Skip to main content

Identification of metastasis-related genes for predicting prostate cancer diagnosis, metastasis and immunotherapy drug candidates using machine learning approaches



Prostate cancer (PCa) is the second leading cause of tumor-related mortality in men. Metastasis from advanced tumors is the primary cause of death among patients. Identifying novel and effective biomarkers is essential for understanding the mechanisms of metastasis in PCa patients and developing successful interventions.


Using the GSE8511 and GSE27616 data sets, 21 metastasis-related genes were identified through the weighted gene co-expression network analysis (WGCNA) method. Subsequent functional analysis of these genes was conducted on the gene set cancer analysis (GSCA) website. Cluster analysis was utilized to explore the relationship between these genes, immune infiltration in PCa, and the efficacy of targeted drug IC50 scores. Machine learning algorithms were then employed to construct diagnostic and prognostic models, assessing their predictive accuracy. Additionally, multivariate COX regression analysis highlighted the significant role of POLD1 and examined its association with DNA methylation. Finally, molecular docking and immunohistochemistry experiments were carried out to assess the binding affinity of POLD1 to PCa drugs and its impact on PCa prognosis.


The study identified 21 metastasis-related genes using the WGCNA method, which were found to be associated with DNA damage, hormone AR activation, and inhibition of the RTK pathway. Cluster analysis confirmed a significant correlation between these genes and PCa metastasis, particularly in the context of immunotherapy and targeted therapy drugs. A diagnostic model combining multiple machine learning algorithms showed strong predictive capabilities for PCa diagnosis, while a transfer model using the LASSO algorithm also yielded promising results. POLD1 emerged as a key prognostic gene among the metastatic genes, showing associations with DNA methylation. Molecular docking experiments supported its high affinity with PCa-targeted drugs. Immunohistochemistry experiments further validated that increased POLD1 expression is linked to poor prognosis in PCa patients.


The developed diagnostic and metastasis models provide substantial value for patients with prostate cancer. The discovery of POLD1 as a novel biomarker related to prostate cancer metastasis offers a promising avenue for enhancing treatment of prostate cancer metastasis.


Prostate cancer (PCa) ranks among the top ten most common malignant tumors in humans and is now the second leading cause of cancer-related mortality in men [1]. Despite advancements in diagnosis and treatment, the global incidence of PCa continues to increase annually. Treatment options for PCa include surgical resection, chemotherapy, radiotherapy, and hormonal therapy, all of which can effectively treat early-stage PCa characterized by localized tumors [2]. However, approximately 30% of PCa cases progress to metastatic disease, with bone metastases being the most prevalent in advanced cases [3]. The five-year survival rate for advanced metastatic PCa is only around 28% [4]. Since a significant portion of PCa cell growth relies on androgen receptor (AR) signaling, androgen deprivation therapy (ADT) remains the primary clinical treatment for advanced PCa [5]. While ADT typically induces remission for 1–2 years, the development of metastatic castration-resistant prostate cancer (mCRPC) leads to drug resistance and poor efficacy [6]. Therefore, there is a critical need to identify new therapeutic targets for metastatic PCa.

The tumor microenvironment (TME) is a key factor in promoting tumor progression and metastasis. Various elements like secreted molecules, changes in the extracellular matrix, vascular remodeling, and interactions with immune cells contribute to the dynamic nature of the TME. Tumor cells are responsible for orchestrating the intricate changes in the extracellular matrix, the formation of new blood vessels, and communication between cells within the TME [7]. Immune cells present in the TME, such as tumor-associated macrophages (TAMs) and infiltrating T cells, are known to drive the progression and spread of PCa. TAMs release pro-inflammatory cytokines such as CCL5 and CCL2, influencing the development and dissemination of PCa [8, 9]. CD8 + T cells play a role in anti-tumor immunity in PCa and are regulated by the N6-methyladenosine regulator YTHDF1 [10]. Mast cells are important cellular components in the TME and have been linked to poor outcomes in individuals with prostate cancer and other solid tumors. Recent research has highlighted mast cell-derived SAMD14 as a new regulator in the prostate tumor microenvironment [11]. Therefore, a comprehensive investigation into the relationship between PCa metastasis and immune cells is essential for uncovering the mechanisms underlying PCa metastasis.

Multi-omics analysis, utilizing artificial intelligence and machine learning techniques, aims to analyze multidimensional data to uncover hidden patterns and correlations related to diseases. These technologies play a crucial role in making clinically relevant decisions and applications [12,13,14]. Martelin et al. developed a machine learning model for cost-effective preliminary prostate cancer screening [15]. Altıntaş et al. also achieved high accuracy in predicting urethral stricture after TURP with their model [16]. Additionally, Yin et al. used machine learning to predict biochemical recurrence of prostate cancer [17]. This study utilized the WGCNA method on two PCa metastasis-related datasets (GSE8511 and GSE27616) to pinpoint genes associated with PCa metastasis. By integrating differential prognostic genes from the TCGA-PRAD dataset, 21 differential prognostic genes linked to metastasis were identified. Cluster analysis further validated the relationship between these genes and PCa patient prognosis, immunotherapy, and targeted therapy. Machine learning algorithms were employed to construct diagnostic and metastatic models to assess the predictive value of these genes for PCa diagnosis and metastasis. Multivariate COX regression analysis underscored the significance of POLD1 among metastasis-related genes. Moreover, significant associations between POLD1 and DNA methylation, as well as androgen receptor-related compounds, were observed. Immunohistochemistry was utilized to confirm the expression and prognostic variances of POLD1. In conclusion, this research identifies POLD1 as a novel marker of PCa metastasis, with implications for diagnosis, prognosis, metastasis, and immunotherapy.

Materials and methods

Datasets and patient samples

The study utilized the GSE8511 and GSE27616 datasets to investigate genes associated with prostate cancer (PCa) metastasis. GSE8511 included 13 metastatic PCa samples and 12 non-metastatic PCa samples, while GSE27616 had 4 metastatic PCa samples and 5 non-metastatic PCa samples. In addition, RNAseq data and clinical information from the TCGA-PRAD dataset were integrated into the analysis. The diagnostic model was developed and validated using the TCGA-PRAD, GSE6956, GSE16120, GSE14206, and GSE38241 datasets. Furthermore, the metastasis model was constructed and validated using the GSE8511, GSE32269, GSE27616, GSE29650, GSE41192, and GSE38241 datasets. Sixty PRAD tissue samples and corresponding adjacent tissue samples were obtained from Shanghai Outdo Biotech Company. The participants in the tissue chip study underwent surgical procedures between January 2011 and December 2014, with a follow-up period until November 2021, ranging from 6 to 10 years.

WGCNA algorithm identifies metastasis-related genes

Utilizing the ‘WGCNA’ package, missing values in the GSE8511 and GSE27616 data are checked, and sample clustering is performed to identify outlier samples. Optimal power values were chosen to create a proximity matrix, ensuring that the gene distribution based on connectivity followed a scale-free network. Subsequently, topological overlap matrices are computed, gene clustering is conducted, dynamically spliced modules are identified, and similar modules are merged (each containing a minimum of 100 genes). Correlation coefficients and P values between different modules and clinical traits are calculated, assessing the relationship between modules and their constituent genes to identify those most closely associated with prostate cancer metastasis.

Subgroup analysis based on metastatic genes

RNA-sequencing expression profiles and corresponding clinical information for PRAD were downloaded from the TCGA dataset. Consistency analysis was performed using the ConsensusClusterPlus R package (v1.54.0) with a maximum of 6 clusters and 80% of the total samples drawn 100 times. The clustering algorithm used was ‘hc’ with inner linkage method ‘ward. D2’. Clustering heatmaps were generated using the R software package pheatmap (v1.0.12).

Immune infiltration and drug IC50 analysis

For a dependable assessment of the immune score findings, we utilized immunedeconv, a software package in R [18]. Each algorithm underwent comprehensive testing, and each presented distinct benefits. The XCELL approach was chosen for this research as it evaluates a broader spectrum of immune cells [19]. Chemotherapy response to various drugs for each sample in TCGA-PRAD was predicted using the Genomics of Drug Sensitivity in Cancer (GDSC) website. The prediction was carried out through the R package pRRophetic, which estimated the IC50 of the samples using ridge regression.

Constructing diagnostic and metastasis models

In order to develop a PCa diagnosis model with high accuracy and stable performance, we integrated multiple machine learning algorithms into various algorithm combinations. These algorithms include elastic network (Enet), gradient boosting machines (GBM), glmBoost, the least absolute shrinkage and selection operator (Lasso), linear discriminant analysis (LDA), NaiveBayes, plsRglm, random forest (RF), Ridge, Stepglm, supported vector machine (SVM), and extreme gradient boosting (XGBoost). The training set comprised the TCGA-PRAD dataset, with validation performed on GSE6956, GSE16120, GSE14206, and GSE38241 datasets. Each algorithm combination was evaluated based on the AUC value, and the combination with the highest average AUC was selected as the optimal model. Metastasis models were constructed using the LASSO regression algorithm and assessed through 10-fold cross-validation. The metastasis model was trained on GSE8511 data set and validated on a combination of GSE29650, GSE27616, and GSE32269 datasets. The analysis was conducted using the R software glmnet package.

Analysis of POLD1 correlation with DNA methylation and screening and docking of compounds

The correlation analysis between POLD1 expression and DNA methylation was conducted using the Shiny Methylation Analysis Resource Tool (SMART) website. SMART is an interactive web server specifically designed for DNA methylation analysis within the TCGA project [20]. The Connectivity Maps (CMaps) website was utilized to examine the correlation between POLD1 and androgen receptor-related compounds [21, 22]. Subsequently, the chosen compounds were subjected to molecular docking with POLD1 using the CB-Dock2 website. The vina score served as the parameter for evaluating molecular docking affinity, with lower scores indicating higher affinity between the receptor and the ligand [23]. Typically, a vina score below − 5 is deemed to have a more favorable impact.

Analysis of POLD1 expression in prostate cancer tissue microarrays by immunohistochemistry

The tissue microarray was initially placed in an oven at 85 °C for 10 min, followed by soaking in xylene for 15 min and hydration using an ethanol concentration gradient of 100%, 95%, 80%, and 70%. Subsequently, the chip was treated with a citric acid solution in an autoclave to facilitate antigen retrieval. After cooling, the chips were rinsed with PBS and exposed to hydrogen peroxide for 20 min. Next, the POLD1 antibody (AF302114) was added and incubated at room temperature for 2 h. The tissue microarray was then rinsed three times with PBS and incubated with immunohistochemical secondary antibodies for 20 min at room temperature. Following three additional rinses with PBS, the microarray was stained with DAB and hematoxylin. Dehydration was carried out using an ethanol gradient of 70%, 80%, 90%, and 100%. Subsequently, immersion in xylene for eight minutes and blocking of the microarray took place. The immunostaining intensity score ranges from 0 to 3, where 0, 1, 2, and 3 correspond to no reaction, weak reaction, moderate reaction, and strong reaction, respectively. A scale is then applied based on the proportion of positive staining observed: scores of 1, 2, 3, and 4 represent ranges of 0–25%, 26–50%, 51–75%, and 76–100%, respectively. The final score is calculated by multiplying the strength score by the scale score. Interpretation of results is as follows: scores from 0 to 5 indicate low expression, while scores from 6 to 12 indicate high expression.

Statistical analysis

POLD1 expression in PCa and normal tissues was determined by the Wilcoxon rank-sum test. Prognostic analysis was performed using the log-rank test. Statistical significance was defined as p < 0.05.


Screening genes related to PCa metastasis

The workflow of our study was shown in Fig. 1. To identify potential targets for regulating PCa metastasis, we conducted WGCNA analysis on the GSE8511 dataset. To meet the requirements of scale-free network distribution, we determined the adjacency matrix weight parameter power value to be 8 (Fig. 2A-B). Subsequently, a weighted co-expression network model was constructed based on this power value, resulting in the categorization of all genes into 3 modules (Fig. 2C). By utilizing the Pearson correlation algorithm to assess the correlation coefficient and p-value between module eigengenes and traits, we observed that the turquoise module exhibited the highest correlation (correlation coefficient of 0.81) (Fig. 2D-E). Additionally, we incorporated GSE27616 into our analysis, where we selected a power value of 30 using the WGCNA method, leading to the division of genes into 9 modules based on this power value. Notably, we identified the brown module as having the strongest correlation (correlation coefficient of 0.88) (Fig. 2F-J). Regulating PCa metastasis is a critical aspect that drives cancer progression. To identify potential targets, we examined all genes up-regulated in cancer within the TCGA-PRAD dataset, resulting in the identification of 76 metastasis-related up-regulated genes (Fig. 2K). Leveraging ScRNA-seq technology, we delved into the functional heterogeneity of cancer cells. CancerSEA, a specialized database, was utilized to decode the distinct functional states of cancer cells at a single-cell level. By analyzing the functions of the 76 identified genes in the CancerSEA database, we discovered their association with PCa angiogenesis and invasion, validating their role in regulating PCa metastasis (Fig. 2L). Furthermore, we identified 21 genes among the 76 metastasis-related genes that were linked to poor prognosis in PRAD, highlighting their significance as risk factors (Fig. 2M).

Fig. 1
figure 1

The workflow of our study

Fig. 2
figure 2

21 PCa metastasis-related prognostic genes identified. (A-B) The optimal soft-threshold power. The threshold was 8. (C) Weighted co-expression network modeling based on selected power values. (D) Heatmap of trait module associations. (E) Scatter plot of association between specific traits and module genes. (F-G) The optimal soft-threshold power. The threshold was 30. (H) Weighted co-expression network modeling based on selected power values. (I) Heatmap of trait module associations. (J) Scatter plot of association between specific traits and module genes. (K) Venn diagram plotting the intersection of GSE811 and GSE27616 metastases with the TCGA-PRAD oncogene. (L) Functional analysis of metastasis-related genes. (M) Prognostic analysis of metastasis-related genes

The role of metastasis-related genes in PCa

The mutational status of genes in TCGA-PRAD was initially examined. An Oncoplot was used to visualize the SNV status of the top 10 mutated genes out of the 21 genes analyzed in PRAD. Among these genes, LENG8 and TELO2 exhibited the highest mutation frequencies (Fig. 3A). The SNV categories of the gene sets in PRAD were summarized, indicating a predominance of missense mutations (Fig. 3B). A heat map was created to depict the mutation frequencies of these genes, highlighting TELO2, LENG8, and MAP3K10 as the top 3 genes with the highest mutation rates (Fig. 3C). Furthermore, the analysis included the profiles of heterozygous and homozygous CNVs for the 21 genes in PRAD, with larger circles denoting higher frequencies (Fig. 3D-E). The study also investigated methylation variances between PRAD and normal samples, as well as the correlation between methylation and mRNA expression of these genes. Larger circles in the visualization indicated stronger correlations with methylation (Fig. 3F-G). Additionally, the association between these genes and chemotherapy drugs was explored using the GDSC database, revealing a high correlation between DAZAP1 and the chemotherapy drugs analyzed (Fig. 3H). Finally, the relationship between the expression of these genes and established pathways was assessed, showing that genes related to metastasis were closely linked to DNA damage and activation of androgen receptors (Fig. 3I). In conclusion, this comprehensive analysis provides insights into the potential roles of these genes in PRAD from various perspectives.

Fig. 3
figure 3

Metastasis-related prognostic genes play an important role in PRAD. (A) Oncoplot provides the situation of the SNV of the top 10 mutated genes. (B) Figure summarizes the SNV classes of inputted gene set. (C) Figure provides the profile of SNV. (D) Figure provides the profile of heterozygous CNV. (E) Figure provides the profile of homozygous CNV. (F) Figure summarizes the methylation difference between tumor and normal samples. (G) Figure summarizes the profile of correlations between methylation and mRNA expression. (H) Figure summarizes the correlation between gene expression and the sensitivity of GDSC drugs. (I) Figure summarizes the percentage of cancers in which specific gene’s mRNA expression has potential effect on pathway activity

Consistent clustering analysis based on metastasis-related genes

Consensus clustering analysis was conducted based on the expression of metastasis-related genes in the TCGA-PRAD data set. Determining the optimal number of clusters was achieved by identifying the K value with the lowest ‘Proportion of Ambiguous Clusters’ (PAC), a common approach in consensus clustering. PAC measures the intermediate part, defined as the consensus index within the range (u1, u2) [0, 1], where u1 is close to 0 and u2 is close to 1 (for instance u1 = 0.2 and u2 = 0.8). Lower PAC values indicate a smoother middle segment and fewer inconsistent assignments in permuted clustering runs [24]. The cumulative distribution curve and the area under the curve revealed that the highest average consistency within the groups was achieved when K = 2. Furthermore, a clustering heat map for K = 2 was generated (Fig. 4A-D). When K = 2, the TCGA-PRAD samples were divided into two clusters: cluster 1 with 259 samples and cluster 2 with 239 samples. Significant differences in metastasis-related genes, except for RABL6, were observed between these two clusters (Fig. 4E). When PRAD metastasizes, the patient’s prognosis often deteriorates. Consequently, we conducted an analysis to compare the overall survival and progression-free survival outcomes between two clusters. Our findings indicate that patients in cluster 1 consistently experienced a poorer prognosis in terms of both overall survival and progression-free survival (Fig. 4F-G). To investigate the underlying reasons and potential mechanisms contributing to the notable disparity in prognosis among the two patient clusters, we conducted GSEA. Our findings indicated that aging, oxidative stress, and other related factors could potentially have a significant impact on the outcomes (Fig. 4H).

Fig. 4
figure 4

PRAD patients were categorized into two clusters based on the expression of metastasis-related genes. (A) Cumulative distribution curve. (B) Area under the distribution curve. (C) Clustering heatmap. (D) Sample clustering consistency. (E) Expression of metastasis-related genes in two clusters. (F) Difference in overall survival between the two clusters. (G) Difference in progression-free survival between the two clusters. (H) Gene enrichment analysis of two clusters

Correlation of metastasis-related genes with immunotherapy and targeted drug therapy for PRAD

Utilizing the XCELL algorithm, our study delved into the relationship between metastasis-related genes and immune cell infiltration in PRAD. Within both clusters, notable variances were observed in T cell subsets (CD4 + naive, CD4 + non-regulatory, CD4 + central memory, CD4 + effector memory, CD8 + naive), myeloid dendritic cells, eosinophils, macrophages, and M2 phagocytes as opposed to CD4 + Th2 cells (Fig. 5A-B). Subsequent analysis of patient responsiveness to immunosuppressive treatment revealed a significantly higher tumor immune dysfunction and exclusion (TIDE) score in cluster 1 compared to cluster 2, potentially contributing to the less favorable prognosis in cluster 1 (Fig. 5C). Furthermore, examination of IC50 scores for commonly used clinical drugs between the clusters displayed significant differences across the seven targeted drugs assessed (Fig. 5D). Lastly, visualization of the immune cell infiltration levels in both clusters, along with the percentage abundance of tumor-infiltrating immune cells for each sample, was depicted through expression heatmaps generated using the XCELL algorithm (Fig. 5E-F).

Fig. 5
figure 5

Metastasis-related genes are associated with PRAD immunotherapy and targeted therapy. (A-B) Analysis of the infiltration level of each immune cell in PRAD samples based on the XCELL algorithm. (C) Analysis of responsiveness of two clusters treated with immunosuppressive drugs. (D) Analyzing differences in IC50 scores for different targeted drugs between two clusters. (E) Heat map of different immune cell infiltration levels. (F) Percent abundance of tumor-infiltrating immune cells per sample

Build PRAD diagnostic models

To develop diagnostic models for PRAD, three datasets were utilized: TCGA-PRAD for training and GSE6956, GSE16120, GSE14206, and GSE38241 for validation. Among the 10 different machine learning algorithms tested, the XGBoost algorithm demonstrated superior efficiency in constructing diagnostic models (Fig. 6A). The AUC value for the TCGA-PRAD training set was 0.960, while the AUC values for the GSE6956, GSE16120, GSE14206, and GSE38241 validation sets were 0.718, 0.661, 0.693, and 0.989, respectively. The diagnostic model developed by the XGBoost algorithm identified 18 genes: DAZAP1, FAM193B, GSDMB, GTPBP3, HMGB3, LIME1, LUC7L, MAP3K10, MC1R, POLD1, RABL6, SH3D21, SLC16A8, SLC26A6, SPPL2B, TELO2, TRMT1, and TRMU. Based on these findings, the XGBoost algorithm was exclusively employed to construct diagnostic models for the aforementioned datasets, yielding consistently satisfactory results (Fig. 6B-F). Whether analyzing the TCGA-PRAD training set or the GSE6956, GSE16120, GSE14206, and GSE38241 validation sets, all AUC values exceeded 0.8, underscoring the exceptional predictive capability of our model.

Fig. 6
figure 6

The model developed by the XGBoost algorithm is considered the top PRAD diagnostic model. (A) Comparison of AUC values among diagnostic models created by various algorithm combinations. (B-F) ROC curves for training and validation sets

Build a metastasis prediction model

This study focused on the significance of metastasis-related genes and utilized ROC curves to pinpoint the top 5 genes linked to metastasis in the GSE8511 dataset (Fig. 7A-B). Subsequently, a prediction model for PRAD metastasis was constructed based on the expression levels of these five genes employing the LASSO algorithm. The model incorporated 4 genes: FAM193B, POLD1, TELO2, and SLC26A6 (Fig. 7C-D). The formula for calculating the risk score is as follows: risk score = FAM193B * 0.98284 + POLD1 * 9.34847 + TELO2 * 4.34915 + SLC26A6 * 4.26241. Analysis of the ROC curve indicated that the model exhibited robust predictive performance (AUC = 1) for the metastasis model in PRAD patients (Fig. 7E). Further validation through decision curve and nomogram analysis confirmed the model’s predictive accuracy (Fig. 7F-G). Subsequent assessment in the GSE32269 dataset revealed an AUC value of 0.828, underscoring the predictive capability of the model (Fig. 7H). In order to overcome the limitation of the small sample size in GSE27616, the sample was amalgamated with other datasets for validation, consistently demonstrating the effectiveness of the metastasis model (Fig. 7I-J). Nonetheless, validation in the TCGA-PRAD dataset was not possible due to insufficient transfer samples. Nevertheless, an analysis of the correlation between risk score and patient prognosis in the TCGA-PRAD dataset revealed that patients classified as high-risk exhibited poorer overall and progression-free survival outcomes (Fig. 7K-L).

Fig. 7
figure 7

Construction of a metastasis prediction model based on metastasis-related genes. (A-B) ROC curves on metastasis genes in the GSE8511 dataset. (C-D) Construction of metastasis prediction model based on LASSO algorithm. (E-G) The metastasis prediction model has excellent predictive value. (H-J) Validating the predictive value of metastasis prediction models. (K-L) Correlation of high and low risk groups with overall and progression-free survival in patients with PRAD

Correlation analysis of metastasis model with PRAD immune infiltration and sensitivity to targeted drugs

We first explored the correlation between the metastasis model and PRAD immune cell infiltration, and we found that between high and low risk groups, B cell, T cell CD4 + memory, T cell CD4 + central memory, T cell CD8 + effector memory, Class-switched memory B cell, Common lymphoid progenitor, Endothelial cell, Granulocyte-monocyte progenitor, Mast cell, Neutrophil, B cell memory, T cell CD4 + Th2, stroma score and microenvironment score (Fig. 8A-B). Eight clinically commonly used targeted drugs were included for analysis, including drugs commonly used by prostate cancer patients like bicalutamide. Significant differences were observed between high and low risk groups for these 8 drugs (Fig. 8C). Additionally, the heat map analysis of immune cell infiltration levels and abundance between high and low risk groups, based on the XCELL algorithm, was also conducted (Fig. 8D-E).

Fig. 8
figure 8

Metastasis model significantly correlates with PRAD immune infiltration and targeted drug sensitivity. (A-B) The degree of infiltration of each immune cell in the high and low risk groups was analyzed according to the XCELL algorithm. (C) Analyzing differences in IC50 scores for different targeted drugs between high and low risk groups. (E) Heat map of different immune cell infiltration levels. (F) Percent abundance of tumor-infiltrating immune cells per sample

Identification of the best prognostic genes among metastasis-related genes

This study confirmed the significant roles of POLD1, SLC26A6, TELO2, and FAM193B in the diagnosis, metastasis, and prognosis of PRAD. To pinpoint metastasis-related genes crucial for prognosis, a thorough analysis was conducted. Examination of the GSE8511, GSE27616 and GSE29650 datasets revealed elevated expression levels of POLD1, SLC26A6, TELO2, and FAM193B in metastatic samples compared to non-metastatic samples (Fig. 9A-B). Correlation analysis in the TCGA-PRAD dataset indicated that POLD1, SLC26A6, and TELO2 expression was significantly higher in pathological T3 and T4 stages than in T2 stage, and in pathological N1 stage compared to N0 stage (Fig. 9C-D). Moreover, POLD1, SLC26A6, TELO2, and FAM193B expressions were notably lower in clinical T1 and T2 stages than in T3 and T4 stages (Fig. 9E). Subsequent univariate and multivariate COX regression analysis highlighted POLD1 and SLC26A6 as potential prognostic indicators for PRAD patients, with POLD1 standing out (Fig. 9F-G). Recent studies have demonstrated that abnormal DNA methylation of various genes, including metastasis suppressor genes and genes responsible for maintaining cell differentiation, plays a significant role in the progression of tumor metastasis [25]. By using the SMART website, we investigated the relationship between POLD1 and DNA methylation in PCa. In the TCGA-PRAD dataset, we identified a total of 19 methylation probes associated with POLD1(Fig. 9H). Out of these 19 probes, the expression of 13 methylation probes was found to be correlated with the expression of POLD1 (Fig. 9I). Subsequent prognostic analysis of these 13 POLD1-related methylation probes revealed that patients with high expression of cg00450979 had a poor prognosis, while patients with low expression of cg18099632, cg09480336, and cg25677697 had an even worse prognosis (Fig. 9J-M). Additionally, immune cell infiltration levels and correlations with immune checkpoint-related genes were compared between the POLD1 high expression group and the POLD1 low expression group (Fig. 9N-O).

Fig. 9
figure 9

POLD1 identified as the best prognostic gene among metastasis-related genes. (A) POLD1 expression in metastatic and non-metastatic groups of the GSE8511 dataset. (B) POLD1 expression in metastatic and non-metastatic groups of the GSE27616 dataset. (C) Expression of POLD1 in pathologically different T-stages of the TCGA-PRAD dataset. (D) Expression of POLD1 in pathologically different N-stages of the TCGA-PRAD dataset. (E) Expression of POLD1 in clinically different T-stages of the TCGA-PRAD dataset. (F-G) Prognostic value of univariate and multivariate COX regression analysis of POLD1. (H) Chromosomal distribution of the methylation probes associated with POLD1. (I) Pearman correlation analysis between methylation probes and POLD1 expression. (K-M) Prognostic Kaplan-Meier (KM) curves for methylation probes. (N-O) Analysis of POLD1 correlation with immune infiltration and immune checkpoints

Analysis of POLD1 expression in PCa cell lines and its correlation with androgen-targeting compounds

Jin et al. integrated 45 The Human Protein Atlas (HPA) cancer cell lines with 973 The Cancer Cell Line Encyclopedia (CCLE) cancer cell lines, resulting in a total of 985 cancer cell lines for research purposes [26, 27]. The expression of POLD1 in prostate cancer cell lines was analyzed using tools provided by Jin et al., revealing that POLD1 was highly expressed in PC3 and DU145 cells, and least expressed in MDA-PCa-2b cells (Fig. 10A). Given that ADT is fundamental in treating metastatic prostate cancer, our objective was to identify compounds with the strongest correlation with POLD1 among androgen-related compounds. Initially, in the TCGA-PRAD dataset, we identified the top 50 genes positively correlated with POLD1 and generated a co-expression heat map (Fig. 10B). Utilizing CMaps to unveil mechanisms of action of small molecule drugs, functionally annotate genetic variations in disease genes, and guide clinical trials. We further analyzed compounds highly correlated with POLD1 in VCAP cells via the SMap website based on the top 50 positively correlated genes. From drugs targeting androgen receptors, we identified the top 5 drugs positively correlated with POLD1 and the top 5 drugs negatively correlated, which were then used for subsequent analysis (Fig. 10C). Subsequently, we presented the 3D structures of the significant positive correlation compound bicalutamide and negative correlation compound oxymetholone. Following this, molecular docking with POLD1 was conducted to confirm their binding affinities. Our analysis of molecular docking results showed promising outcomes, indicating that POLD1 exhibits a robust binding affinity towards both compounds (Fig. 10D-F).

Fig. 10
figure 10

POLD1 has a high affinity for androgen receptor-targeting compounds. (A) POLD1 expression in PCa cell lines. (B) Heatmap of POLD1 co-expressed genes. (C) Androgen receptor-related compounds associated with POLD1. (D) Molecular docking of POLD1 with bicalutamide. (E) Molecular docking of POLD1 with oxymetholone. (F) Molecular structure of POLD1

Expression and prognostic value of POLD1 in PRAD

This study provides evidence supporting the significant role of POLD1 in prostate cancer. A total of 60 paired PCa samples and corresponding adjacent cancer samples were utilized for immunohistochemical staining to validate these findings. The main objective was to investigate the differences in POLD1 expression and its impact on the prognosis of PCa patients. The results demonstrated a notable increase in POLD1 expression in PRAD compared to normal prostate tissue (Fig. 11A). Dot plots were employed to visually depict the differences in POLD1 expression between PCa and normal tissue (Fig. 11B). Additionally, the analysis revealed a correlation between elevated POLD1 levels and poorer prognosis in PCa patients (Fig. 11C). Lastly, a Sankey plot was utilized to illustrate the distribution of patients with different characteristic variables based on POLD1 expression (Fig. 11D).

Fig. 11
figure 11

POLD1 is highly expressed in PCa and is associated with poor patient prognosis. (A-B) Differential expression of POLD1 in PCa. (C) KM curve of overall survival of POLD1 in PCa. (D) Sankey diagram showing distribution of patients in different subgroups


PCa mortality continues to be reduced by significant advancements in early detection and novel treatment modalities. Despite these progressions, PCa remains the most common malignancy among men globally [28]. Metastatic PCa accounts for around 400,000 fatalities annually. While the 5-year survival rate for localized PCa is nearly 100%, it drops to only 30% for metastatic PCa [29]. Thus, investigating markers associated with metastasis is crucial in identifying individuals at high risk of developing deadly metastatic tumors and offering improved treatment strategies for this patient population.

In our study, we initially identified PCa metastasis-related genes from the GSE8511 and GSE27616 datasets. By focusing on genes highly expressed in cancer that facilitate cancer progression, we cross-referenced these metastasis-related genes with the TCGA-PRAD dataset, resulting in 76 differential genes. Leveraging ScRNA-seq, we delved into the functional heterogeneity of cancer cells. CancerSEA, a specialized database, offers a comprehensive understanding of distinct functional states of cancer cells at a single-cell level [30]. Through the CancerSEA platform, we investigated the potential roles of these 76 genes in PCa. Our analysis revealed that metastasis genes primarily contribute to angiogenesis and invasion in PCa, supporting the significance of our findings as angiogenesis and invasion are critical factors in tumor metastasis. Previous studies have shown that tumor-derived exosomes and their contents play a role in promoting cancer metastasis. The interaction between exosome PGAM1 and ACTG1 can enhance PCa metastasis by regulating angiogenesis [31]. Furthermore, Ephrin-A2, a member of the Eph receptor subgroup, has demonstrated value in diagnosing and predicting the prognosis of PCa. It has also been found to facilitate PCa metastasis by promoting angiogenesis and EMT [32]. These findings collectively underscore the pivotal role of angiogenesis in PCa metastasis. Functional analysis revealed a significant correlation between metastasis-related genes and the activation of Hormone AR. It is widely acknowledged that the progression of PCa is influenced by androgens, and endocrine therapy targeting androgens is a crucial aspect of PCa treatment [33]. These conclusions once again confirmed the accuracy of our screened metastasis-related genes.

Clustering classification is a method used to group similar samples together based on commonalities and differences in the data, while separating dissimilar samples into distinct clusters to reveal unique characteristics and interactions. This approach is commonly applied in various fields such as disease diagnosis, prognosis analysis, gene therapy, epidemiology, and medical image analysis. In our study, we categorized PCa samples into two clusters using cluster analysis, with cluster 1 showing a significantly poorer prognosis compared to cluster 2. Gene enrichment analysis revealed enrichment of pathways related to cellular senescence, oxidative stress, and other biological processes. Cellular senescence is commonly recognized as a tumor-suppressing mechanism; however, senescent cells also exhibit heightened invasiveness and lymphangiogenic capabilities attributed to the development of a senescence-associated secretory phenotype [34]. Recent studies report that cellular senescence is associated with the spatial evolution of colorectal cancer toward a more metastatic phenotype [35]. Oxidative stress plays a crucial role in promoting various aggressive behaviors in tumors. Recent studies have shown that oxidative stress can lead to redox modifications of the protein kinase A β subunit, and in some cases, it can drive tumor metastasis by facilitating the RNF25-mediated degradation of ECAD protein in hepatocellular carcinoma [36]. Immune infiltration plays a crucial role in tumor progression. Utilizing the XCELL algorithm, we identified distinct scores of immune cells in two clusters. Patients with elevated TIDE scores exhibited reduced effectiveness of immune checkpoint blockade therapy and shorter survival post-ICB treatment [37]. Notably, cluster 1 displayed markedly higher TIDE scores compared to cluster 2, correlating with the unfavorable prognosis observed in patients within cluster 1. Machine learning methods offer a convenient approach for identifying characteristic genes, particularly in screening for metastasis-related genes in PCa diagnosis. The diagnostic model derived from these genes demonstrates excellent predictive capabilities. Furthermore, utilizing the LASSO algorithm to construct a metastasis model also yields strong predictive value for metastasis. This model also shows a significant association with immune infiltration in PCa. COX regression analyses and our findings analyzed by immunofluorescence experiments highlighted POLD1 as a key prognostic gene. DNA methylation is an epigenetic modification crucial for regulating individual growth, development, gene expression patterns, and genome stability without altering the DNA sequence. This modification is significant in development and can be consistently passed on during cell proliferation. Numerous recent studies have highlighted the close association between abnormal DNA methylation and the onset and progression of tumors and cellular transformation [38, 39]. Methylation analysis platforms like MethSurv and SMART offer a convenient way to explore the relationship between genes and DNA methylation [20, 40]. Through our research on the SMART website, we identified 4 methylation probes associated with POLD1. These probes not only show a connection to POLD1 but also demonstrate a correlation between their expression and the prognosis of patients with PCa. The CCLE and HPA websites offer researchers a convenient platform to investigate gene expression in tumor cell lines. Numerous studies have successfully utilized these databases for analysis and validation [41, 42]. Our own research identified variations in POLD1 expression among PCa cell lines using these resources. This discovery will be valuable for future cell experiments. Additionally, CMap is a valuable tool for identifying functional relationships between small molecule compounds, genes, and disease states [43]. Our investigation revealed that POLD1 exhibits the strongest correlation with bicalutamide and oxymetholone. Notably, bicalutamide is a commonly used targeted drug for clinical metastatic PCa, further supporting the potential of POLD1 as a drug target. Furthermore, our immunofluorescence analysis confirmed high expression of POLD1 in PCa, with patients exhibiting high expression levels showing a poor prognosis.

Our research is enhanced by the utilization of bioinformatics. Transcriptome data allows us to uncover gene expression patterns in various tissues, developmental stages, or environmental conditions, aiding in our comprehension of gene regulatory networks and biological processes’ regulatory mechanisms. Nonetheless, bioinformatics-driven research does come with limitations, as the quality of genomic data utilized directly impacts the accuracy and reliability of results. Therefore, we strive to incorporate multiple data sets in our studies to validate our findings, ensuring result accuracy. Nevertheless, additional experiments are required to further validate our conclusions.


Various machine learning methods were employed in our study to pinpoint key genes linked to metastasis in PCa. Our research findings support the correlation between these genes and the treatment outcomes of patients with PCa, including responses to immunotherapy, chemotherapy, and overall prognosis. Notably, our study identifies POLD1 as a significant gene, with our experiments confirming its crucial role in predicting tumour invasion and prognosis in PCa patients. Furthermore, our investigation underscores the potential of targeting POLD1 for the development of novel drugs for PCa.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.



Prostate cancer


Weighted gene co-expression network analysis


Androgen receptor


Androgen deprivation therapy


Metastatic castration-resistant prostate cancer


Tumor microenvironment


Elastic network


Gradient boosting machines


The least absolute shrinkage and selection operator


Linear discriminant analysis


Random forest


Supported vector machine


Extreme gradient boosting


Tumor-associated macrophages


Kaplan meier


The human protein atlas


Cancer cell line encyclopedia


Genomics of drug sensitivity in cancer


Connectivity maps


Tumor immune dysfunction and exclusion


  1. Elagawany M, Abdel Ghany LMA, Ibrahim TS, Alharbi AS, Abdel-Aziz MS, El-Labbad EM, Ryad N. Development of certain benzylidene coumarin derivatives as anti-prostate cancer agents targeting EGFR and PI3Kβ kinases. J Enzyme Inhib Med Chem. 2024;39(1):2311157.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Li S, Kang Y, Zeng Y. Targeting tumor and bone microenvironment: novel therapeutic opportunities for castration-resistant prostate cancer patients with bone metastasis. Biochim Biophys Acta Rev Cancer. 2024;1879(1):189033.

    Article  CAS  PubMed  Google Scholar 

  3. Arriaga JM, Ronaldson-Bouchard K, Picech F, Nunes de Almeida F, Afari S, Chhouri H, Vunjak-Novakovic G, Abate-Shen C. In vivo genome-wide CRISPR screening identifies CITED2 as a driver of prostate cancer bone metastasis. Oncogene 2024 Mar 7.

  4. Pan J, Tong F, Ren N, Ren L, Yang Y, Gao F, Xu Q. Role of N6–methyladenosine in the pathogenesis, diagnosis and treatment of prostate cancer (review). Oncol Rep. 2024;51(6):88.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Kim J, Park S, Kim S, Ryu S, Hwang H, Cho S, Han Y, Kim J, Park Y, Lee EK, Lee M. Enhancing the anticancer effect of androgen deprivation therapy by monocarboxylate transporter 1 inhibitor in prostate cancer cells. Prostate. 2024;84(9):814–22.

    Article  CAS  PubMed  Google Scholar 

  6. Li W, Wang Z. Ubiquitination process mediates prostate Cancer Development and Metastasis through multiple mechanisms. Cell Biochem Biophys. 2024;82(1):77–90.

    Article  CAS  PubMed  Google Scholar 

  7. Messex JK, Liou GY. Impact of Immune cells in the Tumor Microenvironment of prostate Cancer metastasis. Life (Basel). 2023;13(2):333.

    Article  CAS  PubMed  Google Scholar 

  8. Xiong Z, Yu SL, Xie ZX, Zhuang RL, Peng SR, Wang Q, Gao Z, Li BH, Xie JJ, Huang H, Li KW. Cancer-associated fibroblasts promote enzalutamide resistance and PD-L1 expression in prostate cancer through CCL5-CCR5 paracrine axis. iScience. 2024;27(5):109674.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Chen ML, Yuan TT, Chuang CF, Huang YT, Chung IC, Huang WC. A novel Enolase-1 antibody targets multiple interacting players in the Tumor Microenvironment of Advanced prostate Cancer. Mol Cancer Ther. 2022;21(8):1337–47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wang Y, Jin P, Wang X. N6-methyladenosine regulator YTHDF1 represses the CD8 + T cell-mediated antitumor immunity and ferroptosis in prostate cancer via m6A/PD-L1 manner. Apoptosis. 2024;29(1–2):142–53.

    Article  CAS  PubMed  Google Scholar 

  11. Teng LKH, Pereira BA, Keerthikumar S, Huang C, Niranjan B, Lee SN, Richards M, Schittenhelm RB, Furic L, Goode DL, Lawrence MG, Taylor RA, Ellem SJ, Risbridger GP, Lister NL. Mast cell-derived SAMD14 is a Novel Regulator of the human prostate Tumor Microenvironment. Cancers (Basel). 2021;13(6):1237.

    Article  CAS  PubMed  Google Scholar 

  12. Ren S, Li J, Dorado J, Sierra A, González-Díaz H, Duardo A, Shen B. From molecular mechanisms of prostate cancer to translational applications: based on multi-omics fusion analysis and intelligent medicine. Health Inf Sci Syst. 2023;12(1):6.

    Article  PubMed  Google Scholar 

  13. Yang Y, Zhou L, Luo J, Xue J, Liu J, Zhang J, Wang Z, Gong P, Chen T. Prediction analysis of TBI 24-h survival outcome based on machine learning. Heliyon. 2024;10(9):e30198.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Chen Y, Zhou X, Ji L, Zhao J, Xian H, Xu Y, Wang Z, Ge W. Construction and analysis of a joint diagnostic model of machine learning for cryptorchidism based on single-cell sequencing. Birth Defects Res. 2024;116(3):e2316.

    Article  CAS  PubMed  Google Scholar 

  15. Martelin N, De Witt B, Chen B, Eschwège P. Development and validation of an imageless machine-learning algorithm for the initial screening of prostate cancer. Prostate. 2024;84(9):842–9.

    Article  CAS  PubMed  Google Scholar 

  16. Altıntaş E, Şahin A, Babayev H, Gül M, Batur AF, Kaynar M, Kılıç Ö, Göktaş S. Machine learning algorithm predicts urethral stricture following transurethral prostate resection. World J Urol. 2024;42(1):324.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Yin W, Chen G, Li Y, Li R, Jia Z, Zhong C, Wang S, Mao X, Cai Z, Deng J, Zhong W, Pan B, Lu J. Identification of a 9-gene signature to enhance biochemical recurrence prediction in primary prostate cancer: a benchmarking study using ten machine learning methods and twelve patient cohorts. Cancer Lett. 2024;588:216739.

    Article  CAS  PubMed  Google Scholar 

  18. Sturm G, Finotello F, Petitprez F, Zhang JD, Baumbach J, Fridman WH, List M, Aneichyk T. Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology. Bioinformatics. 2019;35(14):i436–45. PMID: 31510660; PMCID: PMC6612828.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, Hoang CD, Diehn M, Alizadeh AA. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453–7. Epub 2015 Mar 30. PMID: 25822800; PMCID: PMC4739640.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Li Y, Ge D, Lu C. The SMART app: an interactive web application for comprehensive DNA methylation analysis and visualization. Epigenetics Chromatin. 2019;12(1):71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–35.

    Article  CAS  PubMed  Google Scholar 

  22. Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, Gould J, Davis JF, Tubelli AA, Asiedu JK, Lahr DL, Hirschman JE, Liu Z, Donahue M, Julian B, Khan M, Wadden D, Smith IC, Lam D, Liberzon A, Toder C, Bagul M, Orzechowski M, Enache OM, Piccioni F, Johnson SA, Lyons NJ, Berger AH, Shamji AF, Brooks AN, Vrcic A, Flynn C, Rosains J, Takeda DY, Hu R, Davison D, Lamb J, Ardlie K, Hogstrom L, Greenside P, Gray NS, Clemons PA, Silver S, Wu X, Zhao WN, Read-Button W, Wu X, Haggarty SJ, Ronco LV, Boehm JS, Schreiber SL, Doench JG, Bittker JA, Root DE, Wong B, Golub TR. A Next Generation Connectivity Map: L1000 platform and the first 1,000,000 profiles. Cell. 2017;171(6):1437–e145217.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Liu Y, Yang X, Gan J, Chen S, Xiao ZX, Cao Y. CB-Dock2: improved protein-ligand blind docking by integrating cavity detection, docking and homologous template fitting. Nucleic Acids Res. 2022;50(W1):W159–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Șenbabaoğlu Y, Michailidis G, Li JZ. Critical limitations of consensus clustering in class discovery. Sci Rep. 2014;4:6207.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Habeshian TS, Cannavale KL, Slezak JM, Shu YH, Chien GW, Chen X, Shi F, Siegmund KD, Van Den Eeden SK, Huang J, Chao CR. DNA methylation markers for risk of metastasis in a cohort of men with localized prostate cancer. Epigenetics. 2024;19(1):2308920.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Jin H, Zhang C, Zwahlen M, von Feilitzen K, Karlsson M, Shi M, Yuan M, Song X, Li X, Yang H, Turkez H, Fagerberg L, Uhlén M, Mardinoglu A. Systematic transcriptional analysis of human cell lines for gene expression landscape and tumor representation. Nat Commun. 2023;14(1):5417.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, Reddy A, Liu M, Murray L, Berger MF, Monahan JE, Morais P, Meltzer J, Korejwa A, Jané-Valbuena J, Mapa FA, Thibault J, Bric-Furlong E, Raman P, Shipway A, Engels IH, Cheng J, Yu GK, Yu J, Aspesi P Jr, de Silva M, Jagtap K, Jones MD, Wang L, Hatton C, Palescandolo E, Gupta S, Mahan S, Sougnez C, Onofrio RC, Liefeld T, MacConaill L, Winckler W, Reich M, Li N, Mesirov JP, Gabriel SB, Getz G, Ardlie K, Chan V, Myer VE, Weber BL, Porter J, Warmuth M, Finan P, Harris JL, Meyerson M, Golub TR, Morrissey MP, Sellers WR, Schlegel R, Garraway LA. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Booijink R, Terstappen LWMM, Dathathri E, Isebia K, Kraan J, Martens J, Bansal R. Identification of functional and diverse circulating cancer-associated fibroblasts in metastatic castration-naïve prostate cancer patients. Mol Oncol. 2024 Apr;17.

  29. Schostak M, Bradbury A, Briganti A, Gonzalez D, Gomella L, Mateo J, Penault-Llorca F, Stenzinger A, Wyatt AW, Bjartell A. Practical Guidance on establishing a molecular testing pathway for alterations in homologous recombination repair genes in clinical practice for patients with metastatic prostate Cancer. Eur Urol Oncol. 2024;7(3):344–54.

    Article  PubMed  Google Scholar 

  30. Yuan H, Yan M, Zhang G, Liu W, Deng C, Liao G, Xu L, Luo T, Yan H, Long Z, Shi A, Zhao T, Xiao Y, Li X. CancerSEA: a cancer single-cell state atlas. Nucleic Acids Res. 2019;47(D1):D900–8.

    Article  CAS  PubMed  Google Scholar 

  31. Luo JQ, Yang TW, Wu J, Lai HH, Zou LB, Chen WB, Zhou XM, Lv DJ, Cen SR, Long ZN, Mao YY, Zheng PX, Su XH, Xian ZY, Shu FP, Mao XM. Exosomal PGAM1 promotes prostate cancer angiogenesis and metastasis by interacting with ACTG1. Cell Death Dis. 2023;14(8):502.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Zhao Y, Cai C, Zhang M, Shi L, Wang J, Zhang H, Ma P, Li S. Ephrin-A2 promotes prostate cancer metastasis by enhancing angiogenesis and promoting EMT. J Cancer Res Clin Oncol. 2021;147(7):2013–23.

    Article  CAS  PubMed  Google Scholar 

  33. Gourdin T. Highlighting recent progress in the treatment of men with advanced prostate cancer. Curr Opin Oncol. 2024;36(3):174–9.

    Article  PubMed  Google Scholar 

  34. Banerjee P, Gaddam N, Pandita TK, Chakraborty S. Cellular Senescence as a Brake or Accelerator for Oncogenic Transformation and Role in Lymphatic Metastasis. Int J Mol Sci. 2023;24(3):2877.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Park SS, Lee YK, Choi YW, Lim SB, Park SH, Kim HK, Shin JS, Kim YH, Lee DH, Kim JH, Park TJ. Cellular senescence is associated with the spatial evolution toward a higher metastatic phenotype in colorectal cancer. Cell Rep. 2024;43(3):113912.

    Article  CAS  PubMed  Google Scholar 

  36. Huang Z, Zhou L, Duan J, Qin S, Jiang J, Chen H, Wang K, Liu R, Yuan M, Tang X, Nice EC, Wei Y, Zhang W, Huang C. Oxidative stress promotes Liver Cancer Metastasis via RNF25-Mediated E-Cadherin protein degradation. Adv Sci (Weinh). 2024;11(13):e2306929.

    Article  CAS  PubMed  Google Scholar 

  37. Yidan S, Yang X, Ren S, Lu Z, Liu Z, Kong F, Wang Z. Stratification of risk based on immune signatures and prediction of the efficacy of immune checkpoint inhibitors in prostate cancer. J Men’s Health. 2023;19(11):16–33.

    Article  Google Scholar 

  38. Anuraga G, Wang WJ, Phan NN, An Ton NT, Ta HDK, Berenice Prayugo F, Minh Xuan DT, Ku SC, Wu YF, Andriani V, Athoillah M, Lee KH, Wang CY. Potential prognostic biomarkers of NIMA (never in mitosis, gene A)-Related kinase (NEK) family members in breast Cancer. J Pers Med. 2021;11(11):1089.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Xing C, Wang Z, Zhu Y, Zhang C, Liu M, Hu X, Chen W, Du Y. Integrate analysis of the promote function of cell division cycle-associated protein family to pancreatic adenocarcinoma. Int J Med Sci. 2021;18(3):672–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Modhukur V, Iljasenko T, Metsalu T, Lokk K, Laisk-Podar T, Vilo J. Epigenomics. 2018;10(3):277–88. MethSurv: a web tool to perform multivariable survival analysis using DNA methylation data.

  41. Kao TJ, Wu CC, Phan NN, Liu YH, Ta HDK, Anuraga G, Wu YF, Lee KH, Chuang JY, Wang CY. Prognoses and genomic analyses of proteasome 26S subunit, ATPase (PSMC) family genes in clinical breast cancer. Aging. 2021;13(14):17970.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Lazaro-Camp VJ, Salari K, Meng X, Yang S. SETDB1 in cancer: overexpression and its therapeutic implications. Am J Cancer Res. 2021;11(5):1803–27.

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Wang CY, Chiao CC, Phan NN, Li CY, Sun ZD, Jiang JZ, Hung JH, Chen YL, Yen MC, Weng TY, Chen WC, Hsu HP, Lai MD. Gene signatures and potential therapeutic targets of amino acid metabolism in estrogen receptor-positive breast cancer. Am J Cancer Res. 2020;10(1):95–113.

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


This study was supported by the Natural Science Foundation of Heilongjiang Province (LH2019H030) and National Natural Science Foundation of China (82002680).

Author information

Authors and Affiliations



Y W, B J and L Z drafted the manuscript, J W conducted preliminary investigations. J H conducted the preliminary data collection. M R and B D were responsible for reviewing it. Y W, B J and L Z contributed equally to this manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to BeiChen Ding or MingHua Ren.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Ethics Committee of Shanghai Outdo Biotech Company. All patients provided written informed consent before their surgery.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Ji, B., Zhang, L. et al. Identification of metastasis-related genes for predicting prostate cancer diagnosis, metastasis and immunotherapy drug candidates using machine learning approaches. Biol Direct 19, 50 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: