Identification of candidate cuproptosis-related genes
In the present study, the analysis processes were implemented according to the workflow shown in Fig. 1. In total, 146 CRGs were included for further analyses. Compared with tumor tissues and adjacent controls, 114, 53, and 98 cuproptosis-related genes were aberrantly expressed in the GSE107943, GSE76297 and TCGA-CCA cohorts, respectively. Among these targets, 47 significant CRGs overlapped in a consistent direction in all datasets (Fig. 2a, Supplementary Table 1). As shown in Fig. 2b, a univariate Cox regression analysis was conducted using 255 CCA patients from the OEP001105 cohort to inspect prognostic biomarkers. A total of 50 prognostic genes were detected to be significantly associated with OS (Supplementary Table 2). Among the cuproptosis-related DEGs and prognostic markers, 16 overlapping genes, 8 upregulated and 8 downregulated, were annotated as candidate CRGs for CCA (Fig. 2b). To gain better insight into the molecular mechanisms involved in CCA, we constructed a protein‒protein network and investigated the orchestration of the candidate CRGs for CCA (Fig. 2c). In addition, we evaluated the correlations among the candidate genes’ transcriptome expression profiles using a correlation matrix (Fig. 2d). Interestingly, these candidate CRGs exhibited strong correlations with each other, which indicated a close interaction and exerted their effects by producing costimulatory or inhibitory signals in biological processes.
Construction of the prognostic prediction model
To assess the joint effect of the prognostic significance of the 16 candidate CRGs on CCA patient survival, we constructed a multigene prognostic model by performing LASSO penalized regression analysis in the OEP001105 cohort. Consequently, LASSO modeling identified a ten-DEG-based signature with the minimum optional lambda value (Fig. 3a, b), which included ADAM Metallopeptidase Domain 9 (ADAM9), ADAM Metallopeptidase Domain 17 (ADAM17), albumin (ALB), aquaporin 1 (AQP1), cyclin-dependent kinase 1 (CDK1), metallothionein 2A (MT2A), peptidylglycine alpha-amidating monooxygenase (PAM), superoxide dismutase 3 (SOD3), STEAP3 metalloreductase (STEAP3) and transmembrane serine protease 6 (TMPRSS6). The risk score for CCA was calculated as follows: (6.83×106 × expression level of ADAM9) + (1.21×105 × expression level of ADAM17) + (-4.27 × 107 × expression level of ALB) + (-1.43 × 105 × expression level of AQP1) + (0.0898670 × expression level of CDK1) + (1.76 × 105 × expression level of MT2A) + (1.6696174 × expression level of PAM) + (0.7596718 × expression level of SOD3) + (0.1891796 × expression level of STEAP3) + (0.1891796 × expression level of TMPRSS6).
All CCA patients were divided into high-risk and low-risk groups for the discovery (high-risk: n = 127, low-risk: n = 128) and validation datasets (GSE107943: high-risk: n = 15, low-risk: n = 15; TCGA-CCA: high-risk: n = 18, low-risk: n = 18) according to corresponding median values of the risk score. The expression patterns of the ten-CRG signature in the high- and low-risk groups of the OEP001105 dataset were visualized (Fig. 3c-e). K-M curves were conducted to compare survival conditions in the two risk groups. The K-M analysis indicated that the low-risk group had a better OS rate than the high-risk group in the training dataset (Fig. 3f, P < 0.0001). In addition, in line with the K-M analysis in the training dataset, the high-risk group was associated with a worse OS rate in the validation datasets (Fig. 3g, GSE107943: P = 0.0092; Fig. 3h, TCAG-CCA: P = 0.04). ROC curves were applied to assess the precision of the ten-CRG prognostic model in predicting patient survival. Remarkably, the ROC curve analysis in the discovery cohort showed that the areas under the ROC curve (AUC) values at 1, 2 and 3 years were up to 0.755, 0.76 and 0.743, respectively (Fig. 3i). In the validation dataset, the AUC values at 1, 2 and 3 years were 0.912, 0.786 and 0.767 for the GSE107943 dataset and 0.838, 0.845 and 0.794 for the TCGA-CCA dataset, respectively (Fig. 3j, k). Therefore, these results indicated that the CRG signature could predict CCA patients with high sensitivity and specificity.
Assessment of prognostic factors in CCA
To better understand the relevance and underlying mechanisms of the novel prognostic model in CCA, we applied univariate and multivariate Cox regression analyses to evaluate whether the ten-gene model was an independent prognostic factor of OS for cholangiocarcinoma patients. Associations between the clinicopathological features and the risk score of the ten-CRG scoring system were assessed in the OEP001105 cohort. The distribution of the value of the risk score, patient survival status, and survival time of CCA patients is visualized in Fig. 4a and 4b. To analyze the clinical characteristics of CCA, tumor size diameter as well as the expression levels of CA19-9 and CEA were significantly higher in the high-risk group than in the low-risk group (P = 0.015, P = 0.009, P = 0.007, Fig. 4c). Furthermore, compared with the early stage and advanced stage of low-risk CCA patients, the risk score was dramatically upregulated in the high-risk group (P < 0.0001, Fig. 4d). Additionally, to test whether the developed ten-CRG signature risk score was independent of other clinicopathological features of CCA, we performed univariate and multivariate Cox regression analyses for risk score, involving fluke infection, tumor size diameter, CA19-9, CEA, TNM stage, sex and age (Fig. 4e, f). Both univariate and multivariate Cox regression revealed that tumor size diameter and risk score were independent prognostic predictors for CCA.
The risk model predicted the infiltration of immune cells in CCA
To recognize the indicative role of prognostic signals in the tumor microenvironment (TME), we implemented CIBERSORT to evaluate the infiltration of 22 immune-related cells in CCA in the OEP001105 cohort (Fig. 5a). Then, the differential distribution of the detected immune-related cells between the high-risk and low-risk groups was examined by the Wilcoxon rank-sum test. Compared with those in the low-risk group, the proportions of activated memory CD4 T cells, follicular helper T cells, monocytes, M0 macrophages, activated mast cells and neutrophils were significantly increased in the high-risk group (P = 3.48 × 10–03, P = 6.80 × 10–04, P = 4.02 × 10–02, P = 8.30 × 10–04, P = 3.80 × 10–07, P = 1.42 × 10–03, respectively). Naive B cells, CD8 T cells, resting memory CD4 T cells, activated NK cells, M1 macrophages, and resting dendritic cells were significantly highly infiltrated in the low-risk group (P = 5.40 × 10–04, P = 2.41 × 10–02, P = 1.90 × 10–06, P = 2.10 × 10–06, P = 2.96 × 10–03, P = 8.27 × 10–03, respectively). In addition, as shown in Fig. 5b, based on the EPIC analysis, CAF scores were significantly higher and endothelial scores were decreased in the high-risk group compared with the low-risk group (P = 0.01, P = 0.003).
Potential therapeutic targeting and tertiary lymphoid structure analyses in CCA
To identify the drug sensitivity differences of chemotherapeutic drugs and novel inhibitors for CCA patients between the high-risk and low-risk groups, we investigated the expression patterns of hub genes for chemokines, immune checkpoints, and chemotherapeutic regimens in the field of CCA treatment. A total of 17, 8 and 7 hub targets for chemokines, immune checkpoints, and chemotherapeutic regimens were acquired from previous studies to evaluate the therapeutic effects between the two groups 32,33. Among the 17 chemokine-specific markers, the expression levels of 5 genes (CCL17, CCL19, CCL21, CCL7, CCL8) were significantly decreased and those of 2 markers (CCL13, CCL24) were significantly increased in the high-risk group compared with the low-risk group (Fig. 5c). Moreover, immune checkpoint and chemotherapeutic regimen analyses showed that HAVCR2, IDO1, LAG3, PD-L1, CDK1, CLSPN, PARP1 and RAD51 were significantly higher in the high-risk group than in the low-risk group (Fig. 5d, e). Subsequently, to infer the potential functions of the hub genes, we performed correlation analysis between the immune-related infiltrating cells and the hub genes for chemokines, immune checkpoints, and chemotherapeutic regimens (Fig. 5f).
In addition, the mRNA expression matrix was used for estimation with OncoPredict (v. 0.2) 30. Bonferroni correction was used to define significant drugs (i.e., P = 0.05/198 = 2.5 × 10− 3), where 198 is the total number of drugs evaluated for the software. As a result, AGI-6780, AZD1208, AZD8055, cediranib, ibrutinib, ML323, OSI-027, PAK-5399, PCI-34051, PD325901, PFI3, picolinici-acid, rapamycin, SCH772984, selumetinib and Wnt-C59 were lower in the low-risk group than in the high-risk group, with P values ranging from 1.80 × 10− 4 to 1.90 × 10− 8, illuminating that the low-risk group was more prone to sensitivity to these drugs than the high-risk group (Fig. 6a-p).
Validation of the risk score at the protein expression level
To further replicate the sensitivity and accuracy of our prediction model, we performed a similar analysis with the target proteins. A total of 8,320 proteins were detected in the OEP001105 cohort, and eight of the ten-CRG signatures were found in this dataset (ADAM9, ADAM17, ALB, AQP1, CDK1, MT2A, PAM, SOD3 and STEAP3). The scoring system of CCA was conducted to assess the prognosis of CCA in this cohort dependent on these eight gene protein expression levels. Of note, based on the K-M analysis results, the survival probability was significantly lower in the high-risk group than in the low-risk group (Fig. 7a). The AUCs at 1, 2 and 3 years of survival were 0.713, 0.698 and 0.723, respectively (Fig. 7b). In addition, Pearson correlation between the biomarker of mRNA and protein expression levels was performed to test the relationship. Among the eight genes, the mRNA and protein expression levels were strongly correlated with each other (ADAM9: r = 0.77, P < 2.2 × 10− 16; ADAM17: r = 0.32, P = 4.0 × 10− 06; AQP1: r = 0.79, P < 2.2 × 10− 16; CDK1: r = 0.80, P < 2.2 × 10− 16; MT2A: r = 0.32, P = 4.9 × 10− 06; PAM: r = 0.62, P < 2.2 × 10− 16; SOD3: r = 0.53, P = 1.1 × 10− 15; STEAP3: r = 0.75, P < 2.2 × 10− 16; Fig. 7c). Furthermore, we performed differential expression protein analysis between the high-risk and low-risk groups via the limma package. A total of 131 significantly upregulated and 107 downregulated proteins were identified for subsequent analysis (Fig. 7d). ClusterProfiler was used to reveal the underlying differences in functions between the low-risk and high-risk groups based on KEGG pathways. Fourteen CCA-related pathways, such as the PI3K-Akt signaling pathway, chemical carcinogenesis DNA adducts and bile secretion, were significantly associated with the risk score after FDR correction (Fig. 7e). These results indicated that our ten-CRG indicator playing an effect on CCA development is closely related to the immune response and hepatobiliary functions.
Expression level of CRG-related genes in CCA
To explore the expression patterns of CRG-related genes in different types of infiltrating immune cells, a total of 47,261 cells from 7 CCA patients were classified into T cells, B cells, hepatocytes, cholangiocytes, TAMs, CAFs and TECs according to 22 markers (Fig. 8a, b). Our evidence revealed that MT2A, ADAM9, PAM and ADAM17 pervasively existed in all kinds of cells, and SOD3, AQP1 and ALB were highly expressed in CAFs, TECs and hepatocytes, respectively (Fig. 8c-l).
Validation of the prediction model including gene expression patterns
According to the qRT‒PCR results, compared with the normal cell line, the majority of the target genes were aberrantly expressed in at least one CCA cell line, except PAM and STEAP (P < 0.05, Fig. 9a-k). The expression levels of AQP1, MT2A and SOD3 were significantly upregulated in the three CCA cell lines. ADAM9 and ADAM17 were evaluated in RBE and HCCC9810 cell lines, and CDK1 was upregulated in HuCCT1 and HCCC9810 cells. In addition, TMPRSS6 and ALB were significantly decreased in the HuCCT1 and RBE cell lines and the HCCC9810 cell line, respectively. However, inconsistent results for TMPRSS6 and ALB were presented in the remaining cell lines.