CRGI was optimized from 10 mainstream algorithms
10 Cuproptosis-related genes were curated from the work of Tsvetkov et al1. Combined with 6 well-recognized driver mutation genes (i.e., KRAS, TP53, SMAD4, BRCA1, BRCA2, and CDKN2A) in PAAD, they were subjected to mainstream machine learning procedures to develop a Cuproptosis-related gene index (CRGI)40.
In the TCGA dataset, among 10 mainstream machine learning algorithms, we optimized the best model through LASSO penalized Cox regression that had a leading AUC value in 1, 2, 3, and 4-year overall survival (OS) predictive performance, up to 0.736, 0.703, 0.708, 0.812, respectively (Figure 2 C). The formula for the risk score calculation is:
0.5316*KRAS + 0.014*TP53 – 0.0407*CDKN2A – 0.0999*SMAD4 + 0.3768*BRCA1 + 0.0866*BRCA2 – 0.1292*LIAS – 0.587*LIPT1 – 0.3158*DLD + 0.4833*DLAT – 0.3627*PDHA1 – 0.3253*MTF1 – 0.1286*GLS
Following the calculation of the risk score, patients were separated into high- and low-risk groups, and apparently, there was a higher mortality rate within the high-risk group relative to that of the low-risk group (Figure 2 A). The survival analysis further revealed the fact that the low-risk group possessed a significant survival advantage (Figure 2 B). The difference in expression of the CRGI genes between high- and low-risk groups was visualized in the manner of a heatmap as shown at the bottom of Figure 2 A.
Interestingly, although the predictive performances of the other machine learning algorithms in the first 4 years were poor, 7 algorithms (i.e., Decision Tree, Gaussian, K-NN, Logistic Regression, Random Forest, SVM, and XGBoost) possessed an AUC value = 1.000 in 5-year OS prediction (Figure 2 D-L). Taking it all together, we decided to choose LASSO as the final predictive model for the following comprehensive analytics.
Figure 2 Predictive performance comparison of 10 mainstream machine learning algorithms. (A) Distribution of the risk score and survival status in high- and low-risk groups, and the heatmap that depicts the expression of CRGI genes between high- and low-risk groups. (B) Survival analysis of high- and low-risk groups. (C) The time-dependent ROC curve with the AUC value of 1-, 2-, 3-, 4-, and 5-year OS prediction of the best model optimized by LASSO penalized Cox regression. (D)-(L) The predictive accuracy of other machine learning algorithms (i.e., Decision Tree, Gaussian Mixture Model, GBDT, K-NN, LGBM, Logistic Regression, Random Forest, SVM, and XGboost, respectively).
The predictive performance of CRGI was superior to that of the signatures derived from other cell death mechanisms
Recently, with the progressions in the in-depth understanding of cell death mechanisms, a considerable number of prognosis-predictive gene signatures have been proposed. To clarify whether CRGI behaves better than those signatures originating from other underlying mechanisms, we retrieved signatures that were derived from autophagy, ferroptosis, and pyroptosis gene set41,42,43.
We performed time-dependent receiver operative characteristic (ROC) curves across the TCGA (Figure 2 C, Figure 3 A-C) and ICGC datasets (Figure 3 D-G) for each signature and observed that our model was significantly associated with prognosis in both cohorts and possessed the best AUC values and decision curves, demonstrating the stability and accuracy of CRGI.
Usually, it was considered rigorous enough to assess different models by comparing the AUC values of the ROC curve. However, as ROC analysis merely accounts for the specificity and sensitivity of the model, in the field of medicine, in case of unavoidable false positives and false negatives, one should maximize the clinical benefits from either result as possible. Therefore, we complementarily employed decision curve analysis (DCA) for each signature in the TCGA and ICGC datasets, the result of which again supported that our model had the most advanced predictive power (Figure 3 H-I).
Figure 3 Comparison of other cell death mechanisms-based prognostic signatures in PAAD. (A)-(C) The ROC curve of autophagy-, ferroptosis-, and pyroptosis-based models in the TCGA dataset, respectively. (D)-(G) The ROC curve of our model, and the autophagy-, ferroptosis-, and pyroptosis-based ones in the ICGC dataset, respectively. (H) DCA of our model, and the autophagy-, ferroptosis-, and pyroptosis-based ones in the TCGA dataset, respectively. (I) DCA of our model, and the autophagy-, ferroptosis-, and pyroptosis-based ones in the ICGC dataset, respectively.
CRGI served as an independent and reliable indicator in PAAD prognostic prediction
Based on our model, we extract the risk score, 13 CRGI genes (i.e., KRAS, TP53, SMAD4, BRCA1, BRCA2, CDKN2A, DLAT, LIPT1, LIAS, DLD, PDHA1, MTF1, GLS), age, gender, pathological status, TNM stages, histological grades, radiotherapy, smoking, and drinking information from the TCGA dataset, and carried out univariate Cox regression to examine if they are statistically correlated with prognosis and multivariate Cox regression to qualify their eligibility as independent prognostic indicators. It turned out that the risk score, KRAS, SMAD4, BRCA2, LIAS, age, pathological T stage, pathological N stage, and radiotherapy were associated with prognosis as a result of the univariate Cox regression, while results of multivariate Cox regression furtherly indicated that the risk score, LIAS, and smoking were independent prognostic indicators (Figure 4 A-B).
According to the mentioned findings, we constructed a conventional nomogram and its calibration curve (Figure 4 C-D) which suggested that our model accurately predicted the prognosis of PAAD with a C-index value up to 0.795 (95% Cl: 0.743-0.846, P-value = 2.78e-29).
To make it more user-friendly, the underlying statistics were implemented in a web-based OS calculator which assisted the clinicians to estimate the OS probability by entering the clinicopathological parameters, and the survival time of interest for prediction (Figure 4 E). The calculator is available at https://debmed.shinyapps.io/CRGIProgCal/. The clinical data behind the calculator is available in the supplementary material T1_Rare data for the web-based OS calculator.
We also explored the correlation between the risk score and different clinical factors (e.g., age, gender, tumor grade, etc.). It appeared that the risk score was only associated with tumor grade, between G1 and G3, G2 and G3 (S1_Correlation between the risk score and clinical factors).
Figure 4 Nomograms with clinicopathological characteristics to predict OS in PAAD. (A)-(B) Forest plots of the results of the univariate and multivariate Cox regression, respectively. (C)-(D) Conventional nomogram and its calibration curve based on our model. (E) The screenshot of the web-based OS calculator that was developed from the underlying statistics and calculating a fictional case.
Over-representative analysis revealed the functional importance of CRGI in PAAD
Over-representative analysis was conducted to unravel the functional mechanisms underlying high- and low-risk groups through the ssGSEA. The result of correlation analysis of the risk score and ssGSEA score implied that they were statistically and positively associated, with a P-value = 8.15e−03 and Spearman coefficient = 0.2 (Figure 5 A). Moreover, different cancerous hallmarks were found enriched in high- and low-risk groups with statistical significance. Consequently, 34 pathways were identified within the high-risk group. The most enriched pathways were upregulated and mostly related to cell proliferation, including G2M CHECKPOINT, E2F_TARGETS, etc. (Figure 5 B). Within the low-risk group, 4 pathways were downregulated, mainly related to digestive functions, such as BILE_ACID_METABOLISM (Figure 5 C).
We also performed GO terms and KEGG enrichment analysis for the CRGI genes. Subsequently, GO terms were found most relevant to the energy production within the mitochondria (Figure 5 D). In particular, KEGG pathways were found directly connected to pancreatic cancer (Figure 5 E).
Figure 5 Over-representative analysis of CRGI in PAAD. (A) Correlation analysis of the risk score and ssGSEA score. (B) The top 10 most enriched and upregulated cancerous hallmarks in the high-risk group. (C) The 4 downregulated cancerous hallmarks in the low-risk group. (D) The enriched GO terms of CRGI genes. (E) The enriched KEGG pathways of CRGI genes.
CRGI was associated with the tumor microenvironment condition, cancer immunity cycle, and immunotherapy efficacy
It has been widely believed that cancers are essentially considered as dynamic ecosystems wherein subclone populations of most cancer cells and non-malignant cells in the tumor microenvironment engage cooperatively to promote the disease progression. Therefore, it is necessary to investigate the tumor microenvironment condition. Herein, we utilized the R package “ESTIMATE” to elucidate it in a quantitative way, through which we found that except for the stromal score, the immune score and ESTIMATE score were found statistically significant and that higher immune score, as well as higher ESTIMATE score, were observed in normal tissues (Figure 6 A-C). Then, we analyzed the correlation between the risk scores and stromal, immune, and ESTIMATE scores, respectively. It was found that the risk score was negatively associated with the immune score with statistical significance (Figure 6 D-F). We also analyzed the mentioned scores in the high- and low-risk groups, and it was observed that there was a difference in immune score and ESTIMATE score with statistical significance where the lower-risk group possessed a relatively higher immune score and ESTIMATE score (Figure 6 G). These findings supported the idea that CRGI as a classifier of risk group played an important role in the distinguishment of the tumor microenvironment condition in PAAD. On the other hand, we exhaustively screened the immune cell types in the tumor infiltration process in different risk groups by integrating 7 mainstream immunoinformatic algorithms, including TIMER, CIBERSORT, CIBERSORT−ABS, QUANTISEQ, MCPCOUNTER, XCELL, and EPIC. It was found that the immune cell types involved in this process were very much diverse (Figure 6 H).
Immune checkpoints are negatively regulatory proteins in the immune system that are indispensable for the maintenance of self-tolerance, the prevention of autoimmune responses, and the minimalization of tissue damage. They function by controlling the timing and intensity of immune response. In immunotherapy, the overexpression of immune checkpoints inhibits the function of immune cells so that the body cannot produce an effective anti-tumorous immune response, which ultimately leads to immune escape. Therefore, to fully evaluate the potential impact on immunotherapy efficacy caused by CRGI, we also analyzed the difference in the expression of the representative genes of the soundest immune checkpoints in high- and low-risk groups, including PD-1 (i.e., protein of PDCD1 gene), PD-L1 (i.e., protein of CD274 gene), PD-L2 (i.e., protein of PDCD1LG2 gene), CTLA4, HAVCR2, LAG3, TIGIT, and SIGLEC15. It was found that there are statistical significances in the expression of PD-1, CTLA4, TIGIT, and LAG3 (Figure 6 I). It was observed that there were relatively lower expression levels in the low-risk group, which indicated a betterment in immunotherapy efficacy. In addition, after integrating the data curated from the TCGA cohort, by utilizing the TIDE algorithm (i.e., a scoring system that inversely reflects the immunotherapy efficacy), we predicted the potential response of PAAD patients to immune checkpoint blockade. To be more exact, we first carried out a correlation analysis in which we found that the TIDE score and the risk score showed a significantly positive correlation with a P-value = 0.01 and Spearman coefficient = 0.19 (Figure 6 J). Then, we stepped forward, finding that in high- and low-risk groups, there was also a distinct difference in which the low-risk group demonstrated a prominently lower TIDE score than that of the high-risk group, implying that there was an advancement in immunotherapy efficacy in the low-risk group (Figure 6 K). On the other hand, tumor mutation burden (TMB) and Microsatellite instability (MSI) as other reliable references for the prediction of immunotherapy efficacy are also worth being studied. Therefore, we extensively analyzed their relationship with the risk score. It was found that there was statistical significance regarding TMB, but not for MSI (Figure 6 L, S2_Correlation with MSI).
As Cuproptosis is a cell death mechanism that may raise immune reactions, it is of great interest to investigate the underlying mechanism in the cancer immunity cycle44 (Figure 6 M). Correlation analysis between the risk score and cancer–immunity cycle as well as known biological functions in pancreatic cancer showed that the risk score presented a significantly positive correlation to the processes of cancer antigen presentation, recognition of cancer cells by T cells, killing of cancer cells in cancer immunity cycle, and tumor proliferation signature, cellular response to hypoxia, EMT markers, apoptosis, DNA repair, DNA replication, G2M checkpoint, PI3K AKT mTOR pathway, MYC targets, P53 pathway, TGFB, collagen formation, and ferroptosis in known biological functions (Figure 6 N). Similar differences with statistical significance existed between high- and low-risk groups, indicative of the remarkable interactions of CRGI with tumor immunology (Figure 6 O).
Figure 6 In-depth analytics on the relationship between CRGI and the tumor microenvironment condition, immune cell infiltration, immunotherapy efficacy, TMB, as well as the cancer immunity cycle. (A)-(C) Comparison of the stromal, immune, and ESTIMATE scores of tumorous and normal tissues in the TCGA dataset. (D)-(F) Correlation analysis between the risk score and the stromal, immune, and ESTIMATE scores. (G) The violent plot demonstrated the comparison of the stromal, immune, and ESTIMATE scores in high- and low-risk groups. (H) The heatmap demonstrated the diverse immune cell types in the infiltration process. (I) The violent plot demonstrated the comparison of the expression of the representative genes of the soundest immune checkpoints in high- and low-risk groups. (J) Correlation analysis between the risk score and TIDE score. (K) The violent plot demonstrated the comparison of the TIDE scores in high- and low-risk groups. (L) Correlation analysis between the risk score and TMB score. (M) Graphical demonstration of the cancer immunity cycle. (N) Correlation analysis between the risk score and the main steps of the cancer immunity cycle as well as known biological processes. (O) The boxplot demonstrated the comparison of marker scores in high- and low-risk groups.
CRGI-based risk groups possessed various chemosensitivity
Chemotherapy has been the centerpiece in the treatment of cancer over the past few decades, yet due to the heterogeneous characteristics of tumors, even the responses to the same chemotherapeutic may vary from one patient to another45. To address this problem, genome-based methodologies must be introduced. For this purpose, we evaluated the chemosensitivity of PAAD patients from the TCGA dataset who were classified into high- and low-risk groups previously in the present study to 32 commonly used anti-cancerous drugs (S3_Comparison of the efficacy of 32 anti-cancerous drugs in high- and low-risk groups). Subsequently, the IC50 value of 5 drugs (i.e., Lenalidomide, Metformin, Temsirolimus, Axitinib, and Camptothecin) was found relatively higher in the high-risk group (Figure 7 A), while that of the other 11 drugs (i.e., Paclitaxel, Lapatinib, Dasatinib, Bleomycin, Docetaxel, Doxorubicin, Bexarotene, Gefitinib, Bosutinib, and Bortezomib) was found relatively higher in the low-risk group (Figure 7 B).
Figure 7 Comparison of the efficacy of 32 chemotherapeutics in (A) high-risk group, and (B) low-risk group.
CRGI-based molecular subtypes were characterized by different survival outcomes and immunotherapy efficacy
Since the above analytics has revealed that high- and low-risk groups possessed distinct OS probability and immunotherapy efficacy, it raised our interest in systemically dividing it into more precise molecular subtypes through an unsupervised consensus method (i.e., K-mean algorithm). It was found that when K = 4, the PAAD samples were separated into 4 clusters in the consensus diagram (Figure 8 A). Meanwhile, when K = 4, CDF almost reached its maximum which indicated good stability (Figure 8 B). Besides, it was observed that CDF changed only slightly when K+/-1 (Figure 8 C). Therefore, K = 4 was deemed to be an ideal value in the present study. To ensure the robustness of the clustering, we also conducted a principal component analysis (PCA), through which we could see that the samples were indeed well separated (Figure 8 D). Therefore, ultimately, 4 molecular subtypes were identified, annotated by C1 (N = 37), C2 (N = 90), C3 (N = 23), and C4 (N = 29).
Furthermore, we examined the expression profiles of the 13 CRGI genes in each molecular subtype and found that generally, the expression levels were arranged in such order: C1 > C4 > C2 > C3 (Figure 8E).
We then investigated the clinical outcomes in these molecular subtypes. Results of the survival analysis suggested that C2 had the best prognosis, followed by C4, C3, and C1 (Figure 8 F).
Finally, we inspected the TIDE score and the expression of the representative genes of the soundest immune checkpoints in different molecular subtypes. For the TIDE score, there were differences with statistical significance between C1 and C3, and between C2 and C3, where C3 had a relatively lower TIDE score compared with that of C1 and C2 (Figure 8 G). For the expression of the representative genes, except SIGLEC15, the rest were all found statistically significant and exhibited prominent differences in each molecular subtype (Figure 8 H). In short, the 4 molecular subtypes demonstrated diverse immunogenic features, and that may lead to various efficacy in immunotherapy.
Figure 8 Characterization of 4 CRGI-based molecular subtypes with different survival outcomes and immunotherapy efficacy. (A) The consensus matrix (K= 5), where the columns and rows represented the TCGA samples involved which were clustered into squares in white to deep blue colors. (B) The consensus CDF plot (K = 1, 2, 3, 4, 5, 6). (C) The Delta area plot (K = 1, 2, 3, 4, 5, 6). (D) The PCA diagram where TCGA samples were represented in the manner of scattering spots. The circle bordered the area within which all the samples belonged to the same molecular subtype. (E) The boxplot demonstrated the expression profiles of the 13 CRGI genes in each molecular subtype. (F) Survival analysis demonstrated the clinical outcomes of different molecular subtypes. (G) The boxplot demonstrated the comparison of the TIDE scores in different molecular subtypes. (H) The boxplot demonstrated the expression profiles of the representative genes of the soundest immune checkpoints in different molecular subtypes.
CRGI-based molecular subtypes also possessed various chemosensitivity
Like what we have done to the high- and low-risk groups, we carried out the elucidation of the chemosensitivity of PAAD patients from the TCGA dataset who were classified into 4 molecular subtypes according to the CRGI. As a result, we found that 27 out of 32 commonly used anti-cancerous drugs exerted statistically significant differences in the 4 molecular subtypes (Figure 9). Notably, Axitinib was probably the most distinct one as it differed between every two molecular subtypes with extremely high statistical significance.
Figure 9 Comparison of the efficacy of 32 chemotherapeutics in 4 CRGI-based molecular subtypes.