Differential Gene Expression Analysis and Functional Enrichment Analysis of CIC-Related Genes
The expression levels of 101 CIC-related genes were explored in normal pancreas and PC samples using GTEx and TCGA datasets. PCA showed that the distribution difference between normal and tumor samples (Figure 1A). A total of 49 DEGs were identified, including 42 upregulated genes and 7 downregulated genes, and visualized by the heatmap (Figure 1B) and volcano plot (Figure 1C). GO analysis suggested that these DEGs were mainly involved in reactive oxygen species metabolic process, receptor-mediated endocytosis, cell leading edge, endocytic vesicle, tubulin binding and cytokine activity (Supplementary Figure 1A). Moreover, KEGG pathway analysis indicated that apoptosis, phagosome, regulation of actin cytoskeleton, ferroptosis, transcriptional misregulation in cancer and focal adhesion were enriched (Supplementary Figure 1B).
Construction of the CIC-Related Prognostic Signature in PC
To reduce the number of genes needed for constructing the prognostic signature, we firstly utilized random forest screening to assign an importance factor to each CIC-related DEGs. And then, top 20 genes, sorted by importance, were selected into further analysis (Figure 1D). Meanwhile, LASSO regression analysis was performed on 49 DEGs, and 18 candidate genes were retained by the most proper value of lambda (λ) (Figure 1E). Subsequently, we combined 10 overlapping candidate genes between these two methods to establish the best regression model by stepwise multivariate Cox regression analysis (Figure 1F). Finally, four significantly CIC-related genes contributing to OS in PC patients were confirmed (Figure 1G), and the risk score of each patient was calculated using the following formula: Risk score = (0.362× expression level of KRT7) + (0.302 × expression level of AURKA) + (−0.114× expression level of CDKN2A) + (−0.377 × expression level of RARB). The correlation analysis among these four genes was shown in the circle plot (Figure 1H). Moreover, the related networks and functions were predicted using geneMANIA website (Supplementary Figure 1C). We found that related functions mainly involved in the regulation of cell cycle, mitosis-related process and kinase regulator activity.
Evaluation and Validation of the CIC-Related Prognostic Signature
Based on the median of risk scores, patients in TCGA cohort were separated into the low- and high-risk groups. The scatterplots showed that, as the patient’s risk score increased, the number of deaths increased and the survival time decreased. Compared with the low-risk group, the expression levels of KRT7 and AURKA in the high-risk group were upregulated, while the expression levels of CDKN2A and RARB were downregulated (Figure 2A). The Kaplan-Meier survival analysis indicated that patients in the high-risk group were significantly associated with shorter OS than those in the low-risk group (Figure 2B). The PCA revealed that patients with different risk levels were distributed into two clusters (Figure 2D). The area under curves (AUC) were 0.760, 0.766 and 0.807 in the 1-year, 2-year, and 3-year ROC curves, respectively (Figure 2C). We also found that, compared with other clinicopathological characteristics, the AUC value of risk score was much higher, suggesting that it was a better prognostic indicator for PC patients (Figure 2E). To demonstrate the robustness of the prognostic signature, the predictive efficiency was evaluated in three independent validation cohorts, including ICGC, GSE21501 and GSE62452. The patients from these three validation cohorts were stratified into the low- and the high-risk groups based on the median of risk scores calculated by using the same formula as in TCGA modeling cohort. Consistent with the results of TCGA cohort, patients in the high-risk group demonstrated the worse prognosis than those in the low-risk group. Similarly, the expression levels of KRT7 and AURKA in the high-risk group were increased, while the expression levels of CDKN2A and RARB were decreased (Figure 2F and 2G; Supplementary Figure 2A, 2B, 2F and 2G). The PCA confirmed that patients in different subgroups could be divided into two separate directions (Figure 2I; Supplementary Figure 2D and 2I). Moreover, the AUC value of time-dependent ROC curves analysis reached around 0.700, indicating that the prognostic signature performed well in three validation cohorts (Figure 2H; Supplementary Figure 2C and 2H), and the risk score had better predictive accuracy compared with other clinicopathological characteristics (Figure 2J; Supplementary Figure 2E and 2J).
Independent Prognostic Value of the CIC-Related Prognostic Signature and Establishment of the Predictive Nomogram Model
The univariate and multivariate Cox regression analyses were performed to evaluate the prognostic power of the signature. Based on the multivariate Cox regression analysis, the risk score was confirmed to be an independent prognostic factor for OS prediction in all four cohorts (Supplementary Table 2). In order to further improve predictive efficiency, the risk score and other clinical characteristics such as age, gender, grade, and stage were together used to establish the predictive nomogram in TCGA and ICGC cohorts. The C-index for the nomogram was 0.704 (95%CI 0.645−0.763) in TCGA cohort and 0.725 (95%CI 0.644−0.805) in ICGC cohort, indicating that the two nomograms both had well predictive performance (Figure 3A and 3B). Subsequently, time-dependent ROC curves, the calibration curves and the decision curve analysis (DCA) were applied to further evaluate the effectiveness of established nomograms. The AUCs of ROC curves for predicting 1-, 2-, and 3-year survival were 0.716, 0.785 and 0.810 in TCGA cohort (Figure 3C), 0.762, 0.787 and 0.890 in ICGC cohort (Figure 3D). In addition, the calibration curves presented satisfied coherence between observed and predicted 1-year, 2-year and 3-year OS in both cohorts (Figure 3E and 3F). DCA was performed to further assess the predictive efficacy between the risk score and nomogram, and the results demonstrated that the nomogram achieved the highest net benefits, suggesting that it was an efficient model to predict the prognosis of PC patients (Figure 3G and 3H).
Functional Enrichment Analyses and Somatic Mutation Profiles of the CIC-Related Prognostic Signature
To further explore the biological functions and pathways associated with the established prognostic signature, we firstly analyzed the DEGs between the high-risk and low-risk groups (Figure 4A and 4B). A total of 212 DEGs were identified in TCGA cohort, including 189 upregulated genes and 23 downregulated genes. Moreover, in ICGC cohort, 162 DEGs were identified, including 52 upregulated genes and 110 downregulated genes. Notably, KRT7 was one of the top 10 upregulated genes sorting by adjust P value in both TCGA and ICGC cohorts (Supplementary Figure 3A and 3B). The GO analysis showed that the DEGs were enriched in several cell differentiation and tumor metastasis-related processes, such as epidermal cell differentiation, extracellular matrix organization, ameboidal-type cell migration, intermediate filament cytoskeleton and cell-cell junction (Figure 4C and 4D). And the KEGG pathway analysis demonstrated that the DEGs were mainly enriched in several pathways associated with tumor invasiveness and metastasis, such as PI3K−Akt signaling pathway, Wnt signaling pathway, Hippo signaling pathway, Focal adhesion, ECM−receptor interaction and Regulation of actin cytoskeleton (Figure 4E and 4F). To clarify whether the risk score was associated with the mutational landscapes of PC patients, we compared the somatic mutation profiles between the high-risk and low-risk groups in TCGA cohort (Figure 5A−5D). Notably, the mutation frequency in the high-risk group was 96.25%, while 78.21% in the low-risk group, indicating that the mutation frequency increased with the risk score elevation. Moreover, KRAS and TP53 were the top two genes with the highest mutation frequencies in both subgroups. We also found that more co-occurrence and mutually exclusive mutations were observed in the high-risk group when compared with the low-risk group (Figure 5E). In addition, patients with higher risk scores demonstrated higher TMB levels (P < 0.001; Figure 5F). We further conducted the same analyses in ICGC cohort and similar results were verified (Supplementary Figure 4).
Analysis of Immune Features between Two Groups
PC harbored a highly heterogeneous TME. To further investigate whether the differences in prognosis between two risk groups were associated with immune cell infiltration, we firstly utilized ESTIMATE and CIBERSORT algorithms to explore the immune infiltration levels in TCGA and ICGC cohorts. We found that the high-risk group was characterized with lower estimate score, stromal score and immune score, while higher tumor purity (Figure 6A; Supplementary Figure 5A). The composition and correlation of tumor-infiltrating immune cells showed that CIC-related prognostic signature was positively correlated with T cells regulatory (Tregs) and Macrophages M0, and negatively correlated with T cells CD4 memory resting, natural killer (NK) cells activated, Monocytes, Mast cells activated (Figure 6B and 6C) and Macrophages M2 (Supplementary Figure 5B and 5C). Patients in the high-risk group exhibited significantly higher infiltrating levels of B cells memory, Macrophages M0 (P < 0.01; Figure 6D) and Mast cells activated (P < 0.05; Supplementary Figure 5D), while lower infiltrating levels of B cells naive, Dendritic cells resting, Monocytes (P < 0.05; Figure 6D) and T cells CD8 (P < 0.01; Supplementary Figure 5D). Subsequently, the classification of immune subtypes showed that four subtypes and five subtypes were identified in TCGA cohort and ICGC cohort, respectively (Figure 6E; Supplementary Figure 5E). Patients in the high-risk group had more C1 (wound healing) and C2 (IFN-γ dominant) subtypes, and less C3 (inflammatory) subtype, suggesting the unfavorable prognosis. Emerging evidence has shown that immune features of TME are associated with the immune checkpoint blockade (ICB) therapeutic responses. The expression levels of some representative immune checkpoints including CD274 (PD-L1), CD276 (B7-H3), CTLA4 (CTLA-4), HAVCR2 (TIM3), LAG3 (LAG-3), PDCD1 (PD-1), TIGIT (VSIG9) and VTCN1 (B7-H4) were compared between two risk groups. We found that CD276 (Figure 6F), CD274 and VTCN1 (Supplementary Figure 5F) were upregulated in the high-risk group, while TIGIT was downregulated (Figure 6F). Besides, the results of ICB responses prediction demonstrated that the response rates were higher in the high-risk group, and responders had higher risk scores (Figure 6G; Supplementary Figure 5G). Although the rate of response and the risk score between two groups were not statistically significant in TCGA cohort, an increased tendency was observed in the high-risk group and responders, respectively. These findings indicated that patients with higher risk scores might be in an immunosuppressive state.
KRT7 is correlated with the unfavorable prognosis and CIC formation
As shown in the results of multivariate Cox regression and correlation analysis, KRT7 was the most important risk gene in the established CIC-related gene signature for significantly predicting prognosis of PC patients and positively correlated with other three genes expression (Figure 1G and 1H). We investigated the protein and mRNA expression levels of KRT7 by HPA database, GTEx and TCGA datasets. We found that the expression of KRT7 in PC tissue was significantly higher than normal pancreas tissue and the expression level increased with the risk score elevation (Figure 7A and 7B). Moreover, the survival analysis showed that, based on the median of KRT7 expression level, patients with KRT7 high expression had shorter survival than those with low expression (Figure 7C). Given the extensive degree of intra-tumoral heterogeneity in PC, we further examined the expression of KRT7 among diverse cell types in TME at single-cell level. The results demonstrated that KRT7 was mainly expressed by malignant ductal cells and expression levels varied among different subpopulations of malignant cells (Figure 7D). We also analyzed the gene expression features and correlations with prognosis of AURKA, CDKN2A and RARB, but none of them was as significant as KRT7 (Supplementary Figure 6). The related networks and functions of KRT7 mainly involved in cytoskeleton remodeling, cell differentiation and protein localization-related functions (Figure 7E). To clarify whether KRT7 is associated with CIC formation, we compared the expression levels of it among different PC cell lines in three independent datasets (Figure 7F). Two cell lines with relatively higher and lower expression of KRT7 at both mRNA and protein levels were selected respectively for CIC formation assay (Figure 7G). We found that BxPC-3 and CFPAC-1 cell lines with relatively higher expression levels of KRT7 showed higher frequencies of CIC formation than PANC-1 and MIA PaCa-2 with relatively lower expression levels of KRT7 (Figure 7H). Then, we performed IHC staining of KRT7 in 24 PDAC samples from the PUMCH cohort to further validate that high KRT7 expression was associated with unfavorable prognosis in PC (Table 2; Figure 8A). According to the median of IHC score, we next divided patients into low expression and high expression subgroups. Combined with KRT7 IHC scores and clinicopathological characteristics, patients from high expression group showed higher proportions of male, IIB-IV stage, lymphatic metastasis and death (Figure 8B). Although the results of survival analysis were not statistically significant (P = 0.1026), KRT7 as a risk factor (hazard ratio = 2.901), high expression group had poorer prognosis (Figure 8C).