Immune scores and stromal scores are associated with pathologic stages and overall survival of pancreatic cancer patients
We downloaded gene expression profiles and clinical information of all 177 pancreatic cancer patients with initial pathologic diagnosis in TCGA cohort. All pancreatic cancer cases with complete gene expression data and clinical information in the TCGA were included in our analysis. There clinicopathologic information was listed in Supplementary Table S1. According to the clinical information recorded in the TCGA database, we divided all pancreatic cancer cases into Stage I (21/177, 11.9%), Stage II (146/177, 82.5%) and Stage III and IV (7/177, 4.0%) subgroups based on the overall stage of cancer. Their immune and stromal scores were obtained from the ESTIMATE website. The numerical distribution of the two scores was shown in Fig. 1B. In accordance with ESTIMATE algorithm, immune scores were distributed between − 1,559.87 to 3037.78 and stromal scores ranged from − 1,843.32 to 2,179.19, respectively. As shown in Fig. 1C, the pancreatic cancer cases at Stage II subgroup had the highest average immune score (p < 0.01), followed by Stage III and IV, while the Stage I samples had the lowest immune score. Similarly, the rank order of stromal scores across pancreatic cancer subgroups from highest to lowest was Stage III and IV > Stage II > Stage I (Fig. 1D), indicating that both immune and stromal scores are meaningful in the correlation of subgroup classification.
To find out the potential correlation of overall survival with immune scores and/or stromal scores, we divided the 177 pancreatic cancer cases into high vs. low score groups (using zero as the cut-off point) and performed survival analysis using clinical follow-up data for each set of samples. Kaplan-Meier survival curves showed that 5-year survival rate of cases with low immune scores was longer (Fig. 1E, p = 0.024 in log-rank test). Consistently, cases with lower stromal scores also showed longer 5-year overall survival compared to patients with higher stromal scores (Fig. 1F, p = 0.37 in log-rank test), although it was not statistically significant.
Comparison of gene expression profile with immune scores and stromal scores in pancreatic cancer
To reveal the correlation of global gene expression profiles with immune scores and/or stromal scores, we compared the RNAseq - HTSeq - FPKM data of all 177 pancreatic cancer cases obtained in TCGA database. Heatmaps in Fig. 2A and 2B showed distinct gene expression profiles of cases belong to high vs. low immune or stromal score groups. According to immune scores, a total of 2,224 differentially expressed genes (DEGs) were screened with 1922 upregulated genes and 302 downregulated genes in high vs. low score groups (fold change > 2, p < 0.05). When it comes to stromal scores, 1982 upregulated genes and 160 downregulated genes were screened in high vs. low score groups (fold change > 2, p < 0.05). Moreover, Venn diagrams (Fig. 2C, D) showed that 1611 genes were commonly upregulated in the high score groups while 104 genes were commonly downregulated. There is a certain degree of overlap between the DEGs in the high- and low-scoring groups of the immune score and the stromal score, especially in the up-regulation group. Among the differentially up-regulated genes, 1611 genes were shared between immune and stromal score groups, accounting for 70.3% of the genes. While the 104 common down-regulated genes accounted for 29.1%. It is worth mentioning that the DEGs extracted from the comparison of immune score groups covered the majority of genes when it comes to comparison based on stromal scores. Thus, we decided to focus on DEGs of immune score groups for all subsequent analyses in this manuscript.
Then, to outline the potential function of the DEGs between high- and low-immune score groups, we performed functional enrichment analysis of all 2224 genes upregulated in high-immune scores group using KOBAS 3.0. The DEGs we obtained in the GO annotation and relevant pathways were enriched in the KEGG database. As shown in Fig. 2E, F and G, numerous genes were associated with tumor microenvironment or immune function.
Correlation of expression of individual DEGs in overall survival
To explore the potential roles of individual DEGs in 5-year overall survival, we generated Kaplan-Meier survival curves for the 2,224 DEGs, of which 246 genes in total were identified to predict poor overall survival significantly (log-rank p < 0.05, selected genes are shown in Supplementary Figure S2).
Protein-protein interactions among genes of prognostic value
To better understand the interplay among the identified DEGs, we obtained protein-protein interaction (PPI) networks using the STRING tool. By limiting the genes with interactions to be only included in these 246 DEGs, we established a PPI network that are significantly associated with 5-year overall survival using Cytoscape software. As shown in Fig. 3A, the network was made up of 93 nodes and 158 edges.
Functional enrichment analysis for DEGs of prognostic value
To further clarify the main biological functions of the 93 screened DEGs, we performed functional enrichment clustering analysis of the 93 gene nodes using the gene set enrichment analysis (GSEA) method (p < 0.05, FDR < 0.25). As shown in Fig. 3B and 3C, these genes showed strong association with biologically significant processes such as immune response, cell and subcellular composition and movement of cell or subcellular component, which is consistent with the result of PPI network analysis.
Module mining of PPI networks
To perform the module mining on the previously mentioned PPI networks, we obtained a total of 7 modules using the MCODE tool in Cytoscape. It is worth mentioning that one module contained more than 10 nodes (Fig. 4). CXCL5, which have been reported to recruit and activate leukocytes and play a role in cancer progression, was also indicated to be one of the most key genes in the PPI network. The genes in the module, including CXCL9, CXCL10, CXCL11, PPY, LPAR3, HCAR3 and so on, were mainly associated with the prognosis of pancreatic cancer or other cancers as listed in Table 1.
Validation in the ICGC database
To validate the prognostic significance of the 93 prognosis-related genes identified from the TCGA cohort, we downloaded and analyzed gene expression data and its corresponding clinical information of a cohort of 95 pancreatic cancer cases from ICGC. It showed that this set of data contained 79 of 93 prognostically significant genes and their prognostic values in pancreatic cancer were visualized by Kaplan-Meier survival curves. A total of 26 genes were found to be significantly associated with poor prognosis (Supplementary Figure S2), while five of them, including SYT12, GJB5, RHOJ, GJB3 and IFI27, were of particular interest as they have not been previously associated with poor outcomes in pancreatic cancer cases.
Degree and betweenness analysis with prognosis-related DEGs and multi-regulatory network construction
By using RAID 2.0 and TRRUST v2 database respectively, we investigate the interaction between human ncRNAs along with TFs and the 93 prognosis-related genes. We obtained 8437 out of 1954911 pairs of ncRNAs with genes and 179 out of 8427 pairs of TFs with genes. These interaction pairs were imported into Cytoscape software to build a multi-regulatory network with 2228 nodes and 8616 edges, including 2021 miRNA nodes, 3 lncRNA nodes, 89 TF (transcription factor) nodes and 91 mRNA nodes. In order to obtain more accurate and significant results, we used the degree and betweenness analysis method to further screen factors including ncRNAs, TFs and drugs that have significant regulatory effects on prognosis-related DEGs. Here, a pivotal node was defined as having interactions with at least two prognosis-related genes and the hypergeometric p value was less than 0.05.
ncRNAs regulate genes associated with significant prognosis
To explore ncRNAs that can significantly regulate prognosis-related mRNAs, we screened the RAID 2.0 database for pivotal ncRNA nodes. By using degree and betweenness analysis, we obtained a total of 42 ncRNA-pivotal nodes, including 39 miRNA nodes and 3 lncRNA nodes. The 10 most hypergeometrically significant ncRNAs, including WSPAR, has-miR-4425, has-miR-4536-5p, has-miR-410-5p, has-miR-8064, has-miR-6871-5p, has-miR-4767, has-miR-608, has-miR-4481 and has-miR-3605-3p, were listed in Supplementary Table S3.
Transcription factors (TFs) regulate genes associated with significant prognosis
To explore TFs that can significantly regulate prognosis-related genes, we screened the TRRUST v2 database for pivotal TF nodes. Accordingly, we put nodes that interact with at least two prognosis-related genes (hypergeometric p < 0.05) into the candidate TF node set. Since we only found one TF, RFWD2, based on the hypergeometric p value, we loosened the p value cut-off to 0.1 and obtained 7 TF-pivotal nodes as listed in Supplementary Table S4.
Construction of a multi-factor regulatory network based on the pivot nodes
Next, we pruned the multi-regulatory network constructed above based on the obtained ncRNA nodes and TF nodes and removed the rest of the non-pivot nodes. As shown in Fig. 5, this network included 123 nodes and 276 edges, including 7 TFs, 3 lncRNAs and 39 miRNAs.
Explore drugs that have a therapeutic effect on pancreatic cancer
To explore more effective drugs that can significantly regulate prognosis-related genes, we screened the DrugBank database for pivotal interaction nodes. Similarly, a pivotal node is defined as having interactions with at least two prognosis-related genes and the hypergeometric p value was less than 0.05. A total of 17 drug-pivot nodes were obtained (hypergeometric p < 0.05). The 10 most significant drugs including Amcinonide and Octreotide were listed in Table 2.