Immune and stromal scores are associated with pathologic stages and overall survival of pancreatic cancer patients
The gene expression profiles and clinical information of all 177 pancreatic cancer patients (with initial pathologic diagnosis) were downloaded in TCGA cohort. We included all the pancreatic cancer cases with complete gene expression data and clinical information in the TCGA in our analysis. There clinicopathologic information was listed in Supplementary Table S1. According to the clinical information recorded in the TCGA database, we divided all pancreatic cancer cases into Stage I (21/177, 11.9%), Stage II (146/177, 82.5%) and Stage III and IV (7/177, 4.0%) subgroups based on the overall stage of cancer. Their immune and stromal scores were obtained from the ESTIMATE website. The numerical distribution of the two scores was shown in Figure 1B. In accordance with ESTIMATE algorithm, immune scores were distributed between -1,559.87 to 3037.78 and stromal scores ranged from -1,843.32 to 2,179.19, respectively. As shown in Figure 1C, the pancreatic cancer cases at Stage II subgroup had the highest average immune score (p < 0.01), followed by Stage III and IV, while the Stage I samples had the lowest immune score. Similarly, the rank order of stromal scores across pancreatic cancer subgroups from highest to lowest was Stage III and IV > Stage II > Stage I (Figure 1D), indicating that both immune and stromal scores are meaningful in the correlation of subgroup classification.
To find out the potential correlation of immune and/or stromal scores with overall survival, we divided the 177 pancreatic cancer cases into high vs. low score groups (using zero as the cut-off point) and performed survival analysis using clinical follow-up data for each set of samples. Kaplan-Meier survival curves indicated that 5-year survival rate of cases with low immune scores was longer (Figure 1E, log-rank p = 0.024). Consistently, although not statistically significant, longer 5-year overall survival were observed in cases with lower, compared to higher, stromal scores (Figure 1F, log-rank p= 0.37).
Comparison of gene expression profile with immune and stromal scores in pancreatic cancer
To explore the correlation of global gene expression profiles with immune and/or stromal scores, we analyzed the RNAseq - HTSeq - FPKM data of all 177 TCGA pancreatic cancer cohorts. As the heatmaps shown in Figure 2A and 2B, there were evident gene expression profiles between cases of high vs. low immune/stromal score groups. According to the immune scores, a total of 2,224 DEGs were screened with 1,922 upregulated and 302 downregulated genes in high vs. low score groups (FC > 2, p < 0.05). When it comes to stromal scores, there were 1,982 up-regulated genes and 160 down-regulated genes screened in high vs. low score groups (FC > 2, p < 0.05). In addition, the Venn diagrams in Figure 2C and 2D showed 1,611 commonly up-regulated genes and 104 commonly down-regulated genes in the high score groups . There is a certain degree of overlap between the DEGs in the high-scoring and low-scoring groups of the immune score and the stromal score, especially in the up-regulation group. Among the differentially up-regulated genes, 1,611 genes were shared between immune and stromal score groups, accounting for 70.3% of the genes. While the 104 common down-regulated genes accounted for 29.1%. Especially to deserve to be mentioned, when it comes to the comparison based on stromal scores, the majority of the genes were composed of the DEGs from the comparison of immune score groups . Therefore, we were determined to focus on the DEGs from immune score groups for all the following analyses in this manuscript.
Then, by using KOBAS 3.0, functional enrichment analysis of all upregulated 2,224 genes in high-immune scores group were carried out in order to sketch the possible effect of the DEGs between the two immune score groups (high and low). The DEGs we obtained in the GO annotation and relevant pathways were enriched in the KEGG database. As shown in Figure 2E, F and G, numerous genes were associated with tumor microenvironment or immune function.
Correlation of expression of individual DEGs in overall survival
To determine the separate effects of the 2,224 DEGs on 5-year overall survival, Kaplan-Meier survival curves were generated, of which 246 genes in total were identified to be significantly associated with poor overall survival prediction (log-rank p < 0.05, Supplementary Figure S2 shows designated genes).
Protein-protein interactions among genes of prognostic value
To investigate the interactions among the above DEGs, we utilized the STRING tool and yeasted protein-protein interaction (PPI) networks. By limiting the genes with interactions to be only included in these 246 DEGs, we established a PPI network that are significantly associated with 5-year overall survival using Cytoscape software. As suggested by Figure 3A, 93 nodes and 158 edges composed the network.
Functional enrichment analysis for DEGs of prognostic value
To further clarify the main biological functions of the 93 screened DEGs, we performed functional enrichment clustering analysis of the 93 gene nodes using the gene set enrichment analysis (GSEA) method (p < 0.05, FDR < 0.25). As shown in Figure 3B and 3C, these genes showed strong association with biologically significant processes such as immune response, cell and subcellular composition and movement of cell or subcellular component, which is consistent with the result of PPI network analysis.
Module mining of PPI networks
To perform the module mining on the previously mentioned PPI networks, we obtained a total of 7 modules using the MCODE tool in Cytoscape. It is worth mentioning that one module contained more than 10 nodes (Figure 4). CXCL5, which have been reported to recruit and activate leukocytes and play a role in cancer progression, was also indicated to be one of the most key genes in the PPI network. The genes in the module, including CXCL9, CXCL10, CXCL11, PPY, LPAR3, HCAR3 and so on, were mainly associated with the prognosis of pancreatic cancer or other cancers as listed in Table 1.
Validation in the ICGC database
To validate the prognostic significance of the 93 prognosis-related genes identified from the TCGA cohort, we downloaded and analyzed gene expression data and its corresponding clinical information of a cohort of 95 pancreatic cancer cases from ICGC. It showed that this set of data contained 79 of 93 prognostically significant genes and their prognostic values in pancreatic cancer were visualized by Kaplan-Meier survival curves. A total of 26 genes were found to be significantly associated with poor prognosis (Supplementary Figure S2), while five of them, including SYT12, GJB5, RHOJ, GJB3 and IFI27, were of particular interest as they have not been previously associated with poor outcomes in pancreatic cancer cases.
Degree and betweenness analysis with prognosis-related DEGs and multi-regulatory network construction
By using RAID 2.0 and TRRUST v2 database respectively, we investigate the interaction between human ncRNAs along with TFs and the 93 prognosis-related genes. We obtained 8437 out of 1954911 pairs of ncRNAs with genes and 179 out of 8427 pairs of TFs with genes. These interaction pairs were imported into Cytoscape software to build a multi-regulatory network with 2228 nodes and 8616 edges, including 2021 miRNA nodes, 3 lncRNA nodes, 89 TF (transcription factor) nodes and 91 mRNA nodes. In order to obtain more accurate and significant results, we used the degree and betweenness analysis method to further screen factors including ncRNAs, TFs and drugs that have significant regulatory effects on prognosis-related DEGs. Here, a pivotal node was defined as having interactions with at least two prognosis-related genes and the hypergeometric p value was less than 0.05.
ncRNAs regulate genes associated with significant prognosis
To explore ncRNAs that can significantly regulate prognosis-related mRNAs, we screened the RAID 2.0 database for pivotal ncRNA nodes. By using degree and betweenness analysis, we obtained a total of 42 ncRNA-pivotal nodes, including 39 miRNA nodes and 3 lncRNA nodes. The 10 most hypergeometrically significant ncRNAs, including WSPAR, has-miR-4425, has-miR-4536-5p, has-miR-410-5p, has-miR-8064, has-miR-6871-5p, has-miR-4767, has-miR-608, has-miR-4481 and has-miR-3605-3p, were listed in Supplementary Table S3.
Transcription factors (TFs) regulate genes associated with significant prognosis
To explore TFs that can significantly regulate prognosis-related genes, we screened the TRRUST v2 database for pivotal TF nodes. Accordingly, we put nodes that interact with at least two prognosis-related genes (hypergeometric p < 0.05) into the candidate TF node set. Since we only found one TF, RFWD2, based on the hypergeometric p value, we loosened the p value cut-off to 0.1 and obtained 7 TF-pivotal nodes as listed in Supplementary Table S4.
Construction of a multi-factor regulatory network based on the pivot nodes
Next, we pruned the multi-regulatory network constructed above based on the obtained ncRNA nodes and TF nodes and removed the rest of the non-pivot nodes. As shown in Figure 5, this network included 123 nodes and 276 edges, including 7 TFs, 3 lncRNAs and 39 miRNAs.
Explore drugs that have a therapeutic effect on pancreatic cancer
To explore more effective drugs that can significantly regulate prognosis-related genes, we screened the DrugBank database for pivotal interaction nodes. Similarly, a pivotal node is defined as having interactions with at least two prognosis-related genes and the hypergeometric p value was less than 0.05. A total of 17 drug-pivot nodes were obtained (hypergeometric p < 0.05). The 10 most significant drugs including Amcinonide and Octreotide were listed in Table 2.