An Accurate Prognostic Model and Treatment for Pancreatic Carcinoma Based on the Tumor Immune Microenvironment

Background Pancreatic adenocarcinoma (PAAD) is a highly malignant cancer with a poor prognosis. The tumor microenvironment (TME) is closely related to tumorigenesis, progression, and treatment. However, the relationship between TME immune cell genes and prognosis in PAAD is currently unclear. Methods n this study, we identied three prognostic subtypes based on the TME by using data from The Cancer Genome Atlas (TCGA) database, The International Cancer Genome Consortium (ICGC) database and University of California Santa Cruz (UCSC) database. The Silhouette plot analysis was used to evaluate 758 immune genes expression in PAAD from each database, then to divide all samples into three subtypes (Clusters A, B, C) by Lasso’s binomial logistic regression. We analyzed the relationship between subtypes and prognosis by the survival R package. CIBERSORT was used for evaluating the expression changes of immune cells. We detect the copy number variation areas between two groups through GISTIC 2.0 algorithm. The TIDE network tool was used to predict the response of immune therapy. We dened three clusters (Clusters A, B, and C) based on the analysis of immune gene expression. Cluster B got a worse prognosis than the other two clusters. The Cluster B group had the highest level of Macrophages M0 and Macrophage M2. NK cell resting was much higher in Cluster B than other groups in TME. Gene KRAS was mutated in 77% of all samples. Cluster C had a better immune therapy effect than others. We found a news model to predicted patients’ prognosis who with pancreatic adenocarcinoma. Cluster B had the signicant worse prognosis than other groups. Patients in Cluster C could get batter treatment effect by using immunotherapy.


Introduction
Pancreatic adenocarcinoma (PAAD) is one of the most fatal malignancies in the world, representing 2.5% of all cancers, as well as one of the most di cult to diagnose and cure [1]. Almost 85% of PAADS are pancreatic ductal adenocarcinomas (PDAC), which are originating from the glandular epithelium [2]. PAAD has a 5-year survival rate of 9%, making it one of malignant tumors with poorest prognosis [3]. The early diagnosis rate of pancreatic cancer is very low, for 80-90% of patients have unresectable tumors at the moment of diagnosis [3]. Even though the surgical rate is increasing, the cure rate is low and the recurrent rate remains very high. The median overall survival varies between 24-30 months [4]. The clinical manifestations of PAAD include the cancer site, the course of the disease, the presence of metastasis, and the involvement of adjacent organs [5,6].
The major challenges in the treatment of pancreatic cancer are as follows: the metastasis and immunosuppressive tumor microenvironment and the di culty in treating PAAD may be largely due to the protective stromal barriers that effectively limit drug delivery, and the distribution of immune cell populations [7][8][9][10]. There is no doubt that stroma is an important feature of this disease, regulating central processes such as tumor growth, angiogenesis, drug response, and metastasis [8]. As such, the tumor microenvironment in uences cancer initiation and progression, making it a target for drug development efforts [11,12]. Therefore, it is imperative to better understand the pathogenesis of the tumor and explore more treatment strategies to improve the prognosis of PAAD patients. Malignant solid tumor tissues are composed of tumor cells, tumor-associated normal epithelium cells, vascular cells, immune cells, and stromal cells [13]. Tumor-associated broblasts, adipocytes, pericytes, mesenchymal stem cells, endothelial cells, lymphocytes, and extracellular matrix are components of the tumor microenvironment [14]. A noticeable feature of pancreatic cancer is an extensive desmoplastic stroma comprised of broblasts, extracellular matrix (ECM), and immune cells [15][16][17]. There were four subclasses in the stroma functionally: (i) structurally vascularized, (ii) activated, (iii) inflammatory, and (iv) immune [18]. According to the relevant research, the immune microenvironment surrounding cancer cells can recognize and restrain tumor growth or promote progression [19,20].
However, the in ltration of immune cells in the tumor environment of PAAD is poorly understood, where researchers assessed each cell type individually but did not consider the immune microenvironment as a whole [21][22][23]. Thus we need more efforts to specify the role and clinical relevance of the immune contexture in PAAD. To identifying the quality and quantity of tumor-site immune responses is critical because it may help determine which patients can bene t from immunotherapy and improve our understanding of tumor host biology. In this study, we intended to build a prediction model of poor prognosis for PAAD and nd the genes or immune cells related to the immune microenvironment leading to inferior prognosis. We found clinically relevant immune clusters with progressive immune in ltration.
By characterizing the immune components of the cluster, we identi ed a group of tumorigenic immune in ltrates associated with poor prognosis. Further phenotypic analysis revealed two mutually exclusive aggressive tumor phenotypes in PAAD. Then we tried to nd the possibility of insight therapeutic targets for PAAD.

Data collection and processing
Expression data and prognostic clinical data of PAAD were obtained from University of California Santa Cruz (UCSC) database. Mutation data were obtained from The Cancer Genome Atlas (TCGA) database.
Copy number variation (CNV) data were obtained from Broad Institute Genome Data Analysis Center (GDAC) Firehose database. Expression pro le data of pancreatic cancer (PACA) with survival information were obtained from The International Cancer Genome Consortium (ICGC) database. All collected samples from each database have been shown in Table 1.

Unsupervised clustering to obtain immune clusters
The expression pro les of tumor samples from TCGA-PAAD and PACA-CA datasets were extracted for subsequent analysis. The list of immune genes was downloaded from nCounter® PanCancer Immune Pro ling Panel (Human), which were, respectively, intersected with the expression pro les of TCGA-PAAD and PACA-CA tumor samples to obtain the expression pro les of the training expression pro le and the veri cation expression pro le of 758 immune genes. According to the immune gene expression pro le obtained in the previous step, we evaluated the optimal class number K in combination with the Silhouette algorithm, and then conducted hierarchical clustering for all tumor samples.
CIBERSORT analysis CIBERSORT (http://cibersort.stanford.edu/) is a deconvolution algorithm based on gene expression that can evaluate the expression changes of a gene set relative to all other genes in the sample [24]. Thus, the concentration of tumor-in ltrating immune cells (TIICs) can be accurately estimated by this process. The continued performance of CIBERSORT has led to increased interest in cell heterogeneity studies [22,23]. Our current analysis assessed the immune response of 22 TIICs in PAAD using CIBERSORT to assess their correlation with survival [24]. In short, a dataset of gene expression is created using a standard annotation le and uploaded to the CIBERSORT website portal, where the algorithm uses its default signature matrix for a thousand permutations. CIBERSORT estimates the P value of deconvolution by Monte Carlo sampling and establishes the con dence of the result. We divided PACA-CA database and TCGA-PAAD database, including 139 samples, into two clusters and then used the data to make plots.

Comparison of Survival Prognosis
Among the immune subtypes with survival data, the relationship between the immune subtypes and the overall survival of patients was analyzed using the survival R packet. The survival difference was tested by log-rank, and P <0.05 was considered statistically signi cant. The Kaplan-Meier curve shows the survival differences between different immune subtypes. An R-Package "Estimate" was then used to evaluate stromal score, immune score, and tumor purity based on immune gene expression pro les.

Binomial logistic regression
We designed a classi cation model to predict which sample belonged to the poor prognosis group by using lasso's binomial logistic regression. At the meantime, a key gene was obtained, and the Risk Score of each sample was calculated according to the weight of these genes, that is, the expression value of each key gene multiplied by its weight and summed up.
Gene set enrichment analysis (GSEA) analysis Gene set collections for canonical pathways (curated gene sets, 1329 gene sets) were downloaded from the Molecular Signatures Database version 6.2 (www.broadinstitute.org/gsea/msigdb/collections.jsp). Gene set enrichment scores were calculated with GSEA package version 1.32.0 with RNA-seq parameters [28]. Differential gene collection enrichment was determined by LIMMA packets. A statistically signi cant threshold was recorded in the results. The DESeq2 packet of R was used to analyze the differential expression between Cluster B and Cluster A & C groups, according to the threshold FDR<0.05 and |log2FC|>1. Next, GSEA function in R's cluster Pro ler package was used to analyze the differently expressed genes in the curated gene sets of GSEA.

Copy Number Variation (CNV)
The CNV data of Sequence Tagged S (STS) samples were downloaded from FireHose website (http://gdac.broadinstitute.org/), which stored integrated data sourced from TCGA database. The corresponding GISTIC_2.0 module in Gene Pattern was used to detect the copy number variation areas in tumor samples from different groups using its GISTIC algorithm.
Tumor Immune Dysfunction and Exclusion (TIDE) TIDE (http://tide.dfci.harvard.edu) was used to identify factors that underlie tumor immune escape. TIDE combines and models data from 189 human cancer studies, including a total of 33,197 samples. We hypothesis and verify that an accurate genetic marker that mimics tumor immune escape can serve as a reliable alternative biomarker for predicting immune checkpoint blockade (ICB) responses. The TIDE network tool was used to predict the response of different tumor samples to immunotherapy based on the gene expression data of tumor samples.

Statistical analysis
All statistical analyses were performed using IBM SPSS Statistics for Windows, Version 22 (IBM Corp. Armonk, NY, USA) and GraphPad Prism 7.0 (GraphPad Software, La Jolla, CA, USA). The chi-square test was used to analyze the relationship between PDAC and clinicopathological characteristics. Student's-t test was used to analyze the differences among the three groups. Kaplan-Meier method was used to establish survival curves and log-rank test was used to compare survival differences. All statistical results with P <0.05 were considered statistically signi cant.

Results
Cluster of immune genes in PDAC from TCGA-PAAD. The expression data of 178 samples from TCGA-PAAD were measured by the nCounter® PanCancer Immune Pro ling array, an immune gene array. Silhouette plot analysis from 3 to 10 clusters indicated that 3 clusters demonstrated the best segmentation of the nCounter (Fig 1A), showing it as a better pick than more numerous clusters. Then all samples were clustered by the hierarchical clustering method ( Fig 1B). Finally, we got three clusters: Cluster A, Cluster B and Cluster C (Fig 1C, Supplementary Table 1). We concluded that Clusters A-C re ect gradual immune in ltration and were therefore called immune clusters.

Characterization and prognosis of Clusters
We evaluated the stemness features of the 3 clusters based on the methods reported by previous study [29] and compared them by the Wilcoxon Signed Rank Test. We found fundamental differences in stemness scores among the 3 clusters (Fig 2A, Supplementary Table 2). Cluster C had the highest stemness feature score. The scatter graph with bars showed that Cluster A had the highest stromal scores, immune scores, and tumor purity(Supplementary Table 3), while Cluster C got the lowest score ( Fig 2B-D). There were signi cant differences in stromal score, immune score and tumor purity among Cluster A vs Cluster B, Cluster A vs Cluster C and Cluster B vs Cluster C.
We designed a classi cation model to predict which sample belonged to the poor prognosis group by using lasso's binomial logistic regression. At meantime, we got 10 genes, which include CCL4, CD8A, EOMES, FLT3, ITGB4, MUC1, PECAM1, STAT5B, TLR7 AND TNFRSF8 (Supplementary Table 4), from the model and used it to calculate the tumor scores(Supplementary Table 5). The patient had higher scores which proved the worse prognosis. The clinical survival information revealed that the prognosis of the three subtypes had a signi cant differences (Fig 2E). After combing Cluster A and Cluster C to group Cluster A & C, we found Cluster B (with intermediate levels of immune in ltration) associated with a worse survival prognosis than Cluster A & C (Fig 2F), indicating Cluster B as a group predicting inferior survival outcomes. Moreover, Cluster B got higher risk scores compared to Cluster A & C group based on the key genes related to inferior survival ( Fig 2G). Then we divided the samples into groups according to some clinical features, such as smoking status and pancreatic tumor's site. It was found that the risk scores of Cluster B and Cluster A & C groups were signi cantly higher in smoking patients (Fig 2H). And the tumor in Cluster B which had higher cancer scores compared to Cluster A & C in various pancreatic tumor's sites (Fig 2I).

Veri cation of the model via ICGC data
We downloaded the PACA-CA data from ICGC database, then divided all samples into three clusters using the Silhouette Coe cient. Then we used the ICGC database to certify the model's effect. In ICGC database, Cluster B group got higher risk scores compared to the other groups in all cancer samples ( Fig  3A). Then divided the samples into various groups according to some clinical features. We could nd that the Cluster B got higher scores no matter whether the patient smoked or not, nor the patient were diagnosed with diabetes or not. (Fig 3B and G). The location of the tumor in the pancreas does not in uence the worse prognosis in Cluster B (Fig 3C). In tumor characters, such as tumor with or without metastatic tumor stage and tumor's dimension, Cluster B got a higher score than other clusters (Fig 3D-F). The survival curve showed a signi cant differences in prognosis among Cluster A, Cluster B, and Cluster C (Fig 4A-C). Cluster B showed a worse survival outcome when compared to the combined group Fig 4D). The model showed a strong prediction performance demonstrated by the ROC curve in 10-year follow-up with an AUC=0.645. The results proved that the model has outstanding prediction power in PAAD.

The high expression of NK cells resting in TME predicted worser prognosis in Cluster B
We used the CIBERSORT algorithm to evaluate the immune microenvironment composition of PAAD samples. For each group, we calculated the median of CIBERSORT absolute scores for each of the 22 cell types in each cohort(Supplementary Table 6). The level of macrophages M0 and other immune cells had signi cant differences between Cluster B and Cluster A & C (Figs 5A and B). Macrophage M0 (P<0.001) showed an apparent upward trend in the in ltration of immune cells in Cluster B. We found that the Cluster B group had the highest level of Macrophages M0 and Macrophage M2, whereas Cluster A & C group had lower levels of these cell types. The immune cells in the tumor samples were signi cantly correlated with the prognosis, 55 differentially expressed genes were screened (Fig 5C). The different NK cells resting expression in TME showed signi cant difference in prognosis (P=0.024), while Cluster B had higher NK cell resting expression (P<0.05, Fig 5). Therefore we supposed that NK cell resting may be the key point of affecting the patient's prognosis in PAAD.

Phenotypic analysis of the immune clusters
To further characterize the phenotype associated with the poor prognosis in Cluster B, we identified the genes significantly overexpressed in Cluster B. We found 55 genes upregulated in Cluster B when compared to Cluster A & C ( Table 2, Supplementary Table 7). These genes are involved in stem cell biology and EMT. To further characterize the relationship between immune clusters and cancer cell phenotypes, we used gene sets associated with EMT, stem cells, hypoxia, and proliferation. There were a total of seven gene sets from msigDB and one EMT-related signature from Tan et al. [30] were selected. We used the GSVA method to calculate the average enrichment fraction of gene sets for each cluster and cohort. This score re ects the activity of each pathway/gene set in the immune cluster [30]. These genes are mainly related to "BREAST_CANCER_NORMAL_LIKE_UP, REAST_CANCER_LUMINAL_B_DN SANSOM_APC_TARGETS_DN, POOLA_INVASIVE_BREAST_CANCER_UP, WALLACE_PROSTATE_CANCER_RACE_UP, REACTOME_IMMUNE_SYSTEM" (Fig 6A, Supplementary Table 8). To formally identify which gene set scores explained Cluster B, we tested how each gene set contributed to Cluster B vs Clusters A&C using generalized linear models (Fig 6B, Supplementary Table 9). To further characterize the relationship between gene signatures and survival, we performed multivariables analysis, which indicated that "cell motility, mitotic cell cycle, hypoxia, mammary stem cell" are related to poor prognosis (Fig 6C). These results suggested there was a certain association between immune clusters with gene signatures.
Gene mutations and immunotherapy response prediction in different subtypes R's Maftools package was used to analyze the highly mutated genes in tumor samples in different groups (Fig 7A). KRAS was mutated in 77% of the samples, followed by TP53 and SMAD4, which were mutated in 64% and 24% of the samples, respectively. We found that Cluster B and Cluster A & C had frequent ampli cation on chromosomes 8q and 18q, and frequent deletion on chromosome 9q via GISTIC_2.0 of R's GenePattern package (Figs 7B and C). The TIDE network tool was used to predict the response of different tumor samples to immunotherapy based on the gene expression data of tumor samples. Among them (Supplementary Table 10), a high TIDE prediction score was associated with poor immune checkpoint inhibition and a poor prognosis. There was no signi cant difference in TIDE scores between the Cluster B and Cluster A&C groups (Fig 8A). However, Cluster C had a better immune therapy effect than the other two clusters (Fig 8B).

Discussion
Tumor microenvironment plays an important role in the pathogenesis of PAAD. In this study, we present a novel immune-related subtype that is associated with PAAD prognosis. The immune cluster described here depends on the abundance and composition of the immune cell in ltrates. We used an established computational resource (CIBERSORT) to explore the gene expression pro les of downloaded samples to infer the density of 22 types of immune cells. The CIBERSORT algorithm applied initially to the 22 immune cell subtypes helped assess their differing concentrations in Cluster B and Cluster A & C groups for prognosis.
After the precise grouping, we got three clusters in TCGA-PAAD, Cluster A, Cluster B and Cluster C. Then we found the patients in Cluster C got better prognosis than the other two clusters, so we predicted those patients with the higher stem-like feature and lower immune scores which had prolonged survival. Cluster B had the signi cant worse prognosis than other groups. Because the purpose of this study was to predict the prognosis of patients through the tumor immune microenvironment, the reanalysis of the prognosis showed that the prognosis of Group B still had the worst prognosis after combing the Cluster A and Cluster C.
The investigation of tumor-in ltrating immune cells (TIICs) has greatly improved our knowledge of the nature of tumor-immune interactions. The presence of TIICs has been associated with a favorable prognosis across multiple cancer type [32][33][34]. We found that NK cells resting was higher in Cluster B than in Cluster A & C group. And the NK cells resting had signi cantly affected the prognosis of patients who diagnosed as pancreatic adenocarcinoma among all immune cells. Recent evidences suggest that NK cells, besides their direct cytolytic effect against tumor cells, may also shape the adaptive immune response toward a T helper 1 (Th1) pro le, which is thought to favor antitumor responses [35,36]. Despite the complexity of NK cell receptors, NK cells appear to be a valuable and potentially more versatile alternative T cell in antitumor immunotherapy. The NK cells do not need to select or amplify speci c clones to achieve antitumor activity. More importantly, they can be safely used for adoptive immunotherapy in allogeneic Settings [37,38]. According to this study, we found that high resting NK cells in ltrating in the tumor microenvironment had worse prognosis in Cluster B.
In the present study, we found the key genes related to pancreatic cancer immunity. The score of Cluster B was higher than that of the other groups, indicating that the group with higher score had a poorer prognosis (Fig 2C). Correlating with any in uence factors, the score of group B was higher and the prognosis was worse (Fig 2D-H). We veri ed this classi cation method through PACA-CA data, which also testifying the prognosis of Cluster B was worse. The ROC curve was satisfying, which proved that this classi cation method was reliable and feasible. However, there was no signi cant difference in some in uencing factors among the poor prognosis group of PACA-CA data species. It may suggest that the immune microenvironment may have little correlation with these in uencing factors. It indicating that the gene signatures such as EMT, stem cell, and others, which may have correlation with PAAD. The Forest plot shows the risk degree of each pathway in Figure 7C, which indicates that the pathways like STEM_CELL_PATHWAY are risk factors for pancreatic cancer survival, and it is speculated that it may be related to the classi cation of Cluster B with poor prognosis.
According to the region of tumor sample copy number change (Fig 8B), both Cluster B and Cluster A & C had frequent ampli cation on chromosome 8q and 18q, and frequent deletion on chromosome 9q. In addition, a signi cant deletion of Cluster B occurred on chromosome 18q, suggesting that the deletion on chromosome 18q may be associated with poor prognosis. Many studies using different molecular techniques have agreed that the loss of chromosome 18Q is an early event in pancreatic cancer. Chromosome 18Q contains a group of tumor or metastasis-suppressor genes such as SMAD2, SMAD4, DCC, and PAI-2.Loss of 18q heterozygosity (LOH) is a common event in over 90% of pancreatic cancers, while only 50% of pancreatic cancers are characterized by bi-allele inactivation of the SMAD4 gene [39,40]. In pancreatic cancer, KRAS, TP53, p16, and SMAD4 are thought to play key roles in tumorigenesis [41]. As with many aspects of PDAC, mutated KRAS is to blame. KRAS is an early oncogenic event that drives the recruitment of B cells early in PDAC development and thus plays an immunosuppressive role [42]. Therefore we thought the frequent deletion of chromosome 18q may in uence KRAS, TP53, p16, and SMAD4 and induce worse prognosis. KRAS, TP53, CDKN2A, and SMAD4 are the most commonly altered genes in human PDAC, but are considered to have a poor therapeutic response in most cases. , no targeted therapy for PDAC has yet been approved, and despite the large investment in the development of KRAS pathway inhibitors, trial results to date have been unsatisfactory [43]. The counts of Cluster C were much lower than the other two clusters, which may in uence the immune therapy effect. We should collect more data to con rm the real therapy effect.
We conclude that Cluster B had a worse prognosis than others groups. Then we designed a classi cation model to testify whether the sample is Cluster B. In Cluster B, high in ltrating resting NK cells in the tumor microenvironment induce a worse prognosis. We thought the frequent deletion of chromosome 18q may in uence KRAS, TP53, p16, and SMAD4 to induce worse prognosis.
In the modern era of immunotherapy, several clinical trials using immune checkpoint inhibitors have been conducted in PAAD and are planned in combination with immunogenic chemotherapy or radiation therapy. Our study shows that taking into account both the type of immune cells in ltrating the tumor and the dominant state of the tumor will lead to more accurate treatment decisions and improved responses to these new therapeutic strategies. Availability of supporting data and materials Not applicable.

Competing interests
We have no competing interests to declare.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.