Identification of Genetic Determinants of Pancreatic Cancer Immune Phenotypes by Integrative Genome-scale Analysis


 Background Pancreatic cancer (PC) is a malignant neoplasm of the digestive tract that is highly malignant and difficult to diagnose at an early stage with high postoperative mortality and low cure rates. Cancer immunotherapy is innovating the clinical treatment of several cancers, but has a limited role in PC. The incomplete understanding of immune response hinders the development of gene therapy. To fill this gap, it is very necessary to classify the immunogenic subtypes of PC to understand the relationship between tumor microenvironments and clinical pathological characteristics, DNA damage repair and tumor immune response.Methods We extracted copy number change, somatic mutation and expression data from tumor genome map (TCGA). Using RNA sequencing data, we defined three different immunophenotypes and elucidated how immune cell interactions in immune subtypes vary with the background of the immune system. Further correlation analysis between DNA damage repair (DDR) related genes expression and immune response was conducted to explore the effects of DDR expression on antitumor activity in the tumor microenvironments.Results We defined three different immunophenotypes and elucidated how immune cell interactions in immune subtypes vary with the background of the immune system. When the total number of mutations was standardized, no enrichment of new epitopes was detected in immunocompetent phenotypes. These observations suggest that mutations in Th-1 enriched IS3 tumors are essentially no more immunogenic than those in IS2 cancers. We also found that the expression patterns of key IFN mediators STAT1 and / or STAT3 were increased in tumors with DDR mutations (19 of ATM, ERCC1, Rb1, BRCA2, pole and TP53), which is a reflex activation of IFN pathway.Conclusions Three defined immune subtypes show different characteristics of immune cell infiltration, revealing different manifestations in anti-cancer immune function. That is to say, clinical biological events of differential tumors are associated to immune-phenotypes. The invasive phenotype was driven by somatic mutations across immune subtypes, and DDR defect may also influence the response of tumor immune subtypes. Our results suggested that the occurrence of pancreatic cancer is related to genetic factors of immunophenotype, providing information for clinical prognosis and outcome of pancreatic cancer.


Introduction
Pancreatic cancer (PC) is the fourth leading cause of cancer death in Western society and is expected to become the second in ten years [1]. It has a median survival time measured in months and a ve-year survival rate < 5% after adjuvant therapy such as surgical resection, radiotherapy and chemotherapy due to late detection, di cult treatment, and poor prognosis [2]. The micro-environment of pancreatic cancer plays an important role in the occurrence and development of pancreatic cancer. It is also the main reason for poor prognosis and insensitivity to radiotherapy and chemotherapy of patients with pancreatic cancer. Immunotherapy mainly ghts tumors by activating the body's own immune system. However, the existence of the unique tumor microenvironment in pancreatic cancer limits the effectiveness of immunotherapy [3,4].
Solid tumors interact with the immune cells that in ltrate them in a dynamic balance, which determines the progression of the disease. Evading immune surveillance of tumors is a common feature of all cancer types. In this process, protecting cancer cells from the action of cytotoxic immune in ltration uid or promoting the inhibition of the in ltration uid can be positively triggered [5,6]. Although several mechanisms for evading immunity against certain tumor types have been experimentally determined, the understanding of tumor immune escape pathways is not yet comprehensive so far [7]. Some of these mechanisms (such as activation of immune checkpoints) have been used therapeutically and achieved signi cant clinical success in various cancer types. Therefore, discovering the characteristics obtained by tumors in response to surrounding immune cells may open up new strategies for treating the disease [8,9]. There is an urgent need to better understand the immune in ltration of PC in order to improve the selection of current treatment options and formulate new treatment strategies for patients.
The development of the Cancer Genome Atlas (TCGA) has profoundly revealed the gene expression and mutation pattern of human tumors. [10,11]. At present, deconvolution algorithms such as CIBERSORT can extract genome and transcriptome data from a large number of tumor samples have been used to study the number of immune cells in the tumor microenvironment (TME). An article reveals that the measurement of immune in ltration determines the molecular subtypes of many tumors, and the expression of immune genes varies with molecular subtypes [12,13]. Characterization of the immune microenvironment using gene expression signatures, T cell receptor (TCR) and B cell receptor (BCR) repertoire, and further analysis to identify neo-antigenic immune targets provide a wealth of information in many cancer types, and have prognostic. Therefore, extraction of immune genes can distinguish the types of tumor microenvironment on the basis of molecular typing. Recognition of the immune microenvironment using gene expression signatures, especially including immune cell member surface biomarker, and further analysis to identify neo-antigenic immune targets provide a lot of information in many cancer types, and have prognostic value value [14,15]. Cancer immunotherapy as well as cancer chemic-drug treatment has been revolutionized and survival rate has been improved with the help of bioinformatic analysis based on the development age of TCGA's working. Antibodies against immune check point, such as CTLA-4, PD-1and PD-L1, are effective in treating a number of tumor. However, the mechanism of the microenvironment that cause these immune response results has not been fully understood, but is critical to the strategies designing for immunotherapy study [16].
In this study, based on the certi ed immune-related gene set, the pancreatic cancer expression data in the TCGA database was clustered to identify three pancreatic cancer immune subtypes, compare the immune in ltration of each subtype and characterize the activation of immune function in each subtype. Through survival analysis and integration of pathological processes, different prognostic outcomes that may be caused by different subtypes are described. The effector pathways are also signi cantly different in distinct immune subtypes. In order to further explore the mechanism of different immune subtypes, the analysis of the somatic mutation and the variation in copy number of different immune subtypes in pancreatic cancer was conducted. Although no direct evidence is found, it suggests that DNA damage response (DDR) related genes may be related to the appearance of different immune subtypes. This study found that the three immune subtypes of pancreatic cancer could further enhance the understanding of the occurrence and development of cancer, and also provide information for clinical prediction of prognosis and outcome, and the theoretical guidance of pancreatic cancer medication.

Sample data collection and processing
All data collection and analysis are done using R (3.5.0) unless otherwise stated. The packages used are mentioned in various parts of the article. Use the TCGA assembler and our TCGA bioloinks R/Bioconductor package to download RNA-seq, clinical and copy SNP data. Mutation data (genome.wustl.edu__IlluminaGA_curated_DNA_sequencing_level2.maf) download using TCGA portal (N=178).

Immune Signature Compilation
We conducted an extensive literature search and collected 160 immune expression signatures. Using different resources, based on the expert opinions of immuno-oncologists, these resources are considered reliable and comprehensive. Among these characteristics, 83 come from the research background of cancer immune response, and the remaining 77 have general immune effectiveness [17]. It is known that 83 signals related to the immune activity of tumor tissues are composed of 68 genes collected from earlier studies, of which 9 expression signals come from the computational analysis of all TCGA gene expression data sets (immune hypergene attractors), of which 3 These signals represent the functional direction of the tumor. The immune background (or immune constant of rejection, ICR) in a recent study and 3 characteristics of the immune microenvironment of clear cell renal cell carcinoma [18,19]. The 77 more general signatures include scores of 45 signatures, representing a single cell type from two sources, and scores containing the main pattern from immune GDB [20]. These patterns were identi ed as the rst 32 principal components of the 1888 immune C7 human gene set and used as a large and complex set.
The scoring of gene sets uses single-sample gene set enrichment analysis (ssGSEA) (Barbie et al., 2009), as implemented in the gene set variation analysis (GSVA) R software package.

Correlation matrix and consensus clustering
Spearman test was used to calculate the correlation matrix, and corrplot 0.73 was used to draw the graph. The genes were sorted by the rst principal component. Use the consensus Cluster Plus software package to generate a consensus matrix diagram. The parameters are as follows: 5000 replicates, up to 7 clusters, and agglomerated hierarchical clustering with ward criterion (ward.D2) internal linkage and complete external linkage. The genes used for the consensus cluster analysis are 160 immune gene sets selected a priori [21,22].

Immune Cellular Fraction Estimates
Use CIBERSORT to estimate the relative fraction of 22 immune cell types in the white blood cell compartment [23,24]. More speci cally, we apply CIBERSORT to TCGA-RNASeq data. CIBERSORT (cell type identi cation by estimating relative subsets of RNA transcripts) uses a set of 22 immune cell reference maps to derive a base (feature) matrix, which can be applied to mixed samples to determine the relative proportion of immune cells [25,26].

Heat maps and Kaplan-Meier survival plots
For the heat map, pheatmap was used, and ggkm function was created to generate the Kaplan-Meier survival curve. OS data can be used in the TCGA data set and used to generate Kaplan-Meier curves [27,28]. In order to minimize population-speci c bias due to patient follow-up time, the survival rate analysis was limited to a 10-y window; the p-value was based on the log-rank test.

Differential gene expression and functional analysis
The DESeq2 analysis software package is used for gene differential expression analysis. In order to reduce false statistical tests, the gene counts with low expression values were screened out. The critical values for identifying signi cantly differentially expressed genes (DEGs) are false discovery rate (FDR) corrected p-value (q-value) <0.05 and absolute logarithmic change (logfc)>0.5. Gene Ontology (GO) term enrichment analysis was performed using clusterPro ler [29].

Somatic mutations
With the help of R package maftools ,we performed the Kruskal-Wallis test to detect the distribution of somatic mutations and new epitopes in comparison. The DMG between immune subtypes was determined by Fisher's exact test, among which 5037 genes had at least 1% mutation (cutoff value p=0.01) [30,31].

Neo-epitope analysis
NetMHCpan was used to calculate the binding a nity of all possible mutant amino acid peptides overlapping the mutation site, and the presence or absence of somatic mutant cell mutations [32,33]. Use clone and non-clonal neoantigen classi er genotype3 to infer whether the HLA allele corresponding to each sample is compatible with the newly generated mutant peptide [34,35]. If a peptide containing a given mutation is predicted to be a high-a nity binder, while the original unmutated germline peptide is not predicted to be a high-a nity binder, then the mutant peptide is de ned as a potentially strong new epitope [36]. We calculated the immunogenicity pro le of a single sample or gene as the ratio between the different counts of mutations that caused at least one putative strong new epitope to the total number of different mutations observed in the sample or gene.

Copy number alterations
Signi cant genomic imbalances were based on Genomic Identi cation of Signi cant Targets in Cancer (GISTIC) version 2.0 algorithm109 to identify broad and focal CNAs speci c for each group. GISTIC computes, for each segment through the genome, a score based on the frequency of CNA combined with its amplitude, with bootstrapping to calculate the signi cance level in terms of FDR-adjusted p values (q values). GISTIC thresholds for calling gain and loss were set to absolute log 2 ratio > 0.1 and only CNAs with a q value < 0.05 were considered relevant for further analysis. For the generation of the ideogram plot, we made use of NCBI's Genome Decoration Page (GDP, source: http://www.ncbi.nlm.nih.gov/genome/tools/gdp) [37]. BED les were generated for signi cant (FDRadjusted p value (q value) ≤ 0.001, test) differently ampli ed or deleted genes between IS1 to IS3 and uploaded to the GDP tool to create an ideogram for ampli cation and deletion.

Results
Identi cation of three immunogenic subtypes for PADC In order to evaluate the ability of compiled 138 immune gene sets in pancreatic cancer classi cation, we performed ssGSEA on the RNA-seq data extracted from 146 pancreatic ductal adenocarcinoma (PADC) samples collected by TCGA cohort (across). We found that although different gene sets are relatively large in the enrichment score variation on a single sample, the enrichment scores of related gene sets are relatively consistent between multiple samples (Fig. 1A). At the same time, the enrichment scores of these gene sets are correlated within immunogenic modular components, especially the enrichment scores of immune cell tag gene sets are highly positively correlated. This analysis revealed that the relative abundance of many immune cell populations is correlated (Fig.1B), suggesting at least some degree of co-in ltration of the tumor. In short, these gene sets will provide profound and detailed information on the classi cation of immune subtypes of PDAC.
In order to de ne discrete pancreatic cancer categories built on the immune components of pancreatic cancer, we performed an unsupervised consensus cluster based on the expression of these 138 representative immune genes which were selected a priori. Calinski index was used to select the best number of clusters, which indicated that the best separation was a partial trivial solution re ected by dividing the queue into three clusters. These clusters re ected the different amplitudes of overall gene expression (K = 3; Fig. 1C). When the total sum of square is between 2000 and 3000, and the consistency index score is determined as 3, it is the best and the sample group could be effectively divided into three clusters in the cluster separation scatter plot (Fig. 1D). Before further excavating the patient population with the characteristics actually represented by the classi cation, we designated the samples as IS1 (N = 71), IS2 (N = 43) and IS3 (N = 32). IS1 is the most common, while IS3 is the least. We posited that different types to represent different immune response activations, which would be analyzed in more detail below.
To evaluate the tumor micro-environment of PDAC of different immuno-types, we wanted to identify the speci c immunocytes subtypes activated among the immunogenic clusters. In particular, we deconvolved the gene-expression signatures of the IS1, IS2 and IS3 samples with CIBERSORT. Overall, in the three types of the immunogenic clusters, the majority types are NK cell resting, T cell CD4 memory activated, macrophage M0, plasma cells, B cell memory, dendritic cell resting, T cell regulatory, T cell CD8 and monocytes etc. [38,39], which is consistent with the previous reports. IS3 tumors were enriched with Band T-cell marker( Fig. 2A). However, less T-reg cells, macrophages, neutrophils, and DCs and more mast cells resting were obserced in the IS3 tumors compared with IS1 and IS2. Monocyte-related transcripts were also represent more in IS1 samples. Additionally, IS1 shows an enrichment in macrophages M2, while IS2 macrophages were inhibited in M0 stage. Different immune subtypes show different characteristics of immune cell in ltration, revealing different manifestations in anti-cancer immune function.
Since we have identi ed the signi cant associations and difference in immune-cell composition and abundance between three pancreatic cancer immune subtypes, we further investigated whether alterations in immune hub-genes expression were related to the presence of immune cells in the tumor microenvironment (TME). Analysis of a panel of immune-regulatory genes was con rmed, which included 5 types about immune activators, immune inhibitor, immune checkpoint resistance (ICR), major histocompatibility complex (MHC) and T regulatory (Treg) cell. The immune subtypes were signi cantly different among the three IS clusters, which showed a distinct association with the immune subtypes (Fig. 2B). Immune subtypes are part of a shared cluster with high expression of immunosuppressive agents, MHC-related genes, STAT1 and STAT3 transcription factors. Interestingly, from the heatmap we nd the expression of PD-L1, CTLA-4, IDO1, STAT1 and STAT3was signi cantly higher in IS3 than other molecular subtypes.

Differential tumor biological events associated to immunephenotypes
We next explored whether the immune-phenotypes related with clinical and pathological characteristics of the tumors. First we proceeded to compare the different immune subtypes in term of survival. In coherence with previous studies, we observed that the survival of patients showed an improvement trend in immune types IS3 at rst (Fig. 3A). However, since the IS3 cluster was enriched in anti-tumor immune response, which is classically characterized by worse prognosis, prognosis of patients bearing IS3 immune phenotype was worse as compared with subjects bearing the other both immune phenotypes in the end (Fig. 3A) An intriguing exception was that the clinical outcome of pancreatic cancer is best in the case of IS2 (immune-phenotypes of lowest cytotoxicity), with the longest average survival age of the patients. These results suggest that anti-tumor immune response negatively promote the survival potential ability of patients and has higher survival time and rate of risk than tumor immune response.
In addition, we also observed that tumors at a higher pathological stage at the time of diagnosis were signi cantly associated with decreased cytotoxicity of immune subtypes in each immune cancer subtype ( Figure 3B). These observations indicate that tumors preferentially grow when cytotoxic immune in ltration is weak. In contrast, tumors with a highly cytotoxic immune phenotype are partially controlled by the immune system and progress to more advanced stages less frequently. Finally, we analyzed the biological functional pathways enriched by differentially expressed genes of different immune subtypes ( Figure 3C). The activation pathway of each immune subtype is complex and irregular, and it is impossible to nd valuable clues that affect tumor occurrence and development.

Somatic mutations across immune subtypes drive invasive phenotype
To identify whether somatic mutations are speci cally associated with the level of immune activations, we compared the mutations genes in each immune phenotypes (Fig. 4A). Several differentially mutated genes (DMGs) by the maftools are identi ed. The mutation frequency of ve of these genes (e.g., KRAS, TP53, SMAD4, CDKN2A and TTN) was higher than expected by chance given background mutation processes and the missense and frame-shift mutations were in majority. Therefore, this situation were considered driver mutations. Then we observed the relationship between Ti/Tv in the overall mutation (Fig. 4B) and found that the overall level of C>T transformation is signi cantly higher than other transformations, indicating that the enzyme gene catalyzing the conversion of cytosine to thymine is inactivated, and the enzyme catalyzing the conversion of thymine to guanine is activated (Fig. 4B). This mutation conversion indicates that somatic mutations of immune subtypes may increase the invasiveness of normal pancreatic cells, and may be associated with poor prognosis of pancreatic cancer [40,41].
In addition to SNP mutation, cancer mutation information also includes copy number variation and chromosome aberration. In order to investigate the difference in copy number variation in different immune subtypes, CISTIC2 was further adopted for analysis (Fig. 5A). From the results, it can be found that the copy number variation of IS3 is signi cantly different from the other two subtypes, and the copy number is missing at 1-2p, and increased at 4p, especially at 6-13p when compared with IS1 and IS2. The copy number changes little after 16p. IS2 has relatively more variation in ampli ed copy number while its delepted copy number variety is relatively few. (Fig. 5B). This may suggest that structural damage in the genome has an impact on the immune in ltration performance of tumors and further increases the mutation caused by the immune in ltration of tumor cell. However, there is no abnormal change between different subtypes when we count neoantigen, which is worth thinking and exploring when mutations between different subtypes are used to accelerate the cell death in the treatment of pancreatic cancer.

DDR defect may affect the production of immune subtypes
Since overall genomic and chromosomal instability are insu cient to explain the differences in the immune-cell composition within each PADC subtype, there were several studies reporting that DNA damage repair (DDR) is closely related to the tumor immune response. So, we aimed to investigate the effects of DDR expression in the anti-tumor activity in the TME (total mesorestal excision) and found the expression of DDR genes correlated with immune in ltration (Fig. 6A). Classical DDR related genes were selected to show their relationship with immunocyte content, such as B cell and CD4+/CD8+ T cell. These three kinds of cells are the most common immune cells derived from lymphocytes after the occurrence of cancer, , whose main function is to eliminate abnormal cells, protect the body and avoid proliferation [42,43]. Then, we found the expression of DDR genes negatively correlated with the expression of immuneregulatory genes. For example, activation of ATM, RB1, and TP53 were part of the same group and showed high relationship with PDCD1/PD-L1 and CTLA4 (Fig. 6B). Among the seven genes, ATM was positively correlated with PDCD1, and the correlation between PALB2 and PDCD1 was the lowest (0.01). The correlation between CTLA4 and ATM was also the highest (0.499). It should be noted that CD274 was highly correlated with ATM, BRCA2, PALB2, Rb1(0.561, 0.593, 0.407, 0.533).
Another study showed that ATM, BRCA1/2, PALB2, RB1, and TP53 were linked to signi cant increase in immunogenic mutations. These results suggest the presence of cytolytic and regulatory cells was impacted by expression of DDR genes within the TME of PADC tumors [44]. These ndings suggest that we may start with DDR genes within the TME of PADC tumors for the treatment of PCDA.

Discussion
This study developed a clustering method based on immune gene sets to characterize the immune invasion classi cation of pancreatic cancers. Although the current mainstream research has a variety of methods for immunophenotyping, the immune-related subtyping of pancreatic cancer is not yet clear. An obvious conclusion can be drew from the results, consensus clustering worth to attempt further pancreatic tumor study, if provided theoretical support with more detailed and complete statistics. To what extent do the three immune in ltration conditions described based on the immune gene set represent different evolutionary pathways or the conclusions of tumors at different stages of progression are unclear, and this needs to be predicted and veri ed with a larger sample size. These three in ltration manifestations have their own characteristics in tumor biology. For example, in the cases of IS1, antitumor immune-related pathways are activated and immune response intensively, indicating that the patient is in a state of recovering, and the occurrence and development of tumors may be affected.
Limitation or reversal. However, none of these are available for IS2 cases. In the case of IS3, tumors with a highly cytotoxic immune phenotype will be at an early stage and have mechanisms to suppress strong immune pressure. Then they will gradually evolve to invade neighboring tissues, and the analysis of the proportion of their immune cells shows that at the same time their immunophenotype will shift to a more immunosuppressive in ltration mode. Tumors with poor cytotoxic in ltration represent advanced stages of malignant tumors, complete progression, and are hardly controlled by the host's immune system.
The research concern about evaluation of cancer conditions needs to rely on comprehensive methods to predict the internal and external regulation of the tumor. TME can well re ect the information from the special mode of intracellular and extracellular content. It is worth noting that IS3 reveal that the immune cell communication network in TME is complex, but critical to activation of anti-tumor immune reaction. The consequence network structure is variable. The property of tumor-reactive immune cells shows a important role in immunocytes in ltration. Relevant key receptors and ligands and the role of receptorligand pairs (such as the CCL5-CCR5 axis), and explain how immune cell interactions differ depending on the immune system environment and manifest in immune subtypes.
Through the comparison of mutations in immunophenotyping in this study, we have observed the possible in uence of genome changes on immune response. For example, KRAS mutations are enriched in IS3, but not common in IS2, which suggests that mutations that drive oncogenes can activate different immune pathways in cells. Driver mutations, such as TP53, can alter the immune landscape by inducing abnormal transcription, possibly by generating new antigens. Our results con rm previous ndings that mutations in BRAF enhance immune in ltration, while mutations in IDH1 suppress immune responses. [45,46]. These views require further work to determine the functional aspects of these associations.
A recent analysis of the TCGA data set found that the number of somatic mutations and cytolytic activity (de ned as the average expression of GZMA and PRF1 transcripts) have a linear correlation, but are highly different in multiple cancers [47]. Although a higher number of mutations may increase the possibility of generating mutant peptide sequences that are recognized as exogenous by T cells, when divided by the normalized neoantigen count of the total number of mutations, we did not detect the relative IS2 novelty in the immune IS3 phenotype. Enrichment of epitopes. These observations indicate that mutations in Th-1 rich IS3 tumors are not inherently more immunogenic than mutations in IS21 tumors. We also observed that the correlation between mutation load and immunophenotype strongly depends on IMS.
The DDR mutation status of tumors is an important predictive biomarker of immune checkpoint inhibitor response [48]. Analysis of tumor DDR mutation status plays an important role in predicting the causes of different immune in ltration characteristics in patients. There are reports showing that the abnormal expression of a variety of important chemokines is related to the occurrence of DDR [49]. Spontaneous TIL in ltration may occur at the same time as the malignant progression of tumors in the state before immunothermal treatment caused by DDR de ciency [50]. Although it is necessary to conduct mechanistic studies to further determine these associations, our results show that in tumors with DDR mutations (such as TP53, ATM, RB1, and BRCA2), the expression pattern of key IFN mediators STAT1 and/or STAT3 increases. The activation of the IFN pathway. Availability of data and materials

Abbreviations
The datasets analyzed during the current study are available in the TCGA repository, https://cancergenome.nih.gov/.