A Prognostic Signature Based on Immune-Related Genes Associated with Tumor Microenvironment for Cholangiocarcinoma

Several studies indicate immune related genes (IRGs) dramatically correlates with tumorigenesis and progress of cancer. However, the association IRGs and the value in predicting the prognosis of cholangiocarcinoma (CCA) patients remains unclear. Thus, we aimed to discover reliable biomarkers based on IRGs signature to predict survival in CCA. Gene expression proles of CCA patients were collected from TCGA and GEO database. Univariate, multivariate and Lasso (Least absolute shrinkage and selection operator) Cox analysis was used to identify IRGs signature. Subsequently, a nomogram that predicts outcomes for CCA patients based on IRGs signature was constructed to visualize prognosis. We also explored the relationships between tumor-inltrating immune cells (TIICs) and IRGs. GDSC database was used to explore potential targeted drugs. Tissue samples of 23 CCA patients were collected and qRT-PCR were employed to validate the mRNA level of IRGs.


Abstract
Background Several studies indicate immune related genes (IRGs) dramatically correlates with tumorigenesis and progress of cancer. However, the association IRGs and the value in predicting the prognosis of cholangiocarcinoma (CCA) patients remains unclear. Thus, we aimed to discover reliable biomarkers based on IRGs signature to predict survival in CCA.

Methods
Gene expression pro les of CCA patients were collected from TCGA and GEO database. Univariate, multivariate and Lasso (Least absolute shrinkage and selection operator) Cox analysis was used to identify IRGs signature. Subsequently, a nomogram that predicts outcomes for CCA patients based on IRGs signature was constructed to visualize prognosis. We also explored the relationships between tumor-in ltrating immune cells (TIICs) and IRGs. GDSC database was used to explore potential targeted drugs. Tissue samples of 23 CCA patients were collected and qRT-PCR were employed to validate the mRNA level of IRGs.

Results
5 IRGs (AVPR1B, CST4, TDGF1, RAET1E and IL9R) were identi ed as robust indicators for OS. A prognostic model based on IRGs signature can signi cantly stratify patients into high-risk and low-risk groups. Meanwhile, ROC indicated that the risk model had satisfactory performance. We also observed that the risk score was an independent prognostic factor for OS and built a nomogram. The low-risk group had a signi cantly higher percentage of in ltrated macrophages M1 than the high-risk group.
Through analyzing the GDSC database, the high-risk group demonstrated drug sensitivity to some targeted drugs such as AMG 706 and Thapsigargin. Finally, the results of qRT-PCR were consistent with bioinformatics analysis.

Conclusions
We established a novel IRGs signature to predict prognosis for CCA. The risk model exhibited excellent diagnostic e ciency in predicting OS.
Background Cholangiocarcinoma (CCA) is a highly aggressive malignancy on biliary tract system with extremely unsatisfactory prognosis, and its incidence is increasing worldwide in recent years [1]. Although surgical resection remains the only possible treatment, the 5-year survival rate for CCA patients underwent radical resection remains less than 30% due to high frequency of metastasis and recurrence [2]. The bene ts from adjuvant therapy appear to be very limited [3]. One reason for the poor prognosis of CCA is that most patients are diagnosed at advanced stage. Immunotherapy has become a new pillar of cancer treatment for more than a decade, and it has offered hope for reducing the morbidity and mortality of this refractory disease [4]. Immunotherapy based on immune checkpoint inhibitors (ICIs) has achieved notable success in clinical treatment of tumor [5,6]. Many studies have demonstrated that immunotherapy shows a promising safety and e cacy for patients with CCA. Unfortunately, immunotherapy is not effective for all CCA patients [7,8]. Hence, these problems necessitate the development of an effective risk model that allows to closely monitor progression and shed light on treatment strati cation.
Evidence that immune-related genes (IRGs) are closely related to the tumorigenesis and progression of CCA has substantially increased over past years [9,10]. Several IRGs have been reported to serve as promising biomarkers for predicting survival [11][12][13]. For example, PD-L1 was signi cantly correlated with TNM stage and overall survival (OS) for CCA patients after resection [14,15]. The comprehensive analysis of the association between the IRGs and prognosis is conducive to elucidating the potential prognostic value. However, most of previous investigations mainly focused on a single IRG, which is not su ciently rigorous to clinical practice. Tumor immune microenvironment (TIME), composed of many immune cells, participated in tumor progression and immunotherapeutic response [16][17][18]. The proportion and type of tumor-in ltrating immune cells (TIICs) and cytokines in the TIME are in turn determined by IRGs expression. Therefore, it is essential to explore the relevance between the IRGs and immune cells, which will help to uncover the underlying molecular pathway and nd a robust immune target serve as prognostic biomarker.
Our study was to investigate the clinical implications of IRGs signature on prognostic strati cation for CCA patients. Second, we also explored the association between IRG signature and tumor immune cell in ltration. Finally, a prognostic nomogram that incorporated prognostic IRGs signature and clinical features was established to predict OS.
Meanwhile, we obtained a list of IRGs from the ImmPort database (https:// www.immport.org/home). The genes of cytokines, cytokine receptors, and those that were related to the signaling pathways of the T-cell receptor, B-cell antigen receptor, natural killer cell cytotoxicity, and antigen processing and presentation were all selected. Transcription factors (TFs) were obtained from the Cistrome Cancer database (http://cistrome.org/CistromeCancer). The list of IRGs and TFs used in our study were listed on Additional le 6: Table S1.

Identi cation of differentially expressed IRGs and TFs
The normalization of gene expression and screening of differentially expressed genes (DEGs) were determined using "limma" package. Probe sets without corresponding gene symbols or genes with more than one probe set were removed or averaged. |log 2 fold change| >1 and adjusted P <0.05 were considered as statistically signi cant. To explore regulatory network of IRGs, we draw the IRGs and TFs from all DEGs. Volcano plots and heatmaps were plotted respectively by "gplots" and "pheatmap" package on R software.

GO and KEGG pathway enrichment analysis
To understand biological pathways underlying differentially expressed genes, Gene Ontology (GO) annotation and Kyoto encyclopedia of genes and genomes (KEGG) pathway analyses were performed using DAVID (https://david-d.ncifcrf.gov/) and "ClusterPro ler" package. GO analysis consists of three categories: biological process (BP), cellular component (CC) and molecular function (MF). The results were visualized using "GO plot" package and false discovery rate (FDR) <0.05 was considered as the threshold.
Development and validation of prognostic IRGs signature for CCA Univariate Cox analysis were utilized to assess relationship between IRGs expression and OS. In order to further identify the robust independent prognostic IRGs, we performed least absolute shrinkage and selection operator (Lasso) and multivariate Cox analysis to conduct dimensionality reduction analysis by "survival" and "glmnet" packages. An independent prognostic model for prognostic analysis was designed based on the IRGs signature. We applied a linear model for prognosis risk score prediction: The risk score was calculated based on combination of the expression level of IRGs weighted by the regression coe cient. refers to the expression of IRGs in the sample, and β i is the regression coe cient.
To prevent over tting of the model, TCGA-CHOL dataset was utilized as training group. GSE107943 was used as a validation cohort to test reliability and stability of the model. The median risk score was used to divide samples into low-risk and high-risk group. The survival curves were plotted by Kaplan-Meier (K-M) analysis was performed to assess differences in the survival rates between two group. The predictive performance of prognostic model was evaluated by ROC curve. In addition, the risk score distribution and survival status scatter plots for patients in the prognostic model and the heatmap of prognosis-related IRGs were also displayed.

Construction of TF and IRGs network
To uncover the regulatory mechanisms of independent prognostic IRGs, correlations test between TF and IRGs was investigated by Pearson correlation coe cient (r), and the cut-off criteria were set as correlation coe cient >0.4 and P <0.05. The TF-IRGs regulatory network was built on STRING database and further visualized using Cytoscape version 3.7.2.
Construction and validation of the nomogram A nomogram was visualized by means of "rms" package based on risk scores and other clinicopathological parameters (age, gender and AJCC stage). The nomogram was subjected to 1000 bootstrap resamples for internal validation. Calibration of the nomogram for 1-, 2-and 3-year OS was performed by comparing the actual survival with the predicted survival. Further, the discriminatory ability of nomogram was evaluated by calculating the concordance index (C-index).

Estimation of tumor-in ltrating immune cells
We used CIBERSORT algorithm to quantify the proportion of immune cells [20]. The RNA-seq data of CCA patients from TCGA-CHOL and GSE107943 were integrated. Normalized gene expression data were analyzed using the CIBERSORT algorithm, running with 1,000 permutations. The correlation matrix for all 22 immune cell proportions were displayed by the "corheatmap" package. The difference of abundance of 22 TIICs between high-and low-risk groups was compared by Wilcoxon test. Furthermore, the relationship between TIICs and prognostic IRGs were analyzed using Spearman's rank correlation. The distribution of gene expression and immune cell in ltration in each sample were also displayed by "ggscatterstats" package.

Chemotherapeutic response prediction
According to the immune related risk score, we divided the CCA patients into high risk group and low risk group. Based on the Genomics of Drug Sensitivity in Cancer (GDSC, https://www.cancerrxgene.org/) [21], the largest publicly available pharmacogenomics database, we explored the chemotherapeutic response of targeted drugs for the patients from different risk groups. Two commonly used chemo drugs, cisplatin and gemcitabine, were also selected. The prediction process was implemented by R package "pRRophetic", which estimated the half-maximal inhibitory concentration (IC50) by ridge regression and the prediction accuracy was evaluated by 10-fold cross-validation based on the GDSC training set.

Quantitative Real-Time -PCR
Twenty-three matched tumorous and non-tumorous tissue specimens of cholangiocarcinoma are collected from the Biobank of First A liated Hospital of Xi'an Jiaotong University. Total RNA was extracted using TRIzol reagent (Invitrogen) and the total RNA integrity were checked by RNA 6000 Nano kit. Before reverse transcription to cDNA, RevertAid First Strand cDNA Synthesis Kit (Thermoscienti c K1622) was used and reacted at 42°C for 2 min to remove the residual genomic DNA from total RNA. PrimeScript ® , RT reagent kit was used to synthesize the complementary RNA. The FastStart Essential DNA Green Master (Roche 53510700) was utilized to perform real-time quanti cation. The relative expression levels of target genes were normalized by GAPDH and estimated using the 2 -△△Ct method.
The PCR primers are listed in Additional le 7: Table S2.

Statistical analysis
Statistical analyses of all data utilized in our study were completed by R software (version 4.0.3).
Categorical data were analyzed using Chi-square test or Fisher's exact test, and continuous variables were analyzed using ANOVA or Kruskal-Wallis H test for variables with an abnormal distribution. Univariate and multivariate Cox proportional hazards regression model was used to identify independent predictors of OS. All statistical tests were two-sided and P <0.05 was considered to be statistically signi cant.

Differential expressions of IRGs and TFs in CCA
The main design and process of this study are shown in the following gure ( Fig. 1). We used RNA-seq data of 45 samples, including 36 CCA and 9 non-tumor tissues, collected from TCGA-CHOL. Totally, 1755 DEGs were identified, which consist of 743 signi cantly down-regulated genes and 1012 signi cantly upregulated genes, for subsequent bioinformatics analysis ( Fig. 2A and 2B).
The mRNA levels of 2498 IRGs (ImmPort database) and 318 TFs (Cistrome Cancer database) were examined. This analysis eventually revealed 156 IRGs (73 upregulated and 83 downregulated) and 18 TFs (11 upregulated and 7 downregulated) according to the screening criteria. Then, the heatmap and volcano plot were visualized to show the expression pattern of differentially expressed IRGs ( Fig. 2C and 2D) and TFs (Additional le 1: Figure S1A and B) between CCA and non-tumor tissues. The names of DEGs were listed on Additional le 8: Table S3.

Construction of TF regulatory network
To investigate the biological roles and pathways of the DEGs, we found that top enriched GO terms were associated with regulation of signaling in immune and in ammatory response, such as "immune response", "in ammatory response", and "cytokine activity" (Fig. 2E). The signi cant pathways identi ed by KEGG pathway analysis included "cytokine-cytokine receptor interaction", "natural killer cell mediated cytotoxicity" and "chemokine signaling pathway".
Furthermore, GO enrichment analyses of differentially expressed IRGs demonstrated that the differentially expressed IRGs were mainly primarily participated in the pathway about tumor immunity, such as "regulation of immune effector process" and "regulation of cell killing" (Fig. 2F).
Identi cation of prognostic immune-related genes signature For observing the association between IRGs and OS, we further conducted a Cox regression analysis to screen the IRGs. We found 11 prognostic IRGs (RAET1E, CST4, CCL24, CCK, CGB, GUCA2A, TDGF1, TDGF3, THPO, AVPR1B and IL9R) that were signi cantly related to survival of CCA patients in the training group (Fig. 3A). To further explore potential regulatory mechanisms behind deregulation of 11 prognostic IRGs in CCA, we analyzed the correlation between TFs and IRGs expression, as showed on Additional le 1: Figure S1C.
To investigate signi cance of risk genes in estimating prognosis, the 11 survival-related IRGs were further submitted to Lasso method and multivariate Cox regression analysis (Fig. 3B-C; Table 1). Finally, only 5 candidates prognostic IRGs (RAET1E, CST4, TDGF1, AVPR1B and IL9R) which may serve as signi cant predictors were obtained for risk model. Among these genes, RAET1E, CST4 and TDGF1 were identi ed as high-risk genes (serving as risk factors), while AVPR1B and IL9R were identi ed as low-risk genes (serving as protective factors). In the TCGA training cohort, the expression distribution of ve IRGs and K-M curves for OS were shown in Additional le 2: Figure S2. Analysis of the immune-related signature in the training set The survival status of each patient in the training cohort was marked on dot plot, which showed that mortality rate in the high-risk group was signi cantly higher than that of the low-risk group (Fig. 4A and  4B). Among high-risk patients, three risk genes (RAET1E, CST4 and TDGF1) were upregulated, while two protective genes (AVPR1B and IL9R) were downregulated (Fig. 4C). In low-risk patients, these genes displayed an opposite expression pattern. The survival risk curve showed that patients with high risk had a signi cantly shorter OS time than those with low risk (P <0.001, Fig. 4D). Interestingly, we also explored the difference of DFS between two groups, and found the similar results (Fig. 4E). Then ROC analysis revealed reliable performance in the survival prediction of the model, with the AUC value of ROC at 1-, 3-, 5-year were 0.920, 0.966 and 0.940, respectively (Fig. 4F).
Analysis of the immune-related signature in the validation set The GSE107943 cohort including 30 CCA samples was used for validation of prognostic model. The risk score distribution, survival status and risk gene expression in the validation set were displayed in Fig. 5A-C. The K-M curves demonstrated that the OS (P <0.05; Fig. 5D) and DFS (P <0.01; Fig. 5E) was signi cantly poorer in high-risk group than low-risk group. The AUC at 1-, 3-, 5-year in the validation set were 0.689, 0.713 and 0.710, respectively (Fig. 5F). Taking together, these results indicated the risk score could well assess patients' prognosis. K-M analysis of OS among patients with different expression of each IRGs were displayed in Additional le 3: Figure S3.
Independent prognostic value of the risk model Next, combined with other clinicopathological factors, we performed Cox regression analysis to assess performance of our model, such as age, gender and AJCC stage. The univariate analysis and multivariate analysis revealed that risk score was an independent risk factor in the TCGA cohort (HR=16.459, 95%CI, 3.923-69.046, P <0.001) ( Table 2).
Nomogram for the prediction of prognosis in the training cohort To better estimate the prognosis of CCA patients, we established a nomogram that integrated prognostic risk score and clinical variables (age, gender and AJCC stage) (Fig. 6A). The nomogram could accurately predict OS, and demonstrated that risk score contributed much more risk points than other variables. Cindex for OS was 0.814, and calibration plot of internal veri cation for probability of survival at 1-, 2-and 3-year showed better conformity between the predicted and realistic observation results (Fig. 6B-D).

The correlation between the risk score and tumor-in ltrating immune cells (TIICs)
To determine whether our IRGs signature could re ect the proportion of TIICs on tumor immune microenvironment, we investigated the correlation between risk score and 22 immune cells in CCA patients. The difference on proportion of 22 TIICs types between high-and low-risk group were investigated ( Fig. 7A and Additional le 4: Figure S4). As shown on the violin plot, we found the low-risk group had more in ltrated macrophages M1 compared to high-risk group. In addition, we also analyzed the association between immune cells subpopulations, which indicated that memory resting CD4+ T cell is negatively related to CD8+ T cell, while active NK cell is negatively related to resting NK cell in CCA patients (Fig. 7B). To analyze whether prognostic IRGs might contribute to TIICs, we analyzed correlation between IRGs and tumor-immune cell proportions (Additional le 5: Figure S5). Interestingly, we found the expression of RAET1E is positively correlated to the proportion of B cells naïve; TDGF1 is positively correlated to memory active CD4+ T cells. IL9R is positively correlated to follicular helper T cells, and negatively correlated to M2 Macrophages ( Fig. 7C-F).

Drug Sensitivity prediction
We used GDSC database to predict the likelihood of response to several common chemotherapy and targeted drugs. We estimated the IC50 of each sample and observed a signi cant difference of IC50 between high-risk and low-risk groups among anticancer drugs listed on the GDSC database. Patients in the high-risk group were more sensitive to commonly administered chemodrugs (P =0.037 for AMG 706, P =0.019 for Thapsigargin) (Fig. 8A-D). In contrast, the chemotherapeutic response of cisplatin and gemcitabine was not signi cantly different between both groups.

External cohort validation for 5-IRGs signature
Finally, the 5-gene immune related signature was also validated in our cohort. Compared with adjacent normal tissues, the expressions of RAET1E, CST4, AVPR1B and IL9R were signi cantly higher (Fig. 8E), while TDGF1 were signi cantly lower in CCA samples.

Discussion
Activation of immune system has been widely proven to be a pivotal factor on tumorigenesis and metastasis [22]. With development of tumor immunotherapy in recent years, the function of immune system on cancer growth and metastasis has been addressed, in which immune cells participate in tumor biological activity by dysregulating expression of IRGs [23,24]. Hence, IRGs may be an important predictor for prognosis of CCA patients. Although some considerable efforts had been made to identify prognostic model for CCA, the reports on prognostic role of IRGs signature is lacking [25]. In our study, we identi ed a new prognostic IRGs signature and built a dependable model to predict survival for CCA patients.
In order to obtain a reliable prognostic model, we used LASSO and multivariate Cox regression analyses. Five IRGs (AVPR1B, CST4, TDGF1, RAET1E and IL9R) were identi ed as robust prognostic gene signature on CCA patients. The reliability of our model was further validated by GSE107943 cohort. The results demonstrated that the model had better discriminating power for risk strati cation, and showed prominent performance for prediction survival. After adjusting to other clinical factors, risk score generated by our model was demonstrated to be an independent prognostic factor. Finally, a nomogram model based on prognostic IRGs signature and clinical features was constructed to predict the survival rates of CCA patients. The immune risk score has the greatest effect on prognosis. Our results indicated that the risk model could act as an effective biomarker for survival prediction of CCA patients.
The functional enrichment analysis suggests that the IRGs are widely involved in the tumor immunological process, such as "regulation of immune effector process" and "regulation of cell killing". Retinoic Acid Early Transcript 1E (RAET1E) is one important component of the RAET1 family, which consists of major histocompatibility complex (MHC) class I-related genes. RAET1E binds and activates killer cell lectin like receptor K1 (KLRK1), mediating natural killer cell cytotoxicity. Eisele et al found that RAET1E participated in glioma immune escape in glioblastoma [26]. Interleukin-9 (IL-9) is a T cell cytokine that acts through a γ C-family receptor on target cells and is associated with in ammation and allergy. One study demonstrated that IL9R showed suppressed tumor growth on melanoma mice, and administration of recombinant IL-9 (rIL-9) inhibited melanoma as well as lung carcinoma growth [27]. In our study, we also found that IL9R was a protective prognostic factor for CCA patients, which is consistent with previous study. As for Cystatin 4 (CST4), some studies demonstrated that CST4 enhanced gastric cancer (GC) and colorectal cancer (CRC) aggressiveness, and found that blood biomarkers of CST4 have enormous potential in terms of clinical diagnosis in GC and CRC [28,29]. Teratocarcinoma-Derived Growth Factor 1 (TDGF1) is a member of the epidermal growth factor family, which had been regarded as a predictive marker for metachronous metastasis in CRC patients [30].
Recently, some studies have demonstrated that immune in ltration was a crucial determinant of responsiveness to immunotherapy and prognosis prediction of CCA [31,32]. The immune milieu on tumor microenvironment may account to the difference of survival outcome between low-and high-risk group. We found that M1 macrophages were the most in ltrated immune cells in low-risk compared with highrisk group, but the populations of M2 macrophage between two groups had no signi cant difference. Tumor-associated macrophages (TAMs) are one of the main components of the tumor immune contexture and have a major role in cancer progression and remodeling of the microenvironment [33].
Macrophages have different subtypes, which are differentiated under different cytokines. According to their activation, macrophages have three subtypes (M0, M1 and M2), and each subtype serves different immune functions. M1 macrophages secrete IL-12, IL-16, INF-γ and other pro-in ammatory cytokines to activate the in ammatory response and also participate in the host innate immunity, killing tumor cells in the TME. Our results indicated that M1 tumor in ltrated macrophages may exhibit anti-tumor roles on CCA.
GDSC database is a useful tool for predicting the chemotherapeutic response of targeted drugs for the patients from different risk groups. To determine small molecule drugs with potential roles to treat CCA, we observed that AMG 706 and Thapsigargin (TG) may have potential therapeutic e cacy against CCA with high risk. In contrast, cisplatin and gemcitabine haven`t been showed signi cant difference between both groups. AMG 706 is an orally administered, small-molecule angiogenesis inhibitor that acts as a multi-targeted tyrosine kinase inhibitor of VEGFR-1, -2 and -3, PDGF receptor (PDGFR), and stem cell factor receptor [34]. AMG 706 has also exhibited anticancer activity on clinical trial in solid tumors including low-grade neuroendocrine tumors and NSCLC [35,36]. Of which, TG is a non-competitive inhibitor of endoplasmic-reticulum Ca ( 2+ )-ATPase pump [37]. Moreover, TG can directly induce Estrogen receptor (ER) stress and decline the cell stemness [38]. The treatment of TG could inhibit cell viability via mediating NF-kappaB translocation and PI3K pathway [39]. Therefore, we concluded that these identi ed chemotherapy drugs might have potential to treat CCA.
There have been multiple studies for the value of IRG models to predict prognosis in various cancers, such asclear cell renal cell carcinoma [40], hepatocellular carcinoma [41] and cervical cancer [42].
Compared with the previous studies, our study had some advantages. First, we explored the relationship between IRGs expression signature and prognosis in CCA, and identi ed some prognostic IRGs. Second, our prognostic model based on IRGs and the nomogram showed outstanding performance in survival prediction. But some limitations should be addressed. The samples included in our study has limited sample size, and the results of our study were only validated in GEO database. The model requires more data support from clinical patients. Next, the potential molecular mechanisms of IRGs impact on CCA are not fully elucidated, which require further in vivo and in vitro experiments to investigate.
In summary, we established a novel ve IRGs-based prognostic model that accurately predicted the prognosis in CCA, and further con rmed the prognostic performance of this model using GEO database. Our nding elucidated the association between immune risk score and immune cells in ltration, proving its key role in tumor immune microenvironment. These ndings provide new insight into the role of IRGs and tumor immune cells in ltration on CCA, and could be useful in the future as a theoretical foundation  Figure 1 Flow chart of the main procedures of this study. Differential expression of IRGs and functional enrichment analysis of DEGs in CCA. Heatmaps and volcano plots of the differential expressed genes (DEGs) between CCA and non-tumor samples, including all differential expressed genes (A, B) and IRGs (C, D). DEGs are represented in rows, and samples are represented in columns. The expression value for each row was normalized by the z-score. The color from green to red represents the progression from low expression to high expression. Blue bar represents non-Lasso coe cient pro les of IRGs. (C) A coe cient pro le plot was generated to nd the optimal parameter (lambda).

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.