Identication of Prognostic Gene Signature Associated with Tumor Microenvironment of Cholangiocarcinoma

Background: Cholangiocarcinoma (CCA) is the most common malignancy of the biliary tract with a dismal prognosis. Increasing evidence suggests that tumor microenvironment (TME) is closely associated with cancer prognosis. However, the prognostic signature for CCA based on TME has not yet been reported. This study aimed to develop a TME-related prognostic signature for accurately predicting the prognosis of patients with CCA. Methods: Based on the TCGA database, we calculated the stromal and immune scores using the ESTIMATE algorithm to assess TME in stromal and immune cells derived from CCA. TME-related differentially expressed genes were identified, followed by functional enrichment analysis and PPI network analysis. Univariate Cox regression analysis, Lasso Cox regression model and multivariable Cox regression analysis were performed to identify and construct the TME-related prognostic gene signature. Gene Set Enrichment Analyses (GSEA) was performed to further investigate the potential molecular mechanisms. The correlations between the risk scores and tumor infiltration


Background
Cholangiocarcinoma (CCA) is one of the most common malignancies of the biliary tract and also the second most common hepatic malignancy after hepatocellular carcinoma. Based on the anatomic location, CCA can be classified into 3 subtypes : intrahepatic (iCCA), perihilar (pCCA), and distal (dCCA) subtypes [1]. Despite some therapeutic progress has been achieved, the prognosis of CCA remains dismal [2]. Clinicopathological parameters such as the TNM stage, completeness of the procedure (R0/R1) and lymph node metastases have been used to predict the prognosis of CCA patients [3,4]. However, these prognostic markers have limitations in the prediction of prognosis of CCA patients. Thus, more reliable prognostic markers are needed to improve such assessment.
Tumor microenvironment (TME) has been reported to play pivotal roles in tumor development, progression and recurrence [5]. Increasing evidence shows that TME is closely associated with cancer prognosis [6][7][8]. TME consists of surrounding cells and noncellular components. Tumor-infiltrating stromal cells and immune cells as the major cell components of surrounding cells that play a significant role in cancer prognosis [9][10][11]. For example, Vigano et al. found that tumor-infiltrating lymphocytes and macrophages were associated with prognosis in intrahepatic cholangiocellular carcinoma patients after complete surgery [12]. To access the fraction of stromal and immune cells in tumor samples, Estimation of STromal and Immune cells in Malignant Tumors using Expression data (ESTIMATE) has been developed as an open online tool [13]. Accumulating evidence has indicated that tumor-infiltrating stromal cells and immune cells were correlated with prognosis in patients with several tumors using ESTIMATE algorithm. However, these prognostic factors have not been investigated in CCA. Thus, using tumor-infiltrating stromal cells and immune cells to predict CCA patient's prognosis still needs further investigation.
Considering the prognostic potential of TME in CCA, we calculated the stromal and immune scores of CCA patients with the ESTIMATE algorithm in the current study and established a novel TME-related prognostic model for CCA. Our results may serve as a prognosis stratification tool for guiding personalized treatment of CCA patients.

Data collection and calculation of stromal and immune scores
We download transcriptome RNA-sequencing data and clinical data of cholangiocarcinoma patients from The Cancer Genome Atlas (TCGA) database, and a total of 32 patients were selected according to the screening criteria. The clinical information of CCA patients is shown in Table 1. The stromal and immune scores were calculated through the ESTIMATE algorithm using the "estimate" package in R. All patients were divided into high-and low-stromal/immune-score groups based on the median stromal/immune scores.

Differential expression analysis
Differentially expressed genes (DEGs) between high-and low-stromal/immune-score groups were identified using "DESeq2" package in R software. |log2fold-change| > 1 and adjusted P value (FDR) < 0.05 was set as the threshold for DEG identification.

Function enrichment analysis and protein-protein interaction (PPI) network construction
Functional enrichment analysis was carried out using "clusterProfiler" package in R, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). GO terms are composed of biological process (BP), cellular component (CC), and molecular function (MF). Adjusted P-value < 0.05 was considered statistically significant.
The protein-protein interaction (PPI) network was constructed using the Search Tool for the Retrieval of Interacting Genes (STRING) database [14]. Cytoscape software (version: 3.6.0) was utilized to visualize the molecular interaction networks [15]. The top 3 modules of PPI network were selected using the Molecular Complex Detection (MCODE), which is a plugin in Cytoscape. The degree cut-off = 2, node score cut-off = 0.2, k-core = 2, and maxinum depth = 100 were regarded as standard.

Construction of TME-related prognostic model
Univariate Cox regression analysis was performed to identify survival-associated DEGs according to the threshold value of p<0.05 using the "survival" package in R.
Then, we conducted a least absolute shrinkage and selection operator (LASSO) and multivariable Cox regression analysis to identify TME-related prognostic genes using the "glmnet" and "survival" package, respectively in R. The calculation formula of risk score for CCA patients was shown as follows: Risk score = Coef1*Exp1+Coef2*Exp2+Coefi*Expi, Coefi was the regression coefficient which was calculated by multivariate Cox regression model, and Expi was the gene expression level. Patients were divided into a high-and low-risk groups based on the median Risk score.

Gene set enrichment analysis (GSEA)
To investigate the differences of the potential molecular mechanisms of the TMErelated signature between the high-and low-risk groups, gene set enrichment analysis (GSEA) was performed using GSEA software (version: 4.0.3) [16]. Annotated gene sets c2.cp.kegg.v7.2.symbols.gmt and c5.go.v7.2.symbols.gmt were chosen as the reference gene sets. P-value < 0.05 were considered statistically significant.

The correlation between the prognostic model and tumor-infiltrating immune cells
Different types of tumor-infiltrating immune cells were obtained from Tumor Immune Estimation Resource (TIMER) (https://cistrome.shinyapps.io/timer/) database using gene expression profiles based on different algorithm [17]. We assessed the correlation of the prognostic model between with stromal scores, immune scores and tumorinfiltrating immune cells by spearman correlation analysis in R. Then, we compared the fraction of 22 immune cell types between low-and high-risk groups using the Wilcoxon rank sum test. P value < 0.05 were considered statistically significant.

Relationship of immune/stromal scores with clinicopathological indicators and the prognosis in patients with CCA
The overall scheme of this study is shown in the Figure 1. A total of 32 CCA patients were collected from TCGA database according to the screening criteria. Immune/stromal scores of the 32 CCA patients were calculated based on the ESTIMATE algorithm using gene expression profile. We firstly analyzed the distribution of immune/stromal scores in different group with clinical characteristics. Our results showed that there was no significant correlation between immune/stromal scores and clinicopathological indicators, such as tumor grade and stage (Data not shown). Next, we divided the 32 CCA patients into high-and low-score groups according to the median immune/stromal scores to detect the potential correlation between survival and immune/stromal scores (Figures 2A-D). Kaplan-Meier survival analysis showed that low immune scores (p = 0.0320) ( Figure 2C) were associated with poorer overall survival (OS) than those with high scores.

Identification of DEGs in CCA based on immune/stromal scores
To identify the DEGs, differential expression analysis was performed between highand low-score groups using the Dseq2 package in R. With the threshold of |logFC|>1 and FDR <0.05, 745 upregulated and 142 downregulated immune-related genes were obtained based on immune scores ( Figure 3A, Table S1). In addition, we identified 791 upregulated and 171 downregulated stromal-related genes based on stromal scores ( Figure 3B, Table S2). Hierarchical clustering analysis results visualized the difference in expression patterns of the top 20 up-regulated and down-regulated DEGs between high-and low-stromal/immune-score groups (Figures 3C, D). Furthermore, Venn diagrams showed that 674 genes were commonly upregulated and 110 genes were commonly downregulated in the high-score groups (Figures 3E, F).

Functional enrichment analysis and PPI network
To investigate the biological characteristics of DEGs, functional enrichment analyses were performed, including GO enrichment and KEGG pathway analysis. The GO results showed that DEGs were mainly involved in "T cell activation", "lymphocyte proliferation and differentiation", "MHC protein complex", "cytokine binding" and "chemokine binding" (Figure 4A). Furthermore, KEGG enrichment analysis results indicated that these DEGs were mainly involved in "cell adhesion molecular", "chemokine signaling", "hematopoietic cell lineage" and "Th17 cell differentiation" ( Figure 4B). All the results of GO and KEGG pathway analysis were showed in Table  S3 and Table S4. To explore the relationships between these DEGs, we performed PPI network analysis using STRING database and Cytoscape software. Top 3 most significant modules were selected from the PPI network using the MCODE plugin in Cytoscape software (Figures 4C-E). these genes in the top 3 significant modules were selected for further analysis. These results of GO and KEGG pathway analysis indicated that these genes in the top 3 significant modules were strongly associated with the immune response (Figures S1A-B).

Identification of TME-related prognostic genes and construction of a two-gene risk prediction model
In order to evaluate the prognostic effect of DEGs, univariate Cox regression analysis was performed, and 22 survival-related DEGs were identified with P < 0.05 (Table 2). Subsequently, we conducted a LASSO regression and multivariate Cox regression, two genes were identified and used to construct a novel TME-related prognostic model for CCA (Figures 5A, B). As shown in forest plots, KLRB1 was a protective factor, while GAD1 was risk factor for CCA ( Figure 5C). After extracting the multivariate Cox regression coefficient, we calculated risk score of each patient on the basis of the following formula: risk Score = (0.2940 * expression value of GAD1) + (-0.5259 * expression value of KLRB1). Patients were categorized into two groups according to the median risk score, namely the high-risk group and the low-risk group. Kaplan-Meier survival analysis showed that patients with high-risk score had shorter survival time than those with low-risk score ( Figure 5D). The AUC value of ROC curve analysis for predicting the survival of patients at 1-, 2-, and 3-years was 0.811, 0.772 and 0.844, respectively (Figures 5E-G), indicated that the TME-related prognostic model was relatively accurate. Risk score curve and survival status plot reflecting that the number of death cases in high-risk group is significantly more than that in the low-risk group (Figures 5H, I). The heatmap showed the expression of two-gene signature between high-and low-risk groups ( Figure 5J).

Independent prognostic analysis
In order to examine the clinical independence of the model, we performed univariate/multivariate independent prognostic analysis. Univariate independent prognostic analysis ( Figure 6A) showed that high-risk score was related to poor prognosis (HR = 2.718, p < 0.001). Moreover, multivariate independent prognostic analysis ( Figure 6B) indicated that the risk score could serve as an independent prognostic factors for patients with CCA (HR = 3.21, p = 0.003).

GSEA of the mechanism underlying the prognostic differences between the two groups
In order to investigated the potential molecular mechanisms of the prognosis difference between high-and low-risk score, we performed GSEA analysis. GO terms enrichment in the high-risk score group was mainly focused on: "Golgi cisterna membrane" and "positive regulation of smoothened signaling pathway" (Figure7A), and KEGG pathway enrichment was in the high-risk score group was mainly involved in notch signaling pathway ( Figure 7B). However, in the low-risk score group, GO enrichment was primarily focused on: "regulation of cardiac muscle contraction by regulation of the release of sequestered calcium ion", "negative regulation of NF-kappaB transcription factor activity" and "CD4-positive and alpha-beta T cell cytokine production" (Figure 7C); and in the case of KEGG pathway, cell adhesion molecules pathway was significantly enriched ( Figure 7D). Results of the GSEA are shown in Table S5.

Correlation of risk score to TME-score and immune cell infiltration
To evaluate the correlation between the TME score (including immune/stromal/ESTIMATE score) and the prognostic model, spearman correlation test was performed. The results showed that the risk score of the prognostic model was negative correlated with the immune/stromal/ESTIMATE score (Figures 8A-C).
We evaluated the correlations between the prognostic model and tumor-infiltrating immune cells which were obtained from TIMER database through different algorithm. Based on TIMER algorithm, we analyzed the relationship between the prognostic model and six type of tumor-infiltrating immune cells. Our results showed that B cells (Cor = -0.462, P = 7.83e-03) ( Figure 9A) and CD4+ T cells (Cor = -0.467, P = 7e-03) ( Figure 9B) were negative correlated with risk score. However, there were no significant correlations between the risk score and dendritic cell ( Figure 9C), neutrophil ( Figure 9D), CD8+ T cells ( Figure 9E) and macrophage ( Figure 9F). In addition, we compared the difference of 22 type of tumor-infiltrating immune cells, which obtained from TIMER database through CIBERSORT algorithm between highand low-risk groups. Our results showed that the abundance of naïve B cell (p = 0.04) and CD4 memory resting T cell (p = 0.0063) were significantly enriched in low-risk group as compared to the high-risk group (Figure 10A). Finally, we performed spearman correlation test to analyze the correlations between 20 different immune cell types. The correlation of the 20 different immune cells was range from -0.64 to 0.86, the correlation coefficient between resting mast cell and resting NK cell was 0.86, indicating that resting mast cell was significantly correlated with resting NK cell. The correlation of the other immune cells was range -0.64 to 0.57, which indicates that they were weakly to moderately correlated ( Figure 10B).

Survival and expression analysis of the TME-related prognostic signature in CCA patients
To analyze the survival and expression of the two-gene signature in CCA patients, we integrated mRNA expression profile of the two signature genes and survival information of CCA patients. Our results showed that the expression of GAD1 mRNA was significantly upregulated in tumor tissues (P < 0.0001) (Figure 11A), while KLRB1 mRNA expression was significantly decreased in tumor tissues (P = 0.0046) ( Figure 11C). The Kaplan-Meier survival analysis demonstrated that Patients with high GAD1 expression had poorer overall survival (P = 8.25e-03; Figure 11B), and those with low KLRB1 expression had poorer overall survival (p = 1.637e-02; Figure 11D). Therefore, GAD1 may be a promising therapeutic target for CCA patients.

Discussion
Cholangiocarcinomas are aggressive tumors, and most patients were diagnosed at advanced stage [1]. For patients with advanced-stage or unresectable cholangiocarcinoma, the overall survival of untreated patients is a dismal median of 3.9 months [4]. Although several combination therapeutic regimens for CCA have been applied into clinical practice to improve prognosis, the median overall survival is still less than 1 year [4]. Thus, identification of new biomarkers and establishing prediction models of CCA for diagnosis and treatment are much warranted. Accumulating evidence suggests that the TME plays an important role in occurrence and development of tumors and targeting TME has large potential for anti-cancer treatment. Recently, TME-related signature has gained much attention and shown great potential in prognosis prediction of cancer. Therefore, it is critical to screen prognostic markers related to TME of CCA.
In this study, we first calculated the stromal and immune scores of CCA tumor tissues using the ESTIMATE algorithm and analyzed the relationship of stromal-immune scores between with prognosis of CCA patients. We found that CCA patients with low immune/stromal scores had a poorer prognosis than those with high scores, which indicates that the TME composition affects the clinical outcomes of CCA patients. Then, a total of 784 overlapping DEGs based on immune and stromal score were obtained for further analysis. Function enrichment analysis of these DEGs and the genes in the top 3 modules obtained from PPI network showed both the GO terms and KEGG pathways were strongly associated with the immune response.
Furthermore, we identified a two-gene signature (including GAD1 and KLRB1) and subsequently established a novel risk prediction model for CCA prognosis prediction. Among them, glutamic acid decarboxylase 1 (GAD1) serves as a rate-limiting enzyme involved in synthesizing γ-aminobutyric acid (GABA). Over expression of GAD1 has been found in several cancers (including oral cancer, colorectal carcinoma, breast cancer, lymphoma and gastric cancer) and it can influence the proliferation, metastasis and chemoresistance of cancer cells [18][19][20]. Increasing research suggests that GAD1 has the potential to be a biomarker for prognosis prediction [21][22][23][24]. Tsuboi et al. found that upregulation of GAD1 in lung adenocarcinoma patients showed significantly poorer prognosis for overall survival [22]. Lee et al. reported overexpression of GAD1 as a poor prognostic factor in patients with nasopharyngeal carcinoma [23]. Additionally, GAD1 also has been found to be a prognostic marker for glioblastoma (GBM) patients [24]. However, the role and mechanism of GAD1 in CCA remains poorly defined. Killer cell lectin-like receptor B1 (KLRB1), also named as CD161, has been reported as a prognostic gene in many cancers [25][26][27]. For example, Qin et al. showed that high expression of KLRB1 was associated with good overall survival in early cervical squamous cell carcinoma patients [25]. Braud et al. have shown that expression KLRB1 in lung cancer is associated with better clinical outcome [27]. Consistent with these findings, our studies have demonstrated that low KLRB1 expression in CCA patients had poorer overall survival. However, single-cell analysis in glioma revealed KLRB1 as a candidate inhibitory receptor of tumor-infiltrating T cells, and targeting KLRB1 could enhances T cell-mediated killing of glioma cells [28]. In addition, single-cell analysis in early-relapse hepatocellular carcinoma revealed that CD8+ T cells in recurrent tumors overexpressed KLRB1 have an innate-like, low cytotoxic, and low clonal expansion phenotype, and these T cell with high KLRB1 expression enrichment in HCC was associated with a worse prognosis [29]. The bimodal role of KLRB1 in prognosis prediction may be due to the great heterogeneity between different cancers.
Then, we calculated the risk score of each patient based on the TME-related gene signature, and CCA patients were divided into high-and low-risk groups according to the median risk score. Further analysis revealed that the two-gene risk model can be served as an independent prognostic factor in CCA, and patients in the high-risk group showed worse prognosis. ROC curve analysis indicated that the prognostic signature had an excellent performance in predicting OS. Furthermore, we evaluated the correlations between the prognostic model and the TME scores. Our results found that the risk score of the prognostic model was negatively correlated with the immune scores, which consistent with the previous results of this research. Both low-risk group and high-immune score group are associated with favorable prognosis, indicating that immune cell infiltration of the TME could have a beneficial impact on prognosis.
Finally, we analysis tumor-infiltrating immune cells of CCA patients using TIMER database based on TIMER and CIBERSORT algorithm, then evaluated the correlations between the prognostic model and tumor-infiltrating immune cells. Our results showed that B cells (Cor = -0.462, P = 7.83e-03) and CD4+ T cells (Cor = -0.467, P = 7e-03) were negative correlated with risk score. Further analysis revealed that naïve B cell (a subtype of the B cell) and CD4 memory resting T cell (a subtype of CD4+ T cell) were enrichment in low-risk group. Previous study revealed that naïve B cell and CD4 memory resting T cell infiltration was positively correlated with prognosis in several cancers [30][31][32]. Thus, infiltrating of naïve B cell and CD4 memory resting T cell may play important roles in the prognosis of CCA patients, which will be worth further investigating.
However, this study also had some limitations. First, the sample size of CCA in TCGA database was relatively small, it should be validated in a large sample size and other databases. Second, the function and mechanism of the gene signature in our study was analyzed based on multiple bioinformatics tools, the conclusion of this study need to be further confirmed by a series of basic experiments.

Conclusions
In conclusion, our study established a novel TME-related gene signature to predict the prognosis of patients with CCA. This may further our understanding of the potential relationship between TME and CCA prognosis, and serve as a prognosis stratification tool for guiding personalized treatment of CCA patients.