Identification for prognostic value of differentially expressed genes in immune microenvironment of clear cell renal cell carcinoma

Background: Genes related to Anchorimmune microenvironment of clear cell renal cell carcinoma (ccRCC) remains unclear. We aimed to identify related to immune microenvironment and to screen the most significant genes to predict outcomes of ccRCC. Methods: Gene expression and clinicopathological data from TCGA data portal were obtained (KIRC). Immune and stromal scores were calculated based on ESTIMATE algorithm. DEGs between low and high groups of immune scores were identified. Subsequent functional enrichment analysis and protein-protein interaction of DEGs were conducted by DAVID database. Results: Patients were divided into low and high groups by medians according to immune (median: 1038.45) and stromal scores (median: 667.945), respectively. Immune scores were significantly correlated with clinicopathological parameters and overall survival (OS). Based on immune scores, 1433 genes were up-regulated, and among them, 890 DEGs were significantly associated with OS. Based on top 10 DEGs, cases with number of upregulated genes ≥5 were associated poor OS (P = 0.002). In addition, the mean differences of percentages of CD8 T cells (11.32%), CD4 memory resting T cells (-4.52%) and mast resting cells (-3.55%) between low and high immune scores were the most significant. Conclusions: A list of immune microenvironment-related genes in ccRCC was initially identified, and high immune score was an independent poor prognostic factor of OS. Furthermore, the combination of these genes might use to predict the efficacy of immunotherapy. Further analyses of these genes were warrant to explore their potential association with the prognosis of ccRCC.


Introduction
Renal cell carcinoma (RCC) is one of the most common malignancies in urological oncology, among which clear cell RCC (ccRCC) is the most predominant histological type, accounting for more than 80% of all RCCs (1). Generally, ccRCC has been proved to be with highly infiltrated immune cells (2). Meanwhile, a subset of spontaneous regression of RCC, with a frequency of approximate 1%, were attributed to immune-mediated (3). Therefore, ccRCC was one of the earliest malignancies in history to be treated with immunotherapy, and heretofore remains one of most responsive tumors (4)(5)(6). In recent years, the progress of molecular immunology has promoted the discovery of immune checkpoint inhibitors (ICIs), which were successfully applied into clinical practice. Although complete response dramatically occurred up to 9% in patients with immunotherapy (7), still only part of population could benefit from either immunotherapy alone or immunotherapy combined with TKIs(ORR: 25-59%) (7)(8)(9)(10)(11)(12). Commonly speaking, response to immunotherapy in the field of ccRCC were not satisfied as we expected (13). This has prompted researchers around the world to keep exploring new potential biomarkers which can predict efficacy of immunotherapy and help clinical decision making.
Currently no optimal biomarker has been demonstrated to be capable to screen appropriate patients for immunotherapy in ccRCC. PD-L1 was the first biomarker to be considered as a promising predictor on immunotherapy. It has been proven that PD-L1 was a valuable biomarker in predicting immunotherapy efficacy of melanoma and lung cancer (14). Unfortunately, PD-L1 expression failed to distinguish appropriate patients with ccRCC for ICIs (14).Although a variety of prognostic models have been used in clinical settings to predict therapeutic effects and guide clinical decisions of RCC (15), these models offers limited information, especially in the era of targeted therapy and even immunotherapy. Theoretically, tumor mutation burden (TMB) and tumor neoantigen burden (TNB) could predict tumor response to immunotherapy. In terms of ccRCC, only a modest mutation frequency (with a median of 2.6/Mb, ranged from 0.1-10/Mb) observed (16,17). Recent study has demonstrated all three classic histopathologic types of RCC have predominantly high level of TNB (17), the key problem was it was hard to calculate accurately. In the study of IMmotion150, TNB failed to predict the immunotherapy efficacy of mRCC (18). In addition, TMB is positively correlated with cytolytic activity (CYT) in most tumors and the CYT of ccRCC was the highest among 17 tumor types, while the correlation between TMB and CYT in ccRCC was not significant (19).
As a whole, the above evidence suggested that none of current biomarkers could determine immune infiltration and predict immunotherapy response (16).
In the context of immuno-oncology, growing evidences suggested that tumor microenvironment (TME) plays an important role in tumor progression, metastasis and therapeutic resistance (20,21). Immune and stromal cells are the two main types of nontumor cells in TME (22). It has been reported that regulatory T cells (Tregs), tumor associated macrophages (TAM) and CD8 + T cells have different effects on tumor cells (23)(24)(25)(26)(27). Meanwhile, multiple researches have demonstrated that immune microenvironment is closely related to ccRCC (28)(29)(30). Whereas, the mechanisms of TME highly infiltrated with immune cells, spontaneous tumor remission in the course of treatment of primary tumor and the treatment response to immunotherapy in ccRCC still remain obscure. Since the immune microenvironment is a complex space consisted of various kinds of immune cells, the identification of these cells alone is insufficient to assess the complex microenvironment, which is mainly characterized by histology (immunohistochemistry, flow cytometry, etc.). Considering the technique is limited by multiple factors, such as the difficulty to simultaneously assess the number of miscellaneous cell types and the high demanded amount of tissue, Yoshihara et al. designed a calculation to quantitatively predict immune and stromal scores based on ESTIMATE algorithm (Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data), which utilizes specific genes in immune and stromal cells to predict scores of tumor purity and infiltration extent (31,32).
The cancer genome atlas (TCGA) is an open database aims to apply high-throughput genomic sequencing techniques to assist better understanding of cancer and thus improve the ability to prevent, diagnose and treatment of cancer (33,34). In order to analyze the influences of immune and stromal related genes on the prognosis of ccRCC, we aimed to classify ccRCC cases of TCGA into low and high score groups based on immune and stromal scores. Subsequently, we planned to identify differentially expressed genes between low and high groups and to screen the most significant genes to predict outcomes of ccRCC.

Acquisition of gene expression profile data
We obtained level 3 gene expression profile of ccRCC patients (KIRC) from the TCGA data portal (https://tcga-data.nci.nih.gov/tcga/). In addition, clinical and pathological data were also down loaded. The presence of infiltrating stromal/immune cells in tumor tissues of ccRCC was predicted by a tool of ESTIMATE algorithm created by Yoshihara et al. Relationship between immune/stromal score and clinicopathological parameters and overall survival (OS) low and high groups by medians according to immune and stromal scores, respectively.
Subsequently, we analyzed correlations between immune or stromal score and T stage, histologic type, tumor stage and top 4 mutated genes in ccRCC (VHL, PBRM1, SETD2 and BAP1). Additionally, survival analysis was conducted to assess difference between low and high groups of immune/stromal score.
Identification of differentially expressed genes (DEGs) between low and high groups stratified by immune or stromal score DEGs between low and high groups were identified by performing data analysis according to limma package from R software (35). The following cut-off points were adopted to screen DEGs: >1.5 of fold change with adjusted P value <0.05. Heatmaps, volcano plot and venn diagram were plotted by R software using pheatmap package, ggplot2 and gplots. Visual results of the above analysis were presented by bubble plot conducted by using R software with ggplot2 package. P value < 0.05 was considered as cut-off point of statistically significant.

Construction of protein-protein interaction (PPI) network and module analysis
We used a biological database named STRING (Search Tool for the Retrieval of Interacting Genes) for PPI analysis. The role of the database (http://string.embl.de/) was to conduct a network of PPI for identified DEGs on the basis of the proven and predicted PPIs, and then analyze the potential interactions among proteins (37). After online analysis, TSV format files were downloaded for further analysis in Cytoscape software (version 3.5.1) (38). We used Cytoscape software to visualize PPI network and explore top 3 significant modules of network with 10 or more nodes.

Survival analysis
Analysis of overall outcome was performed by R software with survival package.
Additionally, Kaplan-Meier plots were conducted by R software with survminer package.
We calculated hazard ratio (HR) and log-rank P value for presentation. P value < 0.05 was considered as cut-off point of statistically significant.
The composition differences of 22 immune cell subtypes between low and high immune scores In order to quantify the abundance of 22 immune subtypes in ccRCC specimens, the CIBERSORT deconvolution algorithm was applied (39). Specifically, gene expression data was uploaded to the CIBERSORT web portal (http://cibersort.stanford.edu/) to calculate proportions of immune cells using the LM22 signature at 1,000 permutations. After acquisition of the infiltration data of each case, we planned to find the difference of abundance of immune infiltration cell between low and high groups of immune scores.

Results
Immune scores were significantly correlated with clinicopathological parameters and OS Among all 537 cases downloaded from TCGA portal, 191 (35.6%) cases were female, 346 (64.4%) cases were male. The median age was 61y (ranged from 26 to 90 y). According to ESTIMATE algorithm mentioned before, the distributions of immune and stromal scores were ranged from -11.57.91 to 2030.4 and from 1158.94 to 3076.4, respectively. As presented in Figure 1, cases with T stage ≥ 2 were associated with higher immune score rather than stromal score ( Figure 1A Except for BAP1, immune scores were not correlated with mutations of VHL, PBRM1 and SETD2, while stromal scores were not statistically associated with all four genes ( Figure   1G-N). We further analyzed the correlations of immune and stromal scores with OS, and Kaplan-Meier survival curves showed that cases with high immune scores were associated poorer OS than cases with low immune scores ( Figure 1O), whereas stromal scores were not correlated with OS ( Figure 1P).
Gene expression profiles were distinct between low and high groups stratified by immune and stromal scores As shown in Figure 2A-C, the gene expression profiles presented by RNAseq data were significantly different between low and high groups in both immune and stromal scores.

Individual DEGs and OS
We explored the relationships of the 1433 up-regulated genes to OS and a total of 890 DEGs were statistically significant (Supplementary Table S1). Among them, 855 genes were shown to predict poor OS and only 35 genes were shown to predict favorable OS.
Kaplan-Meir survival curves of selected genes (Top 10 DEGs) were shown in Figure 3 and high expression of each gene was associated with poor OS.

PPI network among prognostic DEGs
We obtained the protein-protein interaction among DEGs by using the STRING tool. After downloading the data, we plotted PPI network through Cytoscape software for subsequent analysis. The whole network was composed of 733 nodes and 7001 edges, and a total of 7 modules were screened through MCODE. For further analysis, we selected the three most important modules (Figure 4). In the module 1, a total of 32 nodes and 495 edges form the network, and CXCL8, CXCR3/4/5 and CCL4/5 were the most important nodes ( Figure 4A). In the module 2 ( Figure 4B  and mast resting cells (-3.55%) between low and high immune scores were the most significant, with CD8 T cells the greatest (increased from 7.64% to 18.96%, P < 0.001). It is noteworthy that the trend of change in cell composition of CD8 T cells were opposite to CD4 memory resting T cells and mast resting cells ( Figure 6).

Discussion
As the in-depth research on molecular mechanism of incidence and evolution of solid tumors including renal cell carcinoma, researchers used ICIs to block co-inhibitory signals, prevent inactivation of T cells and then facilitate T cells to kill tumor cells (40,41). The unsatisfied treatment response keeps impelling researchers concentrate on developing and optimizing biomarkers. Current biomarkers focusing on immune status and prognostic models were not sufficient in screening out appropriate patients to receive ICIs treatment.
Thus, it is important to focus on the comprehensive immune status in TME. Based on this, we aimed to predict the prognosis of ccRCC by calculating immune and stromal scores in the aspect of comprehensive level, and explore potential immune related genes to predict prognosis of ccRCC.
In the present study, we found that immune score was a poor prognostic factor of OS.
Since stromal score was not associated with prognosis, all subsequent analysis was based on immune score. After comparing gene expression profiles of cases with low and high scores, a total of 1433 genes related to immune response were identified. Functional enrichment analysis showed that most of these genes were involved in tumor microenvironment. Further survival analysis revealed that 890 genes were found to be associated with prognosis, and PPI network analysis also found that these genes were associated with immune or inflammatory responses, such as IL6, IL10, CTLA4, FOXP3, and ITGAX in module 1. Eventually, we identified top 10 prognosis-related DEGs to be integrated as one comprehensive parameter to predict prognosis of ccRCC. The   comprehensive parameter was consisted of IL10RA, FCER1G, SASH3, TIGIT, RHOH,   IL12RB1,  have been demonstrated to be associated with the development and progression multiple malignancies (47)(48)(49)(50)(51)(52)(53)(54)(55)(56). TIGIT (T cell immunoglobulin and ITIM domain), IL12RB1(Interleukin 12 Receptor Subunit Beta 1), AIF1 (allograft inflammatory factor 1) and SP140 (SP140 nuclear body protein) mainly involved in the process of immune and inflammation. As a novel immune checkpoint receptor similar to PD-1, TIGIT is upregulated on CD8+ TILs and Tregs in multiple tumors, exhibiting therapeutic benefits in animal models (57,58). IL12RB1 promotes delayed type hypersensitivity and autoimmunity (59). AIF1 is mainly associated with allograft rejection, autoimmune diseases and vasculopathy, etc. (60).
SP140 was demonstrated to be associated with multiple sclerosis (61).
We also compared subtypes of immune cells between low and high immune scores to find potential factors attributed to the situation of high immune score. It seemed that increased compositions of CD8 T cells and decreased proportions of CD4 memory resting T cells and mast resting cells were the most important in contributing to higher immune score. Therefore, the higher the immune score, the worse the prognosis may be mainly related to CD8 T cells, which is consistent with previous research evidence, suggesting that the reactions of immune cells were more pronounced as the progression of tumor grade/biological malignancy, possibly due to the increased antigenicity of tumor cells (62,63).      The compositions of CD8 T cells, CD4 memory resting T cells and mast resting cells were significant different between low and high immune scores

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.