Construction of an Immune-related Gene Model for Predicting Prognosis and Immune Inltration in Clear Cell Renal Cell Carcinoma

Background: Clear cell renal cell carcinoma (ccRCC) is a common pathological type of kidney cancer with high immune inltration that has been proven to be treatable with immune checkpoint inhibitor (ICI) therapy. However, the role of immunity in ccRCC remains poorly understood. Therefore, this paper aimed to develop and validate a novel immune-related prognostic marker to predict both the overall survival rate (OS) of ccRCC patients and the response to ICI therapy. Methods: Based on the transcriptome and clinicopathological data of ccRCC from The Cancer Genome Atlas (TCGA) dataset and immune-related genes (IRGs) from immune datasets, IRGs related to prognosis were screened to construct an IRG prognostic index (IRGPI) via coexpression analysis and Cox regression. After verifying that IRGPI was a prognostic indicator independent of clinical parameters, a nomogram was established. In addition, functional enrichment analysis, the CIBERSORT algorithm and single-sample gene set enrichment analysis (ssGSEA) were performed to compare the molecular and immune characteristics of IRGPI-dened subgroups. Finally, the expression of immunosuppressive genes, tumor mutational burden (TMB) and the TIDE algorithm were used to predict the response of ICI therapy in different IRGPI subgroups. Results: A total of 11 IRGs (IFNG, XCL1, APOBEC3G, CD86, CXCR3, IL10RA, IL2RG, CD244, SH2D1A, CD3D and FCER1G) were included in the IRGPI module. IRGPI high patients had a worse OS and had poorer clinical pathological status than IRGPI low patients. A nomogram containing clinical features and IRGPI scores may guide the clinical practice of ccRCC. Chemokine signaling pathways were mainly involved in functional enrichment analysis. Furthermore, the IRGPI could effectively reect the immune characteristics and immune checkpoint gene expression of ccRCC and the response to ICI therapy. Conclusions: The IRGPI is a promising biomarker for determining prognosis and has the potential to be used to predict immunotherapy response in ccRCC.


Introduction
Clear cell renal cell carcinoma (ccRCC) is a common pathological type of kidney cancer, representing almost 70%-80% of pathological cases. Approximately 20-30% of ccRCC progresses to metastasis quickly 1 , which makes treatment challenging. ccRCC is a disease without a cure, and the effectiveness of surgery, radiotherapy, and chemotherapy is limited at the metastatic or advanced stage. Compared to conventional therapies, novel targeted agents, immune checkpoint inhibitor (ICI) treatments and combination therapy strategies have revolutionized the management of ccRCC 2,3 . However, only a subset of patients achieved durable responses when treated with ICIs, which may be attributed to changes in dissimilar biochemical parameters and a high level of cancer heterogeneity caused by changes in the tumor immune microenvironment (TIME).
TIMEs, such as immune cell compositions and tumor-in ltrating lymphocyte (TIL) status, also affect tumor immunity. Although growing evidence has highlighted the TIME as an important prognostic indicator that can enhance the potential of precision immunotherapy, its clinical implication on ccRCC patient survival is not fully understood [4][5][6] . Promising biomarkers for determining prognosis and predicting immunotherapy response in ccRCC are urgently needed 7 . With the establishment of The Cancer Genome Atlas (TCGA) database and the development of gene sequencing technology, it has become possible to comprehensively analyze the immunogenomic landscape of ccRCC.
In this study, immune-related genes (IRGs) related to prognosis were identi ed via weighted gene coexpression network analysis (WGCNA), a systematic biological method, and univariate survival analysis to construct an IRG prognostic index (IRGPI). We then identi ed the correlation between immune in ltration and the IRGPI. Finally, we examined the possibility of the IRGPI to predict immunotherapy response. The results showed that the IRGPI was an individualized immune-related prognostic index independent of clinical factors that can determine the prognosis and predict the immunotherapy response of patients with ccRCC. A graphical abstract is shown in Figure 1.

Material And Methods Patients And Data Acquisition
In this study, all database information were obtained from the open-access database and all methods were carried out in according with relevant guidelines and regulations. The RNA-seq, clinicopathological and gene mutation data of 611 ccRCC samples, including 539 cancer tissues and 72 normal tissues, were retrieved from TCGA database (Furthermore, the list of IRGs was obtained from the ImmPort database () and InnateDB ().
Functional and Pathway Enrichment Analyses of Immunerelated Differentially Expressed Genes (DEGs) Difference analysis was used to identify DEGs between 539 cancer and 72 normal tissues from TCGA by the "limma" package of R and Wilcox test, visualized by the "pheatmap" package of R. The cutoff values were false discovery rate (FDR)<0.05 and |log 2 [fold change (FC)] | >1. Overlapping genes between the DEGs and IRGs were selected for further functional enrichment analysis, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. Both analyses were performed by the "clusterPro ler" package of R. Color gradients depict high (red) or low (blue) expression levels. P<0.05 was considered statistically signi cant.
Identi cation of Immune-related Hub Genes WGCNA was used to describe the associations of gene clusters inside a comprehensive network and evaluate the correlations between gene modules and phenotypes 8 . We evaluated the immune-related DEGs and used the R package "WGCNA" to construct different gene coexpression modules. First, the similarity matrix was constructed by Pearson analysis of the immune-related DEGs. Next, an adjacency matrix and a topological overlap matrix (TOM) were obtained according to the power value (β = 3). A dissimilarity measure (1-TOM) was used to cluster the immune-related DEGs, and a dynamic tree cut algorithm was established for module recognition. For each module, we correlated the module eigengenes (MEs) with clinical features (tumor or normal status). Ultimately, we identi ed 7 modules by setting the merging soft-thresholding power at 0.25. We selected 50 immune-related hub genes with a threshold degree (the number of connections between one point and other points in the network) of > 20.

Identi cation of Prognosis-related IRGs (PRIRGs)
We excluded samples from patients with ccRCC with incomplete clinical prognostic information and retained 513 samples with a follow-up time >30 days as the entire group. The "caret" package of R was used to randomly divide the 513 ccRCC samples into a training cohort and a testing cohort at a ratio of 2:1. Detailed information is presented in Supplementary Table S1. The training TCGA survival information was used to determine the prognosis-related genes through univariate Cox regression analysis by using "survival" package and "survminer" package of R software, and among 50 immune-related hub genes, 34 signi cant PRIRGs were selected for further analysis (P < 0.05, log-rank test). To determine the potential regulatory mechanisms, functional enrichment analysis of the 34 PRIRGs was performed with R software. The gene network of PRIRGs was generated with online tool GeneMANIA () and protein networks were generated with online tool STRING ().

Construction and Validation of the IRGPI
Through multivariate Cox regression analysis, the genes signi cantly affecting overall survival (OS) were identi ed and used to construct an IRGPI based on the PRIRGs. The IRGPI of each sample was calculated as follows: IRGPI = Expression GENE1 * Coe cient GENE1 + Expression GENE2 * Coe cient GENE2 + … Expression GENEn * Coe cient GENEn . Ultimately, patients were randomly assigned to two groups (high and low IRGPI) according to the median IRGPI score in the training group by using "survival" package and "survminer" package of R software. The prognostic power of the IRGPI was evaluated by log-rank tests for survival analysis and the 1-,3-, and 5-year receiver operating characteristic (ROC) curve analyses among the TCGA training (n=342), testing (n=171) and entire cohorts (n=513) by using "timeROC" package of R software. Whether the IRGPI score and clinical factors displayed independent prognostic value for OS was evaluated by univariate and multivariate Cox regression analyses. Moreover, chi-square tests were used to assess the association of the signature with other clinical factors (age, sex, grade and stage) and visualized using the "ComplexHeatmap" package of R software, and nomogram models were drawn.
Comprehensive Analysis of Immune Characteristics and Immunotherapy Response in Different IRGPI Subgroups In the signaling pathway analysis, gene set enrichment analysis (GSEA) based on GO and KEGG analyses was performed to compare the differential gene sets between two IRGPI subgroups. The normalized enrichment score (NES), FDR and adjust P-value were used to adjudicate each enriched pathway. The GO gene sets (c5.go.v7.4.symbols.gmt) and KEGG gene sets (c2.cp.kegg.v7.4.symbols.gmt) in the GSEA database were obtained from .
Cell type identi cation by estimating relative subsets of RNA transcripts (CIBERSORT) was utilized and iterated 1000 times to quantify the relative proportions of 22 human immune cell subsets in 513 ccRCC samples by using "e1071" package and "preprocessCore" package of R software, and the correlation between the 22 immune cells and prognosis was analyzed. To further de ne the immune characteristics of the IRGPI subgroups, we quanti ed the enrichment of 30 immune cell subgroups and immune function in every ccRCC sample based on speci c gene marks by the singlesample GSEA (ssGSEA) score and compared the score between two IRGPI subgroups by using "GSVA" package and "GSEABase" package of R software [9][10][11] .The R packages "ggpubr" then used to visualize the different expression in two IRGPI subgroups.
The expression of some critical immune checkpoints and immunosuppressive genes in different IRGPI subgroups and the correlation between the genes and the IRGPI score were investigated. Tumor mutational burden (TMB) can be utilized to assess the therapeutic effect of immunotherapy in some cancers 12 . We performed correlation analysis between the IRGPI score and TMB by using gene mutation information obtained from TCGA. In addition, we employed the tumor immune dysfunction and exclusion (TIDE) algorithm and calculated microsatellite instability (MSI) obtained from TIDE website (http://tide.dfci.harvard.edu/) to predict the immune response and determine the prognostic value of the IRGPI in patients treated with immunotherapy.

Statistical Analysis
Statistical analysis or visualization were performed by R software (version 4.1.1). Continuous normally distributed data were compared by t-tests. Categorical variables were compared by chi-square tests. Spearman's coe cient was used to analyze the correlations between TMB and IRGPI score. Univariate KM survival analysis was performed, and the log-rank test was used to analyze differences between the curves. Multivariate survival analysis was performed with Cox proportional hazards models. P less than 0.05 was considered statistically signi cant.

Results
Identi cation of Immune-related Hub Genes In the differential expression analysis of 539 tumor vs. 72 normal tissues, a total of 9459 DEGs were identi ed ( Figure 2A). By intersecting these genes with the lists of IRGs downloaded from ImmPort and InnateDB, 946 differentially expressed IRGs (Table S2) were obtained, of which 780 IRGs were upregulated (red) and 166 were downregulated (blue) in renal cancer tissues compared with normal tissues ( Figure 2B, C). The enrichment analysis showed that the 946 differentially expressed IRGs were signi cantly associated with 2015 GO function terms and 86 KEGG pathways, and the top 30 GO function terms and top 8 KEGG pathways enrichment were visualized in Figure 2D, E. To determine the immune-related hub genes, we visualized the distribution of expression of the differentially expressed IRGs (n=946) and obtained 7 different modules based on a power of β=3 and used the dynamic hybrid tree cutting method to merge the highly familiar modules with a minimum module size of 25 and a cutting line <0.25 ( Figure 3A, B, C). Only one module (brown) was negatively correlated with ccRCC, whereas the green, red, black, blue and yellow modules were positively correlated with ccRCC. The modules contained 137 genes (brown), 61 genes (green), 36 genes (red), 33 genes (black), 534 genes (blue), and 130 genes (yellow) ( Figure 3D). Among the above modules signi cantly related to tumor and normal samples, we obtained 50 immune-related hub genes with a threshold weight > 0.2 and a threshold degree of > 20. A total of 34 PRIRGs were correlated with ccRCC patient OS, as determined by univariate Cox regression analysis in the training cohort. (P < 0.05, log-rank test) ( Figure 3E); therefore, these PRIRGs were further analyzed. GO analysis of the genes showed that they were signi cantly enriched in the terms T cell activation (BP), external side of plasma membrane (CC), and cytokine receptor binding (MF) ( Figure 4A). KEGG analysis found that the PRIRGs were signi cantly enriched in cytokine-cytokine receptor interactions and immune responses ( Figure 4B). The gene-gene and proteinprotein interaction networks are shown in Figure 4C, D.
To further explore the characteristics of the IRGPI score in a large cohort and draw more convincing conclusions, we conducted further experiments in the entire TCGA cohort. Univariate Cox regression showed that age, grade, stage and the IRGPI score were associated with the prognosis of ccRCC. After adjusting for age, sex, grade and stage, multivariate Cox regression analysis con rmed that the IRGPI (HR: 1.566, 95% CI: 1.284-1.910, P < 0.001) was an independent prognostic factor ( Figure 5C). Chisquare tests were employed to evaluate the association of the IRGPI subgroups and other clinicopathological factors. IRGPI high was more commonly detected in patients with poor tumor differentiation (grade 3-4), advanced pathological stage (stage -), large tumor size and deep local in ltration area(T3-4), distant metastasis(M1) and higher N stage (P < 0.05, Figure 6A). These ndings indicated that the risk signature was well correlated with a variety of clinical factors. The higher the IRGPI score, the poorer the clinical pathological status. Furthermore, a nomogram including clinical factors and IRGPI score for the entire TCGA cohort was generated ( Figure 6B

Immune Characteristics of Different IRGPI Subgroups
To estimate immune cell in ltration in the entire ccRCC cohort, CIBERSORT was adopted to evaluate the relative proportions of 22 tumor-in ltrating immune cell subtypes in different IRGPI subgroups. As shown in Figure 8A, B, the in ltrating immune cells with high proportions in ccRCC included resting memory CD4 T cells, CD8 T cells and macrophages. Among the 22 immune-in ltrating cells, monocytes and resting mast cells had higher in ltrations in the IRGPI low subgroup, while activated memory CD4 T cells, follicular helper T (Tfh) cells, regulatory T cells (Tregs) and M0 macrophages had higher in ltrations in the IRGPI high subgroup (P<0.05). We analyzed 30 immune-related genomes representing different immune cell types, functions, and pathways ( Figure 8C). Enrichment of immune signature scores indicated that several immune cell types, including multiple types of dendritic cells (DCs), CD8 T cells, macrophages, T helper (Th) cells, Tfh cells, Th1 cells, Th2 cells, TILs, Tregs and tumor-associated macrophages (TAMs), and several immune terms, including APC costimulation and coinhibition, chemokine receptors (CCRs), checkpoints, cytolytic activity, HLA, in ammation promotion, MHC class I, T cell costimulation and coinhibition, and type IFN response, were enriched in the IRGPI high subgroup, while mast cells and the type II IFN response were enriched in the IRGPI low subgroup. Immunosuppressive cells were present at higher proportions in the IRGPI high subgroup. In addition, we analyzed the association between the OS of ccRCC patients and 22 immune cell populations: a high abundance of in ltrating resting memory CD4 T cells, neutrophils, monocytes, resting mast cells, M2 macrophages, resting DCs and naïve B cells was related to better survival rates, while a high abundance of in ltrating Tregs, Tfh cells, activated memory CD4 T cells, plasma cells, M1 macrophages and M0 macrophages was related to poorer survival rates ( Figure S1).

The IRGPI Predicts Response to Immunotherapy
The expression of immunosuppressive genes was investigated. The expression of PD-1 (PDCD1), PD-L2 (PDCD1LG2), CTLA-4, LAG-3, GZMB, BTLA, TIGIT, and TGFB1 was higher in the IRGPI high subgroup than in the IRGPI low subgroup and had a positive correlation with the IRGPI score ( Figure 9A and Table 1).
The above results indicate that the IRGPI low group was more likely to respond to immunotherapy. TIDE can be used to predict ICI response and ICI therapy outcome and to reveal mechanisms of tumor immune escape 13,14 . The correlation among the IRGPI, TIDE score and MSI in ccRCC patients was explored. In our results, the IRGPI high subgroup had a higher TIDE score than the IRGPI low subgroup, implying that IRGPI high patients had a higher potential for immune escape and thus were less likely to bene t from immunotherapy than IRGPI low patients. Moreover, higher tumor TIDE scores are associated with poor patient survival under anti-PD1 or anti-CTLA4 treatments. Additionally, we found that the IRGPI low subgroup had a higher MSI score and a lower T cell dysfunction score, but no statistically signi cant difference in T cell exclusion between the high and low groups was observed ( Figure 9B). TMB is a wellknown immunotherapy biomarker. The results showed that the IRGPI was positively correlated with TMB (R=0.16, P=0.0048. Figure 9C), and the VHL and PBRM1 genes had the highest TMB. Table 1 Correlation between immunosuppressive genes and the IRGPI.  17,18 . An increasing number of immune genes have been discovered; however, the immune molecular mechanisms of ccRCC are poorly de ned. An accurate prediction of prognosis based on immune genes is essential for guiding patients, individualized monitoring and selecting patients for clinical trials.
In the present study, based on ccRCC immune gene datasets, we identi ed differentially expressed IRGs. GO function and KEGG pathway analyses were performed to explore the role of these differentially expressed IRGs in ccRCC. As expected, the results revealed that genes were enriched in processes such as the adaptive immune response, immunoglobulin complex, antigen binding and cytokine-cytokine receptor interaction. Based on the results of the univariate Cox analysis, we used WGCNA to identify 34 PRIRGs affecting patient OS. GO function and KEGG pathway analyses revealed that these genes were enriched in T cell activation and related to the terms external side of plasma membrane, cytokine receptor binding and cytokine-cytokine receptor interaction. Previous studies have shown that chemokines and their receptors can recruit immune cell subsets into the tumor microenvironment 19 . Chemokines also in uence cancer progression and therapy in ccRCC 20 . This evidence suggests that the 34 immune-related hub genes can function not only as potential biomarkers for prognosis but also as therapy targets for ccRCC.
We randomly divided the 513 ccRCC patients into training and testing cohorts for validation of the IRGPI model we constructed based on 11 survival-associated differentially expressed IRGs (IFNG, XCL1, APOBEC3G, CD86, CXCR3, IL10RA, IL2RG, CD244, SH2D1A, CD3D and FCER1G) to draw a more reliable conclusion. The IRGPI proved to be a valid prognostic immune-related biomarker for ccRCC, with worse survival in IRGPI high patients and better survival in IRGPI low patients in the three TCGA cohorts. Although IFNG enhanced immune function, it promoted T cell exhaustion through PDL1, and its functions in cancer cells and immune cells were opposite 21 . High expression of IFNG was associated with poor prognosis in ccRCC 22 . XCL1 is a C class chemokine, that is produced by T and natural killer cells during in ammatory responses 23 . APOBEC3G was overexpressed in renal cancer 24 . The APOBEC family can drive tumor evolution and may induce recurrence, metastasis, and/or therapy resistance 25 . The interaction between CD86 and ligand is the main factor inducing T lymphocyte proliferation and IL-2 production. CXCR3 is the receptor of CXCL10, and alternative splicing of CXCR3 (producing CXCR3-A and CXCR3-B) is partially related to the different functions (tumor-suppressive and tumor-promoting functions) of CXCL10 in renal cancer. Downregulation of CXCR3-B promoted renal cancer cells proliferation and migration 26 . Interleukin 10 receptor subunit alpha (IL10RA) is mainly expressed on most hematopoietic cell and functions as an anti-in ammatory factor 27 . FCER1G was associated with the progression and poor prognosis of ccRCC 28 . CD244, as a receptor of NK cells, plays important roles in the anticancer function of NK cells. In the IRGPI calculation formula, the coe cients of IFNG, XCL1, APOBEC3G, IL10RA, IL2RG and FCER1G were positive, indicating that high expression of the above genes had a worse outcome in ccRCC patients.
However, the coe cients of CD86, CXCR3, CD244, SH2D1A and CD3D were negative, suggesting that the above genes may function as protective factors for ccRCC patients. In summary, IRGPI could be used as a biomarker associated with immune cell dysfunction(exhaustion) and tumor cell proliferation and migration.
The IRGPI presented moderate value in predicting the prognosis of ccRCC patients independent of other clinical factors. Furthermore, a close correlation was found between the IRGPI and some clinical factors.
To further improve the correctness of the prognosis model, a nomogram incorporating the IRGPI score and clinical factors was constructed and veri ed, and this nomogram could also predict the OS of patients to guide clinical practice.
To understand the molecular mechanism of the two subtype groups, GSEA was used to analyze the differences in activated pathways. In the GO and KEGG GSEA, active immune responses and chemokine signaling pathways were found to be enriched in the IRGPI high subgroup. The JAK/STAT signaling pathway downstream of the chemokine signaling pathway was also enriched in the IRGPI high subgroup, indicating that in ammation may be involved in tumor progression. The enrichment of immune response activation in the IRGPI high subgroup seems to contradict the worse prognosis of this group. However, the phenotypic and functional pro les of immune cells vary depending on the tumor subtype and the TIME, and the role of immune cells in ccRCC and the surrounding immune environment needs to be further studied.
Immune cell in ltration plays an important role in tumor progression and metastasis. Understanding the landscape of the TIME could help to nd new ways to treat ccRCC or to alter the TIME to improve the effectiveness of immunotherapy. The immune characteristics of the IRGPI high subgroup indicated that the inability of TILs to mediate antitumor functions is likely due to the overrepresentation of immunosuppressive cells [e.g., Tfh cells, uncommitted (M0) macrophages, HLA-G/E-expressing cells, Tregs and TAMs] and promotion of in ammation. T cells exposed to persistent antigen and/or in ammatory signals can develop defects and become anergic 29 . In contrast to almost all other tumors, ccRCC tumors with increased in ltration of CD8+ T cells and CD4+ T cells have high tumor grade and poor prognosis 30,31 . Macrophages play an immunosuppressive role in ccRCC patients by increasing the activity of Tregs 32 . It has been found that increased expression of nonclassical HLA class I molecules (HLA-G and HLA-E) negatively affects T cell function 33 . TAMs are the main sources of vascular permeability factors, immunosuppressive mediators and proin ammatory cytokines in the TIME. In ccRCC, the accumulation of C1q+ TAMs were correlated with an exhausted T cell phenotype and poor OS 34 . These ndings support our study. The immune escape of tumors is an essential step in the process of cancer progression to evade the immune system and prevent the body from producing an effective antitumor response. In addition to immunosuppressive cells, immunosuppressive molecules are also involved. In this study, the expression of most immunosuppressive genes was higher in the IRGPI high subgroup and had a positive correlation with the IRGPI score. The expression of immunosuppressive molecules was also a characteristic of T cell dysfunction. Increased expression of PD-1 was found to be associated with high-risk RCC tumors, and immune cell PD-1 may promote cancer progression by contributing to immune dysfunction 35 . Since the expression of TGFB1 and HAVCR2 was much higher than that of PD-1 in the TIME, TGFB1 and HAVCR2 may be better targets for immunotherapy in ccRCC.
Since the composition of immune cell subtypes and the expression of immunosuppressive molecules were different between the two IRGPI subgroups, IRGPI might re ect different immune bene ts from ICI therapy (anti-PD1 and anti-CTLA4 therapy) identi ed with TIDE. In our study, TIDE prediction showed that IRGPI high patients had a lower ICI response, possibly because of immune evasion via T cell dysfunction.
MSI is a signi cant factor affecting ICI therapy. There is evidence indicating that MSI is positively correlated with survival, and patients with high MSI were predicted to bene t from novel immunotherapies in clinical trials 36, 37 . Studies have revealed that TMB is a biomarker for predicting immunotherapy response in some cancer types 12,38 . A recent study discovered that high TMB is correlated with poor prognosis and might inhibit immune in ltration in ccRCC 39 . Based on the above evidence, we speculate that our prediction model based on PRIRGs may serve as a reliable immunerelated tool for cancer ICI therapy.
In conclusion, the IRGPI is a prognostic index independent of clinical factors for estimating the prognosis in ccRCC patients and might predict immunotherapy response in ccRCC patients.