Immune Cell In ltration Score and Predicting Model Among Lower-grade Gliomas Based on Immunogenomic Clusters

Zhile Wang PUMCH: Peking Union Medical College Hospital https://orcid.org/0000-0001-6820-8383 Fucun Xie PUMCH: Peking Union Medical College Hospital Yijun Wu PUMCH: Peking Union Medical College Hospital Li Wang PUMCH: Peking Union Medical College Hospital Yi Bai Tianjin First Central Hospital Junyu Long PUMCH: Peking Union Medical College Hospital Xiang Wang (  wangxiang@pumch.cn ) Department of Medical Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China. https://orcid.org/0000-0002-2202-2871

The differences in classi cation in uenced both the treatment and prognosis of glioma patients. Surgical resection with postoperative radiation was still recommended for LGG patients, especially for those with super cial lesions [5][6][7]. And relevant studies recommended chemotherapy agents as additional strategies for unresectable cases [8]. Despite all the strategies, recurrence or progression of original lesions was still a challenge. Novel strategies for the treatment of recurrent gliomas were urgently needed.
Though the brain had been recognized as an immune-privileged organ for over half a century, recent studies showed that tumor microenvironment (TME) of brain malignancies were lled with enough immune cells to generate immune responses [9,10]. Naturally, researchers focused on immune-checkpoint inhibitors as possible solutions for recurrent glioma. An earlier study proved that Inhibitors of PD-1/PD-L1 or CTLA-4 were effective to gliomas in vitro [11]. Unfortunately, the failure among in vivo clinical trials [12,13] stalled further exploration for immunotherapies.
The study was initially designed to explore possible genes as immunotherapy targets and evaluate immune status by scores. We collected the study cohort from The Cancer Genome Atlas (TCGA) database and enrolled a total of 495 patients. Extra gures of the study were shown in additional les in details [see Additional le 1].

Data And Materials Datasets and Samples
The original dataset of LGG samples (including transcriptome sequencing data, clinical data, and mutation information) was derived from the TCGA-LGG database, and accessed through the TCGA dataset website (https://portal.gdc.cancer.gov/). mRNAseq_693 [14,15]  permutations during calculation). The stromal, immune, and ESTIMATE scores were collected from single sample gene set enrichment analysis (ssGSEA) algorithm by 'estimate' R package.

Consensus Clustering
The k-means clustering algorithm was conducted based on the immune cell types using Euclidean distance. The consensus matrix plot (shown in Figure S1) and Probably Approximately Correct algorithm were set as the cluster number selection criteria. The clustering process for genes was similarly conducted. A nal of four ICI clusters and two gene clusters were selected, and were named "A", "B", "C", "D" ICI clusters and "A", "B" gene clusters individually.
ICI Score Calculation ICI score was de ned to describe the overall status of prognostic genes in the study. And the calculation of the ICI scores was accomplished in the following process: The Boruta algorithm trained a dataset based on the random forest classi cation [19], and adopted a feature importance ranking measure to evaluate each immune cell type's importance. After continuous deletion of unimportant features, the Boruta would score the importance of each immune cell type. The values of ICI score were calculated through the following equation.

Gene Set Enrichment Analysis
To quantify the enrichment level of immune cells and immune-related pathways among samples, we conducted enrichment analysis by employing single-sample gene set enrichment analysis (ssGSEA) based on the Kyoto Encyclopedia of Genes and Genomes database. The pathways were scored according to the expression level of genes among samples. And the highly enriched pathways were selected and plotted based on both the enrichment score and the ICI score.
Tumor Mutation Burden TMB was de ned as the total count number of somatic mutations in samples. TMB values were calculated based on the mutation data from TCGA-LGG. The clinical value of the TMB score had been veri ed in earlier studies. And the relationship among TMB, ICI score, and overall survival were further analyzed through the Kaplan-Meier method.
The levels of immune cell types were calculated through the 'CIBERSORT' R source based on the transcriptome expression levels. The scoring process of in ltrated immune environment was conducted by 'estimate' R package. Both ICI and gene expression were clustered ('ConsensusClusterPlus' R package) for further analysis. The correlation picture of immune cells was plotted ('corrplot' R package). And the heatmap of ICI and gene clusters were drawn ('pheatmap' R package). Kaplan-Meier analysis and logrank tests ('survival' R package) further plotted survival curves among clusters. The study's box plots were plotted using 'ggpubr' R package, with dot plots using 'enrichplot' R package. Additional enrichment analysis was conducted through GSEA software (version 4.1.0). TMB values were calculated through R software, with oncoplots plotted using 'maftools' R package.
Clinical features with ICI score groups were screened to nd prognostic factors through the least absolute shrinkage and selection operator (LASSO) regression ('glmnet' R package). Univariate and multivariate Cox regression ('survival' R package) was then conducted to select independent prognostic factors for model-building. The nomogram model was built using the 'rms' R package based on samples randomly selected from the TCGA-LGG dataset. The internal validation was based on the testing group (the rest samples of TCGA-LGG). And external validation was conducted using data from the CGGA dataset (combination of mRNAseq_693 and mRNAseq_325 with batch effect confounding removed through 'sva' R package). ROC curves of the nomogram in different datasets were plotted using 'survivalROC' R package. And calibration curves were plotted through 'rms' R package. Time-dependent decision curve analysis (DCA) was conducted using the 'stdca' R source.

Data Preparation
We collected the information of 511 patients (529 samples) was from the TCGA-LGG dataset. Due to the limited mutation data, 495 patients were nally enrolled. Moreover, we recorded another 1,018 patients for external validation from the CGGA database (mRNAseq_693 and mRNAseq_325). All transcriptome expression data among datasets accepted the following transition before data analysis: .
We acquired the immune in ltration status of 529 samples through 'CIBERSORT' analysis. 314 out of 529 samples (305 out of 511 patients) passed the veri cation process (p<0.05). Furthermore, we calculated the stromal, immune, and ESTIMATE scores. The Pearson correlation plot of immune cell types among all patients was shown in Figure 1C.

Immune Cell In ltration Cluster
Four ICI clusters were built based on the types of in ltrated immune cells among all 305 patients. The consensus matrix plot was shown in additional les [see Additional le 1]. Heatmap of four immune cell in ltration (ICI) clusters were plotted in Figure 1A. Survival curves of each ICI cluster were plotted and shown in Figure 1D. Signi cant differences were observed between ICI cluster A-B (log-rank test; p=0.003; Figure 1D), B-C (log-rank test; p<0.001), and C-D (log-rank test; p=0.034).
The differences in prognosis among different ICI clusters might result from the differences in speci c genes. Moreover, we compared these clusters in immune-checkpoint gene expression and immune cell types. Cluster B showed signi cantly higher gene expression levels in PD-1, PD-L1, CTLA-4, IDO1, and CD161 expression than the rest three clusters (Kruskal-Wallis test; p<0.01; Figure 1E-I). Cluster C had a better prognosis than cluster D. However, the differences between the two clusters were not signi cant enough in PD-1, CTLA-4, and IDO1 expression (Kruskal-Wallis test; p>0.05). We further compared the ICI clusters in immune cell types. ICI cluster B had signi cantly more in ltrated immune cells in the following types: naïve B cells (Kruskal-Wallis test; p<0.01; Figure 1B)

Gene Expression Cluster
To identify the critical prognostic factor among the ICI clusters, we selected the genes that differed in expression levels for further analysis. We reclassi ed all 511 patients into two consensus clusters according to listed genes, including gene cluster A (397 patients) and B (114 patients). The consensus matrix plots for gene clusters were also shown in additional les [see Additional le 1]. The heatmap of gene clusters was shown in Figure 2A. We further classi ed genetypes according to the correlation coe cients between expression level and gene cluster. Genetype A included data of positive coe cients, and genetype B consisted of negative-coe cient data. We conducted intersection (495 patients) of the gene cluster dataset (511 patients) and clinical dataset (495 patients) for Kaplan-Meier analysis. And gene cluster A (383 patients) acquired signi cantly higher survival probabilities than gene cluster B (112 patients) (log-rank test; p<0.001; Figure 2B). Gene ontology enrichment analysis showed the high-level expression pathways concerning biological process, cellular component, and molecular function in both gene cluster A and B groups ( Figure 2C). All pathways listed in the bubble plot were signi cantly highly expressed in tumor tissues.

Immune Cell In ltration Score
To better describe the correlations between gene clusters and expression levels, we divided the patients by the correlation coe cients into two genetypes through Boruta analysis (Boruta analysis showed signi cantly better performance in large number gene weighting than traditional regression models [20]).
Further reduction of dimensions was conducted through PCA. The ICI scores (based on results of Boruta and PCA) contained the information of prognostic genes among LGG patients. The relationship among ICI score, gene cluster, and the prognosis was shown in the alluvial plot ( Figure 3A). The high ICI score group had signi cantly lower survival probabilities than the low ICI score group (log-rank test; p<0.001; Figure 3B). Moreover, the high ICI score group showed higher gene expression in CXCL10, PD1, IDO1, LAG3, GZMB, GZMA, CD8A, HAVCR2, CXCL9, CTLA4, PRF1, and PD-L1 (Kruskal-Wallis test; all p values<0.001; Figure 3C). Enrichment analysis was conducted at the pathway level, and the immunerelated highly-enriched pathways for the high ICI score group were shown in Figure 3D.

Tumor Mutation Burden
Mutation information for both the high ICI score group and low ICI score group was rstly analyzed through the 'maftool' R package, and the oncoplots for each group were plotted in Figure 4A and 4B. Mutations in IDH1, TP53, ATRX, and CIC frequently happened in both groups, and non-missense mutation occurred most frequently in ATRX. Further, we calculated the tumor mutation burden value, and the high ICI score group got signi cantly higher TMB values (Kruskal-Wallis test; p<0.001; Figure 4C). The correlation coe cient between TMB values and ICI scores was 0.17 (Spearman test; p<0.001; Figure 4D).
Patients with high TMB values acquired signi cantly lower survival probabilities (log-rank test; p<0.001; Figure 4E). To better describe the prognostic relationship of both ICI scores and TMB values, patients were divided into four groups. High TMB values with high ICI score subgroup acquired the minuscule survival pro ts (log-rank test; p<0.001; Figure 4F), and low values for both TMB and ICI scores received the most survival bene ts (log-rank test; p<0.001; Figure 4F).

Clinical Features and ICI Score Group
The differences in prognosis between high and low ICI score groups were signi cant. Nevertheless, there still might be selection bias during the enrolling process. The limited number of patients prevented us from conducting propensity score matching (PSM) to minimize the effect of selection bias. Therefore, we further compared patients' survival outcomes with different ICI scores in multiple clinical feature subgroups. Signi cant differences were observed between high and low ICI score subgroups in different groups divided by age (log-rank test; p≤0.001; Figure 5B and C), gender (log-rank test; p<0.001; Figure 5E and F), and histology (log-rank test; p≤0.01; Figure 5H-J). And we also observed that the ICI score distribution was not relatively even among patients with different clinical features. Patients in the following groups had signi cantly lower ICI scores: age < 56 years (vs. ≥ 56 years; Wilcox test; p<0.01; Figure 5A) and oligodendroglioma (vs. astrocytoma; Wilcox test; p<0.01; Figure 5G).

Nomogram Model
Clinical features and ICI score group were analyzed through the LASSO regression, univariate Cox regression, and multivariate Cox regression. Age, grade, and ICI score were identi ed as independent prognostic factors (Wald test; all p values<0.05; Table 1). We randomly divided all 495 available patients into the training group (248 patients) and test group (247 patients) based on the three risk factors. The nomogram model was built based on the training group ( Figure 6A). Further internal validation was conducted among 247 patients in the test group. The 1-year, 3-year, and 5-year ROC curves in the training and test groups were shown in Figure 6B and C. The AUC values were 0.895, 0.754, and 0.73 for the training set. And the AUC values of internal validation were 0.851, 0.86, and 0.768 for 1-year, 3-year, and 5year truncation time. We further draw the calibration curve for internal validation to correct the model's possible bias ( Figure 6E). Figure 6G showed the 3-year time-dependent DCA curve in internal validation, and the rest DCA curves were shown in additional les [see Additional le 1]. External validation was conducted based on the 591 patients from the CGGA database. The ROC was plotted in Figure 6D, and the AUC value was 0.725, 0.744, and 0.735 for 1-year, 3-year, and 5-year truncation time. The calibration curve and time-dependent DCA of external validation were shown in Figure 6F and H.

Discussion
Currently, surgery was recommended as rst-line therapy for regional resectable glioma [3]. Relapse of glioma, especially among glioblastoma patients, was quite common after surgery. Therapeutic options for relapse glioma did exist, but the patients' prognosis was not satisfying [5]. New treatment strategies for glioma were in urgent demand.
Ingression of immune cells in the microenvironment of glioma was widely reported, and the differences in immune cell types were closely related to prognosis of glioma patients. Myeloid cells were the key mediators of immune suppression [3]. Microglia, dendritic cells (DCs), and macrophages, especially M2 macrophages, were signi cantly highly expressed in the TME [21][22][23]. The distribution of immune cells among different clusters were shown in Figure 1B and 2D. All cluster which had worse prognosis showed higher level of the following immune cells in common: naïve B cells, CD8 T cells, resting memory CD4 T cells, M0 macrophage, M1 macrophage, and neutrophiles. Though these immune cell types had not been directly proved as risk factors for glioma, these cells were possible targets for immunotherapy in the future. Due to the lack of information, we decided to list the immune cell types for reference of further studies.
Immune-checkpoint had been a signi cant focus among all immunotherapies. Indoleamine 2,3deoxygenase enzymes and PD-1/PD-L1 [24] were highly expressed to affect the immune tolerance of T cells [25]. Besides, CTLA-4 and LAG-3 were also highly expressed in glioma cells and were related to immune suppression among glioma patients [26]. Despite all the immune-checkpoint studies' efforts, anti-PD-1 therapies failed to prove the survival bene ts in phase III clinical trials [12,13]. Highly immunosuppressive TME might have in uenced the e cacy of the therapy. Furthermore, new immune targets were also a possible solution for the reduced e cacy. Recently, CD161 was considered as another possible molecular target for immunotherapy. In ltrated T cells highly expressed CD161 to inhibit the anti-tumor effect of killer T cells. The process might share some features with the PD-1 pathway, since the CD161 was also expressed in immunosuppressive myeloid cells and tumor cells [27]. In the study, ICI cluster B showed higher immune-checkpoint gene expression levels in PD-1, PD-L1, CTLA-4, IDO1, and CD161 with signi cantly worse prognosis ( Figure 1D-I). Similar results were also observed in gene clusters and ICI score clusters ( Figure 2B, E-H and Figure 3B-C). Based on the results of ICI score groups, the following immune-checkpoint genes were negatively related to prognosis: CXCL-10, PD -1, IFN-gamma,  IDO1, LAG-3, CD161, GZMB, GZMA, CD8A, TIM3, CXCL9, CTLA-4, PRF1, and PD-L1. And part of the conclusion just supported previous studies[28 -33].
The differences in expression level of immune-checkpoint genes might not completely explain the differences in prognosis. In the study, ICI cluster C and D showed different prognosis with similar expression levels in PD-1, IDO1, and CTLA-4. Between ICI cluster B and D, signi cant differences in immune-checkpoint genes were observed with similar prognosis. Other prognostic genes might exist besides identi ed immune-checkpoint genes. Differences in prognosis between gene cluster A and B proved that the differential genes among ICI clusters were related to prognosis ( Figure 2B). Further analysis was focused on description of genes based on pathways and weighting the prognostic genes.
Phenomenon of different immune cells in uencing prognosis through different pathways had been revealed in many studies. In ltrated immune cells expressed multiple tumor-intrinsic factors to form the immunosuppression TME of glioma [3]. Signal transducer and activator of transcription 3 [34], TGF-β [35], IL-10 signaling pathway had been proved to play an essential role in suppressing the activity of immune cells among glioblastoma patients. Highly expressed pathways between high and low ICI score groups were shown in Figure 3D.
TMB had been regarded as a key prognostic indicator for many tumors including glioma [36,37]. And we calculated and analyzed the TMB values with many clinical features which had been proved as prognostic factors for reduction of bias [38,39]. TMB values were positively related to ICI scores ( Figure  4D). Furthermore, comparison of prognosis between high and low ICI score groups received same results in prognostic clinical factors ( Figure 5). After minimizing the in uence of selection bias, the ICI scores were still closely related to prognosis.
Finally, we built a nomogram model to evaluate the ICI score's clinical application's signi cance. Through both internal and external validation, the model showed impressive predicting ability ( Figure 6B-H). The application of both ICI score and the nomogram could be rational in prognosis-prediction and even therapy-selection.
There were still some shortages in the study. 1) Firstly, the dataset was not big enough for us to conduct PSM analysis to minimize the in uence of selection bias. We had to compared the prognosis with speci c prognostic clinical features, and calculate the TMB values.
2) The dataset of the study was collected from the TCGA database, and the results might not be applicable in some regions. More data were required to ensure the reliability of the results.

Conclusions
The ICI score, which was built according to immune-related differential genes, had been proved to be a reliable predictor for prognosis. The nomogram model based on the ICI score showed great guiding signi cance for clinical practice through multiple validations.