Investigation of Candidate Genes and Mechanism Associated with Immune Cell during the Progression of Glioblastoma based on Bioinformatics

Background: This study aimed to investigate the molecular mechanism of immune cell in�ltration during the development of glioblastoma multiforme (GBM), and explore the potential immune cell associated prognostic genes for GBM. Methods: Gene expression data and corresponding clinical data of GBM samples (tumor group) and normal samples (normal group) in TCGA-GBM and GTEx dataset were downloaded. The differentially expression analysis was performed on samples between two groups. Based on tumor immune microenvironment analysis, the immune-related RNAs (lncRNAs and mRNAs) were further explored. Then, functional enrichment analysis, ceRNA network, risk prediction model and prognosis investigation were performed. Finally, the results of survival prognosis of key genes were tested by additional datasets. Results: A total of 4989 up-regulated genes and 2349 down-regulated genes were revealed between tumor group and normal group. M2 macrophages accounted for the largest proportion of tumor in�ltrates immune cells in GBM, and was related to the prognosis of GBM patients. Totally 168 mRNAs (KIF18B) and 5 lncRNAs were related to in�ltration of M2 Macrophage, of which 25 mRNAs and 5 lncRNAs forms a ceRNA network through 37 miRNAs (eg., miR-6849-3p). These genes were mainly assembled in functions like signal release. A risk model based on 5 mRNAs (such as FOX4 and ELFN2) and lncRNA PR11-161H23.5 was constructed. Veri�cation test showed that all 5 mRNAs were signi�cantly associated with OS prognosis. Conclusions:


Background
Glioblastoma multiforme (GBM) is a common malignant neuroepithelial tumor of the central nervous system in adults [1].The survival of GBM following diagnosis is only 12-15 months [2].Typically, surgery after chemotherapy and radiation therapy are common used for the treatment of GBM [3].However, the poor effect of existing treatments on the diffusive, in ltrative, and metastatic block the clinical therapy of GBM [4].
The cancer immunotherapy are proved to be valuable in clinical trials of patients with GBM [5].Innovative novel immunotherapies for the treatment of GBM is hopeful since targeting the immune system contribute to the GBM therapy with signi cant clinical bene ts [6].Actually, the immune in ltration of tumor microenvironment followed by immunotherapy in GBM has been revealed [7].A previous study proves that T cells can in ltrate the tumor bed [8].Kmiecik et al. showed that the survival rate of patients with GBM is closed related with the expression of CD3 + and CD8 + in immune cells [9].Fortunately, the detail mechanism of immune cell in ltration of immune cells during the progression of tumor development can be revealed by the gene expression analysis [10].Based on correlation analysis of immune-related functions, immune cell in ltration abundance and patient survival, Zhao et al. showed that ADAMTSL4 gene was a novel immune-related biomarker for primary GBM multiforme [11].Based on TCGA datasets and Cell-type Identi cation by Estimating Relative Subsets of RNA Transcripts (CIBERSORT) algorithm, a recent gene expression-based study determined the relative proportions of 22 in ltrative immune cells, which closed associated with the GBM prognosis [12].However, due to the lack of detail mechanisms of immune cell-related genes in tumor, possible prediction models and subsequent data validation, the potential immune cell-related genes, as well as correlation between these genes and GBM prognosis are still unclear.
In this study, the gene expression data and clinical information of GBM tumor samples and normal tissue samples in TCGA-GBM dataset and GTEx dataset were downloaded.The differentially expression analysis was performed on samples between two groups.Then, based on tumor immune microenvironment analysis, the immune-related differentially expressed RNAs (DEGs), differentially expressed lncRNAs (DElncRNAs) and differentially expressed mRNAs (DEMs) were further investigated.
The enrichment analysis on was performed on DEMs, followed by mRNAs associated miRNAs were predicted.Moreover, the lncRNA-miRNA-mRNA (competing endogenous RNAs (ceRNA)) network study, risk prediction model and prognosis evaluation were performed based on key genes.Finally, the results of survival prognosis of key genes were tested by additional datasets.This study hoped to investigate the detail molecular mechanism of immune cell during the development of GBM, and explore the potential immune cell associated genes for prognosis of GBM patients.

Methods
The ow chart of experimental design in this study was listed in Supplementary Figure 1.

Microarray data
The RNA-seq data including corresponding clinical phenotype data of GBM samples in TCGA and GTEx were downloaded from UCSC database.A total of 166 GBM samples obtained from TCGA-GBM dataset were enrolled as tumor group.Meanwhile, 5 normal paracancerous tissue samples from TCGA-GBM dataset and 105 normal brain cortex samples from GTEx dataset were combined as normal group.
Moreover, all transcripts per million (TPM) format data, phenotypic data and survival information data of all enrolled samples were further obtained.

Differentially expression analysis
The linear regression and empirical bayesian methods [13,14] in limma package (3.40.6) of R [15] were used to explore DEGs between tumor group and normal group.The P < 0.05 and | log2 fold change (FC)| >1 were selected as the thresholds for DEGs screening.Then, the volcano plots and clustering heatmap were constructed using ggplot2 (version: 3.2.1)[16] and using pheatmap (version: 1.0.12)[17], respectively.Moreover, the DEMs and DElncRNAs from DEGs were explored by gene annotation based on the HUMAN Release 23 gene annotation information in GENCODE database.Finally, the GEPIA 2 database [18] (http://gepia2.cancer-pku.cn/#index)was used to verify the results in current differentially expression analysis.
Immune score and cytolytic activity analysis ESTIMATE (version: 1.0.13)algorithm [19] was sued to estimate the scores of stromal cells and immune cells in reactive tumors.Furthermore, the immune cytolytic score was calculated by the log-average of GZMA and PRF1 expression values in TPM.Finally, the differences of immune score and cytolytic activity between tumor and normal samples were analyzed by T test, and were visualized by Boxplot using ggpubr (version: 0.2.2) [20] software in R. Cibersort algorithm [21] was performed to reveal in ltration abundance of 22 kinds of immune cells.The calculation results of TCIA database [22] were used for veri cation.Furthermore, TIMER algorithm [23] as used to estimate in ltration abundance of 6 kinds of immune cells (B cells, CD4 T cells, CD8 T cells, Neutrophil, Macrophage and Dendritic cells).Meanwhile, the relationship between immune cells in ltration abundance and the overall survival (OS) was explored by log-rank test, followed by the Kaplan-Meier (K-M) survival curves construction using survival (version: 2.44-1.1) in R. Finally, based on the results of the two algorithms, the immune cells with different abundances between GBM and normal samples and related to the prognosis of GBM were selected as the focus of follow-up analysis.

Immune cells associated DEGs investigation
The Spearman correlation test between GBM and normal samples was performed on DEMs, DElncRNAs and in ltration abundance of immune cells using corrplot (version: 0.84) [24] in R. The P < 0.05 and correlation coe cient |r| > 0.3 were selected as the cutoff value for screening the immune associated DEMs and DElncRNAs.

Enrichment analysis of the immune associated DEMs
GO function [25] and KEGG pathway [26] enrichment analyses of immune associated DEMs and DElncRNAs were performed using the clusterPro ler (version: 3.12.0)software [27].Based on Fisher's exact test, the P value < 0.05 was considered as cut-off values of signi cant enrichment results.The results were visualized by Bubble chart and histogram.

The miRNA prediction and co-expression investigation
The lncRNA-mRNA pairs were identi ed from all DElncRNAs and DEMs by Person correlation analysis using corrplot (version: 0.84) [24] in R. The lncRNA-mRNA pairs with r > 0.5 and P < 0.01 were enrolled for the further investigation.Furthermore, the miRNAs prediction for DEMs which correlated with mRNAs were performed using miRWalk3.0software (parameters: binding probability = 0.95; binding site position = 3UTR) [28].Then, totally 4 databases including miRWalk3.0[28], miRDB [29], TargetScan [30] and MirTarBase [31] were used to explored the potential mRNAs-miRNAs relations.Finally, the mRNAs-miRNAs interactions that validated by at least 3 databases (must include MirTarBase) were enrolled for the further investigation.
The online tool LncBase Predicted v.2 was used to predict miRNAs targeting immune related DElncRNAs.The miRNA-lncRNA regulatory relations were screened with the threshold score > 0.7.The lncRNA-miRNA-mRNA interactions were screened by integrating miRNA-mRNA, miRNA-lncRNA and mRNA-lncRNA interactions, and the ceRNA network was visualized by Cytoscape software [32].

PPI network construction
According to STING database (version: 10.0) [33], the information of the protein interaction was explored, and PPI pairs among mRNAs in ceRNA network were predicted (minimum required interaction score = 0.4).Then, the results of these interactions was visualized by Cytoscape software [32].

Survival analysis
According to the OS and progression free survival (PFS) survival prognosis data of 165 GBM samples with survival information, the prognostic value of mRNAs and lncRNAs (from ceRNA network) on GBM patients were investigated.The mRNAs and lncRNAs signi cantly related to the prognosis of OS and PFS were screened by single factor Cox expression regression, and the signi cance p value and the prognosis correlation coe cient β value were obtained by log-rank test, followed by the Risk score calculation with: The corresponding risk score of each sample was calculated, and then with median risk score as the boundary, all samples were divided into high risk group or low risk group.The Kaplan-Meier (KM) [34] survival curve was used to assess the association between risk models and survival outcomes in GBM samples.Furthermore, the computational model for predicting clinical OS and PFS endpoints by the expression value of screened key genes in samples was constructed with svm package, followed by the area under curve (AUC) calculation based on receiveroperating characteristic (ROC) curve.AUC between 0.1 and 1 was used as a numerical value to evaluate the quality of the classi er intuitively.

Prognosis veri cation based on external data set
A total of 4 GBM related GEO dataset including GSE42669, GSE37418, GES7696 and GSE2817 were selected to verify the survival prognosis of the screened key genes By using SurvExpress software [35].
Furthermore, the mRNA-seq 693.RSEM-genes.20190909pro le and associated clinical data mRNAseq_693.clinical.20190909were downloaded from Co-expressed Gene Groups Analysis (CGGA) database (http://www.cgga.org.cn/index.jsp).The GBM and rGBM samples were selected from the original GEO dataset, followed by the survival information extraction including OS and Censor status.Then, these data were integrated with the expression data of the screened key genes.The samples were divided into two groups with high expression and low expression according to the median expression level, followed by the log-RANK test analysis.Finally, the result was visualized by K-M survival curve.

Differentially expression analysis
A total of 4989 up-regulated genes and 2349 down-regulated genes were revealed between tumor group and normal group.Among them, there were 4412 up-regulated and 1805 down-regulated mRNAs, as well as 155 up-regulated and 316 down-regulated lncRNAs between two groups (Supplementary Figure 2).The veri cation results based on the GEPIA2 database show that there were totally 2454 down-regulated genes and 5213 up-regulated genes.Moreover, there were 2229 intersected down-regulated genes (94.89% of GEPIA2 analysis results) and 4778 intersected up-regulated genes (95.77% of GEPIA2 analysis results), which indicating a reliable results of current differentially expression analysis.

Tumor immune score and cytolytic activity analysis
The Box plots were used to reveal the tumor immune score and cytolytic activity between tumor samples and normal samples.The result showed that the stromal score (Supplementary Figure 3A), immune score (Supplementary Figure 3B) and ESTIMATE score (Supplementary Figure 3C) in GBM group were signi cantly higher (all P < 0.05) than those in normal group.Moreover, the cytolytic activity score in GBM group was signi cantly higher (P < 0.05) than that in normal group (Supplementary Figure 3D).

Immune cell in ltration analysis based on CIBERSORT and TIMER
In the expression matrix of CIBERSORT, a total of 335 genes were included in the 547 genes of LM22 signal matrix (accounting for 61.4%).A total of 22 immune cell in ltration abundance matrices were estimated by CIBERSORT algorithm.In all 276 samples, only 66 samples were valid (P < 0.05), including 65 GBM samples and 1 normal sample (Figure 1A).Among these 66 samples, the in ltration ratio of M2 macrophages cells in GBM samples reached an average of 0.458 and a maximum of 0.759.The nding was con rmed in TCIA database that macrophages M2 cells had the highest average in ltration based on 604 samples (Figure 1B).However, in the unique normal sample (GTEX-ZYY3-3126-SM-5SI9), the in ltration ratio of M2 Macrophages was only 0.109, which was much lower than the GBM sample.Due to the small sample size, the result could not be veri ed in this data.Furthermore, the clustering heat map of 22 immune cell in ltration abundances was showed in Supplementary Figure 4.This result showed that the in ltration degree of macrophages M2 cells in GBM samples was signi cantly higher than that of other immune cells.
Based on TIMER, the immunocyte in ltration histograms of GBM vs. normal showed that the in ltration degree of Macrophage cells in GBM samples were signi cantly higher (P < 0.05) than those in normal samples (Figure 1C).Meanwhile, the Violin chart of in ltration abundance showed that except for B cells (P = 0.489) and CD4 T cells (P = 0.006), the in ltration abundance between two groups in all other cells were signi cantly different (all P < 0.01).The heatmap for GBM immune cell subsets was showed in Supplementary Figure 5. Furthermore, the ROC analysis of in ltration abundance of 6 immune cells and OS showed that Macrophage cell in ltration abundance was signi cantly related to sample survival time (Supplementary Figure 6).The lower the in ltration abundance, the longer the overall survival of the patient.Based on the results of the two algorithms, Macrophages, which had different abundance in GBM vs. normal and associated with the prognosis of GBM, were selected as the focus for the further investigation.
The GO function analysis showed that the immune-related DEMs were mainly assembled in functions like Cognition (GO:0050890; Gene: BRSK1), Signal release (GO:0099643; Gene: FOX4 etc.) and Regulation of neurotransmitter transport (GO:0051590; Gene: BRSK1) (Figure 2A).Furthermore, the KEGG pathway enrichment analysis showed that the immune cell associated DEMs were mainly enriched in pathways like Endocrine resistance (hsa01522), Viral protein interaction with cytokine and cytokine receptor (hsa04061), as well as Chemokine signaling pathway (hsa04062) (Figure 2B).

CeRNA network construction
Person correlation analysis of lncRNA-mRNA interaction revealed that totally 137 mRNAs were positively correlated with 5 lncRNAs with r > 0.5 and P < 0.01.The top-10 mRNA-lncRNA interactions according to the correlation coe cient were showed in Supplementary Table 1.The result showed that all the 10 mRNAs including RUNDC3A, BRSK1 and SPTBN2 were interacted with the same lncRNA RP11-74E22.8.Moreover, there were totally 41 mRNAs, 126 miRNAs and 140 interactions in current mRNA-miRNA relations.Furthermore, the lncRNA-miRNA interaction analysis revealed a total of 580 interactions, 5 lncRNAs and 501 miRNAs.

Risk prediction model construction based on prognostic analysis
Based on the COX univariate regression analysis, the mRNAs including SOX4, KIFC1, ELFN2, PLP2 and MEX3A, as well as lncRNA PR11-161H23.5 were related to the prognosis of PFS.Meanwhile, the K-M survival curve analysis showed that SOX4, KIFC1, ELFN2, MEX3A, and RP11-161H23.5 were signi cantly correlated with prognosis.However, the result of multivariate COX regression analysis showed that only RP11-161H23.5 was associated with prognosis (Supplementary Figure 7).Then, the risk prediction model based on the expression of SOX4, KIFC1, ELFN2, PLP2 and MEX3A and PR11-161H23.5wasused to predict the clinical end point of PFS.The result showed that the AUC of the prediction model was 0.828 with a speci city of 0.758 and a sensitivity of 0.909, which could be used to predict the clinical end point of PFS (Figure 4).
According to the median value of risk score (0.08325018), 165 samples were divided into high risk (H) groups and low risk (L) groups.The result of samples sorted by risk score was showed in the Figure 5A.Meanwhile, the scatter plot of sample grouping and PFS survival time showed a shorter survival time in H group (Figure 5B).Moreover, the K-M curve analysis showed that the PFS survival time of H group was signi cantly lower than that of L group (Figure 5C).Furthermore, the clustering heat map of 6 gene expression levels among the sample groups showed that the expression levels of SOX4, KIFC1, ELFN2, and MEX3A were lower in the L group when compared to the H group.Meanwhile, the expression of PLP2 and RP11-161H23.5 were higher in samples of L group when compared to the H group (Figure 6).The box plot analysis for 6 gene expression showed that there was signi cant difference in the expression of each key gene between two groups (Figure 7).

Prognosis veri cation based on external data sets
Since there was no PFS survival information in the external data sets (GEO and CGGA datasets), the effect of the risk model constructed in the previous step could not be veri ed.Thus, only the differences of 6 key genes and OS prognosis were enrolled for current veri cation analysis.A total of 693 samples were downloaded from the CGGA database.After screening, there were 249 GBM and rGBM samples.The analysis results showed that KIFC1, ELFN2 and PLP2 were signi cantly related to OS (Supplementary Figure 8).Furthermore, COX survival analysis was performed based on 4 GEO validation datasets.The results showed that except for RP11-161H23.5, the COX survival analysis results of the other four genes were contribute to the survival (Supplementary Figure 9).

Discussion
Although GBM-in ltrated innate immune cells are proved to be associated with the progression of disease [9], the detail mechanism of these cells and associated genes are still unclear.In this study, the bioinformatics analysis showed that compared with the normal tissues, the immune score and cytolytic activity in GBM tissues were signi cantly higher than those in normal tissues.Meanwhile, the proportion of M2 Macrophage in ltration in GBM was signi cantly higher than that in normal tissues, and was related to the prognosis of GBM patients.A total of 4989 up-and 2349 down-regulated genes were revealed between tumor group and normal group.Among them, totally 37 DEGs (KIF18B) and 5 DElncRNAs (such as lncRNA PR11-161H23.5)were related to in ltration of M2 Macrophage, and forms a ceRNA network through DEMs (such as miR-6849-3p).Signal release was one of the outstanding functions assembled by these DEGs.Furthermore, a risk model for predicting the clinical endpoint of PFS in GBM was successfully constructed based on 5 mRNAs (such as FOX4 and ELFN2) and lncRNA PR11-161H23.5.Finally, the analysis of external data sets showed that all these 5 mRNAs were signi cantly related to the OS prognosis in GBM patients.
Tumor associated macrophages are vital for the tumor microenvironment [36].M2 Macrophage, also named as macrophage anti-in ammatory phenotype, is known to secrete various cytokines and promote tumor-growth [37].A previous study proves the association between in ltration levels of M2 Macrophage and cytokines such as IL-6 in GBM [38].An investigation for the invasiveness of GBM stem cells in the absence and presence of macrophages M2 shows that M2 Macrophage may affect cancer stem cell phenotypes via cell-cell communication [39].Waldemar Debinski et al. indicated that more speci c gene markers for M2 Macrophage were needed for detail mechanism investigation of M2 Macrophage in GBM [40].Kinesin Family Member (KIF) 18B (KIF18B) gene is a member belong to the Kinesin family [41].
Previous studies shows that KIF family members such as KIF3A and KIF18A take part in the progression of different kinds of tumors [42,43].Recently, a study has proved that KIF18B is associated with the development of tumor in human [44].Moreover, the biological function of KIF family members are commonly regulated by miRNAs such as miR-127 [45].Meanwhile, some immune-related lncRNAs such as RP11-161H23.5 contributes to the prognosis prediction of GBM via certain lncRNA-miRNA interactions [46].Actually, the lncRNA-miRNA-mRNA regulation take part in the malignant progression of GBM disease identi cation and functional characterization [47].A previous differential correlation analysis of GBM reveals that immune ceRNA interactions can be used to predict the patient survival [48].In this study, the immune cell in ltration analysis showed that macrophages have the highest percentage of in ltration in GBM tissue compared to all immune-related cells, and was related to the prognosis of GBM patients.
Due to the extremely poor prognosis, there is a pressing need for an improved understanding of the molecular biomarkers for GBM [49].Based on a literature-based meta-analysis, a previous study shows that the PFS can be used as a surrogate endpoint for overall survival in GBM [50].Survival prediction in patients with GBM can be realized by certain human gene variation [51].SRY-Box Transcription Factor 4 (SOX4), a member of SOX family, is a transcription factor required for neuron survival and neurite growth [52].Li et al. indicated that SOX4 was overexpressed in diffusely in ltrating astrocytoma and conferred poor prognosis [53].The differentially expressed of SOX4 contribute to the overall survival and poor prognosis of various cancer [54].A previous study shows that SOX4 can block the cell growth and induce cell cycle arrest in GBM [55].Moreover, ELFN2 is proved to be associated with the biological function of cerebral cortex neurons [56].The evaluation of survival following melanoma brain metastases shows that ELFN2 differentially expressed between tumor samples and controls samples [57].It has been proved that ELFN2 has important signi cance for evaluating the prognosis of patients with GBM [58].In the current study, a risk model for predicting the clinical endpoint of PFS in GBM was successfully (AUC > 0.8) constructed based on M2 Macrophage related mRNAs such as FOX4 and ELFN2.Importantly, the external datasets veri cations showed that these mRNAs were signi cantly associated with OS prognosis in GBM patients.Thus, we speculated that mRNAs such as FOX4 and ELFN2 might be used as novel prognostic key genes for patients with GBM.

Conclusions
M2 Macrophage in ltration might play contribute to the GBM progression via RP11-161H23.5-miR6849-3p-KIF18BceRNA interaction.Moreover, mRNAs such as FOX4 and ELFN2 might be used as novel prognostic key genes for patients with GBM.

Figures
Figure 1 The       The risk prediction model constructed by the expression of mRNAs and lncRNAs.A total of 5 mRNAs including SOX4, KIFC1, ELFN2, PLP2 and MEX3A, as well as 1 lncRNA RP11-161H23.5 were enrolled in current prediction.The X-axis represented the speci city, while the Y-axis presented the sensitivity.
The risk prediction model constructed by the expression of mRNAs and lncRNAs.A total of 5 mRNAs including SOX4, KIFC1, ELFN2, PLP2 and MEX3A, as well as 1 lncRNA RP11-161H23.5 were enrolled in current prediction.The X-axis represented the speci city, while the Y-axis presented the sensitivity.The clustering heat map of the expression of 6 key genes between high-risk group and low-risk group.A total of 5 mRNAs including SOX4, KIFC1, ELFN2, PLP2 and MEX3A, as well as 1 lncRNA RP11-161H23.5 were enrolled in current clustering.The clustering heat map of the expression of 6 key genes between high-risk group and low-risk group.A total of 5 mRNAs including SOX4, KIFC1, ELFN2, PLP2 and MEX3A, as well as 1 lncRNA RP11-161H23.5 were enrolled in current clustering.The box plot analysis for 6 gene expression between high-risk and low-risk groups.A total of 5 mRNAs including SOX4, KIFC1, ELFN2, PLP2 and MEX3A, as well as 1 lncRNA RP11-161H23.5 were enrolled in current clustering.The X-axis represented different groups, while the Y-axis represented the expression value.
histogram and violin chart for immune cell in ltration.A, the histogram for 66 samples of immune cell in ltration abundance explored by CIBERSORT algorithm; the X-axis represented the different samples, while the Y-axis represented the relative percent.B, the histogram for 604 samples of immune cell in ltration abundance explored by TCGA-GBM dataset.The different color represented different immune cell type.C, In ltration histograms of 6 immune cells in Glioblastoma and normal samples explored by TIMER algorithm; the Y-axis represented the relative percent; the different color represented different types of immune cells.D, Violin chart for in ltration abundance difference of 6 immune cells in Glioblastoma and normal samples explored by TIMER algorithm; the X-axis represented the different abundance of immune cells, while the Y-axis represented the different types of immune cells.

Figure 1 The
Figure 1

Figure 2 The
Figure 2

Figure 2 The
Figure 2

Figure 5 The
Figure 5