Development and validation of an immune-related gene pairs signature in lower-grade glioma


 Background

Lower-grade gliomas (LGG) are the prevalent primary intracerebral malignancy tumor. Increasing evidence indicated an association between immune signature and LGG prognosis. Thus, we aim to develop an immune-related gene pairs (IRGPs) signature that can predict prognosis for LGG.
Method:

Gene expression levels and clinical information of LGG patients (LGGs) were collected from The Cancer Genome Atlas (TCGA) and Chinese Glioma Genome Atlas (CGGA) databases. The two databases were divided into training cohort (n = 515) and an independent validation cohort (n = 604). IGRPs significantly associated with prognosis were selected by Cox regression. Gene set enrichment analysis and filtration were performed on IGRPs.
Results

Within 1991 immune genes, an 8 IRGPs signature including 15 unique genes was constructed, which had a significant association with survival. In the validation dataset, the IRGPs signature significantly stratified LGGs into low- and high-risk groups (P < 0.001), and it remained an independent prognostic factor in univariate and multivariate analyses (P < 0.001). Additionally, 26 functional pathways were filtrated through the intersection of Gene set enrichment analysis (GSEA) and gene ontology (GO) enrichment analysis.
Conclusion

The IGRPs signature demonstrated good prognostic value in lower-grade glioma, which may provide new insights into individual treatment for glioma patients. And the IGRPs might take effect through these filtrated 26 functional pathways.


Results
Within 1991 immune genes, an 8 IRGPs signature including 15 unique genes was constructed, which had a signi cant association with survival. In the validation dataset, the IRGPs signature signi cantly strati ed LGGs into low-and high-risk groups (P < 0.001), and it remained an independent prognostic factor in univariate and multivariate analyses (P < 0.001). Additionally, 26 functional pathways were ltrated through the intersection of Gene set enrichment analysis (GSEA) and gene ontology (GO) enrichment analysis.

Conclusion
The IGRPs signature demonstrated good prognostic value in lower-grade glioma, which may provide new insights into individual treatment for glioma patients. And the IGRPs might take effect through these ltrated 26 functional pathways.

Background
Lower-grade gliomas (LGG), consisting of the World Health Organization (WHO) grades II and III, are the prevalent primary intracerebral malignancy tumor [1]. To date, surgical resection combined with postoperative chemoradiotherapy is the standard treatment for LGG. Although a lot of effort has been put forth to improve the clinical outcome, more than half of the LGGs experience recurrence or progress to high-grade glioma over time [2]. What's more, in addition to conventional surgical therapy, LGGs has limited sensitivity to chemotherapy and radiation [3,4]. Thus, there may exist other prognostic factors for LGG. Several biomarkers, including co-deletion of chromosome arms 1p and 19q (1p/19q co-deletion), O-6-methylguanine-DNA methyltransferase (MGMT) methylation and isocitrate dehydrogenase (IDH) mutation have been added to the 2016 WHO classi cation, to clarify the histological features and to guide the therapeutic plan [5][6][7][8]. However, these widely recognized biomarkers cannot properly evaluate individual risk strati cation in LGG.
Tumor microenvironment (TME) has drawn a lot of attention as a vital cellular milieu for its pivotal roles in tumor initiation, progression, and metastasis. It consists of mesenchymal cells, in ammatory mediators endothelial cells, as well as immune cells and stromal cells [9]. Additionally, immune cells and stromal cells are two main non-tumor components, which have great importance in the diagnosis and prognosis of tumors [10]. A prevalent algorithm called ESTIMATE designed by Yoshihara et al. has been used to evaluate immune cells and stromal cells in many malignant tumors [11][12][13]. In this study, we also employed this algorithm for further research.
Glioma immune microenvironment matters in glioma biology [14]. Immune-checkpoint inhibitors and immunotherapy play a part in improving gliomas prognosis [15,16]. As to prognosis biomarkers, researchers have explored the possibility of stratifying patients with malignant tumors (including gliomas) based on gene expression pro les and have constructed a multigene-expression model that can be applied to stratify high-risk subgroup [17][18][19][20][21][22]. One common drawback of the above studies is that gene expression pro les require suitable normalization. There exists an inevitable deviation owing to technical biases across sequencing platforms and biological heterogeneity [23]. Thus, researchers have invented the conception of immune-related gene pairs, which can eliminate the disadvantages of data processing and is considered to be a robust prognostic indicator in some tumors [24][25][26][27][28][29].
However, its prognostic value has not been estimated in lower-grade glioma. In our study, we constructed and validated a personal prognostic indicator by integrating immune-related gene pairs for lower-grade glioma.

Materials And Methods
Acquisition of LGG expression pro les from TCGA and CGGA datasets

Acquisition of immune-related genes
We downloaded a comprehensive list of immune-related genes (IRGs) from the Immunology Database and Analysis Portal (ImmPort) database (https://www.immport.org). Then, the limma package of R software was utilized to extract the RNA expression pro les of IRGs (determined by median absolute deviation, MAD > 0.5) [30]. We select co-expressed IRGs both in TCGA and CGGA databases for further study.
Establishment of prognostic signature based on the immune-related genes pairs A pairwise comparison was played between the immune-related gene expression levels in each sample to acquire a score for each IRGP. An IRGP was counted as follows: (1) if IRG 1 > IRG 2, IRGP = 1; (2) if IRG1 < IRG 2, IRGP = 0. The superiority of analyzing genes in a pairwise method is that it eliminates the need for normalization steps for individualized analysis [29]. If the score of an IRGP was 0 or 1 in more than 80% of the samples of the TCGA dataset or the CGGA dataset, then we discarded the IRGP. Then we apply Least absolute shrinkage and selection operator (LASSO) Cox regression (iteration = 1000) to generate an IRGP index (IRGPI) by using R package glmnet. In order to stratify patients into high and low-risk groups, the optimal cut-off value of the IRGPI was determined using receiver operating characteristic (ROC) curve analysis at 3-year overall survival in the TCGA database.

Validation of the IRGPI
We further evaluated the IRGPI model in the CGGA database of LGG samples by the log-rank test.
Additionally, we assessed the IRGPI together with other clinical variables in the univariate and multivariate cox proportional hazard regression model in TCGA and CGGA datasets.

Pro ling of in ltrating immune cells
To clarify immune cell in ltration in a different risk group, we used CIBERSORT [31], a popular algorithm for exploring cell composition of tumor tissues from their gene expression pro les. Gene expression pro le was uploaded to the online analytical platform CIBERSORT web portal (http://cibersort.stanford.edu), using the default signature matrix at 1,000 permutations. Then, The pvalues were calculated by using 'wilcox.test' function in R.

Gene set enrichment analysis and the ltration of the results
To understand the biological functions of this immune-related prognostic signature, we employed gene set enrichment analysis (GSEA/MSigDB, http://software.broadinstitute.org/gsea/downloads.jsp). FalseDiscovery Rate-adjusted P < 0.05 was considered statistically signi cant. To make a comprehensive understanding of some biological functions, we lter the outcome of GESA from the perspective of the tumor microenvironment. Scores of immune and stromal were calculated using the ESTIMATE algorithm [11] , immune scores and stromal scores were strati ed into low-level groups and high-level groups according to the median value, respectively. Then we acquired the intersected differentially expressed genes (DEGs) among stromal scores and immune scores, which exhibited by the VennDiagram package [32]. Additionally, we conducted gene ontology (GO) enrichment analysis based on the DEGs.
Finally, we took an intersection between GO and GSEA by using the Venn Diagram package.

Result
Prognostic IRGPs signature construction To make our investigation process more explicit, the entire work ow is illustrated in Fig. 1. A total of 1,119 patients with LGG were included in the analysis (Additional le 1: Table S1). TCGA cohort was used as the training dataset. Altogether, 1991 immune-related genes (IRGs) were downloaded from the ImmPort database (accessed on 10/5/2020). 372 IRGs were measured both in TCGA and CGGA datasets (MAD > 0.5).
From these 372 IRGs, 15,245 IRGPs were constructed. After discarding IRGPs with relatively small variation (MAD = 0), 22 IRGPs were left and chosen as initial candidate IRGPs. Then we de ned IRGPI by Lasso Cox proportional hazards regression on the training set and selected 8 IRGPs in the nal model. The IRGPI consisted of 15 unique IRGs ( Table 1). The optimal cut-off value of the IRGPI for prediction of LGG prognosis was determined to be -0.007 using the ROC curve analysis (Fig. 2). The IRGPI signi cantly strati ed patients into low-and high-risk groups in terms of overall survival (OS). We found the high-risk group had a signi cantly more inferior OS than the low-risk group in the TCGA cohort (P < 0.001, Fig. 3a). To further investigate the prognostic value of the IRGPI, we applied uni-and multivariate Cox regression analyses to the training cohort. After combining with the clinical factors such as age, sex, and tumor grade. The risk score of IRGPI remained an independent prognostic factor (P < 0.001, HR, 2.467 [2.003, 3.037]; Fig. 3c, e). Validation of the IRGPI for survival prediction To con rm whether the IRGPI had consistent prognostic value in different populations, we applied the IRGPI signature to the CGGA database (n = 604) as external validation. Unsurprising, On the CGGA cohort, the high-risk group had a signi cantly shorter OS than the low-risk patients (P < 0.001, Fig. 3b). Additionally, it also remained an independent prognostic factor when taking clinical factors such as age, sex, and tumor grade into consideration (P < 0.001, HR, 1.873 [1.653, 2.122]; Fig. 3d, f).

Immune cells in ltration between different risk groups
To gain a better insight into the difference of immune cells in ltration between the low-risk and high-risk groups, we carried out immune in ltration analysis. Figure 4a presented a comparative summary of the CIBERSORT outcome resulting from the two risk groups. We found Macrophage M1 and T cells CD8 were signi cantly highly expressed in the high-risk group (P = 1.757e-04, Fig. 4b; P = 0.002, Fig. 4d), while Monocytes was signi cantly highly expressed in the low-risk group (P = 5.749e-05, Fig. 4c). What's more, Macrophage M0 and T cells CD4 memory activated were also highly expressed in the high-risk group in our study (P = 0.004, Additional le 3: Fig S1a; P = 0.033, Additional le 3: Fig S1b).
We carry out GSEA between low-and high-risk groups to explore the pathways that were signi cantly altered. And we found 299 pathways were signi cantly different(Additional le 2: Table S2). Then we selected the 50 minimum adjusted Ppathways for ltration. According to statistics, 1199 up-regulated DEGs and 296 down-regulated DEGs were selected in the stromal score group. While 965 up-regulated DEGs and 692 down-regulated DEGs were selected in the immune score group. Then, 887 up-regulated intersected genes and 277 down-regulated intersected genes were revealed in Venn plots (Fig. 5a,b). The gene ontology (GO) analysis of these intersected genes was exhibited in Fig. 5c. Finally, we took an intersection between GO and GSEA. As a result, 26 functional pathways were ltrated (Fig. 6, Table 2).

Discussion
In the present study, we constructed an immune-related gene pairs signature and validated this signature in the external dataset. Our research showed the signature could stratify patients into low-and high-risk groups. Uni-and multivariate Cox regression analyses showed that the risk score was an independent prognostic factor. The prognostic signature related to tumor immune microenvironment may disclose a new perspective to develop a novel predictive biomarker and improve LGG patient management in the era of immunotherapy.
Similar to other studies, several immune cells distributed signi cantly differently in our research. For example, we found T cells CD8 and Macrophage M0 are signi cantly enriched in the high-risk group, Wu et al.' s research [27] also have the same trend, though was not statistically signi cant. While the opposite view indicated T cells CD8 was signi cantly higher in the low-risk group [28]. This phenomenon may because the selected immune-related gene pairs and tumor types are different from each other in the above researches. Thus it needs further exploration. Additionally, we ltrated 26 functional pathways from the intersection between GO and GSEA, and this may indicate these immune-related gene pairs take effect through these functions.
Despite a better prognosis for LGG patients (LGGs), 70% of them inevitably progress to high-grade aggressive gliomas and eventually result in death [33]. Over the past 30 years, the overall survival of the majority LGGs had not been signi cantly improved [2]. Thus, it is vital to develop an individualized treatment protocol for LGG. Reliable prognostic biomarkers could identify patients with poor outcomes and who might bene t from intensive therapy. Recently, signi cant research on immune-related genes (IRGs) expression has shown great prognostic value in LGG. For example, Deng et al. [34] found up to 397 IRGs were signi cantly associated with LGG survival. Zhang et al. [21] established a risk model based on 6 IRGs (CANX, HSPA1B, KLRC2, PSMC6, RFXAP, and TAP1). However, one common disadvantage of all the above assays lies in standardizing gene expression pro les, which is di cult to avoid. The immunerelated gene pairs signature depends on relative ranking and paired comparison of gene expression pro le within a sample. This strategy is in favor of dealing with combined gene expression pro les from multiple databases.
LGGs can be further divided into subgroups with different survival outcomes by this prognostic immune signature. So our signature can evaluate LGGs prognosis in a single-sample, individualized form.
Our 8-IRGP signature consists of 15 immune-related genes, and these genes play a vital role in the regulation of the immune microenvironment. Bone morphogenetic protein 2 (MP2) is known to facilitate differentiation and growth inhibition in glioma through the downregulation of both O6-methylguanine-DNA methyltransferase (MGMT) and hypoxia-inducible factor-1a (HIF-1a) [35]. It was reported that killer cell lectin-like receptor subfamily C, member 2 (KLRC2), a transmembrane activating receptor of natural killer cells, expressed higher in glioma-initiating cells compared with neural stem cell lines and normal adult brain tissue [36]. Emerging evidence suggests that the chemokine (C-X-C motif) ligand 12 (CXCL12), which is secreted from glioma stem cells (GSCs), plays a vital role in tumorigenesis and proliferation [37]. Additionally, many of the selected 26 functional pathways are related to the immune microenvironment, such as regulation of immune effector process, positive regulation of cytokine production, and cell chemotaxis. Researches had shown that numerous cytokines and chemokines promote the in ltration of various cells: circulating progenitor cells, endothelial cells, and a range of immune cells such as peripheral macrophages, microglia, leukocytes, CD4 + T cells into the glioma [38][39][40]. These nonneoplastic cells have an important role in tumor growth, metastasis, as well as response to treatment [14]. Thus these functional pathways matter in glioma immune microenvironment.
There are also some limitations to this study. First, the immune-related gene expression pro les are determined by RNA-seq, which is hard to generalize in daily clinical applications for its sophisticated detection program and high price. Second, the immune signature comes from fresh frozen samples.
Thus, there are still doubts about the e ciency and stability. Finally, our study was a retrospective analysis, and prospective research is needed to validate the results.

Conclusion
Taken together, we systematically estimated the prognostic value of the new IRGP prognostic model, which may provide a legitimate approach in LGG management. And this may become a valuable predictive tool to identify patients who may bene t from immunotherapy. Additionally, we ltrated 26 functional pathways from the intersection between GO and GSEA, indicating these immune-related gene pairs may take effect through these functions.

Declarations
Ethics approval and consent to participate Ethical approval was waived by the local Ethics Committee of Ningxia Medical University for the data of patients are downloaded from public databases.

Consent for publication
Not applicable Availability of data and materials The datasets generated and analysed during the current study are available in the TCGA (https://portal.gdc.cancer.gov) and CGGA (http://www.cgga.org.cn) repositories.

Competing interests
The authors declare that they have no competing interests.  The work ow describes the construction of our investigation process. The TCGA data were assigned to a training dataset, and the training dataset was used to construct immune-related gene pair signatures. The CGGA datasets were used to validate the IRGPI. The 26 functional pathways were ltrated by the intersection of GO and GSEA in TCGA dataset.  Time-dependent ROC curve for IRGPI in the TCGA dataset. IRGPI score of −0.007 was used as the optimal cut-off for IRGPI to stratify patients into low-or high-risk groups.