Malignant tumor purity reveals the correlation between CD3E and low grade glioma microenvironment

Background: Tumor microenvironment (TME) contributes to the initiation and progression of low grade glioma (LGG); however, we are still unclear about the specics of LGG's TME. Methods: In this article, we selected 161 LGG patients from the Cancer Genome Atlas (TCGA) as data, and calculated the percentage of tumor inltrating immune cells (TICs) in LGG and the tumor purity of LGG through ESTIMATE and CIBERSORT calculation methods. Immune-related genes were screened out through Cox regression and protein-protein interaction (PPI) network. The data in Gene Expression Omnibus (GEO) was selected to screen out clinically relevant genes. After combining the two, CD3E is selected as the predictor. Finally, we conducted verication at the Aliated Hospital of YouJiang Medical University for Nationalities (AHYMUN) center. Results: We found that the higher the expression of CD3E, the lower the purity of LGG tumors and the worse the prognosis of patients. Gene Set Enrichment Analysis (GSEA) showed that genes in the high-expressing CD3E group are mainly involved in immune-related activities. This suggests that CD3E may be responsible for regulating LGG's TME and tumor purity. Conclusion: has and biological may help evaluate and Evaluating purity may provide additional into the complex role LGG microenvironment and clinical


Background
Due to not comprehensively understanding of lipocytes by gene regulation and carcinogenesis, treatment and prognosis of gliomas are relatively limited [1,2]. In clinical practice, gliomas are generally divided into four grades and low grade glioma (LGG) is grade I and II [3]. A large number of clinical studies have found that the survival rate of LGG patients is not high, and many patients have a sharp decline in survival time due to tumor deterioration in the later stage [4]. Nevertheless, high recurrence and malignancy rate of LGG still bring great pain to patients [5,6]. Investigations on approaches to maintain the quality of life of LGG patients while prolonging the overall survival (OS) has become a common concern for clinicians and researchers [7].
The latest research nds that the tumor microenvironment (TME) facilitates the development of tumors [8]. The interaction between cancer cells, stromal cells and immune cells recruited from a distance promotes the invasion and metastasis of a variety of cancers, including proliferation, anti-apoptosis, and evasion of immune surveillance, thereby signi cantly affecting the treatment and prognosis of cancer patients [9,10]. TME is mainly composed of resident stromal cells and recruited immune cells [11]. Stromal cells and immune cells affect tumor blood vessel growth and tumor proliferation, respectively.
Meanwhile, tumor-in ltrating immune cells (TICs) in TME can be used to determine the prognosis of patients [12], and the related immune genes have an impact on the survival of cancer patients. For example, immune genes affect brain tumors [13,14]. This correlation has led to improvements in immune-based treatment methods to create immune checkpoint inhibitors and prognostic biomarkers for tumor patients [15][16][17]. These studies suggest that the various immune responses of LGG's TME may change the purity of the tumor, thereby affecting the invasion and metastasis of LGG. The study found that there is a deep connection between LGG and TME. The higher the stroma and immune score of LGG, the lower the purity of the tumor and the more aggressive. Low glioma purity shows a strong immunophenotype and suggests a poor prognosis [18]. Thus, clinicians and basic researches are required to identify tumor purity that accurately re ect the LGG heterogeneity and complex role of microenvironment, which may also help to explore novel biomarkers of LGG.
We selected 161 LGG patients from the Cancer Genome Atlas (TCGA) as data, and calculated the percentage of tumor in ltrating immune cells (TICs) in LGG and the tumor purity of LGG through ESTIMATE and CIBERSORT calculation methods, as well as the ratio of immune and matrix components, and selected the inter-sample screening in the Gene Expression Omnibus (GEO). LGG genes associated with prognosis were identi ed and the predictive biomarker CD3E was found. The T cell antigen receptor epsilon subunit (CD3E) gene is located at 11q23.3, composed of 9 exons, and is associated with autosomal recessive hereditary early-onset immunode ciency 18 phenotype, which is a severe combined immunode ciency variant [19]. Moreover, CD3E is overexpressed in certain solid tumors and is associated with immunity [20,21]. We started by the differentially expressed genes (DEGs) produced by comparing immunological and matrix components in LGG samples, and revealed that CD3E was a potential indicator of TME status changes in LGG.
In the discovery step, we only select the data set that includes the LGG tissue and normal brain tissue, the titles and abstracts of these data sets were screened, and all information of the data sets of interest were further evaluated. Finally, we select three data sets, GSE107850 on GPL14951, GSE26576 on GPL6801 and GPL570, GSE20395 on GPL9183, were selected for analysis. All data sets are downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo) [23]. Score calculation We use the estimate R software [24] (version 4.0.0) to estimate the proportion of TME immune cells and stromal cells in each LGG sample, we set ImmuneScore, StromalScore and ESTIMATEScore according to the proportion of the corresponding cells in TME.

Subsistence analysis
This study included 161 patients from TCGA database;459 patients from GEO database and 100 patients from AHYMUN database. Survival analysis by R, p < 0.05 was considered signi cant.
We performed Cox univariate analysis on the clinical data of patients in the A liated Hospital of YouJiang Medical University for Nationalities (AHYMUN) Center to evaluate all events that may affect the OS and disease-free survival (DFS) of LGG patients, including age, gender, epilepsy history, Karnofsky score, tumor envelope in ltration, CD3E expression, etc. Screening for prognosis-related differential expressed genes (DEGs) Using "LIMMA" [25] in R software, the data were standardized and miRNA differential expression analysis.
Put the relevant code into R, and analyze the DEGs in the meningioma samples and normal brain tissue samples through the limma software package. P value < 0.05 and | fold change (FC) | > 1 was set as the threshold for identifying Clinical-related DEGs.

Screening for Immune-related DEGs
According to the median of ImmuneScore and StromalScore we calculated, the 161 LGG samples in the TCGA database were marked as high or low. Use package limma to conduct differential analysis of gene expression, and generate Immune-related DEGs by comparing high and low score samples. Immunerelated DEGs (high/low score group) and false discovery rate < 0.05 with a fold change greater than 1 after log2 conversion were considered signi cant. We calculated the TIC value in all LGG data by the CIBERSORT method, and the samples with P < 0.05 can be further analyzed.

Bioinformatics Analysis
The protein-protein interaction (PPI) network is constructed from the STRING database. All gene interaction networks were drawn by Cytoscape (version 3.8.0.) [26]. We performed gene ontology (GO) enrichment analysis of DEGs through R software, and determined the biological processes (BPs), cell components (CCs) and molecular functions (MFs) of each gene. We also performed Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis to show enrichment for related genes.
Gene Set Enrichment Analysis (GSEA)We use GSEA software (vision 4.0.3) to analyze the entire transcriptome of all tumor samples [27], and only genomes with p < 0.05 are considered important. Immunohistochemistry Immunohistochemistry streptavidin peroxidase method was used to detect the expression of CD3E in LGG and nearby normal tissues. The LGG samples were scored according to the degree of cell staining: 0, cytoplasmic yellow particles; 1. Light brown particles; 2. Obvious brown particles; 3. A large number of dark brown particles. The LGG samples were also scored according to the percentage of positive cells, 0 points: 0%, 1: points: <10%, 2: points: 11%-50%, 3: points: 51-80%, 4: points:> 80%. Calculate the nal IHC Score by multiplying the two scores [28].

Results
As shown in Fig. 1, our research is divided into three stages. To estimate the proportion of TICs in LGG samples and tumor purity, transcriptome RNA-seq data from 516 patients were downloaded from TCGA; then ESTIMATE and CIBERSORT algorithms were performed. DEGs shared by ImmuneScore and StromalScore were used to construct a PPI network. Signi cant hub genes in the PPI network were evaluated using univariate Cox regression cross-analysis. Meanwhile, we selected a quali ed data set from the GEO database and conducted a difference analysis to obtain clinical-related DEGs; then the association between all the DEGs and the survival of LGG patients were evaluated and screened. Next, CD3E was identi ed and validated as the most relevant gene after combination of the two datasets of DEGs. Further studies focused on impact of CD3E on survival, GSEA and correlation with TICs. Functional annotations of neighbor genes and clinical validation of CD3E was elaborated. Finally, we put the research conclusions in our own AHYMUN center for clinical cohort study. TME-related scores are related to survival of LGG patients In order to con rm whether the proportion of cells in TME and tumor purity will affect the survival time of LGG patients, we calculated ImmuneScore, StrromalScore and ESTIMATEScore, and drew a Kaplan-Meier survival curve. The higher the Score, the higher the proportion of the corresponding component in TME.
The sum of ImmuneScore and StromalScore is ESTIMATEScore, which also re ects tumor purity from the side. In Fig. 2, TME scores are related to overall survival. ImmuneScore (P = 0.003), StromalScore (P < 0.001) and ESTIMATEScore (P = 0.006) were positively correlated with OS. These results show that we can infer the prognosis of LGG patients based on the proportion of immune cells in TME and formulate personalized treatment plans.

TME-related scores are related to the Clinical features of LGG Patients
We combined the corresponding clinical information of TCGA's LGG patients with the above calculated scores to determine whether the LGG's TME and tumor purity are related to the patient's clinical characteristics. ImmuneScore positively correlated to high grade of LGG ( Fig. 3C, P < 0.001); StromalScore was positively correlated to high grade of LGG ( Fig. 3F, P < 0.001), and ESTIMATEScore accompanied with high grade of LGG ( Fig. 3I, P < 0.001). These results indicate that tumor purity and the ratio of immune/stromal cells in TME are related to the deterioration of LGG. The higher the ratio of immune/stromal cells in TME, the lower the purity of the tumor, and the worse the prognosis of LGG patients.
The Enrichment Analyses of Immune-Related DEGs In order to determine the exact changes in the genetic pro les of immune and matrix components in TME, we compared high-and low-scoring samples based on the median. We got 297 DEGs through ImmuneScore, 201 genes were upregulated, and 96 genes were downregulated (Fig. 4A, C, D). We also got 518 DEGs from StromalScore, which contained 461 upregulated genes and 57 downregulated genes ( Fig. 4B-4D). We found through Venn diagram that 199 upregulated genes with high score and 19 downregulated genes with low score were both in ImmuneScore and StromalScore. These 218 immune-related DEGs may play a decisive role in LGG's TME. We found through GO enrichment analysis and KEGG analysis that the biological functions of these genes are mainly related to immunity. (Fig. 4E-F).
Identify key Immune-Related genes In order to further study the underlying mechanism of the above genes and nd the key genes, We drew the PPI network diagram through String. The interaction between the genes is shown in Fig. 5A. We selected the top 30 genes ranked by the number of nodes and plotted them into a bar graph (Fig. 5B). We performed univariate COX regression analysis on the survival of Immune-Related DEGs and LGG patients to determine which genes are at high risk for LGG patients and which are low risk. (Fig. 5C). Finally, we combined the main nodes in the PPI and the top 75 genes ranked by the p value to analyze them, we have obtained 30 intersecting genes. (Fig. 5D).

Filter clinical-related DEGs and Lock the Target Gene
We use the R language package to screen all the genes that affect survival in three GSE sets. We screened 114 clinical-related DEGs (P < 0.001) that were signi cantly related to survival from 13299 related genes, and compared them with the previous immune-related DEGs to obtain 7 genes: CD3E, TLR2, CCR5, CXCL9, CXCL10, FCGR2A, and ITGAL (Fig. 5E). We mapped the PPI network for these 7 genes (Fig. 5F). 78.89% terms were in co-expression (lavender line), 7.65% terms were shared protein domains (yellow line), 7.11% terms were in co-localization (deep blue line), and 7.11% terms were predicted (khaki line). We also performed GO and KEGG pathway analyses on these 7 genes, nding that the genes were related to immune diseases and in ammatory response (Fig. 5G). Based on the hazard ratio (HR) value of each gene and the survival-related p value, we targeted CD3E for further study.

Identi cation of Clinical-Related DEGs
According to the median of CD3E expression in the sample, we divided the data set into a high and a low expression groups and screened using "log fold change = 1, and P < 0.05". A total of 114 related differential genes were obtained. The 15 genes with the most signi cant up-regulation and the 11 genes with the most signi cant down-regulation were selected for further analysis (Table 1), which were visualized by volcano map (Fig. 6A) and heat map (Fig. 6B). As illustrated in Fig. 6C, gene-gene interaction between Clinical-Related DEGs and related genes was performed. 95.20% terms were in co-expression (lavender line), and 4.80% terms were in co-localization (deep blue line). In Fig. 6D-6F, We conducted a biological function enrichment analysis of DEGs. The results showed that enrichments of biological processes were positive regulation of voltage-gated potassium channel activity, positive regulation of potassium ion transmembrane transporter activity and regulation of pry-miRNA transcription by RNA polymerase II (Fig. 6D); enrichments of cellular components were ion glutamatergic synapse, apical plasma membrane and apical part of cell (Fig. 6E); enrichments of molecular functions were oxidoreductase activity, calmodulin binding and copper ion binding (Fig. 6F). Enrichments in KEGG pathway were glioma, tyrosine metabolism and citrate cycle (Fig. 6G).
We correlated the 20 most signi cantly up-regulated genes and the 20 most signi cantly down-regulated genes with CD3E. Red for positive correlation, and green represents a negative correlation. The deeper the color, the greater the relevance. CD3E is positively correlated with LILRB4, UPK1A, and REM1, negatively correlated with RIT2, OGDHL, and KCNC2 (Fig. 6H).

CD3E Expression is Negatively Related to the Survival of LGG Patients
CD3E is an epsilon subunit of T cell antigen receptor. According to the median of CD3E expression, all LGG samples were divided into CD3E high, median and low expression groups. Survival analysis showed that in TCGA (P = 0.0011; Fig. 7A) and GSE (P < 0.001; Fig. 7B), the survival rate of LGG patients with high CD3E expression was lower than that of CD3E low expression. Similarly, in GEPIA, the OS of the CD3E high expression was lower than that of the low expression (P < 0.001; Fig. 7C)

CD3E is a Potential Indicator of TME Modulation
Considering that CD3E expression is negatively correlated with the survival rate of LGG patients, we performed GSEA analysis on the high expression group. We found that the genes in the CD3E high expression group mainly participated in immune-related activities, such as B cell receptor signaling pathway, chemokine signaling pathway and T cell receiver signaling pathway (Fig. 7D). Furthermore, CD3E was positively related to glioma and immune cell response. These results suggest that CD3E may be a potential indicator of TME status for LGG.

Correlation of CD3E With the Proportion of TICs
We used the CIBERSORT algorithm to analyze the proportion of TICs of 22 immune cells in LGG to further study the correlation between CD3E and the immune microenvironment of LGG. (Fig. 8). We found that the expression of CD3E is related to the TIC of 10 LGG (Fig. 9).Seven kinds of TICs were positively correlated with CD3E expression, including macrophages M0, macrophages M1, mast cells resting, NK cells resting, T cells CD4 memory activated, T cells CD8 and T cells regulatory; three kinds were negatively correlated with CD3E expression, including eosinophils, monocytes and NK cells activated. These results prove that CD3E is related to the immune activity of TME, thereby affecting the tumor purity of LGG.

Clinicopathological features related to CD3E expression
To verify CD3E expression in LGG, we performed immunohistochemistry (IHC) (Fig. 10A-10B). The scatter plot of the IHC scores revealed that CD3E expression increased in LGG tissues in the AHYMUN cohort (P < 0.01). In Table 2, we found that higher CD3E expression is with patients' age (P = 0.027), grade (P < 0.001), microvascular invasion (P = 0.009), history of epilepsy (P < 0.001) and Karnofsky score (P = 0.002). This seems to indicate that the higher the expression of CD3E in patients, the worse the prognosis. We use univariate Cox regression analysis to show the relationship between CD3E and AHYMUN patients, we found that CD3E is not very relevant to age and gender. (Fig. 10C). In the multivariate model, we also found that patients in the high expression group had worse OS (HR = 3.22; P = 0.001). Moreover, in the AHNTU cohort, the microvascular invasion (HR = 1.52; P = 0.024), the presence of capsular in ltration (HR = 1.63; P = 0.016), and the Karnofsky scores (ref < 80) (HR = 1.46; P = 0.023) were associated with low OS (Table 3). We found that the patient's gender and epilepsy history were not related to DFS (Fig. 10D). Multivariate We found through Cox analysis that the high expression of the CD3E gene caused a signi cant decrease in OS (HR = 4.33; P < 0.001) ( Table 3). Including grade, capsular in ltration, microvascular invasion and Karnofsky scores are related to OS (P < 0.05). In Fig. 10E-F, the higher the CD3E expression level, the lower the OS and DFS of LGG patients.

Discussion
In our study, we rst screened the immune genes related to TME in LGG patients from TCGA. Next, we screened out genes related to the prognosis of LGG patients from GEO. After combining the above genes, we determined that CD3E is the target gene. Then we conducted a series of bioinformatics analysis and veri ed the research results from our own center. We found that CD3E may be an indicator gene of the TME status of LGG patients, and by affecting the TME of LGG, thereby changing the tumor purity and affecting the prognosis of patients.
The combination of the cancer cell genotype and the expression program related to the cell phenotype and the in uence of the TME determines the tumor's adaptability, evolution, and resistance to treatment [29]. In recent years, studies on TCGA and GSE have mapped the genetic picture and overall expression status of numerous tumors, identi ed driver mutations and de ned tumor subtypes based on speci c transcription pro les [30,31].
LGG is a common brain tumor, and the prognosis of patients is often poor [32]. However, whether it is surgery, radiation therapy or chemotherapy (usually using temozolomide), can't improve the prognosis and survival of patients [7,33,34]. The reasons for the lack of progress include the growth of invasive tumors in basic organs, which limits the utility of local therapies, and the protection of tumor cells by the blood-brain barrier, their inherent resistance to induced cell death, and the lack of dependence on a single, can targeted carcinogenic pathways [35]. Besides, when pursuing immune-based glioblastoma treatment methods, the unique immune environment of the central nervous system needs to be considered [36][37][38]. Therefore, we need to study novel LGG immunotherapy candidates. Here, we start from the transcription analysis of LGG in TCGA and nd that the decreased expression of CD3E is closely related to poor prognosis of patients. Therefore, CD3E is a potential prognostic indicator and treatment target in LGG patients.
CD3E protein is encoded polypeptide CD3-ε, which together with the CD3-γ, -δ and -ZETA and T-cell receptor α / β and γ / δ T cell receptor heterodimer -CD3 complex. The complex plays an important role in coupling antigen recognition to several intracellular signal transduction pathways, so defects in CD3E can lead to immunode ciency [39]. CD3E also participates in proper T cell development. TCR-CD3 complex assembly is initiated by forming two heterodimers CD3D/CD3E and CD3G/CD3E. It also participates in the internalization of TCR-CD3 complexes and cell surface down-regulation by endocytic sequences present in the cytoplasmic region of CD3E [40][41][42]. The relationship between the abundance of tumor in ltrating lymphocytes and the expression, copy number, methylation or mutation of CD3E in LGG was shown in Supplement Fig. 1.
In LGG patients, the higher the expression of CD3E, the worse the patient's survival. It might be attributed to immune cells with CD3E high expression promoting anti-tumor immunity except T cells regulatory. Similarly, CD3E acts as a T cell receptor, its high expression in many cancers indicates better clinical results (longer survival), with the exception of LGG alone [43]. This may be related to the cause of LGG and the immune environment of the brain, or it may be due to the interconnection between isocitrate dehydrogenase and TME [36,[44][45][46]. Therefore, CD3E may play a dual role in tumors, either promoting survival or inducing apoptosis. In addition, in the TME of glioma, the proliferation of malignant cells is enhances, the pool of undifferentiated glioma cells increases, and macrophage expression exceeds microglial expression [43]. We used GSEA and found that the CD3E high expression enriched immunerelated signaling pathways, such as B/T cell receptor signaling pathway and chemokine signaling pathway. These results indicate that CD3E may be involved in the transition of TME from immune-based to metabolic-based. More and more studies prove that CD3E is related to tumor treatment [43,[47][48][49]. Our research also found that the balance between tumor pathways, sugar metabolism and lactic acid formation can affect the immune status of LGG. Therefore, we suspect that in the development of LGG, the up-regulation of CD3E promotes the decline of tumor purity, and at the same time the transition of TME from immunotype to metabolite type further promotes the deterioration of LGG.
In general, we use the ESTIMATE algorithm to determine the TME-related genes in LGG by analyzing LGG samples in TCGA. Through the analysis of LGG samples in GEO, the prognostic related genes in LGG were determined. The above studies con rmed that CD3E is not only a potential prognostic factor for LGG patients, but also a driving factor for TME to transform from an immune state to a metabolic state.

Conclusion
In conclusion, the purity of LGG has a considerable impact on clinical, genomic and biological status.