Identification of key genes and their microRNAs in glioma: Evidence from bioinformatic analysis

Background Glioma is one of the most common primary intracranial tumors. Although a lot of studies have been conducted to elucidate the pathogeny of glioma, the molecular mechanisms are still unclear because of its complex biological functions. Methods To identify the candidate genes in the carcinogenesis and progression of glioma, microarray datasets GSE4290, GSE122498 and GSE2223 were downloaded from Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were identified, and performed function enrichment of DEGs by Gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. The protein-protein interaction network (PPI) was constructed using STRING and Cytoscape to find hub genes. Survival analysis and GEPIA database was conducted to screen and validate critical genes. Analysis of miRNA and genetic alteration was used to explore and predict the molecule mechanism. A total of 150 DEGs were identified, consisting of 54 downregulated genes and 96 upregulated genes. The enriched functions and pathways of the DEGs include regulation of transportation, synaptic transmission and SH3 domain binding. Fifteen hub genes were identified and biological process analysis revealed that these genes were mainly enriched in SH3 domain binding, neuron projection terminus, mitotic nuclear division, condensed chromosome and affected the brain development. Survival analysis showed that VAMP2, PPP3CA, DLGAP5, KIF14, REPS2, CENPU, KNTC1 and SMC4, may be involved in the carcinogenesis, invasion or recurrence of glioma. These 8 hub genes, which were related miRNAs and genetic changes were commonly involved in the development of glioma, were closely associated with tumor grade. DEGs and hub genes identified in the present study help us understand the molecular mechanisms of carcinogenesis and progression of glioma, and provide candidate targets for diagnosis and treatment of glioma.


Introduction
Gliomas are one of the most common aggressive neoplasms of the brain, and it is caused by carcinogenesis of the brain and spinal glial cells [1] . The annual incidence of glioma is 3-8 cases per 100,000 people [2] , and the incidence is increasing in recent few years [3][4] . Currently, the standard treatment for glioma is surgical resection and postoperative combined radiochemotherapy, but due to the complex biological characteristic, it is easy to relapse and the prognosis is extremely poor. About 70% of glioma patients only have a two-year survival time after diagnosis, and more than 90% of patients die within five years [5][6][7] . According to the grading system set by the World Health Organization (WHO) [8] , gliomas are classified into four grades: Grade 1, the slightest with the most favorable prognosis; to grade 4, the most severe with the worst prognosis of all grades. The anaplastic gliomas, in terms of traditional cytopathology, are classified as grade 3, and glioblastoma (GBM) is classified as grade 4 [9] . GBM is evolved from astroid neuroglial cell lineage that supports neurocytes, and it accounts for 12-15% of intracranial tumors, and 50-60% of astrocytic tumors [10] . Although the exact pathogenesis and factors are not clear, numerous studies have shown that genetic mutations play an important role in the gliomas over the past few years [11][12][13] . Some experts thought that classification based on gene expression is a better indicator of survival than traditional histological classification [14] .
Recently, microarray technology and bioinformatic analysis have made breakthrough progress in searching oncogenes, and have identified many biomarkers that can be used for diagnosis and prognosis of various cancers [15][16][17] . For glioma, it is determined that the basic molecules could effectively resist the drug resistance and tumor recurrence. Therefore, it can be used to screen for genetic variation, which can help us identify the differentially expressed genes (DEGs), molecular mechanism and new therapeutic targets related to the canceration and progression of glioma. Furthermore, co-expression analysis could help us perform multigene analysis to obtain complex mechanisms for large-scale data, especially for those identifying functional modules. However, the results obtained in the independent microarray analysis may be not accurate and reliable due to the false-positive rate. Thus, in this study, 3 profiles from Gene Expression Omnibus (GEO) were downloaded and analyzed to identify DEGs between the tumor and normal tissues. Besides, the enrichment analysis of Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, and protein-protein interaction (PPI) network analyses were performed to annotate gene function and screen hub genes. The prognostic value was evaluated through survival analysis and the relationship between gene expression and disease grade, which implies some potential prognostic biomarkers and therapeutic targets.

Microarray Data
Three gene expression profiles in GSE4290 [18] , GSE122498 [19] , and GSE2223 [20] were downloaded from the GEO (http://www.ncbi.nlm.nih.gov/geo) database [21][22] , which was based the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array) by Fine HA and Fritsche J, and the GPL1833 platform by Bredel M. The GSE4290 dataset contains 157 cases of glioma and 23 cases of normal brain tissue. GSE122498 has 16 glioblastoma samples and a normal sample. GSE2223 includes four normal samples and different histological types of a total of 50 glioma samples.

Identification of Degs
GEO2R(http://www.ncbi.nlm.nih.gov/geo/geo2r), which is an online tool based on R languages, was used to analyze the DEGs between glioma and normal brain tissue. Platform annotation files were used to annotate the probes. Probe sets without corresponding gene symbols or with numerous genes were removed. For different probes that match the same gene, the average value of the probes was used as the final gene expression value. We defined DEGs as differentially expressed with |logFC| ≥ 1 and defined as considered statistically significant with P-value < 0.05. By comparing these three data sets for analysis to generate common DEG, the results are presented in Venn diagrams.

Functional and Pathway Analysis of Degs
Including biological process (BP), cellular component (CC) and molecular function (MF), Gene Ontology (GO) analysis is used to annotate genes and used for functional studies of genes [23] . The Kyoto Encyclopedia of Genes and Genomes (KEGG) contains not only each single pathway, but also its related pathways, which is an online biochemical information database [24][25] . Metascape (http://www.metascape.org/) is a commonly used online analysis tool that supports the enrichment analysis of common model organisms [26] . DAVID (https://david.ncifcrf.gov/), an online tool, can provide systematic and comprehensive biological function annotation information for functional analysis of genes [27][28] . Therefore, we applied GO and KEGG pathway analyses to the DEGs by using Metascape and DAVID online tools at the functional level. P-value < 0.05 was chosen to have significant differences, and molecules > 3 was considered meaningful.

Ppi Network Construction and Module Analysis
The Search Tool for the Retrieval of Interacting Genes (STING: https://string-db.org/cgi/) is a database of proteinprotein interations including physical and functional associations [29][30] . We applied it to construct PPI network of DEGs and the interaction score > 0.4 was considered statistically significant, before visualizing by Cytoscape software (http://www.cytoscape.org). Cytoscape supports many use cases in molecular and systems biology, genomics, and proteomics [31] . We used this software to visualize the up-regulated and down-regulated genes of the PPI network respectively. In addition, the MCODE application is used to draw the most significant module in the PPI network. The criteria for selection were degree cut-off = 2, node score cut-off = 0.2, Max depth = 100 and k-score = 2.

Hub Genes Identification and Analysis
We identified the top 15 hub genes in the network by using the cytoHubba application in Cytoscape [31] , and visualized the expression of the 15 hub genes in the GSE122498 data set. We performed overall survival and disease free survival analyses of those hub genes through the online database Gene Expression Profiling Interactive Analysis (GEPIA: http://gepia.cancer-pku.cn/) [32] , and P < 0.05 was considered to have statistical significance. Eventually, we selected the 8 hub genes with the most obvious differences for further analysis. The GEPIA online database contains TCGA and GTEx data, which can be combined and analyzed for different databases at the same time [32] . GEPIA was used to verify and analyze the expression of 8 hub genes in different grades of gliomas and normal brain tissues.

Construction of Mirna-target Gene Regulatory Network
MiRNAs are involved in the regulation of post-transcriptional gene expression in plants and animals [33] . The miRWalk2.0 (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/) database was used to search for miRNAs and their target genes related to glioma [34][35] . Through comparing target genes with the 8 hub genes, miRNAhub gene pairs were obtained. Then, miRNA-hub gene regulatory network was visualized by Cytoscape software [31] .

Identification of DEGs in Glioma
Following the analysis of the datasets (GSE4290, GSE122498 and GSE2223) with GEO2R, the differences expression of multiple genes between normal and glioma tissues was presented in volcano plots ( Fig. 1A-1C). The analysis of three datasets extracted 2,588, 6,232 and 1,129 DEGs respectively, and the Venn diagram revealed the common DEGs (Fig. 1D). There were 150 common DEGs, including 96 up-regulated genes and 54 down-regulated genes in glioma tissues.

Degs Go and Kegg Pathway Analysis
To analyze the biological classification of DEGs, functional and pathway enrichment analyses were performed using Metascape. GO analysis results showed that changes in biological processes (BP) of DEGs were significantly enriched in regulation of transporter activity, regulation of potassium ion transport, vesicle transport, synaptic transmission and the regulation of growth and neuron differentiation (Table 1). Changes in cell component (CC) of DEGs were mainly enriched in the microtubule associated complex, perinuclear region of cytoplasm, dendritic spine, endocytic vesicle, apical plasma membrane and synapse (Table 1). Changes in molecular function (MF) were mainly enriched in Src homology 3 (SH3) domain binding, protein domain specific binding, ATPase coupled ion transmembrane transporter activity and small ubiquitin-related modifier (SUMO) binding (Table 1). KEGG pathway analysis revealed that the DEGs were mainly enriched in adherens junction, cGMP-PKG signaling pathway, glutamatergic synapse and endocytosis.

Construction of the PPI network and identification of significant modules and hub genes
Based on the STRING database, the PPI network for the 150 DEGs was constructed, which was displayed by Cytoscape (Fig. 1E). And the most significant modules were obtained using Cytoscape (Fig. 1F). Then we applied the cytoHubba application to screen out the top 15 hub genes in the whole network ( Fig. 2A). The table showed the abbreviations, full names and major functions for these hub genes ( Table 2).

Functional Annotation And Analysis Of The Hub Genes
The biological process and KEGG enrichment analysis of 15 hub genes are performed in the figure (Fig. 2B). The hub genes mainly enriched in the following GO terms and pathways: clathrin-clated pit, brain development, SH3 domain binding, neuron projection terminus, mitotic nuclear division, Notch signaling pathway and condensed chromosome (Fig. 2B). Heat map revealed that the hub genes could basically differentiate the glioma samples from the normal samples (Fig. 2C). Subsequently, we drew the overall survival and disease free survival curves of 15 hub genes based on GEPIA datasets to explore their prognostic value. According to the results, we selected the 8 genes that most relevant to prognosis as candidate genes, which are VAMP2, LRP2, PPP3CA, DLGAP5, KIF14, REPS2, CENPU, KNTC1 and SMC4 (Fig. 3-4). We noted that high expression of VAMP2, PPP3CA and REPS2 showed better overall and disease-free survival than low expression ( Fig. 3A-C, 4A-C), while high expression of KIF14, DLGAP5, CENPU, KNTC1 and SMC4 showed reductions in overall and disease-free survival (Fig. 3D-H, 4D-H). In the GEPIA online database based on the TCGA and GTEx data, higher mRNA levels of KIF14, DLGAP5, CENPU, KNTC1 and SMC4, and lower mRNA levels of VAMP2, PPP3CA and REPS2 were associated with tumor grade (Fig. 5). CBioportal analysis showed that genetic changes of 8 hub-genes, which was that some of them were amplified and some were mutated (Fig. 6B).

Mirna-target Gene Regulatory Network Analysis
A total of 93 miRNAs related to glioma were identified from miRWalk2.0, and only 49 miRNAs remained after removing the duplicate miRNAs and miRNAs in mice. The target genes of the selected miRNAs were compared with the 8 hub genes, and a total of 28 miRNA-hub gene pairs were obtained. The miRNA-hub gene regulatory network was visualized by Cytoscape software, consisting of 30 nodes and 28 edges (Fig. 6A).

Discussion
Glioma refers to a tumor derived from neuroepithelium, which is the most common primary intracranial tumor [2] . The patients with glioma suffer from a high recurrence rate and short survival time with low cure rate and poor prognosis. The changes of genes are implicated in the development of multiple cancers, including glioma. Despite some genes mutations have been confirmed to play key roles in glioma, the specific molecular markers in the occurrence and development of glioma remained to be elucidated. Recent advances in genomic analysis have helped us comprehensively learn the mechanisms of glioma and identified some potential biomarkers of glioma for prognosis and therapy. In this study, the expression and prognostic values of hub genes in glioma were analyzed.
In the present study, we performed gene expression profile analysis among three datasets: GSE4290, GSE122498 and GSE2223, totally included 223 GBM and 24 normal samples. At last, overlapped 150 DEGs among three datasets were identified, of which 96 were upregulated and 54 were downregulated. GO annotation indicated that the DEGs were mainly manifested in SH3 domain binding, regulation of transporter activity, actin filament organization and synaptic transmission. KEGG pathway shows that the DEGs were associated with endocytosis, glutamatergic synapse and cGMP-PKG signaling pathway. These results indicate that the pathogenesis of glioma is a complex biological process driven by different genes and epigenetic changes. Abnormal regulation of multiple genes promotes the occurrence and development of glioma through multiple pathways. In order to identify the key genes, we constructed a protein-protein interaction network through STRING, and then we focused our attention on the top 15 hub genes selected by cytoHubba from the network according to their degree, which were identified that may provide new approaches for therapeutic studies of glioma. GO annotation indicated the 15 hub-genes mainly took part in clathrin-clated pit, SH3 domain binding, neuron projection terminus, mitotic nuclear division, condensed chromosome and affected the brain development; among them, NOTCH1, DLGAP5, and AAK1 were enriched in the Notch signaling pathway. As is known to all, those biological process and pathway play a vital role in cancer. Proteins bind to the SH3 domain to participate in cytoskeletal elements and signal transmission. Notch signaling pathway consists of a set of genes and their products that affect multiple processes of normal cell morphogenesis, including the differentiation of multipotent progenitor cells, apoptosis, cell proliferation, and the formation of cell boundaries [36] . Previous studies have shown that blocking the Notch pathway caused cell cycle exit, apoptosis, and differentiation in tumor cell lines [37] . However, researches in recent years have demonstrated that the Notch signaling pathway mediates cell fate decisions and is tumor suppressive or oncogenic depending on the context [38][39] . In our brain, Notch signal maintains the normal neural stem cells, and also maintains the growth of brain cancer stem cells, which indicates that it has a carcinogenic effect. And the activation of genes related to Notch pathway reduces the growth of glioma and increases survival rate, which means the Notch signaling restricts cell proliferation and is a predominant signaling pathway in glioma [40] . For the future, it is necessary to further clarify the precise molecular mechanism of this pathway and related genes, as well as the genes, molecules and pathways that have synergy with it.
The present survival analysis of 15 hub genes revealed that VAMP2, PPP3CA, DLGAP5, KIF14, REPS2, CENPU, KNTC1 and SMC4, were highly associated with the prognosis of glioma patients; patients with high expression of DLGAP5, KIF14, CENPU, KNTC1 and SMC4 showed shorter overall survival time than those with low expression level, while the glioma patients with the higher expression of VAMP2, PPP3CA and REPS2 had longer overall survival time. GEPIA analysis showed that higher mRNA levels of DLGAP5, KIF14, CENPU, KNTC1, SMC4 and lower mRNA levels of VAMP2, PPP3CA, REPS2 were associated with higher tumor grade, especially in the GBM. MicroRNAs (miRNAs) are short non-coding RNAs that regulate the post-transcriptional gene expression of animals and plants, and influence the expression of target genes by directly binding their mRNA [41][42][43] . To predict the mechanism of action of 8 hub genes in glioma, we constructed a miRNA-8 target gene regulatory network and studied the genetic alteration of selected genes. VAMP2 (Vesicle associated membrane protein 2), also known as Syb2 or Synaptobrevin 2, is the most representative type of VAMP protein family. VAMP2-mediated vesicle fusion provides materials for neuronal morphogenesis, and the reduction of VAMP2 level is the basis of neurotransmission defects [44][45] . The expression of VAMP2 in glioma patients significantly decreased, which indicates that the loss of nerve function in patients with glioma is related to VAMP2. According to miRNA and mutation analysis, we predicted that hsa-let-7d-5p, hsa-miR-146b-3p and hsa-miR-483-5p suppressed the late stage of VAMP2 gene transcription, thereby reducing the Syb2 protein instead of mutation of VAMP2. PPP3CA (Calcineurin A Alpha or CALNA) is protein phosphatase 3 catalytic subunit alpha Current study showed the different types of mutations in PPP3CA cause several distinct disorders [46] , particularly remarkable in severe epilepsies and other neurodevelopmental diseases [47] . Our analysis illustrated that the mRNA expression of PPP3CA had decreased and the mutation of PPP3CA showed deep deletion, which means that PPP3CA relied on mutations to cause glioma. But we still have no idea what role the corresponding miRNAs (miR-211-5p, hsa-miR-381-3p and hsa-miR-524-5p) of PPP3CA played in the process of glioma. DLGAP5 (Discs large homolog associated protein 5), also known as HURP or DLG7, is a mitotic spindle protein that promotes the formation of tubulin polymers [48] . It has been proved that DLGAP5 may promote cancer by regulating spindle function and its organization, and regulating the progression of M phase in the cell cycle [49] . We found that DLGAP5 increased the expression of mRNA in gliomas, which was the same as the genetic changes shown, but no regulatory miRNA of DLGAP5 was found in miRNAs associated with glioma. This is the same as the research results that have been reported, indicating that DLGAP5 plays an important role through increasing expression in the occurrence of glioma. KIF14 (Kinesin Family Member 14) promotes the occurrence and development of various of tumors by regulating the formation of mitotic spindles, the separation of chromosomes and the completion of cytokinesis [50] . At the same time, KIF14 is also related to tumor resistance [51] . At present, studies have shown that inhibiting the expression of KIF14 can inhibit the growth of glioma cells and promote apoptosis [52] . Studies on the molecular mechanism of KIF14 indicated that long non-coding RNA PAXIP1-AS1 up-regulated the expression of KIF14 by recruiting the transcription factor ETS1, and promoted cell invasion and angiogenesis of glioma [53] . MiR-137 and miR-6500-3p can target regulation KIF14 to affect the proliferation of glioma cells [54] . In this study, we also found other miRNA, like hsa-miR-124-3p, hsa-miR-17-5p, hsa-miR-193a-3p, hsa-miR-20a-5p, hsa-miR-92a-3p and hsa-miR-93-5p, that can regulate the expression of KIF14. REPS2 (RALBP1 Associated Eps Domain Containing 2) or POB1 encode a protein that regulates the endocytosis of growth factor receptors. REPS2 plays an important role in inhibiting cell proliferation, migration and inducing apoptosis of cancer cells [55] . It has been identified as a biomarker with good prognosis for prostate, breast, esophageal and other cancers [56][57][58] . In our research, we found 9 targeted miRNAs for REPS2, namely hsa-let-7a-5p, hsa-let-7d-5p, hsa-miR-146b-5p, hsa-miR-193a 3p, hsa-miR-199b-5p, hsa-miR-19a-3p, hsa-miR-21-5p, hsa-miR-222-3p and hsa-miR-326. In terms of genetic changes, REPS2 had a variety of mutations in glioma, which showed that REPS2 triggers the occurrence of glioma through various pathways and factors. But there is no research on REPS2 in glioma. This provides direction for future research. CENPU (Centromere Protein U), alias MLF1IP. CENPU is an essential centromere component of mitosis and is involved in the oncogenesis of various cancers [59] . Downregulation of CENPU significantly inhibited the proliferation, migration and invasion of tumor cells [60] . In terms of genetic changes, the expression of CENPU at the core of glioma increased [61] , which was consistent with our findings. However, the mutation of CENPU displayed that the CENPU gene was more likely to be deeply deleted.
We speculated that hsa-miR-181b-5p, hsa-miR-196a-5p and hsa-miR-24-3p were associated with the upregulation of CENPU expression in glioma, making up for the effects of gene mutation. KNTC1 (Kinetochore Associated 1) is a key component of mitosis, which encodes a protein involved in ensuring chromosome segregation and spindle assembly during cell division [62] . KNTC1 knockdown inhibits the cell viability of cancer cells and induces their apoptosis [63] . In the development of gliomas, KNTC1 expression increased significantly without the involvement of miRNA. We did not find miRNAs associated with KNTC1 in glioma, and the genetic changes of KNTC1 in glioma were diverse, indicating that KNTC1 mutation played an important role in glioma. SMC4 (Structural Maintenance Of Chromosomes 4), the last selected gene we chosen, is required for chromosome assembly and separation during cell division [64] . The expression of SMC4 is significantly increased in tumors, and inhibition of SMC4 expression can significantly inhibit the proliferation of cancer cells and reduce their malignancy [65] . Our study found that SMC4 expression increased in glioma, especially in GBM. This is consistent with existing research performance [66] . The analysis of miRNAs related to glioma showed that hsa-miR-181b-5p regulated the expression of SMC4, which indicated that hsa-miR-181b-5p promotes the occurrence of glioma by up-regulating the expression of SMC4.

Conclusion
In conclusion, the present study was designed to identify DEGs that may be involved in the progression of glioma. 8 hub genes, which are VAMP2, PPP3CA,DLGAP5, KIF14, REPS2, CENPU, KNTC1 and SMC4, were identified and may regard as diagnostic biomarkers for glioma. And the potential mechanism of these 8 hub genes was predicted through genetic alteration and miRNA. However, further studies are required to elucidate the biological function of these genes in glioma.

Declarations
Ethics approval and consent to participate: Not applicable.

Consent for publication:
Not applicable.

Availability of data and materials:
The datasets generated and/or analyzed during the current study are available in the Gene Expression Omnibus (GEO) repository, http://www.ncbi.nlm.nih.gov/geo. This research was supported by the Graduate Innovation Funding Project of Hebei University in 2020 (No. hbu2020ss057). Hebei Provincial Department of Finance and Hebei Provincial Health Commission 2019 government-funded clinical medicine outstanding talent training project plan(361007-2).

Authors' contributions:
Xc-Wen made substantial contributions to conception and design, and draught manuscript. Mj-F and Ws-W made contributions to acquisition of data, or analysis and interpretation of data. Zm-Z revised the manuscript critically for important intellectual content. Kb-Z gave final approval of the version to be published.   Univariate overall survival analysis of the hub genes was performed using GEPIA online platform.

Figure 4
Univariate disease-free survival analysis of the hub genes was performed using GEPIA online platform.

Figure 5
The expression of 8 hub genes in GBM, LGG and normal brain tissues was analyzed using GEPIA online platform.