Comprehensive Analysis of Key Genes and Regulatory Elements in Ewing Sarcoma With Prognostic Values

Background: Ewing sarcoma (ES) is one of the most common types of round cell mesenchymal neoplasms related to children and young adults. It is characterized by chromosomal translocations. However, that does not completely explain the ES proliferation and development. Methods: Several public data series (GSE45544, GSE68591, and GSE68776) were used to identify differentially expressed genes (DEGs) between ES and normal tissue. Furthermore, functional enrichment analysis and protein-protein interaction network of DEGs were performed. The regulatory network of DEGs and hub genes, survival analysis of hub genes was visualized. In addition, functional predictions of the candidate gene were analyzed. Results: A total of 142 DEGs 10 hub genes were identied. While out of them, only one candidate gene FGF7 was found to be suitable as a candidate gene. Conclusions: In conclusion, the DEGs, hub genes, and their regulatory molecules screened out in this research could contribute to comprehend the mechanisms in ES and provide potential research molecular for further study.


Background
Ewing sarcoma (ES) is one of the most common types of round cell mesenchymal neoplasms. The prevalence of ES is most often related to children and young adults [1]. Molecularly, ES is characterized by repeated chromosomal translocations, resulting in a framework fusion protein consisting of the amino terminus of the Ewing sarcoma protein (EWS) linked to the carboxy terminus of different transcription factors. In most of the cases, the translocation is a fusion of the Ewing sarcoma breakpoint region 1 (EWSR1) gene on chromosome 22 with the Friend Leukemia Virus Integration Site 1 (FLI1) gene on chromosome 11 [2,3]. This aberrant translocation causes the chimeric oncogene EWS-FLI1 which could promote ES pathogenesis [4]. Meanwhile, the clinical presentation of ES still varies from patient to patient. Although the treatment of ES has made great progress through unremitting efforts, there are still some patients who do not respond well to treatment. However, the EWS-FLI1 molecule alone does not completely solve this problem. The major challenges in terms of novel molecular players could be involved in ES proliferation and development.
In recent years, with the development of sequencing technology, enormous amounts of data of tumors had been generated. Bioinformatics analysis of these data can reveal a better understanding of the underlying mechanisms of tumor development. Nowadays, numerous researches have been published on many different cancer types. Similarly, ES is no exception, some key genes involved in metastasis and development [5][6][7][8]. However, these studies contain one or two datasets. Integrating more datasets is likely to yield more valuable ndings.
In our study, several microarray datasets from the Gene Expression Omnibus (GEO) database including ES and normal tissues were analyzed to obtained differentially expressed genes (DEGs). Subsequently, functional and pathway enrichment analyses of the DEGs were used to reveal the underlying mechanisms. Besides, a protein-protein interaction (PPI) network was constructed to screen out hub genes in the DEGs. Meanwhile, we constructed regulatory networks of the DEGs and the hub genes. Furthermore, the immune in ltration analysis, DNA methylation analysis, isomer expression analysis, and the survival analysis of the hub genes were carried out to select one candidate gene and to predict the possible function of the candidate gene. Methods 1. Analysis of gene expression pro les to identify the DEGs.
Three gene expression datasets GSE45544 [9], GSE68591 [10], and GSE68776 [11] containing ES and normal tissue data, as available datasets, were deposited in the GEO (http://www.ncbi.nlm.nih.gov/geo) [12]. The GSE45544 dataset contained 6 ES patient samples and 22 normal tissues samples. The GSE33006 contained 22 ES cell lines and 5 normal human cells. The GSE41804 contained 32 ES patient samples and 33 normal tissue samples. Normal bone marrow was chosen as the contact normal tissue group to the ES. The DEGs between the ES group and the corresponding normal tissue group were analyzed using the R language (version 4.0.2) with the Limma package. The p-value < 0.01 and absolute value of logFC (fold change) > 1 were considered statistically signi cant.

Dimensionality Reduction of the DEGs
In addition, the 3D-PCA dimensionality reduction was performed on samples from tumor sarcoma tissue, normal tissue, and skeletal muscle tissue using the online database GEPIA2 (http://gepia2.cancer-pku.cn/index.html) [13]. The data was based on their expression of a group of genes.

GO functional analysis and pathway enrichment of DEGs
Gene Ontology (GO) [14] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [15]analyses of DEGs were performed by WebGestalt (WEB-based Gene SeT AnaLysis Toolkit, http://www.webgestalt.org) [16], which is a continuously updated useful functional enrichment analysis web tool. Over-Representation Analysis (ORA) was selected as the analyzed method. The GO functional enrichment was performed in the biological process no Redundant (BP), cellular component no Redundant (CC), molecular function no Redundant (MF), and the pathway analysis was performed in the KEGG pathway. The p-value < 0.05 was considered to indicate statistical signi cance.

Construction of the PPI network to select hub genes
The STRING database (online Search Tool for the Retrieval of Interacting Genes, http://string-db.org, RRID: SCR_005223) [17] was used to predict a PPI network of DEGs. The selecting parameter was that the interaction scores greater than 0.4. Furthermore, the network was visualized by Cytoscape software (version 3.8.0) [18]. Then, MCODE (Molecular Complex Detection,) [19], a plug-in of Cytoscape, was open to select hub genes, with the following criteria: MCODE score >5.5, degree cut-off=2, score cut-off=0.2, max depth=100, and k-score=2.

Regulatory network of the hub genes
To analyze the miRNA-gene and TF-gene networks among DEGs and hub genes, the database NetworkAnalyst (https://www.networkanalyst.ca/NetworkAnalyst/home.xhtml) [20] was applied to use. NetworkAnalyst is a powerful webbased 3D visual analytics platform for comprehensive pro ling, meta-analysis, and systems-level interpretation among genes, proteins, miRNAs, transcription factors (TF), and metabolites. In this study, we choose miRTarBase database for building gene-miRNA interactions and the ENCODE database for TF-gene interactions. Besides that, to analyze TF-lncRNA interactions. All the networks above were visualized by Cytoscape software (version 3.8.0) [18].

Immune in ltration analysis of the hub genes
Tumor IMmune Estimation Resource (TIMER, https://cistrome.shinyapps.io/timer/) [21] is a comprehensive web server for calculating a systematic analysis of immune in ltrates based on the microarray expression values across diverse cancer types. The immune in ltration estimation of the hub genes was performed in SARC dataset by TIMER. The scatterplots of the hub genes were generated to show the purity and different immune cells with partial correlation coe cient and statistical p-value.

DNA methylation analysis and Isomer expression analysis of the hub genes
The level of DNA methylation in the promoter region of the hub genes was obtained from the UALCAN (http://ualcan.path.uab.edu/index.html) [22]. While the analysis of the expression level of the isomers of the hub genes was also obtained by GEPIA2 [13]. 8. Survival analysis of the hub genes PROGgeneV2 (http://genomics.jefferson.edu/proggene) [23] is a comprehensive web tool that can be used to indicate the prognostic signi cance of genes with publicly available data. All the hub genes were performed separately with the cancer type chosen in the sarcoma dataset. The overall survival plots were created based on the cohort divided at the median expression of the given input gene. The hub genes that had p-values < 0.05 were considered as candidate genes and taken further analyzed.

Prediction of functions of the candidate genes
The database GeneMANIA (http://www.genemania.org) [24] was used to construct the functional association network among the candidate genes and their interactive genes. The statistical options were as follows: max resultant genes = 20, max resultant attributes = 10. Subsequently, the expression of candidate genes in normal and corresponding malignant tissues was displayed by the Anatomic Viewer of the online SAGE database (http://www.ncbi.nlm.nih.gov/SAGE) [25]. Furthermore, the heatmaps of positively and negatively correlated signi cant genes of the candidate genes were illustrated by the LinkedOmics (http://www.linkedomics.org/) [26] The parameter selection of all these tools above is based on the developers' references to select the most appropriate parameters.

Identi cation and PCA analysis of the DEGs.
After standardization of the microarray results, DEGs (5,744 in GSE68591, 1,859 in GSE45544 and 2,066 in GSE68776) were identi ed. The overlap among the 3 datasets contained 142 genes as shown in Fig. 1A. While the PCA dimensionality reduction showed a good distinction between the expression of genes among the tumor sarcoma tissue, normal tissue, and skeletal muscle tissue (Fig. 1B).

GO functional analysis and pathway enrichment of DEGs
The biological classi cation, functional and pathway enrichment analyses of DEGs are shown in Fig. 2 (details are shown in Table 1). In biological processes (BP) ontology, DEGs were signi cantly enriched in positive regulation of cell motility, leukocyte migration, and neutrophil mediated immunity ( Fig. 2A). In molecular function (MF) ontology, DEGs were mainly enriched in growth factor binding, extracellular matrix structural constituent, and transmembrane receptor protein kinase activity (Fig. 2B). In cell component (CC) ontology, DEGs were mainly enriched in the extracellular matrix, secretory granule membrane, and speci c granule (Fig. 2C). Besides that, KEGG pathway analysis revealed that the terms were related to the Rap1 signaling pathway, Ras signaling pathway, and Glycine, serine, and threonine metabolism (Fig. 2D).

Construction of the PPI network to select hub genes
The PPI network of the DEGs was constructed by Cytoscape as shown in Fig. 1C, while the most signi cant module with 10 nodes and 17 edges as shown in Fig. 1D. These 10 genes were identi ed as hub genes and the detailed information of hub genes is provided in Table 2. The hub genes may be an important regulatory network in the development of ES.

Immune in ltration analysis of the hub genes
The expression levels of CTSB, CTSD, HMOX1, BCL2L1, CD68, TEK, IL1R1, and VEGFC were negatively associated with tumor purity while FGF7 and PTGS2 were positively associated with tumor purity (Fig. 4). This result indicated that most hub genes are probably expressed in the microenvironment, while FGF7 and PTGS2 are probably expressed in the tumor cells.
6. DNA methylation analysis and Isomer expression analysis of the hub genes Furthermore, the relationship between DNA methylation level in the promoter region of the hub genes and the occurrence of SARC was explored. It was found that only the DNA methylation levels of CTSB in SARC were signi cantly lower than those in corresponding normal tissues, while there was no signi cant difference in other genes (Fig. 5). Meanwhile, the expression levels of isomers of the hub genes were also analyzed. For BCL2L1, BCL2L1-004 are highly expressed in SARC. For CD68, CD68-001 and CD68-002 are highly expressed in SARC. For CTSB, CTSB-006 and CTSB-001 are highly expressed. For CTSD, CTSD-001 and CTSD-006 are highly expressed. For FGF7, FGF7-001 are highly expressed. For HMOX1, HMOX1-001 are highly expressed. For PTGS2, PTGS2-001 are highly expressed. For TEK, TEK-002 is highly expressed. For IL1R1, IL1R1-005 and IL1R1-001 are highly expressed. For VEGFC, VEGFC-001 is highly expressed (Fig. 6).

Survival analysis of the hub genes
The overall survival plots of hub genes could be obtained as shown in Fig. 7. Unfortunately, most of them, namely CTSB, CTSD, HMOX1, BCL2L1, CD68, TEK, IL1R1, PTGS2, and VEGFC, had no statistically signi cant difference in the survival curve. While only FGF7 had a p-value < 0.05. High expression of FGF7 had a better overall survival (Fig. 7).
Together with all the results above, FGF7 is one of the hub genes which could probably be expressed in the tumor cells and its expression had a relationship with the prognosis of ES patients. Therefore, FGF7 was considered as a candidate gene for further analyses.

Prediction of functions of the candidate genes
A functional network of FGF7 and interactive association genes was constructed by GeneMANIA. The function of this gene set was mainly enriched to broblast growth factor receptor signaling pathway, mesenchymal cell proliferation, and renal system development (Fig. 8A). Besides that, the expression pro les of the FGF7 in human tissue are displayed. As shown, FGF71 mRNA in thyroid, lung, and colon normal tissues displayed higher levels than in the matched cancer tissues (Fig.  8B). Subsequently, the gene expression correlation heatmaps indicated that ATP8B4, IL33, A2M, NPR1, and PPAP2A were the most positively correlated signi cant genes of FGF7 (Fig. 8C), and WNT7B, PRR7, NOTUM, ETV4, and RPSAP52 were the most negatively correlated signi cant genes of FGF7 (Fig. 8D). All these ndings predict the function of FGF7 and can be helpful for further research.

Discussion
The genetics of ES is relatively simple and are driven by chromosomal translocations leading to fusion oncogenes [27]. The oncogene EWS-FLI1 plays as an aberrant transcription factor which is enhanced by its interaction with several molecular partners. The current study found that the RNA helicase A (RHA), which interacted with EWS-FLI1, would lead to cancer progression and drug resistance [28]. Besides that, RHA also enhances EWS-FLI1-mediated transcription and promotes migration and proliferation of ES cells [29].
In our present study, several microarray datasets analyses were used to obtain DEGs and hub genes together with their regulatory elements from ES contrast to normal tissue. Altogether, 142 DEGs, 10 hub genes, and 1 candidate gene FGF7 were identi ed. Subsequently, further analyses of the hub genes were used to explore some other interactions. Previous studies have reported that some key molecules could strongly induce the motility and invasion of ES cells [30]. Moreover, studies have revealed that different constituents of the extracellular matrix may be held responsible for its effects on different parameters of ES proliferation [31]. Meanwhile, IGF-1R/RAS/Rac1 signaling played a role in the hyperstimulation of macro-pinocytosis and selective death of EW cells [32]. In summary, all these studies are consistent with the results obtained in our studies.
Consistently, 10 hub genes were screened out from the PPI network of DEGs. Some of them have been stated involved in the tumorigenesis and progression of ES. A previous study revealed that BCL2 interacted with BCLXL to escape PARP inhibitor toxicity in ES [33]. The density of CD68-positive Tumor-associated macrophages was found in ES tissue samples and statistically correlated with clinical parameters [34]. Meanwhile, TEK and VEGFC were reported their expression was upregulated in ES which may allow the development of sarcoma subtype-targeted therapeutics [35,36]. Interestingly, Celecoxib, a PTGS2 (also known as cyclooxygenase-2, COX-2) inhibitor, could signi cantly inhibit the invasion and migration of ES cells [37,38].
In contrast, a search of published literature revealed that the other hub genes have been associated with tumors. It is previously described that inhibitors and small interfering RNA (siRNA) targeting CTSB could inhibit Kaposi's sarcoma proliferation which was crucial for the transformation of endothelial cells [39]. The mRNA and protein expression of HMOX1 were up-regulated in and treatment with the HMOX1 inhibitor could reduce the proliferation of Kaposi's sarcoma [40]. Additionally, when infused γδ T cells lacked IL-1R1, their anticancer immune responses were lost [41].
Further survival analysis revealed FGF7 as a candidate gene because of its signi cantly statistical function. Notably, there was no report on FGF7 in ES, while there had been some studies in sarcoma. In leiomyosarcoma, the SK-LMS-1 cells only responded to exogenous FGF7 but did not respond to VEGF [42]. In chondrosarcoma, aberrant splicing of broblast growth factor receptor was able to bind FGF7, leading to loss of ligand-binding speci city [43]. In contrast, there was another research reported that FGF7 was a preferentially used growth factor in cancer, which was based on the special type receptor in carcinoma cells but not on alternative splicing in sarcoma cells [44].
In this work, a disadvantage is that the information on DEGs, hub genes and their regulatory elements involved in ES are analyzed via bioinformatics. However, the relationship of these molecules has not been con rmed. Furthermore, in vivo and in vitro experiments and analyses are required to test the biological functions of the DEGs and hub genes.