The Oxidative Stress Pathway May be an Important Pathway Leading to The Recurrence of Ewing Sarcoma —— Based On Weighted Gene Co-Expression Network Analysis

Background The role of gene and pathway in recurrence of Ewing sarcoma (ES) was not clear. Thus, we investigated the biological role and underlying mechanism of gene and pathway in recurrence of ES. Methods Data sets of patients with ES were collected from the GEO database. We used dataset GSE63155 and GSE63156 to construct co-expression networks by weighted gene co-expression network analysis (WGCNA). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed by Database for Annotation, Visualization and Integrated Discovery (DAVID). Results We can find that genes with significant interactions in the genes of the recurrence group include SRSF11, TRIM39, SOCS3,NUPL2,COPS5. They work primarily through the oxidative stress pathway. Conclusion Through our research, for the first time found that ES by SRSF11 TRIM39, SOCS3, NUPL2, COPS5 interaction, activation of phosphorylation of bone and oxidative stress is affecting tumor recurrence.


Introduction
Ewing sarcoma (ES) is a rare malignant bone and soft tissue sarcoma. It is the second most common primary malignant bone tumor. It has a very high mortality rate and is more common in adolescents and young people [1]. In fact, ES exhibits a large degree of undifferentiated and "dry" phenotype, contributing to its clinical invasiveness [2]. Among ES patients, 60-75% of patients benefit from multimodal treatment. Therefore more than 30% of patients have limited response to treatment, which is usually first based on evaluating their histological response after neoadjuvant chemotherapy [3]. Therefore, it is particularly important to find the target sensitive to ES drugs for the relapsing type ES. In combination with the latest advances in molecular biology, it is found that the occurrence and development of ES is related to the activation of oncogenes such as SOX2, EGR1 and ARID1A and the inactivation of tumor suppressor genes such as p53 [4]. However, the occurrence of tumor is a complex and gradual process with multiple factors, multiple stages and multiple steps. The molecular pathogenesis of relapsed ES is still unclear, preventing the recurrence of ES and improving the survival rate of patients is the key. In order to judge relapsed ES early and improve its therapeutic effect, it is necessary to better understand the pathogenesis of relapsed ES. Weighted gene coexpression network analysis (WGCNA) is a new tool for analyzing potential gene modules in gene expression data [5].The analysis of the core genes of recurrent and non-recurrent ES by WGCNA can bring great convenience to our clinical drug development. To this end, we searched for the core genes and pathways of recurrent type and non-recurrent type ES through the inclusion of two queues and the method of WCGNA.

Data information
ES from NCBI gene expression data sets Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/), queue GSE63155 and GSE63156 were included in the two groups. packets were used to process the raw data, generate expression matrices, and match the probes with their genetic symbols.

Construction of WGCNA
The raw data for GSE63155 and GSE63156 was downloaded from the GEO database. We preprocessed and normalized the raw data using the affy package (R environment, version 3.4.3). The parameters were set to RMA (background correction) and impute(supplementary missing value).We selected the first 5,000 genes of WGCNA by sequencing the SD from large to small (including primary and recurrent tumors).The R package was used for WGCNA and the pick Soft Threshold function was used to predict the power parameters. The scale-free topological fitting index of multiple efficiencies was calculated to provide an appropriate soft threshold efficiency for protein network construction.
The adjacency is defined as the sum of the adjacency of a gene and all other genes, and the network generation is carried out to measure the network connectivity of the gene. Hierarchical clustering function was used to divide genes with similar expression profiles into modules according to TOM differences with minimum size of 50. By calculating the differences between the characteristic genes of each module, a dividing line was selected to merge each module.1000 genes were randomly selected and the gene network was visualized. Finally, the gene network of feature genes is visualized.

Identification of clinical significant modules
We identified the relevant modules by calculating the correlation between MEs and recurrence.
Subsequently, gene significance (GS) was defined as log10 conversion of p value (GS= lgP) in linear regression of gene expression and clinical information. Module significance (MS) is defined as the average GS of all genes in a module. In general, of all the selected modules, the module with the highest absolute value of MS was considered to be related to recurrence.

Identification of hub genes in key modules
The core gene expression in ES was verified on two independent ES tissue data sets (GSE63155 and GSE63156). By looking for the common core genes of primary ES and relapsing ES affecting different tissue datasets, we included the WGCNA core genes of the two datasets to find the common genes, which helped us to find the most important genes.

Differentially expressed genes functional enrichment and pathway analysis
The DAVID database was used for GO analysis of the identified differentially expressed genes. DAVID provides comprehensive and systematic annotation tools for elucidating the biological significance of genes. GO analysis includes molecular function analysis, biological process analysis and cell component analysis [6]. The difference genes detected by DESeq2 software were used as the enrichment foreground, and the reference genes in human database were used as the enrichment analysis background. P < 0.05 was selected as the significantly enriched GO pathway, and the number of different genes in each significantly enriched GO and pathway was counted. We plot through the Goplot package of R language and achieve visual results [7].

Survival analysis of DEGs
To assess the prognostic value of a particular gene, patient samples were divided into two groups based on the median expression of the gene (high expression vs low expression). Kaplan-meier survival map was used to analyze the overall survival (OS) of patients with ES, and the core genes were respectively passed through R language to obtain the kaplan-meier survival map.

Hub gene analysis and identification
The genes in the co-expression module were uploaded to the search tool to retrieve the interacting genes/proteins (STRING) online database (version 11.0) to evaluate PPI information and construct the functional protein association network. Interactions with a composite score above 0.9 were considered significant. STRING was used to detect the PPI central gene, and the connectivity was set to the top 5%. The weight network of each common expression module is weighted by STRING. The PPI central gene in the top weighted network is considered to be the true central gene [8].

Statistical Analyses
Pearsons 2 test was used to investigate the differences in the clinical and pathological parameters of each group. Overall survival rate (OS) was calculated by survival analysis. OS was defined as the time from the initial diagnosis of ES to death from any cause. The survival rate was statistically analyzed by kaplan-meier, and the difference of survival curve was statistically analyzed by log-rank test.
Spearman's rank correlation (rs) was used to investigate the relationship between relapsed and nonrelapsed groups. All statistical analyses were performed using IBM SPSS 20.0 (IBM Corp., Armonk, NY, USA) and R version 3.3.0 (The R Foundation). P value was bilateral, and P < 0.05 was considered statistically significant.

Construction of weighted co-expression network and identifcation of key modules
To construct a gene co-expression network, the original data of ES (GSE63155 and GSE63156) were downloaded from the GEO database. R is used for background correction and normalization, and the original data is preprocessed in the same way. The probe was matched with the gene symbol by rpacket annotation, and the probe matching multiple genes was removed. For the gene matched by multiple probes, the median was taken as the final expression value. In the end, we had a total of 24,991 genes. We calculated the SD values of each gene, sequenced them from large to small, and selected the first 5,000 genes as WGCNA. The function of fashClust in the WGCNA package was used for cluster analysis of 5000 genes ( Figure 1A and Figure 2A). The selection of soft threshold power is an important step in the construction of WGCNA. The network topology of 1 ~ 20 threshold weights was analyzed, and the scale independence and average connectivity of WGCNA were determined. As shown in Figure 1B, C and Figure 2B, C, the power value 9 was selected as the lowest power (0.9) of the scaling free topological ft index to generate a hierarchical clustering tree (dendrogram) of 5,000 genes. We set the MEDiss Thres to 0.25 to merge similar modules ( Figure 3A and Figure 4A) and generated 65 modules ( Figure 5A and Figure 6A). The gene statistics in each module are shown in Table1 and 2. Genes that cannot be included in any module are put into the grey module and removed in subsequent analysis.

Correlation between modules and identifcation of key modules
We analyzed the interaction between 65 modules and drew a network heat map (

Look for common core genes
The core differentially expressed genes were verified and the core genes were found in two independent ES data sets (GSE63155, GSE63156). By analyzing the genes that play a key role in recurrence in GSE63155, we can find that Blue module and black module are positively correlated in the recurrence group through WGCNA, while GSE63156 is Brown module and turquoise module. Euler diagram shows that they share 12 genes (see figure 9). Second, through WGCNA, we can find that brown module and Blue module are negatively correlated in the non-recurrence group, while GSE63156 is Blue module. By Euler diagram, we found that they Shared 26 genes (see figure 10).So we have 38 genes that are critical to the recurrence group.

Function enrichment analysis in two key modules
In order to study the role of core genes, we conducted enrichment analysis and explored the pathways involved in BP and these two key modules. GO enrichment of BP by DAVID shows that, Recurrence of core set of genes is mainly enriched in regulation of phosphorus metabolic process, regulation of phosphate metabolic process, regulation of phosphorylation, the response to hypoxia, the response to oxygen levels, Regulation of protein amino acid phosphorylation (FIG. 11,12,13,).

Identifcation of hub genes in the Salmon module and Darkorange module
We submitted the core gene set related to prognosis to the protein interaction of STRING, with the binding confidence interval of truncation set at 0.4. In Plugin Molecular Complex Detection (MCODE), significant mondels with strong protein-protein connections are calculated and selected, with default parameters (degree cut≥2, node score cut≥2, k-core ≥2, maximum depth =100). In order to P&lt;0.05 was considered statistically significant. The candidate genes of node degree were sequenced and the core genes were selected for further analysis. Figure 18 shows the hub genes for SRSF11 TRIM39, SOCS3, NUPL2, COPS5. There are few studies on the causes of recurrence of ES and most of them are based on clinically relevant studies. Therefore, we urgently need to master the molecular mechanism of recurrence of ES, so as to facilitate our treatment and diagnosis. Therefore, we first through the analysis of GEO data sets, by incorporating two independent ES (GSE63155 GSE63156) data sets on core differentially expressed genes are verified and looking for the core found SRSF11, TRIM39, SOCS3, NUPL2, COPS5 by influencing the phosphorylation of bone and oxidative stress path is the main factor affecting the recurrence of ES.
Oxidative stress is an important factor affecting tumors. Reactive oxygen species (ROS) and reactive nitrogen (RNS) are highly reactive by-products of energy produced during physiological processes (such as metabolism).ROS are prone to interact with lipids, proteins and DNA, leading to oxidative damage, which, if unchecked, may lead to the occurrence and development of a variety of pathologies. Levels of ROS increase in cancer cells compared to normal cells. Although ROS production and removal responses are essentially the same in cancer cells and normal cells, increased ROS production and decreased detoxification can lead to oxidative stress and loss of REDOX control [10]. ROS play an important role in oncogenic signaling, and elevated ROS levels are an important feature of many highly invasive cancers, possibly including ewing tumors [11]. There is strong evidence that many malignancies have a permanently active "oxidative stress phenotype" that leads to increased invasion, which is considered another marker of cancer recurrence [12]. Among other effects, mitochondrial respiratory defects inactivate PTEN in a redox-dependent manner, leading to activation of the PI3K/Akt pathway for survival [13]. ROS dependent inactivation of PTEN (through the oxidation of bound cysteine) and activation of the PI3K/Akt pathway highlights the role of ROS as a second receptor that regulates intracellular signal transduction to determine cell fate. ROS dependent MAP kinase cascade regulation also plays an important role in cell fate regulation through REDOX dependent regulation of ASK1 [14]. The ROS/Jab1/Trx signaling pathway may be significantly associated with the recurrence and metastasis of tumor patients [15]. In this study, oxidative stress pathway was significantly upregulated in the recurrence group, which may be closely related to tumor recurrence. which about 50 species of higher organisms participate in RNA splicing [16]. The splicing factor activity of this family is regulated by reversible phosphorylation mediated by protein kinases belonging to SRPK and CLK families and by kinases activated by different signaling pathways such as MAPK, PI3K and Akt [17]. They are involved in tumor apoptosis by activating ROS and other oxidative stress pathways through interaction, and affect recurrence and metastasis [18]. Tripartite motif 39 (TRIM39), also known as the Ring finger protein 23 (RNF23), belongs to the class with annular structure domain, B -box structure and curly structure domains for characteristics of protein family.
In recent years, TRIM39 stabilized apoptosis regulator 1 (moap-1) by inhibiting the APC/C cdh1mediated polyubiquitination process [19]. This could make it easier for us to find precise treatments.

Declarations
Consent for publication Not applicable.

Detail of funding
None.

Ethics committee
The information of patients with ES obtained from the GEO database does not contain any identifiers and is public. Due to the retrospective nature of the study, the patient's informed consent was not required. All data base information were deidentified by United States institutional review board approval.This analysis does not involve interaction with human subjects or the use of personally identifiable information. Tables Table 1 Gene statistics in each module of GSE63155.   Module  Genes  black  57  blue  647  brown  362  cyan  11  green  146  greenyellow  34  grey  418  magenta  50  pink  54  purple  38  red  112  salmon  18  tan  23  turquoise  1357  yellow  222  Table 2 Gene statistics in each module of GSE63156.   Module  Genes  black  27  blue  391  brown  320  green  73  grey  309  magenta  24  pink  25  red  28  turquoise  507          Euler diagram of the number of differentially expressed genes identified as significant (FDR adjusted p<0.05) for the Positive correlation of recurrence. Of 967 genes, 12 were identified as significant.

Figure 10
Euler diagram of the number of differentially expressed genes identified as significant (FDR adjusted p<0.05) for the Negative correlation of Non-recurrence. Of 1317 genes, 26 were identified as significant.

Figure 11
Functional enrichment and pathway analysis of the hub gene.

Figure 12
Hub Functional enrichment and pathway analysis of the hub gene.

Figure 13
Functional enrichment and pathway correlate with gene of the hub gene.

Figure 18
The top hub genes in the core gene.