Identication of genes that inuence poor prognosis in osteosarcoma

Objectives Osteosarcoma (OS) is the most common primary bone cancer in children and adolescents. At present, the 5-year overall survival rate of OS patients is about 65%, and the long-term prognosis is still not ideal. The study was designed to screen genes that could contribute to the poor prognosis of OS and explore their potential pathogenic mechanisms. Methods The gene expression prole of the GSE94805 dataset from the GEO database, containing data from 12 U2OS cell samples, including four control, four quiescent, and four senescent samples was obtained. Co-expressed differentially expressed genes (DEGs) in OS U2OS cells were selected using the GEO2R tool and Venn diagram analysis. Next, using the STRING, Cytoscap, and Molecular Complex Detection (MCODE) plug-in, the related protein-protein interaction network among upregulated genes was analyzed. Moreover, Kaplan-Meier plots were used to analyze the relationship between the identied genes and OS prognosis. Genes signicantly associated with worse prognosis were evaluated using the Gene Expression Proling Interactive Analysis. Results Thirteen genes were conrmed to be signicantly more expressed in OS than in normal tissues. Five genes (AURKB, EXO1, KIF4A, KIF15, and MCM4) were found to inuence OS prognosis. Conclusion We identied ve core genes related to the prognosis of OS and constructed a clinical prediction model for OS. Our data may provide a reference for future research on mechanisms, clinical diagnosis, and treatment of OS.


Introduction
Osteosarcoma (OS) is a highly hereditary and unstable cancer, and is the most common primary malignant bone tumor in children and adolescents, accounting for approximately 20% of all bone tumors [1]. The 5-year survival rate of patients with non-metastatic OS was only 60-70% through the combined treatment of surgery and chemotherapy [2]. Moreover, OS is prone to local invasion and early metastasis, mainly affecting the lung and skeleton. Indeed, local recurrence, metastasis, and chemotherapy resistance are associated to a poor prognosis [3,4]. Despite signi cant improvements in treatment strategies in recent years, treatment outcomes for patients with metastatic or recurrent osteosarcoma are still pessimistic, with a rather grim prognosis of a greatly reduced 5-year overall survival rate of only 20-30% [5]. Current surgical treatment has a serious impact on limb function and chemotherapy can induce diverse side effects; therefore, there is an urgent need of new and enhanced treatment and early diagnosis strategies.
The rapid development of gene chip technology allowed its use in the diagnosis, treatment, and prognosis evaluation of diseases. Bioinformatics, as a new interdisciplinary subject in the eld of life science, can deeply mine the microarray data of gene expression pro les, identifying new cues for exploring the underlying molecular pathological mechanisms [6,7]. In particular, studies have shown that OS is associated to altered expression of oncogenes and tumor suppressor genes and thereby in uence the cell migration, angiogenesis, cell apoptosis, and proliferation.
In this study, bioinformatics methods were used to analyze publicly available OS microarray data and identify differentially expressed genes (DEGs) that could potentially contribute for the molecular mechanisms or as potential therapeutic targets of OS. Moreover, recognition of an OS-speci c genephenotype correlation network could aid on a comprehensive analysis to screen the disorders associated with osteosarcoma typing.

Microarray data information
The microarray dataset GSE94805 was obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), which was based on the GPL13252 platform (Agilent-027114 Genotypic Technology designed Custom Human Whole Genome 8 × 60 k Microarray) and included 12 samples of OS U2OS cells (four control, four quiescence, and four senescence samples).

Data processing of co-expressed DEGs
The GEO2R online tool was used to identify the DEGs in control, quiescence, and senescence OS samples of the dataset, with |log fold change (FC)| > 2 and adjusted P-value < 0.05. The co-expressed DEGs between control vs. quiescence and control vs. senescence were selected through a Venn diagram.

Enrichment analysis of co-expressed DEGs
Gene ontology (GO) analysis was performed using the differential genes selected from the David website (https://david.ncifcrf.gov/). Molecular function, biological process, and cellular component were selected to observe the functional category or cellular location of the differential genes. Using the KOBAS 3.0 database, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the differential genes was carried out to observe their role in possible signaling pathways.

Protein-protein interaction (PPI) network and module analysis
The PPI network of the co-expressing DEGs was constructed using the retrieval tool STRING (https://string-db.org/). The MCODE plug-in of Cytoscape (version 3.7.1) was used to select the densely connected modules in the PPI network and the key genes were determined.

Survival analysis and RNA sequencing expression of core genes
Kaplan-Meier-plotter is a commonly used online tool for assessing the effect on survival of a great number of genes based on the European Genome-Phenome Archive, The Cancer Genome Atlas (TCGA), and GEO databases (Affymetrix microarrays only). The log rank P-value and hazard ratio (HR) with 95% con dence intervals were computed and are shown in the plots. To validate the identi ed DEGs, the Gene Expression Pro ling Interactive Analysis (http://gepia.cancer-pku.cn) was performed using the data of RNA sequencing expression based on thousands of samples from the Genotype-Tissue Expression and TCGA projects.

Evaluation of prognostic indicators
The prognostic indicators, including overall survival, progression-free interval (PFI), disease-free interval (DFI), and disease-speci c survival (DSS) were evaluated using the Sangerbox software (http://www.sangerbox.com). Survival analysis was evaluated using the Kaplan-Meier method and log rank test (P < 0.05). Every median core gene expression was chosen as the cut-off value for the human cancer dichotomy, thus dividing respective patients into high-and low-risk groups.

Statistical analysis
Statistical analysis was performed using Graph Pad Prism 7 (GraphPad Software, La Jolla, CA, USA) and IBM SPSS Statistics for Windows version 23.0 (IBM Corp., Armonk, NY, USA). Data are expressed as mean ± standard deviation. The F test was used to analyze the homogeneity of variance among quantitative data groups, the Student's t-test and non-parametric test were used to compare the data between groups, and P-value < 0.05 was selected as the threshold of signi cance.

Identi cation of DEGs
According to the GSE94805 dataset, 614 and 1,105 upregulated DEGs of normal vs. quiescent OS and normal vs. senescent OS, respectively, were identi ed. Overall, a total of 360 upregulated DEGs shared by the two DEG sets were selected for further analysis (Fig. 1).

Network and module analysis of co-expressed DEGs
The PPI network corresponding to the 360 intersecting DEGs was analyzed using Cytoscape (Fig. 1A).
Additional analysis further revealed a total of 113 upregulated core genes among the network (Fig. 1B).

Analysis of core genes by the Kaplan-Meier plotter and GEPIA
Survival outcome related to the 113 core genes identi ed was evaluated by Kaplan-Meier plots. Sixteen genes had signi cantly worse survival, whereas 97 had no signi cant impact on survival (P < 0.05, Fig. 3A). GEPIA was then used to determine the expression pro le of the 16 genes, revealed that 13 of these genes were highly expressed in the quiescent and senescent groups compared to the control group (P < 0.05, Fig. 3B).

Enrichment analysis of co-expressed DEGs
Based on the PPI network analysis, enrichment analysis of co-expressed DEGs was performed by GO term and KEGG pathway analysis. The GO results indicated that (1) for biological processes, 13 core genes were particularly enriched in mitotic sister chromatid segregation, G2/M transition of mitotic cell cycle, spindle checkpoint, mitotic spindle organization, regulation of cytokinesis, mitotic spindle assembly, and negative regulation of apoptotic process; (2) for cell component, 13 core genes were signi cantly enriched in the chromosome passenger complex, spindle midzone, spindle microtubule, nucleoplasm, midbody, kinetochore, condensed nuclear chromosome, centromeric region, spindle pole centrosome, chromocenter, chromosome, and centromeric region; (3) for molecular function, 13 core genes were enriched in the ATP binding, histone serine kinase activity, and protein serine/threonine/tyrosine kinase activity (Table 1). The results from KEGG analysis showed that the 13 upregulated core genes were enriched in multiple pathways (Table 2 and Fig. 4A), including cell cycle, progesterone-mediated oocyte maturation, mismatch repair, cellular senescence, DNA replication, viral carcinogenesis, oocyte meiosis, and FxO signaling pathways. According to the biological pathway for key genes, 13 upregulated key genes were enriched in PLK1 signaling events, mitotic prometaphase, aurora kinases, FOXM1 transcription factor network, cell cycle, and aurora B signaling (Fig. 4B)

Five core genes reduce the prognosis of OS
To evaluate the prognostic value of the identi ed ve core genes in OS, the survival outcomes were assessed. As shown in Fig. 5, high expression of the ve core genes (AURKB, EXO1, KIF4A, KIF15, and MCM4) had worse OS prognosis.

Discussion
Osteosarcoma is the most common primary bone tumor in children and adolescents, and patients with advanced osteosarcoma with signs of metastasis have a poor prognosis [8]. In order to identify new valuable prognostic biomarkers for OS, bioinformatics methods based on the GSE94805 dataset were used in this study.
Herein, the gene expression pro les of U2OS cells in the quiescence and senescence phases were analyzed, revealing 360 overlapping upregulated DEGs. However, these genes preliminarily screened were not necessarily related to the occurrence and development of OS. To further re ne the study targets, String, Cytoscap, and MCODE plug-ins were used to conduct enrichment analysis, narrowing the investigation list to 113 key candidate genes. Next, 13 signi cantly upregulated genes were selected by comparing cancer and para-cancerous samples from OS patients. Furthermore, analysis with the Sangbox software showed that the expression of ve closely related core genes -AURKB, EXO1, KIF4A, KIF15, and MCM4 -had signi cant impact on several survival and disease-progression prognostic indicators.
Aurora kinases (AURKs) are highly conserved serine/threonine kinases, including AURKA, AURKB, and AURKC, which are key regulators involved in mitosis [9]. Studies have shown that AURKB is highly expressed in a variety of human cancers, such as lung [10], gastric [11] and liver [12] cancer, and is associated with tumor invasion and metastasis, and poor patient prognosis [13,14]. Overexpression of AURKB can lead to errors during mitosis, such as damage to the junction between kinetochores and microtubules, interfering with chromosome concentration and spindle assembly, and hindering centromere localization, which can cause abnormal chromosome segregation and cytokinesis, ultimately leading to malignant cell transformation [15]. AURKB inhibitors have no effect on non-proliferative cells and have little effect on normal cells [16]. Therefore, AURKB holds potential as a clinical therapeutic target for tumors.
Exonuclease 1 (EXO1), a member of the Rad2/XPG nuclease family, plays important roles in the regulation of cell cycle checkpoint, replication fork maintenance, and DNA repair after replication [17]. Previous studies revealed that EXO1 is associated with various cancers, including Lynch syndrome, ovarian, gastric, lung, breast, and pancreatic cancers [18]. Overexpression of EXO1 disturbes the relative stability of the genome. Dai et al. [19] reported that EXO1 knockdown can reduce the proliferation of hepatocellular carcinoma cells in vitro, inhibit the survival of cancer clone cells, and is associated with poor prognosis of patients with hepatocellular carcinoma. To date, studies on EXO1 and OS are still lacking, and the role of EXO1 in OS is unclear. The present study suggests that high expression of EXO1 in OS is closely related to disease prognosis, and that the upregulation of EXO1 in OS might predict a poor clinical outcome.
The kinesin superfamily (KIFs) is a group of proteins that share highly conserved motor regions, including 1-14 subtypes. Most KIFs have ATP-dependent activity and catalyze microtubule-dependent tailing reactions. KIFs play important roles in the occurrence and development of tumors. KIF4A, a member of the kinesin-4 family, is located in the nucleus during the interphase, which is mainly involved in the biological processes of chromatin separation, spindle formation, and cytoplasmic separation during mitosis. KIF4A is abnormally expressed in lung and ovarian cancers, and oral squamous cell carcinoma [20]. KIF15, belonging to the kinesin-12 family, is an N-recti ed directional motor that plays a crucial role in the formation of bipolar spindles, and maintains the overall integrity of the spindle by shaping and stabilizing the kineto-ber [21,22]. Studies have shown that these proteins participate in several cellular processes, including proliferation, apoptosis, differentiation, and development    The prognosis analysis of key genes. (A) The prognostic information of the 113 key candidate genes, among which 16 genes had a signi cantly worse survival rate (P < 0.05). (B) Signi cantly expressed genes related with poor prognosis were analyzed between OS and normal samples using the GEPIA software. Thirteen genes had signi cant expression in OS cancer samples compared to normal samples (*P < 0.05). Red and grey colors represent tumor and normal tissues, respectively.