A Death-Related Gene Signature for Prognosis with Osteosarcoma

Purpose: Osteosarcoma is one of the most prevalent malignancies, and despite signicant advances in its treatment, patient prognosis remains poor and survival rates are low. It is undoubtedly important to explore the possible reasons for the low survival rates of patients and to reveal the differences. Methods and Results: We obtained RNA-Seq (HT seq) and clinical characteristics of osteosarcoma patients from the TCGA database and divided them into survival group and death group. We dened the differentially expressed genes (DEGs) between the two groups as death-related genes (DRGs) and used them to construct a prognostic signature for overall survival of patients with osteosarcoma. The results of the validation demonstrated satisfactory accuracy and predictive prognostic value of the model. In addition, we performed a series of bioinformatic analyses that identied two key genes and the regulatory networks they constituted that may play a role in the progression of osteosarcoma. Conclusion: Our DRGs signature represents a novel and clinically useful prognostic biomarker for patients with osteosarcoma, helping to aid clinical decision-making.


Introduction
Osteosarcoma (OS) is one of the most common malignancies, with an incidence of one to four per million [1,2]. It is most often seen in adolescents and originates in bone tissue, commonly in the epiphysis of bones. Osteosarcoma of the limbs can be cured in about 75% of cases with current treatments [3].
Osteosarcoma in the spine, however, has a relatively low cure rate due to the special anatomy of the spine. Moreover, osteosarcoma is highly malignant and aggressive, and spreads and metastasizes easily through the bloodstream at an early stage. Pulmonary metastases from osteosarcoma are the leading cause of death in patients [4]. Therefore, although many advances have been made in the diagnosis and treatment of osteosarcoma, its survival rate is low and its prognosis is poor.
In order to improve the prognosis of patients with osteosarcoma, it has become imperative to nd the real factors that in uence prognosis. A number of factors affecting the overall survival of OS patients have been reported, including the size and location of the tumor, the surgical option, and the responsiveness of the tumor to chemotherapy [5][6][7]. However, the predictive power of these factors is inadequate due to the heterogeneity of the sample and the diversity of clinical strategies. Therefore, it is necessary to construct more accurate, adaptable and better predictive models.
With the rise of bioinformatics and advances in molecular biology techniques, it has become possible to address these issues. Bioinformatics methods and tools are increasingly being used to explore and analyze biomolecules such as target genes or proteins. A number of studies have used bioinformatics methods to reveal the mechanisms of tumor development, including lung cancer and breast cancer. In addition, high-throughput sequencing technologies have revolutionized tumor genetics, providing comprehensive molecular characterization of a wide range of tumors, including osteosarcoma [8]. A number of epigenetic abnormalities in osteosarcoma have been identi ed from high-throughput sequencing data and experimentally con rmed to be associated with the progression of osteosarcoma [9,10]. Honglai Tian et al. used WGCNA (Weighted Gene Co-expression Network Analysis) to predict the gene expression modules associated with the mechanism of OS occurrence [4]. Hongwu Fan et al. identi ed several genes closely associated with osteosarcoma metastasis, including MMP3, by bioinformatics analysis [11].
Therefore, in this study, data on OS patients were obtained from the TCGA database, and various bioinformatics tools were used to obtain DRGs, analyze their functions and successfully construct a DRGs signature and a nomogram to predict the prognosis of patients with OS.

Acquisition of genes expression dataset and clinical features and screening of death-related genes (DRGs)
We obtained RNA-Seq (HT seq) data of OS from TARGET program in TCGA database. The clinical features of OS patients were also downloaded from TCGA. We included data from 84 cases after excluding patients with unknown survival status. 55 patients were placed in the survival group and 29 patients were placed in the death group. Then the data were analyzed using the limma package [12], P < 0.05 and |log2FC| > 1 was considered statistically signi cant when the DRGs were obtained.

Functional analysis and identi cation of hub genes
To reveal the functional roles of the DRGs, the clusterPro ler package [13] was used to perform GO (Gene Ontology) enrichment analysis and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analysis and visualize results. PPI (Protein-Protein Interaction) network was constructed by STRING database successively and the MCC algorithm in Cytohubba was then used to identify the top 50 most core genes. Cytoscape software (version 3.8.2) was utilized to visualize the network. The signi cant level was set as combined score > 0.4.

Construction and validation of the DRG-Based prognostic signature
We used lasso regression via glmnet package [14] to obtain the genes associated with prognosis. In order to improve the prediction accuracy of the signature, we used more stringent criteria to screen genes and P < 0.001 was statistically signi cant level. A DRG-based prognostic signature was constructed by stepwise multivariate Cox regression. The risk score was calculated as follows: Riskscore = (Exp = expression level; Coe = regression coe cient). Patients were divided separately into low-risk group and high-risk group based on the median risk score value. We assessed differences in overall survival between two groups by KM survival analysis. A nomogram consisting of clinical features and risk score was then constructed and calibration plots were used as method of assessing the accuracy of it. Then ROC curve analysis was carried out to compute the sensitivity and speci city of the DRG signature in predicting the survival of osteosarcoma patients using the survivalROC package [15]. The ggplot2 package[16] was used for visualizing.

Regulatory network of key genes
We took the intersection of the genes that make up the prognostic signature with the top 50 genes in the PPI and obtained two key genes. Then two databases (Targetscan and miRcode) were used separately to identify the target miRNAs for the two key genes. Following this, we used Starbase to nd lncRNAs that interacted with miRNAs. Finally, the regulatory network of key genes in OS was established. In addition, the top 20 KEGG pathways in which miRNAs were mainly involved were shown.

Analysis of osteosarcoma immune in ltration patterns
The percentage of various immune cells in all OS samples was analyzed using CIBERSORT. Wilcoxon rank sum test was performed to detect the differences in immune cell proportions between the two groups and the statistical signi cance level was P < 0.05. In addition, we used difference test and correlation tests to analyze the association between key genes and the proportion of immune cell in ltration, respectively. In the difference test, al tumor samples were divided into high and low expression groups according to the median expression of key genes, and then compared the difference in the proportion of various immune cell in ltrations between the two groups. The correlation test was a method that used the Pearson test to directly determine the correlation between the expression of key genes and the proportion of immune cell in ltration. Finally, the intersection of the results obtained by the two methods was considered to be statistically signi cant.

Overall survival analysis of tumor microenvironment scores
We used the estimate package [17] to score the tumor microenvironment and divided the entire set into low-score group and high-score group based on the median value of the score. KM (Kaplan-Meier) survival analysis was performed to assess differences in overall survival between two groups. In addition, Wilcoxon rank sum test was used to analysis the differences of correlation between clinical characteristics and tumor microenvironment scores (P < 0.05).

Identi cation of death-related genes (DRGs) and functional analysis
After screening the two groups of data, we obtained 561 differentially expressed genes (DEGs) (Fig. 1).
The top 50 up-regulated and down-regulated DEGs were located in Table 1. Then, we de ned these DEGs as death-related genes (DRGs). Fig. 2A shows the results of KEGG pathway enrichment analysis, DRGs were mainly involved in neuroactive ligand-receptor interaction. The GO analysis showed that DGRs were mainly enriched in immune response, Fc receptor-related pathways, etc (Fig. 2B, D). In terms of CC (cellular component), blood mircoparticle and extracellular matrix were enriched with more DGRs (Fig.  2C). We successfully constructed the PPI of DRGs using Cytoscape and demonstrated the top 50 core genes in the network (Fig. 2E). Table 1 Screening of DRGs from two groups, including top 50 up-regulated and down-regulated genes

DRGs
Gene symbol

Construction and validation of the DRG-Based prognostic signature
After a series of statistical analyses (P < 0.001), we obtained 14 hub genes. This led us to the risk score formula. Detail information of 14 hub genes used to construct the signature was located in Table 2.
Finally, a fourteen-DRG prognostic signature for osteosarcoma patients was successively constructed. We divided the patients into a high-risk group and a low-risk group using the median risk score as a cutoff (Fig. 3C). The survival times of patients with different risk scores are shown in Fig. 3D. The survival rate of patients in the low-risk group was signi cantly higher than that in the high-risk group (P < 0.001) (Fig. 3F).
A nomogram consisting of age, gender and risk score was made to predict the probability of patient survival at 1, 3 and 5 years (Fig. 4A). The calibration plots demonstrated the high accuracy of this nomogram ( Fig. 4B -D). The forest plots showed that risk score was associated with patient prognosis in both univariate COX regression and multivariate COX regression (P < 0.001) (Fig. 4E, F). Finally, we plotted the time-dependent ROC curve to evaluate the predictive power of the fourteen-DRG signature. As shown, the AUC was 0.887, 0.941 and 0.949 at 1 year, 3 years and 5 years (Fig. 3G). Fig. 4H showed that risk score was a better predictor of prognosis compared to other clinical features.

Regulatory network of key genes
We obtained two key genes (CSAG1 and MAGEA12) by intersecting the 14 genes that made up the signature with the top 50 core genes in the PPI of the DGRs (Fig. 5A). Two databases were then used to nd common target miRNAs for key genes, a key miRNA --miR-193a-3p was obtained (Fig. 5B). Subsequently, we had put together a regulatory network with miRNA's targeting lncRNAs, miR-193a-3p, CSAG1 and MAGEA12 (Fig. 5C). In addition, Fig. 5D showed that miR-193a-3p was mainly involved in KEGG pathways such as Pathways in cancer and MAPK signaling pathway.

Immune in ltration patterns
We analyzed the proportion of various immune cells in all OS samples and found that three immune cells, Macrophanges M0, Macrophanges M2, T cells CD4 memory resting, were present in high proportions in most samples (Fig. 6A). In addition, we obtained the degree of correlation between various immune cells (Fig. 6B). Furthermore, by comparing the proportions of various immune cells in the survival group and death group, the proportion of T cells CD8 was the only statistically signi cant difference (Fig. 6C). We noted that the P value for T cells CD4 naive was 0.07 which was within the borderline signi cant interval, so we believed that there was a tendency for differences in the proportions of T cells CD4 naive. Finally, the immune cells that we derived from both tests that were associated with both key genes was T cells CD4 naive (Fig. 6D -I). More detailed data were shown in Table 3.

Overall survival analysis of tumor microenvironment scores
We analyzed all OS samples using the estimate package and obtained three types of scores: StromalScore, ImmuneScore and ESTIMATEScore. Patients were then divided into high and low subgroups based on the median of the scores. KM survival curves show that survival rates were signi cantly higher in the high score group than in the low score group for all three kinds of scores (P < 0.05) (Fig. 7A -C). Analysis of clinical characteristics showed that ESTIMATEScore was signi cantly higher in female patients than in males (P < 0.05) (Fig. 7I). However, there was no statistically signi cant correlation between the remaining clinical characteristics and the scores (Fig. 6D -H).

Discussion
Osteosarcoma (OS) is the most common type of bone malignancy, originating from mesenchymal cells and characterized by the production of bone-like tissue or bone tissue by the tumor cells. And osteosarcoma is commonly found in the long bones of the limbs, particularly the epiphysis, and is characterized by strong localized aggressiveness and early haematogenous metastasis. In recent years, signi cant advances have been made in the diagnosis and treatment of osteosarcoma, including neoadjuvant chemotherapy and improvements in surgical techniques. Although early detection and timely treatment have largely improved the survival rate of the disease, osteosarcoma currently remains a high mortality rate among malignancies in children and adolescents. Therefore, it is necessary to search for markers with high predictive value for patient prognosis during OS development, so that more precise treatment can be implemented. There is no doubt that this has important implications for patients and clinicians alike.
With advances in computer technology and molecular biology, a new interdisciplinary discipline, bioinformatics, has emerged. The analytical tools of bioinformatics can identify those that are meaningful from the vast amount of biological data. The advent of bioinformatics has transformed disease research, with advanced methods and tools used to explore the mechanisms involved in the onset and progression of disease, which also includes various tumors. At the same time, the application of high-throughput sequencing technologies has enabled a comprehensive molecular characterization of tumors, both temporally and spatially. In 2019, Yizhe Xi et al. conducted a study in which they analyzed several hundred circ RNAs that were differentially expressed between osteosarcoma and paraneoplastic tissues and analyzed their potential functions [18]. Kun-Peng Zhu et al.'s study combined bioinformatics analysis and experimentation to construct an RNA regulatory network and explore the underlying mechanisms of OS chemoresistance [19] In addition to this, there is a large body of literature that has examined OS using bioinformatics and has found valuable information. However, there is no literature that correlates the possible reasons for the poor prognosis of OS patients. Therefore, in the present study, the potential causes of low survival in OS patients were focused on. We grouped patients according to their survival status and conducted a series of studies.
In our study, we identi ed several hundred genes that were differentially expressed between the survival and death groups and de ned these genes as death-related genes (DRGs). Then a functional analysis of these genes was performed. KEGG enrichment analysis revealed a pathway related to intercellular signaling. We therefore hypothesize that the OS cell-to-cell signaling was more active in the dead group, and that the OS tissue was more "active" and more prone to metastasis and invasion, resulting in a lower survival rate.
Our main concern was how to predict the prognosis of patients more accurately. The fourteen-DGRs signature obtained after a series of analyses helped us to solve this problem to some extent, and both the KM survival curve and the ROC curve showed the high value of the signature. We have analyzed the two key genes obtained and established a regulatory network. Both CSAG1 and MAGEA12 are related to cancer/testis antigen family. There are several studies have been reported on the role of two key genes in a variety of human tumors. MAGEA12 has been shown to act as a prognostic-related gene in gastric cancer [20]. In addition, overexpression of MAGEA12 is involved in the pathogenesis of human cutaneous squamous cell carcinoma and can also promote invasion of breast cancer cells [21,22]. The study by Chuzhao Lin et al. reported that cancer/testis antigen CSAGE was concurrently expressed with MAGE in chondrosarcoma [23]. In our study, both CSAG1 and MAGEA12 were positively associated with T cells CD8 (Fig. 6E, H). And the percentage of T cells CD8 was higher in the death group than in the survival group (P < 0.05). These results suggest that MAGEA12 and CSAG1 may act through some mechanism on T cell CD8 to promote the progression of osteosarcoma, which is the direction of our future research.
In addition, we also noted that of the fourteen DGRs that comprised the signature, only TAC4 expression was relatively low in the death group (log 2 FC = -1.629). This gene is a member of the tachykinin family of neurotransmitter-encoding genes. And the products of this gene preferentially activate tachykinin receptor 1, and are thought to regulate peripheral endocrine and paracrine functions including blood pressure, the immune system, and endocrine gland secretion [24][25][26]. Unfortunately, no studies have reported a relationship between TAC4 and human tumors. We believe that the relationship between TAC4 and tumors is of interest to explore.
The interaction between immune function and tumor has received increasing attention in recent years and a large number of studies have been reported in the literature. Immunotherapy has a long history in osteosarcoma, with inactivated bacteria being used to treat osteosarcoma for over 100 years, but with controversial results [27]. Mifamurtide, as an immune adjuvant, has been at the forefront of osteosarcoma treatment, but there is a lack of large-scale clinical trials to demonstrate its e cacy and safety, so Maya Kansara et al. suggest that the understanding of the relationship between bone tumors and immune function is still in its infancy [1]. Our GO enrichment data show that DRGs are focused on a number of immune-related pathways. Our analysis of the immune cell composition of OS revealed that Macrophanges M0, Macrophanges M2, and T cells CD4 memory resting were the three most predominant. When comparing the two groups of OS samples, the proportion of T cells CD8 was signi cantly higher in the death group than in the survival group (P < 0.05). There is no doubt that CD8 + T cells are important for their protective immune role against pathogens and tumors. In the case of chronic infections or tumors, the "load" on the immune system is signi cantly increased. Excessive antigenic and/or in ammatory signals continue to act on CD8 + T cells and this leads to a progressive deterioration of T cell function, T cell depletion in RE and ultimately to hypo-and even loss of immune function. Therefore, we believe that the reason for the low survival rate in OS patients is most likely related to the reduced function of the immune system.
As with the immune system, the tumor microenvironment has been one of the focal points of research in recent years. Osteosarcoma grows in a complex and speci c bone microenvironment composed of a wide range of cells. Of these cells that make up the microenvironment, stromal cells and immune cells have received the most attention. Notably, the "Mr. Tumor" will adapt the local microenvironment to its own heterogeneity in a way that suits it. We therefore scored stromal cells and immune cells separately and derived an estimatescore based on these two scores. As shown in the gure, patients with higher scores had signi cantly higher overall survival (P < 0.05). We believe that the scores we obtained are useful as a guide to clinical decision-making, although the process of performing the scoring is more complex.
However, there are some limitations to this study. Firstly, the number of cases we obtained was not su ciently large and a larger sample is needed to validate the e cacy of the fourteen-DGR signature. In addition, the mechanism by which key genes affect patient's survival needs to be further explored.

Conclusions
We successfully constructed a fourteen-DRG signature to predict the prognosis of patients. In addition, a nomogram was obtained. Both of them can help clinical decision making and thus implement more precise interventions for patients.

Declarations
Ethics approval and consent to participate: As this study did not involve human participants, ethical approval and informed consent were not applicable.

Consent for publication:
All authors grant BioMed Central a license to publish the article and identify itself as the original publisher.
Authors' contributions: Bin Xie wrote the manuscript, analyzed the data.
Shiyong Tan analyzed a portion of the data.
Chao Li plotted the graphs.
Junyang Liang designed and guided the implementation of this study. Acknowledgements: Not applicable. Funding: This research did not receive any speci c grant from funding agencies in the public, commercial, or notfor-pro t sectors.
Con icts of Interest: The author(s) declare that they have no competing interests.
Availability of data and material and code availability: The datasets supporting the conclusions of this article are available in the TCGA repository (https://portal.gdc.cancer.gov/).