Identication of a Programmed Cell Death-related Signature for Prediction of Osteosarcoma

Background: This study aims to perform bioinformatics analysis of programmed cell death-related genes (PCDGs) in osteosarcoma, and to construct a multi-gene signature for predicting the prognosis. Methods: The functional enrichment analysis was applied for prognostic PCDGs, and PPI network as well as drug-gene interactions were constructed. In order to set up the prognosis evaluation system, we established a prognosis model by integrating PCDGs. The Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression analysis as well as the multivariate Cox proportional hazard regression analysis were used to construct the ve-genes signature (MUC1, TCF7L2, TGFB2, TRIAP1, CBS) for prediction in Gene Expression Omnibus (GEO) cohort. According to the median risk score, survival analysis was conducted to evaluate the prognostic value of the risk score in GEO cohort. Next, by combining other clinic-pathological independent prognostic factor, stage at diagnosis, a nomogram was established to predict individual survival probability. Result: GO analysis showed that the 15 prognostic PCDGs were mainly enriched in apoptotic signaling pathway, regulation of secretion and p53 signaling pathway. KEGG analysis showed that aforesaid genes were mainly related to PI3K-Akt signaling pathway, diverse neoplasms signaling pathway and hepatitis B. Drug-gene interactions displayed available drugs for inuencing osteosarcoma via programmed cell death, such as adalimumab, tacrolimus and sirolimus. The risk score was constructed based on 5 genes and patients were signicantly divided into high-risk and low-risk groups according to overall survival. In multivariate Cox regression analysis, risk score was still an independent prognostic factor (HR = 2.526, 95% CI = 1.597−3.994, P < 0.001). Cumulative curve showed that low-risk score patients were signicantly had better prognosis than that of patients with high-risk score (P < 0.001). The external independent TARGET cohort proved the validity of risk score model and the nomograph. Conclusion: From the facet of programmed cell death, we provided a multi-gene


Background
Osteosarcoma, a type of malignant tumor, whose survival rate has not been obviously improved for decades. As previous researches exhibited, osteosarcoma often showed resistance to the methods of immunity, such as inhibitor for PD-1/PD-L1 [1,2]. Thus, brand-new and e cient therapeutics are urgently needed. In the past few decades, people have learned increasingly about the death of cell [3]. And programmed cell death (PCD) as one of the breakthroughs has huge potential in application to the treatments for cancers. PCD, also named regulated cell death, is a biologically controlled process. There are many kinds of PCD, including apoptosis, ferroptosis, autophagy-dependent cell death, pyroptosis, entotic cell death and so on [3]. Among them, apoptosis, ferroptosis and autophagy have the most intensively investigated and these processes are suitable for bioinformatics analysis. Apoptosis, widely participating in early development and tumor progression, is different from necrosis in many facets, such as morphology, availability of caspases and biochemical features [4]. Although Ford et al. discovered that apoptotic tumor cells may stimulate the proliferation of tumors, triggering apoptosis is still one of the important approaches for killing cancer cells [5]. Ferroptosis, is a special form of PCD, dependent on iron and excessive peroxidation of polyunsaturated fatty acids. Additionally, targeting the sensibility of ferroptosis in tumors, many kinds of drugs like altretamine, sorafenib, and silica nanoparticles had already shown the therapeutic potential [6]. Aautophagy, widely exists in eukaryotic cells, disintegrating cellular organelles and macromolecules, may enhance the tolerance of tumor to treatments [7,8].
Increasingly, compelling evidence has shown the signi cance of PCD in the therapeutics for osteosarcoma. Via inducing cell apoptosis, many drugs play roles in antineoplastic activity in osteosarcoma, such as apatinib, As2S2, artocarpin and metformin [9][10][11][12]. Further, Liansheng and coworkers observed that, inhibited TGFB1I1 which is up-regulated in exosome from osteosarcoma can promote apoptosis and inhibit proliferation [13]. As previous researches exhibited, autophagy process in osteosarcoma may have diverse effects owing to the stage of autophagy [14]. In general, inhibition of autophagy often has a negative impact on osteosarcoma [15][16][17]. Even though there are few researches about ferroptosis in osteosarcoma, researchers still discovered relational drugs like EF24 and PEITC [18,19].
In the present research, we explored prognostic PCD-related genes (PCDGs) in osteosarcoma. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and protein-protein interactions (PPI) network map analyses were next carried out and drug-gene interactions was displayed. In order to construct a prognostic model of osteosarcoma integrating PCDGs, the least absolute shrinkage and selection operator regression (LASSO) method as well as multivariate Cox regression with stepwise method analyses were used to further screen the genes for constructing a risk score model [20]. Based on the median risk score, survival analysis was conducted to evaluate the prognostic value of the model. Then, the risk score and tumor metastasis stage at diagnosis were recognized independent prognostic factors. Further, a nomogram was established to predict individual survival probability. Finally, an external independent TARGET cohort proved the validity of the risk score model as well as the nomograph.

Data Sources and Data Preprocessing
We rstly paid attention to all the open samples from patients with overall survival (OS) rate for osteosarcoma. Total three datasets (GSE21257, GSE16091, GSE39055) in GEO (https://www.ncbi.nlm.nih.gov/gds) were collected to make GEO merged cohort, and the "ComBat" algorithm was applied to reduce the likelihood of batch effects from non-biological technical biases [21].
And then samples from TARGET (https://ocg.cancer.gov/programs/target/data-matrix) were also gathered, and FPKM values were transformed into transcripts per kilobase million (TPM) values and then log2(TPM + 1) values, which are more comparable between samples and more similar to data from microarrays [22,23]. All the gene matrices were standardized by quantiles normalization. Finally, 121 patients in GEO merged cohort and 95 patients in TARGET cohort were selected.

Construction of the prognostic gene signatures
First, respectively, Kaplan-Meier survival analysis and univariate Cox hazard analysis were applied on the entire GEO merged cohort to screen PCDGs signi cantly associated with OS rate. Then, the LASSO method was implemented to reduce dimensionality and using the R package "glmnet". Next, the multivariate Cox regression analysis with stepwise method was performed to build the risk score model [27]. As weights, beta (β) coe cients of the prognostic genes in the multivariate logistic analysis were used to calculate the risk score.
where EXP stands for the expression level of each gene. The speci city and sensitivity of the prognosis prediction based on the risk score model, were depicted by a receiver operating characteristic (ROC) curve and quanti ed by the area under the ROC curve (AUC) using "timeROC" package [28].
In order to learn whether PCDGs related risk index can be used as an independent predictor, univariate as well as multivariate Cox regression analyses were conducted. Risk score, gender, age, tumor site and metastasis stage at diagnosis were used as covariates in both GEO merged cohort and TARGET cohort.
The construction of nomogram Independent predictors, metastasis stage at diagnosis and risk score were used to construct the nomogram together using the "rms" package in R. Then, calibration curves were drawn to assess the consistency between actual and predicted survival.

Identi cation and enrichment analysis of PCDGs
The GEO merged cohort consisted of 121 patients with OS rate. The clinical characteristics of the patients in both GEO and TARGET cohort were collected. The ow chart of this study was illuminated in Supplement Fig. 1. We searched for PCDGs in the Harmonizome database [29]. A total of 836 PCDGs were selected (Supplement table 1). To further get prognostic PCDGs for osteosarcoma, Kaplan-Meier and univariate Cox hazard analyses were applied to GEO merged cohort (p < 0.05 was considered signi cant). Results showed 15 PCDGs were signi cantly correlated with OS rate (Fig. 1A). To further understand PCDGs in osteosarcoma, GO, KEGG and PPI network were applied to GEO merged cohort ( Fig. 1B-1D). Via GO analysis, among PCDGs, prognostic genes for osteosarcoma mainly participate in apoptotic signaling pathway, regulation of secretion and p53 signaling pathway, suggesting correlative treatments for osteosarcoma may be advantageous. Via KEGG analysis, the aforesaid genes were of universal applicability and they were signi cantly correlated with other tumors like hepatocellular carcinoma, colorectal cancer, prostate cancer, gastric cancer and pancreatic cancer. Via KEGG analysis, the prognostic PCDGs also were mainly enriched in PI3K-Akt signaling pathway, PD-L1 expression and PD-1 checkpoint pathway, TGF-beta signaling pathway. Next, with drug-gene interactions, interrelated available drugs were exhibited ( Fig. 2A). In facet of PCDGs of osteosarcoma, some drugs or chemical compounds may exert in uence, such as adalimumab, aspirin, cysteine, fasudil, pravastatin and repaglinide. And some of them were already used to the cure of cancer or other diseases.
Construction of the prognostic gene signatures 15 prognostic PCDGs entered the LASSO regression analysis. Nine genes (CREB3L1, DNAJC10, LAMP1, MUC1, TCF7L2, TGFB2, TRIAP1, CBS, KEAP1) were identi ed and subsequently were exerted multivariate Cox regression analysis with the stepwise method ( Fig. 2B-2C). Finally, 5 genes (MUC1, TCF7L2, TGFB2, TRIAP1 and CBS) related to the prognosis of osteosarcoma were obtained (Fig. 2D). The coe cients of each gene are shown in Table 1. The median of risk score was used to de ne the groups.
In GEO merged cohort, the distribution of risk scores in patients and the relationship between risk scores and survival time were displayed (Fig. 3A-3B). By heatmap, gene expression pro les in high-risk and lowrisk were exhibited (Fig. 3C). The genes with HR > 1(MUC1, TCF7L2, TRIAP1 and CBS) were considered dangerous, while the gene with HR < 1 (TGFB2) protective. Additionally, the high-risk group has the higher expression dangerous genes with the lower protective one. With the Kaplan-Meier cumulative curve in GEO merged cohort, the validity of the risk model was exhibited (Fig. 4A). By the time-dependent ROC curves, the AUC for 1-year, 2-year, 3-year, and 5-year OS were gathered (Fig. 4B).

The risk model based on PCDGs as an independent prognostic factor
In GEO merged cohort, both univariate and multivariate analysis showed that the risk score an independent prognostic indicator (HR = 2.764, 95%CI = 1.750-4.364, P < 0.001; HR = 2.526, 95%CI = 1.597-3.994, P < 0.001) (Fig. 4C-4D). Further, we validated this model in the external independent TARGET cohort with the same way (Fig. 5A-5B). The risk score was still proved an independent prognostic factor the same as tumor stage at diagnosis (Fig. 5C-5D).

The construction of nomogram
A nomogram is not only a highly legible illustration of a mathematical model, but also a convenient tool for predicting the outcome of individual patients in the clinical environment. The above analysis showed that the risk score as well as tumor stage at diagnosis were independent prognostic indicators. And in this study, the nomogram consisting of above signi cant risk factors was established, with C-index of 0.798 in GEO merged cohort and 0.700 in TARGET cohort (Fig. 6A). The calibration curves showed nonsigni cant deviations between predicted and actual probability in both GEO merged cohort and TARGET cohort (Fig. 6B-6C).

Discussion
Programmed cell death is crucial for the normal growth and development of an organism [30]. As a special kind of cell, cancer cell, is also controlled by programmed cell death via its speci c patterns.
Additionally, the imbalance of programmed cell death in a cell may be also cancerogenic [30]. Among them, apoptosis, ferroptosis and autophagy have been the most extensively studied. Further, programmed cell death is relational to the cancer metastasis as well as prognosis [31][32][33].
In the previous studies, there are several gene expression-based signatures for the prognosis of osteosarcoma [34][35][36][37][38][39][40]. Moreover, gene signatures based on cell death have been reported in diverse cancers, such as lung cancer, hepatocellular carcinoma and gastric cancer [32,33,41]. Although an earlier study indicated the importance of apoptosis, little attention was paid to PCDGs in prognosis prediction of osteosarcoma [36]. In this study, we explored and analyzed the prognostic PCDGs in GEO merged osteosarcoma dataset. Via drug-gene interactions, relational available drugs were excavated. LASSO regression as well as multivariate Cox regression with stepwise method analyses were exerted to develop ve prognostic markers for GEO merged cohort. In the GEO merged cohort, risk scores signi cantly strati ed patient outcomes. Further, both univariate and multivariate analysis showed that the risk score an independent prognostic indicator. Subsequently, the nomogram consisting of signi cant risk factors was established. More importantly, in the external independent TARGET cohort, the prognostic e cacy of the ve-gene signature as well as the nomogram were veri ed.
In previous researches, it has been shown that the ve genes contained in the signature are all related to cancer. MUC1, a membrane tethered mucin glycoprotein, as a high-risk related gene in our study, was proved a kind of oncogene by most human carcinomas [42,43]. In the facet of apoptosis, MUC1 often acted as an inhibitory of apoptosis in cancer cells while inducing apoptosis of activated human T cells in the study by Gimmi et al. [44][45][46][47]. As an inhibitor of apoptosis, MUC1 conferred the resistance of a variety of drugs as well as radiation treatment on cancer cells [48][49][50][51]. Further, many studies indicated the relationship between the high expression of MUC1 and poor prognosis with enhanced metastasis [52,53]. A possible underlying mechanism by Zhao and his colleagues is that, MUC1 could prolong the survival of circulating tumor cells [54]. TCF7L2, transcription factor 7-like 2, is a critical member of the Wnt/beta-catenin canonical signaling pathway and a moderator of apoptosis [55,56]. In a variety of tumors, such as colorectal cancer, acute myeloid leukemia and clear cell renal cell carcinoma, this factor was proved related to the metastasis and poor prognosis [57][58][59][60]. Further, in beta-estradiol treatment process of osteosarcoma, TCF7L2 was observed reduce the expression with other members involved in Wnt/beta-catenin canonical signaling pathway [61]. TRIAP1 (TP53 regulated inhibitor of apoptosis 1) is a notable inhibitor of apoptosis controlled by TP53 and an available target by epirubicin in osteosarcoma [62]. CBS, cystathionine β-synthase, was associated with homocysteine metabolism, remethylation metabolism, the generation of H 2 S, drug resistance, oxidative stress and ferroptosis [63][64][65]. In many neoplasms, CBS was proven differentially expressed to normal tissue and harmful to prognosis. As for osteosarcoma, HU Yang and his group revealed the difference of gene expression of CBS between tumor and normal samples, but did not explore the impact of CBS and its mechanism [66]. TGFB2 encodes a secreted ligand of the transforming growth factor-beta (TGF-β2) superfamily of proteins. TGF-β2 is capable of suppressing interleukin-2-dependent growth of T lymphocytes, and TGF-β2 can be either a proto-oncogene or a tumor suppressor [67]. According to early studies, the function of TGF-β2 was controversial and heterogeneous. On the one hand, TGF-β2 promoted the progress of malignant tumors, such as glioblastoma and breast cancer [68,69]. On the other hand, TGF-β2 may play a tumor inhibition in pancreatic ductal adenocarcinoma as well as squamous cell carcinomas [70,71]. Further, TGF-β2 introduced the dormancy of malignant disseminated tumor cells in the bone marrow and TGF-β2 introduced the restriction of metastasis in lungs [72][73][74].
For osteosarcoma population, our 5-gene risk score model based on PCDGs was shown to be a reliable predictor of prognosis. However, the present study still has some limitations. Firstly, the race and district of patient were heterogeneous because of the retrospective multiple-platform study. Secondly, integrated and uniform clinical characteristic was incomplete for osteosarcoma cohort, needing preferable clinical follow-up and data collection.

Conclusion
In conclusion, our study established a novel 5-gene signature and nomogram to forecast the prognosis of osteosarcoma patients, which may contribute to the clinical decision-making of individual therapy.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
Publicly available datasets were analyzed in this study. The data are accessible at the following repositories: https://www.ncbi.nlm.nih.gov/gds, https://ocg.cancer.gov/programs/target/data-matrix.

Competing interests
The authors declare that they have no competing interests.   Tables   Due to technical limitations, table 1 is only available as a download in the Supplemental Files section. Figure 1 We searched for PCDGs in the Harmonizome database [29]. A total of 836 PCDGs were selected (Supplement table 1). To further get prognostic PCDGs for osteosarcoma, Kaplan-Meier and univariate Cox hazard analyses were applied to GEO merged cohort (p<0.05 was considered signi cant). Results showed 15 PCDGs were signi cantly correlated with OS rate (Fig. 1A). To further understand PCDGs in osteosarcoma, GO, KEGG and PPI network were applied to GEO merged cohort (Fig. 1B-1D).

Figure 3
In GEO merged cohort, the distribution of risk scores in patients and the relationship between risk scores and survival time were displayed (Fig. 3A-3B). By heatmap, gene expression pro les in high-risk and lowrisk were exhibited (Fig. 3C).

Figure 5
Further, we validated this model in the external independent TARGET cohort with the same way ( Fig. 5A-5B). The risk score was still proved an independent prognostic factor the same as tumor stage at diagnosis (Fig. 5C-5D).