Comprehensive Analysis of Metabolism-Related lncRNA Related To The Progression And Prognosis In Osteosarcoma From TCGA


 Background: Osteosarcoma is one of the most common malignant neoplasm among children and adolescents. Studies have shown that metabolism-related pathways are more important for the development and metastasis of osteosarcoma. Long non-coding RNA (LncRNA) plays a key role in the occurrence and progression of cancer in a variety of ways, Metabolism-related lncRNA-mediated molecular mechanisms in the regulation of osteosarcoma have not been fully elucidated.Methods: In this study, all metabolic-related mRNAs and metabolic-related LncRNA in osteosarcoma were extracted and identified based on transcriptomic data from the TCGA database. The survival analysis, The univariable and multivariable independent prognostic analysis, The results of gene set enrichment analysis (GSEA) and the nomogram were used to construct a prognosis signature with metabolic LncRNA as prognostic factor.Results: 9 prognostic factors including that LncRNA AC009779.2, LncRNA AL591895.1, LncRNA AC026271.3, LncRNA LPP-AS2, LncRNA LINC01857, LncRNA AP005264.1, LncRNA LINC02454, LncRNA AL133338.1 and LncRNA AC135178.5, respectively. The survival analysis showed that the difference in expression of an individual LncRNA was closely related to poor prognosis in osteosarcoma. The univariable and multivariable independent prognostic analysis showed that the signature had good independent predictive ability for patient survival. The results of gene set enrichment analysis (GSEA) suggest that these predictors may be involved in the metabolism of certain substances or energy in cancer. The nomogram is further drawn for clinical guidance and assistance in clinical decision-making.Conclusions: This study identified multiple metabolic-related lncRNA that can be considered as novel therapeutic targets for osteosarcoma and contribute to better exploring the specific regulatory mechanisms of lncRNA in the metabolism of osteosarcoma.


Introduction
Sarcoma is a common malignant tumor in many tissues or organs [1]. Osteosarcoma, one of the most common Sarcoma, is diagnosed among children and adolescents [2]. Because it occurs in the metaphysis of long bones and the disease progresses rapidly, it often leads to inconvenience and even restricted movement of patients [2]. Although effective progress has been made in current amputation or limb salvage with chemotherapy, the patient's prognosis is not particularly satisfactory, 20% patients have metastasized at the time of diagnosis [3]. Patients with metastatic osteosarcoma have a lower survival rate [4], which means that further studies on the molecular mechanisms of its progression and metastasis are needed. Therefore, it is particularly important to explore new therapeutic targets and more accurate prognostic markers. Long non-coding RNA (LncRNA) has been shown to play an important regulatory role in the occurrence, development, and metastasis of all kinds of cancers [5][6][7]. Recently, the research on abnormal expression of lncRNA in osteosarcoma has become ever more in-depth [8,9]. On the other hand, we know that the growth of bone tissue is inextricably linked to bone metabolism [10]. Abnormal bone metabolism can lead to loss of control of bone growth or abnormal differentiation, which further results in osteoma or osteosarcoma [10]. There are very few reports that lncRNA affects osteosarcoma by regulating the metabolic process [11], and the mechanism and function of most lncRNA in regulating the metabolism of osteosarcoma are not very clear. Regardless of the fact that there are many therapeutic targets and prognostic markers for osteosarcoma, such as immunomodulation, tumor in ltrating immune cells could be used as a prognosis factor in osteosarcoma patients [12], Niu et. al had found that EGR1, CXCL10, MYC and CXCR4 could be regarded as potential biomarker of osteosarcoma [13]. There are few studies in the metabolic process, and its potential mechanism needs further exploration.
In this study, gene expression data and clinical information were available from the Cancer Genome Atlas (TCGA) database, and metabolic genes and related LncRNA were further extracted. 9 Metabolism-related LncRNA with high prognosis were nally obtained with various bioinformatics analysis methods, and the prognostic signature was constructed and veri ed. This study explains the possible metabolic epigenetic mechanism of osteosarcoma, may also provide new independent targets for the treatment of osteosarcoma.

Materials And Methods
Data downloading and processing All transcriptions data of osteosarcoma samples were downloaded from the TCGA (https://gdcportal.nci.nih.gov/) database. There was a total of 265 samples. After removing duplicated and extremely low expression genes, the mRNA and lncRNA matrix were extracted, respectively. The c2.cp.kegg.v7.2.Symbols.Gmt (https://www.gsea-msigdb.org/gsea/downloads.jsp) from the Gene Set Enrichment Analysis (GSEA) database was downloaded [14], a total of 940 related genes in metabolic pathways were extracted. Furthermore, the intersection of the extracted mRNA matrix was taken to obtain the expression data of 935 metabolic genes.
Preliminary Screening of Metabolism-related LncRNA (MRLncRNA) By using correlation test in R 4.0.3, 335 LncRNA that highly related to metabolic genes (corFilter = 0.6, pvalueFilter = 0.001) were acquired. 15 LncRNA signi cantly related to prognosis were obtained by combining with clinical information through Kaplan-Meier survival (P < 0.05) analysis and univariable Cox regression survival (P < 0.05) analysis.

Construction and evaluation of prognostic signature
We used "glmnet" package in R 4.0.3 to perform Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis, and further used the multivariable Cox regression analysis to obtain prognosis related metabolic LncRNA. These factors were used to construct a prognostic signature. Then, the risk score signature for prognostic metabolism of LncRNA was established according to the following formula: Risk Score (RS) = ∑β (MRLncRNA) ×Exp (MRLncRNA) . β represents the coe cient of the multivariable Cox regression analysis. Exp (MRLncRNA) means the individual prognostic metabolic LncRNA expression value. The risk score of each osteosarcoma patient was calculated accurately, then, all patients were divided into high risk group and low risk group though the median of risk score. In addition, a survival curve was drawn to assess the difference in survival between the two risk groups by utilizing "survival" package in R 4.0.3.

Metabolism-related mRNA-lncRNA network construction
The genes extracted from the TCGA database were utilized to construct the co-expression network, including 935 metabolic genes and 335 metabolic-related LncRNA. Similarly, we also have constructed the network of LncRNA and related metabolic genes with prognostic signatures. The visual co-expression network was produced by Cytoscape 3.7.2.

Independent prognosis and clinical correlation analysis
The univariable and the multivariable Cox regression analysis were applied to determine independent clinical prognostic factors, that is, whether individual LncRNA has a potential impact on certain clinical features or not. P value < 0.05 was considered statistically signi cant.

Survival analysis
By using "survival" package in R 4.0.3, the risk model was analyzed and evaluated whether the high and low risk groups were related to survival. Similarly, each prognostic factor was divided into high and low expression group according to the median of the expression value, then the survival curve was drawn. P value < 0.05 was statistically signi cant.

Nomogram drawing and evaluation
The nomogram was drawn with the known clinicopathological data and the risk score and evaluated. Firstly, the receiver operator characteristic (ROC) curve was used to evaluate its representativeness. Secondly, the C index was used to evaluate its predictive ability. In addition, the calibration curves of 1year, 3-year and 5-year survival rates have been established to evaluate the accuracy of the nomogram. Finally, the dimensionality of the data was reduced by principal component analysis (PCA) to determine whether an individual LncRNA can distinguish high and low risk patients.

Result
The entire research process is shown in Figure 1.

Screening of Metabolism-related LncRNA
By downloading the transcriptions data of osteosarcoma patients, the mRNA and LncRNA matrix were extracted, respectively. A total of 940 genes related to metabolic pathways were acquired from the GSEA database. The expression data of 935 metabolic genes were extracted accurately, and 335 LncRNA were obtained by correlation test in the R. These LncRNA are highly correlated with metabolic genes. A network of 935 metabolic genes and 335 related LncRNA was constructed and visualized by Cytoscape ( Figure 2).
It can be seen that most mRNA and LncRNA have a one-to-many or many-to-one relationship, which initially reveals the complexity and extensiveness of LncRNA network regulation. After incorporating the complete prognostic information, Kaplan-Meier survival analysis and univariable Cox regression survival analysis were utilized to obtain 15 LncRNA signi cantly related to the prognosis (Table 1).

Construction and Evaluation of Risk Signature
In order to further reduce and optimize the signature, the "glmnet" package in R was used to perform LASSO regression analysis, and it was found that the model ts best when λ = -3 and N = 9 (Figure 3a-b). We further used the multivariable Cox regression analysis to obtain 9 prognostic metabolic LncRNA and draw a forest map (Figure 3c). It can be observed that LncRNA AC026271.3, LncRNA LINC02454, LncRNA AL133338.1 and LncRNA AC135178.5 are high-risk factors for patient survival time and status, which implied that these three LncRNA may act as carcinogens. While LncRNA AC009779.2 is a low-risk factor, it is very likely to become a protective factor. Then we used these 9 factors to construct a prognostic risk signature: According to this formula, regarding the median of risk score as the grouping standard, we can accurately calculate the risk score of each osteosarcoma patient. Then all patients were subdivided into high risk group and low risk group. The survival difference between the two risk groups was evaluated by drawing a survival curve ( Figure 3d). Obviously, patients in the low risk group is under a higher survival rate and patients in the high risk group have a worse survival rate (P < 0.001). In addition, we drew a risk curve and a heat map of the expression of 9 LncRNA in all samples based on the constructed signatures (Figure 3eg). It can be seen that this signature can accurately distinguish high risk patients from low risk groups through different risk score and survival time.

Network construction of LncRNA and its related mRNA
In the next step, we also constructed a grid of 9 LncRNA and their related metabolic genes. Figure 4a is a Sankey diagram constructed based on co-expression relationships. The left side is related metabolic genes. The middle is LncRNA that are highly related to metabolic genes, and the right side represents prognostic type of each LncRNA. For example, it can be observed in the gure that LINC01857 may have a co-expression relationship with DGKA, PLCG2, and PLA2G2D, which means that there may be some interaction or mutual regulation mechanism between them. At the same time, it can be seen that LncRNA LINC01857 is a protective factor, which suggests that it may act as tumor suppressor genes in the development of cancer. Similarly, these mRNA-LncRNA co-expression relationships can be found more intuitively in Cytoscape 3.7.2 (Figure 4c).

Independent prognosis and clinical correlation analysis
Next, in order to verify whether the prognostic signature we constructed can function as an independent prognostic factor, that is, in order to verify whether the signature can be utilized to guide clinical decisionmaking or assist in evaluating prognostic status, just like other pathological features, we use univariable Cox regression analysis to verify respectively. In the univariable Cox regression analysis, it can be found that the hazard ratio value of risk score is 1.538 (1.313-1.802) and is statistically signi cant (P < 0.001) (Figure 4b). Similarly, in the multivariate regression analysis, it can be found that the hazard ratio value of risk score is 1.549 (1.286-1.865) and has signi cant statistical signi cance (P < 0.001) (Figure 4d). The consistent results con rm that the signature can have good independent predictive ability. In addition, we also drew a multi-index ROC curve, the results showed that risk score has better diagnostic and recognition capabilities for patients with osteosarcoma (AUC=0.861) compared with other clinical features ( Figure 4e). In addition, we further analyzed whether a single LncRNA has a potential impact on certain clinical characteristics ( Figure 5). For example, as can be seen from Figure 5a, LncRNA AL591895, LncRNA LINC02454, and LncRNA AL133338.1 have de nite differences in tumor populations over 65 years of age and younger (P value < 0.05). Similarly, LncRNA AC026271.3 and LncRNA AL133338.1 also have partial expression differences for patients of different genders (P value < 0.05) in Figure 5b. LncRNA AC009779.2 appears to be related to cancer metastasis (P value < 0.01) in Figure 5c.

Prognostic survival analysis of metabolism-related lncRNA
In order to further evaluate each lncRNA in the signature, whether they can individually affect the prognosis of the patient, we divide each prognostic factor into two groups with high and low expression according to the median of the expression value and draw the survival curve ( Figure 6). It can be seen that all 9 prognostic factors are statistically signi cant, and patients with two different prognostic status can be distinguished signi cantly. This further illustrates that they may play a potential anti-cancer or carcinogenic effect in patients with osteosarcoma.

GSEA analysis
Signed risk scores were divided into high and low risk groups according to the median value, and GSEA analysis was performed on all samples (Figure 7). In the biological process of GO, we can see the enrichment of multiple biological processes related to bone metabolism.  Figure 7d shows the aggregation of metabolic pathways and cancer pathways in the high and low risk groups in KEGG. In conclusion, GSEA analysis further describes the potential biological functions of prognostic factors.

Nomogram drawing and principal component analysis (PCA)
In order to make the signature to facilitate clinical prognosis guidance, we have drawn a nomogram by using known clinical pathological data and risk score we signed (Figure 8a). Then we evaluated the nomogram. First, the ROC curve (AUC=0.839) was drawn, which indicates the nomogram was considered to be well representative (Figure 8b). Further calculating the C index (0.812), it can be established that the nomogram has good distinguishing performance. In addition, we have also developed a calibration curve using 1-year, 3-year and 5-year survival rates to estimate the accuracy of the nomogram (Figure 8c). It can be observed that the nomogram evaluation of the 3-year survival rate is the most accurate. Finally, in order to more truly re ect the difference between the high and low risk of each prognostic LncRNA, we conducted a principal component analysis of these prognostic factors. 2d-PCA ( Figure 8d) and 3d-PCA (Figure 8e) analysis both show that, the prognostic signature composed of these LncRNA can still distinguish precisely whether the patient belongs to the low risk or high risk group after dimensionality reduction of the data. This once again veri es the accuracy of the signature.

Discussion
Sarcoma is a malignant tumor derived from connective tissue or other non-epithelial tissues [15]. It can occur in a variety of tissues and organs (osteosarcoma, leiomyoma, lymphosarcoma, synovial sarcoma [1]), among which osteosarcoma is more common in children and adolescents [16]. Due to the active cartilage and bone metabolism in children and adolescents, osteosarcoma develops rapidly and makes early metastasis [17]. Therefore, the standard of treatment is immediate amputation or limb salvage with medication [17]. However, 20% of patients are prone to early metastasis at diagnosis [3], so it is necessary to explore the pathogenesis and metastasis mechanism. Tumor cells are extremely active in their internal energy and material metabolism due to their immortal proliferation [10]. Several studies have reported the molecular mechanisms, which are about the speci c metabolic process of osteosarcoma cells in the body, they prove the key role of the regulation of metabolic pathway in the process of osteosarcoma [10,18,19].
In recent years, with the rapid development of high-throughput technology, the mechanism of various diseases including cancer has been studied more and more deeply, especially in epigenetics, which is represented by LncRNA and circRNA [5,6]. Sequencing technology identi ed 5 LncRNA including sense, antisense, bidirectional, intergenic and intronic LncRNA [20,21]. These 5 LncRNA work in different ways.
Although more and more LncRNA have been reported in osteosarcoma, we found that there are comparatively few reports on the regulatory role of LncRNA in the metabolism of osteosarcoma, the metabolic changes of osteosarcoma are an extremely important biological process. This study rstly screened out metabolism-related LncRNA through comprehensive bioinformatics analysis, which provides guidance for further research on the molecular mechanism of osteosarcoma metabolism.
In this study, we rst downloaded the data and extracted the corresponding mRNA and LncRNA matrix, we had searched for LncRNA with signi cant correlations through the corresponding metabolic-related genes, then we had made correlate them with clinical information to screen out 15 prognostic-related metabolic LncRNA. Further through univariable Cox regression, LASSO analysis and the multivariable Cox regression to obtain 9 metabolic-related LncRNA, which including LncRNA AC009779.2, LncRNA AL591895.1, LncRNA AC026271.3, LncRNA LPP-AS2, LncRNA LINC01857, LncRNA AP005264.1, LncRNA LINC02454, LncRNA AL133338.1 and LncRNA AC135178.5. In addition, the multivariable Cox regression analysis is utilized to construct a prognostic signature and risk score based on the signature. The risk curve analysis model can distinguish patients in high and low risk groups. Survival analysis in high and low risk groups indicates that the two groups of patients have substantially different prognosis. After that, we performed univariable Cox and multivariable Cox analysis of the prognostic signature to see if it can be used as an independent prognostic factor. The multi-index ROC curve shows that the diagnostic power of the prognostic signature is preferable to any clinical factor. At the same time, we also found some LncRNA and clinical features such as age, sex and tumor metastasis. Draw a survival curve for each prognostic factor and nd that they can be utilized to distinguish the prognostic survival rate of patients independently. GSEA analysis shows that the high and low risk groups are closely linked to cancer and multiple metabolic pathways. Finally, we constructed a nomogram to quantify clinical prognosis.
Some of these 9 prognostic factors have been reported in other cancers, such as LPP-AS2, LPP-AS2 is thought to be associated with immune cells in soft tissue sarcoma [22], and another report that LPP-AS2 regulates EGFR expression by sponging Mir-7-5p as ceRNA, and it had been found that the promoter region of LPP-AS2 combined with c-MYC to continue to promote the occurrence of glioma [23]. LINC01857 has been declared in a variety of cancers. LINC01857 can distinguish the prognostic survival in hepatocellular carcinoma patients with brosis [24]. Another study showed that it can be used as a novel molecular marker for pancreatic cancer [25]. LINC01857 can be bind to certain protein promoters to regulate the transcription process. For example, LINC01857 interacts with CREBBP to promote the transcription of H3K27Ac and CREB1 and promote the progression of breast cancer [26]. In addition, another mechanism of LINC01857 is to combine with certain miRNA to play an important spongy effect. For example, LncRNA LINC01857 sponge miR-1281 to promote TRIM65 expression and facilitate the development of glioma [27]. A recent study revealed that LINC01857 promotes the migration and invasion of gastric cancer cells by regulating microRNA-200b [28]. LINC02454, as a long intergenic LncRNA, is mainly reported in papillary thyroid carcinoma [29,30]. In addition, other LncRNA in this study have not been reported in the relevant literature. This shows that the prognostic signature we constructed has a certain degree of accuracy and is worth in-depth study. Although this study integrates a variety of bioinformatics methods and advanced statistical methods and uses a variety of analyses to construct prognostic signatures and multiple veri cations, there are continued to be some limits. First, the samples come from a single database, which may be not very representative. Second, the necessary cell or tissue level experimental veri cation were lacked. The above two points are the main de ciencies and are also the main aspects of our next research.

Conclusion
We constructed a prognostic signature using LncRNA as a factor from the metabolic process of osteosarcoma patients, in order to provide guidance for clinical activities and assist clinical decisionmaking, and provide a reference for the subsequent regulation of lncRNA in the metabolism of osteosarcoma.  Tables   Table 1 is not available with