Signature Based on Immune-Related LncRNA Can Predict Overall Survival of Osteosarcoma Patients

Background: Osteosarcoma is a malignant bone tumor common in children and adolescents. Metastatic status remains the most important guideline for classifying patients and making clinical decisions. Despite many efforts, newly diagnosed patients receive the same therapy that patients have received over the last 4 decades. With the development of high-throughput sequencing technology and the rise of immunotherapy, it is necessary to deeply explore the immune molecular mechanism of osteosarcoma. Methods: We obtained RNA-seq data and clinical information of osteosarcoma patients from TCGA database and TARGET database. With the help of co-expression analysis we identied immune-related lncRNA and then by means of univariate Cox regression analysis prognostic-related lncRNA was screened out. And also by using least absolute shrinkage and selection operator regression method a model based on immune-related lncRNA was constructed. The differences in overall survival, immune inltration, immune checkpoint gene expression, and tumor microenvironmental immunity type between the two groups were evaluated. Results: We constructed a signature consisting of 13 lncRNA. Our results show that signatures can reliably predict the overall survival of patients with osteosarcoma and can bring net clinical benets. Further more, the signatures can be used for further risk stratication of the metastasis patients. Patients in the low-risk group had higher immune cell inltration and immune checkpoint gene expression. The results from gene set variation analysis show that patients in low-risk group are closely related to immune-related pathways when compared with patients in high-risk group. Finally, patients in the low-risk group are more likely to be classied as TMIT I and hence more likely to benet from immunotherapy. Conclusion: Our signature may be a reliable marker for predicting the overall survival of patients with osteosarcoma.


Background
Osteosarcoma is a malignant bone tumor that commonly affects children and adolescents [1]. In the 1970s, chemotherapy was introduced and signi cantly improved patient-survival. Currently, patients with newly diagnosed osteosarcoma routinely receive neoadjuvant chemotherapy, surgical removal of the lesion, and undergo adjuvant chemotherapy. However, approximately 30% of patients die within 5 years of treatment due to being unresponsive to chemotherapy and metastasis [2].Today, clinical characteristics such as metastatic status remain the most important criteria for stratifying patients and making clinical decisions. However, in clinical work, a notable difference in the prognosis of patients with similar clinical characteristics is often observed. In addition, perhaps due to the relatively low incidence of osteosarcoma, the formulation of new drugs and new treatment programs has reached a deadlock, and there has been no breakthrough in the past 40 years [3]. Hence, in order to nd new biomarkers and therapeutic targets we need an even deeper understanding of the molecular mechanism of osteosarcoma, which will help us.
Nowadays, immunotherapy has been used as a new type of anti-tumor method, which has shown reliable e cacy in a variety of tumors including melanoma and hepatocellular carcinoma [4][5][6]. This new treatment method opens hope for patients with osteosarcoma [7,8]. However, due to many factors, the effectiveness of these new therapies in osteosarcoma -patients is still unclear. Therefore, there is an urgent need to understand the immune molecular mechanisms of osteosarcoma. LncRNA is de ned as a kind of non-coding RNA longer than 200 nucleotides [9]. Over a decade ago, little was known about this RNA. However, with the rapid development of technology, more and more evidence indicated that this non-coding RNA has a vital role in the development of many tumors, including osteosarcoma [10][11][12][13].
Therefore, with the help of co-expression analysis, this study identi ed the immune-related lncRNA in osteosarcoma and developed reliable prognostic signatures based on the lncRNA to stratify the patients more precisely. In addition, the differences in immune characteristics between high-risk and low-risk patients were identi ed. Finally, the relationship between our signature and the e cacy of immunotherapy was also explored based on marker or typing developed in the previous literature.

Data collection
Normalized RNA-Seq (FPKM format) data for 88 osteosarcoma patients was downloaded from the TCGA database (https://cancergenome.nih.gov/). At the same time, we downloaded the clinical data related to these patients from the TARGET database (https://ocg.cancer.gov/programs/target). Table 1 lists the clinical characteristics of these patients. All these datas were obtained from a public database, so no additional informed consent was required. Identi cation of immune-related lncRNA A list of immune-related genes was downloaded from the immunology database and the analysis portal (ImmPort) database [14]. The list consisted of a total of 2498 unique immune-related genes. The lncRNA pro le from RNA-seq data was extracted using R software. Correlation between lncRNA and immunerelated genes was then calculated. LncRNA with a correlation coe cient of 0.4 and P < 0.05 are considered as immune-related lncRNA and were used for subsequent analysis.
Construction and evaluation of lncRNA model Univariate Cox regression analysis was used to identify survival-related lncRNA from the abovementioned immune-related lncRNA. The lncRNA with a P value from the univariate Cox proportional hazards analysis (Wald test for predictive potential) of less than 0.5 is considered as a prognostic-related lncRNA. The prognosis-related lncRNA expression data set was used as an input to the survival model. Subsequently, 1000 iterations were performed using the least absolute shrinkage and selection operator (LASSO), and the retained lncRNA in more than 50 iterations was considered as an important lncRNA. These important lncRNAs are incorporated into the proportional hazards model in turn, and the area under the receiver's operating characteristic curve (AUROC) was calculated. The model when AUROC reaches the peak is the optimal model. Subsequently, each patient's risk score based on the best model was calculated, and time-dependent receiver operating characteristic (ROC) curve analysis was used to determine the optimal cut-off value for the risk score [15]. Risk score = βlncRNA(1) × expr lncRNA(1) + βlncRNA(2) × exprlncRNA(2) +···+βlncRNA(n) × exprlncRNA(n) [16]. The patients were then divided into high-risk group and low-risk group based on the best cut-off value. The log-rank test was used to achieve the overall survival difference between the two groups, and the KM survival curve was drawn. Using the receiver operating characteristic (ROC) curve analysis the speci city and sensitivity of the risk score was assessed. Finally, multivariate Cox regression analysis was used to explore the independent prognostic value of risk scores.

Construction and evaluation of lncRNA model
The relationship between risk scores and clinical characteristics was further investigated. We divided patients into 6 subgroups based on metastatic status, gender and other clinical characteristics, and explored the prognostic value of risk scores in each subgroup. The forest plot of subgroup analysis was then drawn by means of R software. In addition, based on the metastatic status and risk status, the patients were divided into four groups, and the overall difference in the survival rate among patients in each group was calculated and a KM survival curve was drawn.

Estimation of immune in ltration
With the help of 'Microenvironmental cell counter (MCP-counter)' method the immune in ltration was assessed. This method can reliably quantify the absolute abundance of 8 immune and 2 stromal cell population [17]. Then by means of the 'ESTIMATE' R package the stromal score, immune score and estimate score were estimated (https://sourceforge.net/projects/ estimateproject /) [18]. Finally, the single sample GSEA was used to evaluate the enrichment of 29 immune-related gene sets in each sample [19]. The difference in immune in ltration between the two groups of patients was evaluated. 'Tumor microenvironment immune type (TMIT)' was used to speculate the e cacy of anti-PD-1 / PD-L1 treatment. TMIT divides patients into four types based on PD-L1 and CD8A mRNA expression, which has been shown to predict patients' response to immune checkpoint inhibitors in pan-cancer analysis [20].
Using SubMap analysis (Gene Pattern) to compare gene expression pro les of osteosarcoma patients with melanoma patients treated with immunotherapy to indirectly predict the e cacy of immunotherapy in osteosarcoma patients [21,22].

Gene set variation analysis
Genome Variation Analysis (GSVA) is an unsupervised gene set enrichment method that can estimate the scores of certain pathways or markers over a sample population [23]. We downloaded the 'c2.cp.kegg.v7.1.symbols' and 'c5.all.v7.1.symbols' gene sets from the 'Molecular Signatures Database' for GSVA. Subsequently, the differential analysis of these gene sets was performed using the LIMMA package of R software, and the gene set with adjusted P < 0.05 was regarded as the differentially expressed gene set.

Construction of the nomogram
The rms package of R software was used to build a nomogram based on clinical factors and immunerelated lncRNA risk scores. We then assessed the predictive capability using the concordance index (C-index). In addition, calibration plots were drawn to verify the accuracy of the nomogram. Finally, by means of decision curve analysis, the clinical utility of the nomogram was assessed.

Statistical analysis
All analyses in this study were performed with R software (version 3.6.3) and P < 0.05 was considered statistically signi cant.

Identi cation of prognostic-related immune-related lncRNA in osteosarcoma
Firstly, we isolated lncRNA expression data from the RNA-SEQ data of 88 patients downloaded from the TCGA database. Subsequently, a total of 2498 immune-related genes were extracted from the ImmPort database. Supplementary Table 1 provides detailed information on these immune-related genes. As shown in Supplementary Table 2, 1986 immune-associated lncRNA were identi ed by constructing immune-lncRNA co-expression networks. We analyzed the relationship between these immune-related lncRNA and the survival of 85 patients using univariate Cox regression analysis (3 patients lack valid clinical data). Finally, 240 lncRNAs were identi ed as survival-related immune-related lncRNA and used for further analysis. Supplementary Fig. 1 and Supplementary Table 3 shows the results of univariate cox analysis of these lncRNAs.
3.2 Establishing a risk score and testing its prognostic value in osteosarcoma As mentioned above, these lncRNAs related to prognosis were subjected to 1000 iterations using LASSO analysis. Subsequently, the 29 lncRNA retained in more than 50 iterations were considered important lncRNA for further analysis. Supplementary Table 1 shows the details of these lncRNA. Incorporating these important lncRNA into the COX model, the optimal model consisting of 13 lncRNA was determined based on the 5-year survival AUROC. The risk score of each patient was calculated, and the cut-off value of the risk score was determined to be 0.186 using the time-dependent receiver operating characteristic (ROC) curve analysis. The results of the survival analysis showed that patients in the low-risk group had longer survival rate than patients in the high-risk group (Fig. 1D, P < 0.001). Figure 1 shows the process of building this model. Supplementary Fig. 2 shows the relationship between 13 lncRNA and immune genes.
To verify the independent prognostic value of the risk score, we performed a multivariate regression analysis. As shown in Fig. 2B, after adjusting for other variables (including age, gender, and metastatic information), we found that the risk score may be an independent predictor. The results of the forest plot show that the higher the risk score, the shorter the overall survival of the patient (hazard ratios: 2.974, 95% of con dence intervals: 2.164-4.088, P = 1.89e − 11). The results of the time-dependent ROC analysis show that the risk score has good discriminative ability. As shown in Fig. 2A, no matter how the patient's survival time changes, the risk score always has an excellent discriminating ability. On the contrary, as the patient's survival time prolongs, the discriminating ability of metastatic status continues to decline.

Relationship between risk score and clinical characteristics
To further explore the strength of the risk score, we divided patients into 6 subgroups based on their age, gender and metastatic status, and explored the prognostic value of risk scores in each subgroup. As shown in Fig. 3A, the risk score shows a good prognostic value in each subgroup. Metastatic status is an important basis for formulating treatment plans for patients with osteosarcoma. Therefore, we further divided patients into 4 groups according to their metastatic status and risk status. As shown in Fig. 3B, there was no signi cant difference in overall survival rate between metastatic patients and nonmetastatic patients in the low-risk group. Among patients in the metastasis group, patients in the low-risk group had a longer overall survival than those in the high-risk group.

Evaluation of differences in immune in ltration between high-and low-risk groups
We further evaluated the differences in immunological characteristics between the two groups of patients. As mentioned above, the abundance of 10 immune-related cells was calculated using the MCPcounter method. As shown in Fig. 4A, a signi cant difference was observed between the two groups of patients. Compared with patients in the high-risk group, the abundance of the 8 cell populations in the patients in the low-risk group was higher (B-cell lineage, CD8 + T cells, Cytotoxic lymphocytes, Endothelial cells, Monocytic lineage cells, Neutrophils, NK cells, T cells). Then, the estimate, immune and stromal scores were calculated using the ESTIMATE algorithm. Similarly, patients in the low-risk group had higher estimate scores, immune scores, and stromal scores than those in the high-risk group (p < 0.001, Fig. 4B). We further used ssGSEA to evaluate the difference of 29 immune-related gene sets or immune cells between the two groups of patients as a supplement. The results of ssGSEA have reached similar conclusions. Most immune-related gene sets or immune cells are signi cantly enriched in the low-risk group (Fig. 4C). Finally, we explored the relationship between abundance of 10 immune-related cells and risk score. As shown in Fig. 5, with the increasing risk score, the abundance of immune cells kept decreasing, especially CD8 + T cells.
The differences in gene expression of some immune-related pathways were further evaluated and heatmaps were drawn to show the results. We found that genes related to activated T effector and IFNγ pathway such as STAT1, CCL4, CXCL9, CXCL10 were signi cantly up-regulated in the low-risk group (Fig. 6B).
3.5 The relationship between signature and the expression of immune checkpoint gene, tumor microenvironment type As mentioned above, the association between the expression of 7 potentially targetable immune checkpoint genes between the two groups was assessed. As shown in the gure, all immune checkpoint genes in the low-risk group were more highly expressed, although PDCD1 did not show statistical signi cance (Fig. 6A). Further analysis showed that the CD8A gene is also highly expressed in the low-risk group. Similarly, tumor lymphocyte in ltration was higher in the low-risk group. However, the optimal cutoff values for PD-L1 and CD8A mRNA expression have not been determined. Therefore, we analyzed the relationship between the risk score value and PD-L1 gene expression, CD8A gene expression, and TIL. The results showed that as the risk score increased, the expression of PD-L1 gene and CD8A gene decreased ( Fig. 6C-D). Unfortunately, although TIL also showed a similar trend, it did not reach statistical signi cance (Fig. 6E). However, the results of the box plot show that the TIL of the low-risk group is higher than that of the high-risk group (Fig. 4C).
In addition, we use SubMap analysis to further study the relationship between MRGP signature and immunotherapy e ciency. Using subclass mapping, the expression pro les of the two groups of patients (high-risk group and low-risk group) were compared with a published immunotherapy data set. This data set records the expression data of 47 melanoma patients treated with programmed cell death protein 1 (PD-1) immune checkpoint inhibitors or cytotoxic T lymphocyte associated protein 4 (CTLA-4) immune checkpoint inhibitors. The results showed that the expression pro les of patients in the low-risk group were correlated with those in the PD-L1 response group (Fig. 6F). This indicates that patients in the lowrisk group are more likely to bene t from PD-L1 therapy.

Identifying the differences in biological pathways and processes between the two groups of patients
We used GSVA to study the differences in biological pathways between the two groups of patients. As shown in the Fig. 7, signi cant differences were observed in the biological pathways between the two groups of patients. We found that in the low-risk group, most immunization-related pathways had higher GSVA scores. It is worth noting that patients in the high-risk group had higher GSVA scores for certain metabolic-related pathways. Supplemental Table 4 shows detailed information on GSVA results.

Construction of Nomogram Based on Risk Score
We developed a nomogram that combined risk scores with traditional clinical factors (Fig. 8), and tested the accuracy of the nomogram using a calibration curve. As shown in Fig. 8B, the 3-year and 5-year overall survival predictions show that the nomograms have good accuracy. The C index of the nomogram is 0.924, which indicates that our nomogram has a good degree of discrimination. These results suggest that the new nomogram shows reliable performance in predicting patient prognosis. Finally, the results of the DCA analysis show that the model combined with the risk score can bring clinical net bene ts.
Although with the increasing development of limb salvage surgery techniques for osteosarcoma, the patient's postoperative limb function has been greatly improved, but the patient's overall survival is still similar to that of decades ago [24][25][26]. In recent years, with the continuous development of highthroughput sequencing technology, exploring speci c mRNA expressions, such as tumor microenvironment genes and metabolic genes, can help us better identify tumor diversity and develop personalized treatment strategies [27,28]. This study identi ed immune-related lncRNA in osteosarcoma patients based on IRG in the Immport database. These lncRNA were then used to construct a signature consisting of 13 lncRNA. Our signature can reliably identify high-risk patients. The results of subgroup analysis and nomogram further support our conclusion. Considering that our signature is based on immune-related lncRNA, we used three methods to study the differences in immune characteristics between the two groups of patients. The results showed that the low-risk group was closely related to immune cell in ltration and immune-related gene expression. In addition, compared with the high-risk group, the immune checkpoint gene expression was higher in the low-risk group. These results indicate that our signature can effectively identify high-risk patients and help to understand the immune microenvironment of osteosarcoma.
Presently, the presence or absence of metastases at diagnosis is still a key indicator for identifying patients with high-risk osteosarcoma [29]. However, some patients have a good prognosis, despite the diagnosis of metastasis. Therefore, effective identi cation of this part of patients helps to formulate more accurate treatment plans. By means of further analysis of the patient's risk score and metastatic status, we found that the risk score can effectively screen this part of the patient. Compared with metastatic patients in the high-risk group, metastatic patients in the low-risk group had signi cantly higher overall survival. However, due to the limitation of sample size, further prospective studies are needed to verify our results.
Recently, immunotherapy, especially immune checkpoint inhibitors, have shown great value in the treatment of various tumors [30][31][32]. But selecting patients who may bene t from this treatment is crucial. Recently, various markers or types have been developed to predict the e cacy of immunotherapy. Among them, the expression of PD-L1 gene has been proved to be an effective biomarker for anti-PD-1 / PD-L1 treatment in various tumor [33][34][35]. Therefore, we explored the expression differences of 7 immune checkpoint genes including PD-L1 gene between the two groups. The results showed that although the PDCD1 gene did not reach statistical signi cance, all immune checkpoint genes tended to be highly expressed in the low-risk group. In addition, studies have shown that combining PD-L1 gene expression with tumor lymphocyte in ltration can be very helpful in dividing the patients into four types : type I or adaptive immune resistance (PDL1 (+), TIL (+)), II Type or immunological ignorance (PD-L1) (-), TIL (-)), type III (PD-L1 (+), TIL (-)) and type IV or immune tolerance (PD-L1 (-), TIL (+)).Among patients with melanoma, patients with type I are most likely to bene t from immunotherapy [36]. In our study, compared with patients in the high-risk group, patients in the low-risk group were more likely to be classi ed as type I because of the high expression of the PD-L1 gene and TIL. A recent pan-cancer based study showed that combining PD-L1 gene and CD8A gene expression can also divide patients into four types: TMIT I (PDL1 (+), CD8A (+)), TMIT II (PDL1 (-), CD8A (-)), TMIT III (PDL1 (+), CD8A (-)), TMIT IV (PDL1 (-), CD8A (+)). From the perspective of pan-cancer genome, TMIT I (PD-L1 + CD8A +) is associated with high PD-L1 expression, high mutation load / new antigen, high MSI, PD-L1 ampli cation and the presence of oncogenic viruses [20]. In addition, Yu-Pei Chen et al further proved that in some solid tumors, patients with TMIT type I are most likely to bene t from immunotherapy and calculate the optimal cut-off value [37]. Similarly, our results indicate that patients with lower risk scores are more likely to be classi ed as TMIT I patients. In addition, we compared the expression pro les of osteosarcoma patients with melanoma patients treated with immune checkpoint inhibitors by submap analysis. There is a certain correlation between the expression pro les of the low-risk group and the PD-L1 response group. In conclusion, it is reasonable to speculate that immunotherapy may be an effective treatment for low-risk groups. However, due to the lack of data on immunotherapy in patients with osteosarcoma, we were unable to calculate the cut-off value of the risk score to predict the patient's response to immunotherapy. In addition, further research is also needed to verify the application of risk score in predicting the e cacy of osteosarcoma-immunotherapy.
It must be admitted that our research has some limitations. Firstly, the sample size is relatively small. Although osteosarcoma is a rare tumor with a low incidence, we believe that only a larger sample size will make the conclusion more convincing. Secondly, due to the scarcity of lncRNA research in osteosarcoma, we were unable to nd other lncRNA data with clinical information, so we failed to set up external validation. Finally, due to the lack of immunotherapy data of osteosarcoma patients, we could only combine the conclusions of previous studies to speculate the value of risk scores in predicting patients' response to immunotherapy. Further research is needed to prove our conclusion.

Conclusion
In conclusion, our signature can effectively predict the overall survival of osteosarcoma patients and can be used as a reliable supplement to the metastatic state to formulate more accurate treatment plans. Finally, our signature is expected to provide some guidance for identifying patients who may bene t from immunotherapy.

Declarations
Ethics approval and consent to participate No ethical approval nor informed consent was required in this study due to the public-availability of the data used.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.