Establishment of a 8 Immune-Related LncRNA Signature for Predicting the Prognosis of Soft Tissue Sarcoma

Background: Soft tissue sarcoma is relatively rare and highly heterogeneous, which brings great diculties to treatment. Long non-coding RNA acts a vital role in the occurrence and progression of soft tissue sarcoma, especially in the tumor-related immune process, which has become a hot spot of current research. Therefore, we are committed to developing lncRNA markers related to immunity to promote the treatment and prognosis of patients with soft tissue sarcoma. Methods(cid:0)Based on the TCGA-SARC and GTEx data set, we screened out 8 prognostic-related immune lncRNAs and constructed a nomogram, which was veried in the test set. Furthermore, immune inltration analysis was carried out on patients of high and low risk. Results: Based on the results of Pearson's correlation coecient, we obtained 859 immune-related lncRNAs. After difference analysis, we nally determined 54 different lncRNAs. Univariate and multivariate cox regression analysis nally determined 8 immune-related lncRNAs to construct prognostic models and nomograms to predict the prognosis of STS patients. The above results have been veried in external data sets, indicating that this model has good predictive ability. Gene Set Enrichment Analysis and ESTIMATE analysis showed obviously differences exist in the immune inltration status and immune cell subtypes of high- and low-risk patients. Conclusion: We constructed an immune-related lncRNA pattern to predict the survival status of soft tissue sarcoma patients.


Introduction
Soft tissue sarcoma is a kind of tumor with very high rarity and heterogeneity. It contains at least 100 different histological and molecular subtypes with different clinical features. [1]. Because of these characteristics, soft tissue sarcomas are often di cult to diagnose, their treatment options are limited, and the results are mediocre. [2].In view of the actual situation of STS treatment, it is imperative to nd a more accurate prognostic assessment method.
More and more studies have shown that immune response is crucial in tumor progression, as well as in soft tissue sarcoma. Recent studies have con rmed that there is a large amount of immune heterogeneity between subtypes of soft tissue sarcoma, and clinical trial reports have shown that they have a clear response to immunotherapy. [3]. The combined use of immunotherapy and immune checkpoint inhibitors has shown a clear effect on certain types of soft tissue sarcoma. [4].
Many researches have demonstrated that lncRNAs related to immune have a signi cant impact on tumors. [5]. However, there are relatively few studies on immune-related lncRNA in soft tissue sarcoma.
Therefore, here, for the rst time, we have established a 8 immune-related lncRNA signature to predict the prognosis of STS patients, which is been veri ed accurately in the validation set. We performed GSEA analysis and immune correlation analysis on the high and low risk groups identi ed by the multivariate cox analysis. In conclusion, this 8-DEIRLs can predict the prognosis of STS patients very well, and may contribute to the precise treatment and immunotherapy of STS patients.

Collection of Sample and Data
We downloaded RNA expression pro les with corresponding clinical features from the UCSC Xena website (https:// https://xena.ucsc.edu/) in the TCGA-SARC and GTEx databases. The GTEx database contains RNA transcriptome data of 54 normal tissue samples from healthy individuals. We obtained the RNA sequencing data of the muscle and adipose tissue from the GTEx database and used it as a control for comparison. There are 263 tumor samples and 2 normal samples in TCGA-SARC, and 911 normal samples in GTEx. A total of 1176 samples are used as the training set. The GSE21050 gene sequence data ,which contained 310 sarcoma samples, was identi ed as an external validation.

Immune-related LncRNA
First, we isolated the expression data of LncRNA from the TCGA-SARC genome data. Secondly, we obtained the transcriptome data of genes related to immune, based on following gene set, IMMUNE_RESPONSE and IMMUNE_SYSTEM_PROCESS. Then we performed Pearson correlation coe cient analysis on the immune gene matrix and LncRNA expression matrix, and obtained immunerelated LncRNA expression data, based on R > 0.4 and P < 0.001. With the help of the 'limma' package, we performed a differential analysis of these LncRNAs. Based on |logFC| >0.05 and adjust p < 0.05, we obtained 54 LncRNAs and proceeded to the next excavation.

Identifcation of immune-related LncRNA related to OS and establishment of prognostic signatures
We performed univariate cox regression analysis, LASSO regression analysis, and multivariate cox regression analysis, and screened out 8 prognostic-related LncRNAs. The risk score was calculated as follows: riskScore β i is de ned as the coefcient of lncRNA i of the multivariate cox regression analysis; G i presents level of each lncRNA. STS patients were separated into the highwe-risk group and the lowwe-risk group on account of median risk score. To assess the accuracy of results, we analyzed the data in the test set at the same level. To assess the availability of our signature, we conducted overall survival analysis to evaluate the OS differences in high-risk and low-risk patients. ROC curves at 3 and 5 years were also generated to evaluate the credibility and accuracy of the signature. In the next step, we performed univariate cox and multivariate cox analysis on the risk score and patient clinical information to construct a nomogram.

The Gene Set Enrichment Analysis
To investigate the differences in gene function between high-and low-risk STS patients, we conducted a GSEA analysis, including the following gene set : IMMUNE_RESPONSE, and IMMUNE_SYSTEM_PROCESS.

Analysis of immune in ltration in high risk and low risk STS patients
We analyzed the relationship between 8 DEIRLs risk score and immune microenvironment in STS patients, including ESTIMATEScore, TumorPurity, ImmuneScore, StromalScore.

Screening of the differentially expressed immunerelated LncRNAs (DEIRLs)
We isolated 13,832 lncRNA expression matrices from the TCGA-SARC database, and then we downloaded gene set related to immune (IMMUNE_RESPONSE and IMMUNE_SYSTEM_PROCESS ) from MSigDB, and abstract the SARC immune-related gene expression matrix, a total of 332. Based on the criteria of R > 0.4 and P < 0.001, pearson correlation coe cient analysis was performed between expression matrix of this genes and lncRNA matrix, then 859 immune-related lncRNAs were determined. Furthermore, we performed a differential analysis of these lncRNAs in R. Via 'limma' package ,|logFC| >0.05 and adjust p < 0.05 was used as the selection criterion, and nally 54 DEIRLs were obtained. The ow chart of this research is shown in Fig On account of the median risk score, patients with STS were separated into two groups, high-risk and low risk. We drew expression heatmaps, risk distribution plots and survival status pro les of the 8 identi ed DEIRLs, and compared the survival difference between the two groups both in training set and test set. (Fig. 4A ,4C,4E,4G). Similar differences were also obtained in the test group, which veri ed the prognostic model. (Fig. 4B,4D,4F,4H). As shown in Fig. 5Aand 5B, the 8 DEIRLs characteristic can satisfactorily predict the survival status of STS patients, with AUC : 0.784 (test set:0.585).

E valuation of the lncRNA signature
The results of univariate and multivariate independent prognostic analysis showed that 8 DEIRLs risk characteristics were obviously related to the survival status of STS, with p-value < 0.001. (Fig. 6A and  6B).Analysis of multiple ROC curves showed that risk score signature had the largest AUC area. (Fig. 6E). The size of AUC represents the prognostic e ciency of the 8 DEIRLs s model. The larger the area, the better the predictive effect of the patient's prognosis. In addition, based on the "timeROC" (version 0.4) package in R, curves were plotted to evaluate the predictive value. (Fig. 5Cand 5D). Our results showed that the 8 DEIRLs prognostic model could well predict the 3-and 5-year survival rate (AUC 0.76 and0.765). These results demonstrated that the accuracy and sensitivity was excellent. Based on multiple Cox regression, we conducted a prognostic nomogram to predict the 1-year, 3-year and 5-year survival possibility. (Fig. 7A). Furthermore, calibration plots of 3-, and 5-survival prediction were used to assess the predictive ability of nomogram, as shown in Fig. 7B-D. The calibration curve showed that the nomogram had a high consistency between the survival state prediction results and the actual results in the training and test set.

Gene set enrichment analysis (GSEA)
We used GSEA software to analyze the differences of immune gene set in high-risk and low-risk patient groups, and the results showed that a higher degree of immune gene enrichment exists in low-risk population. In addition, KEGG pathway analysis and GO analysis are also used to further explore riskrelated pathways and genes. Figure 8.

Analysis of immune in ltration between high-and lowrisk patients
Using the ESTIMATE algorithm, we studied the differences of tumor immune status in high-and low-risk patient groups. Figure 9A. Compared to the low-risk patients, the high-risk patients had higher tumor purity. However, the StromalScore ImmuneScore, ESTIMATEScore were all lower. (Fig. 9B-D). Given the above results, we also explored the correlation between risk groups and immune cell subtypes. The results showed that CD4 + cells, Macrophage cells, Neutrophil cells were related to risk groups (P < 0.05).

Discussion
In the contemporary era of the development of precision medicine, tailoring a treatment plan for the clinicopathological and molecular characteristics of each patient's tumor is extremely important for the treatment of the patient. [6,7].The doctor's preoperative evaluation, postoperative prediction, and followup have a profound impact on the quality of life of cancer patients. [8,9].
Doctors are exploring targeted precision therapies for speci c histological subtypes and genetic mutations of each STS patient. There is an urgent need to link the patient's transcriptome information with the best treatment strategy. [10].Some recent studies have made the genomic characterization of subtypes of soft tissue sarcoma more precise and discovered some prognostic-related molecular markers. [11].
Long non-coding RNA (lncRNA) has multiple functions in regulating gene expression at both transcription and translation levels, and more and more studies have found that lncRNA is closely related to tumor immunity. [12]. With the development of bioinformatics, more and more immune-related lncRNAs have been mined to construct tumor prognosis models. [13]. [14], [15].In this study, we constructed for the rst time a lncRNA model including 8 immune-related lncRNAs, and veri ed the accuracy of this model as a marker for the survival status of soft tissue sarcoma. First, we analyzed the transcriptome data of TCGA SARC and obtained the lncRNA co-expressed with immune genes. After univariate and multivariate cox regression analysis, we nally determined 8 immune-related lncRNAs:C5orf56, LINC00294, LINC01023, PCOLCE-AS1, LINC00944, LINC01140, SERTAD4-AS1, THUMPD3-AS1.Next, by comparing with other clinical characteristics, the score determined by 8 DEIRLs has the largest ROC value, indicating that this model has an excellent prognostic predictive ability for STS patients. The risk score and clinical information are combined to construct a nomogram to predict the 1, 3, and 5-year survival rate of patients, and the corresponding calibration chart shows that this nomogram has relatively high accuracy.
We used GSEA to explore the differences in gene function between high-and low-risk populations, and the results showed that high-risk patients had relatively low levels of immune gene enrichment .The ESTIMATE algorithm showed that there were signi cant differences in immuneScore, tumor purity, StromalScore, ESTIMATEScore, between high-and low-risk patients.And, this 8 DEIRLs model related to 3 immune cell subtypes: CD4 + cells, Macrophage cells, Neutrophil cells.
GSEA analysis showed that these lncRNAs were enriched in the pathways of " KEGG_SPLICEOSOME", " KEGG_RNA_POLYMERASE", "KEGG_PYRIMIDINE_METABOLISM", " KEGG_RNA_DEGRADATION" and " KEGG_CELL_CYCLE". Recent genome analysis has shown that many of the molecular changes observed in cancer result from mutations in the splicing process. Understanding the link between tumor cell biology and splicing regulation is essential for studying pathogenesis and treatment methods. cells that is treated with YM155. [19].The cell cycle of cancer cells is dysregulated, leading to uncontrolled growth of tumor cells. [20].ESTIMATE algorithm and TIMER database analysis show that this model is closely related to tumor immune in ltration and immune cell subtypes, which may provide potential targets for immunotherapy of soft tissue sarcoma.
There have been some studies on the role of these DEIRLs in tumor cells. Xiaokun Zhou et al. found that overexpression of LINC00294 inhibited the growth of glioma cells and induces apoptosis. [21].Linc01023 can inhibit the growth and metastasis of glioma cells by regulating the IGF1R/AKT pathway. [22]. Pamela R de Santiago et al. discovered that LINC00944 regulated the level of ADAR1 in breast cancer cells, and the expression of LINC00944 was positively correlated with T lymphocyte in ltration. [23].Knockdown of LINC01140 can inhibit the growth and metastasis of glioma cells through miR-199a-3p. [24].THUMPD3-AS1 affects the proliferation of NSCLC cells by regulating the level of ONECUT2, so it can be used as a prognostic-related marker and a potential therapeutic target. [25].
As we know, this is the rst time that an immune-related lncRNA model has been constructed to predict the prognosis of patients with soft tissue sarcoma, and the immune in ltration related to the model has been explored. The GTEx database makes up for the shortcomings of the lack of normal samples in the TCGA database. However, this work has some limitations: First of all, some important clinical features of patients in the TCGA database are not su ciently detailed, such as tumor stage information, which may affect the treatment and prognosis of STS patients. Second,it is necessary to further determine the functional correlation between the expression levels of these 8 DEIRLs and the immunophenotype in soft tissue sarcoma at the cellular level. Finally, in order to ensure the predictive performance of the nomogram, more independent external queues should be analyzed on the basis of our model construction method.

Conclusion
All in all, we have constructed a new type of immune-related lncRNA signature to predict the prognosis of patients with soft tissue sarcoma. In addition, we also found the high-and low-risk patients had different immune in ltration status based on the risk score. These ndings may provide new insights into the prognosis assessment of STS patients, and may provide new ideas for the immunotherapy of STS.

Declarations
Availability of data and materials The datasets generated and/or analysed during the current study are available in the UCSC Xena repository(https://xena.ucsc.edu/), and GEO database (https://www.ncbi.nlm.nih.gov/geo/).

Con ict of Interest
The authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.