Retinoic acid metabolism-related enzyme signature identied prognostic and immunotherapeutic eciency in sarcoma

Background: Dynamic balance of retinoic acid metabolism plays a major role in a variety of biological functions including cell proliferation and differentiation, while its dysregulation often leads to cancer progression and disordered immunity. Targeting retinoic acid signaling has shown effectivity in re-educates tumor microenvironment so that they could enhance the ecacy of immunotherapies and received better outcome. However, a comprehensive analysis of retinoic acid metabolism abnormality in sarcoma is still lacking, which limits the development and application of related targeted drugs. Methods: The RA metabolism related enzymes data set was collected from several database. Then we systematically analyzed the molecular features of 19 retinoic acid metabolism-related enzymes based on TCGA/TARGET/GSE datasets and revealed two subtypes with distinct metabolic status and prognostic value. And we further generated a 7 genes signature to predict retinoic acid metabolism index based on LASSO-penalized Cox regression model. Results: Gene set enrichment analysis indicated a set of immune and oncogenic pathways were enriched in poor-prognosis group. Connectivity Map screened 56 potential small molecules specic to different sub-groups. Survival analysis demonstrated signicant prognostic difference between high- and low-risk groups among all datasets. Several immune cells including CD8+ T cells, Treg cells, Monocytes, and Macrophages showed different abundance between these groups, and immune checkpoint blockade therapy response prediction indicated potential immunotherapeutic eciency of poor-prognosis group. Conclusions: Taken together, our study elaborated two different retinoic acid metabolism status of sarcoma, which revealed the metabolic heterogeneity of sarcoma. Robust and powerful metabolic index risk model could provide insightful suggestions to explore the molecular functions and mechanisms of retinoic acid metabolism.


Background
Retinoic acid (RA), a metabolically active form of vitamin A, acts as a critical signal transduction molecule [1,2]. In organism, the catalytic process of vitamin A to RA consists of three parts. Vitamin A derived from colored fruits and vegetables was rst catabolized to its alcohol form retinol by retinyl ester hydrolases and then oxidized into retinal by widely expressed alcohol dehydrogenases (ADH) [3,4].
Finally, retinal dehydrogenases (RALDH) binds to retinal and oxidate it into RA [5]. All-trans RA is the majority isoform and predominated in most tissues, which plays functional role in regulating cell growth and controlling differential [6][7][8]. Unsurprisingly, dysregulation of RA metabolism was reported to be functionally related to cancer by regulating the expression of many tumor suppression genes and oncogenes, such as mucins and retinoic acid receptor beta (RARβ) family [9][10][11]. For example, previous reports indicated the combination of RA, interferon-gamma (IFN-γ) and transforming growth factor-beta (TGF-β) induce the expression of MUC4, which is aberrantly expressed in numerous cancers [11,12].
Retinoic-acid receptor responder protein 1 (RARRES2), a RA inducible gene, is a well-documented tumor suppressor in many types of cancer. Downregulation of RARRES2 expression level promotes tumor proliferation and tumorigenicity [13].
RA metabolism and signaling was reported to be functionally related to immunity. In 1992, Carman et al [14]. argued that RA impaired Th2 cell responses to parasitic during vitamin A de ciency by inhibit IFN-γ production from Th1 cells and CD8 + T cells. In the meanwhile, RA regulates cell differentiation in TGF-βdependent responses [15]. In serum-free cultures, RA activate T cell in an IL-2-dependent manner and promotes TCR-mediated CD4 + T cell proliferation [16]. Preliminary evidence suggests CD4 + T cell activation was signi cantly correlated with RARα expression [17,18], and RARγ was reported to be critical for CD8 + T cell effector differentiation [18], indicating RA controls T cell immunity largely depended on RARα and RARγ. In addition to T cell regulation, RA was found to regulate numerous types of immune cells, such as macrophage, DCs and NK cells [19][20][21]. For instance, CD103 + DCs mediate T and B cells by high expression of RALDH1 and RALDH2 enzymes, which are crucial for the conversion of retinal to RA [20]. In NK cells, RA was shown to be activated by IFNα and suppress the cytotoxicity of human NK cells [21]. These ndings support that RA metabolism is functionally correlated with immunity, and dissecting its instinct molecular mechanism will broaden insight to immunotherapy of cancer.
Targeting RA metabolism has been proved to show effectivity in several cancers [22], however, the researches on retinoic acid metabolism abnormality in sarcoma is still lacking, which limits the development and application of related targeted drugs. Sarcoma is a kind of malignant tumor, which can be classi ed into bone sarcomas and soft tissue sarcomas based on the tumor-of-origin. The overall 5year survival for bone sarcomas and soft tissue sarcomas are 66.9% and 64.5% [23,24].
Recent clinical research results indicate that the overall response rate of sarcoma to immunotherapy is less than 20% [25][26][27]. Our previous research found that the sarcoma tissue presents an immunosuppressive microenvironment [28]. For example, the degree of cell-killing T cell in ltration is very low, with only (1.8 cells/HPF) [29]. Devalaraja et al. con rmed that sarcoma cells can release RA and promote the differentiation of monocytes in the tumor microenvironment into tumor-associated macrophages in animal models, thereby inhibiting the in ltration of antitumor T-cells [30] and also weakening the clinical effect of immune-checkpoint blockade [31]. This nding was highly recommended by Balkwill for its potential implementation on immunotherapy of sarcomas [32]. Therefore, these studies show us the importance of retinoic acid metabolism in sarcoma. However, there are still many unknowns about the clinical value of RA.
In this study, we systematically analyzed the molecular features of 19 retinoic acid metabolism-related enzymes and revealed two subtypes with distinct metabolic status and prognostic value. The overall design of this research was shown at Figure S1. Gene set enrichment analysis indicated a set of immune and oncogenic pathways were enriched in poor-prognosis group. Connective Map [33] screened 56 potential small molecules speci c to different sub-groups. We further generated a 6 genes signature to predict retinoic acid metabolism index based on LASSO-penalized Cox regression model. Several immune cells including CD8 + T cells, Treg cells, Monocytes, and Macrophages showed different abundance between these groups, and immune checkpoint blockade therapy response prediction indicated potential immunotherapeutic e ciency of poor-prognosis group. The robust and powerful metabolic index risk model could provide insightful suggestions to explore the molecular functions and mechanisms of retinoic acid metabolism.

Material And Methods
Data acquisition TCGA and TARGET RNA sequence level 3 normalized data and clinical informations of SARC were downloaded from UCSC Xena (https://xenabrowser.net/datapages/) for further analysis. The RNA-seq data of GSE17679 datasets was downloaded from Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/gds/?term=GSE17679). The RNA-seq data of healthy of healthy human tissue was downloaded from the Genotype-Tissue Expression (GTEx) database (http://commonfund.nih.gov/GTEx/).

Selection of RA metabolism related enzymes
The RA metabolism related enzymes data set was collected in three major aspects. The major and important method was searching the literatures from PubMed. And besides, the enzyme information from MetaCyc database [34] and KEGG retinol metabolism pathway were integrated. Bioinformatics analysis Protein interaction network was constructed based on GeneMANIA (https://genemania.org/).
Unsupervised hierarchical clustering of SARC samples was achieved by R package "ConsensusClusterPlus". Principal Component Analysis (PCA) was constructed to reveal the expression pattern difference of SARC. The R package "survival" (https://cran.r-project.org/web/packages/survival/) was adopted to acquire the overall survival through Kaplan-Meier estimation. We calculated the fold change and adjusted p-value by the DESeq2 package for all genes between different groups, in which an adjusted p-value less than 0.05 and |log2FoldChange| > 1 was considered the differential expression gene. Gene Set Enrichment Analysis (GSEA) was used to classify the functional pathway difference between different subgroups. mode of action (MoA) analysis was generated based on the analysis of differentially expressed genes in Connectivity Map [33] (https://clue.io/). Compounds with an absolute value of p < 0.05 and enrichment ≥ 0.7 were treated as potential therapeutic drugs for different subgroups in SARC. The mutation data of SARC was downloaded from Xena Browser (https://xenabrowser.net/datapages/) and R package "maftools" was used to dissect the mutation pattern differences. CIBERSORT [35] was used to retrieve the immune cell components in SARC samples.
ImmuCellAI [36] was used to predict the reponse of Immune checkpoint blockade (ICB) therapy. Gene signature identi cation Univariate Cox regression analysis of the expression of 19 RA metabolism related enzymes was performed on SARC samples to determine the candidate genes with R package "glmnet". Finally, a total of 7-gene signature was obtained after 1,000-time alteration and cross-validation. Risk score for each patient was calculated with the linear combinational of the signature gene expression. The regression coe cients were determined by the minimum criteria. That is, Risk Score = k1*x1 + k2*x2+…+ki*xi (i = n).
where i represents each selected enzyme, k is the regression coe cient and x is the expression level. We further classi ed the samples into high-risk group and low-risk group based on the median value of risk scores.

Results
Expression patterns of RA metabolism regulators in sarcomas With the mRNA expression data stored in TCGA and GTEx database, we examined the relationship between RA metabolism related enzymes and the clinical features. The results for 19 enzymes expression and relevant clinical features were displayed in the heatmap and boxplot (Fig. 1). Due to there are only two normal samples for sarcomas in TCGA, we collected the muscle and adipose tissues in GTEx as normal controls. Figure 1A and 1B indicates that the majority of RA metabolism related enzymes was differentially expressed between sarcomas and normal tissues. Compared with normal tissues, the expression level of CYP3A4, RDH8, CYP2C9, LRAT, CYP2A6, RDH11, RDH16, UGT2B7, DHRS9, RPE65 and RDH12 were elevated, while the expression level of RDH10, ALDH1A1, ALDH1A2, ADH4, RDH5, DHRS4, ADH1B and ADH5 were decreased. However, few enzymes showed signi cant correlation with recurrence ( Fig. 1C and 1D). Only DHRS4 upregulated in primary tumors and RDH8 downregulated in recurrent tumors. We further dissected the regulation network and pathways of RA metabolism related enzymes based on expression patterns, sublocations and protein-protein interactions (Fig. 1E). RA metabolism was found to be related to several metabolic pathways such as terpenoid, isoprenoid, retinoid, diterpenoid, glucuronate and uronic acid metabolic process. Finally, we explored the correlation of RA metabolism related enzymes (Fig. 1F). Most enzymes were positive correlated, especially DHRS4, which exhibits the most positive correlation with other enzymes RA based subtype of sarcomas showed different characteristics We then examined the prognostic value for all RA metabolism related enzymes. ALDH1A1, RDH5 and RDH12 was positive related to better prognosis, while RDH8 and RDH11 indicated poor prognosis ( Fig. 2A). We adopted unsupervised hierarchical clustering method to generate the subtype of sarcomas based on the expression matrix of 19 enzymes. 263 samples from TCGA database were classi ed into two groups, which were then renamed G1 and G2 subgroups (Fig. 2B). G2 subgroup mainly contained sarcomas with elevated expression of RDH11, RDH8, RDH16, CYP2A6 and ALDH1A2. Consistent with the phenomenon that these enzymes showed poor survival. Principle component analysis (PCA) revealed distinct expression pattern between these two groups (Fig. 2C). We then performed Kaplan-Meier survival analysis for these subtypes and noticed a signi cantly decreased OS status in the G2 subgroup compared to G1 subgroup, suggesting the 19 enzymes based marker could indicate the prognostic risk level (Fig. 2D). The median survival for G1 and G2 subgroups were 3.09 years and 2.11 years.
To better understand the molecular mechanisms between RA metabolism regulation and sarcomas. We performed differential expression analysis and identi ed 2,279 differentially expressed genes (DEGs). 1,184 genes were upregulated in G1 subgroup, while 1,095 genes were upregulated in G2 subgroup (Fig. 3A, Table S1). GSEA enrichment analysis based on hallmark and KEGG pathway annotations revealed that in ammatory response, interferon alpha response, interferon gamma response and tyrosine metabolism pathways were enriched in G1 subgroup, and G2 subgroup was related to MYC targets, KRAS signaling, E2F targets and G2M checkpoint (Fig. 3B-C, Table S2). To determine the potential compounds that target the relevant biological functions and pathways, CMap mode of action (MoA) analysis was used to generate candidates based on DEGs. 56 kinds of molecules capable of elevating or repressing the expression of the DEGs, which was surmised and displayed in Table S3  Both G1 and G2 subgroups contains only 3 genes with more than 10% mutation rate in sarcomas. The top 15 most frequently mutated genes were illustrated in Figure S2A and S2B. Interestingly, TP53, ATRX, RB1 and MUC16 occupy the top four positions in both cohorts. The mutated positions and frequency of TP53 were similar and showed no survival difference ( Figure S2C and S3). We then calculated the differentially mutated genes using Fisher's exact test and found only MAGEC1 showed signi cant different mutation rate between G1 and G2 subgroups. Furthermore, we investigated the exclusive mutations and co-occurring of the top 20 most frequently mutated genes and found some unique cooccurring cases between these two subgroups, such as CTNND2-ARFGEF1 in G1 subgroups and LRP1B-CSMD1 in G2 subgroups. The 7-gene signature showed strong prognostic power To expanse the application of RA metabolism based enzymes in different kinds of datasets, we generated a prognostic model to score all samples. The 19 putative RA metabolism related enzymes were exploited to construct the model. LASSO Cox regression algorithm was adopted. After 1,000-time alteration and cross-validation, a 7-gene based risk signature was nally used. The parameters of the model were displayed in Fig. 4A and Fig. 4B. Signi cant differences was observed between different OS status groups Fig. 4C. To evaluate the predictive performance of the model, we calculated AUC and the result was shown in Fig. 4D. The AUC was 0.69 in predicting overall survival status and was able to reach 0.75 in survival time based ROC analysis ( Fig. 4D and S4). Cox proportional hazard regression analysis indicates the risk signature was an independent prognostic factor (Fig. 4E). A similar trend was observed in two independent validation dataset TARGET and GSE17679 with signi cant different survival risk (P = 0.0005 for TARGET dataset and P = 0.075 for GSE17679 dataset) ( Figure S5). And besides, risk score was found to be signi cant related to unsupervised hierarchical clustering results of TCGA samples, showing a robust predictive ability of RA metabolism related enzymes ( Figure S6).
We further constructed a nomogram to predict 1-, 3-, and 5-years OS probability based on risk score. As shown in Fig. 5A, Nomogram analysis indicated risk score deviated very little from actual OS probability, such as 1-, 3-and 5-years OS probability (Figs. 5B-D). These results suggested the potential application of the risk score as a valuable marker to improve the estimation of sarcomas prognosis. RA based risk score correlated with immunity RA metabolism and signaling was reported to be functionally related to immunity. To resolve the relationship between RA metabolism and immune cell subsets, we characterized cell composition using CIBERSORT (Table S4). 18 types of immune cells were decomposed and used to calculate the correlation coe cient with RA based risk score. As shown in Fig. 6, the fractions of CD8 + T cells, Treg cells, Monocytes, and Macrophages were negatively correlated with risk score (Fig. 6). Previous reported that RA inhibit IFN-γ production from Th1 cells and CD8 + T cells, and enhance macrophage activation in response to in vitro infection, indicating RA metabolism dysregulation closely related to immunity. And besides, ImmuCellAI can be used to predict the reponse of Immune checkpoint blockade (ICB) therapy with the ICB response prediction (Table S4). The results in Figure S7 revealed that high RA metabolism risk samples were potentially bene t from immunotherapy.
High expression of RDH11 is a biomarker in sarcomas RDH11 was reported to be elevated and signi cantly related to tumorigenesis and development in many cancers [37][38][39]. Our result further dissected it may act as a potential biomarker in sarcomas. To comprehensive classify the functions of RDH11, we retrieved the expression level of RDH11 in all normal and tumor tissues based on GTEx and TCGA database. The RDH11 expression levels were similar among different healthy tissues except muscle and blood, and notably, its expression level in many tumors were higher than the corresponding healthy tissues (Fig. 7). These results revealed that RDH11 play a crucial role in sarcomas and could act as a potential biomarker.

Discussion
RA metabolism plays functional role in regulating cell growth and controlling differential [6][7][8], dysregulation of RA metabolism was reported to be functionally related to cancer by regulating the expression of many tumor suppression genes and oncogenes, such as mucins and retinoic acid receptor beta (RARβ) family [9][10][11]. Targeting RA metabolism has been proved to show incredible effectivity in several cancers [22], however, the comprehensive analysis on retinoic acid metabolism abnormality in sarcoma is still lacking, which limits the development and application of related targeted drugs.
By systematically analyzing the molecular features of 19 retinoic acid metabolism-related enzymes, we found two subtypes with distinct metabolic status and prognostic value. Gene set enrichment analysis indicated a set of immune and oncogenic pathways were enriched in poor-prognosis group. Connective Map screened 56 potential small molecules speci c to different sub-groups. We further generated a 7 genes signature to predict retinoic acid metabolism index based on LASSO-penalized Cox regression model. The robust and powerful metabolic index risk model could provide insightful suggestions to explore the molecular functions and mechanisms of retinoic acid metabolism.
Recently, RA metabolism and signaling was reported to be functionally related to immunity. Devalaraja et al. revealed that RA made by sarcoma cells promotes immunosuppression and tumor growth, and blocking RA production could enhance anti-tumor T cell immunity in sarcomas [31]. We speci ed that several immune cells including CD8 + T cells, Treg cells, Monocytes, and Macrophages showed different abundance between these groups, which consistent with existing researches [19][20][21], and immune checkpoint blockade therapy response prediction suggested potential immunotherapeutic e ciency of poor-prognosis group, indicating the huge druggable potential of RA combined with immune checkpoint blockade therapy.
RDH11 was reported to be elevated and signi cantly related to tumorigenesis and development in many cancers [37][38][39]. We found the RDH11 expression levels were similar among different healthy tissues except muscle and blood, and notably, its expression level in many tumors were higher than the corresponding healthy tissues, indicating a therapeutic targets potential in SARC. However, more rigorous experimental veri cation still needs to be further developed.

Conclusion
Taken together, our study elaborated two different retinoic acid metabolism status of sarcoma, which revealed the metabolic heterogeneity of sarcoma. Robust and powerful metabolic index risk model could provide insightful suggestions to explore the molecular functions and mechanisms of retinoic acid metabolism.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
All listed authors participated in the study and have seen and approved the submitted manuscript.      Correlation of the immune cell members and risk score.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.