Identification and characterization of two ubiquitin-related subgroups
The detailed workflow was shown in Figure S1. Among 752 URGs, 138 prognostic-related URGs were identified by univariate Cox analyses (Figure 1A, Table S2). NMF clustering was performed to classify sarcoma patients based on the expression of 138 URGs using TCGA-SARC. K = 2 was selected as the best cutoff based on the NMF algorithm, which means that patients from TCGA-SARC were classified into two clusters (Figure S2). A survival analysis revealed that the overall survival (OS) of patients in cluster 2 was better than that in cluster 1 (p < 0.0001; Figures 1B).
We further investigated the correlation between the two clusters and clinicopathological features in the TCGA cohort. The relevance between clinical characteristic and expression levels of the URGs in clusters 1 and 2 are shown in the heat map (Figure 1C). The enriched pathways were also analyzed by GSVA (Table S3) and the results were shown in a heat map in Figure 1D. We noticed that the EMT pathway was enriched in cluster 1. Therefore, we calculated the EMT score of the two clusters, and observed a higher EMT score in cluster 1 (Figure 1E). The results have implicated an activated immune response for cluster 2, which might account for its better survival and was explored in subsequent analyses.
Characterization of the tumor microenvironment of the two clusters
To understand the characteristics of the immune infiltration, we performed CIBERSORT. The distribution of immune cells (listed in Table S4) and its relationship with some clinical parameters of the two clusters were shown in a heat map in Figure 2A. Specifically, cluster 2 was characterized by higher infiltration levels of CD8+ T cells, and a higher M1/M2 macrophage proportion (Figure 2B,C).
We then evaluated the expressions of key immune characteristics (including leukocyte fraction, stromal fraction, single nucleotide variant neoantigens, indel neoantigens, TCR Shannon and TCR richness) and major histocompatibility complex (MHC) class II genes. Leukocyte fraction, stromal fraction, TCR Shannon and TCR richness were significantly higher in cluster 2 (Figure 2D). Meanwhile, most of the MHC class II genes were also higher in cluster 2 (Figure 2E).
We also evaluated the expression of several prominent checkpoints, including CD274, TIGIT, PDCD1, CTLA4, LAG3, BTLA, and HAVCR2. The expressions of these checkpoints were all significantly higher in cluster 2 (Figure 2F), indicating the possibility of a better response to immunotherapy for this cluster.
Establishment and evaluation of a ubiquitin-related signature
From 752 URGs, 59 candidate genes with significant prognostic values (p < 0.01) were identified by univariate analyses (Figure S3A). Random survival forest analyses were then implemented to narrow the scope of candidate genes and the top 10 genes ranked by importance were selected for combination analysis and model construction (Figure S3B). The top 10 combinations with the smallest p values were shown in Figure S3C. Among them, the combination with the least gene number (LRRC41, RNF125, TRIM21 and UBE3D) was selected for building the final signature. The risk score was calculated using the following formula: risk score = 0.587057893868294*LRRC41 + (-0.227071632803124)*RNF125 + (-0.54040867418788)*TRIM21 + 0.418750359339415*UBE3D. The prognostic values of the four genes were investigated and all of them showed a significant prognostic value in patients from TCGA-SARC (Figure S3D).
Further, the prognostic values of the signature was explored in three public datasets. First, the patients in TCGA-SARC (training cohort) were assigned to a high risk or a low risk group according to the median risk score (Figure 3A). The survival analysis revealed that patients in the low risk group had significantly better OS (Figure 3D). The areas under the ROC curve for OS were 0.745 at 2 years, 0.762 at 3 years, and 0.743 at 5 years (Figure 3G). The risk scores were then calculated likewise for patients from two validation cohorts (Figure 3B for TARGET-OS and Figure 3C for GSE17674), and patients in low risk groups all had significantly better prognoses (Figure 3E,F). Analyses of the 2-year, 3-year, and 5-year prognostic prediction abilities indicated that the model had relatively strong robustness and effectiveness (Figure 3H,I).
Pathway analyses and tumor microenvironment characterization in different risk groups
We explored the correlations between the clustering system and the signature. Changes in the attributions of individual sarcoma patients from TCGA-SARC in different clusters, risk subgroups, and survival status were shown in an alluvial diagram; the diagram showed that cluster 1 patients had the highest proportion of the high risk subtype and was linked to a higher incidence of fatality (Figure 4A). Correspondingly, the mean risk score of patients in cluster 1 was significantly higher than cluster 2 (Figure 4B). We then further investigated the enriched pathways related to the signature. GSEA showed that pathways including glycolysis, unfolded protein response, MYC targets V2, MYC targets V1, G2M checkpoint, E2F targets were enriched in the high risk group (Figure 4C). Interestingly, we noticed that hallmark of EMT was also enriched in the high risk group (Figure 4D). Therefore, differential expressions of EMT-related genes were explored in the two groups. Among 8 EMT-related genes, 7 were significantly higher in the high risk group, while 1 (CDH1) was lower in that group (Figure 4E). Consistently, EMT score was significantly higher in the high risk group (Figure 4F). We also performed GSVA, which revealed the correlation between risk score and the known biological processes (Table S5). Interestingly, risk score was negatively correlated with CD8 T effector pathway, antigen processing machinery, and immune checkpoint (Figure 4G). Based on this, relative pathways were further explored.
To further understand the underlying mechanisms, the tumor microenvironment of the two risk groups was characterized. Firstly, Spearman’s correlation analyses were performed to explore the correlations between risk score, each signature gene and immune cell infiltration; a widespread correlation between the expression of the genes, risk score, and immune cell infiltration was observed (Figure 5A). Specifically, CD8+T cells and M1/M2 macrophage proportion were significantly higher in the low risk group (Figure 5B,C), which is in line with the GSVA result. Consistent with this, GSEA revealed an enrichment of hallmark of interferon gamma response in the low risk group and an enrichment of hallmark of TGF beta signaling in the high risk group (Figure 5D,E). According to the GSVA result, we further tested the major MHC class II genes, and most of them showed a higher expression in the low risk group (Figure 5F). Additionally, the expressions of several prominent checkpoint genes in different risk groups were examined, and significantly higher levels of CTLA4, PDCD1, BTLA, TIGIT, HAVCR2, CD274, LAG3 were observed in the low risk group (Figure 5G). The results implicated that patients in the low risk group may have a better response to immunotherapy.
Analyses for TRIM21 and validation in our cohort
One of the signature gene, TRIM21, is an E3 ubiquitin ligase that is is well known to be involved in innate immunity and cancer proliferation.(21) Therefore, we further analyzed the potential role of it in sarcomas. Enrichment analysis showed that hallmark of interferon-gamma response was enriched in patients with high TRIM21 (Figure 6A). Correlation analysis showed that TRIM21 was positively correlated with several prominent checkpoint genes such as CD274, LAG3, TIGIT, BTLA, CTLA4, PDCD1, and HAVCR2 (Figure 6B,C). Intriguingly, TRIM21 was also correlated with CD8+ T cells (Figure 6D). To partly validate the above conclusions, we retrospectively collected the specimens of 50 sarcoma patients from Cancer Hospital, Chinese Academy of Medical Sciences and performed immunohistochemistry assays to analyze the expressions of related proteins. The clinical information of these patients has been listed in Table S6. We analyzed the expressions of TRIM21 and CD8. Significantly, the expression of TRIM21 was positively correlated with that of CD8 (Figure 6F). We have also analyzed the prognostic values of them, and observed that a higher expression of TRIM21 was significantly related to better patients’ survival. (Figure 6G). Although the association between a higher expression of CD8 protein and better patients’ survival was not significant, higher TRIM21 combined with higher CD8 was significantly associated with better OS (Figure 6H,I). Representative images of IHC were shown in Figure 6E.