Identification of DEGs and enrichment analyses
Three studies including GSE105261, GSE47352 and GSE23629 from the GEO database were involved in the analyses. As the volcano plots show in Fig. 1A1-A3, the top 250 DEGs of each study are shown in the plots. DEGs with a fold change ≥ 2 were selected for the next analyses. Seven DEGs identified from more than two studies were finally selected for survival analyses, including GLS2, osteoglycin (OGN), adhesion G protein-coupled receptor F1 (ADGRF1), adaptor-related protein complex 4 epsilon 1 subunit (AP4E1), tetraspanin 3 (TSPAN3), paired-related homeobox 1 (PRRX1) and katanin catalytic subunit A1-like 2 (KATNAL2). The heatmap of the seven DEGs based on 840 RCC patients is shown in Fig. 1B. Among them, KATNAL2 was highly expressed in RCC tissues, and the other DEGs were in low expression states. KEGG enrichment analyses were performed based on all DEGs with fold changes ≥ 2, and 40 altered pathways were detected, as shown in Fig. 1C. ROC curves of the seven DEGs were constructed, and four DEGs showed satisfactory AUC values for the prognosis of RCC, including GLS2 (AUC = 0.605, P < 0.001), ADGRF1 (AUC = 0.587, P > 0.001), KATNAL2 (AUC = 0.595, P < 0.001) and OGN (AUC = 0.585, P < 0.001), as shown in Fig. 1D1-D4.
Construction of a prognostic nomogram for RCC patients based on DEGs
840 RCC patients and their basic characteristics were obtained from the TCGA database. The mean age of all RCC patients was 60.2 years, ranging from 17 to 90 years. We divided all RCC patients into two groups according to their outcomes, including a group of patients who were alive and a group of patients who died, and compared the expression status of the seven selected DEGs between the two groups, as shown in Figure D5-D8. GLS2, ADGRF1 and KATNAL2 showed a significant reduction in the group of patients who died, which indicated that the functional loss of these genes was associated with a poor prognosis in RCC.
Univariate analyses were performed to demonstrate the relationship between selected DEGs and the prognosis of RCC patients, as shown in Fig. 2A-D. The four selected DEGs were tightly associated with prognosis. High expression of GLS2, ADGRF1 and KATNAL2 was related to a good prognosis both in terms of disease-free survival (DFS) and overall survival (OS). However, high expression of OGN was related to poor DFS and OS. Multivariate analysis was performed to integrate the variables into a nomogram. As shown in Table 1, age (P = 0.014), TNM stage (P < 0.001) and GLS2 expression (P = 0.001) were selected for the nomograms based on multivariate analysis with the Cox regression model. The nomogram for OS of RCC patients was constructed based on the three variables above, as shown in Fig. 2E. In the nomogram, every variable produced a score, and the total score was easy to calculate. By correlating the total score with the 1-year to 5-year OS values, the probability of survival for every patient could be obtained.
Table 1
Univariate and multivariate analyses of RCC patients based on DEGs.
Variables | Groups | Renal cell carcinoma |
Total Patient (N = 840, x%) | Univariate analysis | Multivariate analysis |
age | | | < 0.001 | 0.014 |
| > 60 | 421(50.1) | | |
| <=60 | 419(49.9) | | |
gender | | | 0.42 | 0.522 |
| male | 564(67.1) | | |
| female | 276(32.9) | | |
TNM stage | | | < 0.001 | < 0.001 |
| I | 450(53.6) | | |
| II-IV | 390(46.4) | | |
GLS2 | | | < 0.001 | 0.001 |
| low | 420(50.0) | | |
| high | 420(50.0) | | |
ADGRF1 | | | 0.001 | 0.323 |
| low | 420(50.0) | | |
| high | 420(50.0) | | |
KATNAL2 | | | < 0.001 | 0.58 |
| low | 420(50.0) | | |
| high | 420(50.0) | | |
OGN | | | 0.026 | 0.349 |
| low | 420(50.0) | | |
| high | 420(50.0) | | |
The Role Of GLS2 In RCC and ccRCC
The nomogram for the OS of RCC patients revealed that GLS2 plays an important role in the prognosis of RCC. Given that GLS2 was a DEG of the ccRCC samples (GSE105261 and GSE47352), the following analyses were all based on ccRCC samples to reduce the heterogeneity associated with RCC. We first determined the expression status of GLS2 in ccRCC through immunohistochemistry (IHC) data from the Human Protein Atlas. Strong and weak staining of GLS2 are shown in Fig. 3A. We further researched the expression status of GLS2 in ccRCC samples from the TCGA database, as shown in Fig. 3B-E. Loss of GLS2 expression was likely to induce tumorigenesis. GLS2 expression was lower in tumor tissue than in normal control tissue. In addition, lower GLS2 expression was related to more metastatic lymph nodes, higher TNM stage and higher tumor grade. In ccRCC patients, GLS2 was also a good biomarker for predicting the prognosis in terms of both DFS and OS in ccRCC patients, as shown in Fig. 3F and Fig. 3G, respectively. Therefore, we conducted GSEA to compare the different signaling pathways and molecules between the GLS2-high (GLS2-H) group and GLS2-low (GLS2-L) group. According to the GSEA in this study, the cell cycle pathway was altered significantly in the GLS2-L group according to the results from three pathway databases, as shown in Fig. 3H1-H3, including KEGG (nominal P-value = 0.014), BIOCARTA (nominal P-value = 0.006) and REACTOME (nominal P-value = 0.031). In addition, the E2F pathway (Oncogenic signatures: nominal P-value = 0.006) was overactivated in the GLS2-L group, and E2F1 was overexpressed (Oncogenic signatures: nominal P-value = 0.015; KEGG: nominal P-value = 0.019), as shown in Fig. 4A1-A3.
E2F family and the prognosis of RCC and ccRCC patients
The E2F pathway and molecules of the E2F family were found to be activated in the GLS2-L group of ccRCC patients through GSEA. Therefore, we further determined the relationship between the E2F family and the prognosis of ccRCC and RCC patients. E2F1 to E2F8, which are E2F family molecules detected by researchers so far [25], were included in the prognostic analyses in our study. ROC curves of these eight molecules were constructed in our study, and E2F1, E2F2, E2F3, E2F4, E2F5 and E2F7 showed satisfactory AUC values for predicting the prognosis of ccRCC patients, as shown in Fig. 4B1-B6. Survival analyses for the six selected genes above were performed, and the results are displayed in Fig. 4C1-C6. Except for E2F5, all five other molecules of the E2F family demonstrated a convincing ability to predict the prognosis of ccRCC patients and were all related to poor OS. As a result, we performed multivariate analysis to comprehensively evaluate ccRCC patient prognosis. A total of 524 ccRCC patients from the TCGA database were included in the analysis. The mean age of ccRCC patients was 60.6 years, ranging from 26 to 90 years. Basic characteristics were collected for the construction of a nomogram. Through the multivariate analysis for the OS of ccRCC patients, five variables, including age (P = 0.001), TNM stage (P < 0.001), E2F1 (P = 0.037), E2F4 (P = 0.012) and E2F5 (P = 0.015), were selected for the construction of a nomogram, as shown in Fig. 5A. The details of the multivariate analysis are summarized in Table 2. Interestingly, although the univariate analysis of E2F5 showed that it had no significance in predicting the prognosis of ccRCC patients, E2F5 may influence prognosis in combination with other factors. E2F1 demonstrated a robust ability to predict the prognosis of ccRCC patients and was activated in the GLS2-L group. We further determined the expression status of E2F1 in the different groups, and the comparisons are shown in Fig. 5B-E. In contrast to that of GLS2, higher E2F1 expression was detected in tumor tissue than in normal tissue, and higher E2F1 expression was related to more metastatic lymph nodes, higher TNM stage and higher tumor grade.
Table 2
Univariate and multivariate analyses of RCC and ccRCC patients based on E2F family members.
variables | groups | Renal cell carcinoma | Clear cell renal cell carcinoma |
Total Patient, N(%) | Univariate analysis | Multivariate analysis | Total Patient, N(%) | Univariate analysis | Multivariate analysis |
age | | | < 0.001 | < 0.001 | | 0.001 | 0.001 |
| > 60 | 421(50.1) | | | 262(50.0) | | |
| <=60 | 419(49.9) | | | 262(50.0) | | |
gender | | | 0.42 | 0.296 | | 0.695 | 0.905 |
| male | 564(67.1) | | | 340(64.9) | | |
| female | 276(32.9) | | | 184(35.1) | | |
TNM stage | | | < 0.001 | < 0.001 | | < 0.001 | < 0.001 |
| I | 450(53.6) | | | 263(50.2) | | |
| II-IV | 390(46.4) | | | 261(49.8) | | |
GLS2 | | | < 0.001 | 0.036 | | NA | NA |
| low | 420(50.0) | | | 262(50.0) | | |
| high | 420(50.0) | | | 262(50.0) | | |
E2F1 | | | 0.06 | 0.185 | | 0.001 | 0.037 |
| low | 420(50.0) | | | 262(50.0) | | |
| high | 420(50.0) | | | 262(50.0) | | |
E2F2 | | | < 0.001 | 0.553 | | 0.002 | 0.884 |
| low | 420(50.0) | | | 262(50.0) | | |
| high | 420(50.0) | | | 262(50.0) | | |
E2F3 | | | 0.008 | 0.253 | | < 0.001 | 0.669 |
| low | 420(50.0) | | | 262(50.0) | | |
| high | 420(50.0) | | | 262(50.0) | | |
E2F4 | | | NA | NA | | < 0.001 | 0.012 |
| low | 420(50.0) | | | 262(50.0) | | |
| high | 420(50.0) | | | 262(50.0) | | |
E2F5 | | | NA | NA | | 0.773 | 0.015 |
| low | 420(50.0) | | | 262(50.0) | | |
| high | 420(50.0) | | | 262(50.0) | | |
E2F7 | | | < 0.001 | 0.021 | | < 0.001 | 0.891 |
| low | 420(50.0) | | | 262(50.0) | | |
| high | 420(50.0) | | | 262(50.0) | | |
E2F8 | | | < 0.001 | 0.091 | | 0.045 | 0.218 |
| low | 420(50.0) | | | 262(50.0) | | |
| high | 420(50.0) | | | 262(50.0) | | |
The ability of GLS2 and the E2F family to predict the prognosis of RCC patients was then estimated. E2F2, E2F3, E2F7 and E2F8 demonstrated satisfactory AUC values, as shown in Table 3, and were convincing biomarkers for predicting the prognosis according to the univariate analyses, the findings of which are summarized in Table 2. Age (P < 0.001), TNM stage (P < 0.001), GLS2 (P = 0.036) and E2F7 (P = 0.021) were selected for the construction of a nomogram based on the findings of the multivariate analysis, as shown in Fig. 6A. Given the heterogeneity of RCC, E2F7 is likely to be an important biomarker for prognosis. As shown in Fig. 6B-D, higher expression of E2F7 was detected in the group of RCC patients who died than in the group of RCC patients who were alive, and high expression of E2F7 was related to a poor prognosis in terms of both DFS and OS in RCC patients.
Table 3
The comparison of AUC and C-index.
| | Renal cell carcinoma | Clear cell renal cell carcinoma |
AUC or C-Index | P value | AUC or C-Index | P value |
Differential expressed genes | GLS2 | 0.605 | < 0.001 | 0.605 | < 0.001 |
KATNAL2 | 0.595 | < 0.001 | NA | NA |
ADGRF1 | 0.587 | < 0.001 | NA | NA |
OGN | 0.585 | < 0.001 | NA | NA |
PRRX1 | 0.54 | 0.076 | NA | NA |
AP4E1 | 0.54 | 0.076 | NA | NA |
TSPAN3 | 0.526 | 0.254 | NA | NA |
E2F FAMILY | E2F1 | 0.587 | 0.001 | 0.581 | < 0.001 |
E2F2 | 0.603 | < 0.001 | 0.667 | < 0.001 |
E2F3 | 0.603 | < 0.001 | 0.626 | < 0.001 |
E2F4 | 0.607 | < 0.001 | 0.524 | 0.281 |
E2F5 | 0.596 | < 0.001 | 0.566 | 0.004 |
E2F6 | 0.524 | 0.375 | 0.539 | 0.088 |
E2F7 | 0.608 | < 0.001 | 0.677 | < 0.001 |
E2F8 | 0.554 | 0.046 | 0.607 | < 0.001 |
Prognostic model | Nomograms | C-Index 1: 0.7790; C-Index 2: 0.7888 | C-Index: 0.7679 |
Validation of Nomogram Performance
Harrell’s C-indexes were calculated to evaluate the discrimination ability of the nomograms and were involved in the comparison of the abilities of the nomograms and other biomarkers in predicting prognosis. The comparisons of AUC values and C-indexes are shown in Table 3. The nomograms demonstrated a more robust ability to predict prognosis than any other single variable selected in this study in RCC patients and ccRCC patients. The calibration plots are shown in Figure S1. The probabilities of our prognostic models agreed with the accuracy probabilities on acceptable scales (dashed lines in the calibration plots correspond to a 10% margin of error) except for that of the 5-year OS model.