3.1 Association of the genes in the described immune-related signatures with the disease-free and overall survival in RCC
In our study, we have downloaded the data of 758 RCC samples from the TCGA database and independent data of 91 RCC samples from the ICGC database. The analysis of the correlation between the expression levels of the immune-related genes and the prognosis of RCC allowed us to select genes derived from the IFN-gamma signature, the expanded immune gene signature, the cytotoxic T lymphocyte (CTL) signature and the HLA-A and HLA-B in HLA I molecules. These immune-related signatures were reported to be related to the prognosis of solid tumors, such as melanoma, ovarian cancer, breast cancer[15-19].
The univariate COX regression analysis was used to correlate gene expression levels with DFS and the OS of RCC. First, according to the clinical stage, we have divided the samples into two groups: an early-stage group that comprised the RCC in stages I and II and an advanced stage group containing RCC in stages III and IV. After excluding the invalid samples, 499 early RCC and 231 advanced RCC samples were further analyzed. In the two groups of RCC subsets, we found that few of the immune-related genes were significantly associated with the DFS and OS of RCC patients. For the early-stage RCC, we found that the high expression levels of CXCL13 and STAT1 resulted in poor DFS while the high expression levels of IDO1, CXCL13 and GZMB were related to detrimental OS. For the advanced RCC, the high expression levels of TNFRSF8 and CXCL13 were shown to be good predictors of adverse DFS and OS, respectively (Supplementary Figure 1).
3.2 Construction of prognostic risk models for RCC based on the genes from each of the immune-related signatures
Genes from the four gene signatures were studied to perform a multiple COX regression analysis in the early and advanced RCC groups, to construct prognostic models for the OS of RCC and to evaluate the performance of each model in the two groups of samples. The model was used to calculate the risk score of each sample. It determined the division threshold according to the surv_cutpoint function, dividing the samples into high-risk and low-risk groups. The KM survival analysis according to the high and low-risk groups of the samples was further conducted. All four models allowed discrimination of the RCC samples into high and low-risk groups. The OS was worse in the high-risk RCC group than in the low-risk one. Contrary to the early-stage RCC, the survival curves for the four models of the immune-related signatures indicated more significant differences in the OS between the two groups from the advanced RCC. The differences were statistically significant: for the HLA-A and HLA-B the p-value was 0.0015, for the IFN-gamma signature the calculated p-value was 9.787e-6, whereas for the expanded immune gene signature the p-value was1.137e-11, as for the cytotoxic T cell lymphocyte signature the p-value was 0.00011 (Figure 1).
3.3Establishment and examination of the prognostic model with the selected genes for the advanced RCC
The four risk models constructed by using the four immune-related signatures in the advanced RCC samples divided the samples into high and low-risk groups with significant statistical differences in the overall survival rate. We have used 8 genes that were most likely to be associated with OS in the advanced RCC samples. These genes were HLA-B; HLA-A; HLA-DRA; IDO1; TAGAP; CIITA and PRF1. They were combined to make a multiple COX regression analysis and a comprehensive prognostic risk model reflecting the genes’ weight coefficient (see Supplementary Table 1). The advanced RCC samples were divided into high- and low-risk groups according to the risk score of each sample (Supplementary Table 2). The estimated division threshold was with a cutoff of -2.20465 (Figure 2b). The OS in the high-risk group was lower than that in the low-risk group and there were significant differences in the overall survival between the two groups with a p-value equal to 0.032 (Figure 2a). The ROC curves suggested that though the predictive value of risk score for prognosis of advance RCC was not high enough (area under the curve (AUC) = 0.64). however, the stage of the RCC combined with risk score could increase the predictive value for the prognosis (AUC = 0.77) as seen in Figure 2e.
We have further used 91 of RCC samples from the ICGC database to test the comprehensive prognostic risk model. The division threshold determined by the above method was -2.622015 (Figure 4b). The samples were divided into high and low-risk groups (Supplementary Table 3) according to the threshold. The results showed that the OS in the high-risk group was significantly lower than that in the low-risk group (p-value = 0.013, Figure 4a). The prediction result of the model was consistent with the previous results (Figure 3 and Figure 4), and the stability of the model was effectively verified.
3.4 Landscape analysis of gene mutation in the high- and low-risk advanced RCC groups based on the TCGA database
Among the advanced RCC samples in the TCGA database, the genes with the top ten mutation rates in the high-risk group included TTN, MUC4, PBRM1, VHL, CHECK2, ATRX, DNAM2, FAT1, FRG1B, KMT2C (Figure 5a), while in the low-risk group these genes were PBRM1, VHL, TTN, SETD2, MUC4, BAP1, MUC16, MT-CYB, MUC2, CSMD3 (Figure 5b). The distribution and annotation of mutations of the top sixteen mutant genes in the two groups of samples are shown in Figure5. The frequencies of the mutant genes, such as VHL, CHEK2 and ATRX, were different between the high- and low-risk groups. Among them, the frequency of ATRX in the high-risk group was significantly higher than that in the low-risk group (p-value = 0.0455).
3.5 Stability assessment of the prognostic risk model
The stability of model risk score in the different RCC clinical characteristic subgroups of the TCGA database was evaluated. There were significant differences between the high and low-risk groups according to the age, gender, clinical stage and pathological pattern (Figure 5a-d). Moreover, it was indicated that the high-risk groups in all subgroups led to adverse prognosis. This showed that the comprehensive prognostic model constructed by the eight selected genes had a good stability.
3.6 Association of the genes involved in the model with the tumor immune infiltrates
The Tumor Immune Assessment Resource (TIMER) platform was used to download the immune score (Supplementary Table 4) of the advanced RCC samples. Then we have explored the relationship between the expression of HLA-B, HLA-A, HLA-DRA, IDO1, TAGAP, CIITA, PRF1 and CD8B at the transcriptional level and the tumor-infiltrating immune cell populations (B cells, CD8+ T cells, CD4+, T cells, neutrophils and dendritic cells). We found that the expression levels of PRF1, CIITA, TAGAP and HLA-DRA were positively correlated with infiltrates of six types of immune cells in the tumors. Additionally, higher infiltration levels of CD8+ T cells, neutrophils and myeloid dendritic cells were significantly correlated with higher expression of the eight selected genes, respectively (Figure 6).