3.1 Association of genes in described immune-related signatures with disease free survival and overall survival in RCC
In our present study, we downloaded the data of 758 RCC samples from TCGA database and independent data of 91 RCC samples from ICGC database, respectively. The analysis of the correlation between the expression levels of immune-related genes and the prognosis of RCC allowed us to select genes derived from IFN-gamma signature, extended immune gene signature, cytotoxic T lymphocyte (CTL) signature and HLA-A and HLA-B in HLA I molecules. These immune-related signatures were reported to be related to the prognosis of solid tumors, such as melanoma, ovarian cancer, breast cancer[11-15].
The univariate COX regression analysis was used to correlate gene expression levels with DFS and the OS of RCC. First, according to the clinical stage, we have divided the samples into two groups: an early stage group that comprised RCC in stages I and II and an advanced stage group containing RCC in stages III and IV. After excluding the invalid samples, 499 early RCC and 231 advanced RCC samples were further analyzed. In the two groups of RCC subsets, we found that a few of immune-related genes were significantly associated with DFS and OS of RCC patients. For the early stage RCC, we found that high expression levels of CXCL13 and STAT1 resulted in poor DFS while the high expression levels of IDO1, CXCL13 and GZMB were related to detrimental OS. For the advanced RCC, the high expression levels of TNFRSF8 and CXCL13 were shown to be good predictors of adverse DFS and OS, respectively. (Supplementary Figure 1).
3.2 Construction of predictive models of RCC on the basis of genes from each immune-related signature
Genes from four gene signatures were studied to perform a multiple COX regression analysis in early and advanced RCC groups and to construct prognostic models for the OS of RCC and to evaluate the performance of each model in the two groups of samples. The model was used to calculate the risk score of each sample. It determined the division threshold according to the surv_cutpoint function, divided the samples into high-risk and low-risk groups, and conducted the KM survival analysis according to the high and low risk groups of the samples. All four models allowed discrimination of the RCC samples into high and low risk groups. The OS was worse in the high-risk RCC group than in the low-risk one. Contrary to the early stage RCC, the survival curves for the four models of immune-related signatures indicated more significant differences in the OS between the two groups from the advanced RCC (HLA-A and HLA-B: p value = 0.0015; IFN-gamma signature: p value =9.787e-6; Expended immune gene signature: p value =1.137e-11; Cytotoxic T cell lymphocyte signature: p value = 0.00011 ) as shown in Figure 1.
3.3 Establishment and validation of the prognostic model with selected genes for advanced RCC
The four risk models constructed by using the four immune-related signatures in advanced RCC samples divided the samples into high and low risk groups with significant statistical differences in the overall survival rate. Therefore, we have used 8 genes that were most likely to be associated with OS in the advanced RCC samples. These genes were HLA-B; HLA-A; HLA-DRA; IDO1; TAGAP; CIITA and PRF1. These genes were combined to make multiple COX regression analysis, and a comprehensive prognosis prediction model was constructed according to the gene weight coefficient (see Supplementary Table 1). The advanced RCC samples could also be divided into high-risk and low-risk groups according to risk score of each sample (Supplementary Table 2) and the division threshold, cutoff = -2.20465, showing in figure 2B. The OS in the high-risk group is lower than that in the low risk group and there were significant differences in the overall survival between the two groups with p value = 0.032 (Figure 2A). The ROC curve suggested that the comprehensive prediction model built on the data for the studied 8 genes was relatively stable for survival prognosis prediction of advanced RCC (area under curve (AUC) = 0.64 showing in the Figure 2E).
We further used 91 of RCC samples from the ICGC database to validate the comprehensive predictive model. The division threshold determined by the above method was -2.622015 (Figure 4B). And the samples were divided into high and low risk groups (Supplementary Table 3) according to the threshold. The results showed that the OS in the high-risk group was significantly lower than that in the low-risk group (p value = 0.013 showing in the figure 4A). The prediction result of the model was consistent with the previous results (Figur3 and Figure 4), and the stability of the model was effectively verified.
3.4 Landscape Analysis of Gene mutation in high and low risk advanced RCC groups based on TCGA database
Among the advanced RCC samples in the TCGA database, the genes with the top 10 mutation rates in the high risk group included TTN, MUC4, PBRM1, VHL, CHECK2, ATRX, DNAM2, FAT1, FRG1B, KMT2C (Figure 5A), while in the low risk group these genes were PBRM1, VHL, TTN, SETD2, MUC4, BAP1, MUC16, MT-CYB, MUC2, CSMD3 (Figure 5B). The distribution and annotation of mutations of top16 mutant genes in the two groups of samples showed in the Figure5. The frequencies of the mutant genes, such as VHL, CHEK2 and ATRX , were different between the high-risk and low-risk groups. Among them, the frequency of ATRX in the high-risk group was significantly higher than that in the low-risk group (p value = 0.0455).
3.5 Stability assessment of prognosis prediction model
The stability of model risk score in different RCC clinical characteristic subgroups of TCGA database was evaluated. There were significant differences between the high and low risk groups according to the age, gender, clinical stage and pathological pattern (Figure 5A-D). Moreover, it was indicated that the high-risk groups in all subgroups led to adverse prognosis. This showed that the comprehensive prognostic model constructed by the 8genes had good stability.
3.6 Association of the genes involved in the model with tumor immune infiltrates
The Tumor Immune Assessment Resource (TIMER) platform was used to download the immune score (Supplementary Table 4) of advanced RCC samples. Then we have explored the relationship between the expression of HLA-B, HLA-A, HLA-DRA, IDO1, TAGAP, CIITA, PRF1 and CD8B at transcriptional level and the tumor infiltrating immune cell populations (B cells, CD8+ T cells, CD4+, T cells, neutrophils and dendritic cells). We found that the expression levels of PRF1, CIITA, TAGAP and HLA-DRA were positively correlated with infiltrates of six types of immune cells in tumors. Additionally, higher infiltration levels of CD8+ T cells, neutrophils and myeloid dendritic cells were significantly correlated with higher expression of the 8 selected genes, respectively (Figure 6).