Immunophenotypes and tumor microenvironment in ccRCC tissues
Figure 1 illustrates how IRGs in ccRCC were studied for their molecular function and prognostic value. A total of 72 kidney tissue specimens and 539 ccRCC specimens were downloaded from the TCGA transcriptome database. Accordingly, 539 ccRCC tissues from the TCGA database were analyzed for immune cell activity or enrichment, function, pathway, and checkpoints using the ssGSEA algorithm and 29 immune gene sets. In accordance with the hierarchical clustering results, we examined the immunity characteristics of clusters 1 (n=519) and 2 (n=20). Figure 2 shows the results. A cluster with high immunity was referred to as Immunity High, and a cluster with low immunity was referred to as Immunity Low (Figure 2A). Each TME sample was then scored, and the characteristics of the groups with Immunity High and Immunity Low were compared. The results showed that the StromalScore of the Immunity High group and the Immunity Low group were 691.8±21.22 and -602.9± 133.1 respectively. The Immune Score of the Immunity High group and the Immunity Low group were 1066±28.29 and -535.1± 66.03 correspondingly. The Immune Score of the Immunity High group and the Immunity Low group were 1758±42.63 and -1138±183.5 respectively (Figure 2D, E, F). To compare the degree of immune cell infiltration between Immunity High and Immunity Low groups, we used the CIBERSORT algorithm (Figure 2B). Among the two groups, the infiltration degree of T cells CD8, CD4 naive T cells, CD4 memory activated T cells, regulatory T cells (Tregs), macrophages M1, activated dendritic cells, resting mast cells, and eosinophils differed significantly (Figure 2C). There was a significant difference in the infiltration of immune cells between the two groups regardless of the machine learning algorithm. Identifying differentially expressed IRGs in ccRCC patients
Due to the lack of knowledge of immune-related molecular characteristics and their prognostic potential in ccRCC, we investigated the immunophenotype-specific molecular characteristics and prognostic potential of ccRCC-immune interactions. Our first step was to obtain 2,483 tumor-related immune genes from the ImmPort database. By screening for differentially expressed IRGs according to the screening criteria (|log2 FC| >1.0 and FDR<0.05), 355 up-regulated IRGs were found while 200 down-regulated IRGs were found. As shown in Figure 3A and B, these IRGs are expressed inconsistently.
Evaluation of the correlation between gene modules and clinical parameters
By utilizing WGCNA to build different gene modules and evaluate the relationship between these differentially expressed IRGs and clinical parameters, we performed WGCNA for these differentially expressed IRGs. According to the optimal soft threshold, six gene modules were identified. These modules are correlated with clinical parameters, including age, gender, tumor grade, tumor stage, T stage, N stage, and M stage, as shown in Figure 3C. The results show that all modules were related to grade (P<0.05), three modules were closely related to stage and T stage (P<0.01), three modules have a close negative correlation with M stage (P<0.01). A significant correlation was not found between age, gender, or N stage and gene modules. Consequently, further analysis of these differentially expressed IRGs was warranted because they might be associated with clinical prognosis.
Construction and evaluation of immune associated prognostic model
As a result of the univariate Cox regression analysis of these differentially expressed IRGs, we identified 190 IRGs that were associated with prognosis (Table S1). After performing LASSO regression analysis on these IRGs, 16 IRGs were selected as having prognostic significance, namely CCL7, CHGA, CMA1, CRABP2, DCD, IFNE, ISG15, NPR3, PDIA2, PGLYRP2, PLA2G2A, SAA1, TEK, TGFA, TNFSF14, and UCN2 (Figure S1). These IRGs were then analyzed with multivariate Cox regression to determine the most relevant IRGs to prognosis, yielding 15 IRGs in total, including CCL7, CHGA, CMA1, CRABP2, IFNE, ISG15, NPR3, PDIA2, PGLYRP2, PLA2G2A, SAA1, TEK, TGFA, TNFSF14, and UCN2.
To establish a prognostic model, multivariate Cox regression analysis was conducted on these 15 IRGs (Table 1). Each ccRCC patient was assigned a risk score based on the following formula:
Risk score = (0.0710×Exp CCL7) + (0.0935×Exp CHGA) + (-0.1179×Exp CMA1) + (0.0668×Exp CRABP2) + (0.0999×Exp IFNE) + (0.1679×Exp ISG15) + (0.0095×Exp NPR3) + (0.1025×Exp PDIA2) + (0.0619×Exp PGLYRP2) + (0.0078×Exp PLA2G2A) + (0.0270×Exp SAA1) + (-0.0207×Exp TEK) + (-0.1503×Exp TGFA) + (0.0949×Exp TNFSF14) + (0.0571×Exp UCN2)
The median risk scores of 480 ccRCC patients in TCGA were used to categorize the patients into high- and low-risk subgroups. An analysis of Kaplan-Meier survival showed that patients with high-risk disease had a worse outcome than those with low-risk disease (P=4.663e-15, Figure 4A). To further evaluate the predictive performance of the model, the area under ROC curves (AUCs) was 0.837 after one year, 0.751 after three years, and 0.749 after five years (Figure 4B). TCGA cohort members' survival status and expression heat maps were shown in Figures 4C and 4G. In addition, we calculated the risk score of the E-MTAB-1980 cohort using the same formula to determine if the model has similar predictive power in other ccRCC patient cohorts. Similarly, the Kaplan-Meier survival analysis showed a worse outcome for patients with high-risk factors (P=3.671e-04, Figure 4D), with predicted AUCs of 0.854, 0.817, and 0.867 at five years (Figure 4E). Each patient's survival status and expression heat map in the E-MTAB-1980 cohort was shown in figures 4F and 4H. The results showed that the IRG-based prognostic related model is stable and performs well.
Table 1. Multivariate Cox regression analysis to identify prognosis-related immune genes.
Gene
|
Coef
|
Exp(coef)
|
se(coef)
|
z
|
Pr(>|z|)
|
CCL7
|
0.0710
|
1.0736
|
0.0588
|
1.2075
|
0.2272
|
CHGA
|
0.0935
|
1.0980
|
0.0506
|
1.8479
|
0.0646
|
CMA1
|
-0.1179
|
0.8888
|
0.0625
|
-1.8863
|
0.0593
|
CRABP2
|
0.0668
|
1.0691
|
0.0554
|
1.2062
|
0.2277
|
IFNE
|
0.0999
|
1.1051
|
0.0508
|
1.9680
|
0.0491
|
ISG15
|
0.1679
|
1.1828
|
0.1041
|
1.6128
|
0.1068
|
NPR3
|
0.0095
|
1.0096
|
0.0576
|
0.1651
|
0.8688
|
PDIA2
|
0.1025
|
1.1079
|
0.0453
|
2.2615
|
0.0237
|
PGLYRP2
|
0.0619
|
1.0639
|
0.0519
|
1.1924
|
0.2331
|
PLA2G2A
|
0.0078
|
1.0078
|
0.0371
|
0.2104
|
0.8333
|
SAA1
|
0.0270
|
1.0273
|
0.0297
|
0.9095
|
0.3631
|
TEK
|
-0.0207
|
0.9795
|
0.0848
|
-0.2441
|
0.8072
|
TGFA
|
-0.1503
|
0.8605
|
0.0730
|
-2.0580
|
0.0396
|
TNFSF14
|
0.0949
|
1.0996
|
0.0630
|
1.5059
|
0.1321
|
UCN2
|
0.0571
|
1.0587
|
0.0654
|
0.8728
|
0.3828
|
Coef, coefficient.
Prognostic significance of the model stratified by clinical parameters
TCGA data was stratified by age, gender, tumor grade, tumor stage, T stage, N stage, and M stage to explore clinical significance of this model. Kaplan-Meier survival analysis showed that OS for high-risk patients was significantly lower than for low-risk patients (Figure 5). A prediction model based on IRGs can be used to predict the prognosis of ccRCC patients without regard to clinical parameters, according to these results.
Relationship between prognostic related IRGs and clinical parameters
In order to better understand the role of these genes in ccRCC, we examined the relationship between these 15 IRGs and clinical parameters. In the study, it was revealed that IFNE, NPR3, and SAA1 were significantly correlated with gender; CHGA, CMA1, IFNE, ISG15, NPR3, PDIA2, PLA2G2A, SAA1, TEK, TGFA, TNFSF14, and UCN2 were significantly correlated with grade; In terms of stage and T stage, CCL7, CHGA, CMA1, CRABP2, IFNE, ISG15, NPR3, PDIA2, PGLYRP2, PLA2G2A, SAA1, TEK, TGFA, TNFSF14, and UCN2 were significantly correlated; TEK was significantly correlated with N stage; There was a significant correlation between M stage and CCL7, CMA1, ISG15, NPR3, PDIA2, SAA1, TEK, TGFA, TNFSF14, and UCN2 (Table 2).
Table 2. The relationship between prognostic related immune genes and clinicopathologic parameters.
Gene
|
|
Gender
(Male/Female)
|
Grade
(G1-2/G3-4)
|
Stage
(I-II/ III-IV)
|
T stage
(T1-T2/T3-T4)
|
N stage (N0/N1-X)
|
M stage
(M0/M1-X)
|
N
|
|
318/162
|
221/257
|
292/185
|
310/170
|
213/267
|
382/96
|
CCL7
|
t-value
|
0.762
|
1.639
|
2.691
|
2.525
|
0.429
|
2.021
|
|
P-value
|
0.447
|
0.102
|
0.007
|
0.012
|
0.668
|
0.044
|
CHGA
|
t-value
|
0.675
|
3.595
|
3.451
|
3.374
|
0.156
|
1.773
|
|
P-value
|
0.500
|
<0.001
|
<0.001
|
<0.001
|
0.876
|
0.077
|
CMA1
|
t-value
|
1.409
|
4.590
|
6.814
|
5.772
|
0.602
|
5.487
|
|
P-value
|
0.160
|
<0.001
|
<0.001
|
<0.001
|
0.548
|
<0.001
|
CRABP2
|
t-value
|
0.868
|
1.396
|
3.088
|
3.076
|
0.785
|
0.361
|
|
P-value
|
0.386
|
0.163
|
0.002
|
0.002
|
0.433
|
0.718
|
IFNE
|
t-value
|
2.200
|
4.068
|
3.040
|
3.204
|
1.142
|
0.483
|
|
P-value
|
0.028
|
<0.001
|
0.003
|
0.001
|
0.254
|
0.630
|
ISG15
|
t-value
|
1.798
|
3.798
|
4.804
|
3.621
|
1.252
|
3.732
|
|
P-value
|
0.073
|
<0.001
|
<0.001
|
<0.001
|
0.211
|
<0.001
|
NPR3
|
t-value
|
2.240
|
4.513
|
3.996
|
3.713
|
0.369
|
3.714
|
|
P-value
|
0.026
|
<0.001
|
<0.001
|
<0.001
|
0.713
|
<0.001
|
PDIA2
|
t-value
|
0.549
|
3.560
|
5.851
|
4.853
|
1.238
|
4.614
|
|
P-value
|
0.584
|
<0.001
|
<0.001
|
<0.001
|
0.216
|
<0.001
|
PGLYRP2
|
t-value
|
0.281
|
1.465
|
4.133
|
3.278
|
0.744
|
1.501
|
|
P-value
|
0.779
|
0.144
|
<0.001
|
0.001
|
0.457
|
0.134
|
PLA2G2A
|
t-value
|
0.615
|
4.946
|
5.338
|
6.165
|
0.778
|
1.104
|
|
P-value
|
0.539
|
<0.001
|
<0.001
|
<0.001
|
0.437
|
0.270
|
SAA1
|
t-value
|
3.231
|
6.930
|
7.382
|
6.598
|
0.454
|
4.251
|
|
P-value
|
0.001
|
<0.001
|
<0.001
|
<0.001
|
0.650
|
<0.001
|
TEK
|
t-value
|
1.054
|
6.582
|
5.976
|
4.897
|
2.068
|
5.338
|
|
P-value
|
0.292
|
<0.001
|
<0.001
|
<0.001
|
0.039
|
<0.001
|
TGFA
|
t-value
|
1.761
|
3.349
|
3.547
|
2.953
|
1.217
|
5.708
|
|
P-value
|
0.079
|
<0.001
|
<0.001
|
0.003
|
0.224
|
<0.001
|
TNFSF14
|
t-value
|
1.523
|
6.451
|
4.582
|
3.850
|
0.920
|
2.008
|
|
P-value
|
0.128
|
<0.001
|
<0.001
|
<0.001
|
0.358
|
0.045
|
UCN2
|
t-value
|
0.666
|
3.796
|
4.727
|
4.528
|
0.042
|
2.060
|
|
P-value
|
0.506
|
<0.001
|
<0.001
|
<0.001
|
0.966
|
0.040
|
Multi-dimensional regulatory network and functional enrichment analysis of prognostic related IRGs
Using the prognostic model, we studied the upstream mechanism of IRGs in order to further explore their multi-dimensional regulatory network role in tumor genesis and development. After differential analysis of 318 TFs from the Cistrome Project, 79 TFs were identified, including 45 up-regulated TFs and 34 down-regulated TFs (Figure 6A). TF and IRG coexpression analyses were then used to identify regulatory relationships between differentially expressed TFs and prognostic indicators. It turned out that a total of 50 TFs were involved in upstream regulation, as shown in Figure 6B. The regulatory relationships between these TFs and the prognostic related IRGs are shown in Table S2.
After that, we performed GO and KEGG enrichment analyses of these differentially expressed IRGs using the clusterProfiler package in order to explore the biological functions and molecular mechanisms associated with these IRGs. According to GO enrichment analysis, the molecular regulatory network can be viewed from three different perspectives. In the biological process analysis, these differentially expressed IRGs were significantly enriched in the migration of leukocytes, cell chemotaxis, positive regulation of external stimuli, leukocyte chemotaxis, proliferation of leukocytes, migration of myeloid leukocytes, ERK1 and ERK2 cascade activation, migration of granulocytes, granulocyte chemotaxis, and neutrophil chemotaxis (Figure 6C). These differentially expressed IRGs were greatly enriched on the external side of the plasma membrane, the lumens of vesicles, the membrane raft, the membrane microdomain, the secretory granule lumen, the membrane region, the collagen-containing extracellular matrix, the plasma membrane receptor complex, and the alpha granule lumen of platelets (Figure 6C). It was found that these differentially expressed IRGs had significantly higher receptor ligand activities, cytokine receptor binding activity, cytokine activity, growth factor activity, G protein-coupled receptor binding activity, cytokine receptor activity, cytokine binding activity, hormone binding activity, chemokine receptor binding activity, and chemokine activity (Figure 6C). Furthermore, KEGG analysis indicated that the differentially expressed IRGs were mainly enriched in interactions between cytokines and cytokine receptors, viral proteins and cytokine receptors, chemokine signaling pathways, JAK-STAT signaling pathways, PI3K-Akt signaling pathways, MAPK signaling pathways, Ras signaling pathways, and Rap1 signaling pathways.
Construction and validation of a nomogram
To assess the prognostic significance of different clinical parameters and risk scores in ccRCC patients, we first performed the Cox regression analysis. Age (P=0.004), primary tumor location (P<0.001), tumor grade (P<0.001), tumor stage (P<0.001), lymph node infiltration (P=0.031), distant metastasis (P<0.001), and risk score (P<0.001) of ccRCC patients were significantly correlated with OS (Figure 7A). However, multiple regression analysis indicated that age (P=0.042), tumor stage (P<0.001), primary tumor location (P=0.023), and risk score (P<0.001) were independent prognostic factors associated with OS (Figure 7B).
After combining clinical parameters and risk scores, a nomogram was constructed by using the RMS package (Figure 7C). A 0-100 distribution was normalized by mapping the points of each variable to the corresponding horizontal line. The survival rates of ccRCC patients were calculated by drawing vertical lines between the points and total points, which can serve as a reference for clinical decision making. The calibration curve also shows a good correlation between the predicted and actual values (Figures 7D, E, F). To extend its clinical application and availability, the nomogram was also verified in the TCGA and E-MTAB-1980 cohorts. In both cohorts, the nomogram was able to better distinguish between patients with poor prognoses (P<0.001 and P=1.923e-06, Figure 7G, I). Based on the nomogram, the predicted AUCs for 1‐, 3‐ and 5‐ years were 0.897, 0.803, and 0.775 respectively in the TCGA cohort (Figure 7H), while the predicted AUCs in the E-MTAB-1980 cohort were 0.916, 0.919, and 0.904 correspondingly, indicating that the nomogram was very accurate and predictive.
Comparison of IRGs-based prognostic related model with other prognostic models
Using four different types of models, we compared our immune-related model to four-gene models (Gao, Yang et al. 2020), seven-gene models (Wan, Liu et al. 2019), six-gene models (Ren, Wang et al. 2020), and five-gene models (Hua, Chen et al. 2020). For these models, we obtained the genes from the literature and constructed the ROC curve and survival curve for the entire TCGA cohort. Our model is more accurate in predicting the prognosis of ccRCC than the other four models after analyzing and comparing these models (Figure 8).