ccRCC transcriptome data identified 269 high level differently expressed meanwhile immune related genes in cancer versus normal renal samples
Four GEO profiles GSE53000, GSE53757, GSE68417 and GSE71963 were combine applied to explore the aberrant differently expressed genes in ccRCC comparing to normal renal samples. And in GSE53000, a total of 5559 genes were identified to be differently expressed including 4270 genes with the expression change ≤ 2 fold, 1028 genes that were 2 ~ 4 fold, 180 genes that were 4 ~ 8 fold and 81 genes whose expression were > 8 fold in ccRCC comparing to normal renal samples (Fig. 1A). And in GSE53757, a total of 28021 genes were identified, and the number was 21367, 4857, 1195 and 602 in ≤ 2 fold, 2 ~ 4fold, 4 ~ 8 fold and > 8 fold groups respectively (Fig. 1B). In GSE68417, a total of 10150 genes were identified, and the number was 8276, 1425, 290 and 159 in each group (Fig. 1C). Meanwhile, in GSE71963, a total of 10744 genes were identified, and the number was 6266, 3120, 841 and 517 genes in each group respectively (Fig. 1D, Table S2).
Considering the feasibility of further clinical medical use, we mainly focused on the high differently expressed genes (at least > 4 fold in cancer vs. normal). As a result, besides the genes that were shared in multiple profiles, the analysis of 4 GEO profiles indicated a total of 3095 genes that were high level changed expressed in ccRCC comparing to normal control samples (Fig. 1E). Further, the immune related genes list was obtained from IMMPORT immune database, and Venn diagram analysis result identified 269 genes from the 3095 genes that were both high level expression changed and immune regulation related for next step analysis (Fig. 1F, detailed 269 genes information was listed in Table S3).
PPI network of 269 genes highlighted a 46 genes-containing immune relating gene cluster
The PPI network of 269 differently expressed meanwhile potentially immune related genes was constructed (Fig. 2A), and based on the network we identified three promising gene clusters. The first gene cluster posses the highest computed module score and contains 46 genes with a big portion of them predicted to be related with immune system modulation (Fig. 2B). Meanwhile, the second and third gene modules contain 25 and 29 genes respectively, and genes were mostly related with CXCR4, PI3K, EGF and mTOR related signaling pathways (Fig. 2C, 2D).
Given the first module gene cluster possess the highest score and a big percentage of containing genes were immune system related which shows more potential for further clinical immune indicators selection, the 46 genes in the gene cluster were mainly focused for next step analysis.
Kaplan-Meier combine with Cox regression analysis of cluster genes identified 4 ccRCC prognosis related hub genes
Genes should be of more potential if they were both immune regulation and survival related, and promising immune biomarkers should also be prognosis related for potential clinical medical drug targeting use. To further analyze survival relationship of the 46 selected immune related candidate genes, univariate survival analysis including UALCAN and GEPIA, as well as multivariate Cox regression analysis were in succession performed, and the results supported four genes, namely MMP9 (Fig. 3A), NFKB1 (Fig. 3B), IRF7 (Fig. 3C) and HMOX1 (Fig. 3D) as independent prognostic indicators in ccRCC, all four genes not only relate with patients overall survival but also progress free survival indicating their high potential in clinical medical use (Table 1).
Table 1 Univariate combine with multivariate Cox Regression analysis result of the 4 hub genes used for signature construction
OSCC parameters
|
P Value
|
B value
|
HR
|
95% CI
|
Univariate analysis
|
Multivariate analysis
|
UALCAN
|
GEPIA
|
MMP9
|
0.001
|
0.016
|
0.002
|
0.256
|
1.292
|
1.009-1.519
|
NFKB1
|
0.013
|
2.9E-05
|
0.004
|
-0.216
|
0.805
|
0.696-0.932
|
IRF7
|
<0.0001
|
9E-04
|
0.022
|
0.077
|
1.080
|
1.011-1.154
|
HMOX1
|
0.034
|
0.00062
|
<0.001
|
-0.222
|
0.801
|
0.708-0.906
|
Basic physicochemical properties of the 4 selected hub genes
Basic physiochemical properties of MMP9, NFKB1, IRF7 and HMOX1 were preliminary interpreted before deeper scientific use of them (Table 2). As for MMP9, which is a member of matrix metalloproteinase (MMP) family, locates in 20q13.12 and encodes a protein composed of 707 amino acids with an estimated molecular weight of 78.5KD. The theoretical isoelectric point of the protein is estimated to be 5.69 and instability index to be 41.10, meanwhile, the grand average of hydrophobic value of the protein is -0.394 indicating MMP9 works as a cellular unstable and hydrophilic protein which locates in cellular cytoplasm or to be secreted in the extracellular region, and the related signaling pathways include collagen chain trimerization and apoptotic pathways in synovial fibroblasts (Fig. 3E).
Table 2
Basic physicochemical properties of the 4 hub genes used for gene signature construction
Gene Property
|
MMP9
|
NFKB1
|
IRF7
|
HMOX1
|
Formula
|
C3517H5298N958O1035S28
|
C4643H7343N1271O1458S33
|
C2418H3740N678O710S19
|
C1475H2323N405O427S8
|
Molecular Weight
|
78.46KD
|
105.36KD
|
54.28KD
|
32.82KD
|
Number of amino acids
|
707AA
|
968AA
|
503AA
|
288AA
|
Theoretical pI
|
5.69
|
5.20
|
5.89
|
7.89
|
Aliphatic index
|
65.13
|
84.74
|
72.07
|
83.02
|
Hydrophobic value
|
−0.394
|
−0.339
|
−0.367
|
−0.427
|
Estimated protein half life
|
30h
|
30h
|
30h
|
30h
|
Instability index
|
42.10
|
38.15
|
63.17
|
60.81
|
NFKB1, which is short for Nuclear Factor Kappa B Subunit 1 locates in 4q24 and encodes a protein composed of 968 amino acids and estimated to be weighting 105KD with computed theoretical isoelectric point as 5.20 and instability index as 38.15. Meanwhile, the grand average of hydrophobic value of protein is -0.339 indicating NFKB1 to be cellular stable and hydrophilic. NFKB1 is predicted to locates in nucleoplasm and cytoplasm, it has been reported as a transcription regulator that could be activated by various cellular stimuli such as cytokines, ultraviolet irradiation, and bacterial or viral products. Activated NFKB1 translocates into cell nucleus and stimulates the expression of genes involved in various biological functions (Fig. 3F).
IRF7 is short for Interferon Regulatory Factor 7 and it’s a member of the interferon regulatory factor (IRF) family, locating in 11p15.5 and encoding a protein composed of 503 amino acids including 56 negatively charged amino acid residues (ASP + Glu) and 49 positively charged amino acid residues (Arg + Lys). The estimated protein molecular weight is 54.2KD with theoretical isoelectric point computed to be 5.89. Meanwhile, the estimated instability index of the protein is 63.17 and grand average of hydrophobic value is -0.367, the cellular location of the gene is predicted to be in nucleoplasm or cytoplasm (Fig. 3G).
Meanwhile, HMOX1 is short for Heme Oxygenase1, and locates in 22q12.3, encoding a protein composed of 288 amino acids including 35 negatively charged amino acid residues (ASP + Glu) and 36 positively charged amino acid residues (Arg + Lys). The estimated protein molecular weight is 32.8KD with theoretical isoelectric point computed as 7.89. Moreover, the estimated instability index of the protein is 60.81 and grand average of hydrophobic value is -0.427 indicating HMOX1 works as a cellular unstable and hydrophilic protein which is consistent with the ProtScale analysis result of HMOX1 structure showing that the protein harbors more hydrophilic regions than hydrophobicity regions. HMOX1 is predicted to locates in cellular Golgi apparatus and plasma membrane, it has been reported to be associated with the development of heme oxygenase1 deficiency and pulmonary disease, as well as chronic obstructive (Fig. 3H).
Validation of the changed expression of selected hub genes in ccRCC versus normal renal tissues
Although the four selected hub genes were obtained from the differently expressed gene clusters analyzed based on GEO data from the beginning, after preliminary interpretation of the basic physicochemical information, it’s necessary to validate each of the gene’s aberrant changed expression in ccRCC comparing to normal renal samples individually. In the study, two analysis databases including UALCAN and GEPIA were used, and the results revealed that as for MMP9 and IRF7 genes, they gain of expression in most of human cancers (Fig. 4A, 4C). And as for NFKB1 and HMOX1, their expression vary in different cancers (Fig. 4E, 4G), although in ccRCC, all four genes were indicated to be statistical significantly up regulated in cancers comparing to normal renal tissues (Fig. 4B, 4D, 4F, 4H).
Construction of a 4 genes containing ccRCC prognosis related immune gene signature and clinical features analysis
To maximum the clinical prediction value of the four selected genes, Cox-proportional hazards analysis based on LASSO algorithm was applied to construct a MMP9, NFKB1, IRF7 and HMOX1 four genes containing signature which weights the normalized expression level of each gene to the regression coefficient of multivariate Cox regression analysis. And the result revealed a formula: Risk Score = 0.256 * expression (MMP9) − 0.222 * expression (HMOX1) − 0.216 * expression (NFKB1) + 0.077 * expression (IRF7) as the best signature for combining the four differently expressed meanwhile immune related hub genes (Fig. 5A, 5B).
Based on the signature, the risk score for each patient was calculated followed by the patients being categorized into high-risk or low-risk groups according to the median risk score which was set as the cut off point for the signature (Fig. 5C). Further, the association between the gene signature and ccRCC clinical features was preliminary analyzed, which result revealed that higher risk score was positively related with older age (> 45 years old) and more advanced T, N and M stage, meanwhile, the low risk group of patients were tend to be younger (≤ 45 years old) female with lower TNM stage (Table 3).
Table 3
Association between the gene signature and ccRCC clinical features
Parameters
|
Gene signature
|
P Value
|
Low-risk group
|
High-risk group
|
Gender
|
|
|
|
|
female
|
112 (59.6%)
|
76 (40.4%)
|
0.001
|
male
|
154 (44.6%)
|
191 (55.4%)
|
Age
|
|
|
|
|
≤ 45
|
38 (64.4%)
|
21 (35.6%)
|
0.018
|
> 45
|
228 (48.1%)
|
246 (51.9%)
|
Race
|
|
|
|
|
White
|
234 (50.6%)
|
228 (49.4%)
|
0.697
|
Yellow
|
4 (50.0%)
|
4 (50.0%)
|
Black
|
25 (44.6%)
|
31 (55.4%)
|
T stage
|
|
|
|
|
T1
|
164 (60.01%)
|
109 (39.9%)
|
< 0.001
|
T2
|
31 (44.9%)
|
38 (55.1%)
|
T3
|
69 (38.3%)
|
111 (61.7%)
|
T4
|
2 (18.2%)
|
9 (81.8%)
|
N stage
|
|
|
|
|
N0
|
115 (47.9%)
|
125 (52.1%)
|
0.023
|
N1
|
3 (18.8%)
|
13 (81.3%)
|
M stage
|
|
|
|
|
M0
|
231 (54.7%)
|
191 (45.3%)
|
< 0.001
|
M1
|
22 (27.8%)
|
57 (72.2%)
|
High risk score based on the gene signature indicated worse ccRCC patients prognosis
All four genes used to construct the gene signature were previously supported to be high level differently expressed in ccRCC comparing to normal renal samples, but changed gene expression doesn’t equal to survival association. To validate the survival relationship of the gene signature, a series of methods were used. Firstly, Kaplan-Meier survival analysis revealed that the high risk group of patients had a statistical significantly worse overall survival than their low risk counterparts (Fig. 5D). Then the ROC curve showed that the area under the ROC curve (AUC) of gene signature for overall survival was 0.747 at 1 year, 0.696 at 3 year and 0.705 at 5 years (Fig. 5E). Meanwhile, univariate Kapkan-Meier survival as well as multivariate Cox regression analysis were applied for testing the survival prediction ability of the signature, and the results supported the risk score calculated based on the gene signature works as an independent prognosis indicator for ccRCC patients together with some other well accepted prognosis related clinical parameters including patient T and M stage (Table 4). Further, a nomogram was constructed and and in the nomogram, a point scale was assigned for each variable, the sum of all the variables points equal to the final score of each patient, and the survival could be predicted by drawing a vertical line from the total point axis downward to the outcome axis (Fig. 5F).
Table 4
Survival prediction value of the gene signature included ccRCC clinical parameters
Clinical parameters
|
P Value
|
Exp (B)
|
Univariate analysis
|
Multivariate analysis
|
Age
|
< 0.001
|
0.026
|
1.627 (1.058−2.500)
|
Gender
|
0.693
|
-
|
-
|
Race
|
0.719
|
-
|
-
|
T stage
|
< 0.001
|
0.018
|
1.360 (1.054–1.755)
|
N stage
|
< 0.001
|
0.480
|
1.290 (0.636–2.615)
|
M stage
|
< 0.001
|
< 0.001
|
2.794 (1.728–4.517)
|
Signature risk score
|
< 0.001
|
0.002
|
1.871 (1.299–2.693)
|
High-risk group of ccRCC cases were more enriched in immune related phenotype
After preliminary demonstrating the association between constructed gene signature and ccRCC prognosis, the influence of gene signature on cancer immune profiles was to be investigated. And in the first step, GSEA was utilized to analyze the immune-related biological processes linked to the signature, and the analysis result showed that the high-risk group cases were significantly enriched in multiple biological processes, of which 4 immune-related processes were identified including HUMORAL_IMMUNE_RESPONSE (NES = 1,733, Nominal p value = 0.0), REGULATION_OF_T_CELL_MIGRATION (NES = 1.762, Nominal p value = 0.0), REGULATION_OF_LYMPHOCYTE_CELL_MIGRATION (NES = 1.743, Nominal p value = 0.003), POSITIVE_REGULATION_OF_IMMUNOGLOBULIN_PRODUCTION (NES = 1.754,Nominal p value = 0.0). Meanwhile, the low-risk group cases were not indicated to be enriched in any immune-related biological processes (Fig. 5G).
High-risk and low-risk groups of ccRCC patients revealed disparate ICD expression levels
Besides GSEA immune phenotype enrichment analysis, given the significant roles of ICD in antitumor immunological responses, the connection between gene signature and ICD related genes were evaluated for additionally exploring the immune status in high-risk and low-risk groups of ccRCC patients. And the results revealed that the expression of a large portion of the 32 ICD related genes were statistical significantly different between the two groups of patients indicating the diverse immune status in the microenvironment of two groups of patients (Fig. 6A).
Risk score calculated based on the gene signature associated with ccRCC estimated environment immune score
For further validating the immune association of the gene signature, ESTIMATE was performed to evaluate the immune, stromal score and tumor purity of ccRCC samples. And the result revealed that although no significant correlation was found between the gene signature and ccRCC stromal score, a mediate correlation was revealed between the risk score which was calculated based on the gene signature and tumor immune score as well as tumor purity (Fig. 6B). Meanwhile, the high risk-group patients were tend to posses higher immune score and lower tumor purity, indicating the immune targeting potential of the group of patients (Fig. 6C).
Multi immune checkpoints expressed higher in high risk group of ccRCC patients based on the gene signature
Besides estimation of immune score, the association between the gene signature and clinical promising immune checkpoints including PD-L1, CTLA4, TIGIT, TIM-3 and LAG-3 were evaluated (Fig. 6D). And a median association was revealed between the gene signature and CTLA4 expression. Moreover, mild correlation was indicated between gene signature and two of the immune checkpoints including LAG3 and TIGIT, meanwhile, no significant relation was found between the signature and PD-L1 or TIM-3 expression (Fig. 6E). An inspiring fact was that all CTLA4, LAG3 and TIGIT tend to express higher in high-risk group of patients which was categorized based on the gene signature (Fig. 6F), and the distribution shall be an additional support besides above ESMINATE immune score evaluation result for indicating the immune targeting potential for this group of ccRCC patients.
Evaluation of relationships between gene signature and 22 tumor infiltrating immune cells (TICs)
Previous analysis supported that the constructed gene signature was related to immunity, so we carried out analyses on 22 TICs whose distribution profiles were draw based on CIBERSORT algorithm to further study the interaction between the gene signature and ccRCC immune microenvironment. And the correlation analysis result found four types of TICs to be related with the gene signature including plasma cells, activated CD4(+) T memory cells, activated dendritic cells and resting mast cells (Fig. 7A-7C).
Further, the prognostic abilities of the 22 TICs were tested and the results revealed that of the four signature related TICs, CD4(+) T cell and resting mast cell were able to predict ccRCC patients prognosis. The resting and activated CD4(+) T memory cells played opposite roles in patients survival, namely the activated CD4(+) T memory cells were related with worse patients survival (Fig. 7D), meanwhile, the resting CD4(+) T memory cells predicting better patients survival (Fig. 7E). Also, the resting mast cells were correlated with positive patients prognosis (Fig. 7F).
Combining the analysis results, one inspiring deduction could be draw that CD4(+) T memory cells and resting mast cells not only are significantly related to the gene signature but also predict ccRCC patients prognosis, indicating these immune cells may play important roles in the immune regulation of the gene signature in ccRCC microenvironment.