FDEGs identification in HCC.
We exhibited the scheme of our study in a flow chart (Fig. 1). A total of 1014 differential expressed genes, including 539 down-regulated and 375 up-regulated genes, were identified in 242 HCC tissues compared with 246 adjacent normal liver tissues from the GSE14520 dataset. Next, we obtained 254 ferroptosis-related genes from the FerrDb database. Then, the overlapping 39 genes between the 1014 differential expressed genes with 254 ferroptosis-related genes were identified as FDEGs.
Establishment of the prognostic seven-gene signature.
The univariate Cox regression and multivariate Cox regression on 39 FDEGs were performed sequentially to identify the independent prognostic genes for RFS. After the Multivariate Cox regression, seven genes were identified to establish a predictive gene signature. These seven genes were mitogen-activated protein kinase 9 (MAPK9), solute carrier family 1 member 4 (SLC1A4), phosphoenolpyruvate carboxykinase 2 (PCK2), acyl-CoA synthetase long-chain family member 3 (ACSL3), stathmin 1(STMN1), cysteine dioxygenase type 1 (CDO1), and chemokine ligand 2 (CXCL2). The risk score = (-0.167) * expression MAPK9 + (-0.086) * expression SLC1A4 + (-0.167) * expression PCK2 + 0.203 * expression ACSL3 + 0.201 * expression STMN1 + (-0.003) * expression CDO1 + 0.109 * expression CXCL2.
Table 1
Multivariate Cox regression analysis of the 7-gene signature
Gene
|
Coef
|
HR
|
Lower 95%CI
|
Upper 95%CI
|
P-Value
|
MAPK9
|
-0.167
|
0.497
|
0.330
|
0.750
|
0.001
|
SLC1A4
|
-0.086
|
0.600
|
0.387
|
0.929
|
0.022
|
PCK2
|
-0.167
|
0.516
|
0.315
|
0.847
|
0.009
|
ACSL3
|
0.203
|
1.784
|
1.105
|
2.879
|
0.018
|
STMN1
|
0.201
|
1.851
|
1.102
|
3.112
|
0.020
|
CDO1
|
-0.003
|
0.593
|
0.379
|
0.926
|
0.022
|
CXCL2
|
0.109
|
1.606
|
1.027
|
2.512
|
0.038
|
HR-hazard ratio; CI-confidence interval. |
Internal validation of the prognostic gene signature.
We calculated the seven-gene-based risk score for each HCC patient in the training set (GSE14520). Next, the 242 patients were separated into high-risk and low-risk groups based on the median Risk score (Fig. 2A). The Kaplan–Meier survival analysis and time-dependent ROC curves were used to evaluate the predictive performance of this gene signature for RFS. The ROC curve revealed that AUCs of 1-year, 3-year, 5-year for RFS were 0.68, 0.64, and 0.61 (Fig. 2B). Furthermore, The Kaplan–Meier survival analysis revealed that patients in the high-risk group exhibited a poorer RFS than patients in the low-risk group (Fig. 2C). In addition, the correlation analysis demonstrated that high-risk score correlated to tumor-node-metastasis (TNM) stage (P=0.020), serum AFP level (P<0.001), alanine aminotransferase (ALT) (0.025), predicted risk metastasis signature (PRMS) (P<0.001), recurrence (P=0.001), and death (P=0.002) (Table 2). Moreover, we explored the independent prognostic value of the gene signature for RFS by the univariate and multivariate Cox regression analyses. We found that gender (P=0.009), PRMS (0.006), tumor size (P=0.045), TNM stage(P<0.001), and high-risk score (P<0.001) were risk factors of recurrence-free survival by performing univariate cox regression analysis. The multivariate Cox regression analysis confirmed that gender (HR (95%CI): 2.092(1.081-4.049); P=0.028), TNM stage (HR (95%CI): 2.608(1.262-3.389); P=0.004), and high-risk score (HR (95%CI): 1.879(1.215-2.906); P=0.005) were independent risk factors for RFS (Fi. 3). These results exhibited a favorite performance of the seven-gene signature for RFS prediction.
Table 2
Correlation between risk score and clinicopathological features of HCC patients for RFS in the GSE14520 HCC cohort.
Characteristics
|
|
N
|
Risk score level
|
X2
|
P-Value
|
Low
|
High
|
Age
|
>55
|
117
|
64
|
53
|
2.370
|
0.124
|
<=55
|
125
|
56
|
69
|
Gender
|
Male
|
211
|
106
|
105
|
0.279
|
0.598
|
Female
|
31
|
14
|
17
|
Main tumor size
|
>5cm
|
88
|
36
|
52
|
0.131
|
0.064
|
<=5cm
|
154
|
84
|
70
|
TNM stage
|
I/II
|
174
|
97
|
77
|
5.400
|
0.020
|
III
|
51
|
19
|
32
|
Serum AFP level
|
>300ng/ml
|
110
|
41
|
69
|
12.393
|
<0.001
|
<=300ng/ml
|
128
|
77
|
51
|
ALT
|
>50U/L
|
100
|
41
|
59
|
5.027
|
0.025
|
<=50U/L
|
142
|
79
|
63
|
Multinodular
|
Yes
|
52
|
22
|
30
|
1.404
|
0.236
|
No
|
190
|
98
|
92
|
Cirrhosis
|
Yes
|
223
|
107
|
116
|
2.926
|
0.087
|
No
|
19
|
13
|
6
|
PRMS
classification
|
High
|
121
|
25
|
96
|
80.997
|
<0.001
|
Low
|
121
|
95
|
26
|
Recurrence
|
Yes
|
136
|
55
|
81
|
10.389
|
0.001
|
No
|
106
|
65
|
41
|
Death
|
Yes
|
96
|
36
|
60
|
9.299
|
0.002
|
No
|
146
|
84
|
62
|
TNM - tumor, node, metastasis, AFP - alpha fetoprotein, ALT- alanine aminotransferase, PRMS-Predicted risk Metastasis Signature. P-Value<0.05 were considered statistically significant. |
Validation of the prognostic gene signature in the TCGA database.
We download the RNA seq and corresponding clinical characteristic data in the TCGA database to validate the performance of the seven-gene signature for RFS prediction. We calculated the seven-gene-based risk score for each HCC patient in the validation set (TCGA HCC cohort). The 372 patients were separated into high-risk and low-risk groups based on the median Risk score (Fig. 4A). The ROC analysis showed that the AUCs of 1-year, 3-year, 5-year for RFS were 0.69, 0.73, and 0.74, respectively (Fig. 4B). Furthermore, consistent with the result of the GSE14520 dataset, the Kaplan–Meier survival analysis revealed that patients in the high-risk group exhibited a poorer RFS than patients in the low-risk group (Fig. 4C). In addition, the correlation analysis demonstrated that high-risk score correlated to tumor grade (P=0.026), preoperative pharmaceutical (P=0.036), T 3/4(P=0.002), lymph node invasion (P=0.001), metastasis (P=0.048), recurrence (P<0.001), and death (P=0.048) (Table 3). Moreover, we explored the independent prognostic value of the gene signature for RFS by the univariate and multivariate Cox regression analyses. We found that preoperative pharmaceutical (P=0.044), pathologic stage (0.003), stage 3/4(P<0.001), lymph node invasion (P=0.040), and high-risk score (P<0.001) were risk factors of recurrence-free survival via univariate Cox regression analysis. The multivariate Cox regression analysis confirmed that T 3/4 (HR (95%CI): 2.056(1.320-3.204); P=0.001), and high-risk score (HR (95%CI): 1.779(1.286-2.462); P=0.001) were independent risk factors for RFS (Fig. 5). Overall, these results validated the good performance of the seven-gene signature for RFS prediction.
Table 3
Correlation between risk score and clinicopathological features of HCC patients for RFS in the TCGA HCC cohort.
Characteristics
|
|
N
|
Risk score level
|
X2
|
P-Value
|
Low
|
High
|
Age
|
>60
|
180
|
83
|
97
|
1.971
|
0.160
|
<=60
|
191
|
102
|
89
|
Gender
|
Male
|
250
|
128
|
122
|
0.546
|
0.460
|
Female
|
121
|
57
|
64
|
Race
|
White
|
192
|
91
|
101
|
0.971
|
0.325
|
Other
|
179
|
94
|
85
|
Tumor grade
|
G1/G2
|
236
|
128
|
108
|
4.959
|
0.026
|
G3/G4
|
135
|
57
|
78
|
Radiation
|
Yes
|
10
|
3
|
7
|
1.600
|
0.203
|
No
|
361
|
182
|
179
|
Pharmaceutical
|
Yes
|
24
|
7
|
17
|
4.397
|
0.036
|
No
|
247
|
178
|
169
|
Pathologic stage
|
I/II
|
266
|
137
|
129
|
1.009
|
0.315
|
III/ IV
|
105
|
48
|
57
|
T
|
T1/T2
|
260
|
143
|
117
|
9.165
|
0.002
|
T3/T4
|
111
|
42
|
69
|
N
|
Yes
|
15
|
1
|
14
|
11.669
|
0.001
|
No
|
156
|
184
|
172
|
M
|
Yes
|
13
|
3
|
10
|
3.867
|
0.048
|
No
|
358
|
182
|
176
|
Recurrence
|
Yes
|
171
|
62
|
109
|
23.496
|
<0.001
|
No
|
200
|
123
|
77
|
Death
|
Yes
|
130
|
55
|
75
|
4.572
|
0.032
|
No
|
241
|
130
|
111
|
Establishment and validation of a predictive nomogram.
We next established a nomogram that integrated all the independent risk factors including gender, risk score, and tumor stages identified from the multivariate Cox regression analysis to predict the RFS of 1-year, 3-year, and 5-year (Fig. 6A). The C-index of the combined nomogram model was 0.872. The calibration curve representing the actual and combined model-predictive RFS of the training set (GSE14520) at 1-year, 3-year, and 5-year exhibited an accurate performance for predicting recurrence of the nomogram (Fig. 6B-D). Furthermore, the ROC analysis showed that the AUCs for 1-year, 3-year, 5-year RFS predictions were 0.824, 0.807, and 0.762, respectively (Fig. 6E). In addition, the DCA curve exhibited the best net benefit of the combined model for 1-year, 3-year, and 5-year RFS prediction compared with the individual predictive factors (Fig. 6F-H). Summarizing, these results demonstrated that the combined model of nomogram exhibited an excellent predictive ability for 1-year, 3-year, and 5-year RFS of HCC patients, which might help the clinical therapy decision.
Genetic alteration of gene signature correlated with poor survival probability.
We queried the genetic alteration of the seven-gene signature in the Liver Hepatocellular Carcinoma cohort (TCGA, Firehose Legacy) of the cBioPortal database. Among 352 HCC patients, 95 patients (27.0%) shown genetic alterations of gene signature (Fig. 7A). In addition, the patients with genetic alteration have poorer overall survival probability (P=2.98e-3, Fig. 7B), disease-free probability (P=0.0274, Fig. 7C), progression-free survival probability (P=0.0474, Fig. 7D), and disease-specific survival probability (P=0.0118, Fig. 7E) than the patients without genetic alterations. We further investigate the protein expression of gene signature between the HCC tissues and adjacent normal liver tissues in the Human Protein Atlas database. In HCC tissues, the expression of MAPK9, SLC1A4, ACSL3, and STMN1 proteins increased, while the expression of PCK2 and CDO1 proteins decreased (Fig. 7F). However, we didn't find the CXCL2 protein expression in the Human Protein Atlas database.
GO, KEGG enrichment analysis and protein-protein interaction (PPI) network constructions of FDEGs.
We performed the GO and KEGG enrichment analysis on the 39 FDEGs to explore the potential mechanism that these genes regulating the tumorigenesis and progression of HCC. GO analysis revealed that in the biological process, the FDEGs significantly enriched in the pathway of response to nutrient levels, response to the metal ion, response to starvation, response to oxidative stress, etc (Fig. 8A). In the cellular component, the FDEGs are mainly enriched in neuron projection cytoplasm, pigment granules, melanosome, etc (Fig. 8B). In the molecular function, the FDEGs are mainly enriched in protein serine kinase activity, decanoate-CoA ligase activity, etc. (Fig. 8C) KEGG analysis showed that the FDEGs significantly enriched in the pathway of Ferroptosis, Hepatocellular carcinoma, MAPK signaling pathway, and other cancer-related pathways (Fig. 8D). We further constructed a PPI network of these FDEGs in the STRING database and visualized it utilizing the Cytoscape software (Fig. 8E).
GSEA enrichment analysis.
We performed the GSEA analysis to further investigate the significant signaling pathway that genes of the high-risk score and low-risk score patients were enriched. We found that the high-risk score group was significantly correlated with pathways of cancers, such as cell cycle (NES=1.5, P<0.001), notch signaling pathway (NES=1.5, P<0.001), pathway in cancer (NES=1.5, P<0.001), VEGF signaling pathway (NES=1.7, P<0.001), etc (Fig. 8F). Meanwhile, the low-risk score group was negatively correlated with the signaling pathway of lysine degradation (NES=-2.2, P<0.001), peroxisome (NES=-2.1, P<0.001), propanoate metabolism (NES=-1.9, P<0.001), etc (Fig. 8G).