LASSO regression was adopted to build the risk score model for prognosis prediction in the training cohort at a 20-fold cross-validation (Fig. 3). 10 lncRNAs related to DFS and 10 lncRNAs related to OS were selected by lambda.min value. The expression levels of these prognostic lncRNAs and their coefficients were constructed signatures respectively. The disease free survival risk score = (0.13915 * AC009005.2) + (0.14837 * CASC9) + (0.26053 * CTC_338M12.5) + (0.13069 * DYNLL1_AS1) + (-0.06954 * LINC01018) + (0.08498 * PRRT3_AS1) + (0.13913 * RP11_488L18.10) + (0.22265 * RP11_739N20.2) + (0.19098 * RP11_977G19.5) + (0.06799 * WAC_AS1). The overall survival risk score = (0.10825 * AC007405.6) + (-0.12949 * CTC_297N7.9) + (0.17571 * CTD_2510F5.4) + (-0.07341 * F11_AS1) + (-0.00409 * LINC00152) + (0.31658 * LINC01138) + (0.20993 * PCAT6) + (0.21144 * PRRT3_AS1) + (0.25757 * RP11_307C12.11) + (0.31922 * RP11_479G22.8).
Validation and confirmation of prognostic performance of the lncRNAs signature
Using those formulas to figure up the risk score of each patient. All patients were classified into high-risk and low-risk groups in the light of the cut off value. Kaplan Meier curves showed patients with high-risk scores had shorter DFS (P < 0.0001) and OS (P < 0.0001) than those with low-risk scores in the training group. Similar results were observed in the validation and entire set (Fig. 4).
In addition, we analyzed the clinicopathological characteristics between the high-risk and low-risk group in training, validation and entire set respectively by chi-square test. Detailed information was showed in Table 2. For the DFS-related group, the distribution of grade between high-risk and low-risk group showed a significant difference in the training set(P = 0.002). In the validation set, there were statistically significant differences in pT(P = 0.008), stage(P = 0.006) and grade(P = 0.004) between high-risk and low-risk group. For the OS-related group, similar results were obtained in the entire set. There were significant differences in pT(P = 0.002), stage(P = 0.004) and grade(P < 0.001) between high-risk and low-risk group. The proportion of patients with pT3-pT4, stage III-IV and grade 3–4 were higher in the high-risk group than low-risk group. On the other hand, for the DFS-related cohort and OS-related cohort, patients with pT3-pT4, stage III-IV and grade 3–4 had higher risk scores than the patients with pT1-pT2, stage I-II and grade 1–2 in the entire set (Fig. 5).
Univariate and multivariate Cox regression were adopted with the lncRNA-based signature and clinicopathological characteristics. The outcome was presented in Fig. 6. The lncRNA-based signature was an independent predictive factor of DFS and OS for training and entire sets (P < 0.05). In the univariate analyses, pT and stage of the tumor were demonstrated to be risk factors for both DFS and OS of hepatocellular carcinoma patients in training, validation and entire cohorts (P < 0.05). Pathological metastasis was also a significant risk factor for DFS and OS in the univariate regression (P < 0.05).
The performance of the 10-lncRNA signature was evaluated by the area under the ROC curve. The area under curve (AUC) value of the 1,3,5-year DFS predicted in the entire set was 0.73, 0.69 and 0.68 respectively (Fig. 7A). The AUC value of the 1,3,5-year OS predicted in the entire set was 0.76, 0.72 and 0.71 (Fig. 7B). All of the AUC values showed the good performance of lncRNA-based signature. To explore the prognostic value of lnc-RNA based signature, we compared the AUC value of lncRNA-based signature and American joint committee on cancer (AJCC) 8th TNM stage (Fig. 8). The AUC value of TNM stage for DFS was 0.67 lower than the AUC value of lncRNA-based signature. However, the AUC value of TNM stage combined lncRNA-based signature was 0.79 better than the AUC value of TNM stage or lncRNA-based signature alone. Similar results were obtained for OS analysis, which suggested that the combination of TNM stage and lncRNA-based signature could improve the prognosis predictive ability.
What’ more, in order to further analyze the applicable clinical characteristics of lncRNA-based signature, we divided all the patients into different subgroups by TNM stage integrating with lncRNA-based signature. Kaplan Meier curves were performed in the subgroups (Fig. 9). The survival curves indicated that patients of stage I/II in the low-risk group had better survival rate than which in the high-risk group (P < 0.0001), patients of stage III/IV in the high-risk group had worse survival prognosis than which in the low-risk group (P < 0.0001).
Functional enrichment analyses
Firstly, 3212 and 250 target genes that were co-expressed with DFS/OS related lncRNAs were extracted from RAID 2.0 database. These significantly correlated genes were used for GO analysis and KEGG enrichment analysis to determine the potential mechanism of the DFS/OS related lncRNAs in regulating HCC (Fig. 10). GO enrichment analysis found the DFS related DElncRNAs were mainly enriched in organelle fission, nuclear division mitotic nuclear division, and chromosome segregation, etc. Moreover, regulation of mRNA metabolic process, regulation of RNA stability and pri-miRNA transcription by RNApolymeraseII were the most enriched function with OS related DElncRNAs. KEGG analysis showed that the target gene of DFS related DElncRNAs had significant link to lots of enriched pathways, including cell cycle, p53 signaling pathway, tumor necrosis factor signaling pathway, and MAPK signaling pathway. What’s more, the KEGG analysis of OS related DElncRNAs revealed that their targeted genes were involved in transcriptional misregulation in cancer, human T-cell leukemia virus 1 infection, interleukin-17 signaling pathway, cell cycle, and transforming growth factor–beta signaling pathway.