Establishment of the apoptotic gene signature and risk score calculation
Initially, there were 415 DEGs, including up and down-regulated, in the training cohort (Figure 1a, Additional file 1a), 666 DEGs in the validation cohort (Figure 1b, Additional file 1b), and 1982 apoptosis-related genes from GSEA. Then, 36 AR-DEGs were defined as the intersection of the three sets (Figure 1c) and presented in Additional file 2. Furthermore, AR-DEGs, survival status, and survival time were put into the Lasso-penalized Cox regression model and fourteen apoptotic genes were chosen to conduct the signature when the partial likelihood deviance was minimized (Figure 2a, b). Based on the apoptotic gene signature, the risk score was calculated as follows: risk score=0.0364×expression of SOX4+(-0.2997×expression of PDE1B)+ 0.2101×expression of IVNS1ABP+0.4377×expression of TNFRSF25+(-0.2879×expression of GPLD1)+0.1611×expression of SERPINE1+1.3316×expression of FGF8+0.0890×expression of NRP1+0.1165×expression of UNG+(-0.1386×expression of IL2RB )+ (-0.0364×expression of CTLA4)+ (-0.3726×expression of CD3E) + (-0.2001×expression of FASLG)+ (-0.0560×expression of ATOX1). Moreover, the detailed description and coefficient could be found in Additional file 3.
Identification of the prognostic value of the apoptotic gene signature
Patients were subdivided into the high-risk group and low-risk group with the median risk score. In the training cohort, overall survival was significantly better in the low-risk group than the high-risk group and patients with low-risk scores tended to survive (P-value<0.0001, Figure 2c, Additional file 4). The 3-year AUC and 5-year AUC values were 0.762 and 0.818 respectively, both of which were significant (Figure 2d). As for the validation cohort, the high-risk group also showed greater median survival time than the low-risk group (P-value=0.0028, Figure 2e). And the time-dependent ROC revealed the prognostic accuracy of the risk score was 0.717 at 3 years, and 0.745 at 5 years (Figure 2f). Taken together, these results indeed demonstrated the prognostic value of the apoptotic gene signature in training and validation cohorts.
Subgroup analyses in the clinical characteristics using the signature
The purpose of subgroup analysis was to detect whether the apoptotic gene signature could benefit specific populations. Based on the risk score, patients in the training cohort were stratified according to gender, age, tumor size, cirrhosis, multinodular, AFP, ALT, TNM_stage, and BCLC_stage. The prognostic value of the signature was certified in all of the subgroup analyses (all P-value<0.01, Figure 3a, b,c, d, Additional file 5), except for female groups (Additional file 5d).
Construction and validation of the nomogram
In the training cohort, univariate Cox regression analysis revealed that tumor size, cirrhosis, pathological TNM_stage, BCLC_stage, AFP, and risk score of the apoptotic gene signature were the prognostic factor for overall survival (Figure 4a, Table 1). Then, the significant variables were included in multivariate Cox regression analysis, and only BCLC_stage, cirrhosis, and risk score were still independent prognostic factors (Figure 4b). The forest map and nomogram were plotted built on the three variables. When patients met specific conditions, they received corresponding points and the higher total points got, the worse outcome was (Figure 4c). Moreover, three internal validation groups (70 subjects per group) demonstrated that the nomogram approached an ideal model (Figure 4f,g) and had better predictive performance than cirrhosis or BCLC_stage at 3 years and 5 years with the decision curve analysis (Figure 4d,e).
As for the validation cohort, univariate and multivariate Cox regression analyses indicated that pathological N_stage, M_stage, and risk score were selected to build nomogram (Figure 5a, b, Table 2). Similarly, the nomogram and risk score proved the accuracy and clinical utility for 3-year and 5-year survival prediction (Figure 5c, d, e). And we also generated three internal validation groups (120 subjects per group) randomly to testify the deviations from the ideal model (Figure 5f, g).
In short, the risk score based on the apoptotic signature brought obvious net benefit for prognosis, alone or in combination with classical clinical features.
Potential mechanisms of the apoptotic gene signature
Subsequent exploration of the underlying mechanisms was conducted using PPI, GO, and KEGG analyses. A total of 69 genes encoding proteins were enriched and visualized in Additional file 6. Several genes were previously reported to participate in apoptosis and cancer tumorigenesis, including caspase family, FAS, VEGFA and so on. The cut-off of the combined score was set at 0.4 (high) and the detailed information was shown in Additional file 7. Then, the 69 genes were analyzed from the biological process (BP), cellular component (CC), molecular function (MF), and KEGG pathway perspectives. And genes of the signature mainly mapped to the chromosome and took part in growth factor, death receptor, and tumor necrosis factor receptor binding (Figure 6a). Additionally, some of the signaling pathways were related to multiparous tumors and enriched in analysis, including the PI3K-Akt signaling pathway, p53 signaling pathway, MAPK signaling pathway, and TNF signaling pathway. Interestingly, the Hepatitis B pathway was also associated with the genes of the apoptotic signature (Figure 6b).
Extra validation of the 14 genes of the apoptotic signature
In the GEPIA2 database, the mRNA expression of the 14 apoptotic genes was concordant with our previous study. The overexpressed genes included SOX4, IVNS1ABP, TNFRSF25, CTLA4, CD3E, FASLG, FGF8, ATOX1, NRP1, and UNG; while the down-regulated ones were PDE1B, GPLD1, SERPINE1, and IL2RB (Additional file 8). Moreover, the protein level expression of IVNS1ABP, NRP1, SOX4, and UNG was significantly increased in immunohistochemistry (Figure 7a). Unfortunately, the remaining 10 genes were not recorded in the protein atlas. In the cBioportal database, FASLG possessed the most frequent genetic alterations (8%), followed by IVNS1ABP (7%), GPLD1 (3%), SOX4 (2.5%) and TNFRSF25 (2.5%). The most frequent alteration type was amplification and deep deletion (Figure 7b). Notably, FALSG would be up-regulated in the primary tumor with metastasis than those without metastasis in HCC patients. And this suggested that FALSG might play a vital role in the neoplasm metastasis according to the HCMDB database (Additional file 8o).