Identification of differentially expressed apoptosis-related genes
We identified 39 differentially expressed genes(DEGs)between normal tissues and GC tissues. There were 26 up-regulated genes (CASP8, LUM, IFITM3, CCND1, SLC20A1, BIK, CDK2, TAP1, TNFRSF12A, KRT18, BID, EREG, PMAIP1, CASP2, LEF1, PDGFRB, AIFM3, BRCA1, CDC25B, PLPPR4, TIMP1, F2R, ERBB2, TOP2A, BGN, and F2) and 13 down-regulated genes (BTG2, GPX3, GADD45B, IGFBP6, RHOB, GSN, TGFBR3, GSTM1, CAV1, DCN, JUN, CDKN1A, and TXNIP). The expression distribution of DEGs is presented in Fig. 1A-C.
Functional enrichment analysis and protein interaction network
To determine the main biological function of the DEGs and the pathways involved, GO enrichment analysis and KEGG pathway analysis were performed on 39 DEGs. The results of GO enrichment analysis showed that the main biological pathways of DEG enrichment were the extrinsic apoptotic signaling pathway and positive regulation of the cell cycle. The main enriched cell components were the cycle-dependent protein kinase holoenzyme complex and lysosomal lumen. The main enriched molecular function was ubiquitin/ubiquitin-like protein ligase binding (Fig. 2A, B). In addition, KEGG pathway analysis showed that these genes were not only highly enriched in apoptotic pathways, but also involved in the regulation of the p53/PI3K-Akt signaling pathway, platinum drug resistance, Epstein-Barr virus infection and some carcinogenic pathways (Figure 2C, 2D). To further explore the interactions between these ARGs, we PPI analysis using the STRING database and visualized the results using Cytoscape. We finally identified 36 apoptotic genes as hub genes. Among them, 23 genes were up-regulated and 13 genes were down-regulated (Figure 2E).
Association between ARGs and gastric cancer patient survival and prognosis
After univariate Cox regression analysis, we obtained 10 ARGs associated with the prognosis of GC patients (P < 0.05). Among them, 9 genes (CAV1, DCN, F2, F2R, GADD45B, GPX3, IGFBP6, LUM, and PDGFRB) were negatively correlated with the prognosis of GC patients, and one gene (TAP1) was positively correlated with prognosis (Figure 3A). By performing LASSO Cox regression analysis, a 3-gene signature (CAV1, LUM and F2) was constructed according to the optimum λ value (Fig. 3B, C). By comparing the RNA expression and protein expression levels of 3 ARGs between GC tissues and normal tissues, we found that the expression levels of the 3 ARGs in GC samples were significantly increased compared with those in normal samples (Fig. 3D). To further analyse the correlation between the expression of 3 ARG genes and the prognosis of GC patients, we divided the samples into a high expression group (n=152) and a low expression group (n=153) using the median expression level of genes as the critical value. Kaplan–Meier analysis showed that the survival rate of the high expression group was significantly lower than that of the low expression group (Figure 3E). Gene mutations in 3 ARGs were analysed using the cBioPorta database (Figure 3F). A total of 434 patients were enrolled, of whom 46 (11%) had a genetic mutation. Among the 3 ARGs, F2 was the most prone to mutation (5%), while LUM was the gene with the lowest mutation probability but the greatest number of possible mutations (missense mutation, truncating mutation, amplification and deep deletion).
We further compared the expression differences of the 3 ARGs at the protein level between GC and normal tissues. Immunohistochemical results showed that the expression of CAV1, F2 and LUM at the protein level in gastric cancer tissues was higher than that in normal tissues (Fig. 4), which was consistent with our previous results and further verified the prognostic value of the three genes.
Establishment of a prognostic risk signature based on apoptosis-related genes
By combining the expression levels of 3 ARGs with the risk coefficient, the prognostic risk score formula of GC patients was established. Risk score= (0.1457×expression levels of CAV1) + (0.2978×expression levels of F2) + (0.1317×expression levels of LUM). After calculating the risk score of each patient according to the above formula, patients were divided into a high-risk group (n=152) and a low-risk group (n=153) using the median risk score as the critical value (Fig. 5A). The survival distribution of patients (Figure 5B) showed that patients in the high-risk group had more deaths and shorter survival times than those in the low-risk group. Kaplan–Meier (K-M) analysis showed a statistically significant difference in survival between the high-risk and low-risk groups (P < 0.0001) (Figure 5C). The survival rate of the high-risk group was significantly lower than that of the low-risk group, especially in regards to 2- and 3-year survival rates. A ROC curve showed that the prognostic model had good sensitivity and specificity, and the AUC values of the 1-year, 3-year and 5-year survival rates were 0.630, 0.657 and 0.722, respectively (Figure 5D).
Independent prognostic value of the risk model
To determine whether the risk score was an independent predictor of prognosis in patients with GC (P < 0.05), univariate and multivariate Cox regression analyses were performed to compare the risk score and other clinical prognostic variables. The results of univariate Cox regression analysis (Fig. 6A) showed that age, tumour stage, N stage and risk score were correlated with the prognosis of GC patients. Multivariate Cox regression analysis showed that risk score (P < 0.001) and age (P=0.001) were independent predictors of the prognosis of GC patients (Fig. 6B). In addition, ROC curves were used to verify the accuracy of each factor in predicting the prognosis of GC patients. Comparison of the AUC values of each curve, showed that the AUC value of the risk score (AUC=0.631) was significantly higher than that of the clinical variables, indicating that the risk score had higher predictive power and accuracy than the clinical variables (Fig. 6C).
External validation of the risk signature
The risk score of patients in the GEO dataset GSE84426 (n=76) was calculated by the previously established risk scoring formula. According to the median risk score, patients were divided into high-risk (n=38) and low-risk groups (n=38). GSEA was used to identify differences between the high- and low-risk groups stratified by the apoptosis-related signature. The results showed that ECM receptor interaction, the calcium signaling pathway, and focal adhesion were significantly enriched in the high-risk group (Fig. 7A). K-M survival analysis showed that patients in the high-risk group of the external independent cohort also had significantly lower outcomes than those in the low-risk group (P=0.018) (Fig. 7B). The AUC values of the ROC curve (1-year, 3-year and 5-year survival rates: 0.579, 0.605 and 0.650, respectively) further verified the high accuracy of the risk score (Fig. 7C).
Development of a prognostic nomogram
To apply the risk score to clinical practice, we integrated risk scores with clinical variables to construct a nomogram to predict the prognosis of GC patients. The forward stepwise selection method for optimizing AIC applied based on multivariate Cox analysis was used to ultimately identify two variables, risk score and age. According to the actual situation of each patient, the corresponding score of each factor was obtained, and the 1-year, 3-year and 5-year survival rates of the patient were calculated after obtaining total score (Fig. 8A). The results showed that higher final scores were correlated with shorter survival times. Furthermore, the calibration curves for 1-year, 3-year and 5-year survival rates were used to verify the prediction ability of the line graph, and the results showed that the predicted value was in strong agreement with the actual value (C index =0.607) (Fig. 8B-D).