Clinical characteristics of the study populations
The study was performed on 384 patients who were clinically and pathologically diagnosed with GC. Of these patients, 247(64.32%) were male and 137(35.68%) were female. The median age at diagnosis was 68 years (range, 35–90) and the median RFS was 383 days respectively. The 3-year RFS rate of all patients was 10.4%. The pathologic stage was defined according to the American Joint Committee on Cancer (AJCC) Cancer staging manual. The stage of GC patients ranged from I to IV, and 56 (14.58%) patients in state I, 119 (30.99%) patients in stage Ⅱ, 144 (37.5%) patients in stage Ⅲ, and 42 (10.94%) in stage IV, 23 (5.99%) patients in stage X (X: the stage can not be identified). The GC dataset from TCGA consists of three histological types: stomach adenocarcinoma (213, 55.49%), stomach-intestinal adenocarcinoma (170, 44.27%), not available (1, 0.26%). Patients were separated into three groups according to the cancer status of samples, which contains tumor free (262, 68.23%), with tumor (69, 17.97%), indeterminate (53, 13.80%). In addition, race list included Asian, Black or African American, native Hawaiian or other Pacific Islander, White, indeterminate, and the White group was the most common (239, 62.24%). The complete list of clinicopathological characteristics of all the included patients in TCGA and GEO databases was detailed in Table 1. The study flowchart was presented in Figure 1.
Gene sets enrichment analysis and PPI analysis
Figure 2A and Figure 2B showed the top 10 enriched GO terms and genes correlated with the top 5 GO terms respectively. Figure 2C and Figure 2D listed the top 10 KEGG pathways and genes correlated with the top 3 enriched KEGG pathways respectively. Genes with interaction greater than 8 were set as hub genes. Finally, 6 hub genes were selected based on PPI analysis: (WDR5, TBP, PAX5, MYB, POU5F1, SMAD3) (Figure 2E). The top 2 core sub-modules from the PPI network were employed for annotating gene function. Enrichment analysis indicated that the genes in these 2 sub-modules were primarily associated with DNA replication and wnt signaling (Figure 2F).
Identification of TFs significantly associated with RFS and establishment of prognostic signatures
Univariate Cox regression analysis and LASSO Cox regression analysis were conducted to identify the relationship between the 722 TFs and RFS in patients with GC. As a result, 28 TFs (Figure 3A & 3B) were revealed to be significantly correlated with GC patients’ RFS after LASSO Cox regression analysis (Figure 3A & 3B). Finally, 14 TFs (NOTCH3, NR5A1, WDR5, RARB, SRCAP, SMAD3, ONECUT1, PITX3, TRAF6, MTA2, JDP2, FOSL1, GLI1, MTFl) were revealed to be significantly related to GC patients’ RFS by multivariate Cox analysis. Risk score = 6e-05*NOTCH3 + 0.00878*NR5A1 - 0.00124*WDR5 + 0.00233*RARB + 1e-04*SRCAP + 0.00025*SMAD3 + 0.00217*ONECUT1 + 0.08996*PITX3 - 0.00186*TRAF6 - 0.00039*MTA2 - 0.00233*JDP2 + 0.00012*FOSL1 + 0.00196*GLI1 - 0.00257*MTF1. The 14-TF signature was employed for predicting RFS of GC patients. Obviously, the high TF expression of NOTCH3, NR5A1, RARB, SRCAP, SMAD3, ONECUT1, PITX3, FOSL1 and GLI1 was corresponding to a higher risk. Nevertheless, the low TF expression of WDR5, TRAF6, MTA2, MTF1 and JDP2 was corresponding to higher risk (Figure 4) (Figure S1).
Relationship between the 14-TF signature and GC patients’ RFS in internal validation dataset and external validation dataset as well as the whole dataset
Kaplan–Meier analysis was applied to measure the difference in RFS between the two groups. RFS for the high-score GC patients was shorter than that for the low-score GC patients in internal validation set (P= 3e-10) (Figure 5A). A similar outcome was observed in the external validation dataset (p =6e-05) (Figure 5C) and entire dataset (p = 1e-13) (Figure 5E).
Evaluation of the predictive performance of the 14-TF signature by using ROC analysis
Time-dependent ROC curves were drew to assess the predictive power of the 14-TF signature. The AUC of the 14-TF signature at 1, 3, 5 years in internal validation dataset were 0.827, 0.817, 0.811, respectively (Figure 5B). A high predictive power was also presented in external validation dataset (0.808, 0.907, 0.813) (Figure 5D) and entire dataset (0.815, 0.849, 0.801) (Figure 5F). The result suggested that the 14‐TF signature was a stable predictor for RFS of GC patients.
Furthermore, patients were ranked with the risk scores (Figure 6A), and the dot plot was drew via their survival status (Figure 6B). The outcome implied that the high-risk cohort generated a greater mortality rate than that in the low-risk cohort. Heatmap of 14 TFs grouped according to risk score was presented in Figure 6C, which confirmed our previous boxplot. A similar result was obtained in GSE26253 (Figure S2). Besides, subgroup analysis was acted by a few clinicopathological factors consisting of age, gender, stage, histologic type, anatomic site and metastasis status. The result demonstrated a good predictive power of the 14-TF in the majority of sub-groups (Figure S3-S7).
Determination of the 14-TF signature-related biological pathways
Patients were assigned into high- or low-risk cohort in accordance to the cutoff of the median risk score. Top 20 pathways that were more activated in the high-risk patients than that in low-risk patients were exhibited in (Figure 7A) (Table S4). The same trend was evident in the enriched pathways and risk score (Figure 7B), suggesting a good correlation between the pathways and the risk score.
Nomogram development
We performed univariate and multivariate Cox model via TF related risk score and a few other clinicopathological factors to weigh independence of the 14-TF signature as a prognostic predictor of GC patients. Hazard ratios (HRs) demonstrated that the 14-TF signature was importantly correlated with the RFS of GC patients (P= 2.60E-13, HR 2.11, 95% CI 1.72-2.57) by the outcome of Cox regression analysis (Table 2) (Figure 8), implying that the 14-TF signature functioned as an independent prognostic predictor. A nomogram (Figure 9) combining TFs risk score with other clinical factors (p<0.2 in multivariate Cox analysis) was developed. The importance between the 14-TF signature and the clinicopathological factors was observed in (Figure 10A). The result showed that C-index (0.788, 95%CI: 0.741-0.835), AUC (0.865, 0.921, 0.907) (Figure 10B) and calibration plot presented a good performance (Figure 10C & 10D & 10E). In addition, the DCA implied that the established nomogram had more crucial clinical value for the prediction of RFS in GC patients than that in treat all or treat none cohort. The particular benefit was obtained for GC patients’ 3-year recurrent risks (Figure 10F), suggesting strong robustness of our model.