Construction and validation of that as-constructed gene signature for prognosis prediction in TCGA
Figure 1 shows gene expression levels. Data from The Cancer Genome Atlas-gastric cancer (TCGA-STAD) were collected for training that prognostic model. In addition, 16 genes related to survival were identified upon univariate Cox regression analysis, among which, 13 were found upon Lasso-penalized Cox analysis for constructing that prognostic model. To be specific, those 13 genes included GSTA2, POLD3, GLA, GGT5, DCK, CKMT2, ASAH1, OPLAH, ME1, ACYP1, NNMT, POLR1A, and RDH12. The risk score was calculated by the formula 0.0152 * GSTA2 expression - 0.0058 * POLD3 expression - 0.0350 * GLA expression + 0.0092 * GGT5 expression - 0.0088 * DCK expression+ 0.0784* CKMT2 expression + 0.0117* ASAH1 expression- 0.0105 * OPLAH expression+ 0.0244 * ME1 expression - 0.0452 * ACYP1 expression + 0.0035* NNMT expression - 0.0566 * POLR1A expression + 0.0090 * RDH12 expression.
All cases were classified as the low- or high-risk group according to the best threshold of risk score. (Figure 2, and 3) For the TCGA, the areas under receiver operating characteristic (ROC) curve Under ROC Curve (AUC) of the time-dependent prognosis were 0.695, 0.592, 0.574, 0.572, 0.568, 0.558, 0.536, and 0.520 for risk score, stage, N stage, age, grade, T stage, gender, and M stage, separately. In addition, high risk group had evidently poor overall survival (OS) compared with that in low risk one (P<0.001, Figure 4). Afterwards, that as-constructed prognostic model was validated using the GSE84437 cohort. The AUC values for N stage, T stage, age, risk score, and gender were 0.691, 0.608, 0.574, 0.514 and 0.484, respectively. (Figure 4)
Correlations between prognostic model and various clinicopathological features
A total of 813 patients (including 380 from TCGA and 433 from GSE) with complete data about gender, age, and Tumor-Node Metastasis (TNM) stage were included to compare hazard ratios (HRs) of the clinicopathological characteristics. As for the TCGA dataset, results of univariate cox regression analysis revealed that, risk score, N stage, T stage, and stage were related to prognosis. However, multivariate cox regression analysis revealed that, only risk score (HR 5.955, 95%CI 3.267-10.855), gender (HR 1.596, 95%CI 1.038-2.452) and age (HR 1.035, 95%CI 1.014-1.056) were associated with prognosis (Figure 5). For the GEO dataset, gender, age, risk score, N stage, and T stage were included to carry out univariate cox regression analysis; however, only age (HR 1.025, 95%CI 1.013-1.038), T stage (HR 1.600, 95% CI 1.255-2.039), N stage (HR 1.501, 95%CI 1.275 -1.766), and risk score (HR 1.677, 95%CI 1.177-2.390) were included in the results (Figure 6). Moreover, the nomogram was constructed by including the prognostic model. (Figure 7)
Gene Set Enrichment Analysis (GSEA)
Upon GSEA, a total of 31 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with significant enrichment were identified based on TCGA-STAD, most of which were associated with the metabolism (including the metabolism of purine, pyrimidine, dicarboxylate, and glutamate) or metabolic disease (such as the Huntingtons Disease). In the meantime, the remaining ones showed no correlation with metabolism, but they displayed frequent deregulatin in tumors. (Figure 8)