Groups and evaluation of tumor-infiltrating immune
The immune cell infiltration status was assessed by applying the ssGSEA approach to each tumor sample of TCGA STAD cohort. As shown in Figure 2, 375 tumor samples were distinctly divided into two clusters, including Immunity-H (n = 192) and Immunity-L (n = 183) groups based on the landscape of 28 immune cell subpopulations infiltrations in STAD. Detailed results from ssGSEA were presented in Table S1. Next, we calculated the stromal score, immune score and estimate score using ESTIMATE method (Table S2). As depicted in Figure 3A, there were significant differences in stromal score, immune score and estimate score between Immunity-H and Immunity-L groups, of which corresponding scores in Immunity-H group were notably higher than those in Immunity-L group. Moreover, the results from CIBERSORT algorithm on immune cell type showed that the fraction of some important immune cell subtypes varied distinctly between Immunity-H and Immunity-L groups (Figure 3B), which were summarized in Table S3. Collectively, the Immunity-H and L groupings obtained based on ssGSEA evaluation can be used for subsequent analysis.
Identification of DEIIGs
We screened the DEGs between tumor and normal samples, and Immunity-H and Immunity-L samples in TCGA dataset. The volcano plots were drawn to visualize DEGs between tumor and normal group, and Immunity-H and Immunity-L group (Figure 4A). A total of 894 DEGs between tumor and normal group, and 592 DEGs between Immunity-H and Immunity-L group were screened, which were listed in Table S4. Moreover, the number of overlapping DEIIGs was 191, including 6 lncRNAs and 185 mRNAs (Figure 4B, Table S5).
GO function and KEGG pathway analyses
GO function and KEGG pathway enrichment analyses were performed for 185 DEIIGs. These DEIIGs were significantly enriched in 11 biological processes, such as retinoid metabolic process, cell-cell signaling, collagen catabolic process, potassium ion transport and potassium ion transmembrane transport (Table 1, Figure 5). The KEGG pathway analyses showed that the DEIIGs were mainly concentrated in transcriptional misregulation in cancer, vitamin digestion and absorption, fat digestion and absorption, protein digestion and absorption, and gastric acid secretion (Table 1, Figure 5).
Construction of the DEIIG prognostic signature
Univariate Cox regression analysis was performed for the 191 DEIIGs, of which 32 DEIIGs showed significant prognostic potential (log-rank p value < 0.05). Next, total 13 independent prognostic DEIIGs were further screened via a multivariate Cox regression model. After that, we performed the LASSO Cox regression analysis to reduce the number of independent prognosis related DEIIGs with the optimal lambda and finally obtained nine prognostic DEIIG signatures with corresponding coefficients for further study (Table 2).
Evaluation of the prognostic performance of DEIIG signature
According to the risk coefficient of each gene and the gene expression level, the PS of each patient in training dataset was calculated, which is a linear combination of the expression level of each gene weighted by its multivariate LASSO regression coefficient. The samples in training dataset were assigned into high-risk and low risk groups with the median PS as the cutoff value. The survival analysis indicated that the survival rate was remarkably lower in the high-risk group as opposed to low-risk group (p-value < 0.001, HR = 2.230, 95% CI = 1.583-3.142); whereas, the ROC curve analysis showed acceptable discrimination with AUC of 0.824, and high sensitivity and specificity in training dataset (Figure 6A). In addition, the external dataset GSE62254 was used to validate the prediction performance of the nine prognostic DEIIG signature. With the aforementioned formula, we calculated individual PS and classified the patients in validation dataset into high-risk and low-risk groups. Consistently, a significant separation was shown in the KM survival curve in validation dataset (p-value < 0.05, HR = 1.405, 95% CI = 1.020-1.935) and ROC curve analysis demonstrated accepted discrimination with an AUC of 0.766 (Figure 6B). In general, the nine prognostic DEIIG signature performed well at predicting OS of STAD.
Identification of independent prognostic parameters of STAD
Total 351 patients from the TCGA STAD dataset for which complete clinical information was provided, including age, gender, pathologic-M, pathologic-N, pathologic-T, pathologic-stage, neoplasm histologic grade and radiation therapy were included in the univariate and multivariate Cox regression analyses (Table S6). As shown in Table 3, univariate analysis revealed that age (p = 6.91E-03), pathologic M (p = 1.13E-02), pathologic N (p = 1.71E-03), pathologic T (p = 8.09E-03), pathologic stage (p = 1.68E-05), radiation therapy (p = 2.29E-04), recurrence (p = 3.44E-06) and PS model status (p = 2.56E-06) were significantly correlated with overall survival of STAD patients. Multivariate analysis further screened that age (p = 4.17E-03), pathologic T (p = 3.19E-02), radiation therapy (p = 1.05E-02), recurrence (p = 6.33E-04) and PS model status (p = 2.72E-03) were independent risk factors of overall survival. The results from forest map clearly described that age, pathologic T, tumor recurrence and PS model status were tumor risk factors, while radiotherapy was tumor protective factor (Figure 7). Subsequently, we performed ROC analyses to assess how these independent risk factors could behave in predicting prognosis. As shown in Figure 8, the AUC of PS model status performed on overall survival in the training cohort was 0.824, which was superior to those of age (0.545), pathologic T (0.537), radiotherapy (0.544) and recurrence (0.640), which may be the best performance in predicting overall survival.
Correlation of PS with the proportion of TICs
Based on the expression levels of TCGA STAD samples, we used TIMER to analyze the proportion of six kinds of TICs (Table S7). Combining the results of correlation analysis (Figure 9), B cell, T cell CD4+, neutrophil, macrophage and myeloid dendritic cell were positively correlated with PS, whereas T cell CD8+ was negatively correlated with PS. Thus, the significant infiltration with these TICs may potential act as one of the critical factors that the nine DEIIG signature holds to influence the outcome of STAD pronounced.
Validation of on DEIIG signature in clinical specimens
As described above, we have identified nine-DEIIG prognostic signature baed on the TCGA database. To further verify our findings, 59 cases of STAD specimens were collected and performed with quantitative real time PCR. As expected, ADH4 was downregulated in tumor tissues compared with adjacent tissues (Figure 10A). Clinical analysis further demonstrated that decreased ADH4 was associated with TNM stage, lymph node metastasis (Table 4), and represented an independent risk factor for overall survival (Table 5, Figure 10B). Conversely, ANGPT2 was upregulated in tumor tissues compared with adjacent tissues (Figure 10C), which was correlated with TNM stage (Table 6) and worse prognosis (Table 7, Figure 10D).