Construction of immune signature
In order to identify gastric cancer specific immune related gene, we first perform differential gene analysis by DEseq2 R package. There are 4567 differentially expressed genes compared with adjacent normal tissue. Then, we check the immune gene list among all the three cohorts to make sure they express in all gastric cancer samples. A total of 608 immune-related genes were identified as differential expressed in gastric cancer tissues (Figure 1A). The univariate analysis was applied with all the 608 immune related genes for TCGA STAD datasets. There were 104 most remarkable genes filtered out with the criteria of a p-value less than 0.05. Multivariate Cox analysis were performed on the 104 immune related genes to generate the best gene signature in terms of predicting the overall survival of gastric cancer patients and 47 genes were eventually selected to develop a prognostic mode. The risk score of each patient was calculated based on the sum of the average expression of genes multiplied corresponding coefficients in multivariate Cox analysis, gene list and the coefficient values were showed in Table S1. Then all the 862 stomach cancer patients in three cohorts were classified into a low-risk group and a high-risk group in terms of the median risk score. The overview of survival time and expression heatmap of genes in our model were presented in Figure 1B-D.
Validation of Immune-related Prognostic Signature
To sustain our immune gene signature, we calculated the c-index for the forecast of overall survival. The c-index for TCGA dataset, GSE62254 and GSE15459 were 0.7754, 0.7082 and 0.7742 respectively (Figure 2A), which meant the high predictive veracity of the signature for survival. The area under the ROC curve for 5-year OS in TCGA dataset, GSE62254 and GSE15459 were 0.730, 0.733 and 0.798 respectively, indicating that this prognostic model presented a wonderful sensitivity and specificity (Figure 2B-D). Survival analysis presented that patients of high-risk showed dramatically poorer overall survival than those of low-risk in all the three datasets (P < 0.001; Figure 3A-C).
Independence of Immune-related Gene Signature from Other Clinical Features
A total of 862 gastric cancer patients with clinical information, including gender, age, tumor stage, TNM classification were selected for analysis. We further analyzed 300 patients of GSE62254 dataset with more detailed data, such as recurrence, MLH1 immunohistochemistry (IHC) and EBV in-situ hybridization (ISH). The univariate Cox analysis represented that stage III( p<0.05), stage IV (p<0.005), M1 (p=0.0157), N1-N3 (p<0.05), MLH1 IHC (p=0.0012), recurrence (p<0.0001) and risk score (p<0.0001) were significantly associated with overall survival (Figure 4). Multivariate Cox analysis further indicated that our immune related gene signature could be independent predictor of patients’ overall survival after regulated by other clinical factors in TCGA dataset [Hazard ratio (HR) =1.2865, 95% confidence intervals (95% CI) 1.2236-1.3526, p<0.0001], GSE62254 dataset (HR=1.3200, 95% CI 1.1965-1.45, p<0.0001) and GSE15459 dataset (HR=1.0240, 95% CI 1.0132-1.0350, p<0.0001), as demonstrated in Figure 5.
Association of Risk Score with Immune Cell Infiltration
As shown in Figure 6A, infiltration level of eosinophils, macrophages, mast cells, Natural killer cells (NK cells), Plasmacytoid dendritic cells (pDC), gamma delta T cells (Tgd cells) and angiogenesis related immune cells in high-risk group were higher than low-risk group, whereas infiltration level of T helper cells and Th2 cells in low-risk group were higher than high-risk group.
In addition, the relationships between immune cell infiltration and the risk score were analyzed by Pearson correlation. Eosinophils, macrophages, mast cells, Natural killer cells (NK cells), Plasmacytoid dendritic cells (pDC), gamma delta T cells (Tgd cells) and angiogenesis related immune cells were positive correlated with risk score. However, T helper cells and Th2 cells were negative correlated with risk score (Figure 6B).
Association of Risk Score with Tumor Mutation Burden and Neoantigen
Therapeutic efficiency of immune checkpoint inhibitor could be estimated with nonsynonymous mutation burden20, 21. Thus, we detected the association of our immune signature and each type of nonsynonymous mutation. Furthermore, we would like to explore whether our immune signature could be an predictor of reaction to immune checkpoint inhibitor in terms of the number of neoantigen. In general, patients with low risk score revealed higher nonsynonymous mutation load than those with high risk score (p=0.012, Figure7A). In addition, low-risk group patients had higher missense mutation (p=0.011, Fig. 7B), splice site mutation (p = 0.015, Fig. 7D) and Single Nucleotide Polymorphism (SNP) (p=0.012, Figure 7I) .We did not find this relationship in nonsense mutation (p = 0.052, Fig. 7C), frameshift mutation (Figure 7E and F), inframe deletion (p = 0.053, Figure 7G), inframe insertion (p=0.45, Figure 7H), total deletion mutation (p=0.075, Figure 7J), and total insertion mutation (p=0.37, Figure 7K). Moreover, we found there be a positive correlation between the signature and the number of neoantigens in TCGA STAD dataset (Figure 7L).