3.1 The characteristics of patients in GEO and TCGA cohort
The overall process for our study was simplified, as shown in a flowchart in Figure.1. Specific clinical information for each patient in GEO or TCGA cohort was collected into Supplementary Tables 1 and 2. After calculation by CIBERSORT, sample with p-value > 0.05 were discarded. 100 sets of non-tumor samples and 299 sets of tumor samples in GEO cohort and 13 sets of non-tumor samples and 166 sets of tumor samples in TCGA cohort remained. Clinical characteristics including age, gender, pathologic stage and TNM stage of these patients were counted and settled in Table 1.
3.2 Heterogeneity of TIIC distribution in normal and tumor samples
The proportion of 22 TIICs in GEO or TCGA cohort was summarized in histogram (Fig. 2A, 2B). Obviously, although the variation of TIICs in each cohort was distinct, the content of T cells CD4 naive in both cohorts was almost negligible. Therefore, we further analyzed the differential expression of other 21 TIICs in tumor and normal tissues from these two cohorts. As shown in Table 2 and Table 3, four TIICs accounted for more than half of the population in the normal samples of GEO cohort, including T cells CD4 memory resting, macrophages M2, plasma cells and mast cells resting, while plasma cells, T cells CD4 memory resting and T cells CD8 occupied more than half proportion of TIICs in the normal samples of TCGA cohort. For tumor samples, Macrophages M2, T cells CD4 memory resting, Plasma cells, Macrophages M1 and Neutrophils accounted for more than half of TIICs in GEO cohort, while T cells CD4 memory resting, T cells CD8, Macrophages M0, Macrophages M2 and Macrophages M1 accounted for the more than half of TIICs in TCGA cohort. Figures 2C and 2D revealed the difference in infiltrating TIICs between normal and tumor samples by violin diagrams with Wilcoxon’s Sign Rank test, and p < 0.001 was considered to be significant. In GEO cohort, there were 15 TIICs showed significant diversity in normal and tumor samples, with plasma cells, Macrophages M0, Macrophages M1 and Macrophages M2 also significantly different in the TCGA cohort. In both cohorts, the content of macrophages M0 and Macrophages M1 in tumor samples was significantly higher than that in normal samples, while the plasma cells were obviously lower than that in normal samples. It is worth noting that the comparison of Macrophages M2 showed contradictory results in two cohorts, comparing with tumor samples, the density of Macrophages M2 was higher in normal samples of the GEO cohort (18.37% ± 5.94%), while in the TCGA cohort, the infiltration of M2 cells in normal samples was lower (6.02% ± 2.45%). Figures 2E and 2F revealed the infiltration of 21 TIICs in samples of GEO and TCGA cohort by heatmap. Obviously, the composition of TIICs varies widely regardless within or between groups of normal and tumor. Therefore, the heterogeneity of TIICs in GC tissues may regulate different pathological processes, leading to different clinical prognosis.
Figure 2G reveals the correlation between each TIICs in tumor sample of GEO and TCGA cohort. The red represented positive correlation while the blue represented negative correlation. The correlation coefficient, also known as the R value, was shown in the box where every two TIICs intersect. The size of the bubble visualized the closeness of the correlation. As a result, two most positively correlated TIICs in the GEO cohort were activated CD4+ T cell and macrophages M1 with an R value of 0.65 (R = 0.42 in TCGA cohort), while in the TCGA cohort they were neutrophils and activated mast cells with an R value of 0.46 (R = 0.39 in GEO cohort). In contrast, the two TIICs with the most negative correlation in GEO cohort were resting CD4 memory T cells and activated CD4 memory T cells with an R value of -0.58 (R = -0.49 in TCGA cohort), while in the TCGA cohort they were CD8 T cells and resting CD4 memory T cells with an R value of -0.52 (R = -0.53 in GEO cohort).
3.3 Correlation of TIIC infiltration with clinical survival rate
We are following explored the effects of TIICs as a single factor on the survival rate of GC patients by analyzing clinical information provided in GEO and TCGA cohort, and results were summarized in Supplemental Figure S1. Among 21 TIICs, the activated CD4 memory T cells and the resting CD4 memory T cells showed a close correlation with survival rate in both two cohorts. As shown in Fig. 3A-3D, tumor with high-density activated CD4 memory T cells improved the long-term survival rate (p < 0.0001 in GEO, p = 0.029 in TCGA), while high-density resting CD4 memory T cells may significantly shorten patient’s survival time (p = 0.0096 in GEO, p = 0.034 in TCGA). In addition, high-density of macrophages M1 in the GEO cohort and CD8 T cells in the TCGA cohort were closely associated with a better prognosis, although not significant enough in each other’s cohort (Fig. 3E-3H).
3.4 Establishment of GC survival predictive nomogram
To further explorer the TIICs with effective prognosis value, we considered the GEO cohort as a training group and established an immune risk model. In addition, the TCGA cohort was set as the test group to verify the reliability. B cells naïve, Plasma cells, T cells CD4 memory activated, T cells gamma delta, Macrophages M2, Mast cells activated and Neutrophils were selected by the program of LASSO regression analysis and tenfold cross-validation (Fig. 4A, 4B), but B cells naïve and T cells gamma delta were excluded by stepwise filter optimization method. Finally, in the nomogram shown in Fig. 4C, age, gender and five kinds of TIICs were considered as indicators for predicting the prognosis. The “point” bar shown on the top reveals the independent risk of each indicators, and the “total point” bar on the bottom was the synthesis risk corresponding to the 1-, 3- and 5- survival rate. Obviously, the M2 macrophages and activated mast cells were strongly associated with poor prognosis, while plasma cells, activated CD4 memory T cells and neutrophils contributed to a better prognosis. In addition, ROC curves and calibration curves performed the predictive accuracy of this nomogram in the train group (GEO cohort) (Figure. 4D, 4F) and test group (TCGA cohort) (Figure. 4E, 4G), which demonstrated that this nomogram could precisely determine the prognosis survival rates.
3.5 The establishment and validation of survival immune risk (SIR) score model
To further explore the overall role of these five most useful prognostic TIICs in the course of disease, we constructed a "survival immune risk (SIR) score" consisting of the fractions of the above 5 TIICs (The formula based on the Z-index of Cox regression: SIRS = 2.724*Macrophages M2 + 3.030*Mast cells activated – 2.184*Plasma cells – 3.133*T cells CD4 memory activated – 2.643*Neutrophils). Then, SIR score was divided into “high-” and “low-” group based on the cutoff value of -0.03630041 (AUC = 0.687). As shown in Fig. 5A, patients with higher SIR in train group significantly impaired the survival prognosis, which was validated by test group (Figure. 5B). Two ROC curves perform the predictive accuracy of SIR for 1-, 3-, and 5-year survival rates of the train group (Figure. 5C) and test group (Figure. 5D).
3.6 The relationship between SIR score and clinical features
Then we analyzed the relationship between the SIR risk and clinical features including pathological stage, TMN stage and the pathological classification based on Laurén’s criteria in samples of GEO cohort. Obviously, the SIR score increased as the elevated pathological stage (Figure. 6A, p < 0.001), and patients in T3 and T4 stage has notability higher SIR score than T2 stage patients (Figure. 6B). In addition, N3 stage patients showed a significant higher SIR score than other patients (Figure. 6C). However, SIR scores are not different in different M stages (Figure. 6D). Laurén classification of gastric cancer was mainly divided into intestinal type carcinoma, diffuse gastric carcinoma and mixed type carcinoma according to cancer cell morphology and histochemistry. As shown in Figure. 6E, gastric cancer patients with diffuse type tumors possessed higher SIR scores than intestinal type. It is worth mentioning that in TCGA cohort, statistically significant differences were not exist in patients with different clinical traits (Supplemental Figure. S2)
3.7 Identification of SIR score associated oncogenic pathways
Patients in GEO cohort with high- and low- SIR scores were included in the GSEA analysis consist of KEGG (Supplemental Figure. S3). Obviously, the results showed that tumor tissues with high SIR scores significantly enhanced the capacity of oxidative phosphorylation, DNA repair, and glucose metabolism (Figure. 7A), while the pathways of immune cell response, cytokine reactivity, regulation of cytoskeleton and focal adhesion were enriched in low-SIR score tissues (Figure. 7C). However, similar pathways were not noticed in the TCGA cohort.