Markers of stromal cells identified from scRNA-seq analysis
To accurately obtain marker genes of stromal cells, bioinformatics analysis was performed on a scRNA-seq dataset of 21 comprehensive GC samples, which contained cases from stage I to IV, different pathological subtype (intestinal and diffuse) and MMR status (dMMR and pMMR) (Supplementary Table S1). After filtering low-quality cells and doublets, 62376 cells were included in the subsequent analysis. We performed batch correction, unsupervised clustering and obtained 27 cell clusters (Figure 1A, B). Through finding differentially highly expressed genes in each cluster as described in method, we identified marker genes of all clusters. Using cell type-specific classical markers, these cell clusters could be readily annotated as known cell types, mainly including stromal cell (fibroblast, pericyte, EC), immune cell (T cell, NK cell, B cell, plasma, macrophage, dendritic cell, mast cell) and epithelial cell, which covered almost all cell types within GC (Figure 1C). A small cell number of cluster16, 25 with ambiguous marker genes were disregarded. We further identified marker genes of each cell type. The average expression of classical markers across each cell type were showed by heatmap (Figure 1D).
To verified the annotation of each cell type, GO and KEGG enrichment analysis was performed on their marker genes. Stromal cells (fibroblasts, pericytes, ECs) were mainly involved in function of extracellular matrix composition, conferring tensile strength and growth factor binding, as well as pathways of focal adhesion, ECM-receptor interaction, vascular smooth muscle contraction and PI3K-Akt signaling, which had a distinct functional and pathway enrichment profile from immune cells and epithelial cells (Figure 1E, F; Supplementary Figure S1).
Finally, marker genes of fibroblast, pericyte and EC were pooled, resulting in a gene set containing 572 genes, which were marker genes of stromal cells (Figure 1G; Supplementary Table S2).
Stromal prognostic signature construction
Two-step analysis was applied to construct stromal prognostic model. Based on training cohort of GC in TCGA, univariate Cox regression analysis was performed on 572 stromal marker genes, and 239 prognostic genes with p<0.05 were obtained (Supplementary Table S3). Then, LASSO Cox regression method was applied to construct the prognostic signature. We chose the lambda.min value as optimal parameter and obtained a signature containing 7 genes with positive coefficients (Figure 2A, B). A seven-gene prognostic model was built by their expression level and corresponding coefficient: risk score = (0.083687924*SERPINE1 expression) + (0.018012147*SLCO2A1 expression) + (0.003779246 *GJA1 expression) + (0.002782004*SDC2 expression) + (0.011735780*BEX3 expression) + (0.028603108*PLOD2 expression) + (0.171599748*MARCKS expression).
We next verified prognostic value of the stromal risk score in training cohort. Stromal risk score was calculated for each patient. Ranking of risk score and heatmap demonstrated expression profile of the seven genes in cohort cases (Figure 2C). Then, patients were divided into high-risk and low-risk group according to the optimal cutoff of risk score. As Kaplan-Meier curve shown, high-risk group had a significant poorer OS than low-risk group (p=2e-9) (Figure 2D). Time-dependent ROC analysis was used to explore predictive power of the risk model. The area under the curve (AUC) for OS prediction were 0.65 (1 year), 0.70 (3 years), and 0.77 (5 years) (Figure 2E). Furthermore, taking risk score, TNM stage, gender, and age as covariates, multivariate Cox regression analysis revealed that stromal risk score was an independent risk factor with HR=12.4 (95% CI=4.82-31.8, p<0.001, Figure 2F).
Predictive performance validation of the stromal prognostic signature
Three external cohorts of GC, including GSE15459 of 182 cases, GSE62254 of 297 cases, GSE84437 of 431 cases, were used to validate prognostic value of the stromal signature. The same calculation formula of risk score described above was applied for all cohorts. Risk score ranking and gene expression heatmap displayed expression profile of the stromal signature in each cohort (Figure 3A, 3D, 3G). Then, patients were divided into high-risk and low-risk group based on optimal cutoff value in each cohort. According to the Kaplan-Meier survival analysis, high-risk group had poorer OS than low-risk group in all of cohort GSE15459 (p=2e-4), GSE62254 (p=9e-5) and GSE84437 (p=8.96e-5) (Figure 3B, 3E, 3H). Besides, using risk score, TNM stage, gender and age as covariates, the result of multivariate Cox regression analysis shown the independence of the stromal risk score as a risk factor, with HR=5.13 in GSE15459 (p=0.005), HR=745 in GSE62254 (p<0.001) and HR=2.89 in GSE84437 (p=0.021) (Figure 3C, 3F, 3I).
Correlation between the stromal signature and clinicopathological features
High-risk cases identified by the stromal prognostic model was discovered to have a relationship with advanced TNM stage and invasive molecular subtype (p<0.05, Table 1). Through analyzing the differences of risk score between groups, risk score between age groups and gender groups didn’t have a significant difference (Figure 4A, 4B). Whereas, advanced TNM stage group had higher score than early TNM stage (p=0.027, Figure 4C), diffuse type of Lauren had higher score than intestinal type (p=0.035, Figure 4D).
In addition, we explored prognostic value of the stromal signature in each subgroup stratified by TNM stage or Lauren subtype. As shown in Figure 4E-4H, high-risk patients identified by the stromal signature had worse OS than low-risk patients in each subgroup (p<0.05, Figure 4E-4H).
The stromal signature was predictive of chemosensitivity and immunotherapy response
We explored the efficacy of the stromal signature to guide systemic therapy. Firstly, “pRRophetic” R package was applied to infer drug IC50. In each dataset of GSE15459, GSE84437 and TCGA, through comparing IC50 between risk groups identified by the stromal signature, high-risk group was more sensitive to docetaxel and bortezomib (p<0.05, Figure 5A-C). Then, we used TIDE algorithm to infer response to immune checkpoint inhibitor. In GSE15459 dataset, high-risk group identified by stromal signature had higher rate of no-response to immunotherapy than low-risk group (no-response rate: 73%VS46%, p<0.001, Figure 5D). Besides, patients of no-response had significantly higher stromal risk score than responders (p<0.001, Figure 5E). The same tendency could be seen in the other datasets. Stromal high-risk group was less sensitive to immunotherapy compared with low-risk group in GSE84437 (no-response rate: 77%VS53%, p<0.001, Figure 5F) and TCGA (no-response rate: 77%VS61%, p=0.01, Figure 5H). Patients with no-response to immunotherapy had higher risk score than responders (p<0.001, Figure 5G, 5I).
Underlying mechanism of the stromal signature
Since the stromal signature was highly correlated with poor prognosis, as well as resistant to immunotherapy, we are curious to explore its potential mechanism. Through GO enrichment analysis of differentially highly expressed genes in high-risk group identified by the stromal signature, high-risk cases enriched in terms of extracellular matrix organization, angiogenesis, positive regulation of cell migration and connective tissue development, which were the functional characteristic of activated tumor stroma (p<0.05, Figure 6A).3 We further compared the composition of non-cancer cells between the two groups. High-risk tumor had a distinctly higher proportion of stromal cells (including fibroblasts and ECs) and monocytes (p<0.05, Figure 6B) than low-risk tumor. Figure 6C displayed top nine pathways enriched for high-risk group, including epithelial mesenchymal transition (EMT), NF-kB signaling, inflammatory response, angiogenesis and hypoxia.
Different expression of the seven stromal genes
To examine tissue expression location of the seven genes, we explored the scRNA-seq data and immumohistochemical (IHC) staining results. UMAP plots displayed gene expression profile in each cell type within GC. Canonical markers were basically expressed in corresponding stromal cell type indicated its effectiveness (Supplementary Figure S2). As figure 7A-7G shown, the seven genes were mainly expressed in stromal cell clusters of fibroblast, pericyte and EC. SDC2 and PLOD2 were primarily expressed in both fibroblast and pericyte. GJA1 was primarily expressed in fibroblast and EC. SERPINE1 and BEX3 were strongly expressed in all stromal cell types. These results suggested a close relationship among stromal cells. Meanwhile, it’s very clearly that genes were not uniformly expressed in all cells of each cluster, which shown the heterogeneity in a cell population.
IHC results of the seven genes verified the expression profile from scRNA-seq analysis. All seven genes positively expressed in GC stroma (Figure 7H-7N). Such as, SLCO2A1 was almost exclusively expressed in ECs, which was consistent with its UMAP plot of scRNA-seq data (Figure 7I).