Results of the quality control for the single-cell data sets
Diffuse growth is reported in tumor cells in diffuse gastric cancer and the adhesion ability between cells is disrupted, with the invasive characteristic. To explore the molecular mechanism underlying differences on invasion of gastric cancer cells and the degree of infiltration, GSE167297 datasets were downloaded from the GEO database, which contained analyzed data of single cells collected from normal gastric tissues, superficial, and deep tumor tissues of diffuse gastric cancer. Totally 14 samples (30365 single cells) derived from 5 patients were collected to analyze the expression level of 32738 genes, among which normal gastric tissues were not available in one patient. Through sample statistics, single cell samples from 4298 normal tissues, 13,986 superficial tumor tissues of diffuse gastric cancer, and 12,081 deep tumor tissues of diffuse gastric cancer were obtained (Table 1 and Figure 1).
Subsequently, quality control was conducted on single cell samples extracted from the above tissues to eliminate samples and genes with low quality. Firstly, all single cell samples were integrated and the number of detected genes in each single cell was counted. According to the overall distribution, single cells with less than 500 detected genes were considered as low-quality samples and removed (Figure 2). Subsequently, the percentage of samples in which each gene was expressed was counted. According to the overall distribution, genes with a percentage less than 0.5% were considered as low-quality genes and removed (Figure 2).
Lastly, the proportion of mitochondrial gene expression in each sample was counted and single cells with the proportion of mitochondrial gene expression higher than 10% was eliminated. After the completion of quality control on collected samples, the number of genes and single cells was recounted in 14 samples, among which large amounts of cells were excluded in GSM5101025 (Table 2), indicating a poor sequencing output in the sample of GSM5101025.
Characterization on the microenvironment in normal gastric tissues
Considering inherent differences between the cells from normal tissue and tumor tissues, we firstly integrate data collected from 4 normal tissues (GSM5101013, GSM5101018, GSM5101021, and GSM5101024) and finally obtained the expressional profile data of 2296 single cells and 9926 genes. Subsequently, standard data analysis procedures in the ‘Seurat’ package was conducted on achieved data objects, including the data standardization, PCA dimension reduction, and KNN classifier. Single cells were divided into 12 subgroups, followed by identification on detailed cell types for each subgroup (Table 3, Figure 3).
Firstly, for immune cells, 6 subpopulations were extracted from 12 cell populations according to the expression level of CD45(PTPRC), including Cluster 0,1,3,4,6, and 11. As MS4A1 gene was highly expressed in Cluster 0, 3, and 6, we suspected that these three cell subpopulations were B cells. Subsequently, as CD4 and CD8 were highly expressed in Cluster 1 and 4, we considered these two cell subsets to be T cells. Additionally, the expression level of NKG7 was evaluated and we found that large amounts of single cells expressed with CD8 was mixed with NK cells. Lastly, as CD68 was highly expressed in Cluster 11, we considered Cluster 11 to be macrophages (Figure 4). By identifying the proportion of different immune cells, we found that a large number of immune cells were infiltrated in normal adjacent tissues collected from gastric cancer patients, among which CD8+ T cells and NK cells might play a predominant role and macrophages might play a secondary role due to a small proportion.
Then, the cell type of remaining 6 cell subsets were identified. Firstly, mucous cells with high expression level of TFF1 and chief cells with high expression level of PGC were identified in Cluster 5, the cell types in which were considered as main cells that constituted gastric tissues. As ENG was highly expressed in Cluster 7, we suspected that cells in Cluster 7 were vascular endothelial cells. Cluster 9 was considered to be fibroblasts due to high expression level of COL1A1 and PDGFRA. Lastly, as the expression of SLAMF7, TNFRSF17, and SDC1 in Clusters 2 and 8 was relatively high, Clusters 2 and 8 were identified as plasma cells (Figure 5).
Characterization on TME in gastric tumor tissues
Data collected from 10 samples of superficial and deep tumor tissues of diffuse gastric cancer (Table 1) were integrated using the ‘Seurat’ package to obtain the expressional profile data of 16912 single cells and 9926 genes. Subsequently, standard data analysis procedures in the ‘Seurat’ package was conducted on achieved data objects, including the data standardization, PCA dimension reduction, and KNN classifier. Single cells were divided into 15 subgroups, followed by identification on detailed cell types for each subgroup (Table 3, Figure 3).
Firstly, 9 subpopulations were extracted from 15 cell populations according to the expression level of CD45(PTPRC), including Cluster 0,1,2,5,6,10,11,13, and 14. Cluster 0 and 14 were considered as B cells due to high expression level of MS4A1. As CD4 and CD8 were highly expressed in Cluster 1,2,5,6, and 10, we considered Cluster 1,2,5,6, and 10 to be T cells. Additionally, similar to normal tissues, large amounts of single cells expressed with CD8 were mixed with NK cells. Lastly, as CD68 was highly expressed in Cluster 3,11, and 13, Cluster 3,11, and 13 were considered to be macrophages (Figure 4). We found that the number of macrophages in deep tumor tissues (n =1696) was much higher than that in superficial tumor tissues (n = 593), indicating that more macrophages were infiltrated in deep tumor tissues of diffuse gastric cancer. (Table 4, Figure 6).
Subsequently, stromal cell types in gastric tumor tissues were identified. According to the abnormal expression level of TFF1, MUC5AC, and PGC, we found that main cell types in Cluster 9 included abnormal chief cells and mucous cells. Cluster 7 was considered as vascular endothelial cell due to high expression level of ENG. Additionally, as COL1A1 and PDGFRA were highly expression in Cluster 8, Cluster 8 was identified as fibroblasts. Lastly, Cluster 4 and 12 were considered as plasma cells due to high expression level of SLAMF7, TNFRSF17, and SDC1.
EPCAM is regarded as an important biomarker of tumor stem cells in multiple types of malignant tumor [11]. In single cells derived from normal tissues and tumor tissues in the dataset of GSE167297, EPCAM was found significantly highly expressed in Cluster 9, which was a cell subpopulation from gastric tumor tissues. Addition, in Cluster 5, which was a cell subpopulation from normal gastric tissues, the expression level of EPCAM was also observed (Figure 9). Subsequently, screening on differentially expressed genes was conducted on 157 single cells in normal gastric tissues (Cluster 5, EPCAM low) and 421 single cells in gastric tumor tissues (Cluster 9, EPCAM high). The mean expression of each gene in the two types of single cells was calculated and the FC value was calculated by the Wilcoxon rank sum test. 515 differentially expressed genes (| log2FC | > 2 and FDR < 1 * e - 5) were screened out, which was analyzed using DAVID to annotate the biological functions. As shown in Table 5 and Figure 10, we found that 515 differentially expressed genes were mainly closely associated with the following KEGG pathways: hsa04064 (NF-κB pathway), hsa04662 (B cell receptor signaling pathway), and hsa04142 (lysosome pathway).
- protein subunit γ 11 (GNG11) was associated with the prognosis in gastric cancer patients
To further explore the difference on single cells extracted from superficial and deep tumor tissues of diffuse gastric cancer, screening on differentially expressed genes was conducted on 296 single cells in superficial tumor tissues and 131 single cells in deep tumor tissues. The mean expression of each gene in the two types of single cells was calculated and the FC value was calculated by the Wilcoxon rank sum test. 86 differentially expressed genes (| log2FC | > 1 and FDR < 0.01) were screened out, which was further analyzed using DAVID to annotate the biological functions. As shown in Table 6 and Figure 11, 86 differentially expressed genes were mainly closely associated with the following KEGG pathways: hsa04510 (focal adhesion), hsa04672 (intestinal immune network for IgA production), and hsa05200 (cancer related pathway).
GNG11, which was most highly expressed in deep tumor tissues (mean expression value: 0.1247, FC value: 52.2109), compared to superficial tumor tissues (mean expression value: 0.0024), was picked out for analysis. According to the expression level of GNG11, 450 patients were divided into 2 groups: the GNG11 highly expressed group (mean expression value > 48.5364) and the GNG11 lowly expressed group (mean expression value <48.5364). The HR [95%CI] in the constructed univariate Cox proportional risk model was 1.811[1.308-2.508] and the P value in the log-rank test was 0.00029(Figure 12).
Disease-specific survival analysis was further performed on patients, which were divided into the GNG11 highly expressed group (mean expression value > 39.4713) and the GNG11 lowly expressed group (mean expression value <39.4713) according to the expression of GNG11. The HR [95%CI] in the constructed univariate Cox proportional risk model was 4.419 [1.399-13.96] and the P value in the log-rank test was 0.0056 (Figure 13), indicating that GNG11 was a significant risk factor in STAD patients.