Single-cell data set quality control.
We downloaded the GEO dataset GSE145370 and analyzed it using the Seurat R package. The number of minimal information units (MIUs) was not significantly correlated with mitochondrial gene percentage (Fig. 1A), but positively correlated with the number of mRNAs (Fig. 1B). We also analyzed the number of mRNAs, the readings of mRNA, and the distribution of mitochondrial genes (Fig. 1C), which revealed that most genes were distributed between 0 and 2,500 bp, mRNA readings were distributed between 0 and 1,000 bp, and mitochondrial percentages were below 15%. Further, we removed cells with more than 20% expression originating from mitochondrial genes and those with fewer than 200 genes. The expression spectrum of 33,538 genes from 103,223 cells was obtained, and the number of cells in each sample was counted (Table 1), including the number of filtered mRNAs, the readings of mRNA, and the distribution of mitochondrial genes (Fig. 1D). Highly variable genes (top 2,000) were identified for downstream analysis (Fig. 1E) after quality control.
Identification of cell subgroups by t-SNE analysis.
Considering that cells of the same type may be classified into different clusters when clustering because of different cell cycles, we conducted cell cycle regression analysis to observe such effects. We found that in each region, 3 cycles of cells were randomly distributed, and there was no difference in cell cycle status, so there was no need to eliminate the effects of cell cycles (Fig. 2A). Further, we used main component analysis to extract cell characteristics and the distance matrix, to identify 24 cell groups (Fig. 2B). There were different lymph node (LN) ratios in these cell groups, of which clusters 0, 1, 6, and 4 were mainly from tumor tissue, and clusters 5, 8, 2, and 3 were mainly from tissue peripheral to cancer (Fig. 2C). Further, applying rank and testing methods to identify differences in genes using the cell type annotation based on the CellMarker software package (diff_genes_wilcox_fdr.txt per cell group), these 24 clusters were eventually annotated to 16 cells (all.marker.anno .txt), which we defined as: Cell Cluster 0, cell monocytes from tumor tissue; Cell Cluster 1, Paneth cells; Cell Cluster 2, B cells; Cell Cluster 3, monocytes 1; Cell Cluster 4, exhausted CD8 plus T cells; Cell Cluster 5, CD8 + plus effector memory T (Tem) cells; Cell Cluster 6, natural killer (NK) cells; Cell Cluster 8, monocytes 2. In addition, we observed different proportions of tumor samples in different monocyte populations, with B cells derived mainly from normal samples and Paneth cells mainly from tumor samples (Fig. 2D), with significant individual differences observed in these cells in different patients.
Analysis of specific cell subgroups and functions in cancer and cancer side samples.
We observed the concentration of Paneth cells, NK cells, and exhausted CD8 + plus T cells in the tumor samples, and the concentration of CD8 + effector memory T (Tem) cells in the cancer side samples, which suggested the presence of specific cell subgroups in the cancer and cancer-side samples. We analyzed the differences in gene expression in each cell subgroup of cancer and cancer side samples, and we observed the differences in gene expression. Among the Panel cell, Natural killer cell, exhausted CD8 + T cell, the Natural killer cell has the most differential genes, and most of the difference genes in the Exhausted CD8 + T cell are contained by the Natural killer cell As shown in Fig. 3A. We selected the genes common to the three types of cells to include in the R software package clusterProfiler for KEGG Functional enrichment analysis which used a total of 29 pathways (step3_A_KEGG_enrich.txt), with the most significant top 10 shown in Fig. 3B. In addition to our analysis of the genetic differences between the CD8 + effector memory T (Tem) cell, B cell cancer, and cancer side sample tissues, we observed 2 types of genetic differences with a large intersection (Fig. 3C). Similarly, we identified common differences in genes using the R software package, clusterProfiler, for the KEGG functional enrichment analysis, which used a total of 22 pathways (step3_B_KEGG_enrich.txt), with the top 10 shown in Fig. 3D. We observed a significant overlap between the different gene functions in cells in the cancer and cancer side sample tissues (Fig. 3E). For example, genes with differences in the specific cell subgroups of cancer and cancer side samples were highly concentrated in the Ribosome and Parkinson's disease pathways, which showed that there were similarities of dysfunction in the specific cell subgroups of cancer and the cancer side samples.
Cell subgroups and disorder pathways for different clinically phased patients.
To identify prognostically related cell subgroups, we compared the differences between Stage II and Stage III patients across cell types. We observed significant differences in cell sources in all cell groups (step4.stage.count.txt), suggesting significant differences in tumor microenvironmental characteristics between samples with different clinical stages, the most significant of which were among the B cell and monocyte groups. The B cell count was significantly higher in Stage II than in Stage III, and the monocyte count was significantly higher in Stage II than in Stage III, which suggested unique immune cell immersion in the different stages. We compared the differences in gene expression between cancer and cancer side samples of the B cell and the monocyte, and observed that most of the degenerated genes in the monocyte intersected with 324 genes in the B cell (Fig. 4A). Further, using the R software package SNP Panel Identification Assay (SPIA), we analyzed differences in the pathways of gene participation between the 2 types of cells. In B cells, we observed a total of 40 pathways with disorders (e.g., step4_spia. A .txt), of which 24 were activation pathways (Fig. 4B) and 16 pathways were inhibition pathways (Fig. 4C). In monocytes, we observed a total of 42 offset pathways (e.g., step4_spia. B .txt), of which 25 were pathway activations (Fig. 4D) and 17 were inhibition pathways (Fig. 4E). Among them, 14 of the activated pathways are consistent, and 10 of the suppressed pathways are consistent, which indicates that the signaling pathways involved have some consistency overall. However, there are 6 pathways to the contrary, 5 of which are activated in tumor samples in B cells but inhibited in monocytes, including protein processing in the endoplasmic reticulum, RNA transport, influenza A, the mitogen-activated protein kinase (MAPK) signaling pathway, toxoplasmosis, and RNA degradation, mainly related to RNA transcription and degradation.
Transcription of classical mononucleosis.
We also observed the presence of 5 different mononucleocytes in the monocyte1 marker-based cell subgroups due to CD302, SLC43A2, BACH1, QKI, monocyte1-based RIN3, ODF3B, H CK, LAT2, FGR, SFT2D1, IQGAP1, and CLEC7A, the monocyte2 marker-based cell groups due to IL1B, S100A12, SERPINB2, VCAN, EREG, F3, THBS1, PTGS2, GCA, PROK2, GK, EHD1, SLC25A37, and JMJD1C, the monocyte3 marker-based cell groups due to G0S2, VIM, SH3 BGRL3, FBP1, CTSB, DUSP1, IL1RN, LGALS3, and S100A11, the monocyte4 marker-based cell groups due to IFI30, LST1, RNF144B, ACSL1, FPR2, C5AR1, CD55, CUX1, CD300E, C19orf38, IRAK3, and. NLRP3. These monocytes exhibited different cancer sample ratios, with high proportions of tumor samples in the Monoocyte3 group and high proportions of arranged samples in the Monoocyte1, Monocyte2, and Monocyte3 groups, which showed different mononucleosis components in the cancer and cancer side samples. Further, in our subgroup clustering of these monocytes, we obtained 6 cell subgroups (Fig. 5A), which included monocytes, CD8 + T cells, and microglial cells. We observed that microglial cells mainly originated from tumor samples, and Monocyte2 originated mainly from cancer side samples (Fig. 5B). Further comparison of the pathway enrichment levels between these cell clusters, such as by step5.spia .txt using the SPIA method showed that CD8 + T cell abnormalities had the least pathway impacts, and may represent a class of mononucleocytes that make up a small number of normal samples, in addition to several other classes that showed a large number of different path abnormalities (Fig. 5C), We observed multiple inhibition pathways in microglial cells with significant activation pathways in other cells, such as RNA transport and T cell receptor signaling pathways, which suggested the presence of a unique class of dysfunctional monocyte clusters in the tumor samples. In addition, clusters of microglial cells, CD8 + T cells, monocyte2 cells, and monocyte3 cells represented a higher level of major tissue compatibility complexes (MHCs).
mRNA expression and differences between B-cell subclasses.
The B lymphocytic dysfunction, an important component of adaptive immunity, is thought to be important in the pathogenesis of tumors, where 2 B cell groups are detected. There were 11,104 cells in the B cell group, labeled as CD69, FCER2, BANK1, FCMR, ADAM28, SIDT1, CXCR5, LY9, NGLY1, PARP15, BLK, and ARHGAP24. Of the cells in the B cell group, 1,312 were marker-based groups due to LRMP, TCL1A, BCAS4, CD22, VPREB3, SWAP70, and EZR. These B cells exhibited different proportions of tumor samples, and through t-SNE analysis, the B cells were divided into 6 clusters (Fig. 6A), where the CD8 + T cells, NK cells, and microglial cells were mainly located in the tumor samples and the microglial cells and neutrophils were similarly located in the tumor and cancer side samples. This suggested that there was a similar group of cells (microglial cell, neutrophil) in the cancer and cancer side samples. To compare the functional differences between the tumor samples and the cancer side samples in the microglial cells and neutrophils, we calculated the difference in the KEGG pathway enrichment of the tumors in the 2 cell groups using the GSEA method. We observed that a total of 106 pathways were enriched in microglial cells. These pathways were enriched in the cancer side samples (e.g., step6.gsea.kegg.txt), the most significant of which were CARDIAC_MUSCLE_CONTRACTION, OOCYTE_MEIOSIS, and HUNTINGTONS_DISEASE (Fig. 6B). Interestingly, the neutrophils had a total of 55 pathways (step6.gsea.kegg_ B.txt) which were concentrated in the cancer side samples, the most significant of which were the first 3 pathways ALZHEIMERS_DISEASE, SELENOAMINO_ACID_METABOLISM, and PARKINSONS_DISEASE (Fig. 6C), all associated with geriatric diseases. In addition, there was a significant intersection between the enriched pathways in the neutrophil and the rich pathways of microglial cells (Fig. 6D). This suggested that there were some similarities in the disordered pathways of the tumor samples at the pathway level in the different types of cells.
Identification of the genetic signature associated with prognosis.
To identify prognostically related genetic markers, we selected the cell group with the closest proportion of normal and tumor samples. We extracted the gene marker from the source cell subgroup, obtaining a total of 3,154 gene markers. First, we calculated that the 4,211 gene expression differences in Stage III, Stage II, and normal samples contained a total of 460 genes, (scRNAsub.exp.0 edgp .txt). We defined the sample feature vector as follows: normal 0, Stage II 2, and Stage III 3. We calculated the Spearman’s rank correlation coefficient of the expression of these genes in each cell and sample feature vector, selecting a false discovery rate (FDR) < 0.05 to obtain a total of 177 significantly related genes (scRNAsub.exp.edgp.cr .txt). In the functional enrichment analysis of these genes using the R software package, clusterProfiler, we observed that they were mainly concentrated in the KEGG pathways, such as the ribosome, leukocyte transendotal migration, and viral myocarditis pathways (Fig. 7A). In addition, they enriched the molecular functions of the structural constituents of the ribosome (Fig. 7B), and we observed that these genes were enriched into a large number of biological processes and cellular components, such as nuclear-transcribed mRNA catabolic processes and nonsense-mediated decay (Fig. 7C). In cellular components, such as cell-substrate-adhesin junctions and focal adhesions (Fig. 7D), these biological pathways are closely related to tumor proliferation and metastasis. We mapped these genes to the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) version 11.0b (www.string-db.org) database to obtain protein interoperability networks (ppi.network.txt) of these genes, and analyze the interrelationships of these genes to the network degree distribution (ppi.attr.table.txt) (Fig. 7E) and the Closeness centrality distribution (Fig. 7F). The Betweenness centrality distribution curve (Fig. 7E). The proximity to centrality, and the greater the amount of centrality indicate the importance of the network. We selected the genes of the top 5 degrees, Closeness centrality, and Betweenness centrality, and then selected the intersection of the three to identify two important genes, CXCL8 and CCL2.Finally, we observed that the CXCL8 gene had higher expressions in multiple cell groups (Fig. 7H) and that the CCL2 gene expressed only weak signals in each cell population; hence, we considered CXCL8 an effective potential prognostic marker.
TCGA dataset validation.
To verify the prognostic correlation of CXCL8, we downloaded the TCGA-Expression Spectrum Data Set (TCGA-ESDS; TCGA-ESCA.exp.txt) and the corresponding clinical follow-up data (TCGA-ESCA.exp.os .txt), from which the expression spectrum of CXCL8 was extracted, and differences in expression between the cancer and cancer side tissue samples (Fig. 8A) could be observed. In the tumor samples, the expression of CXCL8 was significantly higher than in the cancer side samples. In addition, we compared the relationship between CXCL8 expression and prognosis, using the maxstat package of R, to group CXCL8 expression, and analyzed the prognosis by considering differences of overall survival (OS), disease-free survival (DFS), and progression-free survival (PFS) in patients with high expression and low expression. We observed significant differences in prognosis and the low expression group, which indicated that CXCL8 is a potential prognostic marker in EC.