Data Collection and Preprocessing
The gene expression profile of GSE69925 was searched in NCBI Gene Expression Omnibus (GEO) with "esophageal squamous cell cancer" and "Homo sapiens" as keywords. GSE69925 consists of 274 biopsy specimens of ESCC, which were processed by Affymetrix Gene Chip HG-U133Plus2.0 to get comprehensive gene expression profiling.(8) Of all 274 samples, 40 samples contain tumor staging information (3 samples of stage T2, 21 samples of stage T3 and 16 samples of stage T4).
Probes in this series were annotated by gene symbol information from soft formatted family files, the probe annotated by multiple genes was deleted, and the average value was recorded as the final expression value for the gene matched by multiple probes.
Implementation of Single-Sample Gene Set Enrichment Analysis (ssGSEA)
Immune-related gene sets were obtained from the cancer immunome database (TCIA).(9) The literature from TCIA contains 28 immunocyte-related gene sets, including innate immunity and adaptive immunity. The innate immunity consists of activated dendritic cell, immature dendritic cell, plasmacytoid dendritic cell, CD56dim natural killer cell, CD56bright natural killer cell, eosinophil, mast cell, macrophage, MDSC, monocyte, natural killer cell, natural killer T cell and neutrophil. While the adaptive immunity comprises of activated CD4 T cell, activated CD8 T cell, central memory CD4 T cell, central memory CD8 T cell, effector memory CD4 T cell, effector memory CD8 T cell, gamma delta T cell, activated B cell, immature B cell, memory B cell, T follicular helper cell, regulatory T cell, type 1 T helper cell, type 2 T helper cell and type 17 T helper cell.
Related infiltration levels for 28 immunocytes were quantified by the ssGSEA algorithm in R package GSVA, which could apply gene signatures expressed by immunocytes to every cancer samples.(10) Then the R package ESTIMATE was utilized to calculate the immune score, stromal score, tumor purity and estimate score of every tumor sample.(11)
Identification of Differentially Infiltrated Immunocytes between Different Stages
Based on T stages of samples containing tumor stage information, 40 samples were classified into two groups, T2+T3 and T4. Then we could identify immunocytes infiltrating differentially between two groups, which was analyzed by the Mann-Whitney test, and differences were considered statistically significant when P Value<0.05. The immunocyte with the greatest difference and clinical significance was selected as the target immunocyte for next analysis.
Identification of Infiltrating Immunocyte Related Gene List by Weighted Gene Co-expression Network Analysis (WGCNA)
The expressing profiling of all 274 samples was applied to this section. In this section, 5,000 genes with the highest average expression were selected, and the gene expression was calculated using the WGCNA algorithm.(12) First, the soft threshold for network construction was selected by calculating the scale-free topology fit index for several powers. Then, the scale-free network was constructed and the module eigengene (ME) was calculated. Each module contained a gene list with the most appropriate accounting of gene expression profile. And all 274 samples were divided into highly-infiltrating group and lowly-infiltrating group according to infiltrating scores of the targeting immunocyte, which was utilized as the clinicopathologic features for next step. Finally, the correlation between ME and clinicopathologic features in each module was calculated. According to module-trait relationships, the module displaying the most obvious correlation was considered a key module and selected for further calculation.(13)
Functional Enrichment Analysis
Metascape is a meta-analysis website integrating gene annotation and functional enrichment analysis. In this section, by using the “Express Analysis” module in Metascape, we explored the enriched biological processes and pathways of the gene list contained in the key module.(14)
Differentially Expressed Genes (DEGs) and Correlation Analysis
By DEGs analysis of gene list contained in the key module between highly-infiltrating and lowly-infiltrating group, we could further screen targeting genes associated with immunocyte infiltrating. DEGs were analyzed using package limma of R software with adjusted P Value<0.05 and |logFC| >1.(15)
Spearman’s correlation test was performed to evaluate potential correlations between mRNA expression of target genes and immunocyte infiltration levels.(16)
TIMER Database Analysis
TIMER is a comprehensive website for systematic analysis of infiltrating immunocytes across diverse carcinomas.(17) We analyzed expressions of target genes in different types of carcinoma and their correlations with 6 types of infiltrating immunocytes (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells) in esophageal carcinoma and other digestive system carcinomas. Gene expression levels against tumor purity is a critical factor that have an effect on the analysis of immunocyte infiltrating level and is shown in the left-most panel.(11) The correlations between mRNA expressions of target genes and that of related colony-stimulating factors (CSFs) were also analyzed in TIMER.