Sample collection and clinical information
In this study, we collected esophageal tissue wax blocks from 111 individuals, of which 41 included tumors and paired nontumor tissues and the other 70 were only tumor tissues (Fig. 1). Forty-one patients with ESCC were included, and microbiome samples from both tumor and normal tissues were collected. The clinical information of the 111 patients is shown in Table 1.
Figure 1. Diagram of study enrollment and eligible paired and total samples. ESCC indicates esophageal squamous cell carcinoma.
Microbiota diversity in ESCC
We generated 13,671,987 quality-filtered sequence reads, with an average of 48,311 reads per sample. Sequence reads were mapped to the bacteria in the SILVA database. In general, the 16S rRNA gene sequencing results of paired tumor and normal tissue microbial profiles show partial differences. For alpha-diversity, the Shannon index (4.93 vs 5.05, P=0.6233), PD_whole_tree (9.35 vs 10.73, P=0.2645) and the observed species (46.97 vs 52.29, P=0.1893) were lower in the ESCC tumor tissues than in the normal tissues, but these differences were not statistically significant; however, the overall alpha-diversity Chao 1 index (132.06 vs 185.86, P=0.0091) in tumor tissues decreased dramatically by approximately 25% compared with that in the matched normal tissues (Fig. 2A). For the microbial community composition (beta-diversity), significant clustering was detected for the weighted UniFrac distance between paired ESCC tumor and nontumor tissues (Fig. 2B).
Relative abundances of bacteria at the phylum level in both the normal and tumor groups of ESCC patients
To further determine the signature of microbial profiles in both the normal and tumor groups, the relative abundances of bacteria at the phylum level were determined, and the results are shown in Fig. 2C (Class level was shown in Fig. S1). The top 6 phyla across all samples were Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria, Proteobacteria and Spirochaetes. Differential abundance analysis revealed a similar conclusion: Bacteroidetes (P=0.03149), Fusobacteria (P=0.01796) and Spirochaetae (P=0.004155) were the top 3 significantly enriched taxa in the tumor group compared with the matched normal group samples (Fig. 2D). To identify the signature microbiota existing in tumor tissues, which potentially play roles during the carcinogenesis of ESCC, 16S rRNA sequencing data were annotated at the genus level. Butyrivibrio (P=0.0078) and Lactobacillus (P=0.00052) were closely associated with sugar and fiber fermentation and were found at significantly lower relative abundances in the tumor group than in the normal group. Streptococcus (P=0.0013) was discovered increased in tumor group. In addition, Fusobacterium is anaerobic, gram-negative bacteria and normally treated as pathogen, which discovered as increasing relative abundance in tumor groups (P=0.0052) (Fig. S2). The full list of signature microbiota profiles (genus level) is summarized in Table S1.
Figure 2. Microbial comparison of paired tumor and paired-normal tissues from patients with ESCC for α (A) and β diversity (B). The α diversity of 21 paired ESCC tumor and nontumor tissues was assessed with the Chao1 index, Shannon index, observed species and PD_whole_tree. The β diversity, based on the weighted UniFrac distance, was compared with PCoA plots for 21 paired tumor and nontumor tissues from patients with ESCC. PCoA, principal coordinate analysis. (C) The relative abundance of bacteria at the phylum level in both the normal and tumor groups of ESCC. (D) Differential abundance analysis revealed that Bacteroidetes (P=0.03149), Fusobacteria (P=0.01796) and Spirochaetae (P=0.004155) were the top 3 significantly enriched taxa in the tumor group compared with the matched normal group samples. **P-value<0.01.
Microbiota associated with clinical characters in ESCC
To explore whether there are regular differences among the compositions of microorganisms in ESCC, we grouped 21 pairs of samples according to their respective clinical stages and clustered trees based on the unweighted UniFrac distance matrix and UPGMA method and integrated the clustering results with the relative abundances of each sample species. The results of the histogram (Fig. 3A) and heatmap showed that the relative abundances of Fusobacterium (P=0.039) and Prevotella (P=0.0379) were correlated with clinical stage in ESCC, where they were higher in tumors than in the corresponding normal tissues, not with the IA stage.
To further explore the microbiome alterations that occur during the progression of esophageal cancer and identify signature species as diagnostic biomarkers, we performed differential abundance analyses on 16S rRNA sequencing data of tumor tissues only using DEseq2(27), and cross-referenced with both patient clinical tumor classification of malignant tumor.
Subjects in the pT1-T2 classes were designated Group a, and those in pT3-T4 were designated Group b. In the pT classification, α diversity measures indicated a decreased diversity in Group b (Fig. 3B). Although there was no significant difference in the Chao1 (1094.03 vs 843.77, P=0.0659) index, PD_whole_tree (493.81 vs 450.37, P=0.0771) and the observed species (790.09 vs 626.27, P=0.0819), the Shannon index (5.858 vs 5.219, P=0.0242) decreased significantly in Group b. We found that the relative abundances of the phyla Fusobacteria (P=0.0048) and Bacteroidetes (P=0.0035) increased significantly in Group b (Fig. S4). In addition, multiple comparisons between every pT stage were performed, and bacterial taxa with significant changes in abundance are summarized in Table S2. Many bacterial genera were significantly changed, including Lactobacillus, members of which show preventive effects in bacterial infection and inflammation (28). Moreover, data from clinical stage I-II subjects were grouped (Group c) and then compared with stage III subject tumor microbiome profiles (Group d). As illustrated in the figure (Fig. 3C), there were no statistically significant differences between groups c and d in α diversity as measured by the Chao1 index (1011.25 vs 984.73, P=0.743), Shannon index (5.67 vs 5.64, P=0.7048), PD_whole_tree (480.52 vs 464.56, P=0.2355) and the observed species (741.26 vs 720.91, P=0.624).The data showed that the relative abundance in stage III was higher than that in stage I-II; however, there were no statistically significant differences. In addition, the current analysis shows that the distribution of microflora diversity in tumor tissue is not related to the patient's age, gender, smoking history, drinking history, tumor location, lymph node metastasis or G stage (Table S3).
Figure 3. Microbiota associated with clinical tumor stage in ESCC. (A) Twenty-one pairs of samples according to their respective clinical stages and clustered trees based on the unweighted UniFrac distance matrix and UPGMA method; the clustering results were integrated with the relative abundance of each sample species. The α diversity at the clinical pT stage (B) and clinical tumor stage (C) was assessed with the Chao1 index, Shannon index, observed species and PD_whole_tree.
F. nucleatum was detected in 20 specimens of ESCC tissue by FISH. F. nucleatum stained with the FUSO probe (in red) was found to be enriched within the esophageal tumor tissue (median, 23; range, 5 to 69) (Figure 4).
Figure 4. Enrichment of Fusobacterium nucleatum in ESCC detected by FISH. Graph A: ESCC specimens stained with DAPI. Cell nuclei were stained in blue; B: ESCC specimens stained with both DAPI and a universal bacterial probe (EUB338). Bacterial conserved regions were stained in green; C: ESCC specimen stained with both DAPI and a Fusobacterium-specific probe (FUSO). The F. nucleatum-specific regions were stained in red; D: ESCC specimens triply stained with DAPI, FUSO and EUB338. Cell nuclei (in blue), bacterial conserved regions (in green) and F. nucleatum-specific regions (in red) were clearly visible (all × 400).
qPCR to verify the level ofFusobacterium nucleatum in ESCC
To verify the results of 16S rRNA gene sequencing, we collected more samples to check the relationship between F. nucleatum and clinical characteristics of ESCC. To evaluate the relative abundance of F. nucleatum in tumor tissues, the specific nusG gene of F. nucleatum was quantitatively analyzed by qPCR in samples from 98 ESCC patients. The results showed that of the 98 tumor tissue samples, 69.4% (68/98) were positive for F. nucleatum, and the relative abundance of F. nucleatum in tumor tissues was significantly higher than that in paired normal tissues (P=0.0262) (Fig. 5A). By univariate analysis, it was found that the abundance of F. nucleatum was highly correlated with the pT and clinical stages. The relative F. nucleatum levels in the pT3-4 stage were significantly higher than those in the pT1-2 stage (P=0.039), and there was greater enrichment in the III stage than in the I-II stage (P=0.0039) (Fig. 5B and C). This is consistent with the above analysis results showing that the relative F. nucleatum DNA levels in ESCC tumor tissues were not significantly changed patient age, gender, smoking history, drinking history, lymph node metastasis and G stage (Table 1 and Fig. S5).
Figure 5. Distribution of F. nucleatum in ESCC. (A) The relative abundance of F. nucleatum in tumor tissues was significantly higher than that in paired normal tissue samples in 21 patients with ESCC (P=0.0262). (B) The relative F. nucleatum levels in the pT3-4 stage were significantly higher than those in the pT1-2 stage (P=0.039), and the enrichment was higher in the III stage than in the I-II stage (P=0.0039).
Mechanism by which F. nucleatum affects the progression of ESCC
According to the qPCR results, we selected 20 pairs of ESCC samples for whole-exome sequencing (WES). Thirteen tumor tissues contained F. nucleatum, and the other 7 were F. nucleatum negative. GO functional enrichment analysis of all the high-risk mutant genes of the 13 F. nucleatum-positive samples was conducted, and the most significantly enriched functions are shown in Fig. 6A. The most significantly enriched functions included cell cycle (P=0.00071), positive regulation of apoptotic process (P=0.0063) and positive regulation of transcription, DNA-templated (P=0.0049). Moreover, the involved proteins were further classified by protein domain, and it was found that the epidermal growth factor-related domain (EGF-like domain) was significantly enriched (P=0.00029) (Fig. 6B). The enriched GO terms for high-risk mutant genes in F. nucleatum-positive specimens were significantly different from those for the 7 F. nucleatum-negative samples (Fig. S6).
F. nucleatum signiﬁcantly mutated genes in ESCC
MutSigCV analysis of 20 ESCC paired tissues revealed 20 signiﬁcantly mutated genes (Fig. 6C). ESCC samples were divided into two groups according to the abundance of F. nucleatum. Any sample with a Ct less than 37 was considered F. nucleatum-positive. Specimens were considered negative when the Ct value of the specimen was greater than 37 or no melt curve could be generated. TP53 mutations were present in all F. nucleatum-positive ESCCs. Most TP53 mutations were in the hotspot exons 4-8. TP53 is involved in the regulation of cell proliferation and apoptosis. Five of the 13 F. nucleatum-positive ESCCs harboring TP53 mutations had concomitant mutations in COL22A1. The frequency of F. nucleatum-negative ESCC harboring RBMXL3 and HDGFRP2 mutations was 57.14% and 42.85%, respectively, which are different from the values for the F. nucleatum-positive group. Notably, we also identiﬁed a number of genes as occurring in only the F. nucleatum-positive group, including COL22A1 (38.46%, 5/13), TRBV10-1 (23.07%, 3/13), CSMD3 (30.76%, 4/13), SCN7A (15.38%, 2/13) and PSG11 (7.69%, 1/13) (Fig. 6C).
Relationship between F. nucleatum and the mutational burden in ESCC
Of the 20 patients who provided samples for sequencing, 11 had distant metastasis within 6 months after the operation. There was no metastasis during 2 years of follow-up in the other 9 patients. Among these 11 patients, 72.8% (8/11) were F. nucleatum positive. A higher (mean ±SE) mutational burden was observed in the tumors of 11 patients with metastasis than those of patients without metastasis (141.5±16.94 vs. 124±16.31, P = 0.467), but there was no significant difference (Fig. 6D). The mutational burden per primary tumor was identified (median, 133.6; range, 33 to 256). The high-mutational-burden group was defined as patients with 120 or more mutations, and the low-mutational-burden group was defined as patients with less than 120 mutations. The results indicated that there was a significant correlation between the combination of the mutational burden and the F. nucleatum content in tumors and metastasis in ESCC. In F. nucleatum-positive ESCC, there were more patients with lymphatic and distal metastasis in the high-mutational-burden group than in the low-mutational-burden group.
Figure 6. (A) Gene Ontology (GO) enrichment results; (B) protein domain enrichment analysis results; (C) mutation frequencies and signatures, and significantly mutated genes in 20 ESCCs. The number of somatic mutations of each examined case (top) and the significantly mutated genes (SMGs) colored by the types of mutations and their mutational frequency (bottom). Columns, examined cases; rows, genes; N, F. nucleatum-negative group; P, F. nucleatum-positive group. (D) Combination of the mutational burden and the F. nucleatum content to predict metastasis in ESCC. The red horizontal line indicates the cutoff score for mutational burden. Clinical and pathological features that include the presence or absence of distant metastases in the surgical specimen are annotated for each patient. A sample was considered F. nucleatum positive if the qPCR Ct value was <37 and the melt curve could be generated; otherwise, it was considered negative.