2.1 Sample sequencing data
The offline data met the test requirements after using sequence tags, clean-tags, and OUT. After clustering with 97% similarity, 4003 OUTs were obtained, of which 1290 were in the HC group ( group A), 3763 in the ESCC group (group B), and 1050 OUTs in both groups ( Fig. 1).
2.2 Alpha diversity analysis
Shannon index and Simpson index can predict species diversity. By comparison, the Shannon and Simpson indexes of the ESCC group were higher than those of the HC group, but the difference was not statistically significant (P > 0.05). Chao1 index is used to estimate the total number of species in the community. The chao1 index of the two groups was statistically significant (P < 0.05), indicating that the types of microorganisms in the two groups are not different, but the number of microorganisms in ESCC group was higher than that in HC group ( Fig. 2).
2.3 Beta diversity analysis
Beta diversity examines the similarity of the colony structure of different samples, and clusters are better in the sample set. PCoA can find the most important coordinates in the distance matrix and observe the differences between individuals or groups. PC1 and PC2 represent suspected influencing factors for the shift in the microbial composition of the two groups. PC1 in the two groups was 16.62%, PPC1=0.0034, suggesting a difference in the composition of microecological flora between the two groups (Fig. 3A). However, NMDS analysis is more stable than PCoA for complex structured data ranking results. The two groups in this study can be distinguished (Fig. 3B).
2.4 Differential LEfSe analysis
Phascolarctobacterium, Dialister, Clostridiales, S24_7, Rikenellaceae, Odoribacter and other bacterial genus in ESCC patients were higher than HC, while Neisseria、Neisseriales、Bradyrhizobiaceae、F16、Helicobacteraceae、Helicobacter bacterial genus were higher in HC patients than in ESCC patients, the difference was statistically significant (P < 0.05, Fig. 4A and B).
2.5 Structural analysis of esophagus flora in two groups
Because the esophageal mucosa flora did not conform to the normal distribution. Thus, Wilcox test was used to analyze the relative abundance of the two groups of bacteria. Significant analysis was performed at the above levels. This study only shows the results at the phylum and genus levels, and the analysis of the microbial flora composition at other levels is not listed (Supplementary Table S 2–4).
2.5.1 Analysis of microbial flora composition at the phyla level
The two samples consisted of 12 phyla. The five most abundant phyla in ESCC group are Firmicutes, Bacteroidietes, Proteobacteria, Fusobacteria and Actinobacteria. The five most abundant phyla in the HC group are Proteobacteria, Firmicutes, Bacteroidietes, Fusobacteria and Actinobacteria. The significant different phylum between two groups were Cyanobacteria, Proteobacteria, SR1 and TM7 (P < 0.05). Specifically, compared with the HC group, the abundance of Cyanobacteria in the ESCC group is increased, while the abundance of Proteobacteria, SR1 and TM7 is decreased (P < 0.05, Table 1)). This indicates that the colonization flora changes dynamically during the occurrence and development of ESCC, which lays a foundation for further research on the role of colonization flora in the occurrence and development of ESCC..
Table 1
Significant differences in phylum levels between the two groups (mean ± sd)
name | HC | ESCC | P-value* |
Cyanobacteria | 0.000 ± 0.001 | 0.003 + 0.008 | 0.041 |
Proteobacteria | 0.332 ± 0.157 | 0.217 ± 0.181 | 0.022 |
SR1 | 0.003 ± 0.003 | 0.001 ± 0.004 | 0.023 |
TM7 | 0.005 ± 0.004 | 0.002 ± 0.004 | 0.001 |
Notes: HC: healthy controls, ESCC: esophageal squamous cell carcinoma; *:Wilcoxon.test |
2.5.2 Analysis of microbial flora composition at genus level
The top five genera in the ESCC group are Streptococcus, Prevotella, Haemophilus, Neisseria and Veillonella; while the top five genera in the HC group were Streptococcus, Neisseria, Veillonella, Prevotella and Haemophilus (P < 0.05).The comparison shows that the abundance of Bifidobacterium, Collinsella, Parabacteroides, Paraprevotella, Coprococcus, Lachnospira, Roseburia, Faecalibacterium, Ruminococcus, Dialister, Megamonas, and Megamonas were higher in ESCC group, and the abundance of Prevotella, Lysinibacillus, Streptococcus, Megasphaera, Veillonella, Leptorichia, Ralstonia, Neisseria, Helicobacter, Haemophilus and Acinetobacter were lower than the abundance of HC group (P < 0.05, Table 2).
Table 2
Significant differences in genus levels between the two groups (mean ± sd)
name | HC | ESCC | P -value* |
Actinomyces | 0.015 ± 0.012 | 0.003 ± 0.004 | 0.000 |
Rothia | 0.011 ± 0.014 | 0.007 ± 0.025 | 0.001 |
Bifidobacterium | 0.001 ± 0.003 | 0.013 ± 0.041 | 0.000 |
Collinsella | 0.000 ± 0.001 | 0.002 ± 0.004 | 0.046 |
Parabacteroides | 0.003 ± 0.008 | 0.007 ± 0.010 | 0.001 |
Porphyromonas | 0.015 ± 0.012 | 0.015 ± 0.027 | 0.005 |
Paraprevotella | 0.000 ± 0.000 | 0.001 ± 0.002 | 0.000 |
Prevotella | 0.048 ± 0.038 | 0.034 ± 0.066 | 0.002 |
Lysinibacillus | 0.015 ± 0.017 | 0.006 ± 0.014 | 0.013 |
Streptococcus | 0.120 ± 0.126 | 0.071 ± 0.081 | 0.005 |
Coprococcus | 0.000 ± 0.001 | 0.001 ± 0.003 | 0.001 |
Lachnospira | 0.002 ± 0.005 | 0.005 ± 0.007 | 0.044 |
Roseburia | 0.001 ± 0.002 | 0.004 ± 0.006 | 0.013 |
Faecalibacterium | 0.004 ± 0.011 | 0.020 ± 0.033 | 0.012 |
Ruminococcus | 0.001 ± 0.004 | 0.005 ± 0.008 | 0.001 |
Dialister | 0.001 ± 0.009 | 0.006 ± 0.013 | 0.004 |
Megamonas | 0.004 ± 0.012 | 0.008 ± 0.020 | 0.027 |
Megasphaera | 0.004 ± 0.006 | 0.002 ± 0.005 | 0.034 |
Phascolarctobacterium | 0.002 ± 0.003 | 0.008 ± 0.011 | 0.025 |
Veillonella | 0.051 ± 0.051 | 0.024 ± 0.032 | 0.001 |
Leptotrichia | 0.009 ± 0.010 | 0.006 ± 0.017 | 0.002 |
Sutterella | 0.001 ± 0.004 | 0.003 ± 0.005 | 0.005 |
Ralstonia | 0.011 ± 0.016 | 0.001 ± 0.004 | 0.000 |
Neisseria | 0.071 ± 0.096 | 0.025 ± 0.057 | 0.000 |
Helicobacter | 0.009 ± 0.014 | 0.002 ± 0.016 | 0.000 |
Haemophilus | 0.040 ± 0.035 | 0.032 ± 0.047 | 0.018 |
Acinetobacter | 0.003 ± 0.004 | 0.002 ± 0.009 | 0.030 |
Notes: HC: healthy controls,ESCC: esophageal squamous cell carcinoma; *:Wilcoxon.test |
Table 3
Comparison of Phenotype Classification Based on BugBase
BugBase | KS_pvalue |
Aerobic | 0.000826148 |
Anaerobic | 0.000439153 |
Contains_Mobile_Elements | 7.84E-05 |
Facultatively_Anaerobic | 0.03818119 |
Forms_Biofilms | 0.0180091 |
Gram_Negative | 0.08919101 |
Gram_Positive | 0.08919101 |
Potentially_Pathogenic | 0.009671794 |
Stress_Tolerant | 0.006169115 |
2.6 Random forest classification Tree genus classification effect and ROC curve (genus level)
The random forest method was used to select the top60 species to establish a model (Fig. 5), and then the ROC curve was used to verify that our model is reliable and can effectively distinguish two groups of samples (AUC = 0.90, Fig. 6).
2.7 Comparison of Phenotypic Classification Based on BugBase to Predict the Function of Microbial Metabolism
Bugbase mainly performs phenotypic prediction, including Gram-positive, Gram-negative, biofilm formation, pathogenicity, mobile components, oxygen demand, including anaerobic, aerobic, facultative bacteria) and oxidative stress tolerance[20, 24]. Our tudies show that the ESCC group has advantages over HC in Aerobic, Anaerobic, Contains-Mobile-Elements, Facultatively-Anaerobic, Forms-Biofilms, Potentially-pathogenic, Stress-Tolerant, and the difference was statistically significant ( P < 0.05, Table 4).