Characteristics of Esophagus Flora in Patients with Esophageal Squamous Cell Carcinoma in Central China


 Microecology may be involved in tumorigenesis and development through the introduction of chronic inflammation and may be related to esophageal squamous cell carcinoma (ESCC). This study was to observe the characteristics of ESCC flora and to preliminarily analyze the key genera in ESCC. A total of 72 ESCC patients and 20 healthy individuals esophageal tissue samples, genomic DNA was extracted, PCR was performed on the bacterial 16SrRNA gene sequence V4 hypervariable region, and bacterial characteristics were analyzed by IlluminMiSeq sequencing. The esophageal flora abundance of ESCC patients (Alpha diversity: Shannon index, P = 0.5088; Simpson index, P = 0.5894; Chao1 index, P = 0.0029) and variability (Beta diversity: PC1 16.62%, P PC1 = 0.0034, P PC2 =8.5e-08) were significantly higher than HC tissue. The five most abundant phylum in ESCC group were Firmicutes, Bacteroidietes, Proteobacteria, Fusobacteria and Actinobacteria, while the HC group were Proteobacteria, Firmicutes, Bacteroidietes, Fusobacteria and Actinobacteria. The top five genus in the ESCC group were Streptococcus, Prevotella, Haemophilus, Neisseria and Veillonella; while the top five genus in the HC group were Streptococcus, Neisseria, Veillonella, Prevotella and Haemophilus. The ESCC group has advantages over HC in Aerobic、Anaerobic、Contains-Mobile-Elements、Facultatively-Anaerobic、Forms-Biofilms、Potentially-pathogenic、Stress-Tolerant, and the difference was statistically significant (P <0.05). The esophageal microbial community structure of ESCC patients is different from that of HC, suggesting that the corresponding changes in esophageal microflora have a certain correlation with ESCC. It indicates that the detection of esophageal microorganisms has potential as a means of early diagnosis or screening of future ESCC.


Introduction
Esophageal carcinoma (EC) ranks seventh in cancer incidence rate and sixth in mortality in the world [1].
EC has two pathological types, namely, esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC) [2]. China's EC incidence rate ranks in the top ve globally [3], with ESCC spatial distribution and cannot be cultured. The oral cavity, esophagus, and to rectum are diverse and different in number [14].
Research on microecology and ESCC is still in the early stage. Knowledge on the functions of microecology and its host is limited [15]. To explore the microecology status of ESCC patients can help further studies on ESCC. In this paper, illuminaHiseq technology was used to observe the characteristics of ESCC ora and to preliminarily analyze the key genera in ESCC. It provides theoretical basis for the study on ESCC ora.

Sample source
Seventy-two ESCC patients who were hospitalized in the digestive endoscopy center and thoracic surgery of taihe hospital from July 2018 to July 2019 were regarded as group B and were all con rmed by pathology. Twenty healthy individuals in the center during the same period were regarded as healthy control (HC, group A) by endoscope examination. There was no signi cant difference in gender and age between the two groups (Supplementary Table S1).
The study protocol was reviewed and approved by the Taihe Hospital Ethics Committee, and all patients written consent were obtained before participating in this research. Moreover, this study was conducted following the provisions of the Helsinki Declaration.

Inclusion and exclusion criteria
Inclusion criteria: (1) Age ≥ 18 years, patients in group B histological diagnosis of ESCC; (2) Patients in good condition with no metabolic diseases, such as diabetes, hyperlipidemia, and infectious diseases; (3) Did not take antibiotics, acid inhibitors, and probiotics that affect the esophagus ora in the past 2 months; (4) No special eating habits; (5) No serious liver and kidney disease and immune de ciency; and (6) Informed consent for this study.
Exclusion criteria: (1) Received treatments that affect microecology in the past 2 months; (2) Combined with autoimmune diseases;(3)Past or present tumors other than esophageal cancer; (4) Incomplete information; and (5) Researchers consider them inappropriate for the study.

Standard Collection
Endoscopy center specimens: 6-8 h after fasting, the patients were enrolled for gastroscopy. They gargled with warm water before the examination, and 4-8 specimens were taken after esophageal lesions were found. Two specimens were immediately placed in liquid nitrogen after being labeled, and the remaining tissues were routinely stored in formalin and sent to the pathology department. The researchers in group A collected 2 esophageal mucosa, and the treatment was performed as previously described. Surgical specimens were frozen and sent to the pathology department after ex vivo, and appropriate samples were selected for subsequent research in accordance with the inclusion criteria.

OUT clustering and species annotation
Vsearch version 2.4.4 was used to analyze operational taxonomic units (OTUs). OTUs was clustered by 97% similarity. The species of representative sequence was annotated by SILVA128 database, and the composition of ora of all samples at Phylum, Class, Order, Family, Genus level were counted. The abundance and classi cation of OTUs was recorded.

Bioinformatics analysis and statistical analyses
Sequence data analysis mainly done by using QIIME (Quantitative Insights Into Microbial Ecology, version 1.8.0, http://qiime.org/) and R package (www.r-project.org, version 3.2.0). The Alpha diversity index at OTU level was calculated by using QIIME software, and the samples' OTU abundance and uniformity were compared. Beta diversity examines the similarity of the colony structure of different samples, and clusters are better in the sample set.
UniFrac distance measure was calculated by using QIIME software [16], and a principal component analysis (PCA), principal co-ordinates analysis (PCoA), and nonmetric multidimensional scaling (NMDS) maps were generated to analyze the Beta diversity of different samples in microecology.
The t-test and the Monte Carlo permutation examination were used to draw a box plot to compare the difference in Unifrac distance between groups. The differentiation markers of microecology ora structure between groups were evaluated by PERMANOVA (Permutational multivariate analysis of variance) [17]. Vegan from R package was used to visualize classi ed groups and abundance based on the MEGAN software [18 , 19] and Graphical Phylogenetic Analysis (GraPhlAn) [20]. A Venn graph was generated based on R package "Venn Diagram". The common OTUs between samples or groups were visualized. The difference in the classi cation level within or between groups was compared by R package Kruskal method.
Linear discriminant analysis effect size (LEfSe) analysis [21] combines linear discriminant analysis with Kruskal-Wallis and Wilcoxon rank sum tests, to screen out intergroup biomarkers, such as species, with signi cant difference. By Random Forest Analysis, R package "random Forest" default setting was used to compare the differences between groups. Microbial function was predicted on the basis of PICRUSt [22]. Metagenomic Pro les (STAMP) software package version 2.1.3 [23] was used for further analysis of the output le. Parallel-META 3 (version 3.3.2) was used to complete the Beta-diversity analysis in species and functions based on Meta-Storms distance. Statistical analysis was performed using SPSS 21.0 version (SPSS Inc., Chicago, USA). When P < 0.05, the difference was considered statistically signi cant.

Sample sequencing data
The o ine data met the test requirements after using sequence tags, clean-tags, and OUT. After clustering with 97% similarity, 4003 OUTs were obtained, of which 1290 were in the HC group ( group A), 3763 in the ESCC group (group B), and 1050 OUTs in both groups ( Fig. 1).

Alpha diversity analysis
Shannon index and Simpson index can predict species diversity. By comparison, the Shannon and Simpson indexes of the ESCC group were higher than those of the HC group, but the difference was not statistically signi cant (P > 0.05). Chao1 index is used to estimate the total number of species in the community. The chao1 index of the two groups was statistically signi cant (P < 0.05), indicating that the types of microorganisms in the two groups are not different, but the number of microorganisms in ESCC group was higher than that in HC group ( Fig. 2).

Beta diversity analysis
Beta diversity examines the similarity of the colony structure of different samples, and clusters are better in the sample set. PCoA can nd the most important coordinates in the distance matrix and observe the differences between individuals or groups. PC1 and PC2 represent suspected in uencing factors for the shift in the microbial composition of the two groups. PC1 in the two groups was 16.62%, P PC1 =0.0034, suggesting a difference in the composition of microecological ora between the two groups (Fig. 3A).
However, NMDS analysis is more stable than PCoA for complex structured data ranking results. The two groups in this study can be distinguished (Fig. 3B).

Structural analysis of esophagus ora in two groups
Because the esophageal mucosa ora did not conform to the normal distribution. Thus, Wilcox test was used to analyze the relative abundance of the two groups of bacteria. Signi cant analysis was performed at the above levels. This study only shows the results at the phylum and genus levels, and the analysis of the microbial ora composition at other levels is not listed (Supplementary Table S 2-4).

Analysis of microbial ora composition at the phyla level
The two samples consisted of 12 phyla. The ve most abundant phyla in ESCC group are Firmicutes, Bacteroidietes, Proteobacteria, Fusobacteria and Actinobacteria. The ve most abundant phyla in the HC group are Proteobacteria, Firmicutes, Bacteroidietes, Fusobacteria and Actinobacteria. The signi cant different phylum between two groups were Cyanobacteria, Proteobacteria, SR1 and TM7 (P < 0.05). Speci cally, compared with the HC group, the abundance of Cyanobacteria in the ESCC group is increased, while the abundance of Proteobacteria, SR1 and TM7 is decreased (P < 0.05, Table 1)). This indicates that the colonization ora changes dynamically during the occurrence and development of ESCC, which lays a foundation for further research on the role of colonization ora in the occurrence and development of ESCC..  The random forest method was used to select the top60 species to establish a model (Fig. 5), and then the ROC curve was used to verify that our model is reliable and can effectively distinguish two groups of samples (AUC = 0.90, Fig. 6).

Comparison of Phenotypic Classi cation Based on BugBase to Predict the Function of Microbial Metabolism
Bugbase mainly performs phenotypic prediction, including Gram-positive, Gram-negative, bio lm formation, pathogenicity, mobile components, oxygen demand, including anaerobic, aerobic, facultative bacteria) and oxidative stress tolerance [20,24]. Our tudies show that the ESCC group has advantages over HC in Aerobic, Anaerobic, Contains-Mobile-Elements, Facultatively-Anaerobic, Forms-Bio lms, Potentially-pathogenic, Stress-Tolerant, and the difference was statistically signi cant ( P < 0.05, Table 4).

Discussion
The esophagus is located between the ora-rich oropharynx and stomach,and has less species than oral populations [24]. In 1990s, culture studies have suggested that aseptic esophagus populations or a small amount of bacteria that are swallowed or gastric re ux [25]. Pei et al [26] con rmed that the esophagus ora mainly includes six phyla: Firmicutes, Bacteroides, Actinobacteria, Proteobacteria, Fusobacteria and TM7. Yang et al [27] divided the esophagus ora into two types: I and II. The former is composed of Gram-positive bacteria mainly distributed in the esophagus of normal people, and the latter is composed of Gram-negative bacteria mainly existing in Barrett's esophagus (BE ) or esophagitis patients.
Li [28] and their colleagues found that the normal esophagus ora is similar to the oral cavity ora. Firmicutes is the most abundant phyla, and no speci c ora exists in different parts of the esophagus.
Recent studies have shown that [15 , 29] the abundance of normal esophageal archaea, phages, and other ora is low, mainly including Streptococcus, Prevotella, Veillonella, Clostridium, Haemophilus, Neisseria and Porphyromonas. In the present study, the HC group includes 12 phyla. The most abundant phytobacteria are Bacteroidia, Bacilli, Gammaproteobacteria, Clostridia and Betaproteobacteria, which are basically consistent with the report except for some differences with regard the impact of different regions on dietary structure and living environment.
Understanding the microecology of ESCC is highly important. Peters Ab et al [30] found that oral Porphyromonas gingivalis increases the risk of ESCC. Chen et al reported that the oral ora diversity of ESCC patients is reduced and that the Lautropia, Bulleidia, Catonella, Corynebacterium, Peptococcus and Cardiobacterium abundance in ESCC patients are less than that in non-ESCC. Wang et al [8] con rmed that Actinomyces and Atopobium were associated with high risk in saliva of genus level ESCC patients, while Fusobacterium and Porphyromonas were associated with healthy people. Esophageal squamous dysplasia (ESD) is a precancerous lesion of ESCC. Yu et al [32] the Scienti c research project of Hubei Provincial Department of Education found that the low microbial abundance and composition of the upper digestive tract microbiota in the Chinese population are related to ESD. Yamamura et al [33] con rmed that Fusobacterium nucleus is signi cantly higher in 23% of EC (EAC, ESCC, etc.) than in noncancerous non-tumor tissue (P = 0.021) and is related to EC severity and prognosis. Liu et al [34]reported that the abundance of esophageal phyla Bacteroidetes, Firmicutes, and Spirochaetes is increased in ESCC patients, proteobacteria is decreased in patients with positive lymphatic metastasis. Prevotella and Treponema were more abundant in positive patients than in negative patients, which were related to the poor prognosis of the patient and may be indicators of prognosis in ESCC. Gao et al [35] found that Porphyromonas gingivalis is often located in ESCC and ESD, and its infection is related to disease severity and prognosis, which can be used as a marker for the diagnosis and outcome of ESCC. Passing the graduate student ora may be bene cial to the early diagnosis, disease evaluation, and prognosis of ESCC.
In the present study, the abundance of Cyanobacteria in the ESCC group increased, while the abundance of Proteobacteria, SR1 and TM7 decreased. It is suggested that Proteobacteria, SR1 and TM7 may have a protective effect on normal esophagus. During the development of ESCC, the ora balance was broken, and Cyanobacteria may be involved in the development of ESCC, however, the detailed mechanism needs to be further studied We found that the abundance of Bi dobacterium, Collinsella, Parabacteroides, Paraprevotella, Coprococcus, Lachnospira, Roseburia, Faecalibacterium, Ruminococcus, Dialister, Megamonas, and Megamonas were higher in ESCC group, and the abundance of Prevotella, Lysinibacillus, Streptococcus, Megasphaera, Veillonella, Leptorichia, Ralstonia, Neisseria, Helicobacter, Haemophilus and Acinetobacter were lower than that of HC group. Shao et al [36] the Scienti c research project of Hubei Provincial Department of Education reported that the ESCC ora is mainly composed of Firmicutes, Bacteroidetes and Proteobacteria. Clostridium and Streptococcus are less abundant in ESCC tissues than in non-tumor tissues. The abundance of Bi dobacteria, Clostridium prasium, and Ruminococcus aureus in the feces of EC patients was signi cantly lower than that of HC population (P < 0.05) [36]. A study by the University of Chicago [38] found that the intestine contains high abundance Malignant melanoma mice of Bi dobacterium have stronger CD8 + T cell activity and better e cacy of receiving immune checkpoint inhibitors. Recent studies [39] have shown that Bi dobacterium can effectively ght cancer cells and is associated with substantial improvement of gastrointestinal cancer. Liu et al [34] found that Prevotella and Treponema are more abundant in patients with positive lymphatic metastasis in ESCC and that Streptococcus and Prevotella are associated with poor prognosis in ESCC patients, suggesting that Streptococcus and Prevotella are prognostic indicators of ESCC. In future research, cell culture systems and animal models should be added to examine the pathogenic role of microorganisms in ESCC.
Research on the above-mentioned ora may help to formulate new strategies for ESCC prevention, diagnosis, early intervention and treatment.
Alpha diversity analysis of the two groups of bacteria showed that the diversity of the ESCC group ora was higher than that of the HC group. This result may be related to factors, such as the change in ora during the occurrence and development of ESCC and the relative speci city of different types of tumors.
In Beta diversity analysis, the PCoA study found that the red dots representing the EC specimens and the blue dots representing the HC specimens overlap on the coordinate axis, and the factor that caused the two groups of PC1 to separate was 16.62%. NMDS analysis showed that the esophageal bacterial variation in the ESCC group was greater than that in the control group. It was suggested that the ESCC group had high abundance and large variation.
LEfSe analysis found that the high risk of ESCC may be related to Phascolarctobacterium, Dialister, Clostridiales, S24_7, Rikenellaceae, and so on. Currently, Helicobacter pylori (H.pylori) infection is generally associated with EA [40], but whether or not ESCC is related to this infection remains controversial [41,42]. The presents tudy found that the abundance of Helicobacter ora in the EC group was reduced, but this genus has a large number of strains, and the correlation between H. pylori and ESCC needs to be studied.
We use the random forest method and select the TOP60 species to establish a model. The ROC curve is used to verify that the model is reliable and can effectively distinguish two groups of samples. Further comparison of the phenotypic types of BugBase found that the ESCC group has advantages over the HC group in metabolism, such as Aerobic, Anaerobic, Containers-Mobile-Elements, Facultatively-Anaerobic, and Forms-Bio lms, suggesting that the above related ora is rich and can be expanded from bacterial functions in the later stage. Although the pathogenesis of ESCC is not caused solely by bacterial ora activity, bacterial ora homeostasis may play an important role in the occurrence and development of EC and provide important clues and references for ESCC.
Colletively, data from the present study reveales that the esophageal microbial community structure of ESCC patients is different from that of HC, and the corresponding changes in esophageal micro ora maybe a certain correlation with ESCC. It is indicated that the detection of esophageal microorganisms has potential as a means of early diagnosis or screening of future ESCC. However, this study also has shortcomings. First, the ESCC patients included in the study were mostly male, and the number of ESCC was small. Second, the analysis of the ora of early esophageal cancer and precancerous lesions was lacking. Monitoring the changes in the composition of the ora during different stages of tumorigenes is and development will help clarify the mechanism of ESCC. The sample size and pathological type will be increased in our future studies to explore further the role of esophageal ora in ESCC.   proportional to the OTU abundance. Red represents the HC group and green represents the ESCC group.
Signi cantly different logarithmic LDA score was set to 2. Only the groups with signi cant differences are shown in the gure.

Figure 5
Species importance map Notes:The abscissa is the importance level, and the ordinate is the species name sorted according to importance. The gure re ects the genus of bacteria in the classi er that plays a major role in the classi cation effect, arranged from largest to smallest. Error rate indicates the error rate of the random forest method for predicting classi cation using the features below. Higher indicates that the classi cation accuracy based on the genus characteristics is not high, and the genus characteristics may not be obvious between the groups. Taking all levels as an example, the top 60 species are used for the drawing.

Figure 6
The ROC curve used to analyze the clinical accuracy of using differential bacteria obtained from the ESCC group and HC group for the diagnosis of ESCC Notes: On the ROC curve, the point closest to the upper left of the graph is the critical value with higher sensitivity and speci city.