Characteristic and Interplay of Esophageal Microbiota in Esophageal Squamous Cell Carcinoma

Background: Esophageal squamous cell carcinoma(ESCC) is severe cancer in the world. The role of esophageal microbiota for ESCC is still uncertain. In the current study, 120 paired tissues from ESCC patients were collected, and 16s rRNA sequencing was performed to explore the esophageal microbiota. Results: The present investigation shows that the diversity and composition of the microbiota in ESCC cancerous tissues and para-cancerous tissues is signicantly different, this variation between subjects beta diversity mainly explained by regions and sampling seasons. Species R.Mucilaginosa, P.Endodontalis, unclassied species in genus Leptotrichia, genus Phyllobacterium, and genus Sphingomonas were enriched in cancerous tissue. On the other hand, class Bacilli, N.Subava, H.Pylori, A.Parahaemolyticus, A.Rhizosphaerae, unclassied species in genus Campylobacter and genus Haemophilus were increased in para-cancerous tissue. Compared with the co-occurrence network in cancerous tissue, a denser and more complex association network was formed in para-cancerous tissue. Moreover, the above differential taxa also participated in both co-occurrence network but played quite different roles. Finally, the functional association analyses revealed the altered signaling pathways in ESCCs were correlated to esophageal microbiota. Conclusion: Compared with para-cancerous tissues, microbiota in cancerous tissues showed signicant differences in diversity and composition. The alterations in microbial co-occurrence network and functional pathways in ESCC tissues may be involved in carcinogenesis and the maintenance of local microenvironment for ESCC. These discoveries of the esophageal microbiota for ESCC patients may contribute to the etiology for ESCC prevention, diagnosis, early intervention, and treatment. unclassied species in genus Campylobacter and unclassied species in genus Haemophilus were enriched in para-cancerous tissue. Next, to explore the confounding effect of regions, sampling seasons, tumor location and risk index on host microbiota, we added these four covariates into ANCOM2 analysis. The results showed that the relative abundance of unclassied species in genus Leptotrichia, unclassied species in genus Sphingomonas and A.Rhizosphaerae could be inuenced by regions; the relative abundance of unclassied species in genus Campylobacter could be inuenced by sampling seasons and tumor location. However, none of the differential taxa were signicantly various in either low or high risk index with ESCC. tumor, node, and metastasis; HE: haematoxylin-eosin; SDS: sodium dodecyl sulphate; NBC: Naïve Bayes Classier; QIIME: Quantitative Insights Into Microbial Ecology; ASVs: amplicon sequence variants; FDR: false discovery rate.

In multivariate Adonis test, the beta diversity of cancerous tissue was signi cantly different compared with para-cancerous tissue whichever measured by Jaccard distance(P=0.004), Unweighted unifrac distance(P=0.004) or Weighted unifrac distance(P=0.028) (Figure 2A). About 1%-2% of the variance in beta diversity was explained by tissue type. What is more, the differences of beta diversity between within paired cancerous and para-cancerous tissue were associated with regions(P=0.032) and sampling seasons(P=0.042) based on Unweighted unifrac distance( Figure 2B).

Microbial composition analysis
Pair-wise PCoA results were displayed in Figure 4A. Based on Bray Curtis, Jacarrd, and Unweighted unifrac distance, microbiota from the two tissue types were separated into clusters.Total 9,453 features were found after the sequence denoised in all samples( Figure 6A). More taxa were observed in para-cancerous compared with cancerous tissue(8,133 vs 6,533) and about 5,213 taxa were detected in both tissue( Figure 3B). The composition of esophageal microbiota between cancerous and para-cancerous tissue in level phylum and genus was shown in Figure 3C and Figure S1, repectively. To elucidate the phylogenetic relationship of the cancerous and para-cancerous microbiota, the heat trees of microbiota(the relative abundance >0.1%) were plotted in Figure 3D. The relative abundance of correspondent branches of bacteria in phylum Proteobacteria, Bacteroidetes, Fusobacteria and Firmicutes was similar between cancerous and para-cancerous tissue. However, some bacteria were enriched in different tissues( Figure 3E). In class Alphaproteobacteria, the branches of bacteria of order Rhizobiales and Sphingomonadales were enriched in cancerous tissue while order Rhodospirillles was higher in para-cancerous tissue. The most bacteria of class Alphaproteobacteria and its branch were enriched in par-cancerous tissue except family Klebsiella. Moreover, the microbiota from phylum TM7 and its branches of bacteria, class Bacilli, family Helicobacteraceae and its genus Helicobacter and its species pylori were higher in paracancerous tissue.
Differential abundance analysis A total of 56 differential taxa were picked out by ANCOM2 algorithm( Figure 3G). As the host region exerted the strongest effect on microbiota, we divided all participants into Zhangzhou group and other regions group (Table S1).There were 32, 36, 16 differential abundance taxa between cancerous and paracancerous tissue in all regions group, Zhangzhou city group and other regions group, respectively( Figure 3F). There were only three shared differential bacteria in cancerous and para-cancerous tissue from different regions named family Enterobacteriaceae, unclassi ed species from genus Sphingomonas and genus Phyllobacterium ( Figure 3G). It was clearly ascertained differential bacteria in phylum Firmicutes and Proteobacteria were enriched in para-cancerous tissue from Zhangzhou city. Moreover, we observed an interesting alteration that the unclassi ed species in genus Mycoplane was enriched in cancerous tissue from other regions whereas enriched in para-cancerous tissue from Zhangzhou city. Sampling seasons were another powerful factor that in uenced host microbiota( Figure 3G). The microbiota in phylum Cyanobacteria in cancerous tissue sampled in summer had a higher relative abundance. The relative abundance of bacteria in phylum Proteobacteria were signi cantly enriched in para-cancerous tissue when sampled in spring and summer from Zhangzhou city.
Hence, the dominant candidate differential taxa (Table 2) between cancerous and para-cancerous tissue were selected according to the grand means of relative abundance were exceeded 0.1% from above 56 differential taxa. They were species R.Mucilaginosa, P.Endodontalis, unclassi ed species in genus Leptotrichia, unclassi ed species in genus Phyllobacterium, and unclassi ed species in genus Sphingomonas, which enriched in cancerous tissue. On the other hand, class Bacilli, N.Sub ava, H.Pylori, A.Parahaemolyticus, A.Rhizosphaerae, unclassi ed species in genus Campylobacter and unclassi ed species in genus Haemophilus were enriched in para-cancerous tissue. Next, to explore the confounding effect of regions, sampling seasons, tumor location and risk index on host microbiota, we added these four covariates into ANCOM2 analysis. The results showed that the relative abundance of unclassi ed species in genus Leptotrichia, unclassi ed species in genus Sphingomonas and A.Rhizosphaerae could be in uenced by regions; the relative abundance of unclassi ed species in genus Campylobacter could be in uenced by sampling seasons and tumor location. However, none of the differential taxa were signi cantly various in either low or high risk index with ESCC.

Microbial co-occurrence networks
To understand the interaction among esophageal microbiota in cancerous and para-cancerous tissue, we illustrated the microbial co-occurrence networks of two groups. There were 3,089 positive and 348 negative correlations in cancerous tissue, while the para-cancerous samples had 3,761 positive and 355 negative correlations. Obviously, the microbial co-occurrence networks were distinct between the cancerous and para-cancerous tissue ( Figure 4A). But, widely correlations were investigated in family Lachnospiraceae, species C.aerofaciens and unclassi ed species in genus Blautia in both of the two tissue types.
To quantify such difference, we counted the number of node and its centrality in the microbial networks under different tissue type. As expected, the numbers of interacted microbiota in para-cancerous tissue(273 nodes) were higher than that in cancerous tissue(201 nodes)( Figure 4A and B). The centrality of cooccurrence networks was described with three dimensions which were degree, betweenness, and closeness centrality, respectively. Interestingly, the degree and closeness of shared nodes between cancerous and para-cancerous tissue were quite different. Next, the edges of the networks were evaluated ( Figure 4C).
Despite having a few overlapped edges, the distribution of rank of overlapped edges varies in cancerous and para-cancerous tissue.
To reveal the importance of above candidate differential taxa in the network, the heatmap was shown in Figure 4D. The most of differential taxa in the network were from phylum Proteobacteria and certain differential taxa did not emerged in the co-occurrence networks The role of differential taxa in different networks(cancer vs para-cancer) was also various. The discrepancies of microbial co-occurrence networks in two groups may be attributed to the speci c metabolism in different tissue types.
The association between esophageal microbiota and predict function Most of the differential taxa in para-cancer were negatively associated with EC 2.7.10.1 which regulated the EGFR, ERBB2, ERBB4, and FGFR1 signaling pathways. Different from this, the differential taxa in cancer were positively associated with EC2.  Among the eight MetaCyc metabolic pathways which were signi cantly differed between cancer and para-cacner tissues ( Figure 5B, Table S2), the relative abundance of PWY-3661 and PWY-7431 was increased in cancerous tissue, and other six pathways(PWY-1882, PWY-5265, PWY-6565, PWY-6731, PWY-6906, and PWY-7391) enriched in para-cancerous tissue. Notably the differential taxa played various roles in different tissues. For example, unclassi ed species from genus Phyllobacterium and Sphingomonas were positively associated with PWY-6906 and PWY-1882 in cancer, whereas negatively with PWY-6731and PWY-6565 in para-cancer. Similarly, the correlation was demonstrated between taxa which were A.Parahaemolyticus and A.Rhizosphaerae and pathways including PWY-7431, PWY-7391, PWY-6731, and PWY-1882 in cancer, but PWY-6565 in para-cancer. In addition, the unclassi ed species in genus Haemophilus only associated with the pathways in cancer.

Discussion
To characterize the esophageal microbiota in the ESCC, we compared the microbial diversity and composition of paired cancerous and para-cancerous tissues for ESCC in this study. Moreover, 56 candidate differential taxa were determined, of which 13 taxa have more abundant. Besides, we suggested that the microbial co-occurrence network was another critical aspect of the microbial community. Furthermore, the relationship between differential taxa and predicted functional ESCC-related gene pathway was investigated.
We observed the differences of microbial diversity between the cancerous and para-cancerous tissues for ESCC, which was similar to ndings of Yang[8] and Yu [9]. The microbial structure is modi ed due to the altered esophageal microenvironment in the progression of carcinogenesis. Interestingly, in general linear regression model, the within-pair differences in microbial diversity were only impacted by regions and sampling seasons. It is in agreement with many studies that environmental factors including sampling regions or seasons are the relevant factors affecting microbiota. Daphna et al. report that more than one-fth of the difference in microbiome beta diversity can be inferred from diet and lifestyle for different seasons [16]. An animal experiment also indicated that seasonal variation plays a major role in the temporal changes of the microbial community structure [17]. Moreover, Samuel et al. found that the human fecal microbiota altered due to season cycling[18]. He et al illustrated that geographical factor could strongly alter the composition of gut microbiota [13]. However, there are no other studies considering the effect of environmental factors on the esophageal ora yet, except our previous research [10]. This study demonstrated again that the impact of sampling regions or seasons on the diversity of esophageal microbiota cannot be ignored. Therefore, further studies involving microbiomics are needed to be taken into consideration the effect of the environmental factors on microbiota to ensure the generalizability.
Several differential bacteria were detected in para-cancerous tissue for ESCC. Of note, strains of R.Mucilaginosa were differentially abundant in oral wash with head and neck squamous cell carcinoma compared to healthy controls [19]. It could be inferred that higher R.Mucilaginosa was always presented in cancer patients. What is more, it is reported R.Mucilaginosa upregulated TNF-α and upregulation of CD36 in all cell lines in cocultures [20] and increased abundance of R.Mucilaginosa exhibited the ability to produce acetaldehyde [21,22]. The above functional could explain the ndings of the present study that higher R.Mucilaginosa in esophageal cancerous tissue than that in para-cancerous. Another differential bacterium, P.Endodontalis, was dominantly associated with oral diseases [23]. Interestingly, P.Endodontalis was the most representative taxa which were abundant in gastric adenocarcinoma microbiome [24]. In accordance with our results, Richard et al identi ed colorectal mucosal genus Sphingomonas particularly enriched in colitis-associated cancer [25]. Moreover, Sphingomonas were higher in the bladder mucosa cancerous tissues [26]. This bacteria were called passenger bacteria according to Wang's hypothesis [27]. If microbiota is involved in carcinogenesis, we can expect that it will be enriched in the para-cancerous tissue. Many studies indicated N.sub ava is signi cantly more present in healthy control, as found in our study [28,29]. In addition, similar observations were previously reported H.Pylori was increased in adjacent compared to tumor tissues for gastric cancer, which is in the line with our results [30][31][32].It is been proven that H.Pylori is pathogenic bacteria again. Campylobacter usually linked to gingivitis and periodontitis, has recently been associated with gastrointestinal diseases [33]. Leung and Wang demonstrated Campylobacter as the key genera of the tumor mucosa in colorectal cancer [27,34]. Sandra investigated Campylobacter species appear to be more prevalent and abundant in the esophagus of patients in the Barrett's esophagus [35], but its abundance decreased with the progression to EAC [36]. Our current study suggested this nding in ESCC.
A disease-associated microenvironment could be affected by a speci c microbial network. Several studies have shown ecological interaction networks of microbiota in cancer and adjacent mucosal tissues in gastric [32] and colorectal cancer [27]. The esophagus is a part of the gastrointestinal tract, we highlight the esophageal microbial co-occurrence network for ESCC to extend the digestive researches. Our microbial co-occurrence network analysis suggests bacterial compositions in different sample types show speci c correlation patterns. Compared with the cancerous group, more microbial taxa were involved in the paracancerous group. And, they formed a denser and more complex association network. Interestingly, strong co-occurrence interactions formed by family Lachnospraceae, C.aerofaciens, and unde ned species from genus Blautia, showed the centralities of these taxa in both two networks. This suggests that hub taxa may have a predominant impact on the structure of the microbiota in ESCC patients, which deserves further investigations. Notably, the interactions of B.adolescentis in the cancerous samples may contribute to the formation of a disease-speci c interaction network. Further, the markedly increased interactions of Roseburia and its species R.faecis in the para-cancerous network might also contribute to the maintenance of the esophageal microenvironment. Moreover, the above differential taxa also participated in the co-occurrence network and had various importance. It is implicated that the microbial interaction is complex, the full consideration of microbial interact network should be taken in further microbiome studies.
We have preliminarily revealed differences in the predicted microbiota functions in cancerous and para-cancerous tissues. The enriched differential taxa in cancerous tissue were positively associated with the MET(EC2.1.1.107) and PTEN(EC3.1.3.16) pathway. The increased differential taxa in para-cancerous tissue were negatively associated with the EGFR, ERBB2, ERBB4, FGFR1(EC2.7.10.1), and MTOR(EC2.1.11.1) pathway. Mutations in PTEN, ERBB2, ERBB4, MET, and MTOR are found in ESCCs. Interestingly, regardless of the presence or absence of mutations in these genes in ESCCs, their products (ERBB2, MET, and MTOR) are often overexpressed in tumor tissues, indicating their vital carcinogenic role in ESCC [37]. It is possible that the abnormal pro led abundance of differential bacteria causes changes in expression of PTEN, ERBB2, ERBB4, MET, and MTOR and stimulates tumor growth. Additionally, the enriched differential MetaCyc pathways in cancerous tissue are linked to amino acid degradation. It has been implicated that microbiota in dysplasia tissues perturbed amino acid metabolism probably involving in the process of tumor development [38]. On the other hand, a large proportion of increased pathways in paracancerous tissue are associated with bacteria-related biomolecules synthesis. This could explain why the more abundant microbiota were detected in paracancerous. We found that H. pylori was negatively associated with the polyamine biosynthesis which was in line with the Takashima [39] investigation that H. pylori signi cantly inhibited proliferation of enzymes of polyamine biosynthesis of cells. What is more, Rousseau [40] explored that Haemophilus in animals created changes in peptidoglycan metabolism, it is also consistent with our results. To develop a deeper understanding of esophageal carcinogenesis, further studies are needed to examine the signi cance of microbial functional variations in the ESCC microenvironment.

Advantages and Limitation
In this study, esophageal mucosa samples were obtained undergoing surgery from ESCC patients, thus avoiding possible oral microbial contamination that may occur during upper digestive endoscopy sampling. Our work provided insights into the composition, function and interaction network of the mucosaassociated bacterial community in the tumor microenvironment in ESCC.
However, our study had certain limitations. Firstly, our experiment protocol of extract the DNA of bacteria did not include a bead-beating step. This could lead to an overestimation of the proportion of Gram negative bacteria in the results. Additionally, this study did not include esophageal tissues from individuals without ESCC for comparison, the application of these results was limited. Furthermore, PICRUSt2 is a predictive tool to assessment the microbial function, this method may not accurately re ect the biological functions of microbiota.

Conclusion
Compared with para-cancerous tissues, microbiota in cancerous tissues showed signi cant differences in diversity and composition. The alterations in microbial co-occurrence network and functional pathways in ESCC tissues may be involved in carcinogenesis and the maintenance of local microenvironment for ESCC. Future research would concentrate on veri cation using a larger number of samples and multiple centric populations, as well as expanding our work into cell culture systems and animal models to examine the pathogenic roles of microbiota in ESCC. These discoveries of the esophageal microbiota for ESCC patients may contribute to the etiology for ESCC prevention, diagnosis, early intervention, and treatment.

Studying population
We performed a hospital-based retrospective study of 120 patients pathologically diagnosed with primary ESCC between February 2013 and October 2017 at

Demographic and clinical information
The basic information of all the participants was collected through a detailed questionnaire comprising of sociodemographic status, dietary habits, daily physical activity, smoking status, alcohol consumption, family history of cancer and gastrointestinal symptoms. Clinicopathological features (viz. differentiation status, location, and tumor, node, and metastasis (TNM) stage) for each patient were also collected from their respective medical records.

Sample collection and preservation
Paired cancerous and para-cancerous tissue samples were obtained from each patient immediately after surgical resection in the operating room. The paracancerous tissue samples were from an area at a distance of 3 cm from the cancerous tissue. The tissues samples were cut into small pieces and placed in autoclaved cryovials, stored in liquid nitrogen for 1 h, and, then transferred to a -80℃ refrigerator for storage. All samples were evaluated by pathological haematoxylin-eosin (HE) staining.

Bacterial DNA extraction and 16S rRNA sequencing
The sodium dodecyl sulphate (SDS) method was used to extract bacterial DNA from the samples. The extracted DNA was quantitatively detected by Qubit uorometer(Invitrogen, America), and the results were acceptable. Each extraction was performed with a blank buffer control to detect contaminants from either reagents, or other unintentional sources. However, the negative controls detected too few DNA to prepare library and hence were not sequenced.

Sequence data processing
Raw sequencing data from patients with ESCC were imported into Quantitative Insights Into Microbial Ecology (QIIME2-2020.02) [41] and processed using the DEBLUR algorithm to denoise and then inferred exact amplicon sequence variants (ASVs). The detailed analysis work ow was presented in Figure 6B. The curated ASVs were aligned and annotated by the Naïve Bayes classi er using the Greengenes (version 13.5) database, and were used for the subsequent construction of the phylogenetic tree. ASVs were submitted to a pre-trained Naïve Bayes Classi er (NBC) trained on full-length 99% Greengenes reference for the taxonomic classi cation. Before diversity analysis, the threshold for rarefaction depth was decided by minimizing sequences loss while maximizing the number of samples. At the depth of 10000 sequences per sample, the richness of observed communities had tended to be saturated, with 113 pairs of cancer and para-cancer samples were kept for the Alpha and Beta diversity metrics construction. For the purpose of providing better resolution and limiting false discovery rate (FDR) penalty on statistical tests, the low abundance features (with total ASVs counts less than 100 or detected in less than 20 samples) had been ltered before the differential abundance analysis(955 features left) (Table S3).

Statistical analysis
Questionnaires and clinicopathological data were double-entered into EpiData (version 3.1, Denmark). The demographic and baseline clinical features were displayed using n(%).The individuals' risk index of ESCC was calculated by variables included age, smoking, drinking, eating speed, hot food, pickled food, and fruit from questionnaire (Sup_ le 1). All statistical analyses were evaluated using R software(R version 4.0.2), and two-tailed P<0.050 was considered statistically signi cant.
Since the samples were paired, the Wilcoxon sign rank test was applied for comparisons of Alpha diversity (observed ASVs, Shannon index, and Faith's Phylogenetic Diversity) between cancer and matched para-cancer tissues. For Beta diversity (Bray-Curtis, Jacarrd, unweighted UniFrac and weighted UniFrac distances), the Adonis action of PERMANOVA tests were implemented to evaluate whether the variation of distances could be explained by other controlled variables (ESCC risk scores, sampling seasons, residential regions, tumor locations, and TNM stages).
The general linear models were used to test whether the aforementioned controlled variables could impact the diversity metrics. For Alpha diversity, the paired differences of diversity metrics within each paired sample were calculated by subtracting the diversity values from cancer tissues to the corresponding paracancer tissues and served as dependent variables. While for the Beta diversity, the dependent variables were de ned as the pairwise distances between cancer and matched para-cancer tissues (within-subject distances). Before linear regression, all dependent variables were checked for normality, and the natural logarithmic transformation was applied for reducing skewness. All P values for controlled variables were corrected by the Benjamini-Hochberg FDR procedure.
PCoA plotting which based on the Bray-Curtis, Jacarrd, unweighted UniFrac and weighted UniFrac distances were used to depict the microbiome composition. The ANCOM2 tests [42]were performed to detect the differential abundance in different tissue groups. It had been documented that the regional variation limits applications of microbial-disease association models, therefore, the data were split into Zhangzhou group(50 pairs of samples) and other Cities from Fujian Province group(70 pairs of tissues) according to patients' residential regions.
For the functional prediction, the PICRUSt2 [43] pipeline was used to generate predictions for EC numbers and MetaCyc pathways. The strength of edges of microbial co-occurrence network were assessed by SPARCC algorithm [44], and the interaction network diagram visualized with Cytoscape [45].The top hub taxa were assessed by plugin cytoHubba [46] in Cytoscape. Then, we selected four well established pathogenesis enzyme genes in ESCC according to Lin [37] from KEGG database. Then, Spearman correlation was performed to explore the association between the differential taxa and four enzyme genes. DESeq algorithm[47] was applied to calculate the differential MetaCyc pathways and visualized by volcano plot, then the Spearman correlation between differential taxa and differential MetaCyc pathways was calculated.

Declarations Ethical approval
All procedures performed in studies involving human participants were carried out by the ethical standards of the institutional and/or national research committee, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from participants, and the study was approved by the Institutional Review Board of Fujian Medical University (number: 201495).

Availability of data and materials
The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Con icts of interest
The author(s) declare that there are no con icts of interest.  The median value of risk index in controls (see supplementary le 1) were used as the cut-off for low and high categories de nition.   The heat tree plot of of relative abundance(higher than 0.1%) of microbiota in cancer and para-cancer. (E) The heat trees of microbiota for log2 ratio of median relative abundance between cancer and para-cancer by univariate Wilconxon rank sum test. (F) The differential taxa between cancer and para-cancer in All regions(panel A), Zhangzhou city(panel B) and Other regions(panel C) after multivariate adjustment using ANCOM2. (G) Heatmap of relative abundance of differential microbiota(the red words of right side of heatmap is enriched differential taxa in cancer, the green is that in para-cancer).

Figure 4
The analyses of microbial co-occurrence networks (A) Co-occurrence network of the microbiota in cancer and para-cancer tissue(only Sparcc absolute r>=0.4 was shown). (B) Discrepancies of co-occurrence network nodes between cancer and para-cancer tissue and its measurement dimensions included degree, betweenness, and closeness centrality. (C) Discrepancies of co-occurrence edges between cancer and para-cancer tissue. (D) Discrepancies of the importance of differential taxa in the co-occurrence network.

Figure 5
The association between esophageal microbiota and predict function. (A) The association between the differential microbiota and ESCC-related functionalenzymes. (B) Volcano plot showing the differential MetaCyc metabolic pathways between cancerous and para-cancerous tissue. X-axis indicates the log2(fold change), and Y-axis indicates log10(FDR). The signi cant pathways which enriched in cancer (FDR < 0.05 and log2 fold change>2) are colored red dots, that are increased in para-cancer(FDR < 0.05 and log2 fold change<-2) are colored green dots. (C-D) The heatmap indicates the association between the differential MetaCyc metabolic pathways and microbiota in cancerous tissue(C) and para-cancerous tissue(D).