Elucidating the Gut Microbiome of Colorectal Cancer: 40 Fecal Bacteria as Non-invasive Biomarkers

Background: Colorectal adenocarcinoma (CRC) ranks one of the 5 most lethal malignant tumors both in China and worldwide. Emerging evidences have revealed the importance of gut microbiome on CRC, thus microbial community could be termed as a potential screen for early diagnosis. Importantly, compared with the whole microbial community analysis, few numbers of bacteria genus as non-invasive biomarkers with high sensitivity and specicity causing less cost would benet more in clinical. Methods: Here we analyzed the gut microbiome from 226 CRC patients and 156 healthy people by 16s rRNA sequencing. We analyzed the microbiome diversity between CRC patients and healthy controls. We used ExtraTrees classier to screening the biomarkers and took SVM (Support Vector Machine) model to test the specicity and sensitivity of our biomarkers. Results: Compared with the healthy gut, the microbial composition are divergent in CRC, especially the increase of some bacteria related to CRC and the decrease of some healthy bacteria. 40 bacteria genus exhibiting high weight for the healthy and CRC microbiome classication were screened as biomarkers for CRC. In addition, the combination of 40 biomarkers and FOBT showed an outstanding sensitivity and specicity for discrimination CRC patients from healthy controls. Conclusion: The method could be used as a non-invasive method for CRC early diagnosis. LEfSe (Linear discriminant analysis effect size) analysis performed to determine differences in taxonomy. The histogram with cladogram that phylum Bacteroidetes was overally highly accumulated in CRC while phylum Actinobacteria was overally less accumulated in health samples. Divergent alteration was observed at lower taxonomic levels from phylum Firmicutes and Proteobacteria.


Introduction
As one of the most common gastrointestinal tumors worldwide, colorectal cancer (CRC) ranks third in the world among men and second among women, affecting more than 1.36 million people every year [1].
Most of the CRC patients display no symptoms at early stages and the majority of CRCs develop slowly from adenomatous precursors [2]. It has been estimated that > 95% of colorectal cancer (CRC) would bene t from curative surgery if diagnosed at earlier or intermediate stages [3][4][5][6].Thus, early detection is of vital importance for improving the survival of CRC patients. Conventional screening including barium enema, colonoscopy and sigmoidoscopy, are uncomfortable, invasive, time consuming and expensive [7,8]. Fecal occult-blood testing (FOBT) and serum carcinoembryonic antigen (CEA) test are non-invasive methods, however, they are compromised by its low speci city [9][10][11][12][13]. More non-invasive screening methods with high speci city, high sensitivity, should be established for early detection of CRC.
Massive efforts in whole-genome sequencing and genome-wide association studies show that genetic factors only explain a small proportion of disease variance [14] and only about 5% cancers occur in the setting of a known genetic predisposition syndrome [15]. It has been established that epigenetic regulation altering gene expression alone, or in combination with inherited or somatic mutation plays important contribution to CRC [16]. As a result, an intensive effort has been undertaken on CRC early diagnosis, which is largely focus on the methylation detection of tumor DNA, or combined with the mutation detection on certain genes, however, methylation detection is compromised by its low sensitivity and speci city as well as the complicated detection [17][18][19]. More importantly, the epigenetic alteration can be strongly affected by some environment aspect, including diet habits or chronic alcohol consumption, which also affect human gut microbiota [20].
The gut microbiota maintains survival and metabolism with nutrients in the human body, and works with the human body to respond to external environmental factors, carrying out metabolic and immune activities, as well as maintaining human health [21]. Studies have shown that the changes in the structure and quantity of gut microbiota or the dysfunction are closely related to the damage to human health and colorectal carcinogenesis [22,23]. Studying the intestinal microbiome composition of colorectal cancer patients can open up new inspection methods for tumor screening. Recent studies have suggested that microbiota pro les determined by high-throughput sequencing may be effective in predicting CRCs [24]. It has been reported that peptostreptococcus anaerobius, an anaerobic bacterium enriched in the fecal and mucosal microbiota from CRC patients and promotes CRC [25]. In addition, a series of bacteria, including Bacteroides fragilis and a strain of Escherichia coli [26][27][28][29][30][31], Streptococcus bovis [32], Clostridium septicumand [33] and Fusobacterium nucleatum [34,35] have been reported the association with CRC.
Furthermore, metagenomic analysis of fecal microbiome has been performed and a couple of gene markers have been identi ed and validated as biomarkers for early diagnosis of CRC [36]. Difference in gut microbiota between colorectal cancer patients and healthy people combined with other methods such as fecal immunochemical test (FIT) CEA or other risks factors such as age and BMI index is required for improving accuracy [37,38].
We evaluated differences in bacterial communities in stool samples of colorectal cancers and non-cancer controls through 16S rRNA high-throughput sequencing. In additions, 40 microbial biomarkers have been identi ed for CRC early detection. We also evaluated the performance of microbial as non-invasive markers in large cohorts and compared effectiveness between FOBT and microbial. The microbial biomarkers combined with FOBT could be used as non-invasive early diagnosis.

Results
The gut microbiome is dysbiosis in CRC patients After quality ltering and primer trimming, a total of 5153 usable high-quality sequences reads were generated from 382 samples, the length of which was about 468 bp. In this study, a total of 4728 OTUs were obtained from the colon cancer group and 4331 OTUs from the healthy control group. 3906 OTUs were shared among two groups. Compared with 423 unique OTUs from healthy group, CRC group contained 822 unique OTUs ( Figure S1). Rarefaction curves of CRC and control samples showed almost plateaued, suggesting the sequencing was su cient ( Figure S2).
Based on the total OTU statistical sequence, fecal microbial richness, as estimated by ACE and Chao1 (Pvalues < 0.001, respectively) was signi cantly decreased in CRC (Fig. 1a, b). The fecal microbial diversity, estimated by shannon and simpson, did not show signi cance between control and CRC samples ( Figure  S3).When compared microbiota composition between CRC and healthy gut, beta-diversity exhibited difference between two group (p = 0.001) (Fig. 1c, d). These results suggested the dysbiosis gut microbiome in CRC patients.
The divergent taxonomic composition and functional performance of microbiota in CRC and healthy gut After quality ltering, sequences at a 97% sequence similarity were selected for taxonomic composition analysis. 21 bacterial phyla, 34 microbial class, 56 microbial orders, 107 microbial families, 209 microbial genera and 268 microbial species have been identi ed (Table S1).
The LEfSe (Linear discriminant analysis effect size) analysis was performed to determine differences in bacterial taxonomy. The histogram with cladogram showed that phylum Bacteroidetes was overally highly accumulated in CRC while phylum Actinobacteria was overally less accumulated in health samples. Divergent alteration was observed at lower taxonomic levels from phylum Firmicutes and Proteobacteria.
We further compared the functional capacity of the gut microbiota between CRC and healthy, transporter pathway, especially the ABC transporter pathway, are signi cantly increased in CRC gut, and large number of metabolism related pathways, such as vitamin B6 metabolism, energy metabolism, amino sugar and nucleotide sugar metabolism, fructose and mannose metabolism, phosphonate and phosphinate metabolism, pyruvate metabolism, phenylalanine metabolism, D-Glutamine and D-glutamate metabolism, sphingolipid metabolism, and nitrogen metabolism decreased in CRC gut, with the exception of glycerophospholipid metabolism. These results suggested the disorder of metabolism in CRC patients (Fig. 4).

Fecal microbial markers for CRC detection
The changes in the bacterial community between the two groups could be screened as biomarkers for colorectal cancer detection to assist in its diagnosis. To select the most relevant feature which could be term as biomarkers for CRC, the ExtraTrees classi er calculating feature importance score was performed. 40 signi cantly different features showing different abundance were selected for further analysis (Table 2).

Machine learning classi cation
To illustrate the diagnostic value of the selected biomarkers in the gut microbiome for colorectal cancer,  (Table 1). For FOBT test from stool sample has been widely used in diagnosis, we also performed the classi er with the 40 biomarkers together with FOBT test result. The sensitivity, speci city, precision, and accuracy increased into 93.6%, 92.9%, 95.8% and 93.3% (Table 1). Combination of 40 biomarkers and FOBT showed improved diagnostic performance as compared with 40 biomarkers alone, with AUROC from 0.887 to 0.962 (Fig. 5).

Discussion
The gut microbiome plays a major role in protecting the host against the overgrowth of pathogens and sustaining the health of colon. There is intensive evidence revealing the close relationship between gut microbiome and colonic disease, such as colorectal cancer [42][43][44]. In addition to causing intestinal diseases, gut microbiome is also contribute to obesity, diabetes, allergic asthma and neuropsychiatric diseases [45][46][47], thus, clinical monitoring of fecal bacteria can assist in the diagnosis of other diseases related to gut microbiome. Furthermore, gut status could be improved by arti cially guiding the intervention of diet or the intake of bene cial bacteria according to the changes of gut microbiome [48], and the improvement could be easily detect from the fecal bacteria. Thus, gut microbiome has become a hot spot in the clinical research.
We have performed high-throughput sequencing on the v3-v4 region of intestinal bacteria 16S rRNA gene in stool and described the patterns of gut microbiome relative to health and CRC patients. Fecal richeness from colorectal cancer patients decreased, in addition, the proportion of various bene cial bacteria decreased, and the proportion of harmful bacteria signi cantly increased. A dozen of opportunistic pathogens including Bacteroides and Prevotella were signi cantly increased in patients with colorectal cancer. A couple of pathogens, including Fusobacterium nucleatum [34,35], Peptostreptococcus anaerobius and enterotoxigenic Bacteroides fragilis [25,29] which have been established the role in CRC induction, were highly accumulated in CRC patients ( Fig. 3; Supplemental table 1). The dysbiosis characteristics could be facilitated to term as taxonomic biomarkers for CRC screening.
We performed machine learning using SVM model between pairs of cohorts to conduct binary classi cation for classifying CRC and control. A variety of features including taxonomic, functional [49], and k-mer-based [50] classi cation schemes has been used for machine learning approaches. Here, we used 40 bacteria genus showing a great contribution to differ the CRC state versus control as features for machine learning. In addition, FOBT test result was selected as a feature as well for its importance on CRC diagnosis in clinical. Our machine learning results showed high performances in CRC versus control models ( Table 1). The high performance of fecal bacteria and FOBT test from stool sample facilitates to establish a new non-invasive method for examination of colorectal cancer.
Circulating tumor DNA (ctDNA) is extracellular DNA originated from tumor cells and circulates in a number of bodily uids, including blood, synovial uid and cerebrospinal uid [51]. For the similarity of genetic and epigenetic information provides by ctDNA to that of invasive tumor biopsies, ctDNA has been widely used to detected the gene mutation and termed as a non-invasive diagnostic tool for several cancers [52]. In many tumors, increased methylation of tumor suppressor genes occurs at an early stage, thus, ctDNA methylation pro ling detection can be used for as an alternative non-invasive diagnostic tool [53][54][55]. Some speci c DNA methylation sites, such as SEPT9 have been identi ed as biomarkers of CRC [56,57]. However, the extremely low level in blood and the non-organ information of ctDNA gives a great challenge to early diagnosis.
In clinical application, changes of gut microbiome can be regularly monitored, early detection and treatment of CRC can improve the late survival rate and reduce the cost of late treatment. In this study, we monitored the gut microbiome and took 40 bacteria genus displaying high weight for classi cation between CRC and healthy gut as biomarkers for CRC early diagnose. Combined with FOBT test, our method showed an excellent performance on CRC early diagnose. The method bene t to those who cannot receive colonoscopy in a short time, and those who are not willing to use colonoscopy. Compared with the existing methods of CRC diagnosis, our method is non-invasive and painless, not only does it not require complex examination and preparation before sampling, but also improves the sensitivity and speci city of the test compared with the FOBT alone.

Conclusion
Based on the microbiome composition analysis from CRC and healthy controls, we have selected 40 bacteria genus for classi cation between CRC and health gut. Combined with the FOBT test, these 40 bacteria exhibited excellent sensitivity and speci city. We have proposed a non-invasive CRC early diagnosis method

Study participants and Stool samples collection
Stool samples were collected from 382 individuals undergoing colonoscopy at endoscopy center of Liaoning Cancer Hospital and Dongfang Hospital A liated to Tongji University, including 226 CRCs and 156 healthy controls. To avoid potential alternation of the gut microbiota, the exclusion criteria were: (1) a past history of any cancer; (2) use of antibiotics within the past 3 months; (3) had a surgery or an invasive procedure within the past 3 months; (4) had an in ammatory bowel disease. All enrolled subjects were asked to keep a steady dietary and lifestyle and leave fecal sample over 1.0 g in the special containment before bowel preparation for any endoscopy or surgery. After stool collection by the patients, samples were stored at -80℃ for further analysis.
16S rRNA gene sequencing DNA from stool samples was extracted using Qiagen QIAamp DNA Stool Mini Kit (Qiagen) according to manufacturer's instructions. Quality and quantity of extracted DNA were examined by electrophoretic separation in a 0.8% (wt/vol) agarose gel and NanoDrop 2000 spectrophotometer, respectively.
The hypervariable V3-V4 regions of the 16S rRNA gene were ampli ed using the primer set of 338F (5'-ACTCCTACGGGAGGCAGCA-3') and 806R (5'-GGACTACHVGGGTWTCTAAT-3'). PCR ampli cation uses Pfu high-delity DNA polymerase from TransGen Biotech, and strictly controls the number of ampli cation cycles to keep the number of cycles as low as possible, while ensuring the same ampli cation conditions for the same batch of samples. PCR ampli cation, puri cation of ampli ed product, sequencing library preparation and pyrosequencing were performed at paired-end 250 bp on the Illumina MiSeq platform by Personal Biotechnology, Co., Ltd. (Shanghai, China).

Sequence data processing
Raw sequencing data was processed using Quantitative Insights into Microbial Ecology (QIIME) v1.8.0 [58], and ltered by removing tags and primers. A quality cut-off was applied to discarding the reads (1) that shorter than 150 bp, with (2) an average Phred score lower than 20, (3) with ambiguous bases. After that, the ltered reads were assembled using FLASH software v1.2.7 with overlapping between the pairedend reads > 10. Chimeric sequences were ltered using USEARCH v5.2.236. After quality ltering and chimera removal, clean reads were then clustered into Operational Taxonomic Units (OTUs) at 97% sequence identity using UCLUST. The taxonomic classi cation was performed with Greengenes database release 13.8. Alpha diversity indices of Chao1, ACE, Simpson and Shannon were estimated. Beta diversity analysis was performed with UniFrac in QIIME. Non-metric multi-dimensional scaling (NMDS) was generated by R language release package for analysis based on distance.
Fecal occult blood test (FOBT) All enrolled subjects were asked to offer a valid fecal occult blood test report from a community hospital or a general hospital in recent 6 months. Stool samples with blank FOBT result would have to be examined using Fecal Occult Blood Diagnostic Kit (Colloidal Gold) (Chemtrue @ ) which has been approved by the Chinese Food and Drug Administration Bureau. The cut-off value for positive FOBT is 200 ng/ml according to manufacturer's instructions.

Statistical analysis
Signi cant differences among treatments were identi ed through one-way analysis of variance (ANOVA) followed bu Tukey's test. Typically, homogeneity of variance for the obtained data was tested and data of the test values > 0.05 were adopted for the ANOVA analysis. All statistical analyses were performed using SPSS 19.0 (IBM, New York, USA), and signi cant levels were reported at p < 0.05 and p < 0.01.