Mutational signatures shift induced by chemotherapeutic agents, 5-Fluorouracil and Oxaliplatin, in the gut microbiome

We developed a powerful framework for taxonomy composition and genomic variation analysis to investigate the mutagenesis effect and proliferation inuence of chemotherapeutic agents, such as 5-Fluorouracil (5-FU) and Oxaliplatin (Oxi) on gut microbiota. Using the gut microbiome data of 68 time serial stool samples, we detected 1.45 million variations among the chemotherapy groups and found the drugs signicantly affected mutation signatures of gut microbiota. About 786 faecal metagenomes of 755 individuals from 5 different cohorts were analyzed to build the mutation pattern of gut microbiota from health samples. Oxi notably increase transversion rate, while 5-FU reduced the rate. We also performed in vitro experiments to conrm that chemotherapeutic agents could disrupt the pattern of genetic variant in the intestinal microorganisms. Post-chemotherapy samples had specic gut microbiome signatures with higher abundance of Bacilli and a lack of anaerobic bacteria. In addition, drug-associated functional alterations were also found: metabolism changes in the 5-FU group implied that gut microbiota could provide additional NAD + to inhibit cancer cell autophagy; in the Oxi group, the ribosome and lysine biosynthesis genes were obviously enriched. According to molecular evolution analysis, traits related to protein secretion system showed evidence of strong selection pressure from the drugs, which could be a novel potential treatment strategy for chemotherapy-induced diarrhea. Our study provides a blueprint for characterizing the role of microbes and drug-microbe interaction in the gut microbiota response to chemotherapy. The Wilcoxon signed-rank test was performed to evaluate differences in richness and diversity for KEGG and microbial taxa at various taxonomic ranks. Mutation rate comparisons were conducted to determine statistical signicance between two groups for incidences where differences were declared signicant at P-value < 0.05 with the Student’s T-test.


Introduction
Chemotherapeutics have long been used to treat a variety of human tumors [1]. As one of the commonly used chemotherapy protocols, the FOLFOX regimen contains different chemotherapeutic agents (e.g. 5-Fluorouracil (5-FU) and Oxaliplatin (Oxi)) with a broad range of cytotoxicity [2]. These drugs can effectively inhibit DNA replication of cancer cells via different mechanisms of action. For example, 5-FU produces FdUMP, which directly inhibits thymidylate synthase and thus causes thymine-less cell death [3]. In addition, the products of 5-FU may be converted into FUTP or FdUTP, leading to RNA or DNA damage, respectively. An understanding of the interaction of chemotherapeutics and gut microbial populations will probably trigger applications towards improve the effectiveness of chemotherapy.
Chemotherapeutic agents may affect gut microbial communities by disrupting the homeostatic balance among resident microorganisms; meanwhile, they can accelerate the microbe evolution at molecular level [4][5][6]. Many recent studies have explored the interaction between microbiota and anticancer drugs along with interventions aimed at shaping microbiota to optimize drug e cacy and reduce side effects [7].
However, previous studies mainly focused on the taxonomic characterization and functional compositions of different cohorts at the genus or species level [8]. Variation in taxonomic abundance as well as functions encoded by these gut microbiota have been described in the cohorts of in ammatory bowel disease (IBD) [9], type 2 diabetic (T2D) [10], hypertension [11] and liver cirrhosis [12]. However, genomic variations within species, which leads to their phenotypic diversity and adaptations to chemotherapeutic agents, have only been studied in a few taxa [13]. For example, in the common gut commensal bacteria Escherichia coli [14] just few point mutations can confer clinically relevant antibiotic resistance, and the natural variation in a single gene can lead to pathogenic adaptation.
Given the importance of the gut microbiota in human health and a growing number of studies reporting associations between gut microbiota and diseases [15], a better understanding of the interaction of gut microbiota with chemotherapeutic drugs during chemotherapy help to improve the e cacy of the anticancer chemotherapeutic agents. We recruited 37 cancer patients and collected pre-and postchemotherapy stool samples to study the taxonomic composition, metabolic capacity and molecular evolution during the chemotherapy. In order to avoid the bias caused by distinct diet, regional and genetic differences, we collected healthy samples from well-known studies about Han Chinese with the standard clinical assessments and same sequence platform to build the basic of microbiome structure and mutation pattern. We analyzed our cohort and stool samples from T2D [10], liver cirrhosis [12], and hypertension cohorts [11] with the same bioinformatics pipeline and parameters. Our goal was to build a power framework for metagenomics analysis to gather basic knowledge on the genomic variation landscape and taxonomic abundance shift in gut metagenomes during chemotherapy.

Results
Overall composition of gut microbiota during chemotherapy To characterize the diversity and composition of gut microbiome during the chemotherapy, we characterized the gut microbial composition of 68 samples from 37 patients (6 samples were excluded due to DNA extraction failure) using 16S rRNA gene sequencing. Detail information about stool samples and patients were listed in the Supplementary Table S1 and S2. This cohort consisted of 7 patients using 5-FU, 13 patients using 5-FU + Oxi, and 17 patients using Oxi. All the participants were from a cohort study in Beijing Hospital. Using samples collected at the start of chemotherapy (Day0) and 30 days later (Day30), we traced the dynamic changes of gut microbial species by ecological alpha and beta diversity measures. The chemotherapeutic drugs caused modest changes in the gut microbiome. Shannon diversity and Chao1 index were calculated to estimate the within-sample (α) diversity (Fig. 1A) and there is no signi cant alteration pattern. Weighted UniFrac principal coordinate analysis was used to compare community phylogenetic composition among samples and the result revealed a separation between Day0 and Day30 along axis 1 (explaining 41.5% of the variation in the data, Supplementary Figure S1).
Beta diversity represents how much the community changed in comparison to the baseline (Day0) and was calculated with Weighted UniFrac distance between Day0 and Day30 (Supplementary Table S3). ANOSIM with permutations con rmed signi cant separation of samples (P-value = 0.02, r = 0.24). No signi cant differences in microbial communities were observed when the Oxi and 5-Fu + Oxi groups were analyzed separately. Although the adjusted P-value (adjusted P-value = 0.06) is not signi cant, it indicated that there was a difference pattern in the microbial composition of Day0 and Day30 with 5-Fu. Next, we used LDA effect size (LEfSe) to analyze the microbial communities of different chemotherapeutic drugs. For all Day30 samples, we identi ed gut microbiome signatures with higher abundance of Bacilli (speci cally Streptococcaceae and Lactobacillaceae, Fig. 1B). For 5-Fu, seven species were discovered as biomarkers for separating gut microbiota between Day0 and Day30 (Supplementary Figure S2). Five of these species were higher, and two were lower in the Day0 than Day30. For example, the abundances of Bacteroides coprocola and Streptococcus anginosus were lower in Day30 (P-value = 0.03, two-sided Wilcoxon signed-rank test). Contrary to 5-FU, in the Oxi group, there were 8 of these species were higher and on was lower in the Day0 than Day30. No signi cant change was observed in the 5-FU + Oxi group. To gain further insights into the interaction between chemotherapeutic drugs and gut microbiota, we used the same gut microbiota DNA preparations for independent Illumina shotgun sequencing. The yielded sequences belonged to 10 phyla and 381 species, including several DNA viruses (Supplementary Table S4 and Figure S3). Firmicutes was the most abundant phylum, accounting for 40.9 ± 22.3% of the total reads, followed by Bacteroidetes (36.3 ± 23.7%), Proteobacteria (15.7 ± 23.4%) similar to previous studies [12]. Two commensal species including S. salivarius and L. salivarius were signi cant enriched after the chemotherapy. (Supplementary Figure S4) Correlation tests between the abundance estimates for bacterial taxa at genus level were performed by using the two methods (Pearson correlation coe cient: 0.84; P-value < 2.2 × 10 − 16 ; Fig. 1C and Supplementary Figure  S5) highlighted a highly positive correlation between 16S rRNA and shotgun metagenomics sequencing.

Functional alterations in gut microbiota during the chemotherapy
To determine the functional alterations with the chemotherapeutic agents, we use the HUMAnN2/MinPath [16] and Kyoto Encyclopedia of Genes and Genomes (KEGG) database [17] to evaluate gut microbial functions across groups in our study cohort. According to the HUMAnN2/STAMP analysis of the metabolic function pathways [18], most differences occurred in carbohydrate metabolism and energy related pathways in the 5-FU group (Fig. 2). The microbiome from Day30 included more genes involved in nicotinamide adenine dinucleotide (NAD) salvage pathway (PYRIDNUCSAL-PWY) than that from Day0 in the 5-FU group (Adjust P-value = 1.4 × 10 − 4 ). Previous study suggested an active role for the NAD salvage pathway in modulating cancer cell viability via the replenishing of the NAD reservoir [19], which is a novel strategy to protect cell against DNA damage. Moreover, amino acid biosynthesis, such as L-phenylalanine and arginine, UDP-N-acetyl-D-glucosamine biosynthesis pathway were more active in the samples in the 5-FU group. In other two groups, few pathways were found to be signi cantly different before and after chemotherapy (Supplementary Figure S6).
In order to deeply explore the functional role of the gut microbiota during the chemotherapy, all the genes were aligned to the KEGG database and the abundance of KEGG Ortholog groups (KOs) and pathways for each sample was estimated using the method similar to RPKM [20]. About 277 KOs were signi cantly enriched among the three groups (P-value < 0.05, two-side Wilcoxon rank sum test; Supplementary Table   S5). Overrepresentation of 3 KEGG pathways involved in metabolism was observed in the 5-FU group with hypergeometric distribution, which involved in the categories "Galactose metabolism" and "Phosphotransferase system (PTS)" (Supplementary Table S6). Enriched genes encoding PTS, which re ected increased representation of multiple carbohydrate transporters, was an important mechanism to confer resistance to selection pressure, such as antibiotic, as reported in the previous studies [21]. In the Oxi group, two KEGG pathways "Ribosome" and "Lysine biosynthesis" were signi cantly enriched.
Although Oxi, a platinum-based drug, is generally considered to be DNA-damaging agents, recent studies have unexpectedly shown that Oxi do not alter DNA integrity but instead inhibit rDNA transcription, leading to p53 induction, most likely through the "impaired ribosome biogenesis checkpoint" [22].

Genomic variation in intestinal microorganisms
To enable comparative analyses in multiple metagenomes and to identify the mutation signatures in the different groups, we modi ed the previous method [23] to identify single nucleotide polymorphisms (SNPs), short insertions / deletions (InDels, range 1 bp to 50 bp) and structural variations (more than 50 bp) in each sample. First, we used 3,934 prokaryotic genomes to generate a set of reference genomes (Supplementary Table S8) for the analysis of genomic variation in gut microbial species in 786 samples. Next, local realignment around high-quality InDels, which is a frequently used strategy in genomic variation analysis, was performed to correct mapping errors to improve the accuracy of variant calling [24]. We only considered variants with allele frequency larger than 1% and supported by more than 5 reads. To validate our SNP calling procedure, we used the approaches to calculate error rates in 40 essential single-copy marker genes, following the method mentioned previous [23]. False-positive rates were estimated at 3.6% (Supplementary Table S9).
We identi ed 1.45 million SNPs from 343 genomes (at least 3 SNPs for one genome), of which 1.38 million SNPs (95%) in 47 genomes (0.88% of the total 156 Mb) across 68 stool samples from 37 subjects (Supplementary Figure S7). We also identi ed 26,662 indels and 56,820 structural variants. Subsequent analyses were restricted to SNPs due to their orders of magnitude higher count over other variation types. Examination of the SNP distributions of the protein coding genes of the selected 47 gut bacteria revealed that 4189 genes with at least one non-synonumous mutation in the 47 microbe species had valid coverage (≥ 10X depth) with su cient prevalence (Supplementary Table S10). Among them, we identi ed 22 genes (0.5%) with signi cantly differentiated SNP densities between Day0 and Day30 samples (Wilcoxon signed-rank test, P-value < 0.05). Most of these genes (17/22) were found in Bacteroides thetaiotaomicron VPI-5482, the second most common human commensal bacteria, whose reference genome contains 4779 protein-coding genes [25] (Supplementary Table S11).
Then, we checked the substitution pattern shift between Day0 and Day30. A large portion of base substitutions were attributable to C > T substitution, which ranged from 29.5% to 43.0% in Day0, and from 35.0% to 42.3% in Day30. Comparisons of the mutation rates in other Chinese cohorts, including liver cirrhosis [12], hypertension [11] and T2D [10] (Fig. 3 and Supplementary Table S12) showed a signi cantly reduced transversion (Student's T test, adjust P-value = 0.0003) and transition rate (Student's T test, adjusted P-value = 0.01) the 5-FU group (Supplementary Table S13). By contrast, we found an increased ratio of transversion (Student's T test, adjusted P-value = 5.4 × 10 − 4 ) and transition (Student's T test, adjusted P-value = 8.2 × 10 − 7 ) in the group of Oxi. The similar mutation pattern was also observed the studies about the effect of antibiotic on the genome [26]. Thus, we conclude that genome-wide substitutions rates of gut microbiota were in uenced by chemotherapeutic drugs. To directly test whether selection might have biased the mutation rate of different regions in the genomes, we examined the synonymous and nonsynonymous mutation from coding region from all cohorts. Nonsynonymous / synonymous mutation ratio from Day30 were not signi cantly different from Day0 (Student's T test, Pvalue > 0.05 in all comparison), indicating that the vast majority of acquired amino acid-altering mutations were not selectively promoted by chemotherapeutic agents but simply accumulated in a neutral fashion.
In order to validate the shift of mutation pattern, we explored how stool-isolated microorganism mutation rates change in vitro when treated with 5-FU and Oxi. We exposed four gut species from four different patients (two Escherichia coli strains, Citrobacter sp. MGH104 and Enterococcus faecium), all of which were with signi cantly low growth rate limited by the chemotherapeutic agents, in different concentrations of 75 µM 5-FU and Oxi (Supplementary Figure S8). Most notably, the transition/transversion ratio of mutations do covary with agents concentrations (Supplementary Figure  S9).

Dn/Ds across gut species and individuals
To gain further insights on the molecular mechanisms driving the functional diversi cation of the gut microbiota, the gene families identi ed in the assembled metagenome were annotated based on the KEGG and we calculated, for each KO, the ratio between the number of nonsynonymous (Dn) and synonymous (Ds) changes, a proxy for evolutionary pressure (Supplementary Figure S10). Our analyses showed that the average Dn/Ds was 0.063 the median was 0.016 from 939 KO (Supplementary Table S14 and Figure S10). About 17.9% (17.6% in the 5-FU group, 17.7% in the 5-FU + Oxi group, and 18.3% in the Oxi group) of the gene families had signi cantly higher Dn values and lower Ds values than the mean value calculated over all annotated sequences (one-side Fisher test, FDR < 0.05), suggesting that they may be under positive selection (Supplementary Table S15 and Figure S11). A closer investigation of these gene families revealed that positive selection signatures markedly characterize diverse proteins involved in the cell division, as well as proteins essential for amino acid biosynthesis ( Fig. 4 and Supplementary Figure S12). According to the abundance analysis of KO items between Day0 and Day30, we found genes were under signi cant positive selection pressure, such as secY, which is a key in the protein secretion system and important for critical cell functions, like pathogens and virulence. In addition, as in the previous study [27], we also found the coding sequence of phage infection protein (yhgE) and genes related to transport system were under positive selection. Thus, the genes under positive selection might have a key role in the interaction between bacteria and chemotherapeutic drugs and provide the bacteria additional survival advantage during the chemotherapy.

Discussion
The human gut microbiota is highly complex and exists in a dynamic balance between symbiosis and pathogenesis, which can in uence almost any aspect of host physiology [28]. Growing evidence suggests that the gut microbiota not only plays a key role in carcinogenesis but also in uences the e cacy and toxicity of anticancer therapy [29,30]. The microbiota modulates the host response to chemotherapy via numerous mechanisms, such as alteration of community structure and immune microenvironment. Furthermore, exploitation of the microbiota offers opportunities for the personalization of chemotherapeutic regimens and the development of novel therapies.
Chemotherapeutic agents, Oxaliplatin and 5-FU, exert their cytotoxic effect mostly through DNA damage. When DNA damage is caused by chemotherapeutic drugs, the microbiota composition also changes, which can further affect drug e cacy and overall health. It's very important to systematic explore the interaction between chemotherapeutic drugs and gut microbiota not only from microbiome composition [31,32] but also genomic variation [23,33]. In this study, we rst used two different methods (16S rRNA and shotgun metagenomics sequencing) to investigate the disruption of the intestinal microbiome in terms of taxonomic composition. Correlation tests between the abundance estimates for bacterial taxa show that the two methods highlighted a highly positive correlation. According to the 16S rRNA sequence, we found that different chemotherapeutic agents had distinct effect on the gut microbiota. 5-FUassociated bacteriome shifts included depletion of common health-associated commensals from the genera Streptococcus and Bacteroides and enrichment of Gram-negative bacteria such as Clostridium hathewayi and Lachnospiraceae bacterium. Previous study has shown that 5-FU could improve T celldependent antitumor immunity [34] and Clostridia strains play a key role in enhancing T reg cell abundance and inducing important anti-in ammatory molecules, such as interleukin-10 [35]. In contrast, Oxi-associated bacteriome showed different shift patterns including depletion of commensals from the genus Lachnospiraceae noname and enrichment of genus Lactobacillus and Streptococcus.
Subsequently, we use shotgun metagenomics with a quantitative metagenomic species approach to identify which species signi cantly change during the chemotherapy with different agents. Two commensal species including S. salivarius and L. salivarius were signi cant enriched after the chemotherapy. According to the previous studies, these two species appeared to be highly resistant to 5-FU [36], which maybe the main driver for a community dysbiosis. We hypothesized that if 5-FU was responsible for the dysbiosis changes seen during chemotherapy, then depleted taxa should be susceptible to the drug while the enriched species would be resistant to 5-FU. The resultant damage to the intestinal barrier increases the risk of colitis, bacterial translocation and infection. Probiotic supplementation may have extra bene cial effects to correct dysbiosis of gut microbiota after the chemotherapy.
Concomitant with the alteration of gut microbial composition, we also observed a dysbiosis in bacterial gene functions and different chemotherapeutic agent showed distinct in uence patterns. Microbial metabolism may cause side effects severe enough to necessitate cessation of chemotherapy. The metagenome of 5-FU patients was enriched in genes associated with NAD salvage pathway. Since NAD + plays central roles in a variety of biological processes ranging from cellular metabolism and energy production in both human and microbiology. So, it is reasonable to hypothesis that gut microbiota could provide additional NAD + to inhibit cancer cell autophagy and enhance survival of cancer cells [19]. By contrast, the metagenome of Oxi patients were enriched in genes associated with Ribosome. Oxi, unlike cisplatin and carboplatin, kills cancer cells not only through the DNA-damage response, but also by inducing ribosome biogenesis stress [37]. In order to adapt the selection pressure from Oxi, only the species with high copy number of genes related to ribosome can survive during the chemotherapy similar to cancer cells, which may be why genes involved with ribosomes were signi cantly enriched.
Next, in order the study the effective of DNA damage caused by chemotherapeutic agents, we collected the metagenomics sequencing data of 768 stool samples from 755 Chinese human individuals with the standard clinical assessments and re-analyzed the shift of mutation pattern with uni ed bioinformatics pipeline. Consistent to our expectation, we found obvious shift in the mutation signatures. A signi cantly reduced transversion rate (G:C->T:A) were observed in the 5-FU group. In contrast, we found an increased ratio of transversion in the Oxi group. What's more, the genome-wide substitutions rates of gut microbiota were in uenced by chemotherapeutic drugs without bias among different genomic regions. To further con rm that the shift of mutation signatures were due to strong and diverse environmental selection, we analyzed the patterns of correlation between gene function and the Dn/Ds ratio across all the KO modules. The stable Dn/Ds ratios of most KEGG modules (80.1%) between Day0 and Day30 suggest that the core microbiome were evolutionarily conserved. However, there are some modules, such as genes involved with cell division, protein export and phage-related show high level selection pressure. A previous study has shown that phage-related genes usually have fast mutation rate and are under positive selection [38]. However, the other genes, such as zapA and mutT, which are housekeeping genes involved in cell ampli cation and previously are thought to be with slow rate of amino acid evolution, have remarkably high Dn/Ds ratio and are under positive selection pressure. On the other hand, in order to adaptive to pressure of chemotherapeutic agents, bacteria take different strategies to obtain the survival advantage during the therapy, for example by secreting an amount of protein to modify the host microenvironment with Sec system, which was a potential chemotherapeutic target to many human pathogen.

Conclusions
We have described the disordered pro les of gut microbiota in cancer patients treated with different chemotherapeutic agents, explored the mutation pattern of different drugs, and provided a new clue for the interaction between drugs and microbe. Our ndings suggest that a distinct intestinal microbiome pattern of gut dysbiosis during the chemotherapy is dominated by Bacilli and a lack of other bacteria and the potential restorative in uence of probiotic supplements should be investigated in the chemotherapeutic research. Positive selection on the protein export system provide additional survival advantage for gut microbiota and could be a potential therapy target for gut dysbiosis.

Methods & Materials
Cohort recruitment and sample collection Participants were recruited in the Oncology Department of Beijing Hospital, Beijing, China, in 2018. This study was approved by the Beijing Hospital Ethics Committee. Written informed consent was obtained from all patients. The exclusion criteria included: with a history of in ammatory bowel diseases; having been exposed to probiotics, prebiotics or broad-spectrum antibiotics within 30 days; or having received nasal-tube feeding or parenteral nutrition in the month prior to initiation of the study. Detail information about all patients were listed in the Supplementary Table S1. We collected two faecal samples from each participant. A faecal sample was collected on hospital in-patient admission (Day0), prior to administration of chemotherapy, and 30 days later immediately prior to chemotherapy (Day30). For each faecal sample, one gram of stool was collected into a sterile tube and then subsequently stored at -80 °C for molecular analysis. Total genomic DNA was extracted from 200-500 mg of fecal samples using a QIAamp DNA Stool Mini kit (Qiagen, Germany) according to the manufacturer's instructions. The concentration of the extracted DNA was measured by a NanoDrop2000 (Thermo Scienti c, USA), and then the DNA was stored at -80 °C.

16S ribosomal RNA sequencing and bioinformatics analysis
To develop the 16S rRNA amplicon libraries, the V3-V4 region of 16S rRNA gene was PCR-ampli ed with the primers 341F (5'-CCTAYGGGRBGCASCAG-3') and 806R (5'-GGACTACNNGGGTATCTAAT-3'), modi ed by adding barcodes for multiplexing [39]. Pooled amplicons were paired-end sequenced (PE 2 × 250 bp) by using an Illumina HiSeq 2500 platform according to the manufacture's protocol. Paired-end reads were merged and quality ltered using FLASH and QIIME [40]. Chimeras were detected and removed against the Gold reference database using the UCHIME algorithm [41]. Sequences with ≥ 97% similarity were assigned to the same OTUs. For each representative sequence, the GreenGenes Database was used based on RDP classi er algorithm to annotate. Alpha diversity analysis including Shannon and Chao1 were calculated with QIIME. Phylogenetic beta diversity distances, including unweighted and weighted UniFrac distances were measured using QIIME. Principal-Component Analysis (PCoA) was performed to visualize by ggplot2 package in R software (Version 3.5.0). Further, diversity analyses, such as Adonis and Anosim, were performed by running a work ow on QIIME.

Shotgun metagenomics sequencing library construction
For shotgun metagenomics sequencing, adaptive focused acoustics (Covaris) was used to shear a standard volume of 50 µl of the extracted DNA. DNA libraries were prepared using Illumina TruSeq Sample Preparation Kit (Illumina, USA) according to the manufacturer's protocol. Libraries were quanti ed using an Agilent Bioanalyzer 2100 (Agilent Technologies, USA). Paired-end sequencing (PE 2 × 150 bp) was performed on successful DNA libraries using an Illumina HiSeq X-Ten instrument at the Annoroad Genome Biotech at Beijing. Finally, we obtained 4.27 GB to 8.9 GB of raw data for each sample (average 5.98 GB, Supplementary Table S2).

Metagenomics bioinformatics analysis
Concatenated and human DNA sequences were removed using KneadData with default parameter (http://huttenhower.sph.harvard.edu/kneaddata). The remaining high-quality reads were assembled with IDBA-UD (Version: 2.04, parameters: --pre_correction --min_contig 200) [42]. We mapped the clean reads against scaffolds using Bowtie2 (Version: 2.3.4.2) [43]. Genes (minimum length of 100 nucleotides) were predicted on contigs longer than 500 bp and annotated using Prokka (Parameters: --metagenome) [44]. The gene abundance was determined by using the method similar to RPKM (reads per kilo bases per million reads) used for quantifying gene expression from RNA sequencing data. In brief, high-quality reads were counted with HTSeq-count [45]. For each gene, G i , the number of reads that aligned to it divided by the length of the gene was calculated as NG i and the relative abundance RNG i , of each gene in each sample (n genes) was computed using the following formula: Species-level quantitative taxonomic pro ling was performed using MetaPhlAn2 (Version: 9760413b180f) on the KneadData-ltered reads [46]. Community composition was calculated with MetaPhlan2 using the default settings. Taxonomic pro les included bacteria, archaea, microbial Eukaryotes, and viruses were inferred by MetaPhlAn2 using the 1M unique clade-speci c marker genes identi ed from 17,000 reference genomes (13,500 bacterial and archaeal, 3,500 viral, and 110 eukaryotic).

Functional annotation
Two different methods were used for functional alteration and pathway composition analysis. First, pathway-level composition was calculated with HUMANn2 using the UniRef90 database with default settings, which was followed by further statistical analysis and visualization in STAMP [18]. In order to obtain the detail information of each metabolic pathway, all genes in our genomes were aligned to the KEGG database using DIAMOND (Version 0.7.9.58, parameter: -k 50 -sensitive -e 0.00001) and KOBAS [47]. Each protein was assigned to the KEGG Orthology (KO) families by the highest scoring annotated hits containing at least one HSP scoring over 60 bits. The abundance of KEGG orthology / module was calculated using the methods as mentioned above. For KEGG orthology enrichment analysis, signi cant enriched KEGG pathway was de ned as the adjusted P-value (< 0.05) of the hypergeometric test as the previous study [48].

Generation of a reference genome set
We followed the previous method to build the reference genomes to study the genomic variation of gut microbiota [23]. About 3934 prokaryotic genomes were downloaded from NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria) on August 1 2018. A set of 40 universal single copy marker genes was identi ed in these genomes using HMM pro les made for each marker gene from the corresponding orthologous group from eggnog database [49,50]. For each marker gene, pairwise DNA sequence identities between all genomes were calculated using BLASTn (Version 2.2.25, parameter: evalue < 10 − 5 -F F) [51]. For each genome pair, the median identity of all marker genes was used as a proxy for average nucleotide identity (ANI) between the two genomes. Using an operational 95% ANI recommended for identifying species, we generated 929 clusters of genomes. In order to select a reference genome from each cluster, high quality reads from a subset of shotgun metagenomes (bgi-BGI-06A, bgi-DLF001, bgi-DOF002 from T2B cohort, nHF611710, nHF411719 from HTN, HD-2, HD-31 from liver, 1-1, 4 − 1, 5 − 1 from this study) were mapped to the 3934 genomes using Bowtie2 with the options "--very-fast". Then, the genome with the highest read coverage was selected resulting in a set of 958 reference genomes, each likely representing a unique species (see Supplementary Table S8) Mapping to reference genomes and variant calling Illumina reads from 718 fecal samples, including the T2D, Liver and Hypertension cohorts were quality controlled using a KneadData. KneadData-ltered reads were mapping to reference genomes using Bowite2 (Version: 2.3.4.2) with default parameter. The number of reads mapping to reference genomes were counted and normalized by the genome size in order to obtain quantitative relative abundances of each genome in every sample.
To improve the accuracy of genomic variation, we modi ed the protocol of previous study [23]. First, the SAMtools/BCFtools suite was used for calling high quality InDel supported by more than 5 reads and at least one read on each strand [52]. We then performed local realignment of the aligned reads with those high-quality InDel using the Genome Analysis Toolkit (GATK) [53] to minimize the mismatch bases. GATK's HaplotypeCaller was used to call variants with default parameter. BreakDancer was used to detect the structural variation with at least three reads supported the event [54].

Positive selection analysis
CDSs were annotated based on their protein family membership using KOBAS [47]. With MUSCLE [55], a multiple sequence alignment (MSA) of the protein sequences was created using each KEGG Orthology.
Based on the MSA and the CDS nucleotide sequences, a codon-based alignment was constructed for each KO module with PAL2NAL using default parameters [56]. We then used FastTree, a relaxed neighbour joining algorithm, to reconstruct a phylogenetic tree for each protein family from the obtained MSA [57]. We excluded low-con dence positions in the alignment with a large number of gaps with Gblocks [58]. Dn/Ds was calculated with PAML [59]. A one-sided Fisher's test was performed to identify protein families with a signi cant enrichment of Dn versus Ds changes in comparison to the entire sample. The false discovery rate (FDR) was controlled using the Benjamini and Hochberg procedure and alpha set to 5%.
in vitro experiment to validate the mutation pattern shift A lter-sterilized stock solution of 100 mM 5-FU, Oxi and 5-FU + Oxi (1:1) (Sigma Aldrich) were prepared in DMSO and further diluted to 50, 75 and 100 mM. Stock solutions were further diluted (1:1000) in culture medium for the experiments. Four mono-isolated gut microorganism were obtained from the stools of recruited patients. LB agar plates used for passaging were prepared by adding stock 5-FU, Oxi and 5-FU + Oxi. Subsequently, cultures were transferred into fresh plates and allowed to grow 48 h for each batch.
After 20 batches, single colony on each plate were then scraped off and transferred to Eppendorf tubes with liquid LB culture and cultured overnight at 37 °C. Total DNA was then extracted using the QIAamp UCP Pathogen Mini Kit (Qiagen, German). The concentration of the extracted DNA was measured by a NanoDrop2000 (Thermo Scienti c, USA). Illumina DNA sequence libraries were prepared as the mentioned above. JSpeciesWS was used to assign taxonomy of each isolated strains [60]. Reads were then mapped to the closest reference genomes with BWA and variant calling was performed as mentioned above.

Statistical analysis
The Wilcoxon signed-rank test was performed to evaluate differences in richness and diversity for KEGG and microbial taxa at various taxonomic ranks. Mutation rate comparisons were conducted to determine statistical signi cance between two groups for incidences where differences were declared signi cant at P-value < 0.05 with the Student's T-test.

Data Availability
All sequences generated in this study are available in the NCBI sequence read archive under the accession numbers PRJNA551354.

Declarations
Ethics approval and consent to participate  Cladogram derived from LEfSe analysis of metagenomic sequences based on the shotgun sequencing compared Day0 with Day30. Green shaded areas indicate microbe orders that more consistently describe the fecal microbiome from Day30; red shaded areas indicate microbe orders that more consistently describe from Day0. The pre xes "c", "o", "f", "g" and "s" represent the annotated level of class, order, family, genus and species. C. Comparison of 16S rRNA and Metagenome Abundances The tree represents all taxonomically classi ed species from the shotgun metagenome survey as well as 16S rRNA sequence. The branches of the tree do not re ect evolutionary distances. The position of the dots in the tree corresponds to the taxonomic placement of the representative sequences in the NCBI taxonomy. Empty dots represent the phylotypes found in the shotgun metagenome classi cation, red dots were identi ed from both methods.