Flavonoid-modifying capabilities of the gut microbiome – an in silico study

Tobias Goris (  tobias.goris@dife.de ) German Institute of Human Nutrition Potsdam-Rehbruecke: Deutsches Institut fur Ernahrungsforschung Potsdam-Rehbrucke https://orcid.org/0000-0002-9977-5994 Rafael Cuadrat German Institute of Human Nutrition Potsdam-Rehbruecke: Deutsches Institut fur Ernahrungsforschung Potsdam-Rehbrucke Annett Braune German Institute of Human Nutrition Potsdam-Rehbruecke: Deutsches Institut fur Ernahrungsforschung Potsdam-Rehbrucke


Introduction
Flavonoids, which are plant secondary metabolites exclusively, are considered to bear many bene cial effects on health [1][2][3][4][5] and are assumed to contribute especially to a lower cardiovascular disease and cancer-related mortality [6,7]. Most avonoids are taken up as a part of human diet mainly as glycosides, with a few exceptions, such as avan-3-ols in tea or cocoa. A portion of dietary avonoids is modi ed in the small intestine, where e.g. O-deglycosylation by lactase-phlorizin hydrolase secreted by epithelial cells can occur. The produced aglycons may enter epithelial cells through passive diffusion. Absorbed avonoids then undergo phase II transformation such as glucuronidation, methylation, and sulfonation [8][9][10] before entering the bloodstream. Another part of dietary avonoids (or their phase I/II metabolites) reach the large intestine, where they are subject to transformation by gut bacteria [11][12][13]. Especially members of the genera Bi dobacterium and Lactobacillus were observed to O-deglycosylate, e.g. avanone and iso avone glycosides [11]. While O-deglycosylation was the most extensively described avonoid deglycosylation in the literature with many glycosides from avonols (e.g. rutin in several vegetables), avanones (often found in citrus fruits) and iso avones (present mainly in soy products) as substrates, several bacteria from the phyla Coriobacteriaceae and Lachnospiraceae families also mediate C-deglycosylation from compounds such as isovitexin, e.g. from buckwheat. Additional avonoid transformations involve demethylation, dehydroxylation, reduction, and C-ring cleavage [11].
The six avonoid O-deglycosylating hydrolases characterized so far were found in Bi dobacterium animalis, Bi. pseudocatenulatum and Bacteroides thetaiotaomicron. [14,15,35]. Several avonols and iso avones (e.g. genistin as a major avonoid of soy) were deglycosylated by these beta-glucosidases. Most of them show a high promiscuity, deglycosylating also several polysaccharides and several glycosylated phenolic compounds. As a subgroup of avonoid O-deglycosylating enzymes, rhamnosidases cleave usually the terminal rhamnose subunit of avonoid rhamnosides (e.g. rutin or naringin), resulting in the corresponding avonoid glucosides [36]. In contrast to these relatively simple one-step hydrolysis reactions, Cdeglycosylation is dependent on several enzymes for which two different operons were described. The Cdeglycosylation system described in strain PUE is dependent on at least three enzymes, DgpABC, which are part of a large gene cluster including transporters and other accessory genes whose function in Cdeglycosylation is unclear [23,24]. The C-deglycosylating enzyme system of Eubacterium cellulosolvens includes the ve enzymes DfgABCDE [22]. Both corresponding enzyme complexes include at least one glycosyltransferase and an oxidoreductase.
Daidzein-to-equol conversion in the human gut has been detected in many individuals (estimated as one half of the population) [37][38][39]. The involved enzymes and their encoding genes have been characterized in several different species such as Slackia iso avoniconvertens, Slackia sp. NATTS, Eggerthella sp. YY7918 and Lactococcus garviae [27,29,40,41]. Equol formation in the human gut is important, since this compound is more easily absorbed and has stronger estrogenic activities than daidzein [42]. At least three enzymes are involved in the complete conversion of daidzein to equol. All three were initially proposed to be reductases [27,29], although one of them was suggested to be a dismutase recently [40]. In addition, a didhydrodaidzein (DHD) racemase is employed for stereochemical conversion of DHD in some bacteria [28]. Horizontal gene transfer seemed to have occurred in the case of the closely related enzymes (more than 98% amino acid sequence identity [PID]) from Eggerthella sp. YY7918 and L. garviae [40]. Besides the complete conversion from daidzein to equol, a large number of individuals seem to carry out only the reduction of daidzein to Odesmethylangolensin [39,43], which is based on bacteria encoding only daidzein reductase.
Phloretin hydrolase (Phy), a Friedel-Crafts hydrolase, hydrolytically cleaves the C-C bond adjacent to the aromatic A-ring of phloretin and thereby produces phloroglucinol and 3-(4-hydroxyphenyl)propionic acid. The rst puri ed and characterized Phy was that of Eubacterium ramulus [26] in the apigenin and naringenin degradation pathway [44]. Only one other Phy sequence from the intestinal pathogenic Mycobacterium abscessis showing a 30% amino acid sequence identity to that of E. ramulus was published [25].
The NADH-dependent avanone-and avanol-cleaving reductase (Fcr), cleaving the heterocyclic C-ring of avanones and avanonols, was characterized in detail recently [45]. Suggested to be an enoate reductase earlier and acting in a concerted pathway together with a chalcone isomerase (CHI, [46]), the Fcr of E. ramulus was described to catalyze the conversion of e.g. naringenin to phloretin also without CHI. Still, CHIs [46][47][48][49] responsible for auronol production from avanonols [47], could play a substantial role in avonoid transformation in the human gut. In addition, the speci c reaction(s) carried out by CHI might be of biotechnological interest in auronol production. Very recently, the characterization of an ene-reductase (Flr), catalyzing the reduction of the B ring double bond of avones and avonols, was published [50] and the sequences of the two characterized enzymes were included in this analysis. All these studies were performed with few avonoid-modifying bacteria and the corresponding enzymes originate from a limited number of bacterial species. Isolation of these strains from fecal samples could be biased, because of selective enrichment of culturable bacteria. Therefore, the research on gut-bacterial avonoid modi cation published up to now does not display a complete picture of avonoid transformation in the human gut. Phenotypic studies on human fecal samples rely mainly on (partial) 16S rRNA gene sequencing and thus do not resolve the species level and miss avonoid-converting genera and quantitative studies of genes encoding avonoid-converting enzymes are scarce. A recent study on the quanti cation of daidzein-to-equol-converting bacteria relied on primers speci c for the different dihydrodaidzein reductase and tetrahydrodaidzein reductase genes [52]. In this pilot study, only 17 individuals were tested, of which less than half showed daidzein conversion, four assigned to Adlercreutzia equolifaciens and three to Slackia iso avoniconvertens (or related species). With the advent of metagenomic studies, a closer look into the gene catalog of the human gut microbiome became feasible [53], but still, quanti cation and especially taxonomic a liation of the genes of interest were di cult or impossible. Current bioinformatic methods offered the opportunity to assemble genomes from metagenomic data [54], which also was applied recently to human gut metagenome studies [55][56][57], opening up new ways to reveal avonoid modi cation in the human gut.
Here, we aim to identify and quantify avonoid-modifying bacteria in the human gut microbiota. For this, we screened the most recent combined human gut MAGs collection, the Uni ed Human Gastrointestinal Protein or Genome (UHGP/UHGG) catalog of more than 280,000 assembled genomes with a BLAST search using protein sequences of characterized avonoid-modifying enzymes (table 1) as queries. In addition, the presence, prevalence and abundance of bacterial species described to modify avonoids was investigated.
Based on the obtained/collected data, a possible scenario of avonoid modi cation by the different identi ed bacteria in the human gut, based on the experimental evidence from known avonoid-converting enzymes, is presented.

Results And Discussion
Distribution of avonoid-modifying bacterial species across human gut MAGs First, we screened the MAGs database accompanying metadata for the abundance of bacterial species described to transform avonoids. For this, a literature-based overview of currently known avonoidconverting bacterial strains present in the human gut was compiled, resulting in 45 distinct bacterial strains of 43 different species (Table 2). Of these, eleven were not taxonomically validly published, either due to lack of a 16S rRNA gene sequence or because the authors chose an informal taxonomic designation. If a 16S rRNA sequence was available, these unclassi ed isolates were provisionally assigned to the closest taxonomically described species using a 98% sequence identity cutoff to the respective strain. O-deglycosylation was the most abundant avonoid modi cation performed by human gut bacteria (31 strains). C-deglyocsylation, Cring cleavage, reduction, dehydroxylation or O-demethylation of avonoids were far less abundant (between three and seven strains each).
We then performed a text-based ltering and counting of species in the lineage column of the UHGG metadata with the aim to specify the MAG abundances of described avonoid-transforming species as percentage of the overall number of MAGs in the human gut. Important to note here is that the taxonomic assignments of these metadata are based on the genome taxonomy database (GTDB, https://gtdb.ecogenomic.org/). Thus, species without a sequenced genome (Catenibacillus scindens, Lactobacillus leichmanii, Slackia sp. NATTS) or species lacking taxonomic assignment in the GTDB to a given genome (S. iso avoniconvertens) could not be found. Besides the overall abundance of MAGs, we also calculated the prevalence of MAGs across individuals as percentage of the overall number of metagenome samples in the metadata.
The most prevalent avonoid-modifying bacteria were Bacteroides uniformis (18.5% prevalence across all 21,508 individuals), Parabacteroides distasonis (14.5%), Bi dobacterium infantis (8.4%), and Bi. adolescentis (7.4%; Table 2). All of these species were described to be capable of avonoid O-deglycosylation, mainly avonols, iso avones and avanones [11]. The remaining species were detected in less than 5% of individuals ( Table 2). The overall abundance across all MAGs were similarly distributed (Supplementary Table S1). The numbers for a single E. coli strain described to O-deglycosylate avonoids [58] should be handled with care, since this activity was as yet not con rmed for other E. coli strains and the MAGs database does not allow for strain level assignment. The lack of Bi. longum in the MAGs metadata is remarkable, but can be explained by the fact that Bi. longum ssp. infantis [59] is classi ed as Bi. infantis in the GTDB.
In addition, potential avonoid-modifying bacterial species from a recently published BRENDA enzyme reaction database search [60] were screened for their abundance in the MAGs metadata. However, these bacteria showed only very low abundance (Supplementary Table S2) and were not included in this study. Sequence identities are taken from [11] or the reference given in the species column. When the bacterium was classified into the genus, but not into a species, we performed a 16S rRNA gene analysis and list the original species/strain designation and the 16S rRNA sequence identity to the identified strain in brackets behind the species name. *: as per MIDI technique (Hur Lay et al., 2000). 1 : This number is the combined value from two phylogenetic groups (possibly subspecies) given in the GTDB.
Screening the MAGs amino acid sequence database for potential avonoid-modifying enzymes In the following, the MAGs database of human gut bacteria v1.0 was screened with sequences from characterized avonoid-modifying enzyme as queries. In total, 1,586,499 hits were found with an e-value of less than 1e-60, 1e-25 or 1e-20, ltered for a coverage of at least 75% and a PID of 30. In particular Oglycosidases and rhamnosidases gave a large number of hits, while O-demethylases and C-glycosidases resulted in rather low number of hits, because we ltered for the abundance of several genes from a larger Other bacterial genera besides Bacteroides or Bi dobacterium were detected at a PID of 50 to 60 especially to the three glucosidase sequences of Bi. pseudocatenulatum (Figure 1). At PID values of 55 to 60 to BpGluE, MAGs from Faecalicatena gnavus and the unclassi ed Gemmiger sp003476825 were highly abundant, with 3,170 and 8,329 overall occurrences, respectively (Supplementary Excel File). No Gemmiger bacterium was associated with the deglycosylation of avonoids up to now, but the high PID to BpGluE and the highest overall abundance of O-glycosidase hits make Gemmiger sp003476825 (one of the most prevalent bacteria in the human gut [61]), a potentially highly relevant species for avonoid deglycosylation. Two other taxonomically not described species with a high number of O-glycosidase hits with a PID of more than 50 were CAG-180 sp000432435 (Clostridiales, family Acutalibacteriaceae about 2,000 occurrences) and CAG-217 sp000436335 of the same family with about 1,000 occurrences. All of these show a PID of about 50 to Bi. pseudocatenulatum GluD. The high abundance and similarity to BpGluD indicates a potentially prominent role of these Acutalibacteriaceae members in deglycosylation of avonoids. Furthermore, Blautia wexlerae and two taxonomically not described Eubacterium spp. with a high PID to BpGluD, were abundant as well. More distantly related glucosidases (PID between 40 and 50 to the queries) include those of Faecalicatena gnavus, several Clostridium and Ruminococcus species and Gemmiger sequences more closely related to the Bi dobacterium breve enzyme and a second Blautia enzyme (Supplementary Figure 1). Since only iso avones were tested as substrates for the bi dobacterial enzymes, conclusions about the substrate speci city of the here detected homologs cannot be given. Usually also avonones and avonols are deglycosylated by Bi dobacterium spp. [11].
To unravel the phylogenetic relationships between the observed enzymes and to see whether the novel sequences can be classi ed into distinct phylogenetic classes with potential new catalytic properties, a phylogenetic tree was created. The largest part of the Bl. wexlerae sequences form a large distinct clade including sequences from Facealicatena gnavus as well, while the Gemmiger glucosidases forms another clade, also distinct from all clades harboring characterized avonoid O-glycosidases (Supplementary Figure  2). The latter forms one phylogenetic clade together with glucosidases of four Acutalibacteraceae species. Only four of the sequenced enzymes (BpGluA, BpGluB, BpGluD) and the Bacteroides sequence form larger clades, with the Bl. wexlerae, the CAG-180 and the F. prausnitzii clades being much larger, suggesting that the O-deglycosylating potential of the human gut microbiota is largely uncharacterized and awaits biochemical investigation.
A second class of O-glycosidases was observed in Catenibacillus scindens, where a cluster of two genes, dfgCD, was characterized to encode an enzyme that 7-O-deglycosylated iso avones and avones [22]. DfgCD homologs are found scarcely in human gut MAGs, mainly in Hungatella hathewayi, Faecalicatena species, and Eisenbergiella sp900066775, all with approximately 100 occurrences (Supplementary Figure 3).

Derhamnosylation of avonoids
The BLAST search with queries from six rhamnosidases of Lactobacillus, Bi dobacterium and Bacteroides species resulted mainly in hits with MAGs classi ed as Bi dobacterium and Bacteroides. The low number of hits to Lactobacillus MAGs (only L. plantarum was detected with a PID > 65 and Freq > 20) was expected, given the low abundance of Lactobacillus species in the gut MAGs metadata ( Table 2). One of the most abundant species was Bacteroides dorei, not yet described as rhamnoglycoside-hydrolyzing species ( Figure  2), whereas the species described to de-rhamnosylate e.g. rutin, such as Bi. pseudocatenulatum, Bi. dentium, Bi. breve, and Ba. thetaiotaomicron were of comparably lower numbers. The highest number of MAGs in this analysis was observed when using the metagenome-derived rhamnosidase 3 sequence as query. These MAGs could be assigned to Gemmiger sp003476825, which also contains a putative avonoid O-glucosidase (see above, no sequence similarities to the rhamnosidase described here). With a lower abundance (about 50 appearances), metagenome-derived rhamnosidase 2 orthologs were detected in MAGs of the genus CAG-170 (family Oscillospiraceae). Orthologs (PID of about 98) of the metagenome-derived rhamnosidase 1 were detected in only ve MAGs from the family Monoglobaceae, which could not be assigned to a genus yet and differ from the only described member of this genus, the pectin-degrading intestinal Monoglobus pectinilyticus [65,66], which does not encode the corresponding rhamnosidase. Below a 65 PID cutoff, mainly species from the genera Parabacteroides and Bacteroides and the species Alistipes onderdonkii (approximately 45 to 50 PID to MGR2) were found to be highly abundant (Supplementary Figure 4). Several distinct rhamnosidase families can be distinguished phylogenetically, of which the largest cluster does not include characterized ones and contains mainly sequence hits from Bacteroides and Parabacteroides genomes. Interestingly, most characterized rhamnosidases do not belong to phylogenetic nodes mainly representing the corresponding species: B. dentium is assigned to a cluster containing mostly B. pseudocatenulatum genomes, L. acidophilus belongs to a large node containing mostly Fusicatenibacter saccharivorans, the node including the characterized Bi dobacterium breve rhamnosidase is dominated by Parabacteroides merdae, while the LpR2 produced mainly hits from Bacteroides dorei and Parabacteroides species (Supplementary Figure 4). The B. thetaiotaomicron rhamnosidase and LpR1 cluster together in a large node dominated by Bacteroides dorei. These results show that the characterized isolate species often do not represent the species mainly responsible for the de-rhamnosylating activity in the human gut.

C-Deglycosylation of avonoids
The current gene clusters reported to play a role in C-deglycosylation are depicted in Figure 3. Of the Eubacterium cellulosolvens dfgABCDE gene cluster ( Figure 3A, [22]), all sequences were considered as queries in this study, because it is not completely resolved which gene products are involved in catalysis. Of the dgp gene cluster of strain PUE (Figure 3 B), we used only the rst three genes (dgpABC) as queries, since only these genes were shown to catalyze reactions in the C-deglycosylation pathway [23,24].
To respond to the involvement of more than one enzyme in this reaction, we ltered for genomes encoding all three catalytic subunits dgpABC or at least three of dfgABCDE. The latter resulted in only a few similar hits with a PID higher than 40 (less than 100, Supplementary Figure 4), most of them in the Faecalicatena genus (Supplementary Figure 4). Since dfgD was not found in the majority of MAGs containing dfgABCE, it is likely that DfgD is not involved in C-deglycosylation. The effect of a lack of dfgD in the dfgABCDE gene cluster in Cdeglycosylation was not speci cally investigated [22]. A higher number of hits to DfgA (more than 300) with a PID lower than 40 was observed in Faecalicatena species, but these were single genes not part of a dfgABCDE cluster (Supplementary Figure 5). The dgpABC cluster was much more prominent in the human gut MAGs than the dfgABCE cluster, with most hits in Agathobacter faecis (formerly Roseburia faecis, PID of 50 to 85%) plus several dozens in Blautia, Dorea, Enterococcus and Faecalicatena MAGs (Figure 4). However, only a low percentage of A. faecis MAGs carried the dgpABC cluster, i.e. 804 of the 2819 MAGs assigned to that species (~ 30%), so that the occurrence of A. faecis cannot be linked necessarily to the C-deglycosylation of avonoids. Even less speci c seems the occurrence of C-deglycosylating genes in Faecalicatena gnavus, with only 84 MAGs of 1197 encoding DgpABC (Supplementary Table 1).

Daidzein-to-equol conversion
The three enzymes responsible for daidzein-to-equol conversion (DZR, DDR and TDR) plus the optional dihydrodaidzein racemase encoded by Lactococcus garviae [28] are encoded in a single cluster, which is very similar in L. garviae, A. equolifaciens and S. iso avoniconvertens. Due to the high sequence similarities of the enzymes from the three species, we used only the S. iso avoniconvertens sequences (plus the L. garviae racemase sequence) as queries. Up-and downstream genes were not considered in this study, since they were suggested to play accessory roles [27]. Hits in Slackia, Lactococcus or Eggerthella were scarce, which is contradicting the literature reports on a widely distributed daidzein-to-equol conversion by these species in the human gut. Four hits of Slackia MAGs containing the daidzein-to-equol gene cluster were assigned to the same species represented by GUT_GENOME145587. A comparison of this representative genome to the available S. iso avoniconvertens DSM22006 genome (GenBank assembly number GCA_003725955.1) using ANI calculator [67] resulted in an ANI value of 95%. Therefore, the classi cation of these MAGs to S. iso avoniconvertens is justi ed. Since the other 261 MAGs representing this species do not encode the daidzein-to-equol pathway enzymes, S. iso avoniconvertens appears not to be a typical equol producer in the human gut. Therefore, conclusions of equol production in an individual should not be based on 16S rRNA analyses, which obviously cannot resolve the daidzein-to-equol transforming potential of Slackia spp. Another bacterial species described to transform daidzein to equol, Adlercreutzia equolifaciens, was more abundant, with 33 occurrences of the ddr gene, but also here, the percentage of MAGs carrying the daidzein-to-equol gene cluster was relatively low with less than ten percent (a total of 271 A. equolifaciens MAGs). This observation is strengthened by a report of a human gut A. equolifaciens isolate, lacking the cluster and not able to transform daidzein to equol (Valezquez et al. 2020). Another species whose MAGs frequently carried daidzein-to-equol cluster genes was CAG-1427 sp000435475 (Eggerthellaceae, Figure 5), of which nearly 50% (a number of 35) of the MAGs (Supplementary Table 1) carry the daidzein-to-equol gene cluster. Therefore, the occurrence of these bacterial species cannot give a reliable conclusion on the main equol producers in the human gut. Speci cally designed primers in a PCR-based method revealed that three of 17 arbitrarily selected individuals (of which 9 were "equol-producers") carried equol-producing Slackia, another four Adlercreutzia [52], which together with the present study hints towards an at least equal or higher importance of A. equolifaciens compared to S. equolifaciens in equol production. However, the higher percentage of equolproducing individuals is in contrast to the low number of MAGs from potential equol-producing bacteria in the present study. This might be explained by a low abundance of equol-producers in most individuals, preventing an e cient assembly of MAGs from metagenomes, which of course is much less sensitive compared to sensitive HPLC-based equol detection or the highly speci c PCR-based detection of the corresponding genes. Since two of the equol-producing individuals of the mentioned study did not result in a PCR product with the employed primer pairs, undiscovered equol-forming bacteria exist in the human gut. A high number of hits with a PID of 30 to 40 was identi ed in in Agathobacter rectalis (previously Eubacterium rectale) and several, mostly uncharacterized Collinsella species ( Figure 5). The occurrence of this latter genus in human gut microbiomes correlated with the ability to transform daidzein to equol in several studies [68][69][70]. However, since the PID to the characterized daidzein-transforming enzymes is rather low with only about 30 to 35 and the corresponding genes are not clustered, but spread out in the genome, this assumption warrants further research. A relatively high number of genomes showing only hits to the daidzein reductase sequence might be responsible for the large number of phenotypes transforming daidzein only to O-Desmethylangolensin. Especially bacteria assigned to Erysipelatoclostridium and Clostridium M genera carry putative dzr genes (Supplementary Figure 6, Supplementary Excel File).
Flavone/ avonol reduction, avanone/ avanonol ring cleavage, chalcone isomerization and phloretin hydrolysis Since these four reactions, the reduction of avons and avonols to avanones and avanonols, respectively (catalyzed by Flr), the C-ring cleavage of these products (via Fcr), isomerization of chalcones and auronols (via CHI), and the hydrolysis of the dihydrochalcone phloretin (via Phy) can be viewed as steps in the same pathway of avone/ avonol degradation (e.g. of apigenin via naringenin, Figure 6), the results are compiled in one subchapter.
The very recently characterized Flrs from Flavonifractor plautii and Clostridium ljungdahlii [50] resulted in a large number of hits in F. plautii and Clostridoides di cile MAGs (the latter interestingly with a higher PID to the sequence of F. plautii than to the sequence of C. ljungdahlii), and, with a lower PID, in MAGs of Olsenella, Eubacterium and Fusobacterium species (Figure 7). Most of the F. plautii MAGs (also the type strain DSM 6740) harbor two copies of the r gene, with one of them closer related (PID of ~ 60) to that of Clostridium ljungdahlii than to the "main" Flr of F. plautii. This second putative Flr of F. plautii didn't show any apigeninreducing activity [50], but might reduce other avonoid classes, which were not tested in the corresponding study.
The BLAST analysis with the characterized avanone-and avanonol-cleaving reductase (Fcr) from Eubacterium ramulus resulted in hits identi ed in F. plautii MAGs (523 overall occurrences, Supplementary Excel File), while E. ramulus MAGs revealed a lower number of 188 occurrences. Similar to the situation with Flr, several E. ramulus genomes contained two fcr genes, one of which with a higher PID to the F. plautii gene, which might re ect a different substrate spectrum of the encoded Fcrs. Besides F. plautii and E. ramulus, MAGs assigned to Anaerostipes hadrus frequently (138 MAGs) encoded an Fcr-like enzyme with approximately 85% identity to Fcr from E. ramulus. Besides these three main species, a number of MAGs from Clostridium boltae encoded a putative Fcr with a PID of approximately 42 to the Fcr from F. plautii. Most hits to Fcr (several hundred up to nearly 1000) showed around 35 PID (Figure 8) and frequently occurred in particular in Faecalicatena spp. Whether these putative enzymes have a similar function and whether Faecalicatena and other species contribute to avonoid degradation warrants further investigation.
Genes similar to that encoding CHI (Supplementary Figure 8) were highly abundant in MAGS assigned to Flavonifractor plautii (593 overall occurrences) and Clostridoides di cile (375) and, to a lower extent, present in Eubacterium ramulus (137), in which it was initially described. Below a PID of 40, only MAGs from Fusobacterium mortiferum were detected in a higher number (ca. 100).
The BLAST search using the phloretin hydrolase (Phy) sequence of E. ramulus resulted in several hits with a PID of more than 90, assigned to 159 E. ramulus MAGs (Figure 9). The most abundant Phy homologs with a PID of 46 (579 occurrences, Supplementary Excel le) were found in MAGs assigned to Flavonifractor plautii, which was described to hydrolyze phloretin [71]. Genes of two species never reported to transform avonoids, Dialister succinatiphilus (322 occurrences overall) and Anaerostipes hadrus (119 occurrences) were abundant with a PID of approximately 60 and 75 to the Phy query. Opposed to the Phy of E. ramulus, the BLAST search with the Phy of M. abscessis resulted in only few hits below 40 PID in a Dakarella species.
The co-occurrence of hits to all enzyme sequences (Flr, Fcr, CHI and Phy) was observed in most of the F. plautii and E. ramulus MAGs, emphasizing the co-functionality of these enzymes in a single pathway of avone degradation. F. plautii was by far the most abundant species in this analysis, pointing toward its major role in avone degradation in the human gut. Of the 610 MAGs assigned to F. plautii, most (509 to 610) contained at least one of the genes. In contrast, A. hadrus, containing only Fcr and Phy, showed only a minor amount of MAGs carrying avonoid-degrading enzymes (less than 5 % of the overall MAGs).

O-Demethylation of avonoids
Similar to C-deglycosylation, O-demethylation requires several proteins (MT1 and MT2, CP and AE) acting together in a complex reaction pathway. Since no speci c avonoid O-demethylation system has been characterized, the four enzyme sequences of the O-demethylase system from Eubacterium limosum, which acts on the polyphenolic lignan secoisolariciresinol [30], were used as queries. Above 70 PID, only a few dozen hits from the Eubacterium genus, notably from the species E. callenderi and to a lesser extent E.

Conclusion
The here presented in silico analysis of potential avonoid-modifying enzymes of human gut bacteria is a rst step to uncover the complete picture of avonoid transformation in the human intestine. The ndings constitute a basis for targeted testing of the detected potential novel enzymes and their bacterial sources. Figure 10 summarizes some representative avonoid-transformations, the prevalence of MAGs encoding the corresponding avonoid-transforming enzymes and the main bacterial species involved in these reactions. In addition to the already characterized gut bacteria involved in avonoid transformation, potential species and genera so far not associated with the modi cation of these polyphenols in the human gut were identi ed. Of these novel avonoid-modifying candidates, several species are as yet not isolated. These include for example Gemmiger sp003476825 potentially involved in O-deglycosylation and derhamnosylation or the CAG-1427 genus in O-deglycosylation. Heterologous expression of the potentially involved enzymes could reveal their avonoid deglycosylation potential. Some other bacteria were already known to play a crucial metabolic role in the human gut (e.g. production of short-chain fatty acids or polysaccharide degradation), but their ability to convert avonoids may have been overlooked. An example is Anaerostipes hadrus, so far mainly known for butyrate production [72]. An (albeit minor) portion of MAGs of this highly abundant human gut bacterium was observed to encode several putative avonoid-transforming enzymes including a Phy and an Fcr. A main avonoid-degrading bacterium seems to be Flavonifractor plautii: Most of its MAGs contain Flr, Fcr, Phy and CHI genes, and it is relatively abundant. Thus, F. plautii might serve as a good indicator of the avonoid degradation potential in 16S rRNA gene-based gut microbiota analyses.
On the other hand, our quantitative results on avonoid-modifying bacteria implies that several key avonoid modi cation processes in the human gut might have been so far assigned to bacteria, which are neither representative nor the main avonoid-transforming bacteria in this ecosystem. For example, equol formation from daidzein appears to be catalyzed in the human intestine mainly by Eggerthella and maybe by Collinsella species probably more often than by the thoroughly studied Slackia iso avoniconvertens and Adlercreutzia equolifaciens.
The observation of the wide array of uncharacterized, putative avonoid-transforming enzymes in human gut bacteria presented here might be a valuable source for identi cation of novel enzymes of biotechnological interest, especially for the production of unusual or di cult-to-produce avonoids. Several bacterial species seem to be specialized in metabolizing avonoids and possibly other polyphenols as well. For example, nearly all Flavonifractor plautii MAGs harbor the three enzymes Phy, Fcr and CHI, and therefore seem to be the main avanone and avanonol-degrading bacteria in the human gut. While further research on the effect of this degradation on human health is necessary, it might be one of the reasons, why the presence of F. plautii was associated with the development of colon cancer in an Indian population [73]. Finally, the outcome of the here presented in silico analysis could contribute to clarify the impact of gut bacteria on avonoid-mediated health effects in humans.

Methods
Retrieval of query sequences, MAGs data and set-up of UHGP BLAST database Besides extracting the most recent review on avonoid modi cation [11], a literature search on the databases from Web of Science and Pubmed and in the Google scholar search engine was performed. The employed keywords were " avonoid(s)" AND "modi cation" OR "deglycosylation" OR "demethylation" OR "cleavage". Furthermore, screening in the citation network of the retrieved publications was performed and manually reviewed before Genbank accession numbers and the corresponding amino acid sequences were retrieved from the NCBI Genbank database (Table 1).
A BLAST-searchable database including all non-redundant amino acid sequences from the UHGP catalog was set up as follows: The UHGP-100 version 1.0 non-redundant protein fasta le (uhgp-100.faa) from 286,997 MAGs assembled from gut metagenomes [61] was downloaded as the packed uhgp-100.tar.gz le from the MGnify ftp server ((http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v1.0/) on July 1 st , 2020 and subsequently extracted. With BLAST+ (stand-alone, version 2.9.0 [74]) installed on Linux Ubuntu 18.04 and the makeblastdb command, a BLAST database was created from the uhgp-100 le. For quanti cation purposes, the uhgp-100.tsv table including all of the redundant amino acid sequence IDs (all assigned to a non-redundant "representative" sequence ID corresponding to the identical amino acid sequence in other MAGs) was extracted from the same packed le (uhgp-100.tar.gz). The UHGP metadata which included MAGs taxonomy assigned via the genome taxonomy database (GTDB, release 89), genome quality values, study and sample information and others were downloaded from the same mgnify ftp-address (genomes-all_metadata.tsv le). Please note that there might be differences in the taxonomy, since the current release of the GTDB is no. 202 and the metadata le was not updated.
BLAST searches, data mangling and quanti cation The queries were searched against the UHGP-100 blast database using the blastp search function and an evalue cutoff of 1e -25 for pathways including short sequences (150 to 350 amino acids; daidzein-to-equol, CHI, O-demethylation, Phy, Flr, DfgCD) or 1e -60 for longer sequences (>350 amino acids; Fcr, O-glycosidases, rhamnosidases) and 1e-20 for the C-deglycosylation enzymes, which contained two sequences below 150 amino acids. The maximal number of target hits was set to 100,000 via the max_target_seqs argument. Blast table output format 6 was chosen with the speci ed columns (the Linux BASH commands are given in supplementary txt-le). The output les were concatenated and ltered for a PID of at least 30 and a query coverage value of at least 75 %. To avoid redundant hits to more than one query, only the hits with the best bitscore were kept. To retrieve the redundant amino acid sequence IDs for quanti cation of the hits, grep with a list of the non-redundant sequence IDs as input was used on the uhgp-100.tsv le. The table was merged with the metadata and subsequently ltered for MAGs only (isolate genomes also included in the uhgp catalog were discarded) and the occurrences of the redundant amino acid sequences were counted, these values accounting for the frequency in a given non-redundant sequence group were placed into a new column (Freq). For further ltering and data visualization, R (v 3.6.3) was used in RStudio (v 1.3) with ggplot2.

MAGs gene cluster analysis
To compare and analyze gene clusters among MAGs or isolate genomes, the corresponding genomes were downloaded from the NCBI database or the UHGP MAGs folder and uploaded to the rapid annotation subsystem technology (RAST) server (if possible with their original annotation preserved). The sequence comparison tool was used to compare up to ten genomes.

Construction of phylogenetic trees
Phylogenetic trees of the rhamnosidase and O-glycosidase sequences were constructed after clustering of the sequences with CD-hit (v 4.8.1 [75]) using standard settings (90% PID clustering threshold) and alignment with ClustalOmega (v 1.2.4, [76]) installed on Linux Ubuntu. The alignment was re ned with Clipkit (v.1.1.2, [77]) before the tree was generated with Fasttree (v 2.1.10, standard settings [78]). Tree visualization was performed via iTol (v 6, [79]) and manually re ned in Inkscape for Windows (v1.01). The size of the nodes triangles was drawn to scale using the number of CD-hit clustered sequences as well as the redundant sequence IDs of the uhgp-100.tsv le. Figure 1 Flavonoid O-glycosidase homologs in human gut MAGs. The PID (percent amino acid sequence identity) threshold to the queries (color code) was set to 50 (for abbreviations and details see Table 1). Hits were Page 25/29 ltered for at least 50 occurrences, so that each bubble represents a number of redundant sequences ranging from 50 to 772.

Figure 2
Rhamnosidase homologs in human gut MAGs. A PID threshold of 65 was chosen to the queries shown in the color code (for abbreviations and details see Table 1), those with a lower PID are shown in Supplementary   Figure 4). Hits were ltered for at least 20 occurrences, so that each bubble represents a number of redundant sequences ranging from 20 to 2176. Two characterized C-deglycosylation gene clusters. (A) Eubacterium cellulosolvens with ve genes involved in C-deglycosylation [22] and (B) strain PUE [23,24], with dgpA encoding an oxidoreductase and dgpBC as oxopuerarin-deglycosylating enzymes, of which all three are involved in C-deglycosylation. Genes encoding deglycosylating enzymes are shown in turquoise, accessory genes in violet. Full GenBank accession number is given for the rst gene and only variable digits for the downstream genes. Homologs to enzymes involved in avonoid C-degylcosylation using sequences of strain PUE. Hits were ltered for the co-occurrence of all three genes required for C-deglycosylation (dgpABC) in the same MAG. Hits were ltered for at least 2 occurrences, so that each bubble represents a number of redundant sequences ranging from 2 to 820.

Figure 5
Homologs to enzymes involved in daidzein-to-equol transformation in human gut MAGs. Hits were ltered for the co-occurrence of all three genes required for daidzein-to-equol conversion (dzr, ddr, tdr) in a single MAG.  Reductive degradation of avonoids depictured with apigenin as avone example.

Figure 7
Flr sequence hits of human gut MAGs. A PID threshold of 40 was chosen to the queries shown in the color code (for abbreviations and details see Table 1). Hits were ltered for at least 10 occurrences, so that each bubble represents a number of redundant sequences ranging from 10 to 245.

Figure 8
Fcr-like enzymes in human gut MAGs. Hits were ltered for at least 10 occurrences, so that each bubble represents a number of redundant sequences ranging from 10 to 804.

Figure 9
Distribution of Phy homologs in human gut MAGs. Depicted are MAGs with at least ve identical amino acid sequences. Hits were ltered for at least ten occurrences, so that each bubble represents a number of redundant sequences ranging from 10 to 578.