Compositional analysis of the bacterial community in colostrum samples from women with gestational diabetes mellitus and obesity

Background. Gestational Diabetes Mellitus (GDM) and obesity are threatening health conditions during pregnancy, as they affect the normal function of multiple systems, including neuro-hormonal networks, and adipose, liver, muscle and placenta. GDM and maternal obesity are main triggers to a vicious cycle of metabolic and cardiovascular diseases perpetuated trans-generationally. One of the rst stages of this vicious cycle occurs during early lactation as the infant feeds of “disbalanced” breastmilk microbiota and macromolecule level. Despite the importance of breastmilk microbiota on newborn development, few studies have characterized breastmilk microbiota in association with obesity and GDM. Maternal obesity decreases the diversity of breastmilk microbiota, with increased proportions Staphylococcus compared to Bidobacterium and Bacteroides. However, the extent to which GDM together with maternal obesity affect breastmilk microbiota is unknown. Here, we applied 16S-rRNA high-throughput sequencing to characterize colostrum microbiota of 43 mothers with exclusive GDM and obesity in order to address the impact of GDM/obesity on breastmilk microbiota. Results. We identied a total of 1,496 amplicon sequence variants (ASVs), being Proteobacteria and Firmicutes the dominant phyla. We found Staphylococcus, Corynebacterium-1, Anaerococcus and Prevotella overrepresented in samples of women with obesity and women with GDM. Population diversity indicators, such as Shannon/Faith phylogenetic index and UniFrac/robust Aitchison distances show distinct microbial composition for GDM (female-newborn subgroup) and obesity (male-newborn subgroup) compared to controls. Finally, Differential abundance analysis showed that Rhodobacteraceae was distinct for GDM and ve families (Bdellovibrionaceae, Halomonadaceae, Shewanallaceae, Saccharimonadales and Vibrionaceae) were distinct for obesity groups. study diversity. bacterial pathways. However, there is a strong inuence of the infant´s gender and the utilization of antibiotics intrapartum. To our knowledge, this study represents the rst characterization of the taxonomical changes of colostrum breastmilk in mothers with GDM. We think this study contributes to future design of functional metagenomic studies aiming to understand molecular mechanisms by which breastmilk from mothers with GDM/obesity affect the development and future adult physiology of their suckling infants.

characterization of the taxonomical changes of colostrum breastmilk in mothers with GDM. We think this study contributes to future design of functional metagenomic studies aiming to understand molecular mechanisms by which breastmilk from mothers with GDM/obesity affect the development and future adult physiology of their suckling infants.

Background
Breastfeeding during the rst semester of life is crucial as the newborn's gastrointestinal tract matures and develops determinants of future health [1,2]. In addition to nutritional components (proteins, oligosaccharides, short-chain fatty acids) [3], breastmilk contains a diverse microbial population which participates in educating the gastrointestinal immune system and affects the infant gut-brain axis [4][5][6].
Gestational diabetes mellitus (GDM) is a pregnancy complication with 17% incidence worldwide where women develop chronic hyperglycemia during the gestational period [15]. Risk factors include advanced maternal age, diet micronutrient de ciency, and overweight/obesity [16]. In addition, short-term impact of GDM includes an increased risk of type 2 diabetes and cardiovascular disease in both the mother and the infant during adulthood. While molecular mechanisms are still missing, GDM and maternal obesity are strongly correlated to higher incidence of allergic manifestations and metabolic-related diseases including atopic dermatitis (7.5-fold increase) allergen sensitization (5.9-fold increase) [17,18]. Most importantly, GDM is considered the bottom-line of a trans-generational vicious cycle of obesity and diabetes affecting the entire population and have a huge economic impact on the public health system worldwide.
The molecular processes underlying the pathophysiology of GDM encompass multiple systems and organs including neurohormonal networks, adipose tissue, liver, muscle and placenta. One of the most known effects of GDM is b-cell dysfunction during pro-insulin synthesis, post-translational modi cations, granule storage or sensing peripheral glucose concentrations [16]. Another important molecular mechanism during GDM is the increased number of resident adipose tissue macrophages secreting proin ammatory cytokines such as TNF-a, IL-6 and IL-1b which are related to impair the release of insulin from b-cells [16] Finally, it has been shown that placenta of mothers with GDM contains lower eosinophil leukocytes count and lower placental expression of the immune mediators IL-10 and TIMP3 compared to normoglycemic controls [19].
Despite the importance of a well-balanced breastmilk microbiota on newborn development, limited studies have focused on characterizing breastmilk bacterial pro ling and its microenvironment in association with obesity and GDM. For instance, pre-conceptional obesity (BMI > 30 kg/m 2 ) is linked to a less diverse breastmilk microbial population with increased proportions of Staphylococcus compared to the Bi dobacterium and Bacteroides [20][21][22]. At the nutritional level, several studies show signi cant changes in breastmilk of diabetic pregnancies through all the lactating phases. For example, altered levels of anti-infective proteins such as lactoferrin, glycosylation of slgA suggesting that glucose dysregulation has consequences that affect macromolecular composition of breastmilk [23].
Unfortunately, there is no data available about microbiome changes of breastmilk from diabetic pregnancies.
Here, we report a compositional analysis of the microbiota of colostrum samples of Mexican individuals with GDM and obesity. We used a cohort study including mothers with obesity (non-GDM) and mothers with GDM with a biomass index lower than 30 kg/m 2 (non-obese) from Monterrey, the Mexican city with the highest affection of GDM [24]. We sampled colostrum within the rst 24h after birth and used a 16S rRNA amplicon sequencing approach to identify changes at the taxonomical level. The work presented is pioneering on the characterization of the microbial population in colostrum of women with obesity and GDM. Most importantly, our results represent the rst step towards a molecular mechanistic explanation of interactions between the breastmilk microbial population under GDM and obesity.

Results
Breastmilk samples (colostrum) from 43 Mexican individuals aged 20-32 years were used in this study.
All deliveries were at term with a mean of 39.4 weeks of gestation. A total of 18 samples from mothers with BMI < 25 kg/m 2 (non-obese), with no GDM symptoms were considered as controls. Twelve milk samples were collected from mothers with obesity (BMI ≥ 30 kg/m 2 ) and without GDM, as well as 13 samples were obtained from mothers with GDM and a BMI < 30 kg/m 2 . None of the participants in the gestational diabetes study group had obesity; however, eight of them were overweight (BMI ≥ 25 kg/m 2 ). Most of our participants were multiparous and 25 women were given antibiotics during labor. Regarding delivery, 24 neonates were born by caesarean section and the sex distribution of the newborns was 48.8% of males and 51.16% of females. Clinical and demographics of the participants are summarized in Table  1. Table 1. Clinical characteristics of subjects included in the study (n = 43).
After birth, breastmilk was collected within the rst 24 hours and colostrum was appropriately stored at -20°C until analysis. Using NGS, 1,675,157 high quality reads were obtained with a mean of 38,957 ± 27,723 sequencing reads per sample. Removal of possible contaminants and rare taxa was performed through an exhaustive batch analysis for all samples processed. We identi ed batch effect for Pseudomonas, Enterobacteriaceae, Ralstonia and Herbaspirillum across our dataset. Even though Pseudomonas and Enterobacteriaceae are part of other breastmilk microbial compositional studies [8,[25][26][27][28] we decided to remove them from our dataset as clear batch effect is observed (Supplementary Figure 1).
After the removal of possible contaminants and rare taxa (≤ 25 reads in total), 1,496 amplicon sequence variants (ASVs) were assigned at 30 phyla, 58 classes, 133 orders, 217 families, 395 genera and 335 species using Silva 132 database with a con dence identity level set at 99%. We found increased prevalence of Staphylococcus, Corynebacterium 1, Anaerococcus and Prevotella in samples from participants with GDM and obesity. Detailed information about the abundance of ASVs found per sample can be found in the Supplementary Table 1.
Colostrum compositional microbiota was dominated by Staphylococcus in all the samples.
Alpha and beta diversity metrics show a distinct microbial composition for GD-F, Ob-M and newborn gender-related samples We used a general linear model (glm) using alpha diversity metrics at a sequencing depth of 6,130 (data not shown) in order to quantify the in uence of GDM, obesity (BMI ≥ 30 kg/m 2 ), mode of delivery, antibiotic exposure, multiparity and sex of the baby. As a result of the analysis, maternal physiopathology (GDM, obesity and healthy) and antibiotic exposure showed statistically signi cant association (p ≤ 0.10) for Shannon index and observed ASVs. The implementation of intrapartum antibiotics was related to a decrease in diversity. The sex of the baby showed statistically signi cant association (p ≤ 0.05) for phylogenetic diversity and observed ASVs ( Figure 3A-C). GDM subgroups presented the highest values in all alpha indexes. In addition, our results suggest that, in general, female subgroups had higher diversity compared to male subgroups. Fisher's comparisons indicate that statistical difference was only signi cant between healthy-female (NW-F) and GDM-female (GD-F) for Shannon index. Breastmilk samples from obesity-male subgroup (Ob-M) had the lowest levels of alpha diversity and were statistically different to all the subgroups, including their female counterpart ( Figure 3A-C).
We estimated microbiome beta diversity using the unweighted UniFrac distance ( Figure 3D). Our results show that obesity-male (Ob-M) subgroup cluster separately from the rest of the samples (PERMANOVA; p = 0.047; 999 permutations). Using the unweighted distance matrix, we generated a PCoA biplot in order to show that the clustering was signi cant for obesity-male (Ob-M; p < 0.05) compared to healthy-male (NW-M). Arrows in the plot represent the correlation at family level with PCoA axes, indicating their contribution to the variation ( Figure 3D). While samples from GDM-female (GD-F), GDM-male (GD-M), healthy-female (NW-F) and obesity-female (Ob-F) show high similarity regarding microbial composition, the unweighted measurement indicates that there is a phylogenetic difference between obesity-male (Ob-M) and the rest of subgroups (p < 0.05).
We used the beta-diversity compositional Aitchison's distance in order to assess the compositional nature of data (PERMANOVA; p = 0.002; 999 permutations) ( Figure 3E). The robust principal component analysis (RPCA) biplot, which allows to examine the variation of samples and taxa, did not show a clear separation of any subgroup. PERMANOVA tests and pairwise comparisons showed that obesity-male (Ob-M) was different to both GDM subgroups (p < 0.05) and that obesity-female subgroup (Ob-F) was statistically different to its male counterpart (Ob-M) and both healthy and GDM subgroups (p < 0.05). In addition, the GDM-male (GD-M) subgroup was different to its healthy counterpart (NW-M; p < 0.10). The 7 taxa presented as vectors in the plot are the most signi cant drivers of the location of samples ( Figure  3E).

Breastmilk core and differentially abundant taxa
We de ned the breastmilk core microbiota as taxonomical families present in all samples with a minimum 1% of total mean relative abundance. Overall, 8 families were identi ed as the core taxa and comprise 56.8% ± 11.3% of the total ( Table 2). The most abundant were Staphylococcaceae with a general mean of 13.9% ± 16.5%, followed by Rhizobiaceae (10.3% ± 11.1%), Burkholderiaceae (9% ± 11.7%) and Streptococcaceae (7.1% ± 16.1%). These results demonstrate the high variability of the core bacteria among subgroups and individuals. The four most abundant families belonging to the core were found to describe most of the variation in the ordination space observed in the unweighted PCoA biplot and were represented as arrows ( Figure 3D). However, no clear participation of families to the formation of subgroups was visualized with the implement of UniFrac metrics. Table 2. Breastmilk core microbiota at the taxonomical family level (% relative abundance ± standard deviation).
We used the Aldex2 tool [29] in order to identify differences in ASV abundance between subgroups. We determined the taxa that were driving the difference between the subgroups and obtained effect plots (based on the effect size), which allowed us to visualize if the variation was higher between or within subgroups. Given the high variability amongst samples, we only observed differentially abundant ASVs with a signi cant expected Benjamini-Hochberg corrected p-value of Welch's t test (q ≤ 0.1) in three sample pairs (Figure 4). In the GDM-female vs healthy-female (GD-F vs NW-F) comparison, the family Rhodobacteraceae was identi ed as different (q < 0.10; Figure 4A).While in the healthy-male vs obesitymale (NW-M vs Ob-M) were found a total of 5 families (Vibrionaceae, Halomonadaceae, Shewanallaceae, Bdellovibrionaceae, and Saccharimonadales) with an absolute effect size greater than 1 and a q-value ≤ 0.05 ( Figure 4B), in the obesity-male vs obesity-female (Ob-M vs Ob-F) 10 signi cant taxa were different, of which only 2 (Burkholderiaceae and Sphingobacteriaceae) corresponded to the core families ( Figure  4C). Based on the median difference between subgroups, we observed that in all comparisons, obesitymale (Ob-M) had signi cant higher abundance for differential taxa found. In addition, GDM-female (GD-F) had higher prevalence of Rhodobacteraceae compared to healthy-female (NW-F).

Discussion
We present the rst compositional study of colostrum microbiota of individuals with obesity (BMI ≥ 30 kg/m 2 ) and gestational diabetes mellitus. Colostrum is supposedly the rst postnatal maternal uid in contact with newborns' gastrointestinal system; therefore, identifying differences in its microbial composition could directly explain aberrant function of the infant´s immune gastrointestinal system. Overall, our results show that colostrum microbial composition are overrepresented with Staphylococcaceae and Rhizobiaceae taxonomic families ( Figure 1A). Staphylococcus, Chryseobacterium, Streptococcus and Sphingomonas were the most abundant genera in our data set, which correlates with the microbial composition of breastmilk from women from Mexico, Taiwan, Finland and China (Beijing area) [25,28,30]. Other datasets showed Pseudomonas and Staphylococcus as predominant taxa in breastmilk samples from Spanish and Irish individuals in addition to commensal and obligate anaerobes such as Bi dobacterium and Bacteroides [27,31].
In general, low-biomass samples are unstable and their relative abundance depends on subtle changes of sample handling or sequencing methodologies [32,33]. More importantly, low-biomass samples are prone to DNA extraction and library preparation kit contaminants [34]. Different studies show that presence of Pseudomonas and Ralstonia in breastmilk could be the result of reagent contamination effect, especially when culturomics approaches fail to isolate such microorganisms in selective culture media [31]. Alternative to the utilization of negative controls, batch processing for all the samples allows for visual inspection of contaminants. In our study, samples were processed for DNA puri cation in a total of 16 batches. Detailed analysis allowed us to identify batch effect across our dataset for Pseudomonas, Enterobacteriaceae, Ralstonia and Herbaspirillum (Supplementary Figure 1). Even though such taxonomic groups have been reported in different breastmilk studies, including in a cohort of Mexican individuals [8,25,27,28], we decided to remove them from our dataset.
The candidate division WPS-2 was present in 2.1% of total relative abundance in our samples. This phylum has been described in human and canine oral microbiota and soil [35][36][37][38][39]. WPS-2 was incorporated in the Human Oral Microbiome database (HOMD) in 2014 [40]. While it has not yet been described in microbiota of newborns' oral cavity, WPS-2 was present in nasopharynx samples from infants under 6 months of age [41]. This is the rst time that WPS-2 is reported in breastmilk samples with high prevalence supporting the retrograde ux theory, in which microbiota from neonate's mouth contributes to the settlement of breastmilk bacteria [42].
In addition, we identi ed that breastmilk microbial diversity was speci c to the newborn´s sex (p ≤ 0.05). Our results are in accordance with previous reports where BMI and neonate's gender were related to enrichment of Staphylococcus, Stenotrophomonas and Burkholderia [8]. Regardless of the pathology (GDM or BMI), female-related colostrum samples showed higher alpha diversity compared to male subgroups, suggesting a more diverse microenvironment ( Figure 3A-C). It has been demonstrated that gut and oral microbiota from children and maternal breastmilk biochemical composition differ between female and male infants, possibly due to variation in hormone recruitment and energetic demand during pregnancy [8,[43][44][45]. According to this, we strongly suggest that neonate microbiota sex-bias should be an important consideration for further experimental designs trying to explain causality microbial changes due to any pathology.
Despite the sample heterogeneity, we guided our analysis by a general linear model to show that colostrum samples of individuals with obesity (Ob-F and Ob-M) or GDM (GD-F and GD-M) are enriched for Staphylococcus (p ≤ 0.10) compared to their healthy normal weight counterpart ( Figure 2). It has been observed that higher numbers of Lactobacillus and Staphylococcus were related to higher maternal BMI [21,46,47]. In addition, decreased Streptococcus abundance and an increment in breastmilk microbial diversity in Mexican-American subjects with high BMI (>25 kg/m 2 ) has been previously reported [10]. These results also correlate with our results, where Streptococcus are less abundant and high values of observed ASVs were found in obesity-male (Ob-M, 167ASVs) and obesity-female (Ob-F, 295 ASVs) subgroups. However, other reports show that colostrum samples from obese mothers presented a less diverse microbiota compared to non-obese samples [21]. This variability can be attributed to differences in study populations (geographical location, diet, socioeconomic status), sample collection at different lactation stage and neonate sex-bias.
We implemented the Aldex2 tool, which performs a log transformation and replacement of the zero values in the obtained results for a matrix creation that allows the determination of signi cant differences of taxa between subgroups [29]. Interestingly, we observed Bdellovibrionaceae and Saccharimonadales, which are ultra-small parasite bacteria, differentially present in Ob-M compared to its corresponding healthy (NW-M) contrast ( Figure 4B). While further research is needed, this pattern can be attributed to resilience mechanisms of breastmilk microenvironment to maintain a functional equilibrium through speci c predatory interactions with Gram-negative bacteria such as Burkholderia and Chryseobacterium, which also appear to be in higher proportions in obesity-male (Ob-M; Figure 2). This may be explained by the detection of DNA fragments resulting from the bacterial lysis. Bdellovibrionaceae has been found in soil, freshwater and human gut from healthy subjects and patients suffering from in ammatory diseases [48]. This taxon is considered as a potential probiotic, since it could modulate the gut biodiversity by its predation of bacteria correlated in chronic in ammatory diseases, such as obesity and Crohn's disease [49,50]. On the other hand, Saccharimonadales has been reported in human oral cavity, intestines, skin, and female genital tract [51][52][53]. Lif et al. [54] related the impact of birthing method and a higher prevalence of this novel phylum in oral bio lm samples of infants delivered vaginally compared to infants born by cesarean section. Even though Saccharimonadales remains di cult for cultivation, its presence in adult subgingival plaque, vagina and colon has been associated to human in ammatory mucosal diseases [55,56].
Our results indicate a differential prevalence of Burkholderiaceae and Sphingobacteriaceae in breastmilk samples from obesity-male (Ob-M) compared to its female contrast (Ob-F; Figure 4C). Similar results have been described in oral samples from male infants as they reported a higher abundance of Brachymonas and Sphingomonas [45]. We hypothesize that differences in breastmilk microbiota by infant gender in uence the conditioning of the neonate's gut microenvironment for bacterial communities related to the metabolism of nutrients involved in sex-related neurodevelopment.
We observed a higher relative abundance of Staphylococcus, Anaerococcus and Prevotella in both GDM subgroups compared to their corresponding contrast control (NW). Interestingly, similar pro les have been reported for gut microbiota individuals with GDM [57][58][59]. Complementarily, high prevalence of Prevotella has been observed in oral cavity, amniotic uid and gut microbiota of pregnant women with GDM, which con rms vertical transmission mother-to-baby and supports the enteromammary theory of breastmilk microbiota origin [57,[60][61][62]. Functional metagenomic approaches are needed in order to determine what is the exact role of Prevotella as keystone taxa under the speci c microenvironment shaped by GDM. Rhodobacteraceae was observed in higher proportion in GDM-female (GD-F) compared to healthy-female (NW-F) (adjusted p-value < 0.1). Rhodobacteraceae has been mostly reported in soil [63], but also in breastmilk from healthy Mexican women [25], human skin [64], meconium [65] and fecal samples from patients suffering from diarrhea [66].
In order to attempt to provide a prelaminar functional insight for the taxonomic pro les observed in our study, we used the Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt2) software [67] and Linear Discriminant Analysis (LDA) Effect Size (LEfSe) [68]. A total of 24 metabolic pathways reported in MetaCyc database [69] were considered as signi cant (p-value < 0.05, LDA score > 3.5). While the results suggested that bacterial pathways involved in amino acid and carbohydrate metabolism were signi cantly represented in the obese and with GDM subgroups ( Figure   5).
Despite that higher levels of branched-chain amino acids (BCAA) valine, leucine and isoleucine have been described in blood from GDM woman and have been linked to the early pathogenesis of type 2 diabetes [70][71][72], lactation for more than 3 months in women with GDM has been associated with changes in the BCAA pro le. However, the mechanism that explains this protection role is still unclear [71,73]. Palmitate biosynthesis and inositol isomers degradation pathways were more prevalent in GDM-female (GD-F) subgroup. Palmitate contains palmitic acid which is one of the major saturated fatty acids in colostrum, it provides around 25% of milk fatty acids and it is involved in absorption of fat and calcium [74]. Accordingly, GDM and obesity have been related to higher levels of this free fatty acid in placenta, umbilical cord and breastmilk [75][76][77]. Myoinositol has an important role in several biological processes related to cell survival, lipid metabolism, glycemic control and restoration of ovulation [78]. Women with GDM or polycystic ovary syndrome are de cient in inositol isomers [79,80]. Supplementation with myoinositol during pregnancy improves glucose metabolism, reduces the incidence and severity of GDM and decreases adverse neonatal outcomes [78][79][80][81]. We acknowledge that prediction of bacterial functional metabolic pathways based on 16S amplicon sequencing are strongly biased towards existing reference genomes and caution must be taken in the interpretation of results.

Conclusions
The factors determining colonization of an infant´s gut microbiota is important and could lead to improve health policies. Amplicon sequencing using NGS technologies are the most reliable methods for large-scale microbial compositional studies [82]; however, in order to obtain relevant functional insights, clinical protocols should include dietary habits of the mother, sex of the infant and a strong control of antibiotic usage. Our study indicates that GDM and obesity are related to a higher microbial diversity and signi cantly overrepresented by amino acid and carbohydrate metabolism bacterial pathways. However

Study population
Recruiting was done at Hospital Regional Materno Infantil, the main public perinatal medicine hospital of Servicios de Salud del Estado de Nuevo León, Mexico. We included mother-infant pairs, within 20 and 32 years of maternal age, and with a veri ed address in the Monterrey Metropolitan Area, who accepted the invitation to participate and signed the informed consent document. Exclusion criteria were: mothers who had a history of antibiotic usage in the 3 months prior to delivery; mothers who had prolonged exposure to antibiotics (more than 3 weeks) at any time during pregnancy; mothers who received immunosuppressive or immunomodulatory corticosteroid therapy; history of a vegan, ovolactovegetarian or exclusion diet (e.g.: ketogenic diet); history of bariatric surgery or any complicated surgery; history of feeding disorders; exposure to antineoplastic drugs, histamine-H2 receptor antagonists or proton pump inhibitors and/or monoclonal antibodies; and/or history of diarrhea during the three weeks before delivery. In addition, those with an uncertain last menstrual period date or irregular periods that gave place to an uncertain pregnancy dating were excluded. Elimination criteria were Antibiotics for more than Page 14/33 24 hours post-delivery; need of intensive care (mother or infant) and/or any condition that impeded recollection of breast milk.
Those selected mother infant pairs were further allocated for analysis to one of the three study groups, according to their BMI and health condition: 12 obese women (BMI ≥ 30 kg/m 2 ), 13 women suffering from gestational diabetes (BMI ≤ 29.9 kg/m 2 ) and 18 control healthy women (BMI ≤ 25 kg/m2). The study groups were further sub-divided by sex of the baby. The study protocol was approved by institutional Review Boards at Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, with the ID P000185-CarMicrobioLHum2018-CEIC-CR002, on May 6 th , 2019.

Samples collection and processing
After gentle cleansing only with sterile water of the breasts, each mother performed a gentle circular massage of each breasts until a few drops of colostrum appeared. These rst drops were disregarded, and the mother self-expressed approximately 5 mL (when possible) of colostrum into a sterile falcon-type 20 mL polypropylene tube, under close medical supervision. The procedure was repeated on the opposite breast. Extreme care was taken to avoid milk contact with breast skin or ngers. Then, the tubes were closed, and kept at -20°C for no more than 48 hours until DNA extraction.

DNA extraction
Genomic DNA was extracted from 1mL of colostrum using an optimized phenol-chloroform protocol [83].
Samples were thawed on ice and centrifugated at 16,000 X g for 15 minutes, the fat rim was carefully removed, and PBS washes were performed to eliminate fat residues (0.5 mL sterile PBS), centrifugation at 16,000 X g for 10 minutes). The pellet was resuspended in 0.5 mL of extraction buffer (220 mM Tris-HCl pH 7.5, 110 mM EDTA, 1100 mM NaCl, 20% Triton X-100, 2% SDS) and 0.3 mL of 3 M of sodium acetate. An additional step of mechanical lysis was performed by bead-beating with lysing matrix A using a FastPrep (MP Biomedicals, Santa Ana, CA) disruptor at a speed setting of 5.5 m/s for 25 s. The lysate was submitted to an enzymatic lysis with 10 µL of proteinase K (10mg/mL), 5 µL of lysozyme (10 mg/mL) and 10 µL of RNAse, and incubated at 60°C for 1 hour. After incubation, 100 µL of 1.5 M NaCl ( lter sterile) were added and carefully mixed and maintained at room temperature for 5 minutes. Following incubation, the mixture was centrifugated at 16,000 X g for 15 minutes and the supernatant was transferred into a new tube and extracted twice with an equal volume of phenol:chloroform:isoamylalcohol (25:24:1). The precipitation of DNA was carried out by the addition of 0.6 volumes of isopropanol and incubation at -80°C for 1 hour. Next, we centrifuged the samples (16,000 X g for 15 minutes), and the isopropanol was retired. The pellet was washed twice with 70% ethanol, air dried and resuspended in preheated 50 µL of nuclease free water. The DNA was measured using a NanoDrop ND-1000 UV spectrophotometer (Thermo Fisher Scienti c, Waltham, MA, USA) and the DNA integrity was con rmed through an agarose gel electrophoresis. Unless otherwise speci ed, all reagents were purchased from Sigma Aldrich.
In order to identify any contaminations during DNA extractions, we processed all the samples in a total of 16 different batches using freshly prepared solutions prior to run the experiments. Microbial compositional information for all the batches is provided as Supplementary Figure 1.
DNA sequencing and analysis DNA samples were sequenced at the Advanced Genomics Unit (LANGEBIO, CINVESTAV) using Illumina MiSeq (2x300) following the 16S rRNA amplicon sequencing library preparation (as per manufacturer recommendations) for the ampli cation of V3-V4 hypervariable region with the universal primers 341F 5'CCTACGGGNGGCWGCAG3' and 785(R) 5'GGACTACHVGGGTATCTAATCC 3'. We normalized the DNA for sequencing at 25 ng/ l.
Bioinformatic analyzes were carried out using QIIME 2 v.2019.7 [84]. Sequencing readings were quality ltered using the q2-demux plugin with a minimum length at 270 nucleotides followed by denoising with DADA2 [85]. Single-paired ltered readings were used for the taxonomic species pro le using amplicon sequence variants (ASVs) with the q2-feature-classi er [86] against the Silva 132 database with a limit of sequence identity set at 99% [87]. Removal of potential contaminants include ASVs belonging to Cyanobacteria, Phyllobacterium, Chloro exi, mitochondria / chloroplast and rare taxa (with less than 25 reads across the entire dataset). Resulting ASVs were aligned with mafft v2019.7 [88] and implemented to create a phylogeny with fasttree2 v2019.7 [89]. Rarefaction of sequences to 4,130 per sample was used to perform alpha and beta diversity analyzes. Observed ASVs, Shannon index and Faith's phylogenetic diversity were used as alpha-diversity metrics; UniFrac (weighted and unweighted) and robust Aitchison distances were used for the creation of PCoA and RPCA respectively. We determined bacterial families present in all samples with a minimum relative abundance of 1% overall as core microbiota. ASVs that were assigned at family level were considered to perform differential abundance analysis with Aldex2 tool v1.14.1 [29]. We implemented the Phylogenetic Investigation of Communities by Reconstruction of Unobserved States 2 (PICRUSt2) v2.3.0-b software [67] with default options (picrust pipeline.py) to determine the potential link between the microbiome environment and the functional metabolism based on metabolic pathways reported in MetaCyc database [69]. Raw data was deposited and is available at the NCBI Sequence Read Archive (SRA) under SRA accession number PRJNA638389.
Data processing QIIME2 α-diversity outputs and ASVs count tables at the family and genus taxonomical level were imported and processed in Minitab 17. Association of observed ASVs, Shannon index, Faith's phylogenetic distance and transformed ASVs count tables with maternal health condition, mode of delivery, antibiotic exposure, parity and gender of the neonate was assessed by general linear model (glm) with p-value of ≤ 0.10. β-diversity signi cance for UniFrac and robust Aitchison distances was calculated using permutational ANOVA (PERMANOVA) with 999 permutations. Differential bacteria were assessed with Welch's t test with a Benjamini-Hochberg's false discovery rate (FDR) p-value correction after a centered log ratio (clr) transformation with a zero-replacement of taxa counts. Functional metabolism prediction results by PICRUST2 were further analyzed in the MicrobiomeAnalyst web-based platform [90] using the LEfSe method [68] which performs a Kruskal-Wallis rank sum test to determine the signi cantly different features between groups and a linear discriminant analysis (LDA) to estimate their effect size. A metabolic pathway was considered signi cant with a p-value ≤ 0.05 and an LDA score

Consent for publication
All authors have read the manuscript and have provided their consent for publication.

Availability of data and materials
Raw data is available in the NCBI under ID number 638389 and Bioproject accession number PRJNA638389.

Competing interests
All authors have read the manuscript and declare no con ict of interest.

Funding
The project was funded under ITESM-0021C21064 seed grant.        Differential bacteria at family level between subgroups. Each panel shows a plot illustrating differentially abundant taxon for each comparison and their median difference of centered log-ratio (clr) transformation, which indicates the dimension of the difference in abundance. Taxa with bold letters represent members of the core microbiota. Dots in the plot represent the median difference of signi cant features after a Welch's t test (adjusted p-value ≤ 0.1) and effect size ≥ 1. Dots are colored according to the subgroup that contained a greater fraction. A) Comparison of GDM-female versus healthy-female. B) Comparison of GDM-male versus obesity-male. C) Comparison of healthy-male versus obesity-male. D) Comparison of obesity-female versus obese-male. *q ≤ 0.1; **q ≤ 0.05.

Figure 4
Differential bacteria at family level between subgroups. Each panel shows a plot illustrating differentially abundant taxon for each comparison and their median difference of centered log-ratio (clr) transformation, which indicates the dimension of the difference in abundance. Taxa with bold letters represent members of the core microbiota. Dots in the plot represent the median difference of signi cant features after a Welch's t test (adjusted p-value ≤ 0.1) and effect size ≥ 1. Dots are colored according to the subgroup that contained a greater fraction. A) Comparison of GDM-female versus healthy-female. B) Comparison of GDM-male versus obesity-male. C) Comparison of healthy-male versus obesity-male. D) Comparison of obesity-female versus obese-male. *q ≤ 0.1; **q ≤ 0.05.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.