Bacteroidota and Lachnospiraceae Integration Into the Gut Microbiome at Key Time Points in Early Life are Critical for Neurodevelopment

Background: The early life microbiome plays critical roles in host development, shaping long-term outcomes including brain functioning. It is not known which initial infant colonizers elicit optimal neurodevelopment; thus, this study investigated the association between gut microbiome succession from the rst week of life and head circumference growth (HCG), the earliest validated marker for neurodevelopment. Faecal samples were collected weekly from a preterm infant cohort during their neonatal intensive care unit stay and subjected to 16S rRNA gene sequencing for evaluating gut microbiome composition, in conjunction with clinical data and head circumference measurements. Results: Preterm infants with suboptimal HCG trajectories had a depletion in the abundance/prevalence of Bacteroidota and Lachnospiraceae, independent of morbidity and caloric restriction. The severity of gut microbiome depletion matched the timing of signicant HCG pattern separation between study groups at 30 weeks postmenstrual age. Consideration of the clinical variables indicated that optimal infant microbiome succession is primarily driven by dispersal limitation (i.e., delivery mode) and secondarily by habitat ltering (i.e., antibiotics and enteral feeding). Conclusions: Bacteroidota and are known core taxa of the adult microbiome, with roles in dietary glycan foraging, metabolite production and immunity, and our work provides strong evidence that their integration into the gut microbiome needs to occur early for optimal SHCGT groups could not be signicantly distinguished from each other until the measurement end point of 36 weeks PMA. Together, these results indicate that SHCGT could generally be treated as a loss in HCZ of greater than 0.5 that occurred from 31-36 weeks PMA. Therefore, further analysis of the infant gut microbiome was conducted on the time windows of 24-30 weeks PMA (before SHCGT) and 31-36 weeks PMA (during SHCGT), which is also concordant with the gut microbiome successional pattern.

Studies to date have focused on nutritional interventions for improving HCG in infants [16][17][18][19][20][21][22][23]. However, associations between infant diet and neurodevelopment are not always clear; for example, a recent metaanalysis on human milk feeding for improving neurodevelopmental outcomes demonstrated only weak or inconclusive evidence for its bene cial effects [23]. One overlooked factor that is gaining attention is the ecosystem of microorganisms that populate the gastrointestinal tract, the gut microbiome. The gut microbiome diversi es in parallel with infant development [24][25][26][27], and antibiotic use in the rst year of life has been correlated with later developmental outcomes, including lower cognitive capabilities, attention de cit hyperactivity disorder (ADHD), anxiety and depression [28,29]. The gut microbiome also intersects with many environmental factors already associated with neurodevelopment [30][31][32][33], importantly including nutrition. Critical roles of the gut microbiome in digestion include breaking-down macronutrients, producing neuroactive metabolites after fermentation, and synthesizing micronutrients [5,34]. The gut microbiome additionally indirectly affects neurodevelopment via its interactions with the immune system and intestinal barrier, impacting systemic in ammation and circulation of metabolites [5,34]. Alterations in the composition and function of the gut microbiome have been associated with developmental disabilities in children including autism spectrum disorder (ASD) [35] and ADHD [36]. However, it is not known which early life microbes occurring prior to developmental disability onset are vital to optimal neurodevelopment, and no study to date has examined the relationship between the earliest neurodevelopmental marker, HCG, and the gut microbiome in infants over time.
To address this knowledge gap, prospectively collected longitudinal faecal samples from 58 infants born <34 weeks gestational age at the University of Chicago Comer Children's Hospital were subjected to 16S rRNA gene sequencing [37,38] to determine the microbiome composition and predicted functional pro le [39]. Preterm infants were selected as they represent a substantive cohort (1 in every 10 infants worldwide is born preterm [40]) at an increased risk for both developmental disability [41] and altered gut microbiome con gurations from clinical exposures such as antibiotics [24][25][26][27]. Further, preterm infants can be strictly monitored within the neonatal intensive care unit (NICU) environment, allowing statistical and machine learning models to be built that incorporate HCG trajectories, time, and clinical variables. We hypothesized that features of the early gut microbiome are associated with HCG from birth to termequivalent age in the NICU and found that infants with suboptimal HCG trajectories (SHCGT) experienced a concurrent loss in HCG and the abundances of Bacteroidota and Lachnospiraceae speci cally from 31-36 weeks postmenstrual age (PMA) independent of clinical morbidity and caloric restriction. Thorough examination of the clinical variables revealed preferential effects of infant microbiome successional drivers, with dispersal limitation (i.e., delivery mode) superseding habitat ltering (i.e., antibiotics and enteral feeding) in shaping optimum microbiome and thus host development.

Methods
The primary outcome was infant head circumference growth (HCG). HCG trajectory was evaluated by the difference between head circumference z-scores (HCZs) calculated from the Fenton 2013 growth curve [42] by the completed weeks method at birth and 36 weeks PMA. If the infant was discharged prior to 36 weeks PMA, the measurement at NICU discharge was taken instead (no infant was discharged prior to 34 weeks PMA). Infants that expired in the NICU were excluded. Head circumferences were measured weekly by nursing staff.
For assessment of the infant faecal microbiome, infant diapers were collected weekly by the nursing staff and stored immediately at -20 °C. Research staff aliquoted the faecal samples into sterile microcentrifuge tubes under an anaerobic atmosphere (5/5/90 H 2 /CO 2 /N 2 ). Samples were immediately cryopreserved at -80 °C after aliquoting. Clinical variables identi ed as possible confounders were gathered by abstracting information from patient charts using the electronic medical record system at the University of Chicago Comer Children's Hospital. Data collected included delivery mode, gestational age at birth, sex, birthweight, head circumference at birth, enteral feeding regimens, antibiotics administration, and clinical morbidities.
Enteral feeding data was extracted as the total volume in mL of formula, human milk and total enteral feeds (formula + human milk) for each day the infant was in the NICU up until the head circumference measurement end point. Human milk comprised of mother's own milk except for two infants that were fed donor human milk. Infants were additionally weighed daily by nurses, and the feeding volumes in mL/kg were computed using the bodyweight recorded. The daily feeding amounts were then averaged over the time periods of interest, and the percent of human milk was calculated by dividing the amount by the total enteral feeding amount. The number of days of total parenteral nutrition (TPN -0 mL/kg total enteral feeds) over the time periods of interest was also calculated, in addition to the rst day of life that full enteral feeds (120 mL/kg) was achieved.
Antibiotics data was extracted up until the head circumference measurement end point as the total number of days and longest number of consecutive days administered over the time periods of interest.
The administered antibiotics (via IV or injection unless otherwise noted) included antifungals (amphotericin B except for one infant that received oral nystatin), cephalosporins (3 rd generation cefotaxime, ceftazidime or ceftriaxone except for two infants that received 1 st generation cefazolin, four that received 4 th generation cefepime and one that received 1 st generation oral cephalexin), clindamycin, oral erythromycin, gentamicin, metronidazole, penicillins (ampicillin except for one infant that received penicillin, two that received piperacillin, two that received oxacillin and two that received oral amoxicillin) and vancomycin.
In addition to the total length of stay in the NICU and the PMA at discharge, the morbidities reported include bronchopulmonary dysplasia (BPD), severe brain injury (SBI), severe retinopathy of prematurity (SROP), necrotizing enterocolitis (NEC), seizures and sepsis. BPD was de ned as the need for supplemental oxygen at 36 weeks PMA [43]. SBI was de ned as the presence of periventricular leukomalacia and/or grade 3 or 4 intraventricular haemorrhage on a cranial ultrasonogram [44]. SROP was de ned as unilateral or bilateral retinopathy of prematurity (ROP) at stage 4 or 5 and/or ROP requiring treatment by laser or anti-vascular endothelial growth factor drugs [44]. NEC was de ned by the modi ed Bell's criteria, with at least stage 2 being requisite for disease classi cation [45]. Presence of seizures required both clinical and EEG-activity con rmation. Sepsis encompassed both blood culturepositive early-(<72 h) and late-(>72 h) onset.
Illumina 16S rRNA gene sequencing and processing: Patient faecal samples were submitted to the Environmental Sample Preparation and Sequencing Facility at Argonne National Laboratory (Lemont, IL, USA) for genomic DNA extraction and Illumina 16S rRNA gene sequencing [37,38]. FASTQ les received from the facility were imported into the QIIME2 version 2019.7 [46] pipeline per batch and demultiplexed using the demux tool. The demultiplexed samples were then denoised by the sample inference tool DADA2 [47] from amplicon data within QIIME2. Batches were then exported and merged prior to being imported into R statistical software version 3.6.2 for subsequent classi cation and post-processing. Amplicon sequence variants (ASVs) were classi ed to the genus level by the IDTAXA method [48] via R package DECIPHER version 2.14.0 using the Genome Taxonomy Database [49] version 89. ASVs were additionally classi ed into species-like groups by the online NCBI Nucleotide Basic Local Alignment Search Tool [50] (BLAST -https://blast.ncbi.nlm.nih.gov/Blast.cgi). Top hits were selected by rst the highest percentage identity, and second the lowest e-value. ASVs were amalgamated by identical lists of top hits, taking the sequence with the highest count in the dataset as the representative. Species were called if the top hit was of ≥97% identity and matched the genus identi ed by IDTAXA; multiple species calls were allowed. After classi cation, low quality samples with <1000 total sequence counts were removed, and then species-like groups that represented <0.1% mean abundance were culled.
The α-diversity metrics of richness and Shannon diversity were computed by R package iNEXT [51] version 2.0.20. For β-diversity analysis, the taxonomic levels of phylum, family, genus, and species were individually considered. These data were centre-log ratio (CLR) transformed using R package ALDEx2 [52] version 1.18.0 and taking the median of the Monte-Carlo instances as the value. This CLR transformation allowed standard statistical analysis to be conducted on the inherently compositional data. Predicted functional pro les were obtained via the R package Tax4Fun2[39] version 1.1.5 using their downloaded version 2 reference dataset from NCBI BLAST (Ref99NR) and KEGG [53]. As the output of Tax4Fun2 yields not raw counts but percent abundances, this data was CLR transformed manually, with zeroes imputed by the non-parametric multiplicative simple method using a detection threshold of 1x10 -12 through R package zCompositions version 1.3.4.

Statistical analysis:
All statistical analysis was done in R statistical software version 3.6.2 with plots generated by R package ggplot2 version 3.3.0. Original study groups were de ned by the loss in HCZ from birth to 36 weeks PMA: appropriate head circumference growth trajectory (AHCGT -≥0.5; n=28 patients, n=118 faecal samples), mildly SHCGT (<0.5-1; n=16 patients, n=67 faecal samples), moderately SHCGT (<1-1.5; n=8 patients, n=32 faecal samples) and severely SHCGT (<1.5; n=6 patients, n=23 faecal samples). Later analysis combined all SHCGT groups in comparison to the AHCGT group, and the moderately to severely SHCGT groups in comparison to the AHCGT to mildly SHCGT groups, for both the complete and limited morbidity (LM) datasets. The LM subset comprised of infants without SBI, SROP, NEC, seizures, or sepsis; BPD was not excluded as it had an equivalent risk ration between study groups (Table 2): AHCGT (n=23 patients, n=101 faecal samples), mildly SHCGT (n=13 patients, n=50 faecal samples) and moderately to severely SHCGT (n=8 patients, n=32 faecal samples).
Differences in the clinical characteristics of delivery mode, sex, and incidence of morbidities between study groups were assessed by the Fisher's exact test and presented as the percentages (number) per group. Differences in the remaining clinical variables between study groups were assessed by Welch's ANOVA (multiple groups) or Welch's t-test (two groups) and presented as the mean ± standard deviation per group. Post-hoc analysis was conducted by the Games-Howell method using R package userfriendlyscience version 0.7.2, and pairwise Cohen's Ds were computed by R package rstatix version 0.6.0 without assuming equal variances. Two-sample tests with two-sided p values were completed unless the mean and standard deviation for a study group was zero, which occurred for the clinical variables of clindamycin, erythromycin, and metronidazole administration. In these cases, a series of Welch's t-tests were conducted with the lowest p value reported, and one-sample tests with one-sided (greater) p values were completed in lieu of the all zero study groups. A standard p value <0.05 was considered statistically signi cant. Detailed statistics are provided for the complete dataset [see For multidimensional β-diversity analysis, redundancy analysis (RDA) was performed by R package vegan version 2.5.6 on the genus level data, with HCG trajectory and PMA as environmental variables.
Both overall signi cance of the model and term signi cance was determined by an ANOVA like permutation test, with constraints on the permutations by patient, through this R package. A standard p value <0.05 was considered statistically signi cant. Vector or factor averages of the environmental variables were similarly computed by this R package, for both reporting the R 2 values and locating the study group centroids on the RDA plot. Details of the RDA are provided [see Additional le 3].
For determining statistically signi cant differences in α-diversity metrics, individual microbial taxon abundances or predicted function (KEGG orthologies -KOs) abundances between study groups, mixedeffect linear models were constructed via R package lme4 version 1.1.23 using HCG trajectory and PMA as xed effects, and patient as a random effect. Additionally, an equivalent approach was utilized to test which microbial taxon abundances were resultant from vertical transmission, by replacing the xed effect HCG trajectory with delivery mode. Variables were standardized using R package standardize version 0.2.1 prior to tting the model, and p values were determined by Satterthwaite's method using R package lmerTest version 3.1.2. For the α-diversity metrics, a standard p value <0.05 was considered signi cant. To correct for multiple testing of the individual microbial taxa and KOs, the Benjamini-Hochberg method was utilized, and a false-positive rate of <1% was the set threshold for signi cance. The marginal coe cient of determination for generalized mixed-effect models (variance explained by xed effects) calculated by R package MuMIn version 1.43.15 was also reported as the R 2 . Change point analysis (see below) uncovered the key time windows of 24-30 weeks PMA and 31-36 weeks PMA. The differences in individual microbial taxon abundances and KO abundances between study groups were thus again evaluated during these time windows using the same method as above, except PMA was removed as a xed effect since it was already inherently controlled. The standardization step was also omitted as there would be no issues of scaling due to the model having only one xed effect.
Prevalence of individual microbial taxa or KOs amongst patients within study groups was also computed overall and separately for the key time windows of 24-30 weeks PMA and 31-36 weeks PMA. If a given microbial taxa or KO had >0 sequence counts in at least one faecal sample during the time window of interest, it was considered to be present in the patient. Statistical signi cance was then evaluated between study groups by the Fisher's exact test, with the Benjamini-Hochberg correction for multiple testing applied. A false-positive rate of <1% was the set threshold for signi cance. The ratios of study group percent prevalence are also reported.
Discussed key microbial taxa for HCG were both signi cantly differentially abundant or prevalent between study groups by the set threshold and were present in at least three patients of one study group. For KOs, statistically signi cant features were tallied by their respective KEGG [53] pathways using R package KEGGREST version 1.26.1. The importance of KEGG pathways for discussion was then determined through ranking them by their total number of signi cant features. Detailed statistics from all studied datasets and time windows are provided for both the abundance and prevalence of signi cant microbial taxa and KOs [see Additional le 4]. This same procedure was applied to discussed vertically transmitted microbial taxa that were signi cantly differentially abundant by delivery mode; detailed statistics are provided [see Additional le 5].
Change point analysis: Individual microbial taxa were selected for change point analysis if they signi cantly varied in abundance by PMA as determined from the above mixed-effect linear models. Change point analysis was conducted on the mean percent abundances of the microbial taxon over PMA, and separately for each study group, in case the microbial taxon had a distinct pattern over time by the clinical variable of interest. Only the patients for which the microbial taxon was present in at least one faecal sample (>0 sequence counts) were used for determining change points, as membership of individual microbial taxa was expectedly variant between patients, and thus the number of all zeroes would obscure the patterns of mean microbial taxon abundances over time. A data point for each PMA week from 24-36 was required for this analysis, and missing points were imputed by R package imputeTS version 3.0. Change points were evaluated using the pruned exact linear time (PELT) algorithm with a non-parametric cost function based on the empirical distribution of the data using R package changepoint.np [54] version 1.01, with the number of quartiles set to four times the log length of the time series. Change points were evaluated for a range of penalty values between a minimum of the log length of the time series and a maximum of ten times that value. The ideal penalty value (and thus number of change points) was selected from the diagnostic plot as outlined by Lavielle [55], i.e., the value at which the difference in test statistic (cost of penalty) has the largest improvement, or the elbow of the plot. This elbow was automatically detected through determination of the value that had the largest distance from a straight line between the minimum and maximum plot points. The change points were then tallied by PMA week for each examined taxonomic level (phylum, family, genus, and species), which revealed the time point at which the most changes in microbial taxon abundances were occurring.

Random forest classi cation:
To determine the relative importance of clinical factors versus faecal microbiome features in predicting infant HCG trajectories, the random forest classi cation approach was applied. The R package caret version 6.0.86 was utilized to build random forest classi ers with 500 trees, and the number of variables randomly sampled as candidates at each split (argument mtry) tuned from a random search using 10-fold adaptive cross-validation and 3 repeats. For evaluating feature importance, statistical signi cance was estimated for the decrease in Gini coe cient from permuting the response variable by R package pRF version 1.2. Features were then ranked by their permutation importance [56], or the number of permutations yielding a lower importance than observed out of 1001. The key time windows of 24-30 weeks PMA and 31-36 weeks PMA were each separately considered for building random forest classi ers to predict the binary outcome of AHCGT versus any SHCGT.
Classi ers were rst built using the faecal microbial taxonomic abundance data at the studied individual taxonomic levels on a per faecal sample basis with indexing to control against samples from the same patient being included in both the training and testing subsets. Microbial taxa that had a statistically signi cant (p<0.05) feature importance, were present in at least three patients' faecal samples and were classi ed to the respective taxonomic level examined were subsequently selected to be included in the nal multi-taxonomic level random forest classi ers. In order to be combined with the clinical data, it was necessary to reduce the microbial taxonomic abundance data to one value per patient, and thus the median abundance across each of the patient's faecal samples for the respective time windows was taken as the value. Clinical data included mode of delivery, gestational age at birth, sex, birthweight, birth head circumference, morbidities (BPD, SBI, SROP, NEC, seizures, sepsis, presence of any morbidity, presence of two or more morbidities and number of morbidities), enteral feeding (total in mL/kg, formula in mL/kg, human milk in mL/kg, percentage of human milk, and days of TPN over time window of interest), and antibiotics (all, penicillins, gentamicin, vancomycin, cephalosporins, antifungals, clindamycin and metronidazole in number of days and longest number of consecutive days over time windows of interest). Faecal microbial taxonomic prevalence data were also added to the multitaxonomic level random forest classi ers in the same fashion as the faecal microbial taxonomic abundance data, except it was not necessary to reduce these data to one value as the values were already in the format of microbial taxon presence versus absence per patient over the respective time windows.
After the multi-taxonomic level random forest classi ers were built, a further feature reduction step took place to remove redundant variables. Features were ranked by importance as previously described, and this information was used to select if morbidities were better indicated individually or by one of the summary variables, if enteral feeding-type was better indicated in mL/kg or by percentage, if enteral feeding withdrawal was better indicated by days of TPN or total enteral feeds in mL/kg, and if antibiotics were better indicated as a total or by separate classes and by total days or by longest number of consecutive days. Further, faecal microbial features were reduced by determining if a given microbial taxon was better indicated by prevalence or median abundance, and if a given higher taxonomic level was a better indicator than its lower taxonomies. The nal random forest classi ers accuracies are described by the subtraction of the out of bag error from 100, and features are displayed by their permutation importance. Details of the random forest classi ers are provided [see Additional le 6].

Moderation analysis:
To assess the impact of faecal microbial taxon abundances and clinical factors on the association between delivery mode and infant HCG trajectories, moderation analysis was conducted.
Cumulative link mixed regression models were built using R package ordinal version 2019.12.10, with infant HCG trajectory (AHCGT -> Mildly SHCGT -> Moderately SHCGT -> Severely SHCGT) as the outcome, PMA, delivery mode, the microbial taxon abundance/clinical factor and the interaction between delivery mode and the microbial taxon abundance/clinical factor as xed effects, and patient as a random effect. Data were standardized using R package standardize version 0.2.2 prior to tting the models, and models were tted with the adaptive Gauss-Hermite quadrature approximation with 10 quadrature points. Signi cance of the delivery mode and microbial taxon abundance/clinical factor interaction was assessed by the Wald statistic implemented in the ordinal package. McFadden's R 2 was calculated manually through dividing the log likelihood of the model by the log likelihood of the null model and subtracting this value from one. The null model was built through the same method as above except with removal of all xed effects.
The Benjamini-Hochberg method was used to correct for multiple testing, and a false-positive rate of <1% was the set threshold for signi cance. Additionally, only the microbial taxa that met the prevalence threshold for assessing the effects of delivery mode on microbial taxon abundances as described in the statistical analysis section were considered. As it was of speci c interest to determine which microbial taxa moderated the positive impact of vaginal delivery on infant HCG trajectories, Cohen's D effect sizes between vaginally delivered infants with AHCGT versus vaginally delivered infants with each of the three SHCGT groupings were calculated using R package rstatix version 0.7.0 without assuming equal variances, and microbial taxa that were consistently augmented or diminished in abundance across the three groupings were reported. Detailed statistics are provided for the signi cant microbial taxon abundances [see Additional le 7].
The clinical factors examined included gestational age at birth, birthweight, birth head circumference, sex, total days of all antibiotics, longest number of consecutive days of all antibiotics, total amount of enteral feeds, total days of TPN, day of life full enteral feeds achieved, total amount of human milk, total amount of formula and number of morbidities. Individual antibiotics or morbidities were not examined for this analysis, as some were too rare in incidence to be properly assessed. A standard p<0.05 was the set threshold for signi cance. Cohen's D effect sizes between infants with AHCGT versus any SHCGT were calculated separately for each delivery mode using R package rstatix without assuming equal variances.

Results
The gut microbiome differentiates suboptimal head circumference growth trajectories from appropriate head circumference growth trajectories in infants speci cally after 30 weeks PMA, the time point of head circumference growth trajectory divergence: It was not previously known if gut microbiome succession could be linked to HCG, the earliest marker of human neurodevelopment. This study not only determined that β-diversity of the gut microbiome signi cantly differentiated infants with SHCGT from AHCGT, but also that the key time point when most faecal microbial taxa exhibited a shift in mean abundance of 30 weeks PMA matched the onset of signi cant SHCGT. Preterm infant HCG trajectory was assessed as the difference in HCZ from birth to 36 weeks PMA by the Fenton growth curve [14,42], and study groups were strati ed by 0.5 interval losses in HCZ: AHCGT (≥0.5), mildly SHCGT (<0.5-1), moderately SHCGT (<1-1.5) and severely SHCGT (<1.5). Alterations in α-diversity between study groups were rst considered using ANOVA on multivariate regression with PMA as a covariate and patient as a random effect, but no signi cant differences were found [see Additional le 9]. Next, β-diversity (genus level) was examined by RDA with study groups and PMA as terms and permutations blocked by patient (Figure 1a). A signi cant RDA model was built (p=0.001), and all terms had a statistically signi cant effect on gut microbiome composition (p=0.001), listed from most to least impact (R 2 ): PMA (0.6391), AHCGT (0.3795), moderately SHCGT (0.2160), severely SHCGT (0.1447) and mildly SHCGT (0.1060).
De ning key time points of gut microbiome succession is critical, as previous research has shown that the inability of the microbiome to achieve a particular state on time (i.e., "mature") is associated with poor developmental outcomes such as weight gain failure in children [57]. To identify successional time points within the infant dataset, a novel exploitation of change point analysis was employed. Change point analysis presents a distinct advantage over previous time series methods applied to microbiome succession [24,27,57], as it is embedded in a statistical framework, does not require a priori knowledge of important microbial taxa or time ranges, and unlike machine learning strategies does not need a separate dataset for model training. Change point analysis has been successfully utilized in a human microbiome study to link the abundance of individual microbial taxa to a clinical state over time [58], but has been more widely used on the macroecological scale for detecting whole community changes [59,60]. Therefore, successional time points of the gut microbiome can be identi ed as the maxima of the number of change points in microbial taxon abundances similar to the macroecological approach, given relevant parametrization to account for the complexity and multi-community nature of the microbiome as described in the methods section. After application of this approach, 30 weeks PMA was determined to be the key time point of gut microbiome succession within our studied time range, regardless of taxonomic level (Figure 1b).
To observe how this key time point correlated with HCG patterns, the loss in HCZ from birth for each PMA week was plotted for each study group (Figure 1c). ANOVA with post-hoc analysis revealed that the severely SHCGT group signi cantly separated from the AHCGT group starting at 28 weeks PMA, but signi cantly differentiating all three SHCGT groups from AHCGT could not be done until 31 weeks PMA [see Additional le 10]. Additionally, all three SHCGT groups could not be signi cantly distinguished from each other until the measurement end point of 36 weeks PMA. Together, these results indicate that SHCGT could generally be treated as a loss in HCZ of greater than 0.5 that occurred from 31-36 weeks PMA. Therefore, further analysis of the infant gut microbiome was conducted on the time windows of 24-30 weeks PMA (before SHCGT) and 31-36 weeks PMA (during SHCGT), which is also concordant with the gut microbiome successional pattern.
Infants with suboptimal head circumference growth trajectories have a depleted abundance of faecal Bacteroidota and Lachnospiraceae from 31-36 weeks PMA: Having identi ed the critical time point of microbiome change relevant to head circumference growth, we next investigated which microbial taxa are putatively important for infant neurodevelopment. Bacteroidota and Lachnospiraceae were speci cally identi ed as biomarkers of AHCGT. Further, it was determined that their signi cant reduction in abundance occurred during the time period of HCZ loss (31-36 weeks PMA), solidifying a key time point for interventional strategies. Finally, through predictive metagenomic pro ling [39,53], the depletion of these taxa was found to result in a reduced carbohydrate utilization capacity of the gut microbiome, suggesting a mechanistic link in regards to energy resource utilization and short-chain fatty acid (SCFA) production [5,34]. The differences in microbial taxon and KO abundances between study groups were assessed by ANOVA on multivariate regression with patient as a random effect during the time windows of 24-30 weeks PMA (before SHCGT) and 31-36 weeks PMA (during SHCGT). The number of patients with microbial taxon or KO presence versus absence for each study group was statistically compared by the Fisher's exact test. Study group comparisons also included infants with any SHCGT versus AHCGT, and infants with moderately SHCGT to severely SHCGT versus AHCGT to mildly SHCGT. However, the latter yielded no statistically signi cant differences in abundance or prevalence of microbial taxa, and thus will not be discussed further.
From 24-30 weeks PMA (before SHCGT), few statistically signi cant differences in faecal microbial taxa were observed between study groups, and these taxa were relatively more abundant in the gut microbiome of infants with SHCGT. The relative abundance of Firmicutes was signi cantly higher (p=0.009, R 2 =0.10) in infants with any SHCGT versus AHCGT (Figure 2a). No speci c sub-taxon from within Firmicutes could account for the observed signi cant difference. Actinobacteriota was also signi cantly differentially abundant between study groups (p=0.02, R 2 =0.15) from 24-30 weeks PMA, mainly attributed to its relatively high abundance in the mildly SHCGT group (Figure 2b), speci cally the abundances of Mycobacteriaceae (p=0.0009, R 2 =0.20) and its genus Corynebacterium (p=0.001, From 31-36 weeks PMA (during SHCGT), several faecal microbial taxa were signi cantly less abundant or prevalent in infants with any SHCGT compared to AHCGT. The bacterial phylum Bacteroidota was found to be signi cantly differentially abundant (p=0.0009, R 2 =0.11) between infants with any SHCGT and AHCGT (Figure 2c), in addition to its family Bacteroidaceae (p=0.002, R 2 =0.087) and its genus Bacteroides_B (p=0.009, R 2 =0.087). The bacterial family Lachnospiraceae was also found to be both signi cantly differentially abundant (p=0.004, R 2 =0.095) and prevalent (p=0.009) between infants with any SHCGT and AHCGT (Figure 2d). No singular genera from within Lachnospiraceae could account for the observed signi cant differences. The prevalence of the bacterial family Ruminococcaceae was signi cantly different between study groups (p=0.007) during this timeframe, which could be attributed to the species Faecalibacterium prausnitzii that was signi cantly less prevalent (p=0.004) in infants with any SHCGT versus AHCGT (8% vs. 48%). Finally, two families from the bacterial phylum Firmicutes_C were additionally signi cantly less prevalent in infants with any SHCGT versus AHCGT: Megasphaeraceae (p=0.004; 0% vs. 25%) and Negativicoccaceae (p=0.009; 0% vs. 21%). Main genera present in the dataset from these bacterial families included Megasphaera and Anaeroglobus, and Negativicoccus, respectively.
To examine differences in function associated with these taxonomic differences, predictive metagenomic pro ling was performed by Tax4Fun2 [39]. Signi cant KOs (p<0.05; FP<1%) were tallied by their KEGG pathway classi cations to identify where the most changes in function were occurring. Out of the major categories, the greatest number of signi cant differences in both KO abundance and prevalence occurred in "Metabolism", and the majority of signi cant changes in metabolism were related to "Carbohydrate metabolism" ( Table 1). The highest number of these KOs signi cantly depleted in the gut microbiome of infants with any SHCGT compared to AHCGT belonged to the pathway "Starch and sucrose metabolism", and the second highest number belonged to the pathway "Amino sugar and nucleotide sugar metabolism" (Table 1). All KOs that were signi cantly less abundant or prevalent were only noted from 31-36 weeks PMA. The changes in starch and sucrose metabolism could be attributed to the loss of representatives from the phylum Bacteroidota, whereas most changes in amino sugar and nucleotide sugar metabolism could be related to the loss of Lachnospiraceae (with some overlap of Bacteroidota). importance for binary classi cation of infants into AHCGT versus any SHCGT with 77.5% and 84% accuracy, respectively. From 24-30 weeks PMA, the top patient demographic factor was gestational age at birth (ranked #13), antibiotic factor was total days of all antibiotics (ranked #16), and enteral feeding factor was percentage of enteral feeds as human milk (ranked #17), which were all in the 2/3rds ranked section of important features. From 31-36 weeks PMA, the top patient demographic factor was again gestational age at birth (ranked #29), antibiotic factors were total days of metronidazole (ranked #20), total days of erythromycin (ranked #24) and consecutive days of clindamycin (ranked #26), and enteral feeding factor was total days of TPN (ranked #30), which were all in the 2/3rds ranked section of important features. Further, no statistically signi cant differences in patient demographics (Table 2), or clinical care including antibiotics and enteral feeding were found between study groups [see Additional le 1]. Critically, these results indicate that changes in infant HCG trajectories were not due to caloric restriction.
The random forest classi ers identi ed different sub-taxa from within the found key microbial taxa for infant HCG as important predictors across the two time windows. From 24-30 weeks PMA, the top microbial taxa were the abundances of proteolytic Bacteroidota (Alistipes onderdonkii, Alistipes putredinis and family Marini laceae comprised of Odoribacter spp. and Butyricimonas spp.), with remaining important sub-taxa including the abundance of Bacteroides_B (phylum Bacteroidota), the presence of an unclassi ed Coprococcus species and abundance of Blautia wexlerae (family Lachnospiraceae), the presence of Actinobacteriota and the abundance of Firmicutes. Other unique microbial taxa identi ed amongst the top third of important features were the abundances of Veillonella dispar and an unclassi ed Klebsiella species, and the presence of the bacterial family Peptoniphilaceae. From 31-36 weeks PMA, the top microbial taxon was the presence of the genus Faecalibacterium, with remaining important sub-taxa including the abundance of Bacteroides vulgatus and presence of family Tannerellaceae (phylum Bacteroidota), the presence of families Lachnospiraceae and Megasphaeraceae, and the presence of Negativicoccus succinicivorans (family Negativicoccaceae). Other unique microbial taxa identi ed amongst the top third of important features were the abundances of bacterial family Enterococcaceae and Peptostreptococcus anaerobius, and the presences of bacterial family Actualibacteraceae and Gemmiger formicilis.
However, it was found that infants with any SHCGT experienced a signi cantly greater incidence of speci c morbidities in the NICU (Table 2). These results contrasted the random forest classi ers, as the model built from 24-30 weeks PMA had ranked the incidence of any morbidity amongst the bottom three predictors, and the model built from 31-36 weeks PMA had its highest ranked morbidities amongst the 2/3rds ranked features including incidence of seizures (ranked #16) and SROP (ranked #25). Part of the discrepancy may lie with the incidence of morbidities being correlated with the structure of the infant microbiome, as prior studies have associated speci c morbidities, such as NEC and sepsis, with the infant faecal microbiota [61]. Evaluating differences between the infant faecal microbiome amongst patients with and without speci c morbidities is beyond the scope of this study; it was instead important to ensure that morbidity did not signi cantly alter our found relationships between infant faecal microbial taxa and HCG trajectories. A robust permutation approach was utilized to select and rank features for the random forest classi ers [56], which more frequently identi ed microbiome characteristics over morbidities as important for predicting infant HCG trajectories. As an additional precaution, a limited morbidity (LM) subset was created to determine the dependence of the faecal microbial taxon abundance associations with HCG trajectories on this confounder. This LM subset comprised of infants without SBI, SROP, NEC, seizures, or sepsis. Not only were the previously established gut microbiome and infant HCG trajectory relationships retained after consideration of the clinical variables, but Actinobacteriota abundance newly emerged as a biomarker distinguishing moderately to severely SHCGT from mildly SHCGT and AHCGT. RDA analysis con rmed the signi cant difference (p=0.001) in gut microbiome β-diversity between study groups of the LM subset for both the model and terms, including PMA (R 2 =0.6829) and any SHCGT (R 2 =0.3706). For infants with any SHCGT versus AHCGT, signi cant reductions of Bacteroidota ( Figure  2c) and Lachnospiraceae (Figure 2d) abundance were also con rmed in the LM subset overall (p=0.008, R 2 =0.086 and p=0.005, R 2 =0.085 respectively). No sub-taxon from within Bacteroidota and Lachnospiraceae was signi cantly differentially abundant between study groups [see Additional le 11]. Furthermore, predicted metagenomic pro ling again revealed that the greatest number of changes in function occurred in "Metabolism", speci cally "Carbohydrate metabolism" and its subcategories "Starch and sucrose metabolism" and "Amino sugar and nucleotide sugar" metabolism [see Additional le 12].
A nding unique to the LM subset as compared to the complete dataset was that infants with moderately to severely SHCGT had a signi cant depletion in overall Actinobacteriota abundance (p=0.008, R 2 =0.086) compared to infants with AHCGT or mildly SHCGT. Again, no speci c sub-taxon could account for this signi cant difference [see Additional le 11].
Clinical characteristics were statistically analogous between the complete dataset and the LM subset [see Additional le 2]. However, a lower proportion of infants with any SHCGT in the complete dataset were vaginally delivered (Table 2), and this difference became statistically signi cant (p=0.02) in the LM subset [see Additional le 2]. Interestingly, vaginal delivery was also identi ed as a highly important predictor for the random forest classi ers across both time windows, ranking #5 from 24-30 weeks PMA and as the top feature from 31-36 weeks PMA. We hypothesized that the positive in uence of vaginal delivery on infant HCG trajectories resulted from the vertical transmission of bene cial microbes [62], and thus investigated the interaction of this clinical factor with the infant microbiome in further detail.
The impact of delivery mode on infant head circumference growth trajectories reveals dispersal limitation and supersedes habitat ltering as a driver of optimal infant microbiome succession: Vaginal delivery was signi cantly positively associated with infant HCG trajectories, and we hypothesized that this resulted from the vertical transmission [62] of microbes that elicit AHCGT. To address this hypothesis, it was rst evaluated if the effect of delivery mode on infant HCG trajectories was signi cantly moderated by the abundances of faecal microbial taxa. Such a result would indicate dependence of vaginal delivery's bene t to infant HCG trajectories on the structure of the microbiome, and indeed signi cant moderation by the abundances of faecal microbial taxa previously signi cantly associated with AHCGT was found, including Lachnospiraceae, Bacteroides_B, Faecalibacterium and Megasphaera. Next, it was investigated if the abundances of faecal microbial taxa were signi cantly associated with delivery mode to determine which microbes were more likely to be vertical transmitted. It was discovered that the abundance of Bacteroidota was signi cantly enhanced by vaginal delivery, which was one of the found key microbial taxa for infant HCG. Taken together, these results a rm our hypothesis that delivery mode can be considered as a dispersal limitation factor in uencing optimal infant microbiome succession, which impacts infant health and development. Understanding the relative importance of successional drivers is vital for identifying potential indirect routes of modifying the gut microbiome-brain axis, or variables that could alter the effectiveness of direct interventions. Despite clinical variables typically attributed as habitat ltering factors impacting microbiome succession, such as enteral feeding and antibiotics, not having a signi cant direct effect on infant HCG trajectories, it was interrogated if these clinical factors could signi cantly moderate the in uence of vaginal delivery. That was indeed found to be the case, therefore delineating that optimal infant microbiome succession is primarily driven by dispersal limitation and secondarily by habitat ltering, at least in the context of host neurodevelopmental markers.
To test the in uence of the microbiome on the relationship between delivery mode and infant HCG trajectories, moderation analysis was conducted by building cumulative link mixed regression models of the four categories of infant HCG trajectories (AHCGT -> Mildly SHCGT -> Moderately SHCGT -> Severely SHCGT) for evaluating the signi cance of the interaction between delivery mode and the abundance of a given faecal microbial taxon through the Wald statistic, with PMA, and delivery mode and the given faecal microbial taxon individually as xed effects, plus patient as a random effect. Signi cant moderation (p<0.05; FP<1%) by faecal microbial taxa that were consistently augmented or depleted in abundance (as determined by Cohen's D effect sizes) for vaginally delivered infants with AHCGT versus all three SHCGT categories are displayed in Table 3. Vaginally delivered infants with any SHCGT compared to vaginally infants with AHCGT had reductions in faecal microbial taxa previously found to be directly signi cantly associated with infant HCG trajectories from both standard statistics and machine learning modelling, including bacterial family Bacteroidaceae and its genus Bacteroides_B (also its species Bacteroides dorei and Bacteroides vulgatus), bacterial family Lachnospiraceae (also its genus Eubacterium_E and Hungatella e uvii), Faecalibacterium prausnitzii and bacterial genus Megasphaera (from family Megasphaeraceae). Additional diminished faecal microbial taxa that were selected as predictors of infant HCG trajectories for the random forest classi ers only included the bacterial family Acutalibacteriaceae and Alistipes putredinis. Faecal microbial taxa that were decreased in abundance in vaginally delivered infants speci cally across all SHCGT categories compared to AHCGT included Dialister invisus, the bacterial genera Erysipelatoclostridium and Cloacibacterium, and the bacterial family Mycobacteriaceae, whereas the bacterial genus Coprobacillus was the only faecal microbial taxon found to be increased speci cally in vaginally delivered infants across all SHCGT categories versus AHCGT. The large number of signi cant faecal bacterial modi ers, with many of these having a signi cant direct impact on infant HCG trajectories, indicate that the positive effect of vaginal delivery on infant HCG trajectories is dependent on the microbiome (Figure 3c).
To statistically test the hypothesis of vertical transmission, the differences in faecal microbial taxon abundances between vaginally delivered versus Caesarean-section delivered infants were assessed by ANOVA on multivariate regression with PMA as a xed effect and patient as a random effect.
Bacteroidota (p=0.01, R 2 =0.08), a microbial taxon directly signi cantly associated with infant HCG trajectories by both standard statistics and machine learning modelling, was found to be signi cantly increased in abundance in vaginally delivered infants compared to Caesarean-section delivered infants.
This result also included its genus Bacteroides_B (p=0.009, R 2 =0.08), as well as its family Tannerellaceae (p=0.008, R 2 =0.04), genus Parabacteroides (p=0.007, R 2 =0.05) and species Parabacteroides distasonis (p=0.009, R 2 =0.06). An additional faecal microbial taxon signi cantly elevated in abundance in vaginally versus Caesarean-section delivered infants but not found to be directly related to infant HCG trajectories was the bacterial family Methanobacteraceae (p=0.005, R2=0.03) and its genus Methanobrevibacter_A (p=0.009, R 2 =0.03). Therefore, Bacteroidota is likely vertically transmitted in vaginally delivered infants, which can be considered as a mediator of the positive in uence of vaginal delivery on infant HCG trajectories due to both its signi cant direct effects on infant HCG trajectories and its signi cance as a moderator of the relationship between delivery mode and infant HCG trajectories (Figure 3c). Other faecal microbial taxa that colonize the infant faecal microbiome less speci cally via vertical transmission additionally served as moderators, as previously described.
Finally, to test how clinical factors change infant HCG trajectory outcomes of vaginally delivered versus Caesarean-section delivered infants, moderation analysis equivalent to the approach used for the faecal microbial taxon abundances as above was performed. Time was found to be an important moderator of delivery mode and infant HCG trajectory relationships as demonstrated by the signi cance of gestational age at birth (p<2x10 -16 , R 2 =0.11), and its proxies' birthweight (p<2x10 -16 , R 2 =0.11) and birth head circumference (p=0.002, R 2 =0.08). The number of morbidities was expectedly also a signi cant modi er (p=3x10 -11 , R 2 =0.13), and sex was not (p=0.7, R 2 =0.02). Both time and morbidities were found to have a signi cant direct effect on infant HCG trajectories, and thus these results further evidenced our previous interrogation of critical time points (change point analysis) and morbidity as a confounder (LM subset). Intriguingly, several clinical factors known to affect the infant microbiome were found to be signi cant modi ers of the effect of delivery mode on infant HCG trajectories, and these had at least a ve-fold greater impact on vaginally delivered infants, whose Cohen's D effect sizes (AHCGT/any SHCGT) were all large, compared to Caesarean-section delivered infants, whose Cohen's D effect sizes were all small to negligible (Figure 3d). These clinical factors included both the total days (p=2x10 -5 , R 2 =0.18) and longest number of consecutive days (p=4x10 -8 , R 2 =0.13) of all antibiotics, and the total amount of enteral feeds (p<2x10 -16 , R 2 =0.14), along with the additional proxies of total days of TPN (p=2x10 -6 , R 2 =0.14) and the day of life full enteral feeding was achieved (p=0.002, R 2 =0.12). As for enteral feeding-type, the total amount of human milk was a signi cant moderator (p<2x10 -16 , R 2 =0.07), whereas the total amount of formula was not (p=0.1, R 2 =0.04). Notably, none of these clinical factors had a signi cant direct effect on infant HCG trajectories. As diet and antibiotics are known to be primary in uencers of the intestinal environment, these results demonstrate that delivery mode, i.e., a dispersal limitation factor from vertical transmission, supersedes habitat ltering as a driver of infant gut microbiome succession.

Discussion
There is a vital need to identify modi able environmental factors for reducing the incidence of developmental impairments at a time point early enough for successful intervention. Thus, it was the aim of this study to determine if infant gut microbiome composition was associated with HCG, the earliest marker of neurodevelopment [13,14]. This study is the rst to show that β-diversity of the gut microbiome was signi cantly distinct between infants with AHCGT versus any SHCGT, and that reduced abundances of Bacteroidota and Lachnospiraceae were speci cally associated with SHCGT independent of concurrent morbidities and caloric restriction. Notably, this study's novel application of change point analysis further linked microbiome and HCG by revealing that the timing of peak gut microbiome composition alteration exactly matched the timing of signi cant HCG separation between study groups at 30 weeks PMA. Clinical factors were additionally thoroughly examined to determine their in uence on the association between infant HCG trajectory and the gut microbiome, of which the signi cant effect of delivery mode provided further innovative insights into the primary drivers of optimal infant microbiome succession.
There exists evidence for the potential mechanistic impact of Bacteroidota and Lachnospiraceae on host neurodevelopment. Bacteroidota may affect neurodevelopment through altering intestinal barrier integrity and systemic metabolite availability. Hsiao and colleagues [63] have demonstrated that oral treatment of the maternal immune activation mouse model of ASD with human-derived Bacteroides fragilis both altered intestinal tight junction protein expression and the serum metabolite pro le, and ameliorated behavioural defects. Lachnospiraceae are key producers of SCFAs after fermentation, particularly butyrate [64,65], that are utilized by intestinal epithelial cells as a fuel source [66], have potent antiin ammatory effects [67], and promote immune system maturation by increasing the colonic population of T regulatory cells [68][69][70][71]. Stolp and colleagues [72] have shown that neonatal systemic in ammation in rats can lead to altered blood-brain barrier permeability and behaviour, and therefore, Lachnospiraceae could affect neurodevelopment through in uencing energy resources and immunity.
Vaginal delivery was the clinical factor signi cantly associated with improved HCG but was dependent upon the successful vertical transmission [62] of key microbial taxa [see Additional le 5], which could be impeded by certain clinical factors, including prolonged antibiotics, delayed enteral feeding, formula feeding and younger gestational age at birth leading to SHCGT even with vaginal delivery [see Additional le 8]. The signi cance of gestational age connects to the change point analysis demonstrating 30 weeks PMA to be a key transition point for the preterm infant gut microbiome, which complements other studies [24,27]. This transition is thought to occur due to intestinal maturation [73], at which point the preterm infant intestinal environment is like the term infant at birth. Studies in term infants using different neurodevelopmental metrics and timelines have also indicated Bacteroidota and Lachnospiraceae as potentially important taxa; Kelsey et al. [74] found functional brain network connectivity was associated with Bacteroides and Lachnospiraceae abundance, and Loughman and colleagues [75] associated behavioural problems at 24 months of age with a low Prevotella abundance at 12 months. Therefore, the results from this study taken in the context of the current literature interestingly suggest that Bacteroidota and Lachnospiraceae should broadly colonize infants born >30 weeks gestational age at birth through vaginal delivery.
Diet and antibiotics are known to have a large impact on the infant microbiome, exceeding stochasticity [33,62,76]. Further, a seminal study by Feng and colleagues has demonstrated that the in uence of relative microbial tness on infant microbiome succession appears to supersede historical contingency, i.e., the order of introduction of microbes [77]. These results have together indicated that habitat ltering, i.e., variables that shape the intestinal environment, is a dominant driver of infant microbiome succession. Other studies have additionally shown that delivery mode shapes the initial colonizers of the infant gastrointestinal tract [62,76], which includes faecal microbes for vaginally delivered infants, such as Bacteroidota [78,79] complementing our study, and skin microbes for Caesarean-section infants. However, infants delivered at term possess an advantage as they can access the home environment usually relatively quickly, whereas preterm infants usually remain in the hospital environment at length with sanitation protocols preventing normal microbial dissemination [80]. This difference allows the effect of dispersal limitation on infant microbiome succession to be more closely examined in preterm infants, and a major nding of this study is that its in uence exceeds habitat ltering. That could explain why studies on enteral feeding [23,76] and antibiotics [76,81,82] for their health bene ts and effects on microbiome composition have yielded somewhat inconsistent results for preterm infants, despite their known effects on the term infant microbiome.
However, this study's ndings do not preclude the importance of habitat ltering as a driver, since after considering dispersal limitation, signi cance for antibiotics, caloric restriction and human milk feeding was found. Human milk provides microbially fermentative human milk oligosaccharides [83], which are known to be degraded by Bacteroidota, Lachnospiraceae and Bi dobacterium [83,84] and additionally supported by our predicted functional pro ling analysis, as well as notably acting as a source of microbes and thus a secondary dispersal limitation factor [85]. Further, the maternal microbiome can also exhibit dysbiosis, as the faecal and vaginal microbiome composition of mothers that deliver preterm has exhibited signi cant differences compared to mothers that deliver at term in other studies [86,87]. The effects of delivery mode on preterm infant health outcomes and microbiome succession can thus also be masked depending on a given study's cohort distribution [76]. Therefore, it is recommend that clinical variables should be considered sequentially for infant microbiome studies, rst by dispersal limitation factors (e.g., delivery mode, environment such as hospital versus home or urban versus rural, probiotics) then by habitat ltering factors (e.g., diet, prebiotics, antibiotics).

Conclusions
Bacteroidota and Lachnospiraceae are both known core taxa of the adult gut microbiome[88], and our work provides strong evidence that Bacteroidota and Lachnospiraceae need to integrate into the gut microbiota early in infancy for optimal developmental outcomes. Ultimately, the promising ndings from this study encourage future research of microbiome modi cation to improve infant developmental trajectories, either from clinical investigations to verify microbial functions or metabolites, or animal model experiments for direct probing of the impact of Bacteroidota and Lachnospiraceae on neurodevelopment. Optimizing the gut microbiome in infancy could also reduce the incidence of developmental disability, as evidenced by the associations between early antibiotics and later cognitive outcomes [28,29] or the altered gut microbiome of children with ASD [35] and ADHD [36], and future work could continue to follow the faecal microbiome of infants with AHCGT and SHCGT over time to determine the potential life-long effectiveness of an infant microbiome based intervention.

Availability of data and material
The datasets generated and/or analysed during the current study are available in the NCBI SRA repository, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA739139/.

Competing interests
Bree Andrews, MD/MPH, is an equity partner in the B-Corporation PreeMe+You (PMY), a technology startup whose goal is to provide bedside technology to parents of NICU patients that ameliorates health disparities. There is no current income from the company or royalties and her research on the microbiome and neurodevelopment is not related to her work with PMY. There were no overlapping patients that were studied in the current manuscript who also interfaced with the technology from PMY. Dinanath Sulakhe is a co-founder and equity partner in Navipoint Genomics, LLC (NG), a genomics start-up that was incubated at the University of Chicago, whose goal is to provide a cloud-based genomic data analysis platform. NG currently has an active SBIR Phase-1 grant and Mr. Sulakhe is currently providing consulting services for this grant. Mr. Sulakhe is also a co-founder and equity partner in Navipoint Health, Inc (NH), a genetic testing start-up. There is no current income or royalties from NH. The research presented in the paper is not relevant to NG and NH, or his work at NG and NH. It will not have any impact in the form of nancial gain or loss to NG and NH. The remaining authors declare no competing interests.

Funding
We would like to acknowledge the National Institutes of Health UG3 OD023281 to EC and BA and R01 HD083481 to EC for providing funding. The funders did not contribute to the design of the study, writing, and the collection, analysis, or interpretation of data.
Authors' contributions KO conducted the classi cation, post-processing, and all analyses of the 16S rRNA gene sequencing data, generated all gures and tables, and primarily wrote the manuscript. MA and RY consented patients, and collected, prepared, and sent infant faecal samples for 16S rRNA gene sequencing; MA additionally contributed to the Methods section of the manuscript. MD'S and DS collected and prepared clinical data for analysis from the University of Chicago Comer Children's Hospital's electronic medical records system. PDH and AZW were the acting physicians that recruited patients and provided expert medical consultation for this manuscript. BX processed the 16S rRNA gene sequencing data and deposited it into the NCBI SRA repository. MEM provided expert consultation in neurodevelopmental assessment for this clinical study and manuscript. EC and BA are the grant holders that designed the clinical study, obtained IRB approval, and oversaw data interpretation and editing of the manuscript. All authors critically reviewed and approved of the nal manuscript.     Middle and right panels display mean percent abundance with standard deviation bars for both the complete and limited morbidity datasets, respectively. *p<0.05; FP<1%, ANOVA on multivariate regression.  In uence of clinical factors on faecal microbiome and infant head circumference growth relationships.
Random forest classi ers were built for predicting appropriate head circumference growth (HCG) trajectory (AHCGT) versus any suboptimal HCG trajectory (SHCGT) for infants as de ned in legend for  Table 3) were also both signi cantly directly associated with infant HCG trajectories and signi cantly moderated the effect of delivery mode on infant HCG trajectories but were not signi cantly (dashed grey) increased in abundance by VD. Several clinical factors signi cantly