Transmission of gut bacteria and viruses from mothers to infants

Seeding and development of the gut ecosystem are crucial for health, both in childhood and 1 later in life. While the composition of infant gut bacterial communities has been described, 2 the composition and origin of the infant gut virome remains under-studied. Here, we explore 3 mother-to-infant transmission of bacteria and viruses in 30 mother – infant pairs in a 4 longitudinal collection of faecal samples taken during pregnancy and the first 3 months after 5 birth. We demonstrate that infant bacterial strains resemble maternal strains more than 6 those of unrelated mothers. We quantify viromes using a complementary approach 7 examining both total metagenomes and viral metagenomes. The virome composition is highly 8 consistent between viral and total metagenomes. The infant gut viromes are dominated by 9 active temperate bacteriophages, which are more abundant in infants than mothers (p- 10 value=7.2e-06). We observe that the proportion of shared viruses between maternal and 11 infant gut is only 11.3% when considering the active virome fraction alone, but increases to 12 37.6% when taking into account temperate phages in the form of prophages. These findings 13 indicate that viruses are vertically transmitted from mothers to infants early in life and that 14 pioneering phages can reach the infant gut via vertical transmission of their bacterial hosts. Correlation of the proportion of shared bacterial species and the proportion of shared viruses in TMs of mother – infant pairs.


Introduction
The human early life gut ecosystem has garnered much interest in recent years because of its 16 links to health and disease later in life 1 , but the core aspects of its origin and development 17 remain less understood. Studies have characterised the development of the infant gut 18 microbiome through the first 2-3 years of life, after which the gut microbiome reaches a state 19 of high microbial richness equivalent to that of an adult 2,3 . Thus far, the focus of research has 20 been on the development of gut bacteria. However, the gut ecosystem also comprises viruses, 21 archaea and eukaryotes such as fungi and protists, whose role in early gut ecosystem 22 development has received less attention. 23

Both vertical transmission (during pregnancy and birth) and horizontal transmission 24
(from environmental sources like infant diet) have been described as sources of infant gut 25 microbiota 4 . Recent studies provide increasing support for the maternal gut bacterial 26 reservoir as a key source of microbial transmission from mothers to infants 5,6 Although the 27 transmission of infectious viruses such as cytomegalovirus and herpes simplex virus in the 28 context of maternal and infant morbidity has been established 7 , little is known about the 29 transmission of bacteriophages (bacterial viruses) from maternal to infant gut. This is partially 30 due to difficulties in isolating and annotating metaviromes 8 . As environmental studies have 31 clearly demonstrated that bacteriophages are key players in the modulation of bacterial 32 communities 9,10 , it is crucial to study them in the context of the developing human gut 33 ecosystem as the bacterial community is established in the months following birth. 34 A recent study in 20 healthy infants using viral-like particle (VLP) data provided 35 substantial functional evidence that the bacteriophages colonising the infant gut arise from 36 excisions from pioneering infant gut bacteria 11 . However, the origin of these pioneering 37 bacteriophages, as well as their hosts and their possible roots in the maternal gut ecosystem, 38 remain underexplored. We therefore sequenced 183 total metagenomes (TMs) obtained by 39 isolating all microbial DNA from stool and 66 viral metagenomes (VMs) using a VLP 40 enrichment isolation protocol from 30 mother-infant pairs for whom we have longitudinal 41 samples collected during pregnancy and the first 3 months of life. We first described the 42 bacterial and viral composition in both mothers and infants and elucidated the degree of 43 inter-individual and intra-individual variability. We then compared bacterial strains and 44 viruses across and within mother-infant pairs and found that infant bacterial strains resemble 45 their mother's strains more than those of unrelated mothers. We found that the proportion 46 of shared active viruses within mother-infant pairs was lower than the proportion of shared 47 bacteria. However, we also found significantly more viruses that were shared between 48 mother-infant pairs when taking into account bacteriophages incorporated in the bacterial 49 genomes of maternal TMs, which indicates that maternal prophages are a source of 50 pioneering infant gut bacteriophages. 51

Study sample and gut bacterial composition in mothers and infants 52
We first characterised the gut bacteriome of mothers and infants and described the degree 53 of sharedness of bacterial species in 183 TMs from 30 mothers and 32 infants, including 2 54 twin pairs (Figure 1a). Mothers collected their faeces at months 3 and 7 during pregnancy, at 55 birth and at 3 months after birth (Figure 1a). Faecal samples from infants were collected at 56 months 1, 2 and 3 after birth. Although meconium samples collected at birth were available, 57 only 3 of the 32 meconium samples had a sufficient number of microbial reads, reflecting a 58 low bacterial abundance in these samples, and these 3 meconium samples were therefore 59 excluded from the analysis. In this study, mothers had a median age of 32.0±4.4 years, a 60 median pre-pregnancy of BMI of 23.0±2.6 kg/m 2 (approximately 5 years prior to the 61 pregnancy) and a median gestational age of 39.6±1 week. The infants were 56% female and 62 had a median birth weight of 3,705±380.4 grams. 87.5% of the infants were vaginally 63 delivered, with one twin pair having different delivery modes, and 28% were born at home. 64 In terms of feeding, 53.6% were exclusively breastfed at month 1, 56.3% were exclusively 65 breastfed at month 2 and 53.3% were exclusively breastfed at month 3 (Supplementary Table  66 S1). 67 The infant gut bacteriome showed a remarkably lower species-level alpha-diversity 68 compared to the maternal gut bacteriome (mean mother=2.62, mean infant=1.26, p-69 value=1.12e-34, (Supplementary Fig. 1a)), consistent with previous findings 5 . Over the entire 70 study period, maternal alpha-diversity did not show any significant association patterns to 71 maternal phenotypes or timepoint ( Supplementary Fig. 1c), although this analysis may have 72 been hampered by low sample size. The alpha-diversity of bacterial species in infants also did 73 not change significantly during the first 3 months (Supplementary Fig. 1b). At the phylum 74 level, mothers and infants showed remarkably different compositions (Figure 1b, c), with 75 infants having more Actinobacteria and Proteobacteria and mothers having more 76 Bacteroidetes and Firmicutes (FDR<0.05) (Supplementary Table S2). 77 The species-level bacteriome composition was significantly different in mothers and 78 infants (permutational multivariate analysis of variance (PERMANOVA) p-value=0.001, 999 79 permutations, Figure 1d). The maternal bacteriome showed greater inter-individual variation 80 than intra-individual variation across timepoints ( Supplementary Fig. 2b). Infants also 81 demonstrated more inter-individual variation than intra-individual variation (Supplementary 82 Fig. 2a). We next explored the relation between host phenotypes and the abundance of 83 microbial species and for example found that infants who were exclusively breastfed had a 84 significantly higher abundance of Cutibacterium acnes, a bacterium typically known to be a 85 member of skin flora (FDR=0.001) (Supplementary Table S3). 86 a, Distribution of successfully sequenced samples across mothers and infants at month 3 (P3) and month 7 of pregnancy (P7), birth (B) and the first, second and third month after birth (M1, M2 and M3, respectively). b-c, Phylum-level relative abundance profiles for infant (b) and maternal (c) samples. d, PCoA analysis showing the differences in overall bacterial composition between mothers and infants calculated using Bray-Curtis dissimilarity matrix. Ellipses show 95% confidence intervals. e, Number of the bacterial species in infants shared with own mother. Number of dots represents the number of available timepoints for each infant.

Maternal to infant vertical transmission of bacteria
Although microbiome composition differed between mothers and infants, 43.6±10.0% of 87 bacterial species were shared across related motherinfant pairs (considering all timepoints 88 in mothers and infants, Figure 1e, Supplementary Table S4). We next aimed to explore 89 whether bacterial species shared between related mother-infant pairs are represented by 90 the same strains. For 12 of the 17 species for which strain-level data could be constructed for 91 both mothers and infants, we found at least one incidence of a mother-infant pair sharing an   a, pairwise distances between B. bifidum strains between mother-infant pairs delivered at the hospital (HS) and mother-infant pairs delivered at home (HM). b, PCoA based on phylogenetic distances in A. muciniphila strains in mother-infant pairs. Labels represent timepoints. Only strains present in at least one mother-infant pair are depicted c, Phylogenetic distances within all combinations of samples for related mothers and infants and distances between unrelated individuals. In mothers, 105 viral contigs were shared across at least 50% of the samples, which is in 164 contrast to only two viral contigs shared across at least 50% of the infant samples, 165

Gut viral composition in mothers and infants
demonstrating, yet again, the high individual-specificity of the infant gut virome. 166 In mothers, the number of shared viruses across two timepoints was, on average, 167 1,060, which comprised 26.0±18.6% of all viruses ever detected in a mother (Supplementary 168

Mother to infant transmission of viruses
A moderate proportion of all the viruses found in infants was shared with their mothers 175 (11.3±20.0%), which is in contrast to the proportion of bacterial species shared between 176 mothers and infants (43.6±10.0%). The proportion of shared viruses did not significantly 177 correlate to any available infant phenotypes. 178 Previous studies have shown that the adult gut ecosystem observes "Piggyback-the-179 Winner" dynamics, a state where bacteriophage lysogeny is increasingly favoured in a state 180 with a higher microbial density 11,14,15 . In contrast, the developing infant gut ecosystem is 181 thought to be characterised by a high frequency of bacterial lysis events induced by 182 bacteriophages 16 . We therefore sought to compare the cumulative relative abundance of 183 temperate phages between mothers and infants using the presence of integrase and 184 recombinase genes within each viral contig as a measure of temperate phages. The relative 185 abundance of active temperate phages in infants was significantly higher than in mothers (p-186 value=7.2e-06, Figure 5a) and decreased with timepoint (RPearson=-0.38, p-value=0.03, Figure  187   1200 1400 not shared with mother shared with mother

Viral contigs
Correlation of the proportion of shared bacterial species and the proportion of shared viruses in TMs of mother-infant pairs.

Discussion
This study describes the extent and nature of microbial transmission from mothers to infants, 213 with a focus on viral transmission. A limitation of our study design is that we did not analyse 214 RNA viruses, given their perceived low abundance in the healthy human gut, although future 215 studies should also investigate this dark aspect of the virome. However, while we 216 acknowledge that our sample size only allows us to describe a limited number of associations 217 with phenotypes and only indicates trends in the co-development of bacteriome and virome, 218 our study also has notable strengths. Our novel approach of complementary investigation of 219 both TMs and VMs in both mothers and infants led to novel findings. In the VLP-specific 220 analysis that selectively explores the active virome fraction in stool, we observe high 221 individual-specificity and lower diversity in infants compared to mothers. The average shared 222 fraction of active virome between infants and their mothers was 11.3%, which agrees with 223 the results of one of the few studies to investigate the sharedness of viruses amongst mothers 224 and their infants, where a low sharing of viruses in comparison with bacterial genera was 225 reported in 28 mother-infant twin pairs 16 . However, in our study, when we took prophages 226 into account, the average proportion of vertically transmitted viruses increased to 37.6% and 227 became comparable to the sharedness of gut bacterial species. The majority of the temperate 228 bacteriophages dominating the infant gut were not detected in maternal metaviromes, but 229 were prevalent in maternal TMs, suggesting that infants partly obtain their virome from their 230 mothers via vertically transmitted bacteria. 231 Our study also addressed previously unstudied factors for maternal to infant microbial 232 transmission such as the place of deliveries. As home deliveries constitute 12.7% of the total 233 deliveries (2018) in the Netherlands, and 28% in our samples, we possess the unique 234 opportunity to also reveal trends concerning possible novel associations regarding the effect 235 of place of birth on infant bacterial strains like Bifidobacterium bifidum which need to be 236 further investigated in a larger cohort 17 . An additional strength of our study was that we did 237 not use amplification techniques during the generation of VMs, which allowed accurate 238 quantification of viruses and led to minimal bias in our estimation and characterisation of 239 double stranded DNA viral families. 240 In conclusion, we provide evidence for mother-to-child viral and bacterial 241 transmission events at high-resolution and novel insights into the early colonisation of the 242 infant gut ecosystem. 243

Study design
The samples for this (pilot) study were obtained from the Lifelines NEXT cohort, a birth cohort 244 designed to study the effects of intrinsic and extrinsic determinants on health and disease in 245 a four-generation design. Lifelines NEXT is embedded within the Lifelines cohort study, a 246 prospective three-generation population-based cohort study recording the health and health-247 related aspects of 167,729 individuals living in Northern Netherlands 18 In Lifelines NEXT, we 248 aim to include 1,500 pregnant Lifelines participants and intensively follow them, their 249 partners and their children up to at least 1 year after birth. During the Lifelines NEXT study, 250 biomaterials including maternal and neonatal (cord) blood, placental tissue, faeces, breast 251 milk, nasal swabs and urine are collected from the mother and child at 10 timepoints. The 252 long-term health outcomes of these infants and their parents will be investigated. 253 Furthermore, data on medical, social, lifestyle and environmental factors are collected via 254 questionnaires at 14 different timepoints and via connected devices 19 . 255

Informed consent
The Lifelines NEXT study was approved by the Ethics Committee of the University Medical 256 Center Groningen, document number METC UMCG METc2015/600. Written informed 257 consent forms were signed by the participants or their parents/legal guardians. 258

Sample collection
Mothers collected their faeces during pregnancy at months 3 and 7, at birth and 3 months 259 after birth (Figure 1a). Faeces from infants were collected from diapers by their parents at 260

Profiling of total gut microbiome composition
Total metagenome sequencing reads were trimmed and Illumina sequence adaptor 299 sequences removed using KneadData tools (v0.7.4) and an average PHRED quality score of 300 33 21 . Following trimming, the KneadData-integrated Bowtie2 tool (v2.4.2) 22 was used to 301 remove reads that aligned to the human genome (GRCh37/hg19), and the quality of the 302 processed data was examined using the FastQC toolkit (v0.11.9) 23 . Samples with a clean read-303 depth <5 million were not considered in further analysis. Taxonomic composition of total 304 metagenomes was profiled using the MetaPhlAn3 tool with the MetaPhlAn database of 305 marker genes mpa_v30 and the ChocoPhlAn pan-genome database (201901). 306

Profiling of gut virome composition
VM sequencing reads were subject to quality trimming. Read mapping to the human 307 We annotated the gut virome composition using the de novo assembly-based 313 method. Whole metagenome de novo assembly was performed per VMs using SPAdes 314 (v3.11.1) in metagenomic mode (-meta) with default settings 24 . On average, 280,363 contigs 315 were assembled for maternal samples and 46,830 contigs were assembled for infant samples. 316 When pooled, 724,612 contigs from the whole dataset were subjected to a redundancy-317 removal procedure in which contigs with 90% nucleotide identity over 90% of the length of a 318 shorter contig were considered redundant, and the shorter contig was removed. Overall, 319 274,870 representative pooled contigs larger than 1 kbp were subject to validation as viral. 320 For the 274,870-representative pooled contigs, we predicted the Open Reading 321 Frames (ORFs) using Prodigal v2.6.3 in metagenomic mode 25  Orthologous Groups (pVOGs) 27 . Hits were considered significant at an e-value threshold of 325 10 -5 . Ribosomal proteins were identified using a BLASTp 28 search (e-value threshold of 10 -10 ) 326 against a subset of ribosomal protein sequences from the COG database (release 2014). We 327 used VirSorter v1.0.3 29 with its expanded built-in database of viral sequences ('-db 2' 328 parameter) in the decontamination mode as one of the steps for prediction of viral sequences. 329 Representative contigs larger than 1 kbp were considered viral if they fulfilled at least one of 330 six criteria (similar to those described in 12,13,30 ): (1) they produced BLASTn alignments to a 331 viral section of NCBI RefSeq (release 98) with e-value≤10 -10 , covering >90% of contig length at 332 >50% identity, (2) they had at least three ORFs, producing HMM-hits to the pVOG database 333 with an e-value≤10 -5 , with at least two ORFs per 10 kb of contig length, (3) they were 334 VirSorter-positive (all six categories, including suggestive), (4) they were circular 31 , (5) they 335 produced BLASTn alignments to 427 crAss-like reference genomes 32 & unpublished data) 336 with an e-value≤10 -10 covering >90% of contig length at >50% identity, or (6) they were longer 337 than 3 kbp with no hits to the nt database (release 235) (alignments >100 nucleotides with 338 90% identity and an e-value of 10 -10 ). 54,267 contigs fulfilled at least one of these six criteria, 339 and these contigs were subjected to clustering with 427 crAss-like phage contigs from 32 and 340 an unpublished custom database and genomes of the reference database 341 "ProkaryoticViralRefSeq97-Merged", using vConTACT2 0.9.15 with default parameters 33 .

342
Contigs assigned the status 'Overlap', 'Singleton' and 'Outlier' by vConTACT2 were treated as 343 viral clusters consisting of a single contig in all subsequent analyses. Contigs were further 344 subject to a decontamination procedure based on the following criteria. Contigs were kept if 345 they 1) were circular and contained at least 1 pVOGs hit, 2) were circular and VirSorter-346 positive, or 3) were VirSorter-positive and did not have ribosomal protein genes. Contigs were 347 excluded if they 1) had more than three ribosomal protein genes (similar to the 348 decontamination process described in 12,13 ), or 2) were longer than 200 kbp and co-clustered 349 with contigs possessing more than three ribosomal protein genes. 350 The final curated database of reconstructed viral sequences generated based on our 351 dataset included 51,455 representative contigs ranging in size from 1 kbp to >626 kbp, with 352 low-to-high k-mer coverage (1. Family-level taxonomic annotations were assigned to viral contigs using the Demovir 364 script (https://github.com/feargalr/Demovir) with default parameters and database. Manual 365 curation of Demovir taxonomy assignment and expansion on unassigned contigs using 366 vConTACT was performed as described previously 13 . Despite using sequencing adaptors 367 binding only dsDNA, a few genomes of ssDNA viruses assigned to the families Circoviridae, 368 Inoviridae, Microviridae and Parvoviridae were reconstructed. These could be either a result 369 of taxonomy misassignment or of catching these viruses during their replication in the duplex 370 form. Given the low bacterial contamination level, we did not exclude the putatively ssDNA 371 virus contigs from the further analysis, assuming that even in case of taxonomy 372 misassignment, these contigs are still of viral origin. 373

Virome annotation in TMs
Quality-filtered reads from 183 TMs were aligned to the custom viral database consisting of 374 51,455 viral-representative contigs on a per-sample basis using Bowtie2 v2.3.4.1 in 'end-to-375 end' mode 22 A count table was generated and transformed as with VMs. The resulting RPKM 376 count table was used to assess the number of shared viruses in related mother-infant pairs. 377 The increase in the number of shared viruses in mother-infant pairs in TMs compared to VMs 378 did not depend on the initial size of the virus pool or any of the infant or maternal phenotypes 379 studied (Spearman correlation, p-value>0.05). All comparisons regarding the VMs and TMs 380 viromes were performed on the 66 samples that had both VM and TM samples. 381

Ecological measurements and taxonomic comparisons
To assess bacterial alpha-diversity for each sample, we used species reported in the 387 MetaPhlAn3 output at >0.1% mean relative abundance and present in at least two samples. Differentially abundant viral taxa between adults and infants were defined using a linear 416 (mixed) model that took into account the origin of the sample (and the repeated 417 measurements). The choice of model for every viral family was performed by comparing the 418 suitability of the random effects, as described earlier. FDR correction was applied to correct 419 for multiple testing as described previously. 420

Bacterial species-specific strain analysis 421
Strain SNP haplotypes were generated using StrainPhlAn3 21,42 . This method is based on 422 reconstructing consensus sequence variants within species-specific marker genes and using 423 them to estimate strain-level phylogenies 42 . This method only takes into account the 424 dominant strain of species and hence our method misses overlaps in secondary strains. We 425 then performed multiple sequence alignment and used the Kimura 2-parameter method from 426 the 'EMBOSS' package 43 to calculate phylogenetic distance matrices that contain the pairwise 427 nucleotide substitution rate between strains. Using presence in more than 50 samples as a 428 cut-off, we were able to construct SNP haplotype differences in 17 bacterial species. To 429 identify distinct strain clusters within species, we performed hierarchical clustering using the 430 R function hcluster(). To define identical strains between mother and infant, a definition of 431 0.0 of Kimura strain distance value was used. The significance of the difference between 432 phylogenetic distances between related mother-infant pairs and unrelated mother-infant 433 pairs was tested using PERMANOVA (999 permutations) and a FDR<0.05 was considered 434 significant. Similarly, the effect of place of birth was tested using PERMANOVA (999 435 permutations) on the phylogenetic distances of Bifidobacterium bifidum correcting for 436 samples origin (mother or infant) and repeated measurements. 437

Association with phenotypes
We used R package 'Multivariate Association with Linear Models (MaAsLin) 2' 44 to associate 438 phenotype data with bacterial species. MaAsLin performs boosted, additive general linear 439 models of associations between phenotypes and the relative abundance of species. We 440 performed this analysis only in infant samples where we used a cut-off of 0.01% species 441 abundance and selected only species occurring in at least 5% of the samples. Prior to input 442 into MaAsLin, we performed a clr transformation. In MaAsLin, we set individual IDs as random effects and added read-depth, sex, delivery mode, infant feeding type, timepoint, birth 444 weight, place of birth, maternal age, maternal pre-pregnancy BMI and gestational age as fixed 445 effects. We defined statistical significance at FDR<0.05. Given the low sample size to study 446 association of viral contig abundances with phenotypes we did not perform this analysis. 447

Code availability
All codes used in this study can be found at: https://github.com/GRONINGEN-MICROBIOME-