1. Screening of Candidatus Saccharibacteria in human fecal samples
1.1 Standard polymerase chain reaction (PCR)
Standard PCR targeting Candidatus Saccharibacteria enabled the generation of an approximate 600 bp amplicon from the 2 studied fecal samples. We assembled and cleaned sequences using ChromasPro, and we obtained a first amplicon of 441 bp and a second amplicon of 468 bp. These sequences both had 100% homology with the GenBank sequence LR761313.1 belonging to the Candidatus Saccharibacteria bacterium partial 16S rRNA gene. Indeed, this GenBank sequence LR761313.1 corresponded to the first 16S ribosomal RNA gene sequence of Candidatus M.timonensis, originating from the faecal sample, which we deposited in the NCBI database.
1.2 Electron microscopy
The examination of symbionts of bacteria by scanning electron microscopy from the faaecal specimen enabled the capture of images compatible with the presence of Candidatus Saccharibacteria. Cells with a diameter of less than 300 nm associated with the extremity or on the side of the bacteria were observed in both faecal specimens (Fig. 2A et 2B). As the two specimens were positive for Candidatus Saccharibacteria (standard PCR, Sanger sequencing, and microscopy), whole genome sequencing was then carried out on these two fecal samples.
2. Genome sequencing and description of Candidatus M.massiliensis and Candidatus M. timonensis.
Furthermore, based on the cut-off used to describe two different bacterial species, we concluded that they were two different species. Indeed, the ANI values (44) and the dDDH values (45, 46) calculated between Candidatus M.massiliensis and Candidatus M.timonensis were far below the threshold values recommended to delineate bacterial species (i.e., 95–96% and 70%, respectively) (see "Genomic comparison" section below).
2.1 Genomic properties of Candidatus M.massiliensis
A total of 3,516,594 Illumina paired-end reads were assembled. A BLASTn of 45,409 contigs versus nr was performed, and we kept only 191 contigs that belonged to “Candidatus Saccharibacteria.” The deglycosylation pretreatment combined with extraction protocol 5 enabled us to cover the largest percentage of the Candidatus Saccharibacteria genome (70%) (Table S2). In this study, deglycosylation treatment had an important impact on the results obtained. Indeed, this deglycosylation was also found in extraction protocol 5 (excluding pre-treatment), which could explain why this extraction protocol 5 was more adapted to our research than the other protocols used. Here, we highlight that the removal of the biofilm and the release of the DNA by deglycosylation improved the DNA extraction and sequencing results.
Then, we obtained a genome of 1 scaffold and 91 contigs that were 971.417 bp long with 43.2% GC content. A graphical circular map of Candidatus M.massiliensis was acquired and was shown in Fig. 2C. The gene prediction of Candidatus M.massiliensis allowed us to count a total of 1,073 ORFs, and functional annotation showed that 991 (92.3%) were coding DNA sequences, 62 (5.8%) were tRNA-(amino acid) sequences and 3 (0.3%) were rRNA sequences, and 565 (52.6%) were assigned to clusters of orthologous groups (COGs). In addition, we found that the numbers of enzymes involved in protein metabolism (27.8% =68/245) and DNA metabolism (16.3% =40/245) were the highest among the represented subsystems (Figure S1).
The mosaic representation of the Candidatus M.massiliensis genome showed that 63.8% of the genes overlapped with bacteria, 26.4% with other CPR groups, 0.3% with archaea and 9.32% were ORFans (Fig. 2E). Moreover, we also detect one gene with viral origin in this genome. Thereafter, after the application of the same antibiotic resistance adapted strategy 40, we found that Candidatus M.massiliensis was resistant to tetracycline, Glycopeptide and MLS (for Macrolide – Lincosamide - Streptogramin).
2.2 Genomic properties of Candidatus M.timonensis
A total of 17,610,272 Illumina paired-end reads and 67,937 Nanopore reads were assembled. A BLASTn of 50,159 contigs versus nr was performed, and we kept only 400 contigs that belong to “Candidatus Saccharibacteria.” The combination of deglycosylation and 5% Tween 80 pretreatment combined with extraction protocol 5 enabled us to cover the largest percentage of the genome (79%) (Table S2). As mentioned above, we confirm that deglycosylation has a beneficial impact on the genome coverage. Still with the aim of reducing the presence of biofilm and freeing the DNA, we had the idea of using Tween 80 known to be an emulsifier/dispersant. In this work, we also demonstrated that the Tween 80 treatment complemented with deglycosylation allowed to obtain a better coverage of the genome compared to the other treatments.
A genome of 3 scaffolds and 197 contigs was obtained with genomic size was 755,196 bp long with 43.7% GC content. In addition, we showed that Candidatus M. timonensis had 839 ORFs, including 777 (92.6%) coding DNA sequences. In parallel, 40 genes coded for (4.8%) tRNA-(amino acid), 3 (0.4%) for rRNA, and 478 (56.9%) were assigned to COGs. Moreover, we highlighted that the number of enzymes involved in protein metabolism (27.7% =54/195) and DNA metabolism (15.3% =30/195), as for Candidatus M.massiliensis, were the most common subsystems found in the genome of Candidatus M. timonensis (Figure S1). The graphical circular map of Candidatus M. timonensis is shown in Fig. 2D. The mosaic representation of the rhizome showed that 63.8% of the genes overlapped with bacteria, 26.1% overlapped with CPR, 0.29% with archaea, and 9.8% with ORFans (Fig. 2F). In parallel, one gene with viral origin were detected. Finally, Candidatus M. timonensis was resistant to beta-lactam, tetracycline and glycopeptide.
3. Genomic comparison
3.1 Taxono-genomics classification
The 16S rRNA-based phylogenetic tree highlighting the position of Candidatus M.massiliensis and Candidatus M. timonensis relative to other closely related taxa shows that these genomes belonged to the cluster of Candidatus Saccharibacteria (Fig. 3A). The maximum ANI value was 94 between Candidatus M.massiliensis and Candidatus M. timonensis (Fig. 3C). In addition, estimation by digital DNA-DNA hybridization (dDDH) 46,47 was performed, and the maximum value of the genomes of interest was 56.8 (confidence interval = [54.0–59.5]) between Candidatus M.massiliensis and Candidatus M. timonensis (Table S3). Then, a phylogenetic tree based on whole genome sequences was constructed and confirmed that Candidatus M.massiliensis and Candidatus M. timonensis belong to the phylum Candidatus Saccharibacteria (Fig. 3B). A heatmap representing the single nucleotide polymorphisms (SNPs) between the CPR genomes studied in this work is shown in Fig. 3D. A SNP is a germline substitution of a nucleotide at a specific position in the genome. Regarding CPR genomes, we found that the number of SNPs shared between the CPR genomes studied here varied from 162 between Candidatus Gracilibacteria bacterium 28_42_T64 chromosome (CP042461.1) and Candidatus Gracilibacteria bacterium HOT-871 (CP017714) to 2,213 between Candidatus Saccharibacteria bacterium oral taxon 488 strain AC001 (CP040003) and Candidatus Woesebacteria bacterium GW2011_GWF1_31_35 (CP011214). The genomes of Candidatus M.massiliensis and Candidatus M.timonensis shared 184 SNPs between them, highlighting the close link between these strains.
3.2 Comparison of genomic properties and metabolism of CPR
The genomic properties of the CPR genomes studied in this study are listed in Table 1. Briefly, the size of the available CPR genomes varied between 1,450,269 bp (for Candidatus Saccharibacteria division) and 602,646 bp (for Candidatus Kazan division), and the coding radio varied between 20.9% (for Candidatus Gracilibacteria) and 94.8% (for Candidatus Beckwithbacteria). For Candidatus M.massiliensis and Candidatus M.timonensis, their genomic sizes were 971.417 bp and 755.196 bp and the coding radio were 85.7% and 78.7%, respectively.
Furthermore, the number of coding proteins in these genomes ranged from 2,293 (for
Candidatus SR1 division) to 642 (for
Candidatus Kazan division), and the average size of the proteins ranged from 311.2 amino acids (for
Candidatus Kazan division) to 74.1 (for
Cadidatus Gracilibacteria phylum). Regarding
Candidatus M.massiliensis and
Candidatus M.timonensis, the number of coding proteins were 1009 and 796, and the average size of the proteins were 280 amino acids and 255 amino acids, respectively.
Moreover, the proportion of ORFan genes ranged from 35.37% (= 504/1515) for the Candidatus Gracilibacteria division to 0% for five complete genomes from the environmental microbiome (Table 1). Regarding Candidatus M.massiliensis and Candidatus M. timonensis, the ORFan genes represented 5.0% (= 40/796) and 2.7% (= 27/1,008), respectively. In parallel, all the genomes described here possessed rRNA and tRNA sequences, but only two human CPR genomes showed the presence of CRISPR systems. In terms of CG%, there was a variation of 26.7% between the different CPR genomes studied here from 51.8% for the Candidatus Kazan division to 25.1% for the phylum Candidatus Gracilibacteria (Table 1). For Candidatus M.massiliensis and Candidatus M.timonensis, the GC% were 43.2% and 43.7%, respectively.
In addition, the distribution of functional classes of predicted genes based on orthologous protein groups was mainly in the following groups: 1) translation, 2) replication, recombination, and repair, 3) general function prediction only, and 4) cell wall/membrane biogenesis. Indeed, in Candidatus M.massiliensis and Candidatus M. timonensis, these groups included more than half of the COGs: 55.75% (315/565) and 55.02% (263/478), respectively (Figure S2). The most represented Kyoto Encyclopedia of Genes and Genomes (KEGG) categories of the CPR genomes described here were “translation,” followed by "carbohydrate metabolism” and "replication and repair” (Fig. 3E). In Candidatus M.massiliensis and Candidatus M. timonensis, these KEGG categories included 48.0% (160/333) and 50.6% (154/304) of the genes, respectively. In parallel, the categories underrepresented in the CPR genome were "signaling molecules and interaction" and "lipid metabolism" (Fig. 3E).
3.3 Pangenomic comparison
Pangenome analysis of 10 genomes of Candidatus Saccharibacteria, including Candidatus M.massiliensis and Candidatus M. timonensis, showed that 136 common genes were shared (percentage identity = 50%) (Fig. 3F). However, pangenome analysis against 18 complete genomes of various phyla of CPR described only 3 common genes: rpoB, tuf, and rpsJ coding for DNA-directed RNA polymerase subunit beta, elongation factor Tu, and 30S ribosomal protein S10, respectively (percentage blastp identity = 50%) (Fig. 3G). Moreover, other blastp identity percentages (65%, 80%, and 95%) are also shown in Figures S3.
4. Culture of Candidatus Saccharibacteria from a human faecal sample
The culture of Candidatus Saccharibacteria was cocultured with potential host bacteria (Schaalia odontolytica and Arachnia propionica) as suggested by Murugkar et al. 12.
Follow-up by standard PCR targeting Candidatus Saccharibacteria showed positive results after filtration at 0.22 µm and ultracentrifugation of the two human faecal samples studied here. In addition, all initial cocultures were negative in standard PCR targeting the 16S rRNA gene (primers TM7-1177R and TM7-580F).
4.1. Attempts to cultivate Candidatus M.massiliensis from faecal sample 1
For the human digestive sample from which the Candidatus M.massiliensis genome was reconstructed, all the subcultures performed from the “Candidatus Saccharibacteria"- Schaalia odontolytica coculture were negative by standard PCR (Table S4). Nevertheless, subcultures P1 and P5 under anaerobic conditions and subculture P4 under aerobic conditions of the potential CPR coculture with their host Arachnia propionica were positive by molecular biology techniques (Table S4). Sanger sequencing was able to prove the infection of host cells with Candidatus Saccharibacteria (Table 2). Surprisingly, each coculture a different top hit suggesting the presence of more than one species of Candidatus Saccharibacteria in the faecal sample 1 (Table 2). In parallel, the electron micrographs showed the presence of these structures in postinfection in the subcultures that were positive by PCR (Figure S4).
Table 2
– Summary table of blastn results obtained from positive amplicons of Candidatus Saccharibacteria-host coculture from human digestive samples (best hits). P1: passage 1, P3: passage 3, P4: passage 4, P5: passage 5., S1: Digestive sample 1 (M. massiliensis), S2: Digestive sample 2 (M. timonensis)
Positive coculture in standard PCR and electron microscopy
|
Blastn results
|
Accession number
|
%Coverage
|
%Identity
|
E-value
|
P1-S1- A.propionica (Anaerobic condition)
|
Uncultured bacterium clone 071072_210 16S ribosomal RNA gene, partial sequence
|
JQ477137.1
|
98%
|
99.54%
|
0.0
|
P5-S1- A.propionica (Anaerobic condition)
|
Uncultured bacterium clone TM7 BBM10 16S ribosomal RNA gene, partial sequence
|
KP326382.1
|
100%
|
99.78%
|
0.0
|
P4-S1- A.propionica (Aerobic condition)
|
Uncultured candidate division TM7 bacterium clone 09_3_D06 16S ribosomal RNA gene
|
GU227156.1
|
100%
|
98.85%
|
0.0
|
P3-S2- A.propionica (Aerobic condition)
|
Candidatus Saccharibacteria bacterium oral taxon 488 strain CM002 chromosome, complete genome
|
CP039998.1
|
100%
|
95.96%
|
0.0
|
P5-S2- A.propionica (Aerobic condition)
|
Uncultured bacterium clone TM7 BBM10 16S ribosomal RNA gene, partial sequence
|
KP326382.1
|
100%
|
100%
|
0.0
|
P4-S2- S.odontolytica (Anaerobic condition)
|
Candidatus Saccharibacteria bacterium partial 16S rRNA gene (= Minimicrobia timonensis)
|
LR761313.1
|
100%
|
99.78%
|
0.0
|
4.2 Attempts to cultivate Candidatus M.timonensis from faecal sample 2
For the second human digestive sample from which the Candidatus M. timonensis genome was constructed, all the subcultures of the “Candidatus Saccharibacteria" and Arachnia propionica cocultures in an anaerobic atmosphere were negative by standard PCR (Table S4). However, subcultures P3 and P5 of the " Candidatus Saccharibacteria"- Arachnia propionica coculture under aerobic conditions and subculture P4 of the coculture of potential “Candidatus Saccharibacteria” with Schaalia odontolytica were positive by molecular techniques (Table S4). Blasting of DNA bands obtained on agarose gels revealed the infection of host strains by Candidatus Saccharibacteria (Table 2). Surprisingly, the blastn search of the sequence obtained after subculturing P4 of the coculture with Schaalia odontolytica gave a result with 100% coverage and 99.78% identity with Candidatus M. timonensis that we targeted (Table 2). Once again, each coculture a different top hit suggesting the presence of more than one species of Candidatus Saccharibacteria (Table 2). In parallel, microscopic observation highlighted the presence of symbionts of bacteria in postinfection that were absent preinfection (Figure S4).
5. Metagenomic analysis
The MetaGX database is based on 16S rRNA, and these two genomes described here are very close to each other (99.7% identity of the 16S rRNA gene). Therefore, in order to have a preliminary view on the prevalence and distribution, we used only the reference sequence of Candidatus M.massiliensis. The number of reads corresponding to OTUs associated with Candidatus M.massiliensis ranged from 20 to 41,243. The maximum relative abundance was 9.31% in human respiratory samples from France. In total, we found 12.8% (= 610/4,756) positive samples in the MetaGX database 44. More precisely, the frequency of Candidatus M.massiliensis OTUs was 85.71% (= 24/28) in the human oral samples, 17.7% (= 396/2,233) in human gut samples, 13.87% (= 117/843) in human respiratory samples, 12% (= 42/350) in human breast milk samples, 11.9% (= 14/117) in human skin samples, 1.97% (= 15/760) in human urine samples and 1.5% (= 2/128) in environmental samples. However, this OTUs were absent from human blood, animal digestive tract, human genital tract, human brain abscess, human bone, insects, planaria, and yogurt microbiota (Figure S5). The frequency in positive samples (= 610) was 64.9% (= 396/610) in the human digestive tract, 19.1% (= 117/610) in the human respiratory system, 6.88% (= 42/610) in human breast milk, 3.93% (= 24/610) in human oral samples, 2.45% (= 15/610) in human urine, 2.29% (= 14/610) in human skin and 0.32% (= 2/610) in environment specimens (Fig. 4A). The highest relative abundances were found in the oral (1.76%) and respiratory (0.53%) microbiota (Fig. 4B). In parallel, there were significant differences between the relative abundance in the oral samples and breast milk (p < 0.0001), urine (p < 0.05), skin (p < 0.01), and human digestive samples (p < 0.0001) (Fig. 4B). In addition, we found a significant difference between the relative abundances in respiratory samples and human digestive (p < 0.0001) and breast milk samples (p = 0.0001) (Fig. 4B). Candidatus M.massiliensis OTUs were more frequently detected in samples from France (51.63% =315/610) than in those from Mali (38.85% =237/610) and Senegal (9.50% =58/610) but were absent in microbiomes from Vietnam, Niger and Côte d’Ivoire (Figure S5). The relative abundance found in samples from France was significantly higher than in those from Mali (p < 0.0001) (Figure S5). In addition, the frequencies of positive samples from Europe and Africa were 51.64% (= 315/610) and 48.36% (= 295/610), respectively (Fig. 4C), but these OTUs were absent from the samples from Asia (Figure S5). We found a significant difference between the relative abundance of samples from Europe and those from Africa, with abundances in European samples being higher (p < 0.0001) (Fig. 4D). Conversely, a significant difference was found between the relative abundances in human breast milk samples from Europe and Africa, with higher values in African human breast milk samples (p < 0.01) (Figure S5). In parallel, no significant differences were found between the relative abundances of other microbiota according to their continental origin (Figure S5).
Furthermore, following a thorough metagenomic analysis, we found a significant proportion of Candidatus M.massiliensis (7,024 reads with a relative abundance of 0.9%) in a fecal metagenome from a patient who had spontaneously been cured of HIV infection 25. We were able to cover only 88% of the genome by mapping the Candidatus M.massiliensis genome and we detected a second novel genome of Candidatus Saccharibacteria species that we named Candidatus M.timonensis.