Collinsella provencensis sp. nov., a new species identied from healthy human gut microbiota.

A Gram-positive and anaerobic bacterium was isolated from the stool sample of a healthy French volunteer. A taxonogenomics approach was adopted to characterize it. Cells are rod shaped non-spore-forming and non-motile. Growth of strain Marseille-P3740 occurs at 37°C in anaerobic atmosphere at pH 7. The 16S ribosomal RNA gene sequence analysis of the novel strain Marseille-P3740 displayed 96.31% similarity in nucleotide sequence with Collinsella intestinalis strain JCM 10643, the phylogenetically closest related species with standing in nomenclature. C 16:0 , C 18:1n9 and C 18:2n6 are the major cellular fatty acid components. The genomic of strain Marseille-P3740 T is 1.74 Mbp long with 59.1 mol% of G + C content. Based on the taxonogenomic description and the phenotypic and biochemical properties of this bacterium, we propose the strain Marseille-P3740T (= CSUR P3740 = CCUG 70947) as a new species, Collinsella provencensis sp. nov.


Introduction
The Collinsella genus was rst described in 1999 by Kageyama et al., 1 following the transfer of Eubacterium aerofaciens to the genus on the basis of deep analysis of 16S rRNA gene sequence variability with other Eubacterium sp. At the time of writing, Collinsella genus contains ve validly described species, including C. aerofaciens 1 , C. intestinalis 2 , C. massiliensis 3 , C. stercoris 2 and C. tanakaei 4 . Species belonging to the Collinsella genus are Gram-positive anaerobic bacilli. They have been isolated from the human intestinal microbiota and can sometimes be associated with diseases 5,6 .
Bacterial diversity is believed to play a role in normal physiological functions and in diseases which needs to be better understood 7 . Indeed, the culturomics concept developed in recent years allows, by developing different culture conditions, to expand our knowledge of the human microbiota through the discovery of bacteria never cultivated before [8][9][10][11] . Once the bacterium was isolated, we used a taxonogenomic approach including matrix-assisted laser desorption-ionization time-of-ight mass spectrometry (MALDI-TOF MS), phylogenetic analysis, main phenotypic description and genome sequencing, to establish its description 12,13 .

Strains isolation and identi cation
As part of a culturomic study investigating the human microbiome, we isolated a bacterial strain Marseille-P3740 from fresh stool of a 32-year-old male volunteer living in France. The patient has endorsed an informed consent, while the study was authorized by the ethics committee of the Institut Federatif de Recherche IFR48 under number 2016-010.
In order to prove the existence of bacteria by culturomics, our stool sample was diluted with phosphate buffer saline and then incubated at 37°C in an anaerobic culture ask (BD BACTEC®, Plus Anaerobic/F Media, Le Pont de Claix, France) which was lled with 5% (V/V) sheep blood and 5% (V/V) sterile-ltered cow rumen. To obtain distinct bacterial colonies, we subcultured onto 5% sheep blood-enriched Columbia agar medium (BioMérieux, Marcy l'Etoile, France) at 37°C after 48 hours.
First identi cation was attempted with MALDI-TOF Mass Spectrometry (Bruker, Daltonics, Bremen, Germany) as previously reported 15 . The generated spectra from this strain were analyzed using Biotyper 3.0 software, by comparing them to those properly and regularly incremented in the local URMS database (https://www.mediterranee-infection.com/urms-data-base). MALDI-TOF MS could not correctly identify the strain Marseille-P3740, a molecular investigation was therefore carried out by amplifying the 16S rRNA gene using the primer pair fD1 and rP2 (Eurogentec, Angers, France). Then, it was sequenced using the Big Dye® Terminator v1.1 Cycle Sequencing Kit and 3500xLGenetic Analyzer capillary sequencer (Thermo sher, Saint-Aubin, France), following the protocol previously reported 16 . Using CodonCode Aligner software (http://www.codoncode.com), sequences were assembled and analyzed in order to obtain a consensus sequence as reference sequence of the type strain and submitted it to the NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/nucleotide/). A comparative analysis of nucleotides by BLASTn was performed. Thus, only the sequences phylogenetically close to the typical species were recovered to build the phylogenetic tree.

Phenotypic characterization
Strain Marseille-P3740 was cultured in aerobic, microaerophilic and anaerobic atmospheres (Thermo Scienti c, Dardilly, France) and at different temperatures (25, 28, 37, 45 and 56°C) and varied pH (5 to 8.5) to evaluate its growth under these conditions on 5% sheep blood-enriched Columbia agar medium (bioMérieux). The Gram staining, catalase and oxidase tests, as well as spore-forming were realized as previously described 17 . In addition, to study the biochemical characteristics of the Marseille-P3740 strain, API ZYM and API 50 CH (bioMérieux) strips were used according to the manufacturer's instructions. To reveal the shape of the bacterial cells, a negative staining of the strain Marseille-P3740 was carried out and observed under a scanning electron microscope (Hitachi High-Technologies, Tokyo, Japan) following the protocol described by Belkacemi et al., 2019 18 .

Genome characteristics
Genomic DNA extraction was realized using the EZ1 DNA tissue kit (Qiagen, Hilden, Germany) adapted to EZ1 biorobot. It was sequenced by the MiSeq instrument (Illumina Inc, San Diego, CA, USA) using the Nextera Mate Pair and Nextera XT Paired End (Illumina) sample preparation kit, following the same protocol previously used 19 . Three known softwares were used to correctly assemble this genome, including Spades 20 , Velvet 21 and Soap Denovo 22 . To manage trimmed or untrimmed sequences, MiSeq and Trimmomatic softwares were used, respectively 23 . In addition, we used the GGDC (Genome-to-Genome Distance Calculator) web server available online (http://ggdc.dsmz.de) to calculate the genomic similarities 24 . This allowed us to obtain DNA-DNA hybridization (DDH) values of these compared genomes. The average nucleotide identity (OrthoANI) was also accessed using the OAT software 25 .

Ethical approval
The volunteer has given freely his authorization by signed and informed consent for advanced studies to be done on the collected sample. In addition, all the methods used in this study were performed in accordance with relevant guidelines and regulations conformed to Declaration of Helsinki.

Results
Strain identi cation and phylogenetic analysis MALDI-TOF mass spectrometry could not correctly identify the strain Marseille-P3740. Thus, the obtained reference spectrum was added to the local database. Similarity analysis based on 16S rRNA of strain Marseille-P3740 against GenBank showed the highest nucleotide sequence similarities of 96.31 % sequence identity with Collinsella intestinalis strain JCM 10643 (Genbank accession number: NR_113165.1), which was the phylogenetically closest species. This value obtained is below the threshold of 98.65% recommended to delimit the species barrier in bacteria 24,26 . Therefore, the strain Marseille-P3740 was considered as a potentially new species belonging to the genus Collinsella within the Coriobacteriaceae family in the phylum Actinobacteria ( Figure 1).

Genome sequencing and comparison
The genome of Collinsella provencensis strain Marseille-P3740 is 1,737,922 bp long with a 58.2 mol% G+C content.
The genome size, the number of proteins and the number of genes of strain Marseille-P3740 were lower than those of the other genomes studied here (Table 3). On the other hand, it has a genome with a GC percentage higher than that of C. bouchesdurhonensis, but lower than that of C. intestinalis, C. vaginalis, C. stercoris and C. tanakaei (62.4, 64.4, 62.7 and 60.2 mol% respectively). Calculation of the degree of genomic similarity of strain Marseille-P3740 with closely related species showed that values ranged from 72.57% between C. bouchesdurhonensis and C. vaginalis to 82.09% between C. intestinalis and C. stercoris. On the other hand, C. provencensis compared to the other Collinsella species, showed values ranging from 78.01% with C. stercoris to 72.90% with C. bouchesdurhonensis. Therefore, we found that the OrthoANI values among closely related species (Figure 3) were below the value at the 95% threshold recommended for delineating species barrier in prokaryotes. However, the analysis of the DDH values calculated between the genomes of the Collinsella species studied here, showed that no value is close to the threshold value (70%) making it possible to delimit a new bacterial species. Indeed, 27.3% was the highest DDH value obtained between C. stercoris and C. phocaeensis, while the lowest DDH value obtained (21.6%) was shared between C. vaginalis and C. aerofaciens (Table 4).

Conclusion
The results obtained from phenotypic, phylogenetic and genomic analysis, such as 16S rRNA sequence similarity, OrthoANI values lower than 95%, strongly con rmed that strain Marseille-P3740 T is a new bacterial species called Collinsella provencensis sp. nov.
Collinsella provencensis (pro.ven.cen'sis, N.L. fem. adj. provencensis, pertaining to Provence, the region of France where the strain was isolated). Optimum growth of colonies was obtained at 37°C on 5% sheep blood enriched Columbia agar during after 48 hours. They appear small and transparent. C. provencensis is a Gram-positive rod-shaped bacterium with a mean length of 1 μm and a mean diameter of 0.5 μm.
Strain Marseille-P3740 T produced alkaline phosphatase, acid phosphatase, naphthol-AS-BI-phosphohydrolase and D-trehalose. However, negative reactions were observed for trypsin, βgalactosidase, α-glucosidase, glycerol, D-arabinose, D-ribose, D-xylose, D-glucose, D-fructose, D-mannose, L-rhamnose, D-lactose, D-saccharose, glycogen, D-fucose and D-arabitol. Strain Marseille-P3740 T is catalase-negative and oxidase-negative. The genome size of Collinsella provencensis strain Marseille-P3740 T is about 1.74 Mbp long with 58.1 mol% G+C content. The Genbank accession number for the 16S rRNA gene sequence of strain Marseille-P3740 T is LT722680 and for the whole genome shotgun project is FZRI00000000.This strain was isolated from fresh stool of a healthy French man.

Declarations
Con ict of interest: None to declare 24. Meier-Kolthoff, J. P., Auch, A. F., Klenk, H. P. & Göker, M. Genome sequence-based species delimitation with con dence intervals and improved distance functions. BMC Bioinformatics. 14, 60 (2013     Figure 1 Phylogenetic tree highlighting the position of Collinsella provencensis sp. nov., based on the 16S rRNA gene sequences relative to the most closely related type species within the genus Collinsella. Genbank accession numbers are putted in parentheses. Sequences were aligned using MUSCLE with default parameters, phylogenetic inference were obtained using the Maximum likelihood method and the MEGA X software. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 1000 times to generate a majority consensus tree. The scale bar indicates a 5% nucleotide sequence divergence.

Figure 2
Scanning electron microscopy of stained Collinsella provencensis sp. nov., (Hitachi TM4000). Scales and acquisition settings are shown on gure.