Genomic molecular signatures determined characterization of Mycolicibacterium gossypii sp. nov., a fast-growing mycobacterial species isolated from cotton field soil

A Gram-positive, acid-fast and rapidly growing rod, designated S2-37 T, that could form yellowish colonies was isolated from one soil sample collected from cotton cropping field located in the Xinjiang region of China. Genomic analyses indicated that strain S2-37 T harbored T7SS secretion system and was very likely able to produce mycolic acid, which were typical features of pathogenetic mycobacterial species. 16S rRNA-directed phylogenetic analysis referred that strain S2-37 T was closely related to bacterial species belonging to the genus Mycolicibacterium, which was further confirmed by pan-genome phylogenetic analysis. Digital DNA-DNA hybridization and the average nucleotide identity presented that strain S2-37 T displayed the highest values of 39.1% (35.7–42.6%) and 81.28% with M. litorale CGMCC 4.5724 T, respectively. And characterization of conserved molecular signatures further supported the taxonomic position of strain S2-37 T belonging to the genus Mycolicibacterium. The main fatty acids were identified as C16:0, C18:0, C20:3ω3 and C22:6ω3. In addition, polar lipids profile was mainly composed of diphosphatidylglycerol, phosphatidylethanolamine and phosphatidylinositol. Phylogenetic analyses, distinct fatty aids and antimicrobial resistance profiles indicated that strain S2-37 T represented genetically and phenotypically distinct from its closest phylogenetic neighbour, M. litorale CGMCC 4.5724 T. Here, we propose a novel species of the genus Mycolicibacterium: Mycolicibacterium gossypii sp. nov. with the type strain S2-37 T (= JCM 34327 T = CGMCC 1.18817 T).


Introduction
The genus Mycolicibacterium presents the second largest population of Gram-positive, acid-fast and rodshaped microbes in the family Mycobacteriaceae (Parte et al. 2020). At the time of writing, the genus Mycolicibacterium is composed of 92 recognized species with published names (https://lpsn.dsmz.de) (Parte et al. 2020), which primarily harbors rapidgrowing species isolated from diverse range of environments, including river water, marine sediment and soil (Butler et al. 1993;Brown-Elliott et al. 2010;Zhang et al. 2013). There are also relatively high proportion of members in the genus isolated from clinical specimens, indicating potential pathogenicity to humans and animals (Shojaei et al. 2000;Brown-Elliott and Wallace 2002).
It was well documented that mycobacterial species displayed relatively high diverses in their genomic features at species level (Fedrizzi et al. 2017), which enhanced the challenge to characterize the taxonomic position of microbial species belonging to this population. Much effort has been devoted to primarily delineate different evolved branches of this population using well established approaches based on analyses of the 16S rRNA gene, 16S-23S spacer, and housekeeping gene concatenated multilocus sequences (Roth et al. 1998;Mignard and Flandrois, 2008;Magee and Ward, 2012). However, the reliability of these methods used to well distinguish sub-groups (e.g., slow-and fast-growing species) of mycobacterial species remains of concern. Recently, Gupta et al. (2018) have developed a robust method consistently supporting the existence of five distinct monophyletic sub-groups of mycobacterial species, which are designated as the Mycobacterium, Mycolicibacterium, Mycolicibacter, Mycolicibacillus and Mycobacteroides genera. They have identified representative molecular markers in the form of conserved signature indels and proteins, which are uniquely shared by members of the five identified clades.
In this study, a putative novel species (strain S2-37 T ) belonging to mycobacterial species was isolated from a soil sample of cotton cropping field of Xinjiang in PR China. Phylogenetic analyses based on 16S rRNA genes and genomic sequences coherently agreed the closer relationship of strain S2-37 T with microbial species from the genus Mycolicibacterium. Finally, genome sequence of strain S2-37 T was mapped to previously documented molecular features specific for each genus (Gupta et al. 2018), ending up with the conclusion that the genome sequence of strain S2-37 T supported the classification into the genus Mycolicibacterium along with the polyphasic approaches.

Isolation of the novel strain and cultivation
Strain S2-37 T was isolated from a soil sample collected from a cotton cropping field (86°20 0 N 44°62 0 E, Xinjiang, China). The isolation procedures were described in the previous report with some modifications (Hopkins et al. 1991). Soil samples (5 g) were firstly naturally dried, then mixed with 100 ml of sterilized water in 250 ml flasks. Samples were placed on a shaker with vibration frequency of 160 r/min at 30°C for 30 min, followed by homogenization treatment for 1 min using a sonicator (XO-3200DT, Nanjing Xianou laboratory equipment Co., Ltd) with a frequency of 40 kHz. Then, 100 ll serially diluted samples were plated on commercially available Gause's synthetic No.1 agar (G1, 20 g soluble starch; 1 g KNO 3 ; 0.5 g K 2 HPO 4 ; 0.5 g MgSO 4 ÁH 2 O; 0.5 g NaCl; 0.01 g FeSO 4 ; 20 g Agar; 1000 ml distilled water; pH 7.2-7.4) for cultivation at 30°C in the following 14 days. Bacterial colonies were sub-cultured on G1 agar for three times to achieve pure isolates. Strain S2-37 T was maintained aerobically on G1 agar at 30°C, and stored at -80°C in G1 broth supplemented with equal volume of 50% (v/v) sterilized glycerol for preservation (Prakash et al. 2013).
Phylogenetic analysis based on 16S rRNA gene and genomic sequences Genomic DNA of strain S2-37 T was extracted and its 16S rRNA gene was achieved by PCR amplification using universal primers 27F and 1492R as described by Fan et al. (2008), followed by sequence alignment in order to preliminarily determine the taxonomic position of the strain S2-37 T using the BLAST function embedded in NCBI database (Federhen 2012). GGDC web server (available at http://ggdc. dsmz.de/) was employed for gene phylogenetic and similarity analysis between 16S rRNA gene sequences of S2-37 T and all published Mycolicibacterium type strains collected from the LPSN database (Parte et al. 2020). The phylogenetic trees were reconstructed based on 16S rRNA gene sequences of strains S2-37 T and its 22 closely phylogenetic relatives using the neighbor-joining (NJ) (Saitou and Nei 1987), maximum-parsimony (MP) (Sourdis and Nei 1988) and maximum-likelihood (ML) (Steel and Rodrigo 2008) algorithms supported by the software package MEGA 7.0 (Kumar et al. 2016), with Bacillus subtilis NTCC 6051 T as the outgroup. The topology of phylogenetic trees was evaluated by performing a bootstrap analysis based on 1000 replications (Felsenstein 1985).
Genomic DNA was extracted from strain S2-37 T using Bacterial Genomic DNA Rapid Extraction kit (Cat. No. B518225) supported by Sangon Biotech (Shanghai, China). Library construction, quality control and analysis were performed following the methods described by Zhu et al. (2001). The draft genome of S2-37 T was sequenced using Illumina HiSeq 4000 PE150 (Patnaik et al. 2016) at Beijing TSINGKE Bioinformatics Technology Co., Ltd, and assembled using SOAPdenovo (Luo et al. 2012), SPAdes (Bankevich et al. 2012) and Abyss software (Simpson et al. 2009), respectively. The final genome assembly was achieved by integrating three assemblies in CISA software (Lin and Liao 2013). The whole genomic sequences of strain S2-37 T and its 22 closely phylogenetic relatives were used to perform pan-genome analysis with a bacterial pan genome analysis pipeline (BPGA) (Chaudhari et al. 2016), in order to derive the exact phylogenetic affiliation. ONE CLICK MODE was performed in BPGA program and all the analyses were performed in a single step using default parameters (sequence identity cut-off = 50% and No. of iterations for pan-genome profile calculation = 20).

Genomic analysis
Functional categories of the strain S2-37 T genome were predicted using the Cluster of Orthologous Group of Proteins (COG) (Tatusov et al. 2003). AntiSMASH was employed to predict biosynthetic gene clusters of strain S2-37 T (Blin et al. 2019). Functional gene annotation and metabolic pathway prediction were performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa et al. 2016). The virulence factor database (VFDB) was employed to predict virulence factors of strain S2-37 T (Liu et al. 2019). Digital DNA-DNA hybridization (dDDH) analysis was performed using an automated high-throughput platform for genomebased taxonomy called Type (Strain) Genome Server (TYGS; http://tygs.dsmz.de/) (Meier-Kolthoff et al. 2013), with the genome of strain S2-37 T as the only query sequence. Confidence intervals were calculated using the recommended settings of the GGDC 2.1 (Meier-Kolthoff et al. 2013). Additionally, average nucleotide identity (ANI) was calculated with the OrthoANIu algorithm using the server-based software (available at http://www.ezbiocloud.net/tools/ani) (Yoon et al. 2017).

Divergence analysis of conserved molecular signatures
Conserved molecular signatures, including indels (CSIs) and unique proteins (CSPs), specific for the genus Mycolicibacterium and Mycobacterium were directly collected from the work described by Gupta et al. (2018). Conserved protein sequenes of different species with CSIs were achieved by BLASTp searches against the NCBI non-redundant (nr) database (Altschul et al. 2005), and the identified orthologous proteins with a minimum of 50% in sequence identity were used for divergence analyses. Phylogenetic analyses were performed using the software package MEGA 7.0 after multiple alignments of the sequence data with CLUSTAL W (Kumar et al. 2016). Phylogenetic trees were reconstructed using the neighborjoining (NJ) algorithms (Saitou and Nei 1987). Partial sequence alignments of a conserved region of orthologous proteins from different species were displayed in same patterns as the previous work did (Gupta et al., 2018). Phylogenetically closely related species to strain S2-37 T (Mycolicibacterium litorale CGMCC 4.5724 T , Mycolicibacterium monacense DSM 44395 T , Mycobacterium neglectum CECT 8778 T and Mycobacterium lehmannii SN 1900 T ) and randomly chosen species (Mycolicibacterium austroafricanum DSM 44191 T , Mycolicibacterium smegmatis NCTC 8159 T , Mycobacterium asiaticum DSM 44297 T , Mycobacterium avium 104 T and Mycobacterium bohemicum DSM 44277 T ) were employed for the divergence analyses. The CSPs profile of strain S2-37 T was identified using BLASTp searches against the NCBI non-redundant (nr) database, in order to evaluate its phylogenetic relationship with genus Mycolicibacterium and Mycobacterium. Full names will always be used throughout the context for species belonging to the genus Mycobacterium, in order to avoid confusion when refering to species from the genus Mycolicibacterium.

Morphology, physiology, and biochemical analysis
Gram staining was performed using the methods described by Smibert and Krieg (1994). Acid fastness was conducted using methods described by Berd (1973). Cell morphology was determined when grown on Bennett's agar (10 g glucose; 2 g N-Z amine; 1 g beef extract; 1 g yeast extract; 15 g agar; 1000 ml distilled water; pH 7.1-7.5) at 30°C for 7 days using a light microscope (OLYMPUS BX43F; Olympus Corporation, Tokyo, Japan) and a cold field emission scanning electron microscope (SEM; Hitachi SU8010, Tokyo, Japan). SEM analysis was performed as described by Koon et al. (2019).
Growth with various NaCl concentrations (0-3% at 0.5% intervals and 3-7% at 1% intervals, w/v) and at different temperatures (10, 18, 25, 30, 37 and 40, 45°C), was examined by growing the strains on Bennett's medium as the basal medium. Growth at different pH values (4.0-10.0, at intervals of 1.0 pH unit) was examined on Bennett's medium using the buffer system described by Xu et al. (2005). Catalase activity was determined using 3% H 2 O 2 , and gas production was identified as a positive reaction. Urease activity, nitrate reduction and tween 80 hydrolysis were determined using methods described by Kent and Kubica (1985). Resistance to antibiotics was determined using impregnated filter-paper discs (Goodfellow and Orchard 1974) containing cefalotin, cefoxitin, amikacin, ciprofloxacin, clarithromycin, doxycycline, tobramycin and sulfamethylisoxazole. Other biochemical properties of strain S2-37 T were further tested using the API 20 NE and API ZYM systems (bioMérieux) according to the manufacturer's instructions, and results were summarized in the species description.

Chemotaxonomic characterization
The polar lipids profile of the strain S2-37 T was determined using standard thin-layer chromatographic procedures (Minnikin et al. 1984). In addition, cellular fatty acids were extracted from freeze dried biomass of the strain and were saponified and methylated to produce fatty acid methyl esters (FAMES) following the procedure described by Kuykendall et al. (1988). The FAMES were analyzed by gas chromatography (Agilent 6890 instrument) and the resultant peaks were automatically integrated (Lisec et al. 2006). The fatty acids profile was determined using the standard Microbial Identification (MIDI) System, version 4.5, and the Myco 6 database (Sasser 2001).

Results and discussion
Phylogenetic and genomic analyses Phylogenetic analyses displayed that strain S2-37 T showed the highest 16S rRNA gene sequence similarity with Mycolicibacterium pyrenivorans DSM 44605 T (98.5%), Mycobacterium neglectum CECT 8778 T (98.5%), followed by Mycolicibacterium austroafricanum DSM 44191 T (98.3%) (Table S1). NJ algorithm-directed phylogenetic analysis showed that strain S2-37 T was located in a clade adjacent to M. pyrenivorans DSM 44605 T , Mycolicibacterium aurum NCTC 10437 T and M. austroafricanum DSM 44191 T , with a low bootstrap support (\ 50%) (Fig. 1). The phylogenetic distribution pattern could also be reproduced by the MP and ML trees (Fig. S1, available in the online version of this article). Pan-and core-genome phylogenetic analyses presented distinct topoloties of tested mycobacterial species as compared to that given by the tree reconstructed based on 16S rRNA genes, but coherently agreed that strain S2-37 T showed closer relationship with species belonging to the genus Mycolicibacterium (Fig. S2).
The assembled genome of S2-37 T was 5.9 Mbp with 12 contigs (all [ 500 bp, with an N50 length of 495,170 bp) and the sequencing coverage was approximately Â 100. The total length was 5,843,440 bp and the G ? C content was 68.43 mol%. The results of functional annotation based on the COG database showed that genes involved in lipid transport and metabolism accounted for the largest proportion, except for genes with general function prediction only (Fig. S3). The antiSMASH biosynthetic gene clusters of S2-37 T were shown in Supplementary table 2, which could provide ideas for guiding the screening of active secondary metabolites. According to the KEGG analysis, the majority of genes were involved in cell metabolism (Fig. S4), and some genes were classified into pathways associated with antimicrobial resistance and bacterial infection (Table S3). According to the VFDB analysis, strain S2-37 T was predicted to be able to produce mycolic acid (see details in Table S4), which was considered as a virulence factor shared by large number of mycobacteria (Tortoli 2003). In addition, type VII secretion system (T7SS) was also identified, which has been proved to be involved in the secretion of virulence-associated proteins, the interaction between pathogens and hosts, the balance of zinc/iron in microbes as well (Cao et al. 2016). Particularly, PE/PPE proteins classified as members of T7SS were detected in the genome of strain S2-37 T . These proteins have been identified to localize at the cell surface and/or be secreted, inducing strong immune responses in the host and playing crucial roles in the virulence and pathogenesis of Mycobacterium tuberculosis (Choudhary et al. 2003;Sampson 2011). All these results suggested the potential pathogenicity of strain S2-37 T to human being, the feature of which was shared by many mycobacteria pathogens.
We then determined that strain S2-37 T presented highest dDDH values (average and confidence interval in parentheses) of 39.1% (35.7-42.6%) and 34.9% (35.0-38.4%) with M. litorale CGMCC 4.5724 T and M. monacense DSM 44395 T . This corroborated the result achieved from the phylogenetic trees constructed based on pan-and core-genomes, suggesting closely phylogenetic relationships of strain S2-37 T with M. litorale CGMCC 4.5724 T and M. monacense DSM 44395 T . The ANI values between strain S2-37 T and M. litorale CGMCC 4.5724 T and M. monacense DSM 44395 T were 81.28% and 81.09%, respectively. Overall, all calculated values were below the suggested threshold for the delineation of a novel species (Chun et al. 2018).

Divergence analysis of conserved molecular signatures
Four (LacI family transcriptional regulator, cyclase, CDP-alcohol phosphatidyltransferase and phosphatidylserine synthase) and two (UPF0182 family protein and 23S rRNA (guanosine(2251)-2'-O)- Fig. 1 Phylogenetic trees reconstructed by the neighbourjoining method based on 16S rRNA genes, which show the phylogenetic relationships between strain S2-37 T and its closely related species. Bootstrap percentages (based on 1000 replications) above 50% are shown at the nodes. The GenBank accession numbers for the 16S rRNA gene and genomic sequences are shown in parentheses. Bacillus subtilis subsp. subtilis JCM 1465 T was employed as the outgroup. Filled circles indicate that the corresponding nodes are also recovered in trees reconstructed by the ML and MP algorithms. Bar, 0.02 substitutions per nucleotide position methyltransferase RlmB) conserved proteins with CSIs specific for most members of the genus Mycolicibacterium and Mycobacterium were employed for the divergence analysis, respectively. Phylogenetic analyses based on whole sequences of conserved proteins indicated that strain S2-37 T was more closely related to members from the genus Mycolicibacterium as compared to those from the genus Mycobacterium. And in the partial sequence alignments of conserved proteins, the amino acid insertion partterns of strain S2-37 T were consistent with the molecular signatures (Fig. S5a, c, e, f), which were specific for most members within the genus Mycolicibacterium identified by Gupta et al. (2018). However, conserved regions of two protein sequences did not give a clear clue where strain S2-37 T could be taxonomically positioned (Fig. S5b, d). The CSPs profile of strain S2-37 T was summarized in Table 1. It was clearly shown that strain S2-37 T possessed some identified CSPs specific for members of the genus Mycocilibacterium, but none for members of the genus Mycobacterium and other slow growers of mycobacterial species. Thus, we concluded that strain S2-37 T represented a novel bacterial species in the genus Mycolicibacterium.
Intriguingly, we observed that Mycobacterium neglectum CECT 8778 T and Mycobacterium lehmannii SN 1900 T displayed similar CSIs patterns with members of the genus Mycolicibacterium. It has been reported that these two species were capable of forming colonies within 7 days (Nouioui et al. 2017(Nouioui et al. , 2018, and their most phylogenetically closely related species (e.g., Mycobacterium aurum, Mycobacterium mageritense and Mycobacterium vanbaalenii) have been reclassified into the genus Mycolicibacterium (Gupta et al. 2018). Therefore, we suggested that Mycobacterium neglectum and Mycobacterium lehmannii should be modified to Mycolicibacterium neglectum and Mycolicibacterium lehmannii, respectively.  Gupta et al. (2018) The symbol: -, not detected Morphology, physiology, and biochemical analysis Strain S2-37 T showed more robust growth on Bennett's agar as compared to G1 and R2A agar. It could form round, yellowish colonies stably on Bennett's agar. Strain S2-37 T displayed Gram-indefinite in Gram staining, but positive in acid-fast staining (Fig. S6), which was frequently observed in mycobacterial species (Nakamura et al. 1991). SEM observation revealed that strain S2-37 T is rod-shaped, approximately 1.0-1.5 lm in length and 0.4 lm in diameter (Fig. 2). Positive for catalase activity and negative for Tween 80 hydrolysis, other results achieved from strains S2-37 T , M. litorale CGMCC 4.5724 T and M. monacense DSM 44395 T , were summarized in Table 2. All of these tests were carried out in duplicate using the standard inoculum.
In conclusion, the phenotypic, chemotaxonomic and phylogenetic data supports the delineation of strain S2-37 T as a novel species of the genus Mycolicibacterium. We propose the name Mycolicibacterium gossypii sp. nov. for the species.
Description of Mycolicibacterium gossypii sp.nov.
Cells are aerobic, Gram-positive, acid-fast, nonspore-forming and short rods. It could form smooth, yellowish colonies on Bennett's agar within 7 days at 30°C. Cells are approximately 0.4 lm in diameter and 1.0-1.5 lm long. Positive for catalase activity and negative for Tween 80 hydrolysis. In API 20NE tests, arginine dihydrolase activitie is positive. Negative for nitrate reduction, indole production, b-galactosidase, b-glucosidase, gelatinase and urease activities, as well as fermentation of D-glucose. In addition, results are negative for assimilation of D-glucose, L-arabinose, D-mannitiol, maltose, gluconate, N-acetyl-glucosamine, adipic acid, malic acid, capric acid and phenylacetic acid, but positive for citrate acid. In API ZYM tests, alkaline phosphatase, esterase (C4), esterase lipase (C8), leucine arylamidase, valine arylamidase, cystine arylamidase, acid phosphatase, naphthol-AS-BI-phosphohydrolase, a-glucosidase activities are positive. Negative for lipase (C14), trypsin, a-chymotrypsin, a-galactosidase, b-galactosidase, b-glucuronidase, N-acetyl-b-glucosaminidase, a-mannosidase and a-rucosidase activities. Susceptible to amikacin, ciprofloxacin, clarithromycin, doxycycline, tobramycin and sulfamethylisoxazole, but resistant to cefalotin and cefoxitin. The major fatty acids ([ 10%) are C 16:0 , C 18:0 , C 20:3 x3 and C 22:6 x3. The polar lipids profile is mainly composed of diphosphatidylglycerol, phosphatidylethanolamine and phosphatidylinositol. The genome size of strain S2-37 T is 5.9 Mbp and the DNA G ? C content of the type strain is 68.4 mol%. The GenBank/EMBL/DDBJ accession numbers for the 16S rRNA sequence is MW295419. This Whole Genome Shotgun project has been deposited at GenBank/EMBL/DDBJ under the accession number JAFEVR000000000. The version described in this paper is version JAFEVR010000000.

Declarations
Conflict of interest The authors declare no conflicts of interest.
Availability of data and material The GenBank/EMBL/ DDBJ accession number for the 16S rRNA gene sequence of the strain S2-37 T is MW295419. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAFEVR000000000. The version described in this paper is version JAFEVR010000000.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent All authors agree to publish this work.