Rothia santali sp. nov., endophytic bacteria isolated from sandalwood (Santalum album L.) seedling

A novel, mustard yellow-pigmented aerobic bacterial strain designated AR01T was isolated from hypocotyl tissue of a sandalwood seedling from Bangalore, India. The 16S rRNA gene of strain AR01T had the highest 98.97% sequence similarity with Rothia halotolerans YIM 90716T (KCTC 19172) followed by Rothia kristinae PM 129T (NBRC 15354T) (97.31%) and Rothia koreensis P31T (JCM 15915) (97.11%), respectively. The strain AR01T was coccoid-shaped, non-motile, non-spore forming, oxidase negative and catalase positive. The strain AR01T has a genome size of 3.31 Mb containing 2993 protein-coding genes including 48 tRNA and 10 rRNAs spread across 84 contigs. The genomic DNA G + C content was 71.77 mol%. The calculated dDDH was 31.10% and the OrthoANI value was 85.27% when compared with its closest related type strain Rothia halotolerans YIM 90716T. The predominant cellular fatty acids were C16:0 iso, C15:0 anteiso and C17:0 anteiso. The strain AR01T contains major polar lipids including diphosphatidylglycerol and phosphatidylglycerol. The distinct physiological, biochemical characteristics and genotypic relatedness indicated that AR01T represents a novel species of the genus Rothia, for which the name Rothia santali sp. nov. (Type strain AR01T = MCC 4800T = JCM 35593T) is proposed.


Introduction
The genus Rothia was initially classified as Nocardia (Onishi 1949) which was later renamed Rothia by (Georg and Brown 1967). In the beginning, Rothia was a group under the Actinomycetaceae family which was then transferred to Communicated by Erko Stackebrandt.
The GenBank/EMBL/DDBJ accession number for the reference 16S rRNA gene sequences of the strain AR01 T is OM838448. The accession number of the whole genome of AR01 T is JANAFB000000000.

609
Page 2 of 8 the Micrococcaceae family by Stackebrandt et al. (1997). They have described cells of Rothia are usually coccoid in morphology with raised smooth colonies and gram-stainpositive. The cellular structure is mostly comprising peptidoglycan type A3α and contains alanine, glutamic acid and lysine. Accurate identification of Rothia by the conventional method may be not straight forward because of its high similarity to genus like Kocuria and Nocardia. In such cases, the polyphasic approach of microbial taxonomical characterization is employed for a more reliable and accurate approach for the novel identification of a species.

Isolation and culture conditions
The strain AR01 T was isolated from hypocotyl tissue of sandalwood seedlings showing symptoms of stunted and abnormal growth collected from insect-free sandalwood nursery at the Indian Wood Science and Technology (IWST) institute located in Bangalore, India. The sandalwood seedling was surface sterilized by washing with sodium dodecyl sulphate (L3771; Sigma Aldrich, Germany) and processed to obtain endophytic microbes.

Morphological, physiological and biochemical characteristics
The Gram staining nature of AR01 T was determined by observing the culture smear under a light microscope (Model BX53; Olympus, USA) with the help of a Gram staining kit (K001-KT; HiMedia, India). The size and shape of the cells were determined by scanning electron microscope (Carl Zeiss, EVO 18, Version 6.02). Prior to that the bacterial samples were fixed in 2.5% glutaraldehyde solution (G5882; Merck, Germany) and dehydrated using graded alcohol concentrations of 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90%, one time and twice in 100% for 10 min each (Kannan 2018). The dehydrated samples were coated with gold particles using a sputter coater (model, SC7620; Quorum technologies, UK). For the media optimization, cells of strain AR01 T were grown in a different media plate including R2A, LA, NA and TSA and incubated at 28 °C for 24 h. Optimization of physiological characteristics like temperature, pH and salt tolerance was carried out in LB within a duration of 7 days each, respectively. Cell growth was tested at various temperatures, starting at 5 to 40 °C at an interval of 5 °C on LB medium. The cells' ability to sustain growth at different pH was checked at pH ranging from pH 4-12 units. Buffer systems were prepared as follows: acetate buffer (pH 4-5), phosphate Buffer (pH 6-7) and bicarbonate-carbonate buffer (pH 8-12). The salt tolerance of the culture was checked by inoculation in LB having the salinity range of 0% to 5% of NaCl with an increment of 0.5%, examined at 28 °C for 7 days.
The ability of AR01 T to produce spores was observed by Schaeffer-Fulton staining technique (Schaeffer-Fulton, 1933) with methylene blue as the primary strain and 0.5% saffron as a secondary stain. Cell motility was checked by stabbing the culture in a motility test medium (M260; HiMedia, India). The oxidase and catalase activity of the strain AR01 T was checked by oxidase discs (DD018; HiMedia, India) and 3% (v/v) hydrogen peroxide (31,642; Sigma Aldrich, Germany), respectively.
Utilization of carbon sources, assimilation and additional biochemical sensitivity tests were determined by performing API 20 NE (Biomerieux, France) and GEN III Micro-Plate (BIOLOG, USA) assays according to standard manufacturer instructions. The enzyme activity against different substrates of strain AR01 T was performed using API ZYM (Biomerieux, France). The type strain of R. halotolerans YIM 90716 T (KCTC 19172) was used as a reference strain for all the tests mentioned earlier.

Chemotaxonomic characterization
Cellular fatty acids of strain AR01 T were analyzed by Sherlock™ Microbial ID System (MIDI, Version 6.1; USA) using the RTSBA6 library as per the manufacturer's instructions (Sasser 2001) using the cell mass of strain AR01 T and R. halotolerans YIM 90716 T (KCTC 19172) harvested by growing them on TSA media at pH 7.3 ± 0.2 for 48 h. For polar lipid analysis, cell mass was harvested from cultures at the logarithmic phase. Methanol/chloroform/0.3% sodium chloride (2:1:0.8, by vol.) was used for the extraction of polar lipids (Bligh and Dyer 1959) in addition to the modifications of (Card 1973). Twodimension chromatography was used for separation using chloroform-methanol-water (65:25:4 by vol.) in the first dimension and chloroform-acetic acid-methanol-water (40:7.5:6:2, by vol.) in the second dimension on silica gel TLC (Kieselgel 60 F254; Merck) (Minnikin et al. 1984). The plates were dried and sprayed with 5% ethanolic phosphomolybdic acid for visualization of total lipids. Further characterization was done by spraying the plates with ninhydrin (for amino groups), molybdenum blue (for phosphates), Dragendorff (for quaternary nitrogen) or α-naphthol (for sugars) (Card 1973).

Sequencing of the 16S rRNA gene and phylogenetic analysis
Genomic DNA of the strain AR01 T was extracted using the CTAB DNA extraction protocol (Ausubel, 1989). The quality and quantity were checked using a UV Spectrophotometer Nanodrop One, (Thermo Fisher Scientific, USA) and Qubit fluorometer (Thermo Fisher Scientific, USA). The partial 16S rRNA gene was amplified using the bacterial universal primer 27F and 1492R. The purified PCR products were further sequenced using primers 343R, 704F, 907R and 1028F (Baker et al. 2003). The obtained 16S rRNA gene sequence (OM838448, 1485 bp) of AR01 T was compared with the closest known sequences with valid names only according to EzBioCloud Database (Yoon et al. 2017). Phylogenetic trees were constructed using Neighbour-joining (NJ), maximum-likelihood (ML) and maximum-parsimony (MP) methods to infer the position of the AR01 T and its closest known relatives using the MEGA7 (Kumar et al. 2016) software with 1000 replications of bootstrap analysis. The type strain R. halotolerans YIM 90716 T (KCTC 19172) was which was closest phylogenetic relative, procured from their respective culture collections for comparative polyphasic characterization and maintained on LA media at pH 7 and 28 °C.

Genome sequencing and analysis
Using Illumina MiSeq and Oxford Nanopore Technology (ONT) platform, the genomic DNA of strain AR01 T was sequenced to obtain its genome sequence. The Illumina MiSeq raw reads were checked for their quality using FastQC v0.10.1 (Brown et al. 2017). The sequenced data obtained from ONT sequencing data were base-called with quality filtering (> Q7) using Guppy v3.5.4. The qualityfiltered sequence of Illumina MiSeq and ONT reads were assembled using SPAdes version v3.15.3 (Antipov et al. 2016;Prjibelski et al. 2020) to obtain hybrid genome assembly of the strain AR01 T . The quality and completeness of the assembled genome were checked using QUAST v5.0.2 (Gurevich et al. 2013) and CheckM v1.1.3 (Parks et al. 2015). The rRNA and tRNA screening was done by running RNAmmer (version 1.2) as described by (Lagesen et al. 2007). The DNA G + C content of strain AR01 T was calculated. The digital DNA-DNA hybridization (dDDH) values were calculated using the Genome-to-Genome Distance Calculator (GGDC) webserver following the recommended formula 2 and orthoANI calculator for Average nucleotide identity (ANI), respectively (Meier-Kolthoff et al. 2014;Auch et al. 2010;Yoon et al. 2017). Further, the genome of strain AR01 T was annotated from the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (Zhao et al. 2012). The functional groups of strain AR01 T were analyzed using Clusters of Orthologous Groups (COG) in eggNOG-mapper v2 (Cantalapiedra et al. 2021) from the PGAP annotated protein sequences. A comparison of orthologous gene clusters amongst the genomes under study was made using OrthoVenn2 (Xu et al. 2019).
To validate and confirm the taxonomic position of the new strain AR01 T , a phylogenetic tree based on the core genome was constructed using BPGA, version 1.3.0 (Chaudhari et al. 2016) and UBCG, version 3.0 (Na et al. 2018) of 10 type strains phylogenetically related to strain AR01 T . As described by Edgar 2010, using the integrated USEARCH algorithm the BPGA pipeline generated orthologous protein clusters. The generated protein sequences were further aligned and concatenated. Finally, the phylogenetic tree was reconstructed using the BPGA concatenated sequences by the neighbour-joining method in MEGA 7. A total of 18,883 amino acids position were constructed in the final phylogeny tree. In addition, UBCG identified the bacterial core gene of strain AR01 T and its closely related species. The identified genes were further concatenated, align and then processed for reconstructing the phylogenetic tree. The length of concatenated alignment was 93,480 bp containing 92 marker genes identified using HMMER (Potter et al. 2018) and predicted using Prodigal search (Hyatt et al. 2010).

Phylogenetic and genotypic analysis
The 16S (Fig. 1). The genome sequence of strain AR01 T generated 3,375,449 reads from the Illumina MiSeq and 3,399,524 reads from the ONT sequencing. Depth coverage of 80 × was obtained with 99.01% complete genome containing 84 scaffolds with the largest contigs of 2,20,631 bp. The DNA G + C contents were 71.77 mol %. The final genome assembly of strain AR01 T was deposited in NCBI GenBank bearing accession number JANAFB000000000. The digital DNA-DNA hybridization (dDDH) values for strain AR01 T in against R. halotolerans YIM 90716 T (KCTC 19172) were 31.10% and 85.27% for orthoANI. A phylogenetic tree constructed from the core genome using BPGA and UBCG further supported the phylogeny derived using 16S rRNA gene sequences (Fig. 2).
A total of 2993 protein-coding genes were predicted including 48 tRNA and 10 rRNA. A total of 1670 genes had specific functional distributions according to the COG categories. The functional genes were assigned to 18 functional categories and 1 category (458 genes) with unknown function. The Venn diagram represents 1415 gene clusters shared by AR01 T alone with its closely related type strains while 207 gene clusters were shared by all strains (Fig. 3a). The genes responsible for amino acid transport and metabolism (224 genes), transcription (199) and translation, ribosomal structure and biogenesis (154 genes) were the most abundant followed by genes responsible for Inorganic ion transport and metabolism (139 genes), energy production and conversion (136 genes) (Fig. 3b).

Morphological, physiological and biochemical characteristics
Colonies of strain AR01 T were a mustard yellow-pigmented colony, smooth surface with an entire margin, it is a non-endospore forming and non-motile. The cells of strain AR01 T were gram-positive, coccoid-shaped with 0.8 to 1 μM in diameter as seen in scanning electron micrographs. Strain AR01 T showed an optimum growth in LA incubated at 25-30 °C (optimum 28 °C) for 2 days. The cells of AR01 T were able to grow in a salinity range of 0.0% to 5.0% at neutral pH of 7.0. However, the new strain was able to grow at a high pH of 10.0 (Table 1). Strain AR01 T was found to be catalase-positive and oxidase-negative.  . 2 A combined pan-genome phylogenetic tree of strain AR01 T derived from the orthologous protein and gene sequences was constructed using BPGA and UBCG tools using available genome sequences. No genome sequences are available for type strains of "Rothia arfidiae" SMC-2244 T, "Rothia nasisuis" 1a5R-CH16 T and "Rothia marina" JSM 078151 T . The BPGA-and UBCG-based phylogenetic trees were reconstructed in MEGA 7 using the neighbour-joining method and maximum-likelihood methods, respectively, at 1000 bootstrap replicates. The figures at branch points are bootstrap values observed in trees obtained using BPGA and UBCG, respectively. The draft genome sequence of Zhihengliuella flava H85-3 T (JADOTZ000000000) was used as an outgroup. Bar indicates the number of substitutions per site Based on the API 20 NE test performed, strain AR01 T was found negative for nitrate reduction and could not hydrolyze ESCulin (β-glucosidase) whereas strain R. halotolerans YIM 90716 T (KCTC 19172) showed positive results. Assimilation of L-arabinose, D-mannose, N-acetyl-glucosamine and potassium gluconate was detected negative in the strain KCTC 19172 but not in AR01 T . The APIZYM test performed showed a positive enzyme activity of alkaline phosphatase and α-glucosidase for strain AR01 T and negative for KCTC 19172. However, the esterase enzyme activity tested positive for strain KCTC 19172 T but not for strain AR01 T . In the BIOLOG GEN III microplate assays, strain AR01 T and YIM90716 T (KCTC 19172) showed utilising various substrates; all related data including the morphological, physiological and biochemical characteristics of strain AR01 T and YIM90716 T (KCTC 19172) are given in Table 1 and supplementary table S1.