Comparative genomic analysis of the genus Marinomonas and taxonomic study of Marinomonas algarum sp. nov., isolated from red algae Gelidium amansii

Members of the genus Marinomonas are known for their environmental adaptation and metabolically versatility, with abundant proteins associated with antifreeze, osmotic pressure resistance, carbohydrase and multiple secondary metabolites. Comparative genomic analysis focusing on secondary metabolites and orthologue proteins was conducted with 30 reference genome sequences in the genus Marinomonas. In this study, a Gram-stain-negative, rod-shaped, non-flagellated and strictly aerobic bacterium, designated as strain E8T, was isolated from the red algae (Gelidium amansii) in the coastal of Weihai, China. Optimal growth of the strain E8T was observed at temperatures 25–30 °C, pH 6.5–8.0 and 1–3% (w/v) NaCl. The DNA G + C content was 42.8 mol%. The predominant isoprenoid quinone was Q-8 and the major fatty acids were C16:0, summed feature 3 and summed feature 8. The major polar lipids were phosphatidylglycerol (PG) and phosphatidylethanolamine (PE). Based on data obtained from this polyphasic taxonomic study, strain E8T should be considered as a novel species of the genus Marinomonas, for which the name Marinomonas algarum is proposed. The type strain is E8T (= KCTC 92201T = MCCC 1K07070T).

In this study, a novel species within the genus Marinomonas was isolated from the surface of marine red algae and polyphasic taxonomy method was used to identify the taxonomic position of the novel strain E8 T . We have preliminarily analysed the pan-genomes, orthologue proteins in the genus and predicted the biosynthetic gene clusters related to secondary metabolism.

Isolation, cultivation and maintenance
The strain E8 T was isolated from marine red algae (Gelidium amansii) collected from the coast of XiaoShi Island Weihai, China (37.5°N, 122.1°E) during a low tide and brought to the laboratory in a cold sterilized chamber. Samples were smashed into small pieces and homogenized. Then 1 g homogenate was weighed out and blended in 9 ml sterilized seawater and mixed homogeneously as 10 -1 . After gradient dilution to 10 -2 and 10 -3 , 100 µl of three dilutions were spread on marine agar 2216 (MA, Becton Dickinson) and incubated at 25 °C for 5 days. A single colony of strain E8 T was obtained in pure culture after transferring to fresh MA by the previously described method (Xu et al. 2020) and the strain was stored at -80 °C in sterile 20% (v/v) glycerol supplemented with 1% (v/v) NaCl. The phylogenetically related reference strains Marinomonas communis JCM 20766 T and Marinomonas pontica DSM 17793 T , were obtained from Pro. Du. All sampling and isolating experiments were performed in the April of 2021.

16S rRNA gene sequence and phylogenetic analysis
To identify the taxonomic position of strain E8 T , the 16S rRNA gene sequence was amplified by PCR using two universal primers 27f and 1492r (Liu et al. 2014). The purified PCR products were ligated to the pGM-T vector and transferred into E. coli DH5α cells. The positive clones were selected and performed sequencing by BGI Co. Ltd (Qingdao, PR China) using the ABI 3730XL system. The obtained 16S rRNA gene sequence of strain E8 T was submitted to the GenBank database and the 16S rRNA gene sequence similarities were calculated using the BLAST algorithm at NCBI (https:// www. ncbi. nlm. nih. gov) and EzBioCloud server (http:// www. ezbio cloud. net/) (Yoon et al. 2017). The MEGA version 7.0 (Kumar et al. 2016) was used to reconstruct phylogenetic trees with the neighbor-joining, maximum-likelihood and maximum-parsimony algorithms. The stability of the topology was confirmed by performing bootstrap analyses based on 1000 replications (Felsenstein 1985).

Genome sequencing and function analysis
The genomic DNA of strain E8 T was extracted from a culture grown in MB (Becton Dickinson) for 48 h using a Bacteria DNA Kit (Omega) according to the manufacturer's recommendations and then the purified DNA was sent to Beijing Novogene Bioinformatics Technology Co., Ltd. The sequencing library was prepared using NEBNext ® Ultra ™ DNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer's recommendations and sequenced using the pair-end 350 bp sequencing protocol on the Illumina PE150 platform. To obtain clean data, raw sequencing data were generated using readfq (Version 10) at first and Illumina base-calling software CASAVA v1.8.2 (http:// www. suppo rt. illum ina. com/) according to its corresponding manuscript. All good quality paired reads were assembled using SOAPdenovo software (Li et al. 2008). To make sure the authenticity of 16S rRNA, the complete 16S rRNA gene sequence which was accessed from the draft genome using the RNAmmer 1.2 server (http:// www. cbs. dtu. dk/ servi ces/ RNAmm er/) had been compared with PCR amplification. The G + C content of the chromosomal DNA was calculated using genome sequence. The genome component prediction was conducted with GeneMarkS program (Besemer et al. 2001), tRNAscan-SE software (Version 1.3.1) (Lowe and Eddy 1997), rRNAmmer software (Version 1.2) (Lagesen et al. 2007), Rfam database (Gardner et al. 2009), Island-Path-DIOMB program (Hsiao et al. 2003), PHAST software (Version 2.3) (Zhou et al. 2011) and CRISPRFinder (Grissa et al. 2007) to predict the related coding genes, tRNA, rRNA, snRNA, Genomics Islands, prophage and Clustered Regularly Interspaced Short Palindromic Repeat Sequences (CRISPR), respectively. Then, a whole genome Blast search (E value less than 1e-5, minimal alignment length percentage larger than 40%) was performed to predict gene function by using Gene Ontology (GO) (Ashburner et al. 2000), Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al. 2015), Clusters of Orthologous Groups (COG) (Galperin et al. 2015), Rapid Annotations using Subsystem Page 3 of 13 586 Technology (Rastekenari et al.) server (Aziz et al. 2008) and Transporter Classification Database (TCDB) (Saier et al. 2014). To further detect the taxonomic relationship between the strain E8 T and within members in the genus Marinomonas, phylogenomic analyses based on genomes inferred bac120 marker set via GTDB-Tk (Chaumeil et al. 2020) were reconstructed by using FastTree (Price et al. 2010) with JTT + CAT parameters and performed using IQ Tree (Trifinopoulos et al. 2016) with the LG + F + I + G4 model and 1000 bootstrap replicates.

Morphological, physiological, and biochemical analysis
The morphological and physiological features of strain E8 T were examined with cells grown on MA at 28 °C for 48 h. Cell morphology, size and the presence of flagella were examined by light microscopy (E600, Nikon) and transmission electron microscopy (JEM-1200; JEOL) at the State Key Laboratory of Bio-Fibers and Eco-Textiles (Qingdao University, China). Gram staining was performed as the method described previously (Smibert and Krieg 1994). Gliding motility was tested in marine broth 2216 (MB; BD) supplemented with 0.3% agar and motility was examined using the hanging-drop method according to the methods of Bernardet et al. (2002). Cells of strain E8 T were incubated on MA for 14 days with or without 0.1% (w/v) KNO 3 in an anaerobic jar to determine the growth under anaerobic conditions (10% H 2 , 10% CO 2 and 80% N 2 ). Growth ranges and optimum temperature were indicated on MA at 4, 10, 15, 20, 25, 28, 30, 33, 37, 40 and 45 ℃. NaCl range for growth were tested in the following medium (per liter: 1 g yeast extract, 5 g peptone, 0.1 g ferric citrate), supplying with artificial seawater (per liter: 3.2 g MgSO 4 , 2.2 g MgCl 2 , 1.2 g CaCl 2 , 0.7 g KCl, 0.2 g NaHCO 3 ) containing 0, 0.5, 1,2,3,4,5,6,7,8,9 or 10% (w/v) NaCl at 28 ℃, and recorded the colony growth every 12 h. The pH range for growth was tested in marine broth 2216 (MB; BD) with addiction of appropriate buffers, including MES (pH 5.5 and 6.0), PIPES (pH 6.5 and 7.5), HEPES (pH 7.5 and 8.0), Tricine buffer (pH 8.5) and CAPSO (pH 9.0, 9.5 and10.0), at a concentration of 20 mM and monitored the growth of bacterial using a spectrophotometer at 600 nm. The oxidase activity and catalase activity were assessed by using an oxidase reagent kit (bioMérieux) according to the manufacturer's instructions and the production of bubbles after the addition of a drop of 3% (v/v) H 2 O 2 . The nitrate reduction and degradation of agar, starch, cellulose, alginate, casein and lipids (Tween 20, 40, 60 and 80) were examined according to the methods described by Tindall et al. (2007). Other physiological or biochemical tests were investigated by using the API 20E, API ZYM and API 50CHB Kits (all from BioMérieux, France) and Biolog Gen III microPlates according to the manufacturer's instructions except that salinity was adjusted to 3%. All tests were carried out simultaneously with related type strains at least twice. Susceptibility to antibiotics was tested by measuring the size of the inhibition zone generated by a different drug-sensitive paper on MA at 28 °C for 3 days. A cell suspension (0.5 McFarland standard) was swabbed over MA to create a uniform lawn before aseptic placement of antibiotic discsonto the surface. CLSI standards were strictly followed for cultivation and inhibition zone diameter reading (CLSI 2012).

Chemotaxonomic characterization analysis
To analyse the characterization of cellular fatty acids, respiratory quinone and polar lipid compositions, cells of strain E8 T and two related type strains were harvested from MB medium after growth at 28 °C for 3 days and subjected to freeze-drying. The cellular fatty acids were saponified, methylated and extracted according to the standard protocols of MIDI (Sherlock Microbial Identification System, version 6.0B) protocol equipped with an Agilent HP6890 gas chromatograph (Sasser 1990). The designations and percentages of fatty acids were identified with the TSBA40 database using MIS standard software. The respiratory quinones were extracted from 300 mg freeze-dried cell material and separated by TLC after extraction with a chloroform/methanol (2:1, v/v) mixture (Tindall et al. 2007). The content of each quinone type was subsequently analyzed by HPLC according to the method previously described (Kroppenstedt 1982). The polar lipids were extracted according to the procedure described by Minnikin et al. (1984) and separated with solutions chloroform/methanol/water (65:25:4, by vol) for the first dimension and chloroform/methanol/acetic acid/water (80:12:15:4, by vol) for the second dimension via twodimensional silica-gel TLC. The lipid material was stained using 10% molybdatophosphoric acid solution (total lipids), molybdenum blue solution (phosphates), α-naphthol sulfuric solution (carbohydrates) and ninhydrin (amines), respectively. The specific experimental process is implemented according to the method described by Tindall et al. (2007).

Comparative genome analysis and pan-genome analysis
Comparative genomics of strain E8 T and other species in the genus Marinomonas was performed using 29 reference genomes available in the GenBank database. The general features of the genomes of strains were obtained from the result of NCBI Prokaryotic Genome Annotation Pipeline (PGAP). To identify the genomic distance between each genomes, the MASH distance was computed by using MASH (Ondov et al. 2016). The pairwise whole genome comparisons of average nucleotide identity (ANI) and average amino acid identity (AAI) were calculated according to Konstantinidis and Tiedje using the scripts (http:// enve-omics. gatech. edu/). The digital DNA-DNA hybridization (dDDH) values were determined by using the Genome-to-Genome Distance Calculator (GGDC2.1) (http:// ggdc. dsmz. de).
The protein sequences were clustered and compared by Cd-hit (Li and Godzik 2006) and aligned using MAFFT method (Katoh et al. 2002). The protein sequences were conserved into corresponding codon alignments and automated alignment trimming by using trimAl (Capella-Gutiérrez et al. 2009). Then, the single-copy protein tree was constructed by FastTree (Price et al. 2010). To identify conserved single-copy gene families, the genomes of strain E8 T and 29 reference strains were analysed using OrthoFinder (Emms and Kelly 2019), following the default parameters and constructed a phylogenetic tree of singlecopy Orthologue proteins by FastTree (Price et al. 2010).
The pan-genome analysis was conducted by using BPGA (Bacterial Pan Genome Analysis tool) (Chaudhari et al. 2016) to identify the core, accessory and unique genes. The pipeline generated a phylogenetic tree based on pan-matrix data without outgroup used for the pan-genome tree for better evidence of the relationship among Marinomonas species. Furthermore, the OrthoVenn2 web server (Xu et al. 2019) was performed to analysed the protein sequences for comparison and annotation of the orthologous clusters of the strains E8 T  To further compare the genomic functions and metabolism, Prokka (Seemann 2014) was used to annotate all 30 reference genome sequences. The COG database (Galperin et al. 2015) was used to determine the clusters of orthologous groups of proteins and predict functions and the results of each strain were clustered into a relative abundance heatmap. The COG annotation was selected for E value below 10 −5 . The genomes of the reference strains were analysed using antiSMASH version 6.0 (Blin et al. 2021) with "strict" detection criteria to predict the biosynthetic gene clusters (BGCs) enabled and default parameters. The results of BGCs predictions were generated by a stack graph depicting the number of each BGC type using the Rstudio packages "ggplot2" (Villanueva and Chen 2019). The BAGEL4 web server (van Heel et al. 2018) was used to further mine the potential RiPPs and bacteriocins across all reference strains in genus Marinomonas.

Phylogenetic and phylogenomic analyses
A nearly full-length 16S rRNA gene sequence (1425 bp) of strain E8 T obtained from PCR amplification was included in the 16S rRNA gene sequences assembled from genomic sequences which contained only one complete 16 s rRNA. The BLAST search in NCBI revealed that strain E8 T exhibited highest similarities with M.vulgaris A79 T (98.4%) and M.pontica 46-16 T (97.5%). The similarity between strain E8 T and Marinomonas communis JCM 20766 T , the type species of the genus Marinomonas, was 94.9%. As the topology is based on a maximum-likelihood phylogenetic tree of 16 s rRNA genes, strain E8 T was clustered with M.vulgaris A79 T at a bootstrap formed a separated branch (Fig. 1). The neighbour-joining and maximum parsimony trees also support this branch with high bootstrap values ( Fig. S2 and Fig. S3). In addition, the phylogenomic tree based on the genomic sequences with IQ Tree showed the clade formed by strain E8 T and M. vulgaris A79 T could distinguished from other members in the genus and confirming the strain E8 T was a member of the genus Marinomonas (Fig. 2).

General genomic features
The sequence of the draft genome of strain E8 T was assembled into 79 contigs with a total length of 3,285,337 bp, a contig N50 value of 126,031 bp, a contig L50 value of 8 and a mean coverage of 150×. The longest contig and the shortest contig were 379,893 bp and 618 bp. The DNA G + C content was 45.1 mol%, which was in the middle of most of the related species. A total of 3085 genes were predicted with 2988 protein-coding genes and 97 encode RNAs including 8 5S rRNAs, 4 16S rRNAs, 4 23S rRNAs, 77 rRNAs and 4 ncRNAs. The genome sequence of strain E8 T included 31 Long interspersed nuclear elements (LINEs), 23 Short interspersed nuclear elements (SINEs), 4 Genomics Islands (GIs), 2 CRISPR-associated genes. Further comparative general features of 30 genome sequence were shown in Table S2.
According to the genomic functions predicted, the strain E8 T has been annotated with complete glycolysis, gluconeogenesis, citrate cycle (TCA cycle) and pentose phosphate pathway for central carbohydrate metabolism and UDP-N-acetyl-D-glucosamine biosynthesis, which was same as M.vulgaris A79 T and M.communis JCM 20766 T . The succinate dehydrogenase, cytochrome bd ubiquinol oxidase, cytochrome o ubiquinol oxidase and F-type ATPase were predicted in strain E8 T and M. vulgaris A79 T but the lack of cytochrome c oxidase and cytochrome bc1 complex respiratory unit which presented in JCM 20766 T genome. The ackA gene was absented in strains E8 T and M. vulgaris A79 T genomes which were incapable to convert acetyl-CoA into acetate as a carbon fixation pathway completed in JCM 20766 T genome. The sulfate reduction ability was predicted in M. vulgaris A79 T with cysNC,cysN,cysD,cysNC,cysC,cysH,cysJ and cysI genes to convert sulfate into sulfide but the cysC gene was the absence in strain E8 T genome which may just convert sulfite into sulfide. The strain E8 T was predicted to metabolism many amino acids including serine, threonine, valine, isoleucine, leucine, lysine, ornithine, arginine, proline and tryptophan. For the metabolism of cofactors and vitamins, biosynthesis pathway of Pyridoxal-P (EC 1.4.3.5), NAD (EC 6.3.5.1), pantothenate (EC 6.3.2.1), Fig. 2 Phylogenomic tree reconstructed by IQtree showing the position of strains E8 T and related strains of the family Oceanospirillaceae with reference genome sequences. Gen-Bank accession numbers were given in parentheses. Bootstrap values (expressed as percentages of 1000 replications) above 70% were shown at branch points. Zobellella endophytica 59N8 T (GenBank accession number PXYG00000000) was used as an outgroup. Bar, 0.2 substitutions per nucleotide position coenzyme A (EC 6.3.2.5), biotin (EC 2.8.1.6), lipoic acid (EC 2.8.1.8), molybdenum cofactor (EC 2.10.1.1), siroheme (EC 4.99.1.4), heme (EC 1.3.5.3) and cobalamin (EC 6.3.1.10) were completed in strain E8 T genome sequence. The genomes of three strains contain several genes responsible for choline uptake and betaine biosynthesis, including choline dehydrogenase (betA), betaine-aldehyde dehydrogenase (betB) and glycine betaine ABC transport system permease (ProU), as osmoprotectant. All genes for the synthesis of PG and PE have been found in the genome of strain E8 T , which were similar to other strains in the genus Marinomonas.

Chemotaxonomic characterisation
The fatty acids profiles of strain E8 T and reference strains M.pontica DSM 17793 T and M.communis JCM 20766 T were shown in Table S3. The major fatty acids (> 10%) were C 16:0 , summed feature 3 (C 16:1 ω7c and/or C 16:1 ω6c) and summed feature 8 (C 18:1 ω7c and/or C 18:1 ω6c). The summed feature 8 with the ratio of 47.8%, which was the most abundant cellular fatty acid of the stain E8 T , and was also the most abundant fatty acid of the related type strains M.pontica DSM 17793 T and M.communis JCM 20766 T in the ratio of 35.3 and 45.7%. The strain E8 T can be distinguished from other type strains in the compounds of C 17:0 with the ratio of 3.1% which composition was less than 0.5% in other type strains. The composition of fatty acid summed features 8 in strain E8 T was significantly higher than that in M.pontica DSM 17793 T but similar to that in M.communis JCM 20766 T . The polar lipids were mainly composed of phosphatidylglycerol (PG), phosphatidylethanolamine (PE) and three unidentified lipids (L1-L3), which were similar to other type strains with PG and PE as the major polar lipids (Fig. S4). The isoprenoid quinone detected in strain E8 T was consistent with the other related type strains, which was Q-8.

Comparative genome analysis
The result of the genomic distance between each genome sequence showed that strain E8 T has the closest distance with M.vulgaris A79 T and far distance with M.agarivorans QM202 T , M.flavescens ANRC-JHZ47 T and M.posidonica IVIA-Po-181 T (Fig. S5). The ANI and AAI values between strain E8 T and M.vulgaris A79 T were 78.7 and 84.3%, which is below the recommended cut-off value of 95-96% (Richter and Rossello-Mora 2009) and below the proposed cut-off for a species boundary of 85-90% (Qin et al. 2014), respectively. Strain E8 T also shared low dDDH values with M.vulgaris A79 T (22.0%), below the 70% for species boundary (Goris et al. 2007;Meier-Kolthoff et al. 2013). Moreover, the pairwise comparisons of the digital DDH values were shown in Table S4 and the pairwise genome comparisons in nucleotide level and protein translated genes (ANI and AAI values) were shown in Fig. 3, which suggested that strains E8 T represent a putative novel species of the genus Marinomonas.
The single copy core protein tree showed that strain E8 T and M.vulgaris A79 T shaped a branch with close distance with bole formed by all 30 reference strains (Fig. S6). The strains M.agarivorans QM202 T and M.algicola SM1966 T have a long branch length with bole which may portend rather distant phylogenetic relationships. The phylogenetic tree constructed by single-copy orthologue proteins also conformed to the tree (Fig. S6). According to the result of single-copy gene families, 6806 orthogroups with 104,049 genes in orthogroups and 3629 unassigned genes have been clustered and 1379 orthogroups which presented in all 30 species.
The pan-genome orthologous groups (POGs) of the 30 Marinomonas strains indicated that strain E8 T had 959 core genes, 1661 accessory genes, 219 unique genes and 10 exclusively absent genes. The plot of core-pan showed that the pan-genome in Marinomonas can be considered as "open" (Fig. S7). The pipeline generated a pan-genome phylogenetic tree without outgroup based on pan-matrix data which highlighted distinct groups and confirmed in the coregenome tree (Fig. S8, S9). According to the KEGG annotation, the strains in the genus Marinomonas showed activity in amino acid metabolism, carbohydrate metabolism, membrane transport, metabolism of cofactors and vitamins and xenobiotics biodegradation and metabolism. Unique genes abundantly distributed among amino acid metabolism, carbohydrate metabolism and xenobiotics biodegradation (Fig.  S10). The selected type strains (12 species) related to marine plant formed 5297 clusters, 3916 of which are orthologous clusters (at least contains two species) and 1381 of which are single-copy gene clusters (Table S5). Five genomes shared 1510 protein-coding regions and genome of strain E8 T encodes 2904 proteins contained 2604 in clusters and 261 which found as singletons (Fig. 4).
The strain E8 T with M.aquimarina CECT 5080 T and M.vulgaris A79 T have been observed with a low frequency of carbohydrate transport and metabolism (COG function category G) which may associate with their oligotrophic environment. Compare to strain A79 T , strain E8 T showed a relatively high count of proteins in category J, X and D which were related to genetic information transmission and transcription (Fig. 5). The proteins relevant to secondary metabolites biosynthesis were observed to present in all strains. According to the antiSMASH annotations, the abundant BGCs have been predicted in the genomic sequences of 30 strains. The genomes of the strain M.mediterranea MMB-1 T , M.spartinae CECT 8886 T , M.posidonica IVIA-Po-181 T and M.primoryensis MPKMM3633 T contained the highest bioclusters, as shown in Fig. 6. In general, the genomes of most of the strains were predicted to encode for ectoine, non-ribosomal polyketide synthetases (NRPS), betalactone and some other unspecified ribosomally synthesised and post-translationally modified peptide product

Taxonomic conclusion
Phylogenetic analysis based on 16S rRNA sequence and genomic sequence showed that strain E8 T clustered with M.vulgaris A79 T belongs to the genus Marinomonas, which was consistent with chemotaxonomic results (Table S3). However, low levels of DNA relatedness shared with closely related species based on digital analyses ( Fig. 3 and Table S4) and phenotypic characteristics (Table 1) could distinguished strain E8 T from other type strains in the genus Marinomonas. These results demonstrated that strain E8 T represent a novel species of the genus Marinomonas, for which the name Marinomonas algarum sp. nov. is proposed.
The GenBank accession number for the 16S rRNA sequence is OK444097, and the draft genome sequence of strain E8 T has been deposited at GenBank under accession number JAJATW000000000.