Description and genome analysis of Microvirga antarctica sp. nov., a novel pink-pigmented psychrotolerant bacterium isolated from Antarctic soil

A novel pink-pigmented bacterium, designated strain 3D7T, was isolated during an investigation of potential psychrotolerant species from Antarctic soil. Cells of the isolate were observed to be rod-shaped (0.7–0.9 × 1.0–2.2 µm), Gram-stain negative and non-motile. It was able to grow at 4–32 °C, pH 7.0–10.0 and in the presence of 0–3% (w/v) NaCl. Phylogenetic analysis based on 16S rRNA gene sequences showed that strain 3D7T belongs to the genus Microvirga and was most closely related to ‘Microvirga brassicacearum’ CDVBN77T (98.3%), Microvirga subterranea DSM 14364 T (96.8%), Microvirga guangxiensis 25BT (96.5%) and Microvirga aerophila DSM 21344 T (96.5%). The predominant quinone was ubiquinone 10 (Q-10), and the major fatty acids were summed feature 8 (C18:1ω7c and/or C18:1ω6c) and C19:0 cyclo ω8c. The predominant polar lipids were phosphatidylcholine and phosphatidylethanolamine. The genomic DNA G + C content of strain 3D7T was 63.5 mol%. Its genome sequence showed genes encoding phosphatases and lipases. Genetic machinery related to carbohydrate-active enzymes and secondary metabolites were also observed. The average nucleotide identity and digital DNA–DNA hybridization values based on whole genome sequences of strain 3D7T and its closely related species were below the threshold range for species determination. Phenotypic, chemotaxonomic, phylogenetic and genomic analyses suggested that strain 3D7T represents a novel species of the genus Microvirga, for which the name Microvirga antarctica sp. nov. is proposed. The type strain is 3D7T (= CGMCC 1.13821T = KCTC 72465T).


Introduction
Living in an extremely cold and oligotrophic environment, Antarctic microorganisms have formed unique physiological and biochemical properties in the long-term natural selection evolution (Niederberger et al. 2008). Many strains have genetic machinery to degrade multiple compounds as a source of nutrients and can produce low-temperature (coldactive) enzymes, antibacterial and anti-cancer active substances (Zhang et al. 2004), which are valuable in many fields including environmental engineering, agriculture biotechnology, pharmaceutical industry and enzyme industry. Accordingly, we carried out a programme to explore potential sources of psychrotolerant species from Antarctic soil, during which a putatively novel strain (3D7 T ) of the genus Microvirga was isolated. The genus Microvirga was proposed by Kanso and Patel (2003), with Microvirga subterranea as the type species. It belongs to the family Methylobacteriaceae of the order Rhizobiales. At the time of writing, the genus Microvirga contains 18 species listed on LPSN (List of Prokaryotic Names with Standing in Nomenclature: www.bacterio.net) with validly published names. They are distributed widely in various ecological habitats, such as air (Weon et al. 2010), natural, domestic, and contaminated soils (Dahal et al. 2017;Tapase et al. 2017;Zhang et al. 2009;Zhang et al. 2019a, b), geothermal water (Kanso et al. 2003), Tibet hot spring sediments (Liu et al. 2020), human stool (Caputo et al. 2016), nodules of native legumes and cowpea (Ardley et al. 2012;Radl et al. 2017;Safronova et al. 2017) and roots of rapeseed plants (Jiménez-Gómez et al. 2019). Most members of the genus Microvirga are moderately thermophilic. Some studies reported the genetic potential of strains classified within the Microvirga genus for arsenic oxidation (Tapase et al. 2017) and the production of pigments, amylolytic enzymes (Radl et al. 2017), phosphatases and exopolysaccharides (Jiménez-Gómez et al. 2019). Based on polyphasic taxonomic characterisation, we propose the description of Microvirga antarctica sp. nov., classified as a novel psychrotolerant member of the genus Microvirga with phosphatase and lipase activities. Moreover, an analysis of the sequenced genome of strain M. antarctica 3D7 T , showed genes encoding proteins with potential biotechnological or industrial applications.

Materials and methods
Isolation and culture conditions Strain 3D7 T was isolated from a soil sample collected from the surface of Deception Island (62°55 0 09'' S, 60°34 0 46'' W), Antarctica. The collected soil sample (0.9 g) was suspended in 8.1 mL sterile water and stirred for 30 min as a 10 -1 dilution solution, then diluted it to 10 -2 by gradient. 150 lL of the 10 -2 sample dilution was spread on Reasoner's 2A agar (R2A; AOBOX) medium (pH 7.5). After 7 days of incubation at 15°C, representative colonies were picked and purified by steaking repeatedly. A pinkcoloured isolate, designated strain 3D7 T , was picked up and subsequently purified by plate streaking. The purified strain was stored in 20% (v/v) glycerol suspensions at -80°C.
DNA amplification and determination of 16S rRNA gene sequence The genomic DNA of strain 3D7 T was extracted using TIANamp Bacteria DNA kits (TianGen) according to the manufacturer's instructions. The 16S rRNA gene was amplified by PCR with the bacterial universal forward primer 27F and reverse primer 1525R (Li et al. 2006). The products above were purified and sequenced by BGI (The Beijing Genomics Institute). After sequencing, the 16S rRNA gene sequence of strain 3D7 T was obtained and similarity searches were performed by using the EzBioCloud server (www. ezbiocloud.net/identify) (Yoon et al. 2017a, b). The phylogenetic tree was constructed according to the neighbour-joining (NJ) algorithm (Saitou et al. 1987) and supported by the minimum-evolution (ME) (Rzhetsky et al. 1992) and maximum-likelihood (ML) algorithms (Felsenstein 1981) in the MEGA X program (Kumar et al. 2018). Kimura's two-parameter model was used to calculate the evolutionary distances (Kimura 1980). Bootstrap values were determined based on 1000 replications (Felsenstein 1985).

Genome sequencing, assembly and function analysis
The genome of strain 3D7 T was sequenced using the Illumina HiSeq systems with paired-end sequencing technology. The sequencing data was filtered to remove the sequences containing the adaptor and the low quality data, and the obtained clean data was used for subsequent analysis. The genome assembly was performed by SOAPdenovo (version 2.04) (Li et al. 2010(Li et al. , 2008. The assembly results were submitted to the NCBI (www.ncbi.nlm.nih.gov). The function of coding genes in the assembled genome were annotated by Gene Ontology (GO) (Ashburner et al. 2000), Clusters of Orthologous Groups (COG) (Galperin et al. 2015) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (Minoru et al. 2016). The carbohydrate-active enzymes (CAZymes) were analyzed using HMMER annotation (Zhang et al. 2018), and the analysis of gene clusters related to secondary metabolites production was performed using anti-SMASH 5.0 webserver (Blin et al. 2019).

DNA-DNA hybridization and genome-based phylogenetic analysis
The genomic information of related strains of the same genus or neighbouring genera was obtained from the EzTaxon and NCBI databases. Phylogenetic tree based on the whole genome sequences of strain 3D7 T and related species was constructed using the Composition Vector (CV) approach. Average nucleotide identity (ANI) based on the BLAST algorithm (ANIb) and the MUMmer ultra-rapid aligning tool (ANIm), as well as the correlation indexes of tetranucleotide signatures (Tetra) were calculated through the website of JSpeciesWS (http://jspecies.ribohost.com/ jspeciesws/) (Richter et al. 2016). The orthoANIu values were estimated using the EzBioCloud web service (www.ezbiocloud.net/tools/ani) as described by Yoon et al. (2017a, b). Digital DNA-DNA hybridization (dDDH) was conducted using the Genome-to-Genome Distance Calculator (GGDC; version 2.1) under the recommended Formula 2 (http://ggdc. dsmz.de/distcalc2.php) provided by the DSMZ website (Meier-Kolthoff et al. 2013).

Chemotaxonomic analyses
Cellular fatty acids of strain 3D7 T and its reference strains were analysed by using colonies grown on R2A medium at 28°C for 3 days. The fatty acid methyl ester mixtures were separated and analysed using the standard protocol of the Sherlock Microbial Identification System (MIDI Sherlock software package, version 6.0) (Kämpfer et al. 1996;Sasser 1990). For analyses of quinones and polar lipids, cells were collected at the exponential phase by centrifugation, then washed three times with sterilized water and freeze-dried. Isoprenoid quinones were extracted and purified by the methods of Collins (1985) and then analysed by reversed-phase HPLC. Polar lipids were extracted from freeze-dried cells and loaded onto thinlayer silica gel 60 plates (Merck). Two-dimensional migration was performed on each plate using chloroform-methanol-water (65:25:4, by vol.) as the first solvent and chloroform-acetic acid-methanol-water (80:15:12:4, by vol.) as the second one (Minni et al. 1979;Collins et al. 1980). Total polar lipids were detected by spraying with phosphomolybdic acid solution followed by heating at 110°C for 10 min. Aminolipids were detected by spraying the plate with a 0.4% (w/v) solution of ninhydrin in butanol saturated with water followed by heating at 105°C for 10 min. Phospholipids were detected by spraying with the reagent of Dittmer and Lester.

Phylogenetic characteristics
The nearly complete 16S rRNA gene sequence of strain 3D7 T (1396 bp) was determined and compared with the corresponding sequences. It shared the highest 16S rRNA gene similarity to 'Microvirga brassicacearum' CDVBN77 T (98.3%), followed by Microvirga subterranea DSM 14364 T (96.8%), Microvirga guangxiensis 25B T (96.5%), and Microvirga aerophila DSM 21344 T (96.5%). The NJ analyses ( Fig. 1) showed that strain 3D7 T shared a branching node with 'M. brassicacearum' CDVBN77 T , which was highly consistent with ME tree (Fig. S1) and ML tree (Fig. S2). It was clear that strain 3D7 T was a member of the genus Microvirga.

Genome composition and DNA-DNA hybridisation
The draft genome sequence of strain 3D7 T was 4,457,992 bp in length with 31 contigs. The coverage, N50 and DNA G ? C content were 410 9 , 431,466 bp and 63.5 mol%. The genome data met the proposed minimal standards for the use of genome data for the taxonomy of prokaryotes (Chun et al. 2018). The genome had 4321 protein-coding genes and 49 RNAs (Table 1). Genomic analyses showed that strain 3D7 T and 'M. brassicacearum' CDVBN77 T yielded ANIb and dDDH values of 77.5% and 22.2%, respectively. The ANI values between strain 3D7 T and other species of the genus Microvirga were detailed in Table 1, which were all below standard criteria for classifying strains as different species (95-96%) (Kim et al. 2014). The dDDH values between strain 3D7 T and other species of the genus Microvirga were also detailed in Table 1, and they were far below the 70% cut-off value generally recommended for species differentiation (Wayne et al. 1987). A genome-based phylogenetic tree was included in Fig. S3, which showed that strain 3D7 T was affiliated to the genus Microvirga. These data confirmed that strain 3D7 T represented a novel species of the genus Microvirga.

Genome features and function prediction
Gene Ontology database analysis results of strain 3D7 T reflected a complex metabolic and regulatory network: 1022 genes were related to biological processes, accounting for about 23.7%; 981 genes were related to cell components, accounting for about 22.7%; 2,427 genes were related to molecular functions, accounting for about 56.2% (Fig. S4). Among the 20 general COG functional categories, the detailed distribution of genes was as follows: Amino acid transport and metabolism, 497 genes; Inorganic ion transport and metabolism, 293 genes; Energy production and conversion, 204 genes; Transcription, 202 genes; Carbohydrate transport and metabolism, 200 genes. Detailed information of the COG functional categories was presented in Fig. S5. KEGG metabolic pathways were classified according to the relationship between KO (KEGG ORTHOLOGY) and Pathway. Functional annotation of genes by comparisons against the manually curated KEGG GENEs database revealed that there were 62 genes related to the biosynthesis of other secondary metabolites, 122 genes related to the biodegradation and metabolism of xenobiotics, 374 genes related to the metabolism of carbohydrates and 45 genes related to the metabolism of terpenoids and polyketides (Fig. S6). The genome annotations showed genes encoding for proteins with phosphatase activity, such as the enzymes alkaline phosphatase (EC 3.1.3.1), acid phosphatase (EC 3.1.3.2), inorganic triphosphatase (EC 3.6.1.25) and pyrophosphatase (EC 3.6.1.1). Some enzymes involved in the production of triglyceride lipases, including lysophospholipase (EC 3.1.1.5) and unidentified phospholipase were also observed. These capabilities were tentatively proven in physiological tests, with potential applications in the agriculture biotechnology, washing industry and low-temperature environment remediation.
Analysis of the genome sequence of strain 3D7 T showed 132 genes encoding different CAZymes in six different classes: glycoside hydrolases (GHs), enzymes that catalyze the hydrolysis of glycosidic linkage of glucoside-27 gene counts; glycosyltransferases (GTs), involved in the formation of glycosidic bonds-47 gene counts; carbohydrate esterases (CEs), which hydrolyze carbohydrate esters-35 gene counts; auxiliary activities (AAs), redox enzymes that act in conjunction with CAZymes-20 gene counts; polysaccharide lyases (PLs), which perform nonhydrolytic cleavage of glycosidic bonds-2 gene  (Table S1). AntiSMASH output revealed four biosynthetic gene clusters (BGCs) involved in the secondary metabolism of the bacterium. One of those clusters encodes terpene BGC, which is related to the synthesis of isoindolinomycin. Other clusters encode an arylpolyene, a homoserine lactone and a terpene BGC that are not described for the production of an already known molecule. These genetic characteristics indicated that strain 3D7 T may have biotechnological potential for the degradation of biomass and the pharmaceutical industry.

Chemotaxonomic characteristics
The major cellular fatty acids of strain 3D7 T ([ 10%) were summed feature 8 (C 18:1 x7c and/or C 18:1 x6c) (36.2%) and C 19:0 cyclo x8c (21.7%), which was similar to that of closely related species of the genus Microvirga. Minor qualitative and quantitative differences could be used to distinguish strain 3D7 T from the closest relatives of the genus Microvirga. Compared with 'M. brassicacearum' CDVBN77 T , strain 3D7 T possessed higher amounts of C 16:0 , summed feature 2 (C 14:0 3-OH and/or iso-C 16:1 I) and summed feature 3 (C 16:1 x6c and/or C 16:1 x7c), and lower amounts of C 18:0 , C 18:0 3-OH, C 18:1 x7c 11-methyl and feature 8 (C 18:1 x7c and/or C 18:1 x6c). C 14:0 and C 17:0 cyclo were detected in strain 3D7 T , but not detected in 'M. brassicacearum' CDVBN77 T (Table 3). The predominant respiratory quinone of strain 3D7 T was Q-10, which was in good agreement with other species of the genus Microvirga. The polar lipids of strain 3D7 T consisted of phosphatidylcholine and phosphatidylethanolamine as the major component, plus one unidentified aminophospholipid, two unidentified amino lipids and three unidentified lipids (Fig. S8). Strain 3D7 T shared the same major polar lipids with most of the described species of the genus Microvirga.  In conclusion, all phenotypic, chemotaxonomic, phylogenetic and genomic analyses suggested that strain 3D7 T should be considered to represent a novel species of the genus Microvirga, for which the name Microvirga antarctica sp. nov. was proposed.
Description of Microvirga antarctica sp. nov.
Microvirga antarctica (ant.arc'ti.ca. L. fem. adj. antarctica southern, pertaining to the Antarctica, where the type strain was isolated).
The type strain, 3D7 T (= CGMCC 1.13821 T-= KCTC 72465 T ), was isolated from a soil sample collected from Deception Island, Antarctica. The GenBank/EMBL/DDBJ accession number for the 16S rRNA gene sequence of strain 3D7 T is MH561859. This Whole Genome Shotgun project of strain 3D7 T has been deposited at DDBJ/ENA/ GenBank under the accession number JAGEMM000000000.
Author's contributions ZJL and ZL designed research and project outline. ZL and ZY performed isolation, deposition and polyphasic taxonomy. ZL, CY and PWW performed genome analysis. ZL, PWW and ZSY drafted the manuscript. ZJL revised the manuscript. All authors read and approved the final manuscript.
Funding This research was supported by the National Key Research and Development Program of China (2016YFC0501302) and by the National Natural Science Foundation of China (NSFC, Grant No. 31070002).

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors. Features are fatty acids that cannot be resolved reliably from another fatty acid using the chromatographic conditions chosen. The MIDI system groups these fatty acids together as one feature with a single percentage of the total. Summed feature 2 comprised C 14:0 3-OH and/or iso-C 16:1 I; summed feature 3 contained C 16:1 x7c and/or C 16:1 x6c; summed feature 8 contained C 18:1 x7c and/or C 18:1 x6c