Comparative Genome Analysis of Carbohydrate-Active Enzymes and Virulence Factors in Lichen-Associated Variovorax Sp. PAMC28711


 Background: The genus Variovorax sp. PAMC28711 is a cold-adapted microorganism, isolated from Antarctica lichen Himantormia. The complete genomes of six Variovorax species were analyzed and compared along with the strain PAMC28711. The genomic information was collected from NCBI as well as PATRIC databases. Likewise, CAZyme annotation (dbCAN2 Meta server) was performed in order to predict the CAZyme family responsible for trehalose synthesis and trehalose degradation enzymes. The trehalose metabolic pathway was analyzed via the KEGG database. Bioinformatics tools such as OrthoANI software were used to analyze similar genes in different strains under the same genus. Likewise, MEGA X was used for evolutionary and conserved genes.Results: The complete genome of genus V. sp. PAMC28711 was found to comprise CAZyme families GH (10), GT (9), CB (1), AA5 (1), and CE (1). The three trehalose synthetic pathways (OtsA/OtsB, TS, and TreY/TreZ) and trehalose degradation pathway (TreF) were identified only in V. sp. PAMC28711 among the different strains of Variovorax studied, whereas one to two pathways of trehalose biosynthesis, but not trehalose degradation pathways are involved in other Variovorax strains. The strain PAMC28711 comprises of cytoplasmic trehalase (TreF) as a trehalose degrading enzyme that belongs to the CAZyme family GH37, which is not identified in other strains of Variovorax.Conclusions: To date, although the genus V. sp. PAMC28711 has not been reported to exhibit CAZyme activities such as trehalase, and no microorganism expressed different virulence factors, the results based on PATRIC database showed that the strain carried a few virulence genes. Further, this study provides additional information regarding trehalase as one of the factors facilitating bacterial survival under extreme environments and this enzyme has showed potential application in biotechnology fields.


Background
The genus Variovorax is a Gram-negative and motile bacterium belonging to the family Comamonadaceae [1] that is found in straight to slightly curved or rodshaped form. Due to the presence of carotenoid pigments, the genus Variovorax exhibits yellow, slimy and shiny colonies. Many strains belonging to family Comamonadaceae thrive in polluted environments and degrade complex organic compounds [2]. The genus Variovorax generally inhabits soil and water [3]. Variovorax sp. PAMC28711 was isolated from Himantormia sp., Antarctic lichen collected from the Barton Peninsula, King George Island, Antarctica [4].
Antarctica is the largest continent, which is approximately the size of Europe. Therefore, several additional and extreme locales such as regions of volcanic activity, hypersaline lakes, subglacial lakes, and even the ice itself harbor speci c extremophiles [5]. Therefore, numerous microorganisms have speci cally adapted to a wide range of extreme environments to survive under novel biodiversity, much of which has yet to be elucidated [3]. Another key feature of the Antarctica ecosystem is the extreme variation in the physical conditions ranging from freshwater lakes (some of the most oligotrophic environments on Earth) to hypersaline lakes [6]. Microorganisms found under extreme environmental conditions like Antarctica are ideal candidates for the study of eco-physiological and biochemical adaptations of such life forms [5]. Antarctica is one of the most physically and chemically challenging terrestrial environments for habitation [7]. Habitats with permanently low temperature dominate the temperate biosphere and have been successfully colonized by a wide variety of organisms that are collectively termed psychrophiles or cold-adapted organisms [8]. In particular, the lichens are generally de ned by mutualistic symbiosis between fungi and algae (Chlorophyta or Cyanobacteria); however, they also contain internal bacterial communities [9]. Bacteria associated with lichens were initially reported in the rst half of the 20th century [10]. The lichen-associated microorganism was reported to carry genes involved in the degradation of polymers [11].
Carbohydrate-active enzymes (CAZymes) belong to a large class of enzymes that are involved in the breakdown of complex carbohydrates in the cell. Based on their amino acid sequences, they are classi ed into families with conserved catalytic mechanism, structure, and active site residues, but differing in substrate speci city [12]. They are responsible for carbohydrate synthesis through glycosyltransferases (GTs), degradation of complex carbohydrates via glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), and enzymes for auxiliary activities (AAs) and recognition (carbohydrate-binding module, CBM) [13]. The CAZymes represent a continuously updated list of GH families [12]. The GHs are the largest family of CAZymes that hydrolyze the glycosidic bond between two or more carbohydrates, or between carbohydrate and non-carbohydrate moieties, via overall inversion of anomeric carbon [14].
Virulence is a microbial property that is observed only in susceptible hosts. Virulence is not absolute, and is always measured relative to a standard, usually another microbe or host [15]. Virulence factors are microbial gene products with potential to cause disease within the parasite as well as the host. Virulence factors are bacterial toxins, cell surface proteins that mediate bacterial attachment, cell surface carbohydrates, proteins that protect bacteria, and hydrolytic enzymes that may contribute to bacterial pathogenicity (VFanalyzer; http://www.mgc.ac.cn/VFs/main.htm) [16]. Some of the identi ed virulence factors facilitate physiological and metabolic adaptation of the bacteria in adverse environments [17].
In this study, the lichen-associated cold-adapted aerobic bacterium V. sp. PAMC28711 belonging to family Comamonadaceae was selected. Most of the Variovorax species are agellated and motile. Other species of Variovorax were isolated from the soil, with optimum growth under mesophilic temperature. By contrast, V. sp. PAMC28711, isolated from the Antarctica, can tolerate temperature variation. Although the CAZyme has been studied in various microorganisms, it has yet to be reported in Variovorax. This study was thus carried out to compare the role of CAZyme families in the complete genome of Variovorax species. In addition to the comparative study, a virulence factor in the six species of genus Variovorax was compared and analyzed in silico to identify the enzymes involved in bacterial virulence.

Phylogenetic tree and average nucleotide identity (ANI)
Phylogenetic tree of the 16S rRNA gene sequences in the complete genome of genus Variovorax was constructed using the neighbor-joining method, and the bootstrap values at the branch points are shown. Average nucleotide identity in the complete nucleotide sequences of genus Variovorax was analyzed using the OrthoANI software [34].

Comparative genome analysis
All six strains of the complete Variovorax genome were analyzed using bioinformatics tools, such as CAZyme annotation (dbCAN2 meta server; http://bcb.unl.edu/dbCAN2/). Each genome was annotated using DIAMOND, HMMER, and Hotpep via CAZy, dbCAN, and PPR databases [35]. The dbCAN2 meta server allows submission of nucleotide sequences for prokaryotic and eukaryotic genomes, although protein sequences are preferred. This server uses three tools including DIAMOND (for fast blast hits in the CAZy database), HMMER (for annotated CAZyme domain boundaries according to dbCAN CAZyme domain HMM database), and Hotpep (for conserved short motifs in the PPR library). In order to analyze the trehalose metabolic pathway of V. sp. PAMC28711, the Kyoto Encyclopedia of Genes and Genomics (KEGG) pathway database was used [36,37]. In addition, the PATRIC database (https://patricbrc.org/) was used for genomic information, as well as for the number of virulence factors involved in the respective strains. PATRIC database generally uses the virulence factor database (VFDB), Victors, and PATRIC_VF. The VFDB is an integrated and comprehensive online resource for curating information about the virulence factors of bacterial pathogens [32].

Results And Discussion
General information of the complete genome of genus Variovorax V. sp. PAMC28711 accounts for 4,316,152 bp and a GC content of 65.97% that is less than that of the other complete genomes of Variovorax strains. Table 1 summarizes the general genomic information of all six Variovorax strains including GC content (percentage), chromosome number, contigs, and genome length. Likewise, Table 2 provides a comparative summary of isolation source, isolated information, host information, and phenotype information of the six complete genomes of Variovorax species.  Table 2 Comparison of isolation source, isolated information, host information, and phenotype information of six complete genomes belonging to Variovorax species.
Six complete genomes of Variovorax species and their phenotype information. PAMC 28711 = V. sp., EPS = V. paradoxus, S110 = V. paradoxus, B4 = V. paradoxus, J1 = V. boronicumulans, PMC = V. sp., and N/A = Non-available. Phylogenetic and ANI analysis within the genus Variovorax A phylogenetic tree was constructed using the 16S rRNA gene sequence of Variovorax strains via neighbor-joining method [18]. The branches show the relationship of the species in the genus Variovorax (Fig. 1A). The percentage of replicate trees in which the associated taxa cluster together in the bootstrap test (1,000 replicates) are shown next to the branches [19]. The evolutionary distances were computed using the Maximum Composite Likelihood method [20], and are expressed by the number of base substitutions per site. This analysis involved six nucleotide sequences of Variovorax strains. Evolutionary analyses were conducted in MEGA-X [21]. ANI analysis was conducted with the complete genome sequence of six strains of Variovorax (Fig. 1B). ANI varied in the range of 81.10-93.35% among the six strains of Variovorax. The strain PAMC28711 is identical with other strains in the range of 81.10-81.68%, which is less than in other strains. Likewise, V. paradoxus B4 and V. paradoxus S110 show high identity, i.e., 95.35% (  (Tables S2 and S3 of the Supplementary Information). Among all six strains analyzed, almost similar number of GHs and GTs (10 and 9, respectively) occur in V. sp. PAMC28711 (Fig. 2). V. sp. PAMC28711 carries three pathways of trehalose synthesis (TPS/TPP, TS, and TreY/TreZ) ( Table 3). The enzymes involved in TPS/TPP pathways include trehalose 6-phosphate and trehalose 6-phosphate phosphatase constituting GT20 CAZyme subfamilies (Table 4). Additionally, the TS pathway comprises a trehalose synthase enzyme that belongs to the GH13 CAZyme subfamily. This TS pathway is reversible and includes both the biosynthesis and degradation of trehalose from maltose ( Table 4). The CAZyme subfamily GH37 (trehalase, EC 3.2.1.28) (Figs. 3A and C) was present in V. sp. PAMC28711, but not in the other Variovorax strains used for comparison. In addition, trehalase GH37 in PAMC28711 predicted using the CAZyme database was found to be a cytoplasmic trehalase based on the results of Rapid Annotations using Subsystems Technology (RAST) annotation [22]. Figure 3B and Table S2 (Supplementary Information) show the overall CAZyme subfamilies present in the six different strains of Variovorax. There are several alternative pathways for the degradation of trehalose [23]. Interestingly, bacterial trehalases are not as widely distributed as the trehalose biosynthetic pathway, since trehalose-6-phosphate synthases/phosphatases (TPSs/TPPs) occur in diverse living forms, ranging from micro-to macro-organisms [24]. The enzymes involved in trehalose degradation include alpha,alpha-trehalose phosphorylase (EC 2.4.1.64) and alpha,alpha-trehalase (EC 3.2.1.28). E. coli strain K12 contains two trehalases (cytoplasmic trehalase TreF and periplasmic trehalase TreA) [25]. TreF was predicted via the KEGG pathway map in V. sp. PAMC28711. TreF is the enzyme responsible for the degradation of the disaccharide alpha,alpha-trehalose yielding two glucose subunits [26]. The enzyme exists in a wide variety of organisms, and its sequence is highly conserved throughout evolution [27].

Virulence factor in genus Variovorax
Virulence factors are gene products that let bacteria colonize on or within a host organism, resulting in enhanced risk of disease [32]. Based on the PATRIC results (Fig. 6), all the different strains of genus Variovorax carry virulence factors. Among six Variovorax strains, V. paradoxus S110 and V. boronicumulans J1 carry a higher number of virulence factors, when compared with the other strains. Lee et al. [33] reported a similar result for V. paradoxus S110 (GenBank CP001635.1), which is consistent with the study of virulence genes in oil-contaminated seawater. V. sp. PAMC28711, V. paradoxus B4, and V. sp. PMC12, which was ranked second in the highest number of virulence factors ( Table 6). The PATRIC database shows virulence factors by integrating three virulence factor databases: VFDB, PATRIC VF (PATRIC virulence factor), and Victors virulence factors. Based on the results of PATRIC database (Table 6) and Victors, the virulence factor database showed a higher number of virulence genes compared with the PATRIC VF and VFDB databases. Based on the results obtained from the PATRIC database, all six strains share the virulence factor, RNA-binding protein Hfq gene.

Conclusions
In this study, the complete genome of genus Variovorax, strain PAMC28711 was compared with that of ve other strains: EPS, S110, B4, J1, and PMC12. A comparative analysis of the obtained genome showed that only strain PAMC28711 carries three metabolic pathways of trehalose (trehalose biosynthesis pathways; TPS/TPP, TS, and TreY/TreZ) as well as the trehalose degradation pathway, TreF. The trehalose degradation pathway includes a trehalase enzyme, which belongs to the CAZyme subfamily GH37, and is only involved in strain PAMC28711. Based on the results of AZCL screening, the strain PAMC28711 thrived at 25 °C, even though it was isolated from cold-adapted lichen. Signi cantly, the strain PAMC28711 has the potential to survive in diverse temperatures ranging from psychrophilic to mesophilic habitats, which explains the role of trehalose metabolism in this strain. In addition, this nding suggests that even when it was isolated from a polar region, the strain PAMC28711 survived temperature variation, which explains the existence of different pathways for trehalose synthesis in this strain. Among the six strains, the strain PAMC28711 shows one of the highest numbers of virulence proteins involved. The results show the limitations of bioinformatics tools used in this study for genome analysis, even though they are popular databases available online. The nding indicates that even though bioinformatics tools are essential for prediction or prognosis, they are not completely reliable. The predicted results can only be validated through experimental approach. The preliminary comparative study of Variovorax suggests the need for additional investigations into V. sp. PAMC28711 in the future.   Virulence factors present in various strains of the complete genome of Variovorax species based on the PATRIC database.