Comparative genomic analysis of Stenotrophomonas maltophilia unravels their genetic variations and versatility trait

Stenotrophomonas maltophilia is a species with immensely broad phenotypic and genotypic diversity that could widely distribute in natural and clinical environments. However, little attention has been paid to reveal their genome plasticity to diverse environments. In the present study, a comparative genomic analysis of S. maltophilia isolated from clinical and natural sources was systematically explored its genetic diversity of 42 sequenced genomes. The results showed that S. maltophilia owned an open pan-genome and had strong adaptability to different environments. A total of 1612 core genes were existed with an average of 39.43% of each genome, and the shared core genes might be necessary to maintain the basic characteristics of those S. maltophilia strains. Based on the results of the phylogenetic tree, the ANI value, and the distribution of accessory genes, genes associated with the fundamental process of those strains from the same habitat were found to be mostly conserved in evolution. Isolates from the same habitat had a high degree of similarity in COG category, and the most significant KEGG pathways were mainly involved in carbohydrate and amino acid metabolism, indicating that genes related to essential processes were mostly conserved in evolution for the clinical and environmental settings. Meanwhile, the number of resistance and efflux pump gene was significantly higher in the clinical setting than that of in the environmental setting. Collectively, this study highlights the evolutionary relationships of S. maltophilia isolated from clinical and environmental sources, shedding new light on its genomic diversity.


Introduction
The genus Stenotrophomonas is ubiquitous in diverse ecological environments, belonging to the order Xanthomonadales in the phylum Proteobacteria (Brooke 2021). Stenotrophomonas maltophilia is a Gram-negative, non-spore-bearing, rod-shaped bacterium with a high GC content. S. maltophilia includes numerous recognized strains existing in soil, water, human, and animal feces, and other circumstances, which distinctly exhibits their super environmental adaptation (Choi et al. 2018). Some S. maltophilia isolates were reported to being capable of degrading alkanes, aromatics, and heavy metals, which play an important role in the degradation of pollutants and environmental remediation (Xiong et al. 2020) (Venkidusamy and Megharaj 2016) (Jauhari et al. 2014). For example, S. maltophilia strain W18 was isolated from crude oil-contaminated soil as a typical PAH-degrading strain of this species (Xiao et al. 2021). As for isolates from clinical sources, the transfer of antibiotic resistance genes (ARGs) and virulence factors (VFs) has been reported to occur frequently in recent literatures (Cruz-Cordova et al. 2020). However, the defined phenotypic properties and genomic analysis of this species related to its diversity and taxonomy are missing.
Genome sequencing has become a routine practice in biological research with the rapid development of highthroughput sequencing technology as well as declining costs of sequencing, and a complete sequence is super way to examine all the genetic information of an organism (Schurch et al. 2018). In this case, performing a comparative genomic analysis and construction of a pan-genome based on genome data of available bacterial strains could unravel the genetic diversity, phylogeny, evolution, and adaptability of specific bacteria since more and more genomes are openly released in the NCBI genome database. The core genome from pangenome analysis generally encoded the basic functions necessary for the common niche of the sample isolates, while the accessory genome was associated with ecological differentiation and adaptative variation (Fang et al. 2021). A literature from Tian et al. (2016) compared the pan-and core genomes of 31 Streptomyces strains and conclude that isolates from marine source possess more transporters and functional genes related to the adaptability of cold conditions. An increasing number of S. maltophilia genomes have been deposited in the NCBI database from diverse ecosystems, providing researchers to understand their adaptability and phylogeny during evolution progress.
Presently, 41 S. maltophilia isolates from different habitats together with 1 isolate (the accession number of SAMN28157417) identified by our laboratory were to evaluate genetic diversity as well as document taxonomic and phylogenetic relationships by coupling a great complete genome collection with powerful bioinformatics analysis.
Comparison analyses were performed to shed new light on evolutionary divergence in biological function and selective pressure between each component of the S. maltophilia pan-genome.

Genomic sequences and annotation of S. maltophilia isolates
Based on the genome entry deposited in NCBI database, 42 complete genome sequences of S. maltophilia were downloaded from NCBI including 1 isolate named WGB211 that was identified by our study (https:// www. ncbi. nlm. nih. gov/ nucco re/ 21787 46124). Among them, 19 strains were classified as clinical isolates, while 22 strains were classified as environmental isolates, and one isolate was unknown origin. To better understand the evolution of isolates of environmental and clinical origin, the general features of the relevant strains was described in Table S1. All predicted coding sequences (CDSs) were subjected to identification of homologous clusters by Gimmer version 3.02, and more than 300 bp CDSs were queried against Kyoto Encyclopedia of Genes and Genomes (KEGG) (http:// www. genome. jp/ kegg/) to do functional annotation (Kanehisa and Sato 2020). Genomewide tRNAs were identified using the software tRNAs can-SE v2.0, and rRNAs were determined using RNAmmer v1.2.

Determination of core and pan-genomes
The pan-genomes were described as consisting of the core genome, character genome, and accessory genome (Costa et al. 2020). Forty-two S. maltophilia narrow-feeding genomes were analyzed using bacterial pan-genome analysis (BPGA), and clustered using the search algorithm wherein the number of genomes was fitted as a function of the number of gene families based on the clustering results (Chaudhari et al. 2016). Gnuplot-4.6.6 was used to plot the characteristic curves of pan-genome and obtain core genome of S. maltophilia. The functional relationship of the fitted curve of the pan-genome was in terms of equation of "Heaps" law described as n=k × N −α (Hyun et al. 2022).
where n represents the number of genes for a given number of genomes (N) and k is a constant associated with the pangenome curve. Based on the above equation, the pan-genomic pattern is considered "open" when α < 1, while α > 1 indicates a closed pan-genome. Gene families shared among all isolates

Phylogenetic analysis and average nucleotide identity calculation
The genome sequences of the relevant S. maltophilia were downloaded in NCBI, and the rapid average nucleotide identification (ANI) software was applied to calculate the similarity of the matching regions between their genomes. Forty-one strains of S. maltophilia and WGB211 had an ANI of around 90%, so these strains were selected as the study strains, and the heat map in the R language was used to package to cluster and visualize the calculated results for the 42 S. maltophilia strains (Dai et al. 2022). The phylogenetic tree was generated by the neighbor-joining method algorithm built into BPGA based on single-copy core genes. The gene matrix was calculated using the similarity or dissimilarity of gene contributions to orthologous gene clusters. For the core genome-based phylogenetic tree, BPGA first generates a core genomic phylogenetic tree by extracting protein sequences from 20 random orthologous gene clusters (excluding paralogs). BPGA uses MUSCLE to automatically perform multiple sequence comparisons. All alignments were concatenated, and neighbor-joining phylogenetic trees were constructed (Chaudhari et al. 2016). The obtained phylogenetic trees were appropriately embellished using asyfig software (Sullivan et al. 2011).

Functional and statistical analysis
To characterize the functional classification of pan-genome, the amino acid sequences of the unique genes were queried against the Clusters of Orthologous Groups of proteins (COGs, v2021)

Fig. 3
Core genome tree of 42 S. maltophilia isolates. The bars on the right correspond to the core, accessory, and unique gene content of each strain. Nineteen environmental strains were marked in blue, one unknown strain was marked in orange, and the rest of the strains represented twenty-two clinical isolates database to perform the blast analysis (Galperin et al. 2019). Each COG category was calculated separately when a gene was assigned to more than one COG category. The difference between the environmental-derived and clinical-derived groups was evaluated by means of an independent samples t-test (SPSS 26.0) at p < 0.05 to indicate a significant difference.

Identification of resistance genes in the selected strains
The Comprehensive Antibiotic Database (CARD) was utilized as a reference database for resistance genes, and the RGI software was installed using Ubuntu Linux-84-64 for strain resistance gene identification (Jia et al. 2017). RGI software's pheatmap package was used to display the findings of the resistance gene analysis (Alcock et al. 2020).

General genomic characteristics of S. maltophilia strains
Genomic characterization of 22 clinical and 19 environmental S. maltophilia isolates was performed as shown in Fig. 1A and B. The genome size of S. maltophilia ranged from 4.4 to 5.0 Mb in the environmental-derived isolates and 4.4 to 5.1 Mb in the clinical ones. Additionally, the G + C content of environmental Fig. 4 Average nucleotide identity (ANI) plot of 42 S. maltophilia isolates. The shade of color represents the magnitude of ANI value, the closer the color was red, the higher the ANI value was about and the higher the correlation was S. maltophilia spanned a wide range of 65.00∼67.5%, and the G + C content of clinical strains was concentrated at 66.00-67.00%. The CDS content in the environmental isolates was concentrated between 3900 and 4300 and was relatively stable, whereas it was much more dispersed in the clinical-derived strains with CDS content ranging from 3800 to 4700, probably because those strains showed genetic variability and diversity in different beating hosts. No remarkable differences were observed between environmental and clinical isolates in terms of genome size, CDS number, and G+C content (Fig. 1C). Although previous studies have shown that bacterial genome size is related to the selection pressure of the survival environment (Yu et al. 2018), no significant distribution was observed in the 42 isolates, which might be related to the fact that those isolates selected in this study were from multiple biomes. A literature reported significant differences in genome size and CDS number depending on the origin of Streptomyces strains (Tian et al. 2016).

Core gene and pangenetic analysis
In general, pan-genomes can be divided into open pangenomes and closed pan-genomes according to their characteristics . The open or closed features of the pan-genome reflect genomic diversity and ability of the strain to adapt to environmental changes and acquire new traits through the transfer of genetic material under environmental selection . As shown in Fig. 2A, the size of the pan-genome increased with the addition of genomes, while the size of the core genome decreased as the genome number augmented. The representing functional relationship of the fitted curve was "n = 3655.62.479*N −0.302", wherein α value 0.302 (α<1) implied that S. maltophilia had an open pan-genome. The size of the core genome was progressively closer to the steady-state than the pan-genome, implying that genus-level isolates had a sizable potential to ingest foreign genetic material and boost genetic diversity through other evolutionary mechanisms including mutation and horizontal transfer. As a result, S. maltophilia might change its environment by gaining and losing accessories and unique genes.
The pan-genome contains mainly core genomes, dispensable genomes, and strain-specific genes shown in Table 1. The core genome is essential for the basic lifestyle of bacteria, while the dispensable genome provides species diversity, environmental adaptation, and other characteristics (Wang et al. 2008). To better understand the pan-genome composition of a total of 171,727 genes in the selected isolates, the CDS sequences were clustered. As shown in Fig. 2B, the average number of genes in the core genome was 39.43%, and the proportion of accessory genes was 58.56%, while the proportion of unique genes was 1.8%. Those results indicated that accessory genes were essential for bacterial survival and the basis of their genomic diversity and environmental adaptation. Therefore, the study of auxin genes in S. maltophilia might be able to genetically explain how the organism was altered to adapt to different environments.

Evolutionary analysis of S. maltophilia
To classify the affinities of 42 S. maltophilia isolates, a phylogenetic tree was constructed based on single-copy core genes. As shown in Fig. 3, 42 S. maltophilia isolates were not strictly distributed according to their isolation loci, except that strain 1800 isolated from a contaminated environment as well as strain SJTH1 and WGB211 isolated from shale oil were in an evolutionary branch. Similar consequence was appeared in the mean nucleotide identity clustering heat map of 42 S. maltophilia isolates presented in Fig. 4, wherein the ANI values of strain SJTH1 and WGB211 reached 98.65%, and these three strains 1800 and SJTH1 and WGB211 were 95%, which resulted in their reservation some characteristics from the environment. Combined with previous analyses of the evolutionary relationships of related S. maltobacteria, it could be inferred that isolates from the semblable environmental resources were clustered together, while this conclusion did not apply to clinical isolates due to the differences in the hosts themselves. Our result was consistent with the investigation of S. maltophilia by Yaqian Xiao et al. and the genomic analysis of Aeromonas veronii by Hai-chao Song et al. (Song et al. 2021;Xiao et al. 2021), suggesting that there was no evolutionary correlation between the genome and its ecological niche adaptation.

Functional notes of S. maltophilia
To further investigate their functional properties, COG category and KEGG analysis were performed on the core and non-essential genomes of 19 environmental and 22 clinical strains, respectively. COG category of core genes was mainly related to translation, ribosome structure and biogenesis, transcription, amino acid transport and metabolism,  1612  2503  106  Clinical  291  1612  2324  55  Clinical  D457  1612  2469  78  Clinical  ISMMS3  1612  2307  103  Clinical  MER1  1612  2251  104  Clinical  PL12  1612  1453  56  Clinical  13637  1612  2724  75  Clinical  UHH_PC240  1612  2674  1  Clinical  SKK55  1612  2320  96  Clinical  ICU331  1612  2690  100  Clinical  UHH_  PEG_13_68_68   1612  2279  25   Clinical  UHH_PC239  1612  2675  1  Clinical  UHH_454  1612  2444  114  Clinical  2013_SM24  1612  2294  32  Clinical  2013_SM12  1612  2321  89  Clinical  ZT1  1612  2132  56  Clinical  2013_SM4  1612  2270  66  Clinical  SM_866  1612  2543  279  Clinical  PEG_42  1612  2605  87  Clinical  NEB515  1612  2552  63  Clinical  Col1  1612  2246  41  Clinical  142  1612  2566  65  Clinical  NCTC10498  1612  2397  53 and energy production conversion, which were essential for cell growth and/or rapid and efficient response to nutritional environmental sources (Fig. S1). These capabilities conferred a survival advantage to the ever-changing environment (Ying et al. 2019). The accessory genes of environmental and clinical strains were mainly focused on transcription and cell wall/membrane/envelope formation, and there was no significant difference between the proportions of both in the clinical and environmental settings. However, there were significant differences in energy production and conversion as well as auxin transport and metabolism. In addition to the fact that accessory genomic gene activity was closely related to environmental adaptation and environmental tolerance of bacteria, clinical and environmental variation also led to some differences in bacterial function (Bakermans 2018). Genes with asterisks (* ) appear multiple times because they belong to more than one AMR Gene Family category in the antibiotic resistance ontology) Of the core genomic pathways in KEGG, both clinical and environmental isolates were concentrated on metabolism, such as carbohydrate metabolism, amino acid metabolism, energy metabolism, and metabolism of cofactors and vitamins. Carbohydrate metabolism, signal transduction, and amino acid metabolism are related to some basic life activities of bacteria (Fig. 5). The result suggested that the gene functions of the core genome were mostly conserved in environmental and clinical settings that were related to essential life activities and physiological functions. Extensive signal transduction in the environment allowed the isolates to better respond to changes in the surrounding environment, thus giving them some specific abilities, such as degrading diverse pollutants. The diversity of genes present in different pools indicated that strains selected herein take disparate strategies to adapt to diverse environments (Papon and Stock 2019).

Resistance genotypic diversity within S. maltophilia genomes
The genomes of 22 clinical and 19 environmental isolates were blasted against the database (Table S1) to identify the resistance groups and assess the distribution of the known antibiotic resistance and efflux pump genes in various genomes of S. maltophilia clinical isolates. As presented in Fig. 6, 12 resistance genes were identified in 19 environmental isolates, mainly divided into three major classes of antibiotics resistance-nodulation-cell division (RAD) antibiotic, aminoglycoside 3′-N-acetyltransferase AAC (3′), and aminoglycoside 6′-N-acetyltransferase AAC (6′). The clinical isolates showed a diversity of antibiotic resistance, and 23 antibiotics were identified among the 22 clinical isolates, dividing into 10 categories such as CARB beta-lactamase, L1 family beta-lactamase, Erm 23S ribosomal RNA methyltransferase, major facilitator superfamily (MFS) antibiotic efflux pump, ANT (3′), APH (3′), APH (6′), and sulfonamide resistant sul. The greater variety of antibiotic resistance in clinical isolates was mainly due to developing resistance in clinical hosts, and host diversification also showed different resistance to create pressure to treat infections with S. maltophilia. Thus, the selection pressure of the living environment might play a role in the uneven distribution of antibiotic resistance determinants in this context key role (Zhong et al. 2019). On the other hand, environmental strains were found to contain only resistance-nodulation-cell division (RAD) antibiotics efflux pump, while clinical strains contain a major facilitator superfamily (MFS) antibiotic efflux pump except for resistance-nodulation-cell division (RAD) antibiotics efflux pump. This might be due to the intake of foreign antibiotics in the clinical setting, while a high antibiotic environment allowed the efflux pump gene to function as an antibiotic resistance under selective pressure (Dong et al. 2022). Taken together, our results provided extensive genes gain and loss events occurring in S. maltophilia complex genomes produced consistencies in the relationship between the fraction of homologs and evolutionary relatedness, which was likely a crucial factor leading to genetic diversity.

Conclusion
Deep bioinformatics is a strong tool for nomenclature of 42 S. maltophilia strains to uncover their adaptability in different habitats from the angle of functional and acquired genes related to genome plasticity. Overall, S. maltophilia has adaptability to different environments but essential genes remained mostly constant throughout evolution and would not be affected with environmental changes. However, isolates of clinical origin harbored much more antibiotic resistance genes and efflux pump genes to adapt to human hosts intense and the high selection pressure of antimicrobial medications. The genome contents experienced largely genetic gain and loss events, which might be one of the mechanisms helping us understand how S. maltophilia filled different ecological niches during evolution.