Genomic features of M. luteus
A total of 76 high-quality and non-redundant Micrococcus genomes were downloaded from the NCBI GenBank database and analyzed (Table S1). Based on the average nucleotide identities (ANI) (Figure 1A) and the type strain genomes, 72 genomes (94.7% of the total 76 genomes) could be assigned to three published species with the inner ANI > 97.4%: M.luteus (n=66), M. lylae (n=5), and M. terreus (n=1). The remaining four strains (‘Micrococcus luteus’ AS2, ‘Micrococcus luteus’ DE0384, ‘Micrococcus flavus’ BCRC 80069, and ‘Micrococcus luteus’ DE0113) probably represented three another species which showed ANI values < 89.7% to others. Based on the near full-length 16S rRNA gene sequences of type strains, strains AS2 and DE0384 should be reclassified into M. endophyticus (type strain: YIM 56238, identity > 99.5%, 1432 nt) [40], strain DE0113 should be reclassified into M. cohnii (type strain: WS4601T, identity = 100%, 1109 nt) [41], and strain BCRC 80069 was of M. flavus (type strain: LW4, identity = 99.9%, 1402 nt) [42]. Strains of M. luteus shared < 89.4% ANI with the other Micrococcus strains whereas, within the species, ANI increased up to 97.8%, averagely, suggesting the higher similarity within but a clear separation between species. The phylogenomic relationship of Micrococcus further confirmed the species classification (Figure 1B).
Within M. luteus, the 66 M. luteus genomes showed > 96.8% ANI values with one another, despite the fact that they were derived from strains inhabiting distinct habitats, including marine, soil, food, plant- and human-associated, and any other common sources (Table S1). The average genome size for all M. luteus strains is 2.52 Mb with a minimum of 2.41 Mb (strain OG2) and a maximum of 2.74 Mb (strain DE0230) (Figure S1A). The average genome size was obvious smaller than M. lylae, M. endophyticus (2.68 and 2.78 Mb, respectively, p < 0.01, Student's t-test), and M. terreus (3.09 Mb), but larger than M. cohnii (2.24 Mb). In addition, the percent G + C content of M. luteus was 72.9% (ranged from 72.3% to 73.3%), larger than M. lylae (71.3%, p < 0.01, Student's t-test), M. cohnii (70.8%) and M. terreus (68.9%), but smaller than M. endophyticus (mean 73.4%, p < 0.01, Student's t-test) and M. flavus (73.5%) (Figure S1B). Taken together, the differences in ANI values, genome size, and G + C content, suggest an apparent divergence between M. luteus and the other species of Micrococcus.
The average number of coding sequences (CDSs) was about 2,287, ranging from 2,148 (strain OG2) to 2,501 (strain SGAir0127). On average, 94.6% of which could be annotated and functionally categorized using the eggNOG database [43]. ISFinder analysis [44] revealed a large number of insertion sequence (IS) elements in the genomes (7.22%~13.4% of the predicted CDSs, mean 8.51%), indicating strains in this species had undergone frequent genomic exchanges, and had a high genomic plasticity to adapt to the various environments [45, 46]. There was also a weak correlation between genome size and the number of ISs among M. luteus strains (r = 0.31, p = 0.01). In addition, 57 strains (86.4%) encoded Rpf, a secreted protein with an N-terminal transglycosylase-like domain (PF06737) and a C-terminal LysM domain (PF01476), which can be used to resuscitate bacteria from a profoundly dormant state [12].
Potential virulence factors and antibiotic resistance genes of M. luteus
To evaluate the potential of M. luteus as pathogens, virulence factors (VFs) and antibiotic resistance (AR) genes were annotated using the Virulence Factors Database (VFDB) [47] and Comprehensive Antibiotic Resistance Database (CARD) [48], respectively. As a result, a total of 30 different putative VF genes were found within the M. luteus genomes (Figure S2). Among these, 12 (40%) were shared by all strains, forming a core set of VF genes, of which three (clpC, clpP, katA) involved in stress response, three (ideR, phoP, relA) in regulation, two (lirB, CBU_1566) in secretion system, and the remaining four (htpB, gnd, bauE, icl) in adherence, immune evasion, iron uptake and metabolic adaptation, respectively. Other VF genes (n=18) belonged to the accessory genome, of which five (cps4l, yhxB/manB, galE, rffG, hasC) were highly conserved in at least 85% M. luteus strains, and the rest skewedly distributed within the species, including three (msbA, bplG, mbtA) strain-specific genes (singletons, 10%). On average, each genome contains 22 VF genes, ranging from 19 (strains RIT608 and 1058_MLUT) to 31 (strain SK58). Furthermore, extensive copy number variation of the VF genes was also found. For example, genes ureABG, which are involved in bacteria urea hydrolysis and have been reported to be crucial to pathogenic bacteria virulence and defense against host immunity [49, 50], have shown an obvious gene expansion (three to six copies) in strain SK58 (isolated from human skin); and genes msrA/B(pilB), the protein of which can promote the successful infection of humans and also respond to adverse conditions [51], has also shown a gene expansion (n=3) in five widely distributed strains (strains DE0459, UMB0038, UMB0189, DE0566 and J28).
In addition, a total of 22 distinctive putative AR genes were identified (Figure S3). Overall, three pervasive AR genes were identified in all strains, including genes associated with macrolide- and penam-resistance (mtrA, with two copies), fosfomycin-resistance (murA) and rifamycin-resistance (rbpA). Additionally, an aminoglycoside phosphotransferase encoded by strA, which acted as aminoglycoside-resistance, was found in 59.1% (n=39) of the strains. The remaining genes (n=18, 81.8%, including 8 strain-specific genes) only presented in certain strains. These genes included 13 antibiotic efflux pump-encoding genes (abaF, baeR, cmx, efrA, efrB, emrB, golS, macB, msbA, qacA, qepA2, tet(A), tet(C)), three antibiotic inactivation enzymes encoding genes (aac(3)-IIb, CTX-M-141, blmS), and the other two for antibiotic target alteration (rmtF) and protection (msrE).
The widespread occurrence, and also high variability in terms of copy number and composition of the potential VF and AR genes among M. luteus strains, emphasized their importance in the pathogenic features. It has been reported that M. luteus could cause severe infections in immunocompromised populations, including peritonitis, brain abscesses, pneumonia, septic arthritis and so on [21-24]. The presence of several virulence determinants and antibiotic-resistance genes in M. luteus should be paid attention to, the profiles of which will provide guidance for the future treatment of M. luteus infections.
Core- and pan-genome of M. luteus
To explore the entire genomic repertoire of the M. luteus population, estimates of the cloud, shell, and (soft-) core pangenome components were generated using GET_HOMOLOGUES [52]. The 66 M. luteus strains had a pan-genome of 8,077 genes and a core genome of 1,134 (14.0%) genes, including 991 single-copy core genes (Figure 2). The core genome represented 45.3% to 52.8% of the gene content of each strain, illustrating a relatively high degree of genomic diversity. This core gene ratio was much lower than that of the actinobacterial species Streptomyces albidoflavus (65.3% to 73.0%, recalculated with the same clustering algorithm and parameters, intraspecies ANI > 98.1%) [31]. Furthermore, the cloud genes (rare genes presented only in one or two strains) contained more than half of the pan-genome (4,210 genes, 52.1%; 3.36% for each strain, averagely), of which 3,301 were strain-specific genes (singletons, 40.9%; 2.14% for each strain, averagely), indicating an exceptionally flexible genome of M. luteus. Correspondingly, the pan-genome for M. luteus still increased with approximately 50 new genes after addition of a 66th genome (Figure 2C). Analysis of the pan-genome curve using a power-law regression model confirmed that the pan-genome was open (Bpan = 0.5), as the curve did not approach a constant as more genomes were selected. This ‘open’ pan-genome is considered as indicator of high horizontal gene transfer rates, especially for strains living in multiple niches [53, 54], suggesting that M. luteus has high capacity to absorb and integrate external genetic elements from environments. All these results above showed a considerable intraspecies heterogeneity of M. luteus, which has been preliminarily proposed based on the macro-restriction analysis using pulsed-field gel electrophoresis [55]. This high genome variability may contribute to the functional diversity of M. luteus species thriving in various habitat.
We next performed a Clusters of Orthologous Groups (COG) functional classification for each orthologous group (OG) to define possible differences in the functions encoded by the soft-core (including genes contained in at least 62 genomes), cloud (including rare genes presented only in one or two strains) and shell (the remaining genes presented in 3~61 genomes) genomes of M. luteus (Figure 3). As a result, a number of functional categories were enriched in the soft-core genome, including Energy production and conversion (COG C), Amino acid transport and metabolism (COG E), Nucleotide transport and metabolism (COG F), Coenzyme transport and metabolism (COG H) and Translation, ribosomal structure and biogenesis (COG J) categories (p < 0.01, Fisher’s exact test). On the other hand, genes of Replication, recombination and repair (COG L) and Defense mechanisms (COG V) categories were more represented within the cloud and shell genomes (p < 0.01, Figure 3B). The enrichment of COG L genes in cloud and shell genomes may be responsible, at least in part, for the intraspecies heterogeneity of the species, as these genes have been proved to play important roles in the successful acquisition of foreign DNA [56]. This will be useful for M. luteus to acquire new physiological traits to adapt to new niches.
Phylogenomic relationship and Population structure of M. luteus
To uncover the evolutionary process that led to the genomic diversity, we constructed a highly robust phylogenomic maximum-likelihood (ML) tree of the 66 M. luteus strains based on concatenated 922 single-copy core genes, using the two M. endophyticus strains (‘Micrococcus luteus’ AS2 and ‘Micrococcus luteus’ DE0384) as outgroups (Figure 4). Based on the phylogenomic tree, M. luteus could be divided into three basal clades: Clade I, II, and III. Further analyses revealed clear inter-clade genomic boundaries. Firstly, the ANI heatmap and clustering results (Figure 1A) suggested that the species could be further divided into three groups, with high intra-group ANI and relatively low inter-group ANI. Comparison of this clustering result with the phylogenomic tree indicated that the three clades corresponded nicely with the three ANI groups, indicating the existence of inter-clade ANI boundaries. Secondly, in order to investigate whether there are gene content differences among the three clades, a hierarchical clustering tree based on the content of dispensable genes was constructed (Figure S4). This tree clearly showed three clades, which were the same as those in the core genome tree, indicating the existence of inter-clade gene content boundaries. These results suggested that the separation of the three clades has emerged, both in the core and accessory genes.
The intraspecies difference was also studied using methods, which are widely used in the population structure analyses. By using Fastbaps [57], the entire M. luteus population can be partitioned into three subpopulations at Level 2, based on core genome single nucleotide polymorphisms (SNPs) (Figure S5). These three subpopulations corresponded to the three major clades on the phylogenomic tree. At Level 3 or higher, one additional subpopulation emerged, which corresponded to only strain NCTC 7563 in Clade II. Furthermore, an admixture model implemented in STRUCTURE [58] software was applied, which showed maximal posterior probability at K = 4, indicating the existence of four ancestral subpopulations (Figure 4 and Figure S5). Among them, three ancestral subpopulations (POP1, POP2, and POP3) were mainly represented by the three major clades. It was noted that, the fourth ancestral subpopulation (POP4, yellow) existed in high proportions in all the three clades, with the highest proportion in the fourth subpopulation uncovered by the above Fastbaps analysis (i.e., strain NCTC 7563 in Clade II). However, since POP4 accounted for only 47.5% of the genome of NCTC 7563, it is unclear whether POP4 can be represented by the branch of strain NCTC 7563. It is possible that POP4 corresponded to an unsampled clade. In summary, while the population structure analysis supported the inter-clade differences, it suggested the existence of an unknown ancestor that contributed to the diversity of the species or the existence of the fourth, yet unsampled, clade.
Gain and loss of genes during the evolution of M. luteus
To unravel the evolutionary events that contributed to the intraspecies gene content differences, gene contents of all ancestral nodes on the phylogenomic tree were predicted and the numbers of gene gain and loss events occurred on all branches were calculated using a parsimony method (Figure S6). As a result, the last common ancestor of M. luteus was inferred to possess 2,028 gene families. The gene family numbers had only slightly increased during evolution, as the gene gain events were a bit more numerous than the gene loss events in most stages during the evolutionary process. Also, a relative high number of gene content variances occurred at the divergence of the three major clades (77, 64 and 55 for Clades I to III, respectively), consistent with inter-clade gene content differences mentioned above (Figure S4). Interestingly, 88.3% of total gene gain events and 82.8% of total gene loss events occurred at terminal nodes, suggesting that the pan-genome diversity of M. luteus is largely due to frequent but recent strain-specific gene gain/loss events.
High level of homologous recombination in M. luteus species
Recombination, especially homologous recombination, is one of the main forces shaping bacterial evolution [59, 60]. By recombination, bacteria can receive DNA fragments from both same and other species and integrate them into their chromosomes. Here, a series of analyses were used to evaluate the extent of homologous recombination in core genome of M. luteus. Firstly, NeighborNet network [61] based on the concatenated single-copy core genes showed a reticulate structure, indicating a high-level non-vertical inheritance in phylogenies (Figure 5A). Meanwhile, the pairwise homoplasy index (PHI) [62] statistic provided highly significant evidence for recombination within M. luteus species (p = 0.00).
We also used mcorr [63] to calculate the probability that a pair of genomes differs at one locus conditional on having differences at the other locus. The resulting correlation profile exhibited a monotonic decay (Figure 5B), indicating the presence of recombination. Similar decaying correlation profiles have also been shown in other recombining bacteria, such as Helicobacter pylori, Pseudomonas aeruginosa, Salmonella enterica and Cronobacter sakazakii [63-65]. Meanwhile, the mcorr showed the mean fragment size (f̅) of a recombination event in M. luteus species was 874.45 bp; the recombination coverage (c), which indicates the proportion of sites in the genome whose diversity has come from recombination events since its last common ancestor, was 0.46; and the ratio γ/μ, the relative rate of recombination to mutation, was 10.87 (Figure 5C). All these recombination parameters above further confirmed a high recombination rate within M. luteus. A summary of the parameters of M. luteus and other typical species reported was shown in Table 1.
Table 1
Recombination parameters of different species inferred by mcorr.
Species
|
f̅
|
c
|
γ/μ
|
References
|
Acinetobacter baumannii
|
860
|
0.40
|
1.30
|
[63]
|
Campylobacter jejuni
|
1,000
|
0.32
|
3.40
|
[63]
|
Cronobacter sakazakii
|
816
|
0.53
|
1.61
|
[65]
|
Klebsiella pneumoniae
|
5,800
|
0.27
|
4.20
|
[63]
|
Micrococcus luteus
|
874
|
0.46
|
10.87
|
This study
|
Micrococcus luteus (NHA only)
|
903
|
0.45
|
10.41
|
This study
|
Micrococcus luteus (HA only)
|
849
|
0.49
|
10.93
|
This study
|
Mycobacterium abscessus
|
1,200
|
0.21
|
13.00
|
[63]
|
Pseudomonas aeruginosa
|
590
|
0.52
|
11.00
|
[63]
|
Salmonella enterica
|
-
|
0.46
|
6.16
|
[64]
|
Staphylococcus aureus
|
550
|
0.36
|
1.00
|
[63]
|
Staphylococcus pseudintermedius
|
407
|
0.24
|
3.97
|
[108]
|
Yersinia pestis
|
530
|
0.03
|
3.00
|
[63]
|
-, information unavailable.
We further detected the recombination events happened in individual genes. Among the 991 single-copy core genes, more than three-fifths (628 genes, 63.37%) of the genes showed significant evidence of genetic recombination, with P-values (computed from 1,000 permutations) lower than 0.05 in at least two of the three methods implemented in PhiPack [62]. This result was further confirmed by fastGEAR [66], by which a total of 708 genes (71.44%) were detected having suffered recent or ancestral recombination events (Figure 5D). The most frequently recombining genes included pabB, cstA, betT, and comEC: pabB encodes an enzyme that catalyzes the two-step biosynthesis of anthranilate and has experienced trans-kingdom gene fusions [67]; cstA is involved in peptide utilization during carbon starvation [68]; betT encodes a choline-glycine betaine transporter, which could help to overcome osmotic stress by the accumulation of compatible solutes [69, 70]; and comEC encodes a transformation protein and has been shown to be absolutely required for DNA uptake and recognition [71].
The results above are consistent with the high proportion of admixture inferred from population assignment and numerous insertion sequences and transposases detected in the genomes, which could allow frequent gene exchange between M. luteus strains and other organisms. Recombination has been proved as an important driver of the evolution of most prokaryotes, and acquisition of novel alleles of existing genes will also accelerate ecological adaptation of bacterial populations [72-74].
Habitat-associated accessory OGs underlying putative differential adaptation
M. luteus is widely distributed in various environments and a few strains were isolated as human pathogens or from the mammalian skin or other tissues. In order to investigate whether the genomic differences among clades are relevant to the habitat, the isolation resources of all strains were mapped to the phylogenomic tree (Figure 4). Strains from habitats related to mammals were not clustered on the tree but rather widely distributed in all clades, indicating the inter-clade genetic difference is irrelevant to the habitat transition. It was noted that strains from habitats related to mammals located on the lately diverged branches, and that strains on the early diverged branches as well as the closest characterized relatives of M. luteus (M. endophyticus and M. flavus) were all isolated from NHA habitats. These results suggested that the last common ancestor of M. luteus was more likely to be genuine dwellers in free-living niches, and a few descendants transited to a mammal-associated lifestyle recently. This proposition is different from an early speculation that M. luteus was primarily adapted to (mammalian) skin, and its presence in water or soil might come from contamination by skin flakes [3].
Lifestyle changes are drastic events followed by gene gain and loss [31, 75, 76]. We seek to find whether strains from different habitats enrich different sets of genes. All genomes whose isolation sources were clear were classified into two groups: 18 from mammal hosts (host-associated, HA) and 47 from free-living and plant (non-host-associated, NHA) sources. Pan-genome-wide association study (pan-GWAS) revealed 101 accessory genes that were present in at least half of the strains from one habitat but no more than half from the other. Among the 13 genes whose associations were statistically significant (Table S2 and Figure 6, Benjamini-Hochberg P-value < 0.1 and Empirical P-value < 0.05), eleven were enriched in NHA strains and two were enriched in HA strains. It was noted that the two HA genes (OG_2452 and OG_2453) and one of the NHA genes (OG_1888) were located at the same genomic locus (Figure 7A). Genes of OG_2452 and OG_2453 encoded a sortase and an excalibur calcium-binding domain-containing protein, respectively. Sortases are enzymes responsible for covalent anchoring of specific proteins to the peptidoglycan of the cell wall of Gram-positive bacteria, performing critical biological functions that are required for the colonization and invasion of host tissues [77, 78]. Sortases have been considered as important virulence factors, as many of the proteins that they display have key roles in the infection process [79]. Thus, consistent with their association with HA habitats, genes of OG_2452 and OG_2453 probably enabled HA strains a better colonization on mammalian hosts. On the other hand, the genes of OG_1888 encoded a protein that was a member of COG0739 of membrane proteins related to metallo-endopeptidases. However, the function of OG_1888 in the adaption to the NHA habitats is unknown. Considering the flanking genes of these genes were relatively conserved (Figure 7A), it is possible that the integration of the HA genes was a result of (homologous) recombination. However, the presence of a transposase gene inside the region suggested that the transposase may play a role in the horizontal gene transfer (HGT). Furthermore, the OG_2452-OG_2453 gene cluster existed in HA strains located on different clades, indicating that different HA lineages obtained this gene cluster separately and that there may be horizontal gene transfer (HGT) between different HA lineages.
Most NHA genes were clustered within two regions. One such region contained six NHA genes (OG_1971 to OG_1976) (Figure 7B). In this region, three genes, fadA, fadB, and fadE, responsible for the degradation and recycling of fatty acids via β-oxidation, were identified. Fatty acids are essential components of membranes and are important sources of metabolic energy in all organisms [80]. The other region contained genes encoding a type IV toxin-antitoxin 'innate immunity' bacterial abortive infection (Abi) system, consisted of a bicistronic operon encoding an AbiEi antitoxin and an AbiEii toxin (OG_1985 and OG_1986, Figure 7C). The Abi system, abbreviated for the phage abotive infection system, provides a post-infection resistance mechanism that could block phage multiplication and result in the death of the infected bacterial cell upon phage infection [81-83], and also might act as stress-response elements that help cells survive unfavorable growth conditions [84]. Beside these two genomic regions, there were also two genes (OG_1915 and OG_2057) showing a significant association with NHA habitat. One of these (OG_2057) encoded a lactoylglutathione lyase (COG0346 / PF18029), which has been reported to play a role in the detoxification of methylglyoxal in bacterial cell metabolism when suffering environmental stresses [85]. Thus, these NHA genes probably enhanced the adaptation to the complex environments.
As shown in Figure 4, the HA strains emerged from different clades, suggesting that different strains may have different HA genes. Consistently, we further detected a total of 23 genes that could strictly separate the two lifestyles when limiting the comparison in subclades (each containing almost equal number of HA or NHA strains, Figure 6 and Table S3). It was noted that, among these, nine genes also showed a weak habitat association (p < 0.05, Fisher’s exact test) at the species level. Functional annotation revealed a gene (OG_1965) coding for an ABC transporter permease protein (COG0577). Consistent with its distribution in HA strains in subclade III-b, it has been reported that it is associated with isolates from human gut compared with free-living environments in Oceanobacillus species [86]. However, the potential functions of most genes detected in the association with HA/NHA habitats remain unclear.
The above habitat-associated genes suggested that the nascent ecological differential has already been initiated and pathogenic strains and lineages in this species are merging. However, it is unclear whether the differentiation has been completed. It has been proposed that microbial speciation is usually driven by natural selection for adaptation to distinct ecological niches, and the distinctness is further maintained by barriers to gene flow. This process is not necessarily continuous and complete, and may be terminated at any time [25, 26, 87]. In fact, we did not detect any clues that recombination barriers have emerged between habitats, as there was no obvious difference in recombination parameters between datasets of HA, NHA, and all strains (Table 1). Here, using M. luteus, we have shown another case to depict the process of bacterial ecological differentiation, whether or not it proceeds to completion: niche-specifying (adaptive) variant has already emerged, but probably because of the high-level recombination and the strong diffusibility between different niches of the species, the separation process seems to be suppressed.