Isolation and characteristics of E. faecalis.
A total of 537 genomes of E. faecalis were subjected to genomic analysis, including 52 genomes that obtained from the isolations in our hospital. These genomes covered 537 strains isolated from blood (42 strains), intra-abdominal (34 strains), GI tract (369 strains), urinary tract (55 strains), respiratory tract (3 strains), wound (19 strains), eye (5 strains) and unknown source (10 strains) (Additional file 1).
The analysis of E. faecalis genomes displayed low G+C contents ranging from 37.0 to 38.0% with an average genome size 2.98 Mb. The largest genome size is 3.10±0.19 Mb, whereas the lowest genome size is 2.90±0.13 Mb (Figure 1). Considering the average gene size of E. faecalis is ~1kb, the largest and the smallest genome have 200 gene differences, which displayed large genomic diversity in E. faecalis.
Then, we investigated the differences of genome size and GC contents among the strains from different sources, and found that the isolates from bloodstream had a largest genome size but lowest GC contents. Also, we observed significant differences in genome size between strains from blood and the other two sources (blood vs. GI tract: p-value<0.01; blood vs. intra-abdominal: p-value<0.01), and significant differences in GC content between blood and intra-abdominal (p-value<0.01). These results indicated different genomic characteristics of E. faecalis that isolated in different body sites.
The phylogeny and MLST structures of E. faecalis
To investigate the phylogenetic relationship of 537 strains, we performed complete genomic comparison analysis on these strains. First, the core genome contained 1425 gene with 55167 cgSNPs. According to 55167 cgSNPs, the phylogenic analysis showed that 537 strains were divided into three main clades on the phylogenetic tree. Clade A composed of 189 strains of which were isolated from other six source groups expect eye and respiratory system. While Clade B and C both including all eight source groups contained 167 and 181 strains respectively (Figure 2).
Then, further MLST analysis on 537 isolates from different sources and geographic locations revealed 106 different sequence types that grouped into nine major clonal complexes (CC4, CC21, CC25, CC30, CC40, CC64, CC179, CC476 and CC483) (Figure 3). The nine CCs widely distributed among strains of different origins and almost nine CCs contained strains from GI tract. For strains of non-open natural orifices, the MLST of strains from bloodstream were ST9, ST742, ST959, ST743, ST674, ST79 and ST32, while the MLST of strains from intra-abdominal lining were mainly ST428, ST858, ST857, ST872, ST483, ST721 and ST479. For strains of open natural orifices, the MLST of strains from urinary tract were ST619, ST856, ST689, ST747, ST470, ST590, ST745, ST744, ST525 and ST943. Additionally, the strains from GI tract had the largest number and revealed 68 different sequence types.
Therefore, the scatter of different source strains in the three phylogenic clades and population structures of E. faecalis defined by MLST suggested that the correlation between strain isolation niche and strain phylogeny might be weaken.
The accessory genome of E. faecalis
Pan-genome analysis of 537 E. faecalis revealed that the accessory genome composed 17243 genes. To investigate the different gene contents from 8 sources, we pair-wised compared the present/absent of genes in different sources. We identified 2546 genes of the accessory genome were significantly distributed in one sources of samples at least. Molecular functionally analysis on these genes showed that 53.1% genes were unknown function. 9.4% genes were associated to replication, recombination and repair, which were most frequent genes in the right part (Figure 4). Then, we constructed a matrix on the significant p-value between 8 sources, and found three clusters, namely Cluster A (Low similarity, LS), Cluster B (High similarity, HS), Cluster C (Middle similarity, MS).
Cluster A suggested that stains from blood and GI tract had a wide and huge gene difference compared to other six isolation groups in information storage and processing, metabolism and cellular processes and signaling categories. More specifically, above genes about cell wall/membrane/envelope biogenesis, cell motility, defense mechanisms and coenzyme transport and metabolism were almost involved in virulence and drug resistance of E. faecalis. Strains from eye and respiratory tract showed considerably less different from other six isolation groups in Cluster B reversely. Cluster C is just a control group due to the presence of samples of unknown sources.
Antibiotic resistance genes
Next, We analyzed that antibiotic factors in the isolates according to the against with E. faecalis virulence genes recorded in multiple database. A total of 59 antibiotic resistance genes were found within the investigated E. faecalis genomes (Figure 5). The percentage of antibiotic resistance genes varied greatly between isolation groups. Based on the average percentage of antibiotic resistance genes, the 537 investigated strains could be divided into two classes, namely Group A consisted of strains from GI tract, eye and respiratory tract and Group B consisted of strains from blood, wound, intra-abdominal lining and urinary tract. In generally, the Group B represented by strains from blood have a obviously higher average presence percentage and numbers of antibiotic type, comparing to Group A represented by strains from GI tract.
Moreover, we found two types of vancomycin resistance genes: vanA-cluster (vanYA, vanZA, vanSA, vanHAX2 and vanRA) and vanB-cluster (vanHBX1, vanHBX2, vanSB, vanVB, vanWB, vanYB and vanRB). There was a lower gene percentage of vanB-cluster in all source groups obviously. However, vanA-cluster showed a higher percentage in Group B and the strains isolated from blood had the largest presence percentage. Although the distribution of antibiotic resistance genes varied greatly between strains, six antibiotic resistance genes, consisting of lsaA (ABC-F subfamily protein), efrA/B (heterodimeric ABC transporter efflux pump), emeA (multidrug efflux pump), dfrE (chromosome-encoded dihydrofolate reductase) and mphD (PTS system fructose-specific EIID component), were commonly present in all investigated isolates.
Virulence genes
We analyzed that virulence factors in the isolates according to the comparison with E. faecalis virulence genes recorded in VFDB. Thirty-six genes encoding six groups of common virulence factors in E. faecalis were identified: the adherence-related genes, the antiphagocytosis-related genes, the biofilm formation-associated genes, the genes encoding exoenzyme, the quorum sensing system-associated genes and the genes encoding toxins (Figure 6). The adherence-related genes (srtC, ebpC, efaA, ebpA and ebpB) and the biofilm formation-associated genes (bopD) were almost identified in all eight source groups. The biosynthesis of capsular polysaccharides by E. faecalis is encoded by the csp operon, which includes 11 open reading frames (cpsA to cpsK). The presence percentages of nine genes cpsC, cpsD, cpsE, cpsF, cpsG, cpsH, cpsI, cpsJ and cpsK were higher in isolates from blood, wound, intra-abdominal and urinary tract, whereas there were lower presence percentage of above nine genes in isolates from GI tract, eye and respiratory tract. Moreover, some members of the cytolysin (cyl) operon were detected in the E. faecalis genomes. Normally, the cyl operon comprises eight genes, i.e. cylA/B/I/L/M/R1/R2/S. The presence of six genes (cylI/L/M/R1/R2/S) were higher in isolates from eye, urinary tract and wound but were lower in strains from blood and intra-abdominal lining. Next, we have interests in focusing on quorum sensing system-associated genes and observed that the presence of these genes (fsrA, fsrB, fsrC, gelE and sprE) were higher in isolates from blood compared to isolates from GI tract, especially fsrA and fsrB.
The diversity of fsr-sprE region of E. faecalis.
To illustrate the relationship between fsr quorum sensing system and E. faecalis bloodstream infections, we analyzed the genomic characteristics of fsr-sprE region and gene presence rates in all eight isolation groups. We divided the genomic patterns in this region into three types: Cluster I/II/III based on the presence or absence of these five genes (fsrA, fsrB, fsrC, gelE and sprE) (Figure 7, Table 1). Cluster I genomic pattern refers to that the genome of the fsr-gelE region of the strain is compared with the genome of the fsr-gelE region of the blood-derived E. faecalis V583 strain. All five genes were well compared, and the upstream and downstream of the region were also well compared, with no deletion or insertion. The Cluster II genomic pattern refers to the comparison of the genome in the fsr-gelE region of the strain with that of the blood-derived E. faecalis V583 isolate, there was deletion of one gene in 5 genes, and there was insertion of gene cluster in the upstream of this region, and deletion of gene cluster in the downstream. Cluster III genomic pattern refers to the comparison of the genome in the fsr-gelE region of the strain with that of the blood-derived E. faecalis V583 isolate, there were deletions of 3 or more genes in 5 genes, and deletions of gene clusters were found downstream of this region.
In general, the fsr-gelE regional genomic patterns of all the strains from eight different sources were mainly clustered in Cluster I and Cluster III, while Cluster II only existed in the strains from urinary tract and GI tract. E. faecalis was less likely to cause infections in the eyes and respiratory system. In addition to these two uncommon infection origins, the detection rate of fsr-gelE regional genome Cluster I pattern in strains of blood, wound, abdominal, urethral, and GI tract decreased in turn (Figure 8).