Not all 2,600 Salmonella serovars exhibit equal pathogenicity to humans. Specific serovars or strains of Salmonella are more apt to cause invasive infections in both humans and/or animals. This feature suggests that these isolates may harbor specific VFs crucial for infection. In this project, a database with a non-redundant, relatively comprehensive list of Salmonella VFs and accompanying tools known as VF Profile Assessment and VF Profile Comparison were developed and evaluated. The database was created by compiling existing datasets and conducting an extensive literature review to account for those that were not represented in these databases. The current version of the database contains 594 VFs or putative VFs, including approximately 157 predicted to be located in an SPI, 19 on SGI-1 and 21 that commonly located on plasmids (Supplemental Table 3). Some of these plasmid-associated genes, such as the sit operon, can also be located in the chromosome. Among the SPIs, genes from all 24 currently identified SPIs are included and more details about their functions was recently reviewed [12]. To establish the Virulence and Plasmid Transfer Factor Database to facilitate the prediction of virulence genes, the nucleotide and amino acid sequences of reference genes, and other related information such as the predicted product, locus tag, and accession numbers, were extracted from GenBank to create the backend reference VF dataset accessed by the analysis tools.
Among the tools developed, the VF Profile Assessment tool is used to predict the presence of VFs and provided detailed information on the nucleotide percent identity and location of matching with the reference virulence genes in the uploaded WGS data. The results of this tool can be viewed in the program online and/or downloaded and exported into a spreadsheet to facilitate further data analysis. To evaluate the utility of the tool, WGS data from 810 strains from 14 different serovars were combined and analyzed using Profile Assessment tool (Supplemental Table 3). The observed genetic diversity among individual virulence genes present in strains/serovars could offer valuable insights into their effects on host and/or tissue specificity, gene expression, and other related factors. For example, variation in the percent identities to reference genes among the different fimbrial gene (e.g., bcf operon, Supplemental Table 3) across different serovars may influence their tropism to interact with host epithelial cells. Indeed, Fig. 4 showed that many of the VF similarity profiles from the particular serovars group closely together yet are distinct from those of other serovars. The dispersion of serovars is similar to what is observed with looking at the presence/absence of the genes yet provides more diversity of genotypes. These data go beyond just the presence or absence of a particular VF and underscore the importance of understanding genetic diversity in shaping the pathogenicity and virulence of Salmonella strains.
The VF Profile Comparison tool allows for the upload and comparison of multiple sequences simultaneously. The results display a binary matrix output indicating the presence or absence of VF genes across the uploaded sequences. The data can compare online in the program window or downloaded for further analyzed using other software programs. To evaluate this tool, the WGS data of 43,853 Salmonella isolates from 14 different serovars were analyzed. The data were extracted and collated in a spreadsheet and uploaded into BioNumerics for further analyses. The resulting PCA analysis demonstrated that isolates belonging to the same serovars predominantly clustered together (Fig. 5), suggesting a high degree of similarity in VF profiles within these serovars and a notable diversity among different serovars, with some exceptions (e.g., S. Typhimurium and I,4,[5],12:i:-). These results are consistent with the clustering of the results from the Profile Assessment tool.
To assess the utility of the database for Salmonella characterization, differences in the VF profiles for the strains from the different serovars were compared in detail. Not surprisingly, the PCA results show that the S. Typhi isolates separated from the NTS serovars. There were 28 genes present in the majority (more than 97%) of S. Typhi isolates but absent from the majority of other serovars that belong to three different gene clusters and their detailed functions are listed in Table 7. These included genes encoding a S. Typhi-specific fimbriae, type VI pilus, and the Vi antigen. Studies have shown that these VFs all play an important role in the pathogenesis of bacteria [17–19]. Notably, the Vi antigen production differentiates S. Typhi from the NTS Salmonella and plays an important role in the virulence of S. Typhi [19].
The different VF profiles between the isolates of S. Typhimurium and S. I,4,[5],12:i:- are shown in Table 3. Although there are 31 virulence genes listed as different between the predominant virulence profiles of these two serovars; with the exception of fljA, the presence rates of the other 30 genes in all the isolates of each serovar are not significantly different. This further confirmed that the monophasic variant of S. Typhimurium, S. I,4,[5],12:i: is closely related to S. Typhimurium. While four genes (allD, gip, hyi, and STM0520) are absent in the predominant virulence profile of S. Typhimurium, their presence rates in all the S. Typhimurium isolates analyzed in this study are more than 65%. Meanwhile, the other 27 VFs are absent in the predominant profile of S. I,4,[5],12:i: and present in the predominant virulence profile of S. Typhimurium, but their presence rates in all the S. Typhimurium isolates analyzed are relatively low. Except for fljA, the presence rates of each of the other 26 genes in the S. Typhimurium isolates is less than 60% (Table 3). The reasons for this phenomenon are that the VF profiles of S. Typhimurium are notably diverse, with a total 227 distinct VF profiles identified among the isolates analyzed (n = 1,081), and the predominant profile of S. Typhimurium are only present in 27.94% (302/1,081) of the strains. Notably, the majority of the 27 genes that are absent in the predominant VF profile of S. I,4,[5],12:i:- are located on pSLT virulence plasmid or SGI1. The spv locus (genes spvABCD and spvR), which is strongly associated with strains that cause NTS bacteremia and not present in typhoid strains, is missing in the majority (around 70%) of S. I,4,[5],12: i:- and 40% of S. Typhimurium isolates. The spv operon is associated with the survival and proliferation of Salmonella spp. within macrophages [20]. It encodes the primary virulence factors associated with serovar-specific virulence plasmids in S. enterica. The loss of the spv region eliminates the virulence phenotype of the serovars in their animal hosts and frequently in the mouse model, introducing a pSV (Salmonella virulence plasmid) into a serovar naturally lacking it does not enhance the virulence properties of the strain, which implies that other chromosomally encoded factors are essential for the virulence phenotype [21]. The low presence rate of the spv locus in Salmonella shown in this study is consistent with earlier research indicating that only a small fraction of Salmonella serovars contain this virulence operon [22, 23]. Another operon that is missing from the predominant virulence profile of S. I,4,[5],12:i:- and has a low presence rate in both serovars is the pef fimbrial operon (plasmid-encoded fimbriae), which is responsible for the adhesion of Salmonella spp. to the surface of various cell lines [12, 24]. Since most plasmids impose fitness costs on their hosts, the loss of the plasmid-encoded VFs in Salmonella isolates may have evolutionary advantages that have resulted in its emergence over the past decade. Also, the genes located on SGI-1, a genomic island containing an antibiotic resistance gene cluster, are missing in 99% of the S. I,4,[5],12:i:- isolates and exist in only around 31% of S. Typhimurium isolates. Other VFs that have lower presence rates in S. I,4,[5],12:i:- include fliA, rck, and traT. fljA encodes an inhibitor of fliC, which encodes a phase 1 flagellin protein, FliC, that is important to flagellar motility and biofilm formation [25]. This result is consistent with the previous finding that S. I,4,[5],12:i:- is closely related to S. Typhimurium but lacks the expression of fliA and fljB (encoding phase 2 flagellin) common to all Typhimurium isolates [26]. rck is located close to the pef operon on pSV, and it encodes a protein with resistance to complement killing that can recruit various complement inhibitors to resist the attack of the innate immune system and has been implicated in the invasion of epithelial cells [12]. traT encodes a 27 kDa protein that imparts weak resistance to serum killing and is a component of the plasmid transfer region [12].
The major difference between the VF profiles of S. I,4,[5],12:i:- and S. Saintpaul is the presence of a fimbrial gene cluster (stkABCDEFG) that occurs in about 67% of S. Saintpaul isolates (Supplemental Table 4). The stk fimbrial operon encodes putative Stk fimbriae and was initially reported to be specific for S. Paratyphi A [27]. However, subsequent studies revealed the presence of this operon in other NTS serovars, such as S. Heidelberg, and S. Kentucky [27]. In this study, our results showed that this operon has high presence rates in serovars S. Hadar, S. Indiana, and S. Heidelberg with a presence rate of more than 99% in S. Hadar and S. Indiana and more than 97% in S. Heidelberg (Supplemental Table 4). While the presence rate in the rest of the serovars analyzed in this study is around or less than 1%. The four genes that are missing in almost all S. Saintpaul isolates, but present the great majority of S. I,4,[5],12:i:- are the T3SS effectors sseK1 and sseK3. These effectors encode SseK proteins, which are part of a larger family with NleB1, a glycosyltransferase that modifies host proteins with N-acetyl-d-glucosamine to inhibit antibacterial and inflammatory host responses [12, 28]. Beside these two sseK genes, there is another sseK gene that was also identified in S. enterica strains. While the presence rate of sseK1 and sseK3 in S. Saintpaul is only about 3%, sseK2 is detected in more than 99% in the isolates of all serovars analyzed in this study, including S. Saintpaul (Supplemental Table 4). In all the isolates analyzed in this study, the presence rate of sseK1 and sseK3 is consistent, either more than 99% or less than 3% in a particular serovar. This phenomenon is logical given the collaborative inhibition of the NF-κB signaling pathway by SseK1 and SseK3 during Salmonella infection in macrophages [19]. Although SseK2 can inhibit TNF-α-induced NF-κB reporter activation, its impact on the NF-κB pathway during Salmonella infection in macrophages is minimal [12]. Further research is needed to explore the role SseK2 plays in Salmonella virulence, considering its high prevalence in S. enterica strains. The other two genes, gogB and STM2585, encode T3SS effectors that are involved in the inflammatory response [29].
The difference in the VF profiles between isolates from serovars S. Enteritidis and S. Dublin is due to 12 genes (Table 5). Among them, seven are missing from the majority isolate of S. Dublin. The genes pefA, pefC, pefD, and rck are also missing from majority isolates of S. I,4,[5],12:i:- as mentioned above. Among the 5 genes (finp, sciR, sciS, tssA, xis) that are missing from the majority isolate of S. Enteritidis relative to S. Dublin, genes sciR, sciS, and tssA encode a type VI secretion system (T6SS), which is a contact-dependent contractile apparatus that contributes to Salmonella competition with the host microbiota and its interaction with infected host cells [30–33]. ttsA encodes a core component of T6SS [31, 32]. These findings highlight that these S. Enteritidis lack the T6SS, which is consistent with the previous finding that several genomic islands appear absent or degenerate in S. Enteritidis [34].
Salmonella virulence systems are very complex, as many genes are involved in contributing to their virulence. Numerous VFs, including adhesion molecules, invasins, lipopolysaccharides, polysaccharide capsules, iron acquisition factors, host defense-subverting mechanisms, and toxins, have been identified in Salmonella, and these VFs play different roles during infection to enable the bacterial cells to colonize the host, disseminate, and cause disease [12]. The difference in the presence/absence of the virulence genes in each isolate/serovar might indicate their relative virulence to humans or other animal species. The genetic variation and diversity within different Salmonella serovars explain the observed clinical disparities.