Screening and Identification of Key Genes for the Survival and Multiplication of Salmonella typhimurium in the Host

Background Salmonella typhimurium is an important intracellular pathogen that poses a health threat to humans. The key to studying the pathogenesis of Salmonella is to clarify the mechanisms responsible for its survival and reproduction in macrophages. In this study, RNA was extracted from S. typhimurium reference strain (ATCC 14028) and S. typhimurium isolated from the spleen of infected mice for RNA high-throughput sequencing analysis, based on the BGISEQ-500 platform. Results A total of 1,340 significant differentially expressed genes ( DEGs ) were screened through quantitative gene analysis and various analyses based on gene expression. Of these, 16 genes were randomly selected for fluorescence quantitative PCR verification. Two pairs of primers, 16S and pyridoxol 5ʹ-phosphate synthase ( pdxJ ), were used as internal references. The coincidence rate was determined to be 93.75%, which was consistent with the RNA-seq data, proving the reliability of the RNA-seq data. Functional annotation revealed DEGs associated with regulation, metabolism, transport and binding, pathogenesis, and motility. Through data mining and literature retrieval, 26 of the 58 upregulated DEGs (FPKM >10) were not reported to be related to the adaptation to intracellular survival, and were classified as candidate key genes ( CKGs ) for survival and proliferation in vivo . Among the CKGs, five were biotin synthetic bio family proteins. BioF is one of the first enzymes in the protein synthesis pathway. To evaluate the potential role of bioF in survival and proliferation, bioF mutants of Salmonella were constructed, and the virulence/attenuation was evaluated in vivo . Through the infection of BALB/c mice, High-level, constitutive expression of the mgtC gene confers

bioF deficiency was found to significantly reduce the bacterial load and the fatality rate of mice.

Conclusions
Our results indicated that the bioF gene plays an important role in the survival and proliferation of S. typhimurium in vivo. These data contribute to our understanding of the mechanisms used by Salmonella to regulate virulence gene expression whilst replicating inside mammalian cells.

Background
Salmonella is a Gram-negative pathogen that poses a health threat to humans and livestock. According to the World Health Organization (WHO), salmonellosis is the most common food-borne disease, with nearly one-tenth of the world's population becoming infected each year and around 33 million deaths (http://www.who.int/mediacentre/factsheets/fs139/en/). Among the more than 2600 documented serotypes, Salmonella typhimurium is the predominant serotype associated with salmonellosis worldwide [1,2].
The mechanism of survival and reproduction in macrophages may be key to studying the pathogenesis of S. typhimurium. During the infection process, S. typhimurium must adjust and adapt to rapidly changing intracellular and extracellular environments: downregulating the expression of non-essential genes, upregulating the expression of genes necessary for survival, and expressing essential virulence genes [3][4][5][6]. Many of these adaptive processes are regulated at the transcriptional level [7]. In recent years, a large number of studies have confirmed that many bacteria that induce the high expression of genes in vivo are pathogenic [8][9][10][11][12]. In studies investigating the pathogenesis of Salmonella, the induction of highly expressed genes in vivo is a research focus [13][14][15][16].
In this study, we systematically screened and identified the significant differentially expressed genes of intracellular bacteria in host cells. We studied the bacterial transcriptome rather than the proteome to avoid difficulties in separating out host cell proteins, and applied a new generation of high Flux BGISEQ-500 expression profiling [17] to obtain S. typhimurium transcriptome data for intracellular and extracellular cultures. RNA sequencing (RNA-seq) analysis showed that the transcription levels of genes were significantly altered in the pathogenic bacteria in vivo. We screened and identified key factors in the intracellular proliferation of S. typhimurium by comparative transcriptome analysis combined with literature mining. Our results will help to reveal novel mechanisms employed by S. typhimurium to adapt and survive in vivo and yielded novel candidate key genes (CKGs) for the survival and proliferation of this pathogen in vivo.

Gene expression profiles
A total of six samples were analyzed by high-throughput sequencing using the BGISEQ-500 platform, with an average yield of 21.95 M of data per sample. TN6_2A, TN6_3A, and TN6_4A were three in vivo experimental replicates, and TW6_2A, TW6_3A, and TW6_4A were three in vitro controls. The quality of the filtered reads is shown in the filtered reads quality statistics table (Table 1). After obtaining the clean reads, we used HISAT to compare the clean reads to the reference genome sequence. A total of 4,479 genes were detected. Venn diagrams were generated to quantify the overlap between the three biological replicates. Among them, 4,477 transcripts were detected in the in vitro group, and 4,460 genes were identified among the three biological replicates (Fig. 1a). In total, 3,737 transcripts were detected in the in vivo group, of which 2,853 genes were identified among the three biological replicates (Fig. 1b). To determine the correlation in gene expression between the samples, the Pearson correlation coefficient of all gene expression levels between each sample was calculated. The correlation coefficient between in vivo samples was more than 84.2%, compared with more than 93.3% for the in vitro samples (Fig. 2a). To show the number of genes in each FPKM group (FPKM ≤ 1, FPKM 1-10, FPKM ≥ 10) for each sample, we performed statistical analysis and the results are shown in Fig. 2b.

Identification of DEGs
To determine the DEGs among the 4,479 genes detected by RNA-seq, the DEGseq software was used, with the difference multiple set to more than twice and a Qvalue ≤ 0.001 set as the screening parameter. Comparative transcriptomic analysis (in vivo vs. in vitro) revealed a total of 1,340 DEGs (Additional file 1: Table S3), of which 633 were upregulated in expression, and 707 were downregulated in expression (Fig. 3).

Gene ontology (GO) enrichment analysis for DEGs
Gene ontology divided the genes into three major functional categories: molecular function, cellular component, and biological process (Fig. 4a). The molecular function classification results for the DEGs showed that they were mainly distributed into the categories: binding, catalytic activity, transporter activity, and transcription regulator activity. The cellular component classification results for the DEGs showed that they were mainly distributed into the categories: membrane, membrane part, protein-containing complex, and extracellular region. The biological process classification results for the DEGs showed that they were mainly distributed into the categories: cellular process, metabolic process, localization, and response to stimulus.
The enriched bubble map shows the enrichment of GO terms from three dimensions.
The figure below shows the GO enrichment results for the DEGs (Fig. 4b). GO functional enrichment analysis of DEGs revealed that the DEGs were significantly enriched in the categories: pathogenesis, interspecies interaction between organisms, multi-organism process, locomotion, and extracellular region(Additional file 2: Table S4).

Pathway enrichment analysis for DEGs
The gene-involved KEGG metabolic pathway is divided into seven branches: Cellular Processes, Environmental Information Processing, Genetic Information Processing, Human Disease, Metabolism, Organismal Systems, and Drug Development. Further classification statistics are performed under each branch (Additional file 3: Table   S5). The KEGG pathway annotation classification results for the DEGs are shown in KEGG pathway enrichment analysis showed that the DEGs were significantly enriched in multiple pathways (Additional file 4: Table S6), such as: microbial metabolism in diverse environments, bacterial chemotaxis, Salmonella infection, bacterial secretion system, two-component system, and flagellar assembly (Fig.   5b).

Validation of the RNA-seq data by qPCR
Sixteen genes were randomly selected for fluorescence quantitative PCR verification ( Fig. 6a). Using the 16S gene (Fig. 6b) and the pdxJ gene (Fig. 6c) as internal controls, qRT-PCR analysis was performed to validate the expression profiles of DEGs in S. typhimurium-infected mice. Fifteen genes, with the exception of menG, were significantly upregulated or downregulated, consistence with the RNA-seq data. The overall coincidence rate was 93.75%.

Screening of CKGs for survival and proliferation in vivo
A further in-depth analysis of the DEGs was conducted, and 58 genes that were more than 10 times upregulated in vivo (FPKM >10) were excluded. Through a literature review, we eliminated the genes reported to be associated with intracellular survival and proliferation in Salmonella . Finally, 26 of the 58 genes were classified as CKGs for survival and proliferation in vivo ( Table 2).

Virulence of bioF mutant strains in mice
When mice were challenged with the invA and bioF deletion strains of S. typhimurium, mortality occurred two days later than those challenged with the reference strain (Fig. 7a). No mortalities occurred among the control mice injected with sterile saline. Similar survival rates in mice were observed following challenge with the deletion strains as observed following challenge with a knockout strain of the virulence gene invA. Knockout of the bioF gene, significantly reduced the mortality rates among mice following bacterial challenge (P < 0.05). On day 1, the S. typhimurium load in the spleen was significantly lower for the ΔinvA group than for the DbioF group and the wild-type group (P < 0.05), and the DbioF group showed a lower bacterial load in the spleen than the reference strain group, although this difference was not significant (Fig. 7b). The bacterial loads showed an upward trend with time, but the knockout groups DbioF and DinvA remained significantly lower than the wild-type group (P < 0.05).

Discussion
In this study, a total of 1,340 DEGs were screened through quantitative gene analysis and were subjected to various analyses based on gene expression levels (principal component, correlation, differential gene screening). These DEGs were further analyzed by GO function significant enrichment analysis, pathway significant enrichment analysis, and clustering analysis, and most of the genes found to be enriched were classified into: cellular processes and catalytic activity, structural constituent of ribosome, structural molecule activity, pathogenesis, locomotion, bacterial chemotaxis, flagellar assembly, and Salmonella infection. These DEGs therefore appeared to be functionally closely related to the adaptation of Salmonella to in vivo microenvironments, ensuring the survival of bacteria, and promoting bacterial proliferation. On this basis, we optimized the screening conditions and screened 58 significantly upregulated genes (FPKM > 10). These DEGs mostly encoded type III secretion system (T3SS) proteins, some enzymes, and the SPI2 effector proteins. The T3SS effectors provide Gram-negative bacteria with a unique virulence mechanism enabling them to inject bacterial effector proteins directly into the host cell cytoplasm, bypassing the extracellular milieu [18]. Salmonella replicates within both nonphagocytic epithelial cells and macrophages in Salmonella-containing vacuoles (SCVs) [19]. T3SS secretion effectors alter vacuole positioning by acting on host cell actin filaments, microtubule motors, and components of the Golgi complex. Once positioned, maturation is stalled and bacterial replication is initiated. The T3SS effectors could block phagocytosis or promote bacterial invasion of non-phagocytic cells, altering membrane trafficking, and modulating innate and adaptive immune responses [20,21]. The natural immune system is the first line of defense against bacterial disease [22]. These effectors, which are injected into host cells through a secretory system (such as T3SS), employ complex and sophisticated strategies to block and control the host's signal transduction pathways, particularly those that have important functions in host innate immunity. For example, Salmonella AvrA can be secreted into the host cytoplasm, where it inhibits inflammation and cellular pro-apoptotic responses [23,24]. SseL protein can inhibit the intracellular NF-kB pathway and enhance the pathogenicity of Salmonella pullorum [16]. These activities enable bacteria to survive and replicate, causing disease. To ensure intracellular survival, Salmonella must not only avoid removal by the host immune system but also strive to obtain nutrients, which requires the adjustment of gene expression levels to adapt to a rapidly changing environment [6], and these processes are mostly dependent on certain special biological macromolecules. At present, it is still unclear which regulatory networks are affected by effector proteins, the SPI, and transport processes, and their role in adapting to the SCV microenvironment in the host cell [6].
To date, the biotin synthesis pathway has been studied in detail in E. coli, Bacillus sphaericus, Bacillus subtilis, Saccharomyces cerevisiae, and plants [34]. Studies have shown that after S. typhimurium infection of macrophages, SPI2-mediated secretion of the T3SS proteins SsaQ and SseE and expression of the biotin biosynthesis proteins BioB and BioD increased. After loss of the bioB gene, the ability of the bacteria to proliferate in phagocytes was reduced, indicating that biotin plays a role in the survival of S. typhimurium in macrophages [35]. In our experiments, not only bioB and bioD were found to be upregulated, but also bioA, bioC, and bioF. BioF is an enzyme that catalyzes the first step in the biotin synthesis pathway. Its structure and function have been well studied in microorganisms such as E. coli, B. subtilis, and B. sphaericus [36,37], but not in S. typhimurium [35].
In this study, we used homologous recombination to construct derivatives of S. typhimurium in which the bioF and invA genes were disrupted. S. typhimurium invA mutants (ΔinvA) were included as a control. Gene disruption had no evident effect on growth in vitro or on cellular morphology. The parent strains (WT) and their bioF or invA mutant derivatives were inoculated into mice. The ΔbioF strain was significantly attenuated compared with the WT strain. After knocking out the candidate gene bioF, the mortality of the mice was significantly reduced (P < 0.05).
Unlike ΔinvA, there was no significant difference in the bacterial load in the spleen between the initial stage of ΔbioF infection and infection with the parental strain.
Consistent with the ΔinvA strain, the bacterial load was significantly lower with the ΔbioF strain than with the parental strain as the infection time became prolonged.
The reason for this may be that invA is a virulence gene that has been confirmed to be involved in the invasion of the T3SS. The invA gene plays a role in the process of bacterial invasion, and deletion of this gene directly reduces the number of invading cells [38]; by contrast, bioF is a gene involved in bacterial metabolism and its function does not affect the invasion stage of bacteria pathogenesis. Later on in the infection, the bacterial load was significantly lower with the mutant strain than with the reference strain, suggesting that in a microenvironment of intracellular nutrient deficiency, the loss of bioF affects the proliferation of the bacteria. These results Physiological, metabolic, and effector protein-mediated adaptation strategies allow the bacteria to replicate within the SCVs, and to form persister cells [39,41]. Sulfur is an essential element for microorganisms that can be obtained from a variety of compounds, with sulfate being the preferred source [42]. Sulfate uptake is carried out by sulfate permease belonging to the SulT (CysPTWA), SulP, CysP / (PiT), and CysZ families. The main proteins of the sulfur metabolism transport pathway are significantly upregulated in cells, which may be an adaptation to the SCV microenvironment. Among them, CysA is a sulfate/thiosulfate transfer ATP-binding protein, which is involved in the formation of the ABC transporter complex CysA WTP located inside the cell membrane, and is responsible for energy coupling with the transport system. Within the bacterial cytoplasm, CysH is a phosphate adenosine phosphate reductase (thioredoxin), which is a member of the PAPS reductase family. The three substrates of the CysH enzyme are adenosine 3', 5'diphosphate, sulfite, and thioredoxin disulfide, and the two products are 3'-adenosyl sulfate and thioredoxin(https://www.prospecbio.com/cysh_ecoli). In a study of the intracellular pathogenic bacteria Mycobacterium tuberculosis, it was confirmed that cysA deficiency leads to sulfate auxotrophy [43]. The activity of cysH can protect bacteria against free radicals during chronic infection, and nitrosation and oxidation exert excitatory resistance, which may be the mechanism that guarantees chronic persistent infection [44]. Nitrosation and oxidative stress play a key regulatory role in inflammation and the immune response. Nitric oxide produced by the host's several immune cell NO synthases has a severe impact on the major carbon metabolic pathways of Salmonella [45]. Following Salmonella infection in mice, both cysA and cysH were upregulated, suggesting that Salmonella has a similar mechanism of action to M. tuberculosis and plays an important role in regulating host cell nitrosation and oxidative stress. SspH2 belongs to a growing class of bacterial effector proteins that use and disrupt the eukaryotic ubiquitination pathway, a central eukaryotic regulatory mechanism that controls a variety of biological processes, such as programmed cell death, cell cycle progression, and signal transduction, and when deleted, can lead to cancer and neurodegenerative diseases [50].
Through our literature review of these genes, we obtained a total of 26 DEGs that have not yet been reported to be related to intracellular survival and proliferation.
These genes are highly likely to be associated with such functions and are candidate key proteins for intracellular survival and proliferation. In future studies, we will further elucidate the expression of these proteins in host cells and their possible interaction with host cell proteins, enabling us to explore the immune signaling pathways that are involved in vivo. This will help us to clarify key issues such as the pathogenesis of intracellular bacteria and the mechanisms of evading host immune responses. Furthermore, such studies may provide target proteins for vaccine development and drug development, which may aid efforts to control infections by Salmonella as well as other important intracellular pathogens.

Conclusion
Comparative transcriptomic analysis of S. typhimurium in vivo and in vitro has highlighted a series of CKGs for bacterial survival and proliferation in vivo, including bioF. Inactivation of this gene in S. typhimurium ATCC 14028 led to significant attenuation in vivo. Future research will need to address the complex role of bioF in the life cycle of S. typhimurium. Further studies will also be required to identify other CKGs promoting intracellular survival and proliferation.

MPTP-induced PD motor deficits
We evaluated the exercise capacity of MPTP-induced mice by pole test and rotarod test. Fig. 1A, B showed the time required for the movement of the mice in the pole test and rotarod test. In the pole test, MPTP-induced mice were used longer time than the control group (Fig. 1a). Compared with the control group, the MPTP-induced mice had poor coordination and the rotarod test time was significantly shortened (Fig. 1b). This indicates that MPTP-induced PD mice were successfully constructed.

Summary of RNA-Seq data sets
To understand the transcriptome information of different brain regions in PD mice, 24 libraries were constructed in the CC, HP, ST, and CB regions. Subsequently, the libraries were sequenced on the Illumina HiSeq 2500 sequencing platform. The raw reads of each sample ranged from 27.4 to 63.2 million (Additional file 1: Table S1).
After quality control and filtering, clean reads were obtained. We compared the clean reads to the reference genome with Hisat2. The reads for each gene were calculated using HTSeq-count. Principal component analysis (PCA) was performed based on the DESeq2 method to analyze whether the same brain region samples were together. The results showed that the four brain regions of the control group were divided into different clusters (Fig. 2a). Compared with the CC and HP regions, the transcriptional characteristics of the CB region were significantly different. In addition, the transcriptome characteristics of the ST were significantly different from those of the CC and HP regions. However, the transcriptome characteristics of the CC and HP regions were not significantly different. The four brain regions of the model group were divided into different clusters (Fig. 2b). The difference between the CB region and the other three brain regions was the most obvious.

Differentially expression gene in different brain regions
The transcriptome characteristics of the model group and control group mice were analyzed. DEG analysis explained the difference between the transcription in the model group and control group. According to the values of fold change > 2 and P < 0.05, DEGs were obtained using the DESeq2 R package.
Cluster analysis was performed using the correlation distance and hierarchical algorithm to show gene expression patterns in different brain regions of the control group and model group ( Fig. 3a and b). In the same pathological state, the gene expression in the CB was significantly different from that in the CC, HP, and ST regions.
To analyze the effects of gene expression in different brain regions on PD, the DEGs in the model and control groups in the same brain region were compared. As shown in Fig. 3c, 110 DEGs were obtained in the CC region, of which 64 genes were upregulated and 46 genes were downregulated. A total of 179 DEGs were obtained in the HP region, of which 99 genes were upregulated and 80 genes were downregulated. A total of 521 DEGs were obtained in the ST region, of which 485 genes were upregulated and 36 genes were downregulated. There were 96 DEGs in the CB region, of which 67 were upregulated and 29 were downregulated. Fig. 3d shows the overlap of DEGs in the four brain regions. Of the four brain regions, only two genes (Gxylt1 and C920006O11 Rik) were co-expressed.

Differentially expressed genes in HP and ST regions of Parkinson mice
To further understand the transcript information in the HP and ST regions of mice with PD, RNA sequencing analysis was undertaken. Based on the results, unbiased hierarchical clustering analysis was performed for all genes in the HP region. The data of all genes were visualized using a heat map (Fig. 4a). For data quality control, we performed PCA (Fig. 4b). The results showed that the gene expression in the HP region of the model group was different from that in the HP region of the control group. A total of 179 DEGs were found in the HP region in the model group.
Additional file 1: Table S2 lists the top 10 upregulated genes and top 10 downregulated genes with the most significant differences in the HP region. In addition, hierarchical clustering analysis of the DEGs showed that there were differences in gene expression in the ST region of the PD model mice (Fig. 4c).
Additional file 1: Table S3 lists Table S4 and Table   S5 listed the top 10 up-regulated genes and the top 10 down-regulated genes in CC and CB regions, respectively. A heat map of the top 100 genes differentially expressed between the control group and model group in the CB region is provided in Additional file 2: Figure S1.

Gene ontology enrichment analysis of the DEGs in HP and ST regions
The biological functions of DEGs were identified by GO analysis. The GO analysis database can classify DEGs, including "biological processes," "cellular components," and "molecular functions." The GO term corrected by P < 0.05 was defined as significant enrichment of DEGs. In Fig. 5, the significant enrichment of GO terms in the three main categories (cellular component, biological process, and molecular function) is shown. In the HP region, the main five subcategories of biological process were "DNA methylation or demethylation," "posttranscriptional regulation of gene expression," "rhythmic process," "protein localization to organelle," and "regulation of DNA metabolic process," and the two subcategories for molecular function were "guanyl-nucleotide exchange factor activity" and "protein serine/threonine kinase activity" (Fig. 5a). In the ST region, the main five subcategories of cell component were "post synapse," "dendritic tree," "presynapse," "perikaryon", and "endoplasmic reticulum subcompartment." The five subcategories for biological processes were "positive regulation of nervous system development," "synapse organization," "regulation of transmembrane transport," "regulation of exocytosis," and "regulation of intracellular transport"; and five subcategories for molecular function, which are "protein domain specific binding," "protein kinase binding," "protein heterodimerization activity," "GABA receptor binding," and "guanyl-nucleotide binding" (Fig. 5d). Fig. 5b, 5c and 5e, 5f show the top 20 clusters of significantly enriched terms in the HP and ST regions, respectively.

Gene ontology enrichment analysis of the DEGs in CC and CB regions
GO enrichment analysis was performed on DEGs in the CC and CB regions to determine the effects of PD on these two brain regions. The results showed that 10 GO terms were enriched in the CC region (Fig. 6a), in which the cell components were significantly enriched in two GO terms ("axon" and "filopodium"), the biological process was significantly enriched in six GO terms ("negative regulation of protein kinase activity," "regulation of neuron death," "negative regulation of cell cycle," "ribosome biogenesis," "chromatin organization," and "organophosphate biosynthetic process"), and the molecular function was significantly enriched in two GO terms ("myosin binding" and "lipase activity"). A total of 14 GO terms were enriched in the CB region (Fig. 6b), in which the cellular components were significantly enriched in two GO terms, the biological process was significantly enriched in nine GO terms, and the molecular function was significantly enriched in three GO terms. In the CC region, the top five significantly enriched GO terms were "axon," "associative learning," "myosin binding," "negative regulation of protein kinase activity," and "regulation of neuron death" (Fig. 6c). In the CB region, the top five significantly enriched GO terms were "negative regulation of transmembrane transport," "translation factor activity, RNA binding," "histone lysine methylation," "response to glucose," and "dioxygenase activity" (Fig. 6d).

KEGG enrichment analysis
KEGG pathway analysis of DEGs in the HP and ST regions of the model group helped identify the biological pathways related to DEGs. The P < 0.05 pathway was defined as the DEG enrichment pathway. A total of 179 and 521 DEGs in the HP and ST regions were enriched in 122 and 213 KEGG pathways, respectively. Table 1 shows the pathways and related genes that were significantly enriched in the HP region. In the ST region, the top 10 significantly enriched signaling pathways and related genes are shown in Table 2. There were three signaling pathways ("dopaminergic synapse," "glutamatergic synapse," and "adrenergic signaling in cardiomyocytes"), which was consistent with the study by Fu et al. (2019). However, only a few KEGG pathways were enriched in the CC and CB regions. Only the "focal adhesion" and "regulation of actin cytoskeleton" pathways were significantly enriched in the CC region (Table 3), and only the "adrenergic signaling in cardiomyocytes" and "RNA transport" pathways were significantly enriched in the CB region (Table 3).

PPI networks
There were 427 proteins in the PPI network and the nodes represented proteins (Fig.   7a). The most interacting proteins in the PPI network were LRRK2, DA receptor D2 (DRD2), protein kinase A catalytic subunits (PRKACA), PPP2R5C, insulin-like growth factor 1 (IGF-1), PIK3R1, guanine nucleotide binding protein alpha inhibiting 1 (GNAI1), and guanine nucleotide binding protein alpha inhibiting 3 (GNAI3). A total of 15 proteins that may be associated with PD were extracted from the PPI network and the protein network interactions were reconstructed (Fig. 7b).
The P value was corrected by FDR, and when FDR ≤ 0.01, this was regarded as significant enrichment.