Whole-genome sequencing and genomic-based acid tolerance mechanisms of Lactobacillus delbrueckii subsp. bulgaricus LJJ

The probiotic efficacy and fermentative ability of Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus), a widely used probiotic, is majorly affected by its acid tolerance. Here, we conducted whole-genome sequencing of the high acid-tolerant L. bulgaricus LJJ stored in the laboratory. Compared with the whole genome of low acid-tolerant strain L. bulgaricus ATCC11842, the results show that 16 candidate acid-tolerant genes may be involved in the regulation of the acid tolerance of L. bulgaricus LJJ. Association analysis of candidate acid-tolerant genes and acid-tolerant traits of different L. bulgaricus strains revealed that the three genes dapA, dapH, and lysC are the main reasons for the strong acid tolerance of L. bulgaricus LJJ. The results of real-time quantitative PCR (RT-qPCR) supported this conclusion. KEGG pathway analysis showed that these three acid-tolerant genes are involved in the synthesis of lysine; the synthesis of lysine may confer L. bulgaricus LJJ strong acid tolerance. This study successfully revealed the acid tolerance mechanism of L. bulgaricus LJJ and provides a theoretical basis for the subsequent selection of strains with high acid tolerance for improved probiotic functions. • Three genes are identified as acid-tolerant genes, respectively, lysC, dapA, and dapH. • LysC and dapA are the major key genes in the synthesis of lysine. • The synthesis of lysine may confer L. bulgaricus LJJ strong acid tolerance.


Introduction
As a recognized safe edible microorganism, Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is widely used in the industrial production of fermented dairy products such as yogurt and cheese. It functions in regulating the gut microflora of the human body, maintaining the intestinal microecological balance, inhibiting the growth of pathogenic bacteria (Gordon et al. 1957), and improving the immunity of the human body (Knight and Girling 2003). In 1962, Bogdanov and others isolated three glycopeptides with antitumor activity from L. bulgaricus, which first demonstrated that L. bulgaricus may have potential antitumor effects (Bogdanov et al. 1962). M. Del Piano believed that probiotics should be able to tolerate gastric acid, bile, and pancreatic secretions and can colonize the human intestine (Del Piano

L. bulgaricus LJJ gene function analysis and annotation
The Glimmer 3.02 software was used to predict the gene of the bacteria, and the open reading frame (ORF) with a length of more than 30 amino acids was selected. The predicted gene was searched for homology in the NCBI NR database, and the results were sorted to obtain the ORF of L. bulgaricus LJJ genome (Kelley et al. 2012). In this study, Circos V0.64 software was used in drawing the genome-wide circle.
Comparative genomics study of L. bulgaricus LJJ and type strain ATCC11842 The comparative genomics analysis involves the genomewide comparison, genomic homology, and collinearity, as well as determining differential genes between L. bulgaricus LJJ and L. bulgaricus type strain (ATCC11842). Gene Ontology (GO) annotation and Cluster of Orthologus Groups (COG) functional cluster analysis were performed on unique genes, and candidate genes probably involved in acid-tolerant regulation were searched from L. bulgaricus LJJ unique genes.

Verification of acid-tolerant genes
First, based on the results of the comparative genomics, the candidate acid-tolerant gene was initially screened from the unique genes of L. bulgaricus LJJ and L. bulgaricus ATCC11842. Based on the conserved region of genes, we designed primers for the candidate acid-tolerant genes ( Table 1). The whole genome of seven L. bulgaricus strains with different acid tolerances was used as templates to carry out gene amplification.
According to the amplification results and the length of candidate acid-tolerant gene of the seven strains, the genes with slight differences were excluded. Genes with significant differences in different acid-tolerant strains were identified as acid-tolerant genes. If only the high acid-tolerant strain contains a complete gene, while the low acid-tolerant strains do not, the gene can be preliminarily identified as a key gene for acid tolerance.

Real-time quantitative PCR
SYBR Green I RT-qPCR method was used to detect the transcription level of target gene mRNA in L. bulgaricus samples. Total RNA was isolated from L. bulgaricus LJJ and L. bulgaricus ATCC11842 using a Total RNA isolation kit (DP405-02, TIANGEN Biotech Co., Ltd) according to the manufacturer's instructions. Both L. bulgaricus LJJ and L. bulgaricus ATCC11842 were cultured in MRS medium with pH = 3. And cDNA was synthesized using 1 μg RNA from each sample. Reverse transcription was performed using a PrimeScript™ RT reagent Kit with gDNA Eraser (RR047B, Takara Biotechnology Co., Ltd.). RT-qPCR was conducted using an SYBR® Premix Ex Taq™ II (RR82LR, Takara Biotechnology Co., Ltd.). Following an initial polymerase activation and denaturation step at 95°C for 30 s, the samples in each group underwent 40 amplification cycles of 95°C for 5 s and After amplification, a melting curve analysis was performed to confirm the homogeneity of all products. RT-qPCR results were quantified using the 2 −ΔΔCt method (Livak and Schmittgen 2001). In the present study, 16S rDNA was used as reference genes for the normalization of target genes. The expression levels of the genes analyzed were normalized to their expression levels in the corresponding control group. For RT-qPCR and luciferase assays, all experiments were performed at least three times (biological repeats) and each PCR was performed in triplicate. Error bars represent standard error of the mean (SEM). One-way ANOVA was performed using GraphPad Prism 8 to determine statistical significance. Statistically significance was set at *p < 0.05, **p < 0.01, and ***p < 0.005.

Sequence and function analysis of acid-tolerant genes
The sequence of the identified acid-tolerant genes was analyzed by MEGA7, and the deletion or insertion was analyzed in order to identify the key reasons for the acid tolerance of the strains. Then, we searched the metabolic pathway of identified acid-tolerant genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, revealing the reason why L. bulgaricus LJJ has strong acid tolerance.

Protein structure prediction
The protein structure prediction was performed by Swissmodel. Swiss-model predicts protein structure based on homology modeling. The amino acid sequence or the UniProtKB accession code of the target protein is required as input in Automated Mode (Benkert et al. 2011;Waterhouse et al. 2018). The automatic pipeline identifies suitable templates based on BLAST (Camacho et al. 2009) and HHblits (Remmert et al. 2012). The generated templates are ranked according to the expected quality of the generated models. In this case, multiple templates are selected automatically and different models are built accordingly. The topranked model will be selected as the predicted structure of protein.

Functional genomic analysis of L. bulgaricus LJJ
The raw sequencing data of L. bulgaricus LJJ was subjected to quality control based on the three generations of the sequencing data. A supplementary file shows this in more detail (Supplementary file 1). The final result of genome assembly is available in Table 2. The gene prediction results of L. bulgaricus LJJ showed that a total of 2003 ORFs were predicted, with a total length of 1,598,697 bp, an average length of 798 bp, accounting for 84.54% of the genome (Supplementary file 1). The L. bulgaricus LJJ genome-wide circle map is shown in Supplementary File (Fig. S2). The results of GO annotation, COG classification, and KEGG classification are available in Supplementary files (Fig. S3).

Comparative analysis of genomic sequences and basic features of L. bulgaricus LJJ and ATCC11842
Although the basic characteristics of the genome of strains of L. bulgaricus are similar, slight differences may also be possible among variations. These differences may be attributed to evolutionary variability as a result of environmental differences (Table 3).
The results of genome-wide homology analysis of L. bulgaricus LJJ and L. bulgaricus ATCC11842 are shown in Fig. 1a. There were a total of 1441 genes with homology, with 445 genes unique to L. bulgaricus LJJ, while 126 genes were unique to L. bulgaricus ATCC11842. The homology of L. bulgaricus LJJ and L. bulgaricus ATCC11842 gene is relatively high, reaching 84%. Also, a small number of genes are significantly different and no homology was found.
Then, we conducted a genome-wide analysis of L. bulgaricus LJJ and type strain ATCC11842. The results showed that there was no large sequence rearrangement between the genomes of L. bulgaricus LJJ and L. bulgaricus ATCC11842, indicating that the collinearity was good (Fig.  1b). Insertion, deletion, inversion, and translocation between L. bulgaricus LJJ and L. bulgaricus ATCC11842's genome occur in A, B, and C three locally collinear blocks (LCBs) (Darling et al. 2004). Short gene fragment insertion or inversion implies that the two strains of L. bulgaricus may have undergone genetic recombination or metastasis during the evolution process. Therefore, it is speculated that the insertion or deletion of these fragments is likely to result in different acid tolerances between the two strains (Ming et al. 2016). Comparative GO annotations of unique genes in L. bulgaricus LJJ and ATCC11842 The GO annotation of the unique genes of L. bulgaricus LJJ and L. bulgaricus ATCC11842 is mainly focused on three levels: cellular component, molecular function, and biological process. In comparison to L. bulgaricus ATCC11842, L. bulgaricus LJJ has an obvious advantage in the subclassification of molecular function and biological processes, such as binding, catalytic activity, cellular processes, singleorganism processes, and metabolic processes. The synthesis of cell membrane is an important cell biological property, in which a large number of genes related to the function of cell components are required to be expressed (Anisimova et al. 2017). The acid tolerance is the most important biological function of lactic acid bacteria against acid stress. The acid stress process involves stimulating cell information interaction, gene expression regulation, substance transportation metabolism, etc. (Kandola et al. 2016). This may be the main reason why high acid-tolerant L. bulgaricus LJJ has advantages in the sub-classification of molecular function and biological process.
COG comparative analysis of the unique genes in L. bulgaricus LJJ and ATCC11842 In comparison with L. bulgaricus ATCC11842, L. bulgaricus LJJ has a larger proportion in the following aspects: (L) replication, recombination, and repair, (M) cell membrane biosynthesis and outer membrane proteins, (C) energy production and conversion, (E) amino acid transport and metabolism, and (V) defense mechanisms (Fig. S3). The number of functional genes related to cell protection in L. bulgaricus LJJ is significantly higher than those of L. bulgaricus ATCC11842 and could be responsible for its higher acid tolerance level. Without strong energy metabolism and amino acid transport system, it is difficult for L. bulgaricus LJJ to ingest enough nutrients and energy under acid stress to make it survive normally (Huang et al. 2016). In the absence of strong membrane and membrane protein synthesis or self-healing systems, it is difficult for even acid-tolerant probiotics to survive in acidic environment (Wu et al. 2012).

Preliminary screening of L. bulgaricus LJJ acidtolerant genes
Based on the results of the comparative genomics and previously reported acid tolerance mechanisms of lactic acid bacteria like two-component regulatory system (Dhiman et al. 2014;Landete et al. 2010), proton pump (Cotter and Hill 2003;Higuchi et al. 1997;Koebmann et al. 2000;Olsen et al. 1991;Small and Waterman 1998), protein protection and repair (Cox et al. 2000;Lim et al. 2000), and alkali synthesis (Liu and Pilone 1998;Maghnouj et al. 1998), the acidtolerant gene was initially screened from the 445 unique genes of L. bulgaricus LJJ (Table 4). Most of the candidate acidtolerant genes initially screened are related to amino acid metabolism, suggesting the unique role of basic amino acids like arginine and lysine in bacterial acid tolerance.

PCR amplification and verification of acid-tolerant genes
Based on the results of whole-genome sequencing, we got the sequence of 16 candidate acid-tolerant genes in L. bulgaricus LJJ. According to the conserved region of genes, we designed 16 primers (Table 1). Using the designed P1, P2 … P16 as amplification primers and genomes of 7 different L. bulgaricus strains as templates, 16 candidate acid-tolerant genes of L. bulgaricus LJJ were amplified by PCR. The amplification results are shown in Fig. 2a. The result showed that the 7 different strains of L. bulgaricus contained the target genes corresponding to the P1, P7, P10, P11, P14, and P16 primers. Most fragments were also similar in size, implying "-" indicates no related information that the target genes had no fragment insertion and loss. However, the target gene was not successfully amplified, indicating that the strain does not contain the gene fragment. A successful amplification would imply that the amplified fragments were different, thus proving that the genes had insertion and deletion in different strains. Then, the amplified fragment products were sequenced ( Table 5). The sizes of the same gene were different in different strains. If we take the size of 16 genes in L. bulgaricus LJJ as a standard, some fragments are obviously shorter, which may have occurred as a result of gene deletion during evolution. However, a few other fragments were significantly longer than those of the target gene. The possible reason for these variations is that the fragment insertion occurs during the evolution of the gene, resulting in different fragment sizes, which ultimately leads to genetic functional changes. According to the amplification results and the length of candidate acidtolerant gene of the seven strains, dapA, dapH, and lysC were preliminarily identified as acid-tolerant genes. Therefore, it is speculated that the related acid-tolerant protein cannot be normally translated due to the deletion or lack of acid-tolerant gene. The acid-tolerant metabolic pathway functions abnormally resulting in a relatively weak acid tolerance.
To determine whether the expression levels of the three identified acid-tolerance genes are significantly different in the two strains, we conducted RT-qPCR comparing L. bulgaricus LJJ vs. L. bulgaricus ATCC11842 (Fig. 2a; Table 6). In the acidic environment (pH = 3), the expression levels of these three acidtolerance genes in L. bulgaricus LJJ were significantly higher than those in L. bulgaricus ATCC11842 (p < 0.05). It is inferred that these three genes confer L. bulgaricus LJJ a survival advantage in acidic environment.

Sequence alignment analysis of acid-tolerant genes
The sequence of identified acid-tolerant genes (nos. 6, 12, and 13) was further analyzed, and the acid-tolerant genes dapA, dapH, and lysC in different strains were also analyzed to identify the differences. The results of the alignment analysis are shown in Fig. 1c (a), (b), and (c). Figure 1 c (a) shows the alignment of dapA in different strains. The sequence alignment showed that large fragment deletions of dapA occur in JB, M13, ATCC11842, and GMC. Figure 1 d (a) shows the predicted structure of the protein encoded by the complete dapA gene, Fig. 1d (b) shows the predicted structure of the protein encoded by the partially deleted dapA gene, and Fig. 1d (c) shows that the deleted gene is likely to encode Haem_oxygenase_2 domain. It is speculated that the active center of the enzyme may be inactivated due to the deletion of the DNA fragment; it results in the restriction of lysine synthesis pathway and the significant decrease of lysine synthesis. Therefore, the acid tolerance of JB, M13, ATCC11842, and GMC was weakened. Figure 1 c (b) and (c) show the amplification sequence alignment of dapH and lysC in different strains. Sequence alignment analysis showed that the gene was contained in three strains with relatively strong acid tolerance (LJJ, SY13, and YL5) and had high homology. Based on the inferences from our study, dapH and lysC could be regarded as acidtolerant genes of L. bulgaricus LJJ.

Analysis of metabolic regulation of acid-tolerant genes of L. bulgaricus LJJ
According to previous research, the acid tolerance of lactic acid bacteria includes the regulation of intracellular H + (Wu et al. 2013), the regulation of cell membrane (Paradeshi et al. 2018), and stress response protein dapA Lysine metabolic pathway Alkali synthesis mechanism 7 urdA Cycle of urea Alkali synthesis mechanism 8 mumM Cysteine, methionine metabolic pathway Biofilm synthesis mechanism 9 argC Two-component regulatory system Biofilm synthesis mechanism 10 LuxS Two-component regulatory system Biofilm synthesis mechanism 11 Cation transporter Maybe H + transporter Transfer H + 12 dapH Lysine metabolic pathway Alkali synthesis mechanism 13 lysC Aspartic acid, lysine metabolic pathway Alkali synthesis mechanism 14 proB Arginine, proline metabolic pathway Alkali synthesis mechanism 15 RecA -Repair DNA damage 16 arcD Arginine metabolic pathway Alkali synthesis mechanism "-" indicates no related information expression (Chen et al. 2009). Likewise, the regulation of amino acid metabolism plays an important role in the regulation of intracellular pH, such as the glutamate decarboxylase system (Schwenk and Donovan 2011) and arginine deiminase (ADI) pathway (Schwenk and Donovan 2011).
In this study, we obtained three genes (dapA, dapH, and lysC) related to the acid tolerance of L. bulgaricus  1, 2, 3, 4, 5, 6, and 7 respectively indicate that 7 different L. bulgaricus are JB, GMC, SY3, LJJ, ATCC11842, M13, and YL; M indicates that the marker strip is 100 bp, 250 bp, 500 bp, 750 bp, 1 k, 2 k, 3 k, and 5 k from bottom to top. P1-P16 respectively indicate different primers referred to in Table 1. b In the acidic environment (pH = 3), mRNA expression levels of three acid-tolerant genes in L. bulgaricus LJJ (n = 3) and L. bulgaricus ATCC11842 (n = 3). 16S rDNA was used as reference genes for the normalization of acid-tolerant genes. The expression levels of the genes analyzed were normalized to their expression levels in the corresponding control group. Data are represented as mean ± SEM of at least three independent experiments. *p < 0.05, **p < 0.01 LJJ. The analysis of the acid-tolerant metabolic regulation was performed by KEGG. The regulation process of lysine synthesis by dapA, dapH, and lysC is shown in Fig. 1e. These results hint at possible mechanisms of acid tolerance in L. bulgaricus LJJ (see "Discussion").
Three important acid-tolerant genes dapA, dapH, and lysC mainly regulate the synthesis of lysine and also participate in the lysine metabolic pathway. The lysC gene encodes an aspartate kinase, while the dapA gene encodes a 4-hydroxy-tetrahydrodipicolinate synthase. These two enzymes are important for the synthesis of lysine from aspartate and are involved in all metabolic pathways such as succinyl-diaminopimelate (DAP) pathway, acetyl-DAP pathway, DAP dehydrogenase pathway, and DAP aminotransferase pathway. The dapH gene encodes a tetrahydrodipicolinate N-acetyltransferase, a key enzyme in the acetyl-DAP pathway.

Discussion
In this study, the acid tolerance of different strains of L. bulgaricus was compared and then screened to verify strains with strong acid tolerance (LJJ) and weak tolerance (ATCC11842). Genome-wide sequencing was performed on L. bulgaricus LJJ, to further predict the specific genes of the strain. 2003 ORFs with a total length of 1,598,697 bp and an average length of 798.15 bp, accounting for 84.54% of the genome was detected in our results. The GO gene function was annotated on the predicted genes, and the predicted gene of L. bulgaricus LJJ was annotated with more genes in the binding, catalytic activity, cell part, cell, cellular process, biological adhesion, and metabolic process, which further confirmed that the biological characteristics were strong in this aspect. Analysis of the COG functional classification of proteins predicted by the L. bulgaricus LJJ genome indicated that "-" indicates that the strain does not contain the gene; "1" and "7" in the upper right corner of the strain indicate the rank of acid tolerance. "1" means strong acid tolerance, and "7" means weak acid tolerance. According to the screening principle of acid-tolerant genes, genes 6, 12, and 13 are directly related to the strength of acid tolerance among the 2003 genes, 1251 genes were assigned functional annotations with significant biological significance. Our results also showed that the genes were outstanding in amino acid transport metabolism, translation, ribosome structure and biosynthesis, recombination, repair, cell wall, and cell membrane synthesis. Based on the GO and COG annotation results, we further analyzed the classification of L. bulgaricus LJJ's metabolic pathways, which were enriched in the biosynthesis and transport of metabolites.
Through the comparative analysis of homology and collinearity, between genomes of L. bulgaricus LJJ (high acidtolerant) and L. bulgaricus ATCC11842 (low acid-tolerant), the related regulation pathways of acid-tolerant genes were explored. A comparative analysis of GO annotations and COG classifications between unique genes of L. bulgaricus LJJ and L. bulgaricus ATCC11842 was determined, and the results showed that the number of genes in L. bulgaricus LJJ during acid metabolism, biofilm synthesis, and protein synthesis was significantly higher than those of L. bulgaricus ATCC11842. A comprehensive analysis of the homologous clustering of L. bulgaricus LJJ and L. bulgaricus ATCC11842 strains revealed unique acid-tolerant genes from L. bulgaricus LJJ. Overall, 16 genes associated with L. bulgaricus LJJ acid tolerance were initially screened based on the results of wholegenome sequencing and gene annotation.
Due to the difficulty of genetic manipulation in L. bulgaricus LJJ, understanding of their acid tolerance development has been largely based on comparative genomics and differences in the laboratory type strains. Through the amplification and sequencing analysis of 16 candidate acid-tolerant genes in 7 strains, the acid-tolerant genes were screened and identified as dapA, dapH, and lysC. The survival advantage in acidic environment conferred by the three acid-tolerant genes was further demonstrated by RT-qPCR results.
Analysis of KEGG metabolic pathways for dapA, dapH, and lysC indicated that they are all involved in lysine synthesis. Among them, lysC and dapA are the major key genes in the synthesis of lysine and are involved in all synthetic pathways, such as succinyl-DAP pathway, acetyl-DAP pathway, DAP dehydrogenase pathway, and DAP aminotransferase pathway.
Moreover, lysC encodes aspartate kinase, which converts aspartate to phosphorylated aspartate. Phosphorylated aspartate is not only involved in lysine metabolism but also involved in amino acid metabolisms such as glycine, serine, threonine cysteine and phenylalanine, 2-oxocarboxylate acid metabolism, and various secondary metabolic pathways (Lee et al. 2007). It is one of the important genes that enable strains to cope with changes in the external environment.
Although we have not found the gene lysC in L. bulgaricus ATCC11842, we found its compensatory gene ask, which belongs to the same aspartic kinase family as lysC. The interaction about the compensation gene ask is available in Supplementary files (Table S1). Both dapA and dapH are unique genes, and no compensation has been found in L. bulgaricus ATCC11842, which may result in lysine synthesis always being at a lower level.
As a kind of basic amino acid, lysine can regulate the acidbase balance and help the strains' survival under acid stress. The acetylation modification of a protein generally refers to the transfer of an acetyl group from acetyl coenzyme A (Acyl-CoA) to a protein-specific lysine ε-amino group to form an acetylated lysine. The acetylation modification of lysine is also an important way to regulate the metabolic activities of prokaryotes (Kouzarides 2000). Most of the proteins with acetylation are cytoplasmic proteins, which play a role closely related to cell metabolism, transcription, translation, etc. The acetylation of lysine can sense the energy state of cells and regulate the metabolism of cells through Acyl-CoA and nicotinamide adenine dinucleotide (NAD + ). It is a conservative regulation of metabolism in both eukaryotic and prokaryotic organisms (Wang et al. 2010). Recent researches have shown that the addition of amino acids such as glutamic acid, arginine, lysine, and isoleucine can also increase the acid tolerance of cells (Reeve and Reid 2016;Wu et al. 2013;Wu et al. 2012). Therefore, we inferred that the three acid-tolerant genes grant L. bulgaricus LJJ with relatively strong acid-tolerance by regulating the synthesis of lysine. patently stored in China General Microbiological Culture Collection Center (CGMCC No. 2687). The datasets supporting the results of this article are included within the article and its supplementary files. Requests to access any strains shown in this manuscript should be directed to Xiaoyang Pang (pangxiaoyang@163.com) and Jiaping Lv (lvjiapingcaas@126.com).

Compliance with ethical standards
Conflict of interest The authors declare that they have no competing interests.
Ethics approval Not applicable.