Genome mining discovery of hydrogen production pathway of Klebsiella sp. WL1316 fermenting cotton stalk hydrolysate

Genome sequencing was used to identify key genes for the generation of hydrogen gas through cotton stalk hydrolysate fermentation by Klebsiella sp. WL1316. Genome annotation indicated that the genome size was 5.2 Mb with GC content 57.6%. Xylose was metabolized in the pentose phosphate pathway via the conversion of xylose to xylulose in Klebsiella sp. WL1316. This strain contained diverse formate-hydrogen lyases and hydrogenases with gene numbers higher than closely related species. A metabolic network involving glucose, xylose utilisation, and fermentative hydrogen production was reconstructed. Metabolic analysis of key node metabolites showed that glucose and xylose metabolism influenced biomass synthesis and biohydrogen production. Formic acid accumulated during fermentation at 24–48 h but decreased sharply after 48 h, illustrating the splitting of formic acid to hydrogen gas during early-to-mid fermentation. The Kreb’s cycle was the main competitive metabolic branch of biohydrogen synthesis at 24 h of fermentation. Lactic and acetic acid fermentation and late ethanol accumulation competed the carbon skeleton of biohydrogen synthesis after 72 h of fermentation, indicating that these competitive pathways are regulated in middle-to-late fermentation (48–96 h). This study is the first to elucidate the metabolic mechanisms of mixed sugar utilisation and biohydrogen synthesis based on genomic information.


Introduction
The traditional fossil fuels meet 88% of our energy needs, which is likely to increase by 50% within the next five decades. However, at the present rate of consumption, these reservoirs may be depleted by the year 2150 (Patel et al. 2021a, b). Moreover, burning of fossil fuels, emissions of greenhouse gases, and generation of large quantum of wastes have caused serious environmental pollution issues (Patel et al. 2021a, b). So under such combined pressure of decreasing fossil energy reserves and increasing environmental pollution, making full use of lignocellulose and other renewable resources to develop clean energy has attracted the attention of the scientific community worldwide (Boodhun et al. 2017;Rodionova et al. 2017;Tian et al. 2019). Hydrogen energy is the cleanest renewable energy on the earth, due to its non-polluting end-product (water) of combustion and high energy efficiency (122 kJ/g) (Patel et al. 2019). Among current production methods, hydrogen produced by the dark fermentation of lignocellulosic materials by microorganisms is one of the most prominent strategies (Kamaraj et al. 2020;Kumar et al. 2019;Shanmugam et al. 2020). In this method, the selection of the fermentation strain is of great importance. However, fermentative hydrogen production by single wild strains is relatively low. Therefore, measurements, such as metabolic pathway analysis and engineering modification, are needed for strains with clear genome information (Lu et al. 2016;Jo and Cha 2015;Lee et al. 2019;Zhang et al. 2020) in order to reconstruct the hydrogen production metabolic pathways and increase the yield of hydrogen produced by hydrogen-producing strains.
In recent years, with the development of genome sequencing, biosynthesis networks of biofuel-generating microorganisms have been analysed and regulated using genomics (Bruce et al. 2016;Delegan et al. 2019;Mekanik et al. 2019). Genome-scale information related to microbial synthesis metabolic pathways is necessary for the analysis the metabolic network for the accurate synthesis and metabolic regulation of biofuels. As early as 2003, Kalia et al. analysed the KEGG metabolic pathways of several hydrogengenerating microorganisms based on 84 genomic sequences and 92 partial genomic sequences to screen for new strains with a high efficiency for hydrogen production. In the field of bioenergy, several reports have elucidated the biofuel synthesis pathway and regulatory mechanisms using genomics information. For example, the whole genome sequencing of Clostridium butyrate INCQS635, which is able to produce ethanol and butanol fuel, revealed the metabolic mechanism for the conversion of complex sugars into biofuels. As for hydrogen-producing microbes, genomic information has been applied in the following ways: (1) to search for novel hydrogenase genes of highly efficient hydrogen-producing microbes (Kamalaskar et al. 2016); (2) to analyse metabolic pathways for microbial hydrogen synthesis (Khanna et al. 2013;Castro et al. 2013);and (3) to reveal the regulatory genes for biohydrogen synthesis in wild and mutant hydrogen-producing strains via comparative omics analysis (Gökçe et al. 2017).
Klebsiella is a facultative anaerobic hydrogen-producing bacterium with a high hydrogen production potential, for example, Klebsiella sp. WL1316 was reported to obtain hydrogen yield of 1.442 ± 0.075 mol mol −1 sugar consumed byE our group, which was higher than that obtained by the reported use of pure cultures (Li et al. 2018). At present, research on hydrogen-producing bacteria has mainly focused on the optimisation of the fermentation conditions and the identification of cheap substrates. By contrast, studies on the analysis and regulation of its hydrogen-producing metabolic pathways remain relatively scarce. In general, there is a lack of conclusive results regarding the hydrogen production metabolism of Klebsiella. Therefore, the metabolic regulation of hydrogen production pathways in Klebsiella is not as well understood as in E. coli (Shekhar et al. 2021;Soo et al. 2017), with most studies choosing to focus on the overexpression of key enzyme genes in specific metabolic pathways (Bai et al. 2012;Wu et al. 2014;Jawed et al. 2016). However, some strains of Klebsiella have undergone genome sequencing, including Klebsiella oxytoca SA2 with nitrogen fixation , Klebsiella sp. BRl6-2 with lignin degradation (Woo et al. 2014), Klebsiella pneumoniae with drug resistance (Hua et al. 2014), and Klebsiella oxytoca HKOPL1 isolated from faeces of giant pandas (Jiang et al. 2014). However, there have been no reports on the wholegenome sequencing of hydrogen-producing strains of Klebsiella, despite the fact that it would provide insights into its hydrogen production metabolic pathways.
In this study, the whole genome of Klebsiella sp. was sequenced with the aim of reconstructing the metabolic network for hydrogen production from cotton stalk hydrolysate to identify key metabolic pathways for glucose and xylose utilisation and biohydrogen synthesis. Moreover, the essential enzyme activities were measured, and metabolic flux analysis was developed by detecting the metabolites in the node of key pathways to confirm the genome-scale reconstructed metabolic pathways and provide a new interpretation of the metabolic basis for hydrogen production from cotton stalk hydrolysate.

Microorganism
In this study, Klebsiella sp. WL1316 was used for genomic DNA extraction and fermentative hydrogen production, which obtained the ability to utilize cotton stalk hydrolysate for fermentative hydrogen production (Li et al. 2018. The accession number of this strain's 16S rDNA gene sequence had been reported as KT321102 (Li et al. 2018). The medium, comprised of 10 g/L xylose, 10 g/L glucose, 5 g/L beef extract, 10 g/L peptone, 5 g/L NaCl, 0.5 g/L KH 2 PO 4 , and 1 g/L MgSO 4 ·7H 2 O, was used to activate the strain and cultivate the seed culture (Li et al. 2018).

Genomic DNA extraction, purification, and sequencing
Genomic DNA was extracted and purified using an Ezup Column Bacteria Genomic DNA Purification Kit (Sangon, Shanghai, China) following the manufacturer's instructions. The purified genomic DNA was sent to Beijing Novogene Biological Information Technology Co. Ltd. for wholegenome sequencing using the MiSeq sequencing platform. The sequencing data were deposited into the NCBI database under the BioProject accession number PRJNA611005.

Genome assembly and preliminary analysis
GC and depth correlation analysis and K-mer analysis were performed before assembly. Further, SOAPdenovo software was used to assemble the processed reads, and SOAPaligner sequence alignment software was used to align the reads with the assembly results. The de novo prediction of the assembly results was developed using the Glimmer 3.0 gene prediction software, and GeneMarkS software was used to predict the coding gene; tRNAscan software was used to predict the tRNA region and secondary structure; sRNA was predicted using Rfam software; RepeatMasker software was used to predict the scattered repeats; and TRF was used to search the tandem repeat sequences (Zerbino and Birney 2008;Tarailo-Graovac and Chen 2009;Ye et al. 2015;Xu et al. 2016).

Bioinformatics analysis based on the genome sequencing results of Klebsiella sp. WL1316
The genomic data of Klebsiella sp. WL1316 were blasted through the GO (Gene Ontology), COG (Cluster of Orthologous Groups of proteins), and KEGG (Kyoto Encyclopedia of Genes and Genomes) databases for functional gene prediction (Yu et al. 2006;Horton et al. 2007;Jamialahmadi et al. 2016). As a result, the KEGG metabolic pathways and functional proteins related to glucose metabolism and fermentative hydrogen production by of Klebsiella sp. were identified. Further comparative genomic analysis was performed using five bacteria, Klebsiella pneumoniae subsp. HS11286, Klebsiella pneumoniae subsp. NTUH-K2044, Klebsiella oxytoca KONIH1, Enterobacter ludwigii EcWSU1, and Enterobacter sp. R4-368, which were selected as comparative objects, of which the last two strains were reported to be hydrogen-producing bacteria. The genome characterisation of the five reference strains is illustrated in Supplementary Table 1. The genes encoding the hydrogenase functional proteins were manually mined based on the annotated genome data. The gene numbers of different functional proteins were statistically analysed, and a heatmap was constructed using HemI software.

Cultivation and sampling
After activation, a single colony strain was picked and inoculated into the seed culture medium. The seed was then agitated at a speed of 120 r/min for 15-16 h under 37 °C. The OD 600 of the inoculum was adjusted to approximately 1.0 and inoculated into the fermentation medium at an inoculation size of 10% (v/v). The fermentation was carried out using a self-designed fermenter and following the optimized conditions previously reported by our group: initial sugar concentration of 40 g/L, fermentation temperature 37 °C, initial pH 8.0 (Li et al. 2018).

Measurement
The volume of the hydrogen gas was examined by 1 mol/L NaOH displacement in an inverted burette, and the biohydrogen concentration was measured using a hand-held hydrogen detector (KP810H20; Zhong'an Electronic Detection, Henan, China). The volume of hydrogen gas was measured at 6-h intervals and added after 24 h of daily hydrogen production. The hydrogen production volume measured every 24 h was accumulated and converted into the cumulative hydrogen production in the corresponding fermentation period. At the end of fermentation, the fermentative broth was collected and centrifuged at 8000 × g for 10 min and filtered through syringe filters with a 0.22-μm membrane before analysis. The glucose and xylose concentrations were measured using a glucose detection kit (Comin, Suzhou, China) and a xylose detection kit (ZZStandard, Shanghai, China). The concentrations of key node metabolites, such as pyruvic acid, citric acid, succinic acid, lactic acid, acetic acid, formic acid, and ethanol, were measured using detection kits (ZZStandard, Shanghai, China) using an automatic microplate reader (Multiskan FC; Thermo Scientific, Massachusetts, USA).

Results and discussion
General features of the genome of Klebsiella sp.

WL1316 and functional gene prediction
In this study, genome sequencing and characterisation were conducted on Klebsiella sp. WL1316 to gain a comprehensive understanding of the functional genes of this strain, especially those responses to glucose and xylose metabolism and hydrogen production. The genome characteristics are listed in Table 1. The genome of Klebsiella sp. WL1316 is a chromosomal DNA with a total length of 5.2 Mb and a GC content of 57.6%. A total of 4915 coding genes were predicted, up to 86.7% of the whole genome, and 76.7% of the predicted coding genes encoded products possessed definite functions, while the rest were hypothesized proteins or unknown proteins. The functional protein-coding genes were annotated by aligning the sequences to the COG, GO, and KEGG databases. The COG and GO annotated proteincoding gene numbers reached 3768 and 3622, respectively, representing 76.7% and 73.7% of the total gene numbers, respectively. The COG and GO annotated functional categories are illustrated in Supplementary Fig. 1 and Supplementary Fig. 2. In addition, the KEGG annotated protein-coding genes reached 3408 genes, which were involved in 152  Fig. 3).

Annotation of the metabolic pathways for glucose and xylose utilisation and bioconversion
In a previous study, we reported that Klebsiella sp. WL1316 intrinsically possesses the ability to utilize glucose and xylose produced by lignocellulosic hydrolysate (Li et al. 2018). Therefore, we firstly focused on the sugar metabolic pathway and found that glucose could be metabolized and subsequently converted in the fermentation process via the EMP and PPP pathways, as illustrated in Supplementary Fig. 4. However, there was no reference pathway for xylose metabolism that could be blasted from the KEGG database. Therefore, several functional proteins related to xylose transportation and metabolism were further mined based on the gene annotation results.
As a result, we discovered that the Klebsiella sp. WL1316 was equipped with the ribose/xylose/arabinose/galactoside ABC-type transport system, as well as the D-xylose transport system ATP-binding component, D-xylose ABC transporter, and periplasmic D-xylose-binding protein, which are conducive to the transmembrane transport of xylose in bacteria. Furthermore, in terms of xylose metabolism, this strain was found to possess xylose isomerase and xylulokinase, which isomerises xylose to xylulose and further phosphorylates xylulose-5-phosphate to enter the pentose phosphate pathway for metabolism. The schematic pathways are illustrated in Fig. 1. Such xylose metabolism way was similar to the genome-annotated xylose-metabolizing

Identification of the genes coding for key functional proteins for hydrogen production
According to the literature, Klebsiella is a facultative anaerobic bacterium that is able to generate hydrogen gas via mixed acid fermentation pathway, in which the key enzyme for hydrogen production is formate-hydrogen lyase (Jawed et al. 2016). In addition, hydrogenase is also an important enzyme that catalyses the hydrogen production process. As such, we analysed the hydrogenase and formate-hydrogen lyase-related functional proteins of Klebsiella sp. WL1316 based on genome annotation results. As shown in Fig. 2, this hydrogen-producing bacterium possesses diverse hydrogenase and formate-hydrogen lyase-related functional proteins. The hydrogenase functional proteins included the protein subunits of hydrogenase, the components of the Fe-S cluster and nickel-iron ion binding protein in the hydrogenase active center, and the assembly protein in the metal ion binding center of hydrogenase. The hydrogenase-related gene cluster is illustrated in Fig. 2a. The formate-hydrogen lyase functional proteins included multiple protein subunits, formate-hydrogen lyase regulatory proteins and transcriptional activators, and formate-hydrogen lyase Fe-S protein (Fig. 2b). The distributions of gene clusters of hydrogenase and formate-hydrogen lyase were all located in the genome (Fig. 2c). The annotated functional proteins and their Gene_ ids are listed in Supplementary Table 2. The diversity of hydrogenase and hydrogen formate lyase indicated that the strain had a strong hydrogen production potential at the nucleic acid and protein levels. Hydrogenase is the most important functional enzyme that catalyses the hydrogen production process in hydrogen-producing bacteria (Shekhar et al. 2021;Wang and Yin 2021). Therefore, we further compared the difference in the category and number of hydrogenase functional proteins between Klebsiella sp. WL1316 and the five reference bacteria based on the bacterial genome information (Fig. 3); here, Klebsiella sp. WL1316 and the five reference bacteria were belonged to Enterobacteriaceae, especially Enterobacter ludwigii EcWSU1 and Enterobacter sp. R4-368 were reported hydrogen-producing bacteria, so the comparative genomic analysis not only ensured the similarity of these strains in genetic relationship, but also ensured the effective comparison of hydrogen production information. Figure 3a reveals  which is close to the hydrogen-producing bacterium Enterobacter sp. R4-368 (Fig. 3b). This result indicated that Klebsiella sp. WL1316 might obtain a high hydrogen production potential from the genomic level, which got the highest diversity and quantity of genes encoding the hydrogenase proteins among the tested bacteria.

Construction of the genome scale metabolic model for hydrogen production
Based on the above analysis of glucose metabolic pathways, xylose metabolism-related enzymes and functional proteins, as well as enzymes and functional proteins related to biological hydrogen synthesis, the metabolic network for hydrogen production by Klebsiella sp. WL1316 from the fermentation of cotton stalk hydrolysate was reconstructed. This metabolic network covers not only the glucose and xylose metabolic pathways, but also the biohydrogen synthesis pathway and some competitive metabolic branches, as illustrated in Fig. 4. The biohydrogen synthesis pathway included the mixed acid fermentation pathway and hydrogenase catalytic pathway, while the competitive metabolic branches included the succinic, lactic, acetic acids, and ethanol generation pathways, which acted as the branches of the mixed acid fermentation pathway.

Metabolic analysis for hydrogen production of Klebsiella sp. WL1316 from the fermentation of cotton stalk hydrolysate
As a facultative anaerobic bacterium, Klebsiella sp. typically generates hydrogen gas via mixed acids fermentation pathway (Sivaramakrishnan et al. 2021), so the measurement of organic acids produced in this pathway is basic to monitor the distribution of hydrogen production flow. Pyruvic acid is an important intermediate metabolite of Klebsiella sp. WL1316 for the metabolism of glucose and xylose in cotton stalk hydrolysate. This strain enters the mixed acid pathway for fermentative hydrogen production, which was initiated by the decarboxylation of pyruvate. Therefore, pyruvate decarboxylase (PDC) is the key enzyme involved in this metabolic pathway. Supplementary Fig. 5 shows that PDC activity was relatively high after 24 h of fermentation, indicating that this enzyme promoted pyruvate decarboxylation and then entered the fermentation pathway. Thereafter, PDC activity decreased gradually to 96 h, and the PDC activity declined to a relatively low level, almost 0, after 120 h of fermentation. The lactate dehydrogenase (LDH) activity was low at 24-48 h and increased rapidly to the highest level (0.027 U/10 4 cells) at 72 h, indicating that high LDH activity in 72-h fermentation may lead to a large accumulation of lactic acid and compete for the distribution of metabolic carbon skeleton in the biohydrogen synthesis pathway. Such a competitive effect for biohydrogen synthesis controlled by LDH was consistent with that reported by Jung et al. (2012). The aldehyde dehydrogenase (ALDH) activity reached the highest level after fermenting 48-96 h, indicating that the metabolic pathway of acetic acid and ethanol synthesis mainly competed for the metabolic carbon skeleton of the biohydrogen synthesis in the mid-to-late fermentation stage. The metabolic analysis was first developed toward the key metabolites in the conversion of sugars, such as glucose, xylose, and xylulose (Fig. 5). Glucose and xylose consumption rates decreased with fermentation time during the fermentation stage, and the results showed that glucose was metabolized effectively, which promoted the synthesis and fermentation of cell material. According to the biohydrogen synthesis metabolic network of Klebsiella sp. WL1316, Fig. 4 The metabolic network for fermentative hydrogen production involving glucose and xylose utilisation, biohydrogen synthesis, and several competitive metabolic pathways xylose was metabolized by converting xylose to xylulose through the PPP pathway; therefore, the generation rate of xylulose was further monitored. Although the generation rates of xylulose were lower than consumption rates of xylose during the whole fermentation stage, its generation rates also gradually decreased, indicating that xylulose was also rapidly metabolized via the PPP pathway, which was provided as a carbon skeleton for the fermentation process.
A survey of the literature showed that key node metabolites, such as succinic, lactic, acetic, formic acids, and ethanol, are typically present in the mixed acid fermentation pathway (Jenni et al. 2013;Jung et al. 2012;Yoshida et al. 2005;Lu et al. 2009;Zhao et al. 2009). Therefore, the time-dependent generation profiles of these key node metabolites were analysed (Fig. 6). Pyruvic acid is the most important metabolic intermediate, which initiates the mixed acid fermentation pathway, especially for hydrogen gas generation. Pyruvate is also a key node metabolite for mixed acid fermentation. In this study, the pyruvate node obtained a high generation rate at 24 h of fermentation and then decreased rapidly, indicating that pyruvate underwent rapid decarboxylation transformation, with more carbon skeletons flowing to the fermentation process. The splitting of formic acid is the most important hydrogen-producing pathway for facultative anaerobes Lu et al. 2009;Yoshida et al. 2005). The results revealed that the generation rate of formic acid was relatively low at 24 h of fermentation but reached a peak value at 48 h, indicating that a large amount of formic acid accumulated between 24 and 48 h and that the metabolic carbon skeleton mostly flowed to the mixed acid fermentation pathway at this fermentation stage. Subsequently, the generation rate of formic acid decreased sharply, indicating that formic acid may be used for biohydrogen synthesis in large quantities during the mid-to-late fermentation stage. The shift in citric acid concentration revealed that there was a high generation rate after 24-48-h fermentation at the citric acid metabolic node, which enabled many of the metabolic carbon skeletons to originate from the flow of pyruvate nodes to the TCA cycle. The generation rates of succinic acid were relatively low during the whole fermentation stage, indicating that succinic acid accumulated less under anaerobic fermentation conditions, indicating that it may compete for the metabolic carbon skeleton of biohydrogen synthesis with relatively weak competitiveness. The generation rate of the lactic acid node plays an important role in the whole fermentation stage, which was enhanced sharply after fermenting for 48 h, and reached a peak value at 72 h, indicating that the lactic acid production pathway may be the main competitive branch for biohydrogen synthesis. In addition, the generation rates of the ethanol and acetic acid in the same metabolic branch also significantly affected biohydrogen synthesis. The generation rate of the ethanol remained at a high level during the whole fermentation stage and decreased gradually with the fermentation time but increased to a certain extent at 120 h of fermentation. This indicates that ethanol may accumulate more in the late fermentation period at this time, and it may be an important competitive branch for biohydrogen synthesis. The metabolic flux of acetic acid was higher during the fermentation period of 48-72 h, indicating that the acetic acid production was higher in this fermentation stage, which may be strongly competitive for the metabolic carbon skeleton. In general, acetic acid was produced in a large quantity after 48 h of fermentation, and ethanol was accumulated in large quantities in this  Fig. 6 Metabolic analysis of the key node metabolites for biohydrogen production process in Klebsiella sp. WL1316 metabolic branch, indicating that this metabolic branch is an important branch in competing for the carbon skeleton of biohydrogen synthesis after 48 h of fermentation.

Conclusions
Overall, this study obtained novel results with regards to three aspects: (1) demonstrating the glucose and xylose metabolic pathways at the gene level of Klebsiella sp. WL1316, which was conformed to this strain's fermentative performance that utilize the glucose and xylose in cotton stalk hydrolysate for hydrogen production; (2) revealing the high effective hydrogen production potential of Klebsiella sp. WL1316 from the gene diversity level and confirming the occurrence of multiple pathways affecting the efficient production of hydrogen gas that are commonly found in facultative anaerobes; and (3) constructing a genome scale of the metabolic network model (GSMM) for hydrogen production and verifying the occurrence of the competitive and hydrogen biosynthesis pathways in Klebsiella sp. via measurement of consumption or generation rates, which is crucial for understanding the hydrogen production potential of this strain. On this basis, improvements in the reducing sugar utilisation pathways and channelling the metabolic flux of the competitive and hydrogen biosynthesis pathways will be needed to fine-tune the metabolic capabilities of this strain for highly efficient hydrogen production. Hence, the GSMM constructed in this study could play a pivotal role in the design and acceleration of the metabolic engineering approaches to harness the potential of this hydrogenproducing bacterium in subsequent studies.