Whole genome sequencing and bioinformatics analysis of Photobacterium kishitanii FJ21


 Background: Photobacterium kishitanii FJ21 is isolated and purified in the laboratory, and it is a kind of gram-negative bacteria that can emit blue-green fluorescence under normal conditions. In order to understand the lux operon that has luminescence activity, the whole-genome sequencing and bioinformatics analysis of Photobacterium kishitanii were carried out by using Nanopore sequencing technology. The sequencing data were predicted by the secondary structure and tertiary structure of the protein encoded by lux genes, genome assembly, GC content, gene prediction and functional annotation, phylogenetic tree, genome collinearity analysis and secondary metabolite synthesis gene cluster prediction. Result: Sequencing results showed that there were luxC, luxD, luxA, luxB, luxF, luxE and luxG genes in Photobacterium kishitanii FJ21, and the protein encoded by lux genes had certain hydrophilicity. The genome of this strain contains a chromosome with a total length of 4853277bp and a GC content of 39.23%. There are 3141, 1769, 2472, 4070, 3514 and 1413 genes annotated in COG, KEGG, GO, Refseq, Pfam and TIGRFAMs databases, respectively. At the same time, three types of secondary metabolite gene clusters are predicted, namely RiPP-like, betalactone and arylpolyene. 870 genes were annotated in PHI database. The antibiotic resistance genes of the strain were annotated with CARD database. Conclusion: This study reported the whole genome sequence Photobacterium kishitanii FJ21, and the related research results will provide a basis for further study of lux genes, photogenic activities and mining of other functional genes of this strain.


Background
Luminescent bacteria are a type of Gram-negative bacteria that can emit blue-green uorescence under normal conditions [1], and live in the marine environment mainly with the form of free organisms or parasites.
The bioluminescence of luminescent bacteria is regulated by the enzyme-catalyzed reaction encoded by the lux genes. The luxA and luxB genes encode the α and β subunits of luciferase, respectively. luxC, luxD and luxE constitute the fatty acid reductase complex, responsible for the synthesis of long Aldehyde substrate, luxG encodes avin reductase [2]. Conserved genes luxC, luxD, luxA, luxB, luxE, and luxG exist in all luminescent bacteria that have been discovered [3], in addition, there are other genes such as luxI, luxR and luxF [4]. Although luminescent bacteria have the same lux genes, these bacteria show great differences in characteristics such as growth behavior, luminescence intensity, or bioluminescence regulation [5]. The luminescence process of luminescent bacteria is oxidize FMNH 2 and RCHO to FMN and RCOOH under the catalysis of intracellular speci c luciferase and the participation of molecular oxygen, and at the same time release blue-green light [6]. The luminescence reaction is as follows: Luminescent bacteria can not only act as biosensors [7], but also produce antibacterial compounds[8], lipase [9], asparagine [10] and esterase [11]. Due to the luminescent bacteria method has the advantages of high sensitivity, simple processing, rapid response, and real-time monitoring, it has been widely used in the monitoring of water toxicity and environmental pollutants [12], and the acute and chronic toxicity tests of heavy metal mixtures [13]. At present, the luminescent bacteria commonly used in water quality and environmental monitoring are Photobacterium phosphoreum, Vibrio scheri and Vibrio Qinghaiensis [14]. The bright luminescence is usually used in the national standard GB/T15441-1995 for the determination of acute toxicity of water quality. The basic principle of luminescent bacteria for acute toxicity detection is that the luminescence process is easily affected. As long as the respiration or physiological process of bacteria is disturbed, the luminescence intensity of the bacteria will change [15].
Recently, with the development of high-throughput sequencing technology, many microorganisms have completed genome sequencing. Whole-genome sequencing is an important foundation of microbial molecular mechanism research and development. To further understand the structure of the protein encoded by the lux genes of Photobacterium kishitanii and some functional genes, the whole genome sequencing and bioinformatics analysis of Photobacterium kishitanii FJ21 were carried out by Nanopore sequencing technology in this study. The related research results will provide a basis for the in-depth study of the lux genes, luminescent activities and other functional genes of Photobacterium kishitanii FJ21.

Genome Sequencing and Assembly
The sequencing data are shown in Table 1 and the assembly results are shown in Table 2. The genome size was 4853277 bp, the number of coding genes was 4131, and the N50 was 3252201 bp. ATGC content accounted for 30.49%, 30.29%, 19.72%, and 19.50% of the total base, respectively, and the GC content was 39.23%. The sequencing depth distribution is shown in Figure 1, and the genome circle is shown in Figure 2.
The genome sequence of Photobacterium kishitanii FJ21 has been submitted to the GenBank database with accession number SRX10356131. Genome Structure and Function Annotation The genome contains 4131 CDSs, 4027962 bp in length, and a CRISPR sequence (a cluster of regularly spaced short palindromic repeats, often found in many bacteria and archaea). Gene islands are not predicted on the genome. The results of genome structure prediction are shown in Table 3. There are 1,769, 3,141, 2472, 4070, 3514, and 1413 genes were annotated respectively related to KEGG pathway, COG category, GO, Refseq, Pfam and TIGRFAMs databases.
The corresponding protein sequence was compared with the COG database to complete the annotation classi cation of homologous genes, and the coding genes including information storage and processing, cell biology process and signal transduction, basic metabolism, and unknown functions were obtained [34]. As shown in Figure 3, a total of 3141 proteins obtained COG functional annotations, accounting for 76.03% of the total number of predicted genes, including Energy production and conversion, Amino acid transport and metabolism, Carbohydrate Transport and metabolism, Translation/ribosomal structure and biogenesis, Transcription, Cell wall/membrane/envelope biogenesis, Inorganic ion transport and metabolism, and the number of genes was 232, 328, 210, 265, 266, 267 and 194, respectively.

GO function classi cation
As shown in Figure 4, a total of 2472 genes were annotated for GO function, accounting for 59.84% of the total number of predicted genes. GO function mainly divides them into molecular function, biological process and cellular component [16]. In molecular function, there are many genes annotated by molecular transducer activity, antioxidant activity and transporter activity. In the biological process, there are many genes annotated by metabolic process, positive regulation of the biological process, negative regulation of the biological process. In cell components, there are many genes annotated by cell, cell part and membrane part. Therefore, GO functional annotation is more convenient for us to understand the biological signi cance behind genes.

KEGG pathway analysis
The 2945 genes in the KEGG pathway were enriched in 208 metabolic pathways( Figure 5), and the number of effectively annotated genes was 1769, accounting for 42.82% of the total predicted genes. There are ve categories, namely Metabolism, Genetic Information Processing, Environmental Information Processing, Cellular processes and Organismal Systems. The most annotated genes in metabolism are carbohydrate metabolism, energy metabolism, and amino acid metabolism. The main pathways are oxidative phosphorylation pathway (ko00190) (40 genes), arginine and proline pathway (ko00330) (16 genes), glycolysis/ gluconeogenesis pathway (ko00010) (31 genes), citric acid cycle (TCA cycle)(ko00020) (19 genes). The least genes were annotated in organismal systems.
Analysis of interaction genes between pathogen and host 870 genes were annotated in the PHI database ( Figure 6), of which 326 genes(55.72%) resulted in reduced virulence. There were 43 increased virulence genes, 217 unaffected pathogenicity genes, 82 loss of pathogenicity genes, 103 effector genes, and resistance to chemical and sensitivity to chemical genes were the least. In this annotation, most of the genes belonged to the reduced virulence genes and unaffected pathogenicity genes. Effector genes are associated with pathogenicity, but increased virulence genes are the key genes.

Annotation of Resistance Genes in the CARD Database
The information was shown in Table 4. Including the classi cation of ARO, the Identities, the classi cation of antibiotics, the resistance mechanism, and the classi cation of the AMR gene family. The highest identities can reach 100%. The resistance mechanisms are antibiotic e ux and antibiotic target Alteration. Then the whole genome sequence was constructed a phylogenetic tree (Figure 7), and the results showed that Photobacterium kishitanii, Photobacterium phosphoreum, Photobacterium aquimaris, Photobacterium malacitanum, and Photobacterium carnosumwere clustered together in the phylogenetic tree.
Physicochemical properties of the protein encoded by lux genes Online software Expasy (https://www.expasy.org/) was used to predict the physicochemical properties of proteins encoded by luxC, luxD, luxA, luxB, luxF, luxE, and luxG genes ( Table 5). It can be seen that the positive charge residues carried by the protein encoded by the gene are less than the negative charge residues, and the isoelectric point is between 4.98 and 5.43, indicating that the protein is easy to precipitate between these values. Only the protein encoded by luxF gene is unstable, and the protein encoded by other genes is stable. In addition, the hydrophobicity is negative, indicating that these proteins have certain hydrophilicity.

Prediction of secondary and tertiary structures proteins encoded by lux genes
For unknown proteins, their secondary and tertiary structures can be predicted by amino acid sequences. The online analysis software PSIPRED was used to predict the secondary structure of the protein encoded by lux genes (Figure 8). The amino acid sequence of the pink part was αhelix, and the amino acid sequence of the yellow part was β folding. SWISS-MODEL database (http://swissmodel.expasy.org/repository/) was used to predict the tertiary structure of proteins ( Figure 9). The protein structure in this database was predicted by the homology modeling method. When the sequence similarity between the predicted protein and the template protein exceeds 30%, the homology modeling method can generate the tertiary structure of the protein with a prediction accuracy of 90%.
Except for luxG gene, the similarity of sequences encoded by other genes was more than 30% after alignment, so the tertiary structure of the protein was closer to the real structure. From the predicted secondary and tertiary structures of lux genes encoding proteins, it can be seen that the α subunit and β subunit of the luciferase encoded by luxA and luxB genes have β folding barrel structures, which may be these two genes play an important role in the luminescence activity of luminescent bacteria. Understanding the tertiary structure of proteins is of great signi cance for studying functional structures (such as molecular docking and virtual screening).

Genome collinearity analysis
The basic characteristics of genomes The genome size of the six strains is similar, ranging from 4380538bp to 4853277bp, and the number of coding genes is 3739-4131 (Table 6). The genome characteristics of different strains of the same bacteria are closer, the stain FJ21 is closer to Photobacterium kishitanii in genome size. Compared with the number of coding genes and RNA of other strains, the results showed that the number of coding genes and RNA predicted were signi cantly higher than others. Collinearity analysis MUMmer (version 3.23) software was used to compare the strain FJ21 with Photobacterium kishitanii, Photobacterium phosphoreum, Photobacterium aquimaris, Photobacterium malacitanum, and Photobacterium carnosum. The collinearity and structural variation of genomic sequences are shown ( Figure  10), and there were 218, 744, 717, 748 and 708 contrast blocks between s__ Photobacterium kishitanii, s__Photobacterium phosphoreum, s__Photobacterium aquimaris, s__Photobacterium malacitanum, s__Photobacterium carnosum and the strain FJ21, respectively. They accounted for 86.92%, 70.27%, 64.82%, 65.16% and 56.32% of the genome of the strain, respectively. According to the results, the collinearity between genomes is good, but there are a small number of genome rearrangement events such as inversion and translocation. It can be seen that the six strains still have great differences in evolution.

Prediction and Analysis of Secondary Metabolite Gene Clusters
The encoding genes of secondary metabolites are usually clustered in the genome, encoding complex enzymes with multiple functions. AntiSMASH ( version 6.0.0 ) software was used to predict the gene cluster of the assembled genome. Three types of secondary metabolite gene clusters, RiPP-like, betalactone, and arylpolyene, were predicted in FJ21 genome (Table 7). 90% similarity between arylpolyenes and APE Vf synthetic gene clusters.

Discussion
Nanopore sequencing technology was used to obtain the whole genome sequence and bacterial complete map of Photobacterium kishitanii FJ21. Amino acid metabolism and transport gene abundance were the highest in COG functional annotation. Molecular sensor activity, antioxidant activity and transport activity in GO functional annotation accounted for a large number of genes, indicating that the molecular function of the strain was better. The number of genes related to metabolism in KEGG pathway is the largest, so the strain has a strong metabolic ability. Most of the genes annotated in the PHI database belonged to the reduced virulence genes and unaffected pathogenicity genes, indicating that the pathogenic ability of the strain was weak or nonpathogenic. Most luminous bacteria are nonpathogenic[18], while two subspecies of Vibrio harveyi and Photobacterium damselae are pathogens of many aquatic organisms [19,20]. Many species of Vibrio and Photobacterium have multiple resistance to many common antibiotics. Therefore, we annotated the antibiotic resistance gene of Photobacterium kishitanii FJ21 using the CARD database, and found that the strain was resistant to eight kinds of antibiotics, and the resistance mechanism was antibiotic e ux and target site change.
The 16S rRNA sequence of FJ21 was blastn compared with the NCBI database, and the results showed that FJ21 belonged to Photobacterium. Then the whole genome sequence was constructed a phylogenetic tree. According to the predicted secondary and tertiary structure of lux gene and its encoded protein, the strain contained luxC, luxD, luxA, luxB, luxF, luxE and luxG genes, and the α subunit and β subunit of the uorescent enzyme encoded by luxA and luxB genes had a β fold barrel structure, which was due to the important role of these two genes in the luminescence activity of luminescent bacteria. However, the function of luxF gene and the effect of luxF gene are still uncertain, which needs further study. Recent studies on the effect of luxF on light intensity in bacterial bioluminescence have shown that luxF has a signi cant effect on the maximum light intensity [21].
In the genome collinearity analysis, the genome sizes and basic characteristics of the six strains were similar, and the genome collinearity was good, but there were a small number of genome rearrangement events such as inversion and translocation. Inversions and translocations may result in the fusion of genes that further alter certain functions in species [22], and this may be one of the reasons why different strains of the same species differ. Photobacterium kishitanii FJ21 has higher homology with Photobacterium kishitanii ATCC:BAA-1194. By comparing the number of coding genes and RNA of strains, the results showed that the number of coding genes, tRNA ,and rRNA predicted by Photobacterium kishitanii FJ21 were signi cantly higher than those of the other ve strains, which may be related to strain evolution and environmental adaptation.
The completion of the whole-genome sequencing provides the whole genome information of Photobacterium kishitanii FJ21, and provides the basis for the further study of lux genes, luminous activities and other functional genes of the strain. Subsequently, recombinant luminescent bacteria can be explored or the lux genes can be recombined into the vector to realize the monitoring of acute toxic substances. For example, the vector is constructed and introduced into E. coli [23], so that E. coli can play the role of luminescent gene [24].

Conclusions
Through whole genome sequencing, the genome collinearity analysis and bioinformatics analysis it was possible to determine the strain FJ21 is Photobacterium kishitanii and analyze the lux genes. The strain contains a chromosome with a total length of 4853277bp, and GC content of 39.23%. According to the predicted secondary and tertiary structure of lux gene and its encoded protein, the strain contained luxC, luxD, luxA, luxB, luxF, luxE and luxG genes. However, the function of luxF gene and the effect of luxF gene are still uncertain. Understanding the lux genes will help to undestand luminescent activities and mechanism of Photobacterium kishitanii FJ21. Therefore, the strain can be used in water quality testing or monitoring of acute toxic substances.

Materials And Methods
Strain and Growth Conditions.
The strain FJ21 was isolated from the Microbiology Laboratory of College of Biological Science and Technology, Hunan Agricultural University. 2 ml 3% NaCl solution was added into the freeze-dried powder of Photobacterium kishitanii FJ21. After three generations of plate separation and streaking, it was transferred to liquid medium(Beef extract 0.5%, tryptone 0.5%, Na 2 HPO 4 0.6%, KH 2 PO 4 0.275%, glycerol 0.3%, and NaCl 3%). After 24 hours of culture at 25℃ and 180 r / min, the glycerol-preserved strain was sent to the company for genome sequencing.

Genome Sequencing and Assembly
Nanopore sequencing technology [25] was used to complete the genome scanning and sequencing of the strain. Firstly, high-quality DNA was extracted with Qiagen kit, and ID library was constructed. The DNA was sequenced by single molecule using Oxford Nanopore Technology sequencing instrument PromethION to obtain the original sequencing data. After quality control of the obtained sequencing data, the whole genome scanning of the strain was completed by bioinformatics analysis.
Assembly: the three-generation data after quality control were assembled with ye, corrected with racon combined with three-generation sequencing data. The corrected genome uses its script to detect whether the loop is formed. After removing redundant loops, the origin of the sequence is moved to the replication initiation site of the genome by the circlator [26], to obtain the nal genome sequence. Sequencing depth analysis: After the assembly was completed, the quality control was compared with minimap2 [27] to the genome. Genome Structure and Function Annotation Genome structure prediction includes coding gene prediction, non-coding gene prediction, CRISPR prediction and gene island prediction. The coding genes was predicted by prodigal [28]. In the prediction of non-coding genes, RNAmmer [29] and tRNAscan-SE2.0 [30] software were used to predict rRNA and tRNA in the genome, respectively. Other non-coding RNAs (ncRNAs) were predicted by the Infernal 1.1 [31] search Rfam 13.0 database [32], and the predicted length was greater than 80% of the sequence length in the database. CRISPR was predicted by minced, and gene island was predicted by Islander [33].
After extracting the genome-coded proteins, InterProScan 5 [34] was used for annotation, and the annotation information of TIGRFAMs [35], Pfam[36], and GO [37]databases were extracted. Blastp was used to compare the encoded proteins to KEGG[38], Refseq [39] and COG[40] databases, and the best results with a coverage of more than 30% were retained as annotation results. The interaction genes between pathogen and host were annotated by PHI database, and the antibiotic resistance genes were annotated by CARD database.
Phylogenetic tree Analysis MEGA7.0 software was used to analyze the strain Photobacterium kishitanii FJ21 and construct a phylogenetic tree by the Neighbor-Joining method.
Prediction of protein structure encoded by luxABCDEF gene Prediction of protein secondary structure by protein analysis online software SWISS-MODEL.

Genome collinearity analysis
The whole-genome sequencing of Photobacterium kishitanii FJ21 was analyzed by collinearity with other genome sequences with similarity of 95% in the NCBI database. MUMmer ( version 3.23 ) was used to quickly compare the genomes of Photobacterium kishitanii FJ21 and ve closely related strains ( s__Photobacterium kishitanii, s__Photobacterium phosphoreum, s__ Photobacterium aquimaris, s__Photobacterium malacitanum, s__Photobacterium carnosum ). Visualize each Contig of the genome using a brown box in the ggplot2 package in R language. Yellow lines between genomes represent Colinear and blue lines between genomes represent Inversion. Prediction of secondary metabolite gene cluster The assembled genome was analyzed by using antiSMASH version 6.0.0, and the parameters were selected from taxon bacteria.

Declarations
Ethics approval and consent to participate No formal ethics approval was required in this particular case.

Consent for publication
Not applicable

Availability of Data and materials
The available complete genome sequence has been admitted to GenBank databases with accession number SRX10356131.

Competing interests
The authors declare that they have no competing interests in this paper.

Funding
This study was supported and funded by the National Natural Science Foundation of China(31672457) and the Project of Hunan Provincial Department of science and technology(2019TP2004, 2020NK2004 2021JJ30008).

Figure 7
Phylogenetic analysis of Photobacterium kishitanii FJ21 and high evolutionary similar strains.

Figure 8
Prediction of protein secondary structure encoded by lux genes.

Figure 9
Prediction of protein tertiary structure encoded by lux genes.