Genome Sequencing and Assembly
The sequencing data are shown in Table 1 and the assembly results are shown in Table 2. The genome size was 4853277 bp, the number of coding genes was 4131, and the N50 was 3252201 bp. ATGC content accounted for 30.49%, 30.29%, 19.72%, and 19.50% of the total base, respectively, and the GC content was 39.23%. The sequencing depth distribution is shown in Figure 1, and the genome circle is shown in Figure 2. The genome sequence of Photobacterium kishitanii FJ21 has been submitted to the GenBank database with accession number SRX10356131.
Table 1
Sequencing data statistics.
Name
|
Base(bp)
|
Longest Reads(bp)
|
mean length(bp)
|
mean quality
|
Number Reads
|
Reads N50(bp)
|
Raw Reads
|
2200768252
|
172754
|
22420.21
|
8.74
|
98160
|
32173
|
FilteredReads
|
1823049947
|
172754
|
23391.03
|
9.58
|
77938
|
32731
|
Table 2
Assembly result statistics.
Total bases
|
Contig number
|
Contig N50
|
Longest Contig
|
Shortest Contig
|
4853277
|
2
|
3252201
|
3252201
|
1601076
|
Genome Structure and Function Annotation
The genome contains 4131 CDSs, 4027962 bp in length, and a CRISPR sequence (a cluster of regularly spaced short palindromic repeats, often found in many bacteria and archaea). Gene islands are not predicted on the genome. The results of genome structure prediction are shown in Table 3. There are 1,769, 3,141, 2472, 4070, 3514, and 1413 genes were annotated respectively related to KEGG pathway, COG category, GO, Refseq, Pfam and TIGRFAMs databases.
Table 3
Genome structure prediction results.
Type
|
Number
|
Length (bp)
|
% genome
|
tRNA
|
226
|
18001
|
0.37
|
rRNA(16s)
|
22
|
33990
|
0.70
|
rRNA(23s)
|
22
|
63668
|
1.31
|
rRNA(5s)
|
24
|
2784
|
0.06
|
CDS
|
4131
|
4027962
|
82.99
|
CRISPR
|
1
|
807
|
0.02
|
COG function classification
In the COG category(Figure 3), there are Energy production and conversion (232 genes), Amino acid transport and metabolism (328 genes), Carbohydrate Transport and metabolism (210 genes), Translation, ribosomal structure and biogenesis (265 genes), Transcription (266 genes), Cell wall/membrane/envelope biogenesis (267 genes) and Inorganic ion transport and metabolism (194 genes).
The corresponding protein sequence was compared with the COG database to complete the annotation classification of homologous genes, and the coding genes including information storage and processing, cell biology process and signal transduction, basic metabolism, and unknown functions were obtained[34]. As shown in Figure 3, a total of 3141 proteins obtained COG functional annotations, accounting for 76.03% of the total number of predicted genes, including Energy production and conversion, Amino acid transport and metabolism, Carbohydrate Transport and metabolism, Translation/ribosomal structure and biogenesis, Transcription, Cell wall/membrane/envelope biogenesis, Inorganic ion transport and metabolism, and the number of genes was 232, 328, 210, 265, 266, 267 and 194, respectively.
GO function classification
As shown in Figure 4, a total of 2472 genes were annotated for GO function, accounting for 59.84% of the total number of predicted genes. GO function mainly divides them into molecular function, biological process and cellular component[16]. In molecular function, there are many genes annotated by molecular transducer activity, antioxidant activity and transporter activity. In the biological process, there are many genes annotated by metabolic process, positive regulation of the biological process, negative regulation of the biological process. In cell components, there are many genes annotated by cell, cell part and membrane part. Therefore, GO functional annotation is more convenient for us to understand the biological significance behind genes.
KEGG pathway analysis
The 2945 genes in the KEGG pathway were enriched in 208 metabolic pathways(Figure 5), and the number of effectively annotated genes was 1769, accounting for 42.82% of the total predicted genes. There are five categories, namely Metabolism, Genetic Information Processing, Environmental Information Processing, Cellular processes and Organismal Systems. The most annotated genes in metabolism are carbohydrate metabolism, energy metabolism, and amino acid metabolism. The main pathways are oxidative phosphorylation pathway (ko00190) (40 genes), arginine and proline pathway (ko00330) (16 genes), glycolysis/ gluconeogenesis pathway (ko00010) (31 genes), citric acid cycle (TCA cycle)(ko00020) (19 genes). The least genes were annotated in organismal systems.
Analysis of interaction genes between pathogen and host
870 genes were annotated in the PHI database (Figure 6), of which 326 genes(55.72%) resulted in reduced virulence. There were 43 increased virulence genes, 217 unaffected pathogenicity genes, 82 loss of pathogenicity genes, 103 effector genes, and resistance to chemical and sensitivity to chemical genes were the least. In this annotation, most of the genes belonged to the reduced virulence genes and unaffected pathogenicity genes. Effector genes are associated with pathogenicity, but increased virulence genes are the key genes.
Annotation of Resistance Genes in the CARD Database
The information was shown in Table 4. Including the classification of ARO, the Identities, the classification of antibiotics, the resistance mechanism, and the classification of the AMR gene family. The highest identities can reach 100%. The resistance mechanisms are antibiotic efflux and antibiotic target Alteration.
Table 4
Antibiotic resistance gene annotation.
ARO
|
%Identities
|
Drug Class
|
Resistance Mechanism
|
AMR Gene Family
|
rsmA
|
89.29
|
fluoroquinolone antibiotic, diaminopyrimidine antibiotic, phenicol antibiotic
|
antibiotic efflux
|
resistance-nodulation-cell division (RND)
|
vanWB
|
100
|
glycopeptide antibiotic
|
antibiotic target Alteration
|
antibiotic efflux pumpvanW, glycopeptide
|
CRP
|
94.29
|
macrolide antibiotic, fluoroquinolone antibiotic; penam
|
antibiotic efflux
|
resistance gene cluster resistance-nodulation-cell division (RND) antibiotic efflux pump
|
Haemophilus influenzae PBP3 conferring resistance to beta-lactam antibiotics
|
46.85
|
cephalosporin; cephamycin; penam
|
antibiotic target alteration
|
Penicillin-binding protein mutations conferring resistance to beta-lactam antibiotics
|
Analysis of carbohydrate-related enzymes (CAZy)
Carbohydrates are the main source of energy needed to maintain life activities and are the most widely distributed organic compounds in nature. The carbohydrate-related enzymes (CAZy) database collected six categories of enzymes, namely glycoside hydrolases (GHs), glycosyltransferases (GTs), carbohydrate esterases (CEs), carbohydrate-binding modules (CBMs), auxiliary module enzymes ( AAs ) and polysaccharide lyases (PLs)[17].
In this database, Photobacterium kishitanii FJ21 contains 51 carbohydrate-related enzymes. Among them, glycoside hydrolases (GHs) gene annotation is the most. There are 19 types of 43 genes, accounting for 34.4%. Glycosides produced by enzymatic hydrolysis of glycosidic bonds can be used in biological metabolic pathways. Glycosyltransferases (GTs) have 12 types and 32 genes, accounting for 25.6%. GTs can participate in a variety of life activities in cells, transferring monosaccharides of active substances in vivo to proteins, lipids, sugars, and nucleic acids to form glycosylation. Carbohydrate esterases (CEs) and carbohydrate-binding modules (CBMs) accounted for 16%, respectively. The auxiliary modular enzymes (AAs) and polysaccharide lyases (PLs) genes were the least.
Phylogenetic tree
Blast alignment of 16S rRNA sequence with NCBI database showed that FJ21 belonged to Proteobacteria, Gammaproteobacteria, Vibrionales, Vibrionaceae, and Photobacterium.
Then the whole genome sequence was constructed a phylogenetic tree (Figure 7), and the results showed that Photobacterium kishitanii, Photobacterium phosphoreum, Photobacterium aquimaris, Photobacterium malacitanum, and Photobacterium carnosumwere clustered together in the phylogenetic tree.
lux genes analysis
The sequencing results showed that luxC(1437bp), luxD(921bp), luxA(1074bp), luxB(987bp), luxF (696bp), luxE (1122bp), luxG(705bp) genes existed in Photobacterium kishitanii FJ21.
Physicochemical properties of the protein encoded by lux genes
Online software Expasy (https://www.expasy.org/) was used to predict the physicochemical properties of proteins encoded by luxC, luxD, luxA, luxB, luxF, luxE, and luxG genes (Table 5).
Table 5
Prediction of physical and chemical properties of protein encoded by lux genes.
Type
|
luxC
|
luxD
|
luxA
|
luxB
|
luxF
|
luxE
|
luxG
|
Number of base pairs
|
1437
|
921
|
1074
|
987
|
696
|
1122
|
705
|
Number of amino acids
|
478
|
306
|
357
|
328
|
231
|
373
|
234
|
Relative molecular mass
|
54165.62
|
34417.89
|
40536.85
|
37494.22
|
26616.19
|
42929.59
|
26147.99
|
Theoretical isoelectric point
|
5.38
|
4.98
|
5.21
|
5.26
|
5.07
|
5.09
|
5.43
|
Residual number of negative charge
|
62
|
42
|
51
|
42
|
32
|
52
|
27
|
Residual number of positive charge
|
47
|
28
|
36
|
27
|
25
|
38
|
23
|
Extinction coefficient
|
77155
or76780
|
34045
or33920
|
47245
or46870
|
30620
or30370
|
29005
or28880
|
47705
or47330
|
31775
or31400
|
Instability coefficient
|
37.57
stability
|
38.63
stability
|
34.29
stability
|
30.53
stability
|
41.84
instability
|
36.90
stability
|
33.71
stability
|
hydrophobicity
|
-0.223
|
-0.153
|
-0.381
|
-0.32
|
-0.362
|
-0.389
|
-0.029
|
It can be seen that the positive charge residues carried by the protein encoded by the gene are less than the negative charge residues, and the isoelectric point is between 4.98 and 5.43, indicating that the protein is easy to precipitate between these values. Only the protein encoded by luxF gene is unstable, and the protein encoded by other genes is stable. In addition, the hydrophobicity is negative, indicating that these proteins have certain hydrophilicity.
Prediction of secondary and tertiary structures proteins encoded by lux genes
For unknown proteins, their secondary and tertiary structures can be predicted by amino acid sequences. The online analysis software PSIPRED was used to predict the secondary structure of the protein encoded by lux genes (Figure 8). The amino acid sequence of the pink part was αhelix, and the amino acid sequence of the yellow part was β folding. SWISS-MODEL database (http://swissmodel.expasy.org/repository/) was used to predict the tertiary structure of proteins (Figure 9). The protein structure in this database was predicted by the homology modeling method. When the sequence similarity between the predicted protein and the template protein exceeds 30%, the homology modeling method can generate the tertiary structure of the protein with a prediction accuracy of 90%.
Except for luxG gene, the similarity of sequences encoded by other genes was more than 30% after alignment, so the tertiary structure of the protein was closer to the real structure. From the predicted secondary and tertiary structures of lux genes encoding proteins, it can be seen that the α subunit and β subunit of the luciferase encoded by luxA and luxB genes have β folding barrel structures, which may be these two genes play an important role in the luminescence activity of luminescent bacteria. Understanding the tertiary structure of proteins is of great significance for studying functional structures (such as molecular docking and virtual screening).
Genome collinearity analysis
The basic characteristics of genomes
The genome size of the six strains is similar, ranging from 4380538bp to 4853277bp, and the number of coding genes is 3739-4131 (Table 6). The genome characteristics of different strains of the same bacteria are closer, the stain FJ21 is closer to Photobacterium kishitanii in genome size. Compared with the number of coding genes and RNA of other strains, the results showed that the number of coding genes and RNA predicted were significantly higher than others.
Table 6
Genomic information of strains.
Species name
|
Strain
|
Genome size (bp)
|
Coding gene
|
tRNAs
|
5S、16S、23S rRNA
|
s__Photobacterium kishitanii
|
ATCC:BAA-1194
|
4732354
|
4087
|
159
|
6, 6, 5
|
s__Photobacterium phosphoreum
|
JCM 21184
|
4550107
|
3840
|
73
|
3, 0, 1
|
s__Photobacterium aquimaris
|
LC2-065
|
4525475
|
3826
|
83
|
1, 0, 1
|
s__Photobacterium malacitanum
|
CECT 9190
|
4380538
|
3739
|
186
|
10、15、12
|
s__Photobacterium carnosum
|
TMW 2.2021
|
4559543
|
3984
|
122
|
8、11、8
|
Photobacterium kishitanii
|
FJ21
|
4853277
|
4131
|
226
|
24、22、22
|
Collinearity analysis
MUMmer (version 3.23) software was used to compare the strain FJ21 with Photobacterium kishitanii, Photobacterium phosphoreum, Photobacterium aquimaris, Photobacterium malacitanum, and Photobacterium carnosum. The collinearity and structural variation of genomic sequences are shown (Figure 10), and there were 218, 744, 717, 748 and 708 contrast blocks between s__ Photobacterium kishitanii, s__Photobacterium phosphoreum, s__Photobacterium aquimaris, s__Photobacterium malacitanum, s__Photobacterium carnosum and the strain FJ21, respectively. They accounted for 86.92%, 70.27%, 64.82%, 65.16% and 56.32% of the genome of the strain, respectively. According to the results, the collinearity between genomes is good, but there are a small number of genome rearrangement events such as inversion and translocation. It can be seen that the six strains still have great differences in evolution.
Prediction and Analysis of Secondary Metabolite Gene Clusters
The encoding genes of secondary metabolites are usually clustered in the genome, encoding complex enzymes with multiple functions. AntiSMASH ( version 6.0.0 ) software was used to predict the gene cluster of the assembled genome. Three types of secondary metabolite gene clusters, RiPP-like, betalactone, and arylpolyene, were predicted in FJ21 genome(Table 7). 90% similarity between arylpolyenes and APE Vf synthetic gene clusters.
Table 7
Gene clusters of secondary metabolite of FJ21.
Cluster type
|
Start gene and End gene
|
Ratio of genes show similarity ( %)
|
Most similar known cluster ( %)
|
RiPP-like
|
01645-01653
|
NZ_LN794352(100%)
NZ_PYNJ01000046(100%)
NZ_MSCC01000006(100%)
|
-
|
Betalactone
|
01929-01947
|
NZ_LN794352(100%)
NZ_MSCC01000001(100%)
NZ_MSCQ01000002(100%)
|
-
|
Arylpolyene
|
03136-03175
|
NZ_LN794352(97%)
NZ_MSCQ01000002(94%)
NZ_MSCC01000002(83%)
|
BGC0000837: APE Vf ( 90%)
BGC0002008:arylpolyenes( 50%)
BGC0000836:APE Ec( 42%)
|