3.1. Sequence and similarity information
We selected a hypothetical protein (accession No. QIH20856.1) from the organism Neisseria gonorrhoeae. This hypothetical protein contains 478 amino acids. The amino acid sequence for this protein was selected from the NCBI database and obtained in FASTA format. BlastP was performed to verify sequence similarity. The non-redundant protein sequences (nr) database (Table 2) and the Uniport/Swiss-Prot (SwissProt) database (Table 3) were examined to identify sequence similarity with other known proteins by utilizing BlastP. The HP exhibits similarities with other MobA/ MobL family proteins, according to the non-redundant protein sequence database. A phylogenetic tree showing the phylogenetic relatedness among the sequences obtained from non-redundant database was constructed using the MEGA X program by neighbor-joining method with a bootstrap replication of 1000, shown in Fig. 1.
Table 2
Similar protein obtained from non-redundant protein sequences (nr) database
Description
|
Scientific Name
|
Max Score
|
Total Score
|
E value
|
Percent identity
|
Accession
|
MobA/MobL family protein [Proteobacteria]
|
Proteobacteria
|
984
|
984
|
0
|
100
|
WP_032490546.1
|
MobA/MobL family protein [Haemophilus parainfluenzae]
|
Haemophilus parainfluenza
|
978
|
978
|
0
|
99.37
|
WP_197561055.1
|
MobA/MobL family protein [Haemophilus haemolyticus]
|
Haemophilus haemolyticus
|
977
|
977
|
0
|
99.16
|
WP_140450219.1
|
MobA/MobL family protein [Neisseria gonorrhoeae]
|
Neisseria gonorrhoeae
|
936
|
936
|
0
|
96.86
|
WP_127514845.1
|
MobA/MobL family protein [Haemophilus parainfluenzae]
|
Haemophilus parainfluenzae
|
907
|
907
|
0
|
99.11
|
MBS6191364.1
|
Table 3
Similar protein obtained from Uniport/Swiss-Port (Swissport) database
Description
|
Scientific Name
|
Max Score
|
Total Score
|
E value
|
Per. ident
|
Accession
|
[Escherichia coli]
|
Escherichia coli
|
219
|
219
|
1.00E-62
|
46.96
|
P07112.4
|
[Salmonella enterica subsp. enterica serovar Typhimurium]
|
Salmonella enterica subsp. enterica serovar Typhimurium
|
154
|
154
|
2.00E-41
|
41.01
|
P14492.1
|
[Acidithiobacillus ferridurans]
|
Acidithiobacillus ferridurans
|
86.7
|
86.7
|
3.00E-17
|
27.91
|
P20085.1
|
[Bifidobacterium longum NCC2705]
|
Bifidobacterium longum NCC2705
|
73.2
|
73.2
|
2.00E-12
|
26.32
|
Q8GN32.1
|
[Agrobacterium tumefaciens]
|
Agrobacterium tumefaciens
|
65.9
|
65.9
|
5.00E-10
|
24.58
|
Q44363.1
|
3.2. Physicochemical Properties:
According to the ExPASy ProtPram server, the protein's physical properties (Table 4) revealed that it includes 478 amino acids. The most prevalent amino acids in the composition were Ala (37), Arg (30), Asn (23), Asp (26), Cys (3), Gln (47), Glu (55), Gly (20), His (10), Ile (26), Leu (34), Lys (53), Met (7), Phe (17), Pro (11), Ser (28), Thr (15), Tyr (20), Trp (5), Val (11). Its molecular weight is 56206.84 Dalton. The Hypothetical Protein has an instability index of 45.45, indicating that it is a stable protein. The numbers of negatively charged (Asp + Glu) and positively charged (Arg + Lys) residues were calculated to be 81 and 83, respectively. The Aliphatic Index was found to be 63.37, indicating that the protein is stable across an extensive temperature range. The protein's GRAVY score of 1.179 suggested that it is water-soluble (hydrophilic). The protein's pI was calculated to be 8.07, indicating that it is acidic (pH 7) in nature. The molecular formula of the HP was C2461H3884N716O774S10. In mammalian reticulocytes (in vitro), yeast (in vivo), and E. coli, the putative protein's half-life was calculated to be 30 hours in mammalian reticulocytes (in vitro), > 20 hours in yeast (in vivo), and > 10 hours in E. coli (in-vivo).
Table 4
ProtParam tool analysis result for the HP of Neisseria gonorrhoeae F0T10 13280
Number of amino acids
|
478
|
Molecular weight
|
56206.84
|
Theoretical pI
|
8.07
|
Total number of negatively charged residues (Asp + Glu)
|
81
|
Total number of positively charged residues (Arg + Lys)
|
83
|
Formula
|
C2461H3884N716O774S10
|
Instability index (II)
|
45.45
|
Aliphatic index
|
63.37
|
Grand average of hydropathicity (GRAVY)
|
-1.179
|
The estimated half-life is
|
Thirty hours (mammalian reticulocytes, in vitro).
> 20 hours (yeast, in vivo).
> 10 hours (Escherichia coli, in vivo).
|
3.3. Subcellular localization prediction
The environments in which proteins operate are determined by their subcellular localization. Protein subcellular localization is crucial for understanding protein function. Predicting an unknown protein's subcellular localization also provides valuable information about genomic annotation and drug design [56]. In our study, we have found our protein as cytoplasmic according to the result of the CELLO. The localization score from CELLO was found to be 1.680. PSORTb v3.0.3 and PSLpred were used to verify the result. PSORTb v3.0.3 also identified the protein to be cytoplasmic, and the score was found to be 8.96. According to the PSLpred, the protein was also predicted as a cytoplasm-resident protein with a score of 64.47..
3.4. Secondary structure prediction
Protein secondary structure prediction (helix, sheet, turn, and coil) is an essential first step toward predicting tertiary structure. It also provides details on protein activity, interactions, and functions. Alpha helices were found to be the most frequently occurring structure in the HP while examined by SOPMA (69.87 per cent) (Figure 2). The random coil was seen at 19.67 percent, followed by the extended strand at 5.65 percent. In addition, beta-turn was found to be 4.81 percent. We cross-checked the results using PSIPRED, and a similar result was revealed (Figure 3).
3.5. Homology modelling, quality assessment of the 3D model and visualization
The 3D structure of the protein is highly related to its function. The 3D structure of the HP was obtained from HHpred server using homology modelling. By lowering the energy from − 48,361.0 kJ/mol to -11487.9 kJ/mol, the YASARA energy minimization server made the model structure more stable. The 3D structure of the protein was developed by PyMOL v2 (Fig. 4). PROCHECK's Ramachandran plot analysis, Verify3D, and ERRAT verified the protein's 3D structure. According to the Ramachandran Plot Statistics (Fig. 5.A), the model was thought to be acceptable, with 93.6 percent residues in the most favoured regions [Table 5], and it was 90.8 percent before energy minimization. Then Verify3D and ERRAT were used to validate the target sequence's established 3D structure model. After energy minimization, ERRAT (Fig. 5.B) determined that the model was of good quality with an overall quality factor of 95.5556. Before energy minimization, it was 78.453%. After energy minimization, The Verify3D showed that (Fig. 5.C) 96.30 percent of the residues have averaged 3D-1D score > = 0.2, indicating that the model's environmental profile is good. A comparison of all the quality factors of the predicted structure before and after energy minimization has been summarized in Table 6.
Table 5
Ramachandran plot statistics of the predicted 3D model for studied protein
Ramachandran plot analysis
|
No. (%)
|
Residues in the most favoured regions [A, B, L]
|
159 (91.9%)
|
Residues in the additional allowed regions [a, b, l, p]
|
13 (7.5%)
|
Residues in the generously allowed regions [-a, -b, -l, -p]
|
1 (0.6%)
|
Residues in the disallowed regions
|
0 (0.0)
|
No. of non-glycine and non-proline residues
|
173 (100.0%)
|
No. of end-residues (excl. Gly and Pro)
|
2
|
No. of glycine residues (shown in triangles)
|
8
|
No. of proline residues
|
6
|
Total No. of residues
|
189
|
Table 6
Quality assessment score before and after energy minimization
Criteria
|
Before energy minimization
|
After energy minimization
|
Energy
|
− 48361.0 kJ/mol
|
-11487.9 kJ/mol
|
Quality factor
(ERRAT)
|
78.453
|
95.5556
|
Ramachandran plot
(PROCHECK)
|
90.8%
|
93.6%
|
VERIFY 3D
|
98.41% of the residues have
averaged 3D-1D score > = 0.2
|
96.30% of the residues have
averaged 3D-1D score > = 0.2
|
3.6. Functional annotation
Using the NCBI's conserved domain search tool, two functional domains of the HP were identified. The domain detected in the HP belongs to the MobA/MobL protein family (accession No. pfam03389). This family includes the MobA protein from the E. coli plasmid RSF1010 and the MobL protein from the Thiobacillus ferrooxidans plasmid PTF1. These are mobilization proteins, which are required for particular plasmid transfer. Smc or chromosomal segregation ATPase, is another superfamily that involves cell cycle control, cell division, and chromosome partitioning. Plasmid transfer, cell division, cell cycle regulation, and chromosomal partitioning are essential aspects of genetic engineering and the biotechnological approach. Cell cycle regulation is critical for cell survival and proliferation. Lack of cell cycle maintenance can result in harmful mutations, leading to cell death and cancer [57]. This result was also cross-checked using INTERPRO, MOTIF, and Pfam. All produced similar findings, with positions ranging from 23 to 211 amino acid residues and an e-value of 3.5e-29.
3.7. Active site detection
The CASTp server was used to examine the protein's active site. The discovery and identification of active sites on proteins are becoming highly significant. The position of the active site on a protein is pivotal for a variety of purposes, including structural identification, functional site comparison, molecular docking, and de novo drug creation [25]. In this study, we also evaluated the active site region and the number of amino acids involved (Figure 6). The CASTp server revealed that the active site of the protein had 16 amino acid residues, with the best active site located in regions with 63.924 and a volume of 57.845.
3.8. Prediction of CTL epitope and analysis of the MHC I binding alleles:
The NetCTL server anticipated the 13 effective T cell epitopes from the selected protein sequence, such as QSAQAKNDY, LTDKNQGFL, GMEVEITQY, DSGSNKLPY, HTDKNNHNP, QANQALEQY, KQAQGMGKY, FAEDNPQEF, NQALEQYGY, LDDLQFSGY, AIYHLNVRY, DLQRIQGDY and TVDSGSNKL with a specificity score of 0.940 and a sensitivity score of 0.89. The MHC-I alleles for which the epitopes showed higher affinity (IC50 < 500 nM) are shown in Table 1.
Table 7
T cell epitopes predicted by NetCTL server along with their MHC I binding alleles
Epitope
|
Interacting MHC I alleles
|
QSAQAKNDY
|
HLA-A*30:02
|
LTDKNQGFL
|
HLA-A*01:01
|
GMEVEITQY
|
HLA-A*30:02
|
DSGSNKLPY
|
HLA-B*35:01
|
HTDKNNHNP
|
None
|
QANQALEQY
|
HLA-B*35:01, HLA-B*58:01
|
KQAQGMGKY
|
HLA-A*30:02, HLA-B*15:01
|
FAEDNPQEF
|
HLA-B*35:01, HLA-B*53:01
|
NQALEQYGY
|
HLA-A*30:02, HLA-B*15:01
|
LDDLQFSGY
|
HLA-A*01:01
|
AIYHLNVRY
|
HLA-A*30:02, HLA-A*32:01, HLA-B*15:01, HLA-A*03:01, HLA-A*11:01
|
DLQRIQGDY
|
HLA-A*30:02
|
TVDSGSNKL
|
None
|
3.9. Epitope selection for docking and epitope prioritization:
Among the 13 T cell epitopes, the epitope AIYHLNVRY was found to interact with the highest number of MHC I alleles and was selected for vaccine design. This epitope interacted with 5 MHC I binding alleles, including- HLA-A*30:02, HLA-A*32:01, HLA-B*15:01, HLA-A*03:01, and HLA-A*11:01. VaxiJen 2.0, ToxinPred, and AllerTop 2.0 servers identified the epitope as a putative antigen (antigenicity score 1.5783), non-toxic and non-allergen, respectively. All these results have identified the epitope as a suitable vaccine candidate.
3.10. Molecular docking analysis:
The docking analysis has revealed that the predicted epitope produced a total of nine hydrogen bonds with the residue Tyr9, Arg8, Val7, Ala1, Tyr3, Ile2, Asn6, Leu5, and His 2. The binding energy between the epitope and HLA-B*3501 receptor was found to be -7.5 kcal/mol. The three-dimensional structure of the peptide and the binding interactions of the peptide and HLA-B*15:01 after docking analysis are visualized and captured with Discovery Studio 2021 and shown in Figure 7.