Structural and Functional Annotation and Molecular Docking Analysis of a Hypothetical Protein from Neisseria gonorrhoeae, An In-silico Approach

doi:10.21203/rs.3.rs-1679635/v1

Download PDF

Research Article

Structural and Functional Annotation and Molecular Docking Analysis of a Hypothetical Protein from Neisseria gonorrhoeae, An In-silico Approach

https://doi.org/10.21203/rs.3.rs-1679635/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

The threat of sexually transmitted diseases (STDs) is a significant public health concern. Blood, sperm, vaginal, and other bodily fluids can transport bacteria, viruses, and parasites that cause sexually transmitted diseases from one person to another. Neisseria gonorrhoeae is one of the microorganisms responsible for sexually transmitted diseases (STDs). It is an aerobic gram-negative bacterium with a large genome that contains numerous proteins, some of which are considered hypothetical.

Methods

In this study, the hypothetical protein of Neisseria gonorrhoeae F0T10 13280 was chosen for analysis and used an in-silico approach to explore various properties such as physicochemical characteristics, subcellular localization, secondary structure, 3D structures, and functional annotation. Finally, a molecular docking analysis was performed to design an epitope-based vaccine against this protein.

Results

This study has identified the potential role of the HP in plasmid transfer, cell cycle control, cell division, and chromosome partitioning. Acidic nature, thermal stability, cytoplasmic localization of the protein and some of its other physicochemical properties has also been identified through this study. Molecular docking analysis has demonstrated that one of the T cell epitopes of the protein has a significant binding affinity with the human leucocyte antigen HLA-B*3501.

Conclusions

The in-silico characterization of this protein will help us understand its molecular mechanism of action and get an insight into novel therapeutic identification processes. This research will, therefore, enhance our knowledge to find new medications to tackle this potential threat to humankind.

Bioinformatics

Neisseria gonorrhoeae

hypothetical proteins

phylogenetic characterization

functional annotation

in-silico characterization

molecular docking

Gonorrhoea is a contagious disease that spreads quickly, and every year 87 million new infections are being reported. This sexually transmitted disease already has emerged as a major problem in low- and middle-income countries in Africa, Asia, Latin America, and the Caribbean [1, 2, 3]. Neisseria gonorrhoeae, the etiological agent of Gonorrhea, first isolated in 1878, is a gram-negative, 0.6-1 micro meter in diameter [4], encapsulated bacterium [5], and it belongs to the Neisseriaceae family [1, 6]. These diplococci, kidney shaped bacteria, can infect both men and women [2, 7]. It is a fastidious [3], non-acid fast [8], oxidase-positive [9] and non-spore-forming bacterium [10]. In addition, it is a non-motile [11] and obligate human pathogen [12] that can thrive aerobically or anaerobically in the presence of nitrite [13]. Gonorrhoea can be asymptotic or develop with symptoms. It can manifest as urethritis in men, with symptoms such as epididymitis, urethral stricture, and prostatitis. In women, it might manifest as urethritis or cervicitis, with symptoms including tubal infertility, chronic pelvic discomfort, severe pelvic inflammatory disease sequelae, and ectopic pregnancy [4, 14]. Oropharyngeal and anorectal gonococcal infections can be transmitted from one person to another through kissing and during oral-anal intercourse. Furthermore, gonorrhoea can be caused by contamination via cervical fluids [14, 15]. However, no gonococcal vaccination is currently available, WHO recommends azithromycin and ceftriaxone as a dual therapy [2]. Penicillins, tetracyclines, sulphonamides, fluoroquinolones, macrolides, azithromycin, and ceftriaxone are among the antimicrobial drugs Neisseria gonorrhoeae has shown resistance to [2, 16, 17]. As there is currently no effective treatment for gonococci, new treatment approaches must be developed, such as the discovery of novel antibacterial drugs or the development of alternative therapies [18].

The genome size of Neisseria gonorrhoeae varies from strain to strain, about 2001+/-197 kbp [19]. For example, the genome of Neisseria gonorrhoeae NCCP11945 contains 2232.025 kbp in one circular chromosome that encodes 2662 predicted open reading frames and 4153 bp that codes 12 predicted ORFs [20]. Additionally, Neisseria gonorrhoeae is known to encode several proteins with unknown functions, known as hypothetical proteins. Hypothetical proteins [HP] are considered to be expressed in an organism, but there is no experimental and chemical proof that they exist [21, 22, 23]. Although there is no empirical evidence for the existence of these proteins, they can be predicted to be generated from an open reading frame (ORF) [23, 24]. In most genomes, HPs cover approximately half of the protein-coding regions [21]. These proteins' roles are still unknown [21, 24, 25]. As a result, the annotation of the functions of hypothetical proteins has become extremely popular [25]. The hypothetical proteins can be categorized as uncharacterized protein families (UPF) as well as the domain of unknown functions (DUF) [23]. Uncharacterized protein families (UPF) have been experimentally confirmed to exist, although they have yet to be identified or connected to a known gene.

On the other hand, DUFs are proteins that have been found experimentally but have no known functional or structural domains [23]. Even though they haven't been characterized, elucidating their structural and functional secrets can lead to the identification of new domains and motifs, pathways and cascades, structural conformations, protein networks, etc. [21, 22]. They are crucial in understanding biochemical and physiological pathways, for example, in identifying pharmaceutical targets [21, 22, 25] and providing early detection and advantages for proteomic and genomic studies [21]. It is now easier to analyze hypothetical proteins utilizing a variety of bioinformatics tools that provide benefits such as 3D structural conformation prediction, identification of new domains and pathways, phylogenetic profiling, and functional annotation [22, 23]. In this study, we focused on characterizing a hypothetical protein F0T10_13280 (plasmid) of Neisseria gonorrhoeae with several bioinformatics tools and databases to get an insight into the HP's physical and structural information along with its potential functions, as well as a molecular docking study was performed to design an epitope-based vaccine.

2.1. Sequence retrieval and phylogeny analysis:

The amino acid sequence (accession No. QIH20856.1) was selected by searching the NCBI protein database for HP of Neisseria gonorrhoeae. The sequence was obtained in FASTA format. To identify sequence similarity, BlastP [26] was performed. MUSCLE v3.6 [27] was used to perform multiple sequence alignment. Phylogenetic analysis was carried out using MEGA X [28]. Table 1 depicts the entire framework, which includes all the tools used to annotate the structural and functional properties of HP of Neisseria gonorrhoeae.

Table 1

List of bioinformatics tools and databases used in this study for structural and functional analysis of the HP
S. N.	TOOLS/ SERVER	URL	FUNCTION	REF
Sequence similarity search
1.	BLAST	http://www.ncbi.nlm.nih.gov/BLAST/	Find similar sequences in protein databases	26
2.	MUSCLE		Multiple sequence alignment prediction	27
3.	MEGA X		Phylogenetic tree analysis	28
Physiochemical characterization
4.	ExPASy – ProtParam	http://web.expasy.org/protparam/	Used for predicting physicochemical properties	29
Sub-cellular Localization
5.	PSORT B v3.0	http://www.psort.org/psortb/	predict subcellular localization	32
6.	PSLpred	http://www.imtech.res.in/raghava/pslpred/	predict subcellular localization	33
7.	CELLO	http://cello.life.nctu.edu.tw/	predict subcellular localization	31
Secondary structure prediction
8.	SOPMA	https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html	predict the secondary structure of the protein	34
9.	PSIPRED	http://bioinf.cs.ucl.ac.uk/psipred/	predict secondary structure	35
3D structure prediction and quality assessment
10.	HHpred	https://toolkit.tuebingen.mpg.de/tools/hhpred	detect protein homology	36
11.	YASARA	http://www.yasara.org/minimizationserver.htm	Utilized to increase the stability of the 3D model structure	37
12.	PROCHECK’s	https://saves.mbi.ucla.edu/	Used for Ramachandran plot analysis	39
13.	Verify3D	https://saves.mbi.ucla.edu/	Structure verification	41
14.	ERRAT	https://saves.mbi.ucla.edu/	Used to analyze the statistics of nonbonded interactions between different atoms and verify protein structures	40
Functional characterization
15.	Conserved domain database	http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi	Used to search functional domains in a sequence	45
16.	Pfam	http://pfam.xfam.org/	Family relationship identification	44
17.	INTERPRO	http://www.ebi.ac.uk/interpro/	Used to search InterPro for motif discovery	42
18.	MOTIF	http://www.genome.jp/tools/motif/	Motif discovery	43
Active site identification
19.	CASTp	http://sts.bioe.uic.edu/castp/	Used to find, outline, and estimate inward surface regions on protein 3D structure	46

2.2. Physicochemical properties analysis:

The physicochemical properties of the target protein sequence were investigated using ExPASy's ProtParam program [29]. The molecular weight, atomic composition, estimated half-life, theoretical isoelectric point (pl), extinction coefficient, amino acid composition, aliphatic index, stability index, the total number of positive and negative residues, and grand average of hydropathicity (GRAVY) were all analyzed using this tool.

2.3. Subcellular localization prediction:

It is crucial to know the subcellular localization of proteins in order to comprehend their functions [30] entirely. Several computational tools for predicting protein subcellular localization have been developed. CELLO v.2.5 [31] was first used to recognize the subcellular localization of hypothetical protein F0T10_13280 (plasmid) of Neisseria gonorrhoeae. PSORTb v3.0.3 [32] was further used to anticipate subcellular location. To cross-check the results, we used PSLpred [33], a web server for predicting the subcellular localization of gram-negative bacterial proteins.

2.4. Secondary structure prediction:

Secondary structure predictions of the hypothetical protein were performed using the SOPMA server [34]. The PSIPRED server [35] was also used to ensure the accuracy of the SOPMA results.

2.5. 3D structure prediction and quality assessment:

HHpred server [36] provided a 3D model of the protein. The YASARA server [37] (http://www.yasara.org/minimizationserver.htm) was used to accomplish energy minimization. To visualize the final model and perform structural analysis, PyMOL v2 [38] was employed. The SAVES server's (https://services.mbi.ucla.edu) quality assessment tools were used to assess the predictability of the hypothetical protein's projected 3D structural model. The Ramachandran plot was built using the PROCHECK [39] tool to visualize the backbone dihedral angles of amino acid residues. With the help of the ERRAT server [40], the quality of the protein 3D structure was evaluated. The Verify 3D server [41] was used to check whether an atomic model (3D) was compatible with its amino acid sequence and compare the results to standard structures.

2.6. Functional annotation:

In order to make exact and reliable functional predictions of the HP, we used a variety of tools. INTERPRO [42], MOTIF [43], Pfam [44], and the Conserved domain database of NCBI [45] are the databases and tools being used for this requirement.

2.7. Active site detection:

For active site assessment and structure-based ligand design, the shape and size of protein pockets and cavities are crucial. The computed atlas of surface topography of proteins (CastP) was utilized in this experiment to detect possible binding sites, pockets, and cavities from the 3D structure of the target protein [46].

2.8. Prediction of CTL epitope and MHC I binding allele analysis:

In order to design an epitope-based vaccine against the hypothetical protein, cytotoxic T lymphocytes (CTL) prediction was performed using the NetCTL server [47]. The threshold parameter was set to 0.4 with 0.89 sensitivity and 0.94 specificity. To analyse the MHC I binding alleles, all CTL was evaluated with the immune epitope database (IEDB) utilizing the SMM method [48]. The MHC-I alleles for which the epitopes showed higher affinity (IC50 <500 nM) were selected for further analysis.

2.9. Epitope selection for docking and epitope prioritization:

Among all the CTL epitopes, one epitope was selected based on its interaction with the maximum number of MHC I binding alleles. The suitability of this epitope for vaccine construction was cross-checked with VaxiJen 2.0 [49], Toxinpred [50], and AllerTop 2.0 [51] servers to investigate the antigenic, allergenic, and toxicity properties, respectively. The threshold parameter of the VaxiJen 2.0 server was set to 0.4, and all the parameters of the Toxinpred and AllerTop 2.0 server were set to default.

2.10. Peptide designing and docking analysis:

The three-dimensional structure of the epitope was constructed with the APPTEST server [52]. APPTEST server is a peptide tertiary structure prediction tool that predicts peptide structure using a neural network architecture and simulated annealing methods. A molecular docking experiment was performed to scrutinize the binding interaction between the epitope and receptor molecule. The crystal structure of HLA-B*15:01 (PDB ID – 1xr8) was retrieved from the RCSB database [53] to perform docking analysis. The docking analysis between the peptide (ligand) and human receptor HLA-B*15:01 was performed using the AutoDockVina tool [54]. The grid box size of the AutoDockVina tool was kept at 12.702, 31.843, and 18.307, respectively, for X, Y, and Z. The binding interactions and residues in the interacting surface between the peptide and receptor were investigated with Discovery Studio 2021 [55].

3.1. Sequence and similarity information

We selected a hypothetical protein (accession No. QIH20856.1) from the organism Neisseria gonorrhoeae. This hypothetical protein contains 478 amino acids. The amino acid sequence for this protein was selected from the NCBI database and obtained in FASTA format. BlastP was performed to verify sequence similarity. The non-redundant protein sequences (nr) database (Table 2) and the Uniport/Swiss-Prot (SwissProt) database (Table 3) were examined to identify sequence similarity with other known proteins by utilizing BlastP. The HP exhibits similarities with other MobA/ MobL family proteins, according to the non-redundant protein sequence database. A phylogenetic tree showing the phylogenetic relatedness among the sequences obtained from non-redundant database was constructed using the MEGA X program by neighbor-joining method with a bootstrap replication of 1000, shown in Fig. 1.

Table 2

Similar protein obtained from non-redundant protein sequences (nr) database
Description	Scientific Name	Max Score	Total Score	Percent identity	Accession
MobA/MobL family protein [Proteobacteria]	Proteobacteria	984	984	100	WP_032490546.1
MobA/MobL family protein [Haemophilus parainfluenzae]	Haemophilus parainfluenza	978	978	99.37	WP_197561055.1
MobA/MobL family protein [Haemophilus haemolyticus]	Haemophilus haemolyticus	977	977	99.16	WP_140450219.1
MobA/MobL family protein [Neisseria gonorrhoeae]	Neisseria gonorrhoeae	936	936	96.86	WP_127514845.1
MobA/MobL family protein [Haemophilus parainfluenzae]	Haemophilus parainfluenzae	907	907	99.11	MBS6191364.1

Table 3

Similar protein obtained from Uniport/Swiss-Port (Swissport) database
Description	Scientific Name	Max Score	Total Score	E value	Per. ident	Accession
[Escherichia coli]	Escherichia coli	219	219	1.00E-62	46.96	P07112.4
[Salmonella enterica subsp. enterica serovar Typhimurium]	Salmonella enterica subsp. enterica serovar Typhimurium	154	154	2.00E-41	41.01	P14492.1
[Acidithiobacillus ferridurans]	Acidithiobacillus ferridurans	86.7	86.7	3.00E-17	27.91	P20085.1
[Bifidobacterium longum NCC2705]	Bifidobacterium longum NCC2705	73.2	73.2	2.00E-12	26.32	Q8GN32.1
[Agrobacterium tumefaciens]	Agrobacterium tumefaciens	65.9	65.9	5.00E-10	24.58	Q44363.1

3.2. Physicochemical Properties:

According to the ExPASy ProtPram server, the protein's physical properties (Table 4) revealed that it includes 478 amino acids. The most prevalent amino acids in the composition were Ala (37), Arg (30), Asn (23), Asp (26), Cys (3), Gln (47), Glu (55), Gly (20), His (10), Ile (26), Leu (34), Lys (53), Met (7), Phe (17), Pro (11), Ser (28), Thr (15), Tyr (20), Trp (5), Val (11). Its molecular weight is 56206.84 Dalton. The Hypothetical Protein has an instability index of 45.45, indicating that it is a stable protein. The numbers of negatively charged (Asp + Glu) and positively charged (Arg + Lys) residues were calculated to be 81 and 83, respectively. The Aliphatic Index was found to be 63.37, indicating that the protein is stable across an extensive temperature range. The protein's GRAVY score of 1.179 suggested that it is water-soluble (hydrophilic). The protein's pI was calculated to be 8.07, indicating that it is acidic (pH 7) in nature. The molecular formula of the HP was C2461H3884N716O774S10. In mammalian reticulocytes (in vitro), yeast (in vivo), and E. coli, the putative protein's half-life was calculated to be 30 hours in mammalian reticulocytes (in vitro), > 20 hours in yeast (in vivo), and > 10 hours in E. coli (in-vivo).

Table 4

ProtParam tool analysis result for the HP of *Neisseria gonorrhoeae* F0T10 13280
Number of amino acids	478
Molecular weight	56206.84
Theoretical pI	8.07
Total number of negatively charged residues (Asp + Glu)	81
Total number of positively charged residues (Arg + Lys)	83
Formula	C₂₄₆₁H₃₈₈₄N₇₁₆O₇₇₄S₁₀
Instability index (II)	45.45
Aliphatic index	63.37
Grand average of hydropathicity (GRAVY)	-1.179
The estimated half-life is	Thirty hours (mammalian reticulocytes, in vitro). > 20 hours (yeast, in vivo). > 10 hours (Escherichia coli, in vivo).

3.3. Subcellular localization prediction

The environments in which proteins operate are determined by their subcellular localization. Protein subcellular localization is crucial for understanding protein function. Predicting an unknown protein's subcellular localization also provides valuable information about genomic annotation and drug design [56]. In our study, we have found our protein as cytoplasmic according to the result of the CELLO. The localization score from CELLO was found to be 1.680. PSORTb v3.0.3 and PSLpred were used to verify the result. PSORTb v3.0.3 also identified the protein to be cytoplasmic, and the score was found to be 8.96. According to the PSLpred, the protein was also predicted as a cytoplasm-resident protein with a score of 64.47..

3.4. Secondary structure prediction

Protein secondary structure prediction (helix, sheet, turn, and coil) is an essential first step toward predicting tertiary structure. It also provides details on protein activity, interactions, and functions. Alpha helices were found to be the most frequently occurring structure in the HP while examined by SOPMA (69.87 per cent) (Figure 2). The random coil was seen at 19.67 percent, followed by the extended strand at 5.65 percent. In addition, beta-turn was found to be 4.81 percent. We cross-checked the results using PSIPRED, and a similar result was revealed (Figure 3).

3.5. Homology modelling, quality assessment of the 3D model and visualization

The 3D structure of the protein is highly related to its function. The 3D structure of the HP was obtained from HHpred server using homology modelling. By lowering the energy from − 48,361.0 kJ/mol to -11487.9 kJ/mol, the YASARA energy minimization server made the model structure more stable. The 3D structure of the protein was developed by PyMOL v2 (Fig. 4). PROCHECK's Ramachandran plot analysis, Verify3D, and ERRAT verified the protein's 3D structure. According to the Ramachandran Plot Statistics (Fig. 5.A), the model was thought to be acceptable, with 93.6 percent residues in the most favoured regions [Table 5], and it was 90.8 percent before energy minimization. Then Verify3D and ERRAT were used to validate the target sequence's established 3D structure model. After energy minimization, ERRAT (Fig. 5.B) determined that the model was of good quality with an overall quality factor of 95.5556. Before energy minimization, it was 78.453%. After energy minimization, The Verify3D showed that (Fig. 5.C) 96.30 percent of the residues have averaged 3D-1D score > = 0.2, indicating that the model's environmental profile is good. A comparison of all the quality factors of the predicted structure before and after energy minimization has been summarized in Table 6.

Table 5

Ramachandran plot statistics of the predicted 3D model for studied protein
Ramachandran plot analysis	No. (%)
Residues in the most favoured regions [A, B, L]	159 (91.9%)
Residues in the additional allowed regions [a, b, l, p]	13 (7.5%)
Residues in the generously allowed regions [-a, -b, -l, -p]	1 (0.6%)
Residues in the disallowed regions	0 (0.0)
No. of non-glycine and non-proline residues	173 (100.0%)
No. of end-residues (excl. Gly and Pro)	2
No. of glycine residues (shown in triangles)	8
No. of proline residues	6
Total No. of residues	189

Table 6

Quality assessment score before and after energy minimization
Criteria	Before energy minimization	After energy minimization
Energy	− 48361.0 kJ/mol	-11487.9 kJ/mol
Quality factor (ERRAT)	78.453	95.5556
Ramachandran plot (PROCHECK)	90.8%	93.6%
VERIFY 3D	98.41% of the residues have averaged 3D-1D score > = 0.2	96.30% of the residues have averaged 3D-1D score > = 0.2

3.6. Functional annotation

Using the NCBI's conserved domain search tool, two functional domains of the HP were identified. The domain detected in the HP belongs to the MobA/MobL protein family (accession No. pfam03389). This family includes the MobA protein from the E. coli plasmid RSF1010 and the MobL protein from the Thiobacillus ferrooxidans plasmid PTF1. These are mobilization proteins, which are required for particular plasmid transfer. Smc or chromosomal segregation ATPase, is another superfamily that involves cell cycle control, cell division, and chromosome partitioning. Plasmid transfer, cell division, cell cycle regulation, and chromosomal partitioning are essential aspects of genetic engineering and the biotechnological approach. Cell cycle regulation is critical for cell survival and proliferation. Lack of cell cycle maintenance can result in harmful mutations, leading to cell death and cancer [57]. This result was also cross-checked using INTERPRO, MOTIF, and Pfam. All produced similar findings, with positions ranging from 23 to 211 amino acid residues and an e-value of 3.5e-29.

3.7. Active site detection

The CASTp server was used to examine the protein's active site. The discovery and identification of active sites on proteins are becoming highly significant. The position of the active site on a protein is pivotal for a variety of purposes, including structural identification, functional site comparison, molecular docking, and de novo drug creation [25]. In this study, we also evaluated the active site region and the number of amino acids involved (Figure 6). The CASTp server revealed that the active site of the protein had 16 amino acid residues, with the best active site located in regions with 63.924 and a volume of 57.845.

3.8. Prediction of CTL epitope and analysis of the MHC I binding alleles:

The NetCTL server anticipated the 13 effective T cell epitopes from the selected protein sequence, such as QSAQAKNDY, LTDKNQGFL, GMEVEITQY, DSGSNKLPY, HTDKNNHNP, QANQALEQY, KQAQGMGKY, FAEDNPQEF, NQALEQYGY, LDDLQFSGY, AIYHLNVRY, DLQRIQGDY and TVDSGSNKL with a specificity score of 0.940 and a sensitivity score of 0.89. The MHC-I alleles for which the epitopes showed higher affinity (IC50 < 500 nM) are shown in Table 1.

Table 7

T cell epitopes predicted by NetCTL server along with their MHC I binding alleles
Epitope	Interacting MHC I alleles
QSAQAKNDY	HLA-A*30:02
LTDKNQGFL	HLA-A*01:01
GMEVEITQY	HLA-A*30:02
DSGSNKLPY	HLA-B*35:01
HTDKNNHNP	None
QANQALEQY	HLA-B35:01, HLA-B58:01
KQAQGMGKY	HLA-A30:02, HLA-B15:01
FAEDNPQEF	HLA-B35:01, HLA-B53:01
NQALEQYGY	HLA-A30:02, HLA-B15:01
LDDLQFSGY	HLA-A*01:01
AIYHLNVRY	HLA-A30:02, HLA-A32:01, HLA-B15:01, HLA-A03:01, HLA-A*11:01
DLQRIQGDY	HLA-A*30:02
TVDSGSNKL	None

3.9. Epitope selection for docking and epitope prioritization:

Among the 13 T cell epitopes, the epitope AIYHLNVRY was found to interact with the highest number of MHC I alleles and was selected for vaccine design. This epitope interacted with 5 MHC I binding alleles, including- HLA-A*30:02, HLA-A*32:01, HLA-B*15:01, HLA-A*03:01, and HLA-A*11:01. VaxiJen 2.0, ToxinPred, and AllerTop 2.0 servers identified the epitope as a putative antigen (antigenicity score 1.5783), non-toxic and non-allergen, respectively. All these results have identified the epitope as a suitable vaccine candidate.

3.10. Molecular docking analysis:

The docking analysis has revealed that the predicted epitope produced a total of nine hydrogen bonds with the residue Tyr9, Arg8, Val7, Ala1, Tyr3, Ile2, Asn6, Leu5, and His 2. The binding energy between the epitope and HLA-B*3501 receptor was found to be -7.5 kcal/mol. The three-dimensional structure of the peptide and the binding interactions of the peptide and HLA-B*15:01 after docking analysis are visualized and captured with Discovery Studio 2021 and shown in Figure 7.

In-silico studies may help the researchers to save both time and costs required for the experimental work. Throughout this study, we investigated a hypothetical protein from the bacteria Neisseria gonorrhoeae by utilizing several bioinformatics tools. According to our experiment, several physicochemical and functional properties of the studied hypothetical protein have been identified. For instance, the protein has been predicted as a stable protein with acidic nature and cytoplasmic localization along with its potential functions in gene transfer and cell cycle regulation. This study may enhance our understanding for studying the structural and functional research of protein with unknown functions. Besides, the computational approach for vaccine development against the pathogens may serve as a basis for further in-vivo and in-vitro research. Additionally, this research study may subsequently benefit other researchers to do in-silico studies independently.

Hypothetical protein

CTL

Cytotoxic T lymphocyte

Ethics approval and consent to participate

Not applicable. No impact on ethical standards in this study, and there is no human or animal involvement.

Consent for publication

Authors have no conflict of interest.

Competing interests

All authors declare that they have no competing interests.

Availability of data and materials

The dataset(s) supporting the conclusions of this article is (are) included within the article.

Funding

This study did not receive any funding from any funding agency or research institution.

Author contributions

LM designed the study, experimental work. MRH and KF collected necessary data and performed data analysis. MRH, KF, LM and MZI participated in the drafting manuscript. LM participated in the supervising and reviewing the draft and thoroughly checked and revised the manuscript for necessary changes in format. LM also acted for all correspondences. All authors read and approved the final version of the manuscript.

Acknowledgements

All the authors are thankful towards the Department of Microbiology, Faculty of Life and Earth Sciences, Jagannath University, Dhaka, Bangladesh.

Quillin, S. J., & Seifert, H. S. (2018b). Neisseria gonorrhoeae host adaptation and pathogenesis. Nature Reviews Microbiology, 16(4), 226–240. https://doi.org/10.1038/nrmicro.2017.169
Lin, E. Y., Adamson, P. C., & Klausner, J. D. (2021). Epidemiology, Treatments, and Vaccine Development for Antimicrobial-Resistant Neisseria gonorrhoeae: Current Strategies and Future Directions. Drugs, 81(10), 1153–1169. https://doi.org/10.1007/s40265-021-01530-0
Dillard, J. P. (2011). Genetic Manipulation of Neisseria gonorrhoeae. Current Protocols in Microbiology, 23(1). https://doi.org/10.1002/9780471729259.mc04a02s23
Unemo, M., Seifert, H. S., Hook, E. W., Hawkes, S., Ndowa, F., & Dillon, J. A. R. (2019). Gonorrhoea. Nature Reviews Disease Primers, 5(1). https://doi.org/10.1038/s41572-019-0128-6
James, J. F., & Swanson, J. (1977). The capsule of the gonococcus. Journal of Experimental Medicine, 145(4), 1082–1086. https://doi.org/10.1084/jem.145.4.1082
Alturki, Y. D., Albalawi, S. M. S., Alyami, B. A., Alahmari, S. A. M., Al Hashim, A. A., Alalyani, N. S. J., … Althobaiti, H. S. (2020). An Overview on Gonorrhea Diagnosis and Management in Primary Health Care Centre. International Journal of Pharmaceutical Research & Allied Sciences, 9(4).
Liu, Y. H., Wang, Y. H., Liao, C. H., & Hsueh, P. R. (2019). Emergence and Spread of Neisseria gonorrhoeae Strains with High-Level Resistance to Azithromycin in Taiwan from 2001 to 2018. Antimicrobial Agents and Chemotherapy, 63(9). https://doi.org/10.1128/aac.00773-19
Makinia, W. E., Ojunga, M., & Ayayo, Z. N. O. (2020). Efficacy of Securidaca longipedunculata Fresen (Polygalaceae) against Two Standard Isolates of Neisseria gonorrhoeae. International Journal of Biochemistry Research & Review, 61–68. https://doi.org/10.9734/ijbcrr/2020/v29i630199
Danielsson, D., & Kronvall, G. (1974). Slide Agglutination Method for the Serological Identification of Neisseria gonorrhoeae with Anti-Gonococcal Antibodies Adsorbed to Protein A-Containing Staphylococci. Applied Microbiology, 27(2), 368–374. https://doi.org/10.1128/am.27.2.368-374.1974
Aslanzadeh, J., & Jones, M. (2002). Comparison of M4 and M4RT media for transporting cervical swab samples for PCR detection of Chlamydia trachomatis and Neisseria gonorrhoeae. Annals of clinical and laboratory science, 32(1), 61–64.
Mansoor, I. (2014). Prevalence of Gonorrhea among adult male with urethritis in Erbil City. Zanco Journal of Medical Sciences, 18(2), 692–696. https://doi.org/10.15218/zjms.2014.0018
Spence, J. M., Wright, L., & Clark, V. L. (2008). Laboratory Maintenance of Neisseria gonorrhoeae. Current Protocols in Microbiology, 8(1). https://doi.org/10.1002/9780471729259.mc04a01s8
TURNER, S., MOIR, J., GRIFFITHS, L., OVERTON, T., SMITH, H., & COLE, J. (2005). Mutational and biochemical analysis of cytochrome c′, a nitric oxide-binding lipoprotein important for adaptation of Neisseria gonorrhoeae to oxygen-limited growth. Biochemical Journal, 388(2), 545–553. https://doi.org/10.1042/bj20041766
Ng, L. K., & Martin, I. E. (2005a). The Laboratory Diagnosis ofNeisseria gonorrhoeae. Canadian Journal of Infectious Diseases and Medical Microbiology, 16(1), 15–25. https://doi.org/10.1155/2005/323082
Fairley, C. K., Hocking, J. S., Zhang, L., & Chow, E. P. (2017). Frequent Transmission of Gonorrhea in Men Who Have Sex with Men. Emerging Infectious Diseases, 23(1), 102–104. https://doi.org/10.3201/eid2301.161205
Matoga, M., Chen, J. S., Krysiak, R., Ndalama, B., Massa, C., Bonongwe, N., Mathiya, E., Kamtambe, B., Jere, E., Chikaonda, T., Golparian, D., Unemo, M., Cohen, M. S., Hobbs, M. M., & Hoffman, I. F. (2021). Gentamicin Susceptibility in Neisseria gonorrhoeae and Treatment Outcomes for Urogenital Gonorrhea After 25 Years of Sustained Gentamicin Use in Malawi. Sexually Transmitted Diseases, 49(4), 251–256. https://doi.org/10.1097/olq.0000000000001580
Unemo, M., & Shafer, W. M. (2014). Antimicrobial Resistance in Neisseria gonorrhoeae in the 21st Century: Past, Evolution, and Future. Clinical Microbiology Reviews, 27(3), 587–613. https://doi.org/10.1128/cmr.00010-14
Suay-García, B., & Pérez-Gracia, M. (2018). Future Prospects for Neisseria gonorrhoeae Treatment. Antibiotics, 7(2), 49. https://doi.org/10.3390/antibiotics7020049
Fogel, G., Collins, C., Li, J., & Brunk, C. (1999). Prokaryotic Genome Size and SSU rDNA Copy Number: Estimation of Microbial Relative Abundance from a Mixed Population. Microbial Ecology, 38(2), 93–113. https://doi.org/10.1007/s002489900162
Chung, G. T., Yoo, J. S., Oh, H. B., Lee, Y. S., Cha, S. H., Kim, S. J., & Yoo, C. K. (2008). Complete Genome Sequence of Neisseria gonorrhoeae NCCP11945. Journal of Bacteriology, 190(17), 6035–6036. https://doi.org/10.1128/jb.00566-08
Paul S, Saha M, Bhoumik NC (2015) In silico structural and functional annotation of mycoplasma genitalium hypothetical protein MG _ 377. International Journal Bioautomation 19: 15–24.
Ashrafi, H., Siraji, M. I., Showva, N. N., Hossain, M. M., Hossan, T., Hasan, M. A., Shohael, A. M., & Shawan, M. M. A. K. (2019). Structure to function analysis with antigenic characterization of a hypothetical protein, HPAG1_0576 from Helicobacter pylori HPAG1. Bioinformation, 15(7), 456–466. https://doi.org/10.6026/97320630015456
Rahman, A., Susmi, T. F., Yasmin, F., Karim, M. E., & Hossain, M. U. (2020). Functional annotation of an ecologically important protein from Chloroflexus aurantiacus involved in polyhydroxyalkanoates (PHA) biosynthetic pathway. SN Applied Sciences, 2(11). https://doi.org/10.1007/s42452-020-03598-x
Mazumder, L., Hasan, M., Rus’d, A. A., & Islam, M. A. (2021b). In-silico characterization and structure-based functional annotation of a hypothetical protein from Campylobacter jejuni involved in propionate catabolism. Genomics & Informatics, 19(4), e43. https://doi.org/10.5808/gi.21043
Islam, M. S., Shahik, S. M., Sohel, M., Patwary, N. I. A., & Hasan, M. A. (2015). In SilicoStructural and Functional Annotation of Hypothetical Proteins ofVibrio choleraeO139. Genomics & Informatics, 13(2), 53. https://doi.org/10.5808/gi.2015.13.2.53
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Edgar R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32(5), 1792–1797. https://doi.org/10.1093/nar/gkh340
Kumar, S., Stecher, G., Li, M., Knyaz, C., & Tamura, K. (2018). MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Molecular biology and evolution, 35(6), 1547–1549. https://doi.org/10.1093/molbev/msy096
Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R. D., & Bairoch, A. (2003). ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic acids research, 31(13), 3784–3788. https://doi.org/10.1093/nar/gkg563
Homology Modelling,Bioinformatics Analysis and Insilico Functional Annotation of an Antitoxin Protein from Streptomyces coelicolor A3 (2). (2015). Journal of Proteomics & Computational Biology, 2(1), 01–07. https://doi.org/10.13188/2572-8679.1000006
Yu, C. S., Lin, C. J., & Hwang, J. K. (2004). Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein science: a publication of the Protein Society, 13(5), 1402–1406. https://doi.org/10.1110/ps.03479604
Yu, N. Y., Wagner, J. R., Laird, M. R., Melli, G., Rey, S., Lo, R., Dao, P., Sahinalp, S. C., Ester, M., Foster, L. J., & Brinkman, F. S. (2010). PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics (Oxford, England), 26(13), 1608–1615. https://doi.org/10.1093/bioinformatics/btq249
Bhasin, M., Garg, A., & Raghava, G. P. (2005). PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics (Oxford, England), 21(10), 2522–2524. https://doi.org/10.1093/bioinformatics/bti309
Secondary structure analysis of a protein using SOPMA. Ettimadai: Amrita Vishwa Vidyapeetham Virtual Lab, 2012. Accessed 2021 Nov 30. Available from: https://vlab.amrita.edu/?sub-= 3&brch = 275&sim = 1454&cnt = 1.
Jones D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2), 195–202. https://doi.org/10.1006/jmbi.1999.3091
Zimmermann, L., Stephens, A., Nam, S. Z., Rau, D., Kübler, J., Lozajic, M., Gabler, F., Söding, J., Lupas, A. N., & Alva, V. (2018). A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. Journal of Molecular Biology, 430(15), 2237–2243. https://doi.org/10.1016/j.jmb.2017.12.007
Krieger, E., Joo, K., Lee, J., Lee, J., Raman, S., Thompson, J., Tyka, M., Baker, D., & Karplus, K. (2009). Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8. Proteins: Structure, Function, and Bioinformatics, 77(S9), 114–122. https://doi.org/10.1002/prot.22570
Likova E, Petkov P, Ilieva N, Litov L. The PyMOL Molecular Graphics System, version 2.0. New York: Schrödinger, LLC,2015.
Laskowski, R. A., MacArthur, M. W., Moss, D. S., & Thornton, J. M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography, 26(2), 283–291. https://doi.org/10.1107/s0021889892009944
Colovos, C., & Yeates, T. O. (1993). Verification of protein structures: Patterns of nonbonded atomic interactions. Protein Science, 2(9), 1511–1519. https://doi.org/10.1002/pro.5560020916
Lüthy, R., Bowie, J. U., & Eisenberg, D. (1992). Assessment of protein models with three-dimensional profiles. Nature, 356(6364), 83–85. https://doi.org/10.1038/356083a0
Blum, M., Chang, H. Y., Chuguransky, S., Grego, T., Kandasaamy, S., Mitchell, A., Nuka, G., Paysan-Lafosse, T., Qureshi, M., Raj, S., Richardson, L., Salazar, G. A., Williams, L., Bork, P., Bridge, A., Gough, J., Haft, D. H., Letunic, I., Marchler-Bauer, A.,.. . Finn, R. D. (2020). The InterPro protein families and domains database: 20 years on. Nucleic Acids Research, 49(D1), D344–D354. https://doi.org/10.1093/nar/gkaa977
Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy SR., Griffiths-Jones S., Howe K.L., Marshall M. and Sonnhammer E.L."The Pfam Protein Families Database" Nucl. Acids Res. 30(1):276–280, 2002.
Mistry, J., Chuguransky, S., Williams, L., Qureshi, M., Salazar, G., Sonnhammer, E. L. L., Tosatto, S. C. E., Paladin, L., Raj, S., Richardson, L. J., Finn, R. D., & Bateman, A. (2020). Pfam: The protein families database in 2021. Nucleic Acids Research, 49(D1), D412–D419. https://doi.org/10.1093/nar/gkaa913
Shennan Lu et al. (2020), "CDD/SPARCLE: the conserved domain database in 2020.", Nucleic Acids Res.48(D1)265–8.
Tian, W., Chen, C., Lei, X., Zhao, J., & Liang, J. (2018). CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Research, 46(W1), W363–W367. https://doi.org/10.1093/nar/gky473
Larsen, M. V., Lundegaard, C., Lamberth, K., Buus, S., Lund, O., & Nielsen, M. (2007). Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinformatics, 8(1). https://doi.org/10.1186/1471-2105-8-424
Peters, B., & Sette, A. (2005). Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics, 6(1). https://doi.org/10.1186/1471-2105-6-132
Doytchinova, I. A., & Flower, D. R. (2007). VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics, 8(1). https://doi.org/10.1186/1471-2105-8-4
Gupta, S., Kapoor, P., Chaudhary, K., Gautam, A., Kumar, R., & Raghava, G. P. S. (2013). In Silico Approach for Predicting Toxicity of Peptides and Proteins. PLoS ONE, 8(9), e73957. https://doi.org/10.1371/journal.pone.0073957
Dimitrov, I., Bangov, I., Flower, D. R., & Doytchinova, I. (2014). AllerTOP v.2–a server for in silico prediction of allergens. Journal of molecular modeling, 20(6), 2278. https://doi.org/10.1007/s00894-014-2278-5
Patrick Brendan Timmons, Chandralal M Hewage, APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures, Briefings in Bioinformatics, Volume 22, Issue 6, November 2021, bbab308, https://doi.org/10.1093/bib/bbab308
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne, The Protein Data Bank, Nucleic Acids Research, Volume 28, Issue 1, 1 January 2000, Pages 235–242, https://doi.org/10.1093/nar/28.1.235
Eberhardt, J., Santos-Martins, D., Tillack, A. F., & Forli, S. (2021). AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling, 61(8), 3891–3898. https://doi.org/10.1021/acs.jcim.1c00203
Biovia, D.S. (2015) Discovery Studio Modeling Environment. Dassault Syst. Release, San Diego, 4.
Homology Modelling,Bioinformatics Analysis and Insilico Functional Annotation of an Antitoxin Protein from Streptomyces coelicolor A3 (2). (2015c). Journal of Proteomics & Computational Biology, 2(1), 01–07. https://doi.org/10.13188/2572-8679.1000006
Shackelford, R. E., Kaufmann, W. K., & Paules, R. S. (1999). Cell cycle control, checkpoint mechanisms, and genotoxic stress. Environmental Health Perspectives, 107(suppl 1), 5–24. https://doi.org/10.1289/ehp.99107s15

Download PDF

Version 1

posted

You are reading this latest preprint version

Structural and Functional Annotation and Molecular Docking Analysis of a Hypothetical Protein from Neisseria gonorrhoeae, An In-silico Approach

Status:

Version 1

Abstract

Background

Methods

Results

Conclusions

Figures

Introduction

Materials And Methods

Results And Discussion

Conclusion

Abbreviations

Declarations

References

Status:

Version 1