Homo-oligomer models prediction, functional characterization, and Ab initio docking analysis of hypothetical protein HP33 from Vibrio harveyi strain

DOI: https://doi.org/10.21203/rs.3.rs-1488433/v1

Abstract

It is reported that antigenicity of a unique hypothetical protein named HP33 of Vibrio harveyi Y6 strain causes scale drop and muscle necrosis disease (SDMND) in Lates calcarifer. This distinct protein was found from only V. harveyi SDMND and is responsible for 40-50% mortality. Despite these devastating complications, there is still no cure or vaccine for the disease. As a result, the current work sought to elucidate the roles of HP33 protein. The sequence similarity was searched across the available bioinformatics databases to find the homologous protein. This hypothetical protein is known to be an unstable, nonpolar, and outer membrane protein based on its subcellular localization and physicochemical properties. Functional annotation tools predicted our target protein has only a single domain. The random coil was found to be predominant in the secondary structure. Quality evaluation methods were used to predict and confirm the secondary and tertiary structure. After YASARA energy minimization, the 3D structure became more stable, which was confirmed by many quality assessment methods. Additionally, the active site and interacting proteins were examined. An important biological activity of the HP is that it contains single functional domains that may be responsible for representing a biosynthetic gene cluster. In addition, protein-ligand docking analysis showed two possible drug molecules against HP33 pathogenesis. Finally, the homo-oligomers models prediction and ab-initio docking results will provide good insights for further immunological study which might be of significant relevance to future V. harveyi genetics research.

1. Introduction

In marine aquaculture, Asian sea bass (Lates calcarifer) or barramundi is a valuable fish. Asian sea bass aquaculture production increased from 11,000 to 76,000 tonnes per year between 1990 and 2015. Despite this, Asian sea bass farms have suffered significant economic losses due to disease outbreaks. Vietnam reported outbreaks of a disease called scale drop and muscle necrosis (SDMN) in sea bass farms in 2017 (Dong et al., 2017). A relatively high mortality rate (40-50%) was seen in affected cages due to scale loss, fin rot, and severe muscle necrosis among the diseased fish. There was a strain of Vibrio harveyi, designated SDMN-Y6 (or Y6) consistently found on diseased fish specimens. On examination of the histopathology of the fish, collapsed renal tubules and sloughing epithelial cells were identified as unique lesions suggesting bacterial toxin(s) involvement (Dong et al., 2017), comparable to Vibrio species that produce toxin and induce acute hepatopancreatic necrosis in shrimp (Sirikharin et al., 2015). Genome analysis of V. harveyi Y6 associated with SDMN revealed several putatively virulent genes and an intact prophage carrying a toxin gene (Kayansamruaj et al., 2019). SDMND was recently discovered in farmed Asian sea bass in Thailand and V. harveyi was dominant, and HP33 protein was blamed for the disease (Kwankijudomkul et al., 2021). Further scientists reported that the HP33 protein is a crucial protein that allows pathogens to penetrate the host cell wall, making it a suitable target antigen for vaccine development (Kwankijudomkul et al., 2021). In addition, the gene encoding HP33, or the protein itself, could be used as a biomarker for early detection of SDMND-associated V. harveyi in Asian sea bass. Based on BLAST searches of HP33 against the GenBank database, only seven of the 48 available genomes of V. harveyi contained this gene. This suggests that V. harveyi with HP33 is a relatively new variety to science (Kwankijudomkul et al., 2021). Currently, the function of HP33 is unknown. Nevertheless, many of this bacterium's proteins are classified as HPs since their structures and biological activities are unknown. Such proteins can be extremely useful, and their annotation can lead to new insights into their structures, routes, and activities. Consequently, bioinformatics approaches can be utilized to predict and analyze various forms of the structure of those HPs, their biological functions, and their interactions with other proteins. Additionally, HP structural and functional annotation may uncover novel biomarkers and pharmaceutical targets (Lubec et al., 2005). Several bioinformatics databases and techniques have been utilized to successfully annotate the roles of putative proteins in a variety of pathogenic bacteria (Turab Naqvi et al., 2017). 

HPs were discovered to be crucial in bridging the gaps in genomic and proteomic data (Ijaq et al., 2022). A massive volume of genomic and transcriptomic data has been stored in online databases over the last several years. Based on the computational study of gene structure and nucleic acid sequences, a considerable percentage of this genomic data has been suggested to encode a protein, however, there is no evidence for their in vivo expression. Hypothetical proteins (HPs) are the name given to these proteins. HPs may make up as much as half of the proteomes of some organisms (Minion et al., 2004). For functional genomics as well as general biology, structural and functional characterization of these unknown functional HPs is a challenge. In the long run, if we can address this challenge, we may be able to bridge the gap between genomic sequence data and structural and functional genomics as well as proteomics. In addition to helping to develop potential antibacterial agents against pathogens and to understand medicine resistance, infection, and other biological pathways, HPs functional annotation is of crucial importance to understanding a wide variety of diseases (Naveed et al., 2016). 

It got simpler to attribute function to an HP utilizing numerous bioinformatics methods as the in-silico study progressed. We aimed to develop a better understanding of the protein and further drug targets through the assignment of structural and biological functions to hypothetical protein HP33 of V. harveyi. Protein-protein interaction was investigated and subcellular distribution, secondary structure, and active site were predicted. In addition, homology modeling with ab initio techniques was used to attempt to produce a good quality model of the HP33.

2. Methods

2.1 Identification of similarity and retrieval of sequences

The amino acid sequence of hypothetical protein HP33 of the Vibrio harveyi Y6 strain was retrieved (Kwankijudomkul et al., 2021). After that, the sequence was saved as a FASTA format and submitted to multiple prediction servers for in-silico analysis. A similarity search was conducted with the NCBI protein database to provide a first prediction regarding the function of the targeted protein against non-redundant (Boeckmann et al., 2003) database to use the BLASTp tool to search proteins that may have similar characteristics (Johnson et al., 2008).

2.2. Phylogeny analysis and multiple sequence alignment

Multiple sequence alignments were performed using the BioEdit biological sequence alignment editor between the HP33 and proteins with similar structural characteristics (Alzohairy, 2011). The phylogenetic analysis was performed using an older version of the Molecular Evolutionary Genetic Study (MEGA) (https://megasoftware.net/).

2.3. Physiochemical properties analysis

ExPASy ProtParam tool was used to determine physical and chemical parameters such as molecular weight, theoretical pI, instability index, extinction coefficient,  atomic composition, estimated half-life, aliphatic index, GRAVY value, etc. (Gasteiger et al., 2003). 

2.4. Subcellular localization analysis

CELLO anticipated subcellular localization (Yu & Hwang, 2008). The results were also compared to PSORTb subcellular localization predictions (Yu et al., 2010), PSLpred (Bhasin et al., 2005), and SOSUIGramN. TMHMM (Möller et al., 2001), HMMTOP (Tusnády & Simon, 2001), and CCTOP (Dobson et al., 2015) were used for the topology prediction. Any hydrophobic part of the protein is identified as a transmembrane region.

2.5. Virulence factor prediction

MP3, VICMpred, and the VFDB server were used to predict the virulence of the HP33. By integrating SVM and HMM into the MP3 server, both metagenomic and genomic datasets can be used to identify proteins of pathogenic origin with greater accuracy and efficiency (Gupta et al., 2014). In the VICMpred server, patterns and amino acid and dipeptide composition of bacterial protein sequences are judged based on SVM-based methods to achieve an overall accuracy of 71.75% (Saha & Raghava, 2006). To reliably detect probable pathogenic strains, the VFDB server uses the VFanalyzer pipeline to execute a continuous and comprehensive sequence similarity search among the hierarchical prebuilt datasets (Liu et al., 2019).

2.6. Identification of conserved domains, motifs, folds, families, and superfamilies

Functions of HP33 was predicted using different available functional databases and tools including CDD, Pfam, InterProScan, and SMART. A search was conducted on the database of conserved domains (CDD, available at NCBI)(Marchler-Bauer et al., 2005), for conserved domains. MEME suites were utilized to analyze the motifs HP33 (Bailey et al., 2009). The evolutionary connections of the protein were assigned using Pfam (Finn, 2005) and SuperFamily (Wilson et al., 2006) database. For the functional analysis of the protein, the protein sequence analysis and classification software InterProScan (Hunter et al., 2008) was used. InterPro uses the InterProScan tool to detect input sequences and compare them to InterPro protein signature databases (Jones et al., 2014). SMART compares input sequences to database entries and looks for sequences with comparable domain design and profiles (Letunic et al., 2012). The PFP-FunD SeqE server (Shen & Chou, 2009) was used to recognize protein folding patterns.

2.7. Evaluation of performance

HP33 was subjected to ROC curve analysis to assess the accuracy of the in silico methodologies employed for functional prediction and domain characterization (Bradley, 1997). Three levels were evaluated to estimate efficiency for each of the five tools. There were two columns in the input data. Binary 0 was given to true negative prediction and binary 1 to true positive prediction in the first column. In the second column, integer values from one to five were assigned, with a larger number indicating greater confidence. The ROC Analysis server received the input data (http://www.jrocfit.org) (Alemayehu & Zou, 2012) following format 1. The accuracy, sensitivity, specificity, and area under the curve (AUC) of the ROC curve were obtained after running the online ROC software.

2.8. Prediction of secondary structure

The secondary structure of proteins was predicted using PROTEUS Structure Prediction Server 2.0 (Montgomerie et al., 2006). Its algorithm employs artificial neural networks and machine learning techniques. It is indeed a server-side application with a front-end website that can predict a protein's secondary structure (beta sheets, alpha helixes, and coils) based on its primary sequence. 

2.9. 3D structure prediction, Refinement, Validation, and Assessment of model quality

The 3D structure of the target protein was predicted using the RaptorX server (http://raptorx.uchicago.edu/) (Xu et al., 2021). The protein's 3D structure was refined using GalaxyWeb. In homology modeling, which is based on empirically proven 3D protein structures, the structure's validity is a vital step. The suggested protein model was submitted to ProSA-web for basic confirmation. The z-score, which represents the overall character of the model, was predicted by the server (Mou et al., 2021). If the z-scores of the predicted model are outside the scale of the property for local proteins, the structure is incorrect (Islam & Mou, 2022). A Ramachandran plot analysis was performed utilizing the Ramachandran Plot Server to establish the overall quality of the protein (https://zlab.umassmed.edu/bu/rama/). Subsequently, the predicted three-dimensional structure was evaluated using PROCHECK, Verify3D, and ERRAT Structure Evaluation server.

2.10. Protein-Protein Interaction Analysis

Because of the relevance of context information, the STRING database was created, which is a pre-computed worldwide repository for the collection and analysis of protein-protein relationships (von Mering et al., 2003). As a result, STRING includes a one-of-a-kind scoring methodology based on benchmarks of various types of connections against a common reference set, all of which are combined into a single confidence score per prediction. The graphical depiction of weighted protein interactions derived from a network offers a high view of functional linkage, making it easier to analyze modularity in biological processes (Sivashankari & Shanmughavel, 2006). The STRING database (http://string-db.org/) was employed in this investigation, which analyzes physical and functional correlations to discover known and expected protein interactions. Genomic context, high-throughput investigations, (Conserved) Co-expression and prior knowledge were used to make this decision. This database quantitatively incorporates interaction data from the following sources (Szklarczyk et al., 2015).

2.11. Protein disulfide bonds

The formation of disulfide bonds between cysteine residues in a protein is critical for its folding into a functional and stable shape. To gain insight into experimental structure determination and protein stability, CYSPRED and DIANA were used to predict disulfide bonds within a hypothetical protein. CYSPRED evaluates whether your query protein's cysteine residues form disulfide bridges/bonds. CYSPRED is a neural network-based predictor that has been taught to accurately discriminate the bonding states of cysteine in proteins, beginning with the non-binding state of the residue chain (Grützner et al., 2009). DIANA was also employed since it aids in the prediction of disulfide connections in a protein sequence input. Understanding the function of a hypothetical protein and tertiary prediction techniques rely heavily on the ability to accurately estimate disulfide bridges (Ferrè & Clote, 2005). We will be able to identify docking sites for hypothetical proteins based on their tertiary structure, moving one step closer to creating drugs that target diseases caused by mutations in the hypothetical gene. 

2.12. Ligand binding site prediction

To anticipate protein-ligand binding sites in hypothetical proteins, the Galaxy server was employed. GalaxySite predicts the ligand-binding site of a query protein based on its tertiary structure by protein-ligand docking. The structure may be either an experimental structure (with or without ligand) or a model structure. If a protein sequence is provided, GalaxySite uses the GalaxyTBM technique to predict the structure without a refinement step. The binding ligands are predicted from the complex structures of similar proteins detected by HHsearch. The protein-ligand complex structures are then predicted by a ligand docking method called LigDockCSA (Heo et al., 2014).

2.13. Homo-oligomer models prediction

From a protein amino-acid sequence of monomer structure, the GalaxyHomomer server (http://galaxy.seoklab.org/homomer) predicts its homo-oligomer structure. It is frequently related to protein physiological functions, such as metabolism, signaling, or immunity, homo-oligomerization is abundant in nature. To gain a molecular-level understanding of protein activities and regulation, information on the homo-oligomer structure is crucial (Baek et al., 2017). If you give GalaxyHomomer an amino-acid sequence as input, it will try sequence similarity, structural similarity, and ab initio docking until 5 models are formed. For model 1, less reliable loop or terminal areas (known as ULR) are automatically recognized and remodeled, and the entire oligomer structure is eased using the Galaxy Refine complex. If the monomer structure is provided as input, a structure similarity-based method and ab initio docking are used to build 5 oligomer models. If the user provides less dependable sections of the input structure, such sections are modified during oligomer modeling.

2.14. Detecting active sites

This protein's active site was determined by using the Computed Atlas of Surface Topography of Proteins (CASTp) (Dundas et al., 2006). It is a web-based tool for identifying, defining, and quantifying concave surface areas on 3D protein structures. The topographical features of a protein are obtained in a detailed, comprehensive, and quantitative manner by CASTp. Active pockets on protein surfaces and within the 3D structure's interior can be precisely detected and measured. As a result, it has become a must-have tool for predicting the areas and critical residues of proteins that interact with ligands (Islam & Mou, 2022; W. Tian et al., 2018).

3. Results and discussion

3.1. Similarity identification, Multiple sequence alignment, and phylogeny analysis

The results of BLASTp against a non-redundant database revealed similarities with other proteins (Table 1). The FASTA sequences of the hypothetical protein HP33 and homologous identified proteins were aligned using multiple sequence alignment. To corroborate homology assessments of proteins at the complex and subunit levels, phylogenetic analysis was used. The alignment and BLAST results were used to create a phylogenetic tree, which offers a comparable idea about the protein (Figure. 1). The distances between branches are also taken into consideration.

Table 1. Non-redundant sequencing yielded a protein with similar properties

Protein ID

                                                         HP33

Organism

Protein Name

Identity (%)

e value

WP_009697735.1

Vibrio harveyi

hypothetical protein

99.33   

0.00

WP_045488642.1

Vibrio harveyi

hypothetical protein

99.00

0.00

WP_045455267.1

Vibrio campbellii

hypothetical protein

96.99

0.00

WP_104035359.1

Vibrio jasicida

hypothetical protein

65.02

0.00

WP_045495220.1

Vibrio hyugaensis

hypothetical protein

64.29

0.00

 

3.2. Physicochemical features

ProtParam calculates the molecular weight of proteins by summing the average isotopic masses of amino acids in a protein and one water molecule's isotopic mass (Wilkins et al., 1999). The protein consists of 299 amino acids, among the most abundant was Ser 31 followed by, Leu 27, Glu 23, Gly 20, Thr 20, Asp 19, Val 18, Ile 18, Lys 18, Gln 17, Ala 17, Asn 15, Pro 12, Phe 11, Met 8, Arg 8, Tyr 8, Cys 3 and Trp 3. The computed molecular weight was 32987.00 Da, with a theoretical pI of 4.63, indicating a negatively charged protein. The pKa values of amino acids are used to compute the protein pI. The side chain of amino acid determines its pKa value. It plays a crucial role in determining a protein's pH-dependent properties (Pace et al., 2009). The total number of positively charged (Arg + Lys) and negatively charged (Asp + Glu) residues were discovered to be 26 and 42, respectively. An instability index provides information about how stable your protein is when in a test tube. A protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable (Gamage et al., 2019). The protein was classified as unstable by the computed instability index of 37.34. In a protein, the aliphatic index is the number of amino acids that have an aliphatic side chain, such as leucine, alanine, valine, and isoleucine (Nehete et al., 2013). The aliphatic index was 81.84, indicating that proteins are stable across a wide temperature range. The GRAVY value was -0.370. GRAVY with a negative value implies that the protein is nonpolar. To calculate GRAVY values, several hydropathy values of each amino acid residue are added and divided by the number of residues or length of the sequence. The higher the positive score, the more hydrophobic the substance (Zhang et al., 2016). Mammalian reticulocytes (in vitro) were found to have a half-life of 30 hours, yeast, > 20 hours, and Escherichia coli, > 10 hours. And the molecular formula of protein was identified as C1447H2286N382O475S11.

3.3. Hypothetical protein functional annotation

This potential protein sequence was discovered to have only a domain using the conserved domain search tool which is a family of unknown functions (accession No. DUF6068). Two further domain search tools, InterProScan and Pfam, were used to verify the result. Pfam server predicted the Sodium: a family of unknown function at 1–65 and 193-276 amino acid residues with an e-value of 6.8e and 15e respectively. This domain represents a biosynthetic gene cluster member (BGC). MIBiG describes this BGC as an example of NRP (non-ribosomal peptide) and polyketide biosynthesis classes. This family includes a protein from the benzamide biosynthetic gene cluster from Myxococcus virescens and appears to be predominantly found in cystobacterineae (Wenzel et al., 2015). PFP-FunDSeqE server recognizes the protein predicted fold type as immunoglobulin-like. The immunoglobulin (Ig)-like the domain is a protein domain that is similar to the Ig domains of immunoglobulins in amino acid sequence and structure. Ig domains have a specific immunoglobulin fold that consists of 70–110 amino acids. 7–10 β-strands are spread across two sheets with typical structure and connectivity in conventional Ig-like domains. The MEME suite was used to examine the motif of V. harveyi Y6 strain HP33 protein. Five motifs were investigated for HP33. The MEME package can also forecast breadth, locations, and E-value. There were zero or one occurrences (of a contributing motif site) per sequence for HP33. The E-value was discovered to be 7.59e-30. Motif width discovered between 6 wide and 50 wide (inclusive). Figure 2 shows the motif analysis for HP33.

3.3 Evaluation of performance

For the five in silico tools and servers, ROC analysis revealed a high level of dependability and credibility (Kumar et al., 2017). When two or more tools predicted the same result for HP33, the confidence of prediction was judged high. An average of 95.96%, 96.804%, and 97.762% were recorded for accuracy, sensitivity, and specificity respectfully of the tools used in functional annotation (Table 2 and Figure 3) which indicates the results were valid (Ahmed, 2022).

Table 2. ROC curve assessment analysis

SL. NO.

Tools/Servers

Accuracy (%)

Sensitivity (%)

Specificity (%)

ROC area

1.

InterProScan

100

100

100

1

2.

Pfam

100

100

100

.98

3.

CDD

96.7

97.1

98.1

.79

4.

SMART

89.3

90.0

93.7

0.4

5.

MEME suites

93.8

96.92

97.01

.85

                                Average

95.96

96.804

97.762

.804

 

3.4. Nature of subcellular localization

To understand the function of hypothetical proteins, we need to know their subcellular localization, because different cellular locations represent different roles. This information can also be designed to make a drug that targets the target protein (Wang et al., 2005). CELLO predicted subcellular localization analysis, which was confirmed by PSORTb and PSLpred. The HP’s subcellular location was anticipated to be the outer membrane (Table 1). In contrast to THMM, the Outer Membrane protein is predicted nothing to contain transmembrane helices. All of these findings point to the protein being Outer Membrane (Islam, Sanjida, et al., 2022).

Table 3: Sub-cellular localization of hypothetical protein predicting from different servers

No.

Analysis

Result

1.

CELLO 2.5

Outer Membrane

2.

PSORTb

Outer Membrane

3.

PSLpred

Outer Membrane

4.

TMHMM 2.0

No transmembrane helices present

5.

HMMTOP

One transmembrane helices present (6-25)

6.

CCTOP

No Transmembrane protein

 

3.5. Virulence factor prediction

Pathogenic organisms generate virulence factors that allow them to evade host defense mechanisms, which is critical to causing diseases to hosts. As a result, understanding the molecular underpinnings of virulence is critical for vaccine development and reverse vaccinology (Chaudhuri & Ramachandran, 2014). By using MP3 and VFDB servers, HP33 has been confidently predicted as a virulence factor and pathogenic protein. While the VICMpred server confidently predicted that this protein has a cellular process function (Ahmed, 2022).

3.6. Secondary structure analysis

A secondary structure is formed by intermolecular and intramolecular hydrogen bonding between the amide groups in the primary structure of a protein. The two most important secondary structures in proteins are alpha helices and beta sheets. The right-handed helix configuration of the alpha helix. Hydrogen bonds between the carbonyl (CO) group and the amino (NH) group of the fourth amino acid in the C – terminal amino acid stabilize it. Beta sheets are planar structures made up of beta strands linked together by hydrogen bonds (Han et al., 2017). The proportions of α-helix, β sheet content, coil content, number of sequence alignments used for ab-initio predictions and overall confidence value were 22 %, 32 %, 45%, 1, and 75.3%, respectively, according to the PROTEUS Structure Prediction Server 2.0 study.

 3.7. 3D structure prediction, model quality refinement, and assessment

The biochemical or biophysical functions of hypothetical proteins may be inferred from their structures (Bernstein et al., 1977). Uncharacterized and hypothetical proteins can benefit from 3D structure to help with function assignment. Because protein folding patterns are frequently retained during evolution, structure-based comparisons can find homologs where sequence-based comparisons are useless (Sivashankari & Shanmughavel, 2006). As a result, structure-based molecular function assignment is a promising strategy for large-scale biochemical protein assignment and the discovery of novel motifs. The three-dimensional structure of the target protein was predicted using the RaptorX server (http://raptorx.uchicago.edu/) and protein model 1 was chosen. The RaptorX program predicts 3D structures for protein sequences that have no close homologs in the Protein Data Bank (PDB) developed by the Xu group. A sequence input is used to predict secondary and tertiary structures, solvent accessibility, disordered regions, and solvent accessibility, according to RaptorX (Källberg et al., 2014). Tertiary model and refine model 1 were chosen and visualized in Discovery Studio (Supplementary Figure 1 A). Through a Ramachandran plot analysis, PROCHECK evaluated the scalability of the galaxy server refined model, where the distribution of φ and ψ angles according to the model limits are depicted in (Supplementary Figure 1B). A valid model covers 84.9 % of the residues in the most preferred regions. A 3D structure model of the target sequence was validated by Verify3D and ERRAT and then compared against the established model. On the Verify3D graph, 82.23 % of residues have an average 3D-1D score of ≥ 0.2, showing that the model has an excellent environmental profile, and an overall quality factor of 71.6157 in ERRAT indicates that the model is good. The YASARA energy minimization server later modified the 3D structure. Before energy minimization, the computed energy was –61,130.4 kJ/mol, but after energy minimization (by three rounds of steepest descent approach), it was reduced to –254,513.5 kJ/mol, making the modeled structure more stable. In addition, ProSA web server analysis resulted in a Z score of -4.09 which indicates the model validation (Figure 1C).

3.8. Analysis of protein-protein interactions and protein disulfide bonds

To fulfill a similar function, proteins frequently interact with one another in a mutually reliant manner. The transcription factors, for example, interact with one another to cause transcription. As a result, the functions of proteins can be deduced from their interacting partners (Sivashankari & Shanmughavel, 2006). Interactions between residues determine the functionality of proteins. We used the STRING 10.0 algorithm to predict the protein's possible functional interactions [31]. The identified functional partners with scores were shown in Supplementary Table 1. Moreover, the most common identified functional partners proteins with HP33 are not annotated yet in the database thus it can conclude that this is a unique functional protein from V. harveyi Y6 strain. Similarly, disulfide bonds are important for the folding of proteins, and they play an important role from both the structural and functional perspectives. In this way, disulfide bond analysis in proteins plays a crucial role in understanding the higher structure and function of proteins. As a result, disulfide bond analysis in proteins is critical for revealing the higher structure and biological roles of proteins. Furthermore, antibody aggregation can be caused by improper disulfide bond formation or exchange, hence disulfide bonding is crucial for protein characterization in the biopharmaceutical manufacturing process (Zhang et al., 2011). CYSPRED predicted two bonding states and one non-bonding state of the predicted tertiary model of HP33 indicates the quality of the model (Islam, Mou, et al., 2022a). In addition, DIANA predicted three bonded cystine sequences in different locations of the protein (Table-3)These servers predicted that in this HP33 protein bonded cystine formed disulfide bond between three cysteine residues, carefully protected inside of the protein to function as a stabilizer for the high-order structure of the protein, or an active center for its bioactivity.

Table 4: CYSPRED and DIANA predict cysteine residues important in disulfide bonding

                                   CYSPRED

                         DIANA

      Cysteine

Prediction

Reliability

Distance

Score

Bonded cysteine

CYS 23

 Bonding State

4

189

0.01073

IALYGCGGGGS-ADNAQCKTTWS

CYS 212

 Bonding State

8

228

0.99116

IALYGCGGGGS-TYNLVCDGMEL

CYS 251

NON-Bonding State

9

39

0.01104

ADNAQCKTTWS-TYNLVCDGMEL

3.9. Ligand binding interactions

Galaxy server ligand binding site predictions were done by matching target models with the PDB file of the best-predicted domain-A model. Three models were predicted by a galaxy server with different ligands. Galaxy server also combines the results into three parts Predicted ligand-binding residues, Predicted binding poses of model, and Templates for protein-ligand complex (Table 5; Fig. 4 (A, B)). Interactions at the predicted ligand-binding site were analyzed using LIGPLOT. The details of the protein-ligand interaction analysis were given in Table 5. The most probable protein-ligand binding poses and templates model for another protein-ligand complex were given in Figure 4. Ligand-binding residues depend on the definition of residue-ligand contact (Islam, Mou, et al., 2022b). If the distance between an amino acid residue and a ligand atom is less than the sum of van der Waals radii of the two atoms + 0.5 A, the residue is considered as a binding site residue (Chen et al., 2016). In addition, ligand HEC is a small molecule commonly known as Ferroheme C (DrugBank Accession Number DB03317). It has a molecular weight of around 684.65 KDa and a Monoisotopic value is 684.152734 with the chemical formula C34H36FeN4O4S2. On the other hand, Valine (DrugBank Accession Number DB00161) is a branched-chain essential amino acid that has stimulant activity. It promotes muscle growth and tissue repair. It is indeed a step in the penicillin biosynthesis process (Arakawa et al., 2010). The chemical formula is C5H11NOwith a molecular weight of 117.5 KDa having a Monoisotopic value of 117.078978601.

Table 5: Predicted ligand-binding residues

Ligand Name

Molecular Weight

DrugBank Name

Binding Residues

HEC

618.50

Ferroheme C

272F 273I 274E 275Q 276V 277E 287R 288E 289T 290K

292T

VAL

117.15

Valine

3K 4R 7L 25S 26G 27S 90Q 92D 93P

 

3.10. Homo-oligomer models prediction and Ab initio docking results analysis

The interface area (Ǻ2) between one chain and the other chains is calculated using the Naccess program by the Galaxy server (Baek et al., 2017). Sequence identity between query and template protein is shown in Figure 9. It ranges from 0 (totally different) to 100 (identical). Table 6 shows the oligomer templates, number of subunits, and interface area. For template-based models, sequence identity or structure similarity is displayed, whereas docking score is displayed for models predicted by ab initio docking. Structure similarity between the query and template protein with the docking score (measured by TM-align) is shown in Table 6. The higher the docking score is the better score. If a sequence is supplied, up to five proteins are chosen as templates depending on the ordering of S among those with S >0.2 times the maximum S overall and those with S >0.7 times the highest S for the particular oligomeric state. Additional templates are picked using the monomer structure anticipated by the template-based modeling if the number of detected templates using this sequence-based method is less than five (Ko et al., 2012). The ranking of S among those having monomer structures similar to the given monomer structure (TM-score obtained using TM-align>0.5) and in the specified oligomeric state is used to pick structure-based templates (Zhang & Skolnick, 2005). Response time depends on the total number of residues in homo-oligomer. In this study, template-based method is used (red dots) (Figure 8) and in most cases, the homo-oligomers structure prediction is finished within 2 hours (Baek et al., 2017). Figure 9 depicted the five different models of homo-oligomer of HP33 protein with chains A and B.

Table 6. Ab initio Docking Results of Five homo-oligomer models of HP33

Model No.

Number of subunits

Interface area Ǻ2

Docking score

1

2 mer

1200.9

2024.170

2

2 mer

2721.9

1951.639

3

2 mer

1179.9

1855.651

4

2 mer

1469.9

1852.032

5

4 mer

6666.6

1269.693

3.11. Active Site Detection

As predicted by the CASTp v.3.0 algorithm, the protein modeled contains 41 unique active sites (Supplementary Figure 2 (A)). CASTp is a database server that can recognize regions on proteins, determine their boundaries, compute the area of the areas, and calculate the dimensions of the areas. Vacuums concealed within proteins and pockets on protein surfaces are also involved. To define a pocket and volume spectrum or vacuum, surfaces of solvent-accessible molecules (Richard surface) and molecular surfaces (Connolly surface) are employed. CASTp might be utilized to look at the operational zones and surface properties of proteins. CASTp provides a dynamic, graphical user interface as well as on-the-fly measuring of user-submitted constructs (Wei Tian et al., 2018). Based on the area of 2602.319 and the volume of 4124.259 the top active sites of the model protein were identified (Supplementary Figure 2 (B)). 

4. Conclusion

The hypothetical protein domain has a crucial role as a biosynthetic gene cluster and immunoglobin, according to the research. It was also discovered to be a nonpolar protein with the single exposed domain. The existence and distribution of this hypothetical protein domain across of V. harveyi strain suggest that this protein has some unique characteristics and new antibacterial drugs could be developed. More research is being done, such as protein-ligand docking studies, to identify the representative amino acids involved in ligand binding. The ab-initio docking and homo-oligomers studies of hypothetical protein HP33 provide some immunological insights therefore it may be of interest to researchers looking to produce new drugs against SDMND disease.

Declarations

Funding 

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflict of interest 

The authors declare that they have no conflict of Interest

References

  1. Ahmed, S. (2022). In Silico Characterization of Essential Hypothetical Proteins from Francisella tularensis Schu S4 Strain. https://doi.org/10.26434/chemrxiv-2022-mwj1g-v2
  2. Alemayehu, D., & Zou, K. H. (2012). Applications of ROC Analysis in Medical Research: Recent Developments and Future Directions. Academic Radiology,19(12), 1457-1464. https://doi.org/https://doi.org/10.1016/j.acra.2012.09.006
  3. Alzohairy, A. (2011). BioEdit: An important software for molecular biology. GERF Bulletin of Biosciences,2, 60-61.
  4. Arakawa, M., Yanamala, N., Upadhyaya, J., Halayko, A., Klein-Seetharaman, J., & Chelikani, P. (2010). The importance of valine 114 in ligand binding in beta(2)-adrenergic receptor. Protein science: a publication of the Protein Society,19(1), 85-93. https://doi.org/10.1002/pro.285
  5. Baek, M., Park, T., Heo, L., Park, C., & Seok, C. (2017). GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure. Nucleic Acids Research,45(W1), W320-W324. https://doi.org/10.1093/nar/gkx246
  6. Bailey, T. L., Boden, M., Buske, F. A., Frith, M., Grant, C. E., Clementi, L., Ren, J., Li, W. W., & Noble, W. S. (2009). MEME Suite: tools for motif discovery and searching. Nucleic Acids Research,37(suppl_2), W202-W208. https://doi.org/10.1093/nar/gkp335
  7. Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Jr., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T., & Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol,112(3), 535-542. https://doi.org/10.1016/s0022-2836(77)80200-3
  8. Bhasin, M., Garg, A., & Raghava, G. P. S. (2005). PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics,21(10), 2522-2524. https://doi.org/10.1093/bioinformatics/bti309
  9. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.-C., Estreicher, A., Gasteiger, E., Martin, M. J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., & Schneider, M. (2003). The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research,31(1), 365-370. https://doi.org/10.1093/nar/gkg095
  10. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition,30(7), 1145-1159. https://doi.org/https://doi.org/10.1016/S0031-3203(96)00142-2
  11. Chaudhuri, R., & Ramachandran, S. (2014). Prediction of virulence factors using bioinformatics approaches. Methods in molecular biology (Clifton, N.J.),1184, 389-400. https://doi.org/10.1007/978-1-4939-1115-8_22
  12. Chen, D., Oezguen, N., Urvil, P., Ferguson, C., Dann, S. M., & Savidge, T. C. (2016). Regulation of protein-ligand binding affinity by hydrogen bond pairing. Science advances,2(3), e1501240-e1501240. https://doi.org/10.1126/sciadv.1501240
  13. Dobson, L., Reményi, I., & Tusnády, G. E. (2015). CCTOP: a Consensus Constrained TOPology prediction web server. Nucleic Acids Research,43(W1), W408-W412. https://doi.org/10.1093/nar/gkv451
  14. Dong, H. T., Taengphu, S., Sangsuriya, P., Charoensapsri, W., Phiwsaiya, K., Sornwatana, T., Khunrae, P., Rattanarojpong, T., & Senapin, S. (2017). Recovery of Vibrio harveyi from scale drop and muscle necrosis disease in farmed barramundi, Lates calcarifer in Vietnam. Aquaculture,473, 89-96. https://doi.org/https://doi.org/10.1016/j.aquaculture.2017.02.005
  15. Dundas, J., Ouyang, Z., Tseng, J., Binkowski, A., Turpaz, Y., & Liang, J. (2006). CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Research,34(suppl_2), W116-W118. https://doi.org/10.1093/nar/gkl282
  16. Ferrè, F., & Clote, P. (2005). DiANNA: a web server for disulfide connectivity prediction. Nucleic Acids Res,33(Web Server issue), W230-232. https://doi.org/10.1093/nar/gki412
  17. Finn, R. D. (2005). Pfam: the protein families database. In Encyclopedia of Genetics, Genomics, Proteomics, and Bioinformatics. https://doi.org/https://doi.org/10.1002/047001153X.g306303
  18. Gamage, D. G., Gunaratne, A., Periyannan, G. R., & Russell, T. G. (2019). Applicability of Instability Index for In vitro Protein Stability Prediction. Protein Pept Lett,26(5), 339-347. https://doi.org/10.2174/0929866526666190228144219
  19. Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R. D., & Bairoch, A. (2003). ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Research,31(13), 3784-3788. https://doi.org/10.1093/nar/gkg563
  20. Grützner, A., Garcia-Manyes, S., Kötter, S., Badilla, C. L., Fernandez, J. M., & Linke, W. A. (2009). Modulation of titin-based stiffness by disulfide bonding in the cardiac titin N2-B unique sequence. Biophysical journal,97(3), 825-834. https://doi.org/10.1016/j.bpj.2009.05.037
  21. Gupta, A., Kapil, R., Dhakan, D. B., & Sharma, V. K. (2014). MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data. PLoS One,9(4), e93907-e93907. https://doi.org/10.1371/journal.pone.0093907
  22. Han, L., Zhang, K., Ishida, H., & Froimowicz, P. (2017). Study of the Effects of Intramolecular and Intermolecular Hydrogen-Bonding Systems on the Polymerization of Amide-Containing Benzoxazines. Macromolecular Chemistry and Physics,218(18), 1600562. https://doi.org/https://doi.org/10.1002/macp.201600562
  23. Heo, L., Shin, W. H., Lee, M. S., & Seok, C. (2014). GalaxySite: ligand-binding-site prediction by using molecular docking. Nucleic Acids Res,42(Web Server issue), W210-214. https://doi.org/10.1093/nar/gku321
  24. Hunter, S., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., Finn, R. D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A. F., Selengut, J. D., Sigrist, C. J. A., Thimma, M., Thomas, P. D., Valentin, F., Wilson, D., Wu, C. H., & Yeats, C. (2008). InterPro: the integrative protein signature database. Nucleic Acids Research,37(suppl_1), D211-D215. https://doi.org/10.1093/nar/gkn785
  25. Ijaq, J., Chandra, D., Ray, M. K., & Jagannadham, M. V. (2022). Investigating the Functional Role of Hypothetical Proteins From an Antarctic Bacterium Pseudomonas sp. Lz4W: Emphasis on Identifying Proteins Involved in Cold Adaptation [Original Research]. Frontiers in Genetics,13. https://doi.org/10.3389/fgene.2022.825269
  26. Islam, S., & Mou, M. (2022). Functional Annotation of Uncharacterized Protein from Photobacterium damselae subsp. piscicida (Pasteurella piscicida) and Comparison of Drug Target Between Conventional Medicine and Phytochemical Compound Against Disease Treatment in Fish: An In-silico Approach. Genetics of Aquatic Organisms,6, 453. https://doi.org/10.4194/GA453
  27. Islam, S., Mou, M., Sanjida, S., & Mahfuj, M. s. E. (2022a). Functional Annotation and Characterization of a Hypothetical Protein from Pseudoalteromonas spp. Identify Potential Biomarker: An In-silico Approach. Aquatic Food Studies,2, 57. https://doi.org/10.4194/AFS57
  28. Islam, S., Mou, M., Sanjida, S., & Mahfuj, M. s. E. (2022b). An In-silico Approach for Identifying Phytochemical Inhibitors Against Nervous Necrosis Virus (NNV) in Asian Sea Bass by Targeting Capsid Protein. Genetics of Aquatic Organisms,6, 487. https://doi.org/10.4194/GA487
  29. Islam, S., Sanjida, S., Mou, M., Mahfuj, M. s. E., & Nasir, S. (2022). In-silico functional annotation of a hypothetical protein from Edwardsiella tarda revealed Proline metabolism and apoptosis in fish. International Journal of Life Sciences and Biotechnology,5, 78-96. https://doi.org/10.38001/ijlsb.1032171
  30. Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., & Madden, T. L. (2008). NCBI BLAST: a better web interface. Nucleic Acids Research,36(suppl_2), W5-W9. https://doi.org/10.1093/nar/gkn201
  31. Jones, P., Binns, D., Chang, H. Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., Pesseat, S., Quinn, A. F., Sangrador-Vegas, A., Scheremetjew, M., Yong, S. Y., Lopez, R., & Hunter, S. (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics,30(9), 1236-1240. https://doi.org/10.1093/bioinformatics/btu031
  32. Källberg, M., Margaryan, G., Wang, S., Ma, J., & Xu, J. (2014). RaptorX server: a resource for template-based protein structure modeling. Methods in molecular biology (Clifton, N.J.),1137, 17-27. https://doi.org/10.1007/978-1-4939-0366-5_2
  33. Kayansamruaj, P., Soontara, C., Unajak, S., Dong, H. T., Rodkhum, C., Kondo, H., Hirono, I., & Areechon, N. (2019). Comparative genomics inferred two distinct populations of piscine pathogenic Streptococcus agalactiae, serotype Ia ST7 and serotype III ST283, in Thailand and Vietnam. Genomics,111(6), 1657-1667.
  34. Ko, J., Park, H., & Seok, C. (2012). GalaxyTBM: template-based modeling by building a reliable core and refining unreliable local regions. BMC Bioinformatics,13, 198. https://doi.org/10.1186/1471-2105-13-198
  35. Kumar, A., Maan, P., Singh, G., & Kaur, J. (2017). In-Silico Characterization of a Hypothetical Protein, Rv1288 of Mycobacterium tuberculosis Containing an Esterase Signature and an Uncommon LytE Domain. Curr Comput Aided Drug Des,13(2), 101-111. https://doi.org/10.2174/1573409912666161124144725
  36. Kwankijudomkul, A., Dong, H. T., Longyant, S., Sithigorngul, P., Khunrae, P., Rattanarojpong, T., & Senapin, S. (2021). Antigenicity of hypothetical protein HP33 of Vibrio harveyi Y6 causing scale drop and muscle necrosis disease in Asian sea bass. Fish Shellfish Immunol,108, 73-79. https://doi.org/10.1016/j.fsi.2020.11.034
  37. Letunic, I., Doerks, T., & Bork, P. (2012). SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res,40(Database issue), D302-305. https://doi.org/10.1093/nar/gkr931
  38. Liu, B., Zheng, D., Jin, Q., Chen, L., & Yang, J. (2019). VFDB 2019: a comparative pathogenomic platform with an interactive web interface. Nucleic Acids Res,47(D1), D687-d692. https://doi.org/10.1093/nar/gky1080
  39. Lubec, G., Afjehi-Sadat, L., Yang, J. W., & John, J. P. (2005). Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol,77(1-2), 90-127. https://doi.org/10.1016/j.pneurobio.2005.10.001
  40. Marchler-Bauer, A., Anderson, J. B., Cherukuri, P. F., DeWeese-Scott, C., Geer, L. Y., Gwadz, M., He, S., Hurwitz, D. I., Jackson, J. D., Ke, Z., Lanczycki, C. J., Liebert, C. A., Liu, C., Lu, F., Marchler, G. H., Mullokandov, M., Shoemaker, B. A., Simonyan, V., Song, J. S., Thiessen, P. A., Yamashita, R. A., Yin, J. J., Zhang, D., & Bryant, S. H. (2005). CDD: a Conserved Domain Database for protein classification. Nucleic Acids Research,33(suppl_1), D192-D196. https://doi.org/10.1093/nar/gki069
  41. Minion, F. C., Lefkowitz, E. J., Madsen, M. L., Cleary, B. J., Swartzell, S. M., & Mahairas, G. G. (2004). The genome sequence of Mycoplasma hyopneumoniae strain 232, the agent of swine mycoplasmosis. J Bacteriol,186(21), 7123-7133. https://doi.org/10.1128/jb.186.21.7123-7133.2004
  42. Möller, S., Croning, M. D. R., & Apweiler, R. (2001). Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics,17(7), 646-653. https://doi.org/10.1093/bioinformatics/17.7.646
  43. Montgomerie, S., Sundararaj, S., Gallin, W. J., & Wishart, D. S. (2006). Improving the accuracy of protein secondary structure prediction using structural alignment. BMC bioinformatics,7(1), 301. https://doi.org/10.1186/1471-2105-7-301
  44. Mou, M., Islam, S., & Mahfuj, M. s. E. (2021). In Silico Functional Annotation of VP 128 Hypothetical Protein from Vibrio parahaemolyticus. https://doi.org/10.4194/AFS37
  45. Naveed, M., Kazmi, K., Anwar, F., Arshad, F., Dar, T., & Zafar, M. (2016). Computational Analysis and Polymorphism study of Tumor Suppressor Candidate Gene-3 for Non Syndromic Autosomal Recessive Mental Retardation. Journal of Applied Bioinformatics & Computational Biology,5. https://doi.org/10.4172/2329-9533.1000127
  46. Nehete, J. Y., Bhambar, R. S., Narkhede, M. R., & Gawali, S. R. (2013). Natural proteins: Sources, isolation, characterization and applications. Pharmacognosy reviews,7(14), 107-116. https://doi.org/10.4103/0973-7847.120508
  47. Pace, C. N., Grimsley, G. R., & Scholtz, J. M. (2009). Protein ionizable groups: pK values and their contribution to protein stability and solubility. The Journal of biological chemistry,284(20), 13285-13289. https://doi.org/10.1074/jbc.R800080200
  48. Saha, S., & Raghava, G. P. (2006). VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition. Genomics Proteomics Bioinformatics,4(1), 42-47. https://doi.org/10.1016/s1672-0229(06)60015-6
  49. Shen, H.-B., & Chou, K.-C. (2009). Predicting protein fold pattern with functional domain and sequential evolution information. Journal of Theoretical Biology,256(3), 441-446. https://doi.org/https://doi.org/10.1016/j.jtbi.2008.10.007
  50. Sirikharin, R., Taengchaiyaphum, S., Sanguanrut, P., Chi, T. D., Mavichak, R., Proespraiwong, P., Nuangsaeng, B., Thitamadee, S., Flegel, T. W., & Sritunyalucksana, K. (2015). Characterization and PCR Detection Of Binary, Pir-Like Toxins from Vibrio parahaemolyticus Isolates that Cause Acute Hepatopancreatic Necrosis Disease (AHPND) in Shrimp. PLoS One,10(5), e0126987. https://doi.org/10.1371/journal.pone.0126987
  51. Sivashankari, S., & Shanmughavel, P. (2006). Functional annotation of hypothetical proteins - A review. Bioinformation,1(8), 335-338. https://doi.org/10.6026/97320630001335
  52. Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K. P., Kuhn, M., Bork, P., Jensen, L. J., & von Mering, C. (2015). STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res,43(Database issue), D447-452. https://doi.org/10.1093/nar/gku1003
  53. Tian, W., Chen, C., Lei, X., Zhao, J., & Liang, J. (2018). CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Res,46(W1), W363-w367. https://doi.org/10.1093/nar/gky473
  54. Tian, W., Chen, C., Lei, X., Zhao, J., & Liang, J. (2018). CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Research,46(W1), W363-W367. https://doi.org/10.1093/nar/gky473
  55. Turab Naqvi, A. A., Rahman, S., Rubi, Zeya, F., Kumar, K., Choudhary, H., Jamal, M. S., Kim, J., & Hassan, M. I. (2017). Genome analysis of Chlamydia trachomatis for functional characterization of hypothetical proteins to discover novel drug targets. Int J Biol Macromol,96, 234-240. https://doi.org/10.1016/j.ijbiomac.2016.12.045
  56. Tusnády, G. E., & Simon, I. (2001). The HMMTOP transmembrane topology prediction server. Bioinformatics,17(9), 849-850. https://doi.org/10.1093/bioinformatics/17.9.849
  57. von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., & Snel, B. (2003). STRING: a database of predicted functional associations between proteins. Nucleic Acids Res,31(1), 258-261. https://doi.org/10.1093/nar/gkg034
  58. Wang, J., Sung, W. K., Krishnan, A., & Li, K. B. (2005). Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinformatics,6, 174. https://doi.org/10.1186/1471-2105-6-174
  59. Wenzel, S. C., Hoffmann, H., Zhang, J., Debussche, L., Haag-Richter, S., Kurz, M., Nardi, F., Lukat, P., Kochems, I., Tietgen, H., Schummer, D., Nicolas, J. P., Calvet, L., Czepczor, V., Vrignaud, P., Mühlenweg, A., Pelzer, S., Müller, R., & Brönstrup, M. (2015). Production of the Bengamide Class of Marine Natural Products in Myxobacteria: Biosynthesis and Structure-Activity Relationships. Angew Chem Int Ed Engl,54(51), 15560-15564. https://doi.org/10.1002/anie.201508277
  60. Wilkins, M. R., Gasteiger, E., Bairoch, A., Sanchez, J.-C., Williams, K., Appel, R., & Hochstrasser, D. F. (1999). Protein Identification and Analysis Tools in the ExPASy Server. Methods in molecular biology (Clifton, N.J.),112, 531-552. https://doi.org/10.1385/1.59259-584-7:531
  61. Wilson, D., Madera, M., Vogel, C., Chothia, C., & Gough, J. (2006). The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Research,35(suppl_1), D308-D313. https://doi.org/10.1093/nar/gkl910
  62. Xu, J., McPartlon, M., & Li, J. (2021). Improved protein structure prediction by deep learning irrespective of co-evolution information. Nat Mach Intell,3, 601-609. https://doi.org/10.1038/s42256-021-00348-5
  63. Yu, C., & Hwang, J. (2008, 26-28 Nov. 2008). Prediction of Protein Subcellular Localizations. 2008 Eighth International Conference on Intelligent Systems Design and Applications,
  64. Yu, N. Y., Wagner, J. R., Laird, M. R., Melli, G., Rey, S., Lo, R., Dao, P., Sahinalp, S. C., Ester, M., Foster, L. J., & Brinkman, F. S. L. (2010). PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics,26(13), 1608-1615. https://doi.org/10.1093/bioinformatics/btq249
  65. Zhang, J., Jia, H., Li, J., Li, Y., Lu, M., & Hu, J. (2016). Molecular evolution and expression divergence of the Populus euphratica Hsf genes provide insight into the stress acclimation of desert poplar. Scientific Reports,6(1), 30050. https://doi.org/10.1038/srep30050
  66. Zhang, L., Chou, C., & Moo-Young, M. (2011). Disulfide bond formation and its impact on the biological activity and stability of recombinant therapeutic proteins produced by Escherichia coli expression system. Biotechnology advances,29, 923-929. https://doi.org/10.1016/j.biotechadv.2011.07.013
  67. Zhang, Y., & Skolnick, J. (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res,33(7), 2302-2309. https://doi.org/10.1093/nar/gki524