I. ORHTOLOGOUS DETECTION
a. Searching true orthologs:
Database search phase- In this step, the targets are chosen which are most likely to produce a meaningful alignment with the query (Table1). The orthologs organisms obtained using database ‘Candida genome database’. There are four orthologs candida species were found in database namely C. parapsilosis, C. dubliniensis, C. auris and C. glabrata (Table1). EFG1(Accession ID: C1_08590C_A ) and NRG1(Accession ID: C7_04230W_A) were searched for alignment phase(Table 1).
b. Alignment phase- During this phase, the most promising targets in the database are aligned to the query sequence and scored using Basic Local Alignment Search Tool (BLAST) is one of the common algorithms used to find homologs (Table 3). During database searching phase, BLAST decomposes the query sequence into small words. These words are then compared to words in the database in order to find significant target sequences for alignment. NCBI’s BLASTP was used with a maximum E-value threshold of 3; excluding organism-Candida albican, Models and Uncultured/environmental sample sequences (Table 3) for BLASTP result.
Table 1- List of orthologs organisms for EFG1 &NRG1
Species Name
|
Source Of Information
|
Accession ID
|
Gene
|
Function
|
Ortholog Organisms
|
C. albicans SC5314
|
Candida genome database
|
C1_08590C_A
|
EFG1
|
Squalene epoxidase, epoxidation of squalene to 2,3(S)- oxidosqualene; ergosterol biosynthesis; allylamine antifungal drug target; NADH reducing cofactor but S. cerevisiae EFG1 uses
NADPH; flow model biofilm
induced; Spider
|
C.
parapsilosis
C.
dubliniensis
C. auris
C. glabrata
|
|
|
|
|
biofilm repressed
|
|
C. albicans SC5314
|
Candida genome database
|
C7_04230W_A
|
NRG1
|
Transcription factor/repressor; regulates chlamydospore formation/hyphal gene induction/virulence and rescue/stress response genes; effects both Tup1 dependent and independent regulation; flow model biofilm
induced; Spider biofilm repressed
|
C.
parapsilosis C.
dubliniensis
C. auris
C. glabrata
|
Now, using the gene information protein sequence of the EFG1 & NRG1 was found from the database ‘Candida genome database’ shown in Table 2.the gene and protein sequence for EFG1 and NRG1 from well characterized Candida albicans SC5314 were retrieved (Table 2). The location and length of gene and protein were mentioned in Table 2.
Table 2- EFG1 & NRG1 Gene sequence and respective protein sequence.
In BLASTP result the resulted proteins we need coverage of at least 30% of any of the protein sequences in the alignments to the query (Table 3). Total 10 proteins for EFG1 & for 8 proteins for NRG1 were found with significant query coverage 100% to 30 % (Table 3). Whereas, Identity 80.73% to 61.19% for EFG1 and 45.02% to 67.11% for NRG1 was observed. The details of Accession ID of orthologs, Query coverage, E value and Identity were demonstrated (Table 3).The highest Query coverage and identity for EFG1 and NRG1 was observed with Candida maltosa (strain Xu316)(Accession ID:EMG45685.1) and Candida maltosa (strain Xu316) (Accession ID:EMG47643.1) respectively (Table 3).
Table 3- EFG1 & NRG1 BLASTP result. In the resulted proteins we need coverage of at least 30% of any of the protein sequences in the alignments to the query.
Protein
|
Accession ID of orthologs
|
Query coverage
|
E value
|
Identity
|
EFG1
|
EMG45685.1
|
100%
|
0.0
|
80.73%
|
EFG1
|
RCK64968.1
|
100%
|
0.0
|
79.82%
|
EFG1
|
RLV94007.1
|
99%
|
0.0
|
73.85%
|
EFG1
|
CCE43404.1
|
100%
|
0.0
|
68.88%
|
EFG1
|
PSK78524.1
|
99%
|
0.0
|
64.07%
|
EFG1
|
CCE88912.1
|
100%
|
0.0
|
60.55%
|
EFG1
|
KAA8900417.1
|
99%
|
0.0
|
60.69%
|
EFG1
|
SGZ50148.1
|
99%
|
0.0
|
61.56%
|
EFG1
|
QBM86330
|
100%
|
0.0
|
58.35%
|
EFG1
|
GEQ72255.1
|
100%
|
0.0
|
61.19%
|
NRG1
|
EMG47643.1
|
100%
|
2e-53
|
45.02%
|
NRG1
|
RCK65478.1
|
40%
|
1e-43
|
68.75%
|
NRG1
|
RLV92796.1
|
91%
|
1e-40
|
41.70%
|
NRG1
|
SGZ55399.1
|
32%
|
4e-33
|
64.52%
|
NRG1
|
CCE81639.1
|
44%
|
2e-32
|
53.10%
|
NRG1
|
GEQ72681.1
|
36%
|
2e-29
|
60.00%
|
NRG1
|
QBM89661.1
|
66%
|
3e-29
|
40.80%
|
NRG1
|
QFZ30513.1
|
30%
|
2e-28
|
67.11%
|
II. PHYSICO-CHEMICAL PROPERTY ANALYSIS
The analysis of EGF1 and NRG1 protein using the ProtParam tool provides valuable information about proteins, including the number of amino acids, molecular weight, and theoretical pI (Table 4). The total number of amino acids in EGF1 and NRG1 is 550 and 310, while Glycine (18.7%) and Proline (11.3%) is the most frequently occurring amino acids. The molecular weight was 59611Da and 34299.22Da (Table 4). The theoretical pI of EGF1 and NRG1 was 9.37 and 9.92. The total numbers of negatively and positively charged residues are 17 and 10 & 28 and 31respectively. The molecular formula is described based on the atomic combination of carbon, hydrogen, nitrogen and oxygen found in protein (Table 4). The extinction coefficient of EGF1 and NRG1 at 280 nm was 64640 M−1cm−1 and 29715 M−1cm−1 (Abs 0.1% = 1 g/L) of 0.995 g/L was observed (Table 4). The instability index is inversely proportional to the stability of proteins, with 59.30 and 57.59 for EGF1 and NRG1 was observed. The estimated half-life of EGF1 and NRG1was observed to be 30 hours (Table 4).
Table-4: Computational analysis of physicochemical parameters of EFG1 and NRG1 Proteins of C. albicans.
Sr.No.
|
Para meters▼ / Proteins►
|
EFG1
|
NRG1
|
1.
|
Theoretical pI
|
9.37
|
9.92
|
2.
|
Molecular Formula
|
C2553H3933 N767O864S13
|
C1491H2327 N451O463S10
|
3.
|
Total number of atoms
|
8130
|
4742
|
4.
|
Molecular weight:
|
59611.71
|
34299.22
|
5.
|
Number of amino acids
|
550
|
310
|
6.
|
Amino acid
composition(High)
|
Gln (Q)
103(18.7%)
|
Pro (P)
35(11.3%)
|
7.
|
Amino acid composition(Low)
|
Pyl (O) & Sec (U) 0(0.0%)
|
Pyl (O) & Sec (U) 0(0.0%)
|
8.
|
Extinction coefficients (M-1 cm-1) at 280 nm measured in water
|
64640
|
29715
|
9.
|
Total number of charged residues Negatively(Asp + Glu)
|
17
|
10
|
10.
|
Total number of charged residues positively(Arg + Lys)
|
28
|
31
|
11.
|
Estimated half-life (Hrs.)
|
30
|
30
|
12.
|
Instability Index
|
59.30
|
57.59
|
13.
|
AliphaticIndex
|
43.40
|
49.52
|
14.
|
Grand Average of hydropathicity (GRAVY)
|
-1.005
|
-0.958
|
III. PHYLOGENETIC ANALYSIS
Multiple sequence alignment for EFG1 (Figure 1) & NRG1 (Figure 2) by MEGA was done using clustalW. Alignment of the protein sequences deduced among EFG1-10 Sequences. The EGF1 (Accession ID: CR_07890W_A) encoding gene shows significant similarity among homologous. EGF1 gene sequence was compared with sequences from databases; high similarities to several sequences with conserved region were found (Figure1). Suggesting, that five conserved residue patches and conserved motifs. Each comparable member has conserved CDGIYSFR and SYF, GHV residues region (Figure1). Conserved Proline (P), Alanine (A), Histidine(H), Lysine (L), aspartic acid (Y) and (R) residue were found (Figure1).
Alignment of the protein sequences deduced among NRG1-8 Sequences. The NRG1 (Accession ID: C7_04230W_A) encoding gene shows significant similarity among homologous. NRG1 gene sequence was compared with sequences from databases; high similarities to several sequences with conserved region were found (Figure2).
Suggesting, that three conserved residue patches and conserved motifs. Each comparable member has conserved FTT, GHLA and RQDNC residues region (Figure 2). Conserved Cystine (C) and (Q) residue were found (Figure 2).
Phylogenetic tree Construction by MEGA- The four Parameters used in tree construction for EFG1 are as follows: Statistical method. Neighbor-Joining method, Substitution type- Amino acid, Model/Method- p-distance and Rate among site- Uniform Rates. Optimal tree with the sum of branch length = 1.61318370 is shown (Figure 3). Calculation of evolutionary distances was done using the p-distance method and is in the units of the number of amino acid differences per site (Figure 3). This analysis involved 11 amino acids. Dendogram of Candida albicans EFG1/ (Accession No: CR_07890W) With candida viswanathii (Accession No: RCK62494.1), candida tropicalis (Accession No: AJT59418.1), candida maltose Xu316 (Accession No: EMG50168.1), candida parapsilosis (Accession No: CCE45150.1),Mererozyma sp.JA9(Accession No: RLV83154.1), Millerozyma farnosa CBS 7064(Accession No: CCE89067.1 and CCE79776.1), candida intermdia(Accession No: SGZ46625.1),Clavispora lusitaniae (Accession No: QFZ26919.1 and OVF10431.1),Diutina rugosa(Accession No: KAA8901616.1) (Figure 3). EFG1/CR_07890W shows distinct phylogenetic features among the comparative members (Figure 3). It can be reveal that there two subgroups diverged from each other with respect to EGF1 (Figure 3).
The four Parameters used in tree construction for NRG1 are as follows:
- Statistical method- Maximum Likelihood and JTT (Jones-Taylor-Thornton) matrix-based model. Initial tree(s) was obtained by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value
- Substitution type- Amino acid.
- Model/Method- JTT model
- Rate among site- Uniform Rates.
This analysis involved ten protein comparative sequnces(Figure 4). Dendogram of Candida albicans NRG1/ (Accession No: C7_04230W_A|NRG1) With Candida maltosa (strain Xu316) (Accession No: EMG47643.1), candida viswanathii (Accession No:.1 RCK65478.1), Spathaspora sp. JA1 (Accession No: RLV92796.1), [Candida] intermedia (Accession No: SGZ55399.1), Pichia sorbitophila (Accession No: CCE81639.1), Metschnikowia sp. JCM 33374 (Accession No: GEQ72681.1), etschnikowia aff. pulcherrima (Accession No: QBM89661.1) and Diutina rugosa (Accession No: QFZ30513.1) (Figure 4). C7_04230W_A|NRG1 shows Phylogenetic relationship with Candida maltosa (strain Xu316) among the comparative members (Figure 4). At glace, there are two subgroups diverged from each other with respect to NRG1 (Figure 4).
IV. PREDICTION OF THREE DIMENSIONAL STRUCTURE AND ACTIVE SITE.
Active site prediction of EGF1 (Q59X67) and NRG1 (Q5A0E5) was done using COACH.
PROTEIN STRUTURES AND ACTIVE SITE: EFG-1 and NRG1: COACH is a meta-server approach for the protein-ligand binding site prediction. Based on structure of target proteins input, COACH will compare bound- specific substructure and sequence profiles and generate complementary ligand binding site predictions using two comparative methods, TM-SITE and S-SITE, which recognize ligand-binding templates from the BioLiP protein function database. These predictions will be combined with results from other methods (including COFACTOR, FINDSITE and ConCavity to generate final ligand binding site predictions. For ligand-binding site prediction, input primary sequence can also be given as input where I-TASSER will be used to generate 3D models first which are then fed into the
COACH pipeline. The generated results are shown below:
Ligand binding sites in active site of EFG1 are at residues: Glycine (GLY) at 256; Glutamic acid (GLU) at 261 positions (Figure 5). The different features for target-ligand site with five ligands were illustrated (Table 5).
Table 5: The features for predicted ligand-binding sites for Efg1.
C-score
|
Cluster size
|
PDB Hit
|
Binding Ligand
|
Ligand Binding Site Residues
|
0.06
|
4
|
4uuxB
|
MG
|
256, 261
|
0.03
|
2
|
1q9uB
|
ZN
|
220,295
|
0.03
|
2
|
3tv5C
|
RCP
|
214,218,219,238
|
0.03
|
2
|
1mtbB
|
NTB
|
207,276
|
0.03
|
2
|
2wieC
|
CVM
|
243,328
|
C-score is the confidence score of the prediction. C-score ranges [0-1], where a higher score indicates a more reliable prediction. From the Table 5, the best predicted site is the first one (C = 0.06), with Ligand binding site at 256 and 261. The respective cluster size i.e the total number of templates in a cluster is given (Table 5). The possible binding ligand information can be viewed in the BioLiP database. The complex with one binding ligand (single complex structure with the most representative ligand in the cluster) and multiple binding ligands (complex structures with all potential binding ligands in the cluster) has also been generated (Table 5)
Ligand binding sites in active site of NRG1 are at residues: Residues are: Cysteine (CYS) at 230 and 233; Histidine (HIS) at 246 and 250.(Figure 6). The different features for target- ligand site with five ligands were illustrated (Table 6).
Table 6: The features for predicted ligand-binding sites Nrg1.
C-score
|
Cluster size
|
PDB
Hit
|
Binding Ligand
|
Ligand Binding Site Residues
|
0.29
|
10
|
2ytoA
|
ZN
|
230, 233, 246, 250
|
0.09
|
3
|
2i13B
|
GOL
|
242, 245, 269
|
0.06
|
3
|
2i13B
|
Nuc. Acid
|
157,159,160,163,167,170,186,190,191,194,
195,198,205,207,209,212,216,219,237,239,
242,246,249,267,269,272,276,279
|
0.03
|
1
|
2i13A
|
ZN
|
246, 250
|
0.02
|
1
|
2i13B
|
GOL
|
272, 275
|
C-score is the confidence score of the prediction. C-score ranges [0-1], where a higher score indicates a more reliable prediction. From the Table 6, the best predicted site is the first one (C = 0.29), with Ligand binding site at 230, 233, 246 and 250(Table 6). The respective cluster size i.e the total number of templates in a cluster is given in the table 6. The possible binding ligand information can be viewed in the BioLiP database (Table 6). The complex with one binding ligand (single complex structure with the most representative ligand in the cluster) and multiple binding ligand (complex structures with all potential binding ligands in the cluster) has also been predicted (Table 6).
Protein Homology Modeling: EFG1
Enhanced filamentous growth protein 1(EFG1) from organism Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) with uniprot Id Q59X67 was modeled using I- TASSER using multiple templates (Table7).
Table 7: Table summarizing protein homology modeling features for EFG1.
Protein Name
|
Enhanced filamentous growth protein 1
|
Organism
|
Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
|
UniProt ID
|
Q59X67
|
Model Generated using
|
I-TASSER
|
Templates used
|
Multiple
|
1. Modeling Method:
For protein modeling, I-TASSER was used. I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation.I-TASSER modeling starts from the structure templates identified by LOMETS from the PDB library. The table below shows the top 10 template-query alignments generated by LOMETS (Table 8). Templates of high significance measured by the Z-score, i.e. the difference between the raw and average scores in the unit of standard deviation. Z-score > 1 indicate a good alignment (Table 8).
Table 8: The top ten threading templates are represented with significant modeling parameters.
Ran
k
|
PDB
Hit
|
Identity
1
|
Identity
2
|
Covera
ge
|
Normalized
Z-score
|
1
|
2nbiA
|
0.08
|
0.16
|
0.89
|
1.83
|
2
|
4ux5A
|
0.27
|
0.08
|
0.20
|
1.18
|
3
|
5jcsS
|
0.04
|
0.31
|
0.99
|
2.50
|
4
|
1l3g
|
0.21
|
0.07
|
0.22
|
5.19
|
5
|
1l3g
|
0.21
|
0.07
|
0.22
|
4.16
|
6
|
2nbiA
|
0.06
|
0.16
|
0.89
|
4.12
|
7
|
2nbiA
|
0.08
|
0.16
|
0.81
|
2.98
|
8
|
2nbiA
|
0.17
|
0.16
|
0.79
|
1.67
|
9
|
4ux5A
|
0.27
|
0.08
|
0.19
|
3.98
|
10
|
2nbiA
|
0.06
|
0.16
|
0.90
|
1.25
|
Identity1 is the percentage sequence identity of the templates in the threading aligned region with the query sequence, while Identity 2 is the percentage sequence identity of the whole template chains with query sequence (Table 8). . The coverage of the threading alignment is equal to the number of aligned residues divided by the length of query protein. The normalized Z-score of the threading alignments indicated a good alignment (Table 8).
2. The structure of EFG1 as predicted: Structural details of target proteins:
The final model selected by I-TASSER is based on the pair-wise structure similarity (Table 9). The confidence of each model is quantitatively measured by C-score that is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations (Table 9). C-score is typically in the range of [-5, 2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa (Table 9). The model has an estimated TM-score = 0.49±0.15 and estimated RMSD = 11.9±4.4Å calculated based on C-score which is -0.56 and protein length following the correlation observed between these qualities, as shown in the (Table 9).
Table 9: The structure was predicted by correlation observed between different modeling qualities.
Name
|
C-score
|
Exp.TM-Score
|
Exp. RMSD
|
No. of
decoys
|
Cluster density
|
Model2
|
-0.56
|
0.49 ± 0.15
|
11.9 ± 4.4Å
|
600
|
0.2156
|
ERRAT Analyses: The statistics of non-bonded interactions between different atoms Types are analyzed and the value of the error function versus position of a 9-residue sliding Window is plotted (figure 8).
The plot generated indicates the confidence and overall quality of the model. On the error axis, two lines were drawn to indicate the confidence with which it is possible to reject that exceeded error value (Figure 9). Expressed as percentage for which the calculated error value fall below 95% rejection limit. The structure having 77.9116 % quality factor can be considered to be a reliable one (Figure 9).
3. Model Validation: For validation of structure, Ramachandran Plot (RC plot) was used. The Ramachandran plot shows the torsional angles - phi (φ) and psi (ψ) - of the residues (amino acids) in the structure (Figure 10). Glycine residues are separately identified by triangles as these are not restricted to the regions of the plot appropriate to the other side chain types (Figure 10). The coloring/shading on the plot represents the different regions, the darkest areas (here shown in red) correspond to the "core" regions representing the most favorable combinations of phi-psi values (Figure 10). Ideally, one would hope to have over 90% of the residues in these "core" (favoured) regions (Figure 10). The percentage of residues in the "core" region is one of the better guides to stereo chemical quality. Also, residues in the disallowed region should ideally be less than or equal to 0.2%. R C plot was generated using PDBsum (which used PROCHECK for the plot generation) (Figure 10). The RC plot for the selected final model is shown in Figure 10.
Table 10: The structure was analyzed with Ramachandran Plot (RC Plot) statistics (generated by PDBsum) for the selected final model is given in table below:
Region
|
No of residues
|
Percentage (%)
|
Most favoured regions [A,B,L]
|
258
|
56.1
|
Additional allowed regions [a,b,l,p]
|
160
|
34.8
|
Generously allowed regions [~a,~b,~l,~p]
|
26
|
5.7
|
Disallowed regions [XX]
|
16
|
3.5
|
Non-glycine and non-proline residues
|
460
|
100
|
End-residues (excl. Gly and Pro)
|
2
|
-
|
Glycine residues
|
39
|
-
|
Proline residues
|
49
|
-
|
The analysis was based in 118 structures of 2.0 Angstroms resolution and R-factor. In Conclusion overall modeling parameters is the most appropriate model we can model with the available template (s) and can be used for further experiments (Table 10). 4.3: Protein Homology Modeling: NRG1.Transcriptional regulator NRG1 from organism Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) with uniprot Id Q5A0E5 was modeled using I-TASSER using multiple templates (Table11).
Table 11: Table summarizing protein homology modeling features for NRG1
Protein Name
|
Transcriptional regulator NRG1
|
Organism
|
Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
|
UniProt ID
|
Q5A0E5
|
Model Generated using
|
I-TASSER
|
Templates used
|
Multiple
|
1. Modeling Method:
For protein modeling, I-TASSER was used. I-TASSER modeling starts from the structure templates identified by LOMETS from the PDB library (Table 12). The table below shows the top 10 template-query alignments generated by LOMETS (Table 12). Templates of high significance measured by the Z-score, i.e. the difference between the raw and average scores in the unit of standard deviation. Z-score > 1 indicate a good alignment (Table 12).
Table 12: The top ten threading templates are represented in the table.
Rank
|
PDB
Hit
|
Identity
1
|
Identity
2
|
Coverage
|
Normalize
d Z-score
|
1
|
5v3jE
|
0.15
|
0.18
|
0.85
|
3.70
|
2
|
5wjqA
|
0.13
|
0.18
|
0.88
|
1.55
|
3
|
5v3jE
|
0.12
|
0.18
|
0.77
|
4.24
|
4
|
5und
|
0.17
|
0.10
|
0.49
|
1.03
|
5
|
5v3j
|
0.14
|
0.18
|
0.85
|
1.42
|
6
|
5v3jE
|
0.14
|
0.18
|
0.78
|
6.37
|
7
|
5wjqD
|
0.14
|
0.18
|
0.86
|
1.83
|
8
|
5v3j
|
0.23
|
0.18
|
0.35
|
1.26
|
9
|
2nbiA
|
0.18
|
0.21
|
0.98
|
1.54
|
10
|
2ebtA
|
0.33
|
0.10
|
0.29
|
4.27
|
Identity1 is the percentage sequence identity of the templates in the threading aligned region with the query sequence, while Identity 2 is the percentage sequence identity of the whole template chains with query sequence. The coverage of the threading alignment is equal to the number of aligned residues divided by the length of query protein. The normalized Z-score of the threading alignments (Table 12).
2. The structure of NRG1 as predicted: Structural details of target proteins:
I-TASSER selects the final models, using SPICKER program, based on the pair-wise structure similarity, and reports up to five models which corresponds to the five largest structure clusters (Table 13). The confidence of each model is quantitatively measured by C- score that is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations(Table 13).. C-score is typically in the range of [-5, 2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa(Table 13).. The model has an estimated TM-score = 0.47±0.15 and estimated RMSD = 11.0±4.6Å calculated based on C-score which is -2.04 and protein length following the correlation observed between these qualities, as shown in the Table 13.
Table 13: The structure was predicted by correlation observed between different modeling qualities.
Name
|
C-score
|
Exp.TM-Score
|
Exp. RMSD
|
No. of decoys
|
Cluster density
|
Model1
|
-2.04
|
0.47+-0.15
|
11.0+-4.6
|
2319
|
0.0477
|
ERRAT Analyses: The statistics of non-bonded interactions between different atom Types are analysed and the value of the error function versus position of a 9-residue sliding Window is plotted.
The plot generated indicates the confidence and overall quality of the model (Figure 12b). . On the error axis, two lines were drawn to indicate the confidence with which it is possible to reject that exceeded error value (Figure 12b). Expressed as percentage for which the calculated error value fall below 95% rejection limit. The structure having 77.815 % quality factor can be considered to be a reliable one (Figure 12b).
3. Model Validation:
For validation of structure, Ramachandran Plot (RC plot) was used. The Ramachandran plot shows the torsional angles - phi (φ) and psi (ψ) - of the residues (amino acids) in the structure (Figure 13). Glycine residues are separately identified by triangles as these are not restricted to the regions of the plot appropriate to the other side chain types (Figure 13). The coloring/shading on the plot represents the different regions, the darkest areas (here shown in red) correspond to the "core" regions representing the most favorable combinations of phi-psi values (Figure 13). Ideally, one would hope to have over 90% of the residues in these "core" (favoured) regions (Figure 13). The percentage of residues in the "core" region is one of the better guides to stereo chemical quality (Figure 13). Also, residues in the disallowed region should ideally be less than or equal to 0.2%. R C plot was generated using PDBsum (which used PROCHECK for the plot generation). The RC plot for the selected final model is shown in Figure 13.
Table 14: The structure was analyzed with RC PLOT, to highlight the conformation within the model, which showed following statistics: RC plot statistics (generated by PDBsum) for the selected final model is given in table below:
Region
|
No of residues
|
Percentage (%)
|
Most favoured regions [A,B,L]
|
170
|
65.4
|
Additional allowed regions [a,b,l,p]
|
70
|
26.9
|
Generously allowed regions [~a,~b,~l,~p]
|
15
|
5.8
|
Disallowed regions [XX]
|
5
|
1.9
|
Non-glycine and non-proline residues
|
260
|
100
|
End-residues (excl. Gly and Pro)
|
2
|
-
|
Glycine residues
|
13
|
-
|
Proline residues
|
35
|
-
|
The analysis was based in 118 structures of 2.0 Angstroms resolution and R-factor. This is the most appropriate model we can model with the available template (s) and can be used for further experiments (Figure 13, Table 14).
4.4 Selection of inhibitors for pharmacophore modelling with PHARMAGIST:
The inhibitors of our targeted protein Efg1 are allowed to dock with ten potential ligands was performed with Autodock 4.2. (Table 15). ‘Retigeric_acid_B_CID_53319374’ binds most significantly with -7.43 binding energy interaction followed by ligand ‘Biatriosporin_D_CID_132523661’ with -6.91 binding energy (Table 15).
Selection of inhibitors for pharmacophore modelling with PHARMAGIST:
The inhibitors of our targeted protein Efg1 are allowed to dock with ten potential ligands was performed with Autodock 4.2. (Table 15). ‘Retigeric_acid_B_CID_53319374’ binds most significantly with -7.43 binding energy interaction followed by ligand ‘Biatriosporin_D_CID_132523661’ with -6.91 binding energy (Table 15).
Table 15: Inhibitors (Ligands) of EFG1 with their binding energy by Autodock 4.2.
Ligand
|
Target
|
Binding Energy
|
Biatriosporin_D_CID_132523661.
|
EFG1
|
-6.91
|
amoxapine_CID_2170.
|
EFG1
|
-5.69
|
TrichostatinA_CID_444732.
|
EFG1
|
-5.46
|
chlorpromazine_CID_2726.
|
EFG1
|
-4.58
|
Clozapine_CID_135398737.
|
EFG1
|
-5.77
|
Fluperlapine_CID_49381
|
EFG1
|
-6.42
|
loxapine_CID_3964
|
EFG1
|
-5.5
|
farnesol_CID_445070.
|
EFG1
|
-4.61
|
Retigeric_acid_B_CID_53319374
|
EFG1
|
-7.43
|
suberoylanilidehydroxamic-acid_CID_5311
|
EFG1
|
-4.85
|
On the basis of binding energy we have selected 6 compounds out of 8 to build pharmacophore (Figure 14).
EFG1 and selected inhibitors pharmacophore
The inhibitors of our targeted protein Nrg1 are allowed to dock with ten potential ligands was performed with Autodock 4.2. (Table 16). ‘Mdivi_CID_3825829’ binds most significantly with -5.01 binding energy interaction followed by ligand ‘Doxycycline_CID_54671203’ with -2.57 binding energy (Table 16).
Table 16: Inhibitors of NRG1 with their binding energy by Autodock 4.2.
Ligand
|
Target
|
Binding Energy
|
Doxycycline_CID_54671203
|
NRG1
|
-2.57
|
Mdivi_CID_3825829
|
NRG1
|
-5.01
|
Since we have only two known inhibitors for NRG1 Target, We cannot make pharmacophore. Instead we have used swisssimilarity database(Figure 16).
SwissSimilarity: A Web Tool for Low to Ultra High Throughput Ligand-Based Virtual Screening
SwissSimilarity is a new web tool for rapid ligand-based virtual screening of Screenable compounds include drugs, bioactive and commercial molecules, as well as 205 million of virtual compounds readily synthesizable from commercially available synthetic reagents. User interface and backend have been designed for simplicity and ease of use, to provide proficient virtual screening capabilities to specialists and nonexperts in the field (Figure17).
We have selected Zinc Drug Like molecule database for fingerprint search. Screening can be performed using the following approaches:
- FP2 molecular fingerptints from OpenBabel.
- Electroshape 5D (including atomic partial charges and lipophilicity contributions) forfast non-super-positional shape-based virtual screening.
- Spectrophores: This fast non-superpositional shape-based virtual screening developed by Silicos-IT and implemented in openbabel 2.3.2, uses one-dimensional descriptors generated from the property fields surrounding the molecules.
- Shape-IT: A shape-based alignment tool developed by Silicos-IT, which represents molecules as a set of atomic Gaussians and performs molecular alignment as described by Grant and Pickup (J. Phys. Chem. 1995, 99, 3503).
- Align-IT, a pharmacophore-based tool from Silicos-IT to align molecules by representing pharmacophoric features as Gaussian 3D volumes. Described by Grantand Pickup (J. Phys. Chem. 1995, 99, 3503).
Combined score
In addition to the score of the above mentioned methods, it is possible to make a consensus 2D/3D screening using a score based on both FP2 Tanimoto coefficient (s1) and Electroshape-5D Manhattan distance (s2). This combined score f(s1,s2) was developed for reverse screening using our SwissTargetPrediction web interface (Figure18).
It was obtained by logistic regression using f(s1,s2)=(1+exp(-a0-a1s1-a2s2))-1, where a0, a1 and a2 are parameters learned by the model to predict possible protein targets for a small molecule based on molecular similarity to known bioactive compounds (Figure18). f (s1,s2) ranges from 0 for totally dissimilar molecules to 1 for perfectly identical molecules (Figure18). This combined score was found to perform significantly better for drug-like molecules than the similarity assessed by FP2 or Electroshape-5D separately. We have used Combine score to Screen the database (Figure18, Figure 19).
The calculated score is ranging from 0.998-0.958. The molecule with ZINC08578218 with maximum and ZINC07045454 with minimum score was obtained (Figure 19). The potent eight outputs were obtained from the SwissSimilarity
Molecular docking statistical result
Table 17: Docking result and binding energies computed for top ZINC ligand for EFG1 & NRG1.
TARGET
|
LIGAND
|
Binding energy
|
Interacting Amino acid
|
No. of Hydrogen bonds
|
EFG1
|
ZINC31165359
|
-11.3
|
TYR101
|
1
|
|
ZINC08765154
|
-11.1
|
No Interaction
|
-
|
|
ZINC28582890
|
-10.6
|
GLN126
|
1
|
|
ZINC62001396
|
-10.5
|
GLN126
|
1
|
|
ZINC31165363
|
-10.1
|
GLN126
|
1
|
NRG1
|
ZINC20134767
|
-7.4
|
LYS231
|
1
|
|
ZINC16430849
|
-7.2
|
ARG235
|
1
|
|
ZINC16430852
|
-7.2
|
ARG235
|
1
|
|
ZINC20134767
|
-7.4
|
LYS231
|
1
|
|
ZINC12297005
|
-6.1
|
ARG248
|
1
|
|
ZINC12297006
|
-6.1
|
ARG248
|
1
|
The Docking result and binding energies computed for top ZINC ligand for EFG1 and NRG1 with most favourable ligand were demonstrated (Table 17).Target EFG1 was docked with five thermodynamically significant ligands (Table 17, Figure 20).Single hydrogen bond during EFG1- ZINC31165359 interaction at 101Tyrosine residues with binding energy -11.3 kcal/mol resulted significant value (Table 17, Figure 20). Whereas, total of 800 screened ligands (structural similarity) were docked with NRG1 in the active site ZINC20134767 showed best results for NRG1 with binding energy of -7.4kcal/mol at LYS231 (Table 17, Figure 20).