3.1 Phylogenetic tree
The phylogenetic tree of the HIV-1 sequences is given in Supplementary Data 6. The tree showed two main clusters involving sub-clusters which show variation depending on the origins of the subtypes. The subtypes with the same origin are placed in the same sub-cluster and branch based on the gene cluster. This data showed that HIV-1 variants have genetic diversity according to their geographical dissemination. This result provided the genomic base involving a large-scale variety to explore a common region in the HIV-1 genome.
3.2 Whole-genome alignment and BLASTx
The results from the whole-genome alignment are provided in Fig. 2. The alignment is visualized such that each LCB is clustered together, and the alignment is zoomed out so that each clustered LCB is shown as a continuous black bar, a region with a length of around 2.4 kb within the ≈ 2.8–5.3 kb range from the consensus sequence is highly conserved with almost no gaps in most of the sequences. In addition, after the alignment, using the progressive MAUVE algorithm enabled us to observe the longest highly conserved common regions within HIV-1 genomes aligned with the same gap region in the sequences (Supplementary Fig. 1, adenine in pink, guanine in yellow, thymine in green, and cytosine in blue, respectively).
The BLASTx results from the consensus sequence have indicated that the highly conserved region belongs to the HIV-1 pol gene (NCBI GenBank: QMX87928.1), hence nominating the functional proteins from the HIV-1 pol gene (R.T., IN, and late-phase protease) as a promising target for developing therapeutics, therefore, further therapeutic screening and analysis were performed on one of the main pol gene products, the reverse transcriptase enzyme [52].
3.3 Molecular docking-based virtual screening
The virtual screening of the ligand dataset against the HIV-1 RT enzyme nominated 7 compounds with significantly high affinities (≥ -8.5 kcal/mol), among them, only 4 compounds successfully achieved the same affinity in 3 subsequent runs, hence only these 4 compounds [(1) 4-hydroxy-3-[5-[5-(2-hydroxyphenyl)-1H-pyrazol-4-yl]-4,5-dihydro-1H-pyrazol-3-yl]chromen-2-one (C21H16N4O4), (2) Artoindonesianin P (C20H16O7), (3) 12a-Hydroxydolineone (C19H12O7), (4) Phomoarcherin B (C23H28O5)] were selected and further analysed, the top docking poses' for these 4 compounds are shown in Fig. 3A, a summary of each compound along with their chemical structures is given in Table 1.
Table 2
A summary of all interactions between the RNase H active site and the respective lead compounds is visualized in Fig. 3B.
Interacting compound
|
Interacting residue
|
Distance (Å)
|
Interaction type
|
Compound 1
|
A445
|
3.80
|
Hydrophobic
|
Q500
|
3.69
|
Hydrophobic
|
Y501
|
3.72
|
Hydrophobic
|
S499
|
3.59
|
Hydrogen bond
|
Q500
|
2.58
|
Hydrogen bond
|
Y501
|
2.28
|
Hydrogen bond
|
Mn2+
|
2.40
|
Ionic
|
Mn2+
|
2.60
|
Ionic
|
Compound 2
|
Q475
|
3.62
|
Hydrophobic
|
R448
|
2.64
|
Hydrogen bond
|
Q500
|
2.47
|
Hydrogen bond
|
Y501
|
2.03
|
Hydrogen bond
|
Mn2+
|
2.50, 3.30*
|
Ionic
|
Mn2+
|
2.50
|
Ionic
|
Compound 3
|
Q500
|
3.23
|
Hydrophobic
|
H539
|
3.22
|
Hydrophobic
|
Mn2+
|
2.21, 2.64*
|
Ionic
|
Mn2+
|
2.73, 2.91*
|
Ionic
|
Compound 4
|
A445
|
3.85
|
Hydrophobic
|
Q500
|
3.96
|
Hydrophobic
|
A538
|
3.66
|
Hydrophobic
|
Q500
|
2.50
|
Hydrogen bond
|
Y501
|
2.25
|
Hydrogen bond
|
Mn2+
|
2.41
|
Ionic
|
Mn2+
|
2.60
|
Ionic
|
* Lead compounds making more than one interaction with the same Mn2+ cation have their distances mentioned within the same cell separated by a comma.
|
3.4 Protein-ligand interaction profiling
The interaction between the R.T.'s RNase H catalytic site with the divalent cation and the top 4 potential lead compounds (as shown in Fig. 3) was closely analysed and visualized, Fig. 3B (a-d) shows all the potential hydrophobic interactions (with yellow dashes) and hydrogen bonds (with magenta dashes) between each residue and lead compound within the RNase H active site, the interacting residues from the R.T. backbone are further expanded (stick representations in dark wild willow) to visualize the interacting atoms. The right columns in Fig. 3B (e-h) show all the potential interactions between the lead compounds and the cofactor Mn2+ cations (cyan beads, dark blue dashes), the residues D443, E478, D498, and D549 (DEDD motif) interacting with the cofactor cations are also expanded to visualize the proximity (sky blue sticks) of the interactions, a summary table of all interactions between the lead compounds within the RNase H active site has been listed in Table 2 along with their distances.
3.5 Molecular dynamics simulation
The stability of the interaction between the four compounds and HIV-1 Reverse Transcriptase with the Inhibitor beta-Thujaplicinol Bound at the RNase H Active Site was systematically investigated by simulation of 100ns molecular dynamics in Fig. 4. The RMSD values allowed us to determine if the simulation had stabilized and if there were any significant conformational changes. We found that the RMSD values of the protein and ligand stabilized after about 20 ns of simulation, indicating that the system has reached equilibrium. We found that some residues had higher RMSF values than others, suggesting that certain protein regions are more resilient. These highly flexible regions could play a role in ligand binding to the protein. When we traced the ligand contacts, we found that the ligands interacted with different residues in the protein, and the specific interactions varied depending on the ligand. When we examined the secondary structure of the protein throughout the simulation, we found that its secondary structure remained constant throughout the simulation without any significant change in the distribution of the secondary structural elements. Finally, in the results of the behavior of counterions on the system, we found that counterions were present in the solvent medium, and their concentrations remained relatively constant throughout the simulation. Our findings show that ligands interact with different residues in the protein, and the specific interactions vary depending on the ligand. We also found that certain protein regions are more flexible than others. As shown in Fig. 5, during the molecular dynamics simulation of the four complexes, the structure of HIV-1 Reverse Transcriptase with the Inhibitor beta-Thujaplicinol Bound at the RNase H Active Site showed a stable trend after 20 ns, and for compound 4-protein, the fluctuation of the RMSD value was significantly higher than for the other three compound groups. As seen in Fig. 6, higher RMSF values in compound 4 mean that the ligand moves more in that region. In this way, it was revealed that the interactions of compound 4 on the protein were higher.
3.6 Binding free energy calculation via MM/PBSA
Using the single trajectory approach for MM/PBSA calculation, the 8 energy terms (〖∆E〗_elec, 〖∆E〗_vdw, 〖∆G〗_PB, 〖∆G〗_SA, 〖∆G〗_gas, 〖∆G〗_sol, 〖∆G〗_pol, 〖∆G〗_npol) were calculated for the R.T. enzyme, lead compound, and RT-lead complex separately from each production simulation, and their total sum was used in Eq. (4) to calculate the binding free energy 〖∆G〗_(bind/mmpbsa), the sums of each energy term are provided in Table 3 along with their standard deviations, the values of 〖∆G〗_mmpbsa indicate the spontaneity of the interaction between the R.T. and lead compounds (i.e. more negative = more spontaneous). Detailed values for each energy term for the protein, ligand, and complex are provided separately provided in Supplementary Data 7.
Table 3
MM/PBSA energy terms for 〖∆G〗_complex calculated for each RT-lead pair of HIV-1, energies were calculated from the last 10 ns of the production trajectory using the single trajectory approach.
\({\varDelta E}_{elec}\)
|
\({\varDelta E}_{vdw}\)
|
\({\varDelta G}_{PB}\)
|
\({\varDelta G}_{SA}\)
|
\({\varDelta G}_{Gas}\)
|
\({\varDelta G}_{sol}\)
|
\({\varDelta G}_{pol}\)
|
\({\varDelta G}_{npol}\)
|
\({\varDelta G}_{mmpbsa}\)
|
|
|
|
|
Compound1
|
|
|
|
|
-28.73 ± 3.76
|
-49.67 ± 3.40
|
41.65 ± 2.85
|
-5.10 ± 0.12
|
-78.40 ± 5.07
|
36.56 ± 2.81
|
12.92 ± 4.48
|
-54.77 ± 3.44
|
-41.84 ± 3.90
|
|
|
|
|
Compound2
|
|
|
|
|
-3.68 ± 3.15
|
-50.89 ± 2.67
|
30.76 ± 3.84
|
-5.38 ± 0.10
|
-54.58 ± 3.97
|
25.38 ± 3.83
|
27.08 ± 4.11
|
-56.28 ± 2.64
|
-29.20 ± 4.80
|
|
|
|
|
Compound3
|
|
|
|
|
-2.27 ± 1.23
|
-48.92 ± 2.66
|
13.21 ± 1.20
|
-4.90 ± 0.11
|
-51.20 ± 3.00
|
8.32 ± 1.18
|
10.94 ± 1.52
|
-53.818 ± 2.70
|
-42.88 ± 3.21
|
|
|
|
|
Compound4
|
|
|
|
|
-2.83 ± 1.95
|
-54.86 ± 2.56
|
27.58 ± 3.64
|
-5.48 ± 0.09
|
-57.69 ± 3.02
|
22.11 ± 3.62
|
24.76 ± 4.22
|
-60.33 ± 2.54
|
-35.58 ± 4.84
|
± indicates the standard deviations.
All values given energy values are for the difference between the complex and the sum of protein and ligand, \({\varDelta X}_{y}= {\varDelta X}_{y\left(complex\right)}-({\varDelta X}_{y\left(protein\right)}+{\varDelta X}_{y\left(ligand\right)})\).
All values are in kcal/mol unit.
|
3.7 Analysis of calculated physicochemical and ADMET properties
The physicochemical properties of the 4 lead compounds are listed in Table 4 along with their drug-likeness results, all 4 lead compounds passed the Lipinksi rule of 5, Ghose filters, Veber filter, Egan filter, and Muegge filter without any violations. The ADMET profiles of each lead compound are also summarized in Table 5 based on the results from admetSAR and ADMETlab.
Table 4
Physicochemical and drug-likeness properties of the lead compounds based on the SwissADME results.
Physicochemical properties
|
Compound 1
|
Compound 2
|
Compound 3
|
Compound 4
|
Molecular weight (g/mol)
|
388.38
|
368.34
|
352.29
|
384.47
|
No. heavy atoms
|
29
|
27
|
26
|
28
|
No. aromatic heavy atoms
|
21
|
16
|
15
|
6
|
No. rotatable bonds
|
3
|
0
|
0
|
0
|
No. H-bond acceptors
|
6
|
7
|
7
|
5
|
No. H-bond donors
|
4
|
4
|
1
|
1
|
Log S (ESOL)
|
-4.15
|
-4.31
|
-3.92
|
-4.68
|
Solubility (mg/mL)
|
2.72e-02
|
1.82e-02
|
4.24e-02
|
8.12e-03
|
Solubility class*
|
Moderately soluble
|
Moderately soluble
|
Soluble
|
Moderately soluble
|
Lipophilicity (Log Po/w)×
|
2.30
|
2.42
|
2.15
|
3.67
|
Lipinksi rule of 5#
|
Pass (0)
|
Pass (0)
|
Pass (0)
|
Pass (0)
|
Ghose filters#
|
Pass (0)
|
Pass (0)
|
Pass (0)
|
Pass (0)
|
Veber filters#
|
Pass (0)
|
Pass (0)
|
Pass (0)
|
Pass (0)
|
Egan filters#
|
Pass (0)
|
Pass (0)
|
Pass (0)
|
Pass (0)
|
Muegge filters
|
Pass
|
Pass
|
Pass
|
Pass
|
*Based on the Log S (ESOL) scale, insoluble < -10 < poor < -6 < moderate < -4 < soluble < -2 < very < 0 < highly soluble.
× The values are average of iLOGP, XLOGP3, WLOGP, MLOGP, and SILICOS-IT.
ᶫ Based on the BOILED-Egg model [59].
# Numbers within parentheses indicate the number of violations of the respective filter/rule.
|
Table 5
ADMET profiles of the lead compounds as per results from admetSAR and ADMETlab.
ADMET properties
|
Compound 1
|
Compound 2
|
Compound 3
|
Compound 4
|
|
|
Absorption
|
|
|
Gastrointestinal absorptionᶫ
|
High
|
High
|
High
|
High
|
Blood-brain barrier permeationᶫ
|
None
|
None
|
None
|
Yes
|
|
|
Distribution
|
|
|
Plasma binding protein×
|
96.58%
|
98.23%
|
95.53%
|
94.06%
|
Fraction unbound in plasma
|
2.52%
|
3.30%
|
5.60%
|
7.13%
|
|
|
Metabolism
|
|
|
CYP450 2C9 Substrate
|
Non-substrate
|
Non-substrate
|
Non-substrate
|
Non-substrate
|
CYP450 2D6 Substrate
|
Non-substrate
|
Non-substrate
|
Non-substrate
|
Non-substrate
|
CYP450 3A4 Substrate
|
Non-substrate
|
Substrate
|
Non-substrate
|
Substrate
|
CYP450 1A2 Inhibitor
|
Inhibitor
|
Inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
CYP450 2C9 Inhibitor
|
Inhibitor
|
Inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
CYP450 2D6 Inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
CYP450 2C19 Inhibitor
|
Inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
CYP450 3A4 Inhibitor
|
Inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
Non-inhibitor
|
CYP Inhibitory Promiscuity
|
High
|
High
|
Low
|
Low
|
|
|
Excreation×
|
|
|
T1/2 (hours)¶
|
< 3 (0.349)
|
> 3 (0.52)
|
< 3 (0.113)
|
> 3 (0.70)
|
|
|
Toxicity
|
|
|
Rat acute (LD50, mol/kg)
|
2.43
|
2.38
|
2.41
|
2.71
|
TP* (pIGC50, ug/L)
|
0.46
|
1.04
|
0.57
|
1.38
|
Acute oral (LD50 mg/kg) ᶫ
|
III
|
III
|
III
|
III
|
Carcinogenicity
|
None
|
None
|
None
|
None
|
× Based on the predictions of the ADMETLab 2.0 [46].
¶ Probability of half-life being greater than 3 h is given within parentheses, below 0.5 was considered to have T1/2 < 3.
* Tetrahymena pyriformis toxicity.
ᶫ Class I ≤ 50 mg/kg, class II > 50 mg/kg, class III > 500 mg/kg, and class IV > 5000 mg/kg.
The SMILES notation of the lead compounds was used as the input to calculate each property (provided in Supplementary Data 8).
|
3.8 Additional docking studies
FIV also exhibits a similar DEDD motif (PDB: 5OVN) and was investigated using the same docking approach similar binding poses with an affinity of -9.0 kcal/mol was observed in detailed results included in Supplementary Data 9. Furthermore, docking of Phomoarcherin B with RNase H of Bacteriophage T4 (PDB: 1TFR) and monomeric reverse transcriptase of MLV (PDB: 4MH8) produced affinities of -8.1 kcal/mol and − 8.3 kcal/mol respectively, further indicating the potency of Phomoarcherin B as a potential antiviral RNase H candidate (detailed log files of the docking experiment are in Supplementary Data 10 and Supplementary Data 11, respectively).
The highest affinity was − 5.08 kcal/mol with conducted docking analyses using local computational analysis of the transmembrane domain of HIV-1 gp41 (PDB: 5JYN) and Phomoarcherin B. These results were also confirmed using SwissDock server; the highest value was − 6.36 kcal/mol. The highest number of elements involving affinity as the cluster was considered (detailed log files of the docking experiment are in Supplementary Data 12).
As well known, HIV infections have kept on claiming lives ever since its emergence. While modern antiviral and HAART therapies provide some relief and support for patients, they also have disadvantages and limitations. This study aimed to perform an extensive computational analysis to discover and evaluate potent novel inhibitors of HIV-1 replication within the host by targeting the most coherent target, providing therapeutic options for the patients while accelerating the drug development processes by providing potential leads.
The mutation rate for HIV-1 has been reported to be 10 − 4 to 10 − 2 mutants/clones, and with the estimated production of 109 virions/day within an infected individual,
the virus mutates efficiently to develop resistance and evade the immune system [60]. However, not all these mutants are expected to survive and replicate, as mutations occurring on some genes could be lethal. The most coherent target for drug discovery and development efforts would be the phenotypes that mutate less frequently, as their chance of developing resistance or evasion is lower than their highly mutating counterparts. All this information is useful for determining regions with low mutation rates within the HIV-1 genome. The comparative genomics approach considers the correlations and differences between the genotype (genome) of closely related species or even different variants of the same specie to answer the reasons behind their characteristic phenotypes. This method has also been widely used to discover resistance genes in several bacterial genomes [53, 54].
In this study, we used high-quality HIV-1 sequences from the Los Alamos database and performed whole-genome alignment with MAUVE to indicate the regions within the HIV-1. The genome shows the highest level of consensus among all the selected sequences; as visualized in Fig. 2, a genomic fragment of around 2.4 kb was the longest genomic fragment (the green bars on top of the sequences in Fig. 2) with the least variation among the selected sequences. BLASTx results have given the pol gene encoded 3 essential functional proteins of the HIV-1; the viral reverse transcriptase, integrase, and late-phase protease, which are major targets for drug targeting. The FDA has already approved NNRTI and NRTI with inhibitory functions on the DNA polymerase activity of the R.T. Therefore, our study has focused on discovering and analysing potential R.T. RNase H inhibitors, critical for viral replication due to their polymerase activity [55–58].
The computational analysis reported in this paper and the lead selection criteria applied, compound 4 is the best-performing lead compound, with a docking score of -8.5 kcal/mol, several hydrophobic, hydrogen bond, and ionic interactions with active site residues of the HIV-1 RNase H and the cofactor Mn2 + cations, less than 1 Å deviation from its initial docked pose throughout the 30 ns molecular dynamic simulation, binding free energy of ≈ -35.58 ± 4.84 kcal/mol and near-perfect scores on each ADMET profile. Further confirming its potential RNase H inhibitory activity, the R.T. enzyme's backbone RMSD also reached a plateau following the 20 ns time-lapse of the production simulation [59, 60].
Compound 4 (Phomoarcherin B) is a natural compound found in the endophytic fungus Phomopsis archeri, and it was first isolated and characterized as a pentacyclic aromatic sesquiterpene via spectroscopic analysis by Hemtasin et al. [61]. They were tested for antimalarial activity against Plasmodium falciparum and anticancer activities against cholangiocarcinoma cell lines. Bedi et al. (62) also stated the anticancer activity, whereas no in vitro or in vivo assay was performed regarding its antiviral or RNase H inhibitory activity. As known, major antiretroviral drugs cannot penetrate through the blood-brain barrier, which reduces their efficiency against HIV in the brain and results in the formation of reservoirs [62]. As shown in Table 5, Phomoarcherin B can penetrate the blood-brain barrier, increasing its drug-candidate potential. However, further in vitro assays and clinical trials are needed to confirm its pharmaceutical potential as an HIV-1 RNase H inhibitor.
We suggest that the additional docking studies of Phomoarcherin B with FIV and monomeric reverse transcriptase of MLV belong to the Retroviridae family and RNase H of Bacteriophage T4, which belongs to Myoviridae family, have also provided promising results. Moreover, assumed anti-RNase H activity will provide more knowledge on the effect of Phomoarcherin B with additional in vitro and in vivo studies (details provided in Supplementary Data 9, Supplementary Data 10, and Supplementary Data 11) (Fig. 5). However, we should stress that computational studies are not sufficient alone to show the inhibitory effect of our suggested candidate compound. Even the affinity of Phomoarcherin B to nucleic acid could be higher than the other three compounds over the active site. We also need some experimental studies and future works for validation.
The results showed the affinity of Phomoarcherin B for a critical target protein with the transmembrane domain of HIV-1 gp41 confirmed by docking server analysis (Supplementary Data 12) (Fig. 8a).