3.1. Multiple sequence alignment of complete genomes
The complete genome of coronavirus SARS-CoV-2, Bat coronavirus RaTG13, Pangolin coronavirus and SARS-CoV obtained as blast hits were aligned using Clustal Omega tool available in EBI tools and the phylogenetic tree was constructed (Figure 1).
The MSA has demonstrated the molecular similarities between the organisms. RaTG13 is identified as a neighbour genome for SARS-CoV-2 and justified the hypothesis that, the infection may be transfected from Bats. Meanwhile, the subsequent neighbours were Pangolin MP789 and SARS-CoV. This preliminary sequence alignment has enabled the understanding of sequence similarities and evolutionary information which is very much fundamental in the process of drug discovery.
3.2. Topological analysis and target validation
A thorough analysis of virus-host interactomes may reveal insights into viral infection and pathogenic strategies. Due to lac of interaction data on SARS-CoV-2, the significant identity between SARS-CoV and SARS-CoV-2 proteomes was considered and the SARS-CoV-Human interactome was built by screening domain interactions between SARS-CoV-Human protein-protein interactions (PPIs), and then the network distribution, topological and function analysis was performed (Figure 2). The circular shapes corresponds to proteins (Nodes) which are labeled by Uniprot_ID and the details on nodes are listed in Table 1.
Among 14 proteins on SARS-CoV, major proportion of SARS-CoV-Human interaction is reserved by 5 non-structural proteins (NS3B, NS6, NS7A, NS7B and NS8A with 4, 3, 8, 3 and 1 human proteins correspondingly), 3 Open Reading Frame (ORF) poly proteins (ORF9B, A7J8L3 and A7J8L2 with 4, 1 and 1 human proteins correspondingly), 2 Replicase proteins (R1A and R1AB with 2 and 1 human proteins correspondingly), Membrane protein (VME1 with human IKKB), Envelop membrane protein (VEMP with human B2CL1), Nucleoprotein (NCAP with 4 human proteins) and Spike glycoprotein (SPIKE with human ACE2). With this observations, we can find the higher specificity of Membrane, Envelop and Spike glycoprotein to interact with host through specific entry points. Hence, these three SARS-CoV proteins can be a potential targets to inhibit the pathogen-host interaction specifically while other interactions were versatile. To ensure the impact of inhibition of IKKB, B2CL1 and ACE2 mediated interaction, the landscape of SARS-CoV-Human interaction was further analysed for degree and betweenness centrality distributions of the host as shown in Table 2.
Degree measures how many neighbours a node direct connect to while betweenness measures how often nodes occur on the shortest paths between other nodes. In the PPIs network the nodes with high degree defined as hub protein and the nodes with high betweenness defined as bottleneck protein, both are key or important proteins. In the current topological analysis, the node with low degree of distribution 1 and betweenness centrality 0.0 was A2A3R6. But, the molecular function of A2A3R6 (Uniprot ID: A2A3R6) is not well understood in both human physiology or pathology. Hence, the second node ACE2 with degree of distribution 4 and betweenness centrality 2271 was next significant node as shown in Figure 3 and it is identified as a key node or key player in the SARS-CoV-host interaction. Hence, the SARS-CoV Spike protein interacting with human_ACE2 is identified as a potential drug target. As the information over SARS-CoV-2-Human interaction is not available, the SARS-CoV-Human interaction data can be used. We studied the similarity between SARS-CoV and SARS-CoV-2 by sequence analysis and RBD prediction.
3.3 Sequence analysis and RBD predictions
As depicted in Figure 4, the alignment between S-protein from SARS-CoV-2 and Bat coronavirus RaTG13 was seen closer when compare with SARS- CoV. The alignment at RBD site from 317 residue to 569 found more than 80 percent similar to SARS-CoV and RaTG13 particularly, at major residues, Tyr436, Thr486, Gly488 and Tyr491 except at Arg426 and Pro462 as shown in Figure 5. Considering the evolution, the available crystal structure of SARS- CoV in complex with ACE2 (PDB ID: 6CS2) was used as templet for homology modelling.
The residues involved in interaction of SARS-CoV with ACE2 were predicted using prime module available in Schrodinger small molecule suit and the major interactions are tabulated in table 3 and shown in Figure 6. Very strong interaction was seen between a smallest amino acid Gly488 with Lys353, Gly354 and Asp355 facilitated by 1 hydrogen bond and 94.9% of buried SASA. Remaining other residues have also shown significantly strong interactions with ACE2. Hence, the same residues were made centric to generate the grid.
3.4. Homology modelling and validation of SARS-CoV-2 S-Protein
The modelling of SARS-CoV-2 was performed using the crystal structure of SARS-CoV spike glycoprotein as template, which was 97% identical to the query. The modelled protein shown in Figure 7A was validated for quality and prepared for molecular docking studies using protein preparation wizard available in Schrodinger small molecule suite. The Ramachandran plot generated using protein preparation wizard has confirmed the quality of modelled structure by plotting >95% residues in allowed region, as shown in Figure 7B.
3.5. Molecular docking of small molecules with SARS-CoV-2 S
The repurposing of small molecules for the treatment of COVID-19 requires the knowledge on interaction of therapeutic with SARS-CoV-2 S. Initial HTVS screening suggested 142 molecules with reasonable interaction with SARS-CoV-2 S and further the shortlisted molecules were docked in SP mode where the accuracy of prediction is improved. The docking in SP mode has suggested 15 top molecules listed in table 4 as lead molecules. As hydroxychloroquine is identified to treat COVID-19, it was also subjected for subsequent docking in XP mode. All 15 molecules showed better interaction than hydroxychloroquine with SARS-CoV-2 S. The three molecules Streptomycin, Ciprofloxacin and Glycyrrhizic acid, with low interaction penalties and displaying better interactions with ACE2 binding site on RBD of SARS-CoV-2 S as shown in Figure 8A-C respectively. The three molecules were selected based on their reported anti-viral activity, safety, availability and affordability.
For SARS-CoV-2 S, from glide generated docking model, streptomycin could bind to the receptor, and the binding mode was highly similar to that of ACE2 interaction. The binding pocket of streptomycin was in the RBD site which was observed as an acceptor for ACE2. The streptomycin was well fitted with the shape of the pocket as shown in Figure 8A and 9A with XP score -6.5, where it formed total five hydrogen bonds, among them two hydrogen bonds were seen by donating electrons from N31 and N32 atoms to the Glu493 side chain atoms, simultaneously other two hydrogen bonds were observed between backbone atoms of Leu501 by receiving electrons from hydroxyl groups at 5th and 6th carbon atom of S1 six carbon ring on streptomycin. Remaining one hydrogen bond was formed between backbone atom of Ser503 and hydroxyl group at 6th carbon atom at G3 group of streptomycin. However, the stability of the interaction cannot be pronounced without molecular dynamic simulations.
The docking model of ciprofloxacin illustrated its binding mode on RBD site which was observed as a key site to interfere the virus-host interaction. The ciprofloxacin found reasonably fitted with steric complementarity on RBD pocket as shown in Figure 8B and 9B with XP score -5.31. The interaction of ciprofloxacin with SARS-CoV-2 S was facilitated by two hydrogen bonds with Val492 and Phe499 each by receiving and donating the electrons respectively from hydroxyl and ketone groups.
The docking model of Glycyrrhizic acid illustrated its binding mode on RBD site which was observed as a key site to interfere the virus-host interaction. The Glycyrrhizic acid fitted with steric complementarity on RBD pocket as shown in Figure 8C and 9C with XP score -7.474. The interaction of Glycyrrhizic acid with SARS-CoV-2 S was facilitated by three hydrogen bonds with Leu464, Val492 and Glu493 by receiving electrons from hydroxyl groups of Glycyrrhizic acid. Additionally, the ketone group of Glycyrrhizic acid has formed hydrogen bond with backbone atoms of Phe499 receiving the electrons.
3.6. Molecular dynamics simulation of protein ligand complex
As the receptor SARS-CoV-2 S has 1273aa, it requires enormous computational time to perform MD simulation for whole range of protein, hence, we confined this study only to the RBD portion from 317th residue to 569th residue was trimmed for MD Simulation.
3.6.1. Root Mean Square Deviation (RMSD) analysis of protein ligand complex
The RMSD is used to measure the average change in displacement of a selection of atoms for a particular frame with respect to a reference frame. It is calculated for all frames in the trajectory. The plots in Figure 10 shows the RMSD evolution of a protein (left Y-axis). All protein frames are first aligned on the reference frame backbone, and then the RMSD is calculated based on the atom selection. Monitoring the RMSD of the protein can give insights into its structural conformation throughout the simulation. RMSD analysis can indicate if the simulation has equilibrated its fluctuations towards the end of the simulation are around some thermal average structure. Changes of the order of 1-3 Å are perfectly acceptable for small, globular proteins. Changes much larger than that, however, indicate that the protein is undergoing a large conformational change during the simulation.
The RMSD plot for Streptomycin/SARS-CoV-2 S complex shown in Figure 10A has attained the equilibrium at 5ns and there after shown a stability with maximum RMSD of 1 Å (2.5Å-3. Å) upto 55ns. After 55ns the change in equilibrium state was observed, however the RMSD was seen with in 1.5 Å which is acceptable. Similarly, the Ligand RMSD (right Y-axis) indicates how stable the ligand is with respect to the protein and its binding pocket. The RMSD values for Streptomycin were observed significantly larger than the RMSD of the SARS-CoV-2 S at RBD domain, then it is likely that the Streptomycin is likely to diffuse away from its initial binding site after 48ns.
The RMSD plot for Ciprofloxacin/SARS-CoV-2 S complex shown in Figure 10B has attained the equilibrium at 2ns and there after shown a stability with maximum RMSD of 1.8 Å (2.4Å-4.2Å) upto 58ns. After 58ns the sudden change in equilibrium state was observed, however the RMSD was seen with in 2 Å which is acceptable. On the other hand, the RMSD values for Ciprofloxacin were observed significantly in align with the RMSD of the SARS-CoV-2 S at RBD domain, then it is likely that the Ciprofloxacin can retain in its initial binding site upto 100ns.
The RMSD plot for Glycyrrhizic acid/SARS-CoV-2 S complex shown in Figure 10C has attained the equilibrium till 100ns. Compared to SARS-CoV-2 S in complex with Glycyrrhizic acid to SARS-CoV-2 S in complex with Streptomycin and Ciprofloxacin it found stable till the end of the simulation without any drift in equilibrium. On the other hand, the RMSD values for Glycyrrhizic acid were observed significantly in align with the RMSD of the SARS-CoV-2 S at RBD domain in almost all the frames, Hence it is likely to retain in its initial binding site upto 100ns and predicted to inhibit SARS-CoV-2 S at RBD domain comparatively better than Streptomycin and Ciprofloxacin for long duration but its contact with key ligands has to be confirmed through RMSF and protein ligand contact analysis.
3.6.2. Root Mean Square Fluctuations (RMSF) and secondary structure elements
The RMSF is useful for characterising local changes along the protein chain. In RMSF plot, peaks indicate areas of the protein that fluctuate the most during the simulation. Typically, the tails fluctuate more than any other part of the protein. Secondary structure elements like alpha helices and beta strands are usually more rigid than the unstructured part of the protein, and thus fluctuate less than the loop regions. The secondary structure of RBD on SARS-CoV-2 S has the same secondary structure elements on RBD from SARS_CoV with 74% homologous residues. However, 2019-CoV has a distinct loop with flexible glycyl residues replacing rigid prolyl residues in SARS-CoV. Molecular modelling revealed that 2019-CoV RBD has a stronger interaction with angiotensin converting enzyme 2 (ACE2). A unique phenylalanine Phe486 in the flexible loop likely plays a major role because its penetration into a deep hydrophobic pocket in ACE2. In the trimmed RBD structure this loop starts from 148th residue and ends at 172nd residue. Since the ligand binding site is located at this loop region higher RMSD was noticed. The protein residues that interact with the ligand are marked with green-colored vertical bars on RMSF plots. In the RMSF plot for RBD domain of Streptomycin/SARS-CoV-2 S complex shown in Figure 11A, the RMSF at loop region was 5.6Å with very high ligand contacts (green-colored vertical bars). This was in par with molecular docking interactions. In the RMSF plot for RBD domain of Ciprofloxacin/SARS-CoV-2 S complex shown in Figure 11B, the RMSF at loop region was 5.6Å with a few ligand contacts (green-colored vertical bars) and justifies the interactions seen in molecular docking. Further, In the RMSF plot for RBD domain of Glycyrrhizic acid/SARS-CoV-2 S complex shown in Figure 11C, the RMSF at loop region was 6.3Å with good number of ligand contacts (green-colored vertical bars) and justifies the interactions seen in molecular docking. Though the ligand contacts are seen in interactions, their simulation time coverage determines their stability.
3.6.3. Protein-ligand contacts
Protein interactions with the ligand can be monitored throughout the simulation. These interactions can be categorised by type and summarised, as shown in the plots given in Figure 12A-12C. Protein-ligand interactions are categorised into four types: Hydrogen Bonds, Hydrophobic, Ionic and Water Bridges. Each interaction type contains more specific subtypes, which can be explored through the 'Simulation Interactions Diagram' panel. The stacked bar charts are normalised over the course of the trajectory: for example, a value of 0.7 suggests that 70% of the simulation time the specific interaction is maintained. Values over 1.0 are possible as some protein residue may make multiple contacts of same subtype with the ligand.
In the protein ligand contact plot for Streptomycin/SARS-CoV-2 S complex shown in Figure 12A, residues Glu493 and Lys544 were seen with the maximum interactions fraction i.e 0.20 facilitated by hydrogen bonds and water bridges, which suggest that 20% of the simulation time the specific interaction is maintained and such a interactions are not promising. Hence, Streptomycin cannot be a potential inhibitor of SARS-CoV-2 S to offer anti-COVID19 activity.
In the protein ligand contact plot for Ciprofloxacin/SARS-CoV-2 S complex shown in Figure 12B, residues Phe465, Tyr482, Tyr498 and Phe499 were seen with the interactions fractions 0.75, 0.6, 0.35 and 0.39 respectively facilitated by hydrophobic, hydrogen bonds and water bridges, which suggest that 70%,60%, 35% and 39% of the simulation time the specific interaction is maintained by respective residues and such a interactions are good. Hence, Ciprofloxacin may be a potential inhibitor of SARS-CoV-2 S to offer anti-COVID19 activity.
In the protein ligand contact plot for Glycyrrhizic acid/SARS-CoV-2 S complex shown in Figure 12C, residues Val492, Glu493, Asn496, Cys497 and Phe499 were seen with the interactions fractions 0.78, 1.12, 0.80, 0.60 and 0.80 respectively facilitated by hydrophobic, hydrogen bonds and water bridges, which suggest that 78%,100%, 80%, 60% and 80% of the simulation time the specific interaction is maintained by respective residues and such a interactions are excellent and promising. Hence, Glycyrrhizic acid can be a potential inhibitor of SARS-CoV-2 S to offer anti-COVID19 activity.