A novel peptide analogue of spike glycoprotein shows antiviral properties against SARS-CoV-2: An in silico approach through molecular docking, molecular dynamics simulation and MM-PB/GBSA calculations

At the very beginning of the new decade, the COVID-19 pandemic has badly hit modern human societies. SARS-CoV-2, the causative agent of COVID-19 carries dozens of new mutations in its genome. Herein, we made an effort to nd new antiviral peptides (AVPs) against SARS-CoV-2. Gladly, with the help of Machine Learning algorithms, and Supported Vector Machine, we have invented three new AVPs against the SARS-CoV-2. Antiviral peptides viz., Seq12, Seq12m, and Seq13m can block the receptor binding domain (RBD) of the SARS-CoV-2, necessary for communication with the angiotensin-converting enzyme 2 (ACE2). In addition, these AVPs retain their antiviral properties, even after the insertion of dozens of new mutations (Rosetta, and FoldX based) in the RBD. Further, Seq12, and Seq12m showed negligible cytotoxicity. Besides, the binding free energy calculated using MM-PB/GBSA method is also in agreement with the molecular docking studies performed using HADDOCK. Furthermore, the molecular interactions between AVPs and the viral membrane protein (M) also showed a thermodynamically favorable interaction, suggesting it could eventually inhibit the viral re-packaging process. In conclusion, this study suggests AVPs viz., Seq12, Seq12m, and Seq13m embrace importance as a potential anti-SARS-CoV-2 therapeutic. These AVPs could also aid virus diagnostic tools in the future. TYR543. Five polar uncharged amino acid residues three


Introduction
At present, the entire world is facing challenges in handing the COVID-19 pandemic [1]. Reports are coming out from different countries which suggest repurposing known antiviral drugs against COVID-19 could be fruitful [2][3][4]. However, none of them have reached a nal de nitive treatment for COVID-19. Now, multiple countries are in the race to nd a successful vaccine against COVID-19 [5]. But SARS-CoV-2, the causative agent of COVID-19, contains dozens of new mutations in its genome. And it is one of the main concerns for all anti-COVID-19 efforts [6]. Moreover, varieties of new regional strains are appearing in a very short time frame [7]. SARS-CoV-2 encodes a spike glycoprotein and for function, they form a trimer. The notorious entry of SARS-CoV-2 inside a human host is primarily mediated via a protein-protein interaction between angiotensin-converting enzyme 2 (ACE2), (host cell), and the receptor binding domain (RBD), located at the spike glycoprotein of the SARS-CoV-2. Therefore, RBD embraces importance as a potential pharmaceutical target.
Herein, we made an effort to nd out new antiviral peptides against SARS-CoV-2. In such an effort we utilized Machine Learning and Supported Vector Machine for antiviral peptide predictions. Thereafter, we opt for residue resistant molecular docking, molecular dynamics simulations, and MM-PB/GBSA analysis to characterize the antiviral properties. Best of our knowledge, this is the rst report of antiviral peptides against the SARS-CoV-2 derived from the RBD of the SARS-CoV-2.

Materials And Methods
Data sampling and analysis Nucleotide sequences of the SARS-CoV-2 were obtained from NCBI viruses ( ). The complete genome sequence of the SARS-CoV-2 strain Wuhan-Hu-1 (GenBank Sequence Accession: MN908947) was used as the query sequence. The viral spike glycoprotein (S-Protein), and membrane protein (M-Protein), (QHD43416.1) were studied using PSI-BLAST. Homology models of the M-and S-Protein were built using i-TASSER and SWISS-MODEL respectively [8,9] followed by structure validation using PROCHECK [10].

Motif search
The aligned sequences of spike protein resulted from the PSI-BLAST were used to discover motifs using the MEME suite [11,12]. The primary query included a set of 69 protein sequences, between 16
In silico mutant model of the receptor binding domain Crystal structures of the receptor-binding domain (6W41) of the SARS-CoV-2 was obtained from RCSB-PDB [23]. We manually select Chain C of the 6W41 as it is annotated as RBD for ACE2. The in silico multipoint mutant models were build using FireProt [24]. The combined mutant model was further re ned by manually (based on FoldX, and Rosetta). A detailed structural analysis of the mutant models was carried out using DALI [25], and UCSF Chimera [26].

Molecular dynamics (MD) simulation
MD-simulation studies were performed using GROMACS (v2020) with following modi cations [29]. The MD-simulation productions of the antiviral peptides were carried out for 100 ns [30]. A detailed method of this MD-simulation using GROMACS (v2020) is described in the supplementary section. Furthermore, MDsimulation studies of the antiviral peptides complex with the wild and mutant type RBD were performed using NAMD software [2]. Input les were generated using the web-based server CHARMM-GUI (http://www.charmm-gui.org/). All systems were neutralized with KCl at 0.15 Molar, using the Monte-Carlo ion placing method. Systems were solvated using TIP3 water model, and CHARMM36m force eld was used to assign charges. The Energy minimization was performed for 20,000 steps by steepest descent method followed by restrained 5 ns-equilibration at NVT ensemble and unrestrained 100 nsproduction at 310 K. RMSD, RMSF, Radius of gyration (Rg), H-bonds, and SASA were analyzed using VMD. MM-PBSA calculations were performed using the Calculation of Free Energy (CaFE) tools [31]. Last 10 ns (the most stable region) were selected from each system and subjected for the MM-PBSA calculations, Internal dielectrostatic was set to 4.0, and the external dielectrostatic was set to 80.0. The Reciprocal of grid spacing was set to 2.0. In addition, for MM-GBSA calculations all-atom MD simulations were performed using the AMBER 16 package with the FF99SB for the protein and peptide molecules [32].
The protein-peptide systems were solvated with the TIP3P water models and neutralized with Na + and Cl − ions using the tLEaP input script available from the AmberTools. Long-range electrostatic interactions were applied via the particle-mesh Ewald method [33]. The SHAKE algorithm was used to constrain the length of covalent bonds, including hydrogen atoms [34]. The Langevin thermostat was implemented to equilibrate the temperature of the systems at 300 K. A 2.0 fs time step was used for all simulations. The minimization and equilibration protocols were run for 100,000 steps and 200 ps. Finally, 100 ns of classical MD simulations with no constraints were performed for each of the protein-peptide complexes using the MM-GBSA approach. According to the standard protocol, we used the explicit solvation model for all MD simulations and later the implicit solvation as a postprocessing end-state method to calculate binding free energies of molecules in solution utilizing the Python script. Additionally, HawkDock was also used for MM-GBSA calculations per amino acid residues [35]. The results were analyzed using GraphPad Prism 6 (San Diego, CA, USA).

Characterization of immunogenic properties
Antiviral peptides were checked for their immunogenic properties using the IEDB epitope analysis tool which includes sequential B-cell epitopes [36], T-cell (MHC-II), NetCTL-1.2 [37] and, mapped within the predicted epitopes.

Results And Discussion
SARS-CoV-2 is a novel coronavirus strain and it is new to the scienti c communities. Therefore, in-depth monitoring of all aspects of the COVID-19 has been started. For example, signs and symptoms [38], mode of transmission [39], WHO-solidarity trials [40], contact tracing by mobiles apps such as "Arogya Setu" by India [41], CRISPR-Cas based rapid diagnostic of SARS-CoV-2 [42], and also monitoring daily cases by crowd-sourcing from https://www.covid19india.org/. Moreover, several reports are coming out as preprints and formal peer-review publications about repurposing known drugs [2] and antiviral drugs against the SARS-CoV-2 [43]. Reports suggesting possible anti-COVID-19 effects of different phytochemicals by in silico screening methods [4]. Indeed to address the urgent need for a safe and e cacious vaccine against the SARS-CoV-2 several vibrant initiatives have been started as never before. For example, vaccine manufacturing front-runner come-up with mRNA vaccines [44], viral vector vaccine [45], classical attenuated vaccine etc. [46]. However, reports show COVID-19 vaccines are not going to be a silver bullet for the immunization of a community. And, for the long-run alternatives to the traditional therapeutics would be necessary as before [47,48].
An antiviral peptide is one of such new alternatives [49]. And, antiviral peptides are successful to combat the SARS-CoV, and MERS-CoV [50,51]. Therefore, with a similar kind of anticipation for the SARS-CoV-2, we have invented three new antiviral peptides against the SARS-CoV-2 (Figure 1a-c). Initially, the peptide sequences were interpreted as an antiviral peptide using AVPred [14], and Meta-iAVP [13]. The amino acid sequences of the predicted antiviral peptides were then mapped within the spike glycoprotein of the SARS-CoV-2. Figure 1d shows that antiviral peptides namely, Seq12, Seq12m, and Seq13m are in fact analogous peptides of the spike glycoprotein. Fasta sequences of the antiviral peptides are available in the supplementary section, Figure S1.
AVPred is an antiviral peptide prediction server that is based on a few sequence features viz., motifs, alignment, amino acid composition, and physicochemical properties [14]. Finally, the prediction of the antiviral peptide is made during 5-fold cross-validation using Supported Vector Machine (SVM). Antiviral peptides were predicted by the AVPred (based on the physicochemical model) can achieve up to 85% prediction accuracy with 0.70 Matthew's Correlation Coe cient (MCC). However, the experimental validation dataset shows 86% prediction accuracy with 0.71 MCC.
Conversely, Meta-iAVP is based on a novel sequence-based meta-predictor with an effective features representation derived from Machine Learning Algorithms (MLA) and MLA types of features [13]. Interestingly, the use of MLA and MLA types features has increased the overall prediction accuracy, and MCC of 95.20%, and 0.90 respectively. At present more than 15 peptide-based drugs are in the pipeline of clinical trials [13].
The most worrying concern of an antiviral peptide is its immunogenic pro le [52]. A low immunogenic pro le is the desired characteristic of an antiviral peptide because a low immunogenic pro le reduces the chances of elimination by the host defense system [53]. Therefore, we also investigate the immunogenic pro le of the AVPs. Hopefully, none of the AVPs showed immunogenic properties to humans as a host. Furthermore, to intensify the immunogenic pro le of the AVPs we have performed epitope prediction using IEDB tools. Soon after, epitope prediction, AVP sequences were mapped within the aligned sequence of the epitopes (only most frequently human alleles were chosen). Results showed that none of the epitopes have signi cant similarities with the AVPs ( Figure S2). A few amino acid residues were apparently similar to the predicted epitopes. However, the binding postures (AVP-RBD) suggest these apparently similar amino acid residues were engaged with the interactions (AVP-RBD). Therefore, the probability of elimination by the host defense system is not considerable.
Furthermore, the AVPs in this present study are capable of inducing an anti-in ammatory cytokine, IL-4 [54]. But, they are not capable to induce IL-10 (a pro-in ammatory cytokine). In addition, AVPs were also non-allergenic as they do not have any known epitope for human IgE [20]. Other physicochemical properties viz., estimated half-life, instability index, water-solubility, theoretical IC 50 values, etc. were also summarized in Table 1. Antiviral peptides usually rupture the viral capsid, and eventually inhibit the viral replication cycles [66]. As mentioned earlier, the interactions between RBD-ACE2 is essential for viral entry inside the human host [67]. And recently, a research group led by Prof. Qiwei Zhang at the School of Public Health, Southern Medical University, China, has identi ed the molecular interactions between RBD-ACE2 [68]. Results suggest RBD includes six aromatic amino acid residues viz., TYR449, PHE456, PHE486, TYR489, TYR505, and TYR543. Five polar uncharged amino acid residues ASN487, GLN493, GLN498, THR500, and ASN501. And three non-polar aliphatic amino acid residues, LEU455, GLY496, and GLY502.
We utilized this information in molecular docking studies to zoom-in on the molecular interactions of the AVPs-RBD complexes. However, there is an increasing amount of fear of mutations in the RBD, that would probably make all anti-COVID-19 efforts nil [69]. Indeed, it is very true that if mutations incorporated into the RBD then therapeutics have to evolve accordingly. Therefore, we have constructed a few in silico mutant models of the receptor-binding domain (RBDm) of the SARS-CoV-2, such as energy mutant, evolutionary mutant, and combined mutant ( Figure 2). The amino acid substitutions were incorporated based on Rosetta and FoldX [24]. And the results are summarized in Table 2. The combined mutant model suggested by the FireProt was further re ned by the manual incorporations of amino acid residues based on the best energy substitutions suggested by FoldX, and Rosetta [24]. This combined approach has led to an increase in the free energy of the RBDm from -27.63 kcal/mol (17 mutations) to -32.67 kcal/mol. And among the important amino acid residues of the RBD [67] only ASN487 is conserved ( Figure S3). The structural analysis of the receptor-binding domains (RBD) of the SARS-CoV-2 has been summarized in Figure 2. The RRDis Map analysis shows that RBDm and RBD can be overlapped. Moreover, the RRDis Map showed that RBD and RBDm had an average distance of 51.74 (SD:1.486, a secreted site of the map), suggesting they are structurally different ( Figure S5). Moreover, the RBDb/c/e structurally distinguished from RBDm/RBD and they can not be overlapped with either RBD or RBDm. The RRDis map showed that the average distance of the RBDb/c/e was 28.884 (SD:0.032, a secreted site of the map), suggesting they are structurally close to each other. In addition, the correspondence analysis also suggests RBDb/c/e are structural neighbors and, RBD and RBDm are structurally distinguished from RBDb/c/e. The structural tree (Supplementary Figure S4)   Results from molecular docking studies showed that AVPs-RBD had a thermodynamically favorable interaction (Table 4). Moreover, the AVPs were engaged with nearly all-important amino acid residues of the RDB as mentioned in the earlier report [67] . Out of three antiviral peptides of this present study, Seq12 (47 mer) showed an astonishing HADDOCK score of -111.2 kcal/mol followed by Seq13m (81.4 kcal/mol), and Seq12m (76.8 kcal/mol) suggesting AVPs are well-docked with the RBD (Figure 3, S6). In addition, molecular docking studies between RBDm and the antiviral peptides (RBDm-AVP) summarized in Table 4. Results suggest Seq12, Seq12m, and Seq13m can retain the thermodynamically favorable binding properties with the mutant RBD. Moreover, Seq12 surprisingly showed an improved binding free energy (kcal/mol) in comparison to the wild type RBD (kcal/mol). Further, Seq12, and Seq12m have negligible cytotoxicity (0.17; while 1.0 or higher is considered as toxic and can rapture the human RBCs), (Table 1). These ndings combinedly justify the novelty of the AVPs, and its possible therapeutic use in the future. SARS-CoV-2 has a structural membrane protein called Protein-M [70]. Protein-M is a tri-pass transmembrane protein (Figure 4) which is essential for re-assembling viral structural units into a mature virus [71]. Therefore, Protein-M also embraces importance as a potential therapeutic target. Results from our investigation indicate AVPs are also participating in a thermodynamical favorable interactions with the TM-1 region followed by TM-2 and TM-3 respectively (Figure 4a). In addition, the molecular docking studies between the AVPs and the active site of the viral RNA-dependent RNA polymerase (RdRp) were not thermodynamically favorable interaction (Supplementary section). We speculate that the antiviral peptide Seq12, Seq12m, and Seq13m acts as an anti-SARS-CoV-2 peptide by two possible mechanisms.
Firstly, by inhibiting the RBD-ACE2 interaction. And secondly, by binding with M-protein followed by an eventual inhibition of the viral re-assembly/re-packaging.
Results, derived from MD-simulation of the studies of the AVPs are summarised in Figure 5. The root mean square deviation (RMSD) of alpha carbon atoms of all systems are analyzed to detect their stabilities. It is observed from Figure 6 that Seq12m, has the lowest RMSD value than Seq12 and Seq13m respectively. Antiviral peptides namely Seq12, Se12m, and Seq13m have ≈ 0.274 nm, ≈ 0.274 nm, and ≈ 0.286 nm of RMSD values at 1 ns. The highest RMSD values are respectively ≈ 0.852 nm, ≈ 0.908 nm, and ≈ 1.074 nm for Seq12, Seq12m, and Seq13m. A closer look at the RMSD plot suggests Seq12m has the highest uctuation from 5 ns -10 ns and gets stabilized throughout the trajectory. However, at the start, RMSD-uctuation is lower in the case of Seq12 and the highest RMSD-uctuation is observed between 30 ns -40 ns. Quite similarly, the RMSD-uctuation of Seq13m is highest between 20 ns -40 ns. Root means square uctuation (RMSF) helps to understand the exibility of each amino acid residue [72].
Seq13m is found to have the highest RMSF from the 5 th to 7 th and 25 th to 30 th position of the amino acid sequence.
The lower degree of uctuation with its consistency through the simulation indicates the greater compactness and rigidity of a system [72]. Rg of Seq13m is uctuated from 10 ns -40 ns and reaches the highest value of 1.331 at ≈ 25 ns. However, Rg values of Seq12 and Seq12m are very similar at the beginning and remain same up to 10 ns. Further, Rg uctuations are prominent after 10 ns and sustain such dissimilarities throughout the trajectory. The lowest Rg values are ≈ 1.041 nm (50 ns), ≈ 1.380 nm ( before 10 ns), and ≈ 0.944 nm ( before 10 ns) respectively for Seq12, Seq12m, and Seq13m. In summary systems i.e., Seq12 and Seq12m showed better stability throughout the complete trajectory. The number of inter water-peptide hydrogen bonds in the simulated systems were also compared. The highest number of hydrogen bonds are formed respectively by Seq12 (40 -45 ns), Seq12m (15 ns), and Seq13m (before 5 ns).
In addition, MD-simulation studies of the AVP-RBD complex systems (antiviral peptide + RBD/RBDm) of the 100 ns MD production for piror to the MM-PBSA analyses were summarised in Figure 6. RMSD, and RMSF were stable throughout all MD-trajectory and were most stable in the case of Seq12-RBD/RBDm, and Seq12m-RBD/RBDm complex. However, RMSD values uctuates in the case of Seq13m-RBD/RBDm complex but the these values were within acceptable range of 3nm suggesting extension of the simulation time scale is not required. RMSF value is a measures of the avarage deviations particular atoms or group of atoms from the initial reference structure [73]. Results showed that, in case of the Seq13m+RBDm, RMSF uctuation was high from about 138 th -158 th position and last twenty ve residues of the RBD. Similarly, in the case of Seq13m+RBD the RMSF value highly uctuates from about 200 th to rest of the residues of the RBD. However, only two important interacting amino acid residues namely, PHE486 and TYR489 was remain within the utuating regions ( Figure 6b). Besides, the ΔG bind of Seq13m+RBDm and Seq13+RBD were also very good in comparison to the other RBD-AVP complex systmes (Table 5)  The results obtained from MM-PB/GBSA analysis are also in agreement with the molecular docking studies (Table 5). Moreover, the binding free energies (MM-GBSA) per amino acid residues (only the top twenty participating amino acid residues) are summarized in Figure 7. It shows that TYR489 was an important amino acid residue of the RBD because TYR489 was a common contributor to the lowest binding free energies of Seq12-RBD (-9.30 kcal/mol), Seq12m-RBD (-6.55kcal/mol), and Seq13m-RBD (-4.91kcal/mol). On the other hand, GLU24 (-9.20 kcal/mol) of Seq12, THR20 (-3.36kcal/mol) of Seq12m, and PRO25 (-6.56 kcal/mol) of Seq13m were critically important for the respective antiviral peptides for the same reasons. Further, GLN498 (-0.22 kcal/mol), and GLN493 (-0.28 kcal/mol) were poor contributors for the antiviral peptide-RBD (AVP-RBD) interactions respectively for Seq12 and Seq12m. However, GLY496 of the RBD poorly contributes to the Seq13m-RBD interaction.
Conversely, Seq12-RBDm, Seq12m-RBDm, and Seq13m-RBDm interactions reveal a different picture (Figure 7). For example, amino acid residues of mutant RBD namely, TYR501 (-6.32 kcal/mol), MET494 (-5.11 kcal/mol), and TYR-450 (-5.24 kcal/mol) contributes lowest binding free energy changes while interacting with Seq12, Seq12m, and Seq13m respectively. It is indeed, noticeable that in all cases (wild and mutant RBD) tyrosine contributes the lowest binding free energy irrespective of its positions in the RBD. The biochemical and biophysical properties of tyrosine residue enable itself for this phenomenon.
However, tyrosine has not participated in the top twenty amino acid residues in the case of Seq12m-RBDm. On the contrary, LEU493 was a poor contributor to the binding free energies of Seq12-/ Seq12m-RBDm interactions. However, ASN449 was another poor contributor in the case of Seq13m-RBDm interaction. Overall, the molecular details of AVP-RBD/RBDm interactions calculated using MM-GBSA provide evidence for the fact that AVPs were occupied the important amino acid residues of the RBD which are critical for the RBD-ACE2 interactions [67].  (Table S1).

Conclusion
In summary, peptide analogs of spike glycoprotein namely, Seq12, Seq12m, and Seq13m showed antiviral properties against the SARS-CoV-2. Molecular docking suggests Seq12, Seq12m, and Seq13m can inhibit RBD-ACE2 interactions. In addition, these antiviral peptides can retain their antiviral properties even after the insertion of dozens of new mutations in the RBD. Furthermore, the AVPs can interfere with viral membrane protein M, and it could eventually inhibit viral re-packaging cycles. MD-simulation studies of the AVP-RBD/RBDm complex of Seq12 and Seq12m showed stable RMSD and RMSF throughout the complete MD-trajectory suggesting a stable interaction. Moreover, the binding free energy calculations using MM-PB/GBSA are also in agreement with the molecular docking studies. Furthermore, AVPs namely, Seq12, and Seq12m showed negligible cytotoxicity and they are non-allergenic for humans. The AVPs in this present study might be a potential therapeutic against the SARS-CoV-2. However, more studies are required for clinical or diagnostic use.

Supplementary Section
Detailed MD-simulation method. Table S1. Binding free energy (ΔG bind ) of different AVP-RBD complex systems calculated using MM-GB/PBSA calculated for 50 ns *, # Figure S1. Fasta sequence of antiviral peptides. Figure S2. Predicted epitopes for major histocompatibility complex class-II (MHC-II) against the spike glycoprotein. Antiviral peptides (black frame) were mapped within predicted epitopes and presented respectively as Seq12, Seq12m, and Se13m (upwards to downwards).