Emerging Strains Of SARS-Cov2 And Their Inhibition by The Use of Phytochemicals: An In-Silico Analysis

Recently identied coronavirus namely severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded positive-sense RNA virus with a genome of 29.9 kb in size encoding 14 open reading frames (ORFs) and 27 different structural and non-structural proteins. Among the structural proteins, trimeric-shaped spike glycoprotein is responsible for the entry of the SARS-CoV2 genome into host cells by interacting with human angiotensin-converting enzyme 2 (ACE2) receptors that are present on the cell surface with high anity. Notably, inhibition of spike protein is considered a prime target for the development of drugs against COVID-19. Viruses can mutate, and SARS-CoV-2 is no exception. Since the rst whole genome of SARS-CoV2 was published in February 2020, at least 4400 amino acid substitutions and several thousand mutations have been identied to date. As of today, more than 3500 new variants of SARS-CoV2 have been sequenced with a high spreading and infectivity rate which makes the virus more contagious. These new variants have been spread to several countries including United States (US), United Kingdom (UK), Brazil, South Africa, India, and other countries, etc. Therefore, herein, we analysed the new SARS-CoV2 strains, constructed the 3D homology models of Brazil P.1 and Indian B.1.617 variants, and screened them against 100 phytochemicals having previously identied anti-viral activity. Our study revealed that the top three phytochemicals for each of the new strains might serve as potential anti- SARS-CoV-2 agents for further drug discovery and development process to tackle COVID-19.


Introduction
Recently identi ed coronavirus namely severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded positive-sense RNA virus that shows similar genomic organization (99.8% sequence identity) with genes of previously identi ed SARS-CoV (1). Recent studies showed that the genome of SARS-CoV2 is approximately 29.9 kB in size encoding 14 open reading frames (ORFs) and 27 different structural and non-structural proteins. The structural proteins including spike glycoprotein (S), an envelope protein (E), nucleocapsid protein (N), and matrix protein (M) are noted to have a key role in binding to host receptors. The non-structural proteins such as papain-like protease (nsp3), main protease (nsp5), RNAdependent RNA polymerase (nsp12), helicase (nsp13), and 2′-ZO-methyltransferase (nsp16) are involved from transcription to replication to pathogenesis processes during the life cycle of SARS-CoV2 (2). Among them, spike protein (S-protein) which is trimeric in shape is responsible for the entry of the SARS-CoV2 genome into host cells by interacting with human angiotensin-converting enzyme 2 (ACE2) receptors that are present on the cell surface with high a nity. The S-protein has two subunits as the N-terminal S1 domain and membrane-proximal S2 domain. The S1 domain recognizes the cell surface ACE2 receptor through the receptor-binding domain (RBD) while the S2 domain intervenes in the fusion of viral particles with the host cell membrane. The RBD and other binding domains of S-protein are known to neutralize antibodies that are formed against SARS-CoV2 (3). As a result, inhibition of S-protein and ACE2 human receptor can be considered as prime targets for the development of drugs against COVID-19. Notably, most of the vaccines which are in clinical trials or are being used to neutralize SARS-CoV2 infection are generally based on S-protein sequences. Among the anti-SARS-CoV2 vaccines, several candidates use the RBD site as the sole antigen Viruses can mutate, and SARS-CoV-2 is no exception. Since the rst whole genome of SARS-CoV2 was published in February 2020, at least 4400 amino acid substitutions and several thousand mutations have been identi ed for the same virus so far (4). Among them, a total of 725 mutations in S-protein with 89 on the RBD segment has occurred, suggesting that RBD is more dispose site to mutation (5). The S-protein mutation such as D614G has already appeared prominently in many places around the world during early pandemic situations through single amino acid substitution at 614 positions from an aspartic acid (D) to a glycine (G) residue. This mutation has been considered potentially more transmissible and infective (6). Recent data provided some shreds of evidence that D614G mutation is associated with the higher load of viral nucleic acid in the upper respiratory tract. Though this mutation is potentially more transmissible and infective, still, the disease severity is low (7). As a consequence of numerous mutations, more than 3500 new variants of SARS-CoV2 have been sequenced with a high spreading and infectivity rate which makes the virus more contagious. As of year-end 2020, these new variants had spread to several countries including United States (US), United Kingdom (UK), Brazil, South Africa, etc. A highly mutated variant (20I/501Y.V1 or lineage B.1.1.7) had been identi ed in early September of 2020 in Eastern England, hence also called as UK variant. Early pieces of evidence showed that this variant may be associated with increased transmissibility and higher death risk compared to wild type variants (8). In Nelson Mandela Bay, South Africa, another variant (20H/501Y.V2 or B.1.351) was also identi ed with multiple mutations. Preliminary studies suggested that it has a low impact on disease severity, but it may affect the e cacy of vaccines because of the presence of E484K mutation in spike protein (9). A Brazil variant (known as lineage P.1), rst detected in four travellers from Brazil at Japan airport, is receiving more attention because it may allow immune scape as evidenced in a report. In October 2020, a new Indian B.1.617 variant has been detected in Maharashtra state but in April 2021, the number of new infections has rapidly jumped to rise in an unprecedented manner because of existing of double mutations in spike glycoprotein which has predicted by preliminary study. Although this variant has spread to other countries like Singapore, Australia, Germany, Belgium, United Kingdom as well as the United States, still, some experts have a deep concern that it may turn into a super spreader mutant that will be continued to spread in other countries of the world. Health workers are uncertain that whether the Indian B. Current COVID-19 vaccines are generally based on the spike protein of SARS-CoV-2, but for the emerging strains having mutations in spike proteins, it is a question of concern that how much these vaccines are e cient than the wild-type SARS-CoV-2 because new strains seem to be more transmissible and deadlier. To date, at least 308 vaccine candidates are in various clinical stages of development including 24 in Phase I trials, 33 in Phase II trials, and 16 in Phase III development. Most of them showed e cacy as high as 95% in preventing symptomatic COVID-19 infections. As of April 2021, regulatory authorities from different countries approved 13 vaccines for public use. Depending on the mode of action, all these vaccines are divided into four different groups including two RNA vaccines, ve conventional inactivated vaccines, four viral vector vaccines, and two protein subunit vaccines (11). Medicinal plants and herbs employed in traditional medicines have attracted signi cant attention because of having bioactive molecules that may act as therapeutic agents for the prevention and treatment of several diseases with no or minimal side effects (12). In the last few months, several studies, as well as our studies, have shown the promising role of natural products or phytochemicals to inhibit the SARS-CoV2 targeting their structural and non-structural proteins including spike protein by the use of computational tools (13)(14)(15)(16). Therefore, the aim of the present study was to unfold the insights into the new strains of SARS-CoV2 and their inhibition by phytochemicals from different medicinal plants and herbs for the prevention of COVID-19 by using computational tools.

Data Collection
All the data related to mutations that gave rise to new strains of SARS-CoV-2 were searched and collected from online The experimental 3D structure of SARS-CoV-2 spike glycoprotein of Brazil Strain P.1 and Indian variant B.1.617 were unavailable on the RCSB protein data bank (https://www.rcsb.org/). Therefore, to get 3D structure, they were modeled with the help of the Swiss Model server (https://swissmodel.expasy.org/). The target sequence of both the strains with GenBank QRX39425.1 (https://www.ncbi.nlm.nih.gov/protein/QRX39425) and QUA70603.1 (https://www.ncbi.nlm.nih.gov/protein/QUA70603.1) was retrieved in FASTA format from NCBI (https://www.ncbi.nlm.nih.gov/) website. Later on, the amino acid sequences of the targeted strain were pasted in the Swiss Model server to build their 3D model (17).
Template selection BLAST and HHblits methods were applied for template structure against the SWISS-MODEL template library. The template selection of the targeted sequence of Brazil Strain P.1 and Indian variant B.1.617 was performed on the basis of coverage of template sequence with the targets sequence, identity (similarity between them), and global model quality estimation (GMQE). The two PDB IDs viz. 6ZWV and 7KRS were used as templates against Brazil Strain P.1 Spike glycoprotein and Indian B.1.617 spike glycoprotein sequences, respectively. The range of GMQE relies on a range from 0 to 1. Herein, the higher value of GMQE represented the maximum accuracy (18, 19).

Protein structure validation
The quality of predicted models of Brazil Strain P.1 and Indian variant B.1.617 was analysed with the help of the PROCHECK Ramachandran plot. Ramachandran statistics were used to gure out the range of amino acids available in the favourable as well as allowed and disallowed regions. To nd out the overall quality score of generated models, the ERRAT server was used which was available in SAVES v6.0. The root-mean-square deviation (RMSD) values between template structure and modelled structure were obtained by aligning them in PyMol software

Protein preparation
The UK strain B.1.1.7 and South Africa strain B.1.351 were available and downloaded from the protein data bank website. All spike proteins (for Brazil Strain P.1, Indian variant B.1.617, UK strain B.1.1.7, and South Africa strain B.1.351) were subjected to pre-process and re nement by using protein preparation wizard in the Schrödinger suite. The selected receptor was pre-processed by adding missing hydrogen atoms. All hydrogen bonds were optimized using sample water orientation whereas energy minimization of the receptor was performed by using default RMSD value 0.30 Å and OPLS3e force eld methods Phytochemical hits screening, preparation and optimization as ligands A literature review of phytochemical with antiviral activities was performed and a dataset comprising 100 phytochemicals with antiviral activities was prepared. The chemical structures of all the selected phytochemicals were retrieved from PubChem (https://pubchem.ncbi.nlm .nih.gov/) website along with their PubChem IDs. Their spatial data le in SDF format was utilized for prediction purposes. The ligand structures were corrected for bond length and bond angles. Later on, missing hydrogen atoms were also added. The geometry of all ligands was optimized by employing force eld OPLS3e using the LigPrep module of Schrodinger software. Finally, the optimized structures were further used for docking studies.

Molecular docking
Glide v8.8 (Schrodinger, LLC, New York) software was used for docking studies to identify the binding a nities of ligands within the binding pockets of target proteins. For each ligand, binding a nities were expressed as Glide scores. The Discovery studio was used to visualize the docked poses. Among them, the best poses were ranked on the basis of an energy function that combines the empirical and force-eld algorithm simultaneously.

Results And Discussion
Analysis of mutation patterns The UK strain (20I/501Y.V1 or lineage B.1.1.7) is originated by the replacement of amino acid asparagine with tyrosine at position 501 of the RBD subunit of the spike protein along. Several other mutations including 69/70 deletion as well as P681H near the S1/S2 furin cleavage sites were also seen. The South Africa strain (20H/501Y.V2 or B.1.351) was detected to have numerous mutations in the spike protein, including K417N, E484K, and N501Y. The rst Brazil P.1 strain of SARS-CoV2 contains 10 spike protein mutations N501Y, E484K, and K417T along with 17 other unique mutations. In contrast, the second Brazilian P.2 lineage was also identi ed as having three mutations in spike proteins namely E484K, N501Y, and K417T. As per evidence, the E484K mutation has been noticed in the South African variant but not in the UK variant (10). The Indian B.1.617 variant consists of two mutations namely E484Q and L452R in the spike glycoprotein which were already in circulation globally. Because of the current surge or worst condition, scientists are worried that B.1.617 variant is not a double mutant but it may consist of more mutations including E154K, P681R, and Q1071H. The three mutations E484Q, L452R, and P681R have been found in other variants of concern from the UK, South Africa, and Brazil. The mutation E484Q is similar to the E484K mutation that was previously seen in the Brazilian and South African variants. On the other hand, L452R is similar to the California variant, an immune escape strain, thus, may affect the vaccine e cacy. Further, the P681R mutation showed similarities to that of mutations seen in the United Kingdom variant. Furthermore, other strains that are present in many countries exhibited different mutations in their spike glycoproteins as depicted in table 1.

Generation of 3D Protein Structures: Validation and Analysis
Homology modelling is a key procedure to predict the protein structures relating to their functions. The SWISS-MODEL server is used to design the protein structure by the use of the homology modelling method. Based on query coverage, identity and GMQE scores, the BLAST and HHblits search revealed that PDB IDs 6ZWV and 7KRS may act as the best possible template match in the homology modelling for spike glycoprotein of Brazil Strain P.  The overall molecular structures of Brazil P.1 and Indian B.1.617 strains were inconsistent with the crystal structure of spike glycoprotein of wild type of SARS-CoV2 (PDB ID: 3M3V).

Declarations Con ict of interest
The authors declare that they have no con ict of interest.