Distribution analysis of SARS-CoV-2 in different geographic regions
A total of 9,761 SARS-CoV-2 genomes were retrieved from The GISAID database (https://www.gisaid.org) that contain 3 sequences from bat (Betacoronavirus) and 9 sequences from Malayan Pangolin (Manis javanica) (Data Supplementary Table 1). Out of 9,761 genome sequences, 2,301 complete genome sequences of SARS-CoV-2 were selected randomly, aligned and compared with Wuhan SARS-CoV-2 (NC_045512.2) reference genome. We have divided our dataset into 6 different geographic areas: Europe (20.31%), North America (21.13%), Asia (35.37%), Oceania (20.86%), South America (16.63%) and Africa (10.35%). The European group comprises of SARS-CoV-2 infected patient data from the following countries: Austria, Belgium, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Netherlands, Poland, Portugal, , Slovakia, Slovenia, Spain, Sweden, Switzerland, and United Kingdom. The North American group contains genomes from the United States and Canada. The Asian group comprises genomes obtained from patients located in China, Indonesia, Pakistan, Philippines, Taiwan, Turkey, Kuwait, Georgia, South Korea, Japan, Iran, India, Thailand, Hong Kong, Malaysia, Singapore, and Vietnam. The Oceanian group comprises genomes from Australia and New Zealand. South America includes Brazil, Peru, Chile, Colombia, Argentina, and Ecuador (Fig. 2A – C).
Sequences from bat-SARS-CoV and Pangolin-SARS-CoV were aligned and compared to the Wuhan SARS-CoV-2 (NC_045512.2) as a reference genome. To determine the evolutionary relationship among bat-CoV, Pangolin-CoV and SARS-CoV-2, we estimated a phylogenetic tree based on the nucleotide sequences of the whole-genome sequence. Bat-SARS-CoV and SARS-CoV-2 were grouped together and were observed to share >96% similarity, whereas the Pangolin-SARS-CoV was closest evolutionary ancestor (Fig. 2D). Isolate of human Wuhan SARS-CoV-2 (NC_045512.2) shared 85.98% identity with Pangolin-SARS-CoV which suggests that Pangolin may be associated with SARS-CoV-2 evolution or subsequent outbreak [41, 42].
Identification of hotspot mutations in SARS-Cov-2 complete genome from South American and African regions and analysis of main protease (3CLpro) sequence
Recently, Pachetti et al [43] has reported eight novel recurrent mutations of SARS-Cov-2 that have been identified in positions 1397, 2891, 14408, 17746, 17857, 18060, 23403 and 28881 in Asian, Oceanic, European and North American outbreaks. However, SARS-CoV-2 mutations from South American and African patient isolates are not yet reported. We confirmed the occurrence of these mutations in South Americans and Africans located at positions 3036, 8782, 11083,14408, 23403, 28144 and 28881 as reported in previous literature [43]. Our study highlights the presence of additional “conserved mutations” in the South American and African communities, considering only those occurring ≥ 5 times in our database. We report here 12 new mutations that have evolved in the SARS-Cov2 sequence in South American and African populations. These are located at positions 14805, 25563, 26144, 28882, 28883, 9477, 28657, 28863, 1059, 15324, 28878 and 29742 sites. The high tendency of the virus to demonstrate genetic variability is evident from the fact that even within these variants, three variations 9477 (nsp4), 28657 and 28863 (ORF9, structural protein) were uniquely identified in isolates from South American patients while four novel mutations viz. 1059 (nsp2), 15324 (RdRp), 28878 (ORF9, structural protein), and 29742 (stem-loop II-like motif) were detected only in isolates from African patient samples (Figure 3B). Interestingly, some mutations were identified to be common between these two separate sets of sequences that have been reported from the two distinct geographical locations viz. 14805, 25563, 26144, 28882 and 28883, belonging to gene ORF1ab (14805 RNA-dependent RNA polymerase (RdRp), ORF3a (25563 and 26144 ORF3a protein) and ORF9, N gene (28882 and 28883 nucleocapsid phosphoprotein) sequences, respectively (Figure 3A). An interesting finding of this analysis is the concurrence of 14805 mutation with 14808 mutation in the same locus. This double point mutation was observed in RdRp genome from isolates of both South American and African patients. In contrast, 28882/ 28883 mutation locus corresponded with another previously reported mutation 28881, and this triple point mutation was also present in both the South American and African genomic sequences. Identification of point mutations at the same locus indicates the high susceptibility of these genetic regions to change as the virus evolves.
For its actions, single-stranded SARS-CoV-2 RNA viral genome encodes two protease polyproteins (i) papain-like cysteine-protease (PLpro) and (ii) the chymotrypsin-like cysteine protease known as 3C-like protease (3CLpro). 3CLpro, which is a main protease and therefore important in order to examine the incidence of any mutation in SARS-CoV-2 3CLpro. Multiple sequence alignment of the SARS-CoV-2 genome collected from patients in six different geographical locations exhibited 100% similarity and no discernible variations in sequences obtained from diverse geographical regions, for this enzyme.
SARS-CoV and SARS-CoV-2 similarity
SARS and SARS-CoV-2 complete genomes were collected from NCBI, GenBank database (NC_004718 and NC_045512). Protease nucleotide sequences were extracted from SARS (NC_004718) and were aligned with SARS-CoV-2 (NC_045512). Clustal Omega alignment of 918 SARS nucleotides showed around 95% similarity with SARS-CoV-2 (Supplementary Table 2). Higher amino acid sequence identity was also observed in SARS-CoV and SARS-CoV-2 main protease (3CLpro) derived from Wuhan and US patients. SARS-CoV and SARS-CoV-2 3CLpro showed highly conserved region in both the catalytic sites, His41 and Cys145 [44] and substrate binding region of the enzyme (163-167 and 187-192) [45] (Fig. 4A), inferring that these proteases exhibit high similarities. Furthermore, 12 variant positions (Thr35Val, Ala46Ser, Ser65Asn, Leu86Val, Arg88Lys, Ser94Ala, His134Phe, Lys180Asn, Leu202Val, Ala267Ser, Thr285Ala and Ile286Leu) were observed in SARS-CoV-2 3CLpro (Fig. 4B, C). The effects of mutations and potential resultant amino acids on SARS-CoV-2 3CLpro structure are expected to conserve the polarity and hydrophobicity, except when the resulting amino acid is Leucine at 286 position. However, it is important to mention that these 12 variants are not present in catalytic and substrate binding regions which are involved in critical proteolytic activity of the SARS-CoV-2 protease molecule.
Docking study of SARS-CoV-2 3CLpro inhibitors
The SARS-CoV-2 3CLpro receptor binding pocket was determined by superimposing SARS and SARS-CoV-2 3CLpro with their respective inhibitors (Fig. 4). Interestingly, Needleman-Wunsch alignment algorithm and BLOSUM-62 matrix analysis revealed 94.44 % sequence identity between SARS (Fig. 5A, grey) and SARS-CoV-2 3CLpro (Fig. 5A, Cyan). Cys-His catalytic dyad (Cys145 and His41) comprises the active catalytic binding site in SARS-CoV-2 3CLpro (Fig. 5A’ and B) and indicated the strong possibility that intended pharmacological inhibitors of SARS-CoV-2 3CLpro may also suppress the activity of SARS-CoV-2 3CLpro viral enzymes. Docking protocol for the Autodock 4.2 program was optimized by extracting and re-docking the alpha-ketoamide inhibitor named O6K in the binding pocket of SARS-CoV-2 3CLpro. The lowest binding energy -6.45 kcal/mol and 18.72 µM inhibitory constant (Ki) was predicted for alpha-ketoamide inhibitor (shown in Table 1). Re-docking of O6K inhibitor occupied the similar docking pose in the SARS-CoV-2 3CLpro catalytic dyad active site as previously reported in the crystal structure (PDB ID: 6Y2G) (Fig. 5C, D).
Seven flavonoids and biflavonoid, three anti-malarial compounds, seven anti-viral drugs and three vitamin molecules were subjected to automated docking within the active site of SARS-CoV-2 3CLpro catalytic-dyad. The superimposition of all docked flavones and biflavones (Fig. 6a), anti-malarial drugs (Fig. 6b), anti-viral drugs (Fig. 6c) and vitamins (Fig. 6d) are shown in Fig. 6 and various binding parameter have been tabulated in detail in Table 2.
Amentaflavone, a biflavonoid showed the highest binding energy (-8.49 kcal/mol) implicating a strong affinity with SARS-CoV-2 3CLpro. This corresponded with previously reported enzyme inhibitory assays with amentaflavone that showed the highest IC50 value at low concentrations of the molecule, 8.3±1.2 µM [46]. However, bilobetin demonstrated the lowest IC50 value at a higher concentration of 72.3±4.5 µM in SARS-CoV enzyme activity assays [46]. In contrast, our docking studies revealed that bilobetin, predicted almost comparable binding energy with that of amentaflavone (-8.29 kcal/mol) suggesting that mutation in SARS-CoV-2 3CLpro could potentially disrupt hydrogen bonding or induce some conformational change that could result in alterations in the binding site thus affecting inhibitor interactions with the enzyme active site residues. Amentaflavone showed H-bond interactions with the catalytic dyad residues (Cys145 and His41) as well as noteworthy interactions with the SARS-CoV-2 3CLpro residues Thr26, Ser46, Ser144 and Glu166 whereas His164, and Gln189 amino acids contributed to the hydrophobic interactions for the SARS-CoV-2 3CLpro inhibitors (Fig. 7A).
Three antimalarial drugs were then selected to study their inhibitory actions on SARS-CoV-2 3CLpro. We found, Artemisinin, a natural compound derived from Chinese herb Artemisia annua produces the highest docking score (-6.40 kcal/mol) as compared to O6K, chloroquine (-4.95 kcal/mol) and hydroxychloroquine (-5.77 kcal/mol) anti-malarial molecules. Importantly, Artemisinin has demonstrated broad anti-viral activity against human cytomegalovirus, herpes simplex virus type 1, Epstein-Barr virus, hepatitis B virus, hepatitis C virus, and bovine viral diarrhea virus [47]. Artemisinin was shown to exhibit hydrogen bonding with His41, Leu141, Asn142, Gly143, Ser144 and Glu166 SARS-CoV-2 3CLpro amino-acid residues (Fig. 7B).
Amongst the seven antiviral drugs, Ritonavir showed the highest binding energy (-7.45 kcal/mol) and lowest inhibitory constant Ki value (3.49 µM). Ritonavir produced hydrogen bond interactions with Thr26, His41 and Cys145 SARS-CoV-2 amino acids (Fig. 7C). A combination of two HIV-1 protease inhibitors, lopinavir and ritonavir, were given to critically ill SARS-CoV 2 infected patients [48]. However, the combination therapy of lopinavir and ritonavir was also stopped early in 13 patients (total recruitment 99 patients) due to associated gastrointestinal adverse events [48].
The severity of antiviral therapy adverse events has led researchers to explore the potential of macro-, micro- and phytonutrients that can potentially promote an immune response and suppress viral induced effects. Vitamins are known to modulate the host immune functions by providing anti-oxidants and anti-inflammatory activity [49, 50]. Therefore, we selected vitamins, ascorbic acid (vitamin C), cholecalciferol (vitamin D) and alpha-tocopherol (vitamin E) to investigate their potential interactions with the enzyme SARS-CoV-2 3CLpro. Our docking results interestingly, showed that vitamin D has the lowest binding energy and Ki (-7.75 kcal/ mol and 2.08 µM respectively) as compared to vitamin C and vitamin E. Amino acid residues Thr24, Thr26, His41 and Cys145 of SARS-CoV-2 3CLpro showed hydrogen bond formation with vitamin D (Fig. 7D). Amino acid Thr is extensively involved in intracellular signalling changes through phosphorylation changes, and here we observed that cholecalciferol formed a strong hydrogen bond with Thr residues and could potentially block the phosphorylation of Thr residue in SARS-CoV-2 3CLpro enzyme. There is evidence that serious SARS-CoV-2 infected cases have reported severe vitamin D deficiency and thus therapeutic concentrations of this molecule could potentially be used clinically in SARS-CoV-2 cases [51, 52].