More than 750 whole-genomes of SARS-CoV-2 strains collected from different divisions of Bangladesh have been sequenced using NGS (NextSeq 550) in BCSIR, Bangladesh and submitted to GISAID. From them seventeen cases hCoV-19/Bangladesh/BCSIR-NILMRC_392, hCoV-19/Bangladesh/BCSIR-NILMRC_398, hCoV-19/Bangladesh/BCSIR-NILMRC_376A, hCoV-19/Bangladesh/BCSIR-NILMRC_390, hCoV-19/Bangladesh/BCSIR-NILMRC_391, hCoV-19/Bangladesh/BCSIR-NILMRC_393, hCoV-19/Bangladesh/BCSIR-NILMRC_396, hCoV-19/Bangladesh/BCSIR-NILMRC_397, hCoV-19/Bangladesh/BCSIR-NILMRC_411, hCoV-19/Bangladesh/BCSIR-NILMRC_420, hCoV-19/Bangladesh/BCSIR-NILMRC_421, hCoV-19/Bangladesh/BCSIR-NILMRC_422, hCoV-19/Bangladesh/BCSIR-NILMRC_423, hCoV-19/Bangladesh/BCSIR-NILMRC_424, hCoV-19/Bangladesh/BCSIR-NILMRC_426, hCoV-19/Bangladesh/BCSIR-NILMRC_427 and hCoV-19/Bangladesh/BCSIR-NILMRC_428 were analyzed in this experiment. Almost complete (29,903 nucleotides) genomes of all cases were obtained. All these seventeen isolates were placed in GR clade according to GISAID analyses and 2B clade according to Nextclade. Except hCoV-19/Bangladesh/BCSIR-NILMRC_376A and hCoV-19/Bangladesh/BCSIR-NILMRC_390, all isolates are belongs to B.1.1.25 lineage (Table-1). This B.1.1.25 recognized as the Bangladeshi lineage (https://cov-lineages.org/lineage.html?lineage=B.1.1.25) and B.1.1 is its parent lineage.
The age interval of the patients from these Bangladeshi samples, from whom the viral isolates were taken, were between the age range of 22–74 years (Fig. 1a) and the sex distribution was 88.23% male and 11.76% female (Fig. 1b). But this range is very specific in these samples only, not the whole scenario of the country. All of the uploaded samples were came from Dhaka and Naraynganj of the country.
To identify clusters inside the diversity of Severe acute respiratory syndrome-related coronavirus SARS-CoV-2 (2019 outbreak), the coronavirus typing tool applies phylogenetic analysis. This typing tool has prudently selected reference sequences that denote the diversity of each well-defined phylogenetic cluster (detailed in nextstrain). Additionally, this tool executed extensive testing to be sure that their reference strains accurately classify other sequences [10, 11]. If the sequence is identified as part of the SARS-CoV-2 cluster, an additional phylogenetic analysis is performed to identify one of the following lineages: B.1.351_501Y.V2_20H (Beta variant): A cluster of concerns first originated in South Africa (report-COVID-19 B.1.351 (501Y.V2) Variant of Concern – What We Know So Far; Public Health Ontario, 02.07.2021); B.1.1.7_501.V1_201: A cluster of concern first discovered in the UK by A. Rambaut et al. (report) P.1 aka 501Y.V3 aka 20J: A cluster of concerns first discovered in Brazil by N. Faria et al. (report); Y453F.Cluster5_20B: A cluster of concern first discovered in Denmark, believed to have spread from mink infections; B.1.1.70_501Y_20B [12].
Sequences, not part of any cluster of concern will be designated as International AB Diversity 10, 11]. It was only possible to make the original version of this tool weeks after the virus was first sequenced because the sequence of a novel coronavirus from Wuhan using data generated by the Shanghai Public Health Clinical Center & School of Public Health, the National Institute for Viral Disease Control and Prevention, the Institute of Pathogen Biology, and the Wuhan Institute of Virology shared via GISAID, was released before publication. We, from our deepest heart, acknowledge their contribution.
The B.1.351 (501Y.V2) a variant, which is out cluster of concern emerged in late 2020 in Eastern Cape Province, South Africa, and subsequently spread throughout South Africa and to over 40 countries worldwide, including Canada. One B.1.351 case has been detected as of February 4, 2021, in Ontario (report-COVID-19 B.1.351 (501Y.V2) Variant of Concern – What We Know So Far; Public Health Ontario, 02.07.2021). During the first wave of the pandemic, epidemiological and modeling studies show that, the B.1.351 variant is more transmissible compared to lineages circulating. At present, there is ambiguity concerning the ability of B.1.351 to impact COVID-19 severity. Phylogenetic analysis showed that all the cases of SARS-CoV-2 are belonging to GR Clade.
Assembly was performed by aligning reads to a pre-defined Wuhan reference genome (NC_045512.2). Genome assembly of the raw data was done by Assembly toolkit in the EzCOVID19 and EDGE COVID-19 cloud service provided on the EzBioCloud website and EDGE bioinformatics website respectively [13, 14]. To retain the exceptional variations of raw data using the EDGE COVID19 cloud service a consensus genome was generated [13]. The consensus genome was then compared against the same reference genome to calculate single nucleotide variations (SNV) with positions. The SNVs were compared against GISAID clade variation markers. The genome presented here belonged to B.1.1.25 lineage of GR clade. This result was again confirmed by analyzing the raw data using the default parameters of the coronavirus typing tool offered by Genome Detective [15].
Single-nucleotide variations between the input assembly and the SARS-CoV-2 reference NC_045512.2 (https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2) were shown in table-2. When the SARS-CoV-2 strains were isolated from Bangladeshi patients, compared to the reference viral sequence (NC 045512.1), Nonsynonymous and synonymous mutations were identified. These studies have been done using EZBioCloud (https://www.ezbiocloud.net/tools/sc2/?id=4024a5b2-40bb-40b5-acd6-c4dd65b38d76). The queried dataset covers 99.84% of the SARS-CoV-2 genomic sequence. The nucleotide sequences were indicated starting from the 5’ UTR, while the corresponding amino acid changes were mentioned separately for each protein-coding region. In more than 30% of the viral isolates, nucleotide substitutions of C>T were observed, whereas, G>T, and G>A showed 20% and 15% respectively. Among these mostly seen substitutions, C>T, which are present in the ORF1ab, S gene, and ORF3a gene (Table-2). G>T is found in ORF1ab, ORF3a, and in the M gene whereas, G>A present in the S gene and maximum cases in the N gene.
The emergence of mutations in SARS-CoV-2 evolution has been characterized by, in the context of ‘variants of concern’, that influence virus characteristics, including transmissibility and antigenicity. These mutations impact probably in response to the changing immune profile of the human population [16]. The amino acid change D614G in spike protein was noted to be increasing in frequency in April 2020 and to have emerged several times in the global SARS-CoV-2 population [16]. The coding sequence reveals a high dN/dS ratio, suggesting positive selection at the codon position 614 [17, 18]. Consequent studies indicated that, D614G confers a moderate advantage for infectivity and transmissibility. There is another mutations P681R in spike protein have now arisen and are discussed in this article. Similarly, Bioinformatics analyses reveal that the P681R mutation in the spike protein of certain isolates are highly conserved in this lineage. P681R mutations were highly conserved in the B.1.1.25 lineage, and remarkably, the P681R mutation was the most representative mutation in this lineage.
Each of the 17 viral isolates were also evaluated utilizing the mutations they harbor. Most of the viral isolates were found to have different amino acid substitution combinations (Table-3). However, the same amino acid substitutions combination Spike_D614G and Spike_P681R were observed for the samples BCSIR-NILMRC_393, BCSIR-NILMRC_396, BCSIR-NILMRC_421, BCSIR-NILMRC_422, and in BCSIR-NILMRC_424. In this study we have emphasized these two amino acid mutations and hence we will discuss elaborately with these two amino acid mutations in the spike protein later on. In every isolates Spike_D614G is the common mutation found in the spike protein (Table-3). The analyzed viral isolates were found to harbor 11 to19 mutations per isolate compared to the reference sequence.
B.1.1.25 Lineage with S:P681R Reports are given in the figures number 3, 4 and 5. For this study we have used outbreak.info tool (https://outbreak.info/situation-reports?pango=B.1.1.25&muts=S%3AP681R&selected=BGD&loc=USA&loc=USA_US- CA&loc=BGD). Concerns surrounding new strains of SARS-CoV-2 (hCoV-19), the virus behind the COVID-19 pandemic, have been developing. The prevalence of the B.1.1.25 Lineage with S:P681R in the world, how it is changing over time, and how its prevalence varies across different locations specially in Bangladesh, has been outlined in this report. All these reports are presented in the following figures (Fig. 3, supplementary figure 1 and 2). We depend on shared virus sequences from the GISAID to describe the prevalence of sets of mutations in Mutation Situation Reports. We also rely on the accuracy of the sequences and sample metadata deposited in GISAID, while we apply filters to eliminate some low quality sequences and unreasonable metadata. These sequences are a sample of the total number of cases — often a biased sample and may not represent the true prevalence of the mutations in the population. In general, it is important to note that case numbers for any given lineage/mutation can be significantly affected by overall case numbers and rates of genomic sequencing at any given location (www.Outbraek.info) [19].
Lineage B.1.1.25 is a Bangladeshi lineage and B.1.1 is its parent lineage. Most common countries of this lineage are Bangladesh 50.0%, Canada 27.0%, United States of America 11.0%, United Kingdom 5.0%, and Australia 2.0% (https://cov-lineages.org/lineage.html?lineage=B.1.1.25). All the spike protein sequences along with reference sequence from Wuhan were aligned using the multiple sequence alignment platform of CLUSTAL Omega for Sequence alignments and structure. After finishing the alignment, the generated file was viewed using MView and variations in the sequence or the amino acid changes were noted. For the prediction of secondary structures of SARS-CoV2 spike protein, CFSSP (Chou and Fasman secondary structure prediction) server was used in this experiment. To check for the presence of similarities or differences, all these spike protein sequences were first aligned in CLUSTAL Omega. We found that at position 614, all the isolates from Bangladesh have a common mutation D614G whereas, in the reference genome (Wuhan) it is ‘D’ instead of ‘G’. So, in this position for all the isolates amino acid D (aspartic acid) changed into G (Glycine). Also, at position 681, four of the isolates BCSIR-NILMRC_393, BCSIR-NILMRC_396, BCSIR-NILMRC_422, BCSIR-NILMRC_424 showing mutation. In this position within these four isolates amino acid P (Proline) changed into amino acid R (Arginine) (Fig. 4). For comparison, the original Wuhan sequence as the wild type were considered. We found these two significant differences at amino acid positions in spike protein that were mutated in these isolates overall (https://www.britannica.com/science/amino-acid/Standard-amino-acids).
In the S1 domain, one of these mutations was D614G. This mutation posited near the receptor-binding domain at a downstream position. Another mutation, P681R harboring further downstream in the S protein in the same domain. The D614G mutation which is a conserved mutation in every 17 isolates, in its spike (S) protein has emerged in the spring of 2020, a SARS CoV-2 derivative harboring and quickly become predominant [20]. The D614G-bearing variant has quickly swept out the original strain, as because the D614G mutation increases viral infectivity, fitness, and inter-individual transmissibility [21]. Interestingly, the P681R mutation in the spike protein is unique and recently recognized as the VOCs so far in Bangladesh and the P681R mutation is located in the proximity of the furin cleavage site of the SARS-CoV-2 S protein [22].
For secondary structure analyses of those mutant proteins, we have used CFSSP: Chou & Fasman Secondary Structure Prediction server (http://www.biogem.org/tool/chou-fasman). This is a very effective server which predicts secondary structure of protein from the amino acid sequence. In this server, Chou & Fasman algorithm has been implemented [23]. Secondary structure was analyzed for the mutation D614G of BCSIR-NILMRC-422 and mutation P681R of BCSIR-NILMRC-424. In the case of mutation D614G, Glycine (G) are non-polar with aliphatic R groups and Aspartate (D) are polar amino acids with a carboxylic acid on its side chain that gives it acidic (proton-donating) properties (https://www.britannica.com/science/amino-acid/Standard-amino-acids) and have side chain while Glycine has no side chain. On the other hand, in mutation P681R, P (Proline) non-polar with aromatic R group changed into R (Arginine), which is a basic amino acid, occur with an overall charge of +1 at physiological pH. Another alteration from Proline to Arginine can thus potentially interrupt the local folding of that particular protein. In figure 5, the secondary structure prediction are showed changes in and around the site of mutations. In the spike region, where mutation occurred, there were loss of turn structures in positions 614 and 681. The alternation took place in secondary structure might cause a variation in the function of S1. In this region, S1 helps the fusion process of the spike protein and consequently mutation in S1 may have altered receptor spike interactions and hence infectivity [2].
To correlate, if changes in secondary structure are also reflected in the dynamics of the protein in its tertiary structure, we performed normal mode analyses and studied protein stability and flexibility. For this purpose we have used Dynamut (http://biosig.unimelb.edu.au/dynamut/results_prediction/16263266400). This tool provides users implementing two distinct, well established normal mode approach that can use to analyze and visualize protein dynamics. The impact of mutations on protein dynamics and stability resulting from vibrational entropy changes supposed to be done by sampling conformations and assessment.
Our mutations of interest D614G and P681R on spike proteins of SARS-CoV-2 isolates from Bangladeshi patients, of these, maximum are destabilizing (ΔΔG < 0.0 kcal/mol) and few stabilizing (ΔΔG > 0.0 kcal/mol). Change in vibrational entropy energy (ΔΔSVib ENCoM) in case of mutation D614G between the wild type Wuhan isolate and the BCSIR-NILMRC- 422 isolate was 0.856 kcal.mol-1.K-1, while in case of mutation P681R it was 0.018 kcal.mol-1.K-1 (Fig. 6). In the case of mutation D614G the ΔΔG (Gibbs free energy) was -0.511 kcal/mol and the ΔΔG ENCoM was -0.684 kcal/mol whereas, in case of mutation P681R it was 0.224 kcal/mol (Stabilizing) and the ΔΔG ENCoM was -0.014 kcal/mol. All these findings recommended a stabilizing mutation in these types of the spike.
The interatomic interactions of mutation D614G in isolate BCSIR-NILMRC-422 have been shown in Figure 7. Studies of atomic fluctuations and deformation energies for D614G in case of isolate BCSIR-NILMRC-422 showed noticeable changes (Fig. 9). Atomic fluctuations compute the measure of absolute atomic motion whereas deformation energies detect the measure of flexibility of a protein. Figure 9 shows the photographic representations of the atomic fluctuation and deformation energies where positions that could be visibly detected to be altered have been marked for D614G in isolate BCSIR-NILMRC-422.
The interatomic interactions in case of mutation P681R in isolate BCSIR-NILMRC-424, have been shown in Figure 8. Analyses of atomic fluctuations and deformation energies for P681R in isolate BCSIR-NILMRC-424 showed detectable changes (Fig. 10). Deformation energies identify the amount of flexibility of a protein and atomic fluctuations calculate the measure of absolute atomic motion. The visual representations of the atomic fluctuation and deformation energies were shown in Figure 10, where positions that could be visibly identified to be different have been marked in mutation P681R in isolate BCSIR-NILMRC-424.
The mutations of P681R is the first report for Bangladesh in the isolates of the BCSIR-NILMRC-393, BCSIR-NILMRC-393, BCSIR-NILMRC-422 and BCSIR-NILMRC-424 and additional sequencing followed by adequate annotation would help to expand the acquaintance about mutation of spike protein in human SARS-CoV-2. These variations might lead to the diversification of particular virus and the subsequent emergence of variants, strains, serotypes and antibody escape mutants. Similarly, mutations might help the virus to adjust to the host environment better and expand its tissue tropism.