Unique and rare mutations in Noakhali strains
The surface spike glycoprotein recognizes the host cell receptors and mediates the virion’s binding and fusion with the host cell membrane, thus facilitating the virus’s entry into the cells. Furthermore, this protein increases the adhesion of infected cells with neighbouring non-infected cells to allow the spread of the virus across cells19. This protein has been the target of much investigation. Of the strains sequenced and analyzed in Bangladesh, the majority of the strains belong to the GR clade, a derivative of the G clade. 274 strains of the 324 sequenced belonged to the GR clade (GISAID). The G clade is the parent clade of GR and GH and is classified by the mutation D614G20. This mutation was first detected in Europe in February 2020, with an especially high incidence in Italy21. The D614G mutation is dominant in Bangladesh, reflecting the global trend and has been linked with a higher infectivity22. Each strain we sequenced exhibited the D614G mutation in the spike protein. However, 21 out of the 56 sequenced strains also showed other mutations in the critical spike protein, which may have consequences for the protein’s structure and function (Fig. 1).
Table 1: Unique and rare mutations found in the spike protein of SARS-CoV-2 strains isolated from Noakhali, Bangladesh.
Protein
|
Strain Name
|
Mutation
|
No. of occurrences (global)
|
No of countries with this mutation
|
No. of BD strains with this mutation
|
Notes
|
Structural Proteins
|
S
|
NGRI-NSTU-2
|
G769V
|
617
|
31
|
4
|
NGRI-NSTU-21
|
Q675H
|
464
|
34
|
4
|
NGRI-R2-4
|
H245Y
|
233
|
12
|
0
|
NGRI-R2-O10
|
G261R
|
59
|
8
|
1
|
NGRI-R2-8
|
G261R
|
59
|
8
|
1
|
NGRI-R2-18
|
S46L
|
8
|
5
|
0
|
NGRI-R3-2
|
K1191N
|
155
|
22
|
1
|
Non-structural (C-term)
|
NGRI-R3-3
|
K1191N
|
155
|
22
|
1
|
Non-structural (C-term)
|
NGRI-R3-12
|
A684S
|
6
|
5
|
1
|
E654V
|
15
|
3
|
0
|
NGRI-R3-14
|
C1247S
|
3
|
2
|
0
|
Non-structural (C-term)
|
NGRI-R3-15
|
C1247S
|
3
|
2
|
0
|
Non-structural (C-term)
|
NGRI-R3-16
|
Y204F
|
No data
|
No data
|
No data
|
NGRI-R3-O1
|
D80Y
|
1447
|
25
|
0
|
NGRI-R3-O5
|
C1247F
|
124
|
14
|
0
|
Non-structural (C-term)
|
NGRI-R3-O15
|
S1252F
|
203
|
13
|
0
|
Non-structural (C-term)
|
NGRI-R4-OR18
|
Y204F
|
No data
|
No data
|
No data
|
NGRI-R4-OR19
|
L54F
|
965
|
34
|
1
|
NGRI-R4-O17
|
R21K
|
106
|
7
|
0
|
Q1201H
|
24
|
6
|
0
|
Non-structural
|
N
|
NGRI-NSTU-2
|
G18C
|
66
|
10
|
1
|
NGRI-NSTU-3
|
P67L
|
57
|
10
|
1
|
NGRI-R2-22
|
G238C
|
261
|
23
|
0
|
NGRI-R2-O9
|
R203S
|
71
|
10
|
0
|
NGRI-R3-10
|
A211T
|
13
|
4
|
0
|
NGRI-R3-11
|
A211T
|
13
|
4
|
0
|
NGRI-R3-17
|
M210I
|
743
|
25
|
1
|
E
|
NGRI-R2-O9
|
L19F
|
5
|
5
|
0
|
M
|
NGRI-R2-17
|
H125Y
|
537
|
26
|
7
|
NGRI-R4-O17
|
D209Y
|
476
|
20
|
1
|
Non-structural proteins
|
NGRI-R2-4:
|
NS3
|
N119Y
|
No data
|
No data
|
No data
|
NS7a
|
S81del, V82del
|
No data
|
No data
|
No data
|
NGRI-R2-27
|
NS7a
|
E92stop
|
No data
|
No data
|
No data
|
NGRI-R3-17
|
NSP3
|
L1259R
|
No data
|
No data
|
No data
|
A total of 17 different mutations besides the D614G are found in our strains, with four of them appearing in more than one strain. Of these, the A262S mutation found in strain NGRI-R2-O9 has not been included as it is not an uncommon mutation, having appeared more than 3000 times across 29 countries. All other additional mutations were found to be globally rare. Two of these recurring four mutations, K1191N appearing in strains NGRI-R3-2 and NGRI-R3-3, and C1247S in NGRI-R3-14 and NGRI-R3-15 are non-structural mutations found in the C-terminal region of the spike protein. Furthermore, of the four recurring mutations, one particular mutation, the Y204F in strains NGRI-R4-OR18 and NGRI-R3-16, has not been recorded in any strains from any other countries in the GISAID database previously. Another, C1247S, in strains NGRI-R3-14 and NGRI-R3-15 have only been recorded 3 times from across 2 countries in the GISAID database, and never before in Bangladesh.
Some of the mutations that appear only once are also quite rare, including the A684S in strain NGRI-R3-12 which was previously seen 6 times in 5 countries, and the S46L mutation in strain NGRI-R2-18, which has only been recorded 8 times from across 5 countries. The mutation A262S in NGRI-R2-O9 was the most common worldwide, with 3003 previously recorded cases from 29 countries. However, it has only been reported once before in Bangladesh.
Of the 17 mutations included in Table 1, 7 were reported in Bangladesh previously. The mutations found in the strains NGRI-NSTU-2 and NGRI-NSTU-21 were both reported 4 times, while the other 5 appeared previously only once each. For the remaining 10 mutations, our sequenced strains mark their first recorded appearance in Bangladesh.
The circular nucleocapsid proteins bind the 2981 nucleotide long viral genome to form a ribonucleotide core which helps in the virus’s entry into the host cell. It is also important in viral RNA synthesis. Following entry, this protein also affects cellular processes that are responsible for inflammation in the lungs, progression of the cell cycle, immune responses and viral degradation in the host body19. The most common mutations found across our sequenced strains were the GR clade-defining RG203KR mutations and the S194L mutations, which are among the globally dominant mutations for this protein20. These mutations are also known to cause major changes in the protein structure and intraviral interactions23. However, there were also 6 different unusual mutations that appeared across our strains, which will be the focus of this report. Of these, 3 are being reported for the first time in Bangladesh.
The mutation A211T appeared twice, in the strains NGRI-R3-10 and NGRI-R3-11. While this mutation has been reported 13 times globally across 4 countries, it has so far not been reported in Bangladesh (GISAID). The other mutations that are being reported for the first time in Bangladesh include the G238C mutation (which has appeared 261 times in 23 countries) and the R203S mutation (which has appeared 71 times in 10 countries). Uncommon mutations which have appeared in Bangladesh previously and also appeared in our strains include the G18C (66 times in 10 countries previously), P67L mutation (57 times in 10 countries previously), and the M210I (807 times in 30 countries previously) mutations. These have all been recorded once in Bangladesh till now.
The envelop protein is a small membrane protein that plays a role in viral assembly and releases19. The envelope protein is the most conserved structural protein both globally20, and across our strains (GISAID). The only mutation that appeared in this protein in our strains was the L19F mutation which was only seen 5 times in 5 countries. This appeared in the strain NGRI-R2-O9.
The membrane protein exists abundantly in the virus and makes up the membrane on which the spike proteins are attached. This protein also acts as a scaffold for viral assembly. This protein has important consequences in host cells, including enhancing viral pathogenesis in the cells and promoting apoptosis of the cells19. The most common mutation in the membrane protein globally was reported to be the T175M20. This did not appear in our strains. Only 2 mutations were found in the membrane protein across our sequenced strains, the H125Y in strain NGRI-R2-17, and the D209Y in NGRI-R4-O17. Both of these have appeared in Bangladesh before, the former being recorded 7 times across the country and the latter being recorded once in Dhaka. Globally, the H125Y mutation has been reported 580 times in 27 countries previously, and the D209Y has been reported 494 times in 22 countries previously (GISAID).
Besides the mutations in the structural proteins, 5 unique mutations were also found in the non-structural proteins, 3 of which were on the same strain (NGRI-R2-4). 3 of our strains carried these mutations. The mutations are listed in Table 1.
Functional implication of unique mutation Y204F:
Using PROVEAN, we obtained a score of -1.297 with 693 supporting sequences for prediction (Supplementary Data S1 and Supplementary Table S2). As this is above the threshold value, it indicates that the mutation is neutral – it will not have any significant effects on the protein function.
However, using PolyPhen-2, quite different results were obtained. Using the human divergence model HumDiv, we obtained a score of 0.995 (sensitivity 0.68, specificity 0.96) and using the human variation model HumVar, we obtained a score of 0.993 (sensitivity 0.47, specificity 0.96). As both these scores are close to 1 (Fig. 1), they indicate that the mutation is probably damaging 23.
Noakhali shows high prevalence of B.1.36 and B.1.36.16 lineages
Phylogenetic tree and lineage analysis enabled the pattern of lineage distribution to be observed for the district of Noakhali for the period of July to November 2020 (Figure 2a). The majority of the strains belonged to the GR clade parent lineage B.1.1 (n=21; 37.5%) and its descendant lineages which together contributed towards ~60.7% (n = 34) of Noakhali’s SARS-CoV-2 lineages (Supplementary Table S3). The rest were of B.1.36.16 (n=12; 21.4%) and its parent lineage B.1.36 (n=9, 16.0%), both of which belong to the GH clade. All three dominant lineages, i.e., B.1.1, B.1.36, and B.1.36.16 were found across the sampling period and therefore, it may be assumed that no shift in prevalence of various dominant lineages was observed during the studied timeframe.
Notably, two of the strains from the study, NGRI-R4-2 and NGRI-R4-3 were found to be of lineages B.1.141 and B.1.186 both of which haven’t been reported from Bangladesh thus far. While globally B.1.1.141 is predominantly found in the UK, it is also dispersed in other European countries, Russia and Brazil. B.1.186 on the other hand originated in mid-March, 2020, in Saudi Arabia where it is the most prevalent (57.3% of total cases). The lineage also shows some representation in Italy (11.0%) and India (8.5%). Both of these lineages (B.1.186 and B.1.1.141) were isolated on the first week of November 2020 and therefore were possibly introduced at the beginning of the second wave of COVID-19 pandemic in the country.
This study has made >50 strains of SARS-CoV-2 from the Noakhali district available, enabling lineage analysis of Bangladeshi SARS-CoV-2 strains at a smaller geographic scale. Two lineages of the GH clade, B.1.36 and B.1.36.16 were particularly abundant in the Chittagong division when compared to the rest of the country as shown in Figure 2b. Chittagong Division alone contributed to 57.1% of Bangladesh’s B.1.36.16 lineage distribution, a percentage that rose to 72.7% after inclusion of Noakhali’s strains of our study into the dataset. Similarly, while prior to the inclusion of our strains, 46.2% of the B.1.36 lineage strains were shown to have originated from Chittagong Division, the updated dataset propelled this statistic to 68.2%. While B.1.1 was relatively rare in Chittagong Division for the observed time period, contributing to only 10.8% of the national aggregate, it was noticeably prolific among Noakhali’s strains as it accounted for 40.7% (n = 22) of lineages from the district.
Correlation of disease severity with lineage and mutations
Metadata of patients infected by Noakhali’s strains revealed that most of the prolific lineages from the district, such as B.1.1, B.1.36 and B.1.36.16 exerted varying degrees of severity in their hosts, ranging from asymptomatic cases to severe infections (Supplementary Table S4 and Supplementary Table S5). However, no death cases were found to bear strains of B.1.36 or its descendant lineages. Only one death case was found among the 21 B1.1 lineage infected patients, while one strain belonging to B.1.1.10 lineage also resulted in death. Notably, however, two strains belonging to the B.1.1.25 lineage caused patient deaths while the third produced severe symptoms in its host.
Of the two patients carrying the unique spike mutation Y204F, one of them (NGRI-R3-16) was asymptomatic, while the other (NGRI-R4-OR18) showed moderate symptoms. Therefore, it is unlikely that this mutation affects disease severity. Of all the rare mutations identified, three patients carrying them were deceased. They were the hosts of the strains NGRI-NSTU-3 containing the nucleocapsid protein mutation P67L, NGRI-R2-17 containing the membrane protein mutation H125Y, and NGRI-R2-8 containing the spike protein mutation G261R.
Furthermore, some of the patients carrying the listed rare mutations in the structural proteins also had severe symptoms. Patients infected by strains exhibiting rare mutations in the spike protein who displayed severe symptoms included those carrying strains NGRI-R2-O10 containing the mutation G261R, NGRI-R3-O15 containing the mutation S1252F, NGRI-R2-18 containing the mutation S46L, NGRI-R3-2 containing the mutation K1191N, and NGRI-R2-4 containing the mutation H245Y. It is interesting to note that the strain NGRI-R2-4 also had 3 unique mutations in non-structural proteins. Of the patients carrying strains with a rare mutation in the membrane protein, only one displayed severe symptom, and this was caused by the strain NGRI-R4-O17 carrying the mutation D209Y.
Finally, the strains with rare mutations in the nucleocapsid protein that caused severe symptoms include strains NGRI-R2-22 containing the mutation G238C, NGRI-R3-17 containing the mutation M210I (this strain also had a unique mutation in a non-structural protein), and NGRI-R3-11 containing the mutation A211T. It is interesting to note that the other patient carrying this A211T mutation (in strain NGRI-R3-10) showed only moderate symptoms. The patient carrying the only strain containing a mutation in the envelope protein, strain NGRI-R2-O9, was asymptomatic.
Phylodynamic inferences from Bangladeshi strains
Analysis of 546 genomes from Bangladesh estimates the evolutionary rate and the date of the most recent common ancestor (TMRCA) of the genomes used in the analysis. The evolutionary rate, measured in substitutions per site per year (subs/site/year), tells us the rate at which the virus evolves to bring a change in their existing lineage. The MRCA is the point where all the sampled viruses were in the same host, whether human or non-human and so its timing can represent when the epidemic began to diverge.
The analysis from the Coalescent Exponential Growth model shows a clock rate of 1.66 X 10-4 subs/site/year (95% BCI: 7.059 X 10-5 – 3.7889 X 10-4) estimated to be the average evolutionary rate of Bangladeshi genomes (Table 2). Simultaneously, the global estimated rate of SARS-CoV is between 0.80 – 2.3824. With appropriate time-scale settings, the node age of the root is 2020.1538 in decimal years, which when shows the most recent common ancestor (TMRCA) to first appear around February 25, 2020 (95% BCI: October 11, 2019 – October 10, 2020), in Bangladesh, which is consistent with the other global estimates. Moreover, the first positive SARS-CoV-2 was detected around March 8, 2020, within two weeks of the predicted introduction. Figure 3 and Figure 4 shows the maximum clade credibility (MCC) tree representing the distribution of 564 genomes from Bangladesh.
Phylodynamic inferences from Noakhali strains
Analysis of the Noakhali genomes along with the strains from all over Bangladesh shows the average evolutionary rate to 1.065 X 10-5 subs/site/year (95% BCI: 8.26 X10-9 - 8.42 X 10-5) and the most recent common ancestor (TMRCA) were February 11, 2020 (Table 2). The time-scaled phylodynamic tree of this analysis is shown in Figure 5 and Figure 6.
Table 2: Summarization of the evolutionary rates (in subs/site/year) and estimated MRCA
Analysis
|
Clock Model
|
Coalescent Model
|
Estimated Rate (Mean Rate )
|
Substitution Rate ( 95% HPD)
|
Estimate MRCA
|
95% HPD Interval
|
Bangladesh
|
Strict Clock
|
Exponential Growth
|
1.66E-04
|
7.059 X 10-5 – 3.7889 X 10-4
|
February 25, 2020
|
October 11, 2019 – October 10, 2020
|
Noakhali
|
Strict Clock
|
Exponential Growth
|
1.07E-05
|
8.2648 X 10-9 - 8.4286 X 10-5
|
February 11, 2020
|
-
|