SARS-CoV-2 Genome Variants Epidemiology Surveillance in Ethiopia and Targeted Mutational Changes of Structural S Spike Proteins Through Computational Analysis

Objective: This study aims to identify the variants of SARS CoV 2 that were circulating in Ethiopia and spot dynamic mutational changes of spike antigenicity based on genome data analysis to put forward preventative measurement against pandemic. Results: The genomes from Ethiopia were conrmed to be evolutionary related to RaTG13 and SL- bat coronavirus and Spike receptor sites were conserved. The clade distribution of the genome was reected as GH, GR and other O and intended for new variants. 3 female samples were detected as variants of concern VUI202012/01GRY B.1.1.7 which Pango linage B.1.1.7 was originated from the UK. Despite 21 notable mutations, 71% D614G, 28% D614X, 35% N501Y and 21% NSP5 S284G mutation were occurred predominantly in our genome samples. and could be antigenicity and infectivity. Mutation on N440K was perceived in a sample and potency resist SER-52 antibody neutralization and vaccine escape.


Introduction
Novel coronavirus is becoming a major threat to the world and killing at around 4 million people out of the total half of billion con rmed cases as of June, 2020 [1]. The rst virus entered Ethiopia by a Japanese passenger who was ying to the capital city Addis Ababa by airplane in March 2020, according to the Ministry of Health report. Nowadays the number of people infected with the virus continues to increase 4,220 fatalities cases are registered and 273,175 total con rmed cases are enumerated and 1,894,717 vaccine doses have been administered in the country as of June 2021 [1,2].
Since the virus entered and distributed in Ethiopia, molecular diagnosis techniques such as SARS-CoV-2 Antibody IgG test were performed in Ethiopia [3]. Genomic study is not yet. Though a number of whole genome sequences of SARS CoV-2 Ethiopian were sequenced and deposited in GISAD databases, no previous studies have been conducted in terms of genomics characterization and could not discover the variants in the country. It is very important to know the structural mutations and strains that are circulating in Ethiopia to ensure information about the potential of variants and promising indicators so as to control dynamic mutations. Presently, SARS-CoV-2 with the D614G and N501Y mutation became the most frequently detected globally. As a result, a patient with D614G mutation has more viral loads and it is di cult to detect the virus easily using usual habitual techniques [4]. It was examined to neutralize mAbs test on those who were suffering from SARS CoV-2 variants B.1.1.7 and lastly taken a sera from recipients of the oxford AstraZenic, P zer-Biotech vaccine [5]. Variants of B.1.351 N501Y also con rmed its resistance to therapeutically monoclonal antibodies [6]. Hence the country as Ethiopia is likely poor and very di cult to overcome those dynamic mutations without genomic surveillance checkpoints.
In this study, we constructed evolutionary relatedness of SARS CoV 2 genome and detected variant concerns that were circulating in Ethiopia. Conserved domain of (RBD) protein sites and mutational dynamics of glycoprotein of structural S spike protein was discovered and assessed its potential against mAb-antibody. Our study provides full evidence about the variants and dynamic mutational changes that occurred in Ethiopia and used as the baseline research for tracking epidemiology of the virus.

Materials And Methods
(N = 14) whole genome sequence of SARS CoV-2 which was sequenced from Ethiopia COVID-19 positive were retrieved from GISAID datasets and performed MSA through blastn techniques and purposeful (n = 67) cross regional complete genomes were booked for further analysis. MSA was generated through CLUSTALX software and the Phylogenetic tree was constructed using MEGAX (V. 10.1.7) software [7].
Variants, clade distribution, structural spike and none structural protein mutational changes and other epidemiology analysis was done through GISAID CoVsurver platforms https://www.gisaid.org/. Protein amino acid sequence was analyzed using Jalview (version 2.

Phylogenetic tree analysis related to precursor
To study the genetic evolutionary relationship of the virus of that entered into Ethiopia, the evolutionary phylogenic tree analysis were conducted using MEGA using neighbor-joining statistical approaches and Poisson model along phylogeny test of 1000 bootstrap replication methods and found out that bat coronavirus RaTG13: MN996532.1 which where previously isolated from Yunnan province was considered as the nearest ancestor family to all Ethiopian SARS CoV-2 followed by pangolin bat-SL-CoVZC45: MG772933.1 and bat-SL-CoVZXC21: MG772934.1 respectively. However, (MERS: KT806053.1 and MK564474.1) coronavirus was disclosed as a genetic distance as inferred analysis on Fig. 1A. This nding provisionally took a scienti c evidence as per [10,11], reported Rhinolophus a nis was consider a natural reservoir of the coronavirus and supposed to a source of future epidemic.
Even though 11 patients SARS CoV-2 genome samples from Ethiopia were grouped as in same genetic distance as it marked in red color of tree, it is re ected that sub family member of EPI ISL 1897884, EPI ISL 1899281 and EPI ISL 1899047 were clustered with same sub-group and genotype association to SARS CoV-2 Nigeria, Tunisia, Gabon and Reunion patients samples as of yellow color of a tree node as it is illustrated on Fig. 1A phylogenetic tree.
Thus, it has theoretical to be the sample from 4 of them EPI_ISL_1169919, EPI_ISL_1170958, EPI_ISL_1170957 and EPI_ISL_1170956 categorized as clade of other (O) and it has not de ned yet, whereas the next 7 of them were classi ed as GH clad distributions which include EPI_ISL_1899485, EPI_ISL_1899485, EPI_ISL_1898814, EPI_ISL_1898554, EPI_ISL_1898258, EPI_ISL_1897648, EPI_ISL_1897646 and EPI_ISL_1897644, but the 3 samples were absolutely different and categorize as GR clad distributions which consist of EPI_ISL_1899281, EPI_ISL_1899047 and EPI_ISL_1897884 were con rmed as a new strains of VUI202012/01 GRY (B.1.1.7). Pango linage B.1.1.7 was rst detected in UK and now circulating more than 134 countries including Ethiopia.
We had also performed spike glycoprotein mutational ancestral relationship against to Wuhan genome and SARS. It has proven that the gradual mutational changes on spike protein in related its ancestors. Accordingly the GASAID, 2020, the rst 4 genome samples of (EPI_ISL_1169919; EPI_ISL_1170956; EPI_ISL_1170957 and EPI_ISL_1170958 there was at about 30 amino acid changes along 97.9% identity against to bat/Yunnan/RaTG13/2013 whereas 232 aa changed with 81.36% identity in contradiction of SARS-like/Bat/Nanjing/SL-CoVZXC21/2015 and 226 aa changed with 81.86% against to SARSlike/Bat/Nanjing/SL-CoVZC45/2017 respectively, but only 4 aa was altered with 99.6 % identity of hCoV-19/Wuhan/WIV04/2019 on Fig. 1 Fig. 1 (A2, B2, C2 and D2) this implies that the mutational changes on structural spike was before a decades and continue to at present SARS CoV-2. This nding shown that how the spike proteins changed over the time as compared to the previous coronavirus and probable that will change rigorously.

SARS CoV 2 Spike dynamic mutational changes and its antigenicity
The sum total of 14 SARS CoV-2 genome from Ethiopia was sequenced and deposited in GISAID datasets as of April/2021 [12,13]. And a number of structural and none structural mutational changes were evaluated. Especially, the changes on S gene or Spike proteins those will have effect on site of receptor binding and able to alter host receptors or antigenicity was considered for this analysis.
Essentially 21 number of notable mutational variations were recorded that will have probability antigenicity effects and found on structural termini including G142A; Y144del(143); E180Q; E309Q; P330L; N440K; N501Y; A520S; A570D; D614G; P681H(674); T716I; A771S; S939F; S982A; T1006I; D1118H; K1073N; A1078S and none structural S13T(N-term) M1229I(C-term) as it has depicted on Fig. 2. However, spike D614G, N501Y, Spike_M1229I, NSP12_P323L mutation were re ected as high prevalence of SARS CoV-2 and circulating in Ethiopia. Some of mutational changes those happened in Ethiopia exclusively such as Spike T478X, Spike H69X, Spike, N440K and Spike D614X. Therefore, the genetic mutation changes is expected vary depend open the immune response of the host cells and contumely uctuations across each patients reaction. Though most mutational could be as nonthreatening, the spike glycoproteins mutational changes of Spike_D614G help strengthen the viral survival capability [14]. N501Y mutation might be lead the viral phenotypic character and its fusion or degradability of the receptor cells whereas NSP5 S284G Non-structural protein-5 plays in the viral genome replication into the host for the formation of viral factories.

RBD of Spike conserved region and mAb neutralizations against to Spike
It is important to investigate the function of the virus Receptor Binding Domain (RBD) and its motif and the targeted receptor host cells because the integration of molecular amino acid peptides and gene machinery give us su cient of information about the mechanism of entry of virus and evasion mode of action into receptor cells and to know the mechanism of inhibitor drugs or to develop antibody neutralization and its vaccinations. So that searching of conserved domain of spike protein is one of the technique to reveal the molecular evolution of genes and organisms. It has con rmed that the N-terminal domain (NTD) and C-terminal domain (CTD) of the S1 & S2 subunit of the Spike (S) proteins as reference QSM35600.1 accession number the sample were taken from one of Ethiopian patients were conserved from beta coronaviruses in the sarbecovirus subgenera (B lineage), and use the C-domain to bind their receptors with higher binding a nity than that of the previous SARS and MERS coronavirus and binding polypeptide sites of RBD and RBM has also noted. It enable to attach on the receptor cells perfectly [15].
Having to the conserved domain nding, S spike of SARS CoV-2 of our sample (Ethiopia QSM35600.1) of fusion peptide site is highly conserved to pangolin coronavirus (QIQ54048.1), bat-SL-CoVZXC21, (AVP78042.1), HKU3 (Q3LZX1.1), SARS CoV-2 (PDB: 6VSB_A and 6ACC_A, 2021). However, exclusively SARS-CoV-2 has a functional polybasic (furin) cleavage site (R-arginine/S-threonine), which is absent in SARS coronavirus on Fig. 3. As a result of changes in the receptor domain affects its adhesiveness of a nity and the virus is more likely to be transmitted to humans easy as compare to the previous coronavirus. Annotated Spike protein (Ethiopia-QSM35588.1) structural modeling and ACE2 con rmation was done through modeler and taken its analogues crystallographic structure from protein data bank Speci cally, the mutational changes on PI_ISL_1898258 samples was detected as N440K amino acid changes and expected that which will determine the notarization of human antibody and vaccine escape mechanism because as it was illustrated on Fig. 3 (B1 & B2) 7k8t PDB pyMol analysis ASN-440 glycoproteins of SARS CoV 2 interact with SER-52 antibody might change the mode of action entirely.
This indicates how the viral infection could re-evasive the community through go through the existence antibodies or vaccines. Consequently, the total con rmed cases and number of death increase as per the national COVID − 19 dash board as it has shown the graph on Fig. 3 (F&G).

Conclusion And Discussion
Page 6/13 The SARS CoV-2 complete genome from COVID-19 positive patients of Ethiopia has ultimate evolutionary ancestral related to bat RaTG13 coronavirus followed by pangolin bat-SL-CoVZC45 and bat-SL-CoVZXC21, but genetic distance MERS coronavirus. The clade distribution of the virus supposed to be labeled as other GR, GH and O and its variants intended to new strains and VUI202012/01 GRY (B.1.1.7) variants concern was noted an predictable that VOC GH/501Y.V2 (B.1.351) hence we have recorded Spike N501Y mutation multiple times. Since the number of death, cases and hospitalization has been increased in Ethiopia, it is expected that variant concerns. The spike proteins of the SARS CoV-2 had been conserved from its precursor of coronavirus and used same binding sites against to ACE-2. Mainly, D614G, D614X, N501Y, T478X, N440K and NSP5 S284G mutation had signi cance impact in the country. Thus, an existence RT-PCR primer diagnosis kits and vaccine on the market may not effective and could escape immune system. The study highly recommended that the government responsible bodies should set strict preventative measurement to control the circulating variants in Ethiopia and further clinical and genomic study should be conduct.

Limitations
The study has limitation as; Minima genome sample sequences from Ethiopia during the time of this study.
Selected 67 genomes were too small as compare to world more than one hundred thousand sequences Author's contributions (Abaysew Ayele: proposing idea, data analysis using software and write up the manuscript, Rita Majumdar: supervise and advise, Yohannes Stitotaw: retrieved data, Tesfaye Adisu: literature review) Figure 1 Pandemic SARS CoV-2, Bat SARS-like coronavirus and MERS coronavirus Phylogenetic tree was constructed using neighbor-joining statistical approaches and a single amino acid substitution methods. Poisson model along phylogeny test of 1000 bootstrap replication methods (A). GISAID analysis of ACE-2 (deep green color) interaction with dynamical mutational amino acid changes in Ethiopia 14 samples which colored in blue are common mutations that appeared more than 100 occurrence and which colored in yellow are not occurred multiple times in amino acid sequence and responsible for attachment, fusion, and antigenicity and neutralize the host cells. whereas the black one not de ned yet, insertion or deletion of amino acid residues are colored in cyan, remove a potential glycosylation site are colored magenta, on S of spike as query of Ethiopian COVID-19 positive patients genome sequence as reference of SAR S-    1) where the S1/S2 cleavage region is takes place as blue color, S2' furin cleavage site of membrane mediated fusion as indicated on green color and hydrophobic interaction of host cells as black color and all yellow highlighted as conserved region (C).and the interaction between SARS CoV2 glycoprotein mutational changes at N440K red color (B2) with neutralizing antibody purple 7k8t PDB (E) Total number of COVID-19 cases to date (F) and total number of con rmed death (G) in Ethiopia as per COVID-19 dashboard

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. TableS1.docx