Evolutionary and genomic analysis of four SARS-CoV-2 isolates circulating in March 2020 in Sri Lanka; Additional evidence on multiple introduction and further transmission

doi:10.21203/rs.3.rs-44134/v1

The molecular epidemiology of the virus, and mapping help to understand the evolution of the epidemics and apply quick control measures. The study genomic evidence of multiple SARS-CoV-2 introductions into Sri Lanka and virus evolution during circulation. Whole-genome sequences of four SARS-CoV-2 strains obtained from COVID-19 positive patients reported in Sri Lanka during March 2020 were compared with sequences from Europe and elsewhere. The phylogenetic analysis revealed that the sequence of the sample of first local patient collected on 10th March, who contacted tourists from Italy, was clustered with SARS-CoV-2 strains collected from Italy, Germany, France and Mexico. Subsequently, the sequence of the isolate obtained on 19th March also clustered in the same group with the samples collected in March and April from Belgium, France, India, and South Africa. The other two strains of SARS-CoV-2 were segregated from the main cluster, and the sample collected from 16th March clustered with England and the sample collected on 30th March showed the highest genetic divergent to the isolate of Wuhan, China. Here we report the first molecular epidemiological study conducted on circulating SARS-CoV-2 in Sri Lanka. The finding provides the robustness of molecular epidemiological tools and its application in tracing possible exposure in disease transmission during the pandemic.

Veterinary Epidemiology

COVID-19

SARS-CoV-2

molecular evolution

phylogeny

Sri Lanka

The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) emerged in late 2019s casing disease Corona Virus Disease 2019 caused by SARS-CoV-2 (COVID-19). The SARS-CoV-2 virus spread rapidly throughout the world with an unsettling effect on the human livelihood and economy. As of 11^th May 2020, more than 4,063,525 confirmed cases, including 282, 244 deaths, have been reported worldwide, affecting more than 212 countries as indicated by the European Centre for Disease Prevention and Control [1]. The World Health Organization (WHO) has declared a global health emergency at the end of January 2020 [2]. In Sri Lanka, the first local case of COVID-19 was recorded on 11^th March 2020 in a 58-year-old male. As of 09^th May 2020, there were 835 confirmed cases of COVID-19, including 9 deaths and 255 recoveries [3].

A robust surveillance system, understanding the molecular epidemiology of the virus, and mapping can help to understand the evolution of the epidemics and apply quick control measures [3,4] .Currently, based on the genomic epidemiology mapping of the COVID-19 virus, it has been demonstrated that the virus is undergoing mutations [5]. Therefore, in addition to confirmation of the presence of the virus, the WHO recommends regular sequencing of a percentage of specimens from clinical cases to monitor viral genome mutations that might affect the medical countermeasures, including diagnostic tests [4]. The whole-genome of four SARS-CoV-2 virus strains obtained from COVID-19 positive local patients were sequenced and deposited in the Global Initiative on Sharing All Influenza Data (GISAID) EpiCoV™ database. This study was conducted to investigate the evolution and genetic relatedness of SARS-CoV-2 strains in Sri Lanka, with other reported SARS-CoV-2 strains.

2.1 Sri Lankan SARS-CoV-2 sequences

The whole-genome sequence of four SARS-CoV-2 virus strains obtained from COVID-19 positive patients, including the first local positive case reported on 11^th March 2020, was deposited in the GISAID EpiCoV™ database⁵ were used for this study (Table 1).

2.2 Selection of SARS-CoV-2 isolates

For further understanding of the molecular epidemiology of the COVID-19 outbreak in Sri Lanka, 46 isolates were selected from GenBank, National Center for Biotechnology Information (NCBI) using Basic Local Alignment Search Tool nucleotide (BLASTn) tool based on the highest identity and lowest expected value (E-Value) with Sri Lankan isolates. The sequence datasets of 46 selected SARS-Cov-2 complete genomes from different countries in Asia, Africa, Australia, Europe, North America, and four Sri Lankan isolates retrieved from GISAID⁵ by 28^th April 2020 were used for this analysis. The strain isolated from Wuhan in December 2019 with the NCBI accession number NC_045512.2 was used as the reference genome.

2.3 Whole-genome sequence alignment and Phylogenetic analysis

Sequence alignment was performed using Multiple Sequence Comparison by Log- Expectation (MUSCLE) software [6]. Following alignment, Single Nucleotide Polymorphisms (SNPs) and Amino acid variations analysis were conducted using MEGA X [7], taking the first SARS-CoV-2 reference sequence (GenBank Accession number NC_045512) deposited December 2019 in GenBank from Wuhan, China. The evolutionary history was inferred using the Neighbour-Joining method with the Maximum Composite Likelihood method and the Hasegawa-Kishino-Yano model (HKY) as the best fitting model [8] after 1000 bootstrap replication using Molecular Evolutionary Genetics Analysis version ten (MEGA X) [7].

3.1. Phylogenetic tree analysis

The maximum likelihood phylogenetic tree in Figure 1 shows that two of the SARS-CoV-2 isolates from Sri Lanka (GISAID accession IDs: EPI_ISL_428671 and EPI_ISL_428672) collected on 10^th March 2020 and 19^th March 2020, respectively are clustered in the group with the isolates from Italy, Germany, France and Mexico that were collected before the 10^th March 2020. The EPI_ISL_428673 Sri Lankan isolates collected in 31^st March 2020 was clustered with isolate obtained in 9^th Feb 2020 from England while EPI-ISL_428670 Sri Lankan isolates collected on 16^th March 2020 showed the highest evolutionary distance to the SARS-CoV-2 sequence originated in Wuhan, China (GenBank Acc No: NC_405512).

Figure 1. Phylogenetic analysis of four SARS-CoV-2 complete genome sequences of Sri Lanka retrieved in this study, with available selected complete genome sequences from different countries a (n=50 genome sequences). Strains names were written name followed by country of origin, GISAID accession number, and sample collection date. GISAID: Global Initiative on Sharing All Influenza Data; HKY: Hasegawa, Kishino, and Yano; MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms; SARS-CoV-2: severe acute respiratory syndrome coronavirus. Sequence data obtained from GISAID: Global Initiative on Sharing All Influenza data. All Sri Lankan SARS CoV2 isolates are indicated in red triangles (▲). The main clusters are highlighted in different colors. The Wuhan reference genome is in larger font (GenBank accession number: NC_045512.2). The filled circles represent the main supporting clusters and bootstrap support values are indicated at the level of the nodes. The tree was built using the best-fitting substitution model (HKY) through MEGA X software [7].

3.2. SNPs Analysis

Fifteen SARS-CoV-2 genome sequences that are mainly clustered with the four Sri Lankan strains were compared with the Wuhan reference to observe the viral genome mutations and amino acid variations. The SNPs presented along the whole genome indicated in Table 2 (positions referred respect to the reference sequence; GenBank accession number: NC_045512). The genome sequence of EPI_ISL_428671 from the first local patient has differed in six nt positions compared to the reference genome, while rest of three Sri Lankan sequences EPI_ISL_428670, EPI_ISL_428672, and EPI_ISL_428673 showed variations in six nt positions, five nt positions, and four nt positions, respectively (Table 2). Both EPI_ISL_428671 and EPI_ISL_428672 strains, which clustered with the main group in the phylogenetic tree with European isolates, have shown three similar SNPs at the positions of bps3037, bps14408, and bps23403 (Table 2).

3.3 Amino acid variations

Table 3 indicates the respective changes in the amino acid positions of the derived proteins (positions referred respect to the reference sequence; GenBank accession number: NC_045512). SNPs occurred only in the Open Reading Frame (ORF) 1ab gene, S gene, ORF 3a gene, M gene, and N gene of four Sri Lankan whole-genome strains have resulted in Amino acid changes at the corresponding positions of the translated proteins, while rest of SNPs in the genes did not result in any changes in amino acid sequence (Table 3).

Except for the first Sri Lankan isolate collected on 10^th March (EPI_ISL_428671), the other three Sri Lankan isolates presented a total of six mutations in the ORF 1ab protein with respect to the Reference (Table 3). Mutations can be observed in the S protein at the same position AA614 (bps23403) in both Sri Lankan Strains EPI_ISL_428671 and EPI_ISL_428672. A single mutation was observed in ORF 3a protein in strain EPI_ISL_428673 at the position AA251 and bps26144. In the EPI_ISL_428670 strain, the amino acid sequence of N protein shows one mutation at the position AA398 at bps29465, while EPI_ISL_428671 strain had mutations at the positions AA203 (bps28882) and AA204 (bps28883) compared to the reference strain.

In this study, the virus strain of the first local patient (EPI_ISL_428671) collected on 10^th March 2020, who was a tour guide and had direct contact with Italian tourists [3], is clustered together with isolates from, Italy, Germany, and Mexico. This evolutionary evidence revealed the first sequence of SARS-CoV-2 showed the highest genomic similarity to Italy, and European isolates confirming the history of exposure of the first patient who has been exposed to tourists came from Italy. Even though the history and origin of the infection of the remaining isolates were not reported, the clustering of other isolates from Sri Lanka with isolates from the database has provided a clue of the possible source of the infection. Furthermore, genomic relatedness of the SARS-CoV-2 virus genome sequences of Sri Lankan isolates further confirmed the exposure history of the patients presented in the Epidemiology Unit, Ministry of Health, Sri Lanka [6]. More importantly, this study has indicated the importance of tracking the history of the infection to trace the contacts of the infected person, particularly for the asymptomatic patients in Sri Lanka. The mutations found in the virus identified in Sri Lanka, compared with the reference Wuhan strain, and the recognize amino acid changes, should be further monitored to understand whether those changes affect the virulence of the virus or clinical manifestations of the disease. Though this study had limitations mainly due to the lack of epidemiological information on the available genome in the database and a limited number of sequence genome available in Sri Lanka at the time of analysis the information obtained from this study might assist in understanding the evolutionary dynamics and local transmission of circulating SARS-CoV-2 in Sri Lanka.

In conclusion, results of the present study indicated that the SARS-CoV-2 sequences from Sri Lanka have the highest genomic similarity to isolates from Italy, and Europe. The current study was conducted as a preliminary study in Sri Lanka; further studies are necessary to be performed to increase our knowledge regarding SARS-CoV-2 isolates. Since the mutational variants can alter the presentation of COVID-19 infection; the robust of molecular epidemiological tools indicated in this study can be used to trace possible exposure, epidemiological analysis and develop effective treatment including vaccines.

SARS-CoV-2	Severe Acute Respiratory Syndrome Coronavirus 2
COVID-19	Corona Virus Disease 2019 caused by SARS-CoV-2
WHO	World Health Organization
GISAID	Global Initiative on Sharing All Influenza Data
NCBI	National Center for Biotechnology Information
BLAST	Basic Local Alignment Search Tool nucleotide
MUSCLE	Multiple Sequence Comparison by Log- Expectation
SNPs	Single Nucleotide Polymorphisms
MEGA	Molecular Evolutionary Genetics Analysis version ten
HKY	Hasegawa-Kishino-Yano model
ORF	Open Reading Frame

Ethics approval and consent to participate

Not applicable

Consent for publication

The authors declare consent for publication

Availability of data and materials

Data and material are provided in the supplementary material and available at www.gisaid.org

Competing interests

The authors declare there are no financial or non-financial competing interests.

Funding

No funding was received for this work

Authors' contributions

Dilan Amila Satharasinghe: Conceptualization, Design of the work, Analysis, Interpretation of data, Resources, Supervision, Writing - original draft, Writing -review & editing. Parakatawella Mudiyanselage Shalini Daupadi Kumari Parakatawella: Analysis, Writing - original draft. Jayasekara Mudiyanselage Krishanthi Jayarukshi Kumari Premarathne: Design of the work, Writing - review & editing. L.J.P. Anura P. Jayasooriya, Gamika A. Prathapasinghe: Writing - review & editing. Swee Keong Yeap: Supervision, Writing - review & editing.

Acknowledgements

We gratefully acknowledge the Authors, the Originating and Submitting Laboratories for their sequence and metadata shared through GISAID and NCBI, on which this research is based. All submitters of data may be contacted directly via www.gisaid.org. The Acknowledgments Table for GISAID and NCBI is reported as Supplementary material.

Shereen M, Khan S, Kazmi A, Bashir N, Siddique R. COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses. Journal of Advanced Research. 2020; 24:91-98.doi: 10.1016/j.jare.2020.03.005.eCollection 2020 Jul.
Mahase E. China coronavirus: WHO declares international emergency as death toll exceeds 200. BMJ. 2020; m408.doi: 10.1136/bmj.m408.
Cheng V, Wong S, Chen J, Yip C, Chuang V, Tsang O et al. Escalating infection control response to the rapidly evolving epidemiology of the coronavirus disease 2019 (COVID-19) due to SARS-CoV-2 in Hong Kong. Infection Control & Hospital Epidemiology. 2020;41(5): 493-498.doi: 10.1017/ice.2020.58.
Wang Y, Wang Y, Chen Y, Qin Q. Unique epidemiological and clinical features of the emerging 2019 novel coronavirus pneumonia (COVID‐19) implicate special control measures. Journal of Medical Virology. 2020;92(6): 568-576.doi: 10.1002/jmv.25748
Pachetti M, Marini B, Benedetti F, Giudici F, Mauro E, Storici P et al. Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. Journal of Translational Medicine. 2020;18(1).org/10.1186/s12967-020-02344-6
Edgar R. BMC Bioinformatics. 2004;5(1): 113.doi.org/10.1186/1471-2105-5-113 PMID: 15318951.
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Molecular Biology and Evolution. 2018;35(6): 1547-1549.doi: 10.1093/molbev/msy096.
Hasegawa M, Kishino H, Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution. 1985;22(2):160-174.

Table 1. Four Sri Lankan SARS-Cov-2 virus strains available on the GISAID EpiCoV™ database

Virus name	Passage history	Patient details/history	Accession ID	Collection date	Length	Host	Specimen source	Originating lab	Acknowledgment
hCoV-19/Sri Lanka/COV53/2020	Original	1^st local patient in Sri Lanka Age range 55-60 years	EPI_ISL_428671	2020-03-10	29,903	Human	Sputum	Centre for Dengue Research, University of Sri Jayewardenepura, Sri Lanka	GISAID (https://www.gisaid.org/, last access 28^th April 2020; Supplementary material)
hCoV-19/Sri Lanka/COV38/2020	Original	Original Age range 30-33 years	EPI_ISL_428670	2020-03-16	29,903	Human	Sputum
hCoV-19/Sri Lanka/COV91/2020	Original	Original Age range 35-40 years	EPI_ISL_428672	2020-03-19	29,903	Human	Sputum
hCoV-19/Sri Lanka/COV486/2020	Original	Original Age range 45-50 years	EPI_ISL_428673	2020-03-31	29,903	Human	Oro-phalangeal swab

Table 2. Single nucleotide polymorphisms (SNPs)^a indicated in red color constructed by comparison of four Sri Lankan whole-genome sequences of SARS-CoV-2 with selected SARS-CoV-2 sequences (n = 15 compared sequences)

SARS-CoV-2 sequence ID

(country from which the sequence originated)

313

1059

1397

1762

2480

2558

3037

3467

4201

5986

6466

10265

11083

12778

14408

14805

16376

17247

18984

20483

22258

22374

23403

25563

26144

26181

26527

28396

28688

28881

28882

28883

29451

29465

ORF 1ab gene

ORF1ab gene

Gene S

ORF3a gene

Gene M

Gene N

China/Wuhan/NC_405512

C

G

C

A

C

G

C

A

G

C

T

G

C

T

A

G

T

C

G

T

G

C

China/Wuhan/EPI_ISL_402119

C

G

C

A

C

G

C

A

G

C

T

G

C

T

A

G

T

C

G

T

G

C

Sri_Lanka/EPI_ISL_428673

C

G

T

A

C

G

C

A

G

C

T

C

G

C

T

A

G

T

C

G

T

G

C

Sri_Lanka/EPI_ISL_428672

C

G

C

A

C

T

G

T

C

A

G

C

T

C

T

G

C

T

A

G

T

G

T

G

C

Sri_Lanka/EPI_ISL_428671

C

G

C

A

C

T

G

C

A

G

C

T

C

T

G

C

T

A

G

T

C

G

T

A

C

Sri_Lanka/EPI_ISL_428670

C

A

C

A

C

G

T

A

G

T

C

T

C

T

A

G

C

G

C

G

C

T

England/EPI_ISL_412116

C

G

C

G

T

C

G

C

A

G

C

T

C

T

G

C

T

A

G

T

C

G

T

G

C

India/ EPI_ISL_426415

C

G

C

A

C

T

G

C

G

C

T

C

T

G

C

T

G

T

C

A

T

G

T

C

Germany/ EPI_ISL_406862

C

G

C

A

C

T

G

C

A

G

C

T

G

C

T

A

G

T

C

G

T

G

C

Italy/ EPI_ISL_412973

C

G

C

A

C

T

G

C

A

G

C

T

C

T

G

C

T

A

G

T

C

G

T

G

C

France/ EPI_ISL_420049

C

T

G

C

A

C

T

G

C

A

G

C

T

C

T

G

C

T

A

G

T

G

T

C

G

T

G

C

South Africa/ EPI_ISL_421575

C

G

C

A

C

T

G

C

A

G

A

T

C

T

G

C

G

A

G

T

C

G

T

G

C

Mexico/ EPI_ISL_412972

C

G

C

A

C

T

G

C

A

G

C

T

C

T

G

C

T

A

G

T

C

G

T

A

C

Germany/ EPI_ISL_412912

C

G

C

A

C

T

G

C

A

G

C

T

C

T

G

C

T

A

G

T

C

G

T

A

C

Australia/ EPI_ISL_417404

T

C

G

C

A

C

T

G

C

A

G

C

T

C

T

G

C

T

A

G

T

C

G

T

A

C

N: nucleocapsid protein; ORF: Open Reading Frame; ORF1ab: ORF encoding polyprotein; S: surface glycoprotein; SARS-CoV-2: severe acute respiratory syndrome coronavirus; SNP: single nt polymorphism; UTR: untranslated region.

^a SNPs are shown according to nucleotide positions in the genome sequence and gene location.

SNPs are indicated in red color

Table 3. Amino acid variations ^a construed by comparing translations of four Sri Lankan whole-genome sequences of SARS-CoV-2 with those of selected SARS-CoV-2 sequences (n = 15 compared sequences)

SARS-CoV-2

sequence ID

(country from

which the sequence

originated)

265

378

739

765

1068

1312

3334

3606

4171

4847

5371

5661

6240

271

614

57

251

2

203

204

393

398

ORF 1ab

S glycoprotein

ORF 3a

M protein

N protein

China/Wuhan/NC_405512

T

V

I

P

G

M

G

L

Y

T

R

V

W

Q

D

Q

G

A

R

G

T

A

China/Wuhan/EPI_ISL_402119

T

V

I

P

G

M

G

L

Y

T

R

V

W

Q

D

Q

G

A

R

G

T

A

Sri_Lanka/EPI_ISL_428673

T

V

I

P

G

M

G

L

Y

I

R

A

W

Q

D

Q

V

A

R

G

T

A

Sri_Lanka/EPI_ISL_428672

T

V

I

P

G

I

G

L

Y

T

R

V

W

Q

G

Q

G

V

R

G

T

A

Sri_Lanka/EPI_ISL_428671

T

V

I

P

G

M

G

L

Y

T

R

V

W

Q

G

Q

G

A

K

R

T

A

Sri_Lanka/EPI_ISL_428670

T

I

P

G

M

G

F

Y

T

R

V

S

Q

D

Q

G

A

R

G

T

S

England/EPI_ISL_412116

T

V

S

G

M

G

?

Y

I

R

V

W

Q

D

Q

V

A

R

G

T

A

India/ EPI_ISL_426415

T

V

I

P

*

M

G

L

Y

T

R

V

W

R

G

Q

G

A

R

G

I

A

Germany/ EPI_ISL_406862

T

V

I

P

G

M

G

L

Y

T

R

V

W

Q

G

Q

G

A

R

G

T

A

Italy/ EPI_ISL_412973

T

V

I

P

G

M

G

L

Y

T

R

V

W

Q

G

Q

G

A

R

G

T

A

France/ EPI_ISL_420049

I

V

I

P

G

M

G

L

T

R

V

W

Q

G

H

G

A

R

G

T

A

South Africa/ EPI_ISL_421575

T

V

I

P

G

M

G

L

*

T

C

V

W

Q

G

Q

G

A

R

G

T

A

Mexico/ EPI_ISL_412972

T

V

I

P

G

M

G

L

Y

T

R

V

W

Q

G

Q

G

A

K

R

T

A

Germany/ EPI_ISL_412912

T

V

I

P

G

M

S

L

Y

T

R

V

W

Q

G

Q

G

A

K

R

T

A

Australia/ EPI_ISL_417404

T

V

I

P

G

M

G

L

Y

T

R

V

W

Q

G

Q

G

A

K

R

T

A

N: nucleocapsid protein; ORF: Open Reading Frame; ORF1ab: ORF encoding polyprotein; S: surface glycoprotein; SARS-CoV-2: severe acute respiratory syndrome coronavirus; SNP: single nucleotide polymorphism; UTR: untranslated region.

a SNPs are shown according to nucleotide positions in the genome sequence and gene location. The amino acid positions refer to those in each respective protein sequence of the Wuhan reference (GenBank accession number: NC_405512), starting from the first methionine.

* Stop codon

? Possible sequencing error Amino acid variations are indicated in red color

SupplementaryFileS1.docx

Evolutionary and genomic analysis of four SARS-CoV-2 isolates circulating in March 2020 in Sri Lanka; Additional evidence on multiple introduction and further transmission

Status:

Version 1

Abstract

Figures

1. Introduction

2. Materials And Methods

2.1 Sri Lankan SARS-CoV-2 sequences

2.2 Selection of SARS-CoV-2 isolates

2.3 Whole-genome sequence alignment and Phylogenetic analysis

3. Results

3.1. Phylogenetic tree analysis

3.2. SNPs Analysis

3.3 Amino acid variations

4. Discussion

5. Conclusion

Abbreviations

Declarations