Molecular characterisation and tracking of the severe acute respiratory syndrome coronavirus 2, Thailand, 2020–2022

The global COVID-19 pandemic caused by a novel severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) was �rst detected in China in December 2019. To date, there have been approximately 3.4 million reported cases and over 24,000 deaths in Thailand. This study investigated the molecular characterisation and the evolution of the SARS-CoV-2 identi�ed during 2020–2022 in Thailand. Two hundred and sixty-eight SARS-CoV-2 strains, collected mostly in Bangkok from COVID-19 patients, were characterised by partial genome sequencing. Moreover, 5,627 positive SARS-CoV-2 samples were identi�ed as viral variants [B.1.1.7 (Alpha), B.1.617.2 (Delta), B.1.1.529 (Omicron/BA.1) and B.1.1.529 (Omicron/BA.2)] by multiplex real-time reverse-transcription polymerase chain reaction (RT-PCR) assays. The results revealed that B.1.36.16 caused the predominant outbreak in the second wave (December 2020–January 2021), B.1.1.7 (Alpha) in the third wave (April–June 2021), B.1.617.2 (Delta) in the fourth wave (July–December 2021), and B.1.1.529 (Omicron) in the �fth wave (January–March 2022). The evolutionary rate of the viral genome was 2.60×10 − 3 (95% highest posterior density [HPD], 1.72×10 − 3 to 3.62×10 − 3 ) nucleotide substitutions per site per year. Continued molecular surveillance of SARS-CoV-2 is crucial for monitoring emerging variants with the potential to cause new COVID-19 outbreaks.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection causes the COVID-19 pandemic as declared by World Health Organization (WHO) on 11 March 2020 and continues to impact public health and world economies [1].As of 26 December 2021, there were over 278 million con rmed COVID-19 cases, with over 5 million fatalities globally [2].SARS-CoV-2 is an enveloped virus with a positivesense, single-stranded RNA genome of approximately 27-32 kb in size [3].The genome encodes four structural proteins [spike (S), envelope (E), membrane (M), and nucleocapsid (N)] and 16 non-structural proteins (nsp1-nsp16) involved in viral function and replication.The massive circulation of SARS-CoV-2 worldwide and inequitable global vaccine distribution led to the evolutionary pressure on the virus and the emergence of new variants [4,5].An abundance of SARS-CoV-2 genome sequences has been generated rapidly and deposited in the global archive, namely the Global Initiative on Sharing All In uenza (GISAID) database.As of 2 March 2021, signi cant mutations on the viral genome led to the viral classi cation into nine clades (S, L, V, G, GH, GK, GR, GV, and GRY) [6].The epidemiologically relevant phylogenetic cluster of SARS-CoV-2 is further de ned as a lineage by the Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) tool [7].Based on the enhanced transmissibility, increase in virulence, and decrease in the natural infection-and vaccine-mediated neutralizing antibody attributed to signi cant amino acid substitutions, several SARS-CoV-2 variants are classi ed as Variants of Concern (VOC) [8].To assist public communication and avoid stigmatisation, the Greek alphabet, i.e.Alpha, Beta, Gamma, Delta, Omicron, is used to designate SARS-CoV-2 variants [8].
The rst con rmed COVID-19 case was reported on 12 January 2020 in a traveller from China [9].The number of the infected individuals rapidly surged during March-May 2020 because of transmission linked to boxing events and entertainment venues in Bangkok, which then spread throughout Thailand and was considered the rst COVID-19 wave [10].To effectively contain the viral spread, public health and social measures including wearing masks, physical distancing, movement restriction, workplace and school closures, and city lockdowns were implemented.The third wave began in early April 2021 with the upsurge of COVID-19 cases linked to the entertainment venue at Thonglor district in Bangkok [11].This severe and deadly wave was driven by the emergence of a more transmissible B.1.1.7 (Alpha) SARS-CoV-2 variant of concern (VOC), leading to rising hospitalisations and overwhelming healthcare facilities [12].
With the exception of the city lockdown, all protective measures implemented by the government were still in effect at this times.Moreover, eld hospitals were set up to handle patient isolation.Amid the third wave, the mass vaccination campaign was rolled out on 7 June 2021 to slow down the transmission.
However, the supply of vaccines was limited, and only two brands of vaccines, CorovaVac and Vaxzevria, were available at that time.As of 1 July 2021, there were 52,052 con rmed COVID-19 cases with 1,971 patients classi ed as having severe illness, of which 566 patients required ventilator support [13].As of 8 July 2021, COVID-19 cases were reported in all 77 provinces of Thailand.The Centre for COVID-19 Situation Administration (CCSA) declared the emergence of the fourth wave of COVID-19 pandemic in Thailand from the highly contagious Delta variant, whose transmissibility is faster than that of the previous SARS-CoV-2 variant [13].By December 2021, Thailand experienced its fth COVID-19 wave, with approximately 3.4 million con rmed COVID-19 cases and over 24,000 deaths as of the end of March 2022 [14].
In our previous molecular epidemiological investigation of SARS-CoV-2 in Thailand during the rst wave of the outbreak in 2020, 40 nasopharyngeal and/or throat swab specimens from 40 samples were found infected with SARS-CoV-2 types L, GH, GR, O, and S [15].Here, we aimed to monitor and track emerging new variants of SARS-CoV2 circulating in Thailand between March 2020 and March 2022.The present study provides epidemiological patterns of SARS-CoV-2 in Thailand that could have implications for a more effective disease surveillance and public health preparedness.

Ethics statement
The study protocol was approved by the Institutional Review Board of the Faculty of Medicine of Chulalongkorn University (IRB number 178/64).Waiver for written informed consent was granted due to the samples' anonymity by the institutional review board of the Ethics Committee for human research.All methods were performed in accordance with the relevant guidelines and regulations.

Sample collection and RNA extraction
All nasopharyngeal swab samples were collected as part of outbreak investigations during the period of the rst through fth waves and state quarantine (SQ) from March 2020 to March 2022 (Fig. 1).During the study, 5,627 selected nasopharyngeal swab samples submitted to the collaborating hospitals and the Institute of Urban Disease Control and Prevention (IUDC) routinely tested positive for SARS-CoV-2 by multiplex real-time reverse transcription polymerase chain reaction (RT-PCR) assays described earlier [15].
Total nucleic acid was extracted from 200 µL of supernatant using magLEAD 12gC instrument (Precision System Science, Chiba, Japan) according to the manufacturer's instructions, and prior to analysis in our laboratory.

Genome sequencing
Brie y, the partial sequences of SARS-CoV-2 were ampli ed from the S, RNA-dependent RNA polymerase, and N gene ampli cation stages performed using the SuperScript III Platinum One-Step RT-PCR System (Invitrogen, Carlsbad, CA, USA) and 10 sets of oligonucleotide primers (Table S1).The following ampli cation conditions were used: one cycle of 3 min at 94℃, 40 cycles consisting of 30 s at 94℃, 30 s at 53℃, and 90 s at 68℃, and a nal cycle of 7 min at 68℃.Amplicon puri cation with the HiYield Gel/PCR DNA Fragment Extraction kit (RBC Bioscience Co, Taipei, Taiwan) was followed by Sanger sequencing.The nucleotide sequences of SARS-CoV-2 were deposited in GenBank under accession numbers OK083891-OK084640, OM984745-OM984850, and OM996047-OM996083 (Table S2).

Multiplex real-time RT-PCR assay
The primers and probes speci c to B.  S3.One-step multiplex real-time RT-PCR was performed on the LightCycler 480 real-time PCR system (Roche, Mannheim, Germany).The thermocycler conditions included a reverse transcription step at 42°C for 30 min, and an activation hot start DNA Taq polymerase at 95°C for 10 min, followed by ampli cation that was performed during 45 cycles of denaturation at 95°C for 15 s and annealing/extension at 60°C for 30 s.Multiple uorescent signals were obtained once per cycle upon completion of the extension step.Data acquisition and analysis of the real-time PCR assay were performed using LightCycler 480 SW1.5 software (Roche).

Phylogenetic analysis
The spike sequence dataset was constructed with the BioEdit v7.2.6 software [16] and aligned using CLUSTAL W on the European Bioinformatics Institute (EBI) webserver [17].The diversity of SARS-CoV-2 lineages was analysed with the maximum-likelihood phylogenetic method available in the MEGA program (v7) [18].The Tamura 3-parameter with gamma distribution was selected as the substitution model in the analyses.The bootstrap method determined the statistical consistency of tree nodes (500 random samplings).
A time-scaled phylogenetic tree for complete genome sequences was reconstructed with the BEAST version 1.10.4 program [19].The Bayesian phylogenetic analyses used an uncorrelated log-normal prior distribution of nucleotide substitution rates among lineages.The general time-reversible (GTR) model was selected as the nucleotide substitution model.Bayesian Markov chain Monte Carlo analysis was run for 100 million steps, 10% of which were removed as burn-in and sampled every 1,000 steps from the posterior distribution.The Tracer version 1.7.1 tool (http://tree.bio.ed.ac.uk/software/tracer/) was used to assess for the convergence of all parameters (operator effective sample size of > 200).The maximum clade credibility (MCC) tree was calculated with the TreeAnnotator v1.S2).All of these strains from SQ were imported from the Americas (12.3%),Asia (41.5%),Europe (23.1%), and unknown (23.1%).The results showed that lineage B.1.1.7 (Alpha) was imported from the United States, France, Slovenia, the United Kingdom (UK), and Northern Ireland.Lineage B.1.177was imported from the UK and the United Arab Emirates (UAE).Lineage B.1.1 was predominantly imported from Asian countries (Qatar, India, Philippines, Japan, and Bahrain), the UK, and Italy.Lineage B.1 was imported from Asian and European countries.During the second wave of the outbreak (October 2020-March 2021), lineage B.1 became the predominant virus.
The evolutionary history of the structural region of SARS-CoV-2 was investigated by performing Bayesian analysis with a SARS-CoV-2 S glycoprotein sequence data set.The mean evolution rate was 2.60×10 − 3 (95% highest posterior density [HPD], 1.72×10 − 4 to 3.62×10 − 4 ) substitutions per site year (Fig. 4).The most recent common ancestor of all SARS-CoV-2 clades was dated in September 2019.

Discussion
Since early 2020, SARS-CoV-2 has been circulating in Thailand.The SARS-CoV-2 fourth wave outbreak is the largest known in Thailand, with over 800,000 recorded cases at the end of December 2021 [14].This study characterised the partial genome sequences of 268 Thai strains, performed phylogenetic analyses, and analysed the molecular evolution to investigate their relationships with previously described viruses.
The rst outbreak began in Thailand in 2020, with the rst imported case detected as having occurred in late January 2020, and spread to several other provinces, with Bangkok most severely affected [9].This outbreak was attributed to the lineage A.6 (S) variants and was responsible for 67.5% of all SARS-CoV-2 [15].This study showed that clade GH rapidly became the predominant variant throughout Thailand during the second wave of the COVID-19 epidemic (late December 2020).In November 2020, the B.1.1.7 (Alpha) variant emerged for the rst time in the United Kingdom and caused higher mortality rate [20] A recent study in Malaysia showed that the lineage B.6 (O)-associated groups and B.1.524(G) were the predominantly detected variants throughout the country during the second and third epidemic waves, respectively [23].In Vietnam, there was a reported increase of two clusters of SARS-CoV-2 as the waves of virus infection progressed from its emergence during July 2020-February 2021, with a novel, shared mutation in nsp9 [24].
Several mutations in the S protein identi ed in SARS-CoV-2 isolates are variant of concern (VOCs), which could possibly enhance transmissibility and viral infectivity [25][26].As is already known, the D614G mutation in the S protein increases the ability of the virus to replicate in the upper respiratory tract, causes a possible conformational change in the S1 subunit, and enhances furin cleavage e ciency at the S1/S2 site [27][28].The D614G mutation on the S protein was rst detected outside of China in the small outbreak in Germany in January 2020 [29].Our analysis showed that almost 90% of the Thai variants contained the D614G mutation.Spike mutations N501Y and K417N are rst recorded in the B.1.1.7 (Alpha) and B.1.351(Beta) variants, respectively; the presence of these substitutions in the receptorbinding domain (RBD) on S enables increased binding a nity of the virus to the ACE2 receptor [30][31][32].As recently reported, the E484K substitution was rst detected in the B.1.351(Beta) variants and was associated with antibody neutralisation escape by directly reducing antibody binding a nity [33][34].The mutational analysis of the B.1.617.2 (Delta) variant showed that the variant contained T to G transversion at nucleotide position 22,917, corresponding to the L452R mutation in the S protein, also found in the B.1.617.1 (Kappa) and B.1.427and B.1.429(Epsilon).The L452R substitution is located on the RBD of SARS-CoV-2, which is involved in the reduction of antibody neutralising activity [35].
Our study showed that the mean evolutionary rate early in the epidemics was 2.60 x 10 − 3 nucleotide substitutions per site per year.This rate was approximately four times as high as that averaged from genomic sequences of SARS-CoV-2 strains from Pakistan outbreaks, at around 5.68 x 10 − 4 substitutions per site per year [36].Our estimate was similar to those previously published report of SARS-CoV-2 (0.99-1.8 x 10 − 3 substitutions per site per year) [37][38][39][40].
A limitation of this study is that partial genome sequencing data were not available for all outbreak samples since S region typing is not yet routinely performed by our laboratory.We successfully obtained SARS-CoV-2 typing data by using a multiplex real-time RT-PCR assay between the third and fth epidemic waves occurring primarily in the last year of the study.The present study highlights the importance of molecular typing for a complete understanding of the diversity and circulation of SARS-CoV-2.
In summary, a SARS-CoV-2 outbreak has been ongoing in Thailand for more than 2 years, with a total of ve epidemic waves.Clade B. 1.36.16 (GH) 1.1.7(Alpha), B.1.617.2 (Delta), B.1.1.529(Omicron/BA.1)and B.1.1.529(Omicron/BA.2) were designed to target the S gene.The sequences of primers and probes were selected from conserved regions of sequences available in GISAID database (http://www.gisaid.org/).To identify the lineage B.1.1.7 (Alpha), lineage B.1.617.2 (Delta) and B.1.1.529(Omicron BA.1 and BA.2), one-step multiplex real-time RT-PCR was carried out in a total volume of 20 µl containing RNA template, the SensiFAST Probe One-Step RT-PCR system (Bioline, London, UK) including RNase inhibitor, Mg 2+ , and two pairs of primers and probes.The multiplex sets of primers and Taq-Man probes are described in Table 10.4 tool (http://beast.bio.ed.ac.uk/treeannotator).Results Distribution of SARS-CoV-2 outbreaks in Thailand From March 2020 to May 2021, 113 con rmed cases (1st wave; N = 8, 2nd wave; N = 40 and state quarantine; N = 65) were successfully genotyped by partial SARS-CoV-2 genome sequencing.In this study, the rst wave of the outbreak (March-May 2020) in Thailand was characterised by two different lineages, A and B.1 (Figs. 1 and S1).During the period of SQ from May 2020 through May 2021, we received 65 clinical specimens obtained from travellers and Thais who presented with/without symptoms of COVID-19 and were admitted to a hospital/hotel in Bangkok and Chon Buri province.Among these, lineage B.1.1 was the most frequently detected genotype and accounted for 29.2% of the strains (19/65), followed by 48% (31/65) for lineage B.1.Of the remaining strains, 11 were classi ed as lineage B.1.1.7 (Alpha) and another three of lineage B.1.177(Figure S1 and Table

Figure 1
Figures

Figure 2 The
Figure 2

Figure 3 Maximum
Figure 3