Genome analysis and prevalence of SARS-CoV-2 Indonesian variants and the correlation with the outbreak timeline

Since its first appearance in China, in the end of 2019, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has mutated into several variants. The genome analysis of the spike of SARS-CoV-2 is necessary to identify the mutations in each variant. In this study, we obtained the whole genome sequences of SARS-CoV-2 Wild Type and variants from GenBank (National Center of Biotechnology Information, USA), and GISAID EpiCoV database (Germany). We analyzed the spike glycoprotein gene sequences using the Basic Local Alignment Search Tool (BLAST), and identified the changes in nucleotide and amino acid. Additionally, we also discuss the prevalence and correlation between the submission time of SARS-CoV-2 variants on the GISAID database with the outbreak timeline. The result of our analysis showed that there are 9 amino acid changes in SARS-CoV-2 alpha, beta and delta variants, and 3 amino acid changes in gamma variant. There were 8,861 submissions of Indonesian variants on the GISAID database per November 21, 2021. The correlation between submission time of SARS-CoV-2 variants and outbreak timeline showed that SARS-CoV-2 delta (B.1.617.2) variant potentially caused the sudden increase of COVID-19 confirmed cases from July to September 2021.


Introduction
(VUM). Variants of interest are the variants that have genetic mutations that can affect virus characteristics (such as transmissibility, disease severity, immune escape, diagnostic or therapeutic escape) and can cause significant community transmission or multiple COVID-19 clusters. Variants of concern are those that meet the criteria of variants of interest, and is associated in either increase in transmissibility or detrimental change in COVID-19 epidemiology; or increase in virulence or change in clinical disease presentation; or decrease in effectiveness of public health or available diagnostics, therapeutics or vaccines. While variants under monitoring are SARS-CoV-2 variants that have genetic mutations that are suspected to affect virus characteristics can pose a future risk, however the evidence of phenotypic or epidemiological impact is still unclear [15] As of November 26 2021, the variants of concern according to WHO [15] are: • Alpha variant, originated from United Kingdom, September 2020 • Beta variant, originated from South Africa, May 2020 • Gamma variant, originated from Brazil, November 2020 • Delta variant, originated from India, October 2020 The newly detected variant of concern is the omicron variant, which originated from South Africa and Botswana in November 2021 [15] [16].
In this study, we analyzed the change in nucleotide and amino acid in spike glycoprotein gene sequence of SARS-CoV-2 alpha, beta, delta and gamma mutations in Indonesia. Additionally, we also discuss the prevalence of SARS-CoV-2 variants in Indonesia; and the correlation between submission time of SARS-CoV-2 variants on the GISAID database with the outbreak timeline.
According to Table 6, the mutations in amino acids of the spike glycoprotein of the beta (B.1.351) variant were: D80A, D215G, T240P, K417N, E484K, N501Y, K529K, D614G, and A701V. The existence of mutations K417N and E484K suggest that the SARS-CoV-2 B.1.351 variant may overcome polyclonal antibody response by reducing neutralization from class 1 and class 2 RBD-specific antibodies; with K417N playing an important role in viral escape [20]. Mutation N501Y of the spike glycoprotein in the SARS-CoV-2 B.1.351 variant allows the virus to strengthen its bond with the receptor ACE2 [18]. The infectivity of the SARS-CoV-2 beta variant is also increased by the mutation D614G [3, p. 6].
Meanwhile, from Table 3, the mutations in amino acids of delta (B.1.617.2.55 or AY.55) variant's spike glycoprotein were: T19R, G142D. E156K. A222V, L452R, T478K, G504X, D614G, and P681R. The delta variant mutations in T19R and G142D may cause disruption in binding some anti-NTD neutralizing antibodies obtained from the wild-type's spike [22]. The A222V mutation was present in the Delta Plus and Delta variants, at the prevalence of 58% and 9% respectively [23]. The L452R mutation caused structural change resulting in reduction of intramolecular and intermolecular contacts towards ACE2 binding [24]. In addition, another study reported that 14 out of 35 RBD-specific mAbs, including three clinical stage mAbs, had their neutralizing activity decreased or eliminated due to the L452R mutation [25]. Moreover, another report revealed that the L452R mutation allows viruses to evade HLA-24-restricted cellular immunity while also increasing viral infectivity, potentially encouraging viral reproduction [26]. The T478K mutation in the RBD is specific to the Delta B.1.617.2.55 variation and occurs in the epitope region, and is powerful in neutralizing 'Class 1' monoclonal antibodies [27]. The D614G mutation alters the spike protein to be more stable, allowing it to penetrate the host cell more efficiently [17]. The P681R mutation in the furin cleavage site enhancing the basicity of the poly-basic stretch might aid in greater rate of membrane fusion, internalization and hence improved transmissibility [24].

Analysis of cumulative confirmed cases and deaths and submitted variants on GISAID database
Based on Table 2, the increase in the number of cumulative confirmed COVID-19 cases went steady from September 2020 to January 2021, in which the confirmed cases increased by around 300,000 every 3 months. However from January to March 2021, the number of confirmed cases increased by around 590,000 from 758,473 cases on January 2, 2021 to 1.35 million cases on March 2, 2021. The number of confirmed cases increased sharply from July to September 2021, which went from 2.23 million on July 2 2021 to 4.10 million on September 2 2021; with the number of deaths caused by COVID-19 also increased sharply from 59,534 to 133,676 deaths. By November 21 2021, the cumulative cases of COVID-19 reached the number of 4.25 million with 143,739 deaths caused by COVID-19. The increase in the number of confirmed COVID-19 cases and deaths in Indonesia can be seen in Figure 2.
Based on Table 1, from March-May 2021 to May-July 2021 there was a sudden increase in the number of submitted Indonesian delta variants which went from 2 submissions to 382 submissions on the GISAID EpiCoV database. The number increased to 1,889 submissions in July-September 2021, and kept increasing to 2,707 submissions in September-November 2021. The increasing number of Indonesian delta variant submissions as well as other submitted Indonesian variants on the GISAID EpiCoV database can be seen in Figure 1.
To analyze the correlation between the cumulative COVID-19 cases and deaths, with the cumulative submissions of SARS-CoV-2 Indonesian variants on the GISAID database, direct comparison of the data is performed. The direct comparison can be seen in Table 9 below. In Table 9 above, the notable changes have been highlighted with pale yellow color. The notable change occurs from July 1 2021 to September 1 2021. On July 1 2021, the number of submissions of the SARS-CoV-2 delta variant in Indonesia on the GISAID database was 384, which then increased to 2,273 on September 1 2021. Meanwhile within the same time range, drastic increase of cumulative COVID-19 cases and deaths also occurred. On July 1 2021, the number of cumulative COVID-19 cases and deaths in Indonesia was respectively 2.23 million and 59,534, which then increased to 4.10 million cumulative cases and 133,676 deaths on September 1 2021. This analysis could show the correlation of the increasing number of delta variant of SARS-CoV-2 with the sudden increase of cumulative COVID-19 cases and deaths in Indonesia. This can be related with the clinical manifestations of the SARS-CoV-2 delta (B.1.617.2) variant, which are: (1) increased transmissibility, (2) potential vaccine effectiveness reduction against symptomatic COVID-19, and (3) potential reduction in monoclonal antibody therapies neutralization. Moreover, B.1.617.2 has a potential of increasing severity, judging from hospitalization rate [28]. However, the limitations to this analysis include: (1) lack of proof due to lack of data (especially about the precise number of COVID-19 cases caused by each variant of SARS-CoV-2), (2) lack of significant analysis that can lead to conclusive result, and (3) few mismatch of date between submissions and cases/deaths due to different sources.

Conclusion
As a conclusion, this research studied the genome analysis and prevalence of SARS-CoV-2 Indonesian variants, and the correlation between the submissions of SARS-CoV-2 Indonesian variants: alpha, beta, delta and gamma on the GISAID database with the outbreak timeline. From the genome analysis, the changes of amino acids in the SARS-CoV-2 variants' spike glycoprotein were obtained. The analysis of the submission time and outbreak timeline, supported by analysis of clinical manifestation caused by the mutations, showed that SARS-CoV-2 delta (B.1.617.2) variant might have caused the sudden increase of COVID-19 confirmed cases and deaths from July to September 2021.

5.1.
Whole genome sequences of SARS-CoV-2 In this study, the SARS-CoV-2 whole genome sequences (WGS) were obtained from GISAID (the Global Initiative on Sharing All Influenza Data) EpiCoV database and GenBank (National Center for Biotechnology Information). Registration to GISAID was required to access the database.

SARS-CoV-2 submission timeline analysis
The submission time of SARS-CoV-2 variants on the GISAID database was obtained by accessing the GISAID EpiCoV database. The time range used in this analysis was per 3 months (quarterly) starting from 01 March 2020 until 21 November 2021. The location was set to "Indonesia", and the variant was set as one of the options below: -VOC Delta GK/478K.V1 (B.

Analysis of cumulative confirmed COVID-19 cases and deaths in Indonesia
The number of cumulative confirmed COVID-19 cases and deaths in Indonesia was obtained by accessing the 'Our World in Data' website via the link https://ourworldindata.org/coronavirus-data?country=~IDN . The country was set to 'Indonesia' and the data was recorded every 3 months.

5.4.
Genetic Composition Analysis The genetic composition of SARS-CoV-2 spike glycoprotein sequences were analyzed based on their nucleotide variants and amino acid mutations. In this study, the complete genome of SARS-CoV-2 isolate Wuhan-Hu 1 was used as a reference gene [17]. Afterwards, the results were compared to the previous studies and research journals.

5.5.
Similarity Analysis In order to analyze the similarity between the reference and the query sequences, a nucleotide Basic Local Alignment Search Tool (BLAST) provided by NCBI was used. A pairwise with dots for identities alignment view was utilized to locate the mutations in the nucleotides. In addition, the Coding Region Sequences (CDS) feature was used to analyze the isolates' amino acids sequence.