In this study, we provide a cross-sectional overview and potential impact of the SARS-CoV-2 genome variations from Turkey. The genomes were accessed from the GISAID database, originated from specimens collected during a 10-month period in 2020 and 2021. We further included additional S region sequences, submitted after our initial database searches as an update, representing viruses circulating in early 2021. Therefore, the findings are based on these datasets of 410 complete and 206 partial (spike) sequences, being so far the most comprehesive analysis performed in Turkey [23-25].
In the complete SARS-CoV-2 genomes, we identified 1200 individual nucleotide variations, with a median frequency of 12 (range:4-36) per genome (Table 1). Moreover, the temporal distribution of the variations indicated a statistically-significant accumulation of variations during the 10-month period examined (Figure 1). Comparable findings were reported in globally-distributed virus isolates, with more than 3000 specific point mutations being detected and an increased frequency of variation during the course of the pandemic [8]. However, SARS-CoV-2 isolates from Turkey was proposed to exhibit an elevated variation rate in a study focusing on 166 virus genomes accessed during July 2020 [25], where frequently-detected variations C14408T and C18877T, affecting viral polimerase (nsp12) and exoribonuclease (nsp14), respectively; were suggested as a possible precipitating factors [27,28]. These variations were also noted in our study with varying rates (Table 2, Figure 2). In addition, the co-detection of C14408T and A23403G variations were suggested to be associated with increased diversity [26,27] In parallel with global isolates, the SARS-CoV-2 genome variations in Turkey are mostly missense or silent mutations, frequently involving the enzymes and co-factors, participating in replication in ORF1a/1b or the S regions of the virus genome [8,17,26].
In the study, the most frequently-detected variations, namely the A23403G, C14408T and GGG28881AAC mutations resulting in amino acid substitutions in the corresponding virus proteins, were reported in previous analyses from Turkey [23-25]. However, they seem to be positively-selected in the local virus population pool, as their abundance seem to be elevated. For example, the A23403G variation was reported as low as 56.2% in previous reports, while it is detected in 92.9% of the complete genomes and 99% in S regions in this study (Figure 2, Figure 4). This observation is also evident in global genome data, where viruses with the A23403G and C14408T variations were steadily increased in frequency during the course of the pandemic and have become the majority in late 2020 [8]. The amino acid substitutions occuring as a result of these variations, namely P323L, D614G ve RG203KR, were also associated with a more severe COVID-19 clinical presentation [8]. Moreover, the D614G mutation, a defining component of the variant of concern B.1.1.7, is also likely to affect immune responses to the S protein. In addition to the high frequency of this substitution in the study, we further identified other variations that might affect T and B cell epitopes, albeit with lower rates.
Throughout the pandemic, the availability of virus genome sequences and powerful online tools have enabled a nearly real-time monitorization of SARS-CoV-2 molecular epidemiology [8,17]. Previous reports on SARS-CoV-2 genetic diversity have described particular virus lineages and clades, mostly in overall agreement but lacking a uniforn nomenclature [8,29]. The size of accumulating sequence data further warrants more practical approaches to indicate phylogeographic relationships than standard phylogenetic reconstruction. Here, we adopted a previously-reported mutation-annotated reference strategy to describe intraspecific phylogeny of SARS-CoV-2. We observed the majority of the SARS-CoV-2 genomes from Turkey to belong in the haplogroup A (98.3%), with main subhaplotype diversification into A2 (Figure 3). SARS-CoV-2 haplogroup A isolates consitute the ancestral node and predominant clade across the world. They are frequently-represented in isolates from Europe (97%) Africa (93%) and Asia (77%), but relatively scarce in South America (68%) and North America (53%) [8]. Among global haplogroup A subclades, A2 and A2a appears as the majority, with the phylogeographic inferences indicating a European origin. We observed haplogroup B is with a much lower frequency, and B4a subhaplotype representing the majority within this group (Figure 3). Haplogroup B viruses have been identified in all continents, with higher prevalence in North America (47%), South America (32%), Asia and Oceania (23%), with all major and minor subclades present in Asia [8]. The haplotype B viruses were introduced in Turkey likely by travel to endemic regions and further local spread was presumably prevented by isolation. Our analyses employing highly-diversified subclades of each main haplogroup failed to identify any evidence for recombination among local SARS-CoV-2 genomes. Overall, the findings on virus genome diversity in Turkey suggest several introductions originating from multiple sources and subsequent local adaptation, also noted in previous reports using smaller datasets [23].
The emergence and rapid spread of SARS-CoV-2 variants has raised significant concern, due to their potential for enhanced transmissibility, altered clinical progression and escape from protective immune response induced by previous infection or widely-available vaccines [20]. Also dubbed as the variant of concern (VOC), these viruses exhibit a wide array of amino acid changes accumulated in several regions of the virus genome including the spike protein [20-22]. The rapid spread of particular VOCs in several countries during fall 2020 called for more stringent public health measures as well as targeted monitorization, which is also initiated and currently ongoing in Turkey. We detected three major VOCs in the study group, with increased prevalence of B.1.1.7 and B.1.351 in the recently-dated dataset (Table 3). Moreover, the detection of P.1 in this group suggests not only an elevated prevalence but also a broader repertoire of variants in the population. These findings justify the efforts to identify and monitor known and potentially-emerging virus variants.
Particular limitations of this study need to be addressed. An important issue is the heterogeneity in temporal and spatial distribution of the samples employed for genome sequencing, which suggests a lack of organized sampling strategy for screening. In addition, missing demographic and location data in many instances also prevented further evaluations. Therefore, it is not possible to assess whether the current dataset fully represents the epidemiology and diversity in circulating viruses in Turkey. A continuous and organized surveillance strategy in conjunction with local transmission dynamics and infection epidemiology, will provide a better understanding of the SARS-CoV-2 molecular epidemiology in Turkey.
In conclusion, in this analysis of complete and partial SARS-CoV-2 genome sequences almost covering the first year since emergence, we described main variations associated with epidemiology and immune response, with the observation of increased incidence of VOCs in Turkey. With the ongoing pandemic and accelerated vaccination campaigns, such investigations should be performed periodically for precise screening and coordination of control measures.