PCR-Sequencing Approaches to Assess Informative Mutations in SARS-Cov-2 Spike (S) and ORF7, ORF8 and N Genes Characterizing Variants of Concern and Variants of Interest

Background: The high infectivity rates of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the prolonged duration of coronavirus disease 2019 (COVID-19) pandemics have contributed to the emergence of viral variants endowed with evolutionary advantages, leading to enhanced infectivity. The tracking of these lineages is urgent. However, the need to sequence whole-viral genomes through next-generation sequencing (NGS) represents a barrier hampering the massive identication of these variants. Therefore, in the current study, we developed Sanger-sequencing approaches targeting regions of interest containing vast lineage-dening mutations in the SARS-CoV-2 S gene and ORF8 region, allowing for unambiguous identication of all SARS-CoV-2 variants of concern (VOCs) and of interest (VOIs). Methods and results: Primers were designed for polymerase chain reaction (PCR) and nested-PCR to amplify and sequence samples with a low-viral burden. The primers’ annealing sites conservancy were checked in a large group of sequences. Amplication protocols were standardized, and sequencing reactions were performed in a cohort of samples for validation. The primers were highly ecient and sequencing of the targeted regions matched those generated by NGS in the same samples. The sequencing results allowed for the unambiguous identication of B.1.1.7, P.1 and P.2 samples, and would also allow the identication of B.1.617.2, B.1.351 and B.1.427/B.1.429 lineages, which were absent in our cohort. Conclusion: Implementing Sanger-sequencing-based approaches to identify SARS-CoV-2 lineages may represent an alternative to tracking these variants by more laboratories around the world and providing valuable molecular and epidemiologic information to inform health systems.


Introduction
The coronavirus disease 2019 , caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), was rstly noti ed in Wuhan (China) at the end of 2019 [1], and has rapidly spread around the world. In March 2020, with more than 118 thousand cases in 114 countries, the World Health Organization (WHO) declared the outbreak a pandemic [2]. There are more than 200 million cases and around 4,5 million deaths worldwide from the disease [3].
The prolonged duration of the pandemic and the high infection rates by the virus contributed to the emergence of new variants with potential evolutionary advantages, resulting in increased infectivity, which represents a challenge for COVID-19 control, even in highly vaccinated countries [4][5][6].
Variants of concern (VOCs) that emerged so far include the B.1.1.7 (α) lineage, rstly detected in the United Kingdom [7,8], the B.1.351 (β) lineage, detected for the rst time in South Africa [9], the P.1 (γ) variant, rstly reported in Brazil [10] and the B.1.617.2 (δ), rstly identi ed in India [11]. Also, variants of interest (VOIs) were described, including the P.2 (ζ) lineage, which was rstly described in Brazil [12] and B.1.621 (μ), rstly detected in Colombia [13]. Other variants that harbor mutations suspected to impact disease course are classi ed as variants under monitoring (VUM) by the WHO; these currently include the The e cient identi cation of these variants is crucial to implement measures aiming to control their spreading [14]. However, economic and technological barriers in sequencing complete viral genomes through next-generation sequencing (NGS) hampers its practical implementation in many countries and communities as a surveillance method [15][16][17].
In this context, Sanger sequencing of regions of interest (ROIs) in the viral genome that include lineagede ning mutations emerges as a suitable strategy that may allow more laboratories around the world to track these mutations in SARS-CoV-2 genomes of diagnosed individuals, contributing with relevant molecular and epidemiological information [15,18,19].

Samples
Samples for the present work were selected from an epidemiological genomic surveillance project of SARS-CoV-2 lineages in Paraná state, Brazil. Nasopharyngeal swab samples were collected from patients with active SARS-CoV-2 infection and stored in viral transport media (VTM) under -80ºC. RNA was extracted from VTM using the QIAmp Viral RNA mini kit (QIAGEN ® , USA) in QIAcube equipment and quanti ed in a NanoDrop ® One spectrophotometer (ThermoFisher Scienti c ® ) and a Qubit TM Fluorometer (Invitrogen TM ) using the Qubit RNA HS Assay kit (Invitrogen TM ).
Libraries for viral genome sequencing were constructed using the AmpliSeq TM

Primer design
Polymerase chain reaction (PCR) primers were designed to target ROIs in S and ORF7a/ORF8/N genes using the primer-BLAST tool. Nested-PCR protocols were planned with two primer pairs designed for each ROI to amplify and sequence samples with low viral burden. The rst (outer) ampli es a fragment of 600-800 bp, and the second (inner) ampli es a fragment of 400-600 bp inside the rst amplicon.
Conservation of annealing sites was checked in the NCBI nucleotide bank using the BLAST tool and 159 SARS-CoV-2 newly-generated genomes aligned in MEGA 8.0 software. Primer pairs not targeting regions containing any mutation in the analyzed sequences were tested in silico for potential secondary structures using the OligoAnalyzer TM tool (IDT ).
For the S gene, selected primer pairs were predicted to amplify a 737 bp fragment spanning from nucleotide 22563 to 23299 of SARS-CoV-2 reference genome in the rst PCR round and a fragment of 544 bp from nucleotide 22670 to 23213 in the nested-PCR round (Supplementary table S1). In this setting, both primer pairs anked a large portion of the S-protein receptor-binding domain (RBD) and included several mutations of interest in the S gene, which combined could differentiate all VOCs and VOIs (Supplementary table S2).
Another primer set was designed to target mutations around the ORF8 region. For the rst PCR round, a primer pair amplifying a fragment of 784 bp between nucleotides 27593 and 28377 of the SARS-CoV-2 reference genome was selected. This fragment spanned from codon 67 of ORF7a to codon 31 of N gene, anking the entire coding regions of ORF7b and ORF8 genes.
The nested-PCR primers ampli ed a 432 bp fragment from nucleotides 27915 to 28347 of SARS-CoV-2 genome (supplementary table S1), spanning from codon 8 of ORF8 gene to codon 25 of N gene. The regions anked by both inner and outer primers included several lineage-de ning mutations, allowing for unambiguous identi cation of all VOCs and VOIs (Supplementary table S3).

Polymerase chain reaction
For PCR standardization, three viral samples were randomly selected. Ampli cation protocol was established by testing two MgCl 2 concentrations (1mM and 1.5mM) and four annealing temperatures (56°C, 58°C, 60°C, and 62°C) in a Veriti TM 96 well thermal cycler (Applied biosystems TM ). All PCR reagents were purchased from Invitrogen ® , including the recombinant Taq DNA polymerase (5U/μL) as well as the accompanying 10X PCR Buffer and 50mM MgCl 2 solutions (Cat. 10342020), the 100mM dNTP set (Cat. 10297018), and all the primers. Ampli ed fragments were analyzed after electrophoresis in 2% agarose gels stained with SYBR Safe ® (Invitrogen TM ), run with Tris-Acetate-EDTA (TAE) buffer. Gels were visualized using an iBright system (Applied biosystems TM ).
Primers were highly speci c, amplifying a single fragment with the predicted size for any tested MgCl 2 concentrations or annealing temperatures. The best ampli cation for the S and ORF7/ORF8/N fragments was achieved with 1.5mM MgCl 2 and at 62°C of annealing temperature.
Therefore, all PCRs were performed in 25μL of the reaction mix, setting the nal reagent concentrations as follows: 1X PCR Buffer, 1.5mM MgCl 2 , 0.1μM of each primer, 0.1mM of each dNTP, and 1U of Taq DNA polymerase. To the nested protocol, 2μL of cDNA was used in the rst PCR round, and 1μL of the PCR product was added in the second PCR reaction. Ultrapure water was used to complete the nal volume. Temperature cycling consisted of initial denaturation at 95°C for 5 minutes; 40 cycles of denaturation at 95°C for 30 seconds, annealing at 62°C for 30 seconds and extension at 72°C for 45 seconds, and a nal extension at 72°C for 7 minutes.

Sanger sequencing
For Sanger sequencing, PCR fragments were ampli ed using the rst primer pairs for both S and ORF7/ORF8/N regions of interest using Platinum TM Taq DNA High Fidelity (Invitrogen TM ), and amplicons were sliced and puri ed from agarose gels using the Wizard ® SV Gel and PCR Clean-Up System (Promega TM ). Gel-puri ed PCR products were quanti ed in a NanoDrop spectrophotometer (ThermoFisher Scienti c TM ) and diluted to 4ng/μL.
PCR products were sequenced for forward and reverse strands using the BigDye™ Terminator v3.1 Cycle Sequencing Kit (Applied biosystems TM ) with nested-PCR inner primers. The reaction was performed in a nal volume of 10μL using 5μL of puri ed PCR product at 4ng/μL, 2μL of 2.5μM primer (forward or reverse), 1μL of 5X BigDye Sequencing Buffer, and 2μL BigDye™ Terminator v3.1 Ready Reaction Mix, following the manufacturer's recommendations.
The reactions took place in a Veriti TM thermal cycler using the universal BigDye TM protocol: 96ºC for 1 minute, followed by 35 cycles of 15 seconds at 96°C, 15 seconds at 50°C, and 4 minutes at 60°C; importantly, the decrease from 96°C to 50°C was done at a rate of 1°C per second, as recommended by the sequencing kit supplier (Applied biosystems TM ).
Sequencing amplicons were puri ed using the BigDye XTerminator TM Puri cation kit (Applied Biosystems TM , Cat. 4376486) following the manufacturer's instructions and submitted to capillary electrophoresis in a SeqStudio Genetic analyzer (Applied Biosystems TM ) using the medium run protocol with default parameters. Chromatograms were inspected using BioEdit 7.2 software.

S-gene region of interest sequencing
Ten samples from different lineages were sequenced for the S-gene region of interest. All samples were ampli ed using the outer primers for the sequencing reaction, and sequencing reactions were performed on ampli ed fragments using the nested PCR inner primers. The sequenced samples included two samples from P.1 (γ) lineage, two samples classi ed as P.2 (ζ), one sample classi ed as B.1.1.7 (α), one sample characterized as N.9 and four samples classi ed as B.1.1.28. Interestingly, one of the B.1.1.28 samples harbored the K417T (22812 A>C), E484K (23012 G>A) and N501Y (23063 A>T) mutations in S gene, which are characteristics of P.1, according to the FASTA sequence for the whole viral genome. This sample is referred to as P.1-like from here throughout the text.
All sequenced fragments were concordant with the whole genomes sequenced through NGS, and the fragments correctly displayed the mutations of interest characteristic of each lineage: The B.1.1.7 lineage sample showed an A>T transversion in nucleotide 372 from forwarding inner primer start corresponding to N501Y and a C>A transversion in nucleotide 580, corresponding to A570D mutation. P.2 samples showed a G>A transition in nucleotide 321 corresponding to the E484K mutation. Lastly, P.1 and P.1-like samples showed an A>C transversion in nucleotide 121 corresponding to K417T mutation in addition to E484K and N501Y mutations. Figure 1 shows a representative chromatogram for a P.1 sample, harboring the three mutations mentioned above.
No mutation was found in classical B.1.1.28 and N.9 lineages. We did not have any sample belonging to other VOCs, VOIs and VUMs in our cohort, however these would be characterized by mutations depicted in supplementary table S2 in corresponding positions of the sequenced fragments. Again, all Sanger-generated sequences matched those generated using NGS for this region, con rming the method's accuracy. For B.1.1.7 samples, the following mutations could be detected: a C>T transition in position 35 from inner forward primer start site corresponding to Q27* ORF8 mutation; a G>T transversion, corresponding to R52I ORF8 mutation; an A>G transition representing the Y73C mutation; an adenine deletion in position 334, in a noncoding region of SARS-CoV-2 genome, between ORF8 and N genes, and, nally a three-nucleotide mutation from GAT to CAT in position 343-345, corresponding to the D3L mutation in N gene ( Figure 2). P.1 samples showed a G>A transition in nucleotide 230 from inner primer, representing E92K mutation in ORF8, while P.2 samples showed a C>T transversion in nucleotide 316, characterizing the silent mutation in nucleotide 28253 of SARS-CoV-2 genome (F120-) ( Figure 3). Importantly, the lack of E92K mutation in the P.1-like sample was con rmed in the current protocol. This fact argues against the hypothesis of this lacking being derived from a sequencing error in NGS and suggests that this sample might be an evolutionary intermediate between B.1.1.28 and P.1 lineages, harboring some but not all P.1-de ning mutations. Supplemental table S3 depict mutations in the targeted region that characterize other VOCs, VOIs and VUMs that are not included in our sample.

Discussion
Since its' emergency in late 2019 in the province of Wuhan, China, COVID-19 has rapidly spread through the world, being declared a pandemic in less than six months from its rst report. Widespread transmission and the prolonged period that the pandemic has lasted until now, new SARS-CoV-2 lineages have emerged, some of which harbor mutations conferring them evolutionary advantages, resulting in increased transmissibility and predominance in communities where they are found [4].
According to the level of evidence, the WHO classi es these lineages as VOIs and VOCs according to the level of evidence that they result in increased transmissibility, virulence or are associated with a decrease in the effectiveness of public health measurements, diagnostic tests, vaccines, or therapeutics (https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/, access on 7 th November 2021). Given their epidemiological relevance, the early detection and tracking of these variants are crucial to inform health systems worldwide, allowing the adoption of adequate measurements to contain their spread [14]. The current tools for lineage-assignment, such as the PANGOlin web resource, requires whole-genome sequencing of infecting SARS-CoV-2. Despite the technological advances which decreased the time and costs of generating whole-genomes through NGS in recent years, the implementation of these technologies beyond big centers and reference laboratories is still a challenge due to the technological and economic barriers as well as to the need for specialized and well-trained staff for sample processing, data acquisition, and analysis. Therefore, studies suggested Sanger-sequencing of ROIs in the SARS-Cov-2 genome, mainly in the S gene, as a feasible method to identify VOCs and VOIs [15,18,19].
In the current work, we provide an additional set of primers allowing the ampli cation and sequencing of regions of interest in S and ORF7/ORF8/N genes containing informative mutations that characterize all the currently listed VOIs and VOCs. Further, two additional primer pairs were designed and validated for each region for use in nested-PCR protocols, allowing the robust ampli cation of both regions of interest even for samples with a low-viral burden. Importantly, all primer pairs were checked for annealing site conservancy in a large number of samples, and the absence of mutations suggests that highly conserved regions were targeted, diminishing the possibility of ampli cation failing.
Since Sanger-sequencing is widespread and requires fewer resources than NGS, the approach described in the current study may assist the identi cation of VOCs and VOIs by a more signi cant number of diagnostic and research laboratories serving health systems with valuable information regarding the spread of these variants. Furthermore, Sanger-sequencing has advantages over other non-NGS-based techniques aiming to identify SARS-CoV-2 mutations and lineages, such as probe-based qRT-PCR methods [20], including the identi cations of several mutations in the targeted region in a single run, allowing for SARS-CoV-2 haplotyping and lineage assignment, and the possibility of discovering new mutations in that region.
In conclusion, the present work provides alternative methods based on PCR-sequencing to identify informative mutations and classify SARS-CoV-2 VOIs and VOCs. The implementation of these methods by more laboratories may help to overcome the technical and economic bottlenecks involved in the identi cation of these variants through NGS and provide a more realistic picture of their emergence and spreading throughout geographical regions that are currently underrepresented in SARS-CoV-2 wholegenomes databanks, allowing for more effective public policies for their contention.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.