Two-Pool Multiplex Long-Amplicon Amplication and Full-Genome Sequencing of SARS-CoV-2


 BackgroundSince the first public genome of SARS-CoV-2, over 170,000 genome sequences of the virus have been shared by researchers world-wide (till November 1st 2020). Multiplex PCR targeting SARS-CoV-2 followed by massively parallel sequencing (MPS) and/or nanopore sequencing is a widely used strategy to recover the genome from primary samples. However, the bias of amplification among different amplicons should not be ignored, which might lead to uneven sequencing coverage on the viral genome.MethodsWe aim to develop a novel multiplex PCR panel to achieve an improved coverage evenness of SARS-CoV-2. We adapt long amplicons (~1000-bp) for the panel and thus reduced the number of primer pairs. The panel was validated with clinical samples and sequenced via MPS sequencing systems and a portable nanopore sequencing device MinION. We evaluated the full-genome coverage evenness and its dependence on viral loads of the long amplicon panel; we then compared it with a 98-plex panel provided by the ARTIC network. The accuracy to identify viral genomic variations based on the panel and sequencing with MinION was assessed.ResultsWe developed a two-pool 36-plex panel for full-genome sequencing of SARS-CoV-2, whose amplicon size ranged from 880 to 1027 bp. For samples with a <30 Ct value, >90% viral genome could be recovered with a high sequencing depth (>0.2 mean depth) by using the long-amplicon panel (n = 36), compared with 79-88% highly covered genome region for the ARTIC panel (n = 5). The coverage evenness of the long-amplicon panel was also less affected by low viral titers and not dependent on sequencing data amount. With MinION sequencing, the consensus viral genomes could be reliably recovered. However, a high false positive rate was observed to identify sub-clonal genomic variations with a <0.6 frequency.ConclusionA novel multiplex PCR panel for full-genome sequencing of SARS-CoV-2 with improved coverage evenness and low requirement of data throughput was validated with clinical samples. Amplification of SARS-CoV-2 with the panel followed by MinION sequencing could generate reliable consensus genome sequences, but the detection of non-dominating viral populations within host is error-prone.


Abstract Background
Since the rst public genome of SARS-CoV-2, over 170,000 genome sequences of the virus have been shared by researchers world-wide (till November 1st 2020). Multiplex PCR targeting SARS-CoV-2 followed by massively parallel sequencing (MPS) and/or nanopore sequencing is a widely used strategy to recover the genome from primary samples. However, the bias of ampli cation among different amplicons should not be ignored, which might lead to uneven sequencing coverage on the viral genome.

Methods
We aim to develop a novel multiplex PCR panel to achieve an improved coverage evenness of SARS-CoV-2. We adapt long amplicons (~1000-bp) for the panel and thus reduced the number of primer pairs. The panel was validated with clinical samples and sequenced via MPS sequencing systems and a portable nanopore sequencing device MinION. We evaluated the full-genome coverage evenness and its dependence on viral loads of the long amplicon panel; we then compared it with a 98-plex panel provided by the ARTIC network. The accuracy to identify viral genomic variations based on the panel and sequencing with MinION was assessed.

Results
We developed a two-pool 36-plex panel for full-genome sequencing of SARS-CoV-2, whose amplicon size ranged from 880 to 1027 bp. For samples with a <30 C t value, >90% viral genome could be recovered with a high sequencing depth (>0.2 mean depth) by using the long-amplicon panel (n = 36), compared with 79-88% highly covered genome region for the ARTIC panel (n = 5). The coverage evenness of the longamplicon panel was also less affected by low viral titers and not dependent on sequencing data amount. With MinION sequencing, the consensus viral genomes could be reliably recovered. However, a high false positive rate was observed to identify sub-clonal genomic variations with a <0.6 frequency.

Conclusion
A novel multiplex PCR panel for full-genome sequencing of SARS-CoV-2 with improved coverage evenness and low requirement of data throughput was validated with clinical samples. Ampli cation of SARS-CoV-2 with the panel followed by MinION sequencing could generate reliable consensus genome sequences, but the detection of non-dominating viral populations within host is error-prone.

Background
Genomic surveillance plays an important role in the control and prevention of the COVID-19 pandemic induced by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). As a novel pathogen, the whole genome of SARS-CoV-2 was rapidly revealed in January 2020 [1]. Thereafter, numerous genome sequences of SARS-CoV-2 were obtained and shared by researchers worldwide. Till November 1st 2020, over 170,000 genome sequences of SARS-CoV-2, which were detected from samples collected in over 200 countries, have been deposited in public databases such as the GISAID [2]. Consequently, global spreading and phylodynamics of SARS-CoV-2 were under surveillance [3][4][5][6][7], and lineages of the virus were designated [8,9].
As the pandemic goes on, recovering the full viral genome from primary samples is useful to trace the source and spread of the viral variants as well as to monitor viral genetic variations, which greatly facilitates informed public health decision-making [7,10]. Viral speci c multiplex PCR followed by sequencing is a widely-used strategy to recover the viral genome. Due to the concern of viral RNA degradation, PCR panels for SARS-CoV-2 usually have amplicons with a size <600 bp, shorter than the maximum read length of Illumina MiSeq sequencing apparatus. However, as the bias of PCR e ciency among different amplicons is inevitable, highly multiplex PCR in a pool is more likely to generate uneven coverage. Distributing primers in more pools might reduce the bias [11,12] but it increases the labor and economic cost of the experiment.
To improve the sequencing evenness of the multiplex PCR panel, one approach is to reduce the number of primers in a panel. In our previous study to recover the genome of Ebola virus from clinical samples, two panels with respectively ~1000-bp and ~500-bp amplicon sizes were both implemented [13]. The panel with longer amplicons is favorable as it could have higher coverage and evenness, and if it failed for the highly degraded samples, the short-amplicon panel would be used. Long-amplicon panel is welladaptive with the Oxford Nanopore MinION apparatus, which could generate long read and can be implemented outside conventional laboratories [14,15]. MinION provides an important alternative to massive parallel sequencing (MPS) devices and has been used for the sequencing of the SARS-CoV-2 genome [12,16].
In this study, we developed a new two-pool 36-plex panel for SARS-CoV-2 with ~1000-bp amplicons, which could generate high coverage evenness of SARS-CoV-2 genomes. Consequently, more samples could be sequenced in a MinION owcell and thus greatly reduce the cost.

Nucleic acid test of SARS-CoV-2
Real-time quantitative reverse transcription PCR (qRT-PCR) assay was performed to con rm the infection of SARS-CoV-2 based on oropharyngeal swabs specimen. Total RNA was isolated from viral transport

Primer design and synthesis
The PCR primers for multiplex SARS-CoV-2 speci c ampli cation were designed by using the Primer Scheme v1.3.2 tool (http://github.com/aresti/primalscheme) [14], with the reference genome of SARS-CoV-2 Wuhan-hu-1 strain (GenBank accession MN908947.3). The sizes of amplicons were respectively set as 1000-bp (-a 1000) and 2000-bp (-a 2000). The primer panel with a ~3000-bp amplicon size was manually selected and re-paired from the ARTIC V3 primer panel (http://artic.network/ncov-2019), in which the primers with great bias were excluded. The primers were synthesized by Sangon Biotech (Shanghai, China). The primer pairs of the 1000-bp, 2000-bp and 3000-bp panels were provided in the Additional le.

Multiplex viral ampli cation
Extracted total RNA was rst reverse-transcripted to cDNA by using NEBNext Ultra II RNA First Strand Synthesis Module (New England Biolabs, USA). The generated rst strand cDNA was used as the template for SARS-CoV-2 speci c ampli cation with different multiplex PCR primer panels and NEBNext High-Fidelity 2X PCR Master Mix (New England Biolabs) following the manufacturer's instructions. The regimen of thermal cycling for the multiplex PCR primer panels followed the ARTIC protocol: 30 sec at 98°C ; 35 cycles of 15 sec at 98 °C and 5 min at 65 °C; then hold at 4 °C. Pooled and cleaned with GeneJET Gel Extraction Kit (Thermo Fisher Scienti c, USA) according to the manufacturer's instructions. PCR negative controls were used during the ampli cation.

Library preparation for sequencing by using MPS
The PCR products with the ARTIC primer panel had an amplicon size of ~400-bp and were prepared for libraries of MiSeq sequencing without fragmentation. The products with the longer amplicon sizes were enzyme-based fragmented for 20 min at 37 °C and 30 min at 65 °C. The libraries with barcodes were prepared by using NEBNext Ultra II FS DNA Library Prep Kit for Illumina (New England Biolabs). The MiSeq system (Illumina, USA) and NovaSeq 6000 system (Illumina) were used to generate paired-end reads.
Library preparation for nanopore sequencing by using MinION For nanopore sequencing, the cDNA was treated with NEBNext Ultra II End repair/dA-tailing Module (New England Biolabs) and ligated with barcodes from native barcoding kit (Oxford Nanopore Technologies, ONT) with NEB Blunt/TA Ligase Master Mix (New England Biolabs). The adapter from the ligation sequencing kit (Oxford Nanopore Technologies, UK) was next ligated by using NEBNext Quick Ligation Module (New England Biolabs). Sequencing was then performed using the Nanopore sequencer MinION
For the data of MinION sequencing, the reads with a desired length (between 750 to 1100 bases for the 1K-panel ampli cation) were selected and trimmed with start and end 30 bases. NGMLR v0.2.7 [21] was used for alignment with the default setting. Depth pro les were generated with SAMtools v1.9, and SNV identi cation was conducted by using the home-made 'iSNV-calling' work ow.

Results
We sought to improve the coverage evenness of multiplex PCR panel targeting SARS-CoV-2 by increasing the size of amplicons. To nd a favorable size of amplicons, we designed and synthesized three two-pool panels respectively consisted of ~1,000-bp, and ~2,000-bp, and ~3,000-bp amplicon (Additional le). One nasopharyngeal swab sample was used for preliminary validation of the three panels. The multiplex PCR products were sequenced with MinION. The result shows that both the panels of 1000-bp and 2000-bp amplicons could obtain full viral genome, but the ampli cation of the 3000-bp panel failed due to that most RNA/cDNA degraded to short fragments. Then, the coverage pro les of the 1000-bp and 2000-bp panel were compared. The 2000-bp panel had a much larger bias of coverage among amplicons than the 1000-bp panel ( Figure 1A). For the 2000-bp panel, 30.6% of sequencing data were assigned to one amplicon. Therefore, we selected the panel of ~1000-bp amplicons for the following analyses, referred to as 1K-panel. The 1K-panel contains 36 primer pairs in two pools (18 pairs each), and the sizes of the amplicons ranged from 880-bp to 1027-bp with an averagely 112-bp overlap.
We next compared the evenness of viral coverage by using the 1K-panel and a widely used 98-plex primer panel provided by the ARTIC network (http://artic.network/ncov-2019, version V3). RNA was extracted from six nasopharyngeal swabs with varied SARS-CoV-2 titers (C t values, 22.9 -31.0), and aliquots were respectively ampli ed with the 1K-panel and the ARTIC panel, followed by sequencing with MiSeq. We de ned the viral genome region with a >20% mean depth as the high-coverage region, and the proportion of the high-coverage regions was used to quantify coverage evenness.
We found that the 1K-panel generated a more even sequencing coverage than the ARTIC panel ( Figure  1B). With the 1K-panel, the proportions of the high-coverage regions were averagely 93.0% (SD=1.7%) for the six samples; whereas it was 80.6% (SD=8.7%) for the ARTIC panel. The coverage evenness was dependent on the viral titer of samples, and we found that 1K-panel was less affected by low viral titers than the ARTIC panel ( Figure 2A). As the C t values increasing, the proportions of the high-coverage region slightly decreased from 95.3% to 90.5% for the 1K-panel, compared with 88.1% to 62.4% for the ARTIC panel. We also evaluated a higher cutoff (>30% mean depth) to de ne the high-coverage region, and the 1K-panel maintained the advantage (Figure 2A).
Next, we included another 49 nasopharyngeal swabs to further evaluate the performance of the 1K-panel to recover the viral genome. We observed a comparable e ciency of viral ampli cation and genome coverage evenness ( Figure 2B). Among 29 of the 49 samples with a <30 C t value, except one sample, all had a viral genome recovery with >90% high-coverage region (mean 93.8%, SD 8.1%). The evenness decreases for the samples with a >30 C t value, but ve of six samples with a 30-33 C t value had >70% high-coverage regions.
The amplicon sizes of the 1K-panel exceeded the maximum read length of MPS, but are favorable for nanopore sequencing in which fragmentation of PCR products is not needed. Recent studies have shown that nanopore sequencing by using MinION sequencing could generate accurate consensus genome of SARS-CoV-2 (12,16). Thus, we focused on assessing the level of intrinsic noise to identify viral single nucleotide variations (SNVs), especially the sub-clonal variations, via the 1K-panel ampli cation followed by MinION sequencing. We sequenced the PCR products of the 1K-panel from eight samples by using both MinION (ligation-based kit and R9.4.1 owcell) and MiSeq (enzymatic fragmentation and pair-end 2×300 bp owcell). A high depth coverage was obtained for both sequencing devices (Table 1) which is required to identify sub-clonal SNVs. Then we implemented a home-mode bioinformatics work ow based on piled sequencing bases (see Methods) to identify SNVs. We included the SNV with a mutated allele frequency (MuAF) larger than 0.2 for the assessments. With this cutoff of MuAF, the MiSeq-based identi cation of SNVs should be reliable as previous studies shown [17,22].
The SNVs respectively identi ed with MinION and MiSeq sequences are shown in Figure 3A. Based on the distributions of MuAFs of these SNVs, we found that, with a MuAF≥0.8 threshold to identify SNVs, 96.8% (30/31) SNVs based on MiSeq could be identi ed by using MinION. One SNV (G26526T, in sample S1) was missed due to a much reduced MuAF (MiSeq 0.95, MinION 0.51). No false positive SNV with a ≥0.8 MuAF was observed through MinION sequencing, which is consistent with previous studies [12,16].
However, MinION-based SNVs with a lower MuAF were highly error-prone. Benchmarked by the MiSeqbased SNVs with a 0.2-0.8 MuAF, the MinION-based SNVs had a recall rate of 100% (4 of 4) and a precision as low as 12.9% (4/31). We illustrated these arti cial SNVs and the true positives along the viral genome ( Figure 2B). Most arti cial SNVs were shared among samples, which is consistent with the feature of systematic error of nanopore sequencing [23,24]. No obvious enrichment of arti cial SNVs at certain viral genome region was observed. The result suggests that the sub-clonal SNVs could not be reliably identi ed based on MinION sequencing.
As the 1K-panel could generate an even sequencing coverage on the SARS-CoV-2 genome and no need to detect minor viral populations, a large amount of MinION sequencing data per sample (about 300 to 500 Mbp) seems over-abundant. Therefore, to test the limit of the 1K-panel to recover SARS-CoV-2 genome, we re-sequenced the eight samples with an ultra-low throughput by using MinION (Flongle R9.4.1). We obtained 1.21 to 6.14 Mbp data per sample, the majority of viral genomes were recovered with sequencing reads (Figure 4). With a cutoff of ≥5-fold sequencing depth, 29 (93.5%) of the 31 high-MuAF SNVs could be identi ed, and no false positive SNV was observed. The two SNVs (T11158C and T28144C) failed to be identi ed due to insu cient or no coverage in sample S7, which had the lowest 1.21 Mbp sequencing data. We furthermore examined the coverage evenness. In 7 of the 8 samples, >90% (90.5% -96.6%) of the viral genome had high coverage, and for sample S8 the proportion was 81.6%. Thus, the coverage evenness by using the 1K-panel and MinION sequencing was not affected by the extremely low data input.

Discussion
As the COVID-19 pandemic goes on, the full-genome based surveillance of SARS-CoV-2 would become a routine approach for the control and prevention of the pandemic. Even in countries where the outbreak appears to be leveling off, such as China, continuously regional resurgence of COVID-19 has been seen [25]. Rapid recovery of viral genome and following phylogenetic analyses could determine the lineage of the virus and elucidate how the virus was spiked in and spread in local communities.
The multiplex PCR of SARS-CoV-2 followed by sequencing is the most widely-used strategy to recover the viral genome from primary samples. The portable sequencer MinION provides eld sequencing and could achieve near real-time genomic surveillance of SARS-CoV-2 in an outbreak. The performance of the multiplex primer mix panel that targets SARS-CoV-2 is essential for the e ciency of recovering the viral genome. A panel with low bias in the ampli cation of amplicons could provide high coverage evenness on the viral genome. It reduces the possibility of failure to identify lineage-determining viral variations, which is important to trace the virus in the global phylogeny. Moreover, a high coverage evenness greatly decreases the cost per sample of sequencing. In our evaluation of the 1K-panel, approximate 5 Mbp sequencing data per sample could generate accurate consensus viral genome from samples with a <30 C t value. Thus, we speculate that the sequencing of 96 or more barcoded samples in a batch with one MinION ow cell is feasible, in which about 20-30 Mbp data could be generated for each sample. It is especially useful when a large number of primary samples are needed to be analyzed, such as rapid lineage assignment of samples from screening testing or environmental contamination.
To improve the coverage evenness, we adopt ~1000-bp amplicons for the panel, which are longer than those of some widely-used panels such as the ARTIC primer panel. Very recently, Moore et al. proposed a multiplex primer panel with amplicons ranged from 956-bp 1450-bp [11]. However, their primer pairs are distributed in six multiplex pools and the bias of certain amplicons during the ampli cation is obvious, which induces a more uneven coverage on the viral genome than the 1K-panel. Another major concern of long-amplicon panel is the applicability for clinical samples, as the RNA is easily degraded. Although we did not observe a reduced e ciency of the viral ampli cation among the clinical samples in this study, it should be noted that these samples were all tested within two weeks. Compared with short-amplicon panel, the long-amplicon panel would be more affected by the degradation of RNA/cDNA, especially for the samples stored for a long time. Quality control for RNA integrity before ampli cation could be helpful but it requires extra equipment and procedures. We suggest that the long-amplicon panel could be used for the rst round of viral ampli cation, and the samples that failed to obtain su cient PCR products can be further ampli ed with another short-amplicon panel.
With the approach of 1K-panel ampli cation followed by MinION sequencing, consensus viral genome could be accurately recovered, but identifying viral sub-clonal variation is infeasible, possibly due to the systematic error of nanopore sequencing. Moreover, based on our assessment, a MuAF>0.8 cutoff to de ne consensus SNVs through MinION sequencing is more reliable than the >0.5 cutoff. In the study which analyzes the intra-host diversity of viral genomics, MPS is necessary. As the fragmentation of PCR products with the 1K-panel is needed to conduct MPS, one could choose the 1K-panel or other shortamplicon panels for viral ampli cation considering the trade-off between high coverage evenness and the extra experimental procedures.

Conclusions
We developed and validated a new two-pool, long-amplicon multiplex PCR primer panel for full-genome sequencing of SARS-CoV-2 genome from primary samples. The panel could generate a more even coverage of SARS-CoV-2 than the ARTIC short-amplicon panel, and subsequently has a reduced requirement for sequencing data amount. For the samples with a C t <30 titer, ~5 Mbp data by MinION provided a >95% viral genome coverage with a >10-fold depth. Meanwhile, our assessment shows that nanopore sequencing with MinION enables reliable identi cation of dominating viral population (consensus viral genome) within samples, but was highly error-prone to discover the minor viral population. The SNVs with a <0.6 MuAF should be regarded as unreliable.