Multiplex sequencing of SARS-Cov-2 genome directly from clinical samples using the Ion Personal Genome Machine (PGM).

Various methods have been developed for rapid and high throughput full genome sequencing of SARS-CoV-2. Here, we described a protocol for targeted multiplex full genome sequencing of SARS-CoV-2 genomic RNA directly extracted from human nasopharyngeal swabs using the Ion Personal Genome Machine (PGM). This protocol involves concomitant amplification of 237 gene fragments encompassing the SARS-CoV-2 genome to increase the abundance and yield of viral specific sequencing reads. Five complete and one near-complete genome sequences of SARS-CoV-2 were generated with a single Ion PGM sequencing run. The sequence coverage analysis revealed two amplicons (positions 13 751-13 965 and 23 941-24 106), which consistently gave low sequencing read coverage in all isolates except 4Apr20-64- Hu. We analyzed the potential primer binding sites within these low covered regions and noted that the 4Apr20-64-Hu possess C at positions 13 730 and 23 929, whereas the other isolates possess T at these positions. The genome nucleotide variations observed suggest that the naturally occurring variations present in the actively circulating SARS-CoV-2 strains affected the performance of the target enrichment panel of the Ion AmpliSeq™ SARS CoV 2 Research Panel. The possible impact of other genome nucleotide variations warrants further investigation, and an improved version of the Ion AmpliSeq™ SARS CoV 2 Research Panel, hence, should be considered.


Introduction
The Coronavirus Disease 2019 (COVID- 19) pandemic is a serious public health crisis [1]. The disease was rst reported as a u-like illness, causing pneumonia in Wuhan, Hubei Province, China, in late December 2019 [2,3]. A novel betacoronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which was initially named as 2019 novel coronavirus (2019-nCoV) is the etiologic agent of this ongoing COVID-19 pandemic [4]. Similar to other coronaviruses, SARS-CoV-2 is primarily transmitted between people via close contact with the infected person, exposure to infectious respiratory droplets or contaminated surface [5]. Genetically, the SARS-CoV-2 clustered closely with viruses recovered in bats and pangolins, suggesting its potential zoonotic origin [6,7]. This SARS-CoV-2 related lineage together with severe acute respiratory syndrome coronavirus (SARS-CoV) and the SARS-CoV-related bats viruses forms the sarbecovirus subgenus [8].
The rst SARS-CoV-2 complete genome sequence, Wuhan-Hu-1 (MN908947), was derived from RNA extracted directly from the patient's bronchoalveolar lavage uid using the whole genome meta-transcriptomics approach [4]. This unbiased sequencing approach allows culture-free detection of novel [4] or known pathogens [9]. The metatranscriptomics approach, however, is not always able to produce the entire genome sequence due in part to the presence of host RNAs that contribute to the majority of the generated sequencing reads [4]. The performance of the meta-transcriptomic approach also relies heavily on the virus titer and types of sample [10]. The use of this metatranscriptomics approach for the sequencing of a large number of SARS-CoV-2 samples is therefore limited. After releasing the rst SARS-CoV-2 complete genome sequence, extensive efforts have been made to develop various sequence-based next-generation sequencing (NGS) library preparation methods to increase the yield of viral speci c sequencing reads [11][12][13]. PCR ampli cation and hybridization are two commonly used sequence-based approaches that enable SARS-CoV-2 target enrichment [14]. This is an effective step to ensure successful complete genome sequencing, especially in samples with low viral titer. The Ion AmpliSeq™ SARS-CoV-2 Research Panel (Ion Torrent, Thermo Scienti c) is an amplicon-based SARS-CoV-2 target enrichment approach for full genome sequencing of SARS-CoV-2. The published work ow was designed to be used with an Ion GeneStudio™ S5 Series System (MAN0019277 Rev.A.O).
Here, we described 1) a modi ed Ion Torrent library preparation of Ion AmpliSeq™ SARS-CoV-2 Research Panel (Ion Torrent, Thermo Scienti c), 2) the sequencing work ow using Ion PGM, and 3) data analysis using Torrent Suite TM Software (Ion Torrent, Thermo Scienti c). We generated six complete and near-complete SARS-CoV-2 genome sequences from RNA samples extracted from patient nasopharyngeal swab in a single Ion PGM run. Our results reported and revealed the possible impact genetic sequence variations affecting the performance of the Ion AmpliSeq™ SARS-CoV-2 Research Panel. An improved version of the Ion AmpliSeq™ SARS-CoV-2 Research Panel should be considered to allow better and faster sequencing of SARS-CoV-2 complete genome in a single run.

Results
Overall, 5.1 million usable reads with the mean read length of 209 bp for six SARS-CoV-2 isolates were generated in a single run on the Ion 318 V2 chip BC. The Ion AmpliSeq™ SARS-CoV-2 Research Panel comprised 237 amplicons speci c to the SARS-CoV-2 genome and ten amplicons speci c to ve human expression controls. The sequence coverage on target (SARS-CoV-2 and human expression control) was 99.99% for all six SARS-CoV-2 isolates used in this study with an average mean depth of more than 1000X and coverage uniformity of approximately 96% and above. The amplicon targets drop out occurred on the human expression control (Supplementary Table 1). In order to re ect the actual read distribution on the SARS-CoV-2 genome, we reanalyzed the data after removing the human expression control amplicons from the reference genome le. As expected, the mean base depth and base coverage uniformity for all six isolates improved to 100% on target mapping and more than 97% base coverage ( Table 1).
In order to access the ampli cation uniformity, we looked into the data on the total number of amplicons mapped onto each target region. Among the 237 SARS-CoV-2 speci c amplicon targets, 13 had less than 20% of average reads per amplicon in at least one of the SARS-CoV-2 ( Table 2 and Table 3). Among the 13 amplicons targets, two of the amplicon targets, r1_1.14.786182 and r1_1.25.388943, corresponded to amplicons at positions 13751-13965 and 23941-24106, respectively, showed extremely low coverage (<5% of the average reads per amplicon) in most of the isolates except 4Apr20-64-Hu (Table 3). To investigate the potential contribution of genome variations, we analyzed the two low covered regions and the primer binding sites for all six isolates. Since the primer sequences were not made available, we assumed the 30 bp upstream and downstream of the amplicons as the potential primer binding sites.
Based on these assumptions, it was revealed that 4Apr20-64-Hu possessed C at positions 13730 (21bp upstream of amplicon r1_1.14.786182) and 23929 (12bp upstream of amplicon r1_1.25.388943), while other isolates possessed T at these positions (Figure 1), suggesting the presence of genetic variations within the potential primer binding sites.

Discussion
The number of SARS-CoV-2 genome sequences in the public database is growing rapidly [15]. The exponential growth of the publicly available SARS-CoV-2 genome sequences attributable to the rapid genome sequencing, development of data analysis work ow, and data sharing by researchers worldwide [11][12][13]. Currently, most of the sequencing work ows were created for the use of the Oxford Nanopore and Illumina's sequencing platforms [16][17][18]. The ARTIC protocol is one of the most widely used sequencing methods using the Oxford Nanopore platform [11]. In some cases, researchers used both the Oxford Nanopore and Illumina sequencing platforms to generate consensus genome sequences [18]. The Ion Torrent sequencing platform, one of the popular platforms that was extensively used for viral genome sequencing [19][20][21], however, was not widely used in SARS-CoV-2 sequencing. Several Ion-Torrent based SARS-CoV-2 sequencing work ows were reported [12,22,23] but not very popular among the researchers. Unlike the ARTIC protocol, the published protocol for some of these in-house Ion-Torrent based assay was not detailed enough for it to be replicated in other laboratory settings [12,23]. The recently launched Ion AmpliSeq™ SARS-CoV-2 Research Panel user guide contained the detailed and optimized sequencing protocol for the Ion S5 sequencing platform (MAN0019277 Rev.A.O).
In the current study, we adopted and modi ed protocol from Ion AmpliSeq™ SARS-CoV-2 Research Panel user guide and applied it to the Ion PGM sequencing platform. When adopting or establishing a new protocol, it is critical to harmonize the steps written in the user manual with the existing Standard Operating Procedures (SOP) in a laboratory. This is to ensure that we obtain high-quality data and results. If a different platform or different reagents from the user manual were used, the speci c SOP should be carefully optimized and accessed to maximize the output and data reproducibility of this newly established protocol. In a single sequencing run with Ion 318 chip, ve complete and one near-complete genome sequences of SARS-CoV-2 derived from RNA samples directly extracted from human nasopharyngeal swabs were generated. According to the manufacturer's protocol, 1 million sequencing reads were recommended for every sample (MAN0019277 Rev.A.O). Our sequencing run, however, generated approximately 180,000 to 1,500,000 reads per sample, suggesting a lesser number of total reads were su cient to generate a complete genome sequence. Hence, the number of multiplexed samples can be increased to reduce the sequencing cost using the Ion PGM.
For NGS, an ideal protocol should generate results with high on-target speci city and read coverage uniformity. In this study, 99.9% of the generated reads of all six isolates were mapped to SARS-CoV-2 genomes with more than 96% coverage uniformity. Uneven read distribution is a common issue, and intrinsic factors affect data quality of NGS [24].
In fact, uneven reads distribution was also reported in the previous version of SARS-CoV-2 tiling PCR ampli cation method [11]. Two target regions (r1_1.14.786182 and r1_1.25.388943) of the Ion AmpliSeq™ SARS-CoV-2 Research Panel consistently resulted in low sequencing read coverage in most samples isolated from human and infected cell culture supernatant (unpublished data). Clearly, this problem cannot be random or sample type dependent. Generally, increased amount of sequencing output is the easiest way to improve the coverage in the low read depth region. The 4Apr20-64-Hu with sequencing reads of 216,151 had good sequence coverage (>500X) for both regions. Neither the sample with lower sequencing reads (21Apr20-209-Hu) nor samples with a higher number of sequencing reads demonstrated good coverage at these two regions. The high amount of sequencing reads will only lead to the oversequencing of the adequately covered regions, causing higher sequencing costs. Therefore, increasing the overall number of sequencing reads will not be suitable for solving the read depth problem for r1_1.14.786182 and r1_1.25.388943. Other factors, such as genetic variation and variability of GC content, are common factors that could affect the e ciency of the target enrichment process. We observed that the genetic variations between 4Apr20-64-Hu (>500X read coverage at both regions) and other isolates at positions 13730 and 23929 at the potential primer binding sites for target regions, respectively. Hence, low coverage regions reported herein could be genome dependent, and the speci c genetic variations may lead to ine cient primer annealing during the multiplex ampli cation process. An improved version of the Ion AmpliSeq™ SARS-CoV-2 Research Panel tackling these two low coverage regions or other genetic variations present in the circulating SARS-CoV-2 strains should be considered.
Taken together, we report a rapid complete genome sequencing protocol for the SARS-CoV-2 to be used with the Ion PGM. The much lesser amount of sequencing reads than the recommended 1 million reads was su cient to produce a complete SARS-CoV-2 genome sequence. Six samples or more can be included in a single sequencing run using the Ion 318 chip. Our ndings nonetheless revealed that using the Ion AmpliSeq™ SARS-CoV-2 Research Panel, two potential dropout regions would occur, and increasing of the sequencing reads would not be useful. An improved version of the Ion AmpliSeq™ SARS-CoV-2 Research Panel that addresses the potential genetic variations at the primer binding sites could improve the read coverage uniformity and usability of the sequencing data.

Ethics statement
The study was approved by the Medical Ethics Committee of the University Malaya Medical Centre (MREC ID no.: 20201228-9626) and informed consent was waived based on the basis that this is a retrospective study using anonymous samples and data. All methods were carried out in accordance with relevant guidelines and regulations.

SARS-CoV-2 RNA samples
The Genome assembly and analysis using Torrent Suite Software The generated sequencing reads were analyzed using the in-built work ow and reference le provided by the manufacturer in Torrent Suite TM Software (Ion Torrent, Thermo Scienti c). The generated reads were mapped to SARS-   Table 3: Amplicon targets with low read coverage