Genomic Analysis of SARS-CoV-2 Variants of Concern Circulating in Hawai’i to Facilitate Public-Health Policies

Using genomics, bioinformatics and statistics, herein we demonstrate the effect of statewide and nationwide quarantine on the introduction of SARS-CoV-2 variants of concern (VOC) in Hawai’i. To define the origins of introduced VOC, we analyzed 260 VOC sequences from Hawai’i, and 301,646 VOC sequences worldwide, deposited in the GenBank and global initiative on sharing all influenza data (GISAID), and constructed phylogenetic trees. The trees define the most recent common ancestor as the origin. Further, the multiple sequence alignment used to generate the phylogenetic trees identified the consensus single nucleotide polymorphisms in the VOC genomes. These consensus sequences allow for VOC comparison and identification of mutations of interest in relation to viral immune evasion and host immune activation. Of note is the P71L substitution within the E protein, the protein sensed by TLR2 to produce cytokines, found in the B.1.351 VOC may diminish the efficacy of some vaccines. Based on the phylogenetic trees, the B.1.1.7, B.1.351, B.1.427, and B.1.429 VOC have been introduced in Hawai’i multiple times since December 2020 from several definable geographic regions. From the first worldwide report of VOC in GenBank and GISAID, to the first arrival of VOC in Hawai’i, averages 320 days with quarantine, and 132 days without quarantine. As such, the effect of quarantine is shown to significantly affect the time to arrival of VOC in Hawai’i, both during and following quarantine. Further, the collective 2020 quarantine of 43-states in the United States demonstrates a profound impact in delaying the arrival of VOC in states that did not practice quarantine, such as Utah. Our data demonstrates that at least 76% of all definable SARS-CoV-2 VOC have entered Hawai’i from California, with the B.1.351 variant in Hawai’i originating exclusively from the United Kingdom. These data provide a foundation for policy-makers and public-health officials to apply precision public health genomics to real-world policies such as mandatory screening and quarantine.


Introduction
Precision public health genomics is a public health policy tool to track the spread of viruses. In the age of fast-evolving digital information, precision public health genomics became prominent during the West converts the vast number of sequences into smaller collections of pre-de ned similar sequences. These can further generate multiple sequence alignments (MSA) to produce phylogenetic trees e ciently and at low cost. Similarly, all geographically similar sequences reported in GenBank were downloaded using the search term SARS-CoV-2 and state abbreviation (e.g., "SARS-CoV-2 HI") and the sequence length lter (20, Align sequences using multiple alignment using fast Fourier transform (MAFFT) program or server (https://mafft.cbrc.jp/alignment/server/add_fragments.html?frommanualnov6) [11][12][13] with MN908947 as a reference and do not remove any uninformative sequences and all parameters set as "same as input." 8) Remove the newly added MN908947 sequence that MAFFT places at the beginning of the alignment using AliView, Geneious Prime, or a text editor. If not, the sRNA toolbox will remove the MN908947 sequence during the duplicate removal step, and Lineage B will not serve as an ancestral root in the phylogenetic tree. 9) Import MSA le into Geneious Prime or AliView, 10 search for the orf1a 5' start of the entire alignment (5'-atggagagccttgtccctggtttca-3') and remove the 5' untranslated region (UTR) by deleting the upstream region (~265 bp) from the MSA. Next, search for ORF10 3' end (5'tgtagttaactttaatctcacatag-3') and remove the entire 3' UTR by deleting the downstream region (~229 bp) from the MSA. 10) Create a duplicate le for the MN908947 sequence and remove the 5' UTR and 3' UTR from MN908947 as described above. 11) Using MAFFT, align the trimmed MSA with the trimmed MN908947 as a reference and delete sequences with uncalled nucleotides 'n'. Set the "remove uninformative sequences" parameter in the MAFFT at >0%. 12) Using sRNAtoolbox program or server (https://arn.ugr.es/srnatoolbox/helper/removedup/), 14 load the updated alignment to remove duplicate sequences and merge identi cations (also referred to as sequence accession numbers) of duplicates. This merger will create "appendages" in the phylogenetic tree where the sRNA toolbox will line up identical sequences together with equal signs (=). 13) Import the nal alignment into Geneious Prime and create an approximately maximum-likelihood phylogenetic tree using the FastTree program. 15 Alternatively, FastTree can run as standalone software, and FastTreeMP is appropriate when multiple CPU cores/threads are available. 14) Root the tree with Lineage A (EPI_ISL_406801), which should then be the most recent common ancestor (MRCA) to Lineage B (MN908947) if performing phylogenetics on a Lineage B subgroup. Identify the MRCA of each sequence of interest.
For the B.1.1.7 VOC, which began with 272,732 sequences, we partitioned the sequences into seven sub-MSAs of ~50,000 sequences and performed the above method on each sub-MSA. After unambiguous sequences and duplicates were removed from each group, sub-MSA were recombined using AliView. 10 Duplicates were removed after each recombination of two sub-MSA, except for the nal MSA due to size

Identifying the Consensus of Each VOC
To identify consensus SNPs of SARS-CoV-2 VOCs, assign Lineage B as the reference sequence, and use the Geneious Prime "Find Variations/SNPs" Annotate and Predict function to identify consensus SNPs. Input SNPs into the SnapGene (Insightful Science, snapgene.com) to identify the nucleotide and amino acid number and substitution as described previously. 16 Evaluating the Effect of Quarantine To test the hypothesis that the 67-day (2020-03-25 to 2020-05-31) 17 quarantine in Hawai'i, and the collective 43-state quarantine in the United States that occured from 2020-03-11 to 2020-06-16, signi cantly delayed the arrival of VOC, we partitioned the VOC into two categories. The rst category of "quarantine" are those VOC (B.1.1.7, B.1.351, and B.1.429) that emerged worldwide before and during the 43-state collective mandatory quarantine was in effect. The second category of "post-quarantine" are those VOC (B.1.427 and P.1) that emerged worldwide after all quarantines were lifted.     the Lineage A reference sequence (EPI_ISL_406801). 2 These trees were generated using FastTree in MANA HPC and Geneious Prime. 15 Based on this analysis, 228 of the 260 (87.69%) VOC found in Hawai'i have identi able origins. Figure 2, shows the states in the continental United States, as well as the countries worldwide, that were identi ed as being the source of the B.  into Hawai'i using phylogenetic analysis ( Figure 1C). Using this method, we were able to identify the origin of 10 of 14 B.1.1.7 sequences introduced into Hawai'i ( Figure 2 Figure 1D). Using this method, we were able to identify the origin of all ve B.1.351 sequences introduced into Hawai'i ( Figure 2).   Figure 4D).

Discussion
Precision public health genomics has been a tool in past outbreaks that has yet to be applied for the COVID-19 pandemic. These data and the method serve as a foundation for policy-makers to apply precision public health genomics tools by discerning trends related to the source of SARS-CoV-2 introductions. By identifying the origin of SARS-CoV-2, policies can be reasonably constructed with evidence-based decisions. loss of a proline in the envelope protein. 28 The envelope protein was recently shown to interact with TLR2 and initiate in ammatory response. 29 Prolines are known to be involved in beta-turns, and the P71L substitution could signi cantly change the secondary and tertiary protein structures. This proline loss is striking for vaccines, since some vaccines effective against wild-type SARS-CoV-2, have diminished e cacy against the B.1.351 variant. 30,31 Efforts  Table 2, are enigmatic, and warrant further studies. Conclusively, what is certain is that tracking the spread of these VOC, and determining the effects of their substitutions, is paramount in the effort to control the pandemic.

Ambiguous Sequences
This method demonstrates the need for high-quality sequencing and the need for enrichment and deep coverage. For example, the rst B.1.429 sequence deposited from Hawai'i was from a sample collected on December 31, 2020 (EPI_ISL_967766) is presently unusable. Without resequencing the whole genome or lling in with Sanger sequencing, this sequence is currently not of use in phylogenetics and origin determination due to ambiguous nucleotides in the S gene. While uninformative sequences may be useful for tracking the emergence of individual mutations, they are not useful in tracking VOC.

Public Policy Recommendations and Impacts
As the scienti c community continues to understand the vaccine and healthcare consequences of VOC, of crucial importance is to control and limit the spread of these VOC. Policy-makers must rst ascertain the source of the spread before they can control and limit the spread of VOC. After policy-makers contain the source, healthcare providers will treat patients with the most effective therapy, and vaccines will uphold their e cacy. However, without such precision public health genomics in practice, society risks losing progress in the ght against this pandemic. As only 5.4% of the global population is fully Policies should encourage research focusing on developing pseudoviruses 47 and infectious clones 48 to evaluate kinetics, virulence, anatomical localization, transmission, and neutralization by mAbs, convalescent sera, and vaccine sera. Funding to conduct the aforementioned research should be directed at local and national levels.

Limitations
As SARS-CoV-2 sequences continue to be submitted retrospectively, these data will evolve. As a tool for precision public health genomics, the highest value is the trends that this method elucidates. The quarantine data will also change, this analysis is a snapshot in time.

Conclusions
These methods demonstrate the ability of precision public health genomics to identify the origin of SARS-CoV-2, the success of quarantine in Hawai'i, and the concern of emerging VOC. The conclusion from de ning the origin of VOC in Hawai'i is that California is the primary source of VOC circulating in Hawai'i.
Additional screening and quarantining of the travelers from California while vacationing in Hawai'i will protect the local population from evasive SARS-CoV-2 VOC. A tool was needed to evaluate and make use of the vast worldwide sequencing effort and the tool herein lls that need. Moreover, our methodology demonstrates the ability of sequencing and phylogenetic analysis to provide precision public health genomics in policy-making decisions. As SARS-CoV-2 VOC spreads asymptomatically across the United States and worldwide, it is essential to use fast and accurate SARS-CoV-2 VOC, lineage, and origin assignment for making evidence-based public-policy decisions. Figure 1 Phylogenetic