The most recent common ancestor (MRCA) branch to the sequences from Hawai’i indicates that the B.1.429 variant was introduced into Hawai’i independently and at different times from several different locations from continental United States (California, Colorado, Louisiana, New Jersey, Tennessee, Utah, and Washington). The largest cluster of 40 SARS-CoV-2 B.1.429 variants in Hawai’i, with samples collected between January and early March 2021, originated from a SARS-CoV-2 strain (EPI_ISL_753448) collected in California on November 30, 2020, at the Children’s Hospital Los Angeles. That said, the first identified, ancestral, unambiguous, and unique B.1.429 variant in Hawai’i originated from a SARS-CoV-2 strain (EPI_ISL_855068) from a sample collected on January 06, 2021, in San Juan Capistrano, California.
Within the S gene of the B.1.429 variant, there is a consensus of four non-synonymous amino acid substitutions (S13I, W152C, L452R, D614G). The effects of S13I and W152C substitutions have yet to be determined. Neither of these mutations were found in more than 5% of all published GISAID sequences as of February 2021. The L452R substitution originating in North America did so in Los Angeles (EPI_ISL_1303471), and is considered a substitution of two variants of concern (B.1.427 and B.1.429).12 The CDC notes that these variants correlate to ~ 20% increased transmissibility of SARS-CoV-2,13 and has reduced neutralization using convalescent sera, vaccinated sera, and sera from patients treated with therapeutics.12 The D614G substitution, near ubiquitous among all SARS-CoV-2 sequences, is noted for increasing the fitness of SARS-CoV-2.14
This method demonstrates the critical importance of high quality sequencing and the need for enrichment and deep coverage, as over 76% of B.1.429 published sequences worldwide are uninformative phylogenetically due to incompleteness. For example, the first B.1.429 sequence deposited from Hawaii was from a sample collected on December 31, 2020 (EPI_ISL_967766). Without resequencing the whole genome, or targeting this ambiguous region with Sanger sequencing, this sequence is currently unable to be used in phylogenetics due to ambiguous nucleotides in the S gene. While uninformative sequences may be useful for tracking specific mutations, they are not useful in tracking variants. Moreover, our methodology demonstrates the ability of sequencing and phylogenetic analysis to provide precision public health genomics in policy-making decisions. That is, as SARS-CoV-2 variants spread asymptomatically across the United States, it is important to use methods for fast and accurate SARS-CoV-2 variant, lineage, and origin assignment.