Recurrent independent emergence and transmission of SARS-CoV-2 Spike amino acid H69/V70 deletions

SARS-CoV-2 Spike amino acid replacements in the receptor binding domain (RBD) occur relatively frequently and some have a consequence for immune recognition. Here we report recurrent emergence and signi�cant onward transmission of a six-nucleotide deletion in the S gene, which results in loss of two amino acids: H69 and V70. Of particular note this deletion, 𝚫 H69/V70, often co-occurs with the receptor binding motif amino acid replacements N501Y, N439K and Y453F. One of the 𝚫 H69/V70+ N501Y lineages, B.1.1.7, is comprised of over 4000 SARS-CoV-2 genome sequences from the UK and includes eight other S gene mutations: RBD (N501Y and A570D), S1 ( 𝚫 H69/V70 and 𝚫 144/145) and S2 (P681H, T716I, S982A and D1118H). Some of these mutations have presumably arisen as a result of the virus evolving from immune selection pressure in infected individuals and at least one, lineage B.1.1.7, potentially from a chronic infection. Given our recent evidence that 𝚫 H69/V70 enhances viral infectivity (Kemp et al. 2020), its effect on virus �tness appears to be independent to the RBD changes. Enhanced surveillance for the 𝚫 H69/V70 deletion with and without RBD mutations should be considered as a priority. Permissive mutations such as 𝚫 H69/V70 have the potential to enhance the ability of SARS-CoV-2 to generate new variants, including vaccine escape variants, that would have otherwise signi�cantly reduced viral �tness.

Background SARS-CoV-2's Spike surface glycoprotein engagement of hACE2 is essential for virus entry and infection 1 , and the receptor is found in respiratory and gastrointestinal tracts 2 .Despite this critical interaction and the constraints it imposes, it appears the RBD, and particularly the receptor binding motif (RBM), is relatively toleratant to mutations 3,4 , raising the real possibility of virus escape from vaccine-induced immunity and monoclonal antibody treatments.Spike mutants exhibiting reduced susceptibility to monoclonal antibodies have been identi ed in in vitro screens 3,5,6 , and some of these mutations have been found in clinical isolates 7 .Due to the susceptibility of the human population to this virus, the acute nature of infections and limited use of vaccines to date there has been limited selection pressure placed SARS-CoV-2 8 ; as a consequence few mutations that could alter antigenicity have increased signi cantly in frequency.
The unprecedented scale of whole genome SARS-CoV-2 sequencing has enabled identi cation and epidemiological analysis of transmission and surveillance, particularly in the UK 9 .As of December 18 th, there were 270,000 SARS-CoV-2 sequences available in the GISAID Initiative (https:gisaid.org/).However, geographic coverage is very uneven with some countries sequencing at higher rates than others.This could result in novel variants with altered biological or antigenic properties evolving and not being detected until they are already at high frequency.
Studying SARS-CoV-2 chronic infections can give insight into virus evolution that would require many chains of acute transmission to generate.This is because the majority of infections arise as a result of early transmission during pre or asymptomatic phases prior to peak adaptive responses, and virus adaptation not observed as the virus is usually cleared by the immune response 10,11 .We recently documented de novo emergence of antibody evasion mutations mediated by S gene mutations in an individual treated with convalescent plasma (CP) 12 .Dramatic changes in the prevalence of Spike variants H69/V70 (an out of frame six-nucleotide deletion) and D796H variant followed repeated use of CP, while in vitro the mutated H69/V70 + D796H variant displayed reduced susceptibility to CP, at the same time retaining infectivity comparable to wild type 12 .The H69/V70 itself conferred a two-fold increase in Spike mediated infectivity using pseudotyped lentiviruses.Worryingly, other deletions in the N-Terminal Domain (NTD) have been reported to arise in chronic infections 7 and provide escape from NTD-speci c neutralising antibodies 13 .
Here we analysed the available GISAID Initiative data for circulating SARS-CoV-2 Spike sequences containing H69/V70 and performed phylogenetic and structural modelling from the pandemic data and across sarbecoviruses in different species.We nd, while occurring independently, the Spike H69/V70 often emerges after a signi cant RBM amino acid replacement that increases binding a nity to hACE2.
Protein structure modelling indicates this mutation could also contribute to antibody evasion as suggested for other NTD deletions 13 .

Results
The deletion H69/V70 is present in over 6000 sequences worldwide, 2.5% of the available data (Fig. 1, Supplementary Fig. 1), and largely in Europe from where most of the sequences in GISAID are derived (Fig. 1C-D).Many of the sequences are from the UK and Denmark where sequencing rates are high compared to other countries.H69/V70 occurs in variants observed in different global lineages, representing multiple independent acquisitions of this SARS-CoV-2 deletion (Fig. 1A).While variants with deletions in this region of Spike are observed in GISAID 13 , the earliest unambiguous sequence that includes the H69/V70 was detected in Sweden in April 2020, an independent deletion event relative to other H69/V70 variants.The prevalence of H69/V70 has since increased since August 2020 (Fig. 1C-D).Further analysis of sequences revealed, rstly, that single deletions of either 69 or 70 were uncommon and, secondly, some lineages of H69/V70 alone were present (Fig. 1A), as well as H69/V70 in the context of other mutations in Spike, speci cally those in the RBM (Fig. 1A, E, F).
To guage the importance of this part of Spike molecule, we examined the 69/70 region of Spike in a set of other known Sarbecoviruses, relatives to SARS-CoV-2 (Fig. 2).We observe substantial variability in the region, speci cally caused by indels, with some viruses including SARS-CoV having 6-7 amino acid deletions (Fig. 2B).This is indicative of plasticity in this protein region that could allow the sarbecoviruses to alter their Spike conformation.The second closest relative to SARS-CoV-2 for this region after RaTG13 is the cluster of 5 CoVs sampled in tra cked pangolins in the Guangxi province 14 .Looking at the 69/70 region in these virus sequences raises the interesting observation that one of the ve viruses in the cluster, P1E, has amino acids 69H and 70L present, while the other four have a double amino acid deletion (Fig. 2B).Given that SARS-CoV-2 and RaTG13 have the homologous HV insertion at these positions, the most parsimonious explanation is that the proximal common ancestor between SARS-CoV-2 and the Guangxi pangolin cluster had the insertion, which was then lost while circulating in the pangolin population, similar to what we now see with SARS-CoV-2 in humans.Interestingly, the double amino acid deletion in the pangolin viruses in in-frame in contrast to what is seen in SARS-CoV-2 (e.g.lineage B.1.1.7,Fig. 2C).
To estimate the structural impact of H69/V70, the protein structure of the NTD possessing the double deletion was modelled.The H69/V70 deletion was predicted to alter the conformation of a protruding loop comprising residues 69 to 76, pulling it in towards the NTD (Fig. 3A).In the post-deletion structural model, the positions of the alpha carbons of residues either side of the deleted residues, Ile68 and Ser71, were each predicted to occupy positions 2.9 Å from the positions of His69 and Val70 in the pre-deletion structure.Concurrently, the positions of Ser71, Gly72, Thr73, Asn74 and Gly75 are predicted to have changed by 6.5 Å, 6.7 Å, 6.0 Å, 6.2 Å and 8 Å, respectively, with the overall effect of these residues moving inwards, resulting in a less dramatically protruding loop.The position of this loop in the structure prior to the occurrence of the H69/V70 is shown in the context of the wider NTD in Fig. 3B.The locations of main RBD mutations observed with H69/V70 are shown in Fig. 3C and D. Residues belonging to a similarly exposed, nearby loop that form the epitope of a neutralising, NTD-binding epitope are also highlighted.
We next examined the lineages where S gene mutations in the RBD were identi ed at high frequency, in particular co-occurring with N439K (Fig. 3C,D), an amino acid replacement reported to be de ning variants increasing in numbers in Europe and other regions 3 (Fig. 1E, Supplementary Fig. 2).N439K appears to have reduced susceptibility to a small subset of monoclonals targeting the RBD, whilst retaining a nity for ACE2 in vitro 3 .The proportion of viruses with H69/V70 only increased from August 2020 when it appeared with the second N439K lineage, B.1.141 3 (Fig. 1E).As of November 26 th, remarkably there were twice as many cumulative sequences with the deletion as compared to the single N439K indicating it may be contributing to the success of this lineage (Fig. 1E).Due to their high sampling rates the country with the highest proportion of N439K+H69/V70 versus N439K alone is England.The low levels of sequencing in most countries indicate N439K's prevalence could be relatively high 3 .In Scotland, where early growth of N439K was high (forming N439K lineage B.1.258that subsequently went extinct with other lineages after the lockdown 3 ), there is now an inverse relationship with 546 versus 177 sequences for N439K and N439K+H69/V70 respectively (Fig. 1E).These differences therefore likely re ect differing epidemic growth characteristics and timings of the introductions the N439K variants with or without the deletion.
The second signi cant cluster with H69/V70 and RBD mutants involves Y453F, another Spike RBD mutation that increases binding a nity to ACE2 4 (Fig. 3C,D) and has been found to be associated with mink-human infections 15 .In one SARS-CoV-2 mink-human sub-lineage, termed 'Cluster 5', Y453F and H69/V70 occurred with F486L, N501T and M1229I and was shown to have reduced susceptibility to sera from recovered COVID-19 patients (https:// les.ssi.dk/Mink-cluster-5-short-report_AFO2).Y453F has been described as an escape mutation for mAb REGN10933 16 .The H69/V70 was rst detected in the Y453F background on August 24th and thus far appears limited to Danish sequences (Supplementary Fig. 3).
A third lineage containing the same deletion H69/V70 has arisen with another RBD mutation N501Y (Fig. 4A, C, Supplementary Fig. 4).Based on its location it might be expected to escape antibodies similar to COV2-2499 5 (Fig. 3C, D).In addition, when SARS-CoV-2 was passaged in mice for adaptation purposes for testing vaccine e cacy, N501Y emerged and increased pathogenicity 17 .Sequences with N501Y alone were isolated both in the UK, Brazil and USA in April 2020, and recently in South Africa 18 .A newly described N501Y-derived lineage in South Africa, (B.1.351,Fig. 1A) (also termed 501Y.V2) is characterised by eight mutations in the Spike protein, including N501Y and two other important residues in the receptorbinding domain (K417N and E484K) which are important residues in RBM 18 .The positions of residues 417, 484, and 501 proximal to the bound hACE2 are shown in Fig. 4C.The E484K substitution has been identi ed as antigenically important being reported as an escape mutation for several monoclonal antibodies including C121, C144 19 , REGN10933 and REGN10934 16 .The increase in hACE2 binding a nity caused by N501Y is permissive of the mutation K417N, speci cally, ACE2 a nity induced by N501Y (+ 0.24 log 10 K d ) may be compensated by K417N (-0.45 log 10 K d ) 20 .Residue K417 is also identi ed as antigenically signi cant with K417E facilitating escape from mAb REGN10933 16 .N501Y + H69/V70 sequences were rst detected in the UK on 20th September 2020, with the cumulative number of N501Y + H69/V70 mutated sequences now exceeding the single mutant N501Y lineage (Fig. 1F).On closer inspection these sequences were part of a new lineage (B.1.1.7),termed VOC 202012/01 by Public Health England as they are associated with relatively high numbers of infections (Fig. 4A-C, Supplementary Fig. 4).In addition to RBD N501Y + NTD H69/V70 this new variant had ve further S mutations across the RBD (N501Y) and S2 (P681H, T716I, S982A and D1118H), as well as NTD 144 21 (Fig. 4B).The variant has now been identi ed in a number of other countries, including Hong Kong, Japan, Australia, France, Spain, Singapore, Israel, Switzerland, and Italy.This lineage has a relatively long branch due to 23 unique mutations (Fig. 4A and supplementary Fig. 4), consistent within host evolution and spread from a chronically infected individual 21 .Notably a sequence can be identi ed that contains H69/V70, N501Y and D1118H (Fig. 4A, black box).However, the available sequences did not enable us to determine whether the B1.1.7 mutations N501Y + H69/V70 arose as a result of a N501Y virus acquiring H69/V70 or vice versa.
The B.1.1.7 lineage has some notable features.Firstly the Spike 144 mutation could lead to loss of binding of the S1-binding neutralising antibody 4A8 13 (Fig. 3B).The Y144 sidechain is itself around 4.5 Å from the nearest atoms of 4A8 complexed with Spike and the deletion is expected to alter the positions neighbouring residues that directly interact with 4A8; contacting residues (145, 146, 147, 150, 152, 246 and 258) 22 are shown in magenta in Fig. 3B.Secondly the P681H mutation lies within the furin cleavage site.Furin cleavage is a property of some more distantly related coronaviruses, and in particular not found in SARS-CoV-1 23 .When SARS-CoV-2 is passaged in vitro it results in mutations in the furin cleavage site, suggesting the cleavage is dispensable for in vitro infection 24 .The signi cance of furin site mutations may be related to potential escape from the innate immune antiviral IFITM proteins by allowing infection independent endosomes 25 .The signi cance of the multiple S2 mutations is unclear at present, though D614G, also in S2 was found to lead to a more open RBD orientation to explain its higher infectivity 26 .T716I and D1118H occur at residues located close to the base of the ectodomain (Fig. 4B) that are partially exposed and buried, respectively.The residue 982 is located centrally, in between the NTDs, at the top of a short helix (approximately residues 976-982) that is completely shielded by the RBD when spike is in the closed form, though becomes slightly more exposed in the open conformation.
Residue 681 is part of the loop (residues 676-689) containing the furin-cleavage site, the structure of which is disordered in both cleaved and uncleaved forms 27 , though the surface-exposed locations of modelled residues 676 and 689 (orange in Fig. 4B) indicate that the unmodelled residues 677-688 form a prominently-exposed loop; the signi cant structural exibility of which has prevented inclusion in structural models 22,27 .

Discussion
We have presented data demonstrating multiple, independent, and circulating lineages of SARS-CoV-2 variants bearing a Spike H69/V70.This deletion spanning six nucleotides, is mostly due to an out of frame deletion of six nucleotides, has frequently followed receptor binding amino acid replacements (N501Y, N439K and Y453F that have been shown to increase binding a nity to hACE2 and reduce binding with monoclonal antibodies) and its prevalence is rising internationally.Interestingly the presence of sequence at site 69/70 appears to be unique to SARS-CoV-2 and the closest bat sarbecovirus, RaTG13, and one of the pangolin sequences.We speculate it may have been lost in the other pangolins as these viruses are presumably originated in bats infecting the pangolins after importation to China.
The H69/V70 deletion was also shown to increase Spike mediated infectivity by two-fold over a single round of infection, and appeared to occur with a mutation that conferred reduced susceptibility to neutralising antibodies 28 .Over the millions of replication rounds per day in a SARS-CoV-2 infection even modest reductions in antibody susceptibility could be signi cant.Therefore, H69/V70 may be a 'permissive' mutation that enhances replication 28 , with the potential to enhance the ability of SARS-CoV-2 to generate immune/ vaccine escape variants that would have otherwise signi cantly reduced viral tness.In the case of the UK B1.1.7 lineage with multiple Spike mutations that included the key mutations N501Y and H69/V70, we were able to detect a sequence basal in the phylogeny of B1.1.7 that had both mutations as well as D1118H in S2.
The potential for SARS-CoV-2 evolve to rapidly emerge and x mutations is exempli ed by D614G, an amino acid replacement in S2 that alters linkages between S1 and S2 subunits on adjacent protomers as well as RBD orientation, infectivity, and transmission 26,29,30 .The example of D614G also demonstrates that mechanisms directly impacting important biological processes can be indirect.Similarly, a number of possible mechanistic explanations may underlie H69/V70.For example, the fact that it sits on an exposed surface and is estimated to alter the conformation of a particularly exposed loop might be suggestive of immune interactions and escape, although allosteric interactions could alternatively lead to the higher infectivity recently reported 31 .
The nding of a lineage (B.1.1.7),termed VOC 202012/01, 8bearing seven S gene mutations across the RBD (N501Y, A570D), S1 (H69/V70 and 144) and S2 (P681H, T716I, S982A and D1118H) in UK requires urgent experimental characterisation.The detection of a high number of novel mutations suggests this lineage has either been introduced from a geographic region with very poor sampling or viral evolution may have occurred in a single individual in the context of a chronic infection 12 .This variant bears some concerning features: rstly, the H69/V70 deletion which increases infectivity by two fold 31 .Secondly the 144 which may affect binding by antibodies related to 4A8 13 .Thirdly VOC 202012/01 bears the N501Y mutation that may have higher binding a nity for ACE2 and which has arisen independently in other countries, including South Africa where it has led to establishment and explosive transmission of a multi-mutated lineage 18 .Finally, the VOC 2020/1201 lineage has a second RBD mutation A570D that could alter Spike RBD structure and a mutation near to the furin cleavage site could represent further adaptative change.The emergence of multi-mutated variants in the UK and South Africa may herald an era of re-infection and threaten future vaccine e cacy if left unchecked.
Given the emergence of multiple clusters of variants carrying RBD mutations and the H69/V70 deletion, limitation of transmission takes on a renewed urgency.Continued emphasis on testing/tracing, social distancing and mask wearing are essential, with investment in other novel methods to limit transmission 32 .In concert, comprehensive vaccination efforts in the UK and globally should be accelerated in order to further limit transmission and acquisition of further mutations.If geographically limited then focussed vaccination may be warranted.Research is vitally needed into whether lateral ow devices for antigen and antibody detection can detect emerging strains and the immune responses to them.Finally, detection of the deletion and other key mutations by rapid diagnostics should be a research priority as such tests could be used as a proxy for antibody escape mutations to inform surveillance at global scale.
To reconstruct a phylogeny for the 69/70 Spike region of the 17 Sarbecoviruses examined in Fig 2 , Rdp5 38 was used on the codon Spike alignment to determine the region between amino acids 1 and 256 as putatively non-recombinant.A tree was reconstructed using the protein alignment of this region with FastTree (default parameters) 39 .Alignment visualisation was done using BioEdit 40 .

Structural modelling
The structure of the post-deletion NTD (residues 14-306) was modelled using I-TASSER 41 , a method involving detection of templates from the protein data bank, fragment structure assembly using replicaexchange Monte Carlo simulation and atomic-level re nement of structure using a fragment-guided molecular dynamics simulation.The structural model generated was aligned with the spike structure possessing the pre-deletion conformation of the 69-77 loop (PDB 7C2L 22 ) using PyMOL (Schrödinger).

Declarations
Con icts of interest RKG has received consulting fees from UMOVIS lab, Gilead Sciences and ViiV Healthcare, and a research grant from InvisiSmart Technologies.The deleted residues H69 and V70 and the residues involved in amino acid substitutions (501, 570, 716, 982 and 1118) and the deletion at position 144 are coloured red on each monomer and labelled on the

Figures
Figures

Figure 1 Sub
Figure 1

Figure 2 Comparison
Figure 2

Figure 4 A
Figure 4