The deletion H69/V70 is present in over 6000 sequences worldwide, 2.5% of the available data (Fig. 1, Supplementary Fig. 1), and largely in Europe from where most of the sequences in GISAID are derived (Fig. 1C-D). Many of the sequences are from the UK and Denmark where sequencing rates are high compared to other countries. 𝚫H69/V70 occurs in variants observed in different global lineages, representing multiple independent acquisitions of this SARS-CoV-2 deletion (Fig. 1A). While variants with deletions in this region of Spike are observed in GISAID13, the earliest unambiguous sequence that includes the 𝚫H69/V70 was detected in Sweden in April 2020, an independent deletion event relative to other 𝚫H69/V70 variants. The prevalence of 𝚫H69/V70 has since increased since August 2020 (Fig. 1C-D). Further analysis of sequences revealed, firstly, that single deletions of either 69 or 70 were uncommon and, secondly, some lineages of 𝚫H69/V70 alone were present (Fig. 1A), as well as 𝚫H69/V70 in the context of other mutations in Spike, specifically those in the RBM (Fig. 1A, E, F).
To guage the importance of this part of Spike molecule, we examined the 69/70 region of Spike in a set of other known Sarbecoviruses, relatives to SARS-CoV-2 (Fig. 2). We observe substantial variability in the region, specifically caused by indels, with some viruses including SARS-CoV having 6–7 amino acid deletions (Fig. 2B). This is indicative of plasticity in this protein region that could allow the sarbecoviruses to alter their Spike conformation. The second closest relative to SARS-CoV-2 for this region after RaTG13 is the cluster of 5 CoVs sampled in trafficked pangolins in the Guangxi province14. Looking at the 69/70 region in these virus sequences raises the interesting observation that one of the five viruses in the cluster, P1E, has amino acids 69H and 70L present, while the other four have a double amino acid deletion (Fig. 2B). Given that SARS-CoV-2 and RaTG13 have the homologous HV insertion at these positions, the most parsimonious explanation is that the proximal common ancestor between SARS-CoV-2 and the Guangxi pangolin cluster had the insertion, which was then lost while circulating in the pangolin population, similar to what we now see with SARS-CoV-2 in humans. Interestingly, the double amino acid deletion in the pangolin viruses in in-frame in contrast to what is seen in SARS-CoV-2 (e.g. lineage B.1.1.7, Fig. 2C).
To estimate the structural impact of 𝚫H69/V70, the protein structure of the NTD possessing the double deletion was modelled. The 𝚫H69/V70 deletion was predicted to alter the conformation of a protruding loop comprising residues 69 to 76, pulling it in towards the NTD (Fig. 3A). In the post-deletion structural model, the positions of the alpha carbons of residues either side of the deleted residues, Ile68 and Ser71, were each predicted to occupy positions 2.9 Å from the positions of His69 and Val70 in the pre-deletion structure. Concurrently, the positions of Ser71, Gly72, Thr73, Asn74 and Gly75 are predicted to have changed by 6.5 Å, 6.7 Å, 6.0 Å, 6.2 Å and 8 Å, respectively, with the overall effect of these residues moving inwards, resulting in a less dramatically protruding loop. The position of this loop in the structure prior to the occurrence of the 𝚫H69/V70 is shown in the context of the wider NTD in Fig. 3B. The locations of main RBD mutations observed with 𝚫H69/V70 are shown in Fig. 3C and D. Residues belonging to a similarly exposed, nearby loop that form the epitope of a neutralising, NTD-binding epitope are also highlighted.
We next examined the lineages where S gene mutations in the RBD were identified at high frequency, in particular co-occurring with N439K (Fig. 3C,D), an amino acid replacement reported to be defining variants increasing in numbers in Europe and other regions3 (Fig. 1E, Supplementary Fig. 2). N439K appears to have reduced susceptibility to a small subset of monoclonals targeting the RBD, whilst retaining affinity for ACE2 in vitro3. The proportion of viruses with 𝚫H69/V70 only increased from August 2020 when it appeared with the second N439K lineage, B.1.1413 (Fig. 1E). As of November 26th, remarkably there were twice as many cumulative sequences with the deletion as compared to the single N439K indicating it may be contributing to the success of this lineage (Fig. 1E). Due to their high sampling rates the country with the highest proportion of N439K+𝚫H69/V70 versus N439K alone is England. The low levels of sequencing in most countries indicate N439K’s prevalence could be relatively high3. In Scotland, where early growth of N439K was high (forming N439K lineage B.1.258 that subsequently went extinct with other lineages after the lockdown3), there is now an inverse relationship with 546 versus 177 sequences for N439K and N439K+𝚫H69/𝚫V70 respectively (Fig. 1E). These differences therefore likely reflect differing epidemic growth characteristics and timings of the introductions the N439K variants with or without the deletion.
The second significant cluster with 𝚫H69/V70 and RBD mutants involves Y453F, another Spike RBD mutation that increases binding affinity to ACE24 (Fig. 3C,D) and has been found to be associated with mink-human infections15. In one SARS-CoV-2 mink-human sub-lineage, termed ‘Cluster 5’, Y453F and 𝚫H69/V70 occurred with F486L, N501T and M1229I and was shown to have reduced susceptibility to sera from recovered COVID-19 patients (https://files.ssi.dk/Mink-cluster-5-short-report_AFO2). Y453F has been described as an escape mutation for mAb REGN1093316. The 𝚫H69/V70 was first detected in the Y453F background on August 24th and thus far appears limited to Danish sequences (Supplementary Fig. 3).
A third lineage containing the same deletion 𝚫H69/V70 has arisen with another RBD mutation N501Y (Fig. 4A, C, Supplementary Fig. 4). Based on its location it might be expected to escape antibodies similar to COV2-24995 (Fig. 3C, D). In addition, when SARS-CoV-2 was passaged in mice for adaptation purposes for testing vaccine efficacy, N501Y emerged and increased pathogenicity17. Sequences with N501Y alone were isolated both in the UK, Brazil and USA in April 2020, and recently in South Africa18. A newly described N501Y-derived lineage in South Africa, (B.1.351, Fig. 1A) (also termed 501Y.V2) is characterised by eight mutations in the Spike protein, including N501Y and two other important residues in the receptor-binding domain (K417N and E484K) which are important residues in RBM18. The positions of residues 417, 484, and 501 proximal to the bound hACE2 are shown in Fig. 4C. The E484K substitution has been identified as antigenically important being reported as an escape mutation for several monoclonal antibodies including C121, C14419, REGN10933 and REGN1093416. The increase in hACE2 binding affinity caused by N501Y is permissive of the mutation K417N, specifically, ACE2 affinity induced by N501Y (+ 0.24 𝚫log10 Kd) may be compensated by K417N (-0.45 𝚫log10 Kd)20. Residue K417 is also identified as antigenically significant with K417E facilitating escape from mAb REGN1093316.
N501Y + 𝚫H69/V70 sequences were first detected in the UK on 20th September 2020, with the cumulative number of N501Y + 𝚫H69/V70 mutated sequences now exceeding the single mutant N501Y lineage (Fig. 1F). On closer inspection these sequences were part of a new lineage (B.1.1.7), termed VOC 202012/01 by Public Health England as they are associated with relatively high numbers of infections (Fig. 4A-C, Supplementary Fig. 4). In addition to RBD N501Y + NTD 𝚫H69/V70 this new variant had five further S mutations across the RBD (N501Y) and S2 (P681H, T716I, S982A and D1118H), as well as NTD 𝚫14421 (Fig. 4B). The variant has now been identified in a number of other countries, including Hong Kong, Japan, Australia, France, Spain, Singapore, Israel, Switzerland, and Italy. This lineage has a relatively long branch due to 23 unique mutations (Fig. 4A and supplementary Fig. 4), consistent within host evolution and spread from a chronically infected individual21. Notably a sequence can be identified that contains 𝚫H69/V70, N501Y and D1118H (Fig. 4A, black box). However, the available sequences did not enable us to determine whether the B1.1.7 mutations N501Y + 𝚫H69/V70 arose as a result of a N501Y virus acquiring 𝚫H69/V70 or vice versa.
The B.1.1.7 lineage has some notable features. Firstly the Spike 𝚫144 mutation could lead to loss of binding of the S1-binding neutralising antibody 4A813 (Fig. 3B). The Y144 sidechain is itself around 4.5 Å from the nearest atoms of 4A8 complexed with Spike and the deletion is expected to alter the positions neighbouring residues that directly interact with 4A8; contacting residues (145, 146, 147, 150, 152, 246 and 258)22 are shown in magenta in Fig. 3B. Secondly the P681H mutation lies within the furin cleavage site. Furin cleavage is a property of some more distantly related coronaviruses, and in particular not found in SARS-CoV-123. When SARS-CoV-2 is passaged in vitro it results in mutations in the furin cleavage site, suggesting the cleavage is dispensable for in vitro infection24. The significance of furin site mutations may be related to potential escape from the innate immune antiviral IFITM proteins by allowing infection independent endosomes25. The significance of the multiple S2 mutations is unclear at present, though D614G, also in S2 was found to lead to a more open RBD orientation to explain its higher infectivity26. T716I and D1118H occur at residues located close to the base of the ectodomain (Fig. 4B) that are partially exposed and buried, respectively. The residue 982 is located centrally, in between the NTDs, at the top of a short helix (approximately residues 976–982) that is completely shielded by the RBD when spike is in the closed form, though becomes slightly more exposed in the open conformation. Residue 681 is part of the loop (residues 676–689) containing the furin-cleavage site, the structure of which is disordered in both cleaved and uncleaved forms27, though the surface-exposed locations of modelled residues 676 and 689 (orange in Fig. 4B) indicate that the unmodelled residues 677–688 form a prominently-exposed loop; the significant structural flexibility of which has prevented inclusion in structural models22,27 .