Coronaviruses infect a range of mammalian and avian species1. SARS-CoV-2, the agent of the COVID-19 pandemic8,9, belongs to Sarbecovirus subgenus of betacoronaviruses, members of which mostly infect bats10,11. Hence, bat coronaviruses were identified as a likely evolutionary precursor of SARS-CoV-28,9. It remains unknown how SARS-CoV-2 could have evolved to infect humans but two mechanisms have been proposed: selection in an animal host before zoonotic transfer (possibly via an intermediate host such as the Malayan pangolin (Manis javanica)12); or natural selection in humans following a zoonotic transfer including direct transmission from bats13. In order to better understand this zoonosis, we have characterised the spike protein of SARS-CoV-2 and its closest relative RaTG13. In addition to substitutions in the RBD, a second difference between the spike proteins from human and bat viruses is the presence of a four-amino-acid polybasic cleavage site, PRRA, between S1 and S2 domains2. Similar cleavage sites have been found in related coronaviruses including HKU1 and MERS that infect humans6,7,14 and their acquisition is associated with increased pathogenicity in other viruses such as influenza15.
To examine the evolutionary origin of SARS-CoV-2 we first characterised the spike (S) protein of the furin-cleaved human pandemic SARS-CoV-2 virus by cryoEM (Fig. 1a). We produced a form of the human S protein with the furin-cleavage site intact. This protein, which we expressed in mammalian cells, was secreted in a partially cleaved form, presumably owing to the naturally expressed proteases within these cells6 (Fig. S1A). We further cleaved this protein using exogenous furin for structural and biochemical characterisation (Fig. S1A). The particles analysed from cryo-electron micrographs fell into three populations; a closed form (34%), an intermediate form (39%) and an open form (27%) with an upright Receptor Binding Domain (RBD) (Fig. 1a). The overall structure of the closed conformation of the S trimer is three-fold symmetric and similar to structures described previously using uncleaved material2,16 (Fig. 1a). In the closed conformation, the surface of the RBD, which would interact with the ACE2 receptor, is buried inside the trimer and not accessible for receptor binding. In the intermediate form (Fig. 1a) two of the three RBDs maintain a similar interaction to the closed form but the third RBD displays increased mobility and has shifted slightly away from the trimer axis. In the open form (Fig. 1a), two of the RBDs remain fairly closely associated, as in the closed and intermediate forms but, the third RBD rotates approximately ~60˚ such that the ACE2 interacting surface is now fully exposed at the top of the assembly. The changes in domain orientations between the closed and open forms are shown for a selected monomer in Fig. 1b.
In this protease-cleaved material, there is a higher proportion of the S proteins in an open conformation: 27% compared to 17% in the uncleaved human S trimer described below. The observation here of a substantially populated (39%) intermediate form, where one of the RBDs has separated from the other two of the trimer, also suggests that this conformation, possibly transient, will also lead to a receptor-binding competent form. Thus, protease cleavage is likely a selected feature of the human virus in that it leads to a higher proportion of S proteins on the virus surface capable of binding to receptor. Although the loop containing the cleavage site (residues 676-689) is disordered, in both cleaved and uncleaved forms, cleavage likely introduces additional conformational plasticity in this part of the structure. This plasticity is propagated through the molecule by successively larger domain rearrangements resulting, finally, in the facilitation of the ~60˚ rotation of the RBD.
Next, we determined the cryoEM structure of S from the closest known bat virus (RaTG13) as well as uncleaved human S (Fig. S1B). The bat protein was expressed in mammalian cells but was found to be unstable during preparation of EM grids and required chemical cross-linking to produce particles for data collection and analysis. The resulting micrographs yielded a high-resolution single particle reconstruction at 3.1 Å resolution. The uncleaved human S sample was particularly stable and gave rise to the best quality density maps at 2.6 Å (Fig. S2), enabling us to model 15% more of the receptor binding domain (RBD, 100% complete) and 25% more of the N-terminal domain (NTD, 98% complete) than earlier studies2,16, which impacts the appearance of the trimer. The overall structure of the bat S protein is similar to that of the uncleaved human closed form (Fig. 2a, d). Presumably the chemical crosslinking required to image bat S is responsible for all particles being in the closed conformation. Comparison of the sequences of this bat S protein with the human one reveals a high degree of conservation overall (97.8% in the ectodomain) but with a relatively high proportion of substitutions in the RBD (89.6% identity) (Fig. 2b). The substitutions are clustered at two interfaces; the ACE2 receptor binding surface (considered below), and the RBD/RBD interfaces of the trimeric S. Analysis of the latter interface in the human trimer reveals an extensive network of potential intra-trimer hydrogen bonds; including Arg-403, Gln-493 and Tyr-505 from one subunit interacting with Ser-373, Ser-371 and Tyr-369 from another (Fig. 2c). The corresponding residues in the bat structure, and other inter-subunit contacts, suggest a lower surface complementarity. Of note, the bat S protein has an N-glycosylation site at Asn-370, where a bulky fucosylated glycan wedges between adjacent domains (Fig. S3). Indeed, surface contact area calculations show that in the bat S trimer, the monomer/monomer interactions account for 5200 Å2 (of which 485 Å2 between the RBDs) while the equivalent contact area in the closed structure of the SARS-CoV-2 S trimer is 6100 Å2 (with 550 Å2 between the RBDs). To further investigate the relative stability of the human and bat S trimers we carried out thermal denaturation experiments (Fig. S1c). These data show that the uncleaved human S trimer has a markedly higher thermal stability than the bat protein while the cleaved human protein has a similar stability to the (uncleaved) bat protein. Perhaps the higher stability of human S is required to offset some of the loss of stability that occurs upon cleavage. These structural and biochemical data together suggest that the human virus acquired an advantage by having a polybasic cleavage site, which facilitates a higher proportion of the open, receptor binding competent, conformation.
As outlined above, the second region with a high proportion of sequence differences between the bat and human RBDs is at the receptor binding site. To quantitate the impact of these differences on binding to the human ACE2 receptor we measured binding with surface biolayer interferometry. Spike protein, either human or bat, was immobilised onto the surface of a sensor and purified ACE2 was flowed over the surface to measure binding. Amplitude analysis suggests that the human S has approximately 1000-fold stronger binding to ACE2 than the bat protein with Kd values of <100 nM and >40 μM respectively (Fig. 3a).
Previous studies have determined the structural interaction of the isolated RBD of SARS-CoV-2 S with human ACE217,18. Using this information (PDB: 6VW117) enables us to model and compare the ACE2 domain bound to our human and bat S trimers. In the case of the human S/ACE2 there is a buried surface area of 840 Å2. As well as a series of specific salt and hydrogen bonds, another notable feature is that Phe- 486(HS) inserts into a hydrophobic pocket on the surface of ACE2 formed by residues including Phe-28(ACE2), Leu-79(ACE2), Met-82(ACE2) and Tyr-83(ACE2). In contrast, in the bat spike protein, the hydrophobic Phe-486 is replaced by a less-bulky leucine residue (Leu-486(BS)) (Fig. 3b), which accounts in part for the smaller buried surface of the bat S/ACE2 of 760 Å2. Structural comparison also suggests another substitution that likely contributes to the greatly enhanced affinity of human S binding to ACE2; Gln-493(HS) makes a potential hydrogen bond with Glu-35(ACE2) that is salt bridged to Lys-31(ACE2) that in turn salt bridges with Glu-484(HS). In contrast, the equivalent residue to Gln- 493(HS) in the bat is a tyrosine that sterically clashes with Lys-31(ACE2) and does not hydrogen bond to Glu-35(ACE2), while Glu-484(HS) is replaced by a threonine that would not bond to Lys-31(ACE2) (Fig. 3c). Moreover, the glutamine at position 498 is replaced by a Thr-234(BS) that could not hydrogen bond to Tyr-41(ACE2).
Together, our structural and biochemical data indicate that a bat virus, similar to RaTG13, would not be able to bind effectively to ACE2 receptor and would be unlikely to infect humans directly. Given the modular nature of the human and bat spike glycoproteins, and the number and structural locations of the amino-acid sequence differences between them, our observations could be interpreted to support the involvement of recombination12 between distinct coronavirus genomes in the generation of SARS-CoV-2. The structure of the SARS-CoV-2 spike protein presented here is at high resolution, and nearly complete with many more external loops included and thus provides important insights for vaccine design. Further, our study suggests that the presence of the polybasic cleavage site in the S of SARS-CoV-2 leads to enhanced virus transmissibility, as it increases the proportion of RBDs on the virus surface able to bind receptor.