SARS-CoV-2 antibodies recognize 23 distinct epitopic sites on the receptor binding domain

The COVID-19 pandemic and SARS-CoV-2 variants have dramatically illustrated the need for a better understanding of antigen (epitope)-antibody (paratope) interactions. To gain insight into the immunogenic characteristics of epitopic sites (ES), we systematically investigated the structures of 340 Abs and 83 nanobodies (Nbs) complexed with the Receptor Binding Domain (RBD) of the SARS-CoV-2 spike protein. We identified 23 distinct ES on the RBD surface and determined the frequencies of amino acid usage in the corresponding CDR paratopes. We describe a clustering method for analysis of ES similarities that reveals binding motifs of the paratopes and that provides insights for vaccine design and therapies for SARS-CoV-2, as well as a broader understanding of the structural basis of Ab-protein antigen (Ag) interactions.

(VOC). Our analysis of available structures may aid in understanding which Abs may be of value 50 for emerging variants and contribute to evolving strategies for prophylaxis, treatment, and 51 immunization. 52 Ab-protein antigen (Ab-Ag) interfaces have been a focus of immunologists and protein 53 chemists for more than 80 years 6 , not only because of the important role of Abs in defense 54 against infection 7 , but also due to the general interest in understanding protein-protein 55 interactions 8 . High resolution structural analysis of protein-protein complexes, based initially on 56 X-ray crystallography and more recently on cryogenic electron microscopy (cryo-EM), provides 57 an objective basis for understanding not only the biophysical principles that determine affinity 58 and specificity, but also for elucidating biological and evolutionary rules that govern 59 immunological molecular recognition of foreign molecules and pathogens 9,10 . With an ever-  Table 1). A receptor binding motif (RBM) has been defined as those RBD residues 73 that specifically interact with ACE2 15 . Binding analysis of Nbs and human mAbs derived from 74 patients along with a limited number of protein structures assigned five surface regions of the 75 RBD reflecting its antigenic anatomy 16 . Epitopic analysis was further extended by the definition 76 of seven "communities" of Abs that bind to the RBD surface 17 . Recent analysis of anti-RBD 77 antibodies in the context of evolving escape mutations has taken advantage of these earlier 78 classification schemes 18-21 .

79
Although these classification schemes have been valuable and adopted widely in the 80 analysis of Abs as to how they bind to RBD and spike, particular Abs and Nbs may not be 81 unambiguously classified (Supplementary Figure 1). The previous summaries were based on a 82 relatively small number of available structures and focused on the relative superposition of the 83 Abs in the complexes, rather than on a comparison of the epitopic contacts of the RBD surface.

84
In particular, the original distinction between Class 1 and Class 2 seemed clear based on the initial 85 structures. However, as more structural models became available, apparent inconsistencies  In this work, we focus on complexes of Abs and Nbs bound to the RBD of the spike protein 91 to generate a comprehensive structural framework to further our understanding of Ab-and Nb-92 RBD recognition. Using a large database, we offer a structure-based classification exploiting 93 quantitatively defined contacting amino acid residues on the RBD as well as a clustering analysis.

102
To identify common features of ES of the RBD, we systematically investigated structures 103 of Abs (as Fabs and Fvs, Ab fragments that confer antigen binding activity) and of Nbs (as VHH or 104 synthetic library-derived sybodies) in complex with the spike protein or its RBD as collected in 105 the CovAbDab 23 and the protein data bank (PDB) 24,25 . Abs and Nbs that bind the SARS-CoV-2 RBD 106 are summarized in Table 1 serve as the basis of our structural analysis.

114
Evaluation of the biophysical properties that contribute to protein-protein interactions 115 may be based on different criteria, including calculation of free energy terms of interacting 116 residues 26 , measurement of shape complementarity (Sc 27 ), and calculation of buried or 117 accessible surface area 28-32 . We elected to simplify this analysis first by calculating interatomic 118 contacts between Ab (paratopic) and Ag (epitopic) residues at the interface because the 119 biophysical basis of binding (due to charge, hydrophobicity, hydrogen bonding and van der Waals 120 interactions) is reflected in such contacts. We calculated distances between Ab and Ag interface 121 residues with a cut-off of 5.0 Å (see Methods) and we plotted the numbers of Ab (paratope) contacts as hits versus the residue number of the RBD (epitope) for the Ab heavy (H) (Figure 1a) 123 and light (L) (Supplementary Figure 2a) chains individually, and also overall for both H and L 124 chains together (Supplementary Figure 2b). We also plot the number of hits of the 83 Nbs to 125 each RBD residue (Figure 1c) Ab, H chain, or Nb are by and large, the same, the relative distribution of hits varies for several 129 regions. In particular, the region from RBD residue 368 to 386 is recognized more frequently by 130 Nbs, while other contiguous surfaces are seen equivalently (Figure 1a & 1c). The numbers of hits 131 for Ab H chains are represented graphically as a heat map on the RBD surface in Figure 1b, and 132 the heat maps for the Nbs are shown in Figure 1d. 133 Several contiguous stretches of amino acids of the RBD that make Ab contact were 134 apparent, although the frequency of hits varied considerably for different regions on the surface 135 of the RBD. A fine-grained tabulation of regions of the RBD consisting of three to nine residues 136 define each individual ES as shown in Table 2a. Each of these ES may be assigned to either of the 137 four major classes identified earlier or to the RBM recognized by the ACE2 receptor (Table 2b). 138 These regions include distinct secondary structural features such as strands, loops, turns, and ES recognized by Abs and Nbs are that ES8, 13, 16, and 18 are more frequently seen by Abs while 159 ES4, 5, 6, 7, 11, and 20 are more frequently identified by Nbs. For example, ES16 was recognized 160 by 10% of Abs and by 0.16% of Nbs. This difference may be explained since ES16 forms a solvent 161 exposed convex structure which may not be conducive to recognition by Nbs. By contrast, ES4, 162 5, and 6 form a contiguous patch, recognized more frequently by Nbs, a region that is not exposed 163 to solvent in the complete spike when the RBD is in the down position. Thus, Nbs may be better 164 able to access such hidden surfaces, perhaps because of their relatively small size (12kD 165 compared to ~25 or 50 kD for Fv and Fab respectively or ~150 kD for complete bivalent IgG, with corresponding three-dimensional volumes) 33 . Alternatively, since many Nbs were identified 167 based on binding to isolated RBD, some epitopes identified from such screens may be partially 168 hidden in the complete spike protein. In comparing L chains with H chains, as shown in Figure 2d, 169 L chains generally contribute less to these ES. Nevertheless, L chains seem to preferentially 170 contact ES7, 20 and 21. We note that some ES (e.g. ES7, 8, 9, and 23) could not be placed into  (Table 2b) 14 . In addition, the RBM of the RBD 15 may be defined in terms of the ES that overlap 174 the ACE2-RBD interface (i.e. ES8, 11, 12, 13, 16, 18, 19, 20, 21, and 22 (Table 2b)). With these 23 175 fine-grained ES, we extend the prior classification for Class 1 to now include ES8 and 9 (Table 2b). 176 Each ES surface area or footprint is illustrated by a color map of the RBD surface (  proportion of those residues that interact with the RBD, reflecting a major role for CDR3 in RBD 197 recognition. 198 We plotted the frequency of particular amino acids used by Abs and Nbs (paratopic frequency of tyrosine usage 37 . We also observed that tryptophan is more frequently used in Nbs 206 as compared with Abs (Figure 4c). The usage of CDR3 amino acids is plotted in Figure 4d. To 207 illustrate the predominance of particular paratopic residues of the Ab H chains that contact 208 specific ES, we also grouped these as WebLogo plots 38 (Supplementary Figure 3). 209 Cluster analysis of epitopic sites and binding motifs 211 Having identified the sets of ES bound by each Ab and Nb (see Supplementary Table   212 Table 4a) and 10 distinct clusters for Nbs, N1 to N10 (Supplementary Table 4b). 218 Although Abs within a single cluster bind the same subset of ES, they may, or may not address 219 the RBD from the same angle or utilize CDR of the same length or composition. These differences 220 are illustrated in Figure 5a for clusters A1, A3, and A11 for H chains and in Figure 5b  we analyzed a subset of interfaces from cluster A1, designated A1S1, that recognized ES with a 226 similarity of ³ 0.9. A1S1 consists of 28 members (cluster A1 has 56 members of similarity ³ 0.85).

227
All the members of A1S1 recognize the same ES set (ES8, 9, 12, 13, 16, 18, and 19) (Figure 5c), 228 utilize the same CDR loops, and superpose well. Analysis of the residues of CDR1, 2, and 3 that 229 contact the RBD indicated those residues that are preferentially utilized by this stringently 230 selected cluster of Abs. For the binding motifs of CDR1, 2, and 3 of A1S1, the favored residues 231 are summarized in a WebLogo plot (Figure 5c). Remarkably, Y, S, G, and T predominate for all CDR except CDR3 which exploits R in most instances. Thus, application of a more stringent ES 233 similarity score helps to identify the preferred binding motif utilized by the Ab of the same 234 subgroup. This stringent grouping of Abs and Nbs, based on high similarity score of their 235 respective ES, may prove a useful adjunct in structure prediction based on amino acid sequence 236 and antibody competition.

237
To extend the utility of our ES definitions, we set out to determine broad biophysical  XBB.1.5 has substitutions of P and S for V445 and G446, respectively, which are contained in ES11, preserves the same substitutions, but also substitutes R for L452 in ES12. Figure 6a illustrates the 255 location of these variants on the RBD surface for Omicron and their mutation sites are matched 256 to one or more of the 23 ES. Strikingly, Omicron escape mutations are distributed throughout 257 several distinct ES of the RBD (Table 6a, Figure 6a- ES2 (Table 2a, Table 6a, Figure 6a), and those Abs that 271 recognize ES2 may be further evaluated for their ability to bind the mutants that harbor the R->T 272 substitution. Supplementary Table 5a lists a number of Abs and Nbs whose structures are known 273 that interact with ES2, and analysis of several Abs which may potentially resist the escape 274 mutation (Supplementary Figure 5a) (Supplementary Figure 5b). 282 Our analysis of ES recognized by Abs and Nbs and the identification of specific ES affected 283 by mutations in VOC provides an explanation for the ineffectiveness of some Ab that have been 284 tested therapeutically. One example, Evushield™, which consists of two Abs, tixagevimab (AZD 285 8895) and cligavimab (AZD 1061) illustrates this point. These Ab have been studied by X-ray 286 crystallography (tixagevimab, PDB 7L7D, and cligavimab 7L7E 52 ) and by cryo-EM 53 . By our 287 analysis, tixagevimab interacts with ES13, 16, 18, 19, and 20 and cligavimab with ES2, 10, 11, and 288 12. As shown in Table 6a, residues in every one of these ES are mutated in the Omicron variant.      components" of biophysical properties, or "mature information".

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.