Quantifying how single dose Ad26.COV2.S vaccine efficacy depends on Spike sequence features

It is of interest to pinpoint SARS-CoV-2 sequence features defining vaccine resistance. In the ENSEMBLE randomized, placebo-controlled phase 3 trial, estimated single-dose Ad26.COV2.S vaccine efficacy (VE) was 56% against moderate to severe–critical COVID-19. SARS-CoV-2 Spike sequences were measured from 484 vaccine and 1,067 placebo recipients who acquired COVID-19 during the trial. In Latin America, where Spike diversity was greatest, VE was significantly lower against Lambda than against Reference and against all non-Lambda variants [family-wise error rate (FWER) p < 0.05]. VE also differed by residue match vs. mismatch to the vaccine-strain residue at 16 amino acid positions (4 FWER p < 0.05; 12 q-value ≤ 0.20). VE significantly decreased with physicochemical-weighted Hamming distance to the vaccine-strain sequence for Spike, receptor-binding domain, N-terminal domain, and S1 (FWER p < 0.001); differed (FWER ≤ 0.05) by distance to the vaccine strain measured by 9 different antibody-epitope escape scores and by 4 NTD neutralization-impacting features; and decreased (p = 0.011) with neutralization resistance level to vaccine recipient sera. VE against severe–critical COVID-19 was stable across most sequence features but lower against viruses with greatest distances. These results help map antigenic specificity of in vivo vaccine protection.

strain sequence for Spike, receptor-binding domain, N-terminal domain, and S1 (FWER p < 0.001); differed (FWER ≤ 0.05) by distance to the vaccine strain measured by 9 different antibody-epitope escape scores and by 4 NTD neutralization-impacting features; and decreased (p = 0.011) with neutralization resistance level to vaccine recipient sera. VE against severe-critical COVID-19 was stable across most sequence features but lower against viruses with greatest distances. These results help map antigenic speci city of in vivo vaccine protection.

Main Text
Initial SARS-CoV-2 vaccine candidates were based on the virus's original lineage, as represented by the Wuhan-Hu-1 index strain with Spike D614 (NC_045512). As the virus has evolved, 1-4 e cacy of these vaccines against symptomatic infection has waned, 5,6 and new vaccine inserts have been developed.
Based on data from a randomized, placebo-controlled vaccine e cacy (VE) trial on clinical outcomes and pathogen sequences isolated from participants experiencing clinical outcomes, sieve analysis assesses how VE depends on pathogen sequence features. 7,8 Pajon et al. 9 and Sadoff et al. 10 showed how the VE against symptomatic COVID-19 was lower against certain variants than against the Reference strain in the phase 3 COVE trial of two doses of Moderna's mRNA-1273 vaccine and the phase 3 ENSEMBLE trial of a single dose of Janssen's Ad26.COV2.S vaccine, respectively. [As in ref. 10 , Reference is de ned as the basal outbreak lineage B.1, which bears the D614G mutation.] Cao et al. showed that VE was higher in COVID-19 VE trials where circulating viruses had shorter Spike sequence Hamming distances to the vaccine strain. 11 These sieve analyses only considered Spike viral variation de ned by the WHO-de ned variant category or the unweighted Spike protein distance. They did not assess how VE depends on other Spike sequence features, such as at the level of individual mutations or features that impact immunological functions such as anti-SARS-CoV-2 neutralization, 12-17 relevant given the strong evidence of neutralizing antibodies as a cross-platform correlate of protection. [18][19][20] We report here the results of a sieve analysis of the ENSEMBLE trial, which enrolled over 40,000 participants and was conducted in the US, South Africa, and six countries in Latin America. The sieve analysis considers baseline SARS-CoV-2 seronegative per-protocol participants and the primary endpoint (moderate to severe-critical COVID- 19), as well as the severe-critical COVID-19 endpoint, during the double-blinded period of follow-up. We focus the main text on the Latin America results given the greatest information for sieve analysis as noted below.

SARS-CoV-2 sequence data
A total of 1,345 SARS-CoV-2 Spike amino acid sequences were obtained from 1,224 participants experiencing the moderate to severe-critical primary endpoint. All sequences were variant-typed to either the Reference lineage or to one of nine different WHO-de ned variants (Fig. 1A) (Table S5). Lineages that circulated at the beginning of the study period, e.g., Reference, were closer to the sequence from the vaccine insert than later emerging lineages, with Lambda the most distant ( Fig. 1B-C).
Greater SARS-CoV-2 Spike diversity in Latin America than in South Africa and the US Most sequences were obtained from participants in Latin America (n = 776) with additional sequences from the US (n = 323) and South Africa (n = 125) (Table S6). Five main variants circulated in Latin America (Reference, Zeta, Gamma, Lambda, Mu), while the South African sequences were 76% Beta and 17% Delta, and the US sequences were 85% Reference (Fig. 1A). There was greater Spike AA sequence diversity in Latin America compared to South Africa and the US (Rao's Q = 10.1 vs. 7.7 vs. 3.3, respectively; Fig. S1).
The succession of distinct co-circulating variants in Latin America and the resulting broadest dynamic range of inter-individual sequence diversity, and the greatest number of COVID-19 endpoints, implies that sieve analyses of the Latin America region have the greatest statistical power. In contrast, the domination of the Reference lineage in the US and the Beta and Delta lineages in South Africa constrained the sequence diversity's dynamic range and limited the power of these sieve analyses. Therefore, we focus on the results from Latin America, with the US and South Africa results reported in the Supplementary
Alternatively, we de ned putative antibody footprint site sets (including whole Spike) based on structures of SARS-CoV-2 in complex with antibodies available from the PDB. Each sequence was assigned an escape score based on a class of epitopes (see Supplementary Materials). These features are referred to as PDB1 through PDB14, with the rst 12 clusters in the RBD and PDB13 and PDB14 in the NTD. Vaccine e cacy signi cantly decreased (q-value ≤ 0.20) with the escape scores for PDB4, PDB7, PDB8, and PDB13 (FWER p ≤ 0.05) as well as for PDB1 and PDB3 (q-value ≤ 0.20 and FWER > 0.05) (Table S15). Tables S13 and S14 show inferences about differences in mean escape scores of vaccine vs. placebo sequences.
To interpret the DMS and PDB results, we focus on the epitope-speci c features with FWER p ≤ 0.05 that carry the greatest amount of independent information based on inter-correlation and hierarchical clustering analysis (Supplementary Text, Figs. S13 and S14): DMS2, PDB7, PDB8, and PDB13. The sieve analysis results are similar across these four features, with estimated VE at 60-70% against viruses with escape score zero and decreasing to 0%-20% against viruses with maximum escape score. PDB8 and PDB13 rank highest for discriminating VE with slightly greater span of VE point estimates over the range of escape scores (spans 20-60%, 16-60%, 21-69%, and 1-57% for DMS2, PDB7, PDB8, and PDB13, respectively) ( Fig. 4A-D). Figure 4E lists the Spike AA residues in each epitope footprint and the visualizations in Lower vaccine e cacy against COVID-19 with NTD features hypothesized to abrogate neutralization Seven dichotomous NTD features (see Supplementary Materials) were assessed for a sieve effect as for vaccine-match vs. vaccine-mismatch binary features. Six of the 7 NTD features signi cantly impacted VE (q-value ≤ 0.20): NTD4, NTD6, NTD1, NTD3, NTD5, and NTD7 (where the last four also had FWER p ≤ 0.05) (Fig. 5). Figure S32 shows the spatial locations in the NTD of the features that impacted VE (FWER p ≤ 0.05).
Vaccine e cacy greater against lineages with lower variantneutralization resistance to Ad26.COV2.S vaccine recipient sera All of the sieve analyses study how VE depends on Spike AA features except one: a neutralization sieve analysis that scores each virus's lineage by its experimentally measured sensitivity to neutralization by Ad26.COV2.S vaccinee sera. 29,30 VE decreased with this variant-neutralization resistance score (p = 0.011) (Fig. 5B). Under one model for the neutralization assay being a perfect correlate of protection, the estimates of VE for each of the ve lineages would fall on the curve of VE by variant-neutralization resistance score. Lambda had evidence of deviating from the curve, with VE 55% (48, 62%) based on its measured neutralization sensitivity compared to VE 11% (-35, 41%) based on direct analysis of Lambda ignoring neutralization data. In contrast, the weighted Hamming distance analyses yielded VE estimates at Lambda-variant distance values that are closer to the VE 11% gure. Assessing the severe-critical COVID 19 endpoint Differential VE against severe-critical COVID-19 by lineage could only be assessed for Latin America, with VE of 83% (64, 92%) against Reference, 64% (26, 83%) against Gamma, 94% (-27, 100%) against Zeta, 62% (-31%, 89%) against Lambda, and 84% (42, 96%) against Mu (Table S16). There was no evidence of variation in VE across the lineages (p = 0.50) (Table S16, S17). The estimates of VE were similar/stable across AA positions with vaccine-matched vs. vaccine-mismatched residue, with all unadjusted p-values for differential VE above 0.05 (Fig. S35). For the key positions 452 and 490 found to show sieve effects for the primary COVID-19 endpoint, the results for the severe-critical COVID-19 endpoint were VE 79% (68, 87%) against 452-matched virus compared to VE 70% (3, 91%) against 452-mismatched virus (p = 0.58 for difference), and VE 80% (68, 87%) against 490-matched virus compared to VE 62% (-31, 89%) against 490-mismatched virus (p = 0.34 for differential VE). For the DMS antibody escape score distances, the data support stable VE across the distances (Table S18). Similarly, the data support stable VE across RBD and PDB Spike-antibody escape scores (Table S19). VE was stable by variant-neutralization resistance score, with VE = 84% (67%, 92%) for the most sensitive lineage (ancestral) and VE = 73% (50, 85%) for the least sensitive lineage (Mu) (p = 0.33, Fig. S36).

Discussion
Sieve analysis compares genotype-speci c or immunophenotype-speci c COVID-19 incidence between randomized study groups, therefore directly assessing causal effects of vaccination and providing inferences for how vaccine e cacy depends on SARS-CoV-2 features. In addition to the strength of a randomized, double-blinded placebo-controlled phase 3 trial, the present sieve analysis of ENSEMBLE had ample statistical precision due to the large number of SARS-CoV-2 Spike sequences (measured from more than 1,200 participants) and the broad proteomic variability of the SARS-CoV-2 Spike sequences causing these endpoints. Consequently, the sieve analysis could provide many insights into how the e cacy of the Ad26.COV2.S vaccine, evaluated in baseline SARS-CoV-2 negative individuals, depended on virus features.
In the Latin American cohort, VE against the moderate to severe-critical COVID-19 primary endpoint signi cantly declined with Spike sequence distance as measured in myriad ways, including lineage, weighted Hamming distances calculated for Spike, RBD, NTD, and S1, scores re ecting degree of escape from epitope-speci c antibodies computed using deep mutational scanning or based on crystal structures in the Protein Data Bank (PDB), and NTD features previously shown to impact neutralization.
Estimates of VE by lineage were consistently ordered by the distances of the different lineages to the vaccine strain. VE declined similarly with Spike, RBD, NTD, and S1 distances (VE about 70% against viruses closest to the vaccine and 20% against viruses beyond the 90-95th percentile of distances) but did not depend on S2 distances. This may be explained by S2's relative conservation when compared to S1. As such, almost all variant-characteristic mutations are not in S2, and none of the prescribed antibody epitope footprint clusters included S2 positions (only rare epitopes in PDB mapped to S2), re ecting S2's 'stalk' location and relative lack of exposure to the immune system. VE signi cantly declined with 14 of the 20 evaluable antibody epitope escape scores. Six antibodyepitope clusters had no evidence of impacting VE: DMS3, PDB2, PDB5, PDB6, PDB9, PDB14. Of the 14 clusters with a sieve effect, 9 include at least one site that harbors a characteristic mutation of Lambda, whereas 3 include site 417 twhich is a characteristic mutation of Mu and Gamma, 1 includes site 501 that harbors a characteristic mutation of Gamma, Alpha, and Mu, and 1 includes both sites 417 and 501. Thus the 9 sieve-effect clusters appear to be driven by the differential VE by Lambda vs. not-Lambda, whereas the other 5 appear to be driven by mutations at the important sieve-effect sites 417 and 501 that impact neutralization. Of the 6 non-sieve-effect clusters, only one (PDB14) included a site harboring a characteristic mutation of Lambda, site 75, which was a sieve-effect site with FWER p ≤ 0.05. The potential for sieve effects in different epitope sets depends on many factors including level of accessibility to neutralizing antibodies, conservation, and the narrowness of the footprints on the tridimensional structure they target (Fig. S31).
Neutralizing antibody assays have performed well at predicting vaccine e cacy against COVID-19 and severe-critical COVID-19 across SARS-CoV-2 lineages. 19,20,32 Importantly, one of the sieve analyses in the present work scored viruses by their lineage's directly measured resistance to neutralization by sera from ENSEMBLE Ad26.COV2.S vaccine recipients, providing a way to study a neutralization correlate of protection (CoP) in a complementary way to individual-level and population-level immune correlates analyses. 33-35 VE signi cantly declined against lineages with greater neutralization resistance scores, providing validation of pseudovirus neutralization titer as a CoP. However, the lineage scores were estimated from only eight ENSEMBLE vaccine recipients, albeit the scores are supported by additional data from 17 Ad26.COV2.S vaccine recipients in the COV2001 phase 1/2a study. 36 The relative prevalence of SARS-CoV-2 lineages changed over time (Fig. 1A and Fig. 1 of ref. 10 ) where in Latin America the median (range) number of days from enrollment until the COVID-19 endpoint among placebo recipients was 48 (15,197)  The fact that sieve analysis predicted currently relevant mutations could be expected since SARS-CoV-2 has shown remarkable patterns of convergent evolution since the initial appearance of variants, with numerous recurrent mutations, especially in the RBD, shared across lineages over time. 38 A strength of this study was it was conducted in three separate geographic regions with different circulating lineages, which contribute insights based on these lineages and their characteristic signature mutations, and different distributions of genetic distances of circulating sequences to the vaccine strain.
The analyses of Latin American study sites provided the greatest insights given that 63% of primary COVID-19 endpoints with sequence data were in Latin America where the circulating SARS-CoV-2 sequences were the most diversi ed. All features showing sieve effects in the US also showed sieve effects in Latin America, constituting independent replication of results. The result of no sieve effects in South African study sites can likely be explained by the vast majority of circulating sequences being Beta or Delta variants with limited dynamic range of genetic distances within each variant and a lack of Reference viruses that are close to the vaccine strain.
Another strength of this study was that VE against severe-critical COVID-19 could be assessed. The results support that VE against this endpoint also declines with Spike sequence distance as measured in multiple ways, yet with VE starting higher against viruses closest to the vaccine strain and diminishing less rapidly with increasing degrees of sequence mismatch. Overall, the nding that protection against severe-critical COVID-19 is more invariant to sequence changes than against less-symptomatic COVID-19 may have clinical implications for planning updates of vaccines with new variants. The severe-critical classi cation covers a broad spectrum of clinical phenotypes ranging from individuals with only repeated low partial pressure of oxygen to severe pneumonia requiring respiratory support. Protection against hospitalization with severe consequences is clinically most important but sieve analysis speci c to this outcome could not be performed given small numbers of cases. Yet, ENSEMBLE and post-approval trials have shown high Ad26.COV2.S e cacy against this outcome especially in South Africa after a 6-month boost, suggesting that neutralization resistance and sequence variation may be playing a less dominant role in vaccine-induced protection against the most serious disease, perhaps due to CD8 + T cells. 39

Methods
Trial design, study cohort, and COVID-19 endpoints Trial enrollment began on September 21, 2020. The end of the double-blind period varied by country; the data cutoff for this analysis was July 9, 2021. The main endpoint for sieve analysis is the same COVID-19 primary endpoint (moderate to severe-critical) as in the primary analyses, 10,40 restricting to endpoints starting 14 days post vaccination. Sieve analyses were also conducted for severe-critical COVID-19, again using the same de nition as used in the primary papers. 10,40 Analyses were conducted in the per-protocol baseline seronegative cohort. 40 See Section 1 of the Statistical Analysis Plan (SAP, provided in ref. 41 and as supplementary material) and the Supplementary Materials for further details. SARS-CoV-2 sequencing and sequence data SARS-CoV-2 Spike sequences were generated and variant-typed as described. 40 Sequences were selected for analysis if they were obtained within 36 days following the rst RNA-positive timepoint associated with the rst moderate to severe-critical COVID-19 primary endpoint. See the Supplementary Materials for further details.

Neutralizing antibody titers
Neutralizing antibody titers were measured to a panel of Spike antigens representing the Reference strain B.1.D614G and several variants. 29,30 Each variant was assigned a score de ned as the log10-transformed ratio of geometric mean titer of vaccinee sera against the variant and the geometric mean titer of vaccinee sera against the Reference strain.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.