Calibration of Two Validated SARS-CoV-2 Pseudovirus Neutralization Assays for COVID-19 Vaccine Evaluation

Vaccine-induced neutralizing antibodies (nAbs) are key biomarkers considered to be associated with vaccine efficacy. In United States Government-sponsored phase 3 efficacy trials of COVID-19 vaccines, nAbs are measured by two different validated pseudovirus-based SARS-CoV-2 neutralization assays, with each trial using one of the two assays. Here we describe and compare the nAb titers obtained in the two assays. We observe that one assay consistently yielded higher nAb titers than the other when both assays were performed on the World Health Organization’s anti-SARS-CoV-2 immunoglobulin International Standard, COVID-19 convalescent sera, and mRNA-1273 vaccinee sera. To overcome the challenge this difference in readout poses in comparing/combining data from the two assays, we evaluate three calibration approaches and show that readouts from the two assays can be calibrated to a common scale. These results may aid decision-making based on data from these assays for the evaluation and licensure of new or adapted COVID-19 vaccines.


Introduction
Multiple vaccines have shown short-term efficacy against coronavirus disease-19  in phase 3 randomized placebo-controlled trials [1][2][3][4][5][6][7] ; one has been fully approved 8 and three have been issued an Emergency Use Authorization 9 by the US Food and Drug Administration (FDA), and six an Emergency Use Listing from the World Health Organization (WHO). 10 To expediently meet the worldwide demand for sufficient doses (including booster doses) of safe and efficacious vaccines, and for the licensure of new or modified vaccines, the FDA and the WHO have recommended that immunobridging trials may be used to assess the effectiveness of booster vaccines, vaccines in new populations (e.g., pediatric) or against emerging SARS-CoV-2 variants, 11,12 thus sidestepping the need for additional large-scale placebo-controlled randomized trials (which are becoming increasingly difficult to conduct, especially in high-income countries 13 ) or active-control randomized trials. 14 In such immunobridging trials, vaccineinduced immune responses of an established immune marker surrogate endpoint [15][16][17] are typically assessed to ensure that levels deemed sufficient for clinical efficacy are achieved.
In each of the five US Government (USG)-sponsored phase 3 trials, nAb responses are assessed by one of two different pseudovirus-based SARS-CoV-2 neutralization assays, one performed at Duke University and one at Monogram Biosciences. Each pseudovirus neutralization assay has been rigorously validated in accordance with the ICH/FDA guideline 27,28 and these assays are planned for use in additional efficacy trials or immunobridging trials of COVID-19 vaccine candidates for regulatory approvals. If the Duke and Monogram assays do not render comparable readouts or are not calibrated (cross-validated) to a common scale, nAb response data from these assays may misguide regulatory decisions on these COVID-19 vaccine candidates.
To facilitate the harmonization of SARS-CoV-2 nAb data across multiple laboratories and assay types, the WHO developed an International Standard (WHO IS) for anti-SARS-CoV-2 immunoglobulin consisting of a pool of 11 COVID-19 convalescent plasma samples and provided as lyophilized material that, after reconstitution, was assigned a unitage of 1,000 international units per mL (IU/mL) for neutralization assays. 29 Here we describe the Duke and Monogram nAb assays, and compare titer readouts between the two assays using a diverse set of samples that include the WHO IS, COVID-19 convalescent serum samples, and serum samples from Moderna mRNA-1273 vaccine recipients. Because the ultimate goal is to ensure comparability of nAb readouts in vaccine recipients, we also present and compare statistical approaches for calibrating readouts from the two assays to support future vaccine evaluation and development. Table 1 summarizes and compares the Duke and Monogram assays. Similarities of both assays include utilizing lentiviral vector that 1) is pseudotyped with full-length G614 Spike protein (the SARS-CoV-2 surface protein responsible for virus entry via interaction with the human ACE2 receptor 30,31 ) and 2) contains a firefly luciferase (Luc) reporter gene. Thus, in both assays a quantitative luminescence readout is directly proportional to the amount of pseudovirus entry into target cells. In both assays, neutralizing antibodies in the assayed sample inhibit pseudovirus entry and reduce luminescence signal in a dose-dependent manner, providing the basis for quantifying the neutralizing activity of the sample. Both assays are established to measure ID50 and ID80 neutralizing titers and are fully validated in terms of all required assay performance parameters 27 for convalescent serum and partially validated in terms of specificity, linearity, precision and limits of quantitation of the assay for vaccine incurred samples, in order to extend validation for the purpose of testing clinical trial specimens.

Results
Differences in pseudovirus preparation, cell lines, plate layouts, and dilution schemes are also noted in Table 1. Among the notable differences is the way that Transmembrane Protein Serine 2 (TMPRSS2) is introduced into the assay. SARS-CoV-2 fuses with host cells by binding to the ACE2 receptor; the host cell TMPRSS2 protease cleaves the Spike protein at the S1/S2 sites to facilitate fusion. 32 In the Duke assay, TMPRSS2 is expressed during pseudovirus production, whereas in the Monogram assay it is expressed on the surface of HEK 293T/ACE2 target cells.
This difference can impact relative neutralization titers measured in these assays, with higher titers seen in HEK 293T/ACE2/TMPRSS2 cells (Montefiori, unpublished).
For assay comparison and establishment of the calibration algorithms, nAbs were measured in the first WHO IS for anti-SARS-CoV-2 immunoglobulin (20/ and Day 57 titers were on average more than 20-fold higher than Day 29 nAb responses in both assays ( Figure 1, Table 2). Specifically, geometric mean ID50 titers (standard deviation of loge-  Figure S1). Day 29 and Day 57 nAb titers were treated separately in the calibration process, in order to cover different dynamic ranges of possible assay readouts.
Prior to calibration, we observed that ID50 and ID80 titer readouts from the Monogram lab were consistently higher than those from the Duke lab measured in convalescent sera, vaccine recipient sera, and the WHO IS sample ( Figure 1, Table 2 in convalescent samples, 3.7 (0.5) and 4.6 (0.2) in Day 29 and Day 57 vaccine recipient samples, respectively, and the fold difference of the geometric (arithmetic) mean ID80 titers across replicates of the WHO IS sample was 7.5 (6.6). These differences in readouts between the two assays are an example of the need for calibration of the responses to the same scale prior to cross-assay comparisons or cross-assay merging of results. The fact that fold differences between the two assays are largely consistent across convalescent samples and vaccine recipient samples suggests that the differences between the assays are not specific to naturally acquired or vaccine-elicited nAbs. In addition, the range of readouts in convalescent samples generally covers that in vaccine recipient samples; this supports the use of those convalescent samples to develop the algorithms for the calibration of titers in the vaccine recipient samples. On the other hand, the WHO IS, which is a single sample comprised of a pool of convalescent plasma samples, does not cover the entire range of titers and hence resulted in an expected narrower range of replicate values.
We evaluated three different approaches for the calibration of vaccine-induced neutralizing antibody response titers measured by the Duke and Monogram labs. Approach 1 uses tiers of the WHO IS sample to `standardize' nAb titers measured by each lab to a common international unit (IU/ml), 33 whereas Approaches 2 and 3, via two different statistical algorithms (details in Methods), use titers of an independent pool of clinical (e.g. convalescent) samples measured by both labs to calibrate titers measured by each lab to a common scale still in titer units.
For Approach 1, calibration factors were calculated using the arithmetic mean, geometric mean or median titers of the WHO IS sample measured by both labs (Table S1). Using the arithmetic mean-based calibration factor, calibrated ID50 (cID50) and ID80 (cID80) readouts of the Day 29 and Day 57 vaccine-induced responses from the two labs rendered the highest agreement, compared to the geometric-and median-based factors, with a concordance correlation coefficient (CCC) of 0.75 (95% CI: 0.60, 0.85) for cID50, and 0.72 (0.56, 0.82) for cID80 ( Figure   S2). Therefore, Approach 1 using the arithmetic mean-based calibration factor is used to compare with Approaches 2 and 3 ( Figure 2).

Discussion
Neutralizing antibody data obtained from multiple trials of different vaccines can support the establishment of nAbs as a correlate of protection, and if demonstrated, may facilitate approval of future vaccines through immunobridging. It is not often possible for samples from different vaccine trials to be evaluated in a single assay and laboratory. In the case of SARS-CoV-2, nAb data were generated at one of two independent laboratories for each of the five USGsponsored vaccine efficacy trials; this approach is planned for future trials, as well. Because assay readouts in vaccine recipient sera from these two laboratories are on average 3-5 fold different, our study addresses this imperative need via calibrating data from the two assays to the same scale.
Our first main conclusion is that the WHO IS sample (Approach 1) demonstrated satisfactory performance comparable to that of the panels of convalescent sera in calibrating COVID-19 vaccine-induced nAb titers measured by these laboratories. This finding has important implications for future development and evaluation of COVID-19 vaccines because the WHO IS is more widely available and hence provides a straightforward way to compare and combine data across different platforms, and/or different developers and laboratories. Another main conclusion is that any of the three different calibration approaches can be used to bring readouts from different assays to the same scale for further analyses across studies, but that, due to the distinct advantages and disadvantages of each approach (summarized in Table S2), the specific approach that will be best suited for a given analysis depends on the availability of data to establish the calibration algorithms. For our motivating applicationcross-protocol immunogenicity and correlates analyses of the 5 USG COVID-19 trials, where all nAb data are obtained by either the Duke or the Monogram assay -Approach 2 is recommended. However, for analyses including nAb data not measured by either the Duke or Monogram assay, then Approach 1 (using International Units) is likely the most feasible option, because not all labs have access to the convalescent samples that are necessary for Approach 2. We discuss other implementation considerations for the three Approaches below.
Among the three approaches, Approach 1 is the easiest to implement and likely requires the fewest resources as long as data on the WHO IS sample are available from the lab from a sufficient number of vials (e.g., > 20-30) and replicates. One requirement is to choose an appropriate summary statistic to calculate the calibration factor. In the dataset presented in this study, the calibration factor based on the arithmetic mean of ID50 or ID80 titers rendered the best performance. This is largely due to the fact that the corresponding WHO standard-based calibration factor for each lab translates to conversion factors, i.e., the ratio of readouts in international units of 3.7 for ID50 and 6.6 for ID80 titers between Monogram vs. Duke labs, that are in line with the fold-differences observed in the convalescent patient samples and in the vaccine recipient samples. This explains why this simple multiplicative-factor approach rendered reasonable calibration performance. Meanwhile it also highlights that external neutralizing antibody titers from convalescent or vaccine sera generated by both labs on a common set of samples can inform decisions on which summary statistic to use in Approach 1 by evaluating the conversation factor against the additional data sources. Approach 1 is currently being used in the analyses of nAb data generated from two of the USG trials, one set of data from the Duke lab and the other by the Monogram lab. These analyses provide an important validation that assay-specific differences are minimized in the resulting calibrated data based on Approach 1 across the two USG trials, so that potential differences or similarity between the vaccine candidates in terms of their immune responses are not masked by differences between the two assays.
For both Approaches 2 and 3, a common pool of convalescent or vaccine samples will need to be assayed by both labs in order to establish the calibration algorithm, where Approach 2 requires readouts from both labs to be calibrated to a common scale, and Approach 3 calibrates the readouts from one lab to the other reference lab. We find that Approach 2 rendered the best calibration performance, followed by comparable performance between Approaches 1 and 3 for both ID50 and ID80 titers. These results suggest that precision in calibration can be improved by accounting for assay measurement error and the differences between the two assays across a range of response magnitudes (i.e., as done in Approach 2). In our case, measurement errors were characterized in assay validation experiments using both convalescent samples and vaccine samples by both labs, and assay differences in magnitude were characterized using convalescent samples with dynamic ranges of titers that were comparable to the ranges of titers of vaccine trial samples requiring calibration. Therefore, Approach 2 is recommended when paired-sample data are available from the two assays/labs (as is the case for the analyses of assay data from the 5 USG COVID-19 vaccine trials, where both the Duke and Monogram labs assayed a common set of convalescent samples for the calibration of their readouts in vaccine samples).
There are several limitations of this study. First, for evaluating the performance of the calibration approaches, we only considered a limited number of samples from trials of the mRNA-1273 vaccine. This potentially limits our ability to generalize findings from the calibration study to other COVID-19 vaccine platforms. We plan to assay a subset of samples from phase 3 trials of other COVID-19 vaccines to further validate the calibration approaches when those data become available. Second, to develop the calibration algorithms, both labs must have their assays performed on a common pool of independent samples that are expected to cover a similar dynamic range as the readouts requiring calibration. Access to such a common pool may be challenging for many labs. In that case, Approach 1 might be the most feasible option as long as the WHO IS sample is available in sufficient quantity. Third, for all three approaches, we focused the calibration only for titers above the lower limit of detection (LLoD) of the reference assays. Methods that account for calibration of responses below the LLoD, especially in the context of two assays having different LLoDs, are currently under development. Fourth, this study only concerns data from two specific labs measuring nAbs. Further validation is needed to investigate the performance of the three approaches applied to the calibration of other assays and/or immune responses. Although the described calibration approaches are only applied to the calibration of assays measuring anti-SARS-CoV-2 nAb responses induced by candidate vaccines, a more general use is amenable when assay data from multiple sources are of interest for comparison or combination. Fifth, the two nAb assays used in this study were Our study has a number of strengths. First, three distinct statistical approaches for calibration, each with different data requirements and/or data assumptions, are discussed to provide researchers versatile tools to help guide decisions for which approach may be optimal for calibrating a given data set. Second, each assay was formally validated for the assessment of COVID-19 vaccine-induced or naturally acquired neutralizing antibody immune responses, according to current regulatory guideline and standards. Third, to our knowledge, this is the first study that uses convalescent sera and the WHO IS sample to establish calibration algorithms and evaluate the performance of different calibration approaches for samples from COVID-19 vaccine trials. When more data become available, each approach could be readily updated to improve calibration accuracy and precision. Lastly, although the described calibration approaches were only illustrated for the analysis of data from two labs, all three approaches can be easily generalized to the calibration of data from more than two assays or labs.
In summary, correlates and immuno-bridging studies that could accelerate the development and evaluation of new and modified vaccines are at the heart of COVID-19 prevention efforts. We presented calibration approaches to support valid comparisons and combination of key immune response data from these studies to minimize the influence of assay-specific differences and optimize the characterization of future vaccine candidates. Table 1 provides a comparison of the assays' components and major steps. Further details are provided for each assay below.

Duke
Neutralization was measured in a formally validated assay that utilized lentiviral particles pseudotyped with full-length SARS-CoV-2 Spike protein and containing a firefly luciferase (Luc) reporter gene for quantitative measurements of infection by relative luminescence units (RLU).
The assay was performed in 293T/ACE2.MF provided by Drs. Michael Farzan and Huihui Mu.
Pseudoviruses were prepared and titrated for infectivity essentially as described previously. 35 Briefly, an expression plasmid encoding codon-optimized full-length spike of the Wuhan-1 strain and TMPRSS2 expression plasmids, were detached from 10 cm 2 plates using trypsin/EDTA and re-suspended in culture medium to a final concentration that accommodated the addition of 10,000 cells per well. Cell suspension was added to the serum-virus mixtures and assay plates were incubated at 37˚C in 7% CO2 for 3 days. On the day of assay read, Steady Glo (Promega) was added to each well. Reactions were incubated briefly and luciferase signal (RLU) was measured using a luminometer. Neutralization titers represent the inhibitory dilution (ID) of serum samples at which RLUs were reduced by either 50% (ID50) or 80% (ID80) compared to virus control wells (no serum wells).
The Monogram assay employs a specificity control which is created using the same HIV backbone/Luc sequence used in the SARS-CoV-2 pseudovirus. The envelope is 1949 Influenza A H10N3. It is unlikely for human sera to have antibodies against this rare avian influenza virus. The specificity control is designed to detect non-antibody factors (e.g., ART therapy) that could inhibit SARS-CoV-2 pseudovirus and result in false positive measurements of antibody neutralization. Positive anti-SARS-CoV-2 nAb activity was defined as an anti-SARS-CoV-2 nAb titer >3 times greater than the titer of the same serum tested with the specificity control.

mRNA-1273 vaccine recipients:
Neutralization activity was assayed in a total of 90 serum samples collected in a phase 1 trial 37,38 (NCT04283461) that used the same mRNA-1273 vaccine, dose, and schedule as that used in the Moderna phase 3 trial 3 (NCT04470427). Sera were collected from 30 vaccine recipients at day 0, day 29 (4 weeks post-first dose), and day 57 (4 weeks post-second dose).

Convalescent patients:
Neutralization activity was also assayed in a total of 248 serum samples collected in HVTN

WHO anti SARS-CoV-2 immunoglobulin International Standard (IS) sample
In December 2020, the WHO released a well-characterized international standard for the purpose of improving comparability of results among different assays in different laboratories and reducing interlaboratory variability of anti-SARS-CoV-2 antibody assays. 33  titers multiplied by the calibration factor, which is calculated as 1,000 (IU/ml) divided by the geometric mean, median or arithmetic mean ID50 or ID80 titers of the WHO IS sample that each lab measured in multiple vials. During the validation step, the calibration factor based on one of the above summary statistics that renders the highest agreement between calibrated responses from the two labs for the same set of samples is selected.
In Approach 2, both the Duke and Monogram responses are calibrated to a common scale using the Monogram lab as the reference based on methods described in Huang et al. 40 Briefly, both readouts are assumed to share the same underlying true values but carry lab-specific measurement errors, following a bivariate normal distribution. Based on the assay validation experiments from both labs, a 30% coefficient of variation is assumed for the measurement error for ID50 and ID80 titers in both assays. The calibrated values are then calculated as the expected conditional means derived from the joint bivariate distribution as   The arithmetic mean-based calibration factor was used in Approach 1. The concordance correlation coefficient (CCC) and 95% confidence intervals indicate the level of agreement between the x-and y-axis values.  cID50, cID80: ID50, ID80 titers calibrated to the WHO International Standard, expressed in International Units per ml (IU/ml) (Approach 1).  cBIVN-ID50, cBIVN-ID80: ID50, ID80 titers from each lab calibrated to a common scale using an independent pool of clinical samples, assuming a bivariate normal distribution for the readouts from the two labs (Approach 2).  cREG-ID50, cREG-ID80: ID50, ID80 titers from the non-reference lab calibrated to the reference lab using an independent pool of clinical samples, based on regressing titers from the reference lab on the non-reference lab (Approach 3).