Inter-assay consistency in reduction in parasite liver burden studies
Two human active mAbs were tested, AB311 and AB317, that bind to the NANP repeats of P. falciparum CSP. These mAbs have previously been shown to bind with high affinity and have in vivo functional activity at a single tested dose (16). AB317 has also been previously tested across a range of doses (30-300 µg)  and this guided the selection of dose levels for the present study. The mAb AB1245  that binds to ookinete antigen Pfs25, was used as an IgG1 isotype matched negative control. The main features of the in vivo functional assays here have been described  and experimental details of the assays, including the use the parasites expressing luciferase induced bioluminescence as a measure of liver burden, have been recently reported .
To determine the consistency of the reduction in parasite liver burden assay, seven independent experiments were conducted on separate days with different preparations of infectious sporozoites. AB311 was delivered i.v. in a uniform volume of 200ul at indicated doses, while the sporozoite challenge was administered i.v. post mAb delivery. In all seven studies, mice that received AB311 showed a dose-dependent reduction of transgenic sporozoite infection in liver as measured by the total flux of bioluminescence (Table 1). There was no reduction in transgenic sporozoite infection in mice that received negative control AB1245 (Table 1). The inter-assay consistency results are shown in Table 1. In mice receiving no antibody treatment (untreated infected), the total flux measures were consistent across studies in overall level (range of log10 flux 6.95 to 7.62). Using random effects models to estimate variance of log10 flux, inter- and intra-assay variation were similar and limited among the untreated, infected mice (estimated standard deviations of 0.16 and 0.10 log10 flux or 1.17 and 1.11-fold changes, respectively). These results demonstrate very consistent levels of infectious sporozoites delivery, implying good reproducibility in the preparation, handling, and infectivity of sporozoite batches. Consistency in total level of infection was also seen across experiments for treated groups receiving AB311 (Table 1). Using linear, mixed effects models adjusting for dose variation, the estimated inter-assay variation for treated animals was similar to the inter-assay variation for the infected controls (standard deviation of 0.13 log10 flux or 1.14-fold). For intra-assay variation, the standard deviation for log10 flux within all groups receiving AB311 ranged from a high of 0.35 to a low of 0.05 across experiments (Table 1), with a model-estimated standard deviation of 0.20 log10 flux (1.22-fold). Taken together, the consistency of total flux data for the treatment groups also implies reproducible handling and delivery of mAbs as well as consistent measurement of liver infection by bioluminescence.
To set the background level of luminescence, two naïve mice were used in each study that did not receive any challenge parasite or antibodies, but received only the substrate D-Luciferin. A level of 5.03-5.25 log10 total flux was observed across experiments (Table 1). This sets the lower limit, or minimum log10 total flux, that can be measured for the liver burden assay. The 600-µg AB311 dose group achieved the log10 flux value of 5.40, indicating nearly complete inhibition.
Consistency of mAb performance of parasite liver burden assay was assessed by comparing ID50 estimates. The ID50 represents the dose at which there is 50% reduction in log10 flux between the upper and lower limits of the liver burden measurements. The ID50 was modelled using a four-parameter logistic regression analysis of reduction in log10 flux for different dose levels (Fig. 1). In the seven studies, the estimated ID50 for AB311 ranged from 103 µg to 160 µg with overlapping 95% confidence intervals among the experiments. These results show that for a single antibody, the reduction in liver burden assay can be performed in a highly reproducible fashion and that results are comparable across studies.
Comparison of functional activity of two mAbs using reduction in parasite liver burden assay
An important application of the liver burden assay is to identify mAbs with higher potency compared to a control. To model such a test, AB311 and AB317 were compared in the same study. Mice were dosed with AB311 or AB317 and challenged with transgenic sporozoites using identical assay conditions. Two distinct experiments are shown in Fig. 2. Control groups that received no mAb treatment had a mean log10 flux of 7.2 in both studies. The assay standard deviation within all groups receiving treatments was similar to that shown in Table 1, ranging from a minimum of 0.05 to maximum of 0.28 (Supplement Table 1).
To compare potency, differences in ID50s were tested comparing AB311 and AB317 using the 4PL model (Table 2). Compared to AB311, the ID50 for AB317 was lower by 1.5-fold (p=0.07) and 1.6-fold (p=0.04) for each experiment, and 1.6-fold lower (p < 0.01) pooling the data across both experiments.
Circulating mAb serum concentrations one hour prior to the time of challenge were measured using a CSP ELISA (Fig. 3a). IC50 was determined using 4PL model and results are reported in Table 2. While there was a trend toward higher potency of AB317 compared to AB311, differences in IC50s were not statistically significant between the two mAbs within either experiment or when pooling the data across experiments.
Inter-assay consistency of measuring protection from parasitaemia following mosquito bite challenge
An endpoint that mimics clinical application is the ability of a mAb to protect from blood-stage parasitaemia following an exposure via mosquito bite challenge. In three experiments performed on different days, groups of seven mice were administered with AB311 or AB317 or human IgG isotype control AB1245. Infection by mosquitoes (Anopheles stephensi) carrying transgenic sporozoites occurred 16 hours after antibody administration. To assess parasitaemia, blood smears were performed daily between 4- and 12-days post challenge (Table 3). Control antibody AB1245 conferred no protection in all studies. At the 600µg dose both AB311 and AB317 conferred protection in all animals tested. Animals receiving the 300µg dose demonstrated partial protection, and only animals receiving AB317 were protected at the 100µg dose.
For each mAb, an ID50 was estimated using a two parameter logistic regression curve (2PL) (Table 4) to explore the differences in antibody functional activity and AB317 was tested for superior potency over AB311 based on the liver burden experiments. Compared to AB311, the ID50 for AB317 was lower for all three experiments (range: 1.0-1.8-fold lower) indicating a trend in increased functional activity. However, a significant difference was only detected after pooling the data from all experiments (1.4-fold lower, p = 0.02).
As with the liver burden experiments, circulating mAb serum concentrations one hour prior to challenge were measured using a CSP ELISA (Fig. 3b). IC50 was then estimated for each mAb using a 2PL model (Table 5). AB311 IC50 ranged from 61μg/ml to 87μg/ml and the pooled estimate was 74 μg/ml. Similarly, the IC50 estimate for AB317 was lower, ranging from 41μg to 59μg, and the pooled estimate was 48 μg/ml. Similar to the ID50 comparisons, the IC50 for AB317 was consistently lower than AB311 with similar reduction range (1.0-1.9-fold lower). The fold reduction was statistically significant for one experiment (1.9-fold lower, p < 0.01 in Study 2) and when pooling the data across all experiments (1.52-fold lower, p = 0.02).
Single dose comparisons using time to infection
Assuming that protective activity of a mAb is correlated with delay in infection, time to infection was evaluated for the ability to discriminate antibody functional activity. Using both protection and time to infection, superiority (using one-sided tests) of AB317 over AB311 was tested based on results from the liver burden experiments. For the three experiments, the 600µg dose for both AB311 and AB317 induced full protection and identical results and, therefore, only 100 and 300µg dose results were compared. The proportion of infected mice at the 300µg dose was significantly lower for AB317 in one out of three studies using Barnard’s exact test (Table 6). The pooled data also showed significant difference in protection at this dose level. Next, using the Exact log rank test; significant delays in infection time for AB317 compared to AB311 were found in each study in at least one of the doses, and for pooled data at both dose levels (Table 6, Figure 4). Of note, in Study 2 for the 100µg dose, AB317 induced a significant delay in infection time despite 100% infected mice.
Potency comparisons across all doses
The logistic regression model was used to compare the risk of infection for mice treated with AB317 and AB311 adjusting for dose using one-sided tests (i.e., testing for superiority of AB317 over AB311). Mice dosed with AB317 had statistically significant lower odds of infection (indicating superior potency) compared to those dosed with AB311 in one experiment, were at the border of significance in one experiment, and failed to discriminate in the third experiment (Table 7). As with the comparison testing using a single dose, results were significant when the experimental data were pooled, increasing overall sample size (21 mice).
In the logistic regression model the odds ratio, OR, represents the odds of infection for animals in the AB317 group compared to the AB311 group adjusting for dose. In this model, an odds ratio (OR) below 1 (one) indicates superior protection for AB317 over AB311. Test for superiority were performed using one-sided 95% confidence intervals of the OR. Statistically significant comparisons (upper bound below 1) are bolded.
Experimental design of comparison studies using both assays
The data was explored to determine how the two assays might be used in the future to identify potent mAbs among a number of candidate mAbs. For the liver burden model, the dose response curve indicates that best discrimination at a single dose is determined at the midpoint of the dynamic range, near the ID50. At the ID50, the resulting measurements are furthest from the asymptotes where the signal to noise ratio may be reduced. Using the results from this study, candidates could be compared against the AB311 reference control at a single dose near the AB311 ID50 estimated here in the range of 103-160 µg (Fig. 1). The power to detect differences will depend on the group size and the expected variation (SD) in the data obtained. Using a range of three standard deviation estimates observed in the data, the power to detect differences from AB311was calculated across a range of potential fold flux reductions induced by the putative more potent candidate mAb (Fig. 5). A candidate mAb with similar functional activity to AB317 (1.5-fold reduction in ID50 compared to AB311) can be expected to have a 1.5- to 2-fold reduction in flux compared to AB311 at the given dose. Shown in Fig. 5, the power to detect such a difference in functional activity is relatively low (< 50%) when using five animals at a single dose. The current experimental designs, using five animals per group, are powered at 80% to detect candidate antibodies with 2.9-fold reduction in flux (2.4-fold ID50 change) compared to the reference control, at the average estimated standard deviation determined in our experiments (SD = 0.225). To detect antibodies that are slightly more potent (e.g., 2-fold reductions similar to AB317 vs. AB311) larger group sizes would be required using the assumption of average standard deviation for intra-assay variability (N = 10 for 80% power). Intra-assay variability has a large effect on discrimination: studies are only powered to detect large differences in potency using five animals per group (>5-fold flux reduction for 80% power) using the highest observed standard deviation for intra-assay variability (SD=0.35). Furthermore, for a candidate antibody with an underlying 3-fold improvement in ID50, the power is >90% using the average standard deviation but drops below 80% at the highest observed standard deviation for intra-assay variability. At this level of variation, 80% power to detect differences in ID50 of 2-fold is not achieved even with large group sizes (N > 12). This calculation clearly highlights the importance of maintaining good control over assay consistency when screening for antibodies with different levels of activity as exemplified by AB311 and AB317. Additionally, power might be improved by adjusting the hypothesis testing, setting non-inferiority margins, and increasing the false positive rate.
An analysis was also conducted to guide the testing of candidate antibodies in the protection from parasitaemia assay. Again, AB311 was considered as the reference (Fig. 6) and only the 300 µg dose data were used as there was no protection observed at the 100 µg dose. Power was calculated based on candidate antibodies with increasing protective efficacy (proportion of animals uninfected) and estimates of ID50 were approximated from protection using the logistic model’s fit to the pooled experimental data (Fig. 6). Future experiments are likely be to be designed to test mAbs for superiority, therefore one-sided tests can help improve statistical power. This analysis suggests that using the protection from parasitaemia assay to discriminate levels of functional activity is likely to be difficult unless the underlying difference in is substantial. As shown in Fig. 6, with 40 % protection achieved using AB311 as a reference, a superior antibody is only detected with near 80% power when the expected protective functional activity of the candidate mAb is greater than 90% (> 1.75 reduction in ID50) and group sizes of greater than 10 are used. For experiments with a candidate similar to AB317 (1.5-fold reduction in ID50), approximately 33% power is expected using seven animals but that could be improved to around 50% using 10 animals. Similar to comparisons using liver burden reduction, power could be improved by considering non-inferiority margins or increasing the false positive rate. Additionally, experiments could be powered based on the infection time outcome as more is learned about theconsistency of this endpoint through experimentation.