The use of biomarker information in clinical trials has great potential for efficiently identifying patients most helped by specific treatments.26,28 These biomarkers must have proven their clinical validity in prospective randomized trial with a superiority design in the enriched population.29 However, single-arm phase II trials are the first screenings for efficacy of new therapeutic agents in humans. These trials are important milestones towards testing and adapting the biomarker strategies validated in preclinical stages 27. Although the NI question can be relevant in the phase II setting,5 it is not usually considered on designing single-arm clinical trials in early clinical development.30,31 Previous forays into precision medicine have shown that correctly identifying the target population and associated predictive biomarkers may be more critical for treatment success than simple demonstration of superior efficacy against an alternative.22,23 Consequently, designing a proof-of-concept phase II study that permits NI analysis, if the superiority criteria cannot be met, will allow for more informed decisions that consider efficacy and other parameters, such as safety, cost, and biomarker strategy. Moreover, the analysis of NI provides additional information to the steering committee on the magnitude of the observed activity, without increasing target accrual.7
A prior systematic review evaluating the characteristics of phase II trials that best predict for phase III outcome, selected 270 single-arm phase II between 1981 to 2012. All these selected studies led to a phase III clinical trial. The meta-analysis showed that 168 single-arm trials were not positive; and 61 (36.3%) achieve a positive phase III result despite not having obtained a positive result in the proof-of-concept study.22 Additionally, Jardin DL, et al. 2016, published that 10% of FDA anti-cancer drugs approvals, from 01/01/2009 to 06/30/2014 period, have a negative result in prior phase II clinical trials. 32 In some of these examples, despite the low response rate, investigators considered the response as not worse than conventional treatment; and they decided to continue with a Phase III trial based on other parameters as prolonged duration in clinical benefit or safety.19,33–35 Therefore, investigators decide to switch their primary superiority criteria to a non-inferiority objective and weigh their decision with other relevant parameters. As a result, they achieve a positive Phase III trial, and FDA approves the therapies. 32 This suggest that the NI question could be relevant in the phase II setting and it is not so rare. However, the NI hypothesis was not preplanned, and it was only considered to deal with negative findings in proof-of-concept trials. This strategy leads to an increase in the probability of type I error and the number of false positives. 6 In accordance, both systematic reviews reported and high rate of negative Phase III trials after a negative Phase II superiority trial (between 64% and 85%).22,32 However, if the decision to switch to non-inferiority had been preplanned and included in the statistical design, investigators would have additional information about magnitude of RR (non-inferior, or superior), clinical benefit duration or other parameters without a type I error inflation. In addition, the non-inferiority comparison with an historical control has probed its validity identifying subgroups of patients in adjuvant setting who can avoid the toxic effects of chemotherapy.20
The most popular design for phase II cancer clinical trials is a single-arm two-stage Simon’s study.11 Numerous extensions have been proposed for Simon’s design, including randomized multi-arm trials designed to select the winner among the proposed therapeutics (Pick-the-winner study design).36 However, the inference procedures used in two-stage designs are often not corrected to account for the adaptive nature of these designs. A maximum likelihood estimator of the RR, the number of positive responses/total number of patients, is biased. CI and p-value should not be computed as if the data were obtained in a single stage due to the possibility of early termination.10 Different methods have been proposed to obtain a proper inference from Simon’s two-stage design.10 The use of the UMVUE to estimate RR is recommended as it addresses situations when the actual number of patients recruited is equal to or different from preplanned values.3 The calculation of RR, p-values and CIs has been incorporated in open-access statistical libraries (packages “clinfun” and “OneArmPhaseTwoStudy”, R statistical software) based on previously published methods 9,12.
We observed some differences between calculated (p-value) and simulated values (type I error). However, they were not relevant to the most common design constraints used in phase II single-arm trials (0.01 to 0.1 type I errors and 0.1 to 0.2 type II errors). Using the UMVUE-based calculation method, we proved that the same level of agreement between calculated and simulated values (type I and II errors) results from both two-stage Simon’s superiority designs and designs in which switching to NI was allowed. Importantly, our findings suggest that the proposed method for analyzing NI, when superiority cannot be met, does not introduce bias.
The major limitations of this method are based on bringing together the inherent complexities of a study with historical controls and a NI analysis. However, this limitations are common to comparative designs of NI, because NIM must be established based on historical controls evidence.6 Additionally, some bias as selection of inappropriate patients, poor compliance and insufficient follow-up, that can lead to erroneously conclude that a treatment is not inferior to placebo in a comparative study, go against a positive achievement when the comparison is done among a theoretical rate of efficacy deduced from historical controls. As comparative designs of NI, to declare a therapy as non-inferior in a single arm trial, we need to demonstrate assay sensitivity based on an adequate trial design and conduct.13 In accordance, lower sample sizes in phase II single arm trials are not more challenging for NI analyses than superiority ones if trial is properly preplanned and conducted. Our results suggested that we can conserve the same levels of alpha and beta errors after switching to NI analysis without increase sample size.
Switching to NI analysis with the UMVUE-based calculation method may be extended to two-stage designs with both futility and superiority boundaries (as outlined in the Supplementary Methods).21 We limited our results to two-stage designs that are most popularly for phase II cancer clinical trials, but the methods discussed in this article could be extended to phase II trials with any number of stages.9 Likewise, we can design a single-arm time-to-event study with switching to NI analysis based on the exponential maximum likelihood estimator, one-sample log-rank tests or other approximations to the Kaplan-Meyer estimations.21,37 If we assume the same number of patients as in superiority analysis, we would formulate the NI rejection hazard rate (λ0ni) from the hazard rate assumed under H0 in the superiority analysis (λ0) and the NIM estimated from historical studies6 (λ0NI = λ0 / NIM).