In clinical trials gathering evidence about the effectiveness of a medical intervention, it is necessary to specify a primary endpoint. An endpoint should represent how patients respond after being given the treatment; it should be expected that the distribution of the endpoint will be more favourable if a treatment is effective than if it is ineffective. In many disorders it is difficult to specify just one specific endpoint, as an intervention may have a variety of effects that cannot be adequately measured through one measurement. For this reason, it is common in many conditions to combine multiple distinct endpoints (which we will refer to as components) into a composite endpoint.
Composite endpoints have been recommended when there is large variability in the disease manifestation, e.g. complex multisystem diseases, allowing multiple equally relevant outcomes to be considered without the need to correct for multiplicity. They have also been advocated for rare diseases, where they might improve the power by increasing the number of events observed. On the other hand, they have been criticised for making trial results more difficult to interpret (1).
One specific type of composite endpoint is a composite responder endpoint, which divides patients into responders and non-responders on the basis of the set of components. Some of these components may be binary (present or absent), some may be continuous. In the case of continuous components, some dichotomisation is necessary, so that patients are responders only if the continuous component is above or below a specified threshold. In Table 1, we provide examples of some commonly used responder-based endpoints and their definitions. In some cases (such as tumour response in Table 1), a patient must meet all the criteria to be a responder; in other cases (such as Rheumatoid Arthritis in Table 1) a patient must meet a set number. Some responder endpoints are not composite and are just formed from a single dichotomised continuous endpoint.
Responder endpoints are appealing as they simplify several (potentially complex) pieces of information into one responder/non-responder variable. The proportion of patients who are responders serves as an easy to interpret measurement of the effectiveness of a treatment.
From a statistical point of view, however, this appealing simplicity comes at a non-appealing cost when one or more component is continuous. Dichotomising continuous variables loses information, a point which has been made several times (e.g. (2–4)). This means that if considering one continuous endpoint, it is substantially more efficient to analyse it as a continuous variable rather than dichotomise it and test as a binary variable. As a rule of thumb, the minimum cost of dichotomisation is requiring a 35% higher sample size for the same level of statistical precision(2).
Assuming that avoiding dichotomisation is desirable, it is not obvious how this is possible when the responder endpoint consists of a mix of continuous and binary components. One approach would be to use the approach of Lachenbruch(5) or Hu and Proschan(6) which use separate test statistics for each component, and form an overall test through appropriate weighting. However, this loses the clinical interpretability of the endpoint, and does not allow efficient estimation of the probability of response. Even in the case of a single continuous component, there may be compelling clinical reasons to keep a responder endpoint dichotomised (7): ease of interpretation to researchers and patients, wide acceptance as important, meaningful clinical diagnosis (e.g. diabetes or hypertension).
This motivates statistical methods that can be used to keep to what is clinically relevant by inferring the proportion of patients who are responders, but utilise information contained in continuous components to improve the efficiency. For the single-component responder, this idea dates back to the 90s, where Suissa and Blais (8,9) proposed methods for doing this for a single continuous component case. To our knowledge, this method rarely is applied in practice despite its advantages over analysing the endpoint as binary. More recently, an approach known as the augmented binary method has been developed that allows composite responder endpoints (that consist of at least one continuous component) to be analysed in a more efficient way, whilst maintaining the definition of the endpoint.
In this paper (and associated supplementary material) we first describe the augmented binary method, focusing on its advantages and drawbacks. The main novel contribution of the paper is a review that identifies new clinical areas where trial efficiency can be improved through use of the augmented binary method. Finally, we discuss some further developments to the method that are motivated by the review.
The augmented binary method - intuition, benefits and drawbacks
The augmented binary method extends previous work focused on a single dichotomised continuous endpoint (8,9) to composite responder endpoints with a mixture of continuous and binary endpoints. The original motivation was solid-tumour oncology (10,11), but subsequent papers have focused on developing the methodology for rheumatology(12) and rare diseases using composite endpoints (13).
For simplicity we focus on the case of a composite responder endpoint that combines a dichotomised continuous component with a binary component. For example, response in solid-tumour oncology consists of the sum of target lesion diameters shrinking by at least 30% from a baseline scan (dichotomised continuous) and no new tumour lesions appearing on a scan (binary). The traditional, binary analysis would work with the data on whether or not each patient is a responder or not. If a patient meets the criteria they are a responder, otherwise not. If analysing a randomised controlled trial (RCT), then one might test for a difference between arms in the proportion of patients who are responders with an established method that gives an effect size, confidence interval and p-value (e.g., logistic regression).
A detailed description of how to fit the method is provided in the supplementary material, including R code that can be used for the case of a composite responder endpoint formed from a single continuous and single binary component. The main intuition behind the method is to first fit a more sophisticated model to the data from the different components, and second to use this model to estimate a probability of response and test for a difference between arms. The second step can be thought of weighting the different patients as a proportion of a response with this proportion depending on how close the continuous component was to the threshold. This is demonstrated in Figure 1, where patients are measured on a continuous and binary component. The continuous measurement must be above 1 for the patient to be a responder, however patients must also meet additional binary criteria. The binary method treats the information as 0s and 1s whereas the augmented binary method uses a ‘response weight’ which is determined from the underlying model and is higher as the continuous component increases. The supplementary material contains a link to an R package that can be used to fit the model.
The benefit of the method is primarily the increased power. By better using the available information, the proportion of patients who respond (and therefore any differences between arms in a RCT) can be estimated more precisely. In more statistical language, the variance of the estimate is lower and the width of the confidence interval (CI) is narrower. Simulation studies presented in (10) found that the average gain in power was equivalent to increasing the sample size by at least 30%. The gain can be considerably higher depending on the scenario, predominantly depending on how well the dichotomisation point divides patients. This gain in power has been confirmed in analysis of a real RCT in rheumatoid arthritis, which showed the reduction in CI width was equivalent to an increase in sample size of >50%. It should be emphasised that this gain in power does not rely on additional data being collected – it just comes from using the existing data more efficiently.
There are some additional benefits of the approach. First, due to the underlying model being fitted, it better allows for missing data on different components(10) (it is generally not obvious how to handle missing data on a specific component of a composite outcome). This is especially true when there is the possibility of some components having more missing data than others. Second, it may also help address issues of misclassification due to measurement error: if a patient is truly close to the responder threshold then a measurement error will have a potentially very large impact on the binary method, but a small impact on the augmented binary method.
There are also drawbacks. First, it is undoubtedly more complex to apply the method compared with standard binary approaches. Some code is available (see supplementary material) for applying the method in specific cases but a more generic implementation in different commonly used statistical programs is a high priority for the future. Second, the method makes more assumptions, for instance that the distribution of the continuous components is normal. This means that it is necessary to check this prior to analysing the data and use a suitable correction if assumptions are not met, such as applying a Box-Cox transformation (14) to ensure the continuous component is normally distributed. Third, if the number of components or number of timepoints at which the endpoint is measured is large, applying the method can require a large amount of computational time. This is generally not an issue for an analysis of a single trial; however assessing the performance of the method on a large number of computer simulations can become infeasible.
Up to now, the method has been applied to datasets in solid tumour oncology(10,11), rheumatoid arthritis(12) and systemic lupus erythematosus (SLE)(15). Based on personal experience of peer-reviewing clinical trial papers and discussion with a wider group of clinicians, we hypothesised that there might be a much greater number of diseases where the augmented binary method could be useful. We decided that a more systematic attempt to identify these clinical areas was warranted.