Comparing Methods for Handling Missing Cost and Outcome Data in Clinical Trial-based Cost-effectiveness Analysis

OBJECTIVES: This study compares methods for handling missing data to conduct cost-effectiveness analysis in the context of a clinical study. METHODS: A long-term clinical trial with staggered recruitment was used as a case study (EVRA, NIHR-HTA project 11/129/197). Patients had between 1 year and 5.5 years (median 3 years) of follow-up under “early” or “deferred” treatment. The methods compared were Complete-Case-Analysis (CCA), multiple imputation using linear regression (MILR) and using predictive mean matching (MIPMM), Bayesian parametric approach using the R package missingHE (BPA) and repeated measures mixed model (RMM). The outcomes were total mean costs and total mean quality-adjusted life years (QALYs) at different time horizons (1 year, 3 years and 5 years). RESULTS: All methods found no statistically significant difference in cost at the 5% level in all years, and all methods found statistically significantly greater mean QALY at year 1. At year 3, BPA showed a statistically significant difference while other method did not at later time horizons. Standard errors differed substantially between the methods employed. CONCLUSION: MIPMM seemed to perform better than MILM, confirming findings from simulation studies. RMM and BPA might be feasible options, though they did not perform better than MIPMM in this dataset. Further simulation studies and applications should continue to compare these methods. RMM. repeated measure mixed model; CCA. complete-case-analysis; MIPMM. Multiple Imputation using predictive mean matching; MILR. Multiple Imputation using linear regression; BPA. Bayesian parametric approach; QALY. quality-adjusted life years; ICER. incremental cost ratio; CEAC. cost-effectiveness acceptability curve; SE. standard error; MAR. missing at random; MCAR. missing completely at random

Comparing methods for handling missing cost and outcome data in clinical trial-based cost-effectiveness analysis 1. Introduction.
Missing data occurs when one or all variables are missing for a given subject. This often occurs in longitudinal studies and can particularly be a problem in within-study cost-effectiveness analysis (CEA) because accurate estimates of total mean cost and quality-adjusted life years require full data to be collected on each subject at each follow-up time point (1-3).
Handling missing resource use and outcomes requires assumptions about the missing data mechanism, specifically, the factors that might have influenced data being missing and whether individuals with missing data are similar to, or in some way different from, those with fully observed data. An inappropriate method can bias the results, make inefficient use of the data available or reduce the overall statistical power of the study (4)(5)(6). Cost-effectiveness analysis adds extra dimensions that need to be considered. First, because total costs and total quality-adjusted life years (QALYs) are a weighted sum of multiple item measurements at each follow-up, estimation of these variables over the total time horizon for any subject requires complete data over the full follow-up, so methods which discard subjects with any missing data or do not make use of all the observations are highly inefficient. Second, any analytic method has to take account that there are two dependent variables (costs and QALYs) that will have a correlated, bivariate distribution.
The purpose of this study is to compare methods for handling missing data in cost-effectiveness analysis, using as a case-study a 5-year clinical trial: The Early Endovenous Ablation in Venous Ulceration (EVRA) (7). With a given dataset, there are several ways in which the chosen missing data approach might influence the results: different subjects used in the analysis, different number of observations used per subject, different statistical models of the missing data mechanism and the latent correlation between observed and missing observations, or different estimation model to estimate total mean costs and QALYs and the correlation between them. The aim of this study is to compare results using different methods for handling missing data. The dataset provides an interesting case study because despite very complete overall follow-up of patients, there was considerable item missingness arising from the way the trial was designed. In each case we use a common estimation model for costs and QALYs, and standard off-the-shelf software. Previous tutorial papers (1-3) compared case-analysis (CCA) (8)(9)(10), multiple Imputation (MI) (8,(10)(11)(12) and repeated measure mixed model (RMM) (13). This study extends this previous methodological work by including a Bayesian parametric approach using the R package missingHE (BPA) (14) and examining the results over different time horizons (and hence with different quantities of missing data) of 1, 3 and 5 years. In each case we calculate the mean incremental total cost and QALY, standard errors, the incremental cost-effectiveness ratio (ICER) and the cost-effectiveness acceptability curve (CEAC).
The key assumption for CCA to be unbiased is that individuals with complete data are representative of those with missing data, conditional on the variables included in the analysis model. This assumption is known as "missing completely at random" (MCAR). Furthermore, CCA is inefficient in studies with more than one follow-up because all the information from individuals with at least one assessment missing is discarded (list-wise deletion). Also, CCA is not 'intention-to-treat' because some randomised patients with follow-up data are excluded (1,15). MI operates within the more realistic "missing at random" (MAR) framework and is usually fairly straightforward to implement using standard software.
Furthermore, it allows considerable flexibility about how the posterior distributions of missing data are specified, including semi-parametric (posterior mean matching, PMM) approaches, and separates the imputation of missing data from the estimation of the analysis model, which helps practitioners understand the intuition behind the approach. However, some have criticised the rather "ad-hoc" approach of MI and argue that this method does not account for the complexities of the data or the time dependence between the responses and does not correspond to a valid probabilistic model (16). BPA using the package missingHE attempt to provide a more explicit mathematical framework for making inferences from incomplete data, i.e., it presumes that assumption about missing data as well as the uncertainty it carries out can be made explicit through prior distributions (17). The package missingHE also provides models to explore missing not at random (MNAR) situations. Nevertheless, the use of missingHE is still not frequent in CEA due to the unfamiliarity of software among practitioners and the reliance on a limited range of parametric models. MI replaces each missing observation with a set of plausible imputed (predicted) values, drawn from a posterior predictive distribution of the missing data given the observed data. BPA has some similarities with MI, but does not explicitly create multiple datasets, but instead treats missing items as if they were parameters and estimates posterior distributions directly for each of them.
RMM, also known as a variance-components models, random effects model, hierarchical linear model, or multi-level model, relies on maximum likelihood to estimate the covariance within subject, and if specified correctly are valid for data that are MAR. In the mixed model, information from observed data is used to implicitly impute the missing data (18). In this paper we use the parameters estimated from this model to calculate total incremental mean costs and total mean QALY for each treatment group. This is achieved by dividing the follow-up into a series of discrete time periods, estimating the mean incremental cost per person (treatment-by-time interaction) for each time period using the mixed model, and then estimating total mean incremental cost as the sum of these coefficients over the whole-time horizon. The process is repeated in the same way to estimate total mean incremental QALY. The correlation between total mean incremental cost and total mean incremental QALY is preserved by repeatedly running the routines using resampling with replacement (bootstrap). In this paper, we assume the simplest form of

Data
The Early Endovenous Ablation in Venous Ulceration (EVRA) randomised clinical trial evaluated the cost-effectiveness of "early" versus "deferred" endovenous ablation to treat venous leg ulcers. The trial methods and patients are described elsewhere (7). Briefly, resource use items in hospital, primary and community care and medications related to the treatment of venous ulceration, adverse events or complications were collected by case note review and questionnaires completed at baseline and monthly thereafter up to one year, plus one further telephone follow up between October 2018 and March 2019.
The baseline covariates included in all the estimation models were: TREAT is treatment randomised ("early" coded as 1 or "delayed" coded as zero). The variable is the time variable (coded as a set of categorical (dummy or factor) variables) representing the week after randomisation at which data are observed, from t=0 (baseline) to t=16 (week 260). SIZE, AGE and DURATION are the ulcer size(cm 2 ), subject's age (years) and length of time with ulcer (years), respectively, measured at baseline and centred at the means. SITE was coded as a factor variable.
Each item of resource use was multiplied by UK unit costs obtained from published literature, NHS reference costs, and manufacturers' list prices to calculate overall costs within each of these categories for each patient (7). The costs for each individual over their follow-up (from randomization to date of censoring for that individual) were assigned or apportioned into discrete time periods, that corresponded to 12 monthly periods during the first year (as follow-ups were monthly) and then yearly periods thereafter. This allowed discounting to be applied (3.5% per year), and facilitated analysis using the MI and mixed model in long format (see below).
EQ-5D-5L was collected at baseline, 6 weeks, 6 months, 12 months, plus one further telephone follow up between October 2018 and March 2019, and a utility index was calculated at each time point using a published tariff (26). Due to rigorous follow-up procedures, there were very few withdrawals or dropouts from the study. Nevertheless, due to the study design, data were not observed for every subject for every period (unbalanced panel) for two reasons. First, patients were followed over different time horizons (from 3 years to 5 years) due to staggered recruitment into the study. Second, to keep the cost of the research study low, quality of life was only collected at one time point after the first year.
Patients who died during the study were assigned zero costs and HRQOL thereafter. Code and example data are available in the Supplementary data, http://dx.doi.org/10.17632/j8fmdwd4jp.5.

Examination of pattern of missing data and hypothesis about missing data mechanism
Data are incomplete in this study for three reasons. first year, missing data at year 2, one follow-up at year 3, and missing data for years 4 and 5. As before, it would be reasonable to consider this pattern of missing data to be MCAR, as the choice of date of final follow-up was independent of other variables. This mainly affected collection of EQ-5D, because in the absence of questionnaire data, most types of resource use could be obtained from case-notes.
Third, a relatively small amount of data were incomplete due to early dropout and item missingness (such as failure to complete an interim questionnaire). These data are usually considered MAR.
The pattern of missingness was examined using descriptive statistics and via the linear logistic model (Equation 1). This can distinguish MAR from MCAR, although cannot rule out the possibility that data are missing not at random (MNAR).
where denotes the probability that an observation is missing in individual i at time t.

Repeated measure mixed model framework
We considered a RMM where the outcome could vary over subject, centre, and time point, after controlling for covariates. A two-level variance component model was specified as Equation 2 (long format data).
is the outcome variable (one model for costs during each period t and another for EQ-5D tariff at the end of each period t) for each subject i at time point t. Hence for the model where the dependent variable is cost, 0 is set to be zero for all subjects, 1 is the cost for patient i during the first 4 weeks 2 is the cost between the 4th and the 8th week, and so on up to 12 (week 52). After that, the periods are set to be yearly, so that 13 is the cost between week 52 and week 104 (year 2), and so on up to 16 (year 5 or week 260). Ϛ is the random deviation of subject i's mean costs or EQ-5D tariff from the overall mean 0 and , often called within-subject residual across time, is the random deviation of from subject i's mean costs or EQ-5D tariff (19,28) The ). Uncertainty was estimated by bootstrapping incremental mean costs and QALYs (29) and shown by the CEAC. 3. Pooling step. The results obtained from M completed-data analyses are combined into a single multiple-imputation result using Rubin's rules (30). The pooled estimated mean of a given coefficient is * = 1 ∑ =1 . The combined variance-covariance matrix incorporates both within-imputation variability (uncertainty about the results from one imputed data set) and

Multiple imputation model
between-imputation variability (reflecting the uncertainty due to the missing information).
The same long-format data were used in RMM and MI. Analyses were implemented using the mi suite of commands in STATA 15.
For each of the m complete datasets, total cost and total QALY over 1 year, 3 years and 5 years for each subject were imputed passively using the same formulas given in the section for RMM. The difference between RMM and MI being that in the RMM approach, estimates of total mean cost and QALY for the group as a whole are made by linear combination (lincom) of the coefficients, while MI imputes a total cost and QALY for each subject, and then proceeds to estimate mean incremental cost and QALYs for the group as a whole using bivariate normal regression (sureg in STATA 15). Coefficients from this regression were then combined across the multiple imputed datasets using Rubin's rules (mi estimate) [34]. The bootstrap was not used with MI as bootstrap combined with multiple imputation can be very complex (41). The CEAC was calculated parametrically from the coefficients and covariance matrix of the bivariate normal regression.

Complete case analysis.
This method assumes covariate dependent MCAR (missing completely at random) (1

Bayesian Parametric Approach (BPA)
In principle BPA allows to fit the model of interest while simultaneously handling all the different issues related to the missing data as well as to correctly propagate and quantify uncertainty. Each unobserved quantity in the model is handled as if it were a parameter (17,(42)(43)(44). Costs and QALYS were estimated through a joint distribution based on Markov Chain Monte Carlo (MCMC) using the R function selection, within the missingHE package (17). BPA requires the specification of four models: the first two are the estimation models for the total QALY and total cost variables (Y) and the last two are the auxiliary models which are fitted (similarly to equation 1) to estimate the probability Y is missing using logistic regressions. The four models include covariates of treat, duration, size, age, site, and the two last model also include the length of follow-up, as the probability of missingness increases with time since baseline.
The missingness mechanism is assumed MAR and the distribution of costs and QALY is assumed normal. Non-informative priors were used for the precision of the dependent variables, which were varied from 0.001 to 0.01 in sensitivity analysis. Incremental mean costs and QALY were computed from the first two models and the CEAC was calculated parametrically.

Ethics and Consent
The trial was approved by the South West-Central Bristol Research Ethics Committee, and trial oversight was provided by an independent trial steering committee and an independent data and safety monitoring committee. Written informed consent was obtained from all participants. Details of the trial design and implementation are provided in the published protocol of the EVRA study (45). The study was conducted in accordance with the recommendations for physicians involved in research on human subjects adopted by the 18th World Medical Assembly, Helsinki 1964 and later revisions.

Pattern of missingness
No baseline data were missing. 74% of subjects had complete data (costs and EQ-5D) at 1 year, 10% at year 3 and 25% at year 5 ( Table 2). This pattern arises from the staggered recruitment and because the final questionnaire was administered at a fixed calendar point irrespective of when the subject was recruited.
[ Table 2 about here (missing data pattern)] The logistic model showed the probability that a value is missing in costs and EQ-5D are related to the length of follow-up (week), age at baseline and site (p < 0.0001), see supplementary table S1 and table   S2. From this, we assume MAR. However, it cannot be ruled out that data might be MNAR.
Only subjects with complete data were used in CCA: year 1, n=338; year 3, n=44 and year 5, n=147. The BP included all the 450 subjects in the analysis, one observation (row) per subject. The data for RMM and MI included all the 6,861 available observations structured as an unbalanced panel, so that for any subject i at time t with missing dependent variable Yit (period cost or EQ-5D), data from earlier or later follow-up times informed an estimate of that value, either indirectly through the variance-covariance matrix (RMM) or directly through proper imputation (MI). The data for BPA were in the wide format (N=450) and included the period costs and EQ-5D at each follow-up time point as covariates in the analysis models.

Cost effectiveness analysis
[ Table 3 about hereresults] Table 3 shows a summary of the results of the cost-effectiveness-analysis with the five different approaches at each time point. All methods agreed that there was no statistically significant difference in cost at the 5% level at any time horizon. Early intervention was associated with statistically significantly greater mean QALY among all methods at year 1. BPA showed a statistically significant difference at year 3, while other methods showed trend towards greater QALY for the intervention, but this did not reach statistical significance.
At 3 years early intervention dominated according to both RMM and BPA methods. The ICER according to CCA was £14/QALY; MI computed £489/QALY using PMM and £521/QALY using linear regression.
All methods suggested that early intervention is cost-effective at a threshold of £20,000 per QALY at 1-, 3-and 5-year time horizons. At a threshold of £20000/QALY, the estimated probability that the intervention was cost-effective was 93% using RMM and 53% using CCA, see Figure 2. [ Figure 2 (ceac) and Figure 3 (SE) about here]

Discussion.
Staggered recruitment is a common problem in clinical trial and causes patients to have different lengths of follow-up, meaning that the dataset is an unbalanced panel with a considerable amount of missing data.
Any chosen statistical model should aim to use data efficiently, be robust to potential sources of bias, and provide credible estimates of uncertainty.
This paper compared five methods for handling missing data empirically, some in common use and others less so, using a real data set with several follow-up points over a long time period. Since we do not know the true values of the missing data, this approach does not allow us to generalize about which method is "correct". Nevertheless, based on the results in this paper, we can make some general comments about which methods might be more appropriate to deal with challenges similar to the ones presenting in this case study. We have attempted to use a similar estimation model in each case, so that differences arise mainly from the number of subjects and observations per subject that comprise the data, and the assumed latent correlation between observed and missing data.
CCA is the simplest method to implement. However, it is the least efficient as any subjects with any incomplete observations are discarded. This is seen in the large standard errors, particularly for mean

Conclusion
This study has empirically compared methods for handling missing data in cost-effectiveness analysis.
Despite rigourous trial procedures for following up subjects by telephone and in person there was still considerable missing data arising from staggered recruitment and item missingness. The longitudinal data set employed in the case study is likely to be fairly typical of the type of data encountered in other wellmanaged clinical trials. CCA and BPA did not use all the relevant outcome variables for each subject and hence can be considered relatively inefficient. MIPMM seemed to perform better than MILM, confirming findings from simulation studies (46). RMM seems to be a feasible option and has some points in its favour at least in theory, though it did not perform better than MIPMM in this dataset. Further simulation studies and applications should be undertaken to compare these methods.  Schematic relation between recruitment date and missing data pattern for 3 hypothetical patients Cost-effectiveness acceptability curves at 3 years Note: RMM. Repeated measure mixed model; MIPMM multiple imputation using predictive men matching; MILR. multiple imputation using linear regression; CCA. complete-case-analysis; BPA. Bayesian parametric approach.