Methods
A systematic review was conducted to identify studies for the simulation study. This aimed to identify all factorial RCTs with economic evaluations published before 2010 evaluating any intervention/comparator in any patient group. The protocol is available in Additional file 1. MEDLINE (including daily update and old MEDLINE), EMBASE, Econlit and Journals@Ovid were searched through Ovid on 9th February 2010. We also searched www.bmj.com, Tufts CEA registry (https://research.tufts-nemc.org/cear/Default.aspx), Wiley Interscience, National Institute for Health Research (NIHR) publications list (http://www.hta.ac.uk) and Centre for Reviews and Dissemination (CRD, http://www.crd.york.ac.uk/crdweb) Database on the same date. The review was not updated because the original review was sufficient to identify a representative sample of studies and provide the basis for the simulation study. The review followed PRISMA guidelines [9].
Search terms to identify factorial trials (e.g. “factorial”, “2 x 2”, “2 by 2”, “two by two”, or “2 x 3”) were combined with search terms to identify economic evaluations (“cost-effect*” or “economic evaluation” (See Additional file 1). Since some papers on factorial trial-based economic evaluations do not describe the design as factorial, clinical papers on factorial trials that happened to be picked up in the main database searches and which mentioned plans for an economic evaluation or collection of cost data were flagged. Additional targeted literature searches were then conducted to identify papers reporting economic evaluations of these specific factorial trials.
One author (HD) examined titles and abstracts to assess whether they met all of the following inclusion criteria:
- Described the methods and/or results of a cost-effectiveness, cost-utility, cost-consequence or cost-benefit analysis quantifying the costs and benefits of interventions designed to improve health or affect healthcare systems.
- Used patient- or cluster-level data from a factorial RCT, as defined in Additional file 1.
- Published at least brief details of the methods and/or results of the trial-based economic evaluation on/before 31st December 2009. Studies were not excluded from the review based on language, providing that at least an English abstract was available. For completeness, protocols published as journal articles by 31st December 2009 were also included, to give information on intended analytical methods.
The same author extracted data on study characteristics, study design, statistical methods and results (See Additional file 2). Mean costs and mean health benefits within each cell of the factorial design and their standard deviations were extracted if reported. These data were used in the simulation study and to estimate the magnitude, influence and (where possible) statistical significance of interactions.
Interactions were placed in one of four categories:
- super-additive: where the effect of the combination is greater than the sum of the parts;
- sub-additive: where the effect of the combination is less than the sum of the parts, but the interaction does not change the direction of effects;
- qualitative: where at least one of the treatments under investigation changes sign (not just magnitude) depending on whether or not the other therapy is given; and
- mixed: we developed the “mixed” category to reflect situations where one factor decreases outcome while the other increases it, such that the interaction has the same sign as one treatment effect, but the opposite sign from the other.
To measure the magnitude of interactions relative to between-group differences, we developed the interaction:effect ratio,[1] which indicates both the size of interactions and whether the interaction is super-additive, sub-additive/mixed or qualitative. (see Interaction in the Supplementary Files)
Searches identified 1,671 references (Figure 1, Additional files 1). Of these, 40 complete studies presenting economic evaluation results, 13 published protocols and one prematurely-terminated study[1] met the inclusion criteria. Additional file 2 gives details of all included studies.
Of the completed studies, 23% (9/40) allowed for interactions between factors when analysing the primary clinical endpoint, 53% (21/40) assumed no interaction, while 25% (10/40) did not clearly state their methods (Table 1). Twenty studies (50%) used regression methods for the primary endpoint, of which five included interaction terms, seven did not and eight did not clearly describe their methods. Four studies used inside-the-table analysis and 14 used at-the-margins. Only three studies (8%) observed statistically significant interactions for the primary endpoint, although nine others (23%) observed large or qualitative interactions that did not reach statistical significance or for which significance was not reported. Interaction results were not clearly reported for 15 studies.
By contrast, 53% (21/40) of completed studies allowed for interactions in their base case economic evaluation: more than twice the number allowing for interactions in the primary endpoint. Studies were also more likely to report sufficient information to identify whether interactions were taken into account for cost-effectiveness than primary endpoints, although in most cases it was necessary to infer the methods used from the tables reported. Only five studies analysed economic results using regression analyses, while two used event-based cost-effectiveness analysis, 17 inside-the-table and 14 at-the-margins; this may reflect the difficulties associated with regression-based economic evaluation identified previously [1].
Fifteen completed studies (38%) presented the probability of treatment being cost-effective within the text or as cost-effectiveness acceptability curves. Of these: nine studies presented pair-wise comparisons giving the probability that one treatment is cost-effective compared with a single comparator; three studies presented figures showing how the probability of each treatment evaluated in the trial having highest NMB varies with ceiling ratio; and a further three studies presented acceptability curves for both pair-wise and multiple comparisons. Six further studies quantified uncertainty in other ways (e.g. scatter graphs or confidence intervals). One study also presented the value of information [11-13].
Sixteen studies (40%) reported results inside-the-table in sufficient detail that interactions for both costs and health benefits could be directly evaluated (See Additional file 3).[2] Large interactions arose frequently: 33% (24/72) of interactions had an absolute magnitude larger than one or more simple effect (interaction:effect ratios >1 or <-1; Table 2). Interaction:effect ratios varied between -44 and 232. Overall, 33% of interactions were super-additive (23/72), 49% (35/72) were sub-additive or qualitative, while 17% (12/72) were mixed (Table 2). Large and qualitative interactions occurred at least as commonly for health benefits as for costs and NMB. Among the studies measuring health in units other than QALYs, 50% (7/14) of interactions were larger than simple effects. However, although 29% (7/24) of studies had qualitative interactions for NMB, the interaction changed the treatment adoption decision in only one case [14].
Six studies (reporting nine interactions) reported standard deviations around both costs and health benefits in each group [14-19]. Within these studies, 56% (5/9) of interactions for cost were statistically significant (p<0.05), although there were no statistically significant interactions for health benefits or NMB.
Simulation study
Methods
The six studies reporting standard deviations for each group [14-19] were used in simulation work to evaluate the different criteria for identifying which interactions should be included in economic analyses. Using simulated data means that: (a) whereas for a real trial, we only see one sample, for simulated data, we can generate multiple samples and see how performance varies; (b) we specify the true data-generating mechanism and can compare the conclusions of each individual sample against the true answer; (c) we can vary the characteristics of the data-generating mechanism (e.g. interaction size and sample size) and see the impact on the results. For simplicity, simulations focused on balanced 2x2 full factorial designs with no covariates or missing data. We therefore only included the first two levels for each factor evaluated by Hollis et al [19] and the Alexander Technique, Exercise And Massage (ATEAM) trial [16].
In addition to the original studies, five variants of each trial were simulated using interaction terms that were 0%, 50% or 200% of the size observed in the original study, and using double the sample size with either the original interaction or zero interaction (See Additional file 3). The analysis used Stata version 12 (College Station, Texas) to simulate and analyse 300 samples of each of the 36 scenarios from the six trials. The data-generation methods and Stata code are shown in Additional file 3 and use the data in Additional file 5.
The costs and benefits for each sample were analysed using four mixed models with different combinations of interaction terms: no interactions; interaction for costs only; interaction for health benefits only; and interactions for costs and benefits. The mixed models implemented seemingly-unrelated regression allowing for correlations between costs and benefits by predicting outcomes (which could be either costs or benefits) with random effects by patient. However, separate constants, treatment effects and (where appropriate) interactions were estimated for costs and benefits and unstructured residuals were used. This approach gives identical results to the sureg command [20]. The log-likelihood, degrees of freedom and coefficients and their standard errors were recorded for each model.
The coefficients estimated in mixed models were used to calculate NMB. For simplicity, all costs were interpreted as though they were in pounds Sterling. Results focus on ceiling ratios of £20,000/QALY [21] for the five studies measuring benefits in QALYs, and £5,000 per unit of benefit for other studies.
We evaluated 15 criteria for determining which interactions should be taken into account (Table 3) and applied these to each simulated trial sample. We compared the results of each analysis against the “true” results for each dataset, which (for the purposes of this simulation study) were assumed to equal the mean values for treatment effects and interactions shown in Additional file 3, Table 3.3. The sensitivity and specificity for identifying interactions, the probability of adopting the best treatment and the opportunity cost of making the wrong decision [1] were evaluated for each of the 15 criteria (Table 4).
We used the opportunity cost as the primary measure of which criterion works best, since it focuses on the central question of economic evaluation: namely maximising health gains from a finite budget. Coverage, statistical power and bias were also calculated (Additional file 4).
[1] One study meeting inclusion criteria was terminated early due to poor recruitment but was published as a monograph without analysis of economic results; this is considered in the review alongside protocols.
[2] Since four of these studies were larger than 2x2 or reported results by subgroup, this gave 24 interactions for each of three outcomes (cost, QALYs and NMB): 72 interactions in total.