In this section, we provide an example of how Trial Sequential Analysis successfully used data from feasibility and pilot RCTs that tested MiQuit, a text-message, self-help smoking cessation intervention for pregnant women, to justify research funds to undertake a third, more adequately powered RCT.
Previous MiQuit trials
Smoking during pregnancy increases the risk of miscarriage, stillbirth, low birth-weight, premature birth, perinatal morbidity and mortality, sudden infant death, as well as adverse infant behavioural outcomes (25, 26). Pregnancy is a life event which motivates cessation attempts amongst smokers and over 50% of pregnant women who smoke attempt to quit during this time (27), consequently pregnancy is an opportune moment to offer smoking cessation support. Text message, self-help support, smoking cessation programmes developed for non-pregnant smokers are effective, but such programmes are inappropriate for use during pregnancy (28-30). To address the lack of acceptable self-help, support cessation programmes for pregnant smokers in the UK, MiQuit was developed (7). MiQuit delivers individually-tailored text messages to pregnant smokers, with the aim of encouraging them to stop smoking (7). Further details on MiQuit can be found elsewhere (7).
A MiQuit feasibility RCT was conducted, including 207 women. Biochemically-validated, 7-day point prevalence cessation at 12 weeks post randomisation (~6 months gestation) was 12.5% in the experimental MiQuit group, compared with 7.8% in the control group (odds ratio (OR) 1.68, 95% confidence interval (CI) 0.66 to 4.31) (7). Although the trial was small, and the cessation period brief, the trial provided an estimate suggesting that MiQuit could have a positive impact in addition to routine care.
The feasibility RCT lead to minor changes to the intervention, before a pilot RCT was conducted to investigate the feasibility of undertaking a fully-powered multi-centre RCT in UK National Health Service (NHS) settings (8). The pilot MiQuit RCT recruited 407 pregnant women that smoke, which had largely similar baseline characteristics to those in the feasibility RCT. The self‐reported abstinence from 4 weeks post‐randomisation until late pregnancy follow‐up (approximately 36 weeks gestation) biochemically validated at follow-up was 5.4% in the experimental MiQuit group versus 2.0% in the control group (OR 2.70, 95% CI 0.93 to 9.35) (8). This trial also suggested a beneficial effect of MiQuit.
As MiQuit is a cheap intervention and can be disseminated widely, it was anticipated that even a 1% to 2% absolute effect on smoking cessation in pregnancy could be clinically important and cost effective (8). The results from the feasibility and pilot trials suggested that an impact of this size was attainable; however, an adequately powered RCT would still be needed to determine whether MiQuit is effective and guide future routine clinical practise.
Conventional meta-analysis
The conventional way to determine if an intervention is effective or not is to use the naïve alpha of 5% and the naïve 95% confidence interval (10). Since both the feasibility and pilot trials used almost the same design as was planned to be used in the new RCT, they can be considered as pilots and it would be appropriate to meta-analyse these trials’ findings together. Using a random-effects model, a traditional meta-analysis of pilot and feasibility studies’ data found, that women randomised to MiQuit were more than twice as likely to be abstinent in their pregnancy (pooled OR 2.26, 95% CI 1.04 to 4.93; I2=0%, p=0.041). This result seems to be significant according to conventional assessment (p<0.05). However, this result should be interpreted with caution because, as described above, findings from meta-analyses based on only two small RCTs can produce spurious findings due to type I error (11, 12, 22) (please see below).
In the next sections, we use conventional sample size estimation methods to estimate the sample size for an RCT which, on its own would have enough power to show whether MiQuit might be effective, using a plausible treatment effect estimate derived from the conventional meta-analysis above. We also calculate a second sample size estimate for one or more further RCTs, which when pooled with data from feasibility and pilot trials using Trial Sequential Analysis methods, would be similarly decisive.
Conventional sample size estimation
As the pilot trial (8) was considered at lower risk of bias compared to the feasibility trial (7), a traditional sample size calculation using smoking cessation rate estimates derived from the pilot trial suggests a new trial would require a total sample size of 1292 participants. This estimate has 90% power (10% type II error) and 5% significance (2-sided test; type I error) to detect a 3.4% absolute difference in prolonged abstinence from smoking from 4 weeks after enrolment until 36 weeks gestation between the MiQuit and control groups (5.4% versus 2.0%) (8).
Trial Sequential Analysis
Figure 1.I illustrates a Trial Sequential Analysis incorporating findings from the MiQuit feasibility (A) (7) and pilot (B) (8) trials. In this Trial Sequential Analysis output, the x-axis represents the number of participants and marked on this are the numbers of participants recruited to each trial. The y-axis represents the z-score, where a positive z-score favours the MiQuit intervention and a negative z-score favours the control.
The z-score is the test that helps you decide whether to accept or reject the null hypothesis. Very high positive or very low negative z-scores are associated with very small p-values. The critical z-score values when using a 95% confidence level, which are known as the ‘conventional test boundaries’, are -1.96 and +1.96 and these relate to a two-sided p-value of 0.05. If the z-score is between -1.96 and +1.96, the p-value will be larger than 0.05, and the null hypothesis of no difference between intervention groups is accepted. The z-curve represents the cumulative z-score as each RCT is added to the analysis. In Figure 1.I, when trial B is added to the analysis, the z-curve crosses the conventional test boundary (p=0.05). This is consistent with the results from the conventional meta-analysis for MiQuit, where we found p=0.041.
The required information size is represented by the vertical red line in Figure 1. The required information size was estimated using the same variables as used for the conventional sample size estimation above (90% power, 5% significance, to detect a 3.4% absolute difference) (8); although this estimate could take into account observed heterogeneity, there was none in this meta-analysis (I2 = 0% and D2 = 0). Consequently, the estimated required information size of 1296 participants is only slightly different to that using conventional sample size estimation due to rounding errors. The estimate would be larger if heterogeneity were present.
As the cumulative z-curve does not cross the upper trial sequential monitoring boundary for benefit, this Trial Sequential Analysis shows that further information is required before any firm conclusion can be reached about MiQuit efficacy. Although the conventional meta-analysis suggested, with borderline significance, that pregnant women randomised to MiQuit were more than twice as likely to be abstinent from smoking in late pregnancy, the Trial Sequential Analysis software shows that this finding is not sufficiently robust. The Trial Sequential Analysis-adjusted confidence intervals for cessation using MiQuit (pooled OR 2.26, Trial Sequential Analysis-adjusted CI 0.66 to 7.70), are much wider than those of the conventional meta-analysis (pooled OR 2.26, 95% CI 1.04 to 4.93).
Without Trial Sequential Analysis having been undertaken, an interpretation of the conventional meta-analysis would have been that MiQuit is effective. However, Trial Sequential Analysis indicates that one cannot be secure in this interpretation and further trial data should be collected to eliminate the possibility that this is a false positive result, which can occur early in intervention evaluation when small trials are undertaken.
Calculating sample size for a third MiQuit RCT
Trial Sequential Analysis has demonstrated that further RCT data are required before a firm conclusion about MiQuit efficacy can be determined. As the initial two trials were sufficiently similar to be combined in Trial Sequential Analysis, we will now demonstrate how Trial Sequential Analysis can be used to estimate the sample size for (a) further trial(s) – data from which, when combined with the previous two trials in the Trial Sequential Analysis software, would be expected to provide a more decisive answer regarding MiQuit efficacy. We will also demonstrate how exemplar theoretical findings from future trials which are both in favour and against MiQuit having a positive effect would impact the Trial Sequential Analysis result.
Trial Sequential Analysis sample size estimation: Estimates derived from the Trial Sequential Analysis found the required information size as 1296 participants. From the feasibility and pilot studies, 605 women have already been recruited and randomised; therefore, the required sample size for further RCTs can be estimated as the difference between the required information size minus the number of women already recruited into the previous trials; thus a sample size of 691 women (346 per intervention group) would be needed, assuming a 1:1 ratio.
Figure 1.II shows the Trial Sequential Analysis output after adding a theoretical third trial (C) with a sample size of 630 women (315 per trial group), where an absolute difference of 3.17% was observed in favour of the MiQuit group versus the control group. The Trial Sequential Analysis clearly shows the cumulative z-curve line crossing the upper trial sequential monitoring boundary which indicates MiQuit being effective. As the trial sequential monitoring boundary has been crossed, the Trial Sequential Analysis z-curve does not need to reach the required information size of 1296. In the present scenario, we can firmly conclude that MiQuit is effective for smoking cessation compared with control (provided that all trials are valid and not influenced by systematic errors (bias) or other errors).
When a theoretical third trial (D) with a negative outcome is included in the Trial Sequential Analysis (figure 1.III), we observe a different output. Here, the third trial of sample size 630 was intentionally given a negative outcome (absolute difference of -0.63% in favour of control). Here we observe the z-curve drop below the conventional test boundary, and in a meta-analysis we would have concluded that MiQuit was not effective. However, in the Trial Sequential Analysis, the futility boundary is not crossed, so we are unable to decisively say that MiQuit is not as effective as control for smoking cessation. Due to the diversity, the required information size has increased to 1941, meaning future trials will need a further 706 participants.
A conservative approach to sample size estimation using Trial Sequential Analysis: In the above example, the required information size was derived using the smoking cessation effect from the pilot trial (8). Therefore, it can be contested whether data from the pilot trial should be included in subsequent Trial Sequential Analysis. Consequently, one could exclude the data from the pilot trial from the Trial Sequential Analysis and re-estimate the total number required (figure 2.I). Using this approach, to provide a conclusive result, either a single trial of 1098 participants (549 per intervention group, assuming a 1:1 ratio) or multiple trials cumulating to a total of 1098 participants, would be needed. This figure, although conservative, is still less than the estimate from the conventional sample size calculation.
Figures 2.II and 2.III also show the Trial Sequential Analysis outputs if theoretical trials C and D were included in the analyses. In both situations further information is needed, despite the z-curve coming close to the upper trial sequential monitoring boundary in figure 2.II and the futility boundary in figure 2.III.
Sensitivity analysis
The modelled scenario, in which there is no heterogeneity between trials in a meta-analysis is rare; in most situations where the described approach is used, some heterogeneity between studies is to be expected. Trial Sequential Analysis provides 95% confidence intervals for heterogeneity (D2) within meta-analyses. One way to fully allow for heterogeneity is to perform a sensitivity analysis using the upper 95% confidence interval for the between-trial heterogeneity variance estimate. This would increase the required information size. In our example, the program could not calculate the 95% confidence interval surrounding the D2 of 0% as there were less than three included studies. In this case it is possible to input an estimate for heterogeneity into the Trial Sequential Analysis software.