In this section, we provide an example of how Trial Sequential Analysis successfully used data from feasibility and pilot RCTs that tested MiQuit, a text-message, self-help smoking cessation intervention for pregnant women, to justify research funds to undertake a third, more adequately powered RCT.

## Previous MiQuit trials

Smoking during pregnancy increases the risk of miscarriage, stillbirth, low birth-weight, premature birth, perinatal morbidity and mortality, sudden infant death, as well as adverse infant behavioural outcomes (23, 24). Pregnancy is a life event which motivates cessation attempts amongst smokers and over 50% of pregnant women who smoker attempt to quit during this time (25), consequently pregnancy is an opportune moment to offer smoking cessation support. Text message, self-help support, smoking cessation programmes developed for non-pregnant smokers are effective, but such programmes are inappropriate for use during pregnancy (26–28). To address the lack of acceptable self-help, support cessation programmes for pregnant smokers in the UK, MiQuit was developed (5). MiQuit delivers individually-tailored text messages to pregnant smokers, with the aim of encouraging them to stop smoking (5). Further details on MiQuit can be found elsewhere (5).

A MiQuit feasibility RCT was conducted, including 207 women. Biochemically-validated, 7-day point prevalence cessation at 12 weeks post randomisation (~ 6 months gestation) was 12.5% in the MiQuit group, compared with 7.8% in the control group (odds ratio (OR) 1.68, 95% confidence interval (CI) 0.90 to 3.16) (5). Although the trial was small, and the cessation period brief, the trial provided an estimate suggesting that MiQuit could have a positive impact in addition to routine care.

Next, we conducted a pilot RCT to investigate the feasibility of undertaking a fully-powered multi-centre RCT in UK National Health Service (NHS) settings (6). The pilot MiQuit RCT recruited 407 pregnant smokers and the prolonged abstinence rate from smoking, validated in late pregnancy was 5.4% in the MiQuit group versus 2.0% in the control group (OR 2.70, 95% CI 0.93 to 9.35) (6). This trial also suggested a beneficial effect of MiQuit.

As MiQuit is a cheap intervention and can be disseminated widely, we anticipated that even a 1–2% absolute effect on smoking cessation in pregnancy could be clinically important and cost effective (6). The results from the feasibility and pilot trials suggested that an impact of this size was attainable; however, an adequately powered RCT would still be needed to determine whether MiQuit is effective and guide future routine clinical practise.

## Conventional meta-analysis

The conventional way to determine if an intervention is effective or not is to use the naïve alpha of 5% and the naïve 95% confidence interval (8). Since both the feasibility and pilot trials used virtually the same design as that which would be used in any new RCT, they can be considered as pilots and it would be appropriate to meta-analyse these trials’ findings together. Using a random-effects model, a traditional meta-analysis of pilot and feasibility studies’ data found, that women randomised to MiQuit were more than twice as likely to be abstinent in their pregnancy (pooled OR 2.26, 95% CI 1.04 to 4.93; I2 = 0%, p = 0.041). This result seems to be significant according to conventional assessment (p < 0.05). However, this result should be interpreted with caution because, as described above, findings from meta-analyses based on only two small RCTs can produce spurious findings due to type I error (9, 10, 21).

In the next sections, we use conventional sample size estimation methods to estimate the sample size for an RCT which, on its own would have enough power to show whether MiQuit might be effective, using a plausible treatment effect estimate derived from the conventional meta-analysis above. We also calculate a second sample size estimate for one or more further RCTs, which when pooled with data from feasibility and pilot trials using Trial Sequential Analysis methods, would be similarly decisive.

## Conventional sample size estimation

As the pilot trial (6) was considered at lower risk of bias compared to the feasibility trial (5), a traditional sample size calculation using smoking cessation rate estimates derived from the pilot trial suggests a new trial would require a total sample size of 1292 participants. This estimate has 90% power (10% type II error) and 5% significance (2-sided test; type I error) to detect a 3.4% absolute difference in prolonged abstinence from smoking from 4 weeks after enrolment until 36 weeks gestation between the MiQuit and control groups (5.4% versus 2.0%) (6).

## Trial Sequential Analysis

The z-score is the test that helps you decide whether to accept or reject the null hypothesis. Very high positive or very low negative z-scores are associated with very small p-values. The critical z-score values when using a 95% confidence level which are known as the ‘conventional test boundaries’, are − 1.96 and + 1.96 and these relate to a two-sided p-value of 0.05. If the z-score is between − 1.96 and + 1.96, the p-value will be larger than 0.05, and the null hypothesis of no difference between intervention groups is accepted. The z-curve represents the cumulative z-score as each RCT is added to the analysis. In Fig. 1.I, when trial B is added to the analysis, the z-curve crosses the conventional test boundary (p = 0.05). This is consistent with the results from the conventional meta-analysis for MiQuit, where we found p = 0.041.

The required information size is represented by the vertical red line in Fig. 1. The required information size was estimated using the same parameters as used for the conventional sample size estimation above (90% power, 5% significance, to detect a 3.4% absolute difference) (6); although this estimate could take into account observed heterogeneity, there was none in this meta-analysis (I2 = 0% and D2 = 0). Consequently, the estimated required information size of 1296 participants is only slightly different to that using conventional sample size estimation due to rounding errors; the estimates would be larger if heterogeneity were present.

As the cumulative z-curve does not cross the upper trial sequential monitoring boundary which indicates MiQuit being effective, this Trial Sequential Analysis shows that further information is required before any firm conclusion can be reached about MiQuit efficacy. Although the conventional meta-analysis suggested, with borderline significance, that pregnant women randomised to MiQuit were more than twice as likely to be abstinent from smoking in late pregnancy, Trial Sequential Analysis indicates that this finding is not sufficiently robust. The Trial Sequential Analysis-adjusted confidence intervals for cessation using MiQuit (pooled OR 2.26, Trial Sequential Analysis-adjusted CI 0.66 to 7.70), are much wider than those of the conventional meta-analysis (pooled OR 2.26, 95% CI 1.04 to 4.93).

Without Trial Sequential Analysis having been undertaken, an interpretation of the conventional meta-analysis would have been that MiQuit is effective. However, Trial Sequential Analysis indicates that one cannot be secure in this interpretation and further trial data should be collected to eliminate the possibility that this is a false positive result, which can occur early in intervention evaluation when small trials are undertaken.

## Calculating sample size for a third MiQuit RCT

Trial Sequential Analysis has demonstrated that further RCT data are required before a firm conclusion about MiQuit efficacy can be determined. As the initial two trials were sufficiently similar to be combined in Trial Sequential Analysis, we will now demonstrate how Trial Sequential Analysis methods can be used to estimate the sample size for (a) further trial(s) – data from which, when combined with the previous two trials in the Trial Sequential Analysis, would be expected to provide a more decisive answer regarding MiQuit efficacy. We will also demonstrate how exemplar theoretical findings from future trials which are both in favour and against MiQuit having a positive effect would impact the Trial Sequential Analysis result.

Trial Sequential Analysis sample size estimation: Estimates derived from the Trial Sequential Analysis found the required information size as 1296 participants. From the feasibility and pilot studies, 605 women have already been recruited and randomised; therefore, the required sample size for further RCTs can be estimated as the difference between the required information size minus the number of women already recruited into the previous trials; thus a sample size of 691 women (346 per intervention group) would be needed, assuming a 1:1 ratio.

When a theoretical third trial (D) with a negative outcome is included in the Trial Sequential Analysis (Fig. 1.III), we observe a different output. Here, the third trial of sample size 630 was intentionally given a negative outcome (absolute difference of -0.63% in favour of control). Here we observe the z-curve drop below the conventional test boundary, and in a meta-analysis we would have concluded that MiQuit was not effective. However, in the Trial Sequential Analysis, the futility boundary is not crossed, so we are unable to decisively say that MiQuit is not as effective as control for smoking cessation. Due to the diversity, the required information size has increased to 1941, meaning future trials will need a further 706 participants.

A conservative approach to sample size estimation: In the above example, the required information size was derived using the smoking cessation rate from the pilot trial (6). Therefore, it can be contested whether data from the pilot trial should be included in subsequent Trial Sequential Analysis. Consequently, consistent with this one could exclude the data from the pilot trial from the Trial Sequential Analysis and re-estimate the total number required (Fig. 2.I). Using this approach, to provide a conclusive result, either a single trial of 1098 participants (549 per intervention group, assuming a 1:1 ratio) or multiple trials cumulating to a total of 1098 participants, would be needed. This figure, although conservative, is still less than the estimate from the conventional sample size calculation.

## Sensitivity analysis

The modelled scenario, in which there is no heterogeneity between trials in a meta-analysis is rare; in most situations where the described approach is used, some heterogeneity between studies might be expected. Trial Sequential Analysis provides 95% confidence intervals for heterogeneity (I-square) within meta-analyses. One way to fully allow for heterogeneity is to perform a sensitivity analysis using the upper boundary for heterogeneity. This would increase the required information size. In our example, the program could not calculate the 95% confidence interval surrounding the I-square of 0% as there were less than three included studies. In this case it is possible to input an estimate for heterogeneity into the TSA software.