(1) The literature search yielded 489 articles of which 21 articles were withheld (S2). Literature on the ACS-to-birth interval is summarised in Table S1.1,26-45

Two RCTs on ACS reported on the ACS-to-birth interval: the initial first RCT1, and the RCT of Schutte et al.26 Schutte et al performed a study (partially RCT (ACS vs placebo), partially observational (all women who received betamethasone)) to analyse the incidence of RDS in groups of preterm born children at different time intervals after admission of the mother. Although the goal was to investigate the relevance of time factors, the randomisation was for ACS or placebo. For the subgroup analysis of the time intervals, the number of patients in the time intervals was very small. Finally, the focus was less on the ACS-to-birth interval, but mainly on the admission-to-birth interval.26

All remaining 19 studies that focused on the neonatal outcome according to the ACS-to-birth interval were observational (17 in retrospective27-41,44,45 and two in prospective cohorts42,43). Four studies27-30 postulated no superiority of a certain interval, three studies35,38,42 concluded that an interval of 0-7d was superior compared to other intervals, and eight studies31,34,36,37, 41,43-45 confirmed the optimal interval of one or 2 to 7 days. The remaining four studies32,33,39,40 suggested a different interval (Table S1).

(2) We assessed the risk of bias in the Liggins and Howie trial using the *RoB 2 *tool*.*1,17 In Table S2, a risk of bias assessment is available for the primary research question of the RCT (ACS vs placebo) and for the research question of interest for this article (ACS-to-birth interval). For both research questions, the risk of bias was considered to be high. The same conclusion accounts for the study of Schutte et al.26

(3) Observational studies on the ACS-to-birth interval were prone to causal conclusions based on associational research questions and methodologies. A summary is provided in Table S3.27-45 None of the observational studies used causal inference techniques, however 89.5% used causal language. In 52.6% of articles, multiple regression was done to account for confounding. However, when considering an association, there is no need to correct for confounders.22 In our opinion, Peaceman et al. and Fuller et al. were the only ones who did not use any terms that implied causality.31,40

(4) We did an attempt to address the following research question in a hypothetical trial: *‘Does the 24h-7d interval between ACS administration and birth cause a higher reduction in RDS compared to other ACS-to-birth intervals (<24h, 8-14d, >14d)?’*

*(a) Research question*

An interval is not an intervention. It is an observation. An intervention would be to give ACS or not, or to repeat ACS or not. To transform the ACS-to-birth interval into an intervention, patients could be randomised to one of the four groups and labour could be induced so that birth occurs within one of the four intervals (change the endpoint of the interval). Of course, inducing labour to ensure one delivers preterm within a certain interval, would be unethical. An alternative would be to postpone ACS administration (change the start point of the interval). However, omitting ACS administration based on false negative predictions for birth within a certain interval would again be unethical. So, the research question cannot be answered, at least not if the actual, a posteriori known, interval is used. A possible solution would be to use the predicted interval at the time of ACS administration (see Key component 6 and 7).

*(b) The 3 key principles of causal inference applied to the ACS-to-birth interval*

Exchangeability

To achieve (conditional) exchangeability between the patients in the four ACS-to-birth intervals considered, confounders need to be explored. Confounders are shared causes of treatment and outcome. A directed acyclic graph is a way to present these confounders, an example is presented in Figure 1.

Consistency

Different regimens of ACS are possible and different intervals can be considered. One regimen needs to be chosen for the purpose of the target trial. For example: two intramuscular injections of 12 mg betamethasone (Celestone ChronodoseÒ, betamethasone acetate (3mg/1mL) and phosphate (4 mg/1mL)) with a 24h interval. The ACS-to-birth intervals taken into consideration could be: < 24h, 24h-7d, 8-14d, and >14d.

Positivity

There should be patients in each predefined ACS-to-birth interval.

*(c) Target trial protocol *

1: __eligibility__

Eligibility criteria could be:

- All women at risk of spontaneous or iatrogenic PTB.
- Presenting at a gestational age of 23 weeks and 5 days till 33 weeks and 5 days.

Exclusion criteria: Antenatally detected fetal congenital malformations with an effect on neonatal outcome and intra-uterine fetal death at presentation.

2: __treatment strategies__

The use of the ACS-to-birth interval as intervention is problematic and makes it impossible to establish a time zero. Birth falls after the baseline of the study and therefore is not known at the start of the study. Immortal time bias could arise when fetal death during the ACS-to-birth interval occurs.

Another issue is the possibility of repeating ACS. A large proportion of patients with the diagnosis of preterm labour does not give birth immediately (sometimes not even preterm). When experiencing a new episode of threatened PTB, a repeat course of ACS could be given. This further complicates the set-up of a target trial since this requires the availability of confounders for each administration of ACS (which might not all be available in the data) and the use of g-methods for time-varying-treatments.

3: __assignment procedures__

We cannot randomly assign patients to an interval, since we are interested in an ‘intervention’ only known post-randomisation.

4: __follow-up period__

The follow-up period of the mothers is until delivery. The follow-up of the neonates is until discharge from the neonatal unit. Criteria for discharge need to be set out. Neonates can be transferred back to the referring center and complications might still occur there. However, a diagnosis of RDS is very unlikely to be made after discharge of the neonatal intensive care unit.

5: __outcome__

When the neonatologist knows a mother has received ACS, he might be less likely to diagnose RDS. We think neonatologists rather focus on whether ACS were given and not within which interval birth occured. Therefore, we do not expect a problem with outcome validity for RDS. A sound, uniform definition of RDS, however, is mandatory.

6: __causal contrasts of interest__

If we would be able to know the ACS-to-birth interval (by being able to predict when birth will occur (if only we could!) or by ensuring that a patient delivers within the interval she is randomly assigned to (unethical)), we would be able to assign patients to one of the four intervals. If, however, the patients delivered earlier or later than expected, they would belong to another interval. The intention-to-treat effect would compare the outcome of patients assigned to a certain interval, the per-protocol effect would compare the outcome of patients who delivered in a certain interval.

7: __analysis plan__

In the intention-to-treat analysis, we would explore the effect of the *predicted* ACS-to-birth interval on RDS. We can also do a per-protocol analysis by looking at the effect of the ACS-to-birth interval in which the patient actually delivered on RDS. For this, we not only need to adjust for confounding at baseline, but also for post-randomisation confounding. Post-baseline prognostic factors need to be identified and adjusted for. These post-baseline prognostic factors might not always be readily available in the data, which will lead to residual confounding.

**Comments**

The research question of interest cannot be formulated causally based on *observed* ACS-to-birth intervals. It is impossible to set a time zero, a time at which eligibility, assignment to intervention and start of follow-up coincide. Only associational studies and conclusions are possible, exploring an association between the different intervals and the neonatal outcome. Children born at a certain interval can have different characteristics than children born at other intervals. This was also addressed by Gates and Brocklehurst, who discussed that the evidence on the relevance of the ACS-to-birth interval is based on unsound subgroup analyses.46

Conducting multiple tests is associated with a risk of obtaining significant (false positive) results due to chance alone (familywise error rate).36 When subgroups are not specified before doing an RCT, the risk of the analyses being a result of looking for significant differences between subgroups with the goal of finding any significant result, is high. One should always remain vigilant about post-hoc and subgroup analyses, and interpret them with great caution. Subgroup effects gain credibility when the number of subgroups examined is small, the subgroups are prespecified, the treatment effect between the groups is considerably high, the sample sizes are large, and the observed effect is clinically plausible. Preferably, the effects are consistent and duplicated in other studies.37 The outcome of the Liggins RCT was considerably biased due to subdivision of the study population post-randomisation in etiologic groups of PTB.1 By doing this, the authors introduced a high risk of hampering the exchangeability between the intervention groups per etiologic group, which eliminated the advantages of doing an RCT. Moreover, by comparing multiple groups, they created smaller sample sizes, and introduced a risk of inflated familywise error rates.36 As concerns the interval, the significant difference in RDS incidence in the 24h-7d group seems most likely to be due to chance. The groups compared were small (28 neonates in the treatment group, 24 in the control group), and the probability of a type I error was large (p-value of .03 without multiple testing correction, when performing Bonferroni correction, the p-value reducing the risk of a type I error, should have been .0125).1 Finally, the subgroup analysis was not prespecified and concerned a post-randomisation event. Hirji and Fagerland described four concerns regarding outcome based subgroup analysis: reduced power, overdone post-hoc analysis, selective reporting, and overinterpretation. The subgroup analysis of the ACS-to-birth interval in the Liggins trial is prone to all these concerns due to respectively a small subset, a not prespecified analysis, highlighting of significant findings, and an overt overinterpretation of the result of the subgroup analysis which is reflected in the importance the interval gained in obstetrical literature and practice.49 In general, whenever subgroups are considered in RCTs, there is a risk of post-randomisation confounding due to the risk of lack of exchangeability between the treatment and non-treatment arm within the subgroup.50

The cohort studies are also marked by a high risk of selection bias. Cohort follow-up did not start when ACS were administered, but study participants were selected based on gestational age or weight at birth. Follow-up of women who delivered at term is absent. The risk of collider-stratification bias is high and even more complex than its pure form, since the collider is partially part of the ‘intervention’ (Figure 1).51,52 Controlling for confounders in non-randomised studies in order to explore causality, requires an insight in which variable is a confounder. When using a DAG, designed based on subject-matter knowledge, confounders are shared causes of exposure and outcome. Once confounders are identified, they can be used in a propensity score model, often a multivariate logistic regression model, which will allow exchangeability of the treatment groups. When the principles of positivity and consistency are not violated, one can now use causal inference methods to explore if the exposure has a (causal) effect on the outcome. Interestingly, even the design of a DAG is challenging when considering the interval. An alternative to Figure 1 is adding the interval as a node, which has implications on which variables are considered to be confounders. Alternatively, variables can also be included in a multivariate logistic regression model to enable outcome prediction, without aiming to explore a causal relationship between the exposure/predictor and the outcome.