Relevance of the antenatal corticosteroids-to-delivery interval in the prevention of neonatal respiratory distress syndrome through the eyes of causal inference: a review and target trial

To critically analyse the literature on the antenatal corticosteroids (ACS)-to-birth interval from a causal point of view and to present a solution to the problem of bias caused by post hoc analysis. Due to the post hoc nature of the ACS-to-birth interval, a randomised controlled trial (RCT) of ACS versus placebo is not able to examine the importance of the interval. When an RCT is not feasible, for whatsoever reason, a target trial can be set up and an attempt can be made to answer the causal question of interest using observational data. An attempt was made to set up a target trial which could enable to examine the causal effect of the ACS-to-birth interval on neonatal outcomes. An analysis of current literature on the ACS-to-birth interval was done. The majority of studies aimed to examine the causal effect of the interval, but their study design only permitted to find associations between the interval and neonatal outcomes. Barriers for setting up a target trial are highlighted. Evidence on the superiority of any ACS-to-birth interval is lacking and the question can only be addressed causally and become clinically relevant if baseline randomisation to ACS-to-birth intervals is made possible.


Background
Preterm birth (PTB), defined as delivery prior to 37 weeks' gestation, remains a challenging problem within the field of perinatology. In 1972, Liggins and Howie published a randomised controlled trial (RCT) on antenatal corticosteroids (ACS) for the prevention of respiratory distress syndrome (RDS) in preterm born infants [1]. This trial was one of the most important trials in the history of perinatology. The introduction of ACS brought about a significant drop in neonatal morbidity and mortality [2]. Based on a post hoc analysis, the authors suggested that the greatest reduction of RDS incidence was seen in the group of children exposed between 24 h and 7 days (24h-7d) before birth [1]. Crowley et al. confirmed the importance of the 24h-7d interval in a secondary analysis of a meta-analysis on ACS [3]. This presumed optimal interval influenced obstetrical management, which is reflected in, for instance, biomarker tests being designed to predict who will deliver within 7 days after presentation, or (weekly) repeat doses of ACS [4][5][6][7][8][9]. In spite of that, it often occurs that women deliver term after having received ACS during an episode of threatened PTB or even prophylactically based on risk factors [10][11][12][13]. There are concerns that these children are at risk for being small for gestational age and for developing neurodevelopment disorders [10,14,15]. It is to be determined, however, if these outcomes are caused by ACS exposure or by the underlying etiology of preterm labour [14,16]. Repeat courses of ACS are also a result of the presumed optimal interval. They are associated with an improved short-term neonatal outcome, which could support the presumed optimal 24h-7d interval [8,9].

3
However, is this interval really the optimal interval? Is the decrease in neonatal morbidity higher when the ACS-tobirth interval is 24h-7d compared to other intervals? This is not an easy question to answer. This is a question implying causality. To answer this type of question, one should perform an RCT. However, how can one randomise patients into an interval when the endpoint of the interval is unknown at the time of randomisation? This has resulted in a considerable amount of literature in which the ACS-tobirth interval is considered in post hoc analyses of RCTs or in observational cohort studies. This article is a reflection on the reported literature on the ACS-to-birth interval, and an attempt to design a hypothetical trial examining the influence of the interval on neonatal outcome.

Methods
(1) A literature search on the relevance of the ACS-to-birth interval was performed in the PubMed database. The PICO and search string are available in S1. (2) A risk of bias assessment was performed on the Liggins and Howie RCT using the Revised Cochrane risk-ofbias tool for randomised trials (RoB 2) [1,17]. This tool defines bias as "a systematic deviation from the effect of intervention that would have been observed in a large randomised trial without any flaws". We explored bias of their post hoc analysis on the ACS-tobirth interval. The research question under consideration could be formulated as follows: Does the 24h-7d interval between ACS administration and birth cause a higher reduction in RDS compared to other ACS-tobirth intervals (< 24h, 8-14d, > 14d)?'. (3) As concerns observational studies on the ACS-tobirth interval, we explored if the research question and methodology used had the intention to look for association or causation, if the results section was free of interpretation, and if discussion and conclusion were reported accordingly [18]. We analysed if associational and causal language were used appropriately. Associational language was defined as expressions implying association without insinuating causation, for example 'is associated with', 'an association is observed', etc. Causal language was defined as expressions in which lie an assumption of causation, for example 'causes', 'changes', 'significantly reduced', 'explains', 'effect', 'attributes', etc. 'Confounding' is also a concept that only exists in causal studies. (4) Following the approach of Hernán and Robins for enabling causal inference from observational studies, an attempt was made to formulate a target trial about the effect of the ACS-to-birth interval on the incidence of neonatal outcomes (for this exercise, RDS was chosen) [19]. A target trial is the description of a hypothetical randomised controlled trial that we would envisage to analyse the causal effect of a treatment. Causal inference techniques can then be used in an attempt to emulate the target trial with observational data [19,20]. Hernán and Robins put a large emphasis on a welldescribed research question (a), described three key principles of causal inference (b), and described seven key components for adequately formulating a target trial (c) [19].

(a) Formulating the research question of the target trial
A causal research question requires an intervention [22]. If we give treatment A, what is the effect on outcome Y?

Exchangeability
In randomised experiments, the treated and untreated are exchangeable because the randomisation process ensures that independent predictors of the outcome are equally distributed between the treated and untreated groups [21]. In an observational study, based on expert knowledge, the researcher needs to define these independent predictors and foresee an equal distribution of these confounders between the treated and untreated group by applying particular statistical techniques, in which we will not go into detail (matching, stratification, standardisation, or inversed probability weighting). It is not possible to know if the researchers' assumption of which variables are confounders is correct, therefore there is always the risk of residual confounding.

Consistency
A treatment or intervention should be well defined. When a treatment is not well defined, multiple alternatives of the treatment are possible. Each treatment alternative can have another causal effect on the outcome and the average causal effect of the treatment will depend on the proportion of individuals who received each alternative [21]. The treatment needs to be specified in detail.

Positivity
The probability of being assigned to each level of treatment should be greater than zero [21]. Positivity is required for the confounding variables that are required for exchangeability: Pr [A = a|L = l] > 0 for all values l with Pr [L = l] ≠ 0 in the population of interest, with A the treatment under consideration and L the measured covariates considered to be confounders.

(c) Drafting the protocol
Seven key components of a target trial have been postulated [19]. For this exercise, we will consider RDS as the outcome.

Eligibility
As for RCTs, eligibility criteria, defining who is included into or excluded from the trial, need to be formulated.

Treatment strategies
Treatment should be assigned the moment at which eligibility criteria are met. And starting from then, outcomes need to be counted. These three elements (time when eligibility criteria are met, treatment is assigned, and study outcomes begin to be counted) need to be synchronised (time zero). If not, the target trial emulation can fail due to introduction of selection bias and immortal time bias. Immortal time bias can occur when treatment assignment precedes eligibility and patients die before being eligible [23].

Assignment procedures
To emulate the random assignment of strategies at baseline, we need to ensure exchangeability of the groups by adjusting for all confounders [19].

Follow-up period
The follow-up period needs to be defined.

Outcome
Independent outcome validation might be warranted in observational research [19].

Causal contrasts of interest
The two common causal effects are the intention-to-treat and the per-protocol effect. The intention-to-treat effect is the comparative effect of being assigned to a treatment strategy at baseline. The per-protocol effect is the comparative effect of patients who actually followed the treatment strategy and those who did not [19].

Analysis plan
An analysis plan would be an intention-to-treat or per-protocol analysis.

Results
(1) The literature search yielded 489 articles of which 21 articles were withheld (S2). Literature on the ACS-tobirth interval is summarised in Table S1 [1,[24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43]. Two RCTs on ACS reported on the ACS-to-birth interval: the initial first RCT and the RCT of Schutte et al. [1,24]. Schutte et al. performed a study (partially RCT (ACS vs placebo), partially observational (all women who received betamethasone)) to analyse the incidence of RDS in groups of preterm born children at different time intervals after admission of the mother. Although the goal was to investigate the relevance of time factors, the randomisation was for ACS or placebo. For the subgroup analysis of the time intervals, the number of patients in the time intervals was very small. Finally, the focus was less on the ACS-to-birth interval, but mainly on the admission-to-birth interval [24].
(2) We assessed the risk of bias in the Liggins and Howie trial using the RoB 2 tool [1,17]. In Table S2, a risk of bias assessment is available for the primary research question of the RCT (ACS vs placebo) and for the research question of interest for this article (ACS-tobirth interval). For both research questions, the risk of bias was considered to be high. The same conclusion accounts for the study of Schutte et al. [24]. (3) Observational studies on the ACS-to-birth interval were prone to causal conclusions based on associational research questions and methodologies. A summary is provided in Table S3 [ [25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43]. None of the observational studies used causal inference techniques; however, 89.5% used causal language. In 52.6% of articles, multiple regression was done to account for confounding. However, when considering an association, there is no need to correct for confounders [21]. In our opinion, Peaceman et al. and Fuller et al. were the only ones who did not use any terms that implied causality [29,38]. (4) We did an attempt to address the following research question in a hypothetical trial: 'Does the 24h-7d interval between ACS administration and birth cause a higher reduction in RDS compared to other ACS-tobirth intervals (< 24h, 8-14d, > 14d)?'.

(a) Research question
An interval is not an intervention. It is an observation. An intervention would be to give ACS or not, or to repeat ACS or not. To transform the ACS-to-birth interval into an intervention, patients could be randomised to one of the four groups and labour could be induced so that birth occurs within one of the four intervals (change the endpoint of the interval). Of course, inducing labour to ensure one delivers preterm within a certain interval, would be unethical. An alternative would be to postpone ACS administration (change the start point of the interval). However, omitting ACS administration based on false negative predictions for birth within a certain interval would again be unethical. So, the research question cannot be answered, at least not if the actual, a posteriori known, interval is used. A possible solution would be to use the predicted interval at the time of ACS administration (see Key components 6 and 7).

Exchangeability
To achieve (conditional) exchangeability between the patients in the four ACS-to-birth intervals considered, confounders need to be explored. Confounders are shared causes of treatment and outcome. A directed acyclic graph is a way to present these confounders, an example is presented in Fig. 1.

Consistency
Different regimens of ACS are possible and different intervals can be considered. One regimen needs to be chosen for the purpose of the target trial. For example: two intramuscular injections of 12 mg betamethasone [Celestone Chronodose ® , betamethasone acetate (3 mg/1 mL) and phosphate (4 mg/1 mL)] with a 24 h interval. The ACS-to-birth intervals taken into consideration could be: < 24 h, 24-7d, 8-14d, and > 14d.

Positivity
There should be patients in each predefined ACS-to-birth interval.

(c) Target trial protocol 1 Eligibility
Eligibility criteria could be: (1) All women at risk of spontaneous or iatrogenic PTB.
(2) Presenting at a gestational age of 23 weeks and 5 days till 33 weeks and 5 days.
Exclusion criteria: Antenatally detected fetal congenital malformations with an effect on neonatal outcome and intrauterine fetal death at presentation. The use of the ACS-to-birth interval as intervention is problematic and makes it impossible to establish a time zero. Birth falls after the baseline of the study and therefore is not known at the start of the study. Immortal time bias could arise when fetal death during the ACS-to-birth interval occurs.
Another issue is the possibility of repeating ACS. A large proportion of patients with the diagnosis of preterm labour does not give birth immediately (sometimes not even preterm). When experiencing a new episode of threatened PTB, a repeat course of ACS could be given. This further complicates the set-up of a target trial since this requires the availability of confounders for each administration of ACS (which might not all be available in the data) and the use of g-methods for time-varying treatments.

Assignment procedures
We cannot randomly assign patients to an interval, since we are interested in an 'intervention' only known post-randomisation.

Follow-up period
The follow-up period of the mothers is until delivery. The follow-up of the neonates is until discharge from the neonatal unit. Criteria for discharge need to be set out. Neonates can be transferred back to the referring centre and complications might still occur there. However, a diagnosis of RDS is very unlikely to be made after discharge of the neonatal intensive care unit.

Outcome
When the neonatologist knows a mother has received ACS, he might be less likely to diagnose RDS. We think neonatologists rather focus on whether ACS were given and not within which interval birth occurred. Therefore, we do not expect a problem with outcome validity for RDS. A sound, uniform definition of RDS, however, is mandatory.

Causal contrasts of interest
If we would be able to know the ACS-to-birth interval [by being able to predict when birth will occur (if only we could!) or by ensuring that a patient delivers within the interval she is randomly assigned to (unethical)], we would be able to assign patients to one of the four intervals. If, however, the patients delivered earlier or later than expected, they would belong to another interval. The intention-to-treat effect would compare the outcome of patients assigned to a certain interval, the per-protocol effect would compare the outcome of patients who delivered in a certain interval.

Analysis plan
In the intention-to-treat analysis, we would explore the effect of the predicted ACS-to-birth interval on RDS. We can also do a per-protocol analysis by looking at the effect of the ACS-to-birth interval in which the patient actually delivered on RDS. For this, we not only need to adjust for confounding at baseline, but also for post-randomisation confounding. Post-baseline prognostic factors need to be identified and adjusted for. These post-baseline prognostic factors might not always be readily available in the data, which will lead to residual confounding.

Comments
The research question of interest cannot be formulated causally based on observed ACS-to-birth intervals. It is impossible to set a time zero, a time at which eligibility, assignment to intervention, and start of follow-up coincide. Only associational studies and conclusions are possible, exploring an association between the different intervals and the neonatal outcome. Children born at a certain interval can have different characteristics than children born at other intervals. This was also addressed by Gates and Brocklehurst, who discussed that the evidence on the relevance of the ACS-tobirth interval is based on unsound subgroup analyses [44].
Conducting multiple tests is associated with a risk of obtaining significant (false positive) results due to chance alone (familywise error rate) [45]. When subgroups are not specified before doing an RCT, the risk of the analyses being a result of looking for significant differences between subgroups with the goal of finding any significant result, is high. One should always remain vigilant about post hoc and subgroup analyses, and interpret them with great caution. Subgroup effects gain credibility when the number of subgroups examined is small, the subgroups are prespecified, the treatment effect between the groups is considerably high, the sample sizes are large, and the observed effect is clinically plausible. Preferably, the effects are consistent and duplicated in other studies [46]. The outcome of the Liggins RCT was considerably biased due to subdivision of the study population postrandomisation in etiologic groups of PTB [1]. By doing this, the authors introduced a high risk of hampering the exchangeability between the intervention groups per etiologic group, which eliminated the advantages of doing an RCT. Moreover, by comparing multiple groups, they created smaller sample sizes, and introduced a risk of inflated familywise error rates [34]. As concerns the interval, the significant difference in RDS incidence in the 24h-7d group seems most likely to be due to chance. The groups compared were small (28 neonates in the treatment group, 24 in the control group), and the probability of a type I error was large (p value of 0.03 without multiple testing correction; when performing Bonferroni correction, the p value reducing the risk of a type I error, should have been 0.0125) [1]. Finally, the subgroup analysis was not prespecified and concerned a post-randomisation event. Hirji and Fagerland described four concerns regarding outcome based subgroup analysis: reduced power, overdone post hoc analysis, selective reporting, and overinterpretation. The subgroup analysis of the ACS-to-birth interval in the Liggins trial is prone to all these concerns due to, respectively, a small subset, a not prespecified analysis, highlighting of significant findings, and an overt overinterpretation of the result of the subgroup analysis which is reflected in the importance the interval gained in obstetrical literature and practice [47]. In general, whenever subgroups are considered in RCTs, there is a risk of post-randomisation confounding due to the risk of lack of exchangeability between the treatment and non-treatment arm within the subgroup [48].
The cohort studies are also marked by a high risk of selection bias. Cohort follow-up did not start when ACS were administered, but study participants were selected based on gestational age or weight at birth. Follow-up of women who delivered at term is absent. The risk of collider-stratification bias is high and even more complex than its pure form, since the collider is partially part of the 'intervention' (Fig. 1) [49,50]. Controlling for confounders in non-randomised studies in order to explore causality, requires an insight in which variable is a confounder. When using a DAG, designed based on subject-matter knowledge, confounders are shared causes of exposure and outcome. Once confounders are identified, they can be used in a propensity score model, often a multivariate logistic regression model, which will allow exchangeability of the treatment groups. When the principles of positivity and consistency are not violated, one can now use causal inference methods to explore if the exposure has a (causal) effect on the outcome. Interestingly, even the design of a DAG is challenging when considering the interval. An alternative to Fig. 1 is adding the interval as a node, which has implications on which variables are considered to be confounders. Alternatively, variables can also be included in a multivariate logistic regression model to enable outcome prediction, without aiming to explore a causal relationship between the exposure/predictor and the outcome.

Conclusions
In our opinion, we do not know if the ACS-to-birth interval has an influence on outcomes of preterm born children. An RCT comparing ACS with placebo with a considerably large sample size and prespecified subgroup analysis of the interval, correcting for multiple testing, could shed more light on the importance of the interval. However, due to the widespread use of ACS, and for financial and ethical reasons, such a trial is unlikely to happen. Moreover, the utility of the ACS-to-birth interval in daily practice is questionable, since the date of birth is not known at the moment of ACS administration.
The main problem with the ACS-to-birth interval is that the endpoint of the interval, birth, is an event in the future of which the time of occurrence is unknown at the moment the patient presents with threatening preterm labour, making it impossible to translate the interval to an intervention which would allow formulating a welldefined research question and selecting an appropriate study design. A 'simple' RCT or cohort study is not able to tackle this problem. More advanced techniques are mandatory, if ever it is possible to disentangle this web. If we would have an accurate prediction tool for time of delivery after ACS administration, people could be randomised to predicted ACS-to-birth intervals. In the awaiting of such a tool, an intermediate approach could be to investigate the risk of (for example) RDS according to the probability of birth at a certain time point, starting from the time of ACS administration. Another possibility would be to work with dynamic treatment strategies. Observational data and causal inference techniques can be used for this approach and risk of bias can be assessed with the Risk Of Bias In Non-randomised Studies of Interventions tool, which allows for risk of bias assessment of non-randomised studies of interventions [51]. We plan to do this "disentangling" analysis and invite research groups with data concerning the ACS-to-birth interval to join us.