The impact of unmeasured confounding in observational studies: a plasmode simulation study of targeted maximum likelihood estimation

Unmeasured confounding can cause considerable problems in observational studies and may threaten the validity of the estimates of causal treatment effects. There has been discussion on the amount of bias in treatment effect estimates that can occur due to unmeasured confounding. We investigate the robustness of a relatively new causal inference technique, targeted maximum likelihood estimation (TMLE), in terms of its robustness to the impact of unmeasured confounders. We benchmark TMLE’s performance with the inverse probability of treatment weighting (IPW) method. We utilize a plasmode-like simulation based on variables and parameters from the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT). We evaluated the accuracy and precision of the estimated treatment effects. Though TMLE performed better in most of the scenarios considered, our simulation study results suggest that both methods performed reasonably well in estimating the marginal odds ratio, in the presence of unmeasured confounding. Nonetheless, the only remedy to unobserved confounding is controlling for as many as available covariates in an observational study, because not even TMLE can provide safeguard against bias from unmeasured confounders.


Background
Observational studies of health care interventions and treatments are critical for assessing treatment effects and making patient-centred treatment decisions. Apart from the infeasibility of conducting randomized experiments due to the cost and ethical reasons, observational studies, unlike randomized trials, answer questions of safety and comparative effectiveness that are of direct interest to health care providers and patients faced with a treatment decision [1]. In some randomized trials, patients are exposed to treatments without fully understanding their benefits and risks [2].
The increasingly available observational data sources, including large health care databases, provide an opportunity to efficiently evaluate new treatments and compare with existing alternatives. Like every observational study, these evaluations require confounding control or adjustment. Confounders are referred to as variables or factors associated with both treatment assignment and outcome. While randomized experiments, via the randomization mechanism, balance both measured and unmeasured confounders between treatment groups, observational studies usually suffer from confounding effects. When substantial imbalances exist between the treated and control groups in an observational study, biased treatment effect estimates are produced.
Various techniques, including multivariable regression adjustment, propensity score matching, stratification and weighting, have been commonly used to account for observed or measured confounders [3]. However, these techniques are not entirely immune to the effects of unmeasured confounding, which is not uncommon in clinical and medical researches; hence it is still a vexing problem. For instance, a large cohort study found evidence of the effectiveness of ascorbic acid (vitamin C) in reducing all-cause mortality by 52% (relative risk: 0.48; 95% CI: 0.33 -0.70) [4], after adjusting for age, sex, body mass index, smoking status, systolic blood pressure, serum cholesterol, diabetes, and vitamin C supplement use. This effect was, however, not found in randomized trials (relative risk: 1.00; 95% CI: 0.94 -1.06, for vitamin C supplementation versus placebo) [5]; hence it was disputed whether the disparity of the results was due to the observational study being biased due to unmeasured confounding by, say, socioeconomic status or dietary habits [6][7][8].
Though several strategies, including sensitivity analysis, have been proposed and adopted to address the potential impact of unobserved confounders on causal inference estimation, they are not model estimation methods and do not have inbuilt facilities of determining the robustness to the impact of uncontrolled confounders. Sensitivity analysis is a systematic assessment of the changes in model estimation when the potential unobserved confounders are present versus when they are absent. This technique has been widely and increasingly utilized [9][10][11].
Despite extensive studies of the performance of the methods for confounding control, the robustness of these methods to uncontrolled confounders has received almost no attention. Much recently, targeted maximum likelihood estimation (TMLE) -a doubly-robust semiparametric method, has gained the attention of applied researchers [12][13][14][15]. TMLE estimates exposure effects or associations without relying on model specification. It combines semiparametric estimation, using machine learning algorithms, with an additional estimation process to optimize a parameter of interest (e.g. risk difference, risk ratio, and odds ratio) [16,17]. We sought to examine the performance of TMLE in terms of its robustness to the impact of unmeasured confounders on estimated treatment or exposure effects.
Using the inverse probability of treatment weighting (IPW) method, the most commonly used weighting method by applied researchers and practitioners, especially in the medical and health sciences, as a benchmark, we evaluated TMLE in a plasmode-based simulation study. Our plasmode simulation was inspired by a real empirical cohort study that is typical of health care interventions and treatments. The plasmode approach preserves the number and type of observed covariates in the original study, as well as the correlation structure among covariates, and attempts to simulate exposure and outcome to reflect the observed associations of these variables with those from the real data [18].

Methods
Assume W to be a vector of covariates, and T, Y denote the exposure (or treatment) indicator and observed outcome, respectively. Let Y (T = t) = Y (t) indicate an individual 's potential outcome under treatment T=t. A valid causal inference using IPW or TMLE requires some necessary assumptions. First, the time ordering assumption is required: the covariates W precede treatment T, T precedes Y in time, and Y depends on T and W, while T depends only on W. Secondly is the consistency assumption, which requires that an individual's potential outcome under the actually received treatment equals the observed outcome; Y (T = t) = (Y | T = t). The third is the positivity assumption, which requires that (within strata of W), every individual had a non-zero probability of receiving one of the two treatment conditions, i.e., 0 < P (T = 1|W) < 1. Finally, the assumption of no unmeasured confounding (also known as the conditional exchangeability assumption) is of ultimate interest in this study. This assumption states that the potential outcomes Y (t) are independent of the observed treatment T given measured covariates; Y (T = t) ⊥ T | W. It implies that (within the strata of W), the outcome under the potential treatment T = 1, i.e, P (Y (1) = 1 | T = 1, W) equals the outcome under treatment condition T = 0, i.e, P (Y (1) = 1 | T = 0, W). In other words, all confounders have been measured.

Targeted maximum likelihood estimation
We briefly describe the implementation of TMLE with binary outcome and exposure or treatment for the estimation of the marginal odds ratio. Two probability models are initially specified and estimated based on the observed data on individuals i = 1, 2, …, n. First, the initial conditional probability of outcome Y, given the treatment and covariates, Q 0 (T, W) = 0 ( | , ) is estimated. Second, the conditional distribution of the treatment, given covariates W, The estimate Q 0 ( , ) and the predictions Q 0 (1, ) and Q 0 (0, ) can be estimated with the standard logistic regression; machine learning tools like the Super Learner may also be used. Super Learner is an ensemble learner of a pre-specified library of algorithms with parameters. It uses cross-validation to adaptively create an optimally weighted combination of estimates from candidate algorithms [19].
The estimates Q 0 (1, ), and Q 0 (0, ) are then plugged-in into the substitution estimator of the parameter of interest, log odds ratio, to obtain an untargeted estimate: Initial estimates of Q 0 (T, W) are then updated along a path of some fluctuation parameters, incorporating additional information from the propensity score function to reduce residual confounding in Q 0 (T, W). This updating involves two steps: Firstly, , is used in a clever covariate * (T, W) to define a parametric working model to fluctuate Q 0 (T, W). * (T, W) = ( For each individual with =1 and =0, the clever covariates are calculated as * (1, ) =

(| )
and , respectively. In addition to adding the columns * (1, ) and * (0, ), these values are then combined to form a column * ( , ) in the data matrix.
In the final step, the fluctuation parameter ℇ is estimated by fitting an intercept-free logistic regression of Y on * (T, W) with the logit of Q 0 (T, W) being an offset (fixed quantity), where ℇ is the resulting coefficient of the clever covariate * (T, W). Accordingly, the estimate Q 0 is updated into a new estimate Q 1 of Q 1 (T, W): Consequently, the predictions from the updated model are used to estimate the probabilities of the counterfactual outcomes: The updated estimates Q 1 (1, ) and Q 1 (0, ) are empirically averaged and used to compute the targeted estimator the marginal odds ratio: The standard error can be estimated with the efficient influence function [20], and the Wald-type confidence intervals can be calculated correspondingly.

Inverse probability of treatment weighting
Marginal treatment effects can be estimated in the presence of confounding with the IPW estimator. IPW reduces confounding by correcting each ith individual's contribution by weight equal to the inverse of the estimated probability of exposure or treatment receipt [21]. This probability is usually estimated by logistic regression. By applying IP weighting, a pseudopopulation, where the distribution of covariates is comparable across treatment groups, is created [22]. Therefore, by contrasting the marginal outcome between the two treatment conditions in the created pseudo-population, via a weighted unadjusted logistic regression for the outcome, an unbiased estimate of the marginal treatment effect is produced. We used a robust sandwich-type variance estimator [22] to estimate the standard error of the IPW estimator.

Decription of simulated datasets
We adopted a plasmode-like simulation design to emulate real-world datasets with complex correlation and effect structures. This approach allows us to specify different prediction models for the exposure and outcome variables based on the covariance matrix and original estimates of the regression coefficients. We based our simulations on a previously published cohort study on the effectiveness of right heart catheterization (RHC) for critically ill patients [23,24]. The study, known as the SUPPORT study, was a five-centre study of decision making and outcomes in hospitalized, severely sick, hospitalized adult patients. Amongst other things, we chose this cohort due to its relatively high number of confounder variables, which can thus provide a challenging dataset for selecting potential confounders. The parameter values for intercepts α 0 and β 0 were unaltered to preserve the treatment and outcome prevalences in the real data.
To study the impact of unmeasured confounding, we assumed the model that adjusts for the full covariate set, to contain the true treatment effect. We then partition the covariate space into two subsets of potential confounders that were used to assess the robustness of TMLE to the impact of not adjusting for the full covariate set. In the first set (W1), also known as the simple set, we adjusted for predefined important confounders, including demographic characteristics. These covariates include sex, age, race, primary disease category, type of insurance, and APACHE score.
In the second set, known as the moderate set (W2), we included variables from the simple set (W1) and combined literature knowledge with variables associated with RHC use from the original analysis [24]. These covariates include cancer status, support model estimate of surviving two months, comorbidities, mean blood pressure, white blood cell count, albumin, sodium, pH levels, do-not-resuscitate DNR status on day 1, heart rate, respiratory rate, and temperature. In each of the two covariate sets, other measured confounders in the full set were intentionally omitted.
Apart from the degree of unmeasured confounding (the two covariate sets size), other factors we varied were sample size (n = 1000, 5000) and confounding weight (1, 1.15, 1.3, 1.45). By confounding weight, we mean the multipliers that were multiplied to the estimated covariate coefficients to increase the amount of confounding in the simulated data. In total, we have 2 sample sizes x 4 confounding weights x 2 covariate sets = 16 different simulation scenarios. We then simulated two potential outcomes: one assuming that the patient received the RHC treatment Y (1) and one assuming that the patient did not receive the RHC treatment Y(0). For each of the 16 scenarios, we generated 1000 simulated datasets by repeating this process 1000 times, and we averaged two sets of potential outcomes and calculated the true marginal odds ratio.

Analyses of simulated datasets
After simulation of the datasets, we investigated the impact of unmeasured confounding on TMLE, relative to IPW, in terms of accuracy and precision of treatment effect estimates. For each of the simulation scenarios, we applied each of the TMLE and IPW methods of treatment effect estimation. For TMLE, due to computational simplicity, we used generalized linear models (GLMs) to estimate both the PS and outcome models. For both methods, the covariates were included as main terms in both the PS and outcome model. Interactions between covariates were not considered.
For each method, we estimated the marginal odds ratio and computed the bias, mean squared error (MSE), model-based standard errors, and 95% confidence interval (CI) coverage of the estimated treatment effects. CI coverage is the proportion of times the estimated confidence intervals contain the specified parameter value. TMLE was implemented using the tmle package [27] in R version 4.0.0.

Results
We present the simulation results based on the scenarios and performance metrics explained in the earlier section. We focus on the performance of the TMLE method, using IPW as a threshold for evaluating the results. For both estimation approaches, except for the 95% CI coverage, there were no substantial differences in the bias, MSE, and standard error of estimated treatment effects, given the small and moderate covariate set adjustments. In all considered scenarios, it was observed that the MSE increased as the confounding weight increases, irrespective of the sample size. The same pattern was observed for the standard error and 95% CI coverage. For the bias, an opposite pattern was observed. The absolute bias was consistently decreasing with increasing confounding weights.
However, this pattern was clear and distinct for the large sample size (n = 5000) only.
In Figure 1, the results of the absolute bias of estimated treatment effects are presented for the different scenarios explained in the earlier section. The bias is generally low in the moderate-sized sample size (values ranged from 0.013 -0.240) and is generally high given high sample size (values ranged from 0.119 -0.158). For moderate sample size (n = 1000), TMLE produced slightly lower bias only when the confounding strength was higher than 1.15. However, no notable difference was observed given a large sample size (n = 1000).
TMLE maintained lower MSE between the two methods, for moderate sample (n = 1000), with the difference in MSE being increasingly larger as the confounding weight increased (Figure 2).
However, indistinguishable estimates were observed for large sample size (n = 5000).
In Figure 3 In terms of the 95% CI coverage, there was no clear pattern of superiority between the two methods; both techniques generally achieved reasonably high coverages overall (Figure 4). For moderate sample size (n = 1000), both methods produced 95% CI above the nominal coverage rate, while for large sample size (n = 5000), higher coverage rates were achieved at confounding weights higher than 1.15. An excellent causal inference in observational studies is as good as the controlled, observed confounders. Accordingly, an excellent observational study strives to control for a rich set of covariates -as many as available, so as to minimize the risk of unmeasured confounding. In this study, we investigate a relatively new method of estimating causal treatment effects, TMLE, in terms of its robustness to the impact of unmeasured confounding. Using a plasmode-like simulation study, we compared its performance with IPW, a relatively popular and commonly used method in applied studies. While focusing on TMLE, we summarize our findings and place them in the context of existing literature, where necessary.  We used parametric models for the initial treatment and outcome models estimation due to computational simplicity. However, to take optimal advantage of the properties of TMLE, dataadaptive methods, such as Super Learner [19], are recommended when implementing TMLE. The potential integration of data-adaptive methods while retaining valid inference is a unique strength of TMLE.
Our study is not devoid of limitations. Although we covered a number of scenarios, our simulated data were based on one real cohort, exemplifying only one type of complex data structure from the data. Even a Monte Carlo simulation study can never capture the whole space of possible data scenarios as they occur in reality, the generalizability of results is a potential limitation in our plasmode simulation study. Assignment of a broad range of covariate distributions or parameter values is not usually possible; the covariates parameter values are instead adopted directly from a real-world setting and fixed for simulation. Therefore, our results cannot necessarily be generalized to settings that have not been evaluated; hence we cannot draw definitive conclusions on the comparative performance of the TMLE and IPW estimators. However, this specific data setting is typical of medical and health studies and has already provided useful insights and evaluation regarding the performance of TMLE in the presence of (multiple) unmeasured confounders. Additionally, we did not explore other outcome types, including continuous and time-to-event outcomes, in the context of this study. We look forward to filling the gaps left by these limitations.
In conclusion, unmeasured confounding is still a vexing problem in observational studies. Not even TMLE can provide safeguard against bias from unmeasured confounders. Therefore, more efforts should be devoted to collecting and controlling for as many as available covariates in an observational study. If data on any confounder is available, it should be controlled for in the estimation of effects. For instance, in a study of the relationship between plasma ascorbic acid level and mortality, it was noted that, despite being recorded, information on the social class and physical activity of the participants was not used in the analysis [4]. Lawlor and colleagues [7] showed that if these variables were indeed confounders of the ascorbic acid and mortality association, even a moderate effect of any of them would have resulted in considerable residual confounding in the reported estimates. We further support the recommendation that sensitivity analysis of (multiple) unmeasured confounders should routinely perform to guide the discussion on the possible direction, magnitude, and its impact [9,[28][29][30][31].

List of abbreviations
HIV: TMLE: targeted maximum likelihood estimation; IPW: inverse probability of treatment weighting

Ethics approval and consent to participate
Not applicable

Consent for publication
Not applicable

Availability of data and materials
The dataset used in this study is available from the corresponding author upon reasonable request.