3.2. Outcome-wide Longitudinal causal framework
The chi-squared test is useful for establishing if there is an association between teetotal status and social outcome. However, it may be that people who do not socialise have less opportunity to consume alcohol (reverse causality) or there may be other factors such as health status that affect both social relationships and alcohol consumption (omitted variable bias/confounding).
To address these issues of reverse-causality and confounding we make use of the outcome-wide longitudinal framework specified by VanderWeele et al. (2020). This framework advocates including a large number of covariates so that an estimate of the treatment effect is as unaffected by omitted variable bias as possible.
Additionally, this framework advocates use of temporal ordering of covariates, exposure and outcome that is possible in longitudinal data. Exposure data should come from a period that precedes the outcome variable. The temporal ordering of exposure preceding outcome is necessary for our suggested causal relationship to be plausible, if exposure and outcome are measured at the same time, it makes it impossible to untangle cause and effect.
Additionally, data on covariates should come from a period that precedes exposure. Controlling for covariates from a period that precedes exposure helps reduce the risk of accidentally controlling for a mediator of the relationship between exposure and outcome which would lead to a biased estimate of the treatment effect.
The framework further suggests that depending on data availability, pre-exposure levels of the outcome should be controlled for. Controlling for the exposure at baseline can help mitigate (but not fully rule out) reverse causation, as in this instance it would allow us to look at the effect of changes in drinking behaviour on subsequent socialising conditional on previous socialising. It helps us rule out the possibility that if those who drink are more social, that this is not because people who are more social are more likely to drink.
It also advocates for controlling for pre-baseline exposure to reduce the risk of reverse causality and to reduce the risk of confounding. Figure 3 provides a visual explanation of how controlling for previous exposure can help reduce the risk of uncontrolled for confounding being the sole driver of our relationship. Including a prior measure of the exposure means that for a set of unobserved confounders to explain away our entire relationship, it would have to be associated with the outcome and the baseline exposure, independent of its relationship with the prior exposure (VanDerWeele et al., 2020). As shown in Fig. 3. taken from VanDerWeele et al. (2020), the relationship between U (Unmeasured confounders) and prior exposure (APrior), as well as U and to our final outcome (Yk) would have to be present and substantial.
Following the guidance of the VanderWeele framework, we construct a logit regression model that has multiple outcome variables using data from wave 9, exposure data from wave 7, covariate data from wave 6, prior outcome data from wave 6, and prior exposure data from wave 5. We were only able to include prior outcome data for our main outcome variable of interest, seeing friends socially, as this was the only outcome variable for which data was collected before wave 9.
3.2.1. E-values
As a form of sensitivity analysis, we calculate E-Values for our point-estimates and confidence intervals, which can be viewed as a measure of robustness for associations against uncontrolled for confounding (VanderWeele et al., 2020). VanderWeele and Ding (2017) define E-Values as the ‘minimum strength of association, on the risk ratio scale, that an unmeasured confounder would need to have with both the treatment and the outcome to fully explain away a specific treatment-outcome association, conditional on the measured covariates.’ An advantage of E-Values is that unlike other forms of sensitivity analysis it requires minimal assumptions (VanderWeele and Ding, 2017).
As an example, if the E-Value for an estimated relative risk between an exposure and outcome after controlling for observed confounders was 2, an unmeasured confounder would have to be associated with both the treatment and the outcome by a risk ratio of 2-fold each to explain away the estimated relationship between exposure and outcome (VanderWeele and Ding, 2017). The lowest possible E-Value is 1, which means that no un-measured confounding is required to explain away the exposure-treatment relationship.
E-values are reliant on the magnitude of the association between the exposure and outcome, so unlike p-values, E-values cannot be made arbitrarily small by increasing sample-size, though they do not make precision inconsequential. More precise estimate of the effect size will lead to smaller 95% confidence intervals, so should lead to smaller e-value estimates for the confidence interval (though this is still bounded by the effect size).
E-values do not specifically consider the number of covariates that have been used but E-value estimates can be interpreted as more robust when more potential confounders are controlled for as it means there are less un-measured confounders out there that could explain away the relationship (VanderWeele and Ding, 2017).
We use the formula for E-values specific to relative risks or odds ratios (OR) where the binary outcome is rare (< 15%) (VanderWeele and Ding, 2017).
E-Value equations if OR < 1:
Point-estimate:
E-Value = \(O{R}^{*}\) + \(\sqrt{O{R}^{*} \times (O{R}^{*}-1)}\) (1)
CI:
If UL > = 1, then E-value = 1 (2)
If UL < 1, then E-Value = \(U{L}^{*}\) + \(\sqrt{U{L}^{*} \times (U{L}^{*}-1)}\) (3)
Where OR*=1/OR and UL* = 1/UL
E-Value equations if OR > 1:
Point-estimate:
E-Value = \(OR\) + \(\sqrt{OR \times (OR-1)}\) (4)
CI:
If LL < = 1, then E-value = 1 (5)
If LL > 1, then E-Value = \(LL\) + \(\sqrt{LL \times (LL-1)}\) (6)
Where OR*=1/OR and UL* = 1/UL
We specify the formula we use for E-Values for both our point-estimates and confidence-intervals in Equations 1–6, where OR is the odds ratio, UL is the upper confidence interval, and LL is the lower confidence interval. Empirical proofs for E-values can be found in Ding and VanderWeele (2016). We use the E-Value calculator created by Mathur et al. (2018) (VanderWeele and Ding, 2017).
3.2.2. Regression analysis
Y𝑖9 = \(\tau\)Aij7 + \(\rho\)Aij5 + \(\delta\)Yi6 + 𝐵Xi6 + ui (7)
Equation 7 details our estimation strategy, Y𝑖9 represents our final outcome variables (socialisation outcomes) for individual ‘i’ in wave 9. A represents our alcohol exposure, Aij7 represents the exposure in wave 7 (post-baseline), and Aij5 represents the exposure in wave 5, a prior (pre-baseline) value of the exposure. The superscript ‘j’, indicates which exposure we are using, j = 1 indicates we are looking at extensive margin (binary drinker/teetotal variable), and j = 2 indicates we are looking at intensive margin (categorised frequency of alcohol consumption). Yi6 represents a baseline value of the outcome (from wave 6). Xi6 is our vector of covariates taken at baseline (wave 6). Ui represents our random error for individual ‘i’. Our coefficient of interest is \(\tau\), which conditional on assumptions holding, represents the causal effect of teetotalism on later social outcomes.
We estimate our results using logit regression with robust standard errors. We report the change in odds ratio associated with each variable.
We report the point-estimate change in odds ratio for our exposure, the E-Value for the point-estimate, as well as the E-Value for the 95% confidence interval.