A Negative Control Outcome Regression Accounting for Unobserved Confounding and Lagged Causal Effects

23 Background: Epidemiologists are increasingly interested in using negative controls to 24 eliminate unobserved confounding. Particularly, difference-in-differences method, which uses 25 pre-exposure outcomes as negative control outcomes, is widely used. However, it obtains biased 26 estimations when pre-exposure outcome has lagged causal effect on post-exposure outcome. 27 Methods: Taking advantage of pre-exposure outcomes as negative control outcomes, Negative 28 Control Outcome Regression (NCOR) is proposed to eliminate unobserved confounding. The 29 intercept term of NCOR provides an unbiased causal effect estimate of exposure on post- 30 exposure outcome, and the slope minus 1 denotes the lagged causal effect estimation of pre- 31 exposure outcome on post-exposure outcome. We then illustrate the potential of NCOR in a 32 challenging application to estimate the causal association of PM₂.₅ on all-cause mortality rates 33 (AMR) and lagged causal effect of pre AMR on post AMR. 34 Results: Both theoretical justifications and simulation studies validate that the causal effect of 35 exposure on outcome, along with the lagged causal effect of outcomes are identifiable and can 36 be estimated by proposed NCOR model. The application results demonstrate that the previously 37 estimated association between PM₂.₅ and AMR can be attributed to the unobserved confounding. 38 Furthermore, the NCOR model reveal that pre AMR has no causal association with post AMR. 39 Conclusion: The proposed NCOR model can obtain unbiased and robust causal effect 40 estimation of exposure on outcome, and the lagged causal effect of outcomes. The proposed 41 NCOR is implemented as an R package, called NCOR, and is freely available on GitHub. 42


48
Unobserved confounding is a well-known threat to identify and estimate the causal effects, and 49 it is rarely avoided with certainty in observational studies [1,2]. The use of negative controls 50 to detect or eliminate unobserved confounding has gained increasing acceptance and popularity 51 [3][4][5]. In general, a variable that is related to the unobserved confounding factors but not  Studies that used negative controls to detect potential confounding can be traced back to 65 Rosenbaum who detected the potential confounders using an auxiliary outcome [13]. In  In this study, following the idea of meta-regression, we propose a negative control outcome 96 regression (NCOR) approach to accurately estimate the causal effect of exposure on the future 97 outcome using only a pre-exposure outcome. Both theoretical proofs and simulation studies are 98 performed to validate the effectiveness of NCOR. Furthermore, we illustrate the potential of 99 the NCOR to estimate the causal association of PM2.5 with the all-cause mortality rates (AMRs).

100
In addition, we also provide an R package that can be used by the research community to 101 implement the NCOR model, which is freely available on GitHub (https://github.com/yuyy-102 shandong/NCOR).

104
Difference-in-difference model 105 DID model is most commonly used when pre-and post-exposure outcome can be observed.

106
The causal effect can be estimated by comparing the changes in outcomes over time between 107 two different groups. Figure 1   For Assumption 1(a), a simple causal model supposes that () Yt follows the simple linear 120 model, which is depend on exposure ( X ) and unmeasured confounders (U ). Assumption 2(a) 6 represents that (0) Y does not affect future exposure X , and the future exposure X does not 122 affect pre-exposure outcome (0) Y , reflected by the absence of arrow between (0) Y and X .

123
For assumption 3(a), the association of the unobserved confounder with the outcome is assumed 124 equal across exposure groups and constant over time.

125
Under the assumption as above[9, 11], the formula can be 126 used to estimate the causal effect of exposure on post-exposure outcomes.  We consider that J independent studies from different areas of spatial or research centers. For each study j,

Assumption 3(b). (Parallel trend assumption)
150 For each study j , 1, For each study j , autoregressive process with a fixed auto-correlation coefficient (i.e. the effect of (0) j U on

166
(1/ 2) j U is the same as the effect of ) and fixed variance for each study j. In order to satisfy assumptions 3(b), we prefer that the 168 time points of (0) Y and (1) Y are symmetrical with respect to those of X .
. The sample size of each study ( j n ) required for level 193  and power  is More details about the sample size calculation the simulated data is generated as follows.   i) LR-AC model: In the first simulation study, we compare the biases, standard errors (SEs) and mean square iii) we change the size of To illustrate the NCOR model in epidemiology studies, we assess the potential causal effect of  Nevertheless, the NCOR model reveal that AMRYear-1 has no causal association with AMRYear+1.

326
The results indicate that the association between AMRYear-1 and AMRYear+1 is most likely caused 327 by spurious associations due to unobserved confounders.

329
A significant challenge in observational studies is to control the potential confounders 330 between exposure and outcome [1,2]. The negative control analysis aims to identify the 331 presence of residual confounders and further correct the unmeasured confounders [7,19]. The

332
DID method can be regarded as a negative outcome control approach through a monotonicity 333 hypothesis of unmeasured confounding effects [9]. Theoretically, the formula  The authors declare that they have no competing interests.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. SupplementaryMaterial.docx