This is a double negative control study [35]. A pair of negative control outcome and exposure variables were employed to estimate the short-term PM-lung cancer hospitalization associations based on the time-series data, as shown in Fig. 1. Briefly, a transformation of the negative control outcome identified by a negative control exposure, referred to as the confounding bridge, is used to capture and diminish unmeasured confounding under several key assumptions. (1) Conditional on a sufficient set of measured and unmeasured confounders, the exposed subjects would experience the same average outcome as the unexposed subjects. This is the fundamental assumption required for inferring causality in observational studies [40]. (2) The negative control outcome is associated with both the measured and unmeasured confounders but not causally affected by the exposure. To this end, we used the number of lung cancer cases in week \(t-1\) as the negative control outcome because it could not be affected by the mean PM concentration in week \(t\) unless the presence of unmeasured confounding. (3) There are some confounding bridge functions such that the unmeasured confounding effect on the outcome at each exposure level is identical to that on the confounding bridge function. In practice, this can be achieved by prespecifying a proper parametric or non-parametric model [35]. (4) The negative control exposure is independent of both the outcome and the negative control outcome after adjusting for the exposure and the measured confounders. Notably, in time-series studies, the mean PM concentration in week \(t+1\) could not affect the number of lung cancer cases in week \(t\), and thus would be a well-defined negative control exposure. It has also been used in previous time-series studies to test unmeasured confounding [37, 38]. Then, any non-zero association between the negative control exposure and the outcome or the negative control outcome is completely driven by unmeasured confounding. Furthermore, there are no restrictions on the association of the negative control exposure with the exposure and the association of the negative control outcome with the outcome. Supplementary Material provides more details on the double negative control design.The study is in accordance with relevant guidelines and regulations.
We did not have a formal prospective analysis plan because this is a hypothesis-driven study. This study only involved the record data and did not involve human subjects directly. It was exempt from the IRB review in Cancer Hospital Chinese Academy of Medical Sciences, Shenzhen center.
Study Area
Shenzhen, one of the typical immigrant cities of the Pearl River Delta region in China, has been experiencing the fastest urbanization and transportation development for the past four decades, with 13,026.6 thousand permanent population (refers to those who have lived in Shenzhen over half a year, including 4,547.0 thousand registered and 8,479.7 thousand non-registered population) living in an area of ~ 1,997.5 square kilometers [41]. Typically, Shenzhen has a subtropical maritime climate, and is economically advanced and less polluted.
Data Collection
Cancer-specific case certificates with de-identified personal information were extracted from Cancer Hospital Chinese Academy of Medical Sciences, Shenzhen center between January 1, 2018, and December 31, 2019. Types of cancer are recorded by the International Classification of Diseases 9th Revision (ICD-9) or 10th Revision (ICD-10), with lung cancer coded as 162 in ICD-9 or C34 in ICD-10. In this study, we only focused on those lung cancer cases with the initial record, excluding the recurrent cases mainly due to their treatment strategies and poor prognosis. Of these eligible cases, we further divided weekly aggregated cases into several strata by sex and age (i.e., < 65, 65–74, and 75 + years) without sample size calculation.
Daily time-series data on air pollutants, including PM10 (\({\mu }\text{g}/{\text{m}}^{3}\), 24-hour average), PM2.5 (\({\mu }\text{g}/{\text{m}}^{3}\), 24-hour average), ozone (O3, \({\mu }\text{g}/{\text{m}}^{3}\), maximum 8-hour average), nitrogen dioxide (NO2, \({\mu }\text{g}/{\text{m}}^{3}\), 24-hour average), sulfur dioxide (SO2, \({\mu }\text{g}/{\text{m}}^{3}\), 24-hour average), and carbon monoxide (CO, \(\text{m}\text{g}/{\text{m}}^{3}\), 24-hour average), were obtained from the National Air Quality Real-Time Publishing Platform (http://106.37.208.233:20035/). This platform is administered by the Chinese Ministry of Environmental Protection and has displayed real-time concentrations of air pollutants from controlled monitoring sites since January 2013. The weekly mean concentration for those air pollutants was simply averaged from all monitoring sites in Shenzhen (Supplementary Material) across a week based on the ISO week date system. To account for the potential effects of weather conditions, including temperature (℃) and relative humidity (%), we also obtained daily mean temperature and mean relative humidity for Shenzhen from the National Climatic Data Center (NCDC available at https://www.ncei.noaa.gov/; Air force station BAOAN [ID 594930]). Moreover, the weekly mean temperature and relative humidity were computed by averaging the daily monitoring data across a week.
Statistical analysis
Under the double negative control analytical framework, we employed a generalized additive linear confounding bridge function, which has been widely used in previous studies [34, 42]. We obtained PM-lung cancer hospitalization associations via a modified two-stage least estimator [35]. In stage 1, we regressed the negative control outcome on the negative control exposure, observed confounders and the exposure, and obtained the predicted value of the negative control outcome. In stage 2, we regressed the primary outcome on the predicted value of negative control outcome, observed confounders, and the exposure. Then, the coefficient between the exposure and the outcome in stage 2 is the causal estimate of interest. Furthermore, the corresponding standard errors were estimated using the heteroscedasticity and autocorrelation covariance method [35, 43, 44].
We estimated the short-term effects of PM10 and PM2.5 on lung cancer hospitalization risk with using the square root of the number of lung cancer cases for normalization and variance stabilization [35, 45]. We excluded the missing data for lung cancer cases in week 40 2018 because it involved a seven-day long holiday on National Day of the People’s Republic of China. We explored the delayed effects (i.e., the exposure affects the outcome for a lapse of time beyond the event period) and quantified net effects over a predefined lag period. For the main model, we emphasized the estimated associations of PM level in the present week and included a discrete Fourier transform of time to control for the underlying time trends in lung cancer hospitalization risk, an indicator of the month to account for short-term monthly variations, and natural spline functions with nine df for temperature and three df for the relative humidity to control for the potentially non-linear effects of weather conditions. We reported the point estimate and 95% confidence intervals as the change in lung cancer hospitalization risk per 10 \({\mu }\text{g}/{\text{m}}^{3}\) increase in weekly mean PM10 or PM2.5 concentration.
We fitted two-pollutant models beyond the main model by adjusting for an additional gaseous pollutant of either O3, NO2, SO2, or CO. The PM10- and PM2.5-lung cancer hospitalization associations were considered robust if consistent causal estimates were obtained from both single- and two-pollutant models, as determined by a paired z-test [46]. We also carried out a set of sensitivity analyses to explore the delayed effects by examining various lag structures (Supplementary materials). We then conducted stratified analyses according to sex and age group to investigate the possibly modified effects. Finally, we performed the confounding tests and estimated the corresponding unmeasured effects with an additional adjustment for the PM10 or PM2.5 concentration, time, month indicators, temperature, and relative humidity in the same approach included in the main model.
All statistical analyses were performed using R software, version 3.6.3 (R Foundation for Statistical Computing) via the AER package for the two-stage least estimator and the stats package for the ordinal least square estimator. Sample codes are available from the first author on request. A P-value \(<0.05\) was considered statistical significance. We adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guides for reporting observational studies reporting associations of weekly mean PM10 and PM2.5 concentrations with lung cancer hospitalization risk [47].