Our study explores spatial associations between long term average concentrations of DPM, as a metric for past air pollution exposure, and COVID-19 mortality across each pandemic wave and throughout 2020 in the U.S. The objectives of the study are 1) to assess if living near DPM sources increased the risk of death from COVID-19, 2) to estimate how associations between mortality and long-term exposure to DPM may have changed over time with changes in the Coronavirus and in the population’s behavior, and 3) to test if models accounting for spatial autocorrelation improve model estimates. Data for air pollution, health, demographic, and social determinants of health were merged for this analysis, and global and local models were both applied to examine these relationships.
Model Runs
We tested the association between COVID-19 mortality and long-term DPM concentrations across the contiguous United States for time periods coinciding with each COVID-19 wave in 2020: January 1-May 31, 2020, June 1-September 30, 2020, and October 1-December 31, 2020. We also ran the model for the entire year: January 1-December 31, 2020.
We used regression analysis to examine spatial non-stationarity in the relationship between COVID-19 and DPM while accounting for potentially confounding effects. This work is similar to spatial modeling approaches used by Sun et al. (2020) and Rahman et al. (2020). Sun et al. (2020) investigated different spatial regression models and compared them with an ordinary least squares (OLS) regression model to explain the transmission pattern of COVID-19. County-level race/ethnicity and socio-economic covariates were included in their models. We adapted their approach by focusing on associations of COVID-19 mortality with DPM and by investigating different time periods. Three global models, OLS, spatial lag model (SLM), and spatial error model (SEM), were run to produce a nationwide effect estimate. One local model, geographically weighted regression (GWR), produced effect estimates at the county scale. The R Statistical Software version 3.6.3 was used to run all code. We performed spatial regression modeling with the following libraries: spdep, spgwr, and spatialreg.
OLS models are designed to minimize the sum of squared differences between the true data and the prediction across the dataset (Goldberger, 1964). Mollalo et al. (2020) studied county-level variations of COVID-19 incidence in the U.S. From a list of 35 demographic, socio-economic, topographic and environmental variables, they used a stepwise forward selection procedure and then checked for multicollinearity to determine the most significant predictors of COVID-19. Then, using the same selected explanatory variables, they tested their model using OLS and several spatial models including SEM, SLM, and GWR (described below). Accounting for spatial autocorrelation in their model improved performance over OLS. Karaye and Horney (2020) also compared OLS to spatial regression models to analyze the impact of social vulnerability on COVID-19 cases. Spatial autocorrelation of the residuals may compromise the validity of the OLS model and produce biased estimators (LeSage and Pace 2009, Loonis and De Bellefon 2018). The model assumptions of zero mean, independence, heteroscedasticity, and normal distribution are met for the case where OLS is a complete and correct model in which the variables capture all of the spatial variation without specifying spatial positions (DeAngelis and Yurek 2017, Schabenberger and Gotway 2017). Spatial autocorrelation in residuals may occur due to an omitted variable.
SLMs estimate an autocorrelation parameter (“spatial lag”) using a weighted average of the response variable across neighboring areas, testing if neighboring observations affect one another (LeSage and Fischer 2008, Sun et al. 2020). As the autocorrelation parameter approaches zero, the SLM approaches the OLS (LeSage and Fischer 2008). In SEMs, errors across neighboring areas are autocorrelated (“spatial error”) (Le Gallo et al. 2005). SEMs estimate the relationship between the residuals in a spatial region and those in adjacent regions (Sun et al. 2020). The spatial structure is in the residuals, meaning that some important predictors are omitted in the model (Chi and Zhu 2020).
SLM and SEM have only one spatial dependence parameter. The single-valued characteristic makes it impossible for global spatial models to reveal local spatial patterns (Chi and Zhu 2020, Fotheringham et al. 2003). Another limitation of global spatial models is that the model is dependent on the spatial weighting matrix (Chi and Zhu 2020 ). In contrast, GWR allows for local models to be fit to each observation using spatial distance as a weighting factor for the influence of all other points (Fotheringham et al. 2003). To determine local associations between COVID-19 cases in the U.S. and demographic, socio-economic, topographic and environmental parameters, Mollalo et al. (2020) examined two local models including GWR. The variables incorporated in the model are the same set used for OLS, SLM, and SEM. Similarly, Karaye and Horney (2020) compared GWR to OLS to understand the spatially varying effect in the relationship between social vulnerability and COVID-19 case counts. The main advantage of GWR as a local model is the ability to test for spatial variability among the effects of different variables in the model (Chi and Zhu 2020, LeSage and Pace 2009, Fotheringham et al. 2003). Another strength is that GWR has the same model structure as the OLS, which facilitates comparison between the two models (Fotheringham et al. 2003).
For our spatial autoregressive models, we estimated spatial relationships between regions based on contiguous boundaries shared between 2 or more counties, assuming that COVID-19 spread in a county is influenced by adjacent counties. For GWR, a cross validation function minimizes the root mean square prediction error that defines the weight matrix. We evaluated spatial autocorrelation among contiguous cells in the model residuals using Moran’s I (Moran 1950). Statistically significant Moran’s I indicates either correlation or anticorrelation among neighboring units. Additionally, we used Lagrange multiplier test statistics to understand whether the spatial lag or spatial error pattern is more important for interpreting the local results.
The level of urgency of the COVID-19 outbreak contributed to uncertain policy decisions and interventions in health in compressed timeframes coupled with the complex social, economic and political events of 2020 (Lancaster et al. 2020). Therefore, effects related to pandemic waves could have influenced the importance of specific variables during these different times of the year. Therefore, a set of different covariates have been integrated into the model for each time period. To determine which covariates to include in the regression models of COVID-19 mortality, we applied a stepwise selection algorithm for each season (Table 1). Then, the same covariates were incorporated in the best model for OLS, SLM, SEM, and GWR for each specific wave (Table 2), based on the following framework:
COVID-19 deaths = DPM concentration + Confounder variables + error term (1)
Table 2
Model framework for each wave modeled.
Wave Dates
|
Models
|
Jan 1-May 31, 2020
|
COVID-19 deaths = DPM concentration + Fraction Black + Fraction American Indian + Fraction who take public transportation to work + Fraction average time to work + Fraction uninsured + Fraction smoking + Fraction Income inequality + Population density (2)
|
Jun 1-Sep 30, 2020
|
COVID-19 deaths = DPM concentration + Fraction Black + Fraction Hispanic + Fraction American Indian + Fraction who take public transportation to work + Fraction reporting inactivity + Fraction Incomplete school + Population density (3)
|
Oct 1-Dec 31, 2020
|
COVID-19 deaths = DPM concentration + Fraction Black + Fraction American Indian + Fraction working in a mining or agricultural occupation + Fraction average time to work + Fraction reporting inactivity + Fraction obese + Fraction over 65 + Fraction homelessness (4)
|
Jan 1-Dec 31, 2020
|
COVID-19 deaths = DPM concentration + Fraction Black + Fraction Hispanic + Fraction American Indian + Fraction Pacific Islander + Fraction working in a mining or agricultural occupation + Fraction reporting inactivity + Fraction with a severe housing burden + Fraction Income inequality (5)
|
The confounder selection procedure was based on minimizing the Akaike information criterion (AIC) after controlling for multicollinearity. We used this same process for each of the three waves and throughout 2020 to find the most significant models for determining the nationwide and local associations between COVID-19 mortality and DPM concentration.