Study area:
Lima is located on the central coast of Peru, at an average of 150 meters above sea level, covers a geographical area of 2819 Km2 and a population density of 3392 inhabitants/km2. Lima has a population of 9,562,000 representing about 30% of the national population. [21]. Lima is comprised of 43 districts and divided into four zones: North Lima, Central Lima, East Lima, and South Lima. We excluded 4 districts at high altitude (all above 570 meters average altitude) due to uncertainty about the PM2.5 model predictions in these districts. The uncertainty was largely driven by the fact that ground monitoring stations providing inputs to the PM2.5 model were all located below 375 meters, requiring a large extrapolation to these four high districts, using the model’s prediction of the altitude effect. These districts (district numbers 150106, 150107, 150109, and 150118) represented only 4% of the total population of Lima.
Mortality Data
Data on daily mortality were obtained from the Ministry of Health (MoH). Variables included in this data were age, gender, district of residence, district of occurrence of death, cause of death with respective International Classification of Disease 10th revision (ICD-10). Deaths from respiratory (J00-J99) and circulatory (I00-I99) disease were considered for the study. The database included 109,951 recorded deaths from respiratory and circulatory disease, between January 2010 and December 2016. We excluded four districts (10,228 deaths) because the model may be inaccurate above 375 meters (as noted above), and also excluded some other observations because pollution on some days could not be estimated due to lack of satellite coverage (12,753 deaths) (see below); the remaining sample was 86,970.
Meteorological and Ambient PM2.5 Data
Ground-monitoring PM2.5 data in Lima were available from March 2010 through December 2016, from ten stations from the Servicio Nacional de Meteorología e Hidrología del Perú (SENAMHI, Ministry of the Environment), and 6 stations operated from 2011-2012 by Johns Hopkins University [22]. However, these data were not available on a daily basis during our study period, covering only about 10% of days. Hence, the ground-monitoring network was considered too sparse to adequately capture the spatiotemporal variability in PM2.5 levels that occurs in Lima. Thus, we based our PM2.5 exposure data from a model developed by Vu et al. [20]. Briefly, daily PM2.5 concentrations at a 1 km2 spatial resolution for 2010-2016 were estimated using a combination of the available ground measurements plus aerosol optical depth (AOD) data from satellites, and meteorological and land use data chemical transport models. AOD was obtained from NASA, using the MAIAC (Multi-Angle Implementation of Atmospheric Correction) algorithm. Meteorological fields (temperature, wind, and barometric pressure) were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) and the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem). A random forest model was used to regress the available ground measurements with 14 variables, including MAIAC AOD, meteorological variables from WRF-Chem and ECMWF, and land use variables. The overall cross-validation R2 value (and root mean square prediction error) was 0.70 (5.97 µg/m3), comparing predicted to observed ground level data. The mean difference between ground and predicted measurements was -0.09 µg/m3. This regression model was then used to predict daily PM2.5 levels for each km2 grid across Lima. These estimates were then used in epidemiologic analyses, in which daily deaths were aggregated by district, and daily population-weighted average PM2.5 levels were calculated for each district from the 1 km2 data.
On every 16th day throughout the study period, we were unable to estimate PM2.5 due to lack of satellite coverage. Furthermore, PM2.5 estimates for October 15 to December 31, 2015 could not be made because the WRF-Chem model failed to estimate data within reasonable bounds for that period. Hence, we had PM2.5 estimates for 2236 days (91%) out of the 2465 days during the study period. The daily weather data (temperature and relative humidity) also was provided by SENAMHI.
The protocol was approved by Ethics Review Committee of Cayetano Heredia University (SIDISI code 104102)
Statistical analysis:
For each death we had the district of residence, which we used to assign PM2.5 daily exposure (or a lagged exposure) to that death. Using our daily estimates of PM2.5 at a 1 square km resolution, as well as estimated population in that same area, we created daily population-weighted PM2.5 averages by district, which were in turn assigned to all daily deaths in that district. Daily deaths in each district were grouped for a Poisson regression analysis.
We used generalized linear models with Poisson regression to estimate associations between daily district-level PM2.5 levels and daily counts of deaths for the outcomes of interest. PM2.5 effects were assessed using same day (lag 0), previous day (lag 1), two day (lag 2), three days (lag 3) average PM2.5 for two (0-2), three days (0-3) as well as the prior 30 days average; lag 1 was eventually chosen based on superior fit to the data using Akaike’s Information Criteria (AIC). To control for spatially varying factors and allow the analysis to be based on temporal contrasts only, the models included indicator variables for district to represent the geographical area over which deaths counts were spatially aggregated; this also controlled for spatial autocorrelation in the baseline deaths across the districts [23]. The models also included variables included for day of week, daily relative humidity, and maximum daily temperature. We compared two methods for controlling for long-term trends, either via parametric cubic splines with monthly knots, or with variables for month, year and an interaction between these variables (month*year). The latter fit appreciably better via the AIC and was used. The continuous PM2.5 variable was also categorized into quintiles: Q1st: 11.27 – 17.07 µg/m3; Q2nd: 17.08 – 18.60 µg/m3: Q3rd: µg/m3 18.61 – 20.58), Q4th: 20.59 – 25.23 µg/m3; and Q5th: 25.24 - 60.18 µg/m3. Mortality was analyzed as a whole, and also stratified by three age groups (<18, 18-64, 65 years or more). We analyzed combined respiratory and circulatory deaths, and separately, respiratory deaths (ICD10 codes J00-J45), and circulatory mortality deaths (ICD10 codes I00-I99), as well as infectious respiratory disease (IRD)(ICD10 codes 00-J06, J09-J22) and cardiovascular disease (CVD)(ICD10 codes ICD10: I20-I22, I24, I25, I46-I50, I63-I67, I70, I73-I75, I77, I79, G45). Models for several other sub-categories with fewer daily deaths did not converge. Standard errors of coefficients were adjusted for over-dispersion, which generally was very modest. Analyses were conducted using SAS v9.4 PROC GENMOD (SAS Institute Inc., Cary, NC, USA).