Data sources
We studied the synergistic effects of two climate-related hazards, extreme heat and wildfire smoke, in California ZIP codes from 2006 to 2019 that satisfied the following criteria: 1) having a population larger than 1000 in the 2010 U.S. decennial census (statistical power consideration); and 2) having at least one day in each of the four exposure categories (required for ZIP code-specific effect estimates; details in the climate hazards section below). We chose the ZIP code as the spatial unit for analysis given the availability of health data.
Health outcome
We calculated the ZIP code-specific daily sum of unscheduled hospitalizations by cause using the Patient Discharge Data and Emergency Department Data collected by the California Department of Health Care Access and Information55. We estimated the counts of cardiorespiratory hospitalization, the sum of circulatory and respiratory hospitalizations, based on primary diagnosis codes (see Supplementary materials for the list of codes included to identify the cause-specific hospitalizations).
Climate hazards
We identified wildfire smoke ZIP code-days as days in which the daily wildfire-specific PM2.5 concentration in the ZIP code is higher than a pre-specific threshold. We explored four threshold values—>0 µg/m3, 5 µg/m3 (2021 air quality guideline for annual PM2.5 by World Health Organization48), 15 µg/m3 (2021 air quality guideline for 24-hour PM2.5 by World Health Organization48), and 35 µg/m3 (National Ambient Air Quality Standard for 24-hour PM2.5 by U.S. Environmental Protection Agency56). Although adverse health impacts were observed below such thresholds57,58, maintaining PM2.5 concentrations lower than these threshold values are considered sufficient to protect public health from a regulatory perspective and serves as a good starting point to dichotomize the wildfire-specific PM2.5. Most local air quality management agencies also issue air quality warnings or promote “action days” if the standards were exceeded23. The daily time-series dataset of ZIP code-specific wildfire-specific PM2.5 concentration was previously developed and details were reported elsewhere59. Briefly, this dataset estimated daily total and non-wildfire-specific PM2.5 concentration using an ensemble model-based novel multiple imputation approach with a prediction R2 of 0.78 in the test set by incorporating data from U.S. Environmental Protection Agency’s Air Quality System monitors, aerosol optical depth from NASA satellite measurements, smoke plume observations from the NOAA Hazard Mapping System, meteorological variables from the Gridded Meteorological (GRIDMET) reanalysis product, and other land-use variables from the National Land Cover Database.
We identified extreme heat ZIP code-days as days in which the daily maximum heat index of the ZIP code is higher than a ZIP code-specific threshold, which is similar to the definition of heatwave day threshold used by the U.S. Environmental Protection Agency49. We calculated the ZIP code-specific threshold as the 85th or 95th percentile of the summertime (July and August) daily maximum heat index from the year 1980 to the year 2005 in the ZIP code, which captured unusually hot days in each ZIP code and allowed adaptation and acclimation to local climates. In other words, this definition emphasizes the influence of climate change in recent decades by allowing ZIP codes with larger increases in heat index during the study period compared to the historical period to have more extreme heat days. Besides, we utilized the daily maximum heat index as the threshold, which combines maximum absolute temperature and minimum relative humidity, proxies biological heat stress, and is of higher health relevance than absolute temperature60,61. We obtained daily temperature and humidity from the GRIDMET reanalysis product at a 4km resolution for the calcualtion62. Since most heat index algorithms produce numerically similar results, we choose the National Weather Service’s Weather Prediction Center’s adaptation of the Rothfusz regression model to calculate heat index63,64. To avoid implausible threshold values, we also replaced ZIP code-specific threshold values higher than 105°F or lower than 80°F with absolute cutoff values of 105°F or 80°F to define extreme heat days47. Details of the implementation of this algorithm in Google Earth Engine were reported elsewhere47.
Combining designations of extreme heat ZIP code-day and wildfire smoke ZIP code-day, we categorized ZIP code-days within California into four types of ZIP code-days: extreme heat alone, wildfire smoke alone, compound exposure to both hazards and no exposure to either hazard (control/unexposed). These exposure categories were used in selecting ZIP codes for the study and in ZIP code-level matched design to evaluate spatial heterogeneity in synergistic effects. We summarized spatial distribution and the total number for the three types of exposed days.
Community characteristics
To explore the effect modification of spatial heterogeneities in synergistic effects from extreme heat and wildfire smoke by community characteristics, we obtained ZIP code level community characteristics from the U.S. Decennial Census in 201065 and the Healthy Places Index (HPI) report version 3.066,67. All community characteristics were coded in a way that higher values indicate better socioeconomic status or fewer racial and ethnic minority residents. Specifically, we included variables for race/ethnicity (proportion of White, Black, Asian, Latino, Native American, Pacific Islander), employment (proportion of employment among those ages 20–64), education attainment (proportion of 25 and older with a bachelor degree or higher), health insurance coverage (proportion of insured among those aged 18–64), income (proportion of the population with an income that is greater than 200% of the federal poverty level and per capita income in U.S. dollars), population density per 10,000 population, assets and utilities (percentage of households with central air conditioning [AC] or access to an automobile), and tree canopy coverage (population-weighted percentage of area with tree canopy). Most socioeconomic variables are based on the averages of the American Community Survey in 2015–2019 summarized by the HPI while racial/ethnical variables are based on the Decennial Census survey in 2010. Details of all data sources are summarized in eTable 4. Since California has a highly varied climate with drastically different average temperatures, we also obtained the designation of ZIP codes into 16 climate zones in California from the Pacific Energy Center for stratified analysis68.
Although the HPI provided community characteristics at ZIP code level, they did not provide details on their crosswalk from the census tract (the spatial unit where most survey data was measured) to the ZIP code (the spatial unit for our health outcome). As a sensitivity analysis, we calculated the ZIP code level community characteristics manually from census-tract values except for population density and AC prevalence, which were available at ZIP code level from other data sources (see eTable 4). We downloaded the census tract-level HPI data for the relevant indicators67, census tract population sizes from the U.S. Decennial Census in 201065, and the U.S. Department of Housing and Urban Development crosswalk file69. Specifically, we calculated the ZIP code-level values as weighted averages of census tract-level values. The weight is the census tract population within the ZIP code, which was calculated as the product of the census tract population and the ratio of the census tract addresses that fall within the ZIP code. For automobile ownership, we created an ordinal variable with ten categories based on deciles of ZIP code’s automobile ownership percentile among census tracts within each ZIP code.
Statistical analyses
State-level case-crossover analysis
We first evaluated the synergistic effect of extreme heat and wildfire smoke on the risk of daily cardiorespiratory hospitalizations at the state level using the time-stratified case-crossover design and the metric relative excess risk due to interaction (RERI). The time-stratified case-crossover design is a widely used method in evaluating short-term health impacts of environmental hazards, which compared exposures in days with an event to exposures in the same weekdays in other weeks of the month without the event to account for time-invariant confounding and estimates odds ratio (OR)70,71. An OR larger than one indicates an increase in the risk of hospitalization from exposure to environmental hazards. RERI quantifies the relative difference between the joint effect of two co-occurring hazards (extreme heat and wildfire smoke) and the sum of individual effects of the two hazards72. A positive RERI value indicates an increase in risk due to interaction between the two hazards. We used > 15 µg/m3 daily wildfire-specific PM2.5 concentration as the threshold for wildfire smoke and 85th percentile historical daily maximum heat index as the threshold for extreme heat in the definition of hazards for main analysis.
Specifically, for each ZIP code-day that had cardiorespiratory hospitalizations, we first identified controls as the same weekdays from other weeks of the same month and year in the same ZIP code. Next, we applied conditional logistic models to estimate the statewide OR for the effect of extreme heat alone, wildfire smoke alone, and the combined exposure to both hazards on cardiorespiratory hospitalizations by including an interaction term between indicators for either hazard. The conditional logistic model accounted for the matching procedure by comparing the exposures of each matched case and control set and calculating a weighted average across all matched sets. The OR approximates the effect of the climate hazard (compound or individual) on the risk of hospitalization; in other words, an OR larger than one indicates a detrimental effect of the climate hazard (compound or individual) on the risk of hospitalization. Since this study design controls for covariates that do not vary markedly within a month, such as age, race, and sex, as well as time-varying covariates like season73,74, we did not include covariates other than indicators for the hazards of interest and an interaction term between the two hazards. Effects of individual hazard alone (ORH0S1 and ORH1S0) were exponentiated coefficients from the conditional logistic model, while the joint effect of both hazards (ORH1S1), or effect of compound climate hazards, was calculated as the exponentiated sum of three coefficients (two for individual hazards and one for the interaction term) in the conditional logistic model. Since assessing interaction on the additive scale has been recommended as the appropriate scale to inform potential public health benefits of hypothetical interventions72,75, we evaluated whether a synergistic effect exists between extreme wildfire smoke and heat on the additive scale using the RERI in our main analysis. We calculated the RERI as (ORH1S1 -1) - (ORH0S1-1) - (ORH1S0-1), where subscript H1 indicates days with extreme heat and S0 indicates days without wildfire smoke. We also assessed interaction on the multiplicative scale using the ratio of the odds ratio (exponentiated coefficient for the interaction term) as a secondary analysis. We calculated the 95% confidence intervals for RERI and the joint effect using the delta method72. We used the “survival” for conditional logistic regression and the “msm” package for the delta method76,77.
ZIP code-level matched design with spatial Bayesian hierarchical model
Since the state-level analysis assumed the same effect of extreme heat and wildfire smoke (individual, joint, or synergistic) across all ZIP codes while spatial heterogeneity might exist, we applied a modified version of the previously developed method that utilizes within-community matched design to estimate ZIP code-specific effects and spatial Bayesian hierarchical model (BHM) to incorporate considerations of spatial autocorrelation78.
In the within-community matched design, we quantified the ZIP code-specific effects of the climate hazards on cardiorespiratory hospitalizations for each ZIP code separately. For each ZIP code, we first identified matched controls for three types of exposed days separately (extreme heat alone, wildfire smoke alone, and compound exposure to both hazards). For each exposed day, controls are days of the same year and ZIP code, not exposed to either hazard and within the window of 30 calendar days before or after the exposed day78. To reduce the spill-over effect, we excluded days within three days of any exposed days (either extreme heat or wildfire smoke) from the potential controls79,80. Next, we calculated the rate ratio for each exposed day as the ratio of hospitalization counts in the exposed day divided by the weighted average of the counts of hospitalization in all selected control days, with weights equal to the inverse distance to the exposed day (i.e., 1/number of days to exposed day). Since this study design resembles a matched cohort design, we directly calculated the ZIP code-specific rate ratios for extreme heat alone (RRH1S0), wildfire smoke alone (RRH0S1), and combined exposure to both hazards (RRH1S1) as the average rate ratios from all corresponding pairs of matched exposed days and control days in each ZIP code. We used the ratio of the hospitalization counts to approximate the rate ratio because the population size of a ZIP code is unlikely to change dramatically within two months. Last, we calculated the ZIP code-specific synergistic effects, RERI, as (RRH1S1 -1) - (RRH0S1-1) - (RRH1S0-1). This within-community matched design accounted for potential confounding that does not vary remarkably within two months by design (e.g., population composition regarding sex and age) while giving more weight to control days closer to the exposed day in calculation reduced the potential influence of seasonal trend. We removed ZIP codes from the next-step analysis if the estimation of individual or joint effects failed.
Since areas closer together are more likely to be exposed to the same wildfire smoke or extreme heat events, we expect the ZIP code-specific RERIs to be more similar among closer ZIP codes. To increase the precision of our ZIP code-specific estimates, we used a spatial BHM to leverage this spatial autocorrelation in our estimates. The theoretical development of this model was described elsewhere78. Based on the empirical semivariogram, we selected the spherical shape for the covariance structure and specified starting values of the covariance structure as 2, 0.75, and 2 for sill, nugget, and range, respectively. We also specified the priors as inverse gamma distributions (2 for shape and 1/starting value for scale) for the sill and nugget, and uniform distribution (0.001 to 6) for the range; the tuning parameters as one-twentieth of the starting values. We used 10,000 Monte Carlo Markov chain samples with 75% burn-in to estimate parameters and sample weights in the spatial BHM. We recovered sample weights in the BHM to obtain pooled ZIP code-specific RERIs, which incorporated information from spatial autocorrelation into the pre-pooling ZIP code-specific RERIs. We also computed the signal-to-noise ratio (SNR) as the ratio between the mean of the recovered samples (either for the parameters or for the weights) and the corresponding standard deviation of the recovered samples, representing the precision of the estimates. This method assumes isotropy (i.e., the same spatial relationship in all directions). We used the “spBayes” package in R for the spatial BMH81.
Effect modification by community characteristics
To evaluate the potential effect modification of community characteristics on the synergistic effect, we ran meta-regressions of the pooled ZIP code-specific RERIs on each community characteristic selected separately. We reported results as the increase in RERI per interquartile range change in the community characteristics. For AC prevalence, we also evaluated the effect modification for each climate zone separately because AC prevalence varies dramatically across climate zones in California (eFigure 4A). We conducted meta regressions using the “meta” package82.
Sensitivity analyses
To evaluate the robustness of our state-level estimates towards the inclusion of ZIP codes, we conducted the time-stratified case-crossover analysis using all California ZIP codes (1772) after removing two ZIP codes due to exposure data missingness (one for extreme heat and the other for wildfire smoke). We also explored seven combinations of hazard definitions other than the definition used in the main analysis (threshold of > 0 µg/m3, 5 µg/m3, 15, and 35 µg/m3 daily wildfire-specific PM2.5 concentration for wildfire smoke; threshold of 85th and 95th percentiles of historical daily maximum heat index for extreme heat). Last, we conducted the above analyses using the exposure of the previous day (lag1).
For ZIP code-specific RERIs, we explored three different within-community matched designs with varying assumptions (yearly weighting, monthly Poisson, and yearly Poisson) other than the within-community matched design described in the main analysis (monthly Poisson). The first part of the method name describes the control identification method while the second part describes the estimation method. Specifically, in the “yearly” methods, we adopted the control identification method used by Liu et al. 2017 studying wildfire smoke events79. We identified matched controls for each exposed day as days of the same ZIP code, not exposed to either hazard and within the window of seven calendar days before or after the exposed day in a different year. This control identification method provides better control for seasonal trends by design than the “monthly” method used in the main analysis but might experience more inter-year confounding. For “yearly weighting”, the calculation of the rate ratio is the same as the main analysis except that the unit for the distance between matched control and the exposed day is year instead of day. We also adopted a new rate ratio calculation method modified from Liu et al. 201779, which takes a random sample of four controls from the matched controls of each exposed day and runs a Poisson regression of the count of hospitalization on an indicator of exposed among all matched pairs in each ZIP code (“Poisson”). Compared to the weighting method that gives more weight to days closer to the event, the Poisson method treats all matched controls equally. We leveraged spatial information into these estimations through the spatial BHM while restricting the analyses to ZIP codes that succeeded in all four matched designs and had a plausible RERI (< 50). We also evaluated effect modification by community characteristics for all matched designs explored.
Last, we evaluated the robustness of the combination of our spatial BHM and effect modification evaluation method by conducting three analyses other than the main analysis: 1) we used the HUD crosswalk dataset for meta-regression to confirm the robustness towards community characteristic dataset (HUD crosswalk); 2) we ran the spatial BHM using flat priors for the sill and nugget to introduce minimal prior information into the Bayesian model (inverse gamma distributions with shape and scale equal to 0.001) (flat prior); and 3) we ignored the potential spatial autocorrelation and ran linear regressions of ZIP code RERI from the within-community matched design and community characteristics (non-spatial linear).
All analyses were performed in R Studio with R version 4.1.083. The R code to replicate these analyses is available at the following link: https://github.com/suthlam/synergistic_effect_wildfire_extremeheat_ca.git