Study design
We implemented a longitudinal design, gathering retrospective data about the cohort for an eight-year period between 2012 and 2019 (inclusive). We utilized data from patients treated at Oregon Health & Science University’s Cystic Fibrosis Care Center (for adults) and Doernbecher Children’s Hospital (for children). The Cystic Fibrosis Foundation Patient Registry (CFFPR) database was used to identify individuals seen at these centers during the target period. Individuals were eligible for the study if their record was present in the CFFPR at any time between 2012 and 2019, and if they had also provided a valid zip code to the registry. All patients with CF seen at both centers consented to be included in the CF Foundation’s Patient Registry. The primary end point was the number and frequency of severe pulmonary exacerbation (PEx). “Severe pulmonary exacerbation” was defined as a clinical event with respiratory symptoms requiring medical intervention (home IV antibiotics or hospitalization). Episodes occurring within two weeks of a previous exacerbation were assumed to be the same episode of care.
Exposure
Daily fine particulate matter (PM2.5) measurements for the period between 2012 to 2019 were obtained from the EPA’s Air Quality System DataMart, a publicly available database. Exposure was determined by linking confirmed patient addresses to the EPA’s air quality data; based on their residential zip code, individuals were assigned exposure data from the air quality monitor located nearest their home. In order to identify exposures, daily air quality in an individual’s residential zip code was categorized as either “Acceptable” or “Poor.” These categories were created using the EPA’s Air Quality Index (AQI) definitions, where an AQI over 100, equivalent to a 24-hour PM2.5 reading of 35.5 µg/m3, is considered “unhealthy for sensitive groups” [9]. Individuals residing in areas with a daily Air Quality Index beyond the limit set by the EPA as ‘unhealthy for sensitive groups’ were considered exposed.
Data was limited to dates occurring during a typical wildfire season in in the Northwestern United States. Although the exact dates of the wildfire season are event-driven and therefore vary from year to year, wildfires in the Northwest occur almost exclusively during the dry summer months, i.e., between June 1st and September 30th. Additionally, because exacerbations secondary to viral infections are more likely during the cold winter months, focusing on summertime exacerbations allowed us to minimize confounding events that may have been triggered by viral illness rather than by smoke exposure.
Air quality monitors measure the amount of particulate in the air, but cannot identify the source of the particulate, or tell whether air pollutants are coming from vehicles, industrial sources, or wildfire. However, because prior research has found that the substantial majority of fine particulate matter air pollution detected in the Pacific Northwest during wildfire season can be attributed to wildfire [10 11], the location of our treatment center allowed us to use summer days with poor air quality as an acceptable surrogate for the presence of wildfire smoke. Liu found that in the western United States 71.3% of total PM2.5 could be attributed to wildfires on with poor air quality [10]. Other research found that in Pacific Northwest during the 2017 wildfire season - a period of time included in our study - an estimated 85% of the PM2.5 was due to wildfire [11]. While more efficient vehicles and greater regulations have reduced anthropogenic contributions to air pollution, global warming has caused an increase in the frequency and duration of wildfires. This has meant that a greater proportion of air pollution in the northwestern United States is due to wildfire smoke than to anthropogenic sources such as vehicles and industry [12]. Non-wildfire sources of PM2.5, such as vehicle emissions, are more likely to affect air quality in the winter months, when thermal inversions trap particulate matter close to the ground.
Statistical analysis
We used patient-days as the increment of analysis, and matched participants to environmental data to see whether the likelihood of experiencing an exacerbation was different for those who had been exposed to wildfire smoke-driven air pollution within the previous 30 days. We queried the CFFPR database to find the residential addresses of study participants, which are confirmed annually. In order to reduce the likelihood of exposure misclassification, we updated the location of participants for every year they participated in the study. The first author created a bespoke k nearest neighbors algorithm with a balltree construction to accurately match patients to the air quality monitor nearest them using zip code [13]. Implausible data for age (> 70 years or < 1 year) and outliers for distance to the nearest air quality monitor (> 50 miles) were cleaned from the dataset prior to analysis.
We tabulated the number of wildfire-exposed patient-days and unexposed patient-days and determined whether or not each patient-day contained the start of an exacerbation episode; episodes occurring within 30 days following an exposure were considered exposed exacerbations. Both exacerbations and exposure status were treated as binary variables, and a contingency table was created to determine summary statistics [14]. We ran the more conservative Fisher’s exact test to confirm that our result would still be statistically significant even if the assumption of normality was incorrect.
A binary logistic regression model of the relationship between exposure and exacerbation was used to compute the univariate odds ratio and its associated confidence interval. Additional variables were added to create a multivariate logistic regression model in order to generate adjusted odds ratios. Age and distance to the nearest air quality monitor were treated as continuous variables; zip code, race, insurance status (a proxy for socio-economic status), month, care center at which the patient was typically seen, and patient state of residence were all considered categorical variables. We conducted sensitivity analyses in order to see which variables influenced the relationship between the exposure and the outcome. Variables were retained in the multivariate model based on significance testing, which was determined with the Wald z value and deviance [15]. Variables that reduced the deviance of the model and had a Wald z > = 2 were retained. For adults, only the patient age and year variables were significant enough to be retained in the final multivariate model. Patient age, year, and distance to the nearest air quality monitor were significant for pediatric patients.
Variables were also tested for multicollinearity using the variance inflation factor (VIF) set to a threshold of 2.5. The age variable was found to be multicollinear with care center, most likely because the dataset contains information from two hospitals, one which treats adult patients ages 21 and older and one for children under the age of 21. To reduce multicollinearity, patient age was used to stratified the subjects into two categories (< 21 years vs. ≥ 21 years) by care center. All statistical analysis was conducted using Python version 3.7.7.