Study design
The P4O2-COVID-19 study is a multicenter, prospective cohort study, consisting of 95 patients who had a confirmed SARS-CoV-2 infection. The full details of patient recruitment are described elsewhere (Baalbaki et al., 2023). In short, patients were recruited from five different hospitals across the Netherlands. For the study, participants attended two study visits, with the first visit taking place three to six months after the acute infection and the second visit approximately nine months after the first visit. Written consent from each participant was obtained during the first study visit. During these hospital visits, several measurements were collected and between these two visits, personal and residential PM2.5measurements were performed in the participants' home environment. For the current paper, we focused on the residential outdoor PM2.5 measurements.
The deployed sensor was developed by the private company SODAQ, and is equipped with a Sensirion SPS30 dust sensor to measure PM2.5, and a Bosch BME680 sensor to capture ambient temperature and relative humidity (SODAQ, 2021). The Sensirion sensor measures PM2.5 based on a light scattering principle (Sousan et al., 2021). The manufacturer calibrated the sensor before use. PM2.5concentration was reported in µg/m3.
The PM2.5sensors were adapted for continuous stationary monitoring, which included connection to participants’ household power supply, and deployed outside the window of participants’ homes. An image of the sensor attached to an outside window is shown in Figure S1 (Appendix A). The sensor transmitted data every minute via GSM where it was centrally collated at Utrecht University. Due to server disruptions some data was lost. The dates and timeframes are provided in Table S1 in Appendix B. Specific details on sensor placement were recorded, such as the floor level and the specific direction the sensor was facing (e.g. northeast).
Sensor data management
From February 2022 to May 2023, 73 sensors were deployed. The average active time per sensor was 131 days (min. 1.5 days, max. 300 days), allowing temporal trends and seasonal variations to be examined. Overall, the deployed sensors produced a dataset of 14,370,411 data rows, each containing a PM2.5, temperature, and humidity measurement at a one-minute resolution.
Figure 1 presents the sensor data cleaning process and its various steps. After data cleaning, 95.7% (13,698,302 1-minute observations) of the original dataset remained.
After disconnection from the power source, the sensors continued to transmit data while being transported back to the laboratory as they had an internal battery. As these data points do not reflect residential exposures, they were removed, resulting in the removal of 186,797 (1.30% of total) rows of data. Next, data rows with erroneous dates and times (i.e., outside the deployment period) were removed, excluding N=2,247 data rows (0.016% of total). Data logs were then reviewed for sensor errors (an overview of all error codes and their meaning can be found in Table S2 presented in Appendix C). All error codes except 0 (no error code) and 32 (GPS error – irrelevant as only stationary measurements were collected) were eliminated, removing 405,771 data rows (2.86% of total). Erroneous measurements, defined as PM2.5 minutevalues above 500 ug/m3 or registered as 0 µg/m3 were also removed, as both of these were considered very unlikely in the Netherlands based on expert elicitation. This led to removing 8,487 rows with PM2.5values of 500 ug/m3 or higher (0.062%) and 1,674 rows with a value of 0 ug/m3 (0.012%). Further cleaning involved the removal of temperature points exceeding 50°C or falling below -20°C, eliminating 5,055 (0.037%) and 2,978 (0.022%) data rows, respectively. We removed data points exhibiting a tenfold difference from the preceding and subsequent values as such high temporal variance indicates signal noise (i.e. erroneous “spikes”). This resulted in the exclusion of 1,315 data rows (0.0096%). Finally, during the performance evaluation of each sensor, one sensor showed a declining performance over time, where markedly higher measurements were recorded after 14-09-2022 (Figure S2, Appendix D). Data following this timepoint were removed (N=48,827 rows of data, 0.35% of total). No similar patterns were detected in the remaining sensors. Lastly, sensors containing less than 14 days of data were removed, resulting in three sensors with a combined total of 8,328 data rows being removed (0.06%). After these steps, the final dataset contained 13,698,302 (95.7% of the original) rows of data.
Figure 1: Flow chart of the sensor data cleaning process
Data analysis
The accuracy of the sensor data was evaluated by comparing it to measurements from the national air quality monitoring network (LMN) overseen by the National Institute for Public Health and the Environment (RIVM) and provided PM2.5 concentrations at an hourly resolution. Figure 2 shows the approximate locations of the sensors (as shown by circles) and the official monitoring stations (as shown by squares) in the Netherlands. In all analyses, the nearest monitoring station was used. Before comparing the sensor data to the measurements from the regulatory measurement stations, we explored the influence of humidity and temperature on the obtained sensor data. We ran log-linear regression models to evaluate whether the PM2.5signal was affected by either of these variables. We used the sensor's internal ambient temperature and relative humidity measurements and, as a second source, the meteorological data from the Royal Dutch Meteorological Institute (KNMI). We also regressed temperature and humidity on PM2.5concentrations as measured at the regulatory stations. This was done as the actual PM2.5 concentration is influenced by meteorological conditions including temperature and relative humidity. The results of these models, summarized in Table S7 in Appendix G, reveal small β values and low adjusted R² values for ambient temperature and relative humidity, suggesting that the PM2.5 signal is not significantly influenced by ambient temperature and relative humidity. Therefore, it was decided not to correct the sensor data for ambient temperature or relative humidity. Several comparisons were explored to compare the sensor data with official monitoring data, utilizing daily and hourly PM2.5averages. The sensor data transmitted every minute, and is transformed into hourly and daily averages using the arithmetic mean to enable comparison to the LMN data. Both daily and hourly averages were used to construct boxplots, and daily averages were used to construct time-series graphs. As our participants were located in the provinces of North Holland, South Holland, Flevoland, Limburg, and Zeeland (Figure 2), only the LMN measuring stations from those areas were used. A complete overview of the LMN stations used in the analysis can be found in Table S3 in Appendix E, which includes information on the locations and characteristics of these measuring stations. Summarized sensor measurements were compared against the nearest station (hourly), nearest background station (hourly and daily), and the daily average of all background stations in the four study provinces. The latter was used as the Netherlands consists of one airshed, meaning that the dispersion and movement of air pollutants are influenced by similar atmospheric conditions across the entire country (Strickland et al., 2011). Background network stations were separately analyzed as they best matched the residential locations of the deployed sensors, as few of our residential sensors were placed in industrial or heavy traffic locations.
For each individual sensor comparison the Pearson correlation coefficient, the average difference, the average absolute difference, and the average distance to the LMN station were calculated. The average difference was obtained by subtracting the LMN PM2.5values from sensor values. The distribution of individual Pearson correlation coefficients and sensor-network differences was evaluated. The agreement between the sensors and LMN stations was further evaluated visually using Bland-Altman plots. Plots were constructed using hourly and daily measurements, for respectively the closest LMN measuring station and the closest LMN background station and the sensor, resulting in four plots.
Sensitivity analyses were conducted to account for potential variations resulting from the sensor's deployment at different heights (floor level of the house), micro-location (e.g. facing the backyard or street) and traffic intensities. For all analyses, daily averages were compared against the nearest background site. The sensitivity analyses containing hourly data and comparisons using the average of all background stations are available in Tables S4-S6 in Appendix F. The dataset was stratified based on several factors: floor level, sensor direction, and traffic intensity, allowing for the examination of potential differences between the specific deployment conditions. Floor levels were divided into a binary classification: ground level versus higher floors (Zauli Sajani et al., 2018). Sensor direction was categorized as either facing the street or the backyard. The average number of vehicles per hour on the nearest road was calculated to assess traffic intensity. As most sensors were deployed in areas with a relatively low traffic intensity, a cut-off point based on the 75th percentile was employed, which was 580 vehicles per hour. Sensors in locations with fewer than 580 vehicles per hour on the nearest road were compared to those with more than 580 vehicles per hour on the nearest road.
Figure 2: Map of the Netherlands showing the various measurement locations (indicated by circles) and official LMN measurement stations (indicated by squares) in the Netherlands. For privacy reasons, an offset has been used to show the sensors' locations.