This is the first study to develop LUR models for multiple cities in a Latin American country, providing small-area estimations of air pollutants for use in health risk assessments, epidemiological studies of long-term exposure to air pollution and mitigation evaluation. The development of LUR models to estimate concentrations for PM2.5 and NO2 in five of the largest Colombian cities showed moderate to high explained variance, respectively. Generally, the models showed higher explained variance of PM2.5 compared with NO2. Among the cities, the lowest explained variance was obtained for Bogotá, while the highest was recorded for Medellín and Bucaramanga.
The LUR models for PM2.5 showed relatively small errors of the predicted concentrations (RMSE < 1.7 𝛍g/m3) in the cities, except for Barranquilla. Moreover, the performance of the LUR models developed for PM2.5 was higher than that reported in previous studies in Colombia. Previous LUR models were available only for PM10 and PM2.5 in the city of Medellín with an explained variability of 79% for PM10 (Londoño & Cañon, 2015) and monthly variations between 26% and 79% for PM2.5 (Grisales, 2020), using data from 2007 and 2018, respectively. Our selected LUR model for PM2.5 in Medellín explained 82% of the variability, the highest of the five cities, using a combination of meteorological, land use, population density and traffic volume variables. The high performance of the LUR models for PM2.5 in Medellín compared to other cities might be explained by the wide range of estimated concentrations in the city and the influence of the topography and meteorology in the Valley of Aburrá where Medellín is located, as well as the important contribution of vehicular emissions to local concentrations as have been described in studies of PM2.5 characterization in the city (Area Metropolitana del Valle de Aburrá & Politecnico Colombiano Jaime Isaza Cadavid, 2021). In contrast, the low performance of the LUR models for PM2.5 in Bogotá compared to other cities might be explained partially by the lower contribution of vehicular emissions and the increased contribution of enriched fugitive dust (resuspension of crustal material and soil dust) and secondary PM (Ramírez et al., 2018). A similar profile has also been documented for Barranquilla with an important contribution of ocean aerosols (Nuñez Blanco, 2019), secondary organic aerosols and the effect of exposed land resuspension and road dust (Gómez-Plata et al., 2022), which was represented in the developed LUR model for this city. Additional unexplained variability in PM2.5 concentrations in the cities might be related to regional wildfires contributions which have been substantial in northern South America and particularly in Bogotá (Ballesteros-González et al., 2020)(Casallas et al., 2022).
The variation in explained variability reported for the Colombian cities is comparable to that of PM2.5 in other Latin American and European countries. In Ecuador, Alvarez et al. (Alvarez-Mendoza et al., 2019) developed LUR models for PM10 using remote sensing data, and the models showed an explained variability of 68% at its highest. Sangrador et al. (Sangrador, J.T., Nuñez, M.E., Villarreal, A.B., Cadena, L.H., Jerrett, M., Romieu, 2008) developed LUR models for PM2.5 during the rainy season in 2003 for Mexico City, which showed an explained variability of 60%. Later, Son et al. (Son et al., 2018) developed LUR models for the same city for different temporal scales, and the best explained variability for monthly PM2.5 models was 76%. In Europe, the ESCAPE project developed LUR models for PM2.5 in 20 study areas, where the explained variability varied from 35% in Manchester, UK, to 89% in Paris, France (Eeftens et al., 2012).
As expected, the best predictor variables in our LUR models for NO2 were road and traffic variables. However, the performance of the LUR models developed for NO2, however, was lower than that for PM2.5 and the reported from previous studies in other countries. In Sao Paulo, an annual LUR developed for NO2 explained 66% of the variability in urban concentrations, with variations for summer (75%) and winter (52%) seasons (Luminati et al., 2021). For the Western European countries, Vinneay et al. (Vienneau et al., 2013) developed LUR models for NO2 with and without satellite-based NO2 and obtained explained variability between 48% and 58% without satellite-based NO2 and a modest additional improvement of 5% when adding satellite-based data. In our models for NO2, despite including different variables and metrics of traffic and roads, the models could not capture a higher variability in concentrations, which suggests secondary reactions might be an important source of NO2 in the cities. Although our NO2 LUR explained less variability compared to other reported models in cities, the LUR models explain more variability than simple road proximity metrics or interpolation methods based on data from monitoring stations and similar variability than dispersion models, which have been demonstrated in previous studies assessing exposure assessment for epidemiological studies (Allen et al., 2011; de Hoogh et al., 2014; M Jerrett et al., 2007).
The LUR models have been used in exposure assessment and health research related to long-term exposure to air pollutants. By incorporating data on local sources of pollution, such as traffic or industrial activity, these models can provide more accurate and precise exposure estimates than traditional monitoring methods (Hoek et al., 2008). This is particularly important for assessing the health effects of chronic exposure to air pollution, which has been linked to a range of adverse health outcomes, including respiratory and cardiovascular disease, cancer, and neurological disorders (Chen et al., 2013; Herting et al., 2019; Knibbs et al., 2018; Lamichhane et al., 2017; Stafoggia et al., 2022). LUR models can also identify areas of high pollution levels and vulnerable populations, helping to inform policy and intervention strategies to reduce exposure and improve public health (Vienneau et al., 2013).
Alternative methods for estimating surface concentrations of air pollutants have been developed recently using satellite-based models and models using mobile air pollutant measurements. A study conducted at the municipality level in Colombia compared air quality models based on satellite measurements for PM2.5 between 2014–2019. It showed that the Copernicus Atmospheric Monitoring Service Reanalysis (CAMRA) and the Atmospheric Composition Analysis Group (ACAG) models had a low correlation and tended to overestimated surface concentrations when both models were compared to surface data from 28 cities in 2019. However, ACAG outperformed CAMSRA in terms of mean bias of the model and the spatial representation of the highest concentrations (Rodriguez-Villamizar et al., 2022). Using a mobile monitoring campaign in the city of Bucaramanga in 2019, estimations of within-city spatial variations in ultrafine particle and black carbon concentrations were predicted using a combination of LUR and convolutional neural networks trained using satellite and street-level images, showing the improvement of prediction when using a hybrid approach (Lloyd et al., 2021). Following this hybrid approach, our locally developed LUR models can be further used to develop hybrid models with satellite or mobile data and produce better spatially calibrated models for estimating long-term exposure for PM2.5 and NO2 in the main cities in Colombia and explore their potential transferability across cities.
There are some strengths in our study that are worth mentioning. First, there was a good agreement between PM2.5 measurements made with UPAS compared to the concentrations reported by the local monitoring stations in the cities. For NO2, there were few monitoring sites to conduct a valid comparison in all cities, but data from local government stations in Bogotá had a good agreement with concentrations reported from measurements with the Palmes tubes. Second, we followed the same standardized procedure for conducting measuring pollutants during the two campaigns in each city and the simultaneous measurement within cities avoid the potential error related to using measures in different time scales. Third, we included basic predictor variables for developing LUR models in the cities (land use, roads, traffic, population, and meteorology) available in the cities in Colombia and might be used further to developed multi-city models as those developed for Europe (Wang et al., 2014).
One limitation of the LUR models developed for the cities is the limited number of sampling sites which was 20 for PM2.5 and 40 for NO2, except for Bogotá which doubled the number. These numbers are below the lower range of recommended monitoring sites (between 80–100) for modeling intraurban variations in complex urban settings using LUR (Basagaña et al., 2012). As a result, the models developed using many predictors might have resulted in more unstable performance as was observed in the cross-validation. A second limitation of this study is the absence of valid traffic data for the cities during the campaign measurement, which has shown to improve the LUR model performance, particularly for NO2 (Beelen et al., 2013). To overcome this limitation, we measured traffic speed derived from satellite instruments and used previously available traffic count data for the largest cities to calculate density functions which were then transferred to the other cities to estimated traffic density. Despite the density functions in the cities seemed to reflect the traffic patterns in the cities and were included as significant predictive variables, their inclusion did not help to explain a higher variability in the models for NO2. Third, we did not include meteorological variables in the development of LUR models for the cities of Bucaramanga and Barranquilla due to limited number of meteorological stations and data to produce a valid estimated surface. Although the models´ performance for PM2.5 were good particularly for Bucaramanga, including meteorological variables might have increased the models´ performance as they have been reported as important predictors for intraurban variations in other countries (Cheewinsiriwat et al., 2022; Olvera Alvarez et al., 2018). Another limitation of our study is that we did not include local emission sources and regional sources (such as forest fires) in the prediction models. These variables have shown to influence the concentration of particles in the cities (Casallas et al., 2022). Moreover, street NO2 levels may vary in building density or location, influencing their dispersion. Also, some atmospheric chemical reactions may reduce or transform NO2 concentrations. In urban areas, NO2 emitted mostly from traffic within a radius of 100-300m showed a correlation, although the high reactivity of NO2 and rapid photodissociation may transform this pollutant in a reduced period (Agudelo-castañeda et al., 2020).