Estimating visibility and understanding factors influencing its variations at Bangkok airport using machine learning and a game theory-based approach

doi:10.21203/rs.3.rs-4104582/v1

Download PDF

Research Article

Estimating visibility and understanding factors influencing its variations at Bangkok airport using machine learning and a game theory-based approach

https://doi.org/10.21203/rs.3.rs-4104582/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 05 Aug, 2024

Read the published version in Environmental Science and Pollution Research →

You are reading this latest preprint version

In this study, a range of machine learning (ML) models including random forest, adaptive boosting, gradient boosting, extreme gradient boosting, light gradient boosting, cat boosting, and a stacked ensemble model, were employed to predict visibility at Bangkok airport. Furthermore, the impact of influential factors was examined using the Shapley method, an interpretable ML technique inspired by the game theory-based approach. Air pollutant data from seven Pollution Control Department monitoring stations, visibility, and meteorological data from the Thai Meteorological Department's Weather station at Bangkok Airport, ERA5_LAND, and ERA5 datasets, and time-related dummy variables were considered. Daytime visibility ((here, 8–17 local time) was screened for rainfall, and ML models were developed for visibility prediction during the dry season (November – April). The light gradient boosting model is identified as the most effective individual ML model with superior performance in three out of four evaluation metrics (i.e., highest ρ, zero MB, second lowest ME, and lowest RMSE). However, the SEM outperformed all the individual models in visibility prediction at both hourly and daily time scales. The seasonal mean and standard deviation of normalized meteorological visibility are lower than those of the original visibility, indicating more influence of meteorology than emission reduction on visibility improvement. The Shapley analysis identified RH, PM_2.5, PM₁₀, day of the season year, and O₃ as the five most important variables. At low relative humidity (RH), there is no notable impact on visibility. Nevertheless, beyond this threshold, negative correlation between RH and visibility. An inverse correlation between visibility and both PM_2.5 and PM₁₀ was identified. Visibility is negatively correlated with O₃ at lower to moderate concentrations, with diminishing impact at very high concentrations. The day of the season year (i.e., Julian day) (JD) exhibits an initial negative and later positive association with visibility, suggesting a periodic effect. The dependence of the Shapley values of PM_2.5 and PM₁₀ on RH, and the equal step size method to understand RH effects, suggest the effect of hygroscopic growth of aerosol on visibility. Findings from this research suggest the feasibility of employing machine learning techniques for predicting visibility and comprehending the factors influencing its fluctuations. Based on the above findings, certain policy–related implications, and future work have been suggested.

Visibility

Meteorological Normalization

Explainable Machine Learning

Shapley Value

Hygroscopic growth

Atmospheric visibility refers to the maximum horizontal distance at which the human eye can adequately perceive and recognize a prominent dark object against its background through sufficient visual contrast (Watson 2002). It is a routinely measured meteorological parameter at standard surface weather stations. Visibility is an important meteorological parameter because its degradation can affect various modes of transportation (e.g., traffic congestion, delayed flights, and on-road accidents) and decrease tourism and recreational activities due to disruption of scenic beauties (Watson 2002). In a polluted environment, visibility degradation is mainly caused by the scattering of absorption of light of particulate matter and gaseous pollutants in the atmosphere. Due to this direct linkage with air pollution, it is used as a surrogate for particulate matter (PM) for long-term air quality studies when long-term air quality data is not available (Aman et al. 2019; Singh et al. 2017). In addition to air pollutants, it is also dependent on various meteorological variables either directly or indirectly (Aman et al. 2019, 2023; Singh et al. 2017; Majewski et al 2022). As most of the aerosols are hygroscopic, an increase in humidity can lead to an increase in aerosol size, and this increases light scattering. Other meteorological variables also affect visibility indirectly by affecting aerosol concentrations (Watson 2002). Greater Bangkok (GBK) is one of the largest urban and industrial agglomerations in Southeast Asia. It consists of Bangkok, the capital of Thailand, and five surrounding provinces. Elevated levels of PM_2.5 during the dry season (November to April) are experienced in GBK every year as reported by PCD (2023). This is linked to emissions from both industrial and vehicular sources (ChooChuay et al. 2020; Narita et al. 2019), as well as biomass burning in the area (ChooChuay et al. 2020; Narita et al. 2019; Phairuang et al. 2019). Additionally, stagnant meteorological conditions induced by cold surges and sea breezes in GBK contribute to the persistence of this pollution (Aman et al. 2020, 2023).

The increase in particulate pollution has also led to a decrease in visibility in GBK (Aman et al. 2022). Visibility and its relationship with particulate pollution and meteorological variables in Bangkok have been investigated by many researchers using observational analysis as well as numerical modeling (Vajanapoom et al. 2001; Ruangjun and Exell 2008; Aman et al. 2022; Lee et al. 2017). Vajanapoom et al. (2001) reported a linear inverse relationship between visibility and PM₁₀ in Bangkok. Ruangjun and Exell (2008) developed a regression model using meteorological variables to predict fog and visibility at Don Muang Airport in Bangkok. Lee et al. (2017) used the WRF-Chem model to investigate the effect of biomass-burning aerosol on low-visibility events in Bangkok and other cities in Southeast Asia. Aman et al. (2022) reported two visibility events in the winter of 2014 and 2015 affected by the synergetic effect of particulate pollution and meteorology. Recently, visibility-related studies across the world have started using machine learning (ML) models for visibility prediction using particulate matter and meteorological variables as input data (Kim et al. 2022a; Kim et al. 2022b; Penov and Guerova 2023) and the importance of different features is identified by feature importance plots (Kim et al. 2022a; Penov and Guerova 2023). However, feature importance has limited insights, as it does not give information about their directional relationship with visibility and interactive effects of different variables. Also, it is important to quantify and separate the effect of meteorology on visibility to support air quality-related studies with visibility data. Traditionally meteorological adjustment for visibility or particulate pollution has been done using statistical models (Aman et al. 2019; Barmpadimos et al. 2011). More recently, Grange et al. (2018) developed a meteorological normalization technique to quantify and separate the effect of meteorology on air pollutants (discussed in Section on Meteorological normalization). This method has been used in multiple studies to quantify the effect of meteorology on different air pollutants (Grange et al. 2018; Qu et al. 2020; Wang et al. 2022). To understand the directional relational and interactional effect of different variables, it is important to understand the effect of different variables on visibility for each instance. This is done by first developing the ML model for visibility prediction and explaining each prediction by coupling it with another mathematical concept which is combinedly referred to as explainable machine learning (XML). Shapley additive explanation (SHAP) is one such mathematical concept from cooperative game theory which is coupled with any ML model to explain and quantify the contribution of each predictor variable for each instance of prediction (Lundberg and Lee 2017). This helps in developing a better understanding of factors affecting air pollution overall and for episodical investigation (Wu et al. 2022a; Hou et al. 2022; Wang et al. 2023a; Wang et al. 2023b) or visibility (Yao and Li 2023).

Various studies have used ML models for PM_2.5 prediction in GBK and Thailand (Gupta et al. 2021; Thongthammachart et al. 2023; Wongnakae et al. 2023; Aman et al. 2024) as well as other parts of the world (Hu et al. 2017; Fathollahi et al. 2023; Wei et al. 2021; Tian et al. 2023; Kumar et al. 2023). However, no study has explored ML models for visibility prediction or used methods like meteorological normalization and XML to quantify and understand the effect of meteorology on visibility degradation in GBK. Based on above stated facts and motivation, the objectives of this study are as follows: a) evaluation and comparison of different ML models for visibility prediction, (b) quantification of the effect of meteorology on visibility using ML model-based meteorological normalization, c) understanding of drivers of the visibility degradation using game theory-based XML method i.e, Shapley method. The ML models use different air pollutants, meteorological data, and time-related variables as input variables. The best-identified ML model was used for both the meteorological normalization technique and for integrating with the XML method to identify important variables affecting the visibility and the effect of hygroscopic growth of aerosols on visibility.

Study area and general climate

Greater Bangkok consists of the capital city of Thailand i.e., Bangkok (13.7°N, 100.5°E), and its five neighboring provinces (Samut Prakan, Nonthaburi, Pathumthani, Nakhon Pathom, and Samut Sakhon) (Fig. 1). It is located in the lower part of Central Thailand with a geographical area of 7762 km² and a registered population of 11 million (DOPA 2023). It serves as Thailand's economic hub, contributing approximately 47% to the nation's overall gross domestic product (NESDB 2022). The cityscape is characterized by an intricate blend of commercial, residential, agricultural, and industrial zones (LDD 2016), situated across a predominantly flat terrain with an average elevation of less than 10 meters above mean sea level. The prevailing climate in GBK is tropical and humid, influenced by the northeast monsoon (November-February) and the southwest monsoon (May-October) (TMD 2023). The northeast monsoon introduces cool, dry air from continental mid-latitudes, marking the winter season, while the southwest monsoon brings moist air from the Gulf of Thailand and the Indian Ocean, leading to widespread rain, categorizing it as the wet or rainy season. The transitional period (March-April) between these monsoons typically experiences higher average temperatures, constituting the summer season. The winter and summer seasons are collectively known as the dry season.

Insert Fig. 1

Data collection and processing

Hourly human-observed visibility (VIS, km), temperature (TEMP, °C), relative humidity (RH, %), wind speed (WS, ms^− 1), and direction (WD, deg.) data for the surface weather station at Don Mueang airport (T01) was obtained from the Thai Meteorological Department (TMD) (Fig. 1 and Table 1). Hourly air pollutants datasets were requested and obtained from PCD for seven air quality stations (P05, P08, P19, P27, P52, P53, P61; Fig. 1) for the years 2017–2022. These air quality monitoring stations are located in three provinces of GBK and have been used as a representative of air pollution level in GBK affecting visibility. The air pollutant datasets included PM_2.5, PM₁₀, Sulphur dioxide (SO₂), nitrogen oxides (as the sum of nitrogen oxide and nitrogen dioxide, i.e., NO_x = NO + NO₂), Ozone (O₃), and carbon monoxide (CO). It should be noted that all the air pollutants datasets are not available over all seven stations. All surface measurement data are from near-surface monitoring, except for winds at about 10 m above ground level (AGL). Surface pressure (SP), global radiation (GR), and rainfall (RN) data were downloaded for GBK from the 5th generation of the European Centre for Medium-Range Weather Forecasts (ECMWF), namely ERA5_LAND at a spatial resolution of 0.1°. Cloud cover (CC) and planetary boundary layer height (PBLH) data were downloaded from ERA5 at a spatial resolution of 0.25°. Both reanalysis datasets were obtained from the Climate Data Store (CDS; https://cds.climate.copernicus.eu) which is an operational service of Copernicus Climate Change Services implemented by ECMWF. The values for different meteorological variables from ERA5_Land and ERA5 were extracted for the location of Don Mueang Airport using the bilinear interpolation method. Only daytime data (here, 08 LT-17 LT) for visibility and other variables were considered here. In previous studies on visibility (Aman et al., 2019; Aman et al., 2022), visibility observations under certain meteorological conditions, e.g., high RH (RH > 90%), rain, fog, and mist, etc., were excluded to understand the effect of anthropogenic activities on visibility. However, in this study, we excluded visibility observation under rainfall but used data under all other weather conditions for visibility estimation. In our quality-checking, the following probable ranges or detection limits were applied the initial data: PM_2.5 and PM₁₀ (0 to 1,000 µg m^–3), SO₂ (0 to 1,000 ppb), NO_x (0 to 1,000 ppb), O₃ (0 to 1,000 ppb), CO (0 to 1,000 ppm), TEMP (–5 to 50°C), RH (0 to 100%), WS (0 to 50 m s^–1), WD (0° to 360°), RN (0 to 1,000 mm h^–1) and GR (0 to 1,000 W m^–2) (Aman et al. 2023). No improbable values were found in the data. In addition to the air pollutants and meteorological variables, three time-related variables, namely the day of the season year (referred to here as Julian day or JD), day of the week (DOW), and hour of the day (HOD) were also considered. For the day of the season year, 1st November is assigned as day 1, and so on. The hour of day and day of the week were used as surrogates for the local traffic emissions while the day of the season year was used as proxies to account for the seasonal and long-term variability in visibility due to changes in emissions and not accounted for in the meteorological variables. HOD also represents hourly sky conditions due to changing position of the sun. To understand the variability of visibility and other variables on a monthly scale, its monthly average values by month were assessed. This helped us determine that visibility degradation is mainly a problem in the dry season (discussed in Result and Discussion) and hence was chosen for this study. The strength and direction of the association between visibility and other variables were examined with correlation analysis through a a correlation heatmap.

Table 1

Data and variables used in the study
Datasets	Product/Station Name^a	Variable^b	Spatial Resolution	Source^c
Meteorological data	T01	VIS, TEMP, RH, WS, WD	In situ	TMD
Air Quality data	P05, P08, P19, P27, P52, P53, P61	PM₁₀, PM_2.5 SO₂, NO_x, O₃, CO	In situ	PCD
Reanalysis data	ERA5–LAND	SP, GR, RN	0.1° × 0.1°	ECMWF
Reanalysis data	ERA5	CC, PBLH	0.25° × 0.25°	ECMWF
^aT01: Don Muang Airport (WMO484560), P05 (Thai Meteorological Department Bang Na), P08 (Vocational Rehabilitation Center for Persons with Disabilities Phra Pradaeng), P19 (Bang Phil National Housing Authority), P27 (Samut Sakhon Witthayalai School), P52 (Metropolitan Electricity Authority Substation Thonburi), P53 (Chokchai Police Station Ladprao Road), P61 (Bodindecha (Sing Singhaseni) School. ERA5: European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis V5. ^bVIS: Visibility; TEMP: Air temperature; RH: Relative humidity; WS and WD: Wind speed and direction; PM₁₀ and PM_2.5: Particulate matter with size smaller than or equal to 10 µm and 2.5µm, respectively; SO₂: Sulphur dioxide; NO_x: Nitrogen oxides; O₃: Ozone; CO: Carbon monoxide; SP: Surface Pressure; GR: Global Radiation; RN: Rain;; CC: Cloud cover; and PBLH: Planetary boundary layer height. ^cTMD: Thai Meteorological Department, PCD: Pollution Control Department, ECMWF: European Centre for Medium-Range Weather Forecasts. All datasets are are hourly and available for season year 2017–2022 (November 2016 to October 2022).

Insert Table 1

Visibility prediction with machine learning (ML) models

Visibility was estimated using the following six individual ML models: random forest (RF), adaptive boosting or AdaBoost (ADB), gradient boosting (GB) and extreme gradient boosting or XGboost (XGB), cat boosting or Catboost (CB), and light gradient boosting machine or LightGBM (LGB). The technical details of these individual ML models are given elsewhere (Breiman 2001; Freund and Schapire 1999; Friedman 2001; Chen and Guestrin 2016; Prokhorenkova et al. 2018; Ke et al. 2017). A set of hyperparameter values for each ML model was selected based on author’s learned experience and judgment (here, Aman et al. 2024) and the literature review. Various combinations of hyperparameters were then explored to optimize visibility estimation (Table 2). The model validation process employed a nested cross-validation technique with a double loop. In the inner loop, hyperparameters underwent optimization using the Random Search Method and a 5-fold cross-validation. In the outer loop, a 10-fold cross-validation was performed, with 90% of the data used for training and 10% for testing the developed model. This procedure was repeated 10 times to ensure all data points should be utilized for both training and testing. Four statistical metrics namely correlation coefficients (ρ), mean bias (MB), mean absolute error (ME), and root mean square error (RMSE) were used for the evaluation of different ML models. These are mathematically defined as below:

$\rho = \frac{1}{n-1}{\sum }_{i=1}^{n}\frac{(\stackrel{-}{O}-{O}_{i})}{{\sigma }_{o}}\times \frac{(\stackrel{-}{P}-{P}_{i})}{{\sigma }_{p}}$	(1)
$MB= \frac{\sum _{i=1}^{n}({P}_{i}-{O}_{i})}{n}$	(2)
$ME= \frac{\sum _{i=1}^{n}\|{P}_{i}-{O}_{i}\|}{n}$	(3)
$RMSE= \sqrt{\frac{\sum _{i=1}^{n}{({P}_{i}-{O}_{i})}^{2}}{n}}$	(4)

Table 2

Hyperparameters used by ML model
Model	Hyperparameter	Space
Random forest	n_estimator	[100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
	max_depth	[3, 5, 7, 9, 10, 12, 14, 16, 18, 20]
	max_features	[“sqrt”, “log2”]
	min_samples_split	[2, 4, 6, 8, 10]
	min_samples_leaf	[1, 2, 4, 6, 8]
	criterion	['squared_error']
Adaptive boosting	n_estimator	[100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
	learning_rate	[0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1.0]
	loss	["linear", "square"]
Gradient boosting	n_estimator	[100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]
	max_depth	[3, 5, 7, 9, 10, 12, 14, 16, 18, 20]
	max_feaatures	['sqrt', 'log2']
	min_samples_split	[2, 4, 6, 8, 10]
	min_samples_leaf	[1, 2, 4, 6, 8]
	learning_rate	[0.1, 0.2, 0.4, 0.6, 0.8, 1.0]
	criterion	['squared_error']
	subsample	[0.5, 0.7, 0.9, 1]
Extreme gradient boosting	n_estimator	[100, 200, 400, 500, 600, 700, 800, 900, 1000, 1200, 1400]
	col_sample_bytree	[0.4, 0.5, 0.6, 0.7, 0.8]
	max_depth	[3, 5, 7, 9, 10, 12, 14, 16, 18, 20]
	learning_rate	[0.01, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0]
	subsample	[0.5, 0.7, 0.9, 1]
	min_child_weight	[1, 3, 5, 7, 9]
	gamma	[0.1, 0.5, 1, 3, 5]
Cat boosting	n_estimator	[100, 200, 400, 600, 800, 1000, 1100, 1400, 1500]
	col_sample_bylevel	[0.6, 0.8, 1]
	max_depth	[4, 6, 8, 10]
	learning_rate	[0.01, 0.03, 0.1, 0.2, 0.3]
	subsample	[0.6, 0.8, 1]
	min_child_samples	[1, 3, 5]
	max_leaves	[31, 62]
	l2_leaf_reg	[0.1, 1, 3, 5, 7, 10]
Light gradient boosting	n_estimator	[100, 200, 300, 500, 700, 900, 1000]
	col_sample_bytree	[0.5, 0.8, 0.9]
	max_depth	[5, 10, 15, 20, 25, 30]
	learning_rate	[0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 1.0]
	subsample	[0.5, 0.7, 0.9, 1]
	min_child_samples	[10, 20, 30, 40, 50]
	num_leaves	[31, 60, 90, 100, 150, 200]

Here, O̅ and P̅ are the means of observed and predicted values, O_i and P_i denote the individual observed and predicted values, and n is the number of data points. The model that showed the best performance in visibility estimation was selected and the final tuning of the model again using full datasets. The model development and hyperparameter optimization were done using the scikit-learn library (Pedregosa et al. 2011).

Insert Table 2

Meteorological normalization of visibility

Meteorological normalization is a technique that was first introduced by Grange et al. (2018) and later adopted by other studies with some adjustments to decouple the effect of meteorology on PM_2.5. The decoupling was achieved by predicting PM_2.5 for a specific time multiple times with randomly selected meteorological variables and then averaged to give normalized PM_2.5. Here, we have adopted this method to decouple the effect of meteorology on visibility. Grange et al. (2018) and some other studies (Hou et al. 2022; Mallet 2021; Qu et al. 2020) predicted PM_2.5 1000 times for each hour while Liu et al. (2022) repeated the prediction process 500 times. Vu et al (2019) proposed to normalize only weather conditions without normalizing seasonal and diurnal variation by randomly resampling meteorological variables only from the particular hour for which prediction is being made within 4 weeks (2 weeks before and 2 weeks after) for the entire study period. Wang et al. (2022) resampled both meteorological variables and time variables for each hour prediction which could not be used for comparing seasonal or diurnal variation with observed PM_2.5. Wu et al. (2022b) compared three meteorological normalization methods: a) resample meteorological variables, b) resample meteorological and time variables, and (c) no resampling and found that resampling meteorological variables only gives the best results. Here, we have adopted this method to decouple the effect of meteorology on visibility. Here, Vu et al (2019) were followed and only meteorological variables were resampled keeping air pollutants and time-related variables constant. It can be represented as follows:

$${VIS}_{norm}= \frac{1}{1000} \times {\sum }_{i=1}^{1000}{VIS}_{, i \left(prd\right)}$$

Here, VIS_norm is the meteorologically normalized visibility, and VIS_{i, (prd)} is the ML model predicted visibility for the i^th set of predictor variables. The mean level of observed visibility and meteorologically normalized visibility were calculated. Next, the seasonal average of observed and normalized visibility were compared.

SHapley Additive exPlanation (SHAP): A game theory-based approach

The average importance of different predictor variables on visibility and their overall effect on the output of an ML model can be determined using feature importance plots and partial dependence plots, respectively. However, these methods can not investigate the complex interactive effects of different variables on visibility. Also, the contribution of different variables for each instance of prediction is not possible by either of the two plots. To address these two important aspects, this study used SHapley Additive exPlanation (SHAP) method which is a cooperative game theory-based approach given by Nobel laureate Lloyd Shapley. In this method, the Shapley value is calculated to quantify the contribution of each player to the outcome of any cooperative game (Lundberg and Lee 2017; Lundberg et al. 2020). In the context of ML model interpretability, Shapley values are used to attribute the contribution of each predictor variable to the prediction of a specific instance. Shapley values help to understand the impact of each predictor variable on a model's output by considering all possible combinations of predictor variables and calculating the average marginal contribution of the variable across all possible combinations. A positive Shapley value means a positive while a negative value means a negative effect on visibility. A detailed explanation of SHAP method can be found in Lundberg and Lee (2017). In this study, Shapley values of each predictor variable for each prediction were computed by the best-identified ML model i.e., the light gradient boosting model (see sub-section on ML model evaluation in Results and Discussions. Next, the mean Shapley values for each feature were calculated and represented by the SHAP feature importance plots for dry, winter, and summer seasons. A list of important predictor variables was identified based on mean SHAP values being higher than a threshold value (here set as 0.1). To understand the directional relationship of visibility with the selected important predictor variables and to understand the distribution of Shapley value for these predictor variables, SHAP summary plots (also called beeswarm plots) were used. In summary plots, the Shapley values (both positive and negative) for each feature and for each instance are represented but it still cannot help to understand the exact form of relationship between visibility and its predictor variables. To understand this, the dependence plots for Shapley values of the important predictor variables were also analyzed.

Effect of hygroscopic growth on visibility with SHAP approach

RH is considered an important predictor variable that can affect the visibility relationship with PM_2.5 and PM₁₀ due to the hygroscopic growth of aerosols leading to a change in light scattering (Watson, 2002; Singh et al. 2017; Aman et al. 2019). To understand this interaction, Shapley values for PM_2.5 and PM₁₀ were plotted against their respective concentrations under different RH intervals. A total of seven RH intervals: 0–40%, 40–50%, 50–60%, 60–70%, 70–80%, 80–90%, and 90–100% were used. In addition to the SHAP approach, an equal step-size method as adopted by Aman et al. (2019) was also used to understand the effect of the hygroscopic growth of aerosols. In this method, RH data was divided into 11 intervals: 0–40%, 40–45%, 45–50%, 50–55%, 55–60%, 60–65%, 65–70%, 70–75%, 75–80%, 80–85%, 85–90%, 90–95%, and 95–100%. The average values for original and meteorologically normalized visibilities, PM_2.5 and PM₁₀ were calculated for each interval. To understand the effectiveness of meteorological normalization on nullifying the effect of meteorology on visibility.

Monthly variation of visibility, air pollutants, and other variables

The monthly variation of visibility, air pollutants, and other meteorological variables is shown in Fig. 2. Visibility is relatively lower in the dry season as compared to the wet season (Fig. 2a). This is due to poor air quality in the dry season as suggested by the higher concentration of PM_2.5, PM₁₀, SO₂, NO_x, O₃, and CO (Fig. 2a-2c). Although RH and rain are high in the wet season due to the southwest monsoon (Fig. 2d-2e), its effect on visibility is not found due to the low concentration of different air pollutants in the wet season due to wet scavenging by rain and due to strong winds (Fig. 2e-2f). The temperature variation is low in winter during November to February, high in summer during March and April, and mild in the wet season (Fig. 2d). Surface pressure is higher during winter (November-February) due to the presence of cooler and denser air and decrease as temperature increase during summer and the wet season (Fig. 2f). Global radiation is found higher during March and April in summer and gradually decrease during the wet season (Fig. 2g) possibly due to cloud and rains (Fig. 2e). Planetary boundary layer height is also relatively higher during March-April due to warm conditions and a lack of strong subsidence. It is lower during the end of the wet season in September and October and during winter (November – February) due to cool weather (Fig. 2g). Overall, given visibility degradation is a concern only in the dry season, this study only focuses on the dry season.

Insert Fig. 2

Correlation between visibility and predictor variables

The correlation coefficient between visibility and its predictor variables is shown in the heat map in Fig. 3. Visibility shows a negative correlation with PM_2.5, PM₁₀, SO₂, NO_x, CO, RH, SP, CC UWIND, VWIND, DOW, and JD. Conversely, it shows a positive correlation with O₃, TEMP, PBLH, GR, and HOD was found. The strengths of these correlations vary, ranging from weak to moderate to strong, as indicated by the diverse range of correlation coefficients across different variables. The negative correlation with air pollutants is obvious due to the scattering and absorption of light by these pollutants. However, a positive correlation between visibility and O₃ is due to a negative correlation of O₃ with PM_2.5 and PM₁₀. The increase in light scattering and absorption by particulate matter (and decrease in visibility) leads to a reduction of photochemical rate and hence decrease in O₃. The negative relation between visibility and RH suggests the effect of hygroscopic growth of the aerosols (Watson 2002; Aman et al. 2019). An increase in SP can lead to the development of stagnant weather conditions favoring air pollutant accumulation and degradation of visibility. The presence of high clouds can diminish the amount of sunlight reaching the Earth's surface, consequently impacting visibility. However, the association between visibility and cloud cover can be influenced by specific cloud properties such as cloud type and cloud height. This variation in cloud characteristics accounts for the identified weak correlation between visibility and CC. The weak relationship of visibility with the u and v components of the wind suggests other factors are more dominating than wind. that factors beyond horizontal wind play a role in determining visibility. Warmer temperatures and a higher PBLH enhance vertical mixing of air pollutants leading to improving visibility. Higher GR can improve visibility by reducing haze and increasing contrast, contributing to a more stable and well-mixed atmosphere. Visibility showed an overall weak negative correlation with JD and DOW but may not be sufficient to understand their interrelation due to the periodic effects of time-related variables. A positive correlation between visibility and HOD was found. A range of correlations, from weak to moderate to strong, was observed between visibility and various predictor variables. The complexity of the relationship indicates that the impact of these variables on visibility is not straightforward but involves intricate and potentially confounding effects. Considering the goal of achieving accurate visibility predictions using ML models, it becomes valuable to incorporate all these predictor variables collectively, recognizing their interplay and cumulative influence on visibility.

Insert Fig. 3

Performance evaluation and comparison of machine learning models

The performance of the six individual ML models (i.e. RF, ADB, GB, XGB, CB, and LGB, respectively) using k-fold cross cross-validation (CV) is presented in Table 3. No individual ML model showed the best performance across all the evaluation metrics. LGB showed the best performance across three out of four metrics with the highest ρ of 0.86, lowest MB of zero, second lowest ME of 0.48 km, and lowest RMSE of 0.81 km for hourly prediction. These values for these evaluation metrics for daily visibility prediction are ρ = 0.92, MB = 0.0 km, ME = 0.3 km, and RMSE = 0.43 km. The ADB model has the lowest ME of 0.42 km but has the lowest ρ of 0.84, and the highest MB and RMSE of 0.15 and 0.88 respectively. RF model can be categorized as the model with poor performance with lowest ρ of 0.84, MB of − 0.01, highest ME of 0.55 km, and highest RMSE of 0.88 km for hourly prediction. These values for daily visibility prediction are ρ = 0.87, MB = 0.0 ME = 0.4 km, and RMSE = 0.55 km. All ML models have shown better performance on the daily scale as compared to the hourly scale. Overall, it can be concluded that it is not possible to indicate one ML model to be the best for the visibility prediction. Each model has its own strengths and weaknesses. The performance of the model depends on the nature of the datasets such as sample sizes, number of features, etc. Different studies suggest different ML models as the best choice for visibility estimation (Uyanık et al. 2021; Kim et al. 2022a). Uyanık et al. (2021) compared six ML models for visibility estimation over the Istanbul Strait and the GB model performed the best. Kim et al. (2022a) compared six ML models for visibility estimation over Seoul, South Korea, and the XGB model performed the best. SEM outperformed all the individual ML models for visibility estimation with the highest ρ of 0.88, lowest MB of zero, second lowest ME of 0.44 km, and lowest RMSE of 0.75 km for hourly prediction. These values for these evaluation metrics for daily visibility prediction are ρ = 0.93, MB = 0.0 km, ME = 0.27 km, and RMSE = 0.4 km. The best performance by SEM is likely to be due to the utilization of individual models each using different algorithms and techniques. No study using SEM for visibility estimation was found. However, various studies on PM_2.5 prediction suggest the best performance by SEM (Chen et al. 2019; Aman et al. 2024). Here, the LGB model was considered the best-suited individual ML model as it performed the best among the selected models for this dataset. The final hyperparameters for LGB model are a) n_estimators = 500, b) max_depth = 10, c) min_child_samples = 10, d) num_leaves = 100, e) learning_rate = 0.1 f) subsample = 0.9, g) colsample_bytree = 0.8.

Table 3

Performance of different machine learning models for visibility prediction
Model	Hourly Visibility				Daily Visibility
Model	ρ	MB	ME	RMSE	ρ	MB	ME	RMSE
Random Forest	0.84	−0.01	0.55	0.88	0.87	0.00	0.40	0.55
Adaboost	0.84	0.15	0.42	0.88	0.88	0.16	0.34	0.53
Gradient Boosting	0.85	0.00	0.51	0.84	0.9	0.01	0.34	0.48
XGBoost	0.84	0.00	0.54	0.86	0.87	0.02	0.39	0.54
Cat Boosting	0.85	0.01	0.53	0.85	0.88	0.02	0.37	0.52
Light Gradient boosting	0.86	0.00	0.48	0.81	0.92	0.00	0.3	0.43
Stacked Ensemble Model	0.88	0.00	0.44	0.75	0.93	0.02	0.27	0.4

Insert Table 3

Seasonal average original and meteorologically normalized visibilities

The seasonal average original visibility (VIS_org) and meteorologically normalized visibility (VIS_norm) are shown in Table 4. A consistent increase or decreasing tendency cannot be identified but seasonal average VIS_org shows a decreasing tendency with a decrease from 9.6 km in 2017 to 8.9 km in 2021 but an increase to 9.5 km in year 2022. VIS_norm also decreases from 9.2 km in 2018 to 8.9 in 2021 but increases to 9.3 km in 2022. VIS_norm is either less than or equal to the VIS_org for all years indicating an overall positive effect of meteorology on the seasonal average of visibility. This suggests that the increase in visibility is attributed to meteorology and not because of a decrease in emissions. The standard deviation of VIS_org is almost two times higher than that of VIS_norm which indicates the dominant effect of meteorology on variation of visibility on different time scales. No literature on the use of ML for visibility adjustment was found by us. However, Aman et al. (2019) used a generalized linear model for meteorological adjustment to investigate the long-term trend visibility over Eastern Thailand and reported less year-to-year fluctuation in meteorologically adjusted visibility as compared to the original visibility.

Table 4

Original (VIS_org) and meteorologically normalized (VIS_norm) visibility
Year	VIS_org	VIS_norm
2017	9.5 ± 1.3	9.2 ± 0.6
2018	8.8 ± 1.8	8.8 ± 0.9
2019	9.1 ± 1.5	8.9 ± 0.9
2020	9.0 ± 1.7	8.9 ± 0.8
2021	8.9 ± 1.7	8.9 ± 0.8
2022	9.5 ± 1.2	9.3 ± 0.5

Insert Table 4

Influences of factors on visibility through SHAP values

The overall importance of different factors affecting visibility was analyzed with a global bar plot or global feature importance plot showing the mean absolute SHAP values for all features during dry, winter, and summer (Fig. 4a). RH is identified as the main predictor variable affecting visibility. The mean SHAP values for RH are + 0.59, + 0.59, and + 0.6 for the dry season, winter season, and summer season, respectively. Particulate matter (PM_2.5 and PM₁₀) is identified as the second main important factor. The mean SHAP values are + 0.3, + 0.31, and + 0.26 for PM_2.5 and + 0.23, + 0.25, and + 0.19 for PM₁₀ during the dry, winter, and summer seasons, respectively. JD refers here to “day of the year” is the next important factor that represents the change in emission due to seasonal or long-term variability. The mean SHAP values are + 0.18, + 0.21, and + 0.12, for dry, winter, and summer seasons, respectively. O₃ is the fifth most important variable with Mean SHAP values of + 0.11, + 0.11, and + 0.12 for dry, winter, and summer seasons, respectively. In addition, the beeswarm plot was also plotted to represent how different features affect the visibility (Fig. 4b). The left sample in red and right samples in blue indicates the negative relationship of visibility with RH, PM_2.5, and PM₁₀, JD, and O₃. To understand the directional relationship and strength of influences of these features on visibility, we also plotted the partial dependence plots for these five main variables affecting visibility are shown in Fig. 5. It is found that at low RH (approx. 50%) there is no significant effect of RH on visibility. However, as RH crosses the threshold, negative association with visibility appears. Both PM_2.5 and PM₁₀ showed a consistent negative relationship with visibility. O₃ at lower and moderate levels shows negative correlation with visibility while correlation diminishes at very high O₃ levels. JD indicates a periodic effect, initially showing a negative association with visibility and later displaying positive correlation.

Insert Fig. 4

Insert Fig. 5

Effect of hygroscopic growth of aerosols on visibility

The scatter plots showing Shapley values for PM_2.5 and PM₁₀ with PM_2.5 and PM₁₀ under different RH classes are shown in Fig. 6a. A negative relationship of PM_2.5 and PM₁₀ with their Shapley values suggests a negative effect of aerosols on visibility. For the lower concentration of PM_2.5 and PM_10, Shapley values are positive for both these variables but become negative at higher PM_2.5 and PM₁₀ concentrations. Also, for the same level of aerosol concentrations, higher negative Shapley values are found suggesting the effect of hygroscopic growth of aerosol on visibility. The average original and meteorologically normalized visibilities, PM_2.5 and PM₁₀, under different RH intervals, are shown in Fig. 6 (b). The original visibility is not sensitive to RH at lower RH intervals ranging from 9.6 km at RH interval 0–40% to 9.4 km at RH interval 60–65%. However, as RH increases beyond 60–65%., visibility decreases sharply to 4.5 km at 90–100%. Average PM_2.5 and PM₁₀ first decrease with an increase in RH but later increase with an increase in RH. However, the change in PM_2.5 and PM₁₀ is not as sharp as in the case of visibility. This sharp decrease in visibility without much increase in particulate pollution also confirms the effect of hygroscopic growth of aerosol on visibility. As expected, meteorologically normalized visibility is not sensitive to changes in RH as it only represents the effect of emission. Over low RH intervals, not much difference is found between original and meteorologically normalized visibilities which increases as RH increases. The synergetic effect of air pollution and humidity on visibility due to increases in the scattering efficiency under humid conditions under aerosol growth has been reported in various studies (Aman et al. 2019, 2022; Wang et al. 2019). Aman et al (2022) reported the synergetic effect of particulate matter and humidity on low-visibility events in Bangkok. Aman et al. (2019) suggested with an equal step-size method that visibility initially increases with an increase in RH but as a threshold RH value is reached any further increase in RH leads to a decrease in visibility over Chonburi and Rayong provinces in Thailand. Wang et al. (2019) also suggested in a study in Beijing, China that RH has little effect on variation in visibility under low-RH conditions. However, as RH increases above 40%, visibility is inversely related to RH under polluted conditions.

Insert Fig. 6

The study uses machine learning (ML) based approaches for visibility estimation and understanding various factors affecting its variation at Bangkok Airport. A total of six individual machine learning (ML) models and a stacked ensemble model (SEM) developed by combining individual models were used. The individual ML models are random forest, adaptive boosting, gradient boosting, extreme gradient boosting, light gradient boosting, and cat boosting. Moreover, the study analyzed the influence of key factors by employing the Shapley method, an explainable ML technique inspired by the principles of game theory. The predictor variables include different air pollutants, meteorological variables, and time-related dummy variables as a surrogate for emission and its variability. Air pollution data was collected for seven air quality monitoring stations of the Pollution Control Department. Visibility and other meteorological data were collected for the Weather station operated by the Thai Meteorological Department at Bangkok Airport. Additional meteorological data were supplemented with ERA5_LAND and ERA5 reanalysis data. Visibility data were screened for rainfall and only daytime (here, 08 LT-17 LT) was considered. Visibility degradation is higher in the dry season as compared to the wet season due to poor air quality. Hence, only the dry season was selected for visibility prediction and understanding its influencing factors. No one model showed the best performance across all the evaluation metrics. The light gradient boosting model showed the best performance in terms of three out of four evaluation metrics and hence can be considered the best-performing individual model. SEM outperformed all the individual models for both hourly and daily visibility predictions. All models performed better on estimating daily visibility as compared to hourly visibility. No consistent increase or decrease in seasonal average visibility was found. The seasonal average as well as standard deviation of meteorological normalized visibility is lower than the original visibility suggesting less impact of any emission decrease on any visibility improvement. Shapley analysis suggested relative humidity (RH) as the dominant factor affecting visibility followed by PM_2.5, PM₁₀, day of the season year, and O₃ as the five most important variables. Visibility is not sensitive to low-RH but as RH surpasses the threshold value, it shows a negative relationship with visibility. A negative relationship of both PM_2.5 and PM₁₀ with visibility is found. O₃ showed a negative relationship at lower to moderate levels of O₃ concentration. However, as the concentration becomes very high it does not show any effect on visibility. JD or the day of the season suggests a periodic effect as it is negatively associated with visibility initially and positively later. The dependence of the Shapley values of PM_2.5 and PM₁₀ on RH with more impact with the same level of pollution under different RH conditions suggests the effect of the hygroscopic growth of aerosols on visibility. The effect is also confirmed by a sharp decrease in average visibility under high RH but relatively less sensitivity of meteorologically normalized visibility and particulate pollution to RH. The results from this study indicate the potential of using ML techniques estimate visibility and to assist in understanding factors that influence its variation.

Based on the above findings, two important policy–related implications are identified. Firstly, the effects of meteorology on visibility should be considered while using them as a proxy to understand air quality trends. Secondly, the synergetic effect of humidity and air pollution poses a challenge to visibility improvement by emission reduction alone in Bangkok which is next to the Gulf of Thailand as a major moisture source. Thus, the importance of visibility forecasting should be emphasized as a preparatory measure to benefit the public in coping with unfavorable conditions (i.e., very low visibility). The future recommended work on visibility prediction and analysis to expand or enhance the current scope of work are as follows: spatiotemporal visibility mapping over GBK, use of sensor-based visibility measurement and additional predictor variables such as large-scale meteorological factors (e.g., air-mass back-trajectories and synoptic weather patterns), use of advanced data-driven models (i.e., deep learning models), and forecasting of low-visibility events.

Acknowledgements

The authors thank the Thai Meteorological Department (TMD) and the Pollution Control Department (PCD) for providing the meteorological and air quality data. This research is supported by the Ratchadapisek Somphot Fund for Postdoctoral Fellowship, Chulalongkorn University, Bangkok, Thailand.

Funding This research received no external funding.

Data availability

The TMD and PCD data are owned by the respective agencies, which generally require an official request to obtain or access the data. The other data are available through the corresponding author.

Ethics approval Not applicable.

Consent to participate Not applicable.

Consent for publication Not applicable.

Competing interests The authors declare no competing interests.

Aman N, Manomaiphiboon K, Pala-En N, Devkota B, Inerb M, Kokkaew E (2023) A study of urban haze and its association with cold surge and sea breeze for Greater Bangkok. Int J Environ Res Public Health 20: 3482. https://doi.org/10.3390/ijerph20043482.
Aman N, Manomaiphiboon K, Pala-En N, Kokkaew E, Boonyoo T, Pattaramunikul S, Devkota B, Chotamonsak C (2020) Evolution of urban haze in Greater Bangkok and association with local weather and synoptic characteristics during two recent haze episodes. Int J Environ Res Public Health 17:9499. https://doi.org/10.3390/ijerph17249499.
Aman N, Manomaiphiboon, K., Pengchai, P., Suwanathada, P., Srichawana, J., and Assareh, N, (2019) Long-term observed visibility in eastern Thailand: Temporal variation, association with air pollutants and weather factors, and trends. Atmosphere 10:122. https://doi.org/10.3390/atmos10030122.
Aman N, Manomaiphiboon K, Suwattiga P, Assareh N, Limpaseni W, Suwanathada P, Soonsin V, Wang Y (2022) Visibility, aerosol optical depth, and low-visibility events in Bangkok during the dry season and associated local weather and synoptic patterns. Environ Monit Assess 194:322. https://doi.org/10.1007/s10661-022-09880-2.
Aman N, Manomaiphiboon K, Xian D, Tian L, Gao L, Wang Y, Pala-En N, Wang Y, Wangyao K (2024) Spatiotemporal estimation of hourly PM₅using AOD derived from geostationary satellite Fengyun-4A and machine learning models for Greater Bangkok. Air Qual Atmos Health. https://doi.org/10.1007/s11869-024-01524-3.
Barmpadimos I, Hueglin C, Keller J, Henne S, Prévôt ASH (2011) Influence of meteorology on PM₁₀ trends and variability in Switzerland from 1991 to 2008. Atmos Chem Phys 11:1813–1835. https://doi.org/10.5194/acp-11-1813-2011.
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324.

Chen J, Yin J, Zang L, Zhang T, Zhao M (2019) Stacking machine learning model for estimating hourly PM₅ in China based on Himawari 8 aerosol optical depth data. Sci Total Environ 697:134021. https://doi.org/10.1016/j.scitotenv.2019.134021.
Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD ’16, San Francisco, CA, USA, 13–17 August 2016, 785–794.
ChooChuay C, Pongpiachan S, Tipmanee D, Suttinun O, Deelaman W, Wang Q, Xing Li, Li G, Han Y, Palakun J, Cao J (2020) Impacts of PM₅ sources on variations in particulate chemical compounds in ambient air of Bangkok, Thailand. Atmos Pollut Res 11:1657–1667. https://doi.org/10.1016/j.apr.2020.06.030.
Department of Provincial Administration (DOPA) (2023) Statistic of Population by Province in 2022. (In Thai). https://stat.bora.dopa.go.th/new_stat/webPage/statByYear.php (accessed on 28 September 2023).
Fathollahi L, Wu F, Maleki R, Pongracic B (2023) PM₅concentrations estimation using machine learning methods with combination of MAIAC - MODIS AOD product - a case study in western Iran. Air Qual Atmos Health 16:1529–1541. https://doi.org/10.1007/s11869-023-01354-9

Freund Y, Schapire RE (1999) A short introduction to boosting. J Japan Soc Artif Intell 14:771–780.
Friedman JH (2001) Greedy function approximation: A gradient boosting machine. Ann Stat 29:1189–1232.
Grange SK, Carslaw DC, Lewis AC, Boleti E, Hueglin C (2018). Random forest meteorological normalisation models for Swiss PM₁₀ trend analysis. Atmos Chem Phys 18:6223–6239. https://doi.org/10.5194/acp-18-6223-2018.
Gupta P, Zhan S, Mishra V, Aekakkararungroj A, Markert A, Paibong S, Chishtie F (2021) Machine learning algorithm for estimating surface PM₅in Thailand. Aerosol Air Qual Res 21:210105. https://doi.org/10.4209/aaqr.210105.
Hou L, Dai Q, Song C, Liu B, Guo F, Dai T, Li L, Liu B, Bi X, Zhang Y, Feng Y (2022) Revealing drivers of haze pollution by explainable machine learning. Environ Sci Technol Lett 9:112−119. https://doi.org/10.1021/acs.estlett.1c00865.
Hu X, Belle JH, Meng X, Wildani A, Waller LA, Strickland M.J, Liu Y (2017) Estimating PM₅concentrations in the conterminous United States using the random forest approach. Environ Sci Technol 51:6936−6944. https://doi.org/10.1021/acs.est.7b01210.
Kim BY, Cha JW, Chang KH, Lee C (2022a). Estimation of the visibility in Seoul, South Korea, based on particulate matter and weather data, using machine-learning algorithm. Aerosol Air Qual Res 22:220125. https://doi.org/10.4209/aaqr.220125.
Kim J, Kim SH, Seo HW, Wang YV, Lee YG (2022b) Meteorological characteristics of fog events in Korean smart cities and machine learning based visibility estimation. Atmos Res 275:106239. https://doi.org/10.1016/j.atmosres.2022.106239.
Kumar V, Malyan V, Sahu M, Biswal B, Pawar M, Dev I (2023) Spatiotemporal analysis of fine particulate matter for India (1980–2021) from MERRA-2 using ensemble machine learning. Atmos Pollut Res 14:101834. https://doi.org/10.1016/j.apr.2023.101834.
Land Development Department (LDD) (2016) Land use and land cover data for Thailand for the Years 2012–2016; CD–ROM Product; Land Development Department: Bangkok, Thailand, 2016.
Lee HH, Bar-Or RZ, Wang C (2017) Biomass burning aerosols and the low-visibility events in Southeast Asia. Atmos Chem Phys 17:965–980. https://doi.org/10.5194/acp-17-965-2017.

Liu B, Wang Y, Meng H, Dai Q, Diao L, Wu J, Shi L, Wang J, Zhang Y, Feng Y (2022) Dramatic changes in atmospheric pollution source contributions for a coastal megacity in northern China from 2011 to 2020. Atmos Chem Phys 22:8597–8615. https://doi.org/10.5194/acp-22-8597-2022.
Lundberg S, Lee SI (2017) A unified approach to interpreting model predictions. arXiv 1705.07874.
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI (2020). From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2:56−67. https://doi.org/10.1038/s42256-019-0138-9.
Mallet MD (2021) Meteorological normalisation of PM₁₀ using machine learning reveals distinct increases of nearby source emissions in the Australian mining town of Moranbah. Atmos Pollut Res 12:23–35. https://doi.org/10.1016/j.apr.2020.08.001.
Majewski G, Rogula-Kozłowska W, Szeląg B, Anioł E, Rogula-Kopiec, P, Brandyk A, Walczak A, Radziemska M (2022) New insights into submicron particles impact on visibility. Environ Sci Pollut Res 29:87969–87981. https://doi.org/10.1007/s11356-022-21781-y.
Narita D, Oanh NTK, Sato K, Huo M, Permadi DA, Chi NNH, Ratanajaratroj T, Pawarmart I (2019) Pollution characteristics and policy actions on fine particulate matter in a growing Asian economy: The case of Bangkok Metropolitan Region. Atmosphere 10:227. https://doi.org/10.3390/atmos10050227.

National Economic and Social Development Board (NESDB) (2022) Gross Regional and Provincial Product, Chain Volume Measures, 2021 edition; Office of the National Economic and Social Development Board: Bangkok, Thailand, 2022. Available online: https://www.nesdc.go.th/main.php?filename=gross_regional (accessed on 28 September 2022).

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830.
Peng-In B, Sanitluea P, Monjatturat P, Boonkerd P, Phosri A (2022) Estimating ground-level PM₅over Bangkok Metropolitan Region in Thailand using aerosol optical depth retrieved by MODIS. Air Qual Atmos Health 15:2091–2102. https://doi.org/10.1007/s11869-022-01238-4.
Phairuang W, Suwattiga P, Chetiyanukornkul T, Hongtieab S, Limpaseni W, Ikemori F, Hata M, Furuuchi M (2019) The influence of the open burning of agricultural biomass and forest fires in Thailand on the carbonaceous components in size-fractionated particles. Environ Pollut 247: 238–247. https://doi.org/10.1016/j.envpol.2019.01.001.
Penov N, Guerova G (2023) Sofia airport visibility estimation with two machine-learning techniques. Remote Sens 15:4799. https://doi.org/10.3390/rs15194799.
Pollution Control Department (PCD) (2023) Annual Report 2022, Pollution Control Department, Bangkok, Thailand (in Thai). https://www.pcd.go.th/wp-content/uploads/2023/04/pcdnew-2023-05-02_04-27-17_828080.pdf (accessed on 6th June 2023).
Qu L, Liu S, Ma L, Zhang Z, Du J, Zhou Y, Meng F (2020) Evaluating the meteorological normalized PM2.5 trend (2014-2019) in the "2+26" region of China using an ensemble learning technique. Environ Pollut 266:115346. https://doi.org/10.1016/j.envpol.2020.115346.
Singh A, Bloss WJ, Pope FD (2017) 60 years of UK visibility measurements: impact of meteorology and atmospheric pollutants on visibility. Atmos Chem Phys 17:2085–2101. https://doi.org/10.5194/acp-17-2085-2017.

Thongthammachart T, Shimadera H, Araki S, Matsuo T, Kondo A (2023) Land use regression model established using Light Gradient Boosting Machine incorporating the WRF/CMAQ model for highly accurate spatiotemporal PM₅estimation in the central region of Thailand. Atmos Environ 297:119595. https://doi.org/10.1016/j.atmosenv.2023.119595
Thai Meteorological Department (TMD) (2023) The climate of Thailand. Thai Meteorological Department. https://www.tmd.go.th/en/archive/thailand_climate.pdf. (accessed on 28 September 2023).
Tian L, Chen L, Zhang P, Hu B, Gao Y, Si Y (2023) The ground-level particulate matter concentration estimation based on the new generation of FengYun geostationary meteorological satellite. Remote Sens 15(5):1459. https://doi.org/10.3390/rs15051459.
Uyanık T, Karatug, C, Arslanoglu Y (2021) Machine learning based visibility estimation to ensure safer navigation in strait of Istanbul. Appl Ocean Res 112:102693. https://doi.org/10.1016/j.apor.2021.102693.

Vu TV, Shi Z, Cheng J, Zhang Q, He K, Wang S, Harrison RM (2019) Assessing the impact of clean air action on air quality trends in Beijing using a machine learning technique. Atmos Chem Phys 19:11303–11314. https://doi.org/10.5194/acp-19-11303-2019.
Yao T, Li J (2023) Environmental sustainability performance assessment in relation to visibility in African regions with interpretable machine learning. J Clean Prod 428:139414. https://doi.org/10.1016/j.jclepro.2023.139414.

Wang M, Zhang Z, Yuan Q, Li X, Han S, Lam Y, Cui L, Huang Y, Cao J, Lee SC (2022) Slower than expected reduction in annual PM₅in Xi'an revealed by machine learning-based meteorological normalization. Sci Total Environ 841:156740. https://doi.org/10.1016/j.scitotenv.2022.156740.

Wang S, Ren Y, Xia B (2023a) PM₅and O₃ concentration estimation based on interpretable machine learning. Atmos Pollut Res 14:101866. https://doi.org/10.1016/j.apr.2023.101866.
Wang S, Ren Y, Xia B (2023b) Estimation of urban AQI based on interpretable machine learning. Environ Sci Pollut Res 30:96562–96574. https://doi.org/10.1007/s11356-023-29336-5.
Wang X, Zhang R, Yu W (2019) The effects of PM₅ concentrations and relative humidity on atmospheric visibility in Beijing. J Geophys Res Atmos 124:2235–2259. https://doi.org/10.1029/2018JD029269.

Watson JG (2002) Visibility: Science and regulation. J Air Waste Manag Assoc 52:628–713. https://doi.org/10.1080/10473289.2002.10470813.

Wei J, Li Z, Pinker RT, Wang J, Sun L, Xue W, Li R, Cribb M (2021) Himawari-8-derived diurnal variations in ground-level PM2.5 pollution across China using the fast space-time light gradient boosting machine (LightGBM). Atmos Chem Phys 21: 7863–7880. https://doi.org/10.5194/acp-21-7863-2021.

Wongnakae P, Chitchum P, Sripramong R, Phosri A (2023) Application of satellite remote sensing data and random forest approach to estimate ground-level PM₅concentration in Northern region of Thailand. Environ Sci Pollut Res 30:88905–88917. https://doi.org/10.1007/s11356-023-28698-0.
Wu Y, Lin S, Shi K, Ye Z, Fang Y (2022a) Seasonal prediction of daily PM₅concentrations with interpretable machine learning: a case study of Beijing, China. Environ Sci Pollut Res 29: 45821–45836. https://doi.org/10.1007/s11356-022-18913-9.
Wu Q, Li T, Zhang S, Fu J, Seyler BC, Zhou Z, Deng X, Wang B, Zhan Y (2022b) Evaluation of NOx emissions before, during, and after the COVID-19 lockdowns in China: A comparison of meteorological normalization methods. Atmos Environ 278:119083. https://doi.org/10.1016/j.atmosenv.2022.119083.

Download PDF

Journal Publication

published 05 Aug, 2024

Read the published version in Environmental Science and Pollution Research →

Editorial decision: Major Revision
28 Apr, 2024
Reviewers agreed at journal
28 Mar, 2024
Reviewers invited by journal
28 Mar, 2024
Editor assigned by journal
22 Mar, 2024
First submitted to journal
21 Mar, 2024

You are reading this latest preprint version

Estimating visibility and understanding factors influencing its variations at Bangkok airport using machine learning and a game theory-based approach

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Study area and general climate

Data collection and processing

Visibility prediction with machine learning (ML) models

Meteorological normalization of visibility

SHapley Additive exPlanation (SHAP): A game theory-based approach

Effect of hygroscopic growth on visibility with SHAP approach

Results and Discussion

Monthly variation of visibility, air pollutants, and other variables

Correlation between visibility and predictor variables

Performance evaluation and comparison of machine learning models

Seasonal average original and meteorologically normalized visibilities

Influences of factors on visibility through SHAP values

Effect of hygroscopic growth of aerosols on visibility

Conclusions

Declarations

References

Status:

Journal Publication

Version 1

\(\rho = \frac{1}{n-1}{\sum }_{i=1}^{n}\frac{(\stackrel{-}{O}-{O}_{i})}{{\sigma }_{o}}\times \frac{(\stackrel{-}{P}-{P}_{i})}{{\sigma }_{p}}\)	(1)
\(MB= \frac{\sum _{i=1}^{n}({P}_{i}-{O}_{i})}{n}\)	(2)
\(ME= \frac{\sum _{i=1}^{n}\|{P}_{i}-{O}_{i}\|}{n}\)	(3)
\(RMSE= \sqrt{\frac{\sum _{i=1}^{n}{({P}_{i}-{O}_{i})}^{2}}{n}}\)	(4)