Curvilinear and gamma generalized linear models for modeling the dependency of air pollution levels on meteorological conditions in Kathmandu valley


 Effects of meteorological parameters including confounding effects of seasonality and lag effects on air pollution levels are investigated and quantified for Kathmandu valley, Nepal using daily local temporal data of 2017-2020 available through air pollution monitoring by Department of Environment and US Embassy, Kathmandu, Nepal, and meteorology monitored by Department of Hydrology and Meteorology, Kathmandu, Nepal. Data modeling are performed through Exponential, Box-Cox transformed and Gamma Generalized Linear Models since air pollution levels are found non-normal with the presence of non-constant error variance in linear modeling. Results depict high proportions of observed air pollution variations (79%-85%) explained by the fitted models with around 5% reduction in PM 10 and PM 1 levels per 1 0 C increase in average temperature and significant increase in surface O 3 level (0.177 Box-Cox transformed value) per 1 0 C increase in average temperature. Similarly, around 0.7% and 2% decrease in PM 1 and PM 10 per 1% increase in relative humidity, 0.032 decrease in transformed value of PM 2.5 per 1 mm increase in rainfall, and 7.3% decrease in PM 10 per 1 m/s increase in wind speed are also detected. Other effects are also quantified in terms of Box-Cox transformed values for statistically significant effects due to relative humidity and wind. In conclusion, meteorological conditions are found significant contributing factors in determining air pollution levels as demonstrated by statistical modeling. On the long run, atmospheric conditions can play vital roles in air pollution situation shifts mainly due to climate change characterized by changes in meteorological values.


Introduction
Air pollution levels in the ambient air is primarily dependent upon its sources such as vehicular, industrial, domestic fuel combustion, solid waste, etc. Apart from these sources, the temporal variation in air pollution can also be attributed to local topography such as plain terai region, hilly region or mountainous region, weather or meteorological condition assessed by different parameters like temperature, rainfall, humidity and wind speed and direction, regional transport and atmospheric chemistry. Even though many studies have been conducted to assess the dependence of air pollution levels on weather parameters through model building at other parts of the world, such studies conducted in Kathmandu have been very few so far. Because of the lack of adequate number of studies that actually quantified the relationship between air pollution levels with atmospheric conditions based upon local daily data, the present study has been carried out to fulfill the gap which is based upon statistical modeling using local available air pollution and meteorological data. Moreover, air pollution situation in Kathmandu valley has been important environmental and public health concerns to Kathmandu valley inhabitants as shown by many studies conducted previously [2,3,5,8,14,18,19]. According to world air quality report 2019, Nepal is amongst the top 10 air polluted nations in the world, ranked eight as worst polluted as per the population weighted ambient PM2.5 level and Kathmandu ranked sixth worst air polluted among capital cities of the world [7]. Many studies conducted earlier at different places have shown association between air pollution and meteorological parameters. A study conducted at North Chennai, a coastal city in India, during monsoon, post-monsoon, summer, and pre-monsoon seasons for 2010-11 using regression analysis analyzed the influence of temperature and relative humidity on ambient SO2, NOx, suspended particulate matter (SPM) and respirable SPM concentrations. The results of the study showed that both SO2 and NOx were negatively correlated in summer and moderately and positively correlated during post-monsoon season with temperature. Weak to moderate correlations existed between the temperature and ambient pollutant concentration during all seasons indicating the influence of inconstant thermal variation in the coastal region. Statistically significant negative correlations were found between humidity and particulates in all the four seasons [9]. Similarly, a study conducted in Dhaka, Bangladesh during 2013-2017 using linear and polynomial regression showed that PM2.5 and PM10 were negatively related with temperature and relative humidity. While relationships of PM and temperature in all other seasons are negative, this research found positive relationship during monsoon season which implied that high temperature with higher humidity during this season contribute to suspension of PM [12]. In January 2013, real-time hourly average concentrations of six pollutants (CO, NO2, O3, PM10, PM2.5 and SO2) based on data from air quality monitoring stations in major Chinese cities analyzed the air pollution characteristics and their relation to multi-scale meteorological conditions. Meteorological conditions were the primary factor determining day-to-day variations in pollutant concentrations, explaining more than 70% of the variance of daily average pollutant concentrations over China [6]. Moreover, a study conducted in Turkey during 2003-2005 investigated relationship between monitored air pollutant concentrations such as SO2 and the total suspended particles (TSP) data and meteorological factors such as wind speed, temperature, relative humidity and atmospheric pressure was investigated in months during October-March. According to the results of linear and non-linear regression analysis, it was found that there is a moderate and weak level of relation between the air pollutant concentrations and the meteorological factors [1].
A study conducted in Kano metropolis, Nigeria during 2018 monitored ambient day-time concentration of NO2, PM10, SO2, H2S and CO in the months of dry (April) and wet (August) months and corresponding meteorological data were collected from Nigerian Meteorological Agency. Meteorological parameters temperature, relative humidity and precipitation washout or scavenging effect on the pollutants were analyzed quantitatively. The result showed concentration of the pollutants in the atmosphere were lower under condition of increased precipitation, low temperature and increased humidity level compared to that of the dry season [16]. A similar study investigated multi-timescale meteorological effects on the urban air pollution using measurements data of PM10, SO2, NO2, CO, and O3 and meteorological variables over the period of 1999-2016 in Seoul, South Korea. The long-term air quality data were decomposed into trend-free short-term components and long-term trends and the effects of meteorology and emissions were quantitatively isolated using multiple linear regression with meteorological variables. In terms of short-term variability, inter-correlations among the pollutants and meteorological variables showed warm and stagnant conditions in the migratory high-pressure system are related to high PM10, while the strong irradiance and low NO2 by high winds at the rear of a cyclone are related to the high O3. In terms of long-term trends, decrease in PM10 (−1.75 µg m −3 yr −1 ) and increase in O3 (+0.88 ppb yr −1 ) in Seoul were largely contributed by the meteorology-related trends [17].
There have been very few studies that associated air pollution on meteorological parameters in Kathmandu valley. One of the few studies conducted in Kathmandu valley during 2003-2005 found association between meteorological conditions like temperature, rainfall, humidity, atmospheric pressure, wind direction and speed with elemental concentrations of PM10 in Kathmandu valley. Increase of rainfall, temperature and humidity had negative correlation with average PM10 concentration in Kathmandu valley (r =-0.358 with rainfall and max. temperature, r = -0.539 with humidity whereas positive correlation with atmospheric pressure (r = 0.237) and wind speed (r = 0.162) [4]. Similarly, a more recent study conducted by NHRC in 2014/15 associated ambient PM2.5 with meteorological parameters and found negative association with temperature (-0.711), rainfall (-0.345) and humidity (-0.207) based upon daily average data for one whole year [14]. Bhaisipati, Pulchowk and Bhaktapur were incorporated for the analysis. Data for Kirtipur station was unavailable.

Meteorological data
Daily meteorological data was collected for temperature (maximum and minimum), rainfall, relative humidity, wind speed from Department of Hydrology and Meteorology (DHM), Government of Nepal (GoN), Kathmandu covering three years daily data 2017-early 2020. Data includes eight stations spread over all the three districts of Kathmandu valley mainly for associating air pollution to atmospheric conditions. The stations are Bhaktapur, Nagarkot, Changunarayan, Godavari, Khokana, Khumalatar, Panipokhari and Kathmandu Airport.

Analysis
Descriptive analysis and subsequent assessment is based upon monthly averages of meteorological parameters and corresponding air pollution averages with graphical representations. Effect quantification of meteorological parameters, seasonality, trend and autoregressive nature of time series variables are explored through statistical models including curvilinear and Gamma generalized linear model (GLM) based upon daily averages.

Models
Regression models are built to associate air pollution concentration levels on meteorological variables and confounders like seasonality and trend. Additionally, autoregressive term(s) are also explored and added to account the effects of autoregressive effects since the time series data of daily pollution levels can be autoregressive and found true after computing autocorrelation coefficients at different lags. Various regression models like curvilinear models, Box-Cox transformed models [13] and Gamma GLM are explored for their suitability since pollution concentration levels are found to be highly skewed and non-normally distributed which ruled out suitability of linear regression models. The functional forms of the accounted models are given below.

Box-Cox Transformed Model
The Box_Cox transformation is shown below where  is a constant determined by goodness of fit and model adequacy test results.
The corresponding model is: Here also y is the response variable, i s  are unknown parameters, xi s are predictors (Meteorological variables and confounders), and ' s  are residuals.

Gamma Generalized Linear Model
Generalized Linear Model for Gamma distributed dependent variable can be used when the response variable is positive continuous and non-normal positively skewed instead of curvilinear models with variable transformations provided that the model performs relatively superior to diagnostic tests and data transformations are deliberately avoided for difficulty in interpretation. In the model building process, the option of using gamma GLM are also explored since air pollution levels are found non-normal, positively skewed and dependency of error variance on mean is detected even after using curvilinear and Box-Cox transformed models. Additionally, the assumption of homoscadasticity of residuals in linear regression is also relaxed while using gamma GLM [15]. Finally, the model is used if suitable regarding goodness of fit and other major model adequacy tests.

Descriptive analysis
The monthly averages of air pollution levels (PM10, PM2.5 and ozone) and corresponding meteorological parameters in Kathmandu valley during 2017-2020 were assessed. For PM1 monthly averages were assessed only for 2019 because of data unavailability for the remaining years during 2017-2020 period. The annual averages of temperature, rainfall, humidity and wind are found to be 18.5 0 C, 1431.5 mm, 77.6% and 1.3 m/s, respectively. The monthly variation shows lowest average temperature (10.6-13.4 0 C ) in winter during when particulate air pollution levels are found to be high, relatively warmer with average ranging between 15.6 to 19.6 0 C in most spring/autumn months (March, April, October, November) and highest averages ranging between 21.4 to 23.9 0 C during most of the summer/monsoon months (April-September) during which period the particulate air pollution levels are found to be relatively low which ascertains that temperature and particulate air pollution are inversely related. Similarly, rainfall occurrence is distinctly very high in Monsoon (July-August) with monthly average between 334.4 to 404.2 mm, moderate in May, June and September (116-181 mm) and low in other dry months (1-91 mm) with lowest in November. Considering relative humidity, monthly average shows that the monsoon or around monsoon months (July-September) were the most humid months with average ranging between 80% to 85% and least averages in March and April time (70-72%) with relatively low rainfall during the period. Regarding wind, monthly averages show that the most windy months in Kathmandu valley were from March to June during which period the average wind speed ranged between 1.6-1.8 m/s whereas lowest in winter or around winter time (November-January) with average ranging between 0.7 m/s to 0.95 m/s. Considering particulate air pollution, monthly averages are found to be highest in winter months with approximately 85 -99 µg/m 3 , 64-80 µg/m 3 and 36-41 µg/m 3 for PM10, PM2.5 and PM1, respectively. The averages clearly indicate inverse relationship between temperature and particulate air pollution levels. On the contrary, the averages are found to be lowest in summer / monsoon season with approximately 21-23 µg/m 3 , 13-18 µg/m 3 and 5-17 µg/m 3 for PM10, PM2.5 and PM1, respectively. However, the pattern of monthly and seasonal variations in Ozone is found to be very different compared to particulate air pollution. The levels are found to be highest in warm temperatures during the months of Spring/Summer (March -June) with values ranging between 67-79 µg/m 3 and lowest during most of the winter time (December-January) with values ranging between 31-33 µg/m 3 .

Models
Air pollution level measured by PM2.5 and PM10, PM1 and O3 are modeled on different predictors including meteorological parameters, seasonal effects and a daily trend variable. Additionally, lag effects are also explored since data for modeling are essentially time series. Data exploration of air pollution levels showed significant positive skewness for all the parameters considered which suggested suitability of curvilinear, models with transformation, nonlinear models including Gamma generalized linear model (GLM) rather than the linear models since normality assumption of the air pollution levels cannot be accepted for substantially skewed variables. Exponential model with logarithmic transformation, Box-Cox transformed model (response variable with Box-Cox transformation) and gamma GLM was explored for their suitability and the model which is found to be the best among them was chosen for modeling considering various model adequacy tests including goodness of fit assessed by adjusted R 2 or Omnibus test, heteroscadasticity by residual plot, normality by Kolmogorov-Smirnov (KS) test, autocorrelation by plots up to sufficient lag, and multicollinearity by Variance Inflation Factor (VIF). Models with estimated parameters with 95% confidence interval and p values are shown in Table 2(A) and

Effects 3.3.1 Temperature
Effects of temperature on pollution level has been found most evident amongst the predictors for all air pollutants and statistical models explored with 1 0 C increase in average temperature found associated with 5.1% and 4.6% decrease in PM10 and PM1 levels, respectively. Similarly, 1 0 C increase in average temperature is also found associated with 0.083 decrease in Box-Cox transformed unit of PM2.5 and in contrast, 0.177 increase in Box-Cox transformed unit of Ozone. The negative association between temperature and particulate air pollutants demonstrates cold weather mainly in winter increases particulate air pollution in the ambient air significantly compared to warm atmospheric condition mainly because particulate pollutants are trapped near the ground during colder, calmer months due to temperature inversion. During a temperature inversion, smoke and dust particles are difficult to rise and disperse in the atmosphere which is very much evident in a place like bowl shaped Kathmandu valley characterized by low wind flow. On the contrary, ground level ozone is found relatively higher in warm temperature compared to cold temperature primarily because pollutants emitted by vehicles and industries and other sources chemically react in the presence of sunlight producing Ozone and therefore is most likely to reach unhealthy levels during hot sunny days in urban environments.

Rainfall
Rainfall is found to be statistically associated with PM2.5 level with increase in 1cm rainfall decreases 0.32 PM2.5 expressed in Box-Cox transformed value which indicates that rainfall decreases air pollution concentration in the ambient air. Except for PM2.5, strikingly rainfall is found statistical insignificant even at 15% level for the rest of the air pollutants considered. This may be primarily due to the effect of multicollinearity among the meteorological parameters when all of them are included in modeling as monsoon season with high rainfall is characterized by warm temperatures and high relative humidity compared to winter season, for instance. Otherwise, if monthly averages of particulate air pollution are assessed then it is found that the pollution levels are least during monsoon time including Kathmandu valley which is a strong evidence that rainfall washes away dust particles from air. Nevertheless, since other major parameters like temperature and relative humidity are found statistically significant in models with particulate air pollution levels, rainfall effect was not found statistically significant.

Relative Humidity
Similar to the temperature effect on air pollution levels, relative humidity is also found to be statistically associated with air pollution levels in all the four considered models. Examining the direction and magnitude of effects, it is found that increase in relative humidity decreases ambient air pollution levels with 1% increase in relative humidity is found to be associated with 1.9% and 0.7% decrease in PM10 and PM1 levels, respectively. Similarly, 1% increase in relative humidity is also found associated with 0.026 and 0.183 decrease in Box-Cox transformed unit of PM2.5 and Ozone, respectively. The reduction in Ozone level associated with increase in relative humidity has also been found in other studies also [10,11].

Wind
Wind is another important weather parameter that effects air pollution level significantly as shown by many studies (NHRC, 2016). Wind disperses air contaminants away from their source and therefore, generally, higher wind is found associated with lower air pollution concentration. The present study has also found negative association between PM10 and PM2.5 with wind speed with 7.3% and 0.148 Box-Cox transformed value of PM10 and PM2.5 levels per 1 m/s increase in wind speed, respectively.

Seasonal effects
Along with atmospheric parameters, seasonality can be major contributing factor on variation in pollution levels. With several seasonal features characterized by joint effects of meteorological parameters, air pollution levels tend to differ significantly for different seasons. Exploration of this through modeling, it is found that Winter season characterized by cool temperature, lower relative humidity and wind flows showed that PM10 and PM2.5 increased by 16.6% and 0.278 Box-Cox transformed value compared to Summer. Similarly, Spring season characterized by dry and relatively cool conditions showed that PM10, PM2.5 and O3 increased by 59.7%, 0.594 and 1.716 Box-Cox transformed values compared to Summer, respectively. Similar results are obtained for Autumn season for PM10 and PM2.5 compared to Summer, except for PM1 where the effect seems to be different which could be due to different season specific joint effect of Autumn compared to Spring for PM1. For Ozone, the effect due to Autumn season is found statistically insignificant.

Lag effects
Models depending upon time series data are often affected by lagged variables due to which auto-regressive terms may be required to explain variation in the response variable while modeling major predictors. In the present model building process, this has been explored and is found that the first and second lagged values of the responses are found statistically significant in explaining variation of the air pollution levels. All the particulate air pollution parameters including O3 are found to be affected by their own values at Lag1and are statistically significant with positive association at 1% level. For PM1, additionally Lag2 values is found statistically significant.

Model adequacy tests
The goodness of fit, normality, heteroscedasticity, multi-collinearity and autocorrelation checks were performed under model adequacy test requirements of acceptance of the fitted models. Models performed good as regards to goodness of fit test with 79% -85% of the variance in the response variables explained in curvilinear models or Omnibus Test resulting highly significant for Gamma GLM. Regarding, normality and heteroscedasticity, residuals are found slightly non-normal and variances fairly constant. In order to obtain normal residuals, other curvilinear/nonlinear models were also explored but residuals were still found non-normal or slightly non-normal. Consequently, the fitted models are accepted and indicate that further researches could be required to achieve relatively more accurate modeling results. Considering autocorrelation, the models are found not much affected by high autocorrelations (-1<r<0.3) considering high sample size for data modeling.

Conclusion
Though the major sources that govern the overall average of air pollution in a local environment are emissions from anthropogenic sources like vehicular, industrial, domestic, solid waste and others, natural characteristics like local topography, atmospheric conditions (also caused by human activities) also determine air pollution levels and its temporal variations. The present analysis focused on the atmospheric conditions determined by meteorological parameters like temperature, rainfall, relative humidity and wind. Exploring the dependency of temporal air pollution variation through statistical models including curvilinear and nonlinear models namely exponential, Box-Cox transformed, and gamma GLM revealed statistically significant associations between air pollution levels and the meteorological parameters with address of time series affected confounding variables such as seasonality and autoregressive dependence (lagged effect) with high proportions of variance in air pollution levels explained (79% to 85%). With the lack of adequate number of studies that quantified the effects in Kathmandu valley based upon local data, the present analysis would be useful in assessing quantification of meteorological effects on air pollution level. Results showed around 5% reduction in particulate air pollution (PM10 and PM1) per 1 0 C increase in average temperature and significant increase in surface O3 air pollution (0.177 Box-Cox transformed value) per 1 0 C increase in average temperature. Similarly, around 0.7% and 2% decrease in PM1 and PM10 per 1% increase in relative humidity and 7.3% decrease in PM10 per 1 m/s increase in wind speed are also detected. Other effects are also quantified in terms of Box-Cox transformed values for statistically significant effects due to rainfall, relative humidity and wind.
In conclusion, meteorological conditions are significant contributing factors in determining air pollution levels as demonstrated by statistical modeling of local data in Kathmandu valley. On the long run, atmospheric conditions can play vital roles in air pollution situation shifts mainly due to climate change characterized by changes in meteorological parameter values.

Author Declarations
Funding: The paper is based upon the study funded by WHO under urban health initiative (UHI) program implemented in Kathmandu. WHO reference no. of the study is 202576053.

Conflict of Interests / Competing Interests:
The author declare that there is no conflict of interests. The author has no relevant financial or non-financial interests to disclose. The author has no competing interests to declare that are relevant to the content of this article.

Ethical approval:
The study does not involve human or animal participant data. Analysis and modeling are based upon secondary data. Available air pollution data of Kathmandu valley was acquired from DataPlatform, World Air Quality project. Upon request, data was sent by the provider through email upon the conditions that the data will only be used for the WHO study and need of acknowledgement of the data provider. Remaining air pollution data was also downloaded from AirNow website of US Embassy and Open Data portal of Nepal, both of which provide freely downloadable data access. Similarly, meteorological data was obtained (purchased) from the Department of Hydrology and Meteorology (DHM), Kathmandu upon formal request and fulfilling conditions required for its use.

Data availability/transparency
Monthly meteorological and air pollution data are provided in the manuscript itself. Air pollution monitoring data is available data from related websites as mentioned in ethical approval section and results can be verified from the websites. Author is neither entitled nor permitted to supply raw data except its use for analysis of the study.
Code availability: Not applicable since raw data is not entitled to be supplied.

Authors' Contribution:
The paper has sole author. From collection of data, designing the study, analyzing, interpretations, manuscript writing, including others are done by the author. The submission is approved by the author.