Prediction of PM2.5 concentrations using soft computing techniques for the megacity Delhi, India

Over the past few years, the concentration of fine particulate matter (PM2.5) in Delhi’s atmosphere has progressively increased, resulting in smog episodes and affecting people’s health. Therefore, accurate and reliable forecasting of PM2.5 concentration is essential to guide effective precautions before and during extreme pollution events. In this work, soft computing techniques, including Artificial Neural Network and Gaussian Process Regression are employed to predict PM2.5 concentrations in Delhi. Four models, namely, multi-layer feed-forward neural network (MLFFNN), General regression neural network, Gaussian process regression with ARD squared exponential kernel (GPARD_sqexp) and Gaussian process regression with ARD rational quadratic kernel (GPARD_rat_quad) are built using meteorological and air quality data corresponding to a two-year period (2015–2016). The results of the study suggested that MLFFNN showed the best prediction performance among the four models, with testing correlation coefficient (R) 0.949, Root mean square error 30.193, Nash–Sutcliffe efficiency index 0.892 and Mean absolute error 18.388. Moreover, sensitivity analysis performed to understand the importance of different input variables reported that PM10, wind speed, air quality index and aerodynamic roughness coefficient (Z0) are the most critical parameters influencing MLFFNN model forecasts. On the whole, the work has demonstrated that the artificial neural network model is more capable of dealing with PM2.5 forecasting in Delhi urban area than the Gaussian process regression model.


Introduction
Fine particulate matter (PM 2.5 ) is a heterogeneous mixture of organic chemicals, reactive metals and other inorganic elements (Wang et al. 2018). It is regarded as a major contributor to the growing phenomenon of acute air pollution episodes in developing economies. In recent years, the high concentration of PM 2.5 has become a critical environmental, economic and health problem in Delhi due to the city's fast-paced economic and industrial development and unprecedented growth in the number of motor vehicles. From 2013 to 2019, the average annual PM 2.5 concentrations in the city ranged from about 100 lg/m 3 to 150 lg/m 3 (Li et al. 2021), which is almost three times more than the national ambient air quality standards (60 lg/m 3 ) prescribed by the Central Pollution Control Board (CPCB 2009). A study reported that an estimated 14,844 premature deaths occurred in Delhi in 2015 due to high levels of PM 2.5 (Maji et al. 2017). In addition to this, PM 2.5 also induces oxidative stress in some plant species, which results in their abnormal growth and development (Bench 2004). High PM 2.5 concentrations can result in extreme air pollution episodes that cause disruption in aircraft and road traffic and trigger respiratory illnesses (Guo et al. 2019). Starting from 2012, Delhi experienced multiple prolonged and extreme smog episodes characterized by remarkably high PM 2.5 levels ([ 700 lg/m 3 ), which forced the closure of schools and offices and led to an increase in the number of hospital admissions. Under such circumstances, predicting the PM 2.5 concentration has become a huge talking point in the field of air pollution forecasting due to a number of factors, especially for Delhi. Thus, precise modeling of PM 2.5 concentration is vital to provide scientific support to decision-makers and stakeholders in taking necessary action against the pollution hexes (Masood et al. 2017). Forecasting techniques can be categorized into three primary clusters, i.e., numerical methods, statistical methods and soft computing methods. A number of researchers have applied these methods in PM 2.5 forecasting (Frohn et al. 2002;Lei et al. 2019;Akhtar et al. 2018). However, it has been observed that the soft computing methods are more efficient in handling the underlying non-linear associations between the variables as compared to other techniques.
More recently, with the advent of quantum computing, researchers have successfully developed and used various soft computing techniques, such as artificial neural network (ANN), Gaussian progression regression (GPR), and other techniques to perform short-and long-term forecasting studies. For instance, Wang and Wang (2019) applied an ANN model based on a backpropagation algorithm (BP) for providing real-time PM 2.5 concentrations in China. The simulation outcome indicated that the proposed ANN model incorporating genetic algorithm (GA) was able to better capture the trends in the pollutant concentration data and showed more reliable and accurate results. Similarly, Dai et al. (2019) evaluated the ability of an MLFFNN model to predict daily levels of PM 2.5 using multi-scale meteorological data in China. The outcome revealed that the BP algorithm-based MLFFNN model successfully simulated the PM 2.5 concentration levels with improved accuracy. Suleiman et al. (2019) evaluated the ability of three ML approaches (Artificial Neural Networks, Support Vector Machines and Boosted Regression Trees) to forecast PM 2.5 levels on the inputs of pollutant concentration, meteorological variables and traffic parameters. It was concluded that the prediction performance of the ANN model using K-fold cross validation technique was stable and accurate relative to other approaches. In another study, Jang et al. (2020) used GPR models with Periodic, RBF and Matern kernels for the prediction of PM 2.5 concentrations in Seoul. The inputs for the model consisted of various meteorological and air quality parameters. Results showed that the models were capable of reproducing accurate PM 2.5 concentrations with less average prediction error. Sun and Sun (2017) comparatively assessed the LSSVM and GRNN models for daily prediction of PM 2.5 on the inputs of pollutant concentration and temperature. The results of the study showed that the LSSVM model presents better performance in predicting PM 2.5 concentrations than the GRNN model. A number of studies have been carried out in different cities and climate zones to investigate the performance of soft computing models in the field of PM 2.5 forecasting (Kang et al. 2018). However, based on relevant literature, the study of soft computing in PM 2.5 forecasting has been limited in Delhi, though some studies were conducted to estimate the concentration of PM 2.5 (Agarwal et al. 2020). Additionally, to the best of the authors' knowledge some state-of-the-art soft computing models (such as GPR and GRNN) have not been applied to forecast PM 2.5 concentrations in Delhi. Besides, in most of the forecasting studies, researchers have applied only classical meteorological parameters such as wind speed, wind direction, cloud cover, atmospheric pressure, vertical wind speed etc., to predict PM 2.5 concentrations. Parameters such as aerodynamic roughness coefficient (Z 0 ), precipitation, evaporation, maximum and minimum temperature have not been used, even though they are known to influence the behavior of PM 2.5 in the atmosphere (Srivastava et al. 2018;Ashworth et al. 2016). Hence, in view of the above observations, the present study has been conducted for achieving the following objectives: • To design and develop four soft computing models, i.e., MLFFNN, GRNN, GP ARD_sqexp and GP ARD_rat_quad , for forecasting PM 2.5 concentrations in Delhi. • To compare the forecasting performances of these soft computing models under the context of PM 2.5 prediction. • To perform sensitivity analysis and determine the most critical input parameters influencing PM 2.5 forecasting.
The rest of the paper is organized as follows. Section 2 presents the details of methodology by particularly talking about the study region, data collection and the estimation of aerodynamic roughness coefficient (Z 0 ). Section 3 presents the fundamental concepts behind the applied soft computing techniques and the adopted performance metrics. Section 4 presents the modeling performances, comparative and sensitive analysis along with a detailed discussion on results. Finally, conclusions have been presented in Sect. 5 along with the limitations and future recommendations.

Study area and data
Delhi city is located in the northern part of India, at 28°42 0 N latitude, 77°18 0 E longitude, and an altitude of 216 m above mean sea level (Kumar and Tewary 2021). The region has a subtropical steppe climate, characterized by hot, humid summers and cold winters. The annual mean temperature varies from 12.9°C to 34.8°C, the annual relative humidity is about 60%, and the annual mean precipitation varies from 600 to 800 mm (Ramachandran 2007). Being landlocked, the city experiences high-level invasion of PM 2.5 throughout the year, causing severe environmental and health problems. The study area, along with the air quality monitoring (R K Puram) and meteorological (Safdarjung airport) stations monitored by the Delhi Pollution Control Committee (DPCC) and Indian Meteorological Department (IMD), is shown in Fig. 1.
In this study, the daily averaged data of eleven pollutants (CO, Ozone, PM 10 , PM 2.5 , NO, NO 2 , NOx, NH 3 , SO 2 , Benzene, and Toluene) was collected from the DPCC monitoring station at R K Puram. Besides, daily measurements of eight meteorological parameters, including Temp max , Temp min , WS avg , WD, rainfall, Evaporation, Humidity, and atmospheric pressure, were procured from the IMD meteorological station at Safdarjung airport. Moreover, the air quality index and aerodynamic roughness coefficient were considered additional variables. A dataset of 731 observations corresponding to a period of two years (i.e., 2015-2016) were utilized for model development and validation. Out of 731 observations, a total of 548 observations (75%) were used for the training process whereas the rest 183 observations (25%) were used for the testing process (validation). The characteristics of both the training and testing data sets have been shown in Table 1. In general, training process exposes the models to the input data and utilizes design algorithms to learn patterns from that data in order to make generalized predictions. The validation process evaluates the predictive accuracy of a trained model against the observed dataset. Moreover, this process limits the possibility of model overfitting, which normally occurs when the model starts learning from the noise and inaccurate values present in the dataset.
The data for the input and output parameters were normalized through the Min-Max normalization technique in the numerical range of [-1, 1] using the following Eq. (1): where V i = ith normalized value; y i = ith observed value for the variable y; min(y) = minimum value in the dataset; max(y) = maximum value in the dataset.

Aerodynamic roughness coefficient (Z 0 )
The aerodynamic roughness coefficient is usually defined as the height in meters above the ground surface where the wind profile attains a zero value (theoretically). It is an important aerodynamic property that is linked with the exchange of momentum, energy, and trace gases between the atmosphere and land surfaces. High-precision estimation of aerodynamic roughness (Z 0 ) in urban areas plays a crucial role in air quality forecasting and identifying potential ventilation paths. Overall, a number of methods have been proposed in the literature for the parameterization of Z 0 in urban areas (Verkaik 2000;Davenport 1960;Wieringa 1992). Out of these methods, the two most prevalent have been the micrometeorological and morphometric approaches. In this study, the local value of Z 0 under near neural conditions has been evaluated micro-meteorologically using the logarithmic wind profile equation: where U(Z) is the wind speed at height Z, U Ã is the friction velocity, Z is the observation height(m), K is the Karman constant (0.4) and Z 0 is the aerodynamic roughness coefficient. The friction velocity can be inferred using the following equation: where r h is a parameter defining horizontal turbulence which can be calculated using the following relation (Weber 1999).
where r u and r v are the standard deviations of horizontal wind components in an orthogonal coordinate system. Finally, from Eq. (2), Z 0 has been computed as 3 Applied Soft computing techniques

General regression neural network (GRNN)
GRNN is one of the most powerful non-parametric regression-based techniques with dynamic architecture that displays strong non-linear forecasting ability in atmospheric sciences. This technique is a variation of the radial basis function (RBF) network, which is based on the one- pass learning approach and does not require an iterative training procedure (Goodband et al. 2008). It approximates any arbitrary function between the input and output vectors by deducing the function estimate directly from the training data. The GRNN model consists of four layers: input, pattern, summation, and output, as shown in Fig. 2. The first layer, i.e., the input layer, receives and collects the input signals and transfers them to the pattern layer. The pattern layer non-linearly transforms the input information in a manner as to memorize the relationship between the input and the proper output of the pattern layer. The summation layer incorporates two summation neurons, namely S 1 and S 2 . The summation S 1 generates the arithmetic sum of the pattern layer output, and S 2 produces the weighted sum of the pattern layer output. Finally, the output layer divides the outcome of S 1 by that of S 2 and produces the final outcome of the network.

Gaussian process regression (GPR)
Gaussian process regression is a state-of-the-art kernel-based soft computing technique that chooses different combinations of kernel functions to infinitely approximate real data. The technique is nonparametric, meaning that the complexity of the model grows as more data points are incorporated. It can represent complex input and output relationships by applying infinite number of variables and allows the data to define the complexity level through Bayesian inference (Chu et al. 2005). One of the most significant attributes of the Gaussian process is the variety of Kernel functions, which leads to the generation of functions with different degrees or different types of continuous structures. Moreover, this technique provides a natural measure of prediction uncertainty arising from model input data. Due to its simplicity and desirable prediction performance, GPR has been widely applied in the field of air pollution forecasting.

Multilayer feed forward neural network (MLFFNN)
MLFFNN is an interconnection of perceptrons where the input signal only propagates in the forward direction, first through input nodes, then through hidden nodes, and finally through output nodes (no back-loops). These nodes are connected through weighted linkages and are arranged in a sequence of three different layers, namely an input layer, a hidden layer, and an output layer (Fig. 3). Moreover, the networks have one or more than one hidden layer in their configuration, allowing them to have more flexible frameworks for performing non-linear function estimation tasks like air pollution forecasting (Masood and Ahmad 2021).

Model evaluation metrics
In this study, four most popular statistical parameters, such as Nash-Sutcliffe efficiency index (NSE), root mean square error (RMSE), Mean absolute error (MAE), and Correlation coefficient (R), have been implemented to assess the performance of the developed PM 2.5 forecasting models. These metrics have been expressed mathematically as follows: 1. Nash-Sutcliffe efficiency index 2. Root mean square error (RMSE) 3. Mean absolute error (MAE) 4. Correlation Coefficient (R) where I p andI o represent the predicted and observed values and N is the number of observed values.

Z 0 estimation
The aerodynamic roughness coefficient (Z 0 ) was determined micro-meteorologically using the logarithmic wind profile equation [Eq.
(2)]. The Daily average wind speed data (731 observations) collected on a 10 m hub height were considered for the computation of Z 0 . The temporal variation in the daily mean roughness coefficient values has been shown in Fig. 4. According to Fig. 4, the value of Z 0 varies from 2.31 Â 10 -4 to 8.894 m, with a long-term mean (calculated over the entire two-year period) of 2.514 m. This computed value of Z 0 is found to be consistent with the literature (2-3 m) (Slade 1969;Grimmond and Oke 1999). Furthermore, the relationship of Z 0 with wind speed has been analyzed and plotted using the observed data (Fig. 5). As shown in Fig. 5, for low wind speeds (0-0.5 m/s), a high-frequency distribution of low Z 0 values (0.5-2.5 m) is observed. As the wind speed increases, a slightly less dense scattering of Z 0 values is seen which leads to a drop in the value of Z 0 . Overall, the Z 0 scores were found to be less deviating in this region at low wind speed values (\ 1 m/s) in what are likely stable atmospheric conditions forming during the night. The MLFFNN model proposed for this work is a result of an iteration optimization approach. A series of trials were carried out to determine the learning rate, momentum, number of hidden layers and the number of neurons hidden in each layer; a classical Levenburg-Marquardt algorithm with a learning rate of 0.1 for 2500 epochs was applied in training the model. As a result, the MLFFNN 1-25-5-1 model seemed to be the best predictor for PM 2.5 in this study, with two hidden layers. Figure 6 shows the forecasting performance of the MLFFNN model for both the development and verification stages (training and testing). The agreement plot (Fig. 6) Table 2 and Fig. 6 suggests that the accuracy of the MLFFNN model is acceptable for predicting PM 2.5 concentrations.

Performance of GP model
The kernel-based GP model is constructed using a trialand-error approach. Two kernel functions, i.e., ARD squared exponential and ARD rational quadratic, were utilized to prepare the model. A standard Gaussian noise value of 0.1 was set for both the kernels. The forecasting performances for both the models GP ARD _ sq_exp and GP ARD_rat_quad are presented in Table 2. The results highlight the outperformance of the GP ARD_rat_quad model as compared to GP ARD_sq_exp . The performance metrics computed for the GP ARD_rat_quad model for the simulation The agreement plots (Fig. 7a-d) for both these models suggest that the predicted values were in close agreement with the observed PM 2.5 values. Overall, it can be concluded that the results of the GP ARD_sq_exp and GP ARD_rat_quad models were comparable in the training and the testing stages, indicating that these models were also found to be suitable in the prediction of PM 2.5 .

Performance of GRNN model
Similar to MLFFNN and GP models, the GRNN model development is also based on an iterative procedure that includes a certain amount of trial and error. The design of the GRNN model involves setting up an optimal value for the user-defined parameter (i.e. spread value). In this study, a spread value of 0.35 was found to work well for the adopted GRNN model. The simulation performance of the GRNN model for both the development and verification stages (training and testing) is shown in Fig. 8. As observed in Fig. 8, there is a minor deviation of the forecasted values from the agreement line or observed values.
To evaluate the performance of this model, the performance metrics for both the training and testing stages are presented in Table 2

Performance comparison of the models
In this section, the forecasting performances of the four developed soft computing models in estimating PM 2.5 concentrations have been compared for both training and testing phases, respectively. This comparison was performed using the standard performance metrics (NSE, RMSE MAE and R) (    (Taylor, 2001), as presented in Fig. 10. The Taylor diagram presents a graphical representation of three standard statistical metrics for comparing the performances of different models. It provides a method for comparing the simultaneous variation of these performance metrics (i.e., the correlation coefficient (R), standard deviation(SD), and root-mean-square error (RMSE)) within a 2-D plot. Moreover, the diagram combines the statistical measurements (RMSE and SD) with the correlation coefficient (R) and delineates the model performances on the basis of the distance between the observed and the predicted results.The observed data are displayed as a point on the x axis at R = 1 and SDV = 1. The centered normalized RMS difference (E) between the simulated and observed patterns, is the distance to this point. As shown in Fig. 10, the MLFFNN model presents a higher correlation with a

Sensitivity analysis
Sensitivity analysis is a technique that quantifies the impact of uncertainties of one or more inputs on the model outcome.The analysis is an important component of modeling process as it is used to systematically study the complex model interactions between the variables. Moreover, it provides a realistic view of the modeling strategy by highlighting critical parameters that significantly influence the model predictions and should be most accurately measured so as to maximize the precision of the model and give a general indication of the reliability of the model predictions. The sensitivity analysis was carried out to single out the most influential input parameters in predicting PM 2.5 concentrations using the best performing technique (MLFFNN model). For this analysis, different combinations of the testing data set have been used to identify the critical input parameters. The effect of each input parameter on the output parameter (PM 2.5 ) was represented in terms of RMSE and R. The results from the

Discussion
In this study, four soft computing models i.e., MLFFNN, GRNN, GP ARD_sqexp and GP ARD_rat_quad were applied to predict daily PM 2.5 concentrations based on temporal, meterological and air quality variables.The training period ranges from January 1, 2015 to June 30, 2016 containing 548 observations and the testing period ranges from July 1, 2016 to December 31, 2016 containing 183 observations. The forecasting results revealed that the developed models had the prediction potential for PM 2.5 concentrations and were able to efficiently handle the complex and non-linear relationships that existed between the air quality input variables.
The comparative assessment of the models revealed that the MLFFNN model was the most stable and accurate in PM 2.5 forecasting, relative to the other softcomputing models used in this study.The scatter diagrams (Figs. 6,7,8) demonstrated a high value of the coefficient of determination (R 2 ) for the MLFFNN model in both training and testing stages,which consequently affirmed the superiority of the MLFFNN model. Moreover, further comparison among the softcomputing models was done on the basis of performance metrics ( Table 2) which revealed that the MLFFNN model presented the lowest RMSE and MAE and highest NSE and R among all models. The Taylor diagram (Fig. 10) compared the prediction performance of the models by identifying pattern correlation, RMSE and variability between the observed and the predicted data. It was seen from the Fig. 10 that MLFFNN model was located nearest to the observed data point. This indicated that the MLFFNN model offered better performance in terms of PM 2.5 forecasting in comparison with the other softcomputing models.
The findings of this study were also compared to previous studies (Perez and    for both training and testing phases, respectively. • The sensitivity investigation was performed to single out the most important parameters with respect to PM 2.5 concentrations. As per the investigation, wind speed has the maximum influence on PM 2.5 concentrations followed by air quality index (AQI) and aerodynamic roughness coefficient (Z 0 ). • It can be concluded that the proposed MLFFNN model is extremely efficient in modeling complex non linear phenomenon such as fine particulate matter (PM 2.5 ) air pollution. Accordingly, this model can be applied more effectively by scientists, decision-makers and local air quality management based agencies to provide a wider assessment of the state of air quality in terms of potential human exposure, airborne concentrations and deposition of PM 2.5 .
Despite the higher accuracy of the MLFFNN model, there still exists some limitations that inhibits the use of MLFFNN in forecasting PM 2.5 concentrations.
• The data for model training and validation is limited to a short period of two years. • Other novel meteorological and traffic related parameters that effect PM 2.5 concentrations such as mixing height, clearness index, traffic volume, and motorization rate are not included in the model. • The efficient optimization algorithms needed for hyper parameter tuning have not been applied which could have further improved the prediction performance.
Therefore, in future researches, the applied soft computing models can be improved by using extended historical air quality and meteorological datasets with more explanatory parameters for PM 2.5 forecasting. Besides, the models can be used to generate daily forecasts of gaseous air pollutants such as NO 2 , CO, SO 2 and O 3 . It is also recommended that future works can incorporate state-ofthe art optimization algorithms for hyper parameter sampling and network pruning that will improve the model performance even further.