An artificial neural network and multiple linear regression methods are used in the development of the air quality forecasting model. Given that the parameter data is a time series, a neural network and multiple linear regression (MLR) for time series data development are necessary. In this study, the input parameters are PM10, SO2, NO2 and the output parameter is AQI. An independent data set is needed for the model's training, testing, and validation in order to construct the ANN and MLR models for air quality prediction.
Consequently, five years of data 2019–2023 is taken for Kesarganj road Meerut and Khora Colony Ghaziabad from the website of the Uttar Pradesh Pollution Control Board.
3.1 Statistical Method
One of the most used statistical methods is regression analysis. A multivariate statistical method called multiple regression analysis is used to look at the connection between one dependent variable and a group of independent variables. The purpose of multiple regression analysis is to use independent variables whose values are known to predict the single dependent variable. The regression or response function f represents the influence of independent factors on the response mathematically:
\(y\) - dependent variable.
\({b}_{1},{b}_{2},{b}_{3},\dots .,{b}_{n}\) - regression parameters(unknown)
\(f-\) It is often believed that the form is known
For the observed response variable, a regression model is developed
$$\text{z}=\text{y}+\text{Ɛ}={f(x}_{1},{x}_{2},{x}_{3,\dots ..,}{x}_{n};{b}_{1},{b}_{2},{b}_{3},\dots .,{b}_{n})+\text{Ɛ}$$
$$\text{Ɛ}- \text{e}\text{r}\text{r}\text{o}\text{r} \text{i}\text{n} \text{z}.$$
The method of least squares can be used to determine unknown regression parameters\({b}_{1},{b}_{2},{b}_{3},\dots .,{b}_{n}\)
$${E(b}_{1},{b}_{2},{b}_{3},\dots .,{b}_{n})=\sum _{j=1}^{n}{\left({z}_{j}-{y}_{j}\right)}^{2}=\sum _{j=1}^{n}{\left({z}_{j}-{f(x}_{1},{x}_{2},{x}_{3,\dots ..,}{x}_{n};{b}_{1},{b}_{2},{b}_{3},\dots .,{b}_{n})\right)}^{2}$$
Where \({E(b}_{1},{b}_{2},{b}_{3},\dots .,{b}_{n})\) is the error function.
In order to estimate \({b}_{1,}{b}_{2},{b}_{3},\dots .,{b}_{n}\), we solve the system of equations to minimize E:
$$\frac{\partial E}{\partial {b}_{i}}=0, i=\text{1,2},,\dots ,n$$
Values of different parameters in MLR model for Meerut city and Ghaziabad city are given in Table 1 and Table 2 respectively. The graphs of frequency and regression standardized residual are given in Fig. 1 and Fig. 2 for Meerut and Ghaziabad city respectively.
Table 1
Model | R | R Square | Adjusted R Square | Standard Error of the Estimate | R Square Change | F Change | df1 | df2 | Sig. F Change |
1 | 0.998 | 0.996 | 0.996 | 2.64435 | 0.996 | 3861.144 | 3 | 47 | \(<0.001\) |
Predictors: (Constant), NO2, SO2, PM10 Dependent Variable: AQI |
Table 2
Model | R | R Square | Adjusted R Square | Standard Error of the Estimate | R Square Change | F Change | df1 | df2 | Sig. F Change |
1 | .995 | .989 | .988 | 5.95011 | .989 | 1271.044 | 3 | 42 | < 0.001 |
3.2 Multilayer Perceptron
An artificial neural network is a computational or mathematical model that draws inspiration from the architecture and functionality of biological brain networks (Kumar and Sharma 2017). In general, their neuronal activation features, training approach, network structure, connection pattern, and data processing capability define them. The Multilayer Perceptron is the most widely used neural network model. Because this kind of neural network needs a desired output in order to learn, it is referred to as a supervised network. Using historical data, the objective of this kind of network is to build a model that accurately connects the input and output. In this manner, if the desired outcome is unknown, the model can be utilized to produce the output. When the inputs are transferred from the input layer to the hidden layer, they are multiplied by the interconnection weights. They are added together and then handled by a nonlinear function (often the hyperbolic tangent) inside the hidden layer. If there are many hidden layers, the processed data is increased by the connection weights once more when it exits the first hidden layer, summed, and then handled by the second hidden layer, and so forth.
The output of the neural network is finally produced by multiplying the data by the connectivity weights and processing it one more time in the output layer. To train the neural network for any task, a number of input-output mapping experiments are needed. For any trained neural network to produce dependable results, these data are among the most crucial components. Consequently, in order to include all the necessary information, the training sample data must be fairly large and comprise a wide range of data from various process parameters and experimental settings. Figures 3 and 4 shows the structure of neural network for Meerut and Ghaziabad city respectively with inputs PM10, SO2 and NO2, one hidden layer and output as AQI.
Figures 4 and 5 show the modal summary with relative error in training and testing for both cities.