PM2.5 concentration forecasting using Long Short-Term Memory Neural Network and Multi-Level Additive Model CURRENT

Background PM 2.5 concentration predication can provide an effective way to protect public health by early warning. Though there are many methods available, the comparison between multi-level additive model (AM) and long short-term memory (LSTM) neural network in predicting PM 2.5 concentration is limited. This study aimed to compare the performance of multi-level AM and LSTM in predicting hourly and daily PM 2.5 concentration. Methods Air pollution data from Jul 1, 2016 to Dec 31, 2017 were obtained from Beijing Municipal Environmental Monitoring Center, and meteorological data were derived from the National Meteorological Science Data Sharing Service. Multi-level AM and LSTM were developed to estimate the regional hourly and daily concentration of PM 2.5 . Results In the prediction of hourly PM 2.5 concentrations, LSTM achieved a better performance than multi-level AM (range of R 2 : 0.76-0.92 for LSTM, 0.59-0.78 for multi-level AM; range of root mean square error (RMSE): 6.20-17.58μg/m 3 for LSTM, 19.19-30.81μg/m 3 for multi-level AM; range of mean absolute error (MAE): 4.50-13.42μg/m 3 for LSTM, 13.55-22.35μg/m 3 for multi-level AM; range of mean absolute percentage error (MAPE): 0.18%-0.55% for LSTM, 0.50%-0.87% for multi-level AM). While in the prediction of daily PM 2.5 concentrations, multi-level AM showed a higher predictive accuracy than LSTM (range of R 2 : 0.43-0.93 for LSTM, 0.74-0.98 for multi-level AM; range of RMSE: 32.46-46.82μg/m 3 for LSTM, 4.83-20.98μg/m 3 for multi-level AM; range of MAE: 24.32-34.89μg/m 3 for LSTM, 3.67-16.33μg/m 3 for multi-level AM; range of MAPE: 0.92%-1.74% for LSTM, pressure, maximum speed, instant maximum wind speed, 10-minute mean wind speed, wind direction, temperature, maximum temperature, minimum temperature, relative humidity, minimum relative humidity, vapor pressure and precipitation, a total of 15 variables. This study collected meteorological hourly and daily mean concentration of 16 districts in Beijing from Jul 1, 2016 to Dec 31, 2017.


Background
Fine particulate matter (particulate matter with an aerodynamic diameter less than or equal to 2.5μm, PM 2.5 ) can increase the risk of occurrence and death of multiple respiratory [1,2], circulatory systemic diseases [3,4]and can increase the morbidity and mortality of tumors [5,6]. In addition, 3 heavy metals carried by PM 2.5 can accumulate in human body and cause chronic hazards.
Microorganisms adhered to PM 2.5 can cause sensitization, which do great harm to human body [7].
The revised new standards for ambient air quality in 2012 have added PM 2.5 to the air pollution monitoring project. The monitoring data of air pollutants published by the Beijing Municipal Environmental Protection Bureau showed that the average annual concentration of PM 2.5 in Beijing from 2013 to 2017 were 89.5μg/m 3 , 85.9μg/m 3 , 80.6μg/m 3 , 73μg/m 3 , and 58μg/m 3 respectively.
Although the annual mean concentration of PM 2.5 showed a decline trend, the decline was small, and it is much higher than the ambient air quality secondary standard, which is annual mean concentration of 35μg/m 3 . In addition, the annual mean concentration limit for PM 2.5

in the European
Union is 25μg/m 3 , and in the United States is 15μg/m 3 . It can be seen that PM 2.5 pollution in Beijing cannot be ignored. Therefore, the accurate PM 2.5 concentration prediction is very important for controlling air pollution and preventing health hazards caused by it. Present PM 2.5 predication methods could be divided into two categories generally. One is deterministic methods [8,9], in which Community Multiscale Air Quality (CMAQ) model [10] and the latest WRF-Chem model [11] are currently used internationally, while Nested Air Quality Prediction Modeling System (NAQPMS) [12]is commonly used in China. The mechanism of deterministic methods is clear, but complex prior knowledge is needed, and there are various application restrictions. Such as the forecasting process should take a long time, and the potential effects of the associated factors of PM 2.5 is not fully considered [13,14]. The other is statistical methods, which could avoid sophisticated theoretical models and apply statistical-based models simply to predict the concentration of air pollutants rapidly, because of no need to describe the physical and chemical processes of pollutants. Statistical methods can predict the concentration of air pollutants including PM 2.5 by analyzing air quality related data and have received extensive attention from scholars.
In these models, LSTM have nearly all the advantages of ANN and RNN, such as the ability of performing nonlinear mapping, the characteristic of adaptability and robustness, high performance in the field of temporal series predication. In addition, LSTM [29] neural networks have the ability of studying long temporal series, at the same time it will not be affected by the problem of gradient vanishing. These features are important in PM 2.5 concentration predication, because PM 2.5 concentration is related to its previous concentration. Li et al [30] extended the LSTM model to predict the PM 2.5 hourly concentration basing on data from 12 air quality monitoring stations from January 2014 to May 2016. In PM 2.5 concentration predication, GAM can identify nonlinear relationship and interaction of the associated factors with PM 2.5 , and is suitable for the analysis of temporal series including PM 2.5 concentration predication. The results of the both two studies showed that GAM has a significant improvement in prediction efficiency compared to simple linear regression models [31,32].
However, in most studies the collinearity has not been well solved, and it needs to be improved to choose the degree of freedom of each variable in GAM. Since LSTM and AM both have several advantages, and the comparison between them is limited, this study aimed to compare performance of multi-level AM and LSTM in hourly and daily PM 2.5 concentration predication based on data from 16 districts of Beijing from July 2016 to December 2017.

Air Pollution and meteorological data
The air pollution data including PM 2.5 , CO, NO 2  Meteorological data were obtained from the National Meteorological Science Data Sharing Service 5 (http://data.cma.cn/), which include air pressure, sea level pressure, maximum pressure, minimum pressure, maximum wind speed, instant maximum wind speed, 10-minute mean wind speed, wind direction, temperature, maximum temperature, minimum temperature, relative humidity, minimum relative humidity, vapor pressure and precipitation, a total of 15 variables. This study collected meteorological hourly and daily mean concentration of 16 districts in Beijing from Jul 1, 2016 to Dec 31, 2017.

Statistical Analysis
The median and inter-quartile range was used to describe the distribution of variables which were not normally distributed. Spearman rank correlation analysis was used to assess the association between meteorological factors and PM 2.5 .
The general form of multi-level AM can be specified as [33]: Y = ∑gX+α [1] Multi-level AM in this study was modified as follows: Y = ∑βX+∑sX+α [2] Here Y refers to dependent variable, X refers to independent variables, g(X) denotes the fitting function, β(X) denotes linear function, s(X) denotes smoothing function, and α is intercept term.
In this study, PM 2.5 concentration with a certain time lag was selected as dependent variable, the date was used as a time series to control the time confounding factors, meanwhile, season, rainfall, holidays, weekends were used as categorical variables. And some selected meteorological factors enter into multi-level AM to solve the multi-collinearity between the associated factors and PM 2.5 .
Penalty cubic spline smoothing function was selected to perform the non-parametric fitting in gaseous pollutants and most meteorological factors, while thin plate spline smoothing function was selected for wind speed and direction because of interaction. Considering time as level 1 and districts and counties as level 2, a multi-level AM was established including weekends, holidays and season to choose variables for multi-level AM. Degrees of freedom of variables were selected based on the partial autocorrelation function.
6 LSTM model is composed of an input layer, an output layer, and a series of cyclically connected hidden layers, namely memory blocks, each block consists of some self-recurrent memory cells and three multiplicative units (input gates, output gates, and forget gates) which provide functions to read, write, and reset continuously for the cells. Self-recurrent memory cells can block any external disturbance, so the state can remain unchanged from one step to the next, enabling LSTM to solve the problem of gradient vanishing. Forget gates can reset the memory block when the state is out of date, meanwhile, prevent the gradient from exploding. In addition, the input gate allows the incoming signal to modify the state of the cells, and the output gate allows or prevents the state of cells from affecting other cells. The basic structure of LSTM is shown in S1 Fig. Hourly or daily PM 2.5 concentration was used as dependent variable, and the independent variables, namely, input characteristics, included 4 gaseous pollutants, 15 meteorological factors, and 5 time variables. The number of LSTM layers, fully connected layers, cells in each layer, batch size and epochs were adjusted according to training and testing losses. The neural networks were disconnected with a probability of 0.01 to avoid over-fitting. The loss function was mae, the optimizer was adam, and the time step was set to 1, which indicated that PM 2.5 concentration at the next time point (next hour or day) was predicted based on historical data.
Multi-level AM and LSTM were established based on the data of each district and county in Beijing to predict hourly and daily PM 2.5 concentration respectively. The first two-thirds of the data was used as training set, while the latter one-third was used as testing set. The efficiency of the two types of models was evaluated with determination coefficient (R 2 ), root mean square error (RMSE), mean absolute error (MAE) and the mean absolute percentage error (MAPE).
The descriptive statistical analysis of the pollutant and meteorological data was performed using Arcgis10.2 and R3.4.3 software. The fitting of multi-level AM was performed using R3.4.3, and the Python 3.6 software was used to fit LSTM.

Results
The distribution of air pollutants and meteorological factors were described in S1 Table.  Meanwhile, daily PM 2.5 concentration levels seem to fluctuating randomly, whereas, there was a decline trend during the study period.
The association between PM 2.5 and meteorological factors were shown in S2 Table. All the correlation coefficients were statistically significant. If spearman rank correlation coefficients (r s ) among several correlated meteorological factors were above 0.60, the one which has the highest spearman rank correlation coefficient with PM 2.5 should be selected. Finally, Minimum relative humidity (r s = 0.29, P < 0.001), sea level pressure (r s = -0.09, P < 0.001), maximum wind speed (r s = -0.16, P < 0.001), temperature (r s = 0.01, P < 0.001) were under consideration to be involved in the construction of multi-level AM. Table 1, it demonstrated that every variable is statistically correlated with PM 2.5 (β = -15.05-33.73, P < 0.05). Finally, minimum relative humidity, sea level pressure, maximum wind speed and wind direction, temperature, rainfall, CO, NO 2 , SO 2 , and O 3 were selected to be involved in the construction of multi-level AM. Note: All estimates are from multi-level additive model.

The result of multi-level AM was shown in
In multi-level AM, the degrees of freedom of the independent variables are determined one by one according to the principle of minimizing the partial autocorrelation function (pacf). One example was given in S4 Fig, when the k was 20, the pacf was minimum, and then the degree of freedom of this variable in this model could be determined as 20.
In LSTM, when the training and testing loss tend to be stable, it indicated that the model trained well  Fig 1. It can be seen that the predicated hourly PM 2.5 concentration using LSTM are more consistent with the observed than multi-level AM, which could also suggest that LSTM performs better than multi-level AM in hourly PM 2.5 concentration predication.  Fig 2. It can be seen that the predicated PM 2.5 hourly concentration plots of multi-level AM are more consistent with the observed than LSTM, which could also suggest that multi-level AM performs better than LSTM in predication of daily concentration of PM 2.5 . The limitation is that, in the predication of daily PM 2.5 concentration, the sample size was small and 13 time series was not long enough. However, this study conducted LSTM and multi-level AM for hourly and daily PM 2.5 concentration predication at the same time. In view of the difference of hourly and daily PM 2.5 concentration, daily PM 2.5 concentration should be predicted in longer time series basing on LSTM and multi-level AM to verify the conclusions of this study.

Conclusion
It can be concluded that LSTM performed better than the multi-level AM when there is a large amount of data, while multi-level AM performed better than LSTM when the amount of data is relatively small.