PM 2 : 5 concentration prediction in Lanzhou, China, using hyperchaotic cuckoo search—extreme learning machine

High concentrations of PM 2 : 5 cause environmental problems and many serious health effects, including heart and lung disease. To assess the PM 2 : 5 level with high accuracy, an advanced extreme learning machine (ELM) model is proposed, which combines the advantages of the hyperchaotic system and the cuckoo search algorithm. It improves the accuracy of the original ELM and avoids manual adjustment of parameters. The model examines the PM 2 : 5 concentration in Lanzhou, China, with daily predictions at four stations in 2018. It is also compared with different methods, such as the original ELM, multiple linear regression , and long short-term memory , and the results show that the proposed model obtains better forecasting performance than the others in terms of root mean squared error and the coefﬁcient of determination (R 2 ), respectively. In addition, the model is applied to make short-term forecasts for four stations, predicting hourly PM 2 : 5 concentrations over the next week based on fourteen days of monitoring data. They are in high agreement with the monitored PM 2 : 5 concentrations. Our research indicates that the proposed model can facilitate effective measures to avoid exposure to high concentrations of PM 2 : 5 . Meanwhile, it also provides a novel way to predict air pollution.


Introduction
Air pollution has become a growing problem in the 20th century.PM 2:5 is an atmospheric particulate matter with an aerodynamic equivalent diameter of up to 2.5 microns that can have an important impact on air quality.Many studies have confirmed that asthma, chronic obstructive pulmonary disease, and lung cancers are closely linked to PM 2:5 , which can also have an impact on the cardiovascular system (Coleman et al. 2020;Ye et al. 2018).In China, rapid urban and industrial development in recent years has led to environmental problems of poor air quality in some regions.Lanzhou, a major city in Northwest China, is also plagued by air pollution.Therefore, this research focuses on air quality in Lanzhou and predicts long-term and shortterm PM 2:5 concentrations.The long-term prediction refers to the assessment of daily levels of PM 2:5 during a year based on different variables, including temperature, humidity, sulfur dioxide, nitrogen dioxide, etc.The shortterm forecast predicts the hourly PM 2:5 concentrations during a week.Achieving early warnings of air pollution by accurately predicting PM 2:5 concentrations is an important step in the prevention and control of air pollution.
In addition to deterministic modeling based on physics and chemistry (Gilani et al. 2016), some scholars have also adopted statistical methods for air quality prediction.Compared to the former, the statistical modeling approach has received increasing attention, as it does not need to take into account complex chemical reactions and movement processes such as the dispersion of pollutants.The statistical models not only consist of traditional statistical models but also include artificial neural network (ANN) models and composite models.Traditional statistical models are mostly used to make predictions by analyzing patterns in the data, such as autoregressive (Zhou and Goh 2017) and multiple linear regression (MLR) models (Lesar and Filipcic 2021).Variations in PM 2:5 levels are influenced by a variety of factors, so simple linear models have limitations in assessing PM 2:5 concentrations.
Artificial neural networks, with their nonlinear mapping and adaptive learning capabilities, have been widely used in air environment prediction, such as feed-forward neural networks (Fu et al. 2015), recurrent neural networks (Biancofiore et al. 2017), and deep neural networks (Lightstone et al. 2021).With the development of computer science, the performance of composite models has been improved considerably, which has attracted attention in the development of various prediction models.For example, convolutional back-propagation neural networks (BPNNs) (Kow et al. 2020;Park et al. 2020), and weighted bagging based neural network combining image contrast-sensitive features, can be used to predict air pollution (Qiao et al. 2020).Hybrid convolutional neural networks combining high-level features extracted from convolutional layers with ground truth PM 2:5 can learn haze-related classification mappings and train support vector regression, indicating cheapness, quickness, and convenience (Li et al. 2020).Moreover, neural networks can derive a kind of multivariate Bayesian uncertainty processor to predict the probabilities of PM 2:5 (Zhou et al. 2020) or develop air quality early-warning systems through phase-space reconstruction and multi-objective optimization (Wang et al. 2020).In addition, the aerosol optical depth-PM 2:5 model (Xu et al. 2021) used a combination of MLR, BPNN, classification and regression trees, and random forest estimation methods to forecast PM 2:5 concentrations in eastern China, and the accuracy was improved by adding meteorological elements that vary over time and height.
The deterioration of air quality has affected public health in China (Wang et al. 2020;Jin et al. 2020;Schraufnagel et al. 2019), so a number of studies on the prediction of PM 2:5 concentrations in Chinese cities have been presented.Chinese researchers (Xiong et al. 2020) proposed a hybrid model that can rapidly estimate particulate pollution based on a data-driven ANN.At the regional level, Zhao et al. (2020) used recurrent neural networks to analyze hourly air quality in Northwest China.In addition, Wang et al. (2021) exploited an ANN to predict and analyze the PM 2:5 concentration in Chongqing, China.In terms of small-scale predictions, Tong et al. (2020) and Xu and Liu (2020) investigated machine learning to analyze PM 2:5 concentrations in households in Hong Kong and near the Beijing railway station, respectively.
Most existing algorithms are aimed at predicting shortterm or long-term pollutant concentrations.There are rarely any algorithms that can perform well in both longterm and short-term PM 2:5 concentration forecasting.Therefore, it is meaningful to put forward a model that can accurately predict both the long-term and short-term PM 2:5 concentrations.The neural network is complicated to calculate, occupies many system resources, and has poor prediction accuracy for PM 2:5 concentrations.There is a correlation between the PM 2:5 concentration and time series, so an extreme learning machine (ELM) approach is employed to predict the PM 2:5 concentration.Avoiding manual adjustment, the cuckoo search (CS) algorithm can be applied to calculate the optimal number of hidden layer neurons of the ELM.For the adequate convergence and randomness of hyperchaotic systems (Li et al. 2019), its joint can reduce the computational cost of the CS algorithm and improve the accuracy of ELM (Jia et al. 2020).Inspired by the above, this research proposes the hyperchaotic CS-ELM (HCCS-ELM) model to predict both long-term and short-term PM 2:5 concentrations.
The main contributions of this research are as follows: (1) a new machine learning algorithm is proposed for measuring the daily and hourly concentrations of PM 2:5 , which can predict pollutants levels for the coming year and make short-term predictions for the coming week; (2) the CS algorithm is optimized by a hyperchaotic system, and the improved CS algorithm and hyperchaotic series are applied to the original ELM.This research innovatively applies the hyperchaotic system to PM 2:5 concentration prediction; (3) the model improves the accuracy of pollutant concentration predictions, while effectively simulating the trend of pollutants over time.
The remainder of this article is organized as follows.Section 2 describes the relevant methods and models, which also contain the research area and data sources.Section 3 presents the results of the experiments comparing HCCS-ELM with other models.The experimental results are discussed in Sect. 4. Section 5 concludes the findings of this research.
The ELM is used for PM 2:5 daily and hourly concentration prediction considering the close connection between pollutant levels and time series.In addition, the hyperchaotic system and the CS algorithm are used to improve the performance of the ELM.The study area and data sources are introduced in the following subsections.Then, the hyperchaotic system and the application of the hyperchaotic system to the CS algorithm and the ELM are described in detail.Finally, the improved CS algorithm and the hyperchaotic ELM are combined to form the HCCS-ELM shown in the last subsection.

Study Area
Lanzhou is the capital city of Gansu, located at 36 03 0 N and 103 40 0 E. The terrain is high in the southwest and low in the northeast.Surrounded by mountains in the north and south of the urban area, the Yellow River passes through the city, forming a narrow long valley between gorges and basins.The frequencies of calm winds and temperature inversions in the Lanzhou River Basin are high, which is not conducive to the diffusion of endogenous pollutants.The transfer of external dust into the surrounding area can lead to high concentrations of PM 2:5 .
Based on the air quality records of Lanzhou from 1 January 2016 to 31 December 2021, the annual average PM 2:5 concentrations for Lanzhou from 2016 to 2021 were 54 lg=m 3 , 45 lg=m 3 , 38 lg=m 3 , 37 lg=m 3 , 35 lg=m 3 , and 32 lg=m 3 , respectively; the annual number of days with good or excellent air quality (air quality index less than 100) were 240, 237, 249, 298, 310 and 296, respectively.In Fig. 1, the red dashed line represents the Chinese PM 2:5 concentration limits, and the blue line represents Lanzhou's PM 2:5 concentration, which has severely exceeded China's standards in recent years.

Air quality monitoring and meteorological data
The meteorological data are the CMA dataset (available from http://data.cma.cn/), and the pollution data are the MEE dataset (available from http://www.mee.gov.cn/).In this research, the data were collected from four stations whose geographic locations are shown in Fig. 2. Station 1 is located in Xigu District, Lanzhou City, next to Lanzhou Petrochemical, and has relatively low traffic.Station 2 is located around the city of Lanzhou, with low traffic and no nearby factories.Stations 3 and 4 are located in the residential area of Lanzhou city, with heavy traffic and no nearby factories.The four stations do not affect the surrounding environment, which means they are hidden.For example, if people know where the stations are, they may go away, reducing traffic flow and resulting in inaccurate data.Daily and hourly meteorological data such as wind direction, wind speed, temperature, relative humidity, and pollutant (CO, SO 2 , NO 2 , O 3 , and PM 2:5 ) data from 2016 to 2018 are selected as the study samples (Feng 2020;Barmpadimos et al. 2012;Lou et al. 2017).

Hyperchaotic system
In this section, a hyperchaotic real system and its corresponding complex system are used to generate the input weights and hidden layer biases of the HCCS-ELM to obtain better prediction results than comparable models.The hyperchaotic real system is given as follows (Li et al. 2019;Jia et al. 2020): where a, b, c and d are real constants, and n 1 , n 2 , n 3 and n 4 are real variables.The dots denotes derivatives with respect to time t.The corresponding hyperchaotic complex system is given by where , and g 4 ¼ u 6 þ ju 7 are complex variables, and j ¼ ffiffiffiffiffiffi ffi À1 p . The over bar represents the complex conjugate of the variables.The real version of System (2) is given as follows: The hyperchaotic sequences in System (1) are shown in Figure 3 for the n 3 -n 2 -n 1 and n 2 -n 1 -n 4 planes.

Hyperchaotic cuckoo search
The CS algorithm is a behavior simulation proposed by Yang and Deb (2009) based on interesting breeding behavior such as brood parasitism of certain species of cuckoos.Meanwhile, they introduced the breeding behavior of cuckoos and the characteristics of L evy flights of some birds and fruit flies.In nature, cuckoo birds fly in random or similar random ways to find the location of the nest suitable for laying eggs.The CS algorithm has three ideal conditions as the premise (Yang and Deb 2009;Gandomi et al. 2011): 1.Each cuckoo lays only one egg at a time and randomly selects a nest location for hatching.2. In a randomly selected set of nests, the high-quality nests will be retained for the next generation.

3.
The number of available nests is fixed, and the probability of the bird's nest owner finding the foreign egg is P a 2 ½0; 1.When the owner finds a foreign cuckoo egg, he will throw it away or build a new nest.Based on the above, the operation steps of the hyperchaotic cuckoo algorithm are as follows: 1 Setting parameters such as population size, searching space dimension, and maximum number of iterations.The position of the nest is initialized randomly as X i , i 2 ½1; n, and the objective function is defined as f(x), x ¼ ½x 1 ; x 2 ; x n T , where T denotes transpose.2 Calculating the objective function value of each nest location and comparing it to obtain the current optimal function value.3 Using the hyperchaotic sequence to update the position and state of the other nest except the optimal one, calculating the objective function value and comparing the obtained function value with the current optimal value.If it is better, the current optimal value is updated.4 After position updating, random numbers r and P a are used for comparison.If r [ P a is used, the nest position is randomly updated once.Otherwise, it remains unchanged.5 If the maximum number of iterations or search accuracy requirements are met, the next step will proceed, or it will go back to Step (3).6 Outputting the location of the globally optimal nest.

Hyperchaotic extreme learning machine
The hyperchaotic ELM is divided into three parts: input, hidden, and output layers.In the network, the neurons of the input, hidden and output layers are completely connected.Assume that the model has n input layer neurons and its corresponding input has n data variables, j neurons in the hidden layer, and m in the output layer so that its corresponding output has m data variables (Huang et al. 2006).W is the connection weight matrix of the input and hidden layers.It is represented using the following matrix: where w ji is the connection weight between the i th input and the j th hidden neurons.b is the connection weight between the hidden and output layers as shown in Eq. 5: where b kj is the weight of the hidden neuron j to the output neuron k.Both W and the status of neuron threshold in the hidden layer are generated using the above mentioned hyperchaotic system.
Hyperchaotic sequences generated using the above hyperchaotic functions are used to replace the hidden layer threshold matrix randomly generated using ELM.The running steps of the hyperchaotic ELM are as follows: (1).Importing data and starting training.
(2).Generating the connection weight matrix between the input and hidden layers randomly.(3).Generating the threshold matrix of the hidden layer using the hyperchaotic function.(4).Calculating the connection weight matrix between the hidden and output layers using the least square method.(5).Importing the test data into the hyperchaotic ELM and outputting the prediction results after the training.

HCCS-ELM
Figure 4 shows the structure diagram of the HCCS-ELM.First, the training data are imported.Then, the chaos equation is used to generate the hyperchaotic sequence.Sequence 1 is selected to replace the L evy flight in the cuckoo algorithm.It realizes the jump by randomly selecting the value in sequence without repetition.Sequence 2 is selected to replace the randomly generated connection weight matrix of the input and hidden layers.Sequence 3 is used to generate the hidden-layer threshold matrix in the ELM.The improved cuckoo algorithm is used to traverse the sample data to generate the number of hidden layer neurons in the ELM.After the training process of ELM, the test data are imported, and the prediction results are obtained.
To reduce the computational cost of the system, the hyperchaotic system is used to generate a sequence instead of the L evy flight.The hyperchaotic system is only required to generate a sequence once, and then each iteration randomly selects a value from the sequence, which greatly reduces the resource occupation.In addition, the CS algorithm is exploited to traverse the samples to generate the number of hidden layer neurons, which effectively improves the adaptability of the algorithm and avoids repeated parameter adjustment.Considering the convergence phenomenon of chaos, the hyperchaotic sequence is used to replace the hidden layer threshold matrix in ELM, and regression can be understood as a convergence phenomenon.Therefore, the use of convergent sequences instead of random sequences improves the accuracy of ELM, yields highly accurate results in PM 2:5 concentration prediction, and saves the use of system resources.

Experiments and results
In this section, the HSSC-ELM is used to predict the four stations in terms of the daily concentrations of PM 2:5 and compares with other models for forecasting performance in one year and each quarter.Then, the HSSC-ELM is used for hourly PM 2:5 concentration prediction to obtain the prediction results of the model.Finally, the HSSC-ELM is analyzed for daily pollutant level forecasting in terms of time and memory consumption compared with the ELM.

Long-term prediction
In this research, HCCS-ELM, BPNN, MLR, ELM, long short-term memory (LSTM), and Prophet are used to predict the PM 2:5 concentration at four observation stations.In addition, the results are compared and analyzed similarly to Amanollahi and Ausati (2019).Seventy percent of the historical observation data are used to train the model, and the remaining 30% of the data are used to test the model.In this research, PM 2:5 levels are predicted for different seasons, since the PM 2:5 concentration varies across seasons.For instrument failure and other irresistible making historical observations incompletable, invalid data are deleted if one or more kinds of variables are missing.However, special days, such as rainy days, holidays, or nights, are not criteria for filtering data.Thus, only two months in each season are selected for prediction.Table 1 presents the amount of filtered data.There are 500 samples in the training set of PM 2:5 prediction at Station 1 and 215 samples in the test set.There are 511 samples in the training set of PM 2:5 predictions of Stations 2, 3, and 4 and 219 samples in the test set.The input parameters include eight variables: temperature, humidity, sulfur dioxide, nitrogen dioxide, ozone, carbon monoxide, instantaneous wind speed, and wind direction.The predicted PM 2:5 concentration is output at the output layer.Both the HCCS-ELM and ELM have the same parameters, which are the sigmoid function, ''regression'' model, and 10 hidden layer neurons.However, the BPNN has 10 hidden layer neurons, 100 epochs, a 0.1 learning rate, and 0.4 goals.All parameters of the six models are kept constant during the experiment.Root mean squared error (RMSE) and the coefficient of determination (R 2 ) are selected as the evaluation criteria.The formulas of RMSE and R 2 are given as follows: where n is the number of samples, X obs;i is the observed value of PM 2:5 concentration, X model;i is the predicted value of PM 2:5 concentration, and VarðÁÞ is the variance.The closer R 2 is to 1 and the smaller RMSE is, the better the prediction effect is. MLR, LSTM, and Prophet are run with Python.Figure 5 shows the results predicted by HCCS-ELM.Tables 2 and 3 present the RMSE and R 2 of HCCS-ELM, ELM, BPNN, MLR, LSTM and Prophet, respectively.The analysis of the above chart shows that the RMSE of HCCS-ELM is smaller than the other five models, and the R 2 is larger than the other models, which could predict the concentration and the trend of PM 2:5 more accurately.Table 2 shows the results obtained by the daily concentration from 2016 to 2017 as the training set and the 2018 daily concentration as the test set.Table 3 uses the first two months of each quarter in 2018 as the training set and the last month as the test set.

Short-term prediction
The short-term extremely high PM 2:5 concentration is usually the object of the attention of the meteorological department.High PM 2:5 concentrations are harmful to human health.In this research, data with PM 2:5 concentrations exceeding China's national standard are selected, and the data division is consistent with the above.Figure 6 shows that the HCSS-ELM is still efficient for predicting short-term extremely high PM 2:5 concentrations.Due to missing data, the hourly concentration data are incomplete.In Fig. 6, the test data are the hourly concentration of the date in the figure, and the training data are the effective value of the hourly concentration fourteen days before the test date.

Performance evaluation
The running time and memory usage are used to evaluate the performance of the HCCS-ELM and ELM.Tables 4  and 5 present the time and memory results.Table 4 shows the performance of the two algorithms in processing the daily concentration data of each station from 2016 to 2018, and Table 5 shows the performance of the two algorithms in processing the daily concentration data of four stations in each quarter.This shows that the HCCS-ELM needs more running time than the ELM, while they have similar occupied memory.That is because the HCCS-ELM needs to generate hyperchaotic sequences during operation.The differences in running time and memory usage of the same model can be attributed to the software.

Discussion
From the perspective of environmental protection, people should improve the environmental protection system, adopt an environmentally friendly lifestyle, and insist on the harmonious coexistence between humans and nature.For the improvement of the global ecological environment, it is necessary to set up and practice the concept that clear waters and green mountains are as valuable as mountains of gold and silver.Machine learning has fast and accurate prediction performance and is commonly used to predict the trend of pollutant concentration transformation.Accurate prediction of pollutant concentrations can help local governments take more effective preventive measures.Therefore, the HCCS-ELM is obtained using two steps based on the ELM: (1) the chaotic sequences that are generated by the hyperchaotic system, are randomly chosen instead of L evy flights for iteration to reduce the computational effort of the CS algorithm; (2) the improved CS algorithm is used for the number of hidden layer neurons of ELM, which can avoid manual adjustment and improve the accuracy of the ELM   algorithm.The proposed model has the following advantages in analyzing PM 2:5 concentrations in Lanzhou.First, HCCS-ELM simplifies the analysis process, while traditional numerical weather forecast methods are very complex and require many calculations.It also shows a reduction in computational complexity when predicting PM 2:5 compared to deterministic numerical prediction methods, for example, by modeling the relationship between satellite aerosol optical depth and ground-based PM 2:5 observations.In the HCCS-ELM, the CS algorithm and hyperchaotic system need some time to operate.As a result, compared to ELM, the running time increases, but the accuracy is improved and the prediction error is reduced.The HCCS-ELM is similar to other neural networks in that it is independent of the complex relationship between the parameters and outputs.However, it depends on constant changes in weights so that the parameters and outputs are closely related and tedious mathematical modeling is avoided.Second, the HCCS-ELM has excellent generalization.Since there is no mathematical modeling process, combined with similar and available data, HCCS-ELM can theoretically be applied to many cities in China that set up air quality monitoring stations without being affected by differences in the geographical environment and the level of economic development.Finally, considering the data processing characteristics of HCCS-ELM, the concentration of other pollutants can be predicted by simply transforming the columns in the data matrix, avoiding the complex adjustment process in the meteorological equations.Furthermore, the addition of a hyperchaotic system provides new ideas for improving the performance of neural networks.There are some limitations to this research.The model considers the temporal variation in PM 2:5 , but the spatial distribution is ignored.Furthermore, the model needs to be improved in terms of operational cost and efficiency.

Conclusion
This research proposes a novel model, HCCS-ELM, which combines the CS algorithm and hyperchaotic system with ELM.The ELM is conducted at each of the four stations to predict daily and hourly PM 2:5 concentrations based on monitored levels of other pollutants.The CS algorithm can calculate the optimal number of hidden layer neurons of ELM without manual adjustment, while the hyperchaotic system reduces the computational cost of the CS algorithm and improves the accuracy of ELM due to its adequate convergence and randomness.The prediction results show that HCCS-ELM achieves better performance than the original ELM, MLR, BP, LSTM, and Prophet in predicting daily concentrations of pollutants.Furthermore, the PM 2:5 model predictions of hourly PM 2:5 concentrations during a week are highly consistent with the monitored data.With its generalizability and high prediction accuracy, the HCCS-ELM model is of great significance for air pollution forecasting.Future work will focus on adding the spatial characteristics of pollutants to the HCCS-ELM to obtain more detailed information on the distribution of PM 2:5 .

Fig. 5
Fig. 5 Daily PM 2:5 concentration prediction results of HSSC-ELM in a year

Table 1
The amount of data on daily pollutant concentrations of four stations in each season

Table 2
Comparison of RMSE and R 2 of different methods (Year)

Table 5
Funding This work was supported in part by the Fundamental Research Funds for the Central Universities (Grant No.lzujbky-2019) and The Natural Science Foundation of Gansu Province (20JR10RA606).Performance comparison of HCCS-ELM and ELM (Year) Performance comparison of HCCS-ELM and ELM(Season) Data source Meteorological data: http://data.cma.cn/.Pollutant data: http://www.mee.gov.cn/Table 4