The impact of air quality on its Baidu index: grey model analysis

: To investigate the relationship between air quality and its Baidu index, we collect the annual Baidu index of air pollution hazards, causes and responses. Grey correlation analysis, particle swarm optimization and grey multivariate convolution model are used to simulate and forecast the comprehensive air quality index. The result shows that the excessive growth of the comprehensive air quality index will lead to an increase in the corresponding Baidu index. The number of search for the causes of air quality has the closest link with the comprehensive air quality index. Strengthening the awareness of public about air pollution is conducive to the improvement of air quality. The result provides a reference for relevant departments to prevent and control air pollution.


Introduction
Air quality prediction is the basis of environmental protection work and plays a crucial role in environmental protection [1]. It can provide scientific data for environmental management, environmental assessment, and environmental planning [2]. Air quality level is an important standard for measuring the quality of the environment, which directly affects people's health [3,4]. Therefore, it is particularly important to strengthen the prediction of air pollution and timely grasp the air quality status. Only by strengthening the prevention and control of air pollution can we improve air quality further [5]. After strengthening the prediction of air pollution, we can understand the development trends of air pollution.
The prediction results combined with its current status can provide relevant data support for air pollution treatment [6]. Continuous optimization and improvement of air pollution control programs can better promote the effectiveness of air pollution control. In recent years, with the concept of environmental protection gradually gaining popular support, the country's attention to environmental protection has been increasing. Especially in the area of air pollution prediction and control, efforts are constantly improving.
However, at present, air pollution in some cities in China is more serious, and air quality prediction and improvement methods need to be further improved [7].
The widespread use of big data has brought good news to air prediction [8,9]. The traditional data for air quality prediction are mostly from statistical reports of the government and environmental protection departments. Since the completion of the report requires a certain amount of time, the collection and release of these data often lack timeliness and cannot directly reflect the current problem. The lack of data has caused huge obstacles to prediction [10]. No matter how advanced the prediction methods and tools used by researchers, the data source is still the bottleneck of air prediction. Therefore, the use of data that more timely and directly reflects air quality will greatly improve the prediction effect. In recent years, network data from search engines has been widely used in prediction research in various fields. Tourism forecasting, financial forecasting and hospitalization forecasting based on network search data are all have higher accuracy [11,12]. The instantaneity of network data can make up for the lag of traditional forecasting data and has stronger timeliness. Therefore, the characteristics of web search data determine its applicability in the prediction of social and economic activities. By extracting rules from a large amount of search data and analyzing data change trends, it can provide effective methods for real problems [13]. The increasing popularity of Internet technology in China has made the spread of information no longer limited by time and space. The platform has also become an important source of information for the public [14]. Therefore, web search data can be used to mine the public's perception and response to air quality. A proper use of network data is conducive to reliable air quality prediction.
At present, most air quality forecasts are short-term with a period of hours and days. For example, the Light Gradient Boosting Machine model is used to predict the PM2.5 concentration in Beijing in the next 24 hours [15]. Improved model based on neural network is used to predict PM2.5 pollution at air quality testing stations for 48 hours [16]. Using hourly PM2.5 concentration data collected from Beijing's 1,233 air quality monitoring stations, the air quality is predicted [17]. Composed of complexity analysis, data preprocessing, and optimized prediction modules, an analysis and prediction system is used to predict the hourly AQI sequence of eight cities in China [18]. Therefore, this paper will perform an annual air quality forecast in order to propose longer-term air quality improvement methods. Existing annual data are limited, and traditional statistical methods are inappropriate in this case. The grey model will be taken to deal with this problem.
Grey prediction theory is put forward by Professor Deng to deal with the problem of uncertain data or small sample size [19]. Grey forecasting models have attracted much attention because of their remarkable effect [20][21][22]. In the case study of e-waste data in Washington, the nonlinear grey Bernoulli model with convolutional integral was optimized by particle swarm optimization algorithm, which improved the accuracy of the model [23]. The unbiased fractional discrete multivariate grey model is used to predict the power consumption [24]. The grey model is optimized by changing the fractional order in order to better predict electricity usage [25]. The lion ant colony optimization algorithm is designed to determine the optimal accumulation coefficient to further improve the predictive accuracy of the grey model [26]. To discuss the relationship between CO2 emission and economic growth, unequal gap grey Verhulst model was 4 derived [27]. Many scholars also applied grey system theory to the study of air quality [28,29]. However, few people apply the grey model to study the relationship between web search information and air quality.
Compared to AQI, Comprehensive air quality index (CAI) is more convincing in assessing air quality [30].
In this paper, the relationship between Baidu index and CAI is analyzed by grey multivariable convolution model.
The rest of this article is organized as follows. Section 2 gives the research area and data sources.

Section 3 introduces research methods. Section 4 analyzes the relationship between Baidu index and CAI in
Beijing. The conclusion and implication are drawn in Section 5.

Research area and data source
Beijing is China's capital and the center of China's political, cultural and economic development ( Fig.1). However, due to the rapid economic development and the rapid increase in population, coupled with the special terrain and atmospheric characteristics of the Beijing-Tianjin-Hebei region, Beijing is encountering a serious air pollution problem. This problem not only endangers public health and exacerbates the deterioration of the natural environment, but also affects the normal economic production and social order. Therefore, it is particularly significant to predict and improve air quality in Beijing.

Fig. 1. Location of Beijing
Internet search information is generated from the public's spontaneous Internet search behavior, which can directly reflect the public's intentions. It can reflect the changes in air quality precisely with the characteristics of real-time and public scale. Facing air pollution, the public often obtains knowledge from others to solve problems due to the lack of relevant knowledge and information. Hence, the public's attention to the problem and the need to understand the relevant information will give rise to corresponding information search and query behavior. In Internet users' search for air quality issues, specific search terms are used to obtain information about air quality, such as "what can be done to reduce air pollution." To form a combination of network search keywords that can be used in specific prediction models. Common sources of web information for web users include search engines, portals, forums, microblogs, and other social software. The Baidu index is a data reference platform provided by Baidu. The search index provided in the platform is calculated based on the number of keywords searched by netizens. Therefore, Baidu index is the research data sources.
People perceive air quality problems when their bodies are affected by air pollution. According to the theory of protection motivation [31], perceived risk determines the willingness to respond. Therefore, we 6 classify the network search data from the hazards, causes and responses of air pollution. First of all, selecting "bronchitis, smog, environmental protection" as the benchmark keywords by consulting relevant experts and considering the availability of keywords. Then, we expand the keywords with the hot word recommendation feature of search engines. The keyword list is shown in Table 1.

Grey multivariable convolution model
Because of its good predictive effect, GMC(1,N) model is widely used in predictive analysis in various fields. In order to make better predictions, fractional order accumulation is used for optimization.
Its definition is as follows.
Definition The original nonnegative sequence is The whitened equation of GMC(1,N) model is written as The grey derivative is noted as x t x t dt = + ) ) . 1 2 , , , N b b b and u are parameters. They are calculated by the least-squares as Then, the approximate time-response function of GMC(1,N) is By inverse cumulative generation operator processing, the predicted value is In this paper, the mean absolute percentage error (MAPE) is used for evaluating the models, which is calculated as: Where t A is the actual value, t F is the simulated values, and n is the number of years. The smaller the MAPE, the closer to the actual value and the more accurate is the forecasting model.

Particle swarm optimization
The particle swarm optimization (PSO) algorithm is a commonly parameter optimization algorithm, which iteratively updates the initialized particles until it reaches the optimal solution or the number of iterations is exhausted. It is used to optimize the fractional order to achieve the desired result in this paper.
Step 1: The position and velocity values of the first generation particles are randomly generated. Step 3: Update the speed and displacement of particles as follow: Where  is inertia factor, k and i are the numbers of generation and particles, 1 c , 2 c is learn factor, 1 r , 2 r is random number.
Step 4: All particles are compared with the particle of pbest . The better is used as the current pbest .
Each pbest is compared with the last iteration gbest , the better is used as the current optimal gbest .
Step 5: If the optimal precision is reached or the number of iterations is exhausted, the operation is stopped. Otherwise, cycle to step 2.
The relevant parameters of PSO in this paper are shown in Table 2.

Empirical study
We will discuss the relationship between Baidu index and CAI from three aspects: hazards, cause and response. Correlation analysis was performed on Baidu index and CAI by grey correlation analysis. These data are summed with correlation degree as the weight to get the model input data sequence. The GMC(1,N) and PSO algorithm were used to obtain prediction results and MAPE values. The specific process is as follows:   Table 4.    When λ = 0.10 By the ordinary least square method, the simulated value and error of fitting and verification are shown in Table 6. The MAPE of fitting and verification are 0.501% and 0.809%. The results show that the model can predict CAI well. The value of CAI from 2019 to 2023 is predicted under the assumption of increase rate of H-value.   Table 8 shows the value of CAI and C-BI from 2013 to 2018. The prediction process is same as the Part 4.1. The results are shown in Table 9. As shown in Table 9, CAI from 2019 to 2023 shows different trends under five different growth rates of C-BI value. It can be seen more intuitively from Fig. 4. CAI increases when the C-BI value growth rate increases. CAI decreases when the C-BI value growth rate decreases. Moreover, CAI increases or decreases faster when the added value becomes larger or smaller. It indicates that the C-BI value has a significant effect on the value of CAI.  Table 10 shows the value of CAI and R-BI from 2013 to 2018. The prediction process is same as the Part 4.1. The results are shown in Table 11.

Comparative analysis
In the above process, the CAI simulation and prediction results were obtained through the Baidu index of hazards, causes and responses. The correlation degree between each keywords and CAI are at a high level. It shows that Baidu index can reflect CAI to a certain extent. The error is very small. So the grey model is very suitable for annual CAI prediction. Therefore, the prediction result has high reliability.
According to the forecasting results, as the air quality-related Baidu Index grows faster, CAI becomes larger. CAI increase or decrease is great when the growth rates of the air quality-related Baidu Index is also at a higher or lower level. In other words, its speed of increase or decrease is increasing.
The overall trend is similar to H-BI. But compared to H-BI, C-BI greatly affects CAI. It can be seen 20 more intuitively from Fig.6 and Fig.7. They show that the influence of air pollution C-BI on CAI is dominant. The impact of R-BI on CAI is similar to that of H-BI. Based on this similarity, there may be some internal connections between them. According to the theory of protection motivation, perceived risk determines the willingness to respond. It means that the public also search for responses of air quality while search for hazards of air quality. Fig. 6. The CAI value when the decrease rate of Baidu index is 10% Fig. 7. The CAI value when the growth rate of Baidu index is 20%

Conclusion and implication
We use grey correlation analysis, grey multivariate convolution model and particle swarm optimization to discuss the relationship between CAI and Baidu index. According to the analysis results, as the air quality-related Baidu Index grows faster, air quality becomes worse. When Baidu index growth is at a high level, air quality deteriorates further. The relevant search for the cause of air quality can strongly reflect air quality. Searches for keywords such as smog, PM2.5, PM10, etc., all reflect the change of air quality with a high degree.
Baidu index reflects the public's grasp of the air quality information and channels. Faced with air pollution, most people will take actions. They can be roughly divided into two categories: People who understand air pollution well can directly take corresponding actions and need not to search for related knowledge. People with few knowledge of air pollution will search for relevant knowledge. Obviously, the former can deal with air pollution better. These people are conducive to the improvement of air quality.
Due to little or no understanding of air pollution, the response of the latter is reflected in searching air pollution-related information. When air pollution occurs, this group contributes little to improving air quality. So we can improve air quality by reducing the number of the latter population. Based on the above discussion, this article makes the following suggestions for improving air quality.
1. Improving the ability of the public perceive and respond to air pollution Environmental risk perception can effectively help the public maintain a certain extent of sensitivity and understand the process, manifestations and coping methods of air pollution. When air pollution occurs, it can be detected immediately. The strengthening of the public's response knowledge not only helps to reduce self-exposure and avoid air pollution hazards, but also contributes to the improvement of air quality.
The first barrier for prevent air pollution is formed through individual-centric prevention and control of all the people.

Strengthening effective communication among environmental protection departments and the public
Due to the complex information sources and different measurement methods, there is a certain discrepancy between the information released by all parties, which has caused public doubts. Air quality information is the main basis for the public to make decisions for air pollution. Therefore, the creation of government's authoritative information release system and the construction of air pollution information communication channels should be completed as soon as possible. Only in this way can the public obtain air quality related information opportunely and accurately. With relevant information about the nature, causes, and potential hazards of air pollution, the public can prevent and improve air pollution in a targeted manner.
3. Establishing public participation system for air pollution governance The governance of air pollution requires the joint participation of government, non-governmental organizations, enterprises and the public. The public is both the demander and the promoter of the air governance process. Consequently, it is urgent to establish a subjective awareness of the public's active participation in air pollution governance. It contributes to motivate the public to take substantive action to protect the environment. For example, low-carbon travel, tree planting and energy conservation etc., which are all actions that help to improve air quality.