Estimation model of PM2.5 Concentrations Based on Spatiotemporal Adaptability and Satellite Remote Sensing


 Based on satellite remote sensing AOD, we can estimate and monitor the continuous changes of PM2.5, which solved the disadvantages of traditional ground station discrete monitoring. Four-dimensional spatiotemporal heterogeneity is not considered in the construction of traditional empirical regression models, such as geographically weighted regression model (GWR) and spatiotemporal geographically weighted regression model (gtwr). To solve this four-dimensional spatiotemporal nonstationarity, this article proposes and constructs a spatiotemporal adaptive fine particulate matter (PM2.5) concentration estimation model: 4D-GTWR by introducing a DEM (Digital elevation model) and time effects into a GWR model. This method solves the heterogeneity between the three-dimensional space and one-dimensional time by constructing a four-dimensional space kernel function and obtaining its weight. Based on PM2.5 ground observation data and meteorological data collected from December 2017 to February 2018 in Zhengzhou City, Henan Province, PM2.5 estimations are obtained from MODIS MYD-3K AOD data using the GWR, TWR, GTWR and 4D-GTWR models. The results showed that the MAE (mean absolute error) of the 4D-GTWR model decreased by 54.13%, 54.06% and 37.90%, compared to those of the GWR, TWR and GTWR models, respectively, and that the PM2.5 concentrations predicted by the 4D-GTWR model were closest to the measured values. The R2 (the correlation coefficient) of the 4D-GTWR model was 0.9496, which was better than those of the GWR (R2 =0.7761), TWR (R2 =0.7763) and GTWR (R2=0.8811) models. The 4D-GTWR model can not only improve the precision of PM2.5 estimations but can also reveal the four-dimensional spatial heterogeneity of PM2.5 concentrations and the differentiation of the DEM's influence on the spatial dimensions.

from 1998 to 2012. Kumar et al. (2007) established a multivariate linear regression relationship with PM2.5 by using MODIS AOD data and meteorological data obtained in Delhi, India, and the correlations between the estimated results and the actual observed values varied between 0.60 and 0.81. Bai et al. (2016)proposed using AOD, relative humidity, wind speed and temperature data as well as other indicators to invert PM2.5 concentrations. Mao et al. (2012) added AOD data to the traditional land use regression (LUR) model to estimate the PM2.5 concentrations in Florida, USA. The correlation coefficient of the model was 0.63. The multiple linear regression model added a variety of variables, making the decision coefficient of the model higher than that of the single linear regression model and improving the performance of the model.
To solve the changes in PM2.5 concentrations over time and space, researchers have considered spatially and temporally heterogenous PM2.5 concentrations. For example, Song et al. (2014)studied the spatial nonstationarity of PM2.5 concentrations in the Pearl River Delta from May 2012 to September 2013 using a geographically weighted regression (GWR) method as the research model and found that the predictive abilities of the multiple linear regression method and semiempirical model were significantly improved. Huang et al. (2010) proposed a geographically and temporally weighted regression (GTWR) method for predicting housing prices in their research area, and the results showed that the goodness of fit of the GTWR model was better than that of the GWR model, verifying the nonstationarity problem in space and time. Chu et al. (2015)used PM data collected in Taiwan from 2005 to 2009 to study the relationship between PM10 and PM2.5 using multiple linear regression models: the GWR model and the GTWR model. The results showed that the outputs of the GTWR and GWR models were relatively consistent, but the results of the GTWR model had a better fitting effect and stronger spatial and temporal interpretation abilities. Zhao Yangyang (2016)and others combined collaborative training with a spatiotemporal geographically weighted regression technique and proposed a collaborative spatiotemporal geographically weighted regression method. Taking the PM2.5 concentrations in Beijing, Tianjin and Hebei from March to July 2015 as an example, the GTWRs of different kernel functions were used to carry out comparative analysis experiments. The results showed that the method can improve the accuracy of PM2.5 concentration estimations when the number of spatiotemporal samples is insufficient. Rao Lanlan (2017)used the geographically weighted regression model and adaptive bandwidth to estimate near-surface nitrogen dioxide concentrations and compared the results with those obtained using the traditional linear regression model, time-weighted regression model and geographically weighted regression model. The results showed that the correlation between the near-surface nitrogen dioxide concentrations estimated by the GTWR model and the ground observation values was best and that this model also resulted in the smallest error. In short, the GWR and GTWR models are widely used in the fields of social economics, agricultural production, urban geography, meteorology, etc. (Danlin, 2007). In recent years, the geographically and temporally weighted regression model has been widely considered by researchers, facilitating abundant achievements in theory and application in a wide range of application fields.
However, the GWR and GTWR models do not take into account the four-dimensional nonstationarity resulting from the considered three-dimensional space and one-dimensional time. Specific research, such as the inversion of PM2.5 concentrations, considers model parameters in the study area with four-dimensional spatiotemporal changes, and four-dimensional spatiotemporal nonstationarity exists. Therefore, in this paper, on the basis of the GWR and GTWR models, combined with PM2.5 concentration data, a 4D-GTWR model is proposed to estimate PM2.5 concentrations. First, the traditional GWR model is extended to a four-dimensional geographically and temporally weighted regression (4D-GTWR) model, and this new model is applied to estimate PM2.5 concentrations in four-dimensional space-time. Second, according to the evaluation index of the model, the goodness of fit is estimated by a comparison between the models. Finally, taking the research area in Zhengzhou, Henan Province, as an example, the accuracies of the PM2.5 inversion results output by the models were compared and analyzed.

Materials and methods
Model definition and design .Usually controlled and influenced by topography, the development process of the natural geographical environment changes, and global-or regional-scale high-resolution digital elevation models (DEMs) play an extremely important role in studies of climatic and environmental changes such as seismic geological natural disasters and atmospheric environmental pollution (li zhenhong et al., 2018). The GTWR model does not take into account the change of the nonstationarity relationship of the four-dimensional space-time with a supplementary DEM; namely, the four-dimensional spatial heterogeneity of the three-dimensional space and the one-dimensional time is not taken into account. Therefore, in this paper, a DEM is considered for integration into the calculations of spatial and temporal distances, and an analysis and calculation method of the four-dimensional spatial and temporal distances is proposed to establish a four-dimensional geographically and temporally weighted regression (4D-GTWR) model. The expression of the 4D-GTWR model is as follows:  is the regression coefficient parameter of each independent variable of sample point i, and  is the random error of sample point i, which obeys the independent and identical distribution of Using the same calculation as that applied in the GTWR model, the regression 4 coefficients at the i-th sample point of the 4D-GTWR model are estimated to be , as follows: represents the four-dimensional spatiotemporal weight matrix. The estimated value, i ŷ , of the dependent variable of sample point i is calculated as follows: where i X represents the vector of line i in matrix X. Therefore, the dependent variable regression vector, ŷ , of each sample point can be calculated as follows: where the matrix S is the hat matrix of the 4D-GTWR model. According to the observed value y and the fitted value ŷ , the residual sum of squares (RSS) of the 4D-GTWR model can be calculated as follows: and I is the unit matrix.
Similarly, it is assumed that the fitting value ŷ is the unbiased estimation of Then,

RSS
can be expressed as follows.
Therefore, the unbiased estimation, 2  , of the random error variance in the 4D-GTWR model is as follows.

Four-dimensional spatiotemporal kernel function
Similar to the construction of the GTWR model, the core of the 4D-GTWR model involves constructing a four-dimensional space-time kernel function to calculate the four-dimensional space-time weight. When considering the aspect of spatiotemporal proximity, it is necessary to consider the fusion of the three-dimensional spatial distance and the one-dimensional temporal distance, that is, the change in the regression coefficient with the change to four-dimensional space-time. Therefore, it is necessary to construct four-dimensional space-time kernel functions on the basis of spatial kernel functions. The following describes the construction method of the temporal distance and spatial distance of the 4D-GTWR model.
Considering that the DEM mainly has a certain impact on the spatial dimension, based on the study of spatial and temporal variations in PM2.5 in this paper, the pollution processes of PM2.5 and other pollutants undergo four-dimensional spatial and temporal changes; that is, four-dimensional nonstationary spatial and temporal changes occur. Therefore, for any number of monitoring points, the three-dimensional spatial coordinates among them differ. Considering that two-and three-dimensional spaces may have different scale effects, the 4D-GTWR model is introduced into  to express the scale differences between the distances. Therefore, according to the Euclidean spatial distance, the three-dimensional spatial distance s d between monitoring stations A and B can be expressed as follows: where  can represent various operators. On this basis, by combining the spatiotemporal distance construction method, the four-dimensional spatiotemporal distance can be expressed as follows: The scale-effect symbols  and  usually adopt additions; that is, a linear combination of the four-dimensional spatiotemporal distance can be used to obtain the four-dimensional spatiotemporal distance of the 4D-GTWR model as follows: parameter is a four-dimensional spatiotemporal distance adjustment factor that is used to balance the scale differences among the various four-dimensional spatiotemporal distances. By using the Euclidean spatial distance to expand the above formula, the four-dimensional spatiotemporal distances of sample points i and j can be obtained as follows: Figure 1. Similarly, by taking the Gaussian kernel function as an example, the four-dimensional spatiotemporal weight function of the 4D-GTWR model can be obtained as follows: represent the two-dimensional spatial bandwidth, the spatial bandwidth of the DEM, the temporal bandwidth and the four-dimensional spatiotemporal bandwidth parameters, respectively. Similarly, represent the two-dimensional spatial weight, the spatial weight of the DEM and the temporal weight, respectively. Therefore, the spatiotemporal weight matrix of the 4D-GTWR model is expressed as follows:

Research data
Zhengzhou, the capital city of Henan Province, is an essential central city in central China, a leading comprehensive transportation hub and a core city of the Central Plains Economic Zone. It lies between 112°42′E-114°14′E and 34°16′ N-34°58′N. Compared with other urban areas, the atmosphere in Zhengzhou is relatively stable. Because the air is not very fluidity, the tiny particles in the air will gather and float in the air, which is not conducive to the diffusion of pollutants, thus causing more and more severe haze pollution.
More than 30 automatic air quality monitoring stations have been established in Zhengzhou city to monitor the contents of NO 2 , SO 2 , NO, PM10, PM2.5 and other pollutants in the air. These stations include monitoring stations in the Gongyi, Dengfeng, Xinmi, Xinzheng, Xingyang, Zhongmou and Zhengzhou districts. These stations are mainly used to monitor the air quality in vehicle traffic environments and the impact of air pollution on pedestrians. Because the data of some stations are not available, only PM2.5 hourly mean concentration monitoring data of 31 stations (9 state-controlled monitoring stations and 22 city-controlled monitoring stations) with uniform distributions were selected in this paper. The distribution of air pollution monitoring stations is shown in Figure 2, and the basic information of the stations is shown in Table 2. The PM2.5 concentration data recorded at each site were downloaded from the website of the National Meteorological Information Center of China (http://data.cma.cn/), and some air quality source data information is shown in Table 1. Meteorological factors, including the air pressure (hPa), temperature (°C), relative humidity (%), rainfall (mm), wind speed (km/h) and wind direction (°), all have different effects on changes in PM2.5 concentrations. Because the geographic locations of the meteorological stations differ from those of the PM2.5 monitoring stations, the meteorological factor values at the PM2.5 stations are derived from the data recorded at surrounding meteorological stations via kriging interpolations. The meteorological data source is the hourly mean data provided by the Henan Meteorological Bureau. There are seven meteorological monitoring stations. The distribution of the meteorological data is shown in Figure 1, and the information of the stations is shown in Table 3. The DEM data were obtained from the Geospatial Data Cloud Platform (http://www.gscloud.cn) of the Computer Network Information Center of the Chinese Academy of Sciences, with a spatial resolution of 30 m.

Comparative analysis of model results
In this paper, a total of 77 observed dataset were selected, and the GWR, TWR, GTWR and 4D-GTWR models were used to estimate and predict PM2.5 concentrations. Moreover, the correlation coefficient (R 2 ), root mean square error (RMSE) and mean absolute error (MAE) were selected to evaluate the accuracy of the models. The Pearson correlation coefficients (R 2 ) between the predicted PM2.       Tables 7-10 show the minimum (Min), lower-quartile (LQ), median (Med), upper-quartile (UQ) and maximum (Max) values of the coefficients of variables in the GWR, TWR, GTWR and 4D-GTWR models, respectively. The variations in the coefficients of variables at are described through time and space. In terms of the median values, the coefficients of the AOD, relative humidity and wind speed are positive, indicating that they are positively correlated with PM2.5, while the coefficients of the air pressure, air temperature and wind direction are negative, indicating that they are negatively correlated with PM2.5. These results are consistent with the variation trends of the variables shown in Figure 4. The estimation accuracies of the GWR and TWR models are comparable, probably because the experimental data have 31 coordinate points in space and the time span is only 3 months, meaning that the effects of temporal and spatial changes are relatively balanced. The estimation accuracies of the GTWR and 4D-GTWR models are significantly higher than those of the GWR and TWR models. Therefore, the fitting ability of the 4D-GTWR model is the best relative to the GWR, TWR and GTWR models for the four-dimensional spatiotemporal nonstationary relationship. At the same time, the four-dimensional spatiotemporal nonstationarity with the DEM is verified to be more significant than that with separate temporal and spatial nonstationarity, indicating that the three-dimensional spatiotemporal distance plays an important role in the calculation of the four-dimensional spatiotemporal distance in the 4D-GTWR model and further shows that the DEM factor has an important influence on changes in the PM2.5 concentrations. At the same time, it is verified that the effect of four-dimensional space-time non-stationarity with DEM is better than that of space-time non-stationarity.

PM2.5 estimation results
The parameter coefficients of each model inversion used to obtain the PM2.5 concentrations are obtained for Zhengzhou City, Henan Province, taking February 26, 2018 as an example, as shown in Table 11. Then, the spatial and temporal prediction distributions (at a 3-km resolution) of the PM2.5 concentration inversion output by each model are shown in Figure 5, Figure 6, Figure 7 and Figure 8. Overall, the spatial distribution of PM2.5 is patchy and has obvious regional distribution characteristics. Moderate and high concentrations of PM2.5 are mainly distributed in the eastern part of Zhengzhou City, including in Zhongmou County and Xinzheng City. The PM2.5 concentrations in these regions are generally higher than 75 µg/m 3 . In other areas, the pollution is relatively low, with PM2.5 values generally less than 75 µg/m 3 , and the air quality is good. By comparing the area proportion of PM2.5, five pollution levels inverted by each model were compared with the source data obtained from the PM2.5 stations, as shown in Table 12 and Figure 9. It can be concluded that the 4D-GTWR model has the best inversion effect at fine, moderate and severe pollution levels, but the differences among models at the good and light pollution levels are not obvious. Compared with other models, the 4D-GTWR model inversion predicts a larger range of PM2.5. By comparing the source PM2.5 data, the 4D-GTWR model inversion is shown to predict PM2.5 concentrations that are closer to the true values than those output by other models       Figure 9 Statistics of the area proportions of PM2.5 pollution levels

Conclusion
In this paper, six variables were selected as the influencing factors of PM2.5 concentrations, including the AOD, relative humidity, temperature, wind speed, wind direction and air pressure. Based on inversion methods used for PM2.5 concentrations in China and globally, GWR, TWR and GTWR models were constructed, and a 4D-GTWR model was proposed. The PM2.5 concentration and its influencing factors in Zhengzhou City, Henan Province, were analyzed as an example. The main conclusions are as follows.
(1) Combined with the basic theory and estimation methods of the GWR, TWR, GTWR and 4D-GTWR models, we can further grasp the spatiotemporal differentiation among different influencing factors. In the analysis and research of PM2.5 concentration inversions, the proposed model can better reflect the impacts of different factors on PM2.5 concentration inversions in the spatial and temporal dimensions with a high goodness of fit.
(2) A four-dimensional geographically and temporally weighted regression method is proposed in this study. The GWR and GTWR models usually use Eurospatial distance measurements. This paper introduces a DEM to construct a four-dimensional spatial distance measurement that can better reflect the four-dimensional spatial distribution characteristics of PM2.5 and provide a more real spatial distance to analyze the actual situation more effectively. Empirical research was carried out on 77 groups of measured data from December 2017 to February 2018 in Zhengzhou City, Henan Province. The goodness of fit R 2 , MAE and RMSE values of the models were used as evaluation criteria for analysis and comparison. The results showed that the 4D-GTWR model was superior to the traditional GTWR and GWR models in which the Euclidean distance is used. In the analysis and study of PM2.5 concentration inversions, the four-dimensional spatiotemporal variation in PM2.5 can be explained more objectively using the proposed model. Therefore, it is meaningful to introduce four-dimensional spatiotemporal nonstationarity into the GWR model, and the 4D-GTWR model provides an effective method for estimating the mass concentration of PM2.5.
However, this study still has some limitations that warrant further study. For example, only the DEM data of 31 sites and PM2.5 concentration data covering 3 months are used in this study, so the four-dimensional spatiotemporal heterogeneity deficiency may lead to the degradation of the performance of the 4D-GTWR model.

Data and codes availability statement
The data and codes that support the findings of this study are available in ["figshare"] with the identifier(s) [data "doi: 10.6084/m9.figshare.9974168"].