3.1 Statistics of PM2.5 concentration in Jiaozuo
Histograms for the daily mean and maximum PM2.5 in Jiaozuo show skewed distributions(Fig), indicating the Pearson’s coefficient and common linear regression are not suitable to be used for PM2.5.Transformation is usually used to convert such skewed-distribution variables into more Gaussian distribution variables(Qian et al., 2015). Here, we use a logarithmic transformation.
The logarithmic transformed data seem to be more Gaussian like. All the following modelling process is based on these logarithmic transformed data. Although common linear regression can be used here, but we still prefer GLM considering that GLM is also suitable for the data of normal distribution.
The Frequencies of the daily maximum PM2.5 in each hour (Fig. 2) show that daily peaks can happen in every hour, but mostly the peaks happen in two periods: 9:00-12:00 and 20:00-24:00. The frequencies of maximum at night (20:00-24:00) is much higher than those in 9:00-12:00. Similarly, the average PM2.5 for each hour shows two peaks in 10:00-11:00 and 20:00-21:00 and the latter is usually higher than the former.
3.2 Correlation maps
The CCs were calculated between the mean/max daily PM2.5 of Jiaozuo and the time series extracted at each grid box from the LSVs, then the correlation maps were produced. According to these maps, the high-correlation areas are not necessarily close to Jiaozuo or in its surrounding area.
3.2.1 Geopotential height
The correlation maps between the PM2.5 and the geopotential height show that heavily polluted days correspond to the negative-correlation area at the Mongolian Plateau at the 500 hPa layer and the positive-correlation area stretching from the east of North China to the Korean Peninsula (Fig. 3). The negative-correlation area covering the south part of China and the positive-correlation area covering the Northwest Pacific and the Indian peninsula on the 850 hPa and 1000 hPa layers. This shows that the main circulation in heavy air pollution days of Jiaozuo is characterized by the relative low-pressure troposphere on the Mongolian Plateau and the relative high pressure from the east of North China to the Korean Peninsula, indicating that the cold air is weakly transported from the Mongolian Plateau to the south. The low pressure in the South Central China appears in the lower troposphere. At the location of Jiaozuo, the relative high pressure appears at the top of troposphere and the relative low pressure appears at the bottom of troposphere. The existence of relative high pressure over the Indian subcontinent and the Pacific Ocean indicates that the low-layer atmosphere tends to convergent circulation, thus, under such conditions, the pollution diffusion in eastern China is weak.
According to the correlation maps, there are one negative-correlation center (Mongolia-Siberia) and one positive-correlation center (North China-Korean Peninsula) at the 500hPa layer, while there is one negative correlation center (the south-eastern part of China) and two positive-correlation centers (Indian subcontinent and the Northwest Pacific). The geopotential at these centers were used as predictors to construct SD models. The correlation centers at 700hPa layer and 850hPa layer were not used here because these two layers can be regarded as the transition from the 500hPa layer to the 1000hPa layer and are not typical enough.
3.2.2 Relative humidity and specific humidity
The correlation maps calculated from relative humidity and specific humidity at the four pressure levels are generally similar and show a broken structure of high-correlation areas which can pass the significant tests at the 0.05 level. There are many such small patches of high or low correlation areas. Nevertheless, the broken structure obtained from specific humidity is not apparent as that from relative humidity.
From the correlation map of relative humidity at 500hPa, there is a negative-correlation belt stretching from the Indian subcontinent to Southwestern China and a tiny high-correlation region in the lower Yellow River Basin. At the 700hPa level, there are negative-correlation areas in India, Northeastern China, Northwest Pacific, a large positive-correlation area at the Mongolia-Siberia region and a small positive-correlation area between the lower reaches of the Huai River and the Yangtze River (the JiangHuai region). The specific humidity at 700hPa has some similar patterns as the relative humidity at 700hPa, except that there is no apparent positive-correlation area at the Mongolia-Siberia region, and no negative-correlation area in Northeastern China. At the 700hPa level, the positive-correlation region obtained by specific humidity at the Jiang-Huai region has a larger area than that obtained by the relative humidity. At the 850hPa/1000hPa level, the positive-correlation area covering the Jiang-Huai region becomes larger and stretches to the south of China. The negative correlation center is also apparent in the west Pacific Ocean. The above patterns indicate that heavy air pollution in Jiaozuo is related to high humidity in the southeast of China at the bottom of the Troposphere. From lower levels to upper levels of the troposphere, the high-correlation region surrounding Jiaozuo gradually moves northward and becomes smaller.
We select the three high-correlation centers from the relative humidity at 700hPa as model predictors: the negative-correlation area at India subcontinent, the negative-correlation area in Northeastern China and the positive-correlation area in the Jiang-Huai region. Two correlation centers from the relative humidity at 1000hPa are also selected: the negative-correlation center at the Northwest Pacific, and the positive-correlation center at the eastern part of China.
3.2.3 Temperature
The correlation maps obtained from air temperature show a high-positive center in the east of China and a negative region in the Mongolia-Siberia region, indicating that heavy PM2.5 in Jiaozuo is related to the relatively warmer air temperature in the surrounding area of Jiaozuo and the relatively colder air temperature in the Mongolia-Siberia region, implying a stationary and weak diffusion condition for air pollution. Usually, in winter, when cold waves move through China, the air temperature in eastern China becomes colder, and the heavy air pollution can be blown away in a relatively short time. This circulation for cold-wave days is contrary to the above patterns of air temperature. From lower levels to upper levels in the troposphere, the high-correlation region surrounding Jiaozuo gradually moves northward, indicating the warm air mass moves to Eastern China earlier in the upper-level troposphere than in the lower-level troposphere. The positive-area of air temperature at 1000hPa in Eastern China seems broken or becomes weaker in a belt along the Taihang Mountain, and this weaker belt stretches southward to the Yangtze River basin. Considering all the above patterns, we selected both the centers in Mongolia-Siberia region and the eastern China as model predictors.
3.2.4 Wind
From the correlation maps from U-wind, a long belt of negative correlation stretches along the longitudinal direction, from North China to Japan, in 500hPa and 700hPa levels, indicating a relatively westward wind anomaly. In lower level troposphere (850hPa and 1000hPa), the negative area is connected with a large region in the Northwest Pacific, implying that the heavy air pollution in Jiaozuo is related to the westward wind anomaly widely existing in East Asia. Meanwhile, in the upper levels (500hPa and 700hPa), there is also a large and long positive-correlation area in the north of Asia continent, covering all the Central Asia, Mongolia, Northeastern China and Siberia, indicating a strong west wind anomaly in the upper-level troposphere. This pattern is not shown in lower levels, implying that it is related to the stable westerly jet.
In the upper levels of the U-wind, there is a large positive-correlation area from the whole area of East China to the Mongolia Plateau and Siberia, indicating a strong northward wind anomaly. Meanwhile a negative-correlation area exists in the Northwest Pacific, indicating a southward wind anomaly there. In the lower levels, this positive-correlation area becomes smaller and weaker, and the negative-correlation area seems moves westward and covers Northeastern China.
All the above patterns show the heavy air pollution is related to a low-pressure system with weakened wind blowing from the Mongolia-Siberia region to the surrounding areas in Eastern China. Under this atmospheric condition, the strong westerly in the subarctic area can prevent cold air mass from moving to mid-latitude East Asia. For wind variables, we select the negative center (South China) and the positive center (North China) of U850, V500 and V850 as predictors.
3.2.5 Main circulations in high/low PM2.5 days
By comparing the geopotential height at 500hPa in low-PM2.5 days and high-PM2.5 days, low-PM2.5 days reflect comparatively low pressure and strong northwestern wind over the southeast of China. In both high PM2.5 and low PM2.5 days, west wind dominates the area north to 35°N, indicating the stable westerly jet. Comparatively, in low-PM2.5 days, the wind in Southeastern China become more southerly.
At 850hpa, both in high-PM2.5 and low-PM2.5 days, there is a high pressure belt stretching from southwest to northeast in China, covering the Sichuan basin, Loess Plateau, North China and Northeastern China. A low-pressure area also exists in the Northwest Pacific. In the high-pressure belt, the area over North China in high-PM2.5 days poses a weak and divergent air flow to the surrounding areas.
Comparatively, in high-PM2.5 days, the pressure in this belt is not as high as that in low-PM2.5 days and the pressure in the Northwest Pacific is also not as low as that in low-PM2.5 days. Therefore, the land-sea pressure gradient during low-PM2.5 days are larger than that in high-PM2.5 days, producing a strong wind which is conducive to the dissipation of PM2.5 in the eastern part China.
Combining the geopotential heights at the upper level (500hpa) and the lower level(850hpa), it is obvious that high PM2.5 concentration usually corresponds to comparatively high pressure in upper air and low pressure in lower air over North China, which implies a static stratification of atmosphere. In low PM2.5 days, on the contrary, comparatively low pressure in upper air and high pressure in lower air imply a strong downward air flow, which is usually related to the cold wave coming from the north pole.
Table 1
Screened predictors based on correlation maps. All these predictors are anomalies relative to the interannual mean during the winter months of 1979-2018.
predictor
|
variable
|
Correlation sign
|
area
|
longitude
|
latitude
|
X1
|
Z500
|
negative
|
Mongolia-Siberia
|
73°E
|
61°N
|
X2
|
Z500
|
positive
|
North China, Korean Peninsular
|
119°E
|
36°N
|
X3
|
Z1000
|
negative
|
Southern China
|
113°E
|
29°N
|
X4
|
Z1000
|
positive
|
India subcontinent
|
75°E
|
19°N
|
X5
|
Z1000
|
positive
|
Northwest Pacific
|
137°E
|
33°N
|
X6
|
RH500
|
negative
|
India subcontinent
|
84°E
|
13°N
|
X7
|
RH700
|
negative
|
Northeast China
|
125°E
|
45°N
|
X8
|
RH700
|
negative
|
India subcontinent
|
84°E
|
15°N
|
X9
|
RH700
|
positive
|
Yangtze River
|
115°E
|
32°N
|
X10
|
RH1000
|
positive
|
North China
|
115°E
|
32°N
|
X11
|
RH1000
|
negative
|
Northwest Pacific
|
142°E
|
30°N
|
X12
|
T700
|
negative
|
Mongolia-Siberia
|
73°E
|
61°N
|
X13
|
T700
|
positive
|
North China
|
119°E
|
36°N
|
X14
|
T850
|
negative
|
Mongolia-Siberia
|
73°E
|
61°N
|
X15
|
T850
|
positive
|
North China
|
120°E
|
30°N
|
X16
|
U850
|
negative
|
North China
|
113°E
|
35°N
|
X17
|
U850
|
positive
|
South China
|
111°E
|
26°N
|
X18
|
V500
|
positive
|
Loess Plateau
|
107°E
|
36°N
|
X19
|
V500
|
negative
|
Northwest Pacific
|
140°E
|
38°N
|
X20
|
V850
|
positive
|
South China
|
116°E
|
29°N
|
X21
|
V850
|
negative
|
Northwest Pacific
|
143°E
|
23°N
|
3.3 Performance of the downscaling models
3.3.1 Parameter estimation and the final models
The LASSO algorithm was used during the GLM training (parameter estimation), and some of the predictor parameters/coefficients were replaced by zeroes. The parameters were estimated based on four different sample sets, respectively and the RMSE and the CCs (based on the logarithmic transformed PM2.5) (Fig. 2-3). The predictor set including wind (15 predictor values) get better model performance than the predictor set without using wind variable (21 predictor values). Generally, the predictors extracted from averaged multiple grid boxes has poorer model performance than the single grid box based predictors. The differences between the simulated PM2.5 under the four combinations of the comparison schemes (the single grid box combining 15 predictors, the average of multiple grid boxes combining 15-predictor, the single grid box combining 21 predictors, the average of multiple grid boxes combining 21-predictor) are generally small and share the similar variations (Fig. 6).
Table 2
Pearson’s Correlation coefficients between the GLM simulated logarithmic daily PM2.5 and the observed ones in Jiaozuo, during the winter months of 2014-2018.
|
Validation period
|
Predictors from single grid box
|
Predictors from area-averaged multiple grid boxes
|
predictand
|
15 predictors
|
21 predictors
|
15 predictors
|
21 predictors
|
calibration
|
validation
|
calibration
|
validation
|
calibration
|
validation
|
calibration
|
validation
|
Mean PM2.5
|
2014
|
0.6156
|
0.6490
|
0.6344
|
0.6756
|
0.6067
|
0.6441
|
0.6360
|
0.6731
|
2015
|
0.6370
|
0.4286
|
0.6567
|
0.4756
|
0.6529
|
0.4803
|
0.6605
|
0.4709
|
2016
|
0.6271
|
0.5257
|
0.6405
|
0.5768
|
0.6298
|
0.5373
|
0.6320
|
0.5678
|
2017
|
0.6227
|
0.6572
|
0.6490
|
0.6445
|
0.6293
|
0.6258
|
0.6515
|
0.6204
|
Maximum PM2.5
|
2014
|
0.5415
|
0.5924
|
0.5709
|
0.6171
|
0.5610
|
0.6150
|
0.5679
|
0.6048
|
2015
|
0.5857
|
0.4120
|
0.5981
|
0.4453
|
0.5955
|
0.4303
|
0.5987
|
0.4216
|
2016
|
0.5647
|
0.4852
|
0.5637
|
0.5277
|
0.5619
|
0.4845
|
0.5578
|
0.4986
|
2017
|
0.5722
|
0.5772
|
0.5842
|
0.5688
|
0.5729
|
0.5357
|
0.5833
|
0.5305
|
Table 3
RMSE obtained between the GLM simulated logarithmic daily PM2.5 and the observed ones in Jiaozuo, during the winter months of 2014-2018.
predictand
|
Validation period
|
Predictors from single grid box
|
Predictors from area-averaged multiple grid boxes
|
15 predictors
|
21 predictors
|
15 predictors
|
21 predictors
|
calibration
|
validation
|
calibration
|
validation
|
calibration
|
validation
|
calibration
|
validation
|
Mean PM2.5
|
2014
|
4.4775
|
4.5896
|
4.4695
|
4.5722
|
4.4747
|
4.5679
|
4.4729
|
4.5609
|
2015
|
4.5305
|
4.3888
|
4.5228
|
4.3934
|
4.5197
|
4.4044
|
4.5201
|
4.4007
|
2016
|
4.4312
|
4.7066
|
4.4353
|
4.6845
|
4.4379
|
4.6796
|
4.4359
|
4.6859
|
2017
|
4.5331
|
4.3628
|
4.5201
|
4.3936
|
4.5265
|
4.3833
|
4.5161
|
4.4097
|
Maximum PM2.5
|
2014
|
4.9451
|
5.0767
|
4.9442
|
5.0714
|
4.9471
|
5.0635
|
4.9472
|
5.0631
|
2015
|
4.9834
|
4.9432
|
4.9820
|
4.9420
|
4.9796
|
4.9525
|
4.9821
|
4.9440
|
2016
|
4.9152
|
5.1689
|
4.9124
|
5.1772
|
4.9206
|
5.1491
|
4.9146
|
5.1727
|
2017
|
5.0233
|
4.8221
|
5.0138
|
4.8280
|
5.0150
|
4.8286
|
5.0101
|
4.8419
|
According to the CCs between the simulated and the observed, the models trained with samples from 2015-2017 get the largest CCs, both for the mean PM2.5 and the maximum PM2.5, with an averaged CC of 0.642 and 0.584, respectively. The models trained with the samples of 2015-2017 are as follows:
ln(PM2.5,mean) = 4.4483 - 0.0878X1 + 0.0698X4 + 0.0036X5 - 0.0648X6 - 0.0195X8 + 0.2354X10
- 0.1X11 - 0.0478X14 + 0.0083X15 + 0.0345X17 + 0.049X18 - 0.0182X19
ln(PM2.5,max) = 4.9244 - 0.0568X1 + 0.0679X4 - 0.0239X6 -0.0397X8 + 0.1914X10
-0.0649X11 – 0.0439X14 + 0.0041X15 + 0.0318X17 + 0.054X18 – 0.0176X19
Here, the predictors X1−X19 are listed in Table 1.
We use these models based on samples in 2015-2017 to produce results for further analysis, since relatively good model performance was obtained based on these samples. Two additional PM2.5 series observed at other two sites (Gaoxinqu and Yingshicheng) in Jiaozuo were also used to validate the models.
For the models for daily mean PM2.5, the CCs between the observations and the modeled results are 0.52 for both the sites (Gaoxinqu and Yingshicheng). The average of daily meanPM2.5 over the three sites in Jiaozuo has a CC of 0.55 to the modeled result. All the above CCs are lower than the CC obtained at Huanbaoju (0.56).
For the maximum PM2.5, the CCs obtained are 0.50, 0.48, 0.52 and 0.54, for Huanbaoju, Gaoxinqu and Yingshicheng and the average over the three stations, respectively. It is very interesting that the model was trained based on the data of station Huanbaoju, theoretically, the highest CC should be obtained for this station, but smaller CC was obtained instead than that for the three-station average and the station Yingshicheng. This demonstrates that the single station of Huanbaoju has some local variabilities which is related to some local conditions.
The simulated daily PM2.5 mean/maximum values has variations roughly corresponding to the observed values (Fig. 6). The simulated peaks and valleys usually have one day leading or lagging behind the observed ones. This inconsistency maybe related to the local atmospheric condition which affects the pollution transportation. It is common that the heavy PM2.5 episodes usually lasts for several days, and the simulated PM2.5 in this study can reflect such multi-day variations.
3.3.2 Variance inflation
The simulation underestimated the daily variances of both mean PM2.5 and maximum PM2.5 (Fig. 7). Some specific peak values were not produced by the models. In order to match the simulated variances with the observed, variance inflation was used. The simulated standard deviation is 0.43 of the observed standard deviation. Considering the emission levels are different for each year, we use a different inflation factor for each year.
After this inflating, the magnitude of the variations can match the observed, however, very few peaks are produced by the simulation. The mean PM2.5 are simulated with better performance than the maximum PM2.5, and the underestimation of maximum PM2.5 is also more significant than that of the mean PM2.5. In contrast, the models perform the poorest in winter 2015 and perform the best in winter 2014. Similar good performance is obtained in 2017 as that in 2014.
After inflating the variance, the increase of RMSE is smaller than 0.01 µg/m3, while the relative biases are reallocated for the winter of each year, and the average absolute biases are reduced from 0.64 to 0.59 and from 0.72 to 0.57 respectively, for the mean PM2.5 and maximum PM2.5. The variance inflation does not change CCs, and causes very small increase of biases, but can improve the risk warning on peak PM2.5 days.
3.3.3 Outputs in special periods
In the special periods of the APEC Blue sky and SCO Blue sky, the simulated PM2.5 at these days are also comparatively low, indicating that these blue-sky days were mainly caused by good atmospheric circulation (Fig. 8). Therefore, as for whether the emission control measures have a significant impact on reducing PM2.5 concentrations in the two special periods, this study cannot give a conclusion. According to the experiments through numerical transport simulation, the emission control measures explains a PM2.5 reduction of -21.8% in Beijing(Gao et al., 2017).
Similarly, in the December 2017, the even-odd-number traffic control in multiple urban areas shows no significant impact on PM2.5, since the observed PM2.5 also has several peak days corresponding to the downscaled results.
In the three spring festivals of 2015 (in winter 2014), 2016 (in winter 2015) and 2017 (in winter 2016), the PM2.5 values are relatively small. However, according to the downscaling results, there should be a peak value during the spring festival of 2015 and 2016. During the Spring Festival of 2016, there is actually a peak observed, but it is lower than the corresponding peak simulated by the downscaling model (Fig. 9). During the Spring Festival of 2017, although the fireworks were banned in the main urban areas, however, burning fireworks in other wide areas were still allowed and the emissions certainly has important effects on PM2.5. This effect is also not significant by comparing with the simulated PM2.5.
3.3.4 Downscaled annual PM2.5 during 1979-2017
In this study, the historical simulation for the winters of 1979-2017 was produced purely based on large-scale atmospheric predictors without considering the annual changes of emissions. The mean daily PM2.5 and maximum PM2.5 are shown Fig. 10. Mann-Kendall tests (Kendall, 1955; Mann, 1945) shown that both the simulated average PM2.5 mean and maximum have decreasing trends which can pass test at the 95% significance level, during 1979-2017. This result implies that under the same assumed anthropogenic and natural emission level, due to the large-scale atmospheric factors, the PM2.5 concentrations is generally dropping. This result is contrary to the recent observed trend which is largely caused by the increased pollutant emission. The dropping trend is mainly resulted from the lower values after 2005.
Theoretically, the relation between pan evaporation and PM2.5 is very complex. Nevertheless, the pan evaporation in high PM2.5 days is comparatively low because in hazy days, the high relative humidity and weak air flow can reduce water evaporation. In this study, the CC between the observed EVP (represented by the data observed at Zhengzhou station) and observed PM2.5 during winters of 2014-2017 is -0.44, while that between the observed EVP at Zhengzhou and the simulated PM2.5 is -0.42 during the same period. (R=-0.39 during 1979-2018).
3.3.5 Comparison between the downscaling models and the local meteorological data based models
Table 4
Correlation coefficients (CC) and rooted mean squared error (RMSE) obtained by the PM2.5 (daily mean) concentration models based on local meteorological data
|
Calibration period
|
Zhengzhou
|
Jiaozuo
|
Calibration
|
Validation
|
Calibration
|
Validation
|
R
|
Nov.,2014-Feb.,2015
|
0.613
|
0.638
|
0.627
|
0.669
|
Nov.,2015-Feb.,2016
|
0.615
|
0.518
|
0.666
|
0.487
|
Nov.,2016-Feb.,2017
|
0.614
|
0.536
|
0.624
|
0.623
|
Nov.,2017-Feb.,2018
|
0.565
|
0.720
|
0.413
|
0.705
|
Nov.,2018-Feb.,2019
|
0.613
|
0.553
|
0.644
|
0.531
|
RMSE
|
Nov.,2014-Feb.,2015
|
4.491
|
4.783
|
4.446
|
4.653
|
Nov.,2015-Feb.,2016
|
4.545
|
4.551
|
4.518
|
4.330
|
Nov.,2016-Feb.,2017
|
4.526
|
4.634
|
4.426
|
4.734
|
Nov.,2017-Feb.,2018
|
4.573
|
4.425
|
4.400
|
4.361
|
Nov.,2018-Feb.,2019
|
4.556
|
4.496
|
4.488
|
4.462
|
The local meteorological data based models for daily mean PM2.5 concentrations were trained by the LASSO-GLM algorithm. In all cases based on different sample sets of the cross validation, the parameters for air temperature and pan evaporation were set to zeros by the LASSO algorithm, indicating that the useful signals carried by these two variables can be represented by the signals in other variables. The PM2.5 models for Zhengzhou and Jiaozuo (based on the same meteorological data observed in Zhengzhou) are as follows,
Zhengzhou: Log(PM2.5) =4.4785-0.0374Prs+0.1857RH-0.1997wind-0.0847SSD
Jiaozuo: Log(PM2.5) =4.4410-0.0086Prs+0.2657RH-0.1493wind-0.0562SSD
The parameters of the above two models were obtained by the sample set in winter months of 2015-2018, which generally obtained the largest average CCs in calibration and validation periods, as fitted by different sample sets of cross validation (Table 4). The values of R and RMSE obtained by these local meteorological based models are close to that obtained by the downscaling models.
The outputs of the downscaling models and the local meteorological data based models were shown in Fig. 11(a). It is clear that the observed daily PM2.5 values in Jiaozuo and Zhengzhou are generally consistent to each other, indicating that the air quality in these two cities are highly controlled by the same weather system and that using the meteorological data observed in Zhengzhou to represent the weather in Jiaozuo is reasonable. The outputs of the local model and the downscaling model produces similar variations although they are not completely the same, and both of them underestimated the observed values. Based on the comparison between the modeled outputs and the observations, the downscaling models for Jiaozuo get a larger R value than the PM2.5 model based on local meteorological data, and the R value obtained by the local model for Jiaozuo is slightly smaller than that for Zhengzhou.