3.1 Analytical sample
The research scope of this paper is the three major urban agglomerations of the Yangtze River, including the Chengdu-Chongqing urban agglomeration, the middle reaches of the Yangtze River urban agglomeration, and the Yangtze River Delta urban agglomeration.
Based on the data availability and research comparability, three county-level cities (Xiantao, Qianjiang and Tianmen) are excluded. The research objects include 70 prefecture-level cities and above. The statistical data come from China Demographic and Employment Statistical Yearbook, China Urban Statistical Yearbook, China Economic and Social Big Data Research Platform, EPS Data Service Platform and the State Railway Administration. The interpolation method is used to supplement the individual pollution emission data.
3.2 Calculation method
The core dependent variable is the pollution emission reduction index. To facilitate the follow-up mechanism analysis, the entropy method is used to treat the per capita industrial sulfur dioxide emission, per capita industrial smoke and dust emission, per capita industrial wastewater emission and capita industrial NOx emission as negative indicators for range standardization. The pollution emission reduction index is calculated after objective weighting to avoid the problem of information duplication between variables. The calculation steps of the panel data entropy method are as follows:
Range standardization:
$${Z}_{tik}=\frac{{z}_{max}-{z}_{tik}}{{z}_{max}-{z}_{min}}$$
Normalization:
$${r}_{tik}={Z}_{tik}/\sum _{t=1}^{m}\sum _{i=1}^{n}{Z}_{tik}$$
Information entropy of each pollution index:
$${e}_{k}=-\frac{1}{\text{ln}(\text{m}\cdot n)}\sum _{t=1}^{m}\sum _{i=1}^{n}{r}_{tik}\text{l}\text{n}\left({r}_{tik}\right)$$
Redundancy:
Weight of each indicator:
$${w}_{k}={d}_{k}/\sum _{k=1}^{h}{d}_{k}$$
Pollution reduction level:
$${PR}_{ti}={r}_{tik}\cdot {w}_{k}$$
Where \({z}_{tik}\)refers to the index value of the city \(i\) in year \(t\), \({r}_{tik}\) is the proportion of various pollution emissions, \(t\in \left\{\text{1,2},\dots ,m\right\}\), \(i\in \left\{\text{1,2},\dots ,n\right\}\), \(k\in \left\{\text{1,2},\dots ,h\right\}\), \(m=13\), \(n=70\), \(h=4\). \({z}_{max}\) and \({z}_{min}\) represent the maximum and minimum values of different pollution indicators during all urban samples. After the above treatment, \({PR}_{ti}\) is the pollution emission reduction level of the city i in year t. The larger the value, the more pronounced the pollution emission reduction effect of the city.
The independent variables include the development of high-speed rail (HSR) and the level of producer service agglomeration (PAL). At present, most studies on the high-speed rail set dummy variables based on whether to open high-speed rail, thus ignoring the dynamic change trend of high-speed rail service level. This paper measures the development level of HSR by using the number of HSR lines passing through the city. If an HSR line takes a city as the endpoint, it will open another section of the same line from the city next time. When the number of routes is calculated, the city is regarded as having opened two routes successively. During the process of calculating the producer service agglomeration level (PAL), to facilitate the number of employees in all relevant industries during the period of sampling, the scope of producer services is set as technical services, scientific research, finance, software, computer services, leasing and business services, information transmission, geological survey industries with high knowledge intensity, as well as transportation, storage and postal industries with low knowledge intensity. The existing indexes to identify the industrial agglomeration level include Herfindahl Hirschmann index, EG index, Moran index, spatial Gini coefficient and location entropy. The calculation of the Herfindahl Hirschmann index and EG index needs data accurate to the enterprise level. Based on the availability and integrity of data, this research calculates the global Moran index and location entropy to represent the spatial agglomeration level of producer services and selects the value of location entropy as the explanatory variable.
The calculation formula is as follows:\({PAL}_{iv}=\left(\frac{{p}_{iv}}{{p}_{i}}\right)/\left(\frac{{P}_{v}}{P}\right)\)
\({PAL}_{iv}\) represents the producer service agglomeration of the city \(i\). The larger the \({PAL}_{iv}\) value, the higher the agglomeration. \({PAL}_{iv}\)>1 indicates that the sample is in a state of agglomeration, while \({PAL}_{iv}\)<1 indicates that the sample city’s PAL is lower than the regional average level. \({p}_{iv}\) is the number of people engaged in the industry \(v\) of the city \(i\), \({p}_{i}\) is the total number of workers in the city \(i\), \({P}_{v}\) is the total number of people engaged in the industry \(v\) in the urban agglomeration, and \(P\) is the total number of workers in the urban agglomeration.
The control variables include the city’s economic development level (PGDP), the opening-up level (FDI) calculated using the annual average exchange rate, the degree of government intervention (GI), urban scale (CS), technology investment (TI), environmental regulation (ER), information level (INL) and industrial structure (IND). Table 1shows the descriptive statistical analysis results of each variable.
Table 1
Variable
|
Description
|
Obs
|
Mean
|
Std.Dev.
|
Min
|
Max
|
PR
|
pollution reduction level
|
910
|
0.850
|
0.148
|
0.127
|
1
|
HSR
|
HSR lines
|
910
|
0.884
|
1.278
|
0
|
8
|
PAL
|
PAS level
|
910
|
0.762
|
0.711
|
0.153
|
4.780
|
PGDP
|
per capita GDP
|
910
|
5.130
|
3.437
|
0.589
|
19.902
|
FDI
|
ratio of actually used foreign capital to GDP
|
910
|
0.026
|
0.034
|
0
|
0.889
|
GI
|
ratio of fiscal expenditure to GDP
|
910
|
0.159
|
0.058
|
0.058
|
0.675
|
CS
|
logarithm of total population at the end of the year
|
910
|
6.052
|
0.626
|
4.299
|
8.136
|
TI
|
ratio of science and technology expenditure to fiscal expenditure of the current year
|
910
|
0.023
|
0.019
|
0.002
|
0.163
|
ER
|
comprehensive utilization rate of industrial solid waste
|
910
|
0.873
|
0.168
|
0.05
|
1.432
|
INL
|
logarithm of Internet households
|
910
|
4.053
|
1.121
|
1.347
|
8.551
|
IND
|
ratio of tertiary industry to secondary industry
|
910
|
0.799
|
0.304
|
0.313
|
2.693
|
3.3 Model specification
3.3.1 Spatial correlation analysis
This study tests the spatial correlation using the global Moran’s I index to demonstrate the rationality and necessity of using a spatial econometric model for empirical analysis. The Moran index is calculated using the row-normalized spatial weight matrix:
$$I=\frac{\sum _{i=1}^{n}\sum _{j=1}^{n}{w}_{ij}({x}_{i}-\stackrel{-}{x})({x}_{j}-\stackrel{-}{x})}{\sum _{i=1}^{n}{({x}_{i}-\stackrel{-}{x})}^{2}}$$
\({x}_{i}\) is the observed value in the region \(i\), \({x}_{j}\) is the observed value in the region \(\text{j}\), and \({w}_{ij}\) is a factor in the row-normalized weight matrix \(\text{w}\). The value range of I is [-1,1]. I > 0 represents positive autocorrelation, while I < 0 represents high-value and low-value aggregation. When I is close to 0, the observed value is distributed randomly in space without spatial correlation.
According to Tobler’s First Law, the more geographically adjacent objects are, the more similar they are. Competition and cooperation among objects, imitation behaviour, overflow behaviour and unclear boundary definition are all reasons for spatial dependence. In this study, the economic-geographical space matrix is calculated by combining geographical and economic distances. The calculation formula is as follows:
$$W={W}_{ij}^{d}\times {W}_{ij}^{e}$$
$${W}_{ij}^{d}=\left\{\begin{array}{c}1/{d}_{ij}, i\ne j\\ 0, i=j\end{array}\right. {W}_{ij}^{e}=\left\{\begin{array}{c}1/\left|\stackrel{-}{{e}_{i}}-\stackrel{-}{{e}_{j}}\right|, i\ne j\\ 0, i=j\end{array}\right.$$
The special variables are set as follow:
\({W}_{ij}^{d}\) : The weight matrix of geographical distance
\({d}_{ij}\) : The geographical distance calculated by longitude and latitude between city \(i\) and city \(j\)
\({W}_{ij}^{e}\) : The weight matrix of economic distance in the period of sampling
\(\stackrel{-}{{e}_{i}}\) : Per capita GDP in the sampling period of city \(i\).
\(\stackrel{-}{{e}_{j}}:\) Per capita GDP in the sampling period of city \(j\).
\({W}_{ij}^{d}\) , \({W}_{ij}^{e}\) and \(W\) are all standardized. This paper uses MATLAB R2018b and Stata15.1 software for measurement and calculation. As shown in Table 2, the results indicate that the global Moran index is significant at the level of 1% during the sample period. From 2007 to 2019, PR and PAL in the prefecture-level cities of the three major city clusters have evident spatial agglomeration. Therefore, it is more scientific to choose a spatial econometric model for analysis.
Table 2
Moran index test results of PR and PAL
Year
|
2007
|
2008
|
2009
|
2010
|
2011
|
2012
|
2013
|
PR
|
0.505***
(7.152)
|
0.503***
(7.114)
|
0.543***
(7.733)
|
0.516***
(7.372)
|
0.415***
(5.905)
|
0.394***
(5.672)
|
0.417***
(5.926)
|
PAL
|
0.363***
(5.411)
|
0.415***
(6.372)
|
0.375***
(5.690)
|
0.362***
(5.513)
|
0.408***(5.995)
|
0.495***
(7.232)
|
0.476***
(7.080)
|
Year
|
|
2014
|
2015
|
2016
|
2017
|
2018
|
2019
|
PR
|
|
0.399***
(5.739)
|
0.373***
(5.346)
|
0.43***
(6.122)
|
0.466***
(6.643)
|
0.242***
(3.637)
|
0.457***
(6.516)
|
PAL
|
|
0.439***
(6.428)
|
0.412***
(5.992)
|
0.343***
(5.038)
|
0.387***
(5.705)
|
0.393***
(5.837)
|
0.398***
(5.912)
|
3.3.2 Model specification
The general expression for the spatial panel model is:
$${PR}_{it}=\lambda W{PR}_{it}+\gamma {X}_{it}+\eta W{X}_{it}+{\mu }_{it}+{\nu }_{it}+{\epsilon }_{it}$$
$${\epsilon }_{it}=\rho W{\epsilon }_{it}+{\delta }_{it}$$
In the model, \({PR}_{it}\) represents the PR level of city\(i\)in year \(t\), \(W\) refers to the spatial weight matrix, \({X}_{it}\) refers to the set of control variables, \({\mu }_{it}\) and \({\nu }_{it}\) represent the time effect and individual effect, respectively, \(\lambda\)is the spatial autoregression coefficient, \(\rho\)is the spatial autocorrelation coefficient, \(\gamma\)and \(\eta\) respectively represent the parameter matrix to be estimated, and \({\epsilon }_{it}\) is the random disturbance term. When \(\rho\)=0 and \(\eta\)=0, it is a spatial autoregressive model (SAR) as follows:
$${PR}_{it}=\lambda W{PR}_{it}+\gamma {X}_{it}+{\mu }_{it}+{\nu }_{it}+{\epsilon }_{it}$$
The SAR model considers the influence of adjacent areas on pollution emission, which is consistent with the results of spatial correlation analysis.
When \(\lambda\)=0 and \(\eta\)=0, the general expression for the spatial panel model turns into the spatial error model (SEM), considering the changes of local explanatory variables affected by the error term of adjacent areas via the spatial transmission mechanism, as well as the potential interference factors such as missing variables. The model is as follows:
$${PR}_{it}=\gamma {X}_{it}+{\mu }_{it}+{\nu }_{it}+{\epsilon }_{it}$$
$${\epsilon }_{it}=\rho W{\epsilon }_{it}+{\delta }_{it}$$
When \(\rho\)=0, the general expression for the spatial panel model turns into the spatial Dubin model (SDM), considering the influence of both local core variables and adjacent regional core variables on local explanatory variables. The model is as follows:
$${PR}_{it}=\lambda W{PR}_{it}+\gamma {X}_{it}+\eta W{X}_{it}+{\mu }_{it}+{\nu }_{it}+{\epsilon }_{it}$$
The opening of HSR has an exogenous policy impact on the economy. This paper groups samples based on whether the city has high-speed rail or not. To estimate the net effect of SHR opening and spatial correlation between HSR and PR, the following basic SDM-DID equation is as follows:
$${PR}_{it}={\lambda }_{0}W{PR}_{it}+{\alpha }_{0}DID+{\beta }_{0}WDID+{\gamma }_{0}{X}_{it}+{\eta }_{0}W{X}_{it}+{\mu }_{it}+{\nu }_{it}+{\epsilon }_{it}$$
(1)
In order to verify the intermediary role of PAL between HSR and PR, the stepwise regression models are as follows:
$${PAL}_{it}={\lambda }_{1}W{PAL}_{it}+{\alpha }_{1}DID+{\beta }_{1}WDID+{\gamma }_{1}{X}_{it}+{\eta }_{1}W{X}_{it}+{\mu }_{it}+{\nu }_{it}+{\epsilon }_{it}$$
(2)
$${PR}_{it}={\lambda }_{2}W{PR}_{it}+\theta {PAL}_{it}+\theta {WPAL}_{it}+{\alpha }_{2}DID+{\beta }_{2}WDID+{\gamma }_{2}{X}_{it}+{\eta }_{2}W{X}_{it}+{\mu }_{it}+{\nu }_{it}+{\epsilon }_{it}$$
(3)
DID is the interaction term between the policy dummy variable and the time dummy variable. This paper uses the number of HSR lines instead of the policy dummy variable to represent the HSR level of the city \(i\)in year \(t\). When the sample city opens the high-speed rail, the policy dummy variable of the current year is 1; otherwise, it is 0. \({\alpha }_{0\tilde2}\) and \({\beta }_{0\tilde2}\) are the estimated coefficients of the double-difference term.
LM test and RLM test results show the SDM model is suitable for the existing data, and LR test and Wald test results indicate that it cannot degenerate into SAR or SEM (Table 3). The statistical value of the Hausman test is 73.70 and P = 0.000, indicating that the fixed effect is significantly better than the random effect.
Table 3
Model selection test results
test
|
statistic
|
P-value
|
test
|
statistic
|
P-value
|
LM-error
|
126.140
|
0.000
|
LR Test SAR
|
66.87
|
0.000
|
LM-lag
|
175.531
|
0.000
|
LR Test SEM
|
84.00
|
0.000
|
RLM-error
|
0.111
|
0.739
|
Wald Test SAR
|
24.33
|
0.004
|
RLM-lag
|
49.503
|
0.000
|
Wald Test SEM
|
26.02
|
0.001
|
The results of the counterfactual test and parallel trend test illustrate the feasibility of using the DID method in the model. In the multi-period DID model, \({pre}_{2\tilde6}\) represents the dummy variable before the HSR opening. PRE1 is removed from the estimation because it is the base period. The results (Table 4) show that the “pseudo-policy” variable has no significant effect on PAL and PR. As shown in Fig. 2, the change trends of PAL and PR in the control group (CG) and the treatment group (TG) during the sample period are almost the same, indicating that the data used in the analysis meet the assumption of using the DID model in terms of statistical significance and intuitive judgment.
Table 4
Parallel trend hypothesis test results
variable
|
PR
|
PAL
|
pre6
|
-0.014(-0.76)
|
0.033(-0.52)
|
pre5
|
0.008(-0.46)
|
-0.032(-0.71)
|
pre4
|
0.001(-0.07)
|
-0.041(-0.92)
|
pre3
|
-0.016(-1.41)
|
-0.062*(-1.71)
|
pre2
|
0(-0.03)
|
-0.035(-1.16)
|
Constant
|
-0.653(-1.49)
|
1.199(-0.63)
|
control variables
|
YES
|
YES
|
Year FE
|
YES
|
YES
|
City FE
|
YES
|
YES
|
R2
|
0.837
|
0.892
|
Observations
|
910
|
910
|
Notes: T-values in parentheses