3.1 Spatial analysis of the cumulative number of confirmed cases
Logarithmize and map the cumulative number of confirmed cases in each prefecture-level city, as shown in Figure 1. Use different colors to indicate the cumulative number of confirmed cases in different prefecture-level cities. The darker the color, the larger the number is. It can be seen that the cumulative number of confirmed cases in prefecture-level cities across the country has obvious clustering characteristics. The specific manifestation is that the levels of cumulative number of confirmed cases in cities across the country vary greatly and take on polarization. The closer a city is to Wuhan, the greater the cumulative number of confirmed cases, and the farther away a city is from Wuhan, the lower the cumulative number of confirmed cases. Most of the cumulative number of confirmed cases is concentrated in Central China, South China and East China.
Next, used local Moran's I (LISA) to explore the spatial autocorrelation model of the cumulative number of confirmed cases, as shown in Figure 2. From the LISA chart of the cumulative number of confirmed cases, it can be seen that it takes on an obvious positive spatial autocorrelation, and that Central China and East China are high agglomeration areas, while Northwest and Southwest China are low agglomeration areas.
3.2 Results of OLS model
In order to guarantee the overall validity of each factor, the OLS model was used for testing, and the results are shown in Table 2. It can be seen from Table 2 that at a significance level of 95% and below, there are altogether 6 independent variable factors having significant impacts on the cumulative number of confirmed cases. It can be seen from the absolute values of the coefficients that the descending order of importance is the In, the Out, the NumDoc, the NumTaxi, the NumHos, and the GDP. Positive are the coefficients of the GDP, the NumDoc, and the Out, while negative are the regression coefficients of the NumHos, the NumTaxi, and the In.
Table 2 Statistical descriptions of regression coefficients
Variable
|
OLS
|
GWR
|
MGWR
|
B
|
SE
|
t
|
Min
|
Median
|
Max
|
Min
|
Median
|
Max
|
Bandwidth
|
Intercept
|
3.17***
|
0.072
|
43.886
|
-0.599
|
0.244
|
0.761
|
-0.933
|
-0.058
|
1.251
|
43
|
Pop
|
-0.001
|
0.265
|
-0.005
|
-0.748
|
-0.313
|
0.658
|
-0.314
|
-0.309
|
-0.29
|
293
|
GovBudget
|
-0.201
|
0.304
|
-0.663
|
-1.304
|
-0.157
|
0.592
|
-0.144
|
-0.142
|
-0.125
|
293
|
GDP
|
0.229*
|
0.103
|
2.22
|
-0.145
|
0.046
|
0.893
|
-0.103
|
0.056
|
0.295
|
161
|
NumUnemploy
|
0.058
|
0.143
|
0.406
|
-0.256
|
0.094
|
0.26
|
0.079
|
0.084
|
0.094
|
293
|
NumCollege
|
0.717
|
0.44
|
1.629
|
-0.011
|
0.338
|
0.704
|
-0.023
|
-0.016
|
0.006
|
293
|
NumVocational
|
0.116
|
0.181
|
0.641
|
-0.302
|
0.064
|
0.551
|
-0.152
|
-0.137
|
-0.119
|
293
|
PopCollege
|
-0.006
|
0.408
|
-0.014
|
-0.424
|
0.118
|
0.818
|
0.319
|
0.326
|
0.346
|
293
|
PopVocational
|
-0.344
|
0.19
|
-1.808
|
-0.505
|
-0.066
|
0.162
|
0.091
|
0.111
|
0.118
|
292
|
NumHos
|
-0.537**
|
0.163
|
-3.287
|
-0.385
|
-0.184
|
0.029
|
-0.122
|
-0.114
|
-0.096
|
293
|
NumBed
|
0.317
|
0.403
|
0.788
|
-0.66
|
0.223
|
1.101
|
0.651
|
0.653
|
0.672
|
293
|
NumDoc
|
0.921*
|
0.413
|
2.23
|
-0.259
|
0.523
|
1.786
|
0.248
|
0.255
|
0.266
|
293
|
NumMeInsure
|
-0.082
|
0.115
|
-0.713
|
-0.759
|
0.276
|
2.461
|
-0.086
|
-0.081
|
-0.072
|
293
|
NumUnInsure
|
0.497
|
0.442
|
1.125
|
-2.097
|
0.288
|
1.206
|
0.429
|
0.434
|
0.443
|
293
|
NumBus
|
0.175
|
0.233
|
0.752
|
-0.787
|
-0.07
|
0.515
|
0.211
|
0.216
|
0.228
|
293
|
NumPassenger
|
-0.074
|
0.304
|
-0.244
|
-1.217
|
0.168
|
1.021
|
-0.692
|
-0.233
|
0.212
|
63
|
NumTaxi
|
-0.666*
|
0.31
|
-2.151
|
-1.156
|
-0.339
|
1.072
|
-0.201
|
-0.193
|
-0.175
|
293
|
Out
|
2.694***
|
0.456
|
5.904
|
0.374
|
2.025
|
3.817
|
1.503
|
1.531
|
1.576
|
291
|
In
|
2.958***
|
0.521
|
-5.673
|
-4.765
|
-2.6
|
-0.545
|
-3.4
|
-1.822
|
-1.088
|
43
|
Note: * p < 0.05, ** p < 0.01, *** p < 0.001
The diagnostic indicators of the OLS model were calculated, as shown in Table 3, where R² is the determination coefficient, and AICc (corrected Akaike information criterion) is the modified AIC. According to the R², the model explains approximately 44.8% of the total variations of the dependent variables in 294 units. Calculation showed that the Moran's I index of the sample residual of the OLS model is 0.3740 (p<0.001), indicating that the residual has significant spatial autocorrelation and that the residual distribution shows obvious spatial agglomeration characteristics. Since the parameter estimation of the OLS model demonstrated larger deviation, GWR model was used for analysis.
Table 3 Diagnostic indicators
|
R²
|
Adjust R²
|
AIC
|
AICc
|
Diagnostic indicators of OLS
|
0.448
|
0.412
|
697.646
|
702.723
|
Diagnostic indicators of GWR
|
0.741
|
0.673
|
560.445
|
593.542
|
3.3 Results of GWR model
Since the OLS fitting results had drawbacks, the GWR model was set and used for analysis (Oshan et al., 2019). The diagnostic indicators of the GWR model were obtained as shown in Table 3.
In this paper, the authors referred to the views of Fotheringham (Fotheringham et al., 1998). If the difference of AICc between GWR fitting results and OLS fitting results is greater than 3, it indicates that the GWR is superior to the OLS. As is shown in Table 3, the AICc values of the two models differ by 109.181, so the GWR model with a lower AICc value has better fitting result. Compared with the OLS model, the R² and adjusted R² of the GWR model are significantly improved, indicating that the residual is effectively reduced and the GWR fitting result is more ideal.
The results of the GWR model are shown in Table 2. The estimated coefficients of the 18 factors vary in different grid cells, and the regions with significant influence also have significant spatial heterogeneity. The regression coefficient ranges of impacting factors vary greatly, and the coefficients are either positive or negative. The regression coefficients obtained by OLS can only represent the overall average level. A stable coefficient relationship does not exist between the cumulative number of confirmed cases and the influencing factors, since the influencing factors show strong spatial instability. GIS map was used to show the spatial characteristics of the influence changes of each significant factor, as shown in Figure 3.
3.4 Results of MGWR model
The fitting result obtained from GWR model was superior to that from the OLS model. However, since the GWR uses fixed bandwidth and this time the bandwidth was calculated to be 173, the data could not be optimally utilized. Then the MGWR model had to be used for analysis (Oshan et al., 2019 ). It can be seen from Table 4 that the goodness of fit R² of MGWR is higher than that of classic GWR, and the value of AICc is lower than that of classic GWR. Therefore, it could be determined that the result of MGWR is better than that of classic GWR. In terms of the number of valid parameters, the MGWR is smaller and the residual sum of squares is also much smaller, indicating that it can use fewer parameters to obtain a regression result closer to the true value. Therefore, the MGWR model in this case is better than the classic GWR model.
Table 4 Indexes of classic GWR and MGWR models
Indicators of model
|
MGWR
|
Classic GWR
|
R²
|
0.770
|
0.673
|
AICc
|
483.579
|
593.542
|
Number of valid parameters v1
|
56.585
|
60.409
|
Residual sum of squares
|
54.669
|
76.266
|
The statistical description of each coefficient of MGWR is shown in Table 2. It can be seen from Table 2 that MGWR can directly reflect the differential action scale of different variables, while the classic GWR can only reflect the average value of the action scale of each variable. The bandwidth of the classic GWR is 173, accounting for 59% of the total sample number. By calculating MGWR, it was found that the scales of action of different variables varied greatly. Among the MGWR results, significant were the regression coefficients of the 8 variables:Intercept, Pop, GDP, NumBed, NumBus, NumPassenger, Out and In, while others were insignificant.. The Intercept expressed the influence of different positions on the cumulative number of confirmed cases when other independent variables were determined. GIS map can be used to show the spatial characteristics of the influence variations of each significant factor, as shown in Figure 4.
3.5 Summary of regression coefficients of significant factors of each model
The regression coefficients of the significant influencing factors of the OLS model, GWR model and MGWR model were summarized, and the results are shown in Table 5.
Table 5 Summary of regression coefficients of significant factors
Categories
|
Variable
|
OLS
|
GWR
|
MGWR
|
Whether the basic consistent
|
Constant
|
Intercept
|
+
|
strong+,weak-
|
equal strength
|
×
|
Economics
|
Pop
|
|
strong-,weak+
|
strong-
|
√
|
GovBudget
|
|
strong-
|
|
|
GDP
|
+
|
strong+
|
strong+
|
√
|
NumUnemploy
|
|
strong+
|
|
|
Education
|
NumCollege
|
|
|
|
|
NumVocational
|
|
strong+,weak-
|
|
|
PopCollege
|
|
strong+
|
|
|
PopVocational
|
|
strong-
|
|
|
Medical
|
NumHos
|
-
|
strong-
|
|
√
|
NumBed
|
|
strong+
|
strong+
|
√
|
NumDoc
|
+
|
strong+
|
|
√
|
Insure
|
NumMeInsure
|
|
strong+
|
|
|
NumUnInsure
|
|
strong+
|
|
|
Traffic
|
NumBus
|
|
|
strong+
|
|
NumPassenger
|
|
strong-,weak+
|
strong-
|
√
|
NumTaxi
|
-
|
strong-,weak+
|
|
|
Out
|
+
|
strong+
|
strong+
|
√
|
In
|
-
|
strong-
|
strong-
|
√
|
Notes:①In the third column: "+" means that the regression factor is positively significant in 95% of the confidence interval in the OLS model, "-" means negatively significant, and blank means not significant.
②In columns 4 and 5: "Strong +, Weak -" means that the regression factors exist respectively in the GWR model and the MGWR model. In 95% of the confidence interval, both positive and negative significance exist, but mainly positive significance; "Strong- , Weak+ "means that the majority is negatively significant; “equal strength” means that the difference between positive and negative is not big; “strong +” means that most of them are positively significant; “Strong-” means that most of them are negatively significant; and blank means not significant.
③The“√”in the sixth column indicates that the regression coefficients of two or three models are close in the“Whether basically consistent”;“×”indicates that the regression coefficients of two or three models are not close, and blank means that the factors are not significant in 95% of the confidence intervals of the three models, or are significant in only one model.