The Pattern of COVID-19 Transmission In China Based On The Data From 294 Prefecture-Level Cities

This paper concerns an analysis about the impact of a total of 18 factors on the spread of COVID-19 in ve aspects: population and economy, education, medical care, insurance, and migration. The research is based on the cumulative number of conrmed cases in 294 prefecture-level cities in China, and geographic weighted regression models. The main conclusions are as follows: (cid:0) Population and economic factors are positively correlated with the overall impact of the epidemic, but the case is just opposite in the economically developed southeast region, possibly due to the fact that the outbreak of the epidemic coincided with the Spring Festival, with a large number of people outowing from the southeast, and economic activities decreasing, resulting in a small number of conrmed cases; (cid:0) Educational factors have some impact on the spread of the epidemic, but the characteristics of the impact are uncertain; (cid:0) Medical and insurance factors have a positive relation with the overall epidemic, different from the common sense and the vast majority of studies. This may be explained by the fact that at the initial stage of the outbreak, a large number of patients were admitted to the regular wards without adequate protection, resulting in an increase in the number of infectious patients in hospitals; (cid:0) Migration factors have a negative relation with the overall suppression of the epidemic, indicating that during the outbreak period, migration should be strictly controlled.


Introduction
Coronavirus disease  is caused by the SARS-CoV-2 virus , and the rapid spread of the disease has made it a global health issue. On December 31, 2019, the Chinese government rst reported the outbreak of COVID-19 in Wuhan. Within a few weeks, the virus had spread rapidly throughout China. In order to get the spread of the epidemic under control, the Chinese authorities took drastic and effective measures and achieved remarkable results. On March 19, 2020, the number of new domestic cases was "zero" for the rst time. By March 19, 2020, 81,252 cases of COVID-19 were o cially con rmed in mainland China, including 3252 deaths. At the same time, the spread of the epidemic is still continuing in many other countries and regions, and the situation of epidemic prevention and control remains severe. Therefore, an in-depth analysis of the spread and control of COVID-19 in China is of great signi cance for international epidemic prevention and control.
At the early stage of the outbreak, most of the researches on COVID-19 were medical ones. Chan (Chan et al., 2020) observed the infected patients and con rmed the presence of polymerase and surface spike protein in the COVID-19 virus, indicating that this is a new type of coronavirus close to SARS. Guan  summarized the clinical characteristics of patients by observing con rmed 1,099 patients two months before the outbreak. Zhou (F.  used Logistic regression to prove that senior patients and patients with higher sequential organ failure assessment scores have a higher risk of death. Ai (Ai et al., n.d.) discussed the diagnostic value of chest CT using RT-PCR as a reference standard in COVID-19 and con rmed that chest CT is highly sensitive to the diagnosis of COVID-19.
COVID-19 is a new highly contagious infectious disease. At the early stage of the outbreak of the epidemic, there were very few studies on its transmission rules, except for the studies on the transmission rules of other infectious diseases. The spread of infectious diseases needs to meet the three links at the same time: source of infection, route of transmission, and susceptible population . Therefore, the effective prevention of the spread of an epidemic lies in controlling the source of infection and cutting off the route of transmission. In terms of sources of infection, large-scale population movements are an important factor leading to the rapid spread of an epidemic. In 2005, Woolhouse (Woolhouse and Gowtage-Sequeria, 2005) pointed out that repeated outbreaks of infectious diseases were closely related to the surge of tourist tra c. Population mobility was often due to economic trade, so frequent economic trade was considered to be an important driving force for the spread of infectious diseases (Weiss and McMichael, 2004). Cutting off the route of transmission is an effective means to control the spread of an epidemic. It is crucial to improve the residents' consciousness of infectious disease prevention and control. Various factors such as income and expenditure, education level and other factors will signi cantly affect residents' consciousness of the prevention and control. Therefore, the level of education was closely related to the spread of the epidemic (Zhong et al., 2020). Whether there is insurance after the outbreak of the epidemic will directly affect income and expenditure. The spread of infectious diseases was closely related to insurance, especially medical insurance (Ridde et al., 2020). Areas with better public medical and health security can detect the epidemic in time and take measures to stop the spread of the epidemic. The vast majority of studies showed that the medical and health system has an important inhibitory effect on the spread of epidemics (Dalziel et al., 2018). With the further development of the epidemic prevention and control situation, researches on the transmission pattern of COVID-19 are constantly on the rise. By studying 425 con rmed cases in Wuhan, Li  pointed out that the average incubation period of COVID-19 is 5.2 days and the epidemic situation doubles every 7.4 days at the early stage. Wang (J.  pointed out that environmental conditions such as humidity and temperature would affect the spread of COVID-19. Research by Anderson (Anderson et al., 2020) showed that isolating infected people could curb the epidemic. Bruinen (Bruinen de Bruin et al., 2020) analyzed the measures taken by countries around the world to confront the COVID-19 epidemic and sorted them out.
Facts have showed that the spread of the epidemic takes on obvious spatial characteristics. The epidemic in China was spreading outward from Wuhan, Hubei Province , but there are few speci c studies. Understanding the spatial characteristics of the spread of the COVID-19 epidemic is essential for predicting local outbreaks of COVID-19 and formulating public health policies at an early stage. Up to now, researches on the spatial spread of COVID-19 in China have been very limited. This paper is based on multi-scale geographic weighted regression to study the spatial characteristics of the spread of COVID-19, shedding light on preventing and controlling the spread of COVID-19.

Data
(1) Epidemic data Data on the cumulative number of con rmed COVID-19 cases in March were collected according to prefecture-level cities and dates. Before March, China's epidemic prevention and control period was relatively short, and the cumulative number of con rmed cases was small; after March, too many factors intervened and affected China's epidemic prevention and control. And on March 19, China cleared the number of new domestic cases for the rst time, and the focus of epidemic prevention and control shifted to preventing imported cases from abroad.
(2) Related factor data In this paper, the authors referred to the study on spatial transmission law of SARS (Fang et al., 2009) and the factors of infectious disease transmission. In order to study the impact of the source of infection, the control variables are controlled at the population and economic level and the migration tra c level. In order to study the impact of transmission routes, in this paper the control variables of higher education level, medical level and insurance level are controlled. The description of variables is shown in Table 1.

Table 1 Description of variables
Except for the emigration scale index and immigration scale index from Baidu Migration , the other variables data were derived from China Urban Statistical Yearbook and China Statistical Yearbook in 2019. In order to eliminate the impact of magnitude and unit, the cumulative number of con rmed dependent variables was logarithmized (Add 1 to prefecture-level cities with 0 con rmed cases), and 18 independent variables were standardized.

Data analysis 2.2.1 Ordinary Least Squares (OLS)
The OLS (ordinary least squares) model is a multivariate linear function used to explain the relationship between the dependent variable and the independent variable . In this paper, the OLS model is used to analyze the linear relationship between the cumulative number of con rmed cases and various in uencing factors (Mollalo et al. , 2020). The calculation formula of OLS is expressed as formula (1):

Geographically Weighted Regression (GWR)
General linear regression model is usually used to summarize the laws existing within the overall range, but the actual laws vary with geography. Therefore, geographically weighted regression model should be used to eliminate estimation errors caused by spatial heterogeneity. (Kang et al., 2020). The calculation formula of geographically weighted regression is expressed as formula (2):

Multi-scale Geographically Weighted Regression (MGWR)
The bandwidth used by the classic GWR is xed, making the data unable to be optimally used. The multi-scale geographically weighted regression model overcomes this drawback (Oshan et al., 2019). The calculation formula of multi-scale geographic weighted regression is expressed as formula (3):

Model and software
In this study, ArcGIS was used to visualize the cumulative number of con rmed cases and draw the LISA(Local Indicator of Spatial Association) map of the cumulative number in prefecture-level cities. Geoda was used to analyze the OLS model and calculate the global autocorrelation index Moran's I of OLS model residuals. Finally, MGWR2.2 was used to set up and analyze GWR model and MGWR model, and ArcGIS was used for visual analysis of model results.

Spatial analysis of the cumulative number of con rmed cases
Logarithmize and map the cumulative number of con rmed cases in each prefecture-level city, as shown in Figure 1. Use different colors to indicate the cumulative number of con rmed cases in different prefecture-level cities. The darker the color, the larger the number is. It can be seen that the cumulative number of con rmed cases in prefecture-level cities across the country has obvious clustering characteristics. The speci c manifestation is that the levels of cumulative number of con rmed cases in cities across the country vary greatly and take on polarization. The closer a city is to Wuhan, the greater the cumulative number of con rmed cases, and the farther away a city is from Wuhan, the lower the cumulative number of con rmed cases. Most of the cumulative number of con rmed cases is concentrated in Central China, South China and East China.
Next, used local Moran's I (LISA) to explore the spatial autocorrelation model of the cumulative number of con rmed cases, as shown in Figure 2. From the LISA chart of the cumulative number of con rmed cases, it can be seen that it takes on an obvious positive spatial autocorrelation, and that Central China and East China are high agglomeration areas, while Northwest and Southwest China are low agglomeration areas.

Results of OLS model
In order to guarantee the overall validity of each factor, the OLS model was used for testing, and the results are shown in Table 2. It can be seen from Table 2 that at a signi cance level of 95% and below, there are altogether 6 independent variable factors having signi cant impacts on the cumulative number of con rmed cases. It can be seen from the absolute values of the coe cients that the descending order of importance is the In, the Out, the NumDoc, the NumTaxi, the NumHos, and the GDP. Positive are the coe cients of the GDP, the NumDoc, and the Out, while negative are the regression coe cients of the NumHos, the NumTaxi, and the In.

Results of GWR model
Since the OLS tting results had drawbacks, the GWR model was set and used for analysis (Oshan et al., 2019). The diagnostic indicators of the GWR model were obtained as shown in Table 3.
In this paper, the authors referred to the views of Fotheringham (Fotheringham et al., 1998). If the difference of AICc between GWR tting results and OLS tting results is greater than 3, it indicates that the GWR is superior to the OLS. As is shown in The results of the GWR model are shown in Table 2

Results of MGWR model
The tting result obtained from GWR model was superior to that from the OLS model. However, since the GWR uses xed bandwidth and this time the bandwidth was calculated to be 173, the data could not be optimally utilized. Then the MGWR model had to be used for analysis (Oshan et al., 2019 ). It can be seen from Table 4 that the goodness of t R² of MGWR is higher than that of classic GWR, and the value of AICc is lower than that of classic GWR. Therefore, it could be determined that the result of MGWR is better than that of classic GWR. In terms of the number of valid parameters, the MGWR is smaller and the residual sum of squares is also much smaller, indicating that it can use fewer parameters to obtain a regression result closer to the true value. Therefore, the MGWR model in this case is better than the classic GWR model. The statistical description of each coe cient of MGWR is shown in Table 2. It can be seen from Table 2 Table 5. Notes: In the third column: "+" means that the regression factor is positively signi cant in 95% of the con dence interval in the OLS model, "-" means negatively signi cant, and blank means not signi cant.
In columns 4 and 5: "Strong +, Weak -" means that the regression factors exist respectively in the GWR model and the MGWR model. In 95% of the con dence interval, both positive and negative signi cance exist, but mainly positive signi cance; "Strong-, Weak+ "means that the majority is negatively signi cant; "equal strength" means that the difference between positive and negative is not big; "strong +" means that most of them are positively signi cant; "Strong-" means that most of them are negatively signi cant; and blank means not signi cant.
The"√"in the sixth column indicates that the regression coe cients of two or three models are close in the"Whether basically consistent";"×"indicates that the regression coe cients of two or three models are not close, and blank means that the factors are not signi cant in 95% of the con dence intervals of the three models, or are signi cant in only one model. March 2020, a critical period for the prevention and control of the epidemic in China. The number of doctors was severely insu cient. There were many cases where a large number of domestic doctors rushed to help Hubei Province. In the case of a severe shortage of doctors, a large number of doctors can detect more cases, and because the medical conditions in cities in western China were relatively backward, the role of doctors became more obvious.

The impact of insurance factors on the cumulative number of con rmed cases
Insurance factors are positively related to the overall spread of the epidemic. The NumMeInsure showed a signi cant role in promoting the spread of the epidemic in the central and southeastern regions, and the correlation effect was gradually decreasing from west to east. This result is contrary to common knowledge, because in order to get the epidemic under control quickly, China adopted the policy of free medical care on the basis of medical insurance reimbursement, and the remaining part was subsidized by the national nance. Such practice encouraged more people to take the initiative to receive medical examination, leading to an increase of con rmed cases within a certain period of time, but it controlled the subsequent spread of the epidemic. The NumUnInsure showed a signi cant role in promoting the spread of the epidemic in East China, and the correlation effect gradually decreased from south to north. By comparing the distribution of the number of people participating in unemployment insurance, it could be found that in the regions with a large number of people covered by unemployment insurance, the cumulative number of con rmed cases rose slowly. This might be explained by the fact that unemployment insurance reduced unnecessary economic activities and mobility of the unemployed., but it was not enough to fully cover living expenses, so the cumulative number of con rmed cases would still increase.

The impact of migration and tra c factors on the cumulative number of con rmed cases
Migration and tra c factors have a negative impact on the overall suppression of the epidemic. The NumBus in Xinjiang showed a signi cant role in promoting the spread of the epidemic, but only three prefecture-level cities were marked as regions with no regularity. The NumPassenger had a signi cant impact on the spread of the epidemic in North, Central and South China, but the impact characteristics were relatively uncertain. In North and Central China, where the epidemic rst broke out, the more the NumPassenger, the greater the cumulative number of con rmed cases became. In southern China, the number of con rmed cases was reduced due to the lockdown of the cities and the banning of bus operations by the Chinese government after the outbreak. The NumTaxi showed a signi cant restraint on the epidemic in North, Central and East China. Except for Xinjiang, the more the NumTaxi, the fewer the cumulative number of con rmed cases were. The reason might be explained by the fact that taxis carry fewer people than public transport, lowering the cumulative number of con rmed cases.
The Out showed a signi cant effect on the spread of the epidemic. The higher the emigration scale index, the higher the urban mobility, and the spread of the epidemic intensi ed with population movement. The positive correlation was increasing from coastal cities to central and western cities, because most of the labor force in China migrated from central and western cities to coastal cities. The In showed a signi cant inhibitory effect on the spread of the epidemic. Cities with high immigration coe cients were generally metropolises or tourist cities, and they took relatively more strict prevention measures at the initial stage of the epidemic. From the GWR model, it can be seen that the effect of negative correlation increased from coastal cities to central and western cities. This might be explained by the fact that compared with central and western cities, coastal cities are larger and more populated, so it was more di cult to prevent and control the spread of the disease, but their impact coe cient was still negative. From the MGWR model, it can be found that the area with weak negative correlation effect extended north from Beijing to the northeast, Inner Mongolia and Xinjiang, seemingly foretelling the further outbreak of the epidemic in Heilongjiang, Inner Mongolia, Xinjiang and other regions in May. It was worth noting that there were many signi cant areas in the emigration scale index and the immigration scale index, indicating that emigration and immigration are crucial to the spread of the epidemic. In order to better prevent and control the epidemic, tight control should be imposed on the emigration and immigration. Figure 1 The cumulative number of con rmed cased in prefecture-level cities Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. LISA map of the cumulative number of con rmed cased in prefecture-level cities Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.