Determinants on COVID-19 Case Fatality Rates of Cities in China: A Logit-NB Hurdle Model Analysis

Background: The ongoing Coronavirus Disease 2019 (COVID-19), a global pandemic with high infectiousness and high mortality, has seriously threatened human health, life safety and caused enormous economic losses. This study investigates the inuencing factors on the case fatality rate (CFR) of COVID-19 at the city level in China. Methods: A logistic-negative binomial (Logit-NB) hurdle model is employed to examine the determinants on the probability of death and the value of CFR with COVID-19, based on conrmed cases and deaths by 13 March 2020 and 25 January 2021 at the city level in China and related environmental, demographic, and socioeconomic data. Results: We found that the probability of death from COVID-19 will increase by 1% with 1 newly increased conrmed case and increase by 4% in response to a rise of 1 unit in the air quality index. CFR will feebly increase with the number of conrmed cases, with the estimator being 2.81E-05. As the number of doctors increases by 10,000, CFR will decrease by 0.18%. Each 1% increase in the humidity leads to a 0.02% decrease in CFR, and each 1-unit increase in the population density causes a 0.09% decline in CFR. The comparison between the two research periods conrms the robustness of the results. Conclusions: The number of conrmed cases and the air quality are closely associated with the death probability, while the number of conrmed cases, the medical resources, the humidity, and the population density signicantly affect the CFR. Furthermore, the air quality and population density stand out in the rst wave of epidemic outbreak, while they become non-signicant in the second wave.

The fatality rate of COVID-19 is affected by multiple factors, including air pollution, climatic conditions, demographic characteristics, socioeconomic factors, and the controlling measures. Many scholars focus on the close association between air pollution and COVID-19 cases and mortality rates [4][5][6][7][8][9][10]. As an essential environmental factor, climatic conditions also in uence the death rates of COVID-19 [11], mainly measured by temperature and air humidity [12][13][14]. Demographic characteristics have remarkable effects on the mortality of patients with COVID-19: age is the dominant factor; besides, gender, race, ethnicity, medical history (such as comorbidity and obesity), and neighborhood characteristics also play a signi cant role in determining the CFR [15][16][17][18]. The socioeconomic factors exert speci c impacts on the COVID-19 spread, including income, unemployment, inequality, poverty, total population, population density, human mobility, medical resources [17,[19][20][21]. Finally, the government actions (such as containment measures, travel restrictions, and social distancing policies) prove to be effective in mitigating the spread of the disease and reducing the con rmed cases and deaths [22][23][24].
Scholars employ traditional statistical methods (such as multivariate and panel regression) to reveal the effects of demographic and clinical characteristics on the mortality of patients with COVID-19 [25][26][27][28]. At the regional level, scholars use GIS-based spatial analysis methods and spatial regression models to evaluate the impacts of environmental conditions, socioeconomic factors, demographic features (age, sexual, racial, and ethnic structure) on the spatial distribution of COVID-19 cases and deaths, based on the data of country level, state level, county level, or city level [16,17,20,22,29,30].
The CFR of COVID-19 in China (4.79% on 25 January 2021)is more than double of the worldwide CFR (2.15% on 25 January 2021). One of reasons why China's CFR is so high may be that people knew little about the virus early in the epidemic. The studies on the determinants of China's COVID-19 mortality focus more on the individual level from patients' perspective in Wuhan of Hubei Province [28,[31][32][33].
There are relatively few studies at the city level in China, which mainly concern the impacts of air pollution, climatic factors, and medical resources [34][35][36]. It is signi cant to analyze the in uence factors at the city level, as Chinese city governments have played an important role in taking timely measures to mitigate the spread of the epidemic. Besides environmental factors and medical factors, demographic characteristics and socioeconomic factors also need to be considered. Since China has successfully controlled the spread of the virus by 13 March 2020, the CFRs of COVID-19 in many cities outside Hubei Province are zero, and the zero-in ated models will better t the data [37,38]. This study employs a Logit-NB hurdle model to examine the determinants on COVID-19 CFRs of cities in China and provides evidence for responding to the public health crisis in the future. Speci cally, we will address the following questions: (1) Does the probability of death from COVID-19 and the CFR value belong to two different processes?
(2) If they are different processes, what are the respective determinants on them?

Data Collection
The data of COVID-19 cumulative con rmed cases and deaths are collected from the China National Health Commissions (CNHC, http://www.nhc.gov.cn) and the provincial Health Commissions by 13 March 2020 and 25 January 2021. The dataset covered 280 prefecture-level cities that have public data online. The case fatality rate (CFR -spring and CFR-2021) of COVID-19, as the dependent variable in this study, is measured by the number of deaths per 100 con rmed COVID-19 cases by 13 March 2020 and 25 January 2021, respectively. The cumulative CFR on 13 March 2020 and 25 January 2021, respectively, represents the rst and second waves of COVID-19 spread. In the rst wave of a massive disease outbreak, Chinese governments have no experiences in dealing with this epidemic, and it takes 2 months to control the spread of the virus; while in the second wave, Chinese governments have enough experiences in prevention and control measures, contributing to the rapid containment of sporadic outbreaks. The comparison of the con rmed cases and CFR between these 2 research periods can better reveal the in uence factors on CFR in the whole process of responding to COVID-19 and the effect of disease controlling experiences. When the pandemic is still ongoing, the current CFR will not re ect the real situation because the infected people are likely to die in the future. Until 13 March 2020, however, the rst wave of epidemic spread has been curbed in China, and the con rmed cases and deaths do not grow considerably. Between 13 March 2020 and 25 January 2021, the con rmed cases with COVID-19 have grown very slowly, not to mention the CFR. Therefore, the CFR is a reasonable indicator to measure the developing state of the epidemic. The CFR by 13 March 2020 varies signi cantly from city to city, with 65 non-zero CFR and 215 zero CFR. By 25 January 2021, the number of cities with non-zero CFR has increased to 68.
The spatial distribution of COVID-19 CFRs is shown in Fig. 1. As shown in Fig. 1a and Fig. 1b, the distribution at both times is similar. There exists signi cant spatial autocorrelation in CFR. The highest CFR values are mainly concentrated in Hubei Province cities, ranging from 2-7% on 13 March 2020. The spatial distribution of CFR in cities outside Hubei Province is relatively random and uctuates considerably, ranging from 0 to 15%, due to a greater uncertainty of statistical inference caused by a smaller number of deaths. The con rmed cases in those cities are relatively smaller (even single digits), easily leading to extremely high CFR values (as shown in several spots of red color in Fig. 1). COVID-19 cases versus deaths in Hubei Province and other provinces are shown in Fig. 2. The slope in Fig. 2 represents the average CFR. The average CFR in Hubei Province has increased from 4.9-8.0%, while it decreased from 0.83-0.4% in other provinces. Cities in Hubei Province had much more cases and much higher CFR than other cities. There exists a signi cant linear relationship between the number of con rmed cases and death cases with a high value of R 2 in Hubei Province (Fig. 2a), illustrating that the CFR of each city in Hubei Province is consistent. The scatterplot of cities outside Hubei Province presents a roughly linear relationship, whereas a great disturbance on CFR emerges, resulting from the death cases ranging from 0 to 6.

Statistical analysis
Since the numbers of deaths in 215 out of 280 Chinese cities are zero in 2010, there exists an obvious zero-in ation problem in the regression of the CFR. However, extant studies on the CFR often ignored the zero-in ation problem, which led to statistical biases [37,38]. There are two reasons for the zero-in ation problem: (1) The epidemic in China was under control. 84% of the cases occurred in Hubei Province until 13 March 2020. The average con rmed cases inside and outside Hubei Province were 5916.6 and 46.3, respectively. In contrast, there were fewer cases distributed in other regions. The slope in Fig. 2b represents the CFR, which is 0.83%. Hence, the average deaths in those cities were 46.3*0.83%=0.38. The average number of deaths was less than 1, resulting in no deaths in most cities.
(2) Some cities did not have the medical conditions to receive critically ill patients. Many critically ill patients were sent to surrounding cities with better medical resources.
A hurdle model is employed in this research to deal with the zero-in ation problem. It is a two-part model that speci es one process for zero counts and another process for positive counts. The rst part we used is a binary logistic model, which estimates the probability of attaining non-zero CFR predictors. The second part we used is a truncated negative binomial regression model, which estimates the predictors of the non-zero CFR values. The truncated negative binomial regression model will be better to explain the CFR, considering that the overdispersion problem may happen in a Poisson model. Therefore, the Logit-NB hurdle model employed in this study is demonstrated as follows: where Y 1 signi es the probability of whether there is a death case in a city, assigned as 1 when the answer is yes, otherwise 0; Y 2 denotes the CFR; y i is the dependent variable of city i, x i is the explanatory variable vector of city i, α and β are parameters. Various medical, environmental, demographic, and socioeconomic factors were compiled and considered explanatory variables in Table 1.
Based on the extant literature, the in uence factors on the CFR with COVID-19 consist of medical factors, environmental factors, demographic characteristics, and socioeconomic factors. The medical factors include the number of con rmed COVID-19 cases and the number of doctors. The former indicator is gathered from the National Health Commission and the Provincial Health Commissions, closely associated with the CFR shown in Fig. 2. The latter indicator is a good proxy to assess the healthcare capacity (medical resource availability and accessibility), explaining different mortality rates in different regions [39]. The environmental factors are composed of air quality (or pollution) and climatic conditions. AQI represents air quality, and climatic conditions are measured by humidity and temperature. Demographic characteristics consist of age, ethnicity, gender structures, the proportion of the population with non-agricultural hukou and population density, directly connected to the COVID-19 mortality [18]. Socioeconomic factors incorporate GDP per capita, the percentage of unemployment, and the insurance coverage, which re ect people's socioeconomic status that will in uence their health outcomes [20]. Regarding the speci c data sources, the number of doctors, the population density, GDP capita, the percentage of unemployment and the percentage of employees joining the urban basic medical care system are collected from China City Statistical Yearbook 2019, which records the newest available data of cities in 2018. The AQI daily observation data is acquired from the Ministry of Ecology and Environment of the People's Republic of China, and the average daily values of each city during the 2 research periods (from 1 January 2020 to 13 March 2020, and from 1 January 2020 to 1 January 2021) are calculated. Similarly, the average humidity (%) and the average temperature (Celsius) during the research period are gathered from the China Meteorological Administration. The average age of residents, the proportion of ethnic minorities, the percentage of males, and the proportion of the population with non-agricultural hukou derive from the Sixth National Population Census of China, which records the demographic data of 2010 and is the latest data available in public because China conducts national population census every ten years. All data that support the ndings of this study are secondary data, and no human participants are involved.

Results
To test the multicollinearity in the regression model, we calculated the variance in ation factor (VIF) and found that each variable's VIF value is less than 3, indicating that there is no multicollinearity in our model. A series of control variables have been incorporated into the model, and the dependent variable is lagged from the independent and control variables, thus reducing the possible endogeneity problem to some extent. The descriptive statistical analysis of all variables is shown in Table 2. In both parts of the Logit-NB hurdle models, we established the CFR models in 2020 and 2021 as the dependent variables. The Cases, AQI, Humidity, and Temperature variables are different in the 2 research periods while others are constant. The rst part of the hurdle model is a binary logistic model, with regression results displayed in Table 3. The binary logistic model explains whether a city has a death case with COVID-19. Model 1 represents the model using the CFR by 13 March 2020 (CFR-spring), and Model 2 represents the model using the CFR by 25 January 2021 (CFR − 2020). We used the average AQI, humidity, and temperature from 1 January 2020 to 13 March 2020 for Model 1 (AQI-spring, Humidity-spring, and Temperature-spring), and the yearly average AQI, humidity, and temperature in 2020 for Model 2 (AQI-2020, Humidity-2020, and Temperature-2020). The value of R 2 in Model 1 is 0.32, indicating that the binary logistic model has speci c explanatory power for whether a city has a death case. Among the explanatory variables, the regression coe cients of the number of con rmed cases (Cases-spring) and the air quality index (AQI-spring) are signi cant, while other variables are not signi cant. The estimator of Cases-spring is 1.01, meaning that the new odds will be 1.01 times the original odds, with the number of con rmed cases increasing by 1.
The probability of death's appearance will increase by 1% with 1 newly increased con rmed case. It indicates that the mortality rate for each new case in a city is 1%, which is near to the CFR of 0.83% in Fig. 2b. The air quality index exerts positive impacts on the appearance of death: the new odds of death will increase by 4% compared with the original odds, in response to a rise of 1 unit in the air quality index.
It indicates that air quality worsening (air pollution) will increase the mortality risk, consistent with the extant research [40,41]. In the logistic part of the hurdle model, the regression result of Model 2 is very similar to that of Model 1, which proves our results' robustness. The only difference between 2 models is that the air quality is not signi cant in Model 2. estimator of Cases-spring is 0.00003, signifying that the CFR will feebly increase with the number of con rmed cases. As the number of doctors increases by 10,000, the CFR will decrease by 0.18%, demonstrating that better medical resources will reduce the mortality risks. Humidity-spring negatively in uences death rates: each 1% increase in Humidity-spring leads to a 0.02% decrease in CFR-spring. Therefore, under all other factors being equal, the CFR-spring will be relatively lower in the southeastern region. The population density (Density) exerts negative impacts on CFR-spring. This can be attributed to the fact that the larger cities with higher population density usually have better healthcare systems at higher density locations [42], and it is more convenient to obtain timely detection and treatment for the infected people. Model 2 is similar to Model 1, except that Density is signi cant in Model 1 while nonsigni cant in Model 2. The results con rm their robustness. The absolute values of estimators of Cases, Doctors, and Humidity in Model 2 are all larger than those in Model 1, implying that the accumulated experience in response to the public health crisis does have speci c effects on reducing CFR.

Discussion
Given the enormous damages to human society caused by the spread of COVID-19, robust scienti c evidence will signi cantly contribute to the epidemic responses, especially the successful disease prevention and control experiences in China. Therefore, it is crucial to clarify the in uence factors that signi cantly affect the CFR with COVID-19 by conducting a multi-city study in China.
In this study, a Logit-NB hurdle model is employed to deal with the zero-in ation problem since nearly 3 quarters of cities have zero-value CFR, which dramatically reduces the estimation bias and improves the explanatory power and goodness of t of the model. The Logit-NB hurdle model also re ects the 2 different CFR determinations: whether there is a death from COVID-19 in a city and how high the non-zero value of CFR in a city is. During these 2 different processes, the in uence factors are different. The application of the Logit-NB hurdle model in CFR research with COVID-19 will provide methodological guidance for epidemic response.
Regarding the determinants, the number of con rmed cases is the only signi cant variable in both 2 parts of the Logit-NB hurdle model, which is much in evidence since it is the denominator of CFR. In the rst process of the Logit-NB hurdle model, the air quality impacts death probability, while the medical resources, the humidity, and the population density matter in the second process of the Logit-NB hurdle model. As it is known to all, air quality plays a vital role in the spread of COVID-19 because aerosol is a potential transmission route for COVID-19, embodied in the level of airborne PM pollution [34]. The air quality has affected the mortality probability in the last year due to its direct effects on the con rmed cases with COVID-19. The medical resources and the humidity are both crucial factors in determining CFR in different cities. The timely supply of medical resources (including medical staff and facilities) is the key to controlling the 2 waves of outbreaks effectively in China. As an important climatic factor, humidity has signi cant negative in uences on CFR, which follows the previous literature results [40]. However, the socioeconomic factors and demographic characteristics do not affect the death probability and CFR, except for the population density. The fact that people in China all enjoy free medical treatment services for COVID-19 signi cantly contributes to reducing CFR, no matter which levels of cities they are in and which kinds of groups they belong to.
The air quality and population density exert signi cant impacts on the mortality risk and CFR in the rst wave of epidemic outbreak, while they are non-signi cant in the second wave. The underlying reason is complicated, and we tried to gure it out from the newly increased con rmed cases and death cases between 13 March 2020 and 25 January 2021. We found the death cases are mainly concentrated in Hubei Province and very few in other provinces (only single digits). There are 360 newly increased con rmed cases in Hubei Province from 13 March 2020 to 25 January 2021, while 1435 newly increased deaths emerge during the same period. It indicates that most of the new death cases on 25 January 2021, resulting from the con rmed cases on 13 March 2020. In other words, the emergence of COVID-19 deaths in the second research period lags behind the con rmed cases. The critical patients with COVID-19 in the rst period are mainly affected by this acute respiratory disease and more sensitive to the air quality, so the worsening of air quality will directly increase the mortality risk. However, the critical patients in the second period have been suffering from COVID-19 for some time and maybe die from other comorbidities that are less in uenced by the air quality. Regarding the population density, in the initial stage of the epidemic outbreak, when there is a lack of experience in responding to the epidemic, the cities with higher population density usually are larger cities where people have better access to medical resources, thus leading to a relatively lower CFR. During the second period, cities in China have accumulated enough experience and are better prepared for the next outbreak with necessary medical resources, making the population density not important anymore.
The main ndings of this study have certain policy implications: To begin with, given the importance of the number of con rmed cases in determining CFR, the proposal of " atten the curve" [43] is still vital to help save lives and decrease CFR, by taking lockdown and social distancing measures to reduce the number of infected people in countries most affected by COVID-19, when limited by the current medical resources. Considering the 2 different processes in the Logit-NB hurdle model, the governments should attach importance to the air quality when preventing the emergence of death from this disease and emphasize the availability and accessibility of medical resources when the aim is to reduce the mortality rate. Finally, the estimators in Model 2 have shown greater effects than those in Model 1, signifying the accumulated experiences' in uence. The containment policies in China, including the immediate lockdown, community containment, self-quarantine, contact-tracing, as well as the free detection and medical treatment of this infectious disease for all residents, prove to be very effective in controlling the spread of the virus and reducing the CFR [44], which can also provide valuable references for other countries in the ght against the pandemic.
Despite its methodological contributions and practical implications, there still exist some de ciencies in this study. The dependent variable re ects the con rmed cases and CFR on 13 March 2020 and 25 January 2021, while some explanatory variables only re ect the annual average limited by the data accessibility. What is more, among explanatory variables, the city-level socioeconomic data from China City Statistical Yearbook 2019 records the situation of 2018. The Sixth National Population Census's demographic data records the conditions of 2010, which are all the most recent data available publicly in China.

Conclusion
Despite these limitations, this study discovered whether there emerges a death case and how high the CFR is proving to be 2 different processes. The death probability determinants are the number of con rmed cases and the air quality, while the determinants on the CFR include the number of con rmed cases, the medical resources, the humidity, and the population density. Besides, the air quality and concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.