In present study, we identified five features that may affect the rebound and severity of the COVID-19 epidemic, and constructed models to predict the severity of the rebound of the COVID-19 epidemic in China.
In China, the rebound of COVID-19 has caused tremendous hardships and challenges. As of February 3, 2021, 33.1 million COVID-19 vaccine doses have been distributed in the mainland of China (13). However, the COVID-19 rebounds are still occurring. For each COVID-19 rebound, the number of infected people and the duration are different. Some studies analyzed the associations between the outcomes of COVID-19 and socioeconomic and environmental factors (14, 15). Based on the empirical evidence, we speculated that socioeconomic and environmental factors may confer the heterogeneity of COVID-19 rebound among regions.
First, we analyzed external data that may cause the rebound of the COVID-19 in China by the dimensionality reduction and clustering methods. A larger eigenvector value represents a strong correlation between the corresponding factors and the rebound of the COVID-19. The PCA results showed that there exist strong associations of the number of neighbouring countries and the total number of COVID-19 cases in neighbouring countries with rebound of COVID-19. Additionally, we found that cities with the number of cases above the average were concentrated in tier 2 and tier 3 cities. Remarkably, differences between tier 2 or tier 3 cities and tier 1 cities are reflected in internal factors such as medical level, demographic characteristics, and meteorological characteristics. Second, we explored the correlations among factors, and found that there were strong correlations among internal factors. In addition, there were strong correlations between transportation related factors and internal factors. Therefore, we divided these factors into nine categories (“air”, “cargo”, “serve”, “foreign”, “unemployed”, “older”, “urbanratio”, “sexratio”, “aquatic”) to reflect the characteristics of a city from different aspects.
To measure the severity of the COVID-19 epidemic, we should not only consider the absolute number of infected people, but also the density of infected population, and the duration of epidemic. Thus, we chose incidence density as the response variable to select five optimal features (“urbanratio”, “unemploy”, “older”, “serve”, and “air”) based on importance score in Random Forest model, and XGBOOST model.
We define “air” as an index that can reflect the comprehensive meteorological characteristics. Associations between meteorological factors and COVID-19 have been discussed in previous studies(16–18). Chien et al. found that temperature, minimum relative humidity, and precipitation can address the impact of meteorological factors on COVID-19, and increasing temperature and precipitation likely can reduce the risk of COVID-19(16). Raza et al. also reported that precipitation and humidity may confer the reduction in COVID-19 cases, and temperature may associate with the increase of COVID-19 cases(18). In this study, we documented that the meteorological factors can be used as a factor to predict the incidence density of COVID-19 rebound.
Synthetic feature “serve” is defined as an indicator reflecting the overall development level of a city including economic level, medical level and service level of a city. We found that this feature showed the highest importance scores both in Random Forest model and XGBOOST model, which indicated that feature “serve” conferred the most of incidence density of COVID-19. In general, due to the large population flow, prosperous cities are more likely to have imported cases, which will lead to the rebound of the COVID-19. However, prosperous cities tend to have higher medical levels and medical resources, and better prevention and control managements, which may contribute to avoiding a serious COVID-19 epidemic.
Additionally, feature “sexratio” and “urbanratio” reflect the demographic characteristics of cities. Understanding the sex differences in COVID-19 severity requires recognitions of both biological and social factors. Sex hormones play an important role in immune regulation, and females can produce greater inflammation and humoral immune response to viral infections(19–23), which may confer the gender differences in the severity of COVID‐19. In addition, gender differences in smoking rate has been suggested to contribute to differences in virus susceptibility and progression (24). Additionally, the mobility and agglomeration of population in cities or towns are often greater than those in rural areas. Thus, a higher urbanization level may confer a higher epidemic risk and faster virus transmission.
With the public data, we demonstrated that the prediction model based on machine learning method can provide insights for the resource planning of health care services to address the burden of COVID-19 pandemic. We constructed three concise models based on the five optimal features. The RF model showed the largest AUC while the traditional logistic regression model was the smallest, which means in our study, RF model is optimal for predicting the severity of COVID‐19 rebound. In China, sporadic cases with positive SARS-CoV-2 across the country are still emerging. Our study may provide a novel insight into the urgent problem: how to balance the effectiveness of control and the negative impact of control.