Modelling and forecasting the spread tendency of the COVID-19 in China

To forecast the spread tendency of the COVID-19 in China and provide effective strategies to prevent the disease, an improved SEIR model was established. Parameters of our model were estimated based on collected data that issued by the National Health Commission of China (NHCC) from January 10 to March 3. The model was used to forecast the spread tendency of the disease. The key factors influencing the epidemic were explored through modulation of parameters, including the removal rate, the average number of the infected contacting the susceptible per day and the average number of the exposed contacting the susceptible per day. The correlation of the infected is 99.9% between established model data in this study and issued data by NHCC from January 10 to February 15. The correlation of the removed is 99.8%. The average forecasting error rates of the infected and the removed are 0.78% and 0.75%, respectively, from February 16 to March 3. The peak time of the epidemic forecast by our established model coincided with the issued data by NHCC. Therefore, our study established a mathematical model with high accuracy. The aforementioned parameters significantly affected the trend of epidemic, suggesting that the exposed and the infected population should be strictly isolated. If the removal rate increases to 0.12, the epidemic will come to an end on May 25. In conclusion, the proposed mathematical model accurately forecast the spread tendency of COVID-19 in China and the model can be applied for other countries with appropriate modifications.


Background
, and the elderly and those with chronic diseases are most susceptible to severe forms of COVID-19 [3,4]. Multiple strategies have been developed to fight against the spread of COVID-19, including strict isolation, early diagnosis and supporting treatment. Currently, there is no specific medicine to combat the novel coronavirus. Therefore, it is a crucial step to forecast the spread tendency of the acute infectious disease based on the epidemiology data. Mathematical modeling is one of the most effective methods for forecasting of infectious disease outbreaks and thus yield valuable insights suggest how future efforts may be improved. An important method for epidemiological studies of such acute infectious diseases is mathematical modeling.
Since the epidemic outbreak, some scholars have established mathematical model to forecast the spread of COVID-19 [5][6][7][8][9][10][11][12][13][14]. Wu et al. [9] calculated that the basic reproductive number of new pneumonia ( ) is 2.68 based on the established susceptible-exposed-infected-removed (SEIR) model, and forecast that the number of infected people in Wuhan would be 75,815 on January 25, 2020. What's more, the authors also forecast the number of infected people imported from Wuhan to Chongqing, Beijing, Shanghai, Guangzhou and Shenzhen. However, the number of infections reported in this study is inconsistent with the number issued by NHCC (1870 cases), and the difference is large. Zhou et al. [10] used the SEIR compartment model to characterize the early spreading of COVID-19, and forecast basic reproduction number ( ) is between 2.2 and 3.0. However, the model hasn't considered that the susceptible can be infected by confirmed patients during the incubation period that can't be ignored.
Besides, Tang et al. [5] established a complicated model and forecast that the control reproduction number ( ) may be as high as 6.47 (95% CI 5.71-7.23). Using their estimated parameter values, the number of the infected will reach the peak around March 10, 2020 and the peak number of the infected is . However, the peak number issued by NHCC is 58016 and the time of reaching the peak is February 17.
Therefore, a more accurate mathematical model is highly anticipated to forecast the spread tendency of COVID-19.
In this study, we re-established a SEIR model of COVID-19 based on the its transmission mechanism. The aim of this study was to obtain a more accurate mathematical model to forecast the spread tendency of epidemic and provide guidance to control the spread of COVID-19.

Theoretical mathematical model
The most classical model for studying infectious diseases is the SIR compartment model, which was proposed by Kermack and McKendrick in 1927 when studying the black plague [15]. SIR model divides population into three categories: Susceptible, which means the uninfected persons who lack immunity; Infective, which means the persons who are capable of spreading the disease to a susceptible person; Removed, which means the dead or healed [16]. The basic model is as following: (1) where , and are the susceptible, the infected and the removed, respectively. is infection rate and is removal rate.
Gradually, researchers realized that the incubation period-the time between exposure and the start of symptoms, should be taken into consideration [17][18][19].
Therefore, the classic SEIR model was modified as following: According to the classic SEIR model, this study divided the population into four groups: susceptible ( ), exposed ( ), infected ( ) and removed ( ). As the susceptible can be infected by the exposed and COVID-19 confirmed infection, we proposed the following transmission mode ( Figure 1): The transmission mechanism of COVID-19.
In this epidemic, the natural mortality of newborns and the natural death of all population were ignored for a short period of time. Because the patients can infect the susceptible during the incubation period, we established the following mathematical model in this study: ( where , , and are the susceptible, the exposed, the infected and the removed, respectively. represents the average number of the infected contacting the susceptible per day, represents the probability of infection by the infected.
represents the average number of the exposed contacting the susceptible per day, represents the probability of infection by the exposed. is the probability of the exposed to become the infected. is the removal rate which includes the cure rate and death rate. is the total population and .

Data
Next, we estimated the parameters based on the number of the infected, the cured and deaths that issued by NHCC every day at 12 pm to obtain an accurate model. Based on the data issued by the NHCC from January 10, 2020 to March 3, 2020, the statistical results are shown in Table 1:

Estimation of model parameters
In order to get the value of the parameters in equation (3) based on data issued by NHCC, algorithm (fmincon and lsqnonlin function) was programmed to estimate the 6 parameters by matlab 2017b. In this study, all the parameters were defined to be nonnegative and bounded, because each parameter has its own significance. Secondly, based on real data, fmincon function was employed to estimate the approximate range of each parameter. Estimated parameters by the fmincon function were regarded as the initial values. Thereafter, further estimation was performed using the lsqnonlin function to achieve the best fitting effect between the simulation curve and the real data curve.

Average forecasting error rate
The equation of average forecasting error rate (AFER) is as following: (4) where is the real value, is the forecasting value, is the number of all data which need to be forecast.

Parameters estimation and forecast
Based on Table 1, the method of data-driven modeling was adopted [20,21]. In order to combat this epidemic, Chinese government has adopted a series of measures, such as the establishment of Huoshenshan Hospital, Leishenshan Hospital, Fangcang Hospital, Wuhan quarantine, home isolation, and sending detachment of medical personnel. The simulation was divided into nine stages. The detailed steps are as follows.
The initial values of the total population, the exposed, the infected, the removed and the susceptible are , 0, 41, 2 and ( ), respectively. The estimated average number of the infected contacting the susceptible per day ( ) is 20, the estimated infection rate by the infected ( ) is . Therefore, is similar to the reported literature [22]. The average number of the exposed contacting the susceptible per day ( ) is 20, and because the probability of infection by the exposed is lower than that of the confirmed infected patients, we set . The probability that an exposed person turns into a confirmed infected patient ( ) is 0.079, which is similar to the reported literature [21]. The removal rate ( ) is 0.001, and is consistent with the reported literature [14].   From March 4, 2020, with the increase of removal rate and greater isolation strength, the parameter will be larger, and will be smaller.

Correlation and average forecasting error rate
Based on the aforementioned nine stages, the date January 10, 2020 was regarded as the starting point, and the data from January 10 to February 15, 2020 was regarded as the training set for model parameter estimation. The between established model data in this study and issued data by NHCC is 99.9% and the is 99.8%. The data from February 16, 2020 to March 3, 2020 were used to forecast and verify the model.
The data of the number of the infected of model forecast and issued by NHCC were shown in Table 2. Therefore, the AFER of the infected was 0.78%. Similarly, the AFER of the removed was 0.75% (see Table 2).  3.3 The dynamic trends of the susceptible, the exposed, the infected and the removed Based on the above nine stages, the dynamic trends of the susceptible, exposed, infected and removed for 54 days (from January 10, 2020 to March 3, 2020) were simulated. Figure 2 showed the dynamic trends of the susceptible where the initial population is ( ) and has a sharp drop from January 19, 2020 to February 5 and remain relatively stable. The dynamic trends of the exposed showed the number of the exposed increased consistently and reached its peak on February 4 and began to gradually decrease (see Figure 3). The dynamic trends of the infected showed that when the government took intervention 2, the growth rate of the infected population decreased significantly. When diagnostic criteria have changed, namely, the government added characteristic CT imaging patterns to the confirmed cases, there was a sudden increase of the number of the infected. The number of people infected reached the peak on February 17 and gradually began to decrease (see Figure 4). The dynamic trends of the removed showed that the number of the removed continued to increase which is consistent with the data issued by NHCC.

Influence of removal rate on epidemic ( )
From the 55 th day (March 4, 2020), the influence of the removal rate ( ) on the infected and the removed were investigated on condition of changing the removal rate ( ) and fixing other parameters.
was gradually increased from 0.02 to 0.12. As shown in Figure 6, the dynamic trends of the infected with different removal rates suggested the larger the removal rate is, the better the control effect is. when , 200 days later (July 29), there are still large numbers of infected individuals, while , 135 days later (May 25), there will be no infected individuals (see Figure 6).
Similarly, if , 200 days later (July 29), the number of the removed individuals increases, namely, there are still many patients that need to be treated. If , the number of the removed become stable (Figure 7). Therefore, it indicates that if the removal rate can be improved, the epidemic will be effectively controlled and terminated earlier. From Figure 7, we also get the conclusion that the larger the removal rate is, the shorter the time it takes for the removed to become stable. This also corresponds to the shorter time to reach the stable point in Figure 6. Since the government sent a large number of medical teams to support Wuhan, the epidemic was effectively controlled and the number of the infected decreased, which is consistent with our forecasting results.  the number of the infected will increase, and the epidemic will be out of control ( Figure   8). Figure 9 showed trends of the removed with different . If , 160 days later (June 18), the removed will not increase and remain stable. But if the isolation rate , it will take a long time for the epidemic to be effectively controlled, or even out of control. For example, if , the number of the infected will increase continually.
This is because more persons will be infected and need to be treated. Therefore, timely isolation of the infected and close contacts, and establishment of Huoshenshan Hospital, Leishenshan Hospital and Fangcang Hospital will be very effective.

Influence of the average number of the exposed contacting the susceptible per day ( ) on the epidemic
In this section, the influence of the average number of the exposed contacting the susceptible per day ( ) on the epidemic was studied by numerical simulation. When and other parameters keep fixed, 160 days later (June 18), the infected will disappear. If the parameter increases from 0.1 to 3, the epidemic will become uncontrollable. For example, when , the number of infections will decrease on the beginning and then increase gradually, and the epidemic will outbreak again (see Figure 10). Similarly, more patients will need to be treated (see Figure 11).

Discussion
Due to the rapid spread of COVID-19 and no vaccine or effective treatment available for the epidemic, it has been declared by the World Health Organization as an "international public health emergency". Even the Chinese government has exerted massive efforts to fight against the epidemic COVID-19, there are still new cases of infection every day. In order to scientifically forecast the spread tendency of the disease, we established a mathematical model on COVID-19. The forecasting accuracy of our model has been confirmed by the data issued by NHCC.
In this study, we first modified a classical SEIR model and established a mathematical model according to the possible transmission mechanism of COVID-19.
Based on the official data issued by NHCC, the model parameters were estimated and the trends of the infected and the removed populations were forecast in the short term.
The average forecasting error rates of the infected and the moved were 0.78% and 0.75%, respectively. Next, series of parameters on this epidemic were studied, including the influences of removal rate ( ), the average number of the infected contacting the susceptible per day ( ) and the average number of the exposed contacting the susceptible per day ( ) . Compared with the reported mathematical models, the main advantages of our model are that the correlation of the infected ( ) between model data and issued data by NHCC is as high as 99.9%, the correlation of the removed ( ) is as high as 99.8% and the AFER of the infected and the removed are as low as 0.78% and 0.75%, respectively. Therefore, the forecast data derived from our model are approximately validated by the real data issued by NHCC.
We further explored how parameters influence the epidemic by numerical simulations. The results showed that when increase the removal rate ( ) and keep other This theoretical induction has been proved as the number of the cured increased, since government dispatched medical teams to Hubei province.
Besides, if the average number of the infected contacting the susceptible per day ( ) can be effectively reduced, the epidemic can be controlled earlier. For instance, if is reduced from 5 to 0.01, the epidemic will disappear by May 25, and if , the number the infected will be increasing. In addition, if the average number of the exposed contacting the susceptible per day is 3, the epidemic will outbreak again, and if is controlled to be less than 0.1, the epidemic will terminate soon. These results have been proved as the number of the infected reached the peak and then decreased, since government has demonstrated an unprecedented level of efforts in dealing with the COVID-19, such as to set up specialized hospitals for nCoV patients, namely Huoshenshan and Leishenshan hospital, Fangcang Hospital.
In conclusion, our established mathematical model can provide theoretical guidance for effective prevention and control of the epidemic COVID-19 in China.
With appropriate modifications, it could be applied for other countries currently attacked by the epidemic.

Ethics approval and consent to participate
Not applicable

Consent for publication
Not applicable

Availability of data and materials
All data generated or analysed during this study are included in this published article.