The method of parameter reassessment for prediction of NCP(2019-nCoV) spread

This paper proposes a conversion rate prediction method and a parameter reevaluation method based on Logistic curve (S-curve) to predict the spread of NCP (the Novel coronavirus pneumonia). According to the statistical data, we use the conversion rate prediction method to predict the spread of NCP. The prediction accuracy is quite high. By fitting the cumulative number of NCP sufferers with the logistic curve, the average estimation method of the limit number is proposed to predict the spread of NCP and the limit number of sufferers. This paper also assessing the effectiveness of prevention and control measures with the dynamic estimation of the infection probability of NCP. Based on the Markov property, the parameter reevaluation method proposed in this paper avoids over-fitting the theoretical curve and improves the accuracy of prediction. This research idea is not only suitable for Logistic curve regression, but also for other regression prediction problems.


Introduction
In the past two years, breakthroughs have been made in the network science study.In 1998, Watts and Strogatz established the Small World (SW) network model [1,2] ; In 1999, Barabasi and Albert proposed the concept of Scale Free (SF) network [2,3] .The discovery of small-world effects and scale-free properties has universal significance in complex networks.As a result, it has attracted widespread attention.The theory and application of complex networks have developed rapidly [4,5] .The discovery of scale-free and small worlds has important implications for the understanding of disease spread [4] .On December 29, 2019, Hubei and Wuhan's health departments received a report on cases of aggregated unknown cause pneumonia from a local hospital [6] .A metagenomic analysis of the patient's lung lavage fluid revealed the presence of coronavirus.This virus was temporarily named as 2019-nCoV.On January 8, 2020, the new coronavirus was identified as the pathogen of this outbreak.On January 15th, the Chinese National Health Commission issued materials for diagnosis, treatment, prevention and etc. of pneumonitis associated with a new coronavirus infection.On January 20, 2020, the Chinese State Council agreed to incorporate the new coronavirus pneumonia into the management of the Infectious Diseases Law and the Sanitary Quarantine Law.Besides, the Council would start the nationwide emergency prevention and control work [6] .On February 8, at the press conference of the Joint Prevention and Control of the Chinese State Council, the spokesperson issued that the English name of the novel coronavirus-infected pneumonia is "Novel coronavirus pneumonia", "NCP" for short.If a person is regarded as a node, as long as one person can inhale the other's breath, there is an edge between them.This is a small and scale free network.Respiratory diseases can easily spread on such networks.NCP outbreaks in China during the Spring Festival, the large flow of people in all directions has turned a relatively static network into a dynamic network that is randomly connected on the original scale-free network with a greater probability.Such network has a huge number of nodes which have large degrees, and the disease spreads very quickly on such interpersonal networks.Although since January 23, 2020, Wuhan and other cities have suspended urban public transport, but the number of NCP sufferers has risen sharply still.As of 24:00 on February 6, 2020, a total of 31,161 cases have been confirmed; 1,540 cases have been discharged; 636 cases died; 26,359 suspected cases; 314,028 close contacts have been traced and 1,860,045 close contacts are still in medical observation.
With the rapid spread of NCP, there is an urgent need to predict the speed of transmission and the limit number of people infected.A large number of scholars have studied this problem by far.Based on 425 confirmed cases, the characteristics of new coronary pneumonia were studied, the doubling time and basic regeneration number of new coronary pneumonia were estimated, and the distribution of epidemiological delay time was analyzed [7] .The paper [8] estimated regeneration value of new coronary pneumonia based on SEIR model and predicted the development of the epidemic.The paper [9]  predicted the number of potential cases in wuhan at the very beginning of the outbreak.Besides, the epidemiological characteristics of new coronavirus 2019-nCoV transmission were studied in the paper [10, 11].Based on the characteristics of 2019-nCoV development until February 1st, the SIR model the paper [12] was modified to solve the dynamic equation of virus evolution.With the number of susceptible regeneration, current infection rate and latent infection rate, the paper also studied the changing trend of infected sufferers analyzed the influence of government administrative actions on the trend change.
The purpose of this article is to establish a new coronavirus transmission model, predict the cumulative limit of confirmed NCP sufferers (who has been tested by the nucleic acid kit and the result was positive) and enrich theoretical research related to new coronal pneumonia at the same time.It should be noted that the confirmed sufferers in this article are all recognized by nucleic acid kit.This paper is organized as follows: In section 2, data collection and interpretation; In section 3, proposing a method for predicting the conversion rate of new coronary pneumonia spread by defining a conversion rate of suspected diagnosis; In section 4, using the Logistic curve (also known as the S curve) to estimate the limit number of confirmed NCP sufferers; In Section 5, evaluating the effectiveness of prevention and control measures through the dynamic estimation of infection probability; In Section 6, proposing a secondary estimation method of Logistic curve parameters to predict the daily trend of cumulative NCP sufferers; Section 7 is the conclusion of this paper.

Data sources and collection
We collected the number of close contacts, suspects, infected individuals, cures, deaths and people under medical observation published on the websites of National Health Commission of the People's Republic of china and Wuhan Municipal Health Commission from January 15, 2020 to February 6, 2020, which are show in Table 1.

Conversion rate prediction method of NCP spread
The Chinese government has been reacting appropriately, adopting strict prevention and control methods and continuously publishing statistics on the number of suspected and confirmed patients every day.With these transparent information, we can use the conversion rate of suspected and confirmed cases to make predictions.Let us denote by I the cumulative number of confirmed NCP sufferers (Who has been through the nucleic acid test and the result were positive), I 1 the cumulative infection number of the day, I 0 the number of the day before, y the number of the suspected sufferers of the day, the conversion rate η can be expressed as And the prediction of the cumulative number of infected individuals of the next day P are given by It can be seen from Table 2 that the accuracy of using conversion rate method to predict NCP propagation is very high, and the relative error rate between most prediction data and real data is within 5%.The shortcoming of the conversion rate prediction method of NCP spread is that it cannot be used for long-term prediction.We will discuss the medium-and long-term prediction method of the NCP sufferers number later.

NCP spreading model and limit number estimation
Let S denote the susceptible state, the state of an individual before being infected.The individual is likely to be infected by a neighbor.Let I denote the infected state in which an individual has been infected.This individual can infect its neighbors with a certain probability  .Since we are predicting the cumulative number of confirmed sufferers, it can be assumed that people will remain infected once they are infected.This is the classic SI model of infectious diseases [4] .From this model, it can be seen that the cumulative number of sufferers meets the Logistic growth curve.Since the Logistic curve is S-shaped, it is also called "S-curve".The Logistic curve equation is Where t is a time series; I(t) indicates the cumulative number of confirmed sufferers at time t;  is the probability that an individual becomes infected with the 2019-nCoV virus; coefficient a is constant; L is the limit number of sufferers.As long as the values of parameters a 、  、L in the model are determined, we can use the model to predict the number of confirmed NCP sufferers.We can estimate L according to the three-point method and the four-point method [13] .
What is the maximum number of people who can be infected by this new coronal pneumonia?Wuhan "closed the city" on January 23, 2020, indicating that the country attached great importance to the epidemic on January 20 (According to the academician, Nanshan Zhong on live TV), and that all measures were basically implemented by January 27.Therefore, we estimate the limit data based on January 27 as L = 84754.According to linear regression analysis, we can get The data are collected from January 15 to February 6, 2020 according to Table 1, and the limit number obtained by the three-point method is L = 36457.However, as of February 9, 2020, the cumulative number of confirmed sufferers has reached 40171, indicating that the theoretical curve is overfitting and it is not conducive to the prediction of future data.The question how to estimate the limit number reasonably is raised.Let us denote by n C the cumulative number of confirmed sufferers on the day n.We propose the average estimation method: starting from the first three data, estimating L i , i = 1, 2, ..., n one by one with the three-point or four-point method [7] , taking the last non-positive L m as the benchmark and then averaging the following limit numbers as an estimate of L. This can be expressed as

Evaluating the effectiveness of prevention and control through dynamic estimation of infection probability
The incubation period of NCP is 14 days.the number of infected individuals has been counted since January 15, 2020.We dynamically estimate the infection probability from January 28.

Fig. 2. Dynamic estimation of infection probability
After the outbreak of the new coronavirus, level one response to major public health emergencies have been initiated across the country since January 29, 2020.It can be seen from Figure 2 that since January 28, 2020, the probability of new coronavirus infection has been decreasing over time, indicating that the national joint prevention and control measures have taken effect.

Parameter reassessment method and prediction of NCP sufferers
When using theoretical models to fit actual data, the pursuit is to minimize the average distance between each actual point and the corresponding theoretical point.The advantage of this is that the actual data is near the theoretical curve in the average sense; the disadvantage is that although the data fits well, the prediction results are not definitely good.This major public health emergency broke out in Wuhan.Medical resources were temporarily strained and the previous testing reagents, equipment, and personnel could not keep up.With the availability of medical resources later, the number of confirmed sufferers has gradually increased and such data has Markov property.Assuming that we have collected n cumulative confirmed cases, the number of confirmed sufferers in the future is not relevant with the cumulative number of confirmed sufferers before the number m.At number i, the cumulative number of confirmed sufferers is y i .The theoretical number of sufferers at time . Assuming the first parameter estimation has determined a and  , we can get the limit parameter L by the following equation Solving the equation ( 5), we get Assuming that the first parameter estimation has determined L and  , we can get the limit parameter a by the following equation Solving the equation ( 8), we get a Assuming that the first parameter estimation has determined L and a , we can get the limit parameter  by the following equation According to Table 1, making n = 23 and m = 16.Using equation (6) to estimate L, we can obtain the cumulative diagnosed prediction data of NCP transmission as shown in Table 3.The fitting result between the actual cumulative sufferers and the cumulative predicted sufferers is shown in Figure 3.  Fig. 3 is a graphical display of Tab. 3. It seems that the fit result of Fig. 3 is not as good as Fig. 1.

Tab
But the most recent 5 data in Fig. 3 fit better than in Fig. 1, and its prediction ability is stronger.At the time of writing this article's model, the data of NCP was only updated to February 6, 2020.The new coronavirus pneumonia data has been updated to February 12 now.Therefore, we can use these newly published data to test the specific prediction effect of the Logistic curve prediction method and the parameter reassessment method.It should be noted that on the 12th, 13332 clinically diagnosed cases in Hubei were included in the cumulative number of confirmed sufferers for the first time, while the confirmed patients were all recognized by nucleic acid kit in the past.Therefore, the number of cumulative diagnoses on the day increased by 15,152.In order to unify the data to test the effect of the prediction model in this paper better, the cumulative number of confirmed patients on the 12th day in our statistics does not include clinical individuals.The results are shown in Tab. 4  Comparing Tab. 4 and Tab. 5, we can clearly see that the Logistic curve method's relative error rate is unstable.Its relative error rate on February 12 is as high as 23.06%.While the relative error rates of parameter reassessment method are all within 5%.It's easy to draw the conclusion that the prediction effect of parameter reassessment method is quite better.

Conclusion
The paper is based on the situation that NCP statistical standards are unchanged.We use the Logistic model to study the spreading trend of NCP.By proposing a parameter reassessment method, the spreading trend of the new coronary pneumonia has been analyzed and predicted.The effectiveness of prevention and control measures is evaluated by dynamic estimation of infection probability.The prediction results in this article indicate that the spread of NCP will start to stabilize in mid-February 2020 and the epidemic will be basically controlled in late February or early March.By comparing the prediction results of the Logistic curve method and the parameter reassessment method, the validity of the parameter re-evaluation method proposed in this paper is verified.The results of this article have certain reference value for the formulation of joint prevention and control policies.This paper proposes

Fig. 1 .
Fig. 1.The cumulative number of infected individuals and Logistic curve fitting graph.

Figures Figure 1
Figures

Figure 2 Dynamic estimation of infection probability Figure 3
Figure 2 . 1. Statistics for the spread of NCP.
. 3. Prediction of the cumulative number of confirmed sufferers by parameter reassessment method.
and 5. Actual prediction effect of Logistic curve.