Estimation of the critical points of an epidemic by means of a logistic growth model

Epidemiological models have become a very important tool in understanding an epidemic’s develop-ment, mainly because they help researchers ﬁnd more eﬃcient strategies in their ﬁght against its spread. Several models have been proposed up to now: some use fractional calculus to solve diﬀerential equations while others use applications from other areas such as predator-prey models. The SIR and SEIR models, among others, mainly focus on the variable response and on epidemiological parameters such as the basic reproduction number (R 0 ) and infection rate per unit of time, neverthe-less they do not focus on the variable ‘time’. We propose the use of the variable time, as the main variable, by using a reparametrization in the logistic model since it will lead to the understanding of the epidemic as it goes along the time. Moreover, this model is important because it allows the estimation of the points of acceleration and deceleration, the point of maximum growth and the asymptotic point of the epidemic. This is only possible by getting an stable epidemic curve with an ‘S’ shape. In this work we use the variable ‘accumu-lated cases’ of COVID-19 of China and Italy and point out the main socioeconomic facts that occurred in each period of the estimated critical points from the logistic growth model.


Introduction
The first official statement by the World Health Organization (WHO) on the COVID-19 pandemic acted as a trigger on the scientific community in a real effort to study and publish papers to understand what has been happening and, therefore, help and guide in the fight against the virus. In this way, new mathematical models like those proposed by Sales [1], Shaikh [2], Zhang [3] and Li [4] added to the well-known epidemiological models SIR, SEIR, generalized logistic model, generalized Richards model and generalized growth model.
The logistic models, among the mathematical models proposed for epidemics, stand out because they present an 'S' format curve and, therefore, make it possible to obtain interesting information about the curve such as absolute and relative growth rate, points of acceleration and asymptote, whose parameters may have biological interpretation, see Seber [5]. All logistic models proposed in the literature up to now are those derived from the Richards model.
In the case of epidemics, Chowell [6] used, in addition to the generalized growth model and the generalized Richards model, the logistic model to characterize the contagiousness, forecast patterns and final burden of Zika Epidemics. However, the authors were vague about the logistic model used, since they only cite the use of a model but do not write down any mathematical equation to characterize the model.
On the other hand, Wu [7]explicitly wrote down the parameters of the logistic model used to describe and characterize an epidemic, nonetheless the authors expressed the logistic model as a solution of differential equations, as it is commonly used in epidemiological models. Only Zhou [8] presented the logistic growth model in the form of a function that is the closest to the model proposed in this work.
All models proposed so far either study the dynamics of virus transmission given some specific scenario or estimate non-critical points as in Zhang [3] and Li [4], however, none of these studies presented a proposal to determine when acceleration or deceleration occur in the number of individuals infected or when the number of those infected stabilizes. Time is a very important factor, mainly to avoid economic and social catastrophes [9]. We do not try to predict the future given predetermined settings, we start from an already established scenario, and then adjust a logistic model in order to estimate the critical points of this model and thereafter try to understand how political, social or environmental factors are related to such critical points of the proposed curve.
Our proposal is to place the variable time as the main factor using the classic logistic model. Derived from Richards model, this model is commonly used to describe the growth of animals [10]. To avoid any confusion with the logistical models used in epidemiological studies also derived from the Richards model, we will call our model a 'non-epidemiological logistic growth model'.
In short, the aim of our work is to estimate critical points of the non-epidemiological logistical growth such as the point of maximum acceleration, inflection point, point of maximum deceleration and asymptotic deceleration point using the number of accumulated cases as the dependent variable. Furthermore, we use the main stock exchange indices and some main political facts in China and Italy to show possible relationships to the critical points of the curve. China and Italy were chosen because they have shown a steady epidemic curve during the time of our study, that is, from the beginning until the end of our data collection.

Data collection
The database used in this paper is available on the website githubusercontent.
The data we analyze range from 12/31/2019 to 06/30/2020. The countries included in the analysis are China and Italy, because those countries managed to stabilize contamination by COVID-19 by the end of data period. We must point out that the new cases obtained on 02/13/2020 (China) and 06/20/2020 (Italy) were very atypical compared to the previous day and the day after. Thereat, these values were replaced by the average of the previous day with the day after.
The indices of the Chinese stock exchanges at Dow Jones in Shangai and at FTSE Mib in Italy, were obtained from the website www.investing.com. Since market data is not available on weekends and holidays, the indices on these days were replaced by the immediately preceding days so that the evaluated period of the COVID 19 cases coincided with the stock indices.
Estimation of the Autoregressive Integrated Moving Average (ARIMA) model Since the accumulated variable (object of this study) is the accumulation of the daily cases and this in turn is characterized as a time series whose values and residues are correlated over time, we adjusted an ARIMA model for all countries using the auto.arima function from the forecast package version 8.12 [11].
The ARIMA model has the following parameters: p, corresponding to the autoregressive part (AR) and means the number of necessary lags for the residues to become uncorrelated; d, corresponds to the integrated part (I) and means the number of differentiations necessary to transform a series into a stationary one; q, corresponds to the order of the moving average (MA) and means that a given observation can be explained by the error of q previous observations. In order to be considered a white noise, the errors (residues) that make up the ARIMA model must fulfill the following assumptions: have a normal distribution, with zero mean and constant variance and be independent, therefore, not correlated. However, since in our model those residues do not meet those assumptions , we made a boxcox transformation.
A stationary time series is such that its mean fluctuates around the same value, that is, its theoretical mean does not change and thus is constant over time, its variance does not change over time and its autocovariance is finite and does not change over time. The transformation of a time series into a stationary time series make possible the estimation of its parameters.
Since several combinations of the parameters p, d and q may occur, the best model was chosen using the information criterion of Akaike (AIC) [12].

Estimation of the logistic growth model
We first adjusted the ARIMA model in order to determine the estimated accumulated values and then used them to adjust the logistic model. The reparametrization was: where, A: is the parameter that represents the model's asymptote as x → ∞; B: it is a parameter without biological interpretation; k: is the parameter that represents the maturity index of an organism. The parameters of the logistic model were determined using the function nls of the stats package version 4.0.0 using the Gauss-Newton algorithm. The maximum acceleration points (map), inflection point (ip), maximum deceleration point (mdp) and asymptotic deceleration point (adp) were determined from model 1. The points of map and mdp were determined using the third derivative, the point of ip was determined by using the second derivative and the point of adp by using the method of non-significant difference [13]. The equations of each critical point are: 1. For map and mdp we first use the following equation, then finding the points where the derivative is equal to zero.
and mdp 2. For the ip we use, then finding the points where the derivative is equal to zero.
3. The adp is found by using the non-significant difference method. We use the algorithm shown below to explain how it works.
where A is the estimated parameter of the logistic model asymptote, p is the proportion of the asymptote, a fixed value between 0 and 1 and f (x i ) is the value of the logistical function (eq. 1) at point and Σ is the of variance-covariance matrix obtained from the estimated logistics regression model. (c) For each x i find the statistic T = ∆ xi / S 2 (∆ xi ).   The 95% confidence intervals were calculated for all parameters and critical points [5,14].
All analyses were performed using the R software [15].

Results
The values of the ARIMA model for each country are shown in table 1. The results show that the residues of the time series calculated from China were more correlated than those from Italy, since the model order indicated by the parameters p and q were respectively higher. As for non-stationarity, a greater number of differencing was necessary for Italy when compared to China. Table 2 shows the point estimates and confidence intervals of parameters and critical points. The point of map has been estimated at 34.93 days for China and 50.92 days for Italy counting from the first official case reported for both countries. The estimated value of the point of ip was 41.68 and 65.53 days for China and Italy respectively. For the point of dmp, the estimates were 48.43 days for China and 80.14 days for Italy. Finally, the points estimated of adp for China and Italy were 57 and 94 days respectively.
We now consider the estimates of the parameters A, B and k. They are not the subject of this study, but are worthwhile to be interpreted since they make reference to the dependent variable (number of accumulated cases). The estimate of the parameter A can be interpreted as a limit number of accumulated cases, that is, from the value of that estimate no significant increase in the number of cases occurs. In the case of China, the estimated value ranged from 73,048.35 to 73,499.35 cases accumulated at 72 days (we consider the lower limit of that confidence interval) since the beginning of   (a) Wuhan began to classify patients and adopted centralised patient management, dedicated hospital built to fight against COVID-19; (b) Huoshenshan makeshift hospital was put into use; (c) The first 3 make shift hospitals were put into use; 3. One fact stands out in the period between the points of ip and mdp (6 days) [19].
(a) Following a change in the diagnosis criteria of confirmed cases, the Chinese mainland reported 15,152 new confirmed COVID-19 cases, including 13,332 in Hubei Province. 4. Among the points of mdp and adp period (9 days), the following facts stand out [19]: (a) The number of newly discharged patients began to surpass the new confirmed cases; (b) Provinces and regions across the Chinese mainland began to downgrade the COVID-19 emergency response; (c) The China-WHO joint expert team held a press conference in Beijing; 5. Finally, from point of adp until 06/30/2020 (126 days), the following facts stand out [19]: (a) Hubei Province gradually revoked the outbound travel restrictions; (b) National Health Commission began to daily report the asymptomatic cases; (c) All inbound travellers to China were mandated to undergo a compulsory nucleic acid test; (d) Wuhan lifted outbound travel restrictions; (e) Beijing and its nearby provinces downgraded the emergency response to the second-level; (f) Hubei Province downgraded its emergency response to the second-level; The Dow Jones Shangai, which is the main index of the Chinese stock exchange, fell sharply until the point of map ( figure 2). Then, the index rose until shortly after the point of adp, the stabilization point of the number of people infected, then a sharp drop for 27 days and a resumption of growth until the last evaluation day.
In Italy, the main facts during the periods shown in figure 2  (a) The prime minister announced a starter plan for the so-called 'phase 2', which would start from 4 May;[27] 5. Finally, from the point of adp until 06/30/2020 (59 days), the following facts stand out: (a) The prime minister announced the government plan for the easing of restrictions -starting from 18 May most businesses could reopen, and free movement was granted to all citizens within their Region; [28] (b) Movement across regions was still banned for non-essential motives; [28] The main index of the Italian stock exchange (FTSE MIB) dropped until a few days before the first critical point of map, then remained stable until the last day of our study.

Discussion
Several statistical techniques have been proposed so far in order to obtain knowledge and control of an epidemic's outbreak'. Some techniques, like that of James [29] or the one by Manchein [9], are used to analyze the data without modeling the epidemic curve. The first used a time series analysis by means of cluster analysis, the second used the correlation distance to assess the relationship between power-law curves in countries such as Brazil, China, France, Germany, Italy, Japan, South Korea, Spain and USA. Manchein [9] showed that power-law curves are highly correlated between countries and these results strongly suggest that governmental strategies used to flatten power-law growth in one country, can be successfully applied in other countries and continents. Other studies, like the one being proposed in this paper, put forward the idea of modeling the curve to find out interesting points such as the critical points.
The critical points estimated through the classic non-epidemiological logistic model are the only ones allowing the researcher to speak in terms of 'acceleration' and 'asymptote' of a given epidemic. Although many studies mention the terms transmission speed and 'curve stabilization', none of them use it in a suitable statistical way.
The model hitherto proposed, was intended to describe the outbreak of the epidemic, that is, how the variables such as number of infected people and number of people recovered should behave as a function of epidemiological parameters. Some researchers applied new models, as in the case of Sales [1], who makes an application of the prey-predator model proposed by Loka and Volterra for COVID-19, and Giordano [30] that uses the model of SIDARTHE to discriminate between detected and undetected cases of infection and between different severity of illness (SOI), non-life-threatening cases (asymptomatic and paucisymptomatic; minor and moderate infection) and potentially life threatening cases (major and extreme) that require ICU admission. Shaikh [2] proposed a mathematical model using a fractional derivative. Established epidemiological models were also applied like SIR [31,32,33], SEIR [8,34,35], generalized logistic model [7], generalized Richards model [7] and generalized-growth model [7].
Our approach is unique because it estimates when (time) the main critical points of an epidemic occur using a simple model, that is, without differential equations requiring deep mathematical skills. Li [4] used a model (unnamed) making use of a parameter to estimate the turning point of the number of infected in China. This parameter is similar to the critical point ip proposed in our work. The authors estimated the turning point as occurring 12 days after beginning to collect the data (January 20, 2020), that is, their estimated turning point occurred on February 1, 2020. This date differs from the one estimated in our work (02/10/2020). The difference is probably due to the fact that in both works different periods have been considered. We started on 12/31/2019 and the Li [4] started on 01/20/2020. It is likely that Li [4] underestimated their result, as the authors disregarded 20 days since the official start of the pandemic (12/31/2019), a period in which the number of infected people was accelerating, reaching its maximum on the 02/03/2020 according to data presented in this article. Zhang [3] used the segmented poisson model to estimate the turning point, which is similar to the inflection point in this study. They estimated March 26, 2020 as the turning point of Italy, a difference of ten days to the critical point ip estimated in our work (04/05/2020). As the beginning of the studies on Italy were the same, there are innumerable hypotheses that justify that difference. The most plausible is that different statistical techniques were used in both studies. It is important to point out that this study is the only one using the term "critical point" in precise mathematical terms.
The model proposed in this paper allows us to answer questions like: 'How many days are necessary to stop the epidemic?' [36] In China, it lasted 57 days until the point of adp, in Italy, 94 days. Other questions may arise in the course of a study and any researcher interested in the field could use the tool proposed above in his quest to answer them. For instance, why was the time between the beginning of the outbreak and the point of map in China shorter (35 days) than in Italy (51 days)? As already pointed out by other stud-ies, the lockdown was indeed an effective measure in the battle against the spread of an epidemic [18]. In Wuhan, the epicenter of the pandemic at that time, the lockdown was implemented 23 days after the first case officially reported. In Lombardy and 14 additional northern and central provinces like Piedmont, Emilia-Romagna, Veneto and Marche, the lockdown was implemented 37 days after the first occurrence. Moreover, air pollution may have been another major factor in Italy to help the spread of the virus and keeping the acceleration up as showed in Coccia [36]. Another interesting result was the proximity between the lockdown and the point of map. In China, this time was 12 days while in Italy it was 14 days. Does this fact remain the same in other countries?
The logistics growth model complements all the studies already proposed to understand a given epidemic, because it makes the variable time the most important one. Our proposal has the limitation of being applied only after the stabilization of the epidemic. On the other hand, it allows the estimation of more precise parameters that have not yet been addressed in the literature and does not make conjectures about the future and its various scenarios as is the case with epidemiological models.
However, it does not mean that epidemiological models that make future projections cannot use the technique proposed here. On the contrary, it would further help to understand how quickly we could stabilize a curve given different social scenarios and government strategies. Manchein [9] for example, used a variation of the SEIR model to show how different government strategies and social actions can flatten the power-law curve. Our proposal could be added to such a technique, as we would know how such strategies could affect the critical points of the curve.
Researchers studying epidemics that are in a second, third or further outbreak waves, may also use the technique we are proposing to analyze the specific wave by simply identifying the beginning and end of the wave under study. James [37] has developed an algorithm that defines a second wave.
For instance, after estimating the critical points of the logistic growth model, the researcher could try to understand why either the time until the peak of infection or the time until the stabilization were different among countries; what health, social and economics measures or environmental factors may contribute to decrease or increase the time among the critical points.

Conclusion
The logistics growth shows itself as an important tool in modeling epidemics, besides some shortcomings as pointed above, it proved to be powerful as it estimates with accuracy the time of occurrence of the critical points allowing public health workers and managers to get a better understanding of the whole process involving an epidemic. The political authorities are able to get a better understanding of time, and not just of the cause and effect as so far has been the approach in scientific works.
Another interesting application of would be to calculate the critical points of the logistic growth model using the number accumulated deaths and so determine which measures directly related to health occurred in each period between the critical points.