Forecasting the Stability of COVID-19 on Indian Dataset with Prophet Logistic Growth Model

Recently COVID-2019, a highly infectious disease has been declared as Pandemic by WHO, and since then the researchers all over the world are making attempts to predict the likely progression of this pandemic using various mathematical models. In this paper, we are using logistic growth model to find out the stability of this pandemic and Prophet Model to forecast the total number of confirmed cases that would be caused by COVID-19 in India.


Subject
Decision Sciences Specific subject area The data prediction is focus on stability of confirmed cases caused by COVID-19 in India using Prophet Logistic Sshaped Curve.

Data format
The raw data is collected in CSV file, which is cleaned for prediction and analysis.
The CSV file has been uploaded.
Parameters for data collection The total number of confirmed cases in India for the period from January 30th, 2020 to April 29th, 2020 were collected and used to find the corresponding carrying capacity in

Value Of The Data
This data offers a new approach to the scienti c community, researchers, and academicians to forecast that when a pandemic is likely to stabilize.

Introduction To Data Description
The data of COVID-2019 from December 31 st 2019 to May 19 th 2020 was collected from o cial website https://ourworldindata.org/coronavirus-source-data, the data is available in form of CSV le [1]. The dataset consisted of 19 features. Out of these, the feature 'total con rmed cases' was forecasted using Prophet Model. These values demonstrated an increasing trend over time. Hence, these were treated independently as univariate time series and a forecast was made for the number of con rmed cases after 19 th May. Figure 1 shows the increase in the actual number of con rmed cases over time as well as the number of con rmed cases as predicted from the logistic growth model. This plot has been made using the parameters mentioned in Table 1. Here a is a hyperparameter (hp), b is the multiplicative factor (mf) and c is the maximum number of people who might get infected by this virus, which is carrying capacity in our case(cc). Figure 2 shows the actual number of con rmed cases as well as the forecasted con rmed cases. We have shown that after 10 th July, the growth of con rmed cases will reach a stable point (refer Table 6). Prophet model gives very good outcomes when data shows either non-linear trends or non-linear growth patterns. Fundamentally, Prophet uses an additive regression concept, which can simply be modelled as a piecewise linear or as a logistic growth curve [5]. We tried using both these options and found that the logistic curve trend gives much better estimations to model the spread of COVID-19. Even intuitively, modelling the COVID-19 spread through logistic method is more realistic. The motivation of this analysis is to identify if the COVID-19 in India is still growing exponentially or the growth had started following a at curve [6]. It is necessary to t India's data to the logistic model because if any country is in the initial part of the curve, it indicates that the COVID-19 is still growing at a fast rate whereas if it is in the second part of the curve, it implies that growth is towards its saturation point. To ascertain the same, growth of the COVID-19 from the available data is modelled with logistic function and forecast of the growth over the next few days is done with Prophet.
Implementation Steps: a. The data frame used in Prophet needs two columns-(i) ds: to store date time series and (ii) y: to store the corresponding values of the time series in the data frame. The parameter carrying capacity (c) represents the maximum number of infections that can be caused by the virus or the saturation point of virus growth which is estimated via logistic model using the equation: b. demonstrate COVID-19 spread using this equation three parameters a, b and c are computed using Nonlinear Least Squares Estimation method by using Scipy Curve t optimization library of python. The logistic curve is then t to the dataset using these parameters and it is then compared to the actual values of con rmed cases at each time instant (refer Table 1 and Figure 1). To c. Now before feeding dataset to the prophet model to forecast growth, we need to nd the estimated carrying capacity, c which is computed as follows: Rule 1: When the fastest growth day is still ahead, it implies that the growth is still increasing. In this case add ten days after nding the fastest growth day.
Rule 2: When the fastest growth day is in the past, it means that growth has stabilized. In this case use the current day and add ten days to nd the estimated highest number of infections.
d. The fastest growth day identi ed through our logistic model is 18 th May which is in the past as we have analyzed data till 19 th May, therefore rule 2 is applicable to our model. The parameter c is 71436567788, which means the maximum limit for the number of infections in India would be 194401.71436567788. Prophet model was the t using this value of c (194401.71436567788 as shown by the horizontal line at the top (refer Figure 2)). From the forecast curve ( Figure 2) it can be inferred that the growth of the COVID-19 in India is expected to stabilize after July 10 th (refer Table 2). In order to evaluate the model-t, we used cross-validation. The rst case of COVID-19 was con rmed in India on 30 th January, and the date of data collection is 19 th May, so we had 110 days data. The initial 100 days (train data) were used to train the model and the remaining 10 days (test data) were used to evaluate the model (refer Table 3). The outcomes obtained are shown in Table 4. Predicted vs actual with lower and upper Con dence Interval (Table 4), the data up to 9 th May was considered for training hence the cutoff date is 2020-05-09. The effects of the error diagnostics of the model are shown in Table 5. Minimum error is reported at a horizon of 8 days, meaning thereby that the number of con rmed cases can be more accurately predicted by looking at 8 days in the past.  The lockdown 4.0 is till May 31 st . The con rmed cases data has t a logistic distribution. This data has been collected in a controlled environment (lockdown, social distancing etc.). From the logistic curve it can be inferred that India is in the second part of the logistic growth curve therefore the number of con rmed cases is going to stabilize soon (provided the controlled conditions sustain). Therefore, the preventive measures have been successful in controlled setup. The maximum people that can be infected by this virus (as inferred from the logistic model) is 194401.71436567788. This means that if the controlled conditions continue to prevail, the growth of the COVID-19 will become stable after 10 th July and the con rmed cases will come close to the carrying capacity. Also, evident from the gures below, the con rmed cases on May 31 st will be 1,44,335 (refer Table 6) which is much less than the carrying capacity of logistic (194401.71436567788), therefore the strict preventive measures should be continued at least till 10 th July.

Conclusion
In this paper, the data provides a simple way to understand the spread of COVID-19 and to authenticate whether various preventive measures taken like social distancing, quarantine, contact tracing are effective or not. Author have forecasted the con rmed cases caused due to coronavirus in India. Figure 2 shows forecast curve of con rmed cases. We have shown that after 10 th July, the growth of con rmed cases will reach a stable point.

Declarations
Competing Interests Figure 1 Logistic Curve of Con rmed Cases Figure 2 Forecast Curve of Con rmed Cases