Forecasting the Stability of COVID-19 on Indian Dataset with Prophet Logistic Growth Model

doi:10.21203/rs.3.rs-32472/v1

Download PDF

Research Article

Forecasting the Stability of COVID-19 on Indian Dataset with Prophet Logistic Growth Model

https://doi.org/10.21203/rs.3.rs-32472/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Recently COVID-2019, a highly infectious disease has been declared as Pandemic by WHO, and since then the researchers all over the world are making attempts to predict the likely progression of this pandemic using various mathematical models. In this paper, we are using logistic growth model to find out the stability of this pandemic and Prophet Model to forecast the total number of confirmed cases that would be caused by COVID-19 in India.

Infectious Diseases

COVID-19 Pandemic

Prophet Model

Prediction

Logistic curve.

Subject	Decision Sciences
Specific subject area	The data prediction is focus on stability of confirmed cases caused by COVID-19 in India using Prophet Logistic S-shaped Curve.
Type of data	Table Graph Figure
How data were acquired	Coronavirus Source Data: https://covid.ourworldindata.org/data/owid-covid-data.csv
Data format	The raw data is collected in CSV file, which is cleaned for prediction and analysis. The CSV file has been uploaded.
Parameters for data collection	The total number of confirmed cases in India for the period from January 30th, 2020 to April 29th, 2020 were collected and used to find the corresponding carrying capacity in Prophet Logistic Growth Model.
Description of data collection	Daily data on COVID-2019 for India during the above-mentioned period was collected from the hyperlinks given at the official website on COVID-19: https://www.mygov.in/corona-data, https://ourworldindata.org/coronavirus-source-data. Two different website links were used to maintain the authenticity in the data, and prediction. It will show the exact figure in term of stabilizing the curve. Analysis of the data was done to evaluate the number of new confirmed cases of COVID-2019.
Data source location	Oxford Martin School, University of Oxford, UK.
Data accessibility	The raw data for this is taken from GitHub repository of coronavirus source directory. https://covid.ourworldindata.org/data/owid-covid-data.csv

This data offers a new approach to the scientific community, researchers, and academicians to forecast that when a pandemic is likely to stabilize.
This data gives valuable insights for the progression of COVID-19, hence the organizations dealing with infectious diseases can use this data for early prevention, to monitor, manage and control the spread of pandemic.
This data supports decision-makers working for different domains of society like nation's leaders determining what policies to implement, pharmaceutical interventions, environmental impact and effect on economy.

The data of COVID-2019 from December 31^st 2019 to May 19^th 2020 was collected from official website https://ourworldindata.org/coronavirus-source-data, the data is available in form of CSV file [1]. The dataset consisted of 19 features. Out of these, the feature ‘total confirmed cases’ was forecasted using Prophet Model. These values demonstrated an increasing trend over time. Hence, these were treated independently as univariate time series and a forecast was made for the number of confirmed cases after 19^th May. Figure 1 shows the increase in the actual number of confirmed cases over time as well as the number of confirmed cases as predicted from the logistic growth model. This plot has been made using the parameters mentioned in Table 1. Here a is a hyperparameter (hp), b is the multiplicative factor (mf) and c is the maximum number of people who might get infected by this virus, which is carrying capacity in our case(cc). Figure 2 shows the actual number of confirmed cases as well as the forecasted confirmed cases. We have shown that after 10^th July, the growth of confirmed cases will reach a stable point (refer Table 6).

Table 1: Parameter Computing using Nonlinear Least Square Estimation Method

a (hp)	b(mf)	c(cc)
11331.051527372085	0.08525654103333764	194401.71436567788

We have used logistic curve to model the spread of COVID-19. A Prophet logistic curve is a Sigmoidal-S-shaped curve used to model functions which can be divided into three phases viz. (i) first phase: increases gradually during the start period, (ii) second phase: grows more quickly in the intermediate period and (ii) third phase: gradually at the end period, reaching a saturation level (referred to as carrying capacity) [2][3][4]. The Sigmoidal curve of this type is often used to find biological growth patterns which exhibit exponential growth period to start with and subsequently reaches a maximum level. The pattern followed by COVID-19 is similar to this curve as large number of people get infection in early stages and further growth subsequently slows down due to implementation of various preventive measures.

Prophet model gives very good outcomes when data shows either non-linear trends or non-linear growth patterns. Fundamentally, Prophet uses an additive regression concept, which can simply be modelled as a piecewise linear or as a logistic growth curve [5]. We tried using both these options and found that the logistic curve trend gives much better estimations to model the spread of COVID-19. Even intuitively, modelling the COVID-19 spread through logistic method is more realistic. The motivation of this analysis is to identify if the COVID-19 in India is still growing exponentially or the growth had started following a flat curve [6]. It is necessary to fit India’s data to the logistic model because if any country is in the initial part of the curve, it indicates that the COVID-19 is still growing at a fast rate whereas if it is in the second part of the curve, it implies that growth is towards its saturation point. To ascertain the same, growth of the COVID-19 from the available data is modelled with logistic function and forecast of the growth over the next few days is done with Prophet.

Implementation Steps:

a. The data frame used in Prophet needs two columns-(i) ds: to store date time series and (ii) y: to store the corresponding values of the time series in the data frame. The parameter carrying capacity (c) represents the maximum number of infections that can be caused by the virus or the saturation point of virus growth which is estimated via logistic model using the equation:

b. demonstrate COVID-19 spread using this equation three parameters a, b and c are computed using Nonlinear Least Squares Estimation method by using Scipy Curve fit optimization library of python. The logistic curve is then fit to the dataset using these parameters and it is then compared to the actual values of confirmed cases at each time instant (refer Table 1 and Figure 1). To

c. Now before feeding dataset to the prophet model to forecast growth, we need to find the estimated carrying capacity, c which is computed as follows:

Rule 1: When the fastest growth day is still ahead, it implies that the growth is still increasing. In this case add ten days after finding the fastest growth day.

Rule 2: When the fastest growth day is in the past, it means that growth has stabilized. In this case use the current day and add ten days to find the estimated highest number of infections.

d. The fastest growth day identified through our logistic model is 18^th May which is in the past as we have analyzed data till 19^th May, therefore rule 2 is applicable to our model. The parameter c is 71436567788, which means the maximum limit for the number of infections in India would be 194401.71436567788. Prophet model was the fit using this value of c (194401.71436567788 as shown by the horizontal line at the top (refer Figure 2)). From the forecast curve (Figure 2) it can be inferred that the growth of the COVID-19 in India is expected to stabilize after July 10^th (refer Table 2).

Table 2: Growth Stabilization of COVID-19 in India

Carrying Capacity	Expected Stabilization Period
194401.71436567788	After July 10^th

In order to evaluate the model-fit, we used cross-validation. The first case of COVID-19 was confirmed in India on 30^th January, and the date of data collection is 19^th May, so we had 110 days data. The initial 100 days (train data) were used to train the model and the remaining 10 days (test data) were used to evaluate the model (refer Table 3). The outcomes obtained are shown in Table 4.

Table 3: Parameters of Cross Validation

Initial	Period	Horizon
100 days	3 days	10 days

Predicted vs actual with lower and upper Confidence Interval (Table 4), the data up to 9^th May was considered for training hence the cutoff date is 2020-05-09. The effects of the error diagnostics of the model are shown in Table 5. Minimum error is reported at a horizon of 8 days, meaning thereby that the number of confirmed cases can be more accurately predicted by looking at 8 days in the past.

Table 4: Cross validation: Predicted (yhat) vs. Actual (y) values of Confirmed Cases

Table 5: Error Diagnostics of the Confirmed Cases

Significant Inferences from Data Analysis:

The lockdown 4.0 is till May 31^st. The confirmed cases data has fit a logistic distribution. This data has been collected in a controlled environment (lockdown, social distancing etc.). From the logistic curve it can be inferred that India is in the second part of the logistic growth curve therefore the number of confirmed cases is going to stabilize soon (provided the controlled conditions sustain). Therefore, the preventive measures have been successful in controlled setup. The maximum people that can be infected by this virus (as inferred from the logistic model) is 194401.71436567788. This means that if the controlled conditions continue to prevail, the growth of the COVID-19 will become stable after 10^th July and the confirmed cases will come close to the carrying capacity. Also, evident from the figures below, the confirmed cases on May 31^st will be 1,44,335 (refer Table 6) which is much less than the carrying capacity of logistic (194401.71436567788), therefore the strict preventive measures should be continued at least till 10^thJuly.

Table 6: Forecast of Confirmed Cases in India

In this paper, the data provides a simple way to understand the spread of COVID-19 and to authenticate whether various preventive measures taken like social distancing, quarantine, contact tracing are effective or not. Author have forecasted the confirmed cases caused due to coronavirus in India. Figure 2 shows forecast curve of confirmed cases. We have shown that after 10^th July, the growth of confirmed cases will reach a stable point.

Competing Interests

The authors confirm that they have no known involvement in any organization and other entity with any financial interest. The authors also declare that they have no known personal relation that could influence the work done in the paper.

Coronavirus Source Data, Oxford Martin School, 2020. https://covid.ourworldindata.org/data/owid-covid-data.csv
https://facebook.github.io/prophet/docs/quick_start.html
https://towardsdatascience.com/forecasting-with-prophet-d50bbfe95f91
https://www.analyticsvidhya.com/blog/2018/05/generate-accurate-forecasts-facebook-prophet-python-r/
Sean J. Taylor, Benjamin Letham, Forecasting at Scale, Facebook, Menlo Park, California, United States https://peerj.com/preprints/3190.pdf
Işil Yenidoğan, Aykut Çayir, Ozan Kozan, Tuğçe Dağ, Çiğdem Arslan, Bitcoin Forecasting Using ARIMA and PROPHET, 2018 3rd International Conference on Computer Science and Engineering (UBMK). DOI: 10.1109/UBMK.2018.8566476

Download PDF

Version 1

posted

You are reading this latest preprint version

Forecasting the Stability of COVID-19 on Indian Dataset with Prophet Logistic Growth Model

Status:

Version 1

Abstract

Figures

Specification Table

Value Of The Data

Introduction To Data Description

Experimental Design, Materials, And Methods

Conclusion

Declarations

References

Status:

Version 1