World Health Organization (WHO) time series data has been used for experiential study. The data time is from January 22, 2020 to May 28, 2020. Data includes confirmed cases, deaths and recovered cases from all countries [8]. This article focuses on the data used for analysis and prediction of COVID–19 in around the world to confirm the diagnosed patients, those who died and recovered. For analysis and forecast quantity in patients with COVID–19 in worldwide, the following time series analysis have been used.
We have 5 months of data (Jan 2020-May 2020) and by this statistics we will predict the figure deaths for future.
Data Preprocessing
For creating training and test files for modeling-
- The first four months (January 2020 to April 2020) are used as training data, and the next one month (May 2020) is used as test data.
- The data set is summarized on daily basis.
The training and testing of the data set is different during the time period shown in the figure 1 below.
Naïve Method
When we use naive methods to predict the next day, we can get the value of the last day, it is estimated that the value is the same the next day [9]. This prediction technique is called the naive method, and we assume the subsequently predictable point is equivalent to the preceding experimental end i.e.
ŷt+1 = Ŷt
Now we will perform a naive method to predict “Death” worldwide observed in the test data set. In Figure 2 the y-axis shows the deaths of infected person and x-axis shows the time (months).
Simple Average Method
In some cases, the numbers in the data are increasing and decreasing randomly with small amplitude, the average value is kept constant. Although the data set has a small change in the entire session, the standard value on every occasion remains unchanged [10]. Now, we can predict the number of the subsequent day, which is like to the standard of the precedent few days. This prediction method in which the estimated value is the same to the standard value of the earlier experiential points is called averaging method. We get all previously known values and evaluated standard and use it as the subsequently value i.e.
Figure 3 is the graphical representation of given value.
Moving Average Method
In the data set, we obtained the given result multiple times, and the number of passes significantly increased/decreased several time ranges. So as to utilize the past Average technique, we need to utilize the mean of all the past information.
Such anticipating strategy which utilizes gap of timeframe for ascertaining the normal is called Moving Average method [11].
Utilizing a basic moving normal form, we estimate the following significance(s) in a period arrangement dependent on the normal of a set limited numeral p of the past qualities. Subsequently, for all i > p
Figure 4 shows the relative measures at axis x as deaths and axis y as time.
Simple Exponential Smoothing Method
It may be sensible to affix greater loads to later discernment than to observations from the evacuated past. The technique which takes a shot at this rule is called basic exponential smoothing [12].
Forecasts are resolved using weighted midpoints where the loads decrease exponentially as observations begins from further previously; the smallest loads are connected with the most prepared recognition:
ŶT+1/T = αyT + α (1- α) yT–1 + α (1- α) 2yT–2 + ……
Figure 5 shows the relative measures at axis x as deaths and axis y as time.
Holt’s Linear Trend Method
We need a methodology that can portray design correctly without any assumptions. Such a system that considers the example of the dataset is called Holt’s Linear Trend procedure [13]. Each Time plan dataset can be broken down into its segments which are Trend, Irregularity and Residual.
We can see from the figure 6 got that this dataset follows a growing example. From now on we can use Holt’s direct example to gauge the future pattern.
For estimating the information with pattern we need three conditions: level, pattern and consolidation of level and pattern to find normal forecast ŷ.
Forecastŷt+h/t = lt + h bt
Level lt = αyt + (1 - α) (lt–1 + bt–1)
Trendbt = β*(lt —lt–1) + (1 - β)bt–1
In the over three conditions, we have added level and pattern to create the forecast condition.
Similarly, in the step of Figure 7, the model condition indicates that it is a weighted normal for evaluating the model at time t, which depends on l(t)–1(t–1) and b(t–1), the past estimates value of mode.
Holt-Winters Method
Holt’s winter method is to apply exponential smoothing to the occasional segments not withstanding level and pattern [14].
Holt’s winter technique utilizes the irregularity factor. The Holt-Winters occasional strategy contains the conjecture condition and three smoothing conditions: for the level lt, for pattern bt and for the occasional segment meant by st with smoothing parameters α, β and γ.
Level Lt = α (yt—St-s) + (1 - α) (Lt–1 + Bt–1)
Trendbt = β (Lt —Lt–1) + (1- β) bt–1
SeasonalSt = γ (yt - Lt) + (1- γ) St-s
ForecastFt+k = Lt + kbt + St+k-s
Where; s is the length of the seasonal period
0 ≤ α ≤1, 0 ≤ β ≤1 and 0 ≤ γ ≤ 1.
In figure 8, there is a level condition of weighted normal between the occasionally balanced perception at time t and the non-accidental prediction.
Root Mean Squared Error (RMSE)
In regression line prediction, it is necessary to predict the average y value associated with a given x value and obtain a measure of the distribution of y values around this average value. To construct the RMS error first, we need to determine the residual error. The residual is the difference of actual value and the predicted value [15]. The RMS error may be positive or negative because the predicted value is lower or exceeds the actual value. Square the residuals, average the squares, and then take the square root to get the RMS error. Then we use RMS error as a measure of the distribution of y values relative to the predicted y values.
Where; ŷi observed value for ith observation
yi predicted value
n number of observations.
We can compare above models based on their RMSE scores in the following table 1.
Table 1: Comparison of models by RMSE values on test data
Model
|
RMSE
|
Naïve Method
|
99.98448367289042
|
Simple Average
|
655.4500199405554
|
Moving Average
|
565.8570072290203
|
Simple Exponential smoothing
|
110.09483260989167
|
Holt’s linear Trend
|
277.164232654063
|
Holt’s Winter
|
236.48593103685542
|
ARIMA
ARIMA: Autoregressive integrated moving average, when exponential smoothing models depended on a description of pattern and irregularity in the data; ARIMA models connect the data with one another [16]. An expansion above ARIMA is Seasonal ARIMA. This works on the irregularity of dataset simply like Holt’ winter method. The general prediction equation of ARIMA expressed by y as:
ŷt = μ + ϕ1 yt–1 +…+ ϕp yt-p - θ1et–1 -…- θqet-q
The moving average parameters (θ) are defined here so that their sign is negative in the equation. The parameters are represented there by ar (1) and ma (1) in table 2. Stationary series may still have autocorrelation errors, which indicates that certain the number of AR items (p≥1) and/or some MA items (q≥1) are also required in the prediction equation.
Table 2: SARIMAX Results
|
coef |
std err |
z |
P>|z| |
[0.025 |
0.975] |
ar.L1
|
1.9024
|
5.909
|
0.322
|
0.747
|
-9.679
|
13.484
|
ma.L1 |
65.0254
|
2.84e+04
|
0.002
|
0.998
|
-5.55e+04
|
5.56e+04
|
sigma2 |
9.1957
|
8028.624
|
0.001
|
0.999
|
-1.57e+04
|
1.57e+04
|
The coef clip shows the weight of each part (significance) and how each affects the time course of action [17]. P> | z | this section starts with the importance of weight. Here, the p self-esteem of each weight is lower than or close to 0.05, so it is reasonable to keep all the weights in our model.
The following figure 9 produces display and examine for any unusual conduct.
The demonstrative model above shows that the model residuals depend on the following accompanying ordinary conveyance:
- In the histogram in addition to assessed density diagram, the red KDE line promptly follows the N(0,1) line, which is the standard image of the ordinary dispersion with a normal estimation of 0 and a standard deviation of 1. These shows the residuals are ordinarily distributed.
- The QQ plot shows that the arranged dispersion of residuals (blue spots) follows the direct pattern of tests taken from the standard normal distribution with N(0, 1). This strongly shows the residuals are ordinarily dispersed.
- There is no obvious seasonal variation in the standardized residuals over time; it seems to be white noise.
In spite of the way that we have a sufficiently fit, a couple of boundaries of our seasonal ARIMA model could be changed to improve our model fit.
Forecasting Visualization
In the last step, we portrayed in figure 10 our seasonal ARIMA time series model to forecast future values [18].
The numbers we created (conjectures and related deterministic ranges) and related deterministic spans can be used to additionally understand timing. Our predictions indicate that we rely on timing to maintain a predictable rate of development.
As we further build the future, we can expect us to lose confidence in our qualities. The deterministic extension created by our model reflects this, and as we move towards a farther future, the deterministic span will grow larger and larger.