The earliest Covid-19 patients were recorded in the data set on January 22, 2020. We have taken examples from January 22, 2020 to June 29, 2020. It consists of 160 instances and five attributes. These attributes have information about the date of recording, confirmed cases, recovered cases, deaths, and growth rates related to CoViD-19 patients. The following estimates are made from the data set to explore and extract useful information.
Correlation coefficients
The statistical measure correlation coefficient is the strength of the relationship between the relative motions of two variables. The range is defined as -1 to +1. Incorrect correlation measurement occurs when values greater than +1 and less than -1. The correlation measurement at -1 is completely negative, the correlation measurement at +1 is positive, and the value at 0.0 is the nonlinear relationship between the two variables [24].
Related statistics can be used to define the relationship between different attributes of the disease. A correlation coefficient can be calculated to determine the correlation level between the confirmed cases and the recovered cases under the current pandemic situation and the rate of increase in deaths and mortality, as shown in Table 1 and Figure 3. We found that in Covid-19 confirmed case and recovered case the correlation between these two variables is highly positive.
Table 1: Correlation Coefficients of attributes
|
Confirmed
|
Recovered
|
Deaths
|
Increase rate
|
Confirmed
|
1.000000
|
0.986051
|
0.988177
|
-0.378478
|
Recovered
|
0.986051
|
1.000000
|
0.950569
|
-0.337027
|
Deaths
|
0.988177
|
0.950569
|
1.000000
|
-0.401742
|
Increase rate
|
-0.378478
|
-0.337027
|
-0.401742
|
1.000000
|
ARIMA Model Results
In the ARIMA model, we choose the parameters p, d, q [28]. For this reason, even without drawing graphics, we use auro_arima to find the appropriate parameters. The auro_arima work works by directing differencing tests like Kwiatkowski–Phillips–Schmidt–Shin, Augmented Dickey-Fuller or Phillips– Perron to decide the request for differencing, d, and afterward fitting models inside scopes of characterized start_p, max_p, start_q, max_q ranges [25]. In the event that the occasional discretionary is empowered, auto_arima likewise tries to distinguish the ideal P and Q hyper-boundaries in the wake of directing the Canova-Hansen to decide the ideal request of occasional differencing, D. The following figure 4 shows the parameters obtained by the auro_arima model.
When viewing the residual plot from the auto_arima model, as shown in Figure 5.
The output of the auto_arema model is explained as follows:
Standardized residual: The error of the residual is near the mean of the zero line and has a uniform variance.
Histogram and density plot: In the figure below, the density plot shows the equal distribution around the zero line average.
QQ-plot: In the QQ chart, all blue dots (ordered distribution of residuals) are on the red line, and any deviations will be skewed by the line. It is usually distributed along N (0, 1) and is considered to be uniformly distributed.
Correlogram: Correlogram or ACF plots show that the residual error isn't autocorrelated. Any autocorrelation implies that Residual error.
The optimal values of p, d, and q obtained by the auto_arima model are 1, 2, and 2, respectively. Now, using the best parameters obtained (1, 2, 2) to create an ARIMA model, the results are shown in figure 6.
Figure 6 above shows the importance of the ARIMA model. In this figure, we will focus on the coefficient table. The coef section shows the weight of each element and how each element affects the time series. P> | z | this section provides advice on the importance of the weight of each element. Here, the p-value of each weight is less than or close to 0.05, so it is wise to include each weight in our model.
These views make us think that our model can create a good fit, which can help us understand time series information and calculate future value. Although we have a reasonable fit, we can occasionally change some limitations of the ARIMA model to improve the model's aggressiveness. We have obtained a model for the time series and can now use it to create estimates [26]. We first compare the predicted value with the actual estimated value of the time series, which will help us understand the accuracy of the prediction. The numbers and associated confidence intervals we have now created can now be used to additionally understand time series and predict what to store. Our data shows that relying on time series can maintain a consistent growth rate.
As our predictions for the future say, it is normal to be less optimistic about our values. This is reflected by the deterministic interval generated by our model, as we further develop, the deterministic interval will become larger and larger. We start predicting death cases in a test data set that maintains 95% confidence. Figure 7 below shows the prediction results.
In the figure below, the actual death of the training data set is shown by the blue line, and the predicted death is shown by the red line. The prediction of death on the red line has dropped, which means that in the future, the incidence of deaths will become shorter and shorter, as more and more people recovered quickly, and people maintained the social distance in this pandemic situation.
By using statistical data, we created summary metrics that classify and collect residuals into single value, which are related to the model's a predictive ability.
In order to judge the prediction results, let us apply commonly used accuracy indicators, the results are shown in table 2.
Table 2: Correlation Coefficients of attributes
Measures of Accuracy
|
Value
|
Mean Absolute Error (MAE)
|
0.12044588473307338
|
Mean Squared Error (MSE)
|
0.023012953284359018
|
Root Mean Squared Error (RMSE)
|
0.15170020858376898
|
Mean Absolute Percentage Error (MAPE)
|
0.009196691386663233
|
The MAE of our model is 0.1204, which is quite small suppose our data death case starts at 0.01.
For MSE, the value 0.0230 is less than MAE. We found this to be the case: MSE is an order of magnitude smaller than MAE.
The value 0.1517, of RMSE is similar to standard deviation and is a measure of how much the residual distribution is.
Around 0.91% MAPE implies the model is about 99.09% accurate in predicting the test set observations.
Regression Model Results
In order to find out which factor has the most significant influence on the forecasted output and how the various factors identify each other, we will consider different input functions such as "confirmation case", "recovered case" and "increase rate". Based on these characteristics, we will predict the deaths of Covid-19 patients. The data set splited into 80%:20% as training and testing respectively.
In multiple linear regression, then regression the model has selected the best coefficients for all attributes [27]. The coefficients of the regression model are shown in Table 3 below.
Table 3: coefficients of regression model
Attributes
|
Coefficient
|
Confirmed
|
0.103305
|
Recovered
|
-0.100568
|
Increase rate
|
69.616876
|
From the table 3, it is clear that if increase in “recovered case” by 1 unit, there is decrease of “death case” by 0.1005 units vice versa. Similarly, increase in “confirmed case” and “increase rate” by 1 unit, there is increase in “death case” by 0.1033 units and 69.6168 units respectively.
Now we predict the test data to check the difference between the actual value and the predicted value in Table 4 below.
Table 4: Difference between the actual value and predicted value
Instance Number
|
Actual Value
|
Predicted Value
|
110
|
286697
|
221975.301362
|
112
|
297539
|
286646.565236
|
143
|
430047
|
423127.482077
|
7
|
133
|
-6528.684075
|
44
|
3459
|
-2713.950271
|
101
|
244129
|
236968.993751
|
122
|
342565
|
329894.990367
|
66
|
31990
|
47224.597929
|
85
|
148157
|
160515.287829
|
86
|
157022
|
167041.159151
|
133
|
386298
|
376198.729391
|
92
|
193926
|
198189.689192
|
26
|
1868
|
-1385.556916
|
146
|
443685
|
438945.896459
|
119
|
328483
|
318945.015040
|
62
|
19026
|
25233.066196
|
51
|
5411
|
808.770349
|
97
|
221109
|
221511.564448
|
128
|
365380
|
355638.073651
|
90
|
180475
|
187102.115303
|
When plotting and comparing the actual value and the predicted value, as shown in Figure 8.
As shown in the multiple regression model shown in Table 4 and Figure 8, the initial predicted number of deaths has increased compared with actual deaths, but as we progress in the data table, compared with actual deaths, the predicted deaths the number has decreased from the month of May 2nd 2020.
Overall, this study shows that the reduction in deaths worldwide is a good sign for human society.