Prediction of COVID-19 Cases in Afghanistan Using ARIMA Model

In Afghanistan, the novel coronavirus disease 2019 (COVID-19) is spreading rapidly. Currently, we are in third wave of pandemic, in Afghanistan. And recently government of Afghanistan recorded the highest confirmed cases since the start of pandemic. In order to prepare ourselves and make right decision, we need to predict the future. Past information has been used to predict future, but we need to confess that no prediction can be as real future. The data have been used to predict is from twenty first March 2021 to fifteenth July 2021. The forecasting period is from sixteenth July 2021 for 40 days till twenty fourth August 2021. To examine stationarity of data, we used Augmented Dickey-Fuller unit-root. In this study autoregressive integrated moving average ARIMA model has been used to predict future COVID-19 cases. In order to find the best ARIMA parameters, we used Akaike’s Information Criterion (AIC) and Bayesian Information Criteria (BIC). The data which we have used in this study are taken from Ministry of Public Health. We found out that ARIMA (0, 2, 1) is best fit model. By using ARIMA (0, 2, 1), we predicted COVID-19 cases in 80 and 95 per cent level of confidence. In 95 (high) and 80 (high) per cent level of confidence COVID-19 positive cases reach 216,159 and 203,979 cases respectively. On the other hand, in 95 (low) and 80 (low) per cent level of confidence after a period of 40-day positive cases will reach 145,780 and 157,961 cases respectively. This study is the first study which predict COVID-19 cases in Afghanistan, so government of Afghanistan, non-government organizations (NGOs) and scholars can use it to plan and prepare themselves to confront uncertain future and protect people against this virus.


Introduction
First case of COVID-19 virus found in China on December 2019, and on March of this year WHO (World Health Organization) declared COVID-19 pandemic (prevalent over a whole country or the world) (The New York Times, 2021). Globally, since the first positive case of 188,928,123 cases have been detected till fifteenth July 2021, and 4,066,605 people are died (John Hopkins Coronavirus Resource Centre, 2021). In Afghanistan, 137853 COVID-19 cases have been confirmed and 5983 people have been died, since 24 February 2020 (WHO, 2021). Afghanistan government announced that till 29 June 2021, 1,024,168 does vaccines have been administered and 253,939 people fully vaccinated which makes 1.3%of all population (Reuters, 2021). On 26 June 2021, ministry of public health announced that among 19 samples they took 11 of them were Delta variant of COVID-19 virus and remaining were Alpha variant (Tolo News, 2021). According to the data, Delta variant of COVID-19 is 40 to 60 per cent more transmissible than the Alpha variant which was 50 percent more transmissible than the initial variant first detected in Wuhan, China (Scientific American, 2021). By rising COVID-19 positive cases, Afghanistan government announced extension of lockdown in capital city Kabul for three weeks (Garda world, 2020) which included a ban on inter-city travel and a reduction of all non-essential services (occurred on 2 May 2020).
In this study, I used a time series model which is ARIMA model. This model is a statistical model which is used to predict time series data. By using this model, I am trying to forecast COVID-19 cases in Afghanistan.
The remaining parts of the study is structured as follow: the second section reviews existing literature on application of varied forecasting models for estimation of the pandemic. The third section the objectives of the present study have been elaborated. The fourth section gives information on description of data, research methodology and selection of best ARIMA model using Akaike's Information Criterion (AIC) and Bayesian Information Criteria (BIC). The sixth section dissentions and results. The seventh section summarises the results and implications.

Literature review
Abolmaali S. (2021), Used Susceptible-Infected-Recovered (SIR) Model, Linear Regression, Logistic Function and ARIMA Model to forecast COVID-19 cases in India, Russia, United States of America and Brazil. This study compares these four models in term of accuracy magnitude of error and tries to forecast COVID-19 cases. Banik A. (2020). Analysed the cause of death rate during COVID-19 period in 29 different developed and developing countries. This study examines the effect of various factors which lead to an impact in fatality rate. Benvenuto D. (2020), used Augmented Dickey-Fuller (ADF) unit root to find weather the data is stationary or not. ARIMA model has been used to predict the prevalence and incidence of COVID-19 on eleventh and twelfth of Feb 2020. Claris S. (2020), used ARIMA model to forecasting daily COVID-19 cases in south Africa. This model predicted positive cases for 20-days. According to this prediction model, COVID-19 cases will rise during this span of time and reach 1744 cases.
Dahesh T. (2020), considered 41-day past data to make ARIMA model and for showing the stationarity of the data Augmented Dickey-Fuller (ADF) unit root used. In this study, ARIMA model forecasted covid-19 cases for 17 days in five different countries. Earnest A. (2005), suggested ARIMA model to predict bed occupied in Singapore during SARS outbreak. Researchers found out that ARIMA (1,0,3) was suitable for this study and training MAPE was 5.7%. it is advised that this model could be used for bed-capacity during infection diseases.
Elsheikh H. A. (2020) forecasted COVID-19 cases in Saudi Arabia by using long short-term memory (LSTM) network and compared with nonlinear autoregressive artificial neural networks (NARANN) and ARIMA model. In order to assess the accuracy of the model, the researcher used root mean square error (RMSE), coefficient of determination, mean absolute error (MAE), efficiency coefficient (EC), overall index (OI), coefficient of variation (COV), and coefficient of residual mass (CRM). Furthermore, authors used the LSTM model to predict confirmed and death cases in five more countries. Coefficient of determination of total cases and for total death are 0.976 and 0.944, respectively.
Fanelli D. (2020), forecasted and analysed COVID-19 dynamic in China, Italy and France. The Authors apply the susceptible-infected-recovered-deaths (SIRD) model, and it shows recovery, infection and death rate in all three countries. Gupta R. (2020), analysed COVID-19 cases in India and other south Asia countries. Author proposed time-series model to predict COVID-19 cases in India. According to this study cases will reach a million in 30 days.
In the work of Kufel T. (2020), 6 ARIMA models for each 32 European countries used to predict dynamic cases of COVID-19. ARIMA (1,2,0) predicted cases for 7 days. Finding the usefulness of ARIMA model in prediction of COVID-19 dynamic cases was the aim of this study.
Katoch R. used the ARIMA model to predict COVID-19 cases in four different states and also all cases of India. In order to check stationarity Dickey-Fuller (ADF) had been used. Test Authors used different ARIMA parameters for instance for India ARIMA (4,2,7) and for Tamil Naidu ARIMA (0,2,1). Khaliq R. (2020), proposed ARIMA model to forecast COVID-19 cases in Jammu & Kashmir. ARIMA (1,2,3), ARIMA (0, 2, 2) and ARIMA (0, 2, 2) are applied to forecast confirmed cases, recovered cases and deceased cases, respectively. The time span used in this study is from 9th March 2020 to 30th September 2020. An ARIMA model predicted COVID-19 cases for one month (October 2020). The study reveals cases will rise during the forecasted period. Kim Y. (2018) studied the transiting dynamic of MERS-CoV in South Korea. Authors used an agent-based model to predict. This model is used for diseases which have rapid separation.
Malki Z. (2020) has developed a SARIMA model in order to predict COVID-19 end and existing cycle of virus. SARIMA (9,0,8) *(0,0,0,3) had been considered fit parameters. This model proposed that if prevention guidelines and precautions are not followed, virus second rebound will happen. Moein S. (2021) suggested that the SIR model is inefficient to predict the actual spread COVID-19 in the long run. This study analysed and forecasted COVID-19 cases in Isfahan (a province in Iran). Namasudra S. (2021) considered Nonlinear Autoregressive Neural Network Time Series (NAR-NNTS) which trained by three different algorithms which are Scaled Conjugate Gradient (SCG), Levenberg Marquardt (LM) and Bayesian and compared with model with Root Mean Square Error (RMSE), Mean Square Error (MSE) and correlation coefficient, in case of India. After comparison, the researcher found out that a better model is the NAR-NNTS which trained by Levenberg Marquardt algorithm. Perone G. (2020), fitted ARIMA model to forecast COVID-19 final size and separation of this virus in Italy. Emilia Romagna (0, 2, 1), Italy (4, 2, 2) and Lombardy (1,2,1) and two different ARIMA parameters are used by the author. This study indicates that cases will reach zero in the 50 days. The final size of the COVID-19 cases will reach between 254,000 and 272,000 and the death toll will reach between 31,318 to 33,538. Petropoulos F. (2020) proposed sample time series method. The object of this study is to analyse and forecast the death rate, confirmed cases and recoveries of COVID-19 virus globally in a specific span of time. This study forecasted COVID-19 cases in five different rounds which start from second Feb 2020 to twenty first Mar 2020.
Roy S. (2020) used weighted overlay analysis and ARIMA model to forecast COVID-19 cases in different states of India. In order to examine the models, mean absolute error (MAE) and Root mean square error (RMSE) had been used. Study shows two parts of India are more vulnerable which are south and west.
Saba T. (2020) used six different forecast methods (Random forests, K-nearest neighbors, Support vector regression, ARIMA model, SARIMA model, Decision trees, Holt winter model, Polynomial regression and Gradients boosting regressor) to predict COVID-19 cases under different lockdown strategies. The study examines three different lockdown strategies which are partial, herd and complete, and suggests herd strategy as the best among all these three strategies.
Sujath R. (2020) proposed machine learning models (Linear regression (LR), Multilayer perceptron (MLP), Vector autoregression (VAR)) to forecast COVID-19 cases in India. The data which was used in this study is from Kaggle data. The study suggests MLP is a good method for predicting the cases more than LR and VAR models.

Objectives and rational of the study
The main objective of this study is to predict COVID-19 positive cases. The time period which we have used is from sixteenth July 2021 for a period of 40 days. Currently Afghanistan is witnessing the third wave of the COVID-19 pandemic. Presently, COVID-19 hospitals are full patients and don't have any space for hospitalization, so people hospitalize patients in hospitals' yard. So there is a great need to estimate and forecast the prevalence of this virus. The Afghanistan government tries to break the transmission chain of the virus. In order to do this, the government has imposed lockdown in capital city Kabul and other big cities. In this unprecedented situation, forecasting the future of pandemic is really critical for the government and all non-government organizations who work in health sector.

Data description
For this study, I have used daily confirmed cases of COVID-19 in Afghanistan from twenty first March 2021 to fifteenth July 2021. Data was collected from the Ministry of Public health (MoPH), Afghanistan. In this study R studio and Excel have been used to make the model and predict the confirmed cases.

Results and Discussion
For determining the ARIMA parameters, AIC and BIC have been used. In this part of study, we discuss all steps which lead us to create ARIMA (p, d, q) and the result of the model. In this study we used data from twenty first March 2021 to fifteenth July 2021 (119 days).
In order to create time series model, we need stationary data, so Augmented Dickey-Fuller (ADF) unit-root test has been used to cheek stationarity of the data. Observing the time series plot of data also show weather the data is stationary or not.  Figure 1 illustrates dramatic increase in Covid-19 cases in Afghanistan. It is also clear from this figure that the data is not stationary. In Augmented Dickey-Fuller unit-root we consider null hypothesis (Time series data is non-stationary) and alternative hypothesis (Time series data is stationary). After calculation, we found out that Augmented Dickey-Fuller unit-root is 0.77, Lag order = 4 and p-value is 0.96. So it confirms the data is not stationary.  Table 1 depicts ARIMA parameters. AR (p) is 0, I (d) is 2 and MA (q) is 1. Auto regression equal to 0 means we need 0 lags in the time period. As we mentioned previously, the data is not stationary so we need to take 2 order differences. Moving Average in time series model is past error, so in our model it is 1.  Figure 2 illustrates the COVID-19 positive cases for two different periods. First period starts from 21 March 2021 to 15 July 2021 which is before forecast period, and second period starts from 16 July 2021 to 24 August 2021 which is the forecasted period (for 40 days). Blue line indicates forecast value and the darker shaded area shows 80% Confidence Level and the lighter shaded area shows 95% confidence level. It is clear from the graph that COVID-19 cases started rising before 30 May 2021 and it continues till the end of the forecast period. On 16 July 2021, total cases were 138,930, and in 40 days it reached 180,970. If we see the outcome of the ARIMA model, there is no hint of reduction in confirmed cases. According to data from 2017 for every thousand patients in Afghanistan there are 0.39 beds (WHO, 2021). These data really show the severity of the pandemic situation in Afghanistan.

Conclusions
Afghanistan is a developing country, from August of 2021, we are in the third wave of COVID-19 pandemic. Like other countries, Afghanistan also was not prepared to confront a pandemic.
In this study, we used the ARIMA (0, 2, 1) model to forecast COVID-19 cases in Afghanistan for 40 days from 16 July 2021 to 24 August 2021. COVID-19 is not a big challenge just for Afghanistan but it is a global challenge. Result of this study shows COVID-19 confirmed cases rise dramatically. If we consider 95% confidence level, COVID-19 cases soar from 139,437 confirmed cases to 216,159 confirmed cases.
For those scholars who would like to work on this topic, it is recommended to investigate the effect of lockdown and other protection measures on the outbreak prevalence.

Significance statements
Before this study, there was no such study to predict the future of COVID-19 cases in Afghanistan. So there is great need for such study. This study helps the government and policy makers and non-government organizations to plan ahead in order to combat the pandemic.

Funding
The authors received no financial support for the research, authorship and/or publication of this article.