Forecasting of COVID-19 Reproduction Number by ARIMA Methodology and Quantile Estimation based on Best Fit Distribution by L- moments for Top-10 Affected Countries

We utilized the average weekly estimated reproduction number data of COVID-19 from March (2020-2021). By applying ARIMA and L-moments methodology, short-and-long-term forecasting of R0 is made for Govt. officials and public health experts to take before-time policy measures to control the spread of novel coronavirus. This study helps medical staff to measure the expected demand of COVID-19 vaccine doses. We applied various ARIMA models on each country ’ s data and the best selected based on RMSE, AIC, and BIC for point and interval forecasting. Application L-Moments techniques selected GLO, GEV, and GNO distributions and quantile estimation with return period calculations. The forecasting shows that maximum countries mean R0 >1, which is still a serious threat and can lead to heath disaster. The forecasting provided an alarming situation in the coming months for India, France, Turkey, and Spain; health experts should take strict measures because the cases rise due to the high R0 forecast. The USA, Russia, and the UK mean R0 will not suddenly increase; these countries consistent in COVID-19 R0 control. We find that even the significant population differences prevail among selected countries, the R0 is still high in maximum countries, so its a dire need to take strict control actions to minimize the R0 for public safety.


Introduction
Start to today, more than 200 countries are fighting the COVID-19 pandemic. COVID-19 originated from Wuhan, China, at the end of 2019, spread across the world, and strongly hit more than 200 countries (WHO 2021). At the end of Jan-2020, WHO announced public health emergency among all member countries, converted to COVID-19 pandemic on March 11, 2020. The pandemic of coronavirus changed the psyche of the world. The virus spread across all continents except Antarctica just within two months.
The number of suspected, infected cases and deaths started to increase in early March 2020 worldwide. Many countries began to follow the successful Chinese model to control the spread of the coronavirus in the provincial and local quarantines, tracing suspected cases, limiting movements and public activities, wearing the face mask, and preventive measures. Because pandemic China was the number 1 country facing the recorded instances that use the quarantine, local control, and preventative measure policy, it now makes it world number 55 th in the series of top countries. Those countries that were late in implementing the pandemic control policies and restrictions faced big problems and more deaths. At the start, the COVID-19 hit China, Italy, Iran, the United States, South Korea, and Japan badly. Those countries that adopted policies efficiently and planned properly are now away from pandemic threats, while other countries face the issue of controlling the spread of the coronavirus. Directly these countries reporting the highest number of cases with high reproduction numbers are under discussion. Like the USA, Brazil, India, Russia, France, the UK, Italy, Spain, Turkey, and Germany are on the top 10 list globally (WHO-COVID-19-meter, March 31, 2021). All these countries were having various policies and control programs, varying populations, and different geographical locations. The spread of the virus is based on its reproduction number, which is reducible by using effective lockdown policies. The top-10 countries with the latest population, cases reported, and % population affected is list Table 1 below.

04-Apr-21
Reproduction numbers demarcated as the mean of the number of people infected in a specific case Fraser (2007 The second most important contribution of R0 is to estimate the amount of vaccination required to protect the population from attaining herd immunity. The percentage of the population requiring immunization is calculated by formula 1 − 1/R0 (1). As the policy of restrictions to stay at home, lockdown on regional, provincial, and national level, and intelligent lockdown for specific areas and many other policies adopted by many countries during this situation. Adequate reproduction number the effectual reproduction number R(t) calculated at day t, i.e., the total number of individuals infected at a specific t time would be infectious if other factors remained unchanged. A data-based predictive model known as Autoregressive integrated moving average (ARIMA) has proved helpful in forecasting short-term forecasts of dengue fever, hemorrhagic fever with renal syndrome, and tuberculosis Darapaneni (2020). ARIMA has proven to be more effective than comparable models such as the support vector machine (SVM) and wavelet neural network. Sharma (2020)  Research demonstrated that the ARIMA methodology used to forecast the occurrence of COVID-19 in the future. The study's findings helped to evaluate the pandemic dynamics and indicated the epidemiological state of the subjected countries. The estimation by applying ARIMA methodology on COVID-19 occurrence patterns in France, Italy, and Spain by Ceylon (2020). L-skewness and L-kurtosis measurements by using the L-moments methodology. Hosking (1990) provides the degree of variation of cases from a distinct bell-shaped distribution on a defined period for surveillance data to measure the inclusive degree of outbreak intensity and differentiate between the consistent and consequently expected seasonal behavior and potential outbreaks. Outbreaks included seasonal rises well perceived by coefficients L-skewness and L-kurtosis Simpson et al., (2020); both methodologies are widely applicable in many fields. Still, on COVID-19 data in reproduction number perspective, its application is carried here, leading to the development of estimations tools by using these methods. Financial advisors in the insurance sector also apply expected-loss models to measure the scale of claims in flood situations or health insurance. To properly value the implications of unpredictable circumstances, the insurance sector measures what are known as exceedance probability functions. Such functions produce estimates of the likelihood that casualties from an unknown occurrence, such as influenza, Ebola pandemic, reach any defined amount within a given period or not Fan et al., (2018). In a probabilistic context, it is expressed as the ratio of the losses generated by a pandemic of any given severity level multiplied by the likelihood of a pandemic of the same severity occurring in the coming year.

The SIR and SEIR model
The fundamental difference between the SIR and SEIR model is that the first one contains three components as susceptible, infected, and recovered. In comparison, SEIR is derived from SIR, which has four parts: sensitive, exposed, infected and recovered. The aggregative number N = S + I + R showen in Figure 1, consisting of the components, generally grows from susceptible to infectious to recover. The application of the SIR model is on several diseases, particularly measles, rubella, and mumps, which are airborne diseases with lifetime immunity upon recovery. Thus, this model is practically applicable to the projection of infectious illnesses and human-to-human transmissible diseases after COVID-19 researchers use this model to calculate the numbers or expected patients to estimate the reproduction number of novel coronavirus. The susceptible, exposed, infectious, and recovered model (SEIR) depicted in Figure 2 is derived from the elementary SIR model. The components of this model represent the following measurements: susceptible individuals (S), those individuals who experience a long incubation duration (E), the total number of infected persons (I), and the sum of recovered patients represented by R. Therefore, the SEIR model differs from the SIR fundamentally by the amalgamation of an expectancy period. Both models provide the same number of reproduction numbers.

Auto-Regressive Moving Average Methodology (ARIMA)
ARIMA (p, d, q) method applies lag at the 1st or 2nd level if the nonstationary problem exists in the data; otherwise, if stationary without lag, than ARMA (p, q) is an alternative method, hence p for Moving Average (MA) and q for Autoregressive (AR) order that is the number of errors lag in ARIMA model forecast. Subtracting the initial value from the current is the most popular way to make a sequence stationary. Based on the nature of the univariate series, one or more lag is expected. Consequently, the value of d represents the smallest number of differentiation necessary to keep the series stationary, so if without differentiation, the data series is still stationary, then d = 0. The identification method began by measuring the presence of autocorrelation (ACF) and partial autocorrelation (PACF) by plotting the correlogram by Brockwell et al. (2003) and Awazuzu et al. (2008), Identification by using Jarque-Bera test for normality, Unit root test to test stationarity and Ljung-Box Q test. Then estimation of appropriate models, selecting the level of auto-regressive and moving averages based on the ACF and PACF of series. Based on the spikes and curve in the graph of ACF and PACF, the (p, q) identified and the best model is performed.
After selecting the best model, forecasting is performed based on parameters (p,d,q) suggested by a suitable model. Diagnostic evaluation of forecasting involves assessing the efficacy of the currently developed model by feasible statistically relevant measures such as Akaike information criterion (AIC), Bayesian criterion (BIC), measurement of mean square error (MSE). The model with the least MSE and best AIC, BIC, is fitted for a forecast. The abovementioned ARIMA methodology is elaborated in the flowchart Figure 3 below.

Linear Moments Technique
LM is calculated by sample values, to practice unbiased estimators is an additional property of PWM as: Connection of both LM and PWM is as After an initial screening of the data, the L-moments procedure follows four steps, like ARIMA (p, d, q) methodology.

Selecting the Appropriate Distribution
L-moments ratio diagram LMRD, Hosing (1997), along with the use of Z-statistic. Application of simulation studies with L-skewness and L-kurtosis of at site situation or at cluster situation, Best fitted distribution selection based on the lowest − or closest to zero, as the | | ≤ 1.64 is the criterion for confidence level at 90%; by using these criteria, more than one qualified distribution is drivable.

Estimations by Selected Distributions
For a given country included in the cluster, the quantile estimates attainable by substituting the index of reproduction number estimates And quantile function of (. ). F is the probability for quantile estimates, where .
( ) stated to the at country quantile function ( ) stated to the cluster quantile function Let ̂ scale factor at country , the cluster contains N countries, with country with sample size and observed data then Setting cluster mean is equal to 1 ( 1 = 1), by equating LMR's , , 3 , 4 mean to LMR of cluster , 3

Results and discussions
Assumption of stationarity and no autocorrelation, the data is scrutinized by the software SPSS 25, E-view 10, and R-language. Augmented ducky fuller test application to all data series shown unit root and data are absent is stationary for further analysis. Just France (p-value=0.5091), which was acceptable after clearance from other tests. The Ljung Box test, autocorrelation function (ACF), and partial autocorrelation function (PACF) shown USA, Brazil, India, UK, France, and Spain were stationary at the first difference, Russia, and Italy on 2 nd difference where the Turkey and Germany on zero lag were stationary. The correlogram of ACF and PACF of the USA, Brazil, India, and Russia is in Figure 4 below.

Auto-Regressive Integrated Moving Average Methodology
For a comprehensive data set of top-10 countries in coronavirus cases, deaths and higher reproduction numbers (WHO, COVID-19 meter) investigated. The data of estimated reproduction number by methodology Fraser (2007), which is updated routinely from last 58 weeks starting from February-2020 to March 31, 2021, from John Hopkins University, is utilized to forecast each country in upcoming months ARIMA methodology. Selection of p & q based on the judgment from ACF and PACF. If the ACF shows a sharp cutoff or at 1 st difference and autocorrelations, not positive means the model consists of MA process. The PACF charts go slow down for the AR process, indicating how many AR levels be utilized. As the variety exists in trends, spread, and control policies, and the number of COVID-19 cases in each county, different autoregressive (AR), lags, and moving averages (MA) are out-turned by each country presented in Table 2. USA best performed for at lag one without MA and AR. The UK also seems to be the same as the USA , but ARIMA (1,1,0) and ARMIA (0, 1,1) perfumed very well. The selection of the best model measured on minimum root mean square error (RMSE), Akaike information criterion (Akaike AIC, 1974), and Bayesian information criterion (BIC). Russia among three suggested models ARIMA (1,2,1), Figure 4 Correlogram of ACF and PACF for countries ARIMA(2,2,1) the ARIMA (2,2,2) performed very well. Italy ARIMA (1,2,1) and ARIMA (2,2,0) performed best. Turkey AR(1) and Germany simply following MA(1) as the best model. The auto Arima function in R-3.6.3 further verified the proposed models for each country, finally approved for forecasting reproduction number. The selected models-based forecasts up to the next four months are displayed. The US mean reproduction number approximately 1, where Russia is the only country with an estimated reproduction number <1 and Spain is >2. All countries forecast depicted in Table 3 below, with point estimation of the reproduction number for coming months with confidence interval 95% lower and upper limit. France tends to be increasing in cases shortly (April-June), with an increase in reproduction is forecasted. Russia and UK reproduction numbers are forecasted to be decreasing in June and July. Turkey and Italy are also predicting a rise in cases in later months. Overall, the estimated reproduction number forecasted > 1 in these countries except Russia. India and Germany predicted (>1), especially both can face a sudden spike in cases. The countries depicting higher reproduction numbers in May-July 2021 and the reproduction number of more than 1.5 or 2 are high to cause severe threats to the countries' health systems. Those countries can take precautionary measures to control the spread by using such useful evidence-based predictions.

Linear Moments Methodology
Linear Moments technique on these 10-countries data is processed, and comparable results are drawn. All countries considered in one cluster follow alike or a unique distribution in their reproduction number trend or number following different distributions selected from the family of five extreme value distributions. So, each country's weekly means reproduction number analyzed as well as cross-checked by using a cluster. Assumed as one cluster because of their previous outcomes in coronavirus cases, the performance of the reproduction number after fulfilling the basic assumptions of mean and variance consistency over time. Extreme value distributions included in linear moment methodology (LMM) are five, generalized logistic distribution (GLO), generalized normal distribution (GNO), generalized Pareto distribution (GPA), generalized extreme value distribution (GEV), and Pearson type3 (PE3) distribution, each having three parameters. Top-10 countries discordancy measure value compared with threshold value 2.76 for 10-countries considered as a cluster. But all countries discordancy value is under the limit, mostly <1 considered to be the part of one cluster. A separate analysis by treating each country as a special and unique identity was performed to check the more comprehensives of results at each country.  Z-fit and L-moments ratio diagram results are depicted separately in (Appendix Figure 1). The lowest Z-statistic criteria and the Figure 6, linear moment ratio diagram (LMRD) both are applicable to choose the most appropriate distribution Hosking (1990). The detailed analysis for each country provided the suitable distribution is generalized logistic distribution (GLO) as bet fit to perform the quantile estimates of reproduction number of COVID-19, see Table 4 and estimate the return periods, which also facilitate the probability value of exceedance. For further validation analysis, based on the lowest Z-statistic threshold, the generalized extreme value (GEV) and generalized normal distributions are selected, the Pearson type 3 (PE3) and generalized Pareto (GPA) excluded because of this minor appearance and higher Z-statistic.
After choosing the best fit is to figure out the estimation of the quantiles for multiple return periods, shown in Figure  The quantile estimates for a given cluster of 1, 2, 5, 10, 20, and 50 return periods are presented in Figure 5. Because for each country, separate analysis and top-10 countries considering cluster provided the same distribution as best fit, using finalized distributions that calculated the cluster quantile estimates. Quantile estimate for individual ℎ country for a specific return period of reproduction number can be drawn and forecasted. The USA, which has a mean reproduction number of 1.368, can be described by multiplying the cluster quantile estimate for selected distribution. As the table values ̂(10)=2.598, 1.368*2.598=3.554 predicted reproduction number once incoming ten weeks (for specific return period) with probability 0.99 of nonexceedance and exceedance probability 0.01. For France table value ̂(10)=0.923, where the mean value of France reproduction number 2.334, so the predicted reproduction number one incoming ten weeks with the probability of exceedance 0.01 is 2.334*0.923=2.154. Spain ̂(50)=2.037 and with mean estimated reproduction number 2.632 will 5.361 reproduction number with once in the coming year with 0.95 probability of non-exceedance if all other lockdowns, spread control policies followed. Let for India, ̂(10) =2.598 with mean reproduction number for India is 1.204, from which we can predict that R0 3.127 incoming 10 weeks with the probability of exceedance 95%. All other countries can construct the relative index similarly for the next weeks and months. Suppose a homogeneous cluster satisfies the criteria for all countries within the cluster represented by a single probability distribution holding distribution parameters jointly, so after the rescaling of country data by their at-country mean. In that case, this rescaled dimensionless probability distribution is described as a regional growth curve. The Monte Carlo simulation technique constructed on simulations to measure estimated quantiles and growth curves is applied, introduced by Hosking (1997). Monte Carlo simulations, over 10000 reiterations provided the results presndted in Table 5, of root mean square error (RMSE), relative bias (RB), relative absolute bias (RAB). Along with lower bound 0.05 and upper bound 0.95 for growth    The case with France opted for lockdown in March, then softened it, and again opted for national lockdown in November. But the matter is the cases, and the death toll is rising in France till Match 2021 and estimated to surpass reproduction number 1.69 in July. It is now facing severe threats based on the increase in the number of cases, rather than having a not much bigger population, depicting a higher reproduction number with an increasing trend. Russia also opted for lockdown in March 2020, but after two months its opened its social activities gradually, now it is on number 4 th in the world in coronavirus spread and cases. However, it is showing control in the circumstances in upcoming months prediction based on reproduction number. Turkey portrayed the highest reproduction number in November 2020, which is still accumulating its cases. Spain has an estimated average number of>2, making it seriously threatening that the issues can arise in the coming months, with a mean R0 surpassing 2.85 in July 2021.

Conclusion
L-moments methodology for quantile estimation on weekly mean estimated COVID-19 reproduction number provided analogous estimates to ARIMA forecasting. Consisting on following four steps of the method, the discordancy measure in which all countries satisfied the threshold criteria. The discordancy measure value for critical countries like France, Spain, and the UK was also within limits. The heterogeneity H measure of the 10-countries provided the best results. The value of H less than or equal to 1 shown the cluster homogeneity perfectly acceptable to fit the distribution and country-wise analysis by considering each country as single-identity has done. Because this methodology depicted matching results, selected outcomes are presented here. L-skewness and L-kurtosis out turned the Z-statistic and LMRD the best-suited fit as delivered by the whole cluster. Among five distributions, GLO, GNO, GEV, GPA, and PE3, three distributions using the Goodness-of-fit measure selected as the best fit for all countries in a cluster to predict the COVID-19 reproduction number for the Year 2021 and so on. Based on the growth curve, RMSE, relative bias, and relative absolute bias, the GLO and GEV distributions are nominated for other quantile estimations. They recommended the GLO distribution as a priority, GEV distribution on a second, and GNO distribution on a third.
These recommended models, forecasting by ARIMA methodology and distributions by Lmoments with quantile estimates and retune periods of mean reproduction number, addS to the development of new control applications, amendment in policies, and more detailed insights further planning and control of the virus. Both methodologies are comparable to each other, supporting both techniques and forecasting's authenticity and accuracy. Although L-moments got on edge to ARIMA forecasting because of the power weighted months built-in functions and short to long term return periods with the probability of exceedance. L-moments also have an edge because of their more weightage to lower values and smaller weightage to more significant values to bring the order in data, converting this technique to a more robust and most minor error in more significant period estimations. But both methodologies apply to any country's estimated reproduction number for forecasting if facing a similar scenario. The application of both methods covered the short-and long-term reproduction number forecasting's accuracy. So, every country should take precautionary measures accordingly, strict action against violations to control the spread, minimize R0 to zero and reduce the damages to public health, education, economies, and growth of the society. These recommendations play a vital role in policy development to reduce R0 and add inns public health matters to take precautionary measures before time.