Trends in COVID-19 Infected Cases and Deaths Based on Parametric and Nonparametric Regression Models


 The present investigation was carried out to study the trends in COVID-19 infected cases and deaths based on the parametric, exponential smoothing and non-parametric regression models by using COVID-19 cumulative infected cases and deaths due to infections The statistically most suited parametric models are selected based on the highest adjusted R2, significant regression co-efficient and co-efficient of determination (R2). Appropriate model is selected based on the model performance measures such as, Root Mean Square Error, Mean Absolute Error, Mean Absolute Percentage Error, assumptions of normality and independence of residuals. Nonparametric estimates of underlying growth functions are computed at each and every time points.


Preamble
The COVID-19 pandemic in 2019 has received much attention, as it affected most economies worldwide and resulted in uncountable deaths. Because no antiviral drugs or vaccines exist, the number of new coronavirus-affected cases has tremendously increased, and many people have died. The development of various methodologies to analyse these pandemic data has become a very important research area regarding the prediction of future corona virus cases. Jiang et al. (2000), by applying time series-based kinetic model for infectious diseases, obtained trends and short-term predictions for the transmission of COVID-19.

Review of Literature
Al-Rousan and Al-Najjar (2020),studied the effect of various factors, such as, sex, region, infection mode and birth year, on recovered and deceased cases due to COVID-19, in the South Korean region. Gondauri et al., (2020) for studying and analysing the correlation between the total numbers of COVID-19 cases and recoveries in different countries, the chain-binomial type of Bailey's model is employed . Most of the studies investigated COVID-19 cases based on various regression and time series models because these models are frequently applied to examine the growth or trend of diseases.
Katris (2021) aimed to generate a time series-based procedure to track outbreaks. In the first stage, he used univariate time series models to present the evolution of the reported cases. He also used combinations of the models to provide more accurate and robust results and considered statistical probability distributions to generate future scenarios. The final step was to build and use an epidemiological model (the time series susceptible-infected-recovered [tsiR] model) and to calculate the epidemiological ratio (R0) to estimate the termination of the outbreak. In addition to feed-forward artificial neural networks and multivariate adaptive regression splines from the machine learning toolbox, the time series models deployed included the classical exponential smoothing and ARIMA approaches. The combinations included the simple mean, Granger-Newbold and Bates-Granger approaches.
Kumar and Roy, (2020) by using the Bailey's model with secondary data, calculated the removal rate, which is the percentage of removed persons in the infected population. Further, regression analysis is performed to show the linear relationship of this indicator with total infection rates. Finally, they described how the model could be linked with decision making.
Mittal, (2020) carried out an exploratory data analysis with the aim of elaborating a statistical model to better understand COVID-19 in India by thoroughly studying the cases reported in the country through 22 April 2020. The results of the study showed the impact of COVID-19 in India at the daily and weekly level and drew parallels between India and neighboring countries as well as severely affected countries.
Ogundoun et.al., (2020), by adopting the ordinary least squares (OLS) estimator to measure the impact of travel history and contact with travelers on the spread of COVID-19 in Nigeria and created forecasts by extracting data spanning 31, 2020, to May 29, 2020, from the Nigeria Centre for Disease Control (NCDC) website. The model assessed the period before and after travel restrictions were enforced by the federal government of Nigeria. The fitted model fit the dataset well and was free of any validity violations based on the diagnostic checks conducted. The results show that the government made the right decision in enforcing travel restrictions, with travel history and contact with travelers found to increase the chances of people being infected with COVID-19 by 85% and 88%, respectively; the authors concluded that the government should enforce this policy to contain COVID-19.
Nesteruk, (2020), used simple mathematical model predicted the characteristics of the epidemic caused by corona virus in mainland China. The optimal values of the SIR model parameters are identified with the use of statistical approach. The numbers of infected, susceptible, and removed persons versus time are predicted and compared with the new data obtained after February 10, 2020, when the calculations are completed.
Rajarathinam and Tamilselvan, (2021) studied the short-and long-term cointegration relationships between the cumulative number of COVID-19 infections and the cumulative numbers of deaths due to COVID-19 are studied by employing an autoregressive distributed lag model and bound cointegration tests. The stability of the estimated model is also assessed. The cumulative sum of the recursive residuals test and the cumulative sum of recursive residuals squares tests are used to assess the consistency of the model's parameters.
Takele (2020) used stochastic modelling to predict COVID-19 prevalence patterns in East African countries, mainly Ethiopia, Djibouti, Sudan, and Somalia. The study results showed that in the four months following June 30, 2020, the number of COVID-19-positive people in Ethiopia could rise from 5,846 to 56,610 in the average rate scenario.

Objectives of the study
Based on the above information, the present study aimed to assess the trends in the number of cases related to COVID-19, i.e., whether the number of cases increases or decreases, based on the the parametric, exponential smoothing and nonparametric regression models by using COVID-19 cumulative infected cases and deaths due to infections data set.

Materials
The cumulative total numbers of COVID-19 infections and deaths as of 31 st June 2021, starting on 9 th March 2020, were collected from the official website maintained by the Health and Family welfare department, Government of Tamil Nadu.Stat graphics Centurion XVI Ver.16.1.12 was used to estimate the model parameters, error diagnostics and to study the stability of the estimated model.

Methods
In parametric models different linear models viz., Linear trend, Quadratic trend, Exponential trend, S-Curve trend (Montgomery, et.al., 2003); different smoothing viz., Simple Exponential Residual analysis is carried out to test the randomness as well as normality. A relative growth rate is calculated based on best fitted models.

Nonparametic Regression (Hardle, 1990; Takezawa, 2006)
Nonparametric regression technique for functional estimation has become increasingly popular as a tool for data analysis. The technique imposes only few assumptions about shape of function and therefore it may be more flexible than usual parametric approaches. In many situations, we may not know the exact functional form and sometimes there may not be any parametric functional form to represent the data. In such situations, the nonparametric technique, which entirely depends on the data will be more suitable. This method is based on the local regression smoother and only assumption about the form of trend. The nonparametric techniques having comparable merit which broadly indicates the direction of the growth rates, and the usual statistical analysis, which provides both direction and dimension of growth rates.

Estimation of trend and growth rate (Jose et.al., 2008)
The nonparametric regression model with the additive error of the form , ) ( . The kernel weighted linear regression smoother (Fan, 1992) is used to estimate the trend function nonparametrically. The value of the local linear regression smoother at time x is the solution of a0 to the following weighted least squares problem: where K is a bounded symmetric kernel density function and h is the bandwidth. Let 0 a and 1 a be the solutions to the weighted least squares problem. The estimate of the trend function m(t) is given by The optimum bandwidth h can be obtained by the method of cross-validation. The slope m | (x) of m(x) can be considered as the simple linear growth rate at the time point x. The estimate of m | (x) is given by Under the assumption that the trend function m is smooth and m(x) ≠ 0 for all x  [0, 1], the value of the relative growth rate at time X can be written as: , a consistent estimate of the relative growth rate rx is given by: ) ( | Taking arithmetic mean, the requisite compound growth rate over a given time-period may be obtained.

Descriptive Statistics
Descriptive statistics have been calculated to know the nature of the distribution of the study variable and reported in the Table 1. The results reveal that both the variables the Jarque-Bera statistics p-values are found to be significant indicating that the study variables are not normally distributed. The maximum number of COVID-19 infected cases are 532529 registered at Chennai and minimum is 11056.00 registered at Perambalur; the maximum cumulative number of COVID-19 deaths due to infections is 8187.000 registered at Chennai and minimum is 164.0000 registered at Nilgiris.

Trends in cumulative number of deaths due to COVID-19 based on parametric model
The results presented in Table 3 reveals that among the parametric model, the ARIMA (2,1,0) time series model has the lowest values of RMSE (1221.02), MAE (608.53) and MAPE (148.83). Hence among the parametric models the ARIMA (2,1,0) is found suitable to fit the trends in in the cumulative number of death due to COVID-19 infections.

Trends in cumulative numbers COVID-19 infected cases based on nonparametric regression model
The nonparametric regression model is employed to fit the trends in number of COVID-19 infected cases. Nonparametric estimates of underlying growth function are computed at each and every time points. Residual analysis showed that the assumptions of independence of errors are not violated at 5% level of significance. The RMSE, MAE, MAPE values are 62090.77, 31047.79 and 65.78, respectively. These values are found to be much lower than that of obtained through the parametric models, indicating thereby the superiority of this approach over the parametric approach. Nonparametric regression model is selected as the best fitted trend function for the number of COVID-19 infected cases and depicted in the fig.3.

Trends in cumulative numbers of deaths due to COVID-19 based on nonparametric regression model
The nonparametric regression model is employed to fit the trends in cumulative numbers of deaths due to COVID-19. Nonparametric estimates of underlying growth function are computed at each and every time points. Residual analysis showed that the assumptions of independence of errors are not violated at 5% level of significance. The RMSE, MAE, MAPE values are 980.47, 464.70 and 78.49, respectively. These values are found to be much lower than that of obtained through the parametric models, indicating thereby the superiority of this approach over the parametric approach. Nonparametric regression model is selected as the best fitted trend function for the number of COVID-19 infected cases and depicted in the following fig.4.

III. CONCLUSION
Results reveal that none of the parametric models have been found suitable to study the trends in cumulative number of COVID-19 infected cases and number of death due to COVID-19 infections.