Modeling and Monitoring COVID-19 Monthly Infected Cases and Deaths

The effects of the novel coronavirus (COVID-19) pandemic could not have been more profound, with the world encountering health crises as well as enormous economic crises. In this paper, the relationships, and trends in the number of COVID-19 infected new cases and the number of deaths due to COVID-19 in all 37 districts of Tamil Nadu state, India, during the period, 3 rd July 2020 to 31 st March,2021 based on a panel regression model.


Background of the study
The COVID-19 pandemic that began in 2019 has received much attention, affecting most of the world economies and leading to countless deaths. In the absence of antiviral drugs and vaccines, the number of new COVID-19-infected cases has increased tremendously and has caused many deaths. The deployment of various methodologies to analyze pandemic data has become a particularly important research area to forecast coronavirus infected cases and deaths.

Review of literature
A few of the research work carried out by various authors in modeling of COVID-19 data are reviewed in sequence as below.
Al-Rousan and Al-Najjar,2020 pointed out that on 1 st of February 2020, CoVID-19 coronavirus outbreak was announced to the public and it was classified as epidemic. Although the disease was discovered and concentrated in Hubei province, China, but it was exported to all other Chinese provinces and spread globally. Until this moment, all plans failed to contain the novel coronavirus disease, and it continued spreading to the entire world to exceed 98000 cases globally with 80000 cases exist in mainland China. This manuscript aims to study and interpret the effect of environment and metrological variables on coronavirus disease spreading in 30 Chinese provinces. Increasing both temperature and shortwave radiation variables would increase the number of confirmed, death, and recovered cases.
Al-Rousan and Al-Najjar,2020 stated that the Coronavirus epidemic caused announcing emergency case in South Korea. The virus started with one infected case by January 20, 2020, where 9583 announced cases were reported by March 29, 2020. This indicates that the number of confirmed cases is increasing rapidly, which can cause national crises for South Korea. The aim of this study is to fill a gap between previous studies and the current development of  spreading, by extracting a relationship between independent variables and dependent variable. This research statistically analyzed the effect of sex, region, infection reasons, birth year, and released or diseased date on the reported numbers of recovered and deceased cases. The results found that sex, region, and infection reasons affected on both recovered and deceased cases, while birth year only affected on deceased cases. Besides, no deceased cases are reported for released cases, while 11.3% of deceased cases positive confirmed after their deceased. Unknown reason of infection is the main variable that detected in South Korea with more than 33% of total infected cases.
Arumugam and Rajathi (2020) asserted that, the Markov chain model is mainly used for business, manpower planning, share market and many different areas. Because the prediction of the any ideas based on the Markov chain the result needs to be efficient. Now, the infection of corona virus COVID-19 is a large task for the human being as well as the government. This paper is focusing tool for prediction of corona virus infection with a Markov chain model. Markov chain model had been used to predict the corona virus (COVID-10) based at the secondary data as on 13th March 2020. The 1st order Markov models had been used to predict the impact of corona virus using probability matrices and Monte Carlo simulation. To present the applications of this model, 2020 corona virus pandemic in India by country and union territory become used as a case study. It will be useful for prediction of the corona virus COVID-19 in destiny. Bertozzi et al. (2020) stated that the coronavirus disease 2019 (COVID-19) pandemic has placed epidemic modeling at the forefront of worldwide public policy making. Nonetheless, modeling and forecasting the spread of COVID-19 remains a challenge. Here, detailed three regional scale models for forecasting and assessing the course of the pandemic. This work demonstrates the utility of parsimonious models for early-time data and provides an accessible framework for generating policy-relevant insights into its course. Also to show how these models can be connected to each other and to time series data for a particular region. Capable of measuring and forecasting the impacts of social distancing, these models high light the dangers of relaxing nonpharmaceutical public health interventions in the absence of a vaccine or antiviral therapies.
Mahanty et al., (2020) presented a medical stance on research studies of COVID-19, wherein they estimated a time-series databased statistical model using prophet to comprehend the trend of the current pandemic in the coming future after July 29, 2020 by using data at a global level. Prophet is an opensource framework discovered by the Data Science team at Facebook for carrying out forecasting based operations. It aids to automate the procedure of developing accurate forecasts and can be customized according to the use case we are solving. The Prophet model is easy to work because the official repository of prophet is live on GitHub and is open for contributions and can be fitted effortlessly. The statistical data presented on the paper refers to the number of daily confirmed cases officially for the period January 22, 2020, to July 29, 2020. The estimated data produced by the forecast models can then be used by Governments and medical care departments of various countries to manage the existing situation, thus trying to flatten the curve in various nations as we believe that there is minimal time to do this. The inferences made using the model can be clearly comprehended without much effort. Furthermore, it tries to give an understanding of the past, present, and future trends by showing graphical forecasts and statistics. Compared to other models, prophet specifically holds its own importance and innovativeness as the model is fully automated and generates quick and precise forecasts that can be tunable additionally Tiwar et al., ( 2020) stated that COVID-19 is rapidly spreading in South Asian countries, especially in India. India is the fourth most COVID-19 affected country at present i.e., until July 10, 2020.
With limited medical facilities and high transmission rate, the study of COVID-19 progression and its subsequent trajectory needs to be analyzed in India. Epidemiologic mathematical models have the potential to predict the epidemic peak of COVID-19 under different scenarios. Lockdown is one of the most effective mitigation policies adopted worldwide to control the transmission rate of COVID-19 cases. In this study, we use an improvised five compartment mathematical model, i.e., Susceptible (S)-Exposed (E)-Infected (I)-Recovered (R)-Death (D) (SEIRD) to investigate the progression of COVID-19 and predict the epidemic peak under the impact of lockdown in India.
The aim of this study is to provide a more precise prediction of epidemic peak and to evaluate the impact of lockdown on epidemic peak shift in India. For this purpose, we examine the most recent data (from January 30, 2020 to July 10, 2020 i.e., 160 days) to enhance the accuracy of outcomes obtained from the proposed model. The model predicts that the total number of COVID-19 active cases would be around 5.8 × 105 on August 15, 2020 under current circumstances. In addition, our study indicates the existence of under-reported cases i.e., 105 during the post-lockdown period in India. Consequently, this study suggests that a nationwide public lockdown would lead to epidemic peak suppression in India. It is expected that the obtained results would be beneficial for determining further COVID-19 mitigation policies not only in India but globally as well. Rosario et al., (2020), aimed to evaluate the relationship between weather factors (temperature, humidity, solar radiation, wind speed, and rainfall) and COVID-19 infection in the State of Rio de Janeiro, Brazil. Solar radiation showed a strong (-0.609, p<0.01) negative correlation with the incidence of novel coronavirus (SARS-CoV-2). Temperature (maximum and average) and wind speed showed negative correlation (p < 0.01). Therefore, in this studied tropical state, high solar radiation can be indicated as the main climatic factor that suppress the spread of COVID-19. High temperatures, and wind speed also are potential factors. Therefore, the findings of this study show the ability to improve the organizational system of strategies to combat the pandemic in the State of Rio de Janeiro, Brazil, and other tropical countries around the word. Zuo (2020) asserted that in the current scenario, the outbreak of a pandemic disease COVID-19 is of great interest. A broad statistical analysis of this event is still to come, but it is immediately needed to evaluate the disease dynamics in order to arrange the appropriate quarantine activities, to estimate the required number of places in hospitals, the level of individual protection, the rate of isolation of infected persons, and among others. In this article, we provide a convenient method of data comparison that can be helpful for both the governmental and private organizations. Up to date, facts and figures of the total the confirmed cases, daily confirmed cases, total deaths, and daily deaths that have been reported in the Asian countries are provided. Furthermore, a statistical model is suggested to provide a best description of the COVID-19 total death data in the Asian countries. Chu (2021) reported that the novel coronavirus (COVID-19) that was first known at the end of 2019 has impacted almost every aspect of life as we know it. This paper focuses on the incidence of the disease in Italy and Spain-two of the first and most affected European countries. Using two simple mathematical epidemiological models-the Susceptible-Infectious-Recovered model and the log-linear regression model, we model the daily and cumulative incidence of COVID-19 in the two countries during the early stage of the outbreak, and compute estimates for basic measures of the infectiousness of the disease including the basic reproduction number, growth rate, and doubling time. Estimates of the basic reproduction number were found to be larger than 1 in both countries, with values being between 2 and 3 for Italy, and 2.5 and 4 for Spain. Estimates were also computed for the more dynamic effective reproduction number, which showed that since the first cases were confirmed in the respective countries the severity has generally been decreasing. The predictive ability of the log-linear regression model was found to give a better fit and simple estimates of the daily incidence for both countries were computed.

Gecili et al., (2021), states that the novel coronavirus (COVID-19) is an emergent disease that
initially had no historical data to guide scientists on predicting/ forecasting its global or national impact over time. The ability to predict the progress of this pandemic has been crucial for decision making aimed at fighting this pandemic and controlling its spread. In this work we considered four different statistical/time series models that are readily available from the 'forecast' package in R.
We performed novel applications with these models, forecasting the number of infected cases (confirmed cases and similarly the number of deaths and recovery) along with the corresponding 90% prediction interval to estimate uncertainty around pointwise forecasts. Since the future may not repeat the past for this pandemic, no prediction model is certain. However, any prediction tool with acceptable prediction performance (or prediction error) could still be very useful for publichealth planning to handle spread of the pandemic, and could policy decision-making and facilitate transition to normality. These four models were applied to publicly available data of the COVID-19 pandemic for both the USA and Italy. We observed that all models reasonably predicted the future numbers of confirmed cases, deaths, and recoveries of COVID-19. However, for the majority of the analyses, the time series model with autoregressive integrated moving average (ARIMA) and cubic smoothing spline models both had smaller prediction errors and narrower prediction intervals, compared to the Holt and Trigonometric Exponential smoothing state space model with Box-Cox transformation (TBATS) models. Therefore, the former two models were preferable to the latter models. Given similarities in performance of the models in the USA and Italy, the corresponding prediction tools can be applied to other countries grappling with the COVID-19 pandemic, and to any pandemics that can occur in future. The speed of adjustment is found to be 9.9% (Rajarathinam and Tamilselvan, 2021).

Objectives of the present study
Based on the above discussion, the present study aimed to study the relationships and trends in the

Panel data model
Panel data are a type of data that contain observations of multiple phenomena collected over different time periods for the same group of individuals, units or entities. In short, econometric panel data are multidimensional data collected over a given period.

A simple panel data regression model is specified as
Here, Y is the dependent variable, X is the independent or explanatory variable, and α β are the intercept and slope, i stands for the i th cross-sectional unit and t for the t th month, and X is assumed to be non-stochastic and the error term to follow the classical assumptions, namely, In this study, i, that is, the number of cross-sections (districts), is 37 (i=1, 2, 3, …, 37), and t=1, 2, 3, …, 9.
Detailed discussions of panel data modeling can be found in, viz., Baltagi (2001) By combining time series of cross-sections of observations, panel data provide "more informative data, more variability, less collinearity among variables, more degrees of freedom and more efficiency" (Baltagi, 2001).

Materials
The COVID-19 dataset was collected from the official Tamil Nadu government website (www.stopcorona.tn.gov.in) from 3 rd July, 2021 to 31 st March, 2021 (the study period). Different econometric tools related to panel data regression modeling were employed to investigate the research questions of the present study. Several methodologies for panel data regression modeling are discussed in the methods section. EViews Ver.11. was used for the calculations.

Unit root tests
Unit roots in panel data can be tested for using either the Levin et al. (2002) test or the Hadri (2000) Lagrange multiplier (LM) stationarity test. The null hypothesis is that the panels contain unit roots, and the alternative hypothesis is that the panels are stationary. In the results, if the p value is less than 0.05, then one can reject the null hypothesis and accept the alternative hypothesis.

Pooled Regression OLS model or Constant Coefficient Model
The pooled model with constant coefficients (the usual assumption for cross-sectional analysis) is specified as Here, i = 1, 2, 3, …, 37, and t = 1, 2, 3 … 9, where i stands for the i th cross-sectional units (Districts) and t for the t th month period, and it is assumed that X (the independent variable) is non-stochastic and that the error term follows the classical assumptions, namely,

Individual-specific effects model
We assume that there is unobserved heterogeneity across individuals captured by i α . The main question is whether the individual-specific effects i α are correlated with the regressor. If they are correlated, we have an FE model. If they are not correlated, we have a RE model.

FE least squares dummy variable (LSDV) model (Gujarati et al., 2017)
The term fixed effects is used because although the intercept may vary across districts, each entity's intercept does not vary over time; that is, it is time invariant. In other words, the individual-specific effects are the leftover variation in the dependent variable that cannot be explained by the regressor. By using the dummy variable technique, one can allow the fixed effects intercept to vary among the districts.

RE model
The Rho is the interclass correlation of the error, that is, the fraction of the variance in the error due to the individual-specific effects. It approaches 1 if the individual effects dominate the idiosyncratic error.

Restricted F-test (Bhaumik,2017)
In the Restricted F-test, The null hypothesis is

Unit root tests
In analyses of time series data, it is important that the study variables are stationary, which means that the means and variances of the variable data are the same. Accordingly, Levin-Lin-Chu unit root tests were carried out to test the stationarity of the study variables, viz., the number of COVID-19-infected patients (NCASE) and of deaths (DEATH) due to COVID-19. The results are reported in Tables 1.
The test results presented in Tables 1 reveals the two variables under study, NCASE and DEATH, to be stationary in level, since the Levin, Lin and Chu t-statistics are found to be highly significant (p<0.0000). Hence, the variables under study are found to be stationary.

Fig. 1 depicts the number of COVID-19-positive patients registered in different districts of Tamil
Nadu during the months of 3 rd July-2020 to 31 st March,2021.
Further the of COVID-19-positive patients registered follows the following third-degree polynomial with the value of R 2 is equal to 99 %. The model is highly significant, and the parameters values are also significant at 5% level. The residuals due to this model are normally distributed because the Shapiro-Wilk's test (test for normality) statistic is non-significant. Also, the residuals are independent as the run-test statistic value is also non-significant. Hence the model is well defined one and the results obtained due to this model are statistically valid.   Further the death due to COVID-19 registered follows the following exponential model with the value of R 2 is equal to 95 %. The model is highly significant and the parameters values are also significant at 5% level. The residuals due to this model are normally distributed because the Shapiro-Wilk's test (test for normality) statistic is non-significant. Also the residuals are independent as the run-test statistic value is also non-significant. Hence the model is well defined one and the results obtained due to this model are statistically valid.
The highest number of deaths due to COVID-19 occurred in the month of August -2020 (3387) and the lowest number (140) of deaths was in the month of February, 2020. In total, in during the study period 11022 number of deaths were registered due to COVID-19 in Tamil Nadu.

Fig. 2. The number of COVID-19-positive patients registered in different districts of Tamil
Nadu during the months of 3 rd July-2020 to 31 st March,2021.

Variations between months
To determine the variations across the months under during the study period due to the number of COVID-19-positive infected cases and deaths due to COVID-19, ANOVA tests were carried out for each of the study variables, NCASE and DEATH, and the results are presented in Table 2. The results presented in Table 2 reveal that since the ANOVA tests are highly significant (p<0.0000) for both study variables, and highly significant between the months at 1% significance. This means that the differences in the number of positive infections registered in different months are highly significant at 1 % level of significant.

Pooled OLS regression Model or Constant Coefficients Model
The panel least squares method is employed with the number of deaths due to COVID-19 as the dependent variable and the number of new COVID-19-infected patients as the independent variable. The regression results based on EViews, Version 11, are presented in Table 3.
The estimated model is DEATH = -4.915224 +0.0161 NASE (R 2 = 92%) The results reveal that the intercept and slopes are very highly significant, and the model F-statistic is also highly significant, with a remarkably high R 2 of 92%. This model explains 92% variations in death by the regressor NCASE. Additionally, for every unit increase in NCASE, DEATH increases by 0.02%, as indicated earlier.
The major problem with this model is that it does not distinguish between the months, nor does it tell us whether the response of total COVID-19 deaths to the explanatory variable over time is the same for all months. The results presented in Table 4 reveal that the FE model is highly significant, with a high R 2 of 93%. The slope coefficient for new COVID-19 infections is also found to be highly significant, which shows that new COVID-19 infections exhibit significant variations in deaths due to COVID-19. All the dummy variable coefficients are found to be non-significant indicating that the pooled regression model values may be informative and appropriate. Additionally, the values of the slope coefficients in Table 4 are also almost same and highly significant in both the model. These two inferences indicates that CCM seems to be better fit than the FE model.   Table 5. The results reveal since the value of Rho is 0, the absence of random effect is confirmed.

Hausman test
The Hausman test evaluates whether there is a significant difference between the FE and RE estimators. The results presented in Table 6 reveal that since the estimated chi-square value is significant, we reject the hypothesis that there is no significant difference in the estimated coefficients of the two models. It seems there is correlation between the error term and one or more regressor. Hence, we can reject the random effects model in favor of the fixed effect model. Note, however, as the last part of the Table -6 shows, not all coefficients differ in the two models (Fixed and Random).   Aging to confirm this the inference obtained through Redundant Fixed Effects Test, the Restricted F-test has been carried out and discussed below.

The Restricted F-test
The Restricted F-test discussed in the section 2.2.7 has been employed and the calculated F * is found to be less than the F-table value indicating that absence of Fixed Effects and the intercepts of cross-sectional units are non-significant at 5% level of significance as per the results given in the

4.CONCLUSIONS
During the study period (3 rd July-2020 to 31 st March 2021), the highest numbers of infections,181817 and the highest number of deaths, 3387 were registered in the month of August -2020, the lowest were in the month of February-2021.Overall during the study period, 78,6,990 infected cases and 11,022 deaths were registers In Tamil Nadu. The interesting results obtained in this paper is that even though the data is Panel type, none of the panel regression model was found suitable whereas the Constants Coefficient Model (Pooled Regression Model) was found suitable to study the relationships between number of covid infects and deaths. The average death due to COVID-19 was about 1.6 %. Figure 1 The number of COVID-19-positive patients registered in different districts of Tamil Nadu during the months of 3rd July-2020 to 31st March,2021.

Figure 2
The number of COVID-19-positive patients registered in different districts of Tamil Nadu during the months of 3rd July-2020 to 31st March,2021.