Estimating Trends in COVID-19 Infected Cases Based on Panel Data Regression Modelling

Background and Objective: The novel coronavirus pandemic, known as COVID-19, could not have been more predictable; thus, the world encountered health crises and substantial economic crises. This paper analysed the trends in COVID-19 cases in October 2020 in four southern districts of Tamil Nadu state, India, using a panel regression model. Materials and Methods : Panel data on the number of COVID-19-infected cases were collected from daily bulletins published through the official website www.stopcorona.tn.gov.in maintained by the Government of Tamil Nadu state, India. Panel data regression models were employed to study the trends. EViews Ver.11. Software was used to estimate the model and its parameters. Results : In all four districts, the COVID-19-infected case data followed a normal distribution. Maximum numbers of COVID-19-infected cases were registered in Kanniyakumari, followed by Tirunelveli, Thoothukudi and Tenkasi districts. The fewest COVID-19 cases were registered in Tenkasi, followed by Tirunelveli, Thoothukudi and Kanniyakumari districts. A random effects model was found to be an appropriate model to study the trend. Conclusion : The panel data regression model is found to be more appropriate than traditional models. The Hausman test and Wald test confirmed the selection of the random effects model. The Jarque-Bera normality test ensured the normality of the residuals. In all four districts under study, the number of COVID-19 infections showed a decreasing trend at a rate of 1.68% during October 2020.


INTRODUCTION
The 2019 COVID-19 pandemic has received much attention, as it has affected most economies worldwide and resulted in innumerable deaths.Because no antiviral drugs or vaccines exist, the number of new coronavirus-affected cases has increased tremendously, and many people have died.The development of various methodologies to analyse these pandemic data has become an especially important research area.The current topic is very much important for investigation since the decision makers can take an appropriate decision to impose the lock down related matters, number of beds needed in hospitals to admit the infected patient, quantity of oxygen needed etc.The present study results would be helpful government, and healthcare communities to initiate appropriate measures to control this outbreak in advance.The trend analysis using sound statistical models are very much important to investigate the pattern in diseases spreading.Hence the current topic is very much important for investigation.
An investigation has been made by estimating hidden models and trends of COVID-19 infected cases in all the 37 districts of Tamil Nadu State, India during the period from 1 st August, 2020 to 31 st December, 2020 by using different curve fitting tools.The result reported that decreases in trend have been observed in all the district 1 .Lockdowns and the effectiveness of reduction in the contacts in Italy have been measured using the modified model.The results showed a decrease in infected people due to stay-at-home orders and tracing quarantine interventions 2 .The prediction, regarding the number of COVID-19 cases in India based on differential equation model.The model shows that the imposition of a countrywide lockdown plays a important role in restraining the spread of the disease 3 .Loglinear model has been used to estimate the progression of the COVID-19 infection in Tamil Nadu, India.The result indicates that the outbreak is showing decay in the number of infections of the disease which highlights the effectiveness of controlling measure 4 .A study aimed to analyse the spatial distribution of COVID-19 incidence in Brazils municipalities (counties) and investigate its association with socio dimorphic determinants to better understand the social context and the epidemic's spread in the country using the ordinary least squares (OLS), spatial autoregressive model (SAR), conditional autoregressive model (CAR) and the local regression model called multiscale geographically weighted regression (MGWR).The MGWR model fit improved when compared to the OLS, SAR and CAR 5 .A mathematical epidemic model is framed by taking inspiration from the classic Susceptible (S)-Infected (I)-Recovered (R),(SIR) model and develop a compartmental model with ten compartments to study coronavirus dynamics in India with inclusion of factors related to face mask efficacy contact tracing and testing along with quarantine and isolation 6 .Autoregressive integrated moving average (ARIMA) modeling approach has been used for projecting coronavirus (COVID-19) prevalence patterns in East African countries, mainly Ethiopia, Djibouti, Sudan and Somalia 7 .Dynamic Panel Data Modeling and Surveillance of COVID-19 in Metropolitan area in the United Stated using the longitudinal trend analysis and Wald test 8 .
In ARIMA times series model the variable under study should be stationary.When the variable is not stationary one should convert the original data to the first or second difference.
By make the original series into stationary series the estimated trend model is for the transformed series not for the original series.This is one of the main drawbacks of ARIMA time series modelling.
To avoid this problem, in this study Panel Data Regression modeling is used.By combining time series of cross section observations, panel data give "more informative data, more variability, less collinearity among variables, more degrees of freedom with more efficiency".
As Baltagi 9 stated that compared to pure cross sectional and ARIMA time series, panel data regression estimation is better to identify and measure effects of independent variables on dependent variables, what one cannot measure using time series and cross section data.The functional form of the equation, the slope coefficients, give the percentage change in the dependent variable 9 .By this way, the statistical methodologies used in this paper is more specific, unique which advances a new knowledge in COVID-19 trend analysis.
Based on the above discussion the main purpose of present study is to study the trends in the number of COVID-19 infected cases, i.e., whether the number of cases decreased or increased.Additionally, the impacts of COVID-19 in four different southern districts of Tamil Nadu, namely, Kanniyakumari, Tenkasi, Thoothukudi and Tirunelveli (cross-sections), were investigated by using panel data models.These are the four districts which comes under the Manonmaniam Sundaranar University, Tirunelveli, India.Based on the outcome of this study, the University authorities can take an appropriate academic decision whether to go for online class, online examinations, or offline class etc., based the available COVID-19 infections trends in these four districts.There are nearly ninety affiliated colleges in this University.The result of the present study would be incredibly helpful to take academic related decisions.

Materials:
The COVID-19 dataset was collected from the official Tamil Nadu Government website www.stopcorona.tn.gov.in.In this novel study, only four southern districts of Tamil Nadu, including Kanniyakumari, Tenkasi, Thoothukudi and Tirunelveli, were considered in October 2020.
Methods: Panel data are a type of data that contain observations of multiple phenomena collected over different time periods for the same group of individuals, units, or entities.In short, econometric panel data are multidimensional data collected over a given period. 9,10A simple panel data regression model is specified as where it  are the estimated residuals from the panel regression analysis.Here, Y is the dependent variable, X is the independent or explanatory variable, and   are the intercept and slope, i stands for the i th cross-sectional unit and t for the t th month, and X is assumed to be non-stochastic and the error term to follow classical assumptions, namely, ) .In this study, i, the number of cross-sections is 4 (i=1, 2, 3, 4), and t=1, 2, 3,…, 30.Detailed discussions of panel data models are given in 11 .
Unit Root Test: Unit roots for the panel data can be tested using either the Leuin-Llin-Chu 12 test or the Hadri 13 LM stationarity test.The null hypothesis is that panels contain unit roots, and the alternative hypothesis is that panels are stationary.In the results, if the p value is less than 0.05, then one can reject the null hypothesis and accept the alternative hypothesis.
Similarly, the unit root for the first difference can also be tested using a similar method.
The Constant Coefficients Model : The Constant Coefficients Model (CCM) assumes that all coefficients (intercept and slope) remain unchanged across cross-sectional units and over time.In other words, the CCM ignores the space and time dimensions of panel data.Put differently, under the CCM, the cross-sectional units are assumed to be homogeneous such that the values of intercept and slope coefficients are the same irrespective of the crosssectional unit being considered.Accepting this homogeneity assumption (also called the pooling assumption), the CCM uses the panel (or pooled) data set and applies the ordinary least squares (OLS) method to estimate unknown parameters of the model.Thus, the CCM is nothing but a straightforward application of OLS to a given panel or pooled data to obtain estimates for unknown parameters of the model.

Individual Specific-Effect Model:
Here, it is assumed that there is unobserved heterogeneity across individuals and captured by i  .The main question is whether the individual-specific effects i  are correlated with the regressor; if they are correlated, a fixed effects model exists.
If these factors are not correlated, a random effects model exists.
Fixed-Effect OR Least-Square Dummy Variable Regression Model: One way to consider the individuality of each district or each cross-sectional unit is to let the intercept vary for each district but still assume that the slope coefficients are constant across districts.The model is written as : The difference in the intercept may be due to COVID-19 infections pre-cautionary measure followed in each of the four districts.
The model ( 2) is known as the fixed effect model (FEM), The term "fixed effects" is due to the fact that, although the intercept may differ across districts, each district's intercept does not vary over time; that is, it is time invariant.
These fixed effects models can be implemented with the dummy variable technique.
Therefore, the fixed effects model can be written as 10 Rho is the interclass correlation of the error or the fraction of the variance in the error term due to individual-specific effects.These variables approach 1 if individual effects dominate the idiosyncratic error 9,10 .
Here RE  and FE  are the vector of parameter estimates of random effect and fixed effect, respectively.Under the null hypothesis, this statistic has asymptotically the chi-squared distribution with the number of degrees of freedom equal to the rank of matrix ( )

Ho:
The null hypothesis of the test states that there is constant variance among residuals.

RESULTS AND DISCUSSION
The   October decreased to 42 cases at the end of the month.Very peculiar upward and downward trends have been observed in this district.

Fig. 1(d): Daily COVID-19 infected cases in Thoothukudi District
In the case of Tirunelveli District, Fig. 1(e) depicts a stepwise declining trend, with the highest number of 86 infected cases on 1 st October directly declining to 15 at the end of the month.

Fig. 1(e): Daily COVID-19 infected cases in Tirunelveli District
The pattern of COVID-19-infected cases of Kanniyakumari and Tirunelveli followed a similar declining trend, whereas the same upward and downward trend was noted in the cases of Tenkasi and Thoothukudi.

Fig. 2. Comparisons of COVID-19 infection trends in all four districts
The results presented in

Constant Coefficient Model(Panel OLS).
The CCM es method is employed considering the number of new cases (NCASE) as the dependent variable and X, time, as the independent variables; the results are presented in Table 4.The result reveals that the intercepts and slopes are highly significant at the 1% level of significance.The slope is negative, which indicates that the number of COVID-19-infected cases decreased by 1.68% in October 2020.The model is highly significant at the 1% level of significance with an incredibly low R 2 value of 29%.
Additionally, the estimated Durbin-Watson value of 0.242273 is quite low, which suggests the presence of autocorrelation in the data.The estimated model assumes that the slope coefficients of time variables X are all identical in all four states.Therefore, despite its simplicity, the CCM may distort the true relationship between the dependent variable-the number of cases (NCASE)-and time, the independent variable X, across the four districts.
Raymundo et al. 5 , reported that the MGWR model was the most suited model in comparison of OLS (CCM) model in the study of spatial analysis of COVID-19 incidence and the sociodemographic context in Brazil.

Fixed-Effect OR Least-Square Dummy Variable Regression Model:
The results presented in Table 5 reveal that the fixed effect model explains 82% of the variation in the dependent variable.The model is highly significant at the1% level of significance.The dummy variables were also highly significant at the1% level of significance.The root mean square error is 11.4868 with the S.E. of regression is 11.7256.Based on the statistical significance at the 1% level of significance of the estimated coefficients and the substantial increase in the R 2 value to 82% (significant at the 1% level of significance), one can conclude that the fixed effects model or the LSDSV regression model performs better than the panel least-squares regression model (CCM).The diagrammatic representation of fixed effects in four different districts is depicted in Fig. 3. Based on this result, it is concluded that the fixed effect model is better than CCM.

Fig.3. Fixed effect in different districts
To confirm the presence of the fixed effect, the redundant fixed effect test was carried out, and the results are presented in Table 7.The test results reveal that the Cross-section F and Chi-square statistics values are significant at the 1% level of significance, indicating that the presence of fixed effects is different from one district to another.Random-Effect Model: Finally, the random-effect model is estimated, and the results are presented in Table 9.The results reveal that the model is highly significant at the1% level of significance with an extremely high R 2 value of 62% with an S.E. of regression 1.7257., Root MSE, 11.6307.As in the case of the fixed effect model, the random-effect model coefficients, intercept, and slope are highly significant at the1% level of significance.
The value of slope is -1.675101, which is highly significant, indicating that the COVID-19 infection new cases are decreasing at the rate of 1.68%.
Baskar et al. 4 , reported that the progression of the COVID-19 infection in Tamil Nadu, India is showing decay in the number of infections of the disease which highlights the effectiveness of controlling measure.
The rho value is 0.7915, which indicates that the individual effects of cross-sections are 0.8%.The cross-sectional random effects in the context of the random effect model are calculated and presented in Table 10.The results reveal that in Kanniyakumari district, 24.04562 is positive and high in comparison to that in the other three districts.This may be due to extremely high infection rates.In the case of Tenkasi District, it is -30.81070, which is extremely exceptionally low.This is because of the incredibly low rate of infections in this district.In Thoothukudi District, the effect is 4.342125, which is exceptionally low in comparison to that of Kanniyakumar1 District.In case of Tirunelveli district, it is of 2.422954.This random effect value is exceptionally low in comparison to that of Kanniyakumari and Thoothukudi districts.The diagrammatic representation of random effects in four different districts is depicted in Fig. 4. Based on this result, the presence of random effects in all four different districts is confirmed.The result presented in    This model in future can be extended by taking several other parameters related to age, and inclusion of population with the case of respiratory ailments, etc.

CONCLUSION
In this study, the panel data regression model was found to be suitable for assessing the trends in COVID-19-infected cases.The random effects model was found to be suitable to study the trend.The study results reveal that the highest number of new cases was registered in Kanniyakumari, followed by Thoothukudi and Tirunelveli and Tenkasi districts.The lowest number of cases was observed in Tenkasi District.This difference may be due to the precautionary measures taken by the district administration.In general, the trends in the number of COVID-19 infected cases were found to decrease in all four districts by 1.68% in October 2020.
Significance statement: This study revealed that COVID-19-infected cases showed a decreasing trend.The study would be incredibly useful to administrators and decision-makers to take precautionary measures to stop COVID-19 infections.By knowing the COVID-19 infection trends, the academic personal could take appropriate decisions.

D 2 
=1 if the observation is from Tenkasi District and is 0 otherwise, 3i D =1 if the observation is from Thoothukudi and is 0 otherwise, and 4i D =1 if the observation is from Tirunelveli and is 0 otherwise.Here, 1  represents the intercept of Kanniyakumari, and α2, α3, and α4 are different intercept coefficients, tell by how much the intercepts of Tenkasi, Thoothukudi and Tirunelveli differ from that of Kanniyakumari District.Since the dummies are used to estimate the fixed effects, the model is also known as the least-squares dummy variable (LSDV) model; hence, one can conclude that the restricted panel regression model is invalid and that the LSDV model is valid.Random-Effect (RE) Model: The RE model assumes that individual-specific effects i  are random and one should include i  in the error term.Each cross-sections have the same slope parameters and a composite error term.So the model (1) become random-effect model (REM) here it  , ii and  are normally distributed with zero means and constant variances Fig.1(a).Total number of COVID-19 infected cases in all four districts in October 2020

Fig. 2
Fig.2 depicts the dates-wise comparison of trends exhibited in the four districts, in which the trend cure of Tenkasi District is at a lower level, followed by those of Thoothukudi, Tirunelveli and Kanniyakumari.From this trend pattern, it is noticeably clear that the highest number of infected cases is registered in Kanniyakumari District and the least number of registered cases in Tenkasi District.

Fig. 5 Fig. 5 .
Fig. 5 depicts and confirms that the coefficients of intercept and slope lie in the 99% confidence interval (CI)

Fig. 6 .
Fig. 6.Characteristics of residuals based on the random effects model.

Table 1 : Characteristics of Summary and Category wise Statistics
results obtained in this paper based on applying different statistical tools related to panel regression models are discussed in subsequent sections.This is the first kind of work based on COVID-19 infected case data sets; hence, the current findings are not compared with existing results available in the literature.
Summary Statistics:The results presented in Table1reveal that the highest number of COVID-19 infected cases was registered in Kanniyakumari (118), followed by Tirunelveli (86),Thoothukudi (77), and Tenkasi (49) districts.The lowest number of COVID-19 cases registered in Tenkasi (3) was followed by Thoothukudi (17), Tirunelveli (15), and Kanniyakumari (25) districts.In all four districts, the number of COVID-19-infected cases follows a normal distribution since the Jarque-Bera statistical values are nonsignificant at the 5% level of significance.Category wise Statistics:The categorical number of COVID-19-infected cases is given at the bottom of Table1.The results show that the number of infected cases in category [0 50) is 7 days in Kanniyakumari district, 31 days in Tenkasi, and 15 days each in Thoothukudi and Tirunelveli districts.In the case of [50 100), category 22 days in Kanniyakumari district and 16 days in each of the Thoothukudi and Tirunelveli districts, whereas the number of cases registered in the category of[100 150), is 2 days in Kanniyakumari districts only.Fig.1(a) depicts the total number of COVID-19 cases in all four districts in October -2020.The highest number of COVID-19-infected cases (2199)was registered in Kanniyakumari district, followed by Thoothukudi (1583), Tirunelveli (1523) and Tenkasi (484).In Tenkasi district, it is exceptionally low due to more awareness among the people, and the district administration might have taken more precautionary measures to prevent COVID-19 infections.

Table 2
reveal that the ANOVA F-test value 44,29316 and Welch Ftest value (64.31809) with a p-value=0.0000indicate that both tests are highly significant at the 1% level of significance, indicating that the number of COVID-19-infected cases is different from one district to another district (cross-section).

Table 2 : Analysis of variance test for equality of means for NCASE.
and Sudan respectively.By means of those models, a forecast of four month a heads future scenario COVID-19 prevalence (July until October 30, 2020) has made.This study models are at par with the study models of Rajarathinam et.al 1 .Unit Root Test: Before estimating the panel data regression model, it is necessary to determine the stationarity of the variable under study.The unit root test result presented in Table3reveals that since the Levin, Lin & Chu t statistics value-3.76884 is significant at the1% level of significance since the p-value is 0.0001 and hence the study variable, NCASE, is stationary at the level, the variable is I(0).

Table 3 : Characteristics of the unit root test.
of panel type and hence Levin, Lin & Chu ttest is employed.

Table 10 : Cross-Section Random Effects Values
Table 11indicates that the Breusch-Pagan LM test statistic value of 30.35083 and Pesaran scaled LM test statistic value of 7.029479are highly significant at the1% level of significance since both statistical p-values are equal to 0.0000, indicating that the null hypothesis of the test, "H0:There is constant variance among residuals", is rejected.Hence, the above random effect model has the problem of heteroscedasticity.

Table 11 : Characteristics of the residual cross-section dependence test
The Hausman test result presented in Table12reveals that the Chi-Sq.A statistical value of 0.0000 with 1 degree of freedom is nonsignificant at a5% significance level, and the nullhypothesis "H0:Random Effect Model" is accepted.So, among the three models viz.CCM, fixed effect and random effect models have emerged as appropriate models.