Panel Data Modelling of COVID-19 Infected Cases


 Background and Objective: The novel coronavirus pandemic, known as COVID-19, could not have been more predictable; thus, the world encountered health crises and substantial economic crises. This paper analyses the trends in COVID-19 cases in October 2020 in four southern districts of Tamil Nadu state, India, using a panel regression model. Materials and Methods: Panel data on the number of COVID-19-infected cases were collected from daily bulletins published through the official website www.stopcorona.tn.gov.in maintained by the Government of Tamil Nadu state, India. Panel data regression models were employed to study the trends. EViews Ver.11. Software was used to estimate the model and its parameters. Results: In all four districts, the COVID-19-infected case data followed a normal distribution. Maximum numbers of COVID-19-infected cases were registered in Kanniyakumari, followed by Tirunelveli, Thoothukudi and Tenkasi districts. The fewest COVID-19 cases were registered in Tenkasi, followed by Tirunelveli, Thoothukudi and Kanniyakumari districts. A random e2ffects model was found to be an appropriate model to study the trend. Conclusion: The panel data regression model is found to be more appropriate than traditional models. The Hausman test and Wald test confirmed the selection of the random effects model. The Jarque-Bera normality test ensured the normality of the residuals. In all four districts under study, the number of COVID-19 infections showed a decreasing trend at a rate of 1.68% during October 2020.


INTRODUCTION
The 2019 COVID-19 pandemic has received much attention, as it has affected most economies worldwide and resulted in innumerable deaths. Because no antiviral drugs or vaccines exist, the number of new coronavirus-affected cases has increased tremendously, and many people have died. The development of various methodologies to analyse these pandemic data has become an especially important research area regarding the prediction of future coronavirus cases.
An investigation has been made by estimating hidden models and trends of  infected cases in all the 37 districts of Tamil Nadu State, India during the period from 1 st August, 2020 to 31 st December, 2020 by using different curve fitting tools. The result reported that decreases in trend have been observed in all the district 1  The study aims to analyse the spatial distribution of COVID-19 incidence in Brazils municipalities (counties) and investigate its association with socio dimorphic determinants to better understand the social context and the epidemic's spread in the country. The study period was February 25 to September 26,2020 using the ordinary least squares (OLS), spatial autoregressive model (SAR), and conditional autoregressive model (CAR) and the local regression model called multiscale geographically weighted regression (MGWR). The MGWR model fit improved when compared to the OLS, SAR and CAR 5 .
A pandemic is an epidemic spread over huge geographical area. COVID-19 is 5 th such pandemic documented after 1918 flu pandemic. In this work, we frame a mathematical epidemic model taking inspiration from the classic Susceptible (S)-Infected (I)-Recovered (R), (SIR) model and develop a compartmental model with ten compartments to study coronavirus dynamics in India and three of its most affected states, namely, Maharashtra, Karnataka, and Tamil Nadu, with inclusion of factors related to face mask efficacy contact tracing and testing along with quarantine and isolation 6  Based on the above information, the present study aimed to assess the trends in the number of cases related to COVID-19, i.e., whether the number of cases decreased or increased.
Additionally, the impacts of COVID-19 in four different southern districts of Tamil Nadu, namely, Kanniyakumari, Tenkasi, Thoothukudi and Tirunelveli, were investigated by using panel data models.

Materials:
The COVID-19 dataset was collected from the official Tamil Nadu Government website www.stopcorona.tn.gov.in. In this novel study, only four southern districts of Tamil Nadu, including Kanniyakumari, Tenkasi, Thoothukudi and Tirunelveli, were considered in October 2020.
Methods: Panel data are a type of data that contain observations of multiple phenomena collected over different time periods for the same group of individuals, units, or entities. In short, econometric panel data are multidimensional data collected over a given period. 9,10 A simple panel data regression model is specified as where it  are the estimated residuals from the panel regression analysis. Here, Y is the dependent variable, X is the independent or explanatory variable, and   are the intercept and slope, i stands for the i th cross-sectional unit and t for the t th month, and X is assumed to be nonstochastic and the error term to follow classical assumptions, namely, In this study, i, the number of cross-sections is 4 (i=1, 2, 3, 4), and t=1, 2, 3,…, 30. Detailed discussions of panel data models are given in 11 .
Unit Root Test: Unit roots for the panel data can be tested using either the Leuin-Llin-Chu 12 test or the Hadri 13 LM stationarity test. The null hypothesis is that panels contain unit roots, and the alternative hypothesis is that panels are stationary. In the results, if the p value is less than 0.05, then one can reject the null hypothesis and accept the alternative hypothesis.
Similarly, the unit root for the first difference can also be tested using a similar method.

The Constant Coefficients Model :
The Constant Coefficients Model (CCM) assumes that all coefficients (intercept and slope) remain unchanged across cross-sectional units and over time.
In other words, the CCM ignores the space and time dimensions of panel data. Put differently, under the CCM, the cross-sectional units are assumed to be homogeneous such that the values of intercept and slope coefficients are the same irrespective of the cross-sectional unit being considered. Accepting this homogeneity assumption (also called the pooling assumption), the CCM uses the panel (or pooled) data set and applies the ordinary least squares (OLS) method to estimate unknown parameters of the model. Thus, the CCM is nothing but a straightforward application of OLS to a given panel or pooled data to obtain estimates for unknown parameters of the model.

Individual Specific-Effect Model:
Here, it is assumed that there is unobserved heterogeneity across individuals and captured by i  . The main question is whether the individual-specific effects i  are correlated with the regressor; if they are correlated, a fixed effects model exists.
If these factors are not correlated, a random effects model exists.

Fixed-Effect OR Least-Square Dummy Variable Regression Model:
One way to take into account the individuality of each district or each cross-sectional unit is to let the intercept vary for each district but still assume that the slope coefficients are constant across districts. The model is written as : 9,10 The difference in the intercept may be due to COVID-19 infections pre-cautionary measure followed in each of the four district.
The model (2) is known as the fixed effect model (FEM), The term "fixed effects" is due to the fact that, although the intercept may differ across districts, each district's intercept does not vary over time; that is, it is time invariant.
These fixed effects models can be implemented with the dummy variable technique. Therefore, the fixed effects model can be written as Rho is the interclass correlation of the error or the fraction of the variance in the error term due to individual-specific effects. These variables approach 1 if individual effects dominate the idiosyncratic error 9,10 .

Hausman test 14 :
The null hypothesis of the Hausman test is that the preferred model includes random effects and not fixed effects. This test determines whether the unique error ( i  ) is correlated with the regressor, and the null hypothesis is that they are not correlated. The random effects estimator is highly efficient, so it should be used if the Hausman test supports it. The Hausman test statistic can be calculated only for time-varying regressors and is given as follows: Ho: The null hypothesis of the test states that there is constant variance among residuals.

RESULTS AND DISCUSSION
The results obtained in this paper based on applying different statistical tools related to panel regression models are discussed in subsequent sections. This is the first kind of work based on COVID-19 infected case data sets; hence, the current findings are not compared with existing results available in the literature.

Fig.1(a). Total number of COVID-19 infected cases in all four districts in October 2020
In Kanniyakumari District, Fig. 1  Roy and Bhattacharya 3 , also asserted that imposition of a country wide lockdown plays an especially important role in restraining the spread of COVID-19 infections.

Fig. 1(d) of Thoothukudi District shows that the highest number of 77 cases registered on 9 th
October decreased to 42 cases at the end of the month. Very peculiar upward and downward trends have been observed in this district.

Fig. 1(d): Daily COVID-19 infected cases in Thoothukudi District
In the case of Tirunelveli District, Fig. 1(e) depicts a stepwise declining trend, with the highest number of 86 infected cases on 1 st October directly declining to 15 at the end of the month.

Fig. 1(e): Daily COVID-19 infected cases in Tirunelveli District
The pattern of COVID-19-infected cases of Kanniyakumari and Tirunelveli followed a similar declining trend, whereas the same upward and downward trend was noted in the cases of Tenkasi and Thoothukudi.

Fig. 2. Comparisons of COVID-19 infection trends in all four districts
The results presented in Table 2    Additionally, the estimated Durbin-Watson value of 0.242273 is quite low, which suggests the presence of autocorrelation in the data. The estimated model assumes that the slope coefficients of time variables X are all identical in all four states. Therefore, despite its simplicity, the CCM may distort the true relationship between the dependent variable-the number of cases (NCASE)-and time, the independent variable X, across the four districts.
Raymundo et al. 5 , reported that the MGWR model was the most suited model in comparison of OLS (CCM) model in the study of spatial analysis of COVID-19 incidence and the sociodemographic context in Brazil.

Fixed-Effect OR Least-Square Dummy Variable Regression Model:
The results presented in Table 5 reveal that the fixed effect model explains 82% of the variation in the dependent variable. The model is highly significant at the 1% level of significance. The dummy variables were also highly significant at the 1% level of significance. The root mean square error is 11.4868 with the S.E. of regression is 11.7256.

NCASE=97.7370-1.6751*X-55.3225*(D2)-19.8709*(D3)-21.8064*(D4)
Based on the statistical significance at the 1% level of significance of the estimated coefficients and the substantial increase in the R 2 value to 82% (significant at the 1% level of significance), one can conclude that the fixed effects model or the LSDSV regression model performs better than the panel least-squares regression model (CCM). The cross-sectional fixed effects (as deviations from the common intercept) in the context of the fixed effect model are calculated and presented in Table 6. In Kanniyakumari district, 24.2500 is positive and high in comparison to that in the other three districts. This may be due to extremely high infection rates. In the case of Tenkasi District, it is -31.07258, which is extremely exceptionally low. This is because in this district, an incredibly low rate of infections is noted. In Thoothukudi District, the effect of 4.379032 is exceptionally low in The diagrammatic representation of fixed effects in four different districts is depicted in Fig.  Fig.3

. Fixed effect in different districts
To confirm the presence of the fixed effect, the redundant fixed effect test was carried out, and the results are presented in Table 7. The test results reveal that the Cross-section F and Chisquare statistics values are significant at the 1% level of significance, indicating that the presence of fixed effects is different from one district to another.  The rho value is 0.7915, which indicates that the individual effects of cross-sections are 0.8%. This random effect value is exceptionally low in comparison to that of Kanniyakumari and Thoothukudi districts. The diagrammatic representation of random effects in four different districts is depicted in Fig. 4. Based on this result, the presence of random effects in all four different districts is confirmed.

Breusch-Pagan Lagrange-Multiplier Test (Heteroskedasticity Test):
The result presented in Table 11 indicates that the Breusch-Pagan LM test statistic value of 30.35083 and Pesaran scaled LM test statistic value of 7.029479 are highly significant at the 1% level of significance since both statistical p-values are equal to 0.0000, indicating that the null hypothesis of the test, "H0: There is constant variance among residuals", is rejected. Hence, the above random effect model has the problem of heteroscedasticity.  Table 12 reveals that the Chi-Sq. A statistical value of 0.0000 with 1 degree of freedom is nonsignificant at a 5% significance level, and the null hypothesis "H0: Random Effect Model" is accepted. So, among the three models viz. CCM, fixed effect and random effect models have emerged as appropriate models. X

Fig.5. Confidence interval
Additionally, Fig. 6 depicts that the residuals are normally distributed because the Jarque-Bera test value is 3.1389, which is found to be nonsignificant at the 5% level of significance.

FUTURE STUDY
This model in future can be extended by taking several other parameters related to age, and inclusion of population with the case of respiratory ailments, etc.

CONCLUSION
In this study, the panel data regression model was found to be suitable for assessing the trends in COVID-19-infected cases. The random effects model was found to be suitable to study the trend. The study results reveal that the highest number of new cases was registered in Kanniyakumari, followed by Thoothukudi and Tirunelveli and Tenkasi districts. The lowest number of cases was observed in Tenkasi District. This difference may be due to the precautionary measures taken by the district administration. In general, the trends in the number of COVID-19 infected cases were found to decrease in all four districts by 1.68% in October 2020.
Significance statement: This study revealed that COVID-19-infected cases showed a decreasing trend. The study would be incredibly useful to administrators and decision-makers to take precautionary measures to stop COVID-19 infections.