Modeling of the Weekly Variation of the Reported COVID-19 Cases as a Potential Indicator of the Surveillance System Accuracy

Periodical daily variation in the number of reported COVID-19 cases within weeks is a common observation in global and national statistics. This variation may imply that the day of week has a significant role in the number of reported cases. We compared the pattern in some countries with an acceptable surveillance system. Data of 18 European and North American countries between 6 Mar and 8 Nov 2020 were extracts. Harmonic regression models were used to quantify the peak day, the absolute intensity and the average of coefficient of variation within weeks (ACVW) classified by country. In eight countries, the within week variation was statistically significant, the maximum and minimum number reported cases were in Saturday and Monday respectively, however, this pattern varied among countries. The maximum of ACVW was observed in Belgium and France, while it was minimum in Russia. The level of intensity of infection had a positive association with the ACVW (r=0.54, p-value=0.021). The observed variation and its pattern may show that the coverage or the tidiness of COVID-19 surveillance system fluctuates in different days of week. In addition, we suggest that the level of this fluctuation might be used as an accuracy indicator of the surveillance system.


Introduction
COVID-19, as an emerging infectious disease, has affected almost all the countries around the world with more than 1.7 million death by the third week of December 2020 [1] . The rapid spread of this novel coronavirus has also highlighted the need for rapid mitigating responses. Moreover, available data can be used to estimate the epidemiological trends of Coronavirus disease, model the infection, and decide on resource allocation. In this regard, time-trend analysis can provide a dynamic view of changes in a population's health status Time trends may focus on pattern of changes in an indicator over time and how quickly or slowly these changes increase or decrease. Furthermore, drawing incidence rates and related death trends can help describe and interpret the outbreak's situation. To achieve these goals, access to clean, valid, reliable, generalizable, and timely data is essential [2,3] . Sources of data and data collection approaches might vary in different providers, technical requirements, and data from population-based surveys; however, having a standard and valid process to report epidemiological data is vital since any biases in either data collection or reporting process may affect the quality of data interpretation.
Determining the trend of health outcomes and interpret the causality between various factors purported to affect incidence and mortality rates, we may find different patterns on weekdays and weekends due to the weekend effect. The weekend effect is a type of bias leading to a weekly time frame for some health outcomes, such as admission cases, case detection, or deaths, and can refer to a selection bias for weekend versus weekday patients. some large-scale research works have evidenced that hospital inpatients admitted during the weekend have a significantly increased compared with those admitted on a weekday, while weekend patients may not be sicker than weekday patients and severity of illness is not different significantly [4][5][6] . Weekend has been defined differently in various countries. For instance, in most European and American countries, it is typically defined as the period starting from Friday midnight to Sunday midnight. Weekend could have important implications for patients, healthcare providers, and policymakers [4] .
Moreover, understanding the seasonal and temporal variations of diseases occurrence is essential in epidemiology. These characteristics will help detect emerging outbreaks, determine the success of intervention programs, and forecast future behaviors [7][8][9] . Regarding Coronavirus Disease, there are also appropriate approaches using parametric and non-parametric methods to investigate the pattern of daily reported cases.
Though numerous well-proved complex models are available for time series data analysis, researchers prefer regression-based methods because of their diversity and flexibility in adopting amendments during model building procedures [10][11][12][13][14] . The harmonic regression model provides a good fit for trend and periodic patterns with relatively symmetric rises and falls in light of the above mentioned. This model could also use some indices to indirectly show the accuracy of the health care system and disease management system [11] . Therefore, this study aimed to investigate the trend of COVID-19 cases using a harmonic regression model in 18 countries with the most detected cases across the globe. Comparisons between the groups would help conclude the incidence patterns of the diseases in different populations and probably rate countries with these indices.

Data Sources
Data were first obtained from the Our World in Data (OWID) website (https://ourworldindata.org/coronavirus), which reports the daily number of newly confirmed cases of COVID-19 from 18 European and North American countries, whose reported cumulative cases are more than 50,000. Data included at the time that each country reported more than 100 cumulative cases to 8 Nov 2020. Next, all the collected data were cleaned and checked with the reference source of data from ECDC. Zero items of new daily cases and considered complete weeks (The weekly cycle begins with Monday) were also removed in all countries. Finally, a timeseries analysis of weekly reported COVID 19 cases was performed using a harmonic regression model.

Conceptual framework of seasonality
"Seasonality" is defined as systematic or periodic oscillations in a variable of interest in a specific course of time [15] . It can be determined by the amplitude and phase angle of a seasonal fluctuation via the base model as follows: (1) = + cos(2 + ) + ( ) Where is a time series of the outcome of interest measured at time t, is the constant baseline of , is the frequency of periodic component, and is the amplitude of fluctuations measured by the difference of maximum and minimum of a one-cycle seasonal curve. In this equation, if = 0, there is no seasonal increase. Also, is the phase angle reflecting the peak timing relative to the origin. For example, in an annual cycle, the origin can be set on either Jan. 1st or any other day. For ease of estimation, the model 1can be reformulated as follows: (2) = + sin(2 ) + sicosn(2 ) + ( ) Where = − sin( ) and = cos( ) are the model coefficients. is defined by = 1 , where M equals the unit of the analysis. This parameter equals 4, 12, and 365.25 in quarterly, monthly, and daily data, respectively. The amplitude ( ) and phase angle ( ) of model (1) and their variances can be calculated through the estimated parameters in model 2 by the delta method (Appendix A). The harmonic regression model (2) can be applied to a variety of actual data. In this study, the outcome time series variable was daily counts of newly confirmed cases of COVID-19 in 18 countries in Europe and North America. We also adapted model 2 for the poisondistributed outcome to form our GLM model as below: Where is the count of the t-th day of the i-th week, t values range from 1 to 7, while i values vary from 1 to L ( the number of weeks each country was studied In this model, is the period within every week, and for our daily data, = 1 7 , then M=7. Thus, the above equation could be rewritten as: MODEL A: Model A can be extended to a more efficient model (Model B) by replacing sine and cosine functions with two wave Fourier transformation functions. First, we transformed time in model A into = 2 ( − ), where is phase angel. Next, we used two models (C and D) as two extensions of model A to capture slight shifts in phase angel ( ). Models B, C, and D are shown in the Appendix A. All these four models (A, B, C, and D) were separately applied to each country's data. Then, we chose the most appropriate model concerning Akaike Information Criteria (AIC) of each country under study. All analysis was conducted in R, version 4.0.1. it should be also pointed out that all the notations and equations used for estimation are presented in Table 1.

Results
The average of coefficient of variation within weeks (ACVW) Table 2 shows the summary of statistics for each country. As seen in this table, the daily average of newly confirmed cases is shown in rows for each country. Also, the lowest to the highest average of newly confirmed cases for each country is represented by pale to dark red color. The last row of this table shows the total average of daily new confirmed cases for all countries. Further, we calculated the coefficient of variation (CV) of newly reported cases during a week in each country under the study. The average of coefficient of variation within weeks (ACVW) ant it's Standard Error (SE) were also measured and shown so that Belgium and France had the maximum ACVW [0.44 (0.06)] and Russia had the minimum ACVW [0.08 (0.01)]. The greater ACVW indicated more variability in daily reported cases. The number of weeks that each country participated in the study varies from each other. Italy with 37 weeks had the maximum number, while Spain with 17 weeks had the minimum number of weeks.

Harmonic regression model
The harmonic regression model was expected to show cyclic variations each week. Table 3 shows the best-fitted model, estimated coefficients, intensity, and the peak day for each country. The weekend effect is statistically significant for eight countries (Belgium, Canada, Czech Republic, Sweden, Switzerland, Ukraine, United Kingdom, and the United States). The newly reported cases on weekends were 2.51 times for Belgium, 1.20 times for Canada, 1.22 times for Czech Republic, 1.70 times for Sweden, 1.24 times for Ukraine, 1.12 times for United Kingdom, and 1.13 times for the United States. The estimated peak day of these countries was also around the third, fourth, fifth, and sixth day of the week. In Switzerland, we found a negative weekend effect on in a way that the newly reported cases decreased by 26% on weekend. In this country, the estimated peak day was around the second day of the week.

Visualizing real data and fitted data
In addition, Figure 1 shows the time series plot of the observed daily new confirmed cases with black points and their fitted values with red points for all the 18 countries. As can be seen, some countries have more fluctuations than others. For example, the estimated peak days for the seven countries whose weekend effects were significantly positive are obvious. We can see that Belgium has an overall peak on the fourth day of the week and dips on the second day of the week. We can also notice that the model estimate of the peak day of Belgium is almost 4 (3.64 in Table 2). Based on Canada's chart, there were slight variations in the number of confirmed cases, but the peak day was almost the fourth day of the week and the model estimate of peak day was around 4 (3.49 in Table 2). In the Czech Republic plot, on the other hand, it was clear that in most weeks, the peak day was around the fifth day and the model estimate of peak day was almost 4 (4.36 in table 2). At a glance, from the Sweden chart, we could state that there was a high variation with a positive slope overall trend in reported new confirmed cases. Based on this chart, the peak day was around the fifth day of the week and the model estimate of peak day was 4.22. In the Ukraine plot, the sixth day of the week was shown as the peak day and the model estimate of peak day was almost 6 (5.04 in Table  2). Likewise, from the United Kingdom chart, we could perceive that there was a peak day almost on the sixth day of the week and the estimated peak day was almost 5. Considering the United States chart, it can be affirmed that there were more fluctuations than other countries and the observed peak day of sixth, the estimated peak day is also nearby fifth day. In most of the 18 countries, the reported number of newly confirmed cases started from small values on the first three days of the week and ended in larger values on the weekends.

Grouping countries by their peak days and intensity
Another characteristic estimated by the model was an intensity that is an indicator of disease severity. The severity index utilized cyclical regression to measure the intensity of baseline disease. In Figure 2, the 18 countries are grouped by their peak days and the intensity is shown on the y-axis. Except for Belgium, with the intensity value of 236, other countries had an intensity ranged between 12 and 96. Given the information, it can be stated that France, United Kingdom, the United States, and Ukraine had their peaks on the 5th day of the week. The peak day of the Czech Republic, Germany, Spain, Belgium, and Sweden was on the 4th day of the week, etc. there was also no statistically significant difference between peak day and intensity (r=0.22, p-value=0.39). Additionally, the ACVW and intensity showed a significantly positive correlation (r=0.54, p-value=0.021).

Discussion
In the present study, the weekend effect was investigated on the reported new cases of COVID-19 in 18 countries in Europe and North America. These countries are popular for their registration and health care systems [16] . Freemantle et al. declared that in healthcare, a pervasive phenomenon known as 'the weekend effect' suggests that patients admitted to hospitals on Saturday and Sunday have an increased risk of death [17] . On this account, we intended to examine the weekend effect on the reporting of new cases of COVID-19 to answer the question of whether the report of new confirmed cases of COVID-19 on the weekend is more than the workdays or not.
To capture the seasonal behaviors in time series data, we used the harmonic regression. This model is well-accepted as a standard procedure to examine seasonal patterns in diseases occurrence in the epidemiological and biostatistical communities [18] . To this end, we first built a base model and then extended it using sine and cosine transform functions. An adapted harmonic regression methodology has been now well-established to explore trends and find peak timing [11] . We used a negative binomial with harmonic terms over the Poisson model due to the overdispersion confirmed by a statistical test. During the COVID-19 pandemic, taking a closer look at the rates of new case rates and understanding the reasons for their fluctuations allowed the health policymakers to assess their policies and practices to control this disease in the best way.
Moreover, we estimated and calculated some indices, such as peak timing and ACVW to determine whether the rate of newly reported cases of COVID-19 on the weekend was more than workdays. In most the 18 studied countries, periodic changes were noticeable. Meanwhile, some countries, including the United States, Canada, Sweden, Switzerland, Russia, and Belarus had more fluctuations. Other countries had a smooth curve, especially from May to September. Furthermore, in most countries, the coefficient of the weekend effect was not significant probably due to the fact that in these countries, the daily reports of the new cases of COVID-19 had been reported almost uniformly throughout the week and had not been dependent on the day of the week. The countries with significant weekend effects also displayed more fluctuations that may have been caused by some problems with their care and registry system. In other words, it can be stated that those countries without significant weekend effect and fewer fluctuations had not clearly reported the actual number of affected cases, and so the data is expected to be too uniform and smooth.
A common way to measure the magnitude of cyclic patterns is simply to divide the standard deviation by the mean, called coefficient of variation (CV) [19] . In the present research, we calculated the average of coefficient of variation within weeks (ACVW) of newly confirmed cases of each country to determine the ACVW and its standard error as a scale to measure variations in a cyclic pattern. This ACVW and intensity have a significant positive correlation (r=0.54, p-value=0.021). Wenger et al. analyzed 13 influenza seasons by harmonic model and found the peak week of each season. They also detected a positive correlation between peak week and intensity, meaning that the earlier the peak in an influenza season starts, the more intensely the season is experienced [10] . In our study, there was no statistically significant difference between peak day and intensity (r=0.22, p-value=0.39). Ramanathan et al. analyzed four-time series datasets by introducing four various types of harmonic regression models. In the end, they selected the best model with low Root Mean Square Error (RMSE) and Bayesian Information Criteria (BIC). They also calculated the peak timing of each dataset [11] . In this study, we used the same four types of harmonic regression models as Ramanathan et al. to choose the best model with minimum Akaike Information Criteria (AIC).

Study limitations and Suggestion for the Future Studies:
Only a few countries from Europe and North America participated in the present study. In all of these countries, weekends include Saturdays and Sundays. While the weekend varies among different countries. It should be also pointed out that the daily new confirmed cases may be reported with a time lag, which is probably not the same for all the countries. The countries do not have the same accuracy in recording data, either. These differences did not cause serious problems in our modeling because we focused on find a pattern for changes occurred in the reported new cases during the week; therefore, the absolute numbers of new cases were not of importance. Additionally, the only real disadvantage (compared with a seasonal ARIMA model) is that the seasonality is assumed to be fixed (the seasonal pattern is not allowed to change over time). To overcome this limitation, we used an extended base model A (models B, C, and D). For future studies, the harmonic regression is suggested to be used for other countries of the world as well as countries with different weekend days.

Conclusions
This article can be a guide for health policymakers. It is necessary to consider whether the countries claiming to have an advanced health system are successful in correctly and transparently reporting the data of COVID-19. Some countries may be purposefully unclear in their reporting to produce too smooth data. Meanwhile, some countries may notice a significant increase in new cases on weekends due to the differences in the health care and registry services are organized during the week. The regular periodic process occurring in reporting new cases over the weekend (in some countries with less intensity and in some countries with more intensity), for any of the above two reasons, needs to think, review and resolve possible problems.