Finding the onset of COVID-19-a correlation analysis with in uenza epidemic around the world


 Background. COVID-19 showed similar and overlapping symptoms compared with seasonal influenza. It is difficult to distinguish them, especially in the early stage of the outbreak. The confluence of the two diseases might result in considerable morbidity, it is doubtful that whether COVID-19 had already affected the morbidity of influenza earlier than the first report.Methods. We conducted Kolmogorov-Smirnov Test and Kruskal Wallis Test to discover seasonal and regional distributions of influenza and COVID-19. Cluster analysis was utilized to explore possible influence factors. Spearman Test was carried out for analyzing correlations between the two diseases. We employed Arima Model to predict time series of WMI. We proved differences between the forecasted and the original time series of influenza from 2019 to 2021 by Mann-Whitney U Test. Then we observed first abnormal peaks on the time series, tracing back to the onset of COVID-19 affecting influenza compared with the first-report time.Results. WMI and WMC varied significantly in four seasons, five continents and the ten selected countries. Cluster analysis divided the data into two groups according to country, continent, population and morbidity. WMI of China, Israel, Honduras, Morocco and Nigeria were correlated with WMC. The forecasted and the original time series of influenza from 2019 to 2021 were significantly different. Compared with the forecasted one, some abnormal peaks firstly appeared on the original time series of influenza around Dec.31st, 2018 on Austria, Norway, Morocco and Nigeria, Jan.28th, 2019 on South Africa, Apr.8th, 2019 on Marshall Islands, Jul.7th, 2019 on America, Sep.30th, 2019 on China and Israel, Mar.11th, 2020 on Honduras.Conclusions. Winter and autumn were the high incidence season for influenza and COVID-19, respectively. Oceania and Americas owned the highest incidence rate for these two diseases. Human immunity, continents, countries’ policies and population were possible influence factors. Only in Honduras, the first reported COVID-19 case happened concurrently with the abnormal value of the ILI. And in the rest of the included countries, COVID-19 might happen earlier than its first reports. Among these regions, COVID-19 might firstly affect Africa in the first week of 2019.


Introduction
COVID-19 was firstly reported in the winter of 2019 accompanied with several symptoms including coughing, fever and so on [1] . It has been being emergent to control this disease effectively for its readily communicable feature (higher transmissibility than seasonal influenza and SARS-CoV) as well as pandemic situation [2][3][4][5] .
Many countries have recommended or imposed various policies, and the effectiveness of implementation relied on people's awareness, knowledge and attitude towards COVID-19 [6][7][8] . This was one of the reasons that made the status of the epidemic control varied greatly in different countries around the world. Most respiratory infectious diseases were usually self-limiting [9] except for a few serious consequences in specific populations [10][11][12][13][14][15][16] , making it difficult to detect and diagnose them in the early stage. At the same time, their symptoms were often partially similar and overlapping, for example, seasonal influenza and COVID-19 share the fever and respiratory symptoms [17][18][19] , making it difficult to distinguish them. What's more, COVID-19, as a new disease, lacked prior diagnostic experience and prompt awareness. These facts led us to explore whether COVID-19 has been misdiagnosed as seasonal influenza earlier than its first report around the world.
Recently, the traceability of COVID-19 attracts much of attention. It have been reported that COVID-19 cases might appear earlier than its first reports in some countries [20][21][22] . A study confirmed that the confluence of COVID-19 and influenza might result in considerable morbidity [23] . For tracing back to the earliest misdiagnosis and confluence, we conducted statistical analysis and time-series analysis. We explored the distribution characteristics of COVID-19 and influenza, the correlation between the two diseases, the possible influencing factors, and abnormal time series of seasonal influenza recent years. During these processes, we solved difficulties like stability transformation of time series and unification of data scale. To our best knowledge, the earliest incidence time of COVID- 19 is still an open problem. We hope our results will provide guidance to find the possible sources of COVID-19 and to carry out the prevention and control measures well.

Collection of Data
To reflect the situation of all continents in the world, the number of countries and the total population in each continent were used as the references for sampling in this study. On the one hand, the number of countries included in each continent should be consistent with the proportion of countries among the five continents (Asia: Europe: Americas: Oceania: Africa = 48:48:51:21:58 ≈ 2:2:2:1:3). On the other hand, the relative size of the population included in each continent should be consistent with the relative size of the population among the five continents (Asia > Africa ≈ Americas > Europe > Oceania).
Besides, countries included should provide both the data of COVID-19 cases in 2020 and the data of influenza-like illness from 2011 to 2020. Countries missing too much of data were excluded.
According to the above criteria, China, Israel, Austria, Norway, America, Honduras, Marshall Islands, Morocco, Nigeria and South Africa were included in this study. Data of these ten countries was obtained from WHO [24,25] , Chines National Influenza Center [26] and Country meters [27] websites.

Data Processing
To address the missing ILI data according to the periodic features, if whole-year new cases data was missing, the data of the nearest year would be used to replace the missing one; if non-whole-year new cases data was missing, 0 would be used to replace the missing data. We used formulas listed below to calculate WMI and WMC. We used cluster analysis to explore possible influence factors of the distribution differences of the two diseases. After operating Kolmogorov-Smirnov Test and Kruskal Wallis Test for verifying the differences amid ten countries included, we conducted K-means cluster analysis to look for the features of different clusters during the epidemics.
Spearman Test was carried out for correlations between WMI and WMC in the same country. As the correlations were confirmed, we continued to do time-series analysis for exploring the possible appearance of COVID-19.
IBM SPSS Statistics 24 was used to conduct statistical analysis. P<0.05 was considered to be significant.

Time-series Analysis
In this study, Python 3 was used to conduct time-series Analysis. We affected influenza time series.

Quality Control
In this study, countries were randomly stratified sampling according to the inclusion and exclusion criteria. All the results were verified repeatedly by two investigators.

Distribution Features of Influenza and COVID-19
WMI of eight countries in northern hemisphere and of two countries in southern hemisphere looked different in four seasons in Figure 1. Then the K-S Test's result showed nonnormal distribution (P<0.001, Figure 2, Table 3), thus we conducted Kruskal Wallis Test, proving that mean WMI of four seasons were significantly different (P<0.001, Figure 2, Table 3) with the highest grade of season 4 (winter, grade=3211.77, Figure 2, Table 3).
In Figure 3, curves about WMC of eight countries in northern hemisphere and of two countries in southern hemisphere behaved different appearances in four seasons.
Then the K-S Test's result showed nonnormal distribution (P<0.001, Figure 4, Table   3), thus we operated Kruskal Wallis Test, confirming that the mean of WMC for four seasons were significantly different (P<0.001, Figure 4, Table 3) with the highest grade of season 3 (autumn, grade=334.91, Figure 4, Table 3).
Curves about WMI of ten countries trended differently in five continents in Figure 5. Then the K-S Test's result showed nonnormal distribution (P<0.001, Figure   5, Table 3), thus we conducted Kruskal Wallis Test, gaining significant difference of mean WMI between five continents (P<0.001, Figure 5, Table 3). Continent 4 had the highest grade (Oceania, grade= 4003.91, Figure 5, Table 3). Figure 6 depicted different curves about WMC of ten countries in five continents.

Possible Influence Factors of Influenza and COVID-19
Kolmogorov-Smirnov Tests showed that WMI and WMC of the ten countries were all nonnormal (P<0.001, Table 4), hence Kruskal Wallis Tests were choosed which proved significantly differences of the mean WMI (P<0.001, Table 5) and the mean WMC (P<0.001, Table 5) between ten countries. The highest grade of the two Kruskal Wallis Tests were country 7 (Marshall Islands, grade=4003.91, Table 5) and country 5 (America, grade=405.85, Table 5).
Cluster analysis for possible influencing factors of influenza divided the data into two clusters according to different cluster centers of country, continent, population and morbidity (cluster centers of cluster 1 including country 1, continent 1, population of 1406170204 and morbidity of 0.0000420151; cluster centers of cluster 2 including country 6, continent 3, population of 69267633 and morbidity of 0.0101980371; Table   5). And the same analysis for COVID-19 divided the data into two clusters according to different cluster centers of country, continent, population and morbidity (cluster centers of cluster 1 including country 6, continent 3, population of 73603300 and morbidity of 0.0477510497; cluster centers of cluster 2 including country 1, continent 1, population of 1436838423 and morbidity of 0.0001248127; Table 6). As we could see, China was divided into different cluster from other countries.

Time-series Analysis
We  Table 8.
K-S Tests showed nonnormal distribution of the original time series and the forecasted time series of influenza from 2019 to 2021 (P<0.001, Table 9) apart from the forecasted time series of Nigeria (P=0.2000, Table 9), then we proved that the two time series of America (P=0.012) and other nine countries (P<0.001) were significantly different by Mann-Whitney U Test (Table 9). So far, the forecasted time  In Nigeria, the first abnormal peak started from Dec.31st, 2018; In South Africa, the first abnormal peak started from Jan.28th, 2019 (Table 10).

Discussion
WMI and WMC of the ten countries were significantly different in the four seasons. Since season 4 of WMI performed the highest grade, influenza would be more prevalent in winter. And season 3 of WMC performed the highest grade, COVID-19 would be more prevalent in autumn. A previous study had shown that high incidence of influenza appeared majorly in autumn and winter in temperate zone, and could occurred throughout the year in tropical zone [28] . Among the countries involved in our study, most of China, Israel, Austria, Norway, America, Morocco and South Africa located in temperate zone, while the other three countries were tropical countries. This confirmed that our conclusions of high incidence season of influenza and COVID-19 were scientific to a certain extent. Besides, a study summarized that all over world respiratory virus would be more easier to spread in winter including influenza virus and human coronavirus [29] , which was consistent with our results.
Nevertheless, we found that COVID-19 would be more prevalent in autumn than winter worldwide, which uncovered from the side that it might happen earlier in autumn than its first observation in winter.
In this study, WMI of ten countries were also proved to be significantly different in five continents as continent 4 performed the highest grade, presenting that influenza would be more prevalent in Oceania. A research speculated that new seasonal influenza A viruses might appear in Asia, then firstly spread to Oceania with quick variation and reproduction [30] . However, our study showed that COVID-19 would be more prevalent in Americas so far as continent 3 performed the highest grade in variance analysis of WMC of ten countries in five continents. The fact from WHO that as of Apr. 2021 the number of COVID-19 diagnoses in the Americas was the largest among the five continents seemed to confirm this as well [25] .
Then we explored influence factors of the two epidemics to try to explained all the differences to some extent. Firstly, the seasonal variations of the two epidemics might be attributed to seasonal human immunity. For example, it has been confirmed that seasonal solar radiation [31] and vitamin D level [32,33] could influence human immunity. As solar radiation was the lowest in winter in temperate zone, vitamin D would be deficient during this time [34][35][36][37] . Secondly, we have found that the mean WMI of the ten countries were significantly different in the highest grade of country 7 the largest population [27] , the first reported COVID-19 case as well as rapid government actions, we also considered influence factors from aspect of country's policies and population. Several studies have shown that morbidity of influenza could be decreased after conduction of NPIs [38,39] . And a study analyzed whether various governmental policies for addressing the COVID-19 pandemic in 177 countries were appropriate, finding different reaction degrees of these countries and suggesting that governments should promote their response facing COVID-19 [40] . Therefore, we speculated that country's policy was one of the influence factors. What's more, we noticed that WHO had informed influenza spreading easily in crowded places [41] . And a paper reported that American population-level interest in telehealth was positively correlated with increased COVID-19 cases, however, the present telehealth level might not satisfied their population demand [42] . These studies supported our hypothesis about the impact of population on influenza and COVID-19. Therefore, we believed that different countries with different human immunity, continents, policies and population were possible influence factors of influenza and COVID-19.
Back to the doubt about whether COVID-19 happened earlier than its first observation, we conducted correlation analysis and time-series analysis. Figure 9 and were correlated with WMC in 2020-2021, we speculated that in these five countries, influenza might also impact COVID-19 itself except for the misdiagnosis led by similar and overlapping symptoms between the two diseases [17][18][19]43] , thus the abnormal peaks of these five countries might come from the correlation between the two diseases as well as misdiagnosis. And the impact from COVID-19 on influenza in the other five countries Austria, Norway, America, Marshall Islands and South African were not that strong especially, but the misdiagnosis still exist according to the abnormal peak value. Among the five countries, abnormal peaks started earliest in Morocco and Nigeria from Dec.31st, 2018. These two countries were African countries, reminding us that perhaps Africa was affected earlier than other continents.
We need more data from other continents and countries to verify.
There were still some deficiencies in our study. The original data of influenza on WHO website missed partly, and it didn't contain the data of different type of influenza. Nevertheless, given that we have handled the missing values, and we have included the data of ten years which was long enough, even if there is no typing in the original data, the overall trend of influenza would still show its seasonality, thus we believed that these data would be meaningful and significant.

Conclusion
High incidence season of influenza was winter, and Oceania owned the highest incidence rate. High incidence season of COVID-19 was autumn, and Americas owned the highest incidence rate. Human immunity in different countries, continents, countries' policies and population possibly influenced the distribution differences.
Besides, COVID-19 might happen earlier than its first reports in China, Israel, Austria,             Table 10 Comparison between the dates of the first abnormal peak and the first report of COVID-19

Countries
When the COVID-19 firstly reported*