Comparison of COVID-19 patterns between severely-struck Italy and the rest of the worlds: an observational study

Background: The country of Italy was placed on lockdown when the number of cases of COVID-19 increased exponentially. It is of great concern which countries/regions would have the most similar patterns to Italy and the comparison of patterns, assuming homogeneity of variance within datasets, among regions worldwide, became necessary. There were no results in past literature regarding the identification of COVID-19 patterns among countries/regions. We were therefore motivated to propose an appropriate mathematic method, using a specific country/region (e.g., Italy), for example, in detecting similar patterns at the respective peak of the outbreak. A visual display highlig hting COVID-19 patterns was proposed in this study using vector mathematics. Methods: We downloaded COVID-19 outbreak numbers with confirmed cases in countries/regions on a daily basis from the Github website. The top peak point was identified for each country/region. Next, thirteen-time points were assigned before and after the inflection point. COVID-19 patterns were assessed by inspecting both similarity and distance using angle(or cosine theta in trigonometric function) and Chi-square statistics. Two sets of data on confirmed cases, one based on cases per 100,000 population and the other does not, were compared using rankings as well as four quadrants divided by similarity and distance. An app was developed to display regions with similar COVID-19 patterns to the selected country. Results : The top four countries presenting with the most similar COVID-19 patterns to Italy were Switzerland, Norway, Iceland, and Luxembourg with Chi-square statistics per one freedom degree at 0.12, 0.37, 0.58, and 0.80, respectively, based on a population of 100,000. Visualizations with four quadrants and world map were shown on Google Maps. Rank correlation in rankings was -0.03 between two sets of confirmed cases with and without using the basis of cases per population of 100,000. Conclusion : We proposed and demonstrated a method using both criteria of similarity and distance from vectors in mathematics to identify countries that are most similar to Italy in terms of the pattern at the peak of the outbreak. The app was developed to display countries/regions with the most similar COVID-19 patterns to the targeted country/region using a dashboard laid on Google Maps. This method can be generalized and applied to study other countries/regions for the current pandemic and other past infectious disease in history.


Background
The novel coronavirus was named COVID-19 by the World Health Organization (WHO) based on the time when the first positive case was in 2019 [1]. Although the origin of this virus was traced back to a seafood market and said to be related to bats [2], COVID-19 has spread globally and was defined as a pandemic by the WHO in March 2020 [3]. As of April 14, 2020, COVID-19 has affected 185 countries and regions around the world, with more than 1,979,477 confirmed cases and 125,910 deaths [4]. Attention was drawn to the more severely affected countries defined by case counts and comparisons were frequently made, such as US(608,377 and

Comparisons of COVID-19 epidemics and the limitations
Comparisons of COVID-19 epidemics among countries have been discussed in some articles [5][6][7]. The assumption of homogeneity of variance should be preliminarily ensured for the comparison of groups [8][9][10]. For instance, there are discrepancies in the transmission dynamics between the city of Wuhan(China) and elsewhere in the Hubei province, where the number of confirmed cases remains lower [7]. Similarly, it is unrealistic to assume that Italy would follow the pattern of that happened in Hubei, or more directly, to compare the pattern shown in Greater Wuhan, home to 19 million people, and the region of Lombardy with a population of 9 million where it was most affected across the country of Italy [7]. Observation of discrepancies and similarities in outbreak patterns among countries and regions would be beneficial in understanding COVID-19 further.

Comparisons of confirmed cases different on a population of 100, 000
We identified 5,054 articles by searching the keywords "case fatality rate" on PubMed Central (PMC) [11]. The use of incidence per 100,000 population to describe a disease has been suggested in the literature [12][13][14][15][16]. Countries most struck by the COVID-19 based on the infectious density can be identified in this way.

What reasons for Italy devastated by COVID-19 hit so hard
The country of Italy was placed on lockdown as the number of cases of COVID-19 soared. With 21,067 deaths, Italy's death toll is significantly higher than in other countries apart from the US(25,981) on April 14, 2020. The reasons Italy and its people were at risk for more severe infections were identified as below [17]: (1) an older population with a greater percentage of adults over the age of 65: (2) more than 21% of Italians are smokers, compared to those less than 14% in the U.S; (3) local customs such as to greet friends and loved ones with a kiss on both cheeks made it more difficult to adjust to the social distancing regulation as an effort fighting against COVID-19; (4) the high case fatality rate(CFR) in Italy might be due to the many asymptomatic or minimally symptomatic patients were not accounted for because of the lack of testing. It would be interesting to see if the countries with characteristics similar to Italy would follow the same outbreak pattern and is a vital focus of our study.
In this study, we developed an algorithm for (1) comparing the number of infected patients at the peak of COVID-19 outbreak, (2) visualizing patterns which are most similar to Italy, and (4) generalizing the similarity approach for observations of other targeted countries/regions on the COVID-19 epidemic.

Data sources
We downloaded COVID-19 outbreak data, including information on confirmed cases in infected countries/regions from Gibhub on April 16, 2020 [18]. All downloaded data were publicly deposited on the website [18]. Ethical approval was, therefore, not necessary for this study.

An algorithm for identifying similarity and distance in data
We applied a trigonometric function and vector mathematics to yield the angle and distance between two datasets, using the 13 elements before and after the peak point (i.e., the maximal daily number of confirmed cases for two compared time points deviated from the peak point). For simplicity, we illustrated three elements for two data strings named vectors (e.g., {0, 1, 0} and {0, 0, 2} based on the origin point at {1,0,0}) as below: Accordingly, the angle of the direction between two vectors (e.g., representing countries Italy and Spain in outbreak) can be referred to as the quantified feature of similarity. The elements can be expanded to numerous numbers. For example, using a series of daily numbers of confirmed cases was applied to this study, see Equation (1).

( )
Thus, The Eq1 can be simplified as cosθ= Therefore, the two directions denote the similarity by multiplying elements of two vectors mentioned in Eq.(1), see Figure 1.
The distance between two vectors describe the deviation based on Euclidean Distance(e.g., = ∑ ( ( ) − ( ) ) ) , Manhattan Distance(e.g., Similarly, the Chi-Square test (e.g., =∑ ) can be used in this study to represent the distance. The larger deviation means a substantial difference between the two areas in the context of a pandemic. For instance, five regions shared a similar outbreak pattern with Italy at the peak period, but distinct differences were found in the number of logarithmic confirmed case counts(denoted by the mean Chi-Square statistics in parentheses), see Figure 2, based on the raw data of ln(daily confirmed cases  [20]. The results of two domains (i.e., similarity on Axis Y and distance on Axis X) appeared suitable for classifying the features along with the similarity and distance in four quadrants.

A dashboard on Google Maps to present the results
A dashboard was developed for daily display of the results across various regions.
An app was developed to display those similarity and distance related to the designated country (e.g., Italy or others).
In the current study, the Google Maps platform was applied to demonstrate the Kano diagram as well as the world map to display the features that are similar to the targeted country.
Top 20 countries/regions with the most number of daily confirmed cases at the peak point were drawn using the pyramid plot.

Results
There are 15 countries/regions (colored by yellow bubbles) with similar patterns and closer distance to Italy(colored by a black bubble in quadrant I), see When the black bubble (i.e., Italy) is clicked, all bubbles will appear on a world map, see Figure 5. It can be seen that 15 yellow bubbles locate in Europe.
Readers are invited to click the bubble of interest and examine the country with the most similar patterns and shorter distances shown in quadrant II.

===Figure 5 inserted here===
Correlation in rankings was -0.03 between two datasets of confirmed cases, one was based on cases per 100,000 population, and the other was not, indicating that considerations with or without the 100,000 population basis yield distinct rankings upon the comparison of countries affected by COVID-19, see Table 1.

===Table 1 inserted here===
The top 20 countries/regions with the most number of daily confirmed cases at the peak point were drawn in Figure 6. We can see that France placed in the first place, followed by Hubei(China) and Spain. The peak date in France is on April 5, 2020, see   (Table 1).

What this finding adds to what we already knew
Numerous articles have discussed using CFRs in comparison to death tolls in various diseases [11]. The incidence rates and CFRs can be adjusted by a population of 100,000 to allow fair and reasonable comparison with one another [12][13][14][15][16]. We analyzed data using both confirmed case based on a population of 100, 000 and without, and found a distinct difference in rankings (Table 1).
Similarly, Figures 2 and 4 present differences in countries with outbreak patterns at the peak and the adjacent data points due to different definitions of confirmed cases in data. Surprisingly, all those countries have a commonly weird phenomenon that the numbers of daily confirmed cases on the date ahead of the peak point were a few; see Figures 2 and 4, which is worth studying in the future. situations, particularly for countries with a similar pattern at the peak of the outbreak.

What it implies and what should be changed
It is worth mention that the four countries (Switzerland, Norway, Iceland, and Luxembourg) have a similar outbreak pattern to Italy at its peak. Whether the four features that put Italy and its people at higher risk for severe disease [17] were in these countries would be interesting to study in the future.

Strengths of this study
Three features are highlighted below: (1) a vector mathematic method was proposed to assess the most similar COVID-19 pattern at the peak points for a designated country, which has never been seen before in literature but of importance when observing COV ID-19 situations in epidemiology; (2) two sets of confirmed cases with and without using a population of 100, 000 as basis were compared in terms of rankings using quadrants divided by similarity and distance. A significant difference was found in data providing evidence with the methods yielding distinct results; (3) two types of visualizations on the world map and four quadrants were provided to readers, using an app, which is unique and innovative in comparing two countries/regions with similar COVID-19 patterns based on the assumption of homogeneity of variance being ensured.

Limitations and future studies
Our study has some limitations. First, although the data were downloaded from Github [18] on a daily basis, we cannot be guaranteed that the difference in criteria of determining confirmed cases among regions would not affect the results. For instance, the confirmed cases are solely from clinically diagnosed cases in Hubei since Feb. 14, 2020 [21].
Second, although we applied both similarity and distance combined on a dashboard( Figure 3). The cutoff points for these two axes affect the consequent classification in Table 1. Attempts to replicate this study in the future should take into account the determination of cutoff-points used in Figure 3. Fourth, although we recommend using a population base method, the traditionally confirmed case counts cannot be neglected because the general public is more familiar with it and would understand it easily, despite the fact that infectious density would provide better interpretation of outbreak situations.
Finally, each government has its own strategy and policy against COVID19. We

Conclusion
We proposed and demonstrated a method using both criteria of similarity and distance from vectors in mathematics to identify countries that are most similar to Italy in terms of the pattern at the peak of the outbreak. The app was developed to display countries/regions with the most similar COVID-19 patterns to the targeted country/region using a dashboard laid on Google Maps. This method can be generalized and applied to study other countries/regions for the current pandemic and other past infectious disease in history.

List of abbreviations:
CC= correlation coefficient CFR=case fatality rate

PMC=Pubmed Central
Declarations Ethics approval and consent to participate Not applicable.
All data were downloaded from the website database at Github  Additional files Additional File 1: Xlsx file: study dataset