Time Course Analysis of Age-Gender Effect on Severity of COVID-19 Outbreak in Spain and Italy


 Several data science studies have analysed the gender effect on COVID19 with at least one of these limitations: 1) missing comparison across countries; 2) data not age-stratified; 3) analysis based on single date of epidemic period instead of time-course; 4) few variables analysed; 5) gender bias not adjusted by country’s population in that strata. Here, we address these limitations. A wide range of variables on the severity of COVID-19 in relation to gender and age, are analysed over an extended time course from March 2020 to when data are publicly available. Spanish and Italian data only are considered because they are the unique open access data to be comprehensive and harmonized according to a comparable format. Altogether our findings offer two key evidence-driven recommendations. First, since data collection is disharmonic across Europe, the creation of a European institute for standards in biomedical data collection could play a crucial role for fast open-source dissemination and analysis of harmonized data, which in turn could foster rapid and coordinated decision making in emergency periods. Second, since COVID-19 severity particularly impacts 60+ males, containing interventions might be more age/gender-adaptive and, to increase effectiveness and efficiency, focus to contrast contagion of these categories at risk.


Introduction
In December 2019 in Wuhan City, China, the virus SARS-CoV2 which causes COVID-19 was first identified. In March 2020 COVID-19 was declared a pandemic. In the next weeks the virus spread across Europe and case numbers rose drastically. Especially Spain and Italy have had high case numbers overburdening the health care system and therefore are greatly impacted by this crisis. Analysing a small sample of cases from China in early March, and examining the early developments of COVID-19 epidemics in other countries, studies have indicated that age plays an important role in COVID-19 severity 1 , and that males and females have approximately the same likeliness to be infected by COVID-19, however males are much more likely to need hospital or ICU treatment and have a higher naïve case fatality rate. [2][3][4] Hence, there are evidences for a gendereffect on the severity of COVID-19. For clarity, in this study we use the word gender to intend mere biological sex, which is a binary interpretation of the more general definition of gender that accounts for social constructions relating to behaviours. Since the disease spread even more consistently to different countries, studies have analysed the influence of gender further, using one-day data from Italy 5 or the US 6 , Europe 6,7 or Republic of Korea 2 , finding similar results. These results were obtained by comparing female and male case numbers of significant variables associated with the disease such as admission to a hospital or to ICU but most importantly the number of confirmed cases and deaths. These states indicate the severity of the illness. A comparison of all available gender-divided data of COVID-19 cases lead to the conclusion that males are not more likely to get infected but have higher severity and fatality rate. 8,9 Comparing these results with similar diseases, for example SARS-CoV1 by using data from China and infecting mice, there is a higher mortality rate of males, especially in older age groups in SARS-CoV1 3,10 , a virus similar to SARS-CoV2. Studies which tried to examine the cause of the biased behaviour of COVID-19 towards gender have largely found that the two enzymes ACE2 6 and TMPRSS2 can be seen as connected to the gender bias of COVID-19. 7,11 Furthermore, a recent study highlighted crucial gender differences on the mechanisms of immune responses to SARS-CoV2 infection that can contribute to understand the origin of this gender bias 21 . Regrettably, existing data science studies, analysing the influence of gender on the development of COVID-19, have generally at least one of these five limitations: 1) they rely on data of a unique country and do not directly compare across countries; 2) they scarcely consider age-stratified data; 3) they generally focus on the data collected at a certain specific date of the epidemic period; 4) they consider just few specific variables associated with the epidemics (such as confirmed cases and deaths) and they do not analyse the difference between the time-trends of these variables; 5) they do not adjust the gender bias in the age strata according to the country population in that strata. This last point is particularly important because, if in a certain age range (such as for instance 90+) the number of females in the country population is significantly larger than the males, then the proportion of COVID-19 affected females might result larger than the males only because of the unbalance in the population. Thus, the same numbers need to be adjusted according to the country population in order to draw conclusions on the results. In this study, we try to address these limitations altogether. Open access data from Spain and Italy, including a wide range of variables on the severity of COVID-19 in relation to gender and age, are adjusted by country population and analysed over an extended time course from March 2020 to when data are publicly available, which embraces the main phases of the epidemics. For Spain, daily cumulative numbers of female and male confirmed cases, hospitalizations, ICU cases and deaths were compared and age stratified. For Italy, the same type of data was only available every few days. Finally, we alert that we had to consider Spanish and Italian data only, because they are the unique open-source data to be comprehensive (in the sense that they include many variables) and harmonized according to a comparable data format.

Data source
The Spanish daily cumulative data are obtained from the Spanish Ministry of Health, Consumer Affairs and Social Welfare 12 . Spanish confirmed cases, hospitalization, ICU cases and deaths from 24 th March to 18 th May were implemented divided by gender and the age strata "0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89" and "90+". Additionally, the country population distribution in January 2020 by gender and age was also used. 13 The same type of COVID-19 cumulative data was obtained for Italy, but only the following dates were available: 12 th , 16 th , 19 th , 23 rd , 26 th and 30 th of March, 2 nd , 6 th , 9 th , 16 th , 23 rd and 28 th of April, 7 th , 14 th , 20 th and 26 th of May, 3 rd , 9 th , 16 th , 23 rd and 30 th of June, 7 th , 14 th , 21 st and 28 th of July, 4 th , 11 th and 25 th of August, 1 st , 8 th , 15 th , 22 nd and 29 th of September and 6 th and 13 th of October. 14 Furthermore, data describing the country population distribution of Italy in January 2019 regarding gender in the specific age strata was used. 15

Data Analysis
To analyse the data 5 different figures were created. The cumulative data used in all figures are divided by the population distribution to correct for trends induced merely by the population distribution. For example, to correct for the fact that females have a higher live expectancy and therefore are more present in older age strata; and that older age strata have smaller proportion on the total population than middle age strata. Figures not adjusted by the population can be found in supplementary information. Each figure presents the difference between the male and female case numbers over the course of time on the left panel and in total on the right panel, stratified by tenyear age strata. All the formula reported below are computed considered the variables and country population in each specific age strata.  (6) is used to create plot C. Plot E is created by using the formula male ICU cases male country population * male hospitalization − female ICU cases female country population * male hospitalization (7) The reason to introduce at the denominator of each fraction also the adjustment factor for male or female country population can be explained with this rationale. Let us consider a first example where: 10 individuals die (or hospitalized) on 100 male confirmed cases with a male country population of 1000 individuals; then the male ratio is 10 / (1000 * 100) = 0.0001. Let us consider a second example where: 10 individuals die on 100 male confirmed cases with a male country population of 100 individuals; then the male ratio is 10 / (100 * 100) = 0.001. In both the examples the standard case fatality ratio (or hospitalization ratio) would be computed as deaths/confirmed cases (hospitalizations/confirmed cases) resulting in the same value for the two examples, but this does not account for the fact that in the second example the male country population was one order of magnitude smaller than the population in the first example. Therefore, in the second example the death (or hospitalization) impacts the male population much more consistently than in the first example, and this is appropriately reflected by our adjustment, which provides an higher value for the second example. The same logic, which is a general type of adjustment, can be applied also for the ICU/hospitalization ratio. In plot B, D and F the time course summation of the curves of the plots A, C and E respectively was presented in form of bars sorted by value. Figure 4 and 5 show Italian data formatted the same way as the Spanish data, however only for confirmed cases, deaths and the naïve case fatality rate since only these data were available. The respective formula used for plots are the same as above for Spain data. On all plots a Pearson Chi-Square statistical test for the inequality of two proportions was applied and then corrected by the Benjamini-Hochberg-procedure for multiple hypothesis testing to determine the significance of the difference between male and female values.
In each time-related plot, we emphasize the influence of changes in social restrictions due to government intervention by grey vertical lines drawn in correspondence of the first day of the intervention. In Spain some restrictions were lifted, and non-essential workers could return to work on the 13 th of April 16 . In Italy non-essential workers returned on the 3 rd of May and on 18 th of May citizens were able to leave the house freely again 17,18 . Finally, we have to report an anomaly and possible aberration in the Spanish data. In the day range 22 to 26 that is 14 th to 18 th April a disproportioned spike in the data in all age strata suggests that the provided numbers for female ICU cases are defective. This data was interpolated with cubic spline interpolation to not influence the results. Cubic spline interpolation was used since it is effective when recreating missing data from a set of functional data. The created data was used in figure 2C, 2D, 3E and 3F. Plots consisting of the original data can be found in the supplementary figures.

Results
The study is composed of two parts. In the first part we analyse the Spanish cumulative data ( Fig.1-3), in the second part the Italian cumulative data (Fig.4-5). Fig.1A,B report the time-course analysis of confirmed cases in COVID-19 Spanish outbreak. In Spain, in the age of 20-49 there are significantly more female then male confirmed cases. Age strata 0-9 and 10-19 have no significant difference between the number of male and female confirmed cases. In the age strata 50-59 (more females) and 90+ (more males) the gender difference in confirmed cases is visible by still not significant. Age groups 60-89 have significantly more male cases than females. Age groups 60-89 have significantly more male cases than females.
Looking at the time-course curves in Fig.1A, they have stable monotonic trend for all age groups except that for 90+, whose curve in the first days has more male cases than females but, in mid-April, this changes drastically and there are significantly more female confirmed cases than male cases. Fig. 1C,D report the time-course analysis of deaths in COVID-19 Spain outbreak. The older the age group the higher the difference between the number of male and female deaths. The gender difference starts to be visible and significant from 60+ (Fig.1D). In general, gender difference grows substantially with time in older age groups 70+ in comparison to groups 69- (Fig.1C). Fig. 2A,B report the time-course analysis of hospitalization in COVID-19 Spanish outbreak. Males are significantly more than females in ages groups 40+ (Fig.2B), where the older the age group the greater the time-course difference growth ( Fig.2A) and, as a consequence, also the respective area under the curve (Fig.2B). Fig. 2C,D report the time-course analysis of ICU cases in COVID-19 Spanish outbreak. Generally, the male cases in ICU are significantly greater in age groups 40-79, for which is also more pronounced the gender difference growth over time (Fig.2C). 80+ have visible but not significant male prevalence. Age groups 0-39 do not shown significant gender difference. Fig. 3A,B report the time-course analysis of the naïve case fatality rate and Fig. 3C,D report the ratio hospitalization to confirmed cases in COVID-19 Spanish outbreak. For the age groups 0-79 the difference of the naïve case fatality rate in Spain and the Hospitalization/Confirmed cases rate are slightly positive. The age group 80-89 has a greater difference and 90+ has the highest difference. The difference of the ratios seems to be constant for the time course. Fig. 3E,F report the time-course analysis of the ratio ICU cases to hospitalization in COVID-19 Spanish outbreak. The time course analysis evidences that in the first 10 days of registered data the curve of 90+ has an important drop in gender difference in comparison to all the other curves, and then it stabilizes (Fig.3E). In general, hospitalized males that go to ICU are larger than females regardless of the age strata, but the difference is significant only for 60+. Notably, 0-9 age strata has a visible difference and is the first not-significant age strata in the ranking (Fig.3F). Fig. 4A,B report the time-course analysis of the confirmed cases in COVID-19 in Italian outbreak. There are significantly more female than male cases in the age strata 40-49, and the opposite (males more than females) significant trend is confirmed for the range 60-89. Age strata 0-9 and 10-19 have no significant difference between the number of male and female confirmed cases. In the age strata 50-59 and 20-39 (more females) the gender difference in confirmed cases is visible by still not significant. All these trends are in remarkable agreement with the Spain data. Interestingly, looking at the time-course 90+ curves in Italy (Fig.4A) as well as in Spain (Fig.1A), they have a very similar trend according to which in the first recorded days male confirmed cases are more than female cases but, in mid-April, this changes drastically and there are significantly more female confirmed cases than male cases. Both in Italy and Spain, also 80-89 has a similar time-trend (see Fig. 1A and 4A) to 90+ although less pronounced. Fig. 4C,D report the time-course analysis of the deaths in COVID-19 in Italian outbreak. In the age strata 60+ there are significantly more male cases than female cases. In all age strata the difference of male and female cases becomes greater over time, the older the age group the higher the rise in the difference.
In the age strata 60+ the naïve case fatality rate in Italy is significantly higher with male cases than with female cases as it is in Spain. The difference in the 90+ naïve case fatality rate is drastically higher than the difference in all other naïve case fatality rates. Also, only in the 90+ age strata there is a rise in the difference of the naïve case fatality rate over time.

Discussion
Overall, these results confirm that males have a significantly greater number of deaths, hospitalizations and ICU cases whereas the number of confirmed cases is more dependent on the age.
In Italy as well as in Spain there exist more female confirmed cases in the age range of 20 to 59 which can be possibly explained with the fact that the professions which are likely to put worker in contact with COVID-19 positive people are more often carried out by females 19 . In the age group 60-89 the disease seems to have a greater impact on males than on females 6,7,20 . Furthermore, there are more confirmed male cases than females in this age group, and some studies suggest that females have a higher chance to get less or no symptoms 6,7,20 , and therefore might not realise they are infected. There are generally more male deaths, hospitalizations and ICU cases then female deaths which confirms the studies which show that males are more effected by COVID-19 than females 6,7,20 . The difference of male and female deaths, hospitalization and ICU cases rises in the course of time, probably connected to the rise in cases in general. For deaths and hospitalization, the older the age strata the higher the rise in cases, proving that the older males are more affected than younger males and females. Both in Italy and in Spain the difference of the naïve case fatality rate is significant for age strata 60+; and the 90+ age strata has a much higher difference than the other age strata, which shows again that old males are most affected by the virus. For the age strata under 60 years there is no significant difference between male and female naïve case fatality rate showing that, in these age groups, males are not more likely to die from COVID-19 than females. Similar to the naïve case fatality rate the difference between male and female hospitalization adjusted by the confirmed cases is significant in the older age strata. The difference is even significant for younger age strata above 20 years. The 90+ age strata shows the highest difference. In summary, this study offers strong evidence that male infected (especially older males) are usually having a severe and more deadly course of the disease. The time-course analysis shows that for deaths, hospitalization and ICU cases the gender difference increases. For confirmed cases whether or not more males are affected is determined by age and the monotonic trend do not change drastically over the course of time, except for the curve of 90+. Indeed, both in Spain and Italy, the 90+ curve astonishingly shows a trend inversion (from male prevalence to female prevalence) around mid-April: a result that at the best of our knowledge is not reported in previous literature and for which we do not have an explanation yet. A speculation could be that since females tend to have less severe symptoms, at the beginning of the epidemics testing of 90+ females was possibly penalized in favour of 90+ males. Possible factors that created this bias might be when the healthcare system collapsed in Italy and Spain: (i) testing was prioritized for more severe cases hence for males; insufficient number of hospital beds led to a reduced admission of 90+ age strata in general and to a prioritization of 90+ male tests. Altogether our findings help to offer two key evidence-driven recommendations to policymakers. First, since the data collection is disharmonic across Europe, European Union might consider to realize an institute for standards in biomedical data collection, which can play a crucial role for rapid and open-source dissemination and analysis of data especially in epidemics. This can be fundamental for rapid and coordinated decision making in emergency periods. Second, since COVID-19 severity is particularly severe in 60+ male category, social interventions (such as lockdown) and technological interventions (such as contact-tracing) to contrast COVID-19 impact on population might be more age/gender-adaptive 1,5 and, in order to increase effectiveness and efficiency, concentrate on these categories at risk.