Gender Disparity in Risk Factors of COVID-19 Mortality Rates

Incidence and mortality rates due to COVID-19 have varied widely in different parts of the world and placed 2 a huge strain on hospital resources. Understanding the underlying reasons behind such variation is crucial 3 to developing population-specific or even individual-specific management strategies. This paper presents 4 a comprehensive analysis of incidence and mortality rates from data collected over a cumulative period of 5 approximately 6.5 months from February to August 2020 across 411 districts of India, totalling over 2 million 6 individuals. We identify the health factors which have both positive as well as negative correlates with high 7 mortality rates, using data obtained from district-wise aggregated COVID-19 incidence and mortality rates 8 and health data obtained from National Family Health Survey (NFHS). 9 To obtain robust indicators, we apply both machine learning techniques as well as classical statistical 10 methods and show that the same factors are identified by both methods. We also identify positive and 11 negative correlates at multiple population scales by dividing the cohort into sub-cohorts formed from two 12 Indian states which were further segregated by gender. 13 We show that there is a disparity of risk factors among males and females. While obesity is the highest 14 risk factor for men, anaemia is the highest risk factor for women. 15 Hence, to better manage the health of a specific group of people, it is important to consider gender-wise 16 heterogeneity in health risk factors which could contribute to differing vulnerabilities. 17

In this observational study, the impact of the health parameters as obtained from the NFHS data on the 54 mortality rate of COVID-19 in the reproductive age group of 15-49 at different population levels is studied. 55 The analysis of positive and negative correlates is performed on data spanning 411 districts across 23 states 56 and 3 union territories. We also repeat the analysis on a sub-cohort of two Indian states to analyze possible 57 heterogeneity of risk factors in different states. We perform aggregated as well as gender-wise analysis 58 on these two states and show that there are disparities in the positive and negative correlates of COVID-19 59 among men and women. We also discuss how these compare with gender-related risk and protective factors 60 in other diseases. 61 To our knowledge, this is the first study that considers both positive and negative correlates of mortality 62 rates, which could correspond to the risk and protective factors. This is also the first study that considers a 63 population at multiple levels of aggregation. Moreover, this is also the first time such an analysis has been 64 carried out on a large aggregate, totalling about 2,350,000 individuals. 65

66
The data processing pipeline for this study consists of the collection, cleaning, and analysis. Data extraction 67 from multiple sources constituted a significant portion of the effort. The Scikit-learn [15] and SciPy [16] 68 module in Python was used for the analysis. 69 70 Data related to incidence and mortality of COVID-19 was collected from multiple sources since no single 71 published source of such data is available. The data collected from the period of January 30 to August 18 72 include those put out by the government agencies, crowd-sourced data and daily media bulletins. Gender-73 wise data related to COVID-19 incidence and mortality is not published by the Ministry of Health and Family 74 Welfare, Government of India. However, some state governments issue official daily bulletins through formal 75 releases to the media. Data from these media bulletins were extracted for the period of April 15 to August 76 18, 2020 for the state of Karnataka and for the period of May 1 to August 18, 2020 for the state of Tamil 77 Nadu. A sample data of such a bulletin can be found in [17,18]. The sources from which data for this study 78 was sourced include: 79 • The Open Government Data Platform India website [19] contains data officially published by the Gov-80 ernment of India. Serial follow up of people who tested positive for COVID-19 was done and details 81 of age, sex and status (hospitalized, discharged, tested negative) was captured for a few selected 82 cities [20,21,22,23,24,25]. This site was last updated on July 30, 2020   from various issues during pregnancy, nutrition, population, literacy and more. 106 Details of total confirmed cases (which included the number of active, recovered, and deceased cases) 108 and the number of deaths were available for 800 districts. Of these, 535 districts overlapped with the 640 109 districts for which NFHS data was available. We considered the subset of these districts which had at least 110 5 deaths, resulting in a total of 411 districts, spanning across 23 states and 3 union territories of India on 111 which final analysis was done. These 411 districts represent a geographical extent of more than 3 million 112 square kilometres and a cumulative total of 2,331,363 cases and 46,239 deaths. 113 In the sub-cohort of the states of Karnataka and Tamil Nadu, gender-wise numbers of COVID-19 positive 114 cases was available only until 20th of July and 31st of May, 2020 respectively. Thereafter, only total positive 115 cases related to COVID-19 cases were published, and gender-wise segregation was not available. We 116 compute the gender-wise fraction in the age group 15-49 from this initial data, estimate the gender-wise 117 numbers for each district for later periods using this fraction and the total number of positive cases published 118 in the bulletins as of August 18, 2020. 119 Among the 93 key indicators included in the NFHS data, we selected the ones corresponding to adult 120 health indicators and further limited it to factors discussed in case reports and medical opinions in literature. 121 We also took into account chronic conditions that are unlikely to have changed in the period between the 122 collection of the health data and the COVID-19 pandemic. The broad categories of factors considered were 123 those related to low BMI, obesity, anaemia, blood pressure and diabetes. The set of health factors chosen 124 for analysis are enumerated in Table 1. 125 The health factors extracted from the NFHS data are published gender-wise. To obtain the gender-126 desegregated population value, the weighted average of each factor was computed according to the sex ratio 127 of each district. Through this process, aggregated health factors corresponding to the gender-desegregated 128 population of each district was obtained. 129

130
A Lasso regression [29] of the health factors on mortality rates calculated from the COVID-19 India data 131 was conducted on districts that reported at least 5 deaths. The health data was standardized before the 132 regression. For each Lasso test, the λ (regularisation parameter) with the best R 2 value was taken through 133 a search of the results from the Lasso_path function. Residual plots corresponding to this value of λ were 134 inspected visually to ensure there was no bias. 135 Independently, factors which differed significantly between the districts with high mortality rates and those 136 with low mortality rates were identified via the Mann-Whitney U test with a significance level of 0.05 corrected 137 by the Bonferroni criterion for each health factor. The districts were classified into two categories of low and 138 high mortality, depending on whether they fell below or above the second quartile in mortality rates. The 139 effect size was also calculated for all the factors between the two sub-groups using Cohen's d effect size and 140 interpreted according to the thresholds defined in [30] i.e. |d|≤0.2 is 'negligible' effect size, 0.2<|d|≤0.5 is 141 'small', 0.5<|d|≤0.8 is 'medium' and otherwise 'large'. 142 The factors obtained from the Mann-Whitney U test and the Lasso test were compared and common 143 factors identified as the risk and protective factors of the population. The same procedure was also followed 144 in the sub-cohort of the two states of Karnataka and Tamil Nadu. 145 Further, each state was analyzed independently and positive and negative correlative factors were iden-146 tified. For the gender-wise correlates using the Mann-Whitney U test, the data of each gender across the 147 two states were combined to yield significant numbers. 148

149
A preliminary two-tailed test performed on the Open Government Data, results of which presented in Table   150 2, shows that there is a significant difference between male and female mortality rates in some regions while 151 other regions do not show a significant difference. Further, though female mortality rates are higher than 152 male mortality rates in many regions, it is not possible to infer that this is uniformly true for all regions. correlation with mortality, with anaemia having the highest negative correlation at higher values of λ. 160 The results of the Mann-Whitney U test on the same data between the two categories of districts of 161 high and low values of mortality rates are shown in Table 3. Anaemia and obesity emerged as statistically 162 significant with small effect sizes. 163 By considering the factors that were identified by both the tests, we conclude that obesity (DM=-3.   169 we estimated the number of cases for 240,912 cases from Karnataka and 298,046 cases from Tamil Nadu. 170 Only those districts that had reported at least 5 deaths for each gender in the age group 15-49 from the 171 death data was considered for this analysis. This resulted in a total of 667 deaths from Karnataka and 556 172 deaths from Tamil Nadu. The distribution for the aggregated health factors in these districts can be seen in 173 Figure 3. 174 From the Lasso plot in Figure 4, BMI below normal, high blood sugar level, very high blood pressure are 175 the positive correlates while obesity and anaemia are the negative correlates. 176 The results of the Mann-Whitney U test on the same data between the two categories of districts of high 177 and low values of mortality rates are shown in Table 4. Obesity, BMI below normal and anaemia were found 178 to be statistically significant with medium effect sizes. 179 By considering the factors that were identified by both the tests, it is seen that BMI below normal   Table   202 5 for females and Table 6 for males, none of the factors showed statistical significance. However, the factors 203 of BMI below normal, obesity and anaemia for women and anaemia for men had non-negligible effect sizes. 204 By considering the factors that were identified by both the tests, it is seen that Anaemia (DM=4.

209
The factors identified by each test and the common factors at each level of population aggregation is shown 210 in Figure 8.  [13]. Pre-menopausal obesity has been shown to be associated with a lower risk of breast cancer [31,32], 218 thus suggesting that fat distribution could play a significant role in health conditions [33] and pre-menopausal 219 obesity could serve as a protective factor. 220 We conclude through this study that anaemia is positively correlated with COVID-19-related mortality 221 in women, but negatively correlated with mortality in men. The difference in the role played by anaemia 222 in women vs. men can be explained by the fact that anaemia, when present in men, is mild or moderate, 223 whereas the prevalence of severe anaemia is higher in women [34]. It is also evident from the gender-224 wise distribution in Figure 5 that anaemia is twice as prevalent in females as in males. Recent estimates 225 of iron-deficiency anaemia (IDA) show that 52% of women aged 15-49 are anaemic [35]. This difference in 226 prevalence could be significantly higher during menstruation and pregnancies. Severe anaemia has been 227 associated with higher maternal mortality [36]. Severe anaemia has also been associated with higher rates 228 of ICU admission in COVID-19 [37,38,39]. This first study of the effect of anaemia in COVID-19 suggests 229 that haemodilution could play a role in COVID-19 mortality. 230 The findings indicate that risk factors for COVID-19 mortality are by themselves heterogeneous, and their 231 effects need to be investigated in conjunction with gender, menopausal status, and severity of the condition 232 to understand them better. 233 The authors declare no competing interests. 235 Financial disclosures 236 The authors received no financial support for the research, authorship, and/or publication of this article.

Factor (%) Characteristic
Men whose Body Mass Index (BMI) is below normal BMI < 18.5 kg/m 2 Women whose Body Mass Index (BMI) is below normal BMI < 18.