Why do we die? Uncovering the genetic connections causing COVID-19 disease severity

SARS-CoV-2 attacks randomly causing deaths among certain populations. Additionally, this disease seems to kill males preferentially. This study investigates why by examining single nucleotide polymorphisms (SNPs) correlated with increased protein expression associated with viral infection of cells leading to disease proliferation. ACE2, the assumed host cell receptor for the virus is believed to be assisted by co-receptors ENPEP and ANPEP. TMPRSS2 cleaves the S viral protein into two sub-units allowing viral binding to the ACE2 receptor. ACE2 found on the X-chromosome has SNPs increasing ACE2 expression found at frequencies greater than 50% in all male populations analyzed which could account for the increase in male deaths. Females would undergo X-inactivation for the SNPs and have protection from the increased ACE2 expression in all their cells. ACE2, ENPEP, and TMPRSS2 were also found to have population specic SNP patterns which could account for the increased prevalence of disease among certain populations.


Introduction
Severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) causes COVID-19, a disease that has spread into a worldwide pandemic. As of April 19th, 2020, the John Hopkins coronavirus resource center reports 2,394,291 cases of SARS-CoV-2 [1]. The World Health Organization (WHO), weekly surveillance report for Europe had 50% male cases and a 60% male mortality rate [2]. The United States has 755,533 cases, by far the most of any other nation. 10 The United States reports 38,664 deaths, a 5.3% case-fatality [1].
Coronaviruses utilize the S protein to bind to host cells facilitating entry and infection [3][4][5][6]. The S protein is cleaved into two subunits, S1 and S2 exposing the receptor binder domain (RBD) which is the mechanism for viral cell entry [4][5][6]. SARS-CoV-2 receptor binding domain (RBD) on S1 utilizes the angiotensin converting enzyme-2 (ACE2) receptor for attachment and entry to host cells. This process is mediated by the S2 subunit. Modi cations in RBD show SARS-CoV-2 has 10x more a nity for ACE2 compared to SARS-COVID1 [7].
Reports of lower levels of ACE2 in AT2 cells suggest SARS-CoV-2 utilizes co-receptors to facilitate entry [5,8]. Reports suggest glutamyl aminopeptidase (ENPEP), alanine aminopeptidase (ANPEP), and DPP4 may serve as co-receptors [8]. ENPEP is involved with the RAS system, associated with hypertension, and has higher expression in the kidneys [12][13][14][15]. Cardiac and kidney complications have been reported with SARS-CoV-2 patients. ENPEP receptors are suggested to play an important role in the RAS system in the brain [12,16,17]. Neurological and central nervous system complications in patients have been reported [18]. ANPEP may serve as a binding co-receptor for coronaviruses and is expressed in lung, kidney, and intestine [8,19]. SARS-CoV-2 patients have reported acute kidney and cardiac issues [ 20]. Studies suggest transmembrane protease serine 2 (TMPRSS2) cleaves the S protein prior to binding to ACE2 and both proteins were shown to be co-expressed in lungs [21]. TMPRSS2 has been suggested to have higher expression than ACE2 [9]. TMPRSS2 and ACE2 were shown to be co-expressed in the esophagus, colon, ileum, nasal goblet and ciliated cells [9]. Co-receptor binding of SARS-CoV-2 may explain the random sudden on-set of cardiac, kidney, and neurological symptoms in non-comorbid patients.
The ACE2 receptor is found on the X-Chromosome [22]. X-Linked inactivation (XCl) is the process in female cells of randomly inactivating one X chromosome for proper gene dosage between X-linked genes [23,24]. The maternal or paternal choice of inactivation is random between inherited X chromosomes and the processed genes are passed on to offspring cells [23]. Heterozygous females carrying one wild type and one mutant allele will always have some cellular populations expressing the wild type allele and some cellular populations expressing the mutant allele [23,24]. This may be mixed within tissues. Reports of XCl involving ACE2 suggest variable heterozygous sex bias [22].
Single nucleotide polymorphisms (SNPs) and structural variants (SVs) are a common form of genetic variation. SNPs found near a gene regulatory region may affect gene expression. SVs and SNPs have been associated in genetic disorders such as autism, schizophrenia, and male infertility disorders [25][26][27][28]. Differential expression linked to SNPs across divergent world populations may re ect the prevalence of ACE2 receptors, (the SARS-CoV-2 mechanism of entry) within tissues. Males with ACE2 polymorphisms have risk factors for essential hypertension. Increased ACE2 expression, and SVs of ACE2 were found to increase or decrease binding a nity to the SARS-CoV-2 S1 protein [29][30][31]. Changes in allelic expression may contribute to the higher SARS-CoV-2 infection and mortality rate in males.
COVID-19 is a global pandemic that has caused millions to fall ill and thousands to die. People around the globe want to know why does this disease randomly affect some populations and not others? Why do males seem to be dying at a higher frequency than females? This study aims to address these two puzzling questions and elucidate possible sex and population risk pro les using SNPs found to increase expression of ACE2, TMPRSS2, and ENPEP across tissues. Sex and population differences between frequency of these SNPs may lead to increased expression of these important proteins leading to heightened infection and disease severity.

Death and infection frequencies
Males are dying at a higher frequency than females on a global scale. It is not expected that males would succumb to COVID-19 at a higher frequency than females; however, when compared to a 50% expected death frequency and a 50% expected infection frequency, it was found that New York reported a signi cant p-value and an expected probability greater than 0.50 for both male infection and male death by COVID-19 (Table 1a). Three states were found to have a signi cant p-value and an expected probability greater than 0.50 for male infections by COVID-19: New Jersey, Utah, and South Dakota (Table 1b). Twelve states were found to have a signi cant p-value and an expected probability greater than 0.50 for male deaths by COVID-19 when compared to 50% expected probability: Arizona, Colorado, Florida, Illinois, Indiana, Michigan, North Carolina, Ohio, Pennsylvania, Virginia, Washington, and Wisconsin (Table 1c).  (Table 2a). Six countries had a signi cant p-value and an expected probability greater than 0.50 for male infections by COVID-19. However, these countries did not report death data aggregated by sex. These countries include Japan, Singapore, Panama, Algeria, Bangladesh, and Chile (Table 2b). Twenty countries had a signi cant p-value and an expected probability greater than 0.50 for male deaths by COVID-19 (Table 2c). These data denote out of 43 countries reporting death frequencies for males, 32 have a frequency higher than 0.50. In Greece, Dominican Republic, The Netherlands, Thailand, and Romania the male frequency of death when compared to male infections was more than twice the frequency found for female deaths when compared to female infections. In all countries tested that reported male infections and male death frequencies, all but India and Pakistan had male death frequencies compared to male infections at least 1.5 times higher than female death frequencies when compared to female infections (Supplemental Table 1).  Figure 3).
Benjamini & Hochberg FDR analysis found 70 signi cant pairs across populations (q-value <0.05). A SNP which increases expression of TMPRSS2, which is believed to cleave the viral S protein into S1 and S2 prior to virus entry into the cells is found at similar levels across the global population (shown in peach, Figure 2, Supplemental Table 4). For all populations the alternate allele was found at a frequency of at least 50% with the country having the highest frequency being Italy with 81.3%. Other populations with greater than 80% frequency for the alternate allele include Peruvians (80.2%), Esan in Nigeria (81.3%) and Southern Han Chinese (81%) (Figure 2, Supplemental Table 4). Chi square analysis across global populations did not nd a signi cant difference in this SNP between males and females for TMPRSS2.
SNPs found across the transcription start site (TSS) in GTEX for ANPEP eQTL cause a decrease in ANPEP expression. ANPEP is believed to be a co-receptor that may assist in SARS-CoV-2 entry into cells. Statistical analysis did not indicate signi cant differences in prevalence across the global populations for these SNPs. This likely indicates if ANPEP is acting as a co-receptor, no population has increased protection over the others as no population has an increased prevalence for SNPs that would lower ANPEP expression. Chisquare analysis did not nd signi cant differences for these SNPs between males and females across populations. As ANPEP is not signi cant across populations it is not included in Figure 2.
Additionally, what may be of importance is Asian populations do not contain SNPs for ENPEP (Figure 2, Supplemental Table 4). Chi square analysis did not show a signi cant difference by sex across populations.

Discussion
ACE2 is found in a region of the X chromosome that is variable for genes escaping X inactivation. ACE2 once believed to be an escape gene displays heterogeneous sex bias across tissues with a propensity for increased expression in males in several tissues including lung, esophagus and stomach, some of the very tissues involved in COVID-19 infection [22]. Increased expression of ACE2 may exacerbate this sex bias in tissues and may increase the risk males are subjected to from COVID-19. In fact, a study investigating ACE2 expression patterns among 8 lung transplant donors found no association between ACE2 expression and smoking but did nd higher ACE2 expression in male donors [10]. SNPs near the TSS of the ACE2 receptor are associated with increased expression of this protein [32][33]. Males with these SNPs may have increased expression of the ACE2 receptor in all their cells expressing ACE2 as they only have one X chromosome ( Figure 5). Females with similar expression patterns to these males would be those homozygous for the SNPs, meaning they would have two X chromosomes both containing alleles for the SNPs. However, females undergoing X inactivation would reduce their amount of ACE2 receptor to the same level as that of a male who has an X chromosome containing the SNPs [22] ( Figure 5). Heterozygous females, meaning they have SNPs on one X chromosome and no SNPs on the other, would not have as high level of ACE2 expression as males with the SNPs [22][23]. This is due to random X inactivation in a woman's cells. Only a percentage of her cells would have increased ACE2 expression offering a level of protection ( Figure 5). Women who are homozygous for no SNPs would have less ACE2 expression overall as they would still undergo X inactivation leading to cells expressing one X chromosome. This would be comparable to males who do not have SNPs. In all populations looked at in this study the frequencies for males with SNPs were greater than 50%. Females who were homozygous for the SNPs were found at much lower frequencies. Lung cells of males have been shown to have a higher number of ACE2 receptors than females which leads to increased infection [7,10]. This increased number of ACE2 receptors may be directly related to increased expression of the ACE2 gene as more than 50% of males across the globe contain SNPs associated with increased expression of the ACE2 gene. It has been shown that polymorphisms in the ACE2 gene correlate with increased risk for hypertension; therefore, it is not unreasonable to believe that ACE2 polymorphisms could also affect one's risk for COVID-19 infection [29]. As ACE2 expression may be increased and is seen in lungs, heart, kidney, ileum, brain, esophagus, colon, nasal goblet and ciliated cells these tissues could be at risk for attack by the virus. Additionally, ACE2 plays an important role in the homeostasis of the renin-angiotensin system (RAS) which is important for regulating heart, kidney and lungs both physiologically and pathologically. When SARS-CoV-2 binds ACE2 there is systematic deprivation of ACE2 to the tissues leading to an increase in vasoconstriction, in ammation and brosis [7].
To investigate what makes some populations or individuals of an ancestry more susceptible to COVID-19, one needs to consider the genetic players contributing to infection and severity of disease, of which ACE2 is only one. TMPRSS2 was found to be highly co-expressed with ACE2 in nasal goblet and ciliated cells which could serve as locations for initial infection or as reservoirs for infectivity between individuals [9].
This co-expression is found in esophagus, ileum, and colon as well, which may lead to other routes of infection [9,21]. Populations with larger frequencies of SNPs increasing ACE2 and TMPRSS2 expression could increase susceptibility to infection as well as severity of disease across multiple tissues. ENPEP, like ACE2 plays an important role within the RAS system and with hypertensive conditions as ENPEP has higher expression levels in the kidneys. Cardiac and nephrotic complications have been reported with SARS-CoV-2 patients. In fact, COVID-19 associated nephritis may serve as a marker of disease severity and capillary leak syndrome; a predictor of uid overload, respiratory failure, and death [34]. Additionally, ENPEP receptors are suggested to play an important role in the RAS system in the brain [12,16,17]. Patients are exhibiting symptoms of brain clots and stroke related to COVID-19 infection even in younger patients. The Mount Sinai Health System treated 5 COVID-19 patients all under 50 years of age for large vessel stroke over a two-week period. This same health system only reported 0.73 patients over a two-week period for the last 12 months [35].
In total, populations that have increased frequencies of SNPs may have increased expression of all three of these formidable proteins for COVID-19 initiation and propagation contributing to this insidious disease. populations. This study may serve as a building block for future genetic studies to further investigate the role these variants as well as other variants may play in the severity of this disease.
Countries that may be winning the battle against COVID-19 may be doing so because of the genetic makeup of their people. Recent challenges found while investigating treatments for COVID-19 may be due to ancestral and sex differences leading to differential expression of these important proteins, thus convoluting treatment effect. The susceptibility of individuals to this disease may be due to each person's coordinated concert of genetic milieu making treatment e cacy challenging without taking the potential for personalized medicine into consideration.

Statistical Testing for male infection rates and death rates
A binomial test was used to compare the probability of males versus females who either contracted COVID-19 or died from the disease. R (version 3.6.3) was used to perform the binomial test [36]. In this code, x equaled the number of infected males or the males who died from COVID-19, where n equaled the total number of infections or deaths from COVID-19 which included both females and males. The expected P equaled 0.50 due to the United States and global sex ratio being close to 50%. Therefore, it was expected that males would account for 50% of COVID-19 deaths or infections. Yates correction for continuity was not applied to this statistical test, which was indicated by correct equal FALSE. While Yates correction for continuity adjusts the chi square test for populations smaller than 5, this test would only apply to Wyoming. However, Yates correction of continuity did not change the p value for Wyoming. The p-value for this study was 0.05. Any state or country that had a p value lower than .05 with an expected probability greater than 0.5 was considered statistically signi cant.
The data collection for states was provided by the department of public health for each state and was documented in the Supplemental Methods Table 5. In this study, 47 states reported COVID-19 infection values aggregated by sex, but only 24 states reported COVID-19 deaths by sex. Three states --Hawaii, Nebraska, and Connecticut--did not report infection or death rates of COVID-19 and were disregarded from this study. It should be noted that some states or countries provided unknown categories. These values were excluded from the study as the focus was to compare males versus females. These changes are listed in Supplemental table 5. For some states, the department of public health reported the deaths and infection rates in percentages. To nd the number of males who died from COVID-19, the percentage was multiplied by the total number of deaths from COVID-19 in that state. Additionally, to nd the number of males infected with COVID-19, the percentage was multiplied by the total number of infections from COVID-19 in that state. Once the number of male deaths or infections was recorded, the binomial tested was used to determine the p value and chi square statistic (Supplemental Table 5).
The same binomial test was used for countries to compare male deaths or male infections versus females. The data collection for countries was provided by Global Health 50/50 and the sources for each country were documented in the Supplemental Tables (Supplemental Table 6a, Supplemental Table 6b). In this study, a total of 51 countries reported either deaths or infections separated by sex. Just like some states, some countries documented the data in percentages. For male infections, the total number of infections was multiplied by the percentage of males infected in that country. For male deaths, the total number of deaths was multiplied by the percentage of males that died in that country. If the countries reported unknown values, these values were subtracted from the total and are documented in the Supplemental Tables. After gathering the data, the binomial test was conducted in the programming software R (version 3.6.3) to analyze COVID-19 deaths and infections by sex in each country [36].
Lastly the frequency of COVID-19 deaths for each country was collected. The frequency was collected by taking the total deaths of COVID-19 divided by the total infections of COVID-19 for that country. The frequency was also done for males and females by taking males deaths of COVID  [38,39,40]. Additionally, SNPs chosen were found in eQTL associations across multiple tissues. SNPs (rs4646120, rs1978124, rs5934250, rs6632680) were chosen for the ACE2 receptor from GTEx eQTL data [32,33]. Upregulation of the ACE2 receptor was found with the G allele, the C allele, the G allele, and the C allele respectively (Supplemental Figure 1a). SNPs (rs113104244, rs78606958, rs1552456) were chosen for the ENPEP receptor from GTEx eQTL data with eQTL associations in the lung [32,33]. Upregulation of the ENPEP receptor was found with the A allele, the G allele, and the G allele respectively (Supplemental Figure 1b). SNPs (rs12442778, rs11635469, rs73478036, rs28565347) were chosen for the ANPEP receptor from GTEx eQTL data [32,33]. Downregulation of the ANPEP receptor was found with the G allele, the G allele, the G allele, and the T allele respectively (Supplemental Figure 1c). For TMPRSS2, eQTL was only found in testes for GTEx associations and did not meet our criteria for eQTL associations across multiple tissues. Therefore, the eQTL used to choose the SNP for analysis is from Sudamant et al., 2015 [41]. To con rm SNP choices, principle component analysis was utilized to determine possible linkage disequilibrium for SNPs chosen [36]. PCA demonstrated SNPs chosen would be expected to inherit together within populations (Supplementary Figure 2). As only one SNP was chosen for TMPRSS2 ANOVA testing was not carried out; however, frequency for populations and Chi Square for males and females within populations was calculated [36].
Frequencies for alleles which affect gene expression were calculated (in most cases the alternate allele frequency). These frequencies were used for subsequent statistical calculations. Welch's ANOVA testing was used for ACE2 SNPs for males and females as well as ACE2 and ENPEP SNPs across populations as data were not normally distributed and homogeneity of variance was violated. R (version 3. Decalrations Figure 1 Average frequencies of the ACE2 receptor by sex data. Male0 populations are males with the ancestral allele that decreased ACE2 expression and Male1 populations are males with SNPs for alternate alleles causing increased ACE2 receptors. Female00 populations are females that are homozygous for the ancestral allele (decreased ACE2 expression), Female 01 populations are females that are heterozygous, containing one ancestral and one alternate allele, and Female11 are females who are homozygous for the alternate alleles (increased ACE2 expression). Globally, the frequency of occurrence of males with SNPs for the alternate alleles was higher than males or females with the ancestral allele in all populations. The mean frequency of the ACE2 receptor was substantially higher for males (0.787 ± 0.142) than females (0.599 ± 0.247). Frequency of alternate alleles causing increased ACE2 receptors in males ranged from 0.51 -1. Whereas the frequency of females for homozygous alternate alleles ranged from 0.14 -0.98).
Furthermore, the CDX population had the highest male frequency (1) and the JPT population had the highest female frequency (0.98) for the alternate allele. The frequency for the ancestral allele in males was lowest in the KHV population (0.005) and was highest in the TSI population (0.486). The frequency for homozygous ancestral allele in females was lowest in the KHV, CHS, JPT, and PEL (0) populations and highest in the CEU population (0.32). The frequency for heterozygous allele in females was lowest in the JPT (0.015) population and was highest in the FIN (0.610) population. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.

Figure 2
Averages of TMPRSS2, ACE2, and ENPEP SNPs. Globally there was an increase of frequency for ACE2 SNPs with the CEU population having the lowest frequency (0.48) and JPT having the highest frequency (0.988). Furthermore, the BEB, ITU, GWD, LWK, MSL, ACB, PEL, CDX, CHB, KHV, JPT, and CHS populations all have frequencies greater than 0.8 for ACE2 SNPs. The lowest frequency (0.5) for TMPRSS2 SNPs was in the YRI and BEB populations and the highest frequency was (0.813) was in the TSI and CEU populations.
Frequency of ENPEP SNPs was substantially lower in all populations. Populations CDX, JPT, KHV, CHS, and CHB all had no frequency for ENPEP SNPs. The highest frequency (0.140) was in the IBS population. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. Representation of X chromosomes for males and females with and without SNPs. Row a. Purple bands represent ACE2 gene with no alternate alleles (no SNPs) to increase gene expression. Females would undergo X inactivation of one X chromosome in all female cells making gene dosage in male and female cells similar. Row b. ACE2 gene with alternate alleles (with SNPs) are indicated in orange. Male cells contain an X chromosome with SNPs increasing ACE2 gene expression in all male cells. Some females contain one X chromosome with SNPs (orange) and one without SNPs (purple). X inactivation would be random in female cells meaning some cells would have X chromosome inactivation for the chromosome that does not contain SNPs (purple). Those cells would have increased expression of the ACE2 receptor similar to males in row b. Other cells would have X chromosome inactivation for the X chromosome that does contain SNPs (orange) and would therefore not have increased expression of the ACE2 receptor. As some of the female's cells will not have increased ACE2 expression they may have a level of protection. Row c. As viruses break out of their host cells and move to other cells for further infection all male cells will be overexpressing ACE2 as they contain SNPs and may be easily infected. Some female cells will overexpress ACE2 while others will not. This may limit the ability of the virus to infect new female host cells.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.