Coronavirus disease (COVID–19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV–2). Appearing first during late 2019 in Wuhan, China, COVID–19 has spread rapidly worldwide1. As of May 23, 2020, SARS-CoV–2 has infected >5 million people in over 200 countries, killing more than 330,000 people2. Europe has been particularly affected, with Spain and Italy each reaching over 200,000 cases of infection and more than 27,000 deaths, resulting in a maximum case fatality rate (CFR) of >10%2. In contrast, East Asia did not experience such dire effects, with South Korea, for instance, reporting a peak CFR of 2.4%2. Multiple contributing factors could explain this difference, including timing and severity of lockdown measures3, population age ratio4, healthcare resource availability5, smoking rate6,7, and early tuberculosis (Bacillus Calmette–Guérin) vaccination8–10. In principle, genetic factors may also underpin differential susceptibility to SARS-CoV–211–13.
Genes encoding cellular serine proteaseTMPRSS2),, angiotensin-converting enzyme 2 (ACE2),, cysteine proteases cathepsin B and cathepsin L (CatB, CatL),, phosphatidylinositol 3-phosphate 5-kinase (PIKfyve),, and two pore channel subtype 2 (TPC2) are notable for their critical roles in SARS-CoV–2 infection14,15. Particularly, the virus utilizes TMPRSS2 and CatB/L proteolytic activity for priming the viral spike protein, whereas ACE2 is the entry receptor for breaking into host cells14,15. A study has suggested TMPRSS2 inhibition as a clinical target because the priming step is a key factor determining successful entry into target cells15. Not only do TMPRSS2 variants appear to have wide population-specific variation16, but, TMPRSS2 also has low mutation burden in certain populations, a characteristic that could partially explain high TMPRSS2 gene expression. Consequently, the latter is associated with a poor outcome in COVID–1916.
To understand the genetic background of complex phenotypes in human populations, researchers commonly assess correlations with allele frequency (AF)16,17. This approach has identified a correlation between ancestral genetic composition and the CFR of COVID–1917. However, few have examined specific variants, their frequencies and individual contributions to SARS-CoV–2 susceptibility. Some reports are also based only on low-resolution intercontinental comparisons between Europeans and East Asians16–18. Moreover, we know little about the evolutionary history of SARS-CoV–2 susceptibility-associated variants, including when they occurred or how their frequencies might have changed over time.
In this study, we investigated intercountry AF differences of TMPRSS2 variants, estimated variant effects on TMPRSS2 protein structure stability, and linked them to the average of time-adjusted COVID–19 CFR (AT-CFR). We propose that the structural deviation causes TMPRSS2 to be less stable, resulting in a reduced overall infection rate that led to reduced CFR in East Asians. We collected and analyzed 221,498 genomes from public databases19–21 and 2,262 whole genomes from the Korean Genome Project22. We also traced TMPRSS2 AF distribution in ancient populations by region and time period. We aimed to increase the current understanding of the genetic variation underlying SARS-CoV–2 infections and explain the ethnic differences in CFR.