The Major Genetic Risk Factor for Severe COVID-19 Does Not Show Any Association Among Indian Populations

With the growing evidence on the variable human susceptibility against COVID-19, it is evident that some genetic loci modulate the severity of the infection. Recent studies have identied several loci associated with greater severity. More recently, a study has identied a 50kb segment introgressed from Neanderthal adding a risk for COVID-19, and this trait is present among 16% and 50% people of European and South Asian origin respectively. Contrary to this nding, our studies on ACE2 identied a haplotype present among 20% and 60% of European and South Asian populations respectively, which appears to be responsible for the low case fatality ratio among South Asian populations. This result was also consistent with the realtime infection rate and case fatality ratio among various states of India. We readdressed this issue using both of the contrasting datasets and compared them with the realtime infection rates and case fatality ratio in India. We found out that the polymorphism present in the 50kb introgressed segment (rs10490770) did not show any signicant correlation with the realtime infection and case fatality ratio in India.


Introduction
Since the beginning of COVID-19 pandemic, it has been observed that people with a different ethnic background and country or continent of origin have variable degrees of susceptibility. Though there are a few well known factors for higher susceptibility, e.g. age and comorbidity 1,2 , the variability of the disease has been also reported among healthy people 3 . Recent genome wide association study has identi ed several loci on chromosome 3 associated with the severe risk factor for COVID-19 among Europeans 4,5 .
Subsequently, Zeberg & Pääbo 6 have identi ed a risk haplotype of 50kb introgressed from Neanderthals, which they called the 'Neanderthal core haplotype'. This risk haplotype was found to be present with an allele frequency of 30% among South Asians, 8% in Europeans and 4% among African Americans. The peak carrier frequency was estimated among the Bangladeshi population, where 63% carried at least one copy of this haplotype. The study also cited twice the risk of mortality among people of Bangladeshi extraction living in the UK as opposed to the native population of Brittanic pedigree 7 .
Conversely, three of our studies on ACE2, the gateway of SARS-CoV-2, identi ed a haplotype, shared among South Asians and East Eurasians, likely protecting them from severe risk [8][9][10] . Additionally, the spatial distribution of this haplotype showed strong association with the low infection as well as low case fatality rate (CFR) 10 . To resolve this discrepancy between the two sets of ndings and the associated claims, we have extracted a SNP (rs10490770) reported to be associated with the high risk for COVID-19 6 from our published and unpublished genome wide datasets (Supplementary Table 1), and looked for existing association with the state-wise COVID-19 data of India.

Materials And Method
The genome-wide genotype data by Illumina tagged rs2285666 and rs10490770 SNPs in their panel.
Therefore, we searched the genotype datasets generated by this platform. The frequency data for both of the SNPs from various Indian populations were extracted by using Plink 1.9 11 , from 1000 genome project data phase 3 12 data published by the Estonian Biocentre 13-16 and our newly genotyped samples for various Indian states and Bangladesh (Supplementary Table 1). In addition to our previous study 10 , more samples were added for the SNP rs2285666. The state-wise COVID-19 infection and CFR datasets were extracted from https://www.covid19india.org/. The regression estimations and plots were built by https://www.graphpad.com/quickcalcs/linear1/ and reveri ed by the Microsoft Excel regression calculations. We have also used Pearson's correlation coe cient test 17 to evaluate the effect of both of the SNPs. The spatial distribution of both the SNPs were drawn by using web tool available on https://www.datawrapper.de/.

Results And Discussion
In contrast to the conclusions drawn by Zeberg & Pääbo 6 , our work on ACE2 identi ed a haplotype frequent among South Asians and East Eurasians [8][9][10] . This haplotype is derived by a polymorphism rs2285666 responsible for elevated expression of ACE2. We have found high inverse correlation of this haplotype with the state-wise cases as well as case fatality ratio among Indian populations 10 . This correlation was signi cant at various timelines of the pandemic in India (Table 1). We veri ed the statistical tests with the updated data up to December 2020 and found these data to be consistent with previous observations (Fig. 1 and Supplementary Fig. 1). Thus, it is likely that the ACE2 SNP rs2285666 has played a signi cant role in modulating the susceptibility to the disease among Indian populations.
In our search of the SNPs reported to be associated with high risk by Zeberg & Pääbo 6 , we found rs10490770 from genome-wide datasets 14,15,[18][19][20] . We applied the same tests done for the ACE2 SNPs (Fig. 1). The state-wise frequency variation of this SNP did not show any association either with the number of cases or the case fatality ratio (Table 1 and Supplementary Figure 1). We repeated these regression tests for the number of cases as well as the case fatality ratio data, obtained during all the three months. However, none of them showed any association with the rs10490770 (p > 0.3) ( Table 1).
The lack of association is striking and suggests instead a complex susceptibility response among Indian populations.
Zeberg & Pääbo 6 have used the data of higher susceptibility to the disease among the Bangladeshi population living in UK 21 to support their ndings. The higher mortality rate for Bangladeshi population in the UK needs more detailed investigation on comorbidity, relative age, genetic admixture as well as local environment and socio-economic circumstances in their particular British context. More importantly, a similar trend has also been observed among African Americans, where some of the same quali cations may apply mutatis mutandis [22][23][24] . Furthermore, it is notable that among the Bangladeshi samples analysed by us, the tribal populations of Bangladesh showed almost three times less frequency of rs10490770 (Supplementary Table 1). Therefore, it is advised to explicitly mention the caste and tribal populations while making any statement about South Asian populations. Signi cantly, our data also show that the incidence of the allele rs2285666 has been found to occur in the highest frequency of 100% in Indian populations such as the Nishi and Kokborok (Tripuri), who represent Trans-Himalayan language communities (Supplementary Fig. 1 and Supplementary Table 1). As a linguistic phylum, the Trans-Himalayan language family is widespread in parts of eastern Eurasia and includes languages such as Tibetan, Burmese, Mandarin, Cantonese and Hokkien.
Thus, our extensive analyses on realtime data did not show any association of the SNP rs10490770 with the state-wise infection rates as well as CFRs, suggesting that the risk allele for COVID-19 in Europe does not play a signi cant role in COVID-19 severity in India.

Con ict of Interest
Authors declare that the research was conducted in the absence of any commercial or nancial relationships that could be construed as a potential con ict of interest.
24. Hooper, M. W., Nápoles, A. M. & Pérez-Stable, E. J. COVID-19 and racial/ethnic disparities. Jama (2020). Tables   Table 1. Estimates of Person correlation coe cient for the rs2285666 and rs10490770 with the realtime COVID-19 cases as well as case fatality ratio among Indian populations. The signi cant values are shown in bold letters.