Our pooled data has yielded 248 high quality polymorphisms (Supplementary Table 1). Europeans and Siberians had highest number of private polymorphisms which is likely due to their large number of samples from these groups (Supplementary Figure 1). In the LD (linkage disequilibrium) plot analysis, all the groups showed largely different LD patters (Supplementary Fig. 2). We used a haplotype based approach for the comparison. In contrast with the genomewide analysis [6–8], the NJ (Neighbour Joining) tree based on Fst distances clustered South Asians with the Island and Mainland Southeast Asian populations (Fig. 1a). This unexpected result strongly suggested closer genetic affinity of South Asians with East Eurasians for ACE2. The pairwise difference analysis suggested lower diversity for South, Southeast Asians and Siberian populations (Fig. 1b). West Asian populations showed drift and low haplotype for this gene (Figs 1a,1b and Supplementary Fig 3).
The phylogenetic analysis of various haplotypes among studied populations helped to identify the SNPs responsible for the affinity of South Asians with the East Eurasians (Fig. 1c). Three major distinct haplotypes were observed. Haplotype 1 (ht 1) was more common in to West Eurasians including Central Asian populations, whereas haplotype 2 (ht 2) was frequent among East Eurasian, South Asian and Americans (Fig. 1c). Haplotype 3 (ht3) was harboured mainly by East Eurasians, and South Asians. Ht 2 was derived by a SNP rs446120, whereas ht 3 was derived by the SNP rs2285666. Both of the SNPs play a key role for the distinction of East and West Eurasian populations. Interestingly, most frequent haplotypes of South Asia occurs on the background of these SNPs. A recent study has also highlighted the highest frequency of this SNP (rs 2285666) among Chinese populations (0.5) as well as significant frequency differences among 1000 genome populations . In our study we also found high frequency (0.6) of this SNP among South Asians (Supplementary Table 1). Moreover, a synonymous coding region variant rs35803318, was most frequent among Americans (0.15), followed by Europeans (0.055), Caucasians (0.051) and Central Asians (0.021), whilst it was not polymorphic for West Asian, South Asian, Southeast Asian and Siberians (Supplementary Table 1).
Thus, the phylogenetic analysis suggested that the majority of South Asian samples are shared with the East Eurasians in the background of monophyletic haplotypes 2 and 3, by unique polymorphism events (rs446120 and rs2285666). In addition with that, a synonymous coding region variant rs35803318 was also significantly polymorphic among Americans and Europeans than South Asians. Hence, it is highly likely that among the South Asians, the host susceptibility to the novel coronavirus SARS-CoV-2 will be more similar to East/Southeast Asians rather than the Europeans or Americans.