The matrilineal ancestry of Nepali populations

The Tibetan plateau and high mountain ranges of Nepal are one of the challenging geographical regions inhabited by modern humans. While much of the ethnographic and population-based genetic studies were carried out to investigate the Tibetan and Sherpa highlanders, little is known about the demographic processes that enabled the colonization of the hilly areas of Nepal. Thus, the present study aimed to investigate the past demographic events that shaped the extant Nepalese genetic diversity using mitochondrial DNA (mtDNA) variations from ethnic Nepalese groups. We have analyzed mtDNA sequences of 999 Nepalese and compared data with 38,622 published mtDNA sequences from rest of the world. Our analysis revealed that the genomic landscapes of prehistoric Himalayan settlers of Nepal were similar to that of the low-altitude extant Nepalese (LAN), especially Newar and Magar population groups, but differ from contemporary high-altitude Sherpas. LAN might have derived their East Eurasian ancestry mainly from low-altitude Tibeto-Burmans, who likely have migrated from East Asia and assimilated across the Eastern Himalayas extended from the Eastern Nepal to the North-East of India, Bhutan, Tibet and Northern Myanmar. We also identified a clear genetic sub-structure across different ethnic groups of Nepal based on mtDNA haplogroups and ectodysplasin-A receptor (EDAR) gene polymorphism. Our comprehensive high-resolution mtDNA-based genetic study of Tibeto-Burman communities reconstructs the maternal origins of prehistoric Himalayan populations and sheds light on migration events that have brought most of the East Eurasian ancestry to the present-day Nepalese population.


Introduction
Nepal is geographically located along the southern slopes of the high Himalayan Mountain ranges. Politically, it is bordered by Tibet autonomous region in the north and India in the south, east, and west (Dhital 2015;Whelpton 2005). Between the Himalayas and the Terai plains, the hilly area stretches from west to east with an elevation rising up to 3000 m above sea level (masl). In many places, the mid-hill ranges are fragmented by the rivers flowing north to south. This rugged topography has been acting as a natural barrier, only permeable to small-scale human movement facilitating the existence of several isolated indigenous communities in Nepal. Pahadi (comprising mainly Brahmin, Chhetri and Kami), the most populous ethnic group in this region, speaks Indo-European (IE) language (Whelpton 2005). The other prominent ethnic populations such as Newar, Magar, Gurung, Tamang, Rai, and Limbu speak Tibeto-Burman languages (Dhital 2015;Whelpton 2005)  be native to the mid-hilly areas of the adjoining districts of Kathmandu valley. Since ancient times, this valley was a melting pot for different reasons, like fertile soil, a favorable climate, a malaria-free zone, and a primal trade center that linked India and Tibet. Newars are considered as the original inhabitant of the Kathmandu valley and its surrounding districts. According to available historical records, the Tibeto-Burman ethnic group (Kirats/Kirati) are the ancestors of the contemporary Newar (Malla 1981;Riccardi 1975;Shrestha 2007;Whelpton 2005). Kirats are considered to be the indigenous aboriginals of the Eastern Himalayan region, extending eastward from Nepal to India, and are primarily divided into two ethnic groups viz. Rai and Limbu (Malla 1981).
The oldest chronicle of Nepal, the Gopal Raj Vamshavali, states that the Kirats governed the Kathmandu valley for 32 generations (Malla and Vajracarya 1985). At present, there is no recorded history of the Kirats. There is just a corpus of inscriptions left by their successors, the Licchavi, who ruled Nepal after the Kirats (Malla 1981;Shrestha 2007). According to linguist van Driem, the Newar are a Tibeto-Burman people with a long history of Sanskritisation and a rich literary heritage in both Sanskrit and their native Tibeto-Burman language, Newari (van Driem 1992). The Supplementary material contains more details about the history of the Newar (Additional file 1: History of Newar). Newar makes up to 5.0% of the Nepal population (2011 census, Nepal). The Magar people are the third-largest indigenous ethnic group in Nepal, representing 7.1% of its total population (2011 census, Nepal). They speak a language belonging to the Tibeto-Burman family (Whelpton 2005). Archaeological records suggest that the Mesolithic settlers in the Eastern Terai region had unique features that show some striking similarities to those found in Southeast Asians, especially to the Hoabinhian culture of Vietnam (Corvinus 1996). Neolithic tools found in several parts of the country, including Kathmandu valley, dates back to the second millennium BC show affinities with the Assamese of Northeast India (Banerjee and Sharma 1969;Sharma 1988). The Himalayas, though considered being a natural barrier to gene flow, the presence of Tibeto-Burmans in Nepal could plausibly indicate dispersals, by circumventing the Himalayas, from Tibet directly or via Northeastern India. Among the Nepalese Tibeto-Burmans, Sherpa's migration from Tibet directly across the Himalayas to the high-altitude Nepalese valleys of Khumbu is well established (Bhandari et al. 2015;Zhang et al. 2017). However, little is known about the migration route, admixture and/or isolation events that brought most of the East Eurasian mtDNA lineages detected in LAN Tibeto-Burman groups. Currently available ancient DNA and genetic data studies from the Himalayan region have suggested contrasting hypotheses of Sherpa/Tibetan or low-altitude East Asian origin for this populations (Cole et al. 2017;Fornarino et al. 2009;Gayden et al. 2013;Gnecchi-Ruscone et al. 2017;Jeong et al. 2016;Wang et al. 2012), though this was most probably due to limited sample size and low-resolution mtDNA sequences. In the present study, we focused on dissecting the maternal ancestry of the Nepalese populations by performing the high-resolution mtDNA phylogeographic study to understand the demographic history of the Nepalese population. Additionally, we have also examined the distribution of the derived variant of the EDAR gene (rs3827760) to find out the East Asian ancestry in extant Nepalese populations.

Sampling
About 5.0 ml of blood samples was collected from 461 unrelated male individuals belonging to nine aboriginal Nepalese populations, representing the three ethnic groups, namely Newar, Magar, and Brahmin (

Mitochondrial DNA and EDAR 1540 T/C sequencing
Following the previously described phenol-chloroform method (Thangaraj et al. 2002), the genomic DNA was extracted from the collected blood samples. DNA quantification was done using NanoDrop® ND-1000 (ThermoFisher Scientific, USA). Polymerase chain reaction (PCR) amplification and DNA sequencing of partial/ complete mtDNA segment(s) (Rieder et al. 1998) were performed in dedicated clean rooms with negative control and blank samples. Initially, we have sequenced the hypervariable regions I and II (HVR I & II), followed by sequencing of selected coding region of mtDNA for accurate haplogroup assignment. Further, the whole mtDNA sequencing of the samples representing the dominant East Eurasian haplogroups was performed. In total, we obtained 139 complete mtDNA sequences from Nepalese individuals. The adaptive non-synonymous allele rs3827760, also known as EDAR 1540 T/C single nucleotide polymorphism (SNP), was genotyped by PCR and direct sequencing using forward (5'-3') TCT GCA CAC AAG GAC TCC AC and reverse (5'-3') GCG TTC TAG GTG TCG TTT GC primers to find East Asian genetic affinity. Altogether, this SNP was assayed in 461 Nepalese samples.

Data analyses
This study generated 461 mtDNA sequence data. Additionally, 538 Nepalese mtDNA sequences were collected from  Table S4 and Additional file 1: Fig. S1). To understand the genetic relationship of Nepalese with other populations, we performed PCA using a method developed by Richards et al. (Richards et al. 2002) in the MVSP (Multivariate Statistical Package for Windows, Ver. 3.22) software (Kovach 1999). The median-joining tree for the dominant haplogroups was constructed manually and confirmed by Network10.1 (www. fluxus-engin eering. com/ sharenet.htm) program. The most parsimonious phylogenetic tree of various haplogroups was reconstructed manually based on the complete mtDNA sequences and verified by mtPhyl software (Eltsov and Volodko 2009). The isofrequency maps for dominant haplogroups were constructed using the Kriging linear model in Surfer 16.0.3 (Golden Software, LLC). The coalescence age or time to the most recent common ancestor (TMRCA) of basal mtDNA haplogroup was estimated using ρ statistics method as described previously (Soares et al. 2009).

mtDNA lineages in Nepalese populations
A total of 999 individuals were classified into known mtDNA haplogroups (Table 1 and Additional file 2:

Genetic relationship of Nepalese populations and their surrounding populations
We calculated diversity indices and tests of selective neutrality of Nepalese populations based on the HVS-I sequences (Additional file 2: Table S5). We detect relatively lower nucleotide diversity and mean pair-wise differences among the Newar sub-caste, whereas the highest diversity was observed among the Kathmandu (Gayden et al. 2007) and Nepali-mix (mix population from Kathmandu and Eastern This differentiation across Nepalese subgroups was further tested by the analysis of the derived variant rs3827760 of the EDAR gene (Fig. 3). Scans of positive selection on genome-wide surveys from the global human population have identified the derived coding variant of the EDAR gene as an interesting candidate for the positive selection in East Asians (Fujimoto et al. 2008;Kamberov et al. 2013). This has resulted in a very high frequency of non-synonymous SNP rs3827760 (370A; 1540C) in East Asian and the Native American origin but virtually absent among most of the Indians (Dravidian and Indo-European speaking groups), European and African populations (Chaubey et al. 2011;Fujimoto et al. 2008). Interestingly, this derived allele was present in both the Tibeto-Burman (Magar and Newar) and IE (Brahmin) populations of Nepal. The highest 1540C allele frequency was observed in Magar (71%), followed by Newar (30%) and Brahmin (20%). The frequency observed in Magar was similar to those reported in HAAPs (Himalayan and adjoining populations) (Tamang et al. 2018) and Indian Tibeto-Burman populations (Chaubey et al. 2011). However, despite having nearly equal proportions of East Eurasian maternal components, Newar showed almost half of the derived EDAR allele frequency than that of Magar, reflecting their distinct population history. The presence of a substantial 1540c allele frequency in Brahmin implies admixture with the local Tibeto-Burman populations.

Phylogeographic patterns of the dominant East Eurasian haplogroups
To delve into the migratory and admixture events that might have occurred between the aboriginal Nepalese and other East Eurasians, we compared the dominant East Eurasian lineages in Nepalese (Table 2) and East Eurasians and analyzed haplotype networks tree and parsimonious phylogenetic tree of major East Eurasian haplogroups. Further, the frequency counter maps of shared haplogroups were constructed.

The phylogeography of haplogroup Z
Haplogroup Z is one of the infrequent mtDNA haplogroups with sporadic distribution in Russia, East Asia, Mainland Southeast Asia (MSEA), Central Asia, Eastern and Northern Europe, and South Asia (Fig. 4) (Chandrasekar et al. 2009;Fedorova et al. 2013). Among the members (Z1, Z2, Z3, Z4 and Z7) of haplogroup Z, Nepalese populations were characterized by rare clades Z3a1a and Z7, of which Z3a1a was the most frequent sub-clade in Newar (Additional file 1: Fig. S4). Lineage Z1 dominated most of the representatives of haplogroup Z in North Asia and Europe (Fedorova et al. 2013). In this region, we estimated the coalescent age of ~ 21.6 Kilo-years ago (Kya) for clade Z1 (Table 3), similar to the earlier estimate of ~ 20.4 Kya (Fedorova et al. 2013). Another clade Z3, found in East Asia, North Asia and MSEA, is the oldest member of haplogroup Z with an estimated age of ~ 25.4 Kya. Haplogroup Z has its highest clade diversity in China, with an overall prevalence of 2.8% (620/ 21,668) and is partially more frequent in Northern Han Chinese (3.1%). Likewise, the overall rate of haplogroup Z in Tibetans was 1.92% (of 6109 samples) (Qi et al. 2013). In another recent study of Tibetans, with comparatively less sample, haplogroup Z was characterized by clades Z3a (5/682), Z3a1a (1/682), Z4a (1/682) and Z4 (1/682) (Qi et al. 2013). Overall, haplogroup Z reaches the highest incidence in Tibeto-Burman-speaking populations from Thailand (Lisu_2, 20%) (Oota et al. 2001), Nepal (Newar, 18.6%) and Myanmar (Chin2, 12%) (Li et al. 2015), in which clade Z3a1a dominated most of the representatives of this haplogroup.
After constructing a phylogenetic tree ( Fig. 4A and Additional file 1: Fig. S5) based on 161 complete mtDNA sequences, we identified the basel nodes of Z3a were distributed mainly in Northern and Eastern Han Chinese, and thus most likely originated in these regions. This claim was further bolstered by the presence of Z3a1 lineage in Yakut of Sakha Republic (Additional file 1: Fig. S6). Terminal clade Z3a1a1 detected in Tibet, Myanmar, Nepal, India, Thai-Laos and Vietnam trace their ancestral roots to China with a coalescent age of ~ 8.4 Kya (Table 3). Basal variant 9452A with a coalescent age of ~ 4.4 Kya were only shared between the members of Z3a1a1 from Northeast India, Thailand and Vietnam but not with the Nepalese and Tibetans. Z3a1a1  (Table 3). Newly defined clade Z8 (Fig. 4C),  Table S4) having a frequency of Z > 0 were included in the analysis and were marked by the blue dots on the contour map so far, detected only in Newar ethnic group of Nepal with a coalescent age ~ 3.8 Kya (Table 3).

2016) of North India.
When we constructed a most parsimonious phylogenetic tree (Additional file 1: Fig. S6) based on 121 complete mtDNA sequences, we identified the ancestral node of clade F1d was mainly disseminated in Han Chinese with an estimated coalescent age of ~ 9.5 Kya (Table 3). Clade F1c, another sub-lineage of haplogroup F, attains its highest frequency in the Tibeto-Burman speaking group from Northeast India (Apatani,16.8% and Nishi,9.1%) and Myanmar (Chin 3: 15.4%, Naga 1: 6.3% and Rakhine 1: 8.3%). The estimated coalescent age of the sub-clade F1c1a2a and F1d1 was ~ 9.5 Kya and 7.8 Kya, respectively (Table 3). After the divergence from the ancestral Chinese F1d1 sequence, Tibeto-Burman groups (of Myanmar, Tibet, and Nepal) and Austroasiatic (AA) groups (of Thailand) shared a common variant 16284G (Fig. 5B) implying a common Tibeto-Burman ancestry. Intriguingly, these groups do not share any further basal variants and lineage expansion as denoted by star-like shape was confined to their respective geographic regions (Fig. 5B). Nepalesespecific branch (F1d1a1) coalesces at ~ 4.4 Kya, suggesting their ancient origin and lineage expansion in Nepal most probably via southeastern Tibet (Fig. 5C). Likewise, after bifurcating from ancestral Chinese/Vietnamese sequence, four basal variants (3970 T, 5300 T, 5663 T, and 12892C) defining clade F1g1a remained restricted within Nepalese populations (Additional file 1: Fig. S8). Analysis of other clades, F1c and F2b, also resulted in a similar mode of distribution, further verifying the notion of early entry into Nepal ( Fig. 5D and Additional file 1: Fig. S7 and S9).

The phylogeography of haplogroup A
Haplogroup A is one of the most prevalent haplogroups in northern and eastern Asia. It is relatively higher in Tibetans (14.63%) (Qi et al. 2013) than in Northern (8.53%) and Southern Han Chinese (6.54%) (Li et al. 2019b). Members of haplogroup A in Tibet are dominated by sub-clades A4 (7.8%) and A11 (5.5%), whereas the other sub-clades are absent or present in shallow frequency in Tibetans (< 2%) (Kang et al. 2016;Qi et al. 2013). Haplogroup A, dominated by its sub-lineage A15c1, was the most common maternal lineage in Sherpa of Nepal, with a frequency of 27.5% (Bhandari et al. 2015). This clade was not observed in the other Nepalese populations studied so far. In the examined  Table S4), only the populations having a frequency of F1d > 0 were included in the analysis and were marked by the blue dots in the Map LAN, lineage A was mirrored by its clades, A27, A14 and A17, of which A27 was the most abundant clade in Newar (3.99%). Newly defined clade A27 only discerned so far in Newar and Nepali-mix coalesce at ~ 8.4 Kya ( Fig. 6 and Table 3) suggesting their ancient origin and potentially insitu differentiation in Nepal. The other sub-clades, A14 and A17, were seen with shallow occurrence in Brahmin, Newar, Kathmandu and Nepali-mix. A17a lineage belonging to Nepalese, Chinese and Vietnamese shares the most recent common ancestor dating back to ~ 8.4 Kya (Fig. 5 and Table 3). Its sub-clade A17a1, branched off directly from the nodes occupied by the Chinese/Vietnamese lineages, is peculiar to Nepal with a coalesces time ~ 3.8 Kya.

The phylogeography of haplogroup D
Haplogroup D is found in Northeast Asia, and Central Asia; and is the second most dominant haplogroup in Tibetans with an average frequency of 16.5% in the Plateau (Comas et al. 2004;Kang et al. 2013;Kutanan et al. 2017;Li et al. 2015;Li et al. 2019b). Detected with noticeable frequency among the Nepalese, it also represents the most dominant maternal lineage in Tamang (26.1%) and Magar (24.3%). Lineage D4 and its representative D4i, D4j and D4e were the major sub-haplogroup in Nepalese, whereas D5 lineage was dominated by D5a2 and its sub-clades D5a2b and D5a2a1 (Additional file 1: Fig. S4). We detect distinct sub-clades of D4i distributed among the Nepalese populations. Magar and Nepali-mix shared the most recent common ancestor with Chinese, and Taiwanese D4i3 with a coalescent time of ~ 7.8 Kya (Table 3 and Additional file 1: Fig. S11). Whereas, D4i detected in Newar, newly named D4i6, shared its ancestral root with the Turkic ethnic group (Uyghur) of northwest China (near to Central Asia) and coalesces at ~ 14.7 Kya (Table 3). Its sub-clade D4i6a so far only sighted in the Nepalese populations coalesce at ~ 7 Kya. Interestingly, newly defined clade D4h2b detected in Thailand and Nepal share a most recent common ancestor and coalesce at ~ 4.1 Kya. Three Nepalese samples shared a most recent common ancestor with the Chinese/Vietnamese and Tibetan D5a2b lineage (Additional file 1: Fig. S12).
Nepal holds a crucial and significant location in the geology of the Himalaya, serving as a bridge between the Western and Eastern Himalaya (Dhital 2015). It stretches from low-lying foothills in the south to the Great Himalayas that extend to the Tibetan plateau in the north, featuring several of the world's highest peaks (Dhital 2015). The Himalayan region is also one of the most multi-lingually diverse regions in the world. It has populations speaking six different linguistic phyla-Austroasiatic, Altaic, Daic, Dravidian, Indo-European, and Tibeto-Burman-as well as two confirmed linguistic isolates (Burushaski and Kusunda) (van Driem 2001). At a genetic level, Himalayan arc was also suggested to have played a pivotal role in shaping human population history and genetic diversity in Asia, acting as a natural genetic barrier that helped to keep large cultural and genetic distinctions observable between South Asian and East Asian populations (Bhandari et al. 2015;Gayden et al. 2007;Gayden et al. 2013;Wang et al. 2012). Tibetan desert high plateau and the high-altitude transverse valleys (> 3,000 masl) of Nepal were among the challenging places inhabited by prehistoric humans. According to previous genetic studies, most of the Tibetan genetic components can be attributed mainly to the Neolithic immigrations initiated from northern China ~ 7 Kya (Qi et al. 2013;Zhao 2009). Archaeological data from the Tibetan plateau suggest an initial extensive human settlement in the northeastern plateau (< 2500 masl) ~ 5.2 Kya, long before their further expansion to high-altitude plateau areas (> 2500 masl) ~ 3.6 Kya (Chen et al. 2015). Among the Nepalese Tibeto-Burmans, Sherpa migration from Eastern to Central Tibet and later, approximately in the sixteenth century, to the Solukhumbu region of Nepal is well-documented (Oppitz 1974;Whelpton 2005). These historical records correlate adequately with both uniparental and genome-wide perspective (Bhandari et al. 2015;Jeong et al. 2016;Zhang et al. 2017). A recent study showed that two clades, M9a1a1c1b1a and A11a1a, played significant roles in shaping the genetic landscapes of contemporary Tibetans (Li et al. 2019a). However, our findings show that these lineages were limited to Tibetans and Sherpas, and were essentially non-existent in the LAN. Except for few LAN lineages that show connections to northeastern Tibet, e.g., D4i3, G2a1d2b, G3a1-most of them show settlements into Eastern Himalayas that extend from Eastern Nepal across Northeast India, Bhutan, Southeastern Tibet to Yunan in China and Northern Myanmar.
Hence, phylogeographic analysis of the East Asian haplogroups detected in LAN shows different characteristics than high-altitude Sherpa/Tibetans: (1) LAN share common Tibeto-Burman precursor with few ethnic groups of Southeastern Tibet including Lhasa and are prevalent in Myanmar, Thailand and Northeast India; (2) majority of the LAN maternal components can be traced back to Neolithic immigrations from China ~ 8 Kya, which is in a good agreement with previous studies (Qi et al. 2013;Zhao 2009); (3) Significantly, after the divergence from ancestral Chinese sequence, the majority of LAN mtDNA lineage do not share any additional variants with Tibetans, Northeast Indians and Burmese.
This genetic differentiation between Sherpa/ Tibetans and LAN is further supported by the mtDNA profile of eight ancient dental samples derived from the high-altitude Annapurna Conservation Area (ACA) of Nepal (Mustang District), situated near the Tibetan plateau (Jeong et al. 2016). These samples, spanning 3150-1250 years before present, yielded sub-clades (Z3a1a, F1c1a, F1d1, D4j1b, 1 3 and M9a) that were mostly present in the contemporary LAN and were virtually absent in high-altitude Sherpa (except M9a, whose terminal haplogroups was uncharacterized). Therefore, it is conceivable that precursors of the earliest inhabitant of the Southern slopes of Himalaya (e.g., those having occupied the mustang region of Nepal) were genetically closer to the contemporary low-altitude Nepalese and plausibly crossed the high-altitude Tibetan/ Nepalese passes. Our results rejected the previous claim of affiliation of ACA samples with the high-altitude contemporary Sherpas (Jeong et al. 2016). The differentiation between Sherpas/Tibetans and LAN (mainly Newar and Magar) is further supported by the linguistic study of the Tibeto-Burman populations (Bradley 1997). This shed new light on the migratory roadmap and admixture events that might have occurred on the southern slopes of the Himalayas. Furthermore, phylogeographic analysis of the majority of the East Eurasian lineages detected in LAN displayed de novo differentiation in contemporary Nepalese, albeit a few shows a recent genetic admixture with northeast Indians. Among the de novo differentiated lineages, most were > 3.8 Kya (Table 3) suggesting a subsequent entry into Nepal after the extensive human settlements on the plateau began.

Conclusion
The present study is the first extensive high-resolution mtDNA-based population genetic analysis of Nepalese populations, which provides new insights into migration and admixture events that occurred on the populations speaking Tibeto-Burman languages. Populations living in the South of Himalaya (mainly Newar and Magar) reveals distinct population history than contemporary high-altitude Tibetans/ Sherpas. In fact, they were similar to ACA samples (prehistoric Himalayan settlers) and their ancestry can be traced back to Neolithic immigration from East Asia around ~ 8 Kya. Finally, we argued that unlike Sherpas/ Tibetans, who remained confined in the high-altitude region, LAN primarily derived their East Asian ancestry from low-altitude dwelling proto Tibeto-Burmans who seems to have diffused from East Asia across the eastern Himalaya. The ancient genetic makeup being gradually reshaped by several admixture events along the migratory route from China to Nepal, the bearers of these lineages might have entered Nepal directly across the Himalayas, most probably via Southeastern Tibet around 3.8-6 Kya. Overall, our study provides a step forward in dissecting the complexity of the East Asian mtDNA landscape of the Himalayan populations residing in the South of the Himalaya and point to the need for further study using other genetic markers.