The primary objective of this study was to enhance our understanding of the evolutionary patterns of mtDNA in the Thai population and its correlations with neighboring countries, while also taking into consideration the impact of South Asian admixture. We observed a significant cluster that includes the haplogroups B5a1, F1a, f1f, and M7b1a1 (Fig. 2). Notably, the B5a1 and F1a haplogroups comprise a substantial proportion of the Cambodian samples, representing 38% in a previous study18 and 21% in our study. This suggests the presence of an autochthonous population in Cambodia. The high prevalence of B5a1 and F1a haplogroups in both populations indicates a potential origin of these haplogroups in Cambodia, followed by subsequent expansions into other regions, including Thailand. This could be attributed to historical migrations, interactions, or genetic admixture events between the Cambodian and Thai populations, leading to an increased frequency of individuals carrying the B5a1 and F1a haplogroups. Multiple Cambodian populations forming clusters alongside Indian populations (Fig. 3b) suggest a significant maternal genetic affinity. Nevertheless, the shared haplogroups between them are limited to three (Fig. 4b). The disparity between NMDS (Fig. 3b) and haplogroup sharing (Fig. 4b) highlights complex population interactions. NMDS clustering can be shaped by diverse factors, including ancient ancestry, genetic drift, migration, and recent admixture. In contrast, haplogroup sharing reflects more recent genetic admixture. The lack of haplogroup sharing implies that while there might be common ancestry or genetic influence, specific genetic lineages have not been shared in recent times. Changmai et al.13 elucidated the SAS admixture in the Khmer population (indigenous people of Cambodia) approximately 7.2 % uing f-statistic-based Admixtool2 with autosomal SNPs and estimated the admixture date around 4 CE during the Angkor period using haplotype-based Globetrotter. However, a subsequent study found significant South Asian admixture (ca. 40–50%) in a protohistoric individual from the Vat Komnou cemetery at the Angkor Borei site in Cambodia. Radiocarbon dating places this individual in the early period of Funan (1 CE), suggesting the possibility of South Asian migration to MSEA and intermarriage with local populations before or during the early stage of state formation14. This study provides additional evidence from mtDNA that contemporary Cambodian populations carry genetic contributions from South Asian ancestry. Moreover, the result suggests a substantial contribution of South Asian women to the genetic makeup of present-day Cambodian populations.
The Ede and Giarai are two distinct populations residing in Vietnam, all of which have Austronesian language roots due to cultural diffusion22. Machold et al.23 found that the Y-haplogroups R1a-M420 and R2-M479, associated with west Eurasia, occurred at low frequencies in the Ede population (8.3% and 4.2%, respectively) and in the Giarai population (3.7% and 3.7%). Changmai et al.13 using qpAdm, qpGraph, and SOURCEFIND methods, consistently inferred the presence of South Asian ancestry in the Ede and Giarai populations (7.5 ± 2.1%, 7.4 ± 2.0% respectively, as inferred by qpAdm). The comprehensive genetic evidence derived from Y-haplogroups, and autosomal DNA admixture strongly supports the relationship between the Ede, Giarai, and South Asian populations13,23. The NMDS plot (Fig. 3b) positions Ede and Giarai relatively closer to South Asian populations compared to other Vietnamese groups. However, the gap between them remains wider than seen in Cambodian and Myanmar populations. Therefore, our findings suggest that while Ede and Giarai show some maternal lineage affinity with South Asians, mtDNA evidence alone is insufficient to decisively establish a conclusion about the contribution of South Asian women to their genetic makeup.
Building upon the data obtained from the study conducted by Kutanan et al.11, we have taken an additional step by visualizing and representing the genetic association between Mon and South Asian populations in a phylogenetic tree. Interestingly, Mon and Punjabi individuals in Lahore, Pakistan, share several haplogroups in the M clade, including M, M30, M45, M4a, and M5a, suggesting a common ancestral population. Mon also shares some haplogroups in the M clade, such as M40a1, M45a, and M5c1, with the Indian Telugu population in the UK. The Telugu-speaking people primarily reside in the states of Andhra Pradesh and Telangana in South India, while Punjabi-speaking people are predominantly found in the Punjab region spanning across India and Pakistan. It is surprising to note that Mon harbors haplogroups from both the South and North Indian subcontinents. The qpAdm analysis infers approximately 12% South Asian ancestry within the Mon population13. Additionally, the Mon group exhibits minor occurrences of West Eurasia-associated haplogroups, including J (5%) and R (16%)24. Furthermore, we observed that Mon shares specific haplogroups within the D clade (including D4j1a1, D4j1b, and D4g2a1c), with the Wanchoo people (an indigenous Tibeto-Burman ethnic group residing in the Northeast Indian state of Arunachal Pradesh), as well as with the Gallong people (a small tribal group in the West Siang district of Arunachal Pradesh, who speak a Tibeto-Burman language)25. This observation aligns with the findings of Changmai et al.13 and Kutanan et al.15, who reported the presence of additional ancestry from a Tibetan-related source in the Mon population. They hypothesized that this Tibetan-related ancestry in Mon might have been acquired through interactions with Sino-Tibetan speaking populations in Myanmar. Based on the aforementioned observations, we have performed additional analysis to further investigate the relationship between the Mon population and Sino-Tibetan speaking populations applying f4-statistics in the form of f4(Tibeto-Burman, Dai; Mons, other Thailand groups) and f4(Tibeto-Burman, Han; Mons, other Thailand groups). The result of f4-statistics (Suppl. Fig. S4-S5) supports the existence of the Tibeto-Burman genetic component in Mon populations as all f4-statistic values are positive (|Z| > 2 in most cases), which indicates that the Tibeto-Burman groups share more genetic drift with Mons than with other Thailand groups. Our results confirm the previous observations.
Our study identifies an initial population expansion phase of around 22,000 to 26,000 YBP, contrasting with an earlier study proposing an expansion of ca. 50,000 YBP11. This discrepancy can be attributed to the variation in mitochondrial mutation rates used in the respective studies. The mutation rate of mtDNA is not constant but fluctuates over time. This fluctuation is influenced by factors such as population size and dynamics. In growing populations, there is a higher number of lineages, leading to an increased accumulation of mutations per lineage. These lineages also persist for longer periods, resulting in an accelerated evolutionary rate that surpasses the mutation rate. Conversely, in declining populations, the evolutionary rate decreases, as indicated in a previous study26, which observed evolutionary rates ranging from 1.91 × 10–8 (95% CI 1.72–2.10 × 10–8) mutations per site per year for the oldest period (40,160 ± 4658 years ago) to 4.33 × 10–8 (95% CI 3.90–4.82 × 10–8) for the most recent period. We used a mutation rate of 4.33 × 10− 8, considering that we analyzed the present-day population. On the other hand, Kutanan et al.11,12 used mtDNA data divided into coding and noncoding regions, with mutation rates of 1.708 × 10− 8 and 9.883 × 10− 8, respectively. When we adjusted the mutation rate to 2.285 × 10− 8, as previously utilized by Maier et al.27 while maintaining other parameters, we found that the population expansion shifted to a timeframe of 50,000 years ago, in line with the discoveries of Kutanan (Suppl. Fig. S3). Bayesian methods provide a probabilistic framework for estimating coalescence times and inferring demographic history through the integration of prior information, such as mutation rates and evolutionary models, with observed genetic data. The mutation rate shapes outcomes noticeably, as evidenced by our empirical findings presented in Fig. 6, Suppl. Fig. S3, and Suppl. Table S4. Diverse studies have adopted different mtDNA partitions and mutation rates, yielding varied results. When estimating the age of the haplogroup, a higher mutation rate leads to older estimates, while a lower rate produces younger ones. Similarly, in the Bayesian skyline plot, the mutation rate affects calibrating the molecular clock used to estimate demographic features like population size changes over time. The critical role of mutation rates in these analyses highlights the necessity for accurate estimation. However, precise estimation is challenging due to its fluctuating nature, resulting in various rates adopted across studies. This variability can lead to variations in age estimates and demographic insights. This predicament prompts us to explore innovative solutions.
The initial population expansion phase (Fig. 6) corresponds to the Last Glacial Maximum (LGM, 19,000–26,500 years ago)28. During this period, Earth underwent intense cold and widespread glaciation, affecting numerous regions and leading to harsh environmental conditions. However, Southeast Asia remained relatively more habitable. Furthermore, the sea level dropped significantly, reaching a minimum level of ca. 120 meters below the present level29. The lowered sea level led to the emergence of habitable lands with abundant resources, creating favorable conditions for the expansion and admixture of modern human populations30. This environmental change facilitated migration across Southeast Asia31. Following the LGM (around 18,000 years ago), the flooding of the Sunda shelf (marine transgression)32,33 likely caused a contraction of inhabitable areas towards inland regions, stabilizing population expansion34,35. During LGM, several new haplogroups emerged, providing additional evidence for population expansion. Notable among them are the Southeast Asian lineage M91 (25,600 years ago), the Northeast Indian haplogroup M49e (25,100 years ago), M49e1 (20,200 years ago), and Indochina R9b (19,000 years ago)36,37. We also observed a second phase of population expansion occurring between 3,800 and 2,500 YBP, which had not been previously documented in studies on mtDNA in the Thai population. Moreover, the sudden increases in effective population size might indicate the arrival of new groups or gene flow from neighboring regions. Gignoux et al.38 demonstrated a global Neolithic expansion through mitochondrial lineage analysis. They also indicated that the population size of Southeast Asia began to expand around 4,700 years ago (CI: 3,000–5,700 years ago), which aligns with our findings. This period corresponds to the Neolithic demographic transition, characterized by the adoption of agriculture by prehistoric societies, resulting in increased food production and decreased mobility compared to foraging practices. In Southeast Asia, this transition took place between 2500 and 1500 BCE and brought about significant changes in the biological, linguistic, and cultural evolution of the region39. Around 6000 − 4000 BCE, rice (Oryza sativa japonica) was fully domesticated in the mid-lower Yangtze River valley of central China40. Subsequently, rice and millet farmers from the Yangtze and Yellow River regions migrated southward through different routes, reaching Baiyangcun in Yunnan around 2650 BCE41. By 2200 − 2000 BCE, they had reached coastal Vietnam and Thailand, and by 1700 − 1500 BCE, they had expanded to the interior Khorat Plateau, as reported by Higham et al.42. Archaeological excavations have yielded artifacts such as domestic rice, pottery styles, and tools, demonstrating a connection to southern China. Notable examples include Khok Phanom Di (ca. 2000 − 1500 BCE), Ban Chiang, and Non Nok (ca. 1100 − 1000 BCE) as well as Tha Kae and Khok Charoen (ca. 2000/1800 − 1100 BCE)43. In addition to archaeological findings, the studies on autosomal DNA have provided supporting evidence for a subsequent increase in the effective population size. Neolithic populations in MSEA were a mixture of deeply diverged East Eurasian lineages and East Asian agriculturalists who migrated from South China approximately 4000 years ago6,7 and the genetic structure of Bronze Age individuals from Northern Vietnam (ca. 2000 years ago) suggests another migration wave from southern China to MSEA6,7.
In summary, our study investigated mtDNA evolution within the Thai population, focusing on SAS admixture. We identified 20 novel haplogroups from 166 randomly selected individuals. Haplogroup distribution was found to vary across populations and countries. Maternal lineage affinities connect MSEA and South Asian populations, highlighting ancestral genetic association and the significant influence of South Asian women. Notably, f4-statistics indicate a Tibeto-Burman genetic element in Thailand's Mon population. Our findings indicate two population expansion phases, aligning with the LGM and Neolithic transition. Importantly, this work enhances our understanding of the mtDNA landscape of Thailand. Moreover, it also emphasizes the critical role of prior knowledge in the Bayesian framework for accurate inference of demographic history.