Ancient uniparental DNAs in distinguishing the competing theories of molecular evolution and modern human origins

Analyses of extant people have resulted in two models for the uniparental DNA phylogenetic trees of modern humans rooted in either Africa or East Asia. The Africa model is based on the neutral theory. The Asia model is reached from the maximum genetic diversity (MGD) theory. To test the two competing theories, we examined published data of ancient uniparental DNAs. Many ancient samples belonging to a terminal haplogroup were found to have mutated only in some, but not all, of the sites that dene a more basal haplogroup. This pattern was found for the non-controversial haplogroups shared by the two competing models, and also for the haplogroups specic to the Asia model. Furthermore, many ancient samples that do not belong to some of the haplogroups of the Africa model nonetheless had mutations in them, which makes it impossible to unambiguously assign them to a haplogroup within the Africa model. Finally, uniparental DNAs of archaic humans were found to carry some modern alleles present in the rst uniparental DNAs in the Asia model, indicating convergent evolution. Therefore, the data from ancient DNAs have veried the MGD theory and the actual existence of the haplogroups specic to the Asia model. sequencing or calling errors (P < 0.01, 2/59 vs 0/2027). The results showed that ancient samples of a particular terminal haplogroup often had only partial mutations in the basal haplogroups to which they belong that are non-controversial or present in both the Africa and the Asia model.


Introduction
The eld of molecular evolution originally got started by the analyses of protein sequence alignments in the early 1960s. The most astonishing nding of those studies is the genetic equidistance phenomenon where, a simple species, e.g., sh, is approximately equidistant in amino acid identity to all the more complex species of land vertebrates such as reptiles, birds, pigs, and humans (Margoliash, 1963). This observation has been interpretated by the molecular clock hypothesis that all species have similar mutation rates (Kumar, 2005). The eld then viewed this limited interpretation of an observed reality as the reality, and the neutral theory was next invented to try to explain the molecular clock "reality" (Kimura, 1968(Kimura, , 1969. The molecular clock and the neutral theory then served as the theoretical foundation for most of the phylogenetic results in the past 60 years, including the evolutionary trees of uniparental DNAs of modern humans with roots in Africa. A tacit assumption of this framework is that all observed genetic distances are still increasing with time and not yet at maximum saturation since mutations will always occur at new sites (not previously mutated) under this framework (in nite site model). Other assumptions include that most DNAs are selectively neutral and the number of neutral sites in a genome is in nite.
Despite the overwhelming amount of studies based on this neutral framework, most lack independent con rmations, and both selfinconsistencies and contradictions among different studies are abundant. After careful reviews of the vast literature in the phylogenetics eld, many researchers have concluded that the neutral theory is an unsatisfying explanatory framework for evolutionary phenomena (Huang, 2016;Kern and Hahn, 2018;Le er et al., 2012;Ohta and Gillespie, 1996). The molecular studies of human origins are all about interpretating the genetic diversity patterns of humans and yet the mystery of what determines genetic diversity remains to be fully solved (Huang, 2016;Le er et al., 2012). The dogma that less conserved DNAs are less functional has been overturned (Kasinathan et al., 2020;Wang et al., 2020). There is a real possibility that both experimental and theoretical biology could in the near future demonstrate that most DNAs are not neutral (Dunham et al., 2012;Gates et al., 2021;Huang, 2016;Pouyet et al., 2018;Quinodoz et al., 2021;Tsuzuki et al., 2020). That would overthrow most of the results in the eld, which might seem hard to believe, but is really not as most results are far from proven beyond any doubt or free from uncertain assumptions.
A competing alternative framework of molecular evolution, the maximum genetic diversity (MGD) theory published in 2008, was inspired by an independent rediscovery of the same genetic equidistance phenomenon (Huang, 2009(Huang, , 2016. It turns out that the equidistance phenomenon is in fact a result of maximum saturation that is completely independent of time and mutation rate but is directly related to organismal complexity, and so the molecular clock (and in turn the neutral theory) interpretation of it could not be further from the truth (Hu et al., 2013;Huang, 2008Huang, , 2012. Complex species have a lower fraction of sites that can freely tolerate substitutions, and so the maximum dissimilarity between a complex species and a simpler one is mostly de ned by the higher fraction of sites in the simpler species that can freely tolerate substitutions. The distinguishing feature of maximum distance is the enrichment of recurrent mutation sites, where independent mutations occur at the same sites in different lineages but lead to different amino acids (Huang, 2010;Wang et al., 2020). The MGD theory accepts the proven virtues of the neutral theory as a good description of truly neutral variants still at the linear phase of sequence divergence prior to maximum saturation, but differs from it in several fundamental aspects. 1. Most observed genetic distances or diversities are at maximum saturation. 2. Convergent mutations, back mutations, and recurrent mutation sites are common. 3. Most DNAs are functional or under selection. These notions of MGD are fully supported by known data (Huang, 2009(Huang, , 2016. The differences between the two theories guarantee that phylogenetic inferences based on the MGD theory would be vastly different from those based on the neutral theory. Indeed, with regard just to humans, the MGD theory has produced two very dramatic results. First, despite that chimpanzee is the closest to humans in raw DNA sequence similarity, it is only among the closest to humans in phylogeny (Huang, 2012). All great apes within the pongid clade are equally related to humans in phylogeny as commonly accepted before the molecular era. The closer similarity in DNA of chimpanzees to humans in 70% of the genome, compared to other great apes, comes from convergent evolution. So does the closer similarity of gorillas to humans in 30% of the genome or the closer similarity of orangutans to humans in 1% of the genome, which has been explained away by the unprovable idea of incomplete lineage sorting (Hobolth et al., 2011;Scally et al., 2012). Secondly, different major human populations have independently evolved for closer to 2 million years, and the roots of the uniparental DNAs of modern humans are in East Asia rather than in Africa (Yuan et al., 2017).
Two competing models of modern human origins termed "Multiregional" and "Recent Out-of-Africa" (OoA) have long been proposed (Brauer, 1982;Henn et al., 2018;Scerri et al., 2018;Stringer and Andrews, 1988;Wolpoff et al., 1984;Wu, 2004). The multiregional model considers extant people of any given region, e.g., East Asia, to be largely descended from ancient people living in the same region at ~200-2000 ky ago, such as Peking man, with some gene ow having occurred between populations in different regions (Wu, 2004). The model has support from fossils and cultural remains but molecular evidence has been lacking until recently (Yuan et al., 2017). Analyses based on the MGD theory suggest multiregional origins for autosomes but root both uniparental DNAs in East Asia (Yuan et al., 2017). The rooting of mtDNA tree in Asia independently con rms an earlier paper (Johnson et al., 1983), and has been veri ed by ndings from ancient mtDNAs that show the earlier appearance of haplogroup R compared to N (Zhang and Huang, 2019a, b). It has also been found that all non-African Y haplogroups originated in southern East Asia, contradicting the serial founder model of OoA (Hallast et al., 2021).
The OoA model posits that modern humans originated in Africa and then migrated to Eurasia, largely replacing local archaic humans with limited genetic mixing (Brauer, 1982;Cann et al., 1987;Green et al., 2010;Henn et al., 2018;Scerri et al., 2018;Stringer and Andrews, 1988). The rooting of uniparental DNAs in Africa relies on the assumption of neutral mutations throughout the entire genome (Cann et al., 1987;Ke et al., 2001;Underhill et al., 2000). The in nite site assumption emerges from the neutral framework, which states that mutations appear once in the evolutionary history, and the related inference of derived alleles underlies the tree topology of the uniparental DNAs and the rooting of the Africa model (Ingman et al., 2000;Underhill et al., 2000). However, mutation saturation and natural selection are far more common than initially thought, which would invalidate the currently accepted inference of derived alleles (Chen et al., 2020;Huang, 2016;Lei et al., 2018;Teitz et al., 2018;Yuan et al., 2017;Zhu et al., 2015). The OoA model of Y lacks self-consistency as many haplogroups contain derived alleles that would de ne other haplogroups, which violates the method underlying the Africa model in the rst place. There are numerous incompatible alleles in the Y tree of OoA that show recurrent mutations, violating the in nite site model (Poznik et al., 2016). The mtDNA tree of OoA also fails the self-consistency test and contains a large number of back mutations that violate the in nite site assumption required to build the OoA tree in the rst place. These mutations can be found in the o cial mtDNA tree, phylotree (http://www.phylotree.org/), and are designated by an exclamation mark (!) in phylotree. The total number of these back mutations in phylotree is 1180 per our manual counting. Models are simpli cations that are still consistent with reality. It is unacceptable to assume the in nite site model (no recurrent mutations) when recurrent mutations are common. The more correct or realistic model is that recurrent mutations are common, which is the position of the MGD theory.
In addition to being derived from a different evolutionary theory, the Out of East Asia (OoEA) model of uniparental DNAs differs from the OoA model in several fundamental ways (Figure 1 and Figure 2). First, haplogroups in OoEA are de ned by alleles shared by all members within a haplogroup, regardless of their derived status. Secondly, mutations that are shared by the terminal branches and form the basal branches come from both common ancestry and convergent mutations in OoEA but only from common ancestry in OoA ( Figure 3).
It has been shown that different human populations share more alleles in fast evolving SNPs than in slower evolving ones, which indicates convergent mutations in fast evolving DNAs (Yuan et al., 2017). Finally, the rooting in OoEA relies on the reasoning that the original haplotype should be the common type shared by most individuals, since mutations leading to alternative types should be rare events (Johnson et al., 1983;Yuan et al., 2017;Zhang and Huang, 2019a, b). This reasoning has been independently arrived at twice (Johnson et al., 1983;Yuan et al., 2017). The ancestor type should have many alleles different from the outgroup to qualify as a modern type. Thus, alleles representing the early basal branches in OoA are all already present in the rst uniparental ancestor in OoEA (Figure 4). Some of those alleles may revert back to archaic alleles to give rise to new haplogroups as modern humans migrated to new environments and admixed with archaic humans (Figure 4). Coevolution alongside admixed autosomes may cause certain sites in modern uniparental DNAs to mutate back to archaic alleles, or cause certain sites in archaic uniparental DNAs to mutate to modern alleles. However, the rooting in OoA relies on similarity to outgroups and assuming constant mutation rates. Populations with the greatest genetic diversity are viewed as the earliest evolving.
Also, among polymorphic sites, populations that have more shared alleles with the outgroups are considered as the closest to the root (Ingman et al., 2000;Underhill et al., 2000). Both of these reasonings are invalid. Genetic diversity today is at maximum saturation and is not related to time of evolution. Greater genetic diversity or long branch length such as Y haplogroup L00 and mtDNA haplogroup L0a in certain populations like the San people in Africa maybe a result of natural selection related to their huntergatherer lifestyle that selects for greater immunity in response to a more primitive lifestyle (Nemat-Gorgani et al., 2018). The reality of convergent mutations or back mutations suggests that one cannot use sharing of ancestral alleles with the outgroups among polymorphic sites of humans to de ne the oldest branch in a phylogenetic tree.
Phylogenetic trees are largely a speculation about history based on genetic variation patterns of present day samples. Nothing could be more powerful than the ancient DNAs in testing if these speculations are true or not. In particular, ancient DNAs could be used to test whether mutations in terminal branches do not occur until the basal branches have fully formed as expected from the neutral theory and OoA. However, OoEA and MGD expect some fractions of mutations in the shared basal branches to occur only after the terminal branches have already formed, which are due to convergent mutations in different terminal branches. Therefore, ancient samples belonging to a terminal haplogroup are expected to have mutated only in some, but not all, of the sites that de ne a more basal haplogroup ( Figure 3). Furthermore, if this pattern is indeed the case, it can be further used to distinguish a true haplogroup from a fake one and hence a true tree from a fake tree: a fake basal haplogroup should not show such a partial mutation pattern or should be found fully mutated in ancient samples. For example, Y haplogroup B belongs to either AB basal haplogroup in OoEA or BT basal haplogroup in OoA. If ancient B samples show partial mutations in AB de ning sites but always complete mutations in BT de ning sites, then AB is real and BT is fake. As can be seen from the OoEA tree, the rst original modern Y already carried BT alleles in all BT de ning sites and later mutations in these sites would lead to formation of the basal haplogroup A00A1b in OoEA ( Figure   4A). BT alleles are already fully formed in the rst modern Y chromosome and are expected to be fully present in all B samples regardless if they are ancient or present day samples.
The biggest difference between the two competing models of uniparental DNA trees comes from how they handle the basal megahaplogroups in the Africa model upstream of G in the Y tree ( Figure 1) and upstream of L3 in the mtDNA tree ( Figure 2). These mega-haplogroups are thought to result from new mutations since the original type in the Africa model. In contrast, the Asia model considers the alleles of mega-haplogroups like CT of Y and R of mtDNA to be the ancestor type carried by the rst modern human individual, and it is the non-CT haplotypes such as AB or non-R haplotypes such as NML that have acquired new mutations. We here used ancient DNAs to test these two competing models of modern human origins. The results strongly veri ed the Asia model and invalidated the Africa model.

Materials And Methods
Ancient samples were selected for analyses here for no particular reasons other than age, sequence depth, and availability known to the authors. Whole genome sequencing data or in a few cases targeted capture array sequencing data of ancient DNAs in BAM les or fastq les or vcf les were downloaded from links provided by the previous publications (see Supplementary Table S1 for a list of the ancient samples and their references). Targeted capture array sequencing data were less informative as they only cover a limited range of Y chromosome sites but would not bias the results to favor one of the two competing origin models or would cover more informative sites in one model versus the other. Most of these ancient DNA sequences were available for download as Bam les.
Some were in fastq format and BWA 0.7.10 was used to align the fastq format to the human genome GRCh37 using standard parameters . The mapped reads were then ltered and sorted using SAMtools 0.1.19 and Picardtools 1.107 (http://picard.sourceforge.net) (Li, 2011;. The Picardtools 1.107 was also used to remove duplicates. The genotypes of ancient DNAs were called under haploid setting using GATK-3.2-2 or Samtools software with mapping quality setting at Q=30 and base quality at 30 McKenna et al., 2010).
Haplogroups de ning SNPs were identi ed using the ISOGG Y chromosome tree version 15.73 updated on August 5, 2020 (https://isogg.org) and the 1kGP dataset (http://www.internationalgenome.org). Brie y, we downloaded the ISOGG Y chromosome SNP index le and cleaned it up by removing duplicated positions. We also downloaded the Y chromosome SNPs data of the 1kGP, which had 60353 SNPs (Poznik et al., 2016). We merged the ISOGG sites with the 1kGP sites to produce haplotype assignments for SNPs in the 1kGP data. ISOGG only has SNP sites whose derived alleles (different from chimpanzee alleles) can de ne a haplotype.
In the Out of East Asia tree, haplotypes are de ned by shared alleles regardless if they are derived or not. Thus, many SNPs that de ne a haplotype in the OoEA model are ancestral and would not be found in the ISOGG list of SNPs. To nd these, we made use of the 1kGP dataset that had the haplotype information of a SNP assigned based on the ISOGG data. 20257 of the 60353 SNPs in 1kGP data were assigned a haplotype in this way. We next counted the number of minor alleles of each SNPs in the 1kGP dataset among the 1233 male samples in 1kGP. If a SNP has the minor allele as ancestral allele that is present in a group of individuals that share a haplotype as de ned by the derived alleles of some SNPs present in ISOGG, it was assigned either that same haplotype or a haplotype speci c to the Asia model. The OoEA-speci c haplogroups are de ned by both the derived alleles of certain SNPs and the ancestral allele of some other SNPs whose derived alleles de ne a different haplotype speci c to the Africa model. For example, 62 SNPs in 1kGP were assigned as haplotype F as de ned by ISOGG, and their derived alleles are present in 849 samples belonging to F while their minor alleles are present in 384 samples that had A, B, C, D, or E haplotypes. In the OoEA model, the minor alleles of these SNPs would de ne the ABCDE mega-haplogroup. We found 93 SNPs that are not present in ISOGG as F de ning but have minor alleles present in the same set of 384 samples. These 93 SNPs were thus assigned to the ABCDE mega-haplogroup, thereby bringing the total number of ABCDE-de ning SNPs to 155 (62+93). In total, 21659 SNPs in the 1kGP dataset were assigned a haplotype name as found in the Asia model.
Chi-squared test in GraphPad Prism 6 was used to test if the number of unexpected mutation or absence of expected mutation in a mega-haplogroup is signi cantly different from the background error rate in calling the genotypes. The background error rate of calling was determined by calling the sites de ning all the haplogroups from L to T (the fraction of unexpected calls or mutations among all informative sites called). If a sample belonging to a particular haplotype is found to have the de ning alleles or mutations in an unrelated haplotype, such alleles/mutations were deemed unexpected. If the sample has the non-de ning alleles in an unrelated haplotype, such alleles are deemed expected. If the sample has the de ning alleles in the haplotype to which it belongs, the alleles are deemed expected. If on the other hand it has the non-de ning alleles in the haplotype to which it belongs, the alleles are deemed un-expected. To select L to T haplogroups as controls is because most of the informative cases in this study involved the early basal haplogroups A to F.

Partial mutations in both terminal and basal haplogroups in ancient Y chromosomes
To study whether ancient samples of modern humans should have mutated in only a fraction of the sites that de ne the present day haplogroups to which they belong, regardless whether they are the internal or terminal branches, we studied a total of 111 published ancient Y chromosome sequences of relatively high coverage (list of samples and references see Supplementary Table S1) for informative sites (60353 SNPs) as found in the 1000 genomes project (1kGP) or in the Y-DNA haplogroup tree from the International Society of Genetic Genealogy (ISOGG, http://www.isogg.org, version 15.73) (Poznik et al., 2016). These samples were selected for no particular reasons other than age (1000-45000 years ago), sequence depths, and availability known to the authors. For all of these samples we called genotypes using downloaded BAM les or BAM les made from downloaded fastq les.
We rst examined the non-controversial haplogroups that are shared by both the Africa and Asia models. We looked for ancient DNAs that showed mutations in some but not all of the sites that de ne a basal haplotype to which they belong. Of the 111 ancient samples, 32 were found to have partial mutations in a non-controversial basal haplogroup even though they all had mutations in a more terminal haplogroup (Table 1). The basal haplogroups showing partial mutations included B2, C, DE, D, E, G, IJ, I, J, NO, Q1b, and P. For example, I10873 had 57 of 59 informative sites mutated for the basal haplogroup B2-M182 and 2 out of 2 informative sites mutated for the more terminal branch B2b1, thus showing mutation in terminal haplogroup B2b1 prior to completion of mutation in the basal haplogroup B2. To exclude the possibility of sequencing or calling errors, we analyzed all samples for mutations in sites that de ne the haplogroups from L to T (L-T). I10873 showed 0 unexpected mutations in 2027 informative sites de ning L-T. Therefore, the 2 sites among the 59 informative sites for B2 that were found to be non-mutated is highly unlikely due to sequencing or calling errors (P < 0.01, 2/59 vs 0/2027). The results showed that ancient samples of a particular terminal haplogroup often had only partial mutations in the basal haplogroups to which they belong that are non-controversial or present in both the Africa and the Asia model. Table 1 Mutation patterns of ancient Y chromosomes in non-controversial haplogroups. Haplogroup-de ning SNPs were identi ed using ISOGG and the 1kGP dataset. The total combined number of sites from ISOGG and 1kGP was used in the analysis. The number of informative sites and the number of mutated sites are shown for the non-controversial haplogroups. * P<0.01, chi-squared test, comparing the fraction of nonmutated sites in a haplogroup and the fraction of unexpected mutations in haplogroups L-T. If a sample belonged to an haplogroup within L-T, such as O, Q, or R, the counting of sites within L-T had excluded the relevant haplogroups to which the sample belong, as shown. We next examined the ancient samples for mutations in the basal mega-haplogroups that are differently classi ed by the two competing models. We reasoned that if a basal mega-haplogroup is real it should show partial mutations in the haplogroup-de ning sites. If it is not real, it would carry the complete set of haplogroup de ning alleles.
The Asia model differs from the Africa model in the mega-haplogroups ancestral to haplogroup F (Figure 1 and 2). To test which model is true, we examined 76 ancient DNAs of the haplotype A, B, C, D, E, or F. The results showed that 36 of these had partial mutation patterns for the basal branches speci c to the Asia model such as A (A00-A1b), AB, ABDE, and ABCDE (Table 2). In contrast, all of these 36 ancient DNAs carried complete set of haplotype de ning alleles for the basal haplogroups speci c to the Africa model, including A1, A1b, BT, CT, and CF. All ancient samples showed very low rate of unexpected mutations in haplogroups L-T, and so the observed absence of mutation in certain sites in the basal haplogroups in the Asia model is unlikely to be miscalls (P<0.01, Table 2). For example, I10871 sample had haplogroup A00 and the following mutation patterns in the mega-haplogroups of the Asia model, 5 mutated among 5 informative sites in A00A0, 8 mutated among 10 sites in A00A1a indicating partial mutations, 146 among 146 in A00A1b, 98 among 100 in AB indicating partial mutations, 3 among 3 in ABDE, 65 among 67 in ABCDE indicating partial mutations. So, this sample showed partial mutations in A00A1a, AB, and ABCDE, and thus has served to validate the actual existence of these mega-haplogroups. It had very low rate of unexpected mutations (3/2705) in haplogroups L-T that were likely due to sequencing or calling errors, which was signi cantly lower than the rate of non-mutated sites in these mega-haplogroups (P<0.01). Thus, the partial mutation patterns observed here were unlikely due to sequencing or calling errors. A partial list of sites that were non-mutated in certain ancient samples is presented in Table 3. To further con rm that these sites were not called by errors, we veri ed that the reads in the BAM les covering these sites were indeed aligned properly (representative screen shots for sample I10871 are shown in Figure 5). Table 2 Mutation patterns of ancient Y chromosomes in the controversial haplogroups. For numbers in the cells, the rst refers to the number of mutations in the OoEA-speci c haplogroup, the second refers to the number of mutations in the OoA-speci c haplogroup, and the third represents the number of informative sites. Numbers in bold highlight cases where only a fraction of the sites de ning a basal haplogroup had been mutated. * P<0.01, chi-squared test, comparing the fraction of non-mutated sites in a OoEA-speci c haplogroup and the fraction of unexpected mutations in haplogroups L-T.

Sample
Year BP  (Hajdinjak et al., 2021), it is more likely for F6-620 to be ABCDE rather than F.
We also examined 35 ancient DNAs carrying haplogroups that were under F, including G, H, I, K, P, Q, and R (see Supplementary Table  S1). Only one sample, Iceman, was found to have partial mutations in the A1b haplogroup of the Africa model (32 mutations out of 33 informative sites). None of the other samples showed partial mutations for the mega-haplogroups speci c to the Africa model to which they are supposed to belong, such as A1, A1b, BT, CT, CF, and F. Thus, the nding of few occurrence (1 among 35 samples examined) of partial mutation patterns in the OoA-speci c mega-haplogroups upstream of G among ancient samples under F is signi cantly different from the common occurrence (36 among 76 samples examined) of partial mutations in ancient DNAs belonging to A to E (1/35 vs 36/76, P < 0.01). This cannot be explained by miscalls as that would require the miscalls to cluster in the A to E samples, which is unreasonable.
All together, we found that 59 out of 111 ancient samples examined (Table 1 and Table 2) showed partial mutations in a basal haplogroup despite carrying alleles de ning a more terminal haplogroup. This is consistent with expectations based on the MGD theory or convergent mutations (Figure 3). Table 3 List of sites not yet mutated in the OoEA-speci c megahaplogroups but mutated in the wrong OoA-speci c megahaplogroups. Bold bases indicate the alleles found in the ancient DNAs. Derived allele data is from ISOGG. n.a., not available. Further invalidating the mega-haplogroups speci c to the Africa model, many ancient DNAs showed unexpected mutations in certain OoA-speci c mega-haplogroups to which they do not belong (Table 2). Partial mutation in the OoEA-speci c megahaplogroups automatically means unexpected mutations in the wrong haplogroups speci c to the OoA model. For example, I10871 sample had A00 haplogroup but was found to have mutations that would group it to certain OoA-speci c mega-haplogroups to which A00 does not belong, including 2 mutations that would group it with A1b, 2 mutations that would group it with CT, and 2 mutations that would group it with F (Table 2 and Figure 5). So, partial mutation of I10871-A00 in the OoEA-speci c megahaplogroup AB (98 mutation among 100 sites) automatically means unexpected mutations in the wrong OoA-speci c megahaplogroup CT (2 mutation among 100 sites). The nding of ancient A00 samples carrying CT alleles is highly unusual and unexpected. Under the OoA model, it would mean that there were certain sites that have mutated more than once (once in A00 branch and once in CT branch), which would violate the in nite site assumption that makes the construction of the Africa model possible in the rst place. Also, this kind of widespread unexpected mutations in the wrong basal haplogroups in ancient DNAs makes the unambiguous assignment of haplogroups impossible. In contrast, all these ancient DNAs can be cleanly assigned to a basal haplogroup speci c to the OoEA model. Taken together, the data strongly validated the OoEA model and invalidated the OoA model.

Partial mutations in both terminal and basal haplogroups in ancient mtDNAs
We also studied ancient mtDNAs to similarly examine whether the partial mutation pattern could be found in real basal haplogroups common to both the Asia and Africa models, and could only be found in basal haplogroups speci c to the Asia model but not the Africa model. We used published sequences for most of these ancient mtDNAs (downloaded from NCBI). We also downloaded BAM les from published accession numbers of certain ancient sequences and called mtDNA sequences using stringent lters. Among 48 ancient mtDNAs examined (list of samples and references see Supplementary Table S1), six were found to show partial mutations in basal haplogroups, either real ones common to both origin models or the OoEA-speci c basal haplogroups ( Table 4).
None of the ancient DNAs showed partial mutations in basal haplogroups speci c to the Africa model. For example, the I2966 sample had L0k2 but lacked some mutations in the real basal haplogroups common to both origin models, including the absence of 1 mutation in L0a'b'f'g'k and the absence of 2 mutations in L0k (Table 4). I2966 also lacked some mutations in the OoEA-speci c basal haplogroups, including the absence of 1 mutation in NML, of 1 mutation in L5'1'0, and of 1 mutation in L1'0 (Table 4). In contrast, none of the numerous ancient samples of haplogroup U examined here (Supplementary Table S1) showed partial mutations in the OoA-speci c basal haplogroups to which U belongs, including L1'2'3'4'5'6, L2'3'4'5'6, L2'3'4'6, L3'4'6, L3'4, L3, N, and R. Furthermore, the absence of a mutation in a basal haplogroup in the OoEA model automatically means the presence of a mutation in a wrong basal haplogroup speci c to the OoA model. For example, I2966-L0k2 lacked the mutation 16223T in the haplogroup NML but instead had the 16223A allele that would classify it to the haplogroup R (Table 4). Therefore, if the OoA model is true, it would be impossible to unambiguously assign certain ancient mtDNAs to a speci c terminal haplogroup. Such di culty however is non-existent in the OoEA model. Thus, the partial mutation patterns of ancient mtDNAs observed here strongly validated the mtDNA tree rooted in Asia and invalidated the one rooted in Africa. Table 4 List of non-mutated sites in ancient mtDNAs in the non-controversial haplogroups and in the OoEA-speci c megahaplogroups. SNPs represent mutations in the OoEA model and mutations are given in the format [modern base] [position number][archaic base], e.g. "A189G". Modern alleles in these sites are those that were carried by the rst original modern mtDNA in the OoEA model. Archaic alleles are those that were carried by the archaic humans prior to any admixture with modern humans or prior to any convergent evolutions (see Figure 4B).

Modern alleles in archaic uniparental DNAs
As modern humans migrated to new places and admixed with local archaic humans, coevolution alongside admixed autosomes may have caused certain sites in modern uniparental DNAs to mutate back to archaic alleles, or certain sites in archaic uniparental DNAs to mutate into modern alleles. Such coevolution is a fundamental part of the Asia model but is not acknowledged by the Africa model given its neutral and in nite site assumptions which rule out coevolution. We here examined four previously published (two Denisovans and two Neanderthals) Y chromosome high coverage sequencing datasets to see whether these archaic Y chromosomes may carry modern alleles in sites where they are not supposed to according to the Africa model (Petr et al., 2020). Table 5 Modern alleles in archaic Y chromosomes. The total combined number of sites from ISOGG and 1kGP was used in the analysis. For numbers in the cells, the rst refers to the number of mutations in the OoEA-speci c haplogroup, the second refers to the number of mutations in the OoA-speci c haplogroup, and the third represents the number of informative sites. Numbers in bold highlight cases where only a fraction of the sites de ning a basal haplogroup had been mutated. * P<0.01, chi-squared test, comparing the fraction of non-mutated sites in a OoEA-speci c haplogroup and the fraction of unexpected mutations in haplogroups L-T. We merged the genotypes of the archaic Y chromosomes with the Y-DNA haplogroup tree from the International Society of Genetic Genealogy (ISOGG, http://www.isogg.org, version 15.73) and also the list of Y chromosome SNPs from the 1000 genomes project (Poznik et al., 2016). We counted the number of modern alleles among informative sites for the major mega-haplogroups of modern humans, as well as the unexpected mutations in L-T haplogroups (Table 5). Archaic Y chromosomes were found to carry a high fraction of modern alleles in sites that, according to the Asia model, differentiate the original modern Y (not yet affected by admixture) from the original archaic Y (Table 5), including A0000A000 (or A00T in the Africa model. A0000 is Denisovan haplotype and A000 is Neanderthal haplotype), A00A1a (A1b), A00A1b(BT), AB (CT), and ABCDE (F). For example, for sites de ning A00T, the Asia model has the original modern Y chromosome carrying A00T alleles while the original archaic Y carrying non A00T alleles or A0000A000 alleles ( Figure 4A). Coevolution with admixed modern autosomes would result in mutation to modern alleles (A00T alleles) in individuals carrying the archaic Y chromosomes A0000A000 alleles. Thus, among A00T de ning sites, the fraction of modern A00T alleles was 5/19, 8/59, and 5/69 for Denisova 4, Denisova 8, and Mezmaiskaya 2, respectively, which were all signi cantly higher than the fraction of unexpected mutations among sites de ning L-T haplogroups (P<0.01, Table 5). Table 6 Modern alleles among archaic Y chromosomes in sites de ning haplogroup A0-T. The sites listed here de ne A0-T in the OoA tree but A00 in the OoEA tree. Modern alleles in these sites are those that were carried by the rst original modern Y chromosome in the OoEA model or those carried by haplogroups A0-T (A0 to T). Archaic alleles are those that were carried by the archaic humans prior to any admixture with modern humans or prior to any convergent evolutions (see Figure 4A). n.i., not informative. the original modern mtDNA in the OoEA model carried modern alleles in most of the major basal mega-haplogroups that are different from the alleles carried by archaic humans. But the original modern mtDNA in the OoA model carried archaic alleles in most of the major basal mega-haplogroups that are the same as the alleles carried by archaic humans. For example, sites de ning ML haplogroup in OoEA had modern alleles in the original modern mtDNA that are different from archaic alleles carried by archaic humans, but the same sites de ne N haplogroup in OoA and have archaic alleles in the original modern mtDNA. Evolution from the rst modern mtDNA into an African haplogroup such as L0 involved many back mutations to archaic alleles ( Figure 4B). Under the OoEA model, archaic mtDNAs had many modern alleles, i.e., alleles carried by the rst modern mtDNA (Table 7). This sharing of modern alleles could be due to convergent evolution, both dependent or independent of admixture between modern and archaic humans. For archaic humans contemporaneous with modern humans such as Neanderthals and Denisovans, convergent evolution could be related to admixture. In contrast, convergent evolution is forbidden in the assumptions required to build the OoA tree, and it is therefore di cult for the OoA model to account for the presence of modern alleles in archaic mtDNAs. Table 7 Modern alleles in archaic mtDNAs. Sites de ning the basal mega-haplogroups are listed here. SNPs represent mutations in the OoEA model and mutations are given in the format [modern base][position number][archaic base], e.g. "G263A". Modern alleles in these sites are those that were carried by the rst original modern mtDNA in the OoEA model. Archaic alleles are those that were carried by the archaic humans prior to any admixture with modern humans or prior to any convergent evolutions (see Figure 4B). Archaic alleles also are shared between archaic humans and OoEA-speci c haplogroups, e.g., 16223T is common to both Vindi.33.16 and NML haplogroup (formation of NML is a result of back mutation to archaic alleles).

Discussion
The mutation pattern in the ancient Y chromosomes and mtDNAs as revealed here con rms the expectation from the MGD theory that ancient samples belonging to a terminal haplogroup should mutate in only a fraction of the sites that de ne a haplogroup they belong to, regardless if it is a terminal or basal haplogroup. Such a pattern is not expected from the popular neutral framework for phylogenetic inferences. Our study here has systematically examined a large number of ancient samples for both uniparental DNAs, and strongly established the partial mutation pattern as a real phenomenon of ancient uniparental DNAs.
Sequencing ancient DNAs are known to have more errors. Three observations in the case of Y chr indicate that such errors cannot explain our ndings. First, as a negative control for background error rate in our calling method, we determined for each ancient Y sample the rate of calling an unexpected allele among all informative sites that de ne the haplogroups from L to T. We showed in all cases that this rate was signi cantly lower than the rate of calling the absence of mutations that establish the partial mutation pattern as reported here. Secondly, partial mutation patterns in the controversial Y haplogroups almost all happened to the ancient samples belonging to the non-F haplogroups. Nearly all randomly sampled ancient DNAs belonging to haplogroup F failed to show partial mutation patterns in the OoA-speci c haplogroups (A0-T, A1, A1b, BT, CT, and CF), which is signi cantly different from the high incidence of partial mutations in the OoEA-speci c haplogroups (ABCDE, ABDE, AB, A00A1b, A00A1a, A00A0) among randomly selected non-F samples (1/35 vs 36/76, P < 0.01). It is unreasonable for miscalls to cluster within non-F samples. Finally, among the 76 non-F samples, 75 (except I10871-A00) are informative for at least one basal haplogroup among the OoA-speci c haplogroups in terms having partial mutations and yet all showed complete set of haplotype-de ning alleles as fully expected by the OoEA model, which is inconsistent with a signi cant miscall rate. It is unlikely for miscalls to only happen to OoEA speci c haplogroups for these 75 non-F samples (35 out of 75 samples had partial mutations in OoEA haplogroups) while not happen to any OoA speci c haplogroups (0 out of 75 informative samples had partial mutations in OoA haplogroups).
Ancient DNAs are supposed to be more informative than extant DNAs with regard to past events and could thus serve as the best evidence to either verify or invalidate any phylogenetic trees that are built by using extant DNAs. It is therefore surprising that the eld has yet to use the now abundant ancient DNAs to verify the standard model of modern human origins, the OoA model. Is it because the model cannot pass the ancient DNA test? Our study here represents the rst such test. Several observations here strongly invalidated the OoA model and its associated neutral theory and supported the competing OoEA model and its associated MGD theory. First, the partial mutation pattern is common in the non-controversial haplogroups, either terminal or basal. For an ancient terminal haplogroup to have partial mutations in a basal haplogroup could only be allowed by the MGD theory but not by the neutral theory (Figure 3). The completion of mutations in all the sites de ning a basal haplogroup in present day samples must entail convergent mutations from different terminal haplogroups. Just like a real tree, both the stem and the leaf branches grow together. Convergent mutations are to be expected in uniparental DNAs given their widespread presence in fast evolving autosomal DNAs that have reached mutation saturation (Huang, 2010;Wang et al., 2020;Yuan et al., 2017). Sharing of alleles among different human populations has been found to happen mostly in fast evolving SNPs but less common in slowly evolving ones, which is consistent with convergent evolution rather than admixture or recent common ancestry (Yuan et al., 2017). Convergent mutations have also been found to be very common in long term evolutionary experiments on microorganisms (Johnson et al., 2021;Katz et al., 2021). As mutations are stochastic, one does not expect all ancient samples to show partial mutation patterns in the basal haplogroups. Nonetheless, we observed a high rate of occurrence of partial mutations in the non-controversial Y chr basal haplogroups among the 111 ancient samples studied (32 out of 111, Table 1).
Secondly, the partial mutation pattern is commonly found also in the OoEA-speci c mega-haplogroups but not found in the OoAspeci c mega-haplogroups. For Y chr, for example, 35 out of 75 non-F samples had partial mutations in OoEA haplogroups but none of these had partial mutations in OoA speci c haplogroups. Also, nearly none of the 35 F samples examined showed partial mutations in the OoA speci c Y haplogroups. Such a pattern is fully expected if the OoEA model is true and if the OoA model is false. If the partial mutation pattern is a real phenomenon, which appears to be the case, it would apply to any haplogroup. Thus, the absence of that pattern for the OoA-speci c haplogroups strongly suggests that the OoA model is unrealistic. Ancient DNAs are either non-informative in distinguishing the two competing models (have all the expected alleles from either models) or informative in falsifying the OoA model and validating the Asia model. We did not nd any informative cases in the ancient DNA record that would support OoA.
Thirdly, our analyses showed that the absence of a mutation in a site de ning an OoEA-speci c haplogroup automatically means the presence of a mutation in a site that de nes an OoA speci c haplogroup to which however the concerned sample does not belong. A haplogroup, be it ancient or present, should not carry mutations that de ne haplogroups to which they do not belong. This expectation is only met by the Asia model but not the Africa model. For example, I10871-A00 Y chromosome carried alleles that would assign it to A1b, CT, and F of the Africa model, to which it does not belong. I2966-L0k2 mtDNA carried alleles that would assign it to L2'3'4'6' and L2-L6 of the Africa model, to which it does not belong. It is however not surprising for ancient samples to fail to be cleanly assigned to a haplogroup within the OoA model if the model is simply incorrect.
We discuss in more detail the implications of our nding using Mota-E1b1a1 Y chromosome as an example with regard to the ABCDE mega-haplogroup under the OoEA tree. Mota had partial mutations in the E1b1a1-M291 terminal haplogroup, mutating only 14 of 178 informative sites. Mota had two SNPs that would assign it to F. They are among a set of 155 SNPs that can divide all present day haplotypes into two mega-groups, F (F to T) and ABCDE (A to E). Mota was informative for 122 of these 155 SNPs. For the two alleles of each of these SNPs, A1 and A2, if one mega-group carries the A1 allele, the other mega-group would carry the A2 allele. Depending on assuming to which mega-group the rst modern male belonged, one gets either the OoA model or the OoEA model. In the OoA tree, the rst original modern male is supposed to carry the alleles shared by all present day ABCDE samples in these 155 SNPs ( Figure 4A). In the subsequent diversi cation of haplotypes under OoA, these ABCDE-de ning alleles were mutated to form the F haplogroup or become F-de ning alleles. It is therefore clear that all samples of the ABCDE mega-group, regardless if they are ancient or present, should carry the complete set of ABCDE-de ning alleles in all these 155 SNPs, since the rst modern male individual who was the ancestor to all ancient and present day male individuals was supposed to already have all of these alleles under the OoA model. Also, ancient F samples are expected to show partial mutations, having most of the ABCDE-de ning alleles (but not all) changed into F-de ning alleles. So, the nding of an ancient E (Mota) carrying two F-de ning alleles would invalidate the OoA model or its assumption that the rst modern male carried ABCDE-de ning alleles in these 155 SNPs. It indicates that Mota-E1b was on its way to become F, mutating 2 of the 122 informative SNPs into F-de ning alleles, which is absurd. If the Mota E lineage were to survive to today, it would have to have these two SNPs mutated back to ABCDE-de ning alleles. But back mutations are forbidden by the assumption of in nite site (meaning no back mutations) that is required to build the OoA tree in the rst place. Furthermore, all of the ancient F-samples (28 of 28) studied here carried complete set of F-de ning alleles in these 155 SNPs, therefore showing no partial mutation patterns, which is unexpected if F was formed by mutating ABCDE alleles carried by the rst Y. The two unexpected F-alleles in Mota cannot be explained by miscalls, since we found as a control that there was not a single unexpected call for Mota among 5588 informative sites that de ne L to T haplotypes (2/122 vs 0/5588, P<0.0001).
In contrast, in the OoEA tree, the rst modern male is supposed to carry the alleles shared by all present day F samples in these 155 SNPs ( Figure 4A). In the subsequent diversi cation of haplotypes under the OoEA model, these F-de ning alleles were mutated to form the ABCDE mega-haplogroup or become ABCDE-de ning alleles. It therefore follows that all F samples, regardless if they are ancient or present, should carry the complete set of F-de ning alleles in all these 155 SNPs, since the rst modern male already had all of them under the OoEA tree. Indeed, we found that all ancient F (F to T) samples (28 of 28 studied) carried the complete set of Fde ning alleles in all these 155 SNPs, which is in stark contrast to the above mentioned observation that many ancient A to E samples did not carry the complete set of ABCDE-de ning alleles in these 155 SNPs. Many of these ABCDE samples (34 out of 76 studied, Table 2), including Mota, had partial mutations in these 155 SNPs, changing most but not all of the F-de ning alleles into ABCDE-de ning alleles, which is to be expected if the ABCDE mega-haplogroup is real and was formed by mutations from the rst Y who carried the F haplotype or the complete set of F-de ning alleles in these 155 SNPs.
Finally, we found that certain alleles in modern uniparental DNAs are also present in archaic humans. Sharing of alleles between archaic and modern humans in most cases is more likely a result of coevolution due to common adaptation to a shared environment or physiology, or natural selection. As convergent evolution is common under the MGD theory but forbidden by the neutral theory, the presence of modern alleles in archaic humans is evidence for the MGD theory. Only the OoEA model can explain such presence by postulating coevolution of admixed autosomes and uniparental DNAs (Figure 4).
We here used ancient DNAs to test the two different models of the human uniparental DNA phylogenetic trees and their respective underlying theory of evolution. The results con rm the actual existence of the haplogroups speci c to the Asia model and support the MGD theory that underlies the model. As might be expected already from the failure of the OoA model to be self-consistent, ancient uniparental DNAs strongly invalidated the OoA model and the neutral theory underlying it. The great value of having two competing models is to make it unnecessary to make up ad hoc epicycles in order to make a model escape falsi cation by contradicting data. If one model is supported by new data while the other is contradicted by it, it is straightforward to recognize the correct model.   Expected pattern of mutations in terminal and basal haplogroups of uniparental DNAs from the MGD theory or the neutral theory. Before basal haplogroup C diverged into terminal ones C1 and C2, no sites in C1 and C2 have mutated and only some sites in C have mutated under the MGD theory. After C1 and C2 diverged, C1 (or C2) had mutations at some sites de ning C1 (or C2). In ancient samples, formation of terminal haplogroups C1 and C2 can take place before complete mutation in sites de ning the basal haplogroup C under the MGD theory but cannot under the neutral theory. Convergent mutations are supposed to be common under the MGD theory but non-existent under the neutral theory and would result in complete mutations in sites de ning the basal haplogroup C in present day C1 or C2 haplogroups in the model based on the MGD theory.

Figure 4
Schematics of the allele patterns in uniparental DNAs in the Africa model versus the Asia model. Sequences are represented by horizontal lines and sites de ning representative haplogroups are indicated by rectangular boxes (length not to scale). A. Y chromosome. A0000 represents Denisovan haplotype and A000 represents Neanderthal haplotype. The term "A00 to T" represents all individual haplotypes that are de ned by haplotype-speci c mutations not present in archaic humans. In the case of A0 and G, the small black box in the "A00 to T" box indicates A0-or G-speci c mutations, respectively. A0 is a result of admixture while G is not. B. mtDNA. The term "L0 to Z" represents all individual haplotypes that are de ned by haplotype-speci c mutations not present in archaic humans. In the case of L0 and U, the small black box in the "L0 to Z" box indicates L0-or G-speci c mutations, respectively. L0 is a result of admixture while G is not. Figure 5 Screen shots of aligned reads covering the sites not yet mutated in the OoEA-speci c haplogroups belonging to the ancient sample I10871. Files of aligned sequencing reads (BAM les) of sample I10871-A00 were downloaded from published source and screen shots of the aligned reads covering the sites not yet mutated are shown. For each of the 6 sites shown here, HG19 positions, allele for the OoA-speci c haplogroup, and allele for the OoEA-speci c haplogroup are also indicated. The non-mutated site is marked by a red star.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.