We collected samples from 92 unrelated individuals from Baranja (81 males), Croatia and 81 from the Zobor region, Slovakia, with 40 of them being men (Fig. 1). Most of these individuals hailed from various villages predominantly inhabited by Hungarian-speaking minorities in both countries (as illustrated in Fig. 1, indicating the distribution of samples). Every participant provided written consent at the time of the sampling. Detailed ancestral documentation spanning two generations was available for these individuals, revealing that the majority of their ancestors were born in the same region of Slovakia/Croatia and self-identified as Hungarians.
We obtained novel genetic data consisting of 168 newly sequenced whole mitochondrial genomes and 23 Y-STR (Y-chromosomal Short Tandem Repeat) haplotypes and over 40 Y-SNP (Single Nucleotide Polymorphism) profiles from 121 males.
Y-chromosome diversity
To research the genetic variation within the Hungarian-speakers, we employed evolutionarily stable binary markers (SNPs) to define the haplogroup of each Y-chromosome. Subsequently, we examined the Y-STR variation of the groups, and specific phylogenetic analyses within eight haplogroups.
The Y haplogroup frequencies of the two populations are presented in Table 1 and on Fig. S1. Furthermore, the haplogroups' origins and current distribution peaks, and ISOGG 2019-2020 names of the haplogroups can be found in Supplementary Tables S3. The most frequent haplogroups in the Zobor region population were R1a-Z280 (32.5%), R1a-M458 (25%), R1b-P312 (15.00%), G2a-L156 (7.5%).
In the case of the Baranja males, the most frequent haplogroups were I2a-P37 (21.95%), R1a-Z280 (17.07%). The overall pattern of Y-chromosomal haplogroup distributions in the two studied populations were similar, but haplogroups R1a-Z93, N1c-M46, C2-M217, J2b-M12 appeared only in the Baranja population (Table 1). Haplogroups G2a-L156 and R1b-M343/P25 (L23) were observed more frequently in the Baranja population. We focused on the genetic history of these specific haplogroups (G2a-L156, R1b-L23) beside the R1a-Z280 and I2a-P37, as they have been previously shown to represent phylogeographically relevant structures 25,26.
The Baranja group exhibited haplotype and haplogroup diversities of 0.99938 and 0.90586, respectively. In contrast, the Zobor region displayed lower values, with 0.98974 for haplotype diversity and 0.81154 for haplogroup diversity. The Y-STR and Y-SNP outcomes for the 40 samples from the Zobor region (Slovakia) and the 81 samples from Baranja (Croatia) are detailed in Supplementary Table S3 and S6. The diminished diversity observed in the Zobor region might be attributed to the smaller sample size. However, this reduced diversity is still more pronounced (0.812 haplogroup diversity) than what was observed in the Váh valley group, presented by n=48 Y-STR haplotypes 26.
Haplogroup frequencies in Baranja, Croatia
|
Haplogroup frequencies in Zobor region, Slovakia
|
2019-2020 ISOGG nomenclature
|
Haplogroup
|
Sample number
|
Frequency
|
Haplogroup
|
Sample number
|
Frequency
|
Haplogroups
|
C-M216
|
1
|
1.22%
|
C-M216
|
0
|
0
|
C (M217: C2)
|
E1b1b-M78
|
6
|
7.32%
|
E1b1b-M78
|
1
|
2.5%
|
E1b1b1a1
|
E1b1-M123
|
1
|
1.22%
|
E1b1-M123
|
0
|
0
|
E1b1b1b2a1
|
G2a-L156
|
3
|
3.66%
|
G2a-L156
|
3
|
7.5%
|
G2 (P15: G2a)
|
I1-M253
|
8
|
9.76%
|
I1-M253
|
0
|
0
|
I1
|
I2a-P37
|
18
|
21.95%
|
I2a-P37
|
4
|
10.0%
|
I2a1a
|
I2b-M223
|
1
|
1.22%
|
I2b-M223
|
0
|
0
|
I2a1b1
|
J2b-M12
|
2
|
2.44%
|
J2b-M12
|
0
|
0
|
J2b
|
L-M11
|
1
|
1.22%
|
L-M11
|
0
|
0
|
L
|
N-VL29
|
1
|
1.22%
|
N-VL29
|
0
|
0
|
N1a1a1a1a1a
|
R1a-M458
|
4
|
4.88%
|
R1a-M458
|
10
|
25.0%
|
R1a1a1b1a1a
|
R1a-Z280
|
14
|
17.07%
|
R1a-Z280
|
13
|
32.5%
|
R1a1a1b1a2
|
R1a-Z93
|
2
|
2.44%
|
R1a-Z93
|
0
|
0
|
R1a1a1b2
|
R1b-M343*
|
1
|
1.22%
|
R1b-M343*
|
0
|
0
|
R1b
|
R1b-M412
|
1
|
1.22%
|
R1b-M412
|
1
|
2.5%
|
R1b1a1b1a
|
R1b-P25*
|
8
|
9.76%
|
R1b-P25*
|
1
|
2.5%
|
L23: R1b1a1b1
|
R1b-P312
|
5
|
6.10%
|
R1b-P312
|
6
|
15.0%
|
R1b1a1b1a1a2
|
R1b-U106
|
4
|
4.88%
|
R1b-U106
|
0
|
0
|
R1b1a1b1a1a1
|
T-M70
|
1
|
1.22%
|
T-M70
|
0
|
0
|
T1a
|
R1a-M198*
|
0
|
0
|
R1a-M198*
|
1
|
2.5%
|
R1a1a
|
Table 1 Haplogroup frequencies of the Y-chromosomal haplogroups from Baranja and Zobor region
Paternal genetic structure of the two populations
The Y haplogroup frequency data were calculated incorporating reference populations and used for a PCA plot (Supplementary Table S7 and Fig. S2). The location of the studied populations on the PCA plot is roughly consistent with the geographical distances between them. Populations from the same geographic region were clustered together and Hungarian populations overlapped with the surrounding Slavic populations, and the Zobor region shows more connections to northern, northeastern populations. The resulted pattern with slight shift of the Zobor region sample set from the Baranja group is primarily due to the relatively high I2a, I1 and E1b1 haplogroup frequencies in Baranja populations. Further differences may be due to the preponderance (25%) of R1a-M458 in the Zobor region population, which is common among Western Slavs, and the absence of the R1b subgroup (U106), which is common in Western Europe. Interestingly, occurrence of J is relatively small in the Baranja population compared to other Hungarian groups and the Székelys, and is absent in the Zobor region group. The previously detected Q at the Székelys is also missing in the current two populations. However, the small sample numbers of the tested groups might also influence these affinities.
Pairwise FST distances and p-values for 41 populations, including Baranja, Zobor region, and other Eurasian populations from published sources were calculated as shown in Supplementary Table S8 and presented in a heatmap plot with clustering (Fig. S3). The Zobor region shows significant genetic distance from almost every other group (p<0.05), whereas the Baranja group is in non-significant distance from the pooled population of Hungary, the Székelys, Moldovans and Slovenians. While small sample sizes limit the scope of definitive conclusions, the clustering groups populations with high genetic affinities to one another. Eastern Europeans and Hungarians from Hungary, the Baranja, and Zobor regions form one cohesive cluster. Another closely related clade comprises North Europeans. In contrast, populations with rather Southeastern European characteristics, including the Székelys and Csángós, constitute a distinct cluster (Supplementary Table S8).
We further investigated these inter-population affinities with Y-STR data, calculating RST genetic distances. We constructed non-metric multidimensional scaling (MDS) plot based on Y-chromosomal haplotypes (n=7,287) collected from YHRD.org, consisting of 23 STR available loci from geographically relevant populations 42 (Fig. 2). The RST genetic distances and RST p-values of the studied populations are presented in Supplementary Table S9.
Whereas the Székely population still shows connections toward southern populations and to diverse groups of the Carpathian Basin (like nonsignificant distance from the Váh valley population and Baranja) and the Slovenians, the Baranja population shows a stronger genetic similarity to Bodrogköz, Váh valley, Slovenian and Czech populations beside the Székelys. Rétköz and Bodrogköz groups were the closest to Zobor region from the Carpathian Basin, although Polish population was also in nonsignificant distance (Supplementary Table S9).
Summarizing these results, we can conclude that the studied populations do not separate from their neighboring groups, and whereas different trends are present in the two new datasets, fine-scale geographic pattern is not decipherable through grouped Y-haplogroup or 23 STR data analyses and low-resolution SNP typing.
Phylogenetic analysis of the paternal lines
Based on the Y-STR haplotypes, median joining networks were constructed for the two investigated regions (Fig. S4), and for the available 21 Y-STR datasets of the Carpathian Basin (Fig. S5). We can observe on these networks that paternal haplotypes are spread throughout the studied regions. Furthermore, based on the data reviewed to date, the Carpathian Basin does not display specific Y-haplotype structure in the modern male population that corresponds with its geography, aligning with the RST results. Only subtle differences are observable in the Székely population along with their shared paternal ancestries in the Bodrogköz/ Rétköz populations (through differences of frequencies of J2 and N1a haplotypes).
Subsequently, we analyzed specific haplogroups that can be linked to ancient Hungarian data from phylogenetic aspects.
We constructed eight networks (R1a-Z93, N-M46, R1b-P25/M343, G2a-L156, R1a-Z280, J2b-M12, C-M217 and I2a-P37) that are potentially helpful in uncovering the genetic legacy of the populations being studied. Four of them (G2a-L156, R1a-Z280, J2b-M12, C-M217) can be found with descriptions in Supplementary Information as Fig. S6-S9.
Median Joining network of 123 R1a-Z93 haplotypes
A MJ network of 123 R1a-Z93 haplotypes from the 18 populations tested in this study or published previously from modern 23,30,43 or ancient sources 18,34,44 is seen on Fig. 3. On the left side of the R1a-Z93 network, three modern Hungarian and one Xiongnu period (TUK04) aDNA samples, including one modern Baranja sample (DV 60) formed a common branch with Bashkirian Mari, Uzbek Khwarazm and Uzbek Fergana samples. On this branch, one modern Hungarian sample shared two haplotypes (one TUK25 from the Xiongnu period and one Bashkirian Mari) in haplotype cluster 1. Two Uzbek (one Khwarazm and one Fergana sample) and one Baranja samples can be derived from this cluster (see Fig. 3).
The other Baranja haplotype is on a Central- and Inner-Asian branch, with two Hungarian samples on the right side of the network.
The paragroup R1a-Z93* is most common in the Altai region of Southern Siberia nowadays but it has also spread in Kyrgyzstan and in all Iranian populations 43. Furthermore, the R1a-Z93 haplogroup is also common in Tajik ethnic groups in Afghan Pashtuns and Caucasus as well. Downstream haplogroup R1a-Z2125 occurred at highest frequencies in Kyrgyzstan and among Afghan Pashtuns 43.
Keyser et al. (2021) demonstrated that ancient Xiongnu period (TUK45, TUK04, TUK25 and TUK09A) samples from Mongolia belonged to haplogroups R1a-Z93 (Z2125), which are also included in this network. Some of them shared the same haplotype on 10 Y-STR level with a present-day Hungarian (see cluster 1 in Fig. 3) or Hungarian aDNA (Nagykőrös Gr2) samples (see cluster 2 in Fig. 3), but they differ in deeper analyses when we consider their full available Y-STR profiles.
Ancient DNA studies showed that the Hungarian King Béla III, and another sample from Royal Basilica (II54) belonged to haplogroup R1a-Z93 18, and two other Z93 samples were found among the Hungarian Conqueror population as well 34. On 15 Y-STR level they are three steps away from each other.
Although, Y-STR analyses complicated in aDNA research due to degradation, SNP data are accumulating via whole genome sequencing and genome-wide capture approaches. We gained a more accurate haplogroup classification of the modern DV020 (Baranja) R1a-Z93 sample: DV020 is R1a1b2a2a3c2~ (FGC56440 terminal SNP), which subhaplogroup can be found in the Hun period Carpathian Basin and Kazakhstan as well 45. From whole genome data, among the Z93 samples a Hun period sample from Romanian Marosszentgyörgy and three other middle-late Avar samples are classified as R-S23201*(xYP1348, Y65081), which means that it can be the same as haplogroup R1a1a1b2a2a3c~ of ISOGG 15. Another Hun period sample (Budapest Vezér street) completely matches the DV020 sample on ISOGG classification, its Y haplogroup marker is R-S12380/etc*(xY73177), which is a subbranch of the DV020-specific FGC56440 branch on Yfull. There are many examples of this haplogroup known from ancient genomic studies dated to the Bronze Age, found at Russian Krasnoyarsk, Kazakh Aktogay (1900-1400 BCE) and Early Iron Age Tasmola culture (700-500 BCE) 46. Some samples from the (Middle) Late Bronze Age Mongolia and a few Xiongnu samples also show the Z2124 subgroup based on whole genome SNP data 47,48. The R1a1a1b2a2a3~ subhaplogroup is present in an early medieval (second third 10th century to first third 11th century) Hungarian village cemetery site Homokmégy-Székes as well 7. We conclude that this Y haplogroup might have arrived at the Carpathian Basin in one of the eastern migrations and can have an origin in the Kazakh steppe.
Median-joining network of 180 N-M46 haplotypes
Based on 180 haplotypes, an N-M46 (ISOGG 2019-2020: N1a1-M46/Tat) MJ network was constructed using populations previously studied by Bíró et al. (2015); Fehér et al. (2015); Szeifert et al. (2022); Pimenoff et al. (2008); Ilumäe et al. (2016) (Fig. 4). The modal haplotype of the N-M46 network (former N1c-Tat) is shared by 32 samples from eight populations (haplotype cluster 1 on Fig. 4), recognized by researchers as the ancestral haplotype 52). Cluster 2, which is the largest cluster, includes 43 haplotypes from five populations, and located one molecular step (DYS391) from cluster 1. More than 70% of the studied Buryat samples belonged to cluster 2, indicating that the Buryats we studied belong to a young and isolated population 23,30.
From the perspective of Hungarians, haplotype cluster 3 is pertinent. Cluster 3 comprises three identical haplotypes: two Hungarian and one Northern Mansi, with the Baranja N-M46 haplotype included among them.
As it can be seen from Fig. 4, the right branch of the network has almost exclusively representatives of the Finno-Ugric language group, except for Bashkirs. Bashkirs have been living close to the Ugric peoples around the Ural Mountains, where admixture can be traced back for a millennium. This network branch can be derived from cluster 2, where the Buryats form the majority of the haplotypes.
The other population samples included in the network formed independent clusters, such as Bashkirian Mari, Finns, Bashkirians, and Khanties from the research of Ilumäe et al. (2016) and shared haplotypes with other populations or were scattered in the network.
Macrohaplogroup N-M231 is widespread from Scandinavia to the Kamchatka in North Eurasia and it is the most frequent haplogroup in Siberia 3,53. Based on the geographic distribution of the parahaplogroup N*-M231, it most likely originated in Southeast Asia, whereas its most widespread subgroup is N-M46 53. Other authors consider Southern Siberia as its geographical origin 52.
The present-day Hungarian Y-chromosomal gene pool contains only a small percentage of N-M46 (1%) and has a distribution typical of East-Central Europe 54. However, its incidence is higher among the Hungarian-speaking Bodrogköz in East Hungary (6.2%) and among Székelys from Miercurea Ciuc, Romania (6.3%) 23,25. According to the results of the Hungarian aDNA studies, N-M46 is detected at a higher frequency (17-36%) in the Hungarian Conquest Period population in the Carpathian Basin 16,34.
We also constructed this network using 15 Y-STRs, incorporating additional ancient samples related to ancient Hungarians from the Ural and Volga regions (Fig. S10). The Baranja sample diverges from a Bashkir haplotype by two mutational steps, while the other Hungarian males share their haplotype with medieval samples from the Volga and Ural regions. The Avar period samples belong to a different subgroup (N1a1a1a1a3a-F4205) of the N-M46 clade, as corroborated by genomic studies 17,45. Although substructures observed on the 15 Y-STR network, we conclude that this lineage in the Baranja population has a Conquest Period origin in the Carpathian Basin.
Median-joining network of 196 I2a-P37 haplotypes
The MJ network of 196 I2a-P37 STR haplotypes from 14 populations is illustrated in Fig. 5. Greek, Irish 55, Catalan samples 56, Croatian samples from FTDNA 57, and Slovakian samples 58,59 were used in the analyses. One sample (Karos II, grave 16) from the Hungarian Conquest Period 34 and all other samples originated from previously studied and published populations (Bíró et al. 2015; Pamjav et al. 2017; 2022; Borbély et al. 2023). In this network, Hungarian speakers residing in rural regions and adjacent countries were distinguished separately. While they are well-documented, there is limited data available that combines both haplotype and haplogroup information from other European populations.
Several haplotype clusters can be seen in Fig. 5, which shows different connections of the populations included in this study. Many shared haplotypes can be observed between the populations (circles of mixed colors). The haplotype cluster 1 is shared by 16 males from eight populations (one Csángó, two Székely, four Hungarian, one Váh valley, two Bodrogköz, one Slovakia, one Greek and four Baranja). A conquering period Hungarian ancient haplotype (Karos II aH) is located one mutational step (DYS389II) away from cluster 1, which indicates that it is genetically closely related to the mixed samples of cluster 1.
Haplogroup I-M170 is a fundamental element of the European Y-DNA gene pool, representing, on average, 18% of the total male lineages. Its near absence in other regions, including the Near East, indicates its likely emergence in Europe, potentially before the Last Glacial Maximum (LGM) 60.
Haplogroup I-M170 has two major subgroups: I1-M253, which is common in Scandinavia, and I2-M438. I2 subgroup I2a-P37 (formerly I1b, currently I2a1a), extends from the eastern Adriatic to eastern Europe and noticeably decreases towards the southern Balkans. I2a probably diffused from its homeland, Eastern Europe or the Balkans after the LGM 61.
In contrast, I2b-M223 (formerly I1c, currently I2a1b1) most likely arose in southern France/Iberia and similarly to the other subclades, it underwent a postglacial expansion 61. Taken together, these observations suggest that haplogroup I-M170 may have played a central role in the process of human recolonization of Europe from isolated refugia after the LGM and suggest that a comprehensive phylogeographic study should localize the in situ origin and spread of major male founders 61. According to the study of Peričić et al. (2005), the I2a haplogroup corresponds to the historic expansion of the Slavs that may have taken place in the middle of the first millennium AD and resulted in significant admixture with the substratum populations living in Eastern Europe. Their haplogroup, subgroup I2a, is widespread among the Slavs, especially the western south Slavs, but has also been detected up to 3-7.1% in populations of the Northern Caucasus 61.
Based on the studies we conducted, the frequency of I2a-P37 in Hungarian speakers was as follows: Hungarian (16%), Csángó in Ghimeş (26.6%), Székely from Csíkszereda (8%), Székelys around Székelyudvarhely (21.8%), Bodrogköz (19%), Rétköz (6.6%), Váh valley (16.67%) (Pamjav et al. 2022; Borbély et al. 2023; Völgyi et al. 2009; Bíró et al. 2015; Pamjav et al. 2017). In the present study, it was also prevalent in Baranja (19.75%) and Zobor region (10%) populations. I2a-P37 is found today with the highest frequency in Bosnians (40%), followed by Croatians (31.2%), other peoples in the Balkans, and Ukrainians (16.1%), and populations in Central and Eastern Europe 61.
Based on our network, there are admixtures and shared lineages between the Hungarian-speaking populations (clusters 1-11), and also with that of the neighbouring Slavic countries (clusters 1-4 and 7-8), included in the study (Fig. 5). Among the conqueror Hungarians, there were three samples that had haplogroup I2a, but due to the lack of overlapping STR loci, only one sample could be included in the study. According to the authors, these three I2a samples were close relatives on the paternal lineage 34. Based on another Hungarian study, six Conquest Period individuals also carried the I2a haplogroup, but two of them (namely Karos II graves 16 and 52) were duplicates of those included in the study of Fóthi et al. (2020) and Neparáczki et al. (2019). According to the archaeological records, the male from Karos II (grave 52) was a leader of the community 63, who belonged to subgroup I2a1a2b1a1a2 (I-Y4460*(xY5598, Y13498, Y16810)) 15. The heterogeneous genetic composition of the Karos community however, and also the mixed genomic type of K2-52 (48-56% Eur_Core and 44-52% Alan/BMAC15) allows space for local (Avar period) components in his ancestry.
Based on the YSEQ I2 panel, the deeper classification of the DV046 (Baranja) sample is I2-Y125026, which is I2a1b2a2b2b~ according to ISOGG 2019-2020. Modern-day Yfull data from this haplogroup are known from Hungary, Croatia, Romania, Serbia, Turkey, Montenegro, Greece, Poland, and Russia. As P37 subgroup has been prevalent in the Southeastern European area already at the time of the Hungarian conquest (e.g. Olalde et al. 2023), and is rare/absent in the Volga-Ural Hungarian-related communities (Szeifert et al. 2022), we conclude that it most probably represents a local lineage in the modern Hungarian-speaking populations, as it was similarly an European component in the mixed population of the Conquest period.
Median-joining network of 207 R1b-P25 haplotypes
An R1b-P25 MJ network (with R1b-L23 and R1b-M73 subbranches) was constructed using 207 samples from the present study, from FTDNA data and populations previously studied 23,64,65 (Fig. 6). The founding haplotype of the Eastern R1b*-P25 (L23, a subgroup of R1b-M269) was shared by 17 samples, including eight Hungarian and two Caucasian Avar, two German and five Czech samples, as shown in Fig. 6 (see cluster 1). All Hungarian haplotypes included in the network belong to the R1b-L23 cluster and most of them appear to be descended from the founding haplotype (cluster 1). The pattern of these haplotype clusters is starlike, representing a set of closely related haplotypes of Hungarian males. Out of ten samples from the present studied regions, six samples are located one or two mutation steps away from the founding haplotype cluster together with other Hungarian males indicating a close relationship with each other in space and time. In addition to the founding haplotype, some Hungarian haplotypes were shared with Europeans (cluster 2) or with Lezghians and Armenians from the Caucasus (cluster 3) as well as with Croatians (cluster 4). Hungarian, Belgian, Armenian, Croatian, German, and Scottish samples show the most similar haplotypes to Baranja and Zobor region samples within the R1b-L23 haplogroup if we compare them to samples from FTDNA at 17 or 21 STR level.
Haplogroup R1b-M269 is the most frequent Western European lineage that was originally thought to have originated in the Paleolithic era, but recently suggests a Late Neolithic origin 6. Most of the R1b-M412 chromosomes belong to Western European males, but another subgroup, R1b-L23 (xM412, R1b1a1b1), is commonly referred to as “Eastern European R1b”, prevalent in the Caucasus, Turkey, and Ural, with about 10% frequency 64.
Olalde et al. (2018) have confirmed the role of R1b-L23 subclades in the expansion of the Eastern population of the Bell Beaker culture to Iberia. And it was also shown to be an important part of the Yamnaya-related Early Bronze Age paternal ancestry 67.
Based on recently published aDNA studies, haplogroup R1b-L23 was present in the territory of today’s Czechia and Poland in Corded Ware culture associated samples from 2000-3000 BC 68 and later in the Migration Period in Hun period, in the Avars and the Hungarian conquerors 16,17,34,45. Nowadays, examples of this subgroup are scattered throughout Europe, with the highest concentrations in the United Kingdom and Ireland, as per Yfull data.
In the two regions under investigation, five R1b-P25 samples were analyzed for the marker Z2103. All results fell under the Z2103 (R1b1a1b1b) subgroup. As this subgroup was both found in the 8-14th century Volga region (Szeifert et al. 2022) and in the local area in pre-Conquest times (Maróti et al. 2022, Olalde et al. 2023), we cannot estimate its time of arrival to the Carpathian Basin. However, Baranja haplotypes originating from cluster 3 pinpoint a separate population event from the other clusters, most likely originating from the Caucasus.
Further analyses of G2-L156, R1a-Z280, J2b-M12, and C-M217 median-joining networks can be found in Supplementary Information of this paper.
Evaluation of the mitochondrial DNA data
Haplogroup-based analyses
Altogether 168 newly reported high-coverage whole mitogenomes were analyzed in this study, 79 from Zobor region and 89 from Baranja with a mean mitogenome coverage of 209.05x, using Illumina NGS technology.
The mitochondrail haplogroup frequencies of the two populations are presented in Table 2 and on Fig. S1.
Haplogroup mtDNA
|
n (absolute frequency, Zobor)
|
Frequency Zobor region
|
n (absolute frequency, Baranja)
|
Frequency Baranja region
|
H
|
34
|
43.04%
|
37
|
41.57%
|
K
|
8
|
10.13%
|
7
|
7.87%
|
U5a
|
7
|
8.86%
|
5
|
5.62%
|
U2
|
5
|
6.33%
|
2
|
2.25%
|
J
|
4
|
5.06%
|
7
|
7.87%
|
T/ T1
|
4
|
5.06%
|
3
|
3.37%
|
U4
|
3
|
3.8%
|
5
|
5.62%
|
HV
|
2
|
2.53%
|
3
|
3.37%
|
T2
|
2
|
2.53%
|
4
|
4.49%
|
V
|
2
|
2.53%
|
3
|
3.37%
|
X
|
2
|
2.53%
|
2
|
2.25%
|
Y
|
2
|
2.53%
|
0
|
0%
|
D
|
1
|
1.27%
|
0
|
0%
|
N1
|
1
|
1.27%
|
0
|
0%
|
U5b
|
1
|
1.27%
|
5
|
5.62%
|
L
|
1
|
1.27%
|
0
|
0%
|
R
|
0
|
0%
|
1
|
1.12%
|
U3
|
0
|
0%
|
2
|
2.25%
|
W
|
0
|
0%
|
3
|
3.37%
|
Table 2 Major mtDNA haplogroups and their frequencies in the Zobor region and Baranja populations. Subhaplogroup resolution is detailed in Supplementary Table S3.
In the Zobor region, 79 mitogenome sequences revealed 377 polymorphic sites, corresponding to 63 distinct haplotypes. These exhibited a haplotype diversity (Hd) of 0.9932. On the other hand, 89 mitogenome sequences of the Baranja region population displayed 447 variable sites, clustering into 78 unique haplotypes with a marginally elevated haplotype diversity of Hd = 0.9969 compared to the Zobor region.
The median-joining network of mitogenomes from the investigated regions showed a large variety of different haplogroups among the villages, without any unique pattern in either case (see Fig. 7). Most of the samples belong to the typically European H and U macrohaplogroups. Most of the haplogroups were shared among the villages, and almost all villages have diverse haplogroup distribution of the maternal lines in both studied regions. Notably, the U macrohaplogroup was absent in the samples from the Pohranice municipality in the Zobor region. This absence however might be attributed to the limited sample size from Pohranice. In the Baranja dataset, the majority of samples associated with haplogroup K originate from a single community, specifically Suza.
Due to the uneven and sometimes limited number of samples across villages, conducting an AMOVA test for heterogeneity wasn't feasible. However, the variations both within and between villages are distinctly illustrated in Fig. 7.
A single aDNA study from the 9th-12th century exists for the Zobor region, which served as a Hungarian-Slavic contact zone during that era. Although the ancient sample set is limited in size and restricted to hypervariable sequences, some parallels can be observed, notably within haplogroup U5a1b 71. From the Baranja region, mostly prehistoric sample sets are published yet, which attest among others for the Neolithic presence of haplogroups T2a-b, K1a-b, and K2b in the area, and the prehistoric prevalence of J1c in the broader North Balkan area 72. These lineages are also found in today’s Baranja population.
Although most haplogroups in our samples align with those predominantly found in Europe, several outlier haplogroups were identified, including haplogroups L1b, N1a, X2, Y1a, D4, U4b, and U3b3. The appearance of outlier maternal lineage L1b in the Zobor data set is noteworthy. In Europe, mtDNA macrohaplogroup L represents less than 1% of the total population. L1b subgroup, dated at about 10 kya, has its frequency maximum in West Africa 5. According to phylogeographic analyses carried out by Cerezo et al. 5, around 65% of the European L lineages are believed to have arrived during more recent historical periods, such as the Roman period, the Arab conquest of the Iberian Peninsula and Sicily, and the Atlantic slave trade era 5. Ancient DNA data are scarce from these periods of Europe yet, therefore the origin of this group in the Zobor dataset remains open.
Although the mitochondrial N1a haplogroup was prevalent among the ancient Hungarians, the N1a representative from the Zobor region belongs to the prehistoric branch of the haplogroup (N1a1a1a3). The closest parallel to this lineage is from the southern area of Transdanubia (Western Hungary) and dates to the transition between the 6th and 5th millennia BCE (sample I0176 in Haak et al. 2015 67).
Haplogroup X2 occurs in two-two cases from both regions (X2c1 and X2b). X2 is more prevalent in the populations of the Near East, Caucasus, and Mediterranean Europe compared to those of northern and northeastern Europe and rare among Eastern European populations. Furthermore, it is virtually absent in the Finno-Ugric and Turkic-speaking peoples residing in the Volga-Ural region 9,49. Both detected subgroups have their parallels in prehistoric Europe, where X2b was more frequent. Two X2c1 samples from the Zobor region have close parallel from the conquest period Karos-Eperjesszög cemetery from northern Hungary (Karos 2/70) 14.
The rear mitochondrial haplogroup Y1a is most probably a sign of the maternal continuity of the Avar population in the Zobor region, based on parallels in Gnecchi et al. 2022, Maróti et al. 2022 15,45. Besides the Avar period of the Carpathian Basin, aDNA haplogroup matches are only known from Mongolia and Kazakhstan 45,47,72. Other outlier haplogroups (D4, U4b, U3b3) are discussed along phylogenetic analyses in the subsequent chapter.
We used PCA to visualize the population genetic relatedness based on mtDNA profiles and haplogroup frequencies of 42 different populations (Supplementary Table S5, Fig. 8).
The PCA analysis positions both the Baranja and Zobor region datasets within the European cluster, aligning closely with the Czech and Slovakian populations. Subtle difference is observed between the Székelys, other published “average” Hungarians and the groups of this study in that most of the East Eurasian haplogroups and haplogroup I are missing in the latter (Supplementary Table S5).
Due to the applied resolution of the haplogroup data, finer differentiation within this European cluster is not discernible.
Sequence-based analyses of the mitogenomes
We conducted a comprehensive examination of complete mitogenomes, encompassing 16,569 base pairs, through DNA sequence level analysis. Subsequently, Slatkin FST values were computed and documented in Supplementary Table S10. A heatmap, illustrating the clustering of FST values, was generated to elucidate the genetic differentiation among the populations under investigation (see Fig. S11). Interesting is that among the included Conquest Period aDNA datasets the KL6 group (stands for larger village cemeteries from the 10-11th centuries 49) clusters with Baranja, Hungarian and Székely datasets.
The differences between the FST values are very small, whole mitochondrial data are missing from some neighboring regions and the Slovakian and Czech datasets are also limited; therefore, the resolution of that analysis is restricted to a broader scale.
We analyzed individual maternal lineages to discern the inter-regional relationships of contemporary Hungarian lineages and their ties to prehistoric and historic populations, among other associations. In the following we present those lineages that show diverse phylogenetic connections of the two study areas, including ancient reference samples as well (see references with non-NCBI IDs in Supplementary Tables S11).
On the NJ tree of haplogroup T1a, one individual from Baranja (DV082) can be found in the close proximity to one individual from archaeological site Bolshie Tigany (Volga-Kama region Early Medieval) on an excerpt of the phylogeographically very diverse and therefore less informative T1a tree (Fig. S12A). Another studied mitochondrial lineage from the Zobor region (ZB006) is situated close to an early medieval lineage from Bayanovo site in the Cis Ural region, associated with the late Lomovatovo culture and to one another sample from 9-10th century site Bolshie Tigany, both located in Russia (Fig. S12B). Whereas the structure of the whole T1a tree do not allow firm phylogenetic conclusions, these proximities on the tree might hint on the common history of these people (see further maternal connections of ancient Hungarians with the populations of the Cis Ural and Volga regions in Szeifert et al. (2022)).
One sample from the Zobor region belongs to lineage D4b2b. While haplogroup D4 is predominantly found across East Asia, Southeast Asia, Siberia, Central Asia, and among the indigenous populations of the Americas, its presence in Europe is notably sparse 10. The D4 mitogenome NJ tree (Fig. S13) shows that the D4b2b subgroup is rather disseminated in Eastern Eurasia nowadays. Although the currently known medieval ancient data (such as late medieval Mongolian sample) do not cluster with the examined ZB058, this lineage could reach the Carpathian Basin in the historically recorded migration weaves of the 1st millennium BC.
The U2e phylogenetic tree highlights the diversity observed within the Zobor and Baranja regions (Fig. S14). While the Baranja sample DV023, classified under lineage U2e2a1a, demonstrates northern affiliations, two samples from the Zobor region don't neatly fit into any subgroups currently recognized in the phylotree (falling into the U2e1'2'3 category). Notably, samples from both the Zobor and Baranja regions share the U2e1b1a subgroup with individuals from the 10th-11th centuries in the Carpathian Basin. Furthermore, representatives from the Zobor region and the steppe, associated with the U2e1a1 subgroup, are also evident in the U2e tree (refer to Fig. S14).
The U3 phylogenetic tree indicates that the U3b and U3b3 lineages in Baranja have connections primarily to the south and east (Fig. S15), where ancient haplogroup matches also coming from the Middle East and Caucasus 72.
The U4 haplogroup evolved during the Last Glacial Period, and spread in Northern Eurasia, having been a relatively common lineage among Mesolithic European hunter-gatherers 72. On the U4 phylogenetic tree the Baranja samples have rather southern (Bulgarian, Serbian) connections whereas the Zobor region lineages show toward Central and Eastern Europe (Fig. S16).
The U5a haplogroup, prevalent across Western Eurasia, is also well-represented in the modern Carpathian Basin. Notably, its U5a2 subclade establishes a clear link with ancient samples from the closer and wider region, with important examples from the 9-11th century cemeteries of ancient Hungarians (see Fig. 9, Fig. S17).
The H13 haplogroup is present in both the Zobor region and Baranja, with pairs of individuals in each. However, their phylogeographic patterns differ strikingly (Fig. S18). In Baranja, the H13 lineages branch off basally, preceding most contemporary lineages. Conversely, in the Zobor region, lineages either match Northern European examples (as seen in the H13a1a1a lineage of ZB013) or are akin to a Roman-era sample from Dobrudja and modern Polish, Russian samples (as observed in the H13a1a3 lineages of ZB042 and ZB047).
Mitogenome sequences from Hungarian populations from Hungary, Székely (Hungarian) people from Transylvania near Odorheiu Secuiesc, Romania 27 and the here presented two populations from Baranja and Zobor region were tested in Arlequin for population differentiation and showed FST values below 0.0035 with significant p-values.