Mitochondrial DNA sequence similarity of 36 primates
The result of the alignment of the 36 primate mtDNA sequences (plus Monodelphis domestica as an outlier) can be seen in the heatmap in Fig. 1. The Hopkins clustering statistic is 0.904, which indicates very good clustering. The species visibly form compact clusters. Based on k-means clustering, there are seven clusters corresponding to the seven primate genera selected for the study. H. heidelbergensis clearly clusters with modern H. sapiens and also Neanderthal and Denisovan, forming a distinct monophyletic group.
The mean sequence identity value between all four species in this group is 97.33 ± 0.77% (p-value = 2.5∙10− 30). In contrast, the mean sequence similarity between the four Homo individuals with Pan paniscus and P. troglodytes is only 90.7 ± 0.58%.
In the MDS plot in Supplementary Fig. 1, the seven monophyletic groups found in this study (plus M. domestica as an outlier) are coded with different colored dots. To the lower left, we have Homo, Gorilla, Pan, Hylobates, and Pongo. To the lower right, we have Macaca and Papio. The outlier, M. domestica is at the center top of the plot. The MDS coordinates for all 37 species are provided in Supplementary File 1.
Homo versus Pan
In a second study comparing the sequence similarity of H. heidelbergensis with human, 21 mtDNA sequences from modern humans were downloaded from NCBI and were analyzed with Neanderthal, Denisovan, and H. heidelbergenesis, and two mtDNA sequences from Pan paniscus and one from P. troglodytes.
In Fig. 2, two histograms showing the frequencies of mtDNA sequence similarities (MSS) are shown. In Fig. 2A, we see the frequency of MSS of 21 modern humans among modern humans, Neanderthal, Denisovan, H. heidelbergensis, and Pan. H. heidelbergensis is arguably the least similar archaic human compared to modern humans. It is the farthest away from modern humans, followed by Denisovan and then Neanderthal. This is to be expected, since H. heidelbergensis is the oldest individual from the Homo group, being a super-archaic human (Brunner and Lombard, 2020).
However, in Fig. 2B, we see a different picture. The MSS between P. troglodytes and P. paniscus is a mere 97.1%, with a variation of 2.5%. This appears to be the upper limit of mtDNA variation between members of the genus Pan. Despite their reproductive and morphological differences, chimpanzees and bonobos can hybridize in captivity (Vervaecke and Van Elsacker, 1992), and there is evidence of introgression of a small portion of nuclear genetic material between the two species based on a study of 75 wild-born chimpanzees and bonobos (de Manuel et al., 2016). The MSS variation of Homo seen in Fig. 2A is only around 3.5%, less than that within Pan. Modern humans, Neandertals, and Denisovans are thought to have interbred with one another in the past (Pennisi, 2013; Coll Macià et al., 2021; Gopalan et al., 2021; Larena et al., 2021). Furthermore, as we can see from Table 1, the MSS between H. heidelbergensis and Pan is the least among Homo, at 89.8%.
Group
|
Mean ± st.dev.
|
No. comparisons
|
Table 1
Mean sequence similarity value between modern humans, Neanderthal, Denisovan, H. heidelbergensis, and Pan.
All modern humans
|
0.997 ± 0.0017
|
210
|
Modern humans and Neanderthal
|
0.986 ± 7E-4
|
21
|
Modern humans and Denisovan
|
0.975 ± 5E-4
|
21
|
Modern humans and H. heidelbergensis
|
0.967 ± 6e-04
|
21
|
Modern humans and Pan
|
0.91 ± 9e-04
|
63
|
All Pan
|
0.971 ± 0.0248
|
3
|
Pan versus Neanderthal
|
0.912 ± 6e-04
|
3
|
Pan versus Denisovan
|
0.909 ± 6e-04
|
3
|
Pan versus H. heidelbergensis
|
0.898 ± 0.0012
|
3
|
To illustrate relationships within the human clade, a phylogenetic tree was created using the MEGA-X software shown in Fig. 3. The 25 modern human individuals are closely grouped together, whereas Neanderthal, Denisovan, and H. heidelbergensis are more basal in the tree, as they are more ancient humans, separated from all other sequences by more mutations.
Variant analysis of H. heidelbergensis and modern human
For each of the nine SRA samples from BioProject PRJEB10957 several variant statistics were called for H. heidelbergensis, as reported in Table 2. These same statistics are also provided for BioProject PRJEB22592 in Table 3, and are also provided in Supplementary Data File 3.
|
% aln hg38
|
% aln ntal
|
% aln pt6
|
No. hg38 vars
|
hg38 var. density
|
No. ntal vars
|
ntal var. density
|
No. pt6 vars
|
pt6 var. density
|
pt6/hg38 dens. prop.
|
pt6/ntal dens. prop.
|
hg38/ntal dens. prop.
|
Table 2
Variant calling statistics for PRJEB10957: reads from H. heidelbergensis to hg38, Neanderthal and pt6
Mean
|
0.966
|
0.959
|
0.742
|
16356.4
|
54.6
|
16904.556
|
51.1
|
35767.8
|
28.1
|
2.440
|
2.300
|
0.946
|
Std. dev.
|
0.034
|
0.035
|
0.161
|
32127.7
|
66.5
|
32951.654
|
62.2
|
59506
|
29
|
1.267
|
1.154
|
0.021
|
|
% aln hg38
|
% aln ntal
|
% aln pt6
|
No. hg38 vars
|
hg38 var. density
|
No. ntal vars
|
ntal var. density
|
No. pt6 vars
|
pt6 var. density
|
pt6/hg38 dens. prop.
|
pt6/ntal dens. prop.
|
hg38/ntal dens. prop.
|
Table 3
Variant calling statistics for PRJEB22592: reads from an archaic human to hg38, Neanderthal and pt6 (with USER treatment)
Mean
|
0.999
|
0.996
|
0.945
|
2521616.000
|
70.737
|
2975644.800
|
59.190
|
20786182.8
|
8.211
|
74.870
|
1.180
|
0.847418
|
Std. dev.
|
0.000
|
0.001
|
0.006
|
1066303.035
|
58.051
|
1322310.477
|
46.696
|
11109092.69
|
5.688
|
2.081
|
1.240
|
0.806394
|
Table 4. P-values coming from Student t-test comparing variant densities from SRA samples from H. heidelbergensis, and archaic human to hg38, ntal and pt6. The term “less” means that the variant density values from column 1 are less than those from column 2.
variant density 1
|
variant density 2
|
p-value
|
p-value (“less”)
|
H. heidelbergensis vs. hg38
|
archaic human vs. hg38
|
0.647
|
0.677
|
H. heidelbergensis vs. ntal
|
archaic human vs. ntal
|
0.790
|
0.395
|
H. heidelbergensis vs. pt6
|
archaic human vs. pt6
|
0.078
|
0.961
|
Proportion of aligned reads to WGS
The proportion of aligned reads from a DNA sample to the WGS of another species should be high if the other genome comes from a species from the same monophyletic group and lower if from another monophyletic group. The proportion of reads from H. heidelbergensis mapping to hg38 and the proportion of reads from the archaic human to hg38 sample should not differ significantly. For PRJEB10957, the mean proportion of H. heidelbergensis reads mapping to hg38 (96.6 ± 3.4%) is much higher than that mapping to pt6 (74.2 ± 16.1%). This proportion is also very similar to the mean proportion of reads mapping to the Neanderthal genome (95.9 ± 3.5%).
When comparing the proportion of reads mapping from H. heidelbergensis to hg38 to the proportion of the same reads mapping to pt6, there is a statistically significant difference, with a p-value of 0.003. When comparing the proportion of reads mapping from H. heidelbergensis to the Neanderthal WGS to the proportion of the same reads mapping to pt6, the p-value is 0.004. However, when the proportion of H. heidelbergensis reads mapping to hg38 is compared to the proportion of these reads mapping to the Neanderthal WGS, the p-value is insignificant at 0.675. This means that the H. heidelbergensis reads map in the same manner to both modern (hg38) and the archaic genome (Neanderthal); there is no statistically significant difference between the two modes of mapping.
Variant density
Another important measurement is the number of variants in the nine H. heidelbergensis SRA samples compared to the human genome. The more variants and the kinds of variants between two species’ genomes could help us tell whether they are from the same or different monophyletic groups. If the two species are from the same group, their genome would harbor fewer differences than if they were from different monophyletic groups.
The variant number can be converted to a variant density by dividing the number of reads in a given sample by the number of variants found in the sample when mapping the reads to a given genome. This number corresponds to the average length of DNA between two neighboring variants. The variant density has an inverted relationship to the number of variants. A lower density value corresponds to a higher number of variants, and vice-versa. This was done because the H. heidelbergensis reads do not cover the whole genome. Since we are also looking at the number of reads instead of the total length of the DNA from a given SRA sample, this is only a rough estimate of the variant density.
As can be seen from Table 2, the mean number of variants is about the same when the H. heidelbergensis reads are mapped to modern human (16,356.4 ± 32,127.7) and to Neanderthal (16,904.6 ± 32,951.77). However, the mean number of variants found when mapping the H. heidelbergensis reads to the chimpanzee genome is much greater (35,767.8 ± 59,506.8). The variant density when mapping the H. heidelbergensis reads to the genome of modern humans and Neanderthals is 54.6 ± 66.5 and 51.1 ± 62.2 as opposed to the same reads mapping to the chimp genome, 28.1 ± 29.0.
If H. heidelbergensis belongs to a clade defined by the continuity of archaic and modern humans, then the variant densities calculated from its nine SRA samples (when mapped to a particular genome) should correlate highly with the variant density of the five SRA samples (ERR2117984–ERR2117988) from PRJEB22592, which contains DNA from an archaic “Paleolithic” human, when both are mapped to the same genome.
On the other hand, if H. heidelbergensis comes from a non-human group, then its pattern of DNA variant differences should differ from that of a human-to-human comparison. In order to measure this, three Student’s t-tests were performed, to see whether there is a statistically significant difference when a.) mapping H. heidelbergensis reads to hg38 compared to mapping ancient human reads to hg38, b.) mapping H. heidelbergensis reads to the Neanderthal genome compared to mapping ancient human reads to the Neanderthal genome, and c.) mapping H. heidelbergensis reads to pt6 compared to mapping ancient human reads to pt6. These comparisons are summed up in Table 4.
When reads from H. heidelbergensis archaic human were both mapped to hg38, the difference in variant density between these two comparisons has a p-value of 0.677. When the target genome is the Neanderthal genome, the p-value is 0.395. When the target genome is the pt6 genome, the p-value is 0.961.
This means that there is no statistically significant difference between the two variant densities; the mapping behavior of the H. heidelbergensis reads behaves very much like the ones from archaic humans. This implies that H. heidelbergensis is just as human as the archaic human from PRJEB22592. All three of these comparisons imply that the H. heidelbergensis reads behave much like those of ancient human, meaning that H. heidelbergensis is human.
Lastly, the different variant densities can be compared to one another by dividing one by the other. This way we can calculate a relative variant density measure, to see how much denser a given genome is in the number of variants compared to another genome. These proportions were calculated for pt6/hg38, pt6/ntal, and hg38/ntal over all nine H. heidelbergensis SRA samples (see Fig. 4). Overall, variants called in the pt6 genome are 2.4 and 2.3 times denser than those variants in the hg38 and Neanderthal genomes. Samples ERR995362 and ERR996364 may have a variant density proportion less than 1 for pt6/hg38 and pt6/ntal, because these samples have a very small number of reads in them. This may be due merely to biological variation.
Skewed variant frequency
The results presented here must be taken with caution since sequencing aDNA does have its caveats. These include contamination with DNA from microbes or modern humans, and degradation of the aDNA (Thomas and Tomkins, 2014).
Proportionately slightly fewer H. heidelbergensis reads map to the hg38 genome, compared to reads from ancient (“Paleolithic”) humans. Also, there are relatively more variants when mapping H. heidelbergensis reads to hg38 as opposed to ancient human reads. There are two reasons for this. The first is that over time, DNA variants could have accumulated in the modern human genome compared to the ancient genome of H. heidelbergensis. The second could be due to deamination from C to T (mirrored on the reverse strand as G to A). The research group that isolated the DNA from the H. heidelbergensis samples claimed that the frequency of C > T deaminations rose from 12–17% at the 5’ end of the reads to 55–62% at the 3’ end of the reads! They found that this rate of deamination is also similar to the same rate as found in bear remains from the same site at Sima de los Huesos (Meyer et al., 2014).
The frequency of each of the twelve possible SNPs was counted for each of the nine H. heidelbergensis SRA samples when mapped to both the hg38 and the Neanderthal genomes. This information can be found in Supplementary File 3. The proportion of each of the twelve SNPs between the H. heidelbergensis reads and both hg38 and the Neanderthal genome for SRA samples ERR995367–ERR995361 were plotted and can be found in Supplementary Figs. 2a and 2b online. These five SRA samples were chosen, because the other four samples had a very low number of reads, and some of the twelve SNPs did not occur in those samples. It is clear from the results that the C > T/G > A SNPs are the most common SNPs, making up 33.2–47% for hg38 and 33.1–46.7% for Neanderthal. These proportions cannot be by mere chance.
Guo and Jamison (2005) calculated that about 33.2% of all SNPs are C > T/G > A. The present study also calculated the frequency of C > T/G > A substitutions from the dbSNP to be 32%. The average C > T/G > A substitution rate over the five H. heidelbergensis SRA samples is 40.1 ± 2.9%. Thus, the proportion of C > T/G > A substitutions between the H. heidelbergensis and modern human genomes is slightly elevated. This corresponds to a Z-score of 2.69, which denotes that the two distributions are significantly different. It means that around 6.9% (40.1% – 33.2%) of the C > T/G > A transitions are due to deamination.
The mean number and standard deviation of SNPs excluding C > T and G > A were calculated for these five samples (see “PRJEB10957 SNP hg38 dist.” tab, rows “z (C > T)” and “z (G > A)” in Supplementary File 3). A z-score was calculated for C > T and G > T in these five samples to see how extreme they are. For all five samples, the z-score was greater than 1.65, meaning that these results are significant at the 5% level. This means that the higher number of C > T and G > A variants are not occurring by random chance. Since the number of C > T and G > A variants are skewed, this means that the genetic distance between H. heidelbergensis and modern humans decreases, and strengthens the conclusion that H. heidelbergensis is human.
Percent dissimilarity of reads mapped to hg38 and pt6
As the last step, 10,000 read sequences from the fastq file of the samples from PRJEB10957 and PRJEB22952 were mapped to both the hg38 and the pt6 genomes using BLASTN. The mean percent mismatches were noted as well as their standard deviations. The normal curves with these mean and standard deviation values are plotted in Fig. 5, and the mean and standard deviations are shown in Table 5.
Table 5
Mean % ± std. dev dissimilarity of 10,000 sequencing reads from SRA samples PRJEB10957 and PRJEB22952 mapped to hg38 and pt6,
|
hg38
|
pt6
|
Homo heidelbergensis
|
0.0049 ± 0.0022
|
0.0283 ± 0.016
|
‘Paleolithic’ archaic human
|
0.0045 ± 0.003
|
0.0283 ± 0.014
|
Z-score
|
0.108
|
1.9E-4
|
The z-score, described in the Materials and Methods section describes how similar two normal distributions are. Both values listed in the last row of Table 5 show that the z-scores comparing H. heidelbergensis to hg38 and pt6 and also the ‘paleolithic’ archaic human to hg38 and pt6 are well below 2.0, meaning that these distributions are almost identical to one another. This means that the genetic distance according to read sequence mismatches between modern human and ‘paleolithic’ archaic human and between modern human H. heidlbergensis are virtually the same, also supporting the idea that H. heidelbergensis is human.