Run time
The run time of all 6 HA algorithms on 6 datasets (hg19 and hg38, each with 3 DP levels) is shown in Table 7 and Fig. 4. The table and figure provide the same information, but they show different details and perspectives. Table 7 gives specific run time, while Fig. 4 shows overall patterns. When comparing the run time of the 6 HA algorithms based on both hg19 and hg38 data’s 3 coverage levels, Table 7 and Fig. 4 show that the fastest HA is HapCUT2 for both hg19 and hg38 data; its run time is always under 2 minutes. SDhaP runs take longer when running hg19 data than hg38 data (1042 ~ 2534 minutes for hg19 vs 107 ~ 385 minutes for hg38). In fact, SDhaP is the slowest HA for hg19 but is near the average for hg38. Meanwhile, MixSIH and PEATH take longer when using hg38 data than hg19 data. Surprisingly, PEATH is actually the slowest HA for hg38, but it is one of the fastest (behind HapCUT2) for hg19, that is, taking < 10 minutes for hg19 but 291 ~ 881 minutes for hg38. In general, Fig. 4 shows that the DP30 runs (blue bars) take significantly less time than DP1 (green bars) and DP15 (yellow bars). This difference may be due to the fact that the input dataset of DP30 is much smaller after filtering based on coverage, i.e., with a much smaller number of DNA fragments as shown in Table 1. As the DP increases, the run time decreases (maybe because the dataset becomes smaller) or does not change much, but there are some outliers. These outliers include SDhaP’s run time on both the hg19 and hg38 datasets.
When running the 6 HA algorithms across the hg19 and hg38 data, there seems to be a longer run time for the hg38 dataset, especially for MixSIH and PEATH, but not for SDhaP. This longer time could be due to the fact that hg38 sequencing reads are longer (2X150 bp paired-end) than the hg19 sequencing reads (2X100 bp paired-end). The run time increases from 1 to 2 seconds (HapCUT2) to 800 minutes (PEATH, DP1). Some outliers of the run time include WhatsHap (DP1 and DP15), MAtCHap (DP1 and DP15), and SDhaP, where the hg38 data run time is around 8 to 700 minutes faster than the hg19 data. In addition to the above main conclusion, all HA methods, except HapCUT2, have faster run time with batch (or sbatch) files (that is to run the HA algorithm by submitting a sbatch job to our Linux server). The sbatch files run around 1 to 22 minutes faster than the files ran without using sbatch on the hg19 DP1 data. HapCUT2 has the fastest run time regardless of using sbatch or not. The run time shown in Table 7 is obtained based on the sbatch runs in our Linux server.
Table 7
Run time of the 6 HA algorithms on hg19 and hg38 data.
no.HI hg19
|
DP1
|
DP15
|
DP30
|
HapCUT2
|
0m0.020s
|
1m27.862s
|
0m50.749s
|
MixSIH
|
166m53.561s
|
167m55.217s
|
60m32.406s
|
PEATH
|
9m9.623s
|
9m17.898s
|
6m55.111s
|
WhatsHap
|
21m47.595s
|
18m32.676s
|
7m47.223s
|
SDhaP
|
1042m19.795s
|
2534m53.175s
|
2395m2.418s
|
MAtCHap
|
145m32.562s
|
74m7.937s
|
2m5.477s
|
no.HI hg38
|
DP1
|
DP15
|
DP30
|
HapCUT2
|
1m38.330s
|
1m27.657s
|
1m15.984s
|
MixSIH
|
533m21.192s
|
521m52.663s
|
323m4.853s
|
PEATH
|
881m41.440s
|
851m6.780s
|
291m25.599s
|
WhatsHap
|
13m18.916s
|
13m23.951s
|
9m34.321s
|
SDhaP
|
243m21.180s
|
385m9.746s
|
107m9.087s
|
MAtCHap
|
47m46.312s
|
42m7.180s
|
23m57.059s
|
In each cell, “m” means minute; “s” means second. For example, “0m0.020s” means 0 minute and 0.02 second. |
Summary of block (blk) and SNV numbers
Table 8 shows the summary of block and SNV numbers for the hg19 and hg38 at the DP1 level. Each row has the summary of the number of SNV per block for all haplotype blocks inferred by one HA. This table shows that for the hg19 DP1 data, WhatsHap infers about 50% more SNV positions (178534 vs 115000) and it has much smaller number of blocks (10132 vs 32000), but its block length is generally longer than the other HA algorithms (the longest block has 5194 SNVs for WhatsHap vs 770 SNVs for other HA algorithms). However, in the hg38 DP1 data, the total number of SNVs, blocks, and SNVs per block summary of WhatsHap is very similar to the majority of other HA methods. In the hg38 DP1 data, PEATH and MAtCHap’s block lengths and numbers of SNVs are similar except for a small number of extremely long blocks (longest block 6999 vs 2100 ~ 2400 for other HA methods). Similar outlying patterns of WhatsHap, PEATH, and MAtCHap can be seen in hg19 and hg38’s DP15 and DP30 data too. Please see the Supplemental Table 2 in the Additional File 1.
Table 8
Block and SNV number summary of the hg19 and hg38 DP1 data.
hg19.DP1.no.HI
|
filterSNV (no"-")
|
Block (no "-")
|
Min
|
Q1
|
Median
|
Mean
|
Q3
|
Max
|
HapCUT2
|
115215
|
32150
|
2
|
2
|
2
|
3.58
|
4
|
770
|
MixSIH
|
115790
|
32252
|
2
|
2
|
2
|
3.59
|
4
|
770
|
PEATH
|
115813
|
31355
|
2
|
2
|
2
|
3.69
|
4
|
770
|
WhatsHap
|
178523
|
10132
|
2
|
2
|
2
|
17.6
|
9
|
5194
|
SDhaP
|
115813
|
32252
|
2
|
2
|
2
|
3.59
|
4
|
770
|
MAtCHap
|
115813
|
31355
|
2
|
2
|
2
|
3.69
|
4
|
770
|
hg38 DP1.no.HI
|
filterSNV (no"-")
|
Block (no "-")
|
Min
|
Q1
|
Median
|
Mean
|
Q3
|
Max
|
HapCUT2
|
100686
|
21710
|
2
|
2
|
2
|
4.64
|
5
|
2187
|
MixSIH
|
100807
|
21726
|
2
|
2
|
3
|
4.64
|
5
|
2190
|
PEATH
|
100810
|
19660
|
2
|
2
|
3
|
5.13
|
4
|
6988
|
WhatsHap
|
100785
|
21654
|
2
|
2
|
3
|
4.65
|
5
|
2479
|
SDhaP
|
100810
|
21726
|
2
|
2
|
3
|
4.64
|
5
|
2190
|
MAtCHap
|
100810
|
19660
|
2
|
2
|
3
|
5.13
|
4
|
6988
|
The top 7 rows are for the hg19 DP1 data. The bottom 7 rows are for the hg38 DP1 data. The first 3 columns are HA names, total number of SNVs (after “-” being removed), and total number of haplotype blocks. “no.HI” in the first column means that no homozygous variants and indel sites are included. Columns 4 to 9 are a summary of the number of SNVs per block for all blocks listed in the third column. “Min” means minimum. “Max” means maximum. “Q1” is the 25th percentile. “Q3” is the 75th percentile. |
Comparing based on positions in known haplotypes
In order to compare the 6 HA algorithms, we first compare the chromosome positions from the known haplotypes with the SNV positions in the haplotypes inferred by each HA algorithm. The basic rational is that if the SNV or chromosome positions inferred by these HA algorithms are not in the positions listed in the known haplotypes or many inferred SNVs are not in the known haplotypes, then it is not useful to compare using the known haplotypes. That is, for example, if HapCUT2 infers 100 SNVs or positions in its output, and only 50 of them overlap with the known (or true) haplotype positions, it will not be meaningful to use the known haplotypes as a reference or standard. By comparing the SNV positions only, as shown in Table 9 we find that for the hg19 data, except WhatsHap, only 37–38% (for DP1 and DP15) and 18% (for DP30) of SNVs in the other 5 HA algorithms overlap with the known haplotype position or SNVs. For the hg38 data, the overlap is much larger, 81–84%. That is, there are still about 20% of SNV positions (roughly about 20,000 SNVs) that are not in the known haplotypes. Therefore, it is not proper to compare their haplotypes with known ones.
Table 9
Comparing SNVs in 6 HA algorithms with positions in known haplotypes.
HA methods
|
hg19.DP1
|
overlap
|
hg19.DP15
|
overlap
|
hg19.DP30
|
overlap
|
HapCUT2
|
115215
|
43868 (38.07%)
|
96357
|
35930 (37.29%)
|
18824
|
3517 (18.68%)
|
MixSIH
|
115790
|
44210 (38.18%)
|
96818
|
36211 (37.40%)
|
18885
|
3542 (18.76%)
|
PEATH
|
115813
|
44223 (38.18%)
|
96835
|
36225 (37.41%)
|
18887
|
3542 (18.75%)
|
WhatsHap
|
178523
|
97865 (54.82%)
|
152769
|
84094 (55.05%)
|
24628
|
7788 (31.62%)
|
WhatsHap.Filter
|
115745
|
44198 (38.19%)
|
96783
|
36204 (37.41%)
|
18864
|
3540 (18.77%)
|
SDhaP.100595
|
115813
|
44223 (38.18%)
|
96835
|
36225 (37.41%)
|
18887
|
3542 (18.75%)
|
MAtCHap
|
115813
|
44223 (38.18%)
|
96835
|
36225 (37.41%)
|
18887
|
3542 (18.75%)
|
HA methods
|
hg38.DP1
|
overlap
|
hg38.DP15
|
overlap
|
hg38.DP30
|
overlap
|
HapCUT2
|
100686
|
84197 (83.62%)
|
99572
|
83851 (84.21%)
|
63970
|
52185 (81.58%)
|
MixSIH
|
100807
|
84239 (83.56%)
|
99675
|
83887 (84.16%)
|
64023
|
52203 (81.54%)
|
PEATH
|
100810
|
84239 (83.56%)
|
99676
|
83888 (84.16%)
|
64025
|
52203 (81.54%)
|
WhatsHap
|
100785
|
84332 (83.68%)
|
99648
|
83983 (84.28%)
|
63999
|
52265 (81.67%)
|
WhatsHap.Filter
|
100625
|
84199 (83.68%)
|
99487
|
83846 (84.28%)
|
63917
|
52186 (81.65%)
|
SDhaP
|
100810
|
84239 (83.56%)
|
99676
|
83888 (84.16%)
|
64025
|
52203 (81.54%)
|
MAtCHap
|
100810
|
84239 (83.56%)
|
99676
|
83888 (84.16%)
|
64025
|
52203 (81.54%)
|
The columns are HA methods, numbers of SNVs obtained based on 3 DP level input data, and the corresponding overlap with 3591460 SNVs (for the hg19 data) and with 3632297 SNVs (for the hg38 data) in known haplotypes. |
Pairwise comparison based on SNV and block
As shown in the Table 8, the WhatsHap output for the hg19 DP1 data contains 178523 SNVs, around 50% more than the other HA algorithms, which makes it an "improper" or "unbalanced" comparison. In order to have a "proper" pairwise comparison, we filter the WhatsHap output to only contain chromosome positions found in the PEATH and MAtCHap output files. We use these two algorithms because they have identical 115813 SNV positions, and HapCUT2 and MixSIH have a similar number of SNVs. We then perform pairwise comparisons by including both the unfiltered and filtered WhatsHap (WhatsHap.Filter or WhatsHap.F) output with the other HA algorithms.
Table 10 is the pairwise comparison of 6 HA algorithms plus WhatsHap.Filter. Each row is the pairwise comparison result with that row HA as the reference (HA1). For example, the second row in the top panel is HapCUT2. That is, we use its output as a reference, i.e., HA1. Then the other HA algorithms listed in different columns are called HA2, where each HA2 is compared with HA1 to see how many blocks of HA2 (in the column) differ from HA1 (in the row). The comparison results are shown in the top panel (block disagreement), then we count the total number of SNVs in these disagreement blocks. These comparison results are shown in the bottom panel (SNV disagreement). We do this type of pairwise comparison for hg19 and hg38’s DP1, DP15, and DP30 data. That is, in total, there are 6 comparison tables, each with two panels, one for block disagreement and one for SNV disagreement as shown in Table 10. In order to avoid showing all these tables, we provide all pairwise comparison tables in the Additional File 2, which is an EXCEL file for hg19 and hg38 data and each with 3 sheets for DP1, DP15, and DP30 respectively.
Regarding the WhatsHap.Filter results, 115745/178523 SNVs (after removing blocks with 1 SNV) are left. That is, WhatsHap and other HA algorithms have a large number of common SNVs after the filter/selection. The filtered output has about a 2362/8149 = 28.99% block disagreement when compared with HapCUT2. These 2362 blocks consist of 94685 SNVs. This means that, on average, there are approximately 40 SNVs on each of these blocks. Thus, the output generated by WhatsHap usually has disagreements with other HA algorithms on long or large blocks. In addition, we find that both the SNV and block disagreements are much lower for the WhatsHap.Filter output than those for the unfiltered output as shown in Table 10.
Table 10
Pairwise comparison based on block and SNV disagreements of hg19 DP1 data.
Block-Disagree
|
HapCUT2
|
MixSIH
|
PEATH
|
WhatsHap
|
WhatsHap.
Filter
|
SDhaP
|
MAtCHap
|
HapCUT2
(32150 blocks)
|
0
|
459/32150 (1.43%)
|
188/32150 (0.58%)
|
1690/32150 (5.26%)
|
1690/32150 (5.26%)
|
4860/32150 (15.12%)
|
494/32150 (1.54%)
|
MixSIH
(32252 blocks)
|
885/32252 (2.74%)
|
0
|
655/32252 (2.03%)
|
1924/32252 (5.97%)
|
1924/32252 (5.97%)
|
5109/32252 (15.84%)
|
664/32252 (2.06%)
|
PEATH
(31355 blocks)
|
1005/31355 (3.21%)
|
976/31355 (3.11%)
|
0
|
2032/31355 (6.48%)
|
2032/31355 (6.48%)
|
5215/31355 (16.63%)
|
838/31355 (2.67%)
|
WhatsHap
(10132 blocks)
|
6353/10132 (62.70%)
|
6344/10132 (62.61%)
|
6328/10132 (62.46%)
|
0
|
|
6602/10132 (65.16%)
|
6347/10132 (62.64%)
|
WhatsHap.Filter (8149 blocks)
|
2362/8149 (28.99%)
|
2258/8149 (27.71%)
|
2227/8149 (27.33%)
|
-
|
0
|
2824/8149 (34.65%)
|
2255/8149 (27.67%)
|
SDhaP
(32252 blocks)
|
5207/32252 (16.14%)
|
5121/32252 (15.88%)
|
5109/32252 (15.84%)
|
5702/32252 (17.68%)
|
5702/32252 (17.68%)
|
0
|
5129/32252 (15.90%)
|
MAtCHap
(31355 blocks)
|
1274/31355 (4.06%)
|
954/31355 (3.04%)
|
838/31355 (2.67%)
|
2105/31355 (6.71%)
|
2105/31355 (6.71%)
|
5246/31355 (16.73%)
|
0
|
SNV- Disagree
|
HapCUT2
|
MixSIH
|
PEATH
|
WhatsHap
|
WhatsHap.
Filter
|
SDhaP
|
MAtCHap
|
HapCUT2 (115215 SNVs)
|
0
|
8881/115215 (7.71%)
|
6190/115215 (5.36%)
|
15960/115215 (13.85%)
|
15960/115215 (13.65%)
|
27098/115215 (23.52%)
|
9224/115215 (8.01%)
|
MixSIH
(115790 SNVs)
|
11034/115790 (9.53%)
|
0
|
10153/115790 (8.60%)
|
17307/115790 (14.95%)
|
17307/115790 (14.95%)
|
28183/115790 (24.34%)
|
10240/115790 (8.84%)
|
PEATH
(115813 SNVs)
|
12872/115813 (11.11%)
|
13671/115813 (11.80%)
|
0
|
19263/115813 (16.63%)
|
19263/115813 (16.63%)
|
31494/115813 (27.19%)
|
12508/115813 (10.80%)
|
WhatsHap (178523 SNVs)
|
165569/178523 (92.74%)
|
165603/178523 (92.76%)
|
165582/178523 (92.75%)
|
0
|
|
166693/178523 (93.37%)
|
165661/178523 (92.80%)
|
WhatsHap.Filter (115745 SNVs)
|
94685/115745 (81.80%)
|
93576/115745 (80.85%)
|
93258/115745 (80.57%)
|
-
|
0
|
96750/115745 (83.59%)
|
93491/115745 (80.77%)
|
SDhaP
(115813 SNVs)
|
28693/115813 (24.78%)
|
28261/115813 (24.40%)
|
28253/115813 (24.40%)
|
31441/115813 (27.15%)
|
31441/115813 (27.15%)
|
0
|
28452/115813 (24.57%)
|
MAtCHap (115813 SNVs)
|
15417/115813 (13.31%)
|
13563/115813 (11.71%)
|
12508/115813 (10.80%)
|
19750/115813 (17.05%)
|
19750/115813 (17.05%)
|
31801/115813 (27.46%)
|
0
|
Each cell consists of the count and percentages of blocks (top panel) and SNVs (bottom panel) that two HA methods disagree with. “-“ means that we do not compare WhatsHap with WhatsHap.Filter. |
To demonstrate the patterns we notice in the comparison Table 10 clearly, we create bar plots as shown in Fig. 5 and Fig. 6. In these bar plots, in the bottom along the x-axis, 6 HA algorithms plus WhatsHap.Filter are shown as HA1, and the “rainbow” colored bars are for each of the other HA algorithms, i.e., HA2, to compare. Figure 5 shows the following key patterns. First, the hg38 datasets (bottom 3 plots) show that all 6 HA algorithms have much better agreement rates than the hg19 datasets (top 3 plots). Second, for both hg19 and hg38, comparisons with SDhaP as HA2 seem to have the highest block disagreement, see the outstanding blue bars in Fig. 5. Third, for the hg19 data, WhatsHap, WhatsHap.Filter, and SDhaP have the most block disagreement when used as HA1, see the x-axis for tall bar clusters in the top panels. However, for the hg38 data, SDhaP generally has the most disagreement, see the blue bars in the bottom 3 plots in Fig. 5. We have the pairwise comparisons based on the total number of disagreement SNVs. Figure 6 shows similar patterns to Fig. 5. That is, for most comparisons, both hg19 and hg38 follow the same trend across three DP levels and have similar results, with hg19 having marginally higher SNV disagreement than hg38 (hg19 has around 10–20% while hg38 has less than 10% on average). With WhatsHap and WhatsHap.Filter as HA1, there is much larger SNV disagreement than all other comparisons in the hg19 data, but not the hg38 data. Disagreement with SDhaP as HA2 is again the largest, see the blue bars in Figs. 5 and 6.
Similar to the pairwise comparison based on the block and SNV disagreement, we also do the pairwise comparison using switch errors. As stated in the comparison analysis we use 12 different but related switch-distance metrics to compare those HA algorithms. For each metric, we plot the pairwise comparison results. Some of them have similar patterns and information. To avoid showing redundant figures, we choose to show 2 of those 12 metrics. These two metrics are total.sw and blk.w.0sw as shown in Figs. 7 and 8 respectively. The bar plot layout (order and color) in these figures is the same as Figs. 5 and 6. The bar plots of all 12 metrics are shown in the Additional File 3 (a PDF of 12 pages).
Pairwise comparison based on switch distance
Figure 7 is for the metric total.sw. That is, it shows the total number of switches an HA2 algorithm needs to match the HA1 haplotypes. In this figure, we find the following striking patterns. First, all 6 HA algorithms seem to agree with each other better in the hg38 data than in the hg19 data. Second, for both hg19 and hg38 data, comparisons with SDhaP as HA2 show the greatest difference (blue bars), that is, more switches needed, i.e., high disagreement bars. Third, for hg19, WhatsHap.Filter and SDhaP as HA1 have the largest number of switches (tall bar clusters). For hg38, only SDhaP has the largest number of switches (blue bars). Fourth, for the hg19 data, when WhatsHap is used as the HA1, the number of switches needed is the smallest (see the short bars above WhatsHap). This may be due to the fact that WhatsHap has long blocks, while the other HA method (i.e., HA2) does not have those long blocks to compare with. Thus, it has a small number of switches needed. This can be seen in Fig. 8, which shows low bars above WhatsHap. However, for the WhatsHap.Filter, after removing those SNVs not inferred by other HA algorithms, its haplotype block structures are altered, and then many switches are needed. Fifth, for hg19 data, DP1 and DP15 plots are very similar in size. The switches required for DP30 are much smaller, see low bars in Fig. 7 for total switches.
Figure 8 is for the metric blk.w.0sw. This figure shows the total number of blocks with 0 switches. That is, the figure shows the number of blocks where those HA algorithms agree with each other. We find the following patterns in this figure. First, for the hg19 data, the blocks with 0 switches are consistently around 30000 for DP1, 25000 for DP15, and 5000 for DP30, except when WhatsHap & WhatsHap.Filter is HA1 (see low bars). Second, for hg38, 6 HA algorithms’ block numbers are pretty consistent with around 20000 for DP1, 20000 for DP15, and 15000 for DP30. Third, the patterns for hg19 DP30 and hg38 DP1, DP15, and DP30’s numbers of blocks with 0 switches are similar. That is, 6 HA methods perform similarly when the DP level (i.e., DP30) is high, and the datasets have long reads (i.e., hg38 data).
WhatsHap has a much smaller number of blocks with 0 switches (sw = 0) and with sw (sw > 0) (Figs. 7 and 8). This is because it has a significantly larger average number of SNVs per block that cannot be checked due to different numbers of SNVs. In general, the number of SNV per block with switches is about 3 for hg19 and 4 for hg38. On average, for those blocks two HA methods disagree with each other, 1 to 2 switches are necessary per short block.
Overall, HapCUT2, MixSIH, PEATH, and MAtCHap have relatively low disagreement percentages for both hg19 and hg38 datasets with different filtering levels. This conclusion can be made based on the pairwise comparison analyses of haplotype blocks, SNVs, and switch error rates. However, both SDhaP and WhatsHap result in much higher disagreement percentages with the other algorithms in the hg19 data. For hg38, WhatsHap has similar performance compared with HapCUT2, MixSIH, PEATH, and MAtCHap. SDhaP still performs differently.