Alignment and voting pipeline
Using BWA-MEM aligner and Minimap2 aligner, alongside custom R scripts, we were able to generate a workflow for determining haplotype from Nanopore reads (Fig. 2A). Each sample file would undergo alignment via ONT-based parameters using BWA-MEM or Minimap2. Using R scripts, the resulting SAM files would be imported into R as a data frame object and reference sequences or allele IDs at 4th resolution were merged with allele group classification tables. The allele counts at 3rd resolution were summed and the top 2 counts for each gene were assigned as the haplotype. This was used to determine HLA genotypes for each patient (Fig. 2B).
Duplicate Analysis
Genes with full-length amplicon coverage (A, B, C, DQB1, DPA1, DQA1) had a total of 4615, 5422, 5113, 780, 318, and 204 alleles respectively and all sequences were unique to each allele with no duplicate sequences at 4th field or 3rd field (Additional File 1: Table S2).
For genes split by amplicons, duplicates were identified at 4th field and at 3rd field, however merging the amplicon-split sequences resulted in no duplicates at 3rd field (Additional File 1: Table S3). For DRB1, there were a total of 602 alleles. The DRB1 amplicon from start to 2.5 kb contained 74 duplicates at 4th field and 60 duplicates at 3rd field. The DRB1 amplicon from exon 2 to the end contained 27 duplicates at 4th field and 1 duplicate at 3rd field. Merging the 2 DRB1 amplicon sequences resulted in 21 duplicates at 4th field and 0 duplicates at 3rd field. For DPB1, there were a total of 940 alleles. For DPB1 amplicon from start to exon 1, 11 duplicates were found at 4th field and 10 duplicates were found at 3rd field. For DBP1 amplicon from exon 2 to end, 57 duplicates were found at 4th field and 3 duplicates at 3rd field. Merging the amplicon sequences resulted in 55 duplicates at 4th field and 0 duplicates at 3rd field. For DRB3, there were 36 alleles. For DRB3 amplicon from start to 2.5 kb, there were 7 duplicates identified at 4th field and 3 duplicates identified at 3rd field. For DRB3 amplicon from exon 2 to end of gene, 2 duplicates were identified at 4th field and 0 at 3rd field. Merging the amplicon sequences resulted in 2 duplicates at 4th field and 0 at 3rd field. For DRB5, there were a total of 11 sequences. For DRB5 amplicon from start to 2.6 kb, 3 duplicates were found at 4th field and 2 at 3rd field. For DRB5 amplicon from exon 2 to end, 0 duplicates were found at 3rd field and 4th field. Merging the amplicon sequences resulted in 0 duplicates being found for the 3rd field and 4th field. DRB4 contained 28 sequences and only one amplicon region from exon 2 to exon 3. 3 duplicates were found at 4th field and 0 at 3rd field. (Additional File 1: Table S3)
The duplicates analyses demonstrated that the amplicon coverage from the NGSgo® MX11-3 PCR mix (GenDx) was sufficient to obtain up to 3rd field resolution HLA typing for the characterized 11 genes.
Allele concordance resulting using R10.4 development batch (first batch)
Using our development batch of 31 samples, our alignment and voting-based pipeline using BWA-MEM aligner was able to obtain an overall high allele concordance with our previous results. For 1st field resolution, we obtained an overall concordance of 97.8%, for 2nd field resolution an overall concordance of 93.9%, and for 3rd field resolution, an overall concordance of 93.8%. The following percent concordances were obtained for the 1st, 2nd and 3rd field for each gene: A (98.4%, 98.4%, 98.4%), B (100%, 96.8%, 96.8%), C (100%, 98.4%, 98.4%), DPA1 (100%, 96.8%, 96.8%), DPB1 (100%, 100%, 98.4%), DQA1 (100%, 98.4%, 98.4%), DQB1 (100%, 98.4%, 98.4%), DRB1 (83.9%, 64.5%, 64.5%), DRB3 (96%, 88%, 88%), DRB4 (100%, 100%, 100%) and DRB5 (100%, 100%, 100%).(Additional File 3: Data S1,Table 1)
Table 1
Percent concordance by gene for 31 development batch samples aligned via BWA-MEM aligner and Minimap2 aligner
| BWA Aligner | Minimap2 Aligner |
Gene | 1st field | 2nd field | 3rd field | 1st field | 2nd field | 3rd field |
(% success) | (% success) | (% success) | (% success) | (% success) | (% success) |
A | 98.4% | 98.4% | 98.4% | 100.0% | 98.4% | 98.4% |
B | 100.0% | 96.8% | 96.8% | 98.4% | 93.5% | 93.5% |
C | 100.0% | 98.4% | 98.4% | 100.0% | 98.4% | 98.4% |
DPA1 | 100.0% | 96.8% | 96.8% | 100.0% | 98.4% | 93.5% |
DPB1 | 100.0% | 100.0% | 98.4% | 98.4% | 98.4% | 98.4% |
DQA1 | 100.0% | 98.4% | 98.4% | 91.9% | 80.6% | 80.6% |
DQB1 | 100.0% | 98.4% | 98.4% | 100.0% | 98.4% | 98.4% |
DRB1 | 83.9% | 64.5% | 64.5% | 83.9% | 64.5% | 64.5% |
DRB3 | 96.0% | 88.0% | 88.0% | 92.0% | 72.0% | 72.0% |
DRB4 | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% |
DRB5 | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% | 100.0% |
Overall | 97.8% | 93.9% | 93.8% | 96.5% | 90.8% | 90.3% |
Using Minimap2 aligner, our pipeline obtained an overall allele concordance of 96.5% at the 1st field, 90.8% at the 2nd field, and 90.3% at the 3rd field (Additional File 3: Data S1, Table 1). The following percent concordances were obtained for the 1st, 2nd and 3rd field for each gene: A (100%, 98.4%, 98.4%), B (98.4%, 93.5%, 93.5%), C (100%, 98.4%, 98.4%), DPA1 (100%, 98.4%, 93.5%), DPB1 (98.4%, 98.4%, 98.4%), DQA1 (91.9%, 80.6%, 80.6%), DQB1 (100%, 98.4%, 98.4%), DRB1 (83.9%, 64.5%, 64.5%), DRB3 (92%, 72%, 72%), DRB4 (100%, 100%, 100%) and DRB5 (100%, 100%, 100%).
In comparing the 2 pipelines, BWA-MEM had a higher overall concordance at all fields of resolution (1st field: 97.8% BWA-MEM vs 96.5% Minimap2, 2nd field: 93.9% BWA-MEM vs 90.8% Minimap2, 3rd field: 93.8% vs 90.3% Minimap2). For non-DRB genes, Minimap2 had overall > 93% concordance at the 3rd field except for DQA1, however BWA-MEM performed much better at > 96% concordance at the 3rd field for non-DRB genes. DRB1 had comparable concordance on both pipelines with 83.9% at 1st field and 64.5% at 2nd and 3rd field. DRB3 performance was better on BWA-MEM at 88% for 3rd field compared to 72% for 3rd field on Minimap2 (Table 1). McNemar’s test was performed on 3rd field results from BWA-MEM and Minimap2 to ascertain whether the concordance differences were statistically significant. Overall, there was a higher statistically significant concordance for BWA-MEM (93.8%) compared to Minimap2 (90.3%) at p = 3.18E-04. DQA1 had a statistically significant higher concordance of 98.4% in BWA-MEM compared to Minimap2 (p = 2.57E-3) (Table 2).
Table 2
Comparison of concordance between BWA-MEM and Minimap2 for development batch
Gene | P-Value | BWA-MEM Concordance | Minimap2 Concordance | Difference |
A | 1.00E + 00 | 98.4% | 98.4% | ns |
B | 4.80E-01 | 96.8% | 93.5% | ns |
C | NA | 98.4% | 98.4% | ns |
DPA1 | 6.17E-01 | 96.8% | 93.5% | ns |
DPB1 | NA | 98.4% | 98.4% | ns |
DQA1 | 2.57E-03 | 98.4% | 80.6% | BWA-MEM higher concordance* |
DQB1 | NA | 98.4% | 98.4% | ns |
DRB1 | NA | 64.5% | 64.5% | ns |
DRB3 | 2.21E-01 | 88.0% | 72.0% | ns |
DRB4 | NA | 100.0% | 100.0% | ns |
DRB5 | NA | 100.0% | 100.0% | ns |
Overall | 3.18E-04 | 93.8% | 90.3% | BWA-MEM higher concordance* |
Allele concordance resulting using R10.3 test batch (second batch)
Using our test batch of 63 samples, our alignment and voting-based pipeline using BWA-MEM aligner was able to obtain an overall high allele concordance with our previous results. For 1st field resolution, we obtained an overall concordance of 97.8%, a 2nd field overall concordance of 93.5%, and a 3rd field overall concordance of 91.2%. The following percent concordances were obtained for the 1st, 2nd and 3rd field for each gene: A (100%, 96.8%, 88.1%), B (100%, 90.5.4%, 88.1%), C (100%, 99.2%, 99.2%), DPA1 (100%, 100%, 99.2), DPB1 (98.4%, 97.6%, 92.9%), DQA1 (100%, 100%, 98.4%), DQB1 (100%, 97.6%, 96%), DRB1 (88.9%, 68.3%, 68.3%), DRB3 (97.3%, 94.6%, 94.6%), DRB4 (100%, 100%, 97%) and DRB5 (73.3%, 66.7%, 66.7%).(Additional File 3: Data S1, Additional File 1: Table S4)
Using Minimap2 aligner, we obtained an overall concordance of 97.4% for 1st field, 90.8% for 2nd field, and 89.0% for 3rd field. %. The following percent concordances were obtained for the 1st, 2nd and 3rd field for each gene: A (99.2%, 89.7%, 86.5%), B (98.4%, 88.1.4%, 85.7%), C (99.2%, 95.2%, 95.2%), DPA1 (100%, 98.4%, 97.6%), DPB1 (99.2%, 99.2%, 95.2%), DQA1 (94.4%, 88.9%, 87.3%), DQB1 (99.2%, 99.2%,96.8%), DRB1 (88.9%, 69.0%, 69.0%), DRB3 (94.6%, 79.7%, 79.7%), DRB4 (100%, 100%, 95.5%) and DRB5 (100%, 93.3%, 93.3%). (Additional File 1: Table S4) In comparing the 2 pipelines, as done previously with the development dataset, we performed McNemar’s test between the BWA-MEM and Minimap2 concordance results at the 3rd field. Overall, there was a statistically significantly higher overall concordance in BWA-MEM (91.2% overall concordance) compared to Minimap2 (89.0% overall concordance) at p = 3.37E-07. From the results, DQA1 (98.4% in BWA-MEM, 87.3% in Minimap2, p = 1.15E-03) and DRB3 (94.6% in BWA-MEM, 79.7% in Minimap2, p = 2.57E-03) had statistically significant higher concordances using the BWA-MEM aligner and DRB5 had a statistically significant higher concordance using the Minimap2 aligner (66.7% in BWA-MEM and 93.3% in Minimap2, p = 1.33E-02). (Additional File 1: Table S5) Due to BWA-MEM pipeline having an overall higher statistically significant concordance compared to Minimap2 across both datasets as well as more genes with significantly higher concordance, we decided to focus on the BWA-MEM aligner-based pipeline when making comparisons to other pipelines.
Comparing performance of our pipeline with other publicly available tools:
Using the HLA-LA pipeline, HLA haplotypes were obtained for all genes except DRB5. With Athlon2, HLA haplotypes were obtained for all genes except DRB4. For the development batch, the HLA-LA pipeline resulted in overall concordances of 99.1% (1st field), 89.4% (2nd field), and 84.7% (3rd field). The following percent concordances were obtained for the 1st, 2nd and 3rd field for each gene: A (100%, 100.0%, 98.4%), B (100%, 98.4.4%, 91.1%), C (100%, 98.4%, 95.2%), DPA1 (100%, 93.5%, 80.6), DPB1 (91.9%, 91.9%, 87.1%), DQA1 (100%, 67.7%, 56.5%), DQB1 (100%, 91.9%, 91.9%), DRB1 (100.0%, 88.7%, 88.7%), DRB3 (100.0%, 96.0%, 96.0%), and DRB4 (100%, 18.8%, 18.8%). For non-DRB genes, the HLA-LA concordances were > 80% at 3rd field except for DQA1 at 56.5%. DRB1 concordance was at 88.7%. (Table 3, Additional File 3: Data S1)
Athlon2 concordances were at 98.5% (1st field), 98.3% (2nd field), and 98.1% (3rd field). The following percent concordances were obtained for the 1st, 2nd and 3rd field for each gene: A (100%, 100.0%, 98.4%), B (100%, 100.0%, 100.0%), C (100%, 100.0%, 100.0%), DPA1 (100%, 100.0%, 100.0%), DPB1 (100.0%, 100.0%, 100.0%), DQA1 (100.0%, 100.0%, 100.0%), DQB1 (100%, 98.4%, 98.4%), DRB1 (96.8%, 96.8%, 96.8%), DRB3 (80.0%, 80.0%, 80.0%), and DRB5 (87.5%, 87.4%, 87.4%). For non-DRB genes, Athlon2 had > 98% concordances for all non-DRB genes with a 96.8% concordance for DRB1. (Table 3, Additional File 3: Data S1)
Table 3
Percent concordance by gene for 31 development batch samples aligned via HLA-LA pipeline and Athlon2 pipeline
| HLA-LA pipeline | Athlon2 Pipeline | |
Gene | 1st field | 2nd field | 3rd field | 1st field | 2nd field | 3rd field |
(% success) | (% success) | (% success) | (% success) | (% success) | (% success) |
A | 100.0% | 100.0% | 98.4% | 100.0% | 100.0% | 98.4% |
B | 100.0% | 98.4% | 91.9% | 100.0% | 100.0% | 100.0% |
C | 100.0% | 98.4% | 95.2% | 100.0% | 100.0% | 100.0% |
DPA1 | 100.0% | 93.5% | 80.6% | 100.0% | 100.0% | 100.0% |
DPB1 | 91.9% | 91.9% | 87.1% | 100.0% | 100.0% | 100.0% |
DQA1 | 100.0% | 67.7% | 56.5% | 100.0% | 100.0% | 100.0% |
DQB1 | 100.0% | 91.9% | 91.9% | 100.0% | 98.4% | 98.4% |
DRB1 | 100.0% | 88.7% | 88.7% | 96.8% | 96.8% | 96.8% |
DRB3 | 100.0% | 96.0% | 96.0% | 80.0% | 80.0% | 80.0% |
DRB4 | 100.0% | 18.8% | 18.8% | N/A | N/A | N/A |
DRB5 | N/A | N/A | N/A | 87.5% | 87.5% | 87.5% |
Overall | 99.1% | 89.4% | 84.7% | 98.5% | 98.3% | 98.1% |
Using McNemar’s test, we compared the concordances of HLA-LA and Athlon2 that of BWA aligner voting based pipeline. In comparing BWA-MEM results to the HLA-LA pipeline, there was a higher statistically significant concordance in BWA-MEM (93.7%) compared to HLA-LA (84.7%) at p = 2.60E-06. DPA1 (96.8% BWA-MEM, 80.6% HLA-LA, p = 9.37E-03), DPB1 (98.4% BWA-MEM, 87.1% HLA-LA, p = 4.55E-02), DQA1 (98.4% BWA-MEM, 56.5% HLA-LA, p = 9.55E-07), and DRB4 (100.0% BWA-MEM, 18.8% HLA-LA, p = 8.74E-04) were found to have statistically significant higher concordances in BWA-MEM aligner. DRB1 (64.5% BWA-MEM, 88.7% HLA-LA, p = 3.51E-03) was found to have a statistically significant concordance in HLA-LA aligner. (Table 4) In comparing BWA-MEM’s results to that of the Athlon2 pipeline, there was a higher statistically significant concordance in Athlon2 (98.1%) compared to BWA-MEM (93.6%) at p = 1.91E-04. The only statistically significant difference at the gene level was DRB1 concordance being higher in Athlon2 (64.5% BWA-MEM, 96.8%, p = 2.15E-05). (Table 5)
Table 4
Comparison of concordance between BWA-MEM and HLA-LA for development batch
Gene | P-Value | BWA-MEM Concordance | HLA-LA Concordance | Difference |
A | 1.00E + 00 | 98.4% | 98.4% | ns |
B | 3.71E-01 | 96.8% | 91.9% | ns |
C | 6.17E-01 | 98.4% | 95.2% | ns |
DPA1 | 9.37E-03 | 96.8% | 80.6% | BWA-MEM higher concordance* |
DPB1 | 4.55E-02 | 98.4% | 87.1% | BWA-MEM higher concordance* |
DQA1 | 9.44E-07 | 98.4% | 56.5% | BWA-MEM higher concordance* |
DQB1 | 1.34E-01 | 98.4% | 91.9% | ns |
DRB1 | 3.51E-03 | 64.5% | 88.7% | HLA-LA higher concordance* |
DRB3 | 4.80E-01 | 88.0% | 96.0% | ns |
DRB4 | 8.74E-04 | 100.0% | 18.8% | BWA-MEM higher concordance* |
Overall | 2.60E-06 | 93.7% | 84.7% | BWA-MEM higher concordance* |
Table 5
Comparison of concordance between BWA-MEM and Athlon2 for development batch
Gene | P-Value | BWA-MEM Concordance | Athlon2 Concordance | Difference |
A | 1.00E + 00 | 98.4% | 98.4% | ns |
B | 4.80E-01 | 96.8% | 100.0% | ns |
C | 1.00E + 00 | 98.4% | 100.0% | ns |
DPA1 | 4.80E-01 | 96.8% | 100.0% | ns |
DPB1 | 1.00E + 00 | 98.4% | 100.0% | ns |
DQA1 | 1.00E + 00 | 98.4% | 100.0% | ns |
DQB1 | NA | 98.4% | 98.4% | ns |
DRB1 | 2.15E-05 | 64.5% | 96.8% | Athlon2 higher concordance* |
DRB3 | 7.24E-01 | 88.0% | 80.0% | ns |
DRB5 | 1.00E + 00 | 100.0% | 87.5% | ns |
Overall | 1.91E-04 | 93.6% | 98.1% | Athlon2 higher concordance* |
For the test batch (second batch), the HLA-LA pipeline resulted in overall concordances of 98.3% (1st field), 86.6% (2nd field), and 78.8% (3rd field). The following percent concordances were obtained for the 1st, 2nd and 3rd field for each gene: A (97.6%, 95.2%, 88.1%), B (100%, 100.0%, 87.3%), C (99.2%, 96.8%, 88.9%), DPA1 (100%, 96.0%, 82.5), DPB1 (93.7%, 93.7%, 85.7%), DQA1 (100%, 63.5%, 47.6%), DQB1 (97.6%, 82.5%, 81.7%), DRB1 (98.4%, 87.3%, 85.7%), DRB3 (100.0%, 100.0%, 94.6%), and DRB4 (97.0%, 28.8%, 28.8%). For non-DRB genes, the HLA-LA concordances were > 80% at 3rd field except for DQA1 at 47.6%. DRB1 concordance was at 85.7%. Athlon2 overall concordances were at 95.3% (1st field), 95.0% (2nd field), and 93.3% (3rd field). The following percent concordances were obtained for the 1st, 2nd and 3rd field for each gene: A (97.6%, 97.6%, 94.4%), B (100%, 100.0%, 95.2%), C (99.2%, 99.2%, 99.2%), DPA1 (100%, 99.2%, 99.2%), DPB1 (100.0%, 100.0%, 96.0%), DQA1 (100.0%, 100.0%, 99.2%), DQB1 (97.6%, 96.0%, 94.4%), DRB1 (97.6%, 96.8%, 96.0%), DRB3 (67.6%, 67.6%, 67.6%), and DRB5 (40.0%, 40.0%, 40.0%). Athlon2 had > 94% concordances for all non-DRB genes with a 96.0% concordance for DRB1. (Additional File 1: Table S6, Additional File 3: Data S1)
In comparing BWA-MEM results to the HLA-LA pipeline, HLA-C (99.2% BWA-MEM, 88.9% HLA-LA, p = 8.74E-04), DPA1 (99.2% BWA-MEM, 82.5% HLA-LA, p = 3.04E-05), DPB1 (92.9% BWA-MEM, 85.7% HLA-LA, p = 2.65E-02), DQA1 (98.4% BWA-MEM, 47.6% HLA-LA, p = 3.41E-15), DQB1 (BWA-MEM 96.0%, 81.7% HLA-LA, p = 5.20E-04), and DRB4 (BWA-MEM 97.0%, HLA-LA 28.8%, p = 5.41E-11) were found to have statistically significant higher concordances in BWA-MEM aligner. DRB1 (68.3% BWA-MEM, 85.7% HLA-LA, p = 6.58E-04) was found to have a statistically significant concordance in HLA-LA aligner. (Additional File 1: Table S7). In comparing BWA-MEM results to the HLA-LA pipeline, HLA-A (88.1% BWA-MEM, 94.4% Athlon2, p = 2.69E-02), HLA-B (88.1% BWA-MEM, 95.2% Athlon2, p = 7.66E-03), and DRB1 (68.3% BWA-MEM, 96.0% Athlon2, p = 2.28E-08) were found to have statistically significant higher concordances in Athlon2 aligner. DRB3 (94.6% BWA-MEM, 67.6% Athlon2, p = 3.30E-04) and DRB5 (66.7% BWA-MEM, 40.0% Athon2, p = 1.33E-02) were found to have statistically significant higher concordances in BWA-MEM aligner (Additional File 1: Table S8)