3.1. Technical validation of the custom NGS panel
The custom NGS panel performance was evaluated using a set of reference standards in two identical but independent NGS runs. After library preparation, all samples were confirmed to be of sufficient quality and quantity. The majority of the constructed libraries were within the targeted size range of 400 bp for sequencing using the custom NGS panel (Supplementary Figure S1). Both NGS runs achieved cluster densities of 860 to 878 K/mm2, > 93% of clusters passed the quality filter, and > 95% of the read bases with quality scores of above Q30, which were close to The Miseq System specifications of 865–965 K/mm2 cluster density and > 80% of bases above Q30 (Supplementary Table S5). Each sample achieved ~ 99.5% on-target aligned reads, and minimum amplicon mean coverage depths of 4600x (Supplementary Table S6). Analysis of the coverage depth per amplicon region revealed that 98.6% of the targeted regions (n = 216/219 amplicons) had average coverage depths of > 1000x. Two amplicons had average coverage depths of < 1000x, namely AMPL89337 (DNMT3A exon 17, chr2:25464411–25464625) and AMPL1156 (TP53 exon 4/exon 5, chr17:7578360–7578579), while AMPL117202 (MPL exon 10, chr1:43814902–43815103) had a coverage depth of below 100x (Supplementary Figure S2).
Combined analysis of sequencing results with the DNA Amplicon and Pindel apps revealed that the former was able to detect all known variants in the reference standards except for large duplications in FLT3; whereas Pindel was able to detect all frameshift variants as well as large duplications in FLT3, but not single nucleotide variants (SNVs) (Fig. 1). Overall, the custom NGS panel has a sensitivity of 99.2%, a specificity of 96.3%, a positive predictive value of 97.7%, an average intra-run concordance of 98.8% [range 95.2–100%], an average inter-run concordance of 99.0% [range 95.2–100%], and a detection limit of 1% VAF (Fig. 1).
3.3. Identification of Genetic Variants in Clinical MPN Samples
An initial total of 314 unique variants were detected across all 10 clinical MPN samples (Fig. 2). After filtering out intronic and UTR variants as well as all variants with MAFs of ≥ 1%, there were 115 exonic variants; 95 of which were found to be sequencing errors and were subsequently excluded. A total of 20 unique variants across the 10 clinical MPN samples were identified, including known MPN driver mutations (Table 2).
Table 2
Details of all NGS detected variants across the 10 clinical MPN samples.
Gene | c.DNA | Aa change | Consequence | dbSNP | ClinVar Assertion/COSMIC ID | Sample | VAF (%) | Sanger |
CALR | c.1092_1143del | p.L367Tfs*45 | Fs del | NA | Pathogenic/COSM1738055 | 09, Overt-PMF | 10.6 | ✓ |
CALR | c.1154_1155insTTGTC | p.K385Nfs*46 | Fs ins | rs765476509 | COSM1738056 | 02, ET | 15.9 | ✓ |
CALR | c.1153_1154insTATGT | p.K385Ifs*46 | Fs ins | NA | COSM5985669 | 01, ET | 35.2 | ✓ |
CALR | c.A1154C | p.K385T | nSNV | rs1024435400 | NA | 01, ET | 35.5 | ✓ |
JAK2 | c.G1849T | p.V617F | nSNV | rs77375493 | Pathogenic/COSM12600 | 07, Pre-PMF | 12.1 | ⨉ |
| | | | | | 10, Overt-PMF | 12.2 | ⨉ |
| | | | | | 06, PV | 19.2 | ✓ |
| | | | | | 08, Pre-PMF | 53.7 | ✓ |
| | | | | | 05, PV | 66.5 | ✓ |
| | | | | | 04, PV | 72.9 | ✓ |
| | | | | | 03, ET | 88.0 | ✓ |
ABL1 | c.A1049G | p.N350S | nSNV | rs144448357 | NA | 07, Pre-PMF | 48.8 | ✓ |
ASXL1 | c.1927_1928insGGGGGGGGTG GCCCGGGTGGAGGTGGCGG CGGGGCCACCGATGAGGGG GGGGGCAGAGGCAGCAGCA | p.G646Wfs*10† | stopgain | rs750318549 | NA | 04, PV | 31.3 | ✓ |
ASXL1 | c.1772dupA | p.Y591fs*0 | stopgain | rs762036456 | COSM4169775, COSM4169776 | 10, Overt-PMF | 40.8 | ✓ |
ASXL1 | c.2190delC | p.L731Yfs*12† | Fs del | NA | NA | 09, Overt-PMF | 38.5 | ✓ |
ASXL1 | c.A4299G | p.Q1433Q† | sSNV | NA | NA | 03, ET | 49.2 | ✓ |
DNMT3A | c.G1155A | p.P385P | sSNV | rs368009374 | VUS/likely benign | 03, ET | 51.6 | NA |
RUNX1 | c.G924T | p.Q308H | nSNV | rs80314254 | Benign | 08, Pre-PMF | 49.6 | ✓ |
SF3B1 | c.A2098G | p.K700E | nSNV | rs559063155 | Likely pathogenic/ COSM84677 | 08, Pre-PMF | 45.6 | ✓ |
TET2 | c.C911T | p.A304V | nSNV | NA | COSM5610834, COSM5610835 | 10, Overt-PMF | 50.1 | ✓ |
TET2 | c.T2604G | p.F868L | nSNV | rs147836249 | COSM87107 | 02, ET | 48.7 | ✓ |
TET2 | c.A3583G | p.I1195V | nSNV | rs568009712 | NA | 01, ET | 49.6 | ✓ |
TET2 | c.A3734G | p.Y1245C† | nSNV | NA | NA | 08, Pre-PMF | 44.9 | ✓ |
TET2 | c.3937delG | p.D1314Mfs*48 | Fs del | NA | COSM4383928 | 08, Pre-PMF | 46.3 | ✓ |
TET2 | c.A4538G | p.E1513G | nSNV | rs553669299 | NA | 02, ET | 51.6 | ✓ |
U2AF1 | c.A470C | p.Q157P | nSNV | rs371246226 | Likely pathogenic/ | 06, PV | 9.8 | ⨉ |
| | | | rs371246226 | COSM211534, COSM1318797 | 10, Overt-PMF | 42.5 | ✓ |
† Putative novel variant; Aa change, Amino acid change; Fs del, Frameshift deletion; Fs ins, Frameshift insertion; ✓, Sanger detected; ⨉, Sanger undetected; NA, Data not available. |
On average, the PMF samples appeared to harbour the highest number of variants, whereas the PV samples appeared to harbour the least number of variants (Fig. 3). Among the 20 variants, 13 were SNVs (synonymous SNV (sSNV) n = 2, nonsynonymous SNV (nSNV) n = 11) and 7 were indels (frameshift insertion (fs ins), n = 2; frameshift deletions (fs del), n = 3; stopgain, n = 2). Out of the 10 sequenced clinical samples, the JAK2 V617F driver mutation was identified in 7 samples, while CALR driver mutation was identified in 3 samples. Aside from driver mutations, other variants were also detected, including an nSNV in CALR as well as variants in ABL1 (n = 1), ASXL1 (n = 4), DNMT3A (n = 1); RUNX1 (n = 1), SF3B1 (n = 1), TET2 (n = 6), and U2AF1 (n = 2) (Fig. 3). All NGS-detected variants with allele frequencies of ≥ 15% were confirmed via Sanger sequencing (Table 2), except for the DNMT3A P385P sSNV that was not confirmed due to nonspecific amplification (Supplementary Figure S3).
The ABL1 point mutation identified in this study, N331S (Sample 07, pre-PMF) was reported in dbSNP (rs144448357). However, it is unknown whether it was previously identified in MPN due to the lack of ClinVar and COSMIC data (Table 2). The variant was identified in a patient diagnosed with pre-PMF at 50 years of age, with JAK2 V617F mutation. The patient presented with higher-than-normal platelet (859x109/L) and white cell counts (17.8x109/L), as well as constitutional symptoms (Table 1).
Four ASXL1 variants were identified in this study, of which, the ASXL1 G646Wfs*10 (c.1927_1928insGGGGGGGGTGGCCCGGGTGGAGGTGGCGGCGGGGCCACCGATGAGGGGGGGGGCAGAGGCAGCAGCA) stopgain variant (Sample 04, PV) was found to be located in the same dbSNP cluster rs750318549 as the previously reported ASXL1 Gly646Trpfs*12 (c.1934dupG) stopgain, which is the most common ASXL1 mutation accounting for > 50% of all identified ASXL1 mutations in myeloid malignancies [17] (Table 2). While ASXL1 Gly646Trpfs*12 is the result of a duplication of a G nucleotide within a homopolymer region of eight G nucleotides [17], the variant ASXL1 G646Wfs*10 identified in this study is the result of an insertion of 67 nucleotides at position chr20: 31022442, making it a novel stopgain variant. Two other ASXL1 variants identified in this study, ASXL1 L731Yfs*12 (Sample 09, overt-PMF) and Q1433Q (Sample 03, ET) were also putative novel variants with no dbSNP, COSMIC or ClinVar data; whereas the ASXL1 Y591fs*0 variant (Sample 10, overt-PMF) has been reported in various diseases including ET and MF [9, 18], MDS [19], chronic myelomonocytic leukaemia [20], AML [21], mast cell neoplasm [22] and CNL [23], as well as breast cancer [24], but has not been previously reported in PV (Table 2).
Six TET2 variants were identified in this study, of which, TET2 Y1245C (Sample 08, pre-PMF) was not found to be reported in dbSNP, COSMIC or the ClinVar database. The TET2 A304V (Sample 10, overt-PMF) and F868L (Sample 02, ET) variants have not been previously reported in MPN. TET2 A304V has only been previously identified in melanoma [25], while TET2 F868L has been previously identified in estrogen- and progesterone-receptor positive breast cancer [26], adult T cell lymphoma/leukaemia [27], and MDS [28, 29]. The TET2 D1314Mfs*48 variant (Sample 08, pre-PMF) has been reported in ET [30] and MDS [19]. Two other variants were reported in dbSNP – TET2 I1195V (rs568009712) (Sample 01, ET) and E1513G (rs553669299) (Sample 02, ET), but the associated disease(s) are unknown.
Two of the most common mutations in SF3B1 and U2AF1 were identified in this study, namely U2AF1 Q157P (rs371246226, COSM211534, COSM1318797) and SF3B1 K700E (rs559063155, COSM84677) [31–33]. Both variants have been reported to be likely pathogenic in the ClinVar database. A RUNX1 variant (Q308H rs80314254) reported as benign in familial thrombocytopenia in the ClinVar database was also identified. The variants SF3B1 K700E and RUNX1 Q308H were found in a pre-PMF sample (Sample 08) which also harboured the TET2 Y1245C and TET2 D1314Mfs*48 variants, alongside the JAK2 V617F driver mutation. The patient was 50 years of age and presented with abnormally high platelet count (1099x10^9/L), anaemia (Hb = 8.7 g/dL), and constitutional symptoms (Table 1). The U2AF1 Q157P variant was identified in PV and overt-PMF (Sample 06 and Sample 10, respectively) (Table 2). The overt-PMF sample (Sample 10) also harboured the ASXL1 Y591fs*0 variant in addition to the JAK2 V617F driver mutation. The patient was 63 years of age and presented with severe anaemia (Hb = 7.3 g/dL), neutropenia (WBC = 5.5x109/L) and thrombocytopenia (Platelet = 54x109/L), with constitutional symptoms (Table 1).