Advances in molecular profiling have led to a rapid expansion in the number of predictive molecular biomarkers and associated targeted therapies, heightening the need for large-scale, prospective tumor profiling assays across all cancer types. The majority of comprehensive next-generation sequencing (NGS)-based profiling methods utilize tumor tissue as the primary specimen of choice for biomarker detection. Although widely used, obtaining an adequate tissue sample can be challenging in some cases due to the need for invasive biopsies that may pose an excessive risk to the patient. Additionally, based on our clinical experience, 8.8% of the tissue submitted for molecular analysis is inadequate for testing due to low tumor cellularity, low DNA yield, or quality 1. Finally, a single tissue biopsy may not capture the full genetic heterogeneity of a patient’s cancer, and consequently, clinically actionable biomarkers may be overlooked even with the most sensitive and specific genomic assay. Taken together, a sole tissue-based genomic profiling approach may not be comprehensive and may limit treatment options for cancer patients.
The successful detection of cancer drivers in circulating-tumor DNA (ctDNA) found within plasma cell-free DNA (cfDNA) 2 has provided a means to overcome the limitations of tissue profiling 3,4. cfDNA profiling can have a direct impact on patient care by informing treatment decisions 5,6, enabling the monitoring of cancer response to therapy 7,8, revealing drug resistance mechanisms 9,10 and detecting minimal residual disease or relapse 11-13. Additionally, by providing a less invasive collection procedure, cfDNA analyses also enable serial molecular profiling during the course of the patient’s disease 14,15. Plasma profiling can also potentially capture inter- and intra-tumor heterogeneity across disease sites especially in patients with advanced metastatic disease 16,17. In addition, recent studies have shown that ctDNA fragmentation profiles can better facilitate cancer screening and early diagnosis 18.
The use of ctDNA as an analyte, however, has its inherent limitations. It is usually found in low concentrations in the plasma 19, which may be the result of low disease burden in early stage tumors, disease control in response to treatment, or low tumor DNA shedding in blood. Moreover, the vast majority of cfDNA is typically derived from normal hematopoietic cells, leading to low levels of ctDNA and very low mutant allele frequencies for somatic mutations. Highly sensitive assays that are limited to single mutation ctDNA profiling assays such as droplet digital PCR (ddPCR) 20 are not practical for broad clinical use given the increasing number of genomic alterations that are predictive of response to FDA-approved targeted therapies or required as inclusion criteria for clinical trial enrollment. Given the low levels of ctDNA in a blood sample, the development of a highly sensitive NGS assay that comprehensively encompasses all clinically actionable targets is crucial for the detection of more low frequency alterations. Advances in next generation sequencing technologies, such as the introduction of unique molecular identifiers (UMIs) and dual barcode indexing, have enabled ultra-deep sequencing of cfDNA while dramatically reducing background error rates, thereby allowing high-confidence mutation detection of very low allele frequencies 21. Further, technical improvements in sequencing library preparation methods have reduced the input DNA required for sequencing, allowing for the efficient generation of libraries with input DNA as low as 10 ng.
Herein, we describe the design, analytical validation, and clinical implementation of MSK-ACCESS (Memorial Sloan Kettering - Analysis of Circulating cfDNA to Examine Somatic Status) as a clinical test that can detect all classes of somatic genetic alterations (single nucleotide variants, indels, copy number alterations, and structural variants) in cfDNA specimens. This assay utilizes hybridization capture and deep sequencing (~20,000X raw coverage) to identify genomic alterations in selected regions of 129 key cancer-associated genes. MSK-ACCESS was approved for clinical use by the New York State Department of Health on May 31, 2019, and has since been used prospectively to guide patient care. We therefore also report our clinical experience utilizing MSK-ACCESS to prospectively profile 681 clinical blood samples from 617 patients, representing a total of 31 distinct tumor types.
Panel design and background error assessment
We utilized genomic data from over 25,000 solid tumors sequenced by MSK-IMPACT to generate a list of 826 exons from 129 genes encompassing the most recurrent oncogenic mutations; variants that are targets of approved or investigational therapies based on OncoKB, an in-house, institutional knowledge base of variant annotations 22; frequently mutated exons; entire kinase domains of targetable receptor tyrosine kinases; and all coding exons of selected tumor suppressor genes. This MSK-IMPACT-informed design targets an average of 3 non-synonymous mutations and at least 1 non-synonymous mutation in 84% of the 25,000 tumors previously sequenced using MSK-IMPACT, including 91% of breast cancers and 94% of non-small cell lung cancers (Figure 1A). To further expand the detection capability of copy number alterations and structural variants in 10 genes, we additionally targeted 560 common SNPs and 40 introns known to be involved in rearrangements. MSK-ACCESS incorporates unique molecular indexes (UMIs) to increase fidelity of the sequencing reads. The overall process (Figure 1B) involves the sequencing of plasma cfDNA and genomic DNA from white blood cells (WBCs) to approximately 20,000X and 1,000X raw coverage, respectively, followed by collapsing read pairs to duplex (both strands of the initial cfDNA molecule) or simplex (one strand of the initial molecule) consensus sequences based on UMIs to suppress background sequencing errors (Figure 1C).
We first sought to characterize the error rate of MSK-ACCESS using a cohort of 47 plasma samples collected from healthy donors. The donor plasma samples were sequenced to a mean raw coverage of 18,818X. Post collapsing, the mean simplex and duplex coverage was 658X and 1,103X, respectively (Figure 1D, Supplemental Figure 1). When considering only the sites with background error across all targeted sites (i.e. positions with non-reference alleles), we observed a median error rate of 1.2x10-5 and 1.7x10-6 in simplex and duplex BAM files, respectively, compared to a median of 3.3 x10-4 in the standard BAMs (Figure 1E). Compared to the relatively equivalent background error rate on the HiSeq 2500 (Supplemental Figure 2), the standard BAMs on the NovaSeq 6000 showed a higher error rate for T>A, T>G, and C>A base transversions (Figure 1E). However, following collapsing, we observed lower and more uniform error rates across both sequencers. Moreover, while only 1% of targeted positions in the standard BAM had an error rate of zero, 92% of positions in the simplex BAM and 94% in duplex BAMs had no observed base pair mismatches (Figure 1F).
Analytical validation
For the analytical validation of MSK-ACCESS, we assembled a cohort of 70 cfDNA samples with a total of 100 known SNVs and indels in AKT1, ALK, BRAF, EGFR, ERBB2, ESR1, KRAS, MET, PIK3CA, and TP53 identified by orthogonal cfDNA assays (ddPCR or a commercial NGS assay) from the same specimen to demonstrate accuracy. The range of VAF for the expected mutations, based on orthogonal assays, was 0.1%-73%. We detected 94% of the expected variants (n = 94, 95% CI: 87.4%-97.8%) based on genotyping and 82% of them (n = 82, 95% CI: 72.7%-88.7%) with de novo mutation calling (R2 = 0.98) (Figure 1G, Supplemental Table 1A, B). Amongst the undetected mutations, leftover DNA was available for only one of the samples (orthogonal VAF = 0.16%), and ddPCR testing of this sample revealed no evidence of the alteration in our specimen. For mutations with VAF ≥ 0.5% from orthogonal assays (n = 83), we called 92% (n = 76, 95% CI: 84%-96.5%) de novo, and we detected 99% of the mutations by genotyping (n = 82, 95% CI: 92.5%-99.9%).
To determine the reproducibility of the assay, we prepared and sequenced seven samples, harboring a total of 152 mutations, both three different times within the same sequencing run and also across four separate runs (Supplemental Table 2). By genotyping, we detected 99% (n = 151, 95% CI: 96.4%-100%) of the expected mutations with an overall median coefficient of variation of 0.16 (range: 0.04-1.2) for each sample and alteration. To test the limit of detection of the assay, we sequenced five different dilution levels (5%, 2.5%, 1%, 0.5%, 0.1%) with a positive control sample containing19 known mutations. In the 0.1% dilution, 11% of the mutations (n = 2, 95% CI: 1.3%-33.1%) were called de novo and 74% (n = 14, 95% CI: 48.8%-90.9%) were detected by genotyping. All expected mutations were called de novo in the 0.5% sample (Supplemental Table 3).
Finally, to calculate specificity, variant calling was performed on 47 healthy donor plasma samples in comparison to their matched WBCs, and no mutations were called. Additionally, we utilized the samples from the accuracy analysis with orthogonal NGS results (n = 37), and considered all genomic positions interrogated by these assays (n = 1,620) (Supplemental Table 4). Four potential false positives not reported by the orthogonal NGS assay (TP53 p.R253H with VAFs 0.17% and 0.24%, and PIK3CA p.H1047R with VAFs 0.05% and 0.07%) were detected by MSK-ACCESS using genotyping thresholds, implying a specificity of at least 99.7% (95% CI: 99.3%-99.9%). Through de novo mutation calling, we identified only one false positive mutation, for a specificity of 99.9% (95% CI: 99.6%-100%). Overall, our positive predictive agreement (PPA) for genotyping was 94% (95% CI: 85%-98%) and for de novo mutation calling was 98% (95% CI: 90%-99.9%). The negative predictive agreement (NPA) was 99.7% and 99.2% for genotyping and de novo calling, respectively.
Clinical experience
Genomic landscape
Based on the above analytical validation results, MSK-ACCESS received approval for clinical use from the New York State Department of Health (NYS-DOH) on May 31, 2019 and was subsequently launched for routine clinical diagnostics assessment. Here, we describe the results from the first 617 patients prospectively sequenced in our clinical laboratory. A total of 687 blood samples were accessioned, and 681 (99%) yielded sufficient cfDNA and passed quality control metrics. Median raw coverage of the plasma isolated from these blood samples was 18,264X and 1,273X for WBCs. Median duplex consensus coverage for plasma was 1,497X.
Of the 681 samples, 51% (n = 349) were from non-small cell lung cancer (NSCLC) patients, followed by prostate, bladder, pancreatic, and biliary samples as the next most common cancer types (28%) (Figure 2A). We assessed the clinical actionability of genomic alterations detected by MSK-ACCESS using OncoKB, and 41% (n = 278) of samples had at least one targetable alteration as defined by the presence of an OncoKB level 1-3B alteration. The highest frequency of level 1 OncoKB alterations were observed in bladder cancer, breast cancer, and NSCLC patients at 48%, 37%, and 33%, respectively. Seventy-three percent (n = 498) of all samples had at least one alteration detected, with a non-zero median of 3 per patient (range 1-28) (Figure 2B), 56% of which harbored clinically actionable alterations.
Altogether, we clinically reported a total of 1697 SNVs and indels in 486 samples from 435 patients, with a median VAF of 1.9% (range 0.02% - 99%) (Figure 2C). Of these mutation calls, 95% (n = 1606) were called de novo without the aid of prior molecular profiling results for the tested patient. For the remaining 91 variants that were rescued by genotyping, the median observed VAF was 0.08%. As expected, deeper coverage enabled the detection of mutations at lower allele fractions for both de novo and genotyping thresholds (Figure 2D). However, de novo calling of alterations that were independently seen previously in tumors occurred across the entire mathematically possible range, given minimum required alternate alleles, allele frequencies, and coverage depths (Figure 2C, 2D).
To ensure the accurate identification of the expected alterations by our assay, we examined the most frequently called mutations, copy number alterations, and SVs in lung cancer and the next five largest disease cohorts (Figure 2E). As expected, TP53 was the most commonly altered gene, with variants in 144 of the 248 (58%) NSCLC samples with detectable alterations. Of greater therapeutic relevance, MSK-ACCESS identified oncogenic targetable driver mutations and amplifications in EGFR, KRAS, MET, ERBB2, and BRAF. Characteristically, lung cancer samples lacking known mitogenic drivers by MSK-ACCESS were found to harbor STK11 and KEAP1 mutations. EML4-ALK and KIF5B-RET fusions were also detected, de novo and by genotyping, in this cohort, along with rearrangements of ROS1 with multiple partners.
Both clinically actionable and oncogenic alterations were similarly found in the next five most represented tumor types prospectively sequenced by MSK-ACCESS (Figure 2E). TP53 was again the most commonly altered gene, including both mutations and likely oncogenic deletions identified. FGFR2 mutations and fusions (most commonly fused to BICC1) were identified in 8 of the 24 intrahepatic cholangiocarcinomas with detectable ctDNA, including missense mutations in the FGFR2 kinase domain known to confer resistance to targeted therapies. Targetable alterations were also identified in IDH1 and PIK3CA. Alterations in FGFR3, ERBB2, AR, and KRAS were recurrently detected in bladder, breast, prostate, and pancreatic cancer, respectively. Overall, the alteration rates in select genes and cancer types between MSK-ACCESS and MSK-IMPACT were comparable, with some notable exceptions such as KRAS in pancreatic cancer or AR and TP53 mutations in prostate cancer (Figure 2F).
Concordance with MSK-IMPACT
To compare the detection sensitivity and the spectrum of mutations observed between tumor tissue and plasma, we sought to examine concordance of mutation calling between MSK-IMPACT and MSK-ACCESS where available. For a consistent comparison analysis across all patients, we selected plasma mutations from the first sample sequenced by MSK-ACCESS for each patient for whom multiple time points were analyzed, and used the union of mutations across all tissue samples for each patient sequenced on MSK-IMPACT. Of the 617 patients tested with MSK-ACCESS, 383 also had clinical MSK-IMPACT results from 520 sequenced tumor tissues. A total of 1,212 mutations were reported in the overlapping target regions across both assays, and 58% (n = 702) of the mutations were reported by both assays (Figure 3A). The distribution of allele frequencies in tissue was slightly higher for the shared mutations than for the MSK-IMPACT-only calls (Mann-Whitney p value < 0.0001), but this effect was not observed for the MSK-ACCESS-only calls (Figure 3B). While the VAFs of shared mutations in tissue and plasma were weakly correlated, we nonetheless observed high-frequency tumor mutations at extremely low VAF by MSK-ACCESS, and vice versa (Figure 3C).
We next considered the alterations specific to one assay. Twenty-one percent of the mutations were reported individually by either MSK-IMPACT tumor sequencing (n = 260) or MSK-ACCESS plasma sequencing (n = 250) (Figure 3A). Interestingly, 61 of 250 mutations reported by MSK-ACCESS-only were present at low sub-threshold levels in tissue by MSK-IMPACT, highlighting the potential for increased sensitivity obtained by utilizing ultra-high depth of coverage and UMIs. Twenty-seven percent (n = 69) of the MSK-IMPACT-only mutations were clinically actionable (OncoKB Level 1-3), as were 12% (n = 30) of the MSK-ACCESS-only detected mutations (Supplemental Figure 3), clearly demonstrating the importance and value of complementary tissue and cfDNA analyses. Moreover, for patients that did not receive MSK-IMPACT testing (n = 234), MSK-ACCESS detected 79 total clinically actionable mutations in 26% (n = 61) of the patients.
In order to evaluate whether the tumor content played a role in mutation discordance, we computationally estimated tumor purity of tissue samples (n = 433 samples from 331 of the 383 patients for whom purity can be determined by FACETS, see Methods) and found no significant difference between samples from patients who had mutations detected only in plasma vs both plasma and tissue (Mann-Whitney p value = 0.7904) (Figure 3D). This was also true when purity was compared for patients with only actionable mutations (Supplemental Figure 4). Next, we evaluated the clonality of mutations (see Methods) in tissue samples of patients tested on both MSK-IMPACT and MSK-ACCESS. A larger proportion of shared mutations were clonal compared to tissue-only mutations, which was also true when only actionable mutations were considered (p<0.001) (Figure 3E), indicating the role clonality plays in mutation concordance between tissue and plasma testing.
We also assessed whether the time between the tissue collection for MSK-IMPACT and blood collection for MSK-ACCESS (difference in date of procedure, ΔDOP) had an impact on the mutation concordance (see Methods). Overall, the average ΔDOP was 48 weeks (median: 13 weeks, range: 0 - 518 weeks). Patients with mutations only detected on either MSK-IMPACT (mean = 400 days; median = 279 days) or MSK-ACCESS (mean = 515 days; median = 232 days) showed a higher ΔDOP than patients for whom mutations were detected on both assays (mean = 218 days; median = 28 days; Mann-Whitney p value < 0.001, Figure 3F, Supplemental Table 5). Evaluation of ΔDOP based only on actionable mutations across the three categories – MSK-IMPACT only, shared mutations, and MSK-ACCESS only – yielded similar findings (Supplemental Figure 5). Taken together, these results underlie the importance of timing of sample collection and tumor heterogeneity with respect to mutation concordance between tissue and plasma-based testing.
Utility of matched WBC analysis
Similar to MSK-IMPACT, MSK-ACCESS utilizes matched WBC sequencing to confidently identify and remove germline variants from cfDNA results. To quantify the benefit of matched WBC sequencing, we performed plasma-only variant calling in all clinical cases, resulting in 24,561 variant calls. We then simulated filtering criteria for unmatched sequencing, removing 14,508 variant calls (median: 14 ± 8 variants per sample), based on their presence in our curated plasma normal samples or in at least 0.5% of the population by gnomAD (Figure 4A, Supplemental Figure 6). We could further filter out 721 (7.2%) likely germline variants based on their VAF within the heterozygous germline variant VAF range (between 35% and 65% in both WBCs and cfDNA). However, using this VAF-based filtering would improperly remove a total of 70 verified somatic mutations from the cfDNA callset, 15 of which were clinically actionable (Supplemental Table 6). Therefore, 10,053 variants with a mean VAF of 4.7% (median: 0.05%) still remained after database driven filtering, highlighting the utility of patient-matched WBC profiling to filter out definitive germline mutations.
Notably, we were able to use the sequenced WBC sample to correctly classify several events as germline that were included as somatic events by commercial providers. As an example, a commercial cfDNA test reported an ATM p.E522fs*43 mutation as somatic and suggested therapies for this alteration, but our matched analysis revealed the indel to be present at ~50% in both the plasma and WBC and clearly ruled it out as a germline event. We have similarly been able to reassign mutations in TP53, BRCA2, and ROS1 that had been previously reported as somatic as germline variants. Additionally, the use of WBC sequencing revealed the germline origin of observed copy number deletions in ATM, BRCA2, and for two patients with retinoblastoma, RB1, based on deletions in their matched WBC sample (Supplemental Figure 7).
As previous reports have demonstrated that tumor- and normal-derived cfDNA may be distinguishable from genomic DNA by fragment length 23,24, we sought to confirm this observation in MSK-ACCESS data and use this information to better inform the origin of variants detected in cfDNA. The general fragment length distribution exhibited the expected bimodal cfDNA peaks around 161 and 317 base pairs, when factoring the trimming of 3 bases from read ends by the pipeline 25 (Figure 4B). For all cfDNA fragments harboring a somatic tumor-derived mutation confirmed to be absent in WBCs (n = 1,558), we observed that these fragments were significantly shorter than those harboring the wild type allele, consistent with their tumor origin (Figure 4C-I) (bootstrapped p value < 0.0001). In several variants with limited supporting evidence in WBC DNA but significantly greater VAF in plasma cfDNA we were able to distinguish the origin as somatic tumor derived (ctDNA) nature based on the slight cfDNA insert size profile peak in the WBC sample. As demonstrated in Figure 4C-II, these reads did have a shortened fragment length (bootstrapped p value < 0.0001), confirming that they originated from the cell free compartment. In stark contrast, the variant calls from the unmatched analysis that were filtered out as putative germline variants by their presence in WBCs at high VAF demonstrated an equivalent fragment length distribution as wild-type alleles (Figure 4C-III) (bootstrapped p value = 0.94). As we have shown, by integrating the fragment length analysis into the MSK-ACCESS assay, we can confidently distinguish between tumor-derived somatic and normal-derived variants in cfDNA.
Assessing the filtering of putative clonal hematopoiesis (CH) mutations
Several recent studies have suggested that CH mutations present a challenge for proper filtering in highly sensitive NGS-based liquid biopsy assays 26-29. We observed that the use of patient-matched normal WBC DNA in MSK-ACCESS eliminated 7,760 (77%) of variant calls below 10% VAF (Figure 4A-IV). We posited that the majority of these calls represent potential CH mutations. Recent reports 30 have suggested that fragments supporting CH variants have length distributions similar to cfDNA derived from non-cancerous cells and distinct from ctDNA 26,28,31. Indeed, the sequence reads harboring variants with plasma VAF <10% and present in WBCs exhibited fragment lengths indistinguishable from wild-type and germline variants (Figure 4C-IV) (bootstrapped p value = 0.99), adding confidence to the hypothesis that these were properly filtered WBC-derived somatic mutations associated with clonal hematopoiesis. The previously described alterations in Figure 4C-II with a lower frequency of reads in the WBC sample than in the cfDNA sample could also have been interpreted as having a CH origin. Nonetheless, the shorter length distribution for fragments harboring these mutations reaffirmed that these were likely tumor-derived as originally postulated.
Given our ability to recognize CH from WBCs, we have been able to reclassify several variant calls reported as somatic events by commercial vendors. While some of these calls were in commonly mutated CH genes such as DNMT3A, some were in less common genes. In one case, a patient with lung adenocarcinoma with an external report of KRAS p.G12S. However, we identified this alteration at equivalent frequencies (0.44% and 0.31%) in the plasma and WBC, suggesting that it most likely represents a CH mutation, underlying the complexities of assigning such alterations to different compartments when considering the clinical presentation of the patient.