Cell-free DNA (cfDNA) comprises highly fragmented DNA molecules that are released into the circulatory system from cells1-3. Circulating tumor DNA (ctDNA) specifically refers to the cfDNA that originates from tumor cells4,5. During the early stages of cancer, ctDNA in the blood can be exceedingly minute (as low as 0.01% of total cfDNA), posing a significant challenge for detection6-9. Contemporary diagnostic techniques utilizing Sanger sequencing or Next-Generation Sequencing (NGS) struggle with sensitivity, often failing to detect minuscule ctDNA levels9,10. Since low levels of ctDNA are common for many early-stage cancers and minimal residual disease (MRD), it has been difficult to obtain reliable detection results for such cases9-14.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) based ctDNA enrichment methods have shown promising enhancement of detection sensitivity15-19. The methods involved the depletion of regular major alleles prior to conducting NGS, thereby enriching for the minor alleles. Innovations in this domain include the highly efficient CRISPR-based sequence depletion methods including CARM (Cas9 Assisted Removal of Mitochondrial DNA)15, DASH (Depletion of Abundant Sequences by Hybridization)16, MAD-DASH(miRNA and Adaptor Dimer-DASH)17 and CUT-PCR (CRISPR-mediated, Ultrasensitive detection of Target DNA using PCR)18. These methods facilitated the enrichment of minor allelic DNA by depleting major allelic DNA, thus enabling accurate detection of minor DNA with fewer NGS reads, suggestive of immense potential in the field of molecular diagnostics.
However, while these methods provided unprecedented effectiveness in selectively enriching the sequences of interest in NGS libraries, the applicability of the current CRISPR based mutant detection methods were limited by the specific requirements15-18. The previous studies demonstrated that the eliminations required the sequences to contain significant sequence differences, or the mutations be positioned within the protospacer adjacent motif (PAM) site. A primary reason for this is that the specificities of the current CRISPR systems have been shown to be insufficient to effectively distinguish single- or double-base mismatches in the target DNA sequence20-22. These inaccuracies in the CRISPR cleavage events constituted a significant barrier, both in vivo and in vitro, for medical and industrial applications that required exact base pair discrimination18,19,23,24. The sequence limitation has been a major hindrance in applying CRISPR-based enrichment for diagnosis through detecting rare mutant alleles in cfDNA. For Streptococcus pyogenes Cas9 (SpCas9), as an example, the accurate single-base discrimination is limited to mutations where the 5’-NGG-3’ sequence within the PAM were changed to 5’-NHH-3’ (H: A, T, C).
Several studies reported enhanced CRISPR systems with improved specificity, such as eSpCas9 and SpCas9-HF20,22. However, while these engineered CRISPRs showed considerably higher accuracies compared to wild-type SpCas9 (SpCas9-WT), significant cleavages still occurred sporadically at off-targets that had single-base pair mismatches. To overcome this precision issue, we sought to develop a CRISPR system capable of effectively distinguishing a single-base mismatch across all the sgRNA target sequence. To this end, we developed an Francisella novicida Cas9 (FnCas9) with advanced fidelity (FnCas9-AF2), that could efficiently discriminate mutations across all positions of a sgRNA target sequence. In vitro cleavage and genome-wide analyses demonstrated that FnCas9-AF2 had undetectable off-target activity.
Next, we applied FnCas9-AF2 to develop CRISPR-based sequence depletion approach for detecting low-frequency cancer-associated mutations, and we named it MUTE-Seq (Mutation tagging by CRISPR-based Ultra-precise Targeted Elimination in Sequencing). We observed that MUTE-Seq markedly increased the detection sensitivities of low-frequency mutant alleles via wild-type allele depletion in biopsy samples from both acute myeloid leukemia (AML) and non-small cell lung cancer (NSCLC) patients. MUTE-Seq significantly increased the detection rates of low-frequency NRAS mutations in AML patients monitored for minimal residual disease (MRD). The increment of detected mutant allele frequencies were apparent in both chromatogram in Sanger sequencing and NGS. Subsequently, we conducted multiple sgRNA-based simultaneous depletions of wild-type alleles in cfDNA using FnCas9-AF2, revealing that the system could be utilized in a multiplexed manner. Multiplexed MUTE-Seq (mMUTE-Seq) significantly enhanced the concordance of detected EGFR mutations between tissue and blood samples from NSCLC patients. Notably, mMUTE-Seq enabled effective detection of mutations that were present at very low-frequencies in stage I NSCLC patients. The findings suggest that the mMUTE-Seq method has considerable potential for developing diagnosis panels aimed at detecting multiple low-frequency ctDNA.
Comparison of Single-Base Mismatch Discrimination between SpCas9, SpCas9 Variants, and FnCas9
We sought to examine the mismatch tolerance of SpCas9, high-fidelity SpCas9 variants, and FnCas9. As a part of the study, 20 single guide RNAs (sgRNAs) were prepared with single-base mismatches, each at a different position of the KRAS target sequence. These were then tested in in vitro cleavage assays using SpCas9-WT, SpCas9-HF1, SpCas9-HF4, eSpCas9(1.0), eSpCas9(1.1), and FnCas9-WT20,22.
SpCas9-WT induced significant DNA cleavage not only with perfectly matched sgRNA (sgRNA T), but also with sgRNAs that contain single-base mismatches in all 20 positions. SpCas9-HF1, SpCas9-HF4, eSpCas9(1.0), and eSpCas9(1.1), the engineered variants of SpCas9 designed for higher precision, also exhibited a noticeable cleavage for most mismatched sgRNAs. However, FnCas9 presented a tendency for lower rates of in vitro cleavage with mismatched sgRNAs. Together, the in vitro cleavage assays suggested that FnCas9 could potentially discern single-base mismatches more efficiently than SpCas9 and its high-fidelity variants (Fig. 1a and Extended Data Fig. 1).
Engineering FnCas9 for Enhanced Sensitivity to DNA Mismatches
Following the initial studies, the aim was to engineer the FnCas9 protein to produce optimized FnCas9 variants that would exhibit enhanced sensitivity to mismatches. The hypothesis was that by mitigating the interactions between positively charged amino acids and the nucleotide implicated in conformational shift during Cas9-sgRNA-DNA binding, it might be possible to destabilize the R-loop formation in the context of mismatched DNA. This potential instability could subsequently curtail nuclease activity directed towards the mismatched DNA sequences20,22,25,26. To examine this, 49 recombinant FnCas9 proteins were prepared, each containing single amino acid substitution to alanine in positions expected to interact with the phosphate backbone of the target DNA (Extended Data Fig. 2a). Subsequently, in vitro cleavage assays were carried out to test the ability of these 49 FnCas9 single substitution variants to cleave the target sequence with sgRNAs, each bearing a single-base mismatch at a different position. The assays revealed that certain amino acid substitution in FnCas9 led to a significant reduction of cleavage with the single-base mismatched sgRNAs, while preserving on-target cleavage activity (Extended Data Fig. 2b).
To identify the most precise FnCas9 variant, their specificity scores were calculated base on their abilities to discriminate single-base mismatches (Extended Data Fig. 2c). Eleven variants demonstrated specificity scores above 60%: six with modified residues in the REC lobe (R455, R785, K721, K789, R919, and R1241) that interacted with the phosphate backbone of the target DNA strand, and five with altered residues in the NUC lobe (R939, K941, K1189, R1226, K1228) that interacted with the phosphate backbone of the non-target DNA strand (Fig. 1b). Of these 11 variants, it was found that the K1189A and R1241A mutants presented the highest specificity scores, leading to further optimization of FnCas9 based on these mutations (Fig. 1c and Extended Data Fig. 2c).
Multiple Amino Acid Substitutions Enhance Specificity of FnCas9 Across Diverse Target Sites and Mismatch Conditions
We sought to ask if the specificities of the FnCas9 variants could be further enhanced. We observed that the single substitution variants showed residual off-target cleavage at the NRAS target site. In order to improve the specificity, we conducted combinatorial alanine substitutions of FnCas9 at the identified amino acid positions and evaluated the variants for specificity at both the KRAS and NRAS target sites. As a result, two FnCas9 variants (FnCas9-K1189A/R1241A [termed FnCas9-AF1], FnCas9-R785A/K1189A/R1241A [termed FnCas9-AF2]) demonstrated undetectable off-target cleavages while preserving on-target activities for both target sites (Fig. 2a, b).
The specificity of these variants was further assessed using 60 sgRNAs containing all possible single-base mismatches at all 20 positions within the NRAS target sequence. FnCas9-AF1 exhibited superior specificity to the target sequence in comparison to the K1189A single substitution. FnCas9-AF2 showed significantly reduced cleavage rates for all mismatched base positions and base mismatch types (Fig. 2b and Extended Data Fig. 3).
Next, we asked whether the ability of FnCas9-AF2 to distinguish base mismatches can be generalized. To this end, we conducted in vitro cleavage assays using FnCas9-AF2 and sgRNAs with single-base mismatches at all 20 positions within KRAS and EGFR target sequences (Extended Data Fig. 4). The specificities of FnCas9-AF2 with the KRAS and EGFR sgRNAs were significantly higher than FnCas9-WT, suggesting that its sensitivity to single-base mismatches are applicable for other target sequences.
Comparative Specificity Assessment between FnCas9-AF2 and SpCas9 Variants
We evaluated the precision of FnCas9-AF2 on targets comprising single nucleotide variants (SNVs) or indel such as EGFR c.2573 T>G, EGFR c.2369 C>T, EGFR c.2389T>A, KRAS c.35G>A, MET c.3028+1 G>A, and EGFR c.2236_2250 del. For each site, we designed sgRNAs targeting the wild-type sequences, and carried out in vitro cleavage on both the wild-type and mutant DNAs. The wild-type DNA was completely cleaved by FnCas9-AF2 and was observed as two DNA fragments on the gel. However, in the case of the mutant DNA, FnCas9-AF2 did not produce detectable cleavage in targets with both indel and single nucleotide variants (Fig. 3a). We performed a quantitative analysis of the in vitro cleavage efficiency using FnCas9-AF2 and SpCas9 variants on both wild-type and mutant targets including the 5 SNVs and 1 indel. When compared to the SpCas9 variants, FnCas9-AF2 showed extremely low cleavage rate of mutant DNA (0.48% in average) while preserving highly efficient cleavage activity on wild-type DNA (97.11% in average) (Fig. 3b and Extended Data 5).
Genome-wide Analysis of Off-target Effects in FnCas9 and SpCas9 Variants
We then sought to determine whether the higher sensitivity of FnCas9-AF2 is associated with reduced off-target effects in the whole genome. In Digenome-seq analyses27, SpCas9-WT exhibited 654 potential off-target sites. In contrast, FnCas9-WT had only 77 potential off-target cleavage sites, underscoring FnCas9’s enhanced specificity. The high-fidelity SpCas9 variants further minimized off-target cleavage sites, with eSpCas9(1.1) registering 37 sites and SpCas9-HF4 having 13. Remarkably, FnCas9-AF1 presented only one potential off-target cleavage site, while FnCas9-AF2 showed even higher precision with zero detectable off-target site (Fig. 3d, e). Therefore, FnCas9-AF2 was selected for further analysis and application.
Precision Enrichment of Mutant Alleles in AML Patients Using MUTE-Seq: An FnCas9-AF2-Based Wild-type Allele Depletion Technique
Utilizing FnCas9-AF2, we introduce MUTE-Seq, a novel technique that overcomes the limitations of applicable sequences in enriching minor alleles through the highly precise elimination of the major counterparts. To evaluate the capacity, we conducted MUTE-Seq on human genomic DNA (gDNA) samples obtained from the bone marrow of eight AML patients who were monitored for MRD. Our primary aim was to identify specific NRAS mutations, G12D, G12C, and G13D. To accomplish this, we designed sgRNA to target the NRAS locus, encompassing G12 and G13. The samples were divided into two groups: the MUTE-Seq group, which underwent in vitro cleavage with FnCas9-AF2 to eliminate wild-type DNA, and the control group, which remained untreated. Subsequently, we analyzed the VAFs in both the MUTE-Seq and control groups using Sanger sequencing. (Fig. 4a).
In the control group, NRAS mutations in DNA samples from patients 1 through 6 ranged from 3% to 16.9%, whereas samples from patients 7 and 8 exhibited no detectable NRAS mutations (Fig. 4b). Chromatograms of the control group with detectable mutant alleles displayed relatively lower peak sizes for mutant bases compared to wild-type base peaks. In contrast, MUTE-Seq group exhibited distinct double peaks at the positions of mutant bases in the chromatograms (Fig. 4c). Moreover, VAFs were significantly higher in the MUTE-Seq group (Fig. 4d). Notably, samples with lower VAFs in the control group displayed a more pronounced increase in VAFs following MUTE-Seq enrichment. For instance, the P1 sample with low initial VAF showed an 8.1-fold increase post-enrichment, whereas P2, with a relatively higher initial VAF, exhibited a 3.3-fold enrichment.
Furthermore, a substantial level of agreement was observed between the VAFs obtained from the control group and MUTE-Seq group (Fig. 4e). Additionally, we employed NGS to measure the VAF of both control and MUTE-Seq groups, finding relatively similar detected VAFs (R²=0.958) between Sanger and NGS in both groups (Fig. 4f). Notably, for samples 7 and 8, both the control and MUTE-Seq groups exhibited VAFs below the detectable limit in both Sanger sequencing and NGS.
Next, we aimed to validate the detection limit of MUTE-Seq below 2.5 % VAF, which is hard to discern with Sanger sequencing. For this test, We prepared blended gDNA samples by mixing NRAS wild-type and G13D-mutant gDNA from 0.25% to 2.5% VAFs. Quantification of NRAS G13D allele frequencies showed that the MUTE-Seq group consistently exhibited elevated VAFs compared to controls: rising to 11.1% from 0.25%, 16.4% from 0.5%, 31.8% from 1.25%, and 44.6% from 2.5% (Fig 4g, 4h). The increased levels of mutant ratios in MUTE-Seq were high-enough for definitive detection by Sanger sequencing. Significantly, the correlation between pre- and post-enrichment was highly maintained with a R²=0.977 (Fig. 4h). We also noticed higher amplification efficiencies for lower VAFs in the control group, similar to the previous analyses on samples from AML patients. The results suggested that MUTE-Seq methods could surpass the detection limit of Sanger sequencing for identifying low-prevalence mutations in cancer samples.
Multiplexed MUTE-Seq (mMUTE-Seq) Approach for Simultaneous Enrichment of Low-Frequency Mutations in cfDNA from Non-Small Cell Lung Cancer (NSCLC) Patients
We asked whether the MUTE-Seq could simultaneously enrich multiple mutant alleles in cfDNA via multiplexed manner. We conducted quantitative analyses of multiplexed MUTE-Seq (mMUTE-Seq) by applying 10 sgRNAs simultaneously on reference materials containing NSCLC associated mutations with VAFs of 1%, 0.1% and 0% (Fig 5a). We found that the VAFs of mutations present in the reference materials (EGFR exon 19 deletion, EGFR L858R, EGFR T790M, and KRAS G12D) were significantly increased by the mMUTE-Seq (Fig 5a). Notably, 0% samples demonstrated undetectable VAFs in both the unenriched control (Deep-Seq) and the mMUTE-Seq groups, suggesting that the VAF increment by the mMUTE-Seq was specific. We further analyzed the average enrichment efficiencies of all detected VAFs by the mMUTE-Seq and observed 34.2- and 66.2-fold increase of detection rates in 1% and 0.1% initial VAF samples, respectively (Fig 5b). In contrast, multiplexed mutant enrichment using SpCas9-WT, eSpCas9(1.1), SpCas-HF4, and FnCas9-WT showed only slight increment in the VAF detection rates, and the detected VAFs of the 0.1% initial VAF sample remained undetectable (Extended Data Fig 6).
To evaluate the sensitivity and specificity of mMUTE-Seq, we conducted statistical analyses on variants detected in reference materials at the 1% and 0.1% VAFs. We first determined the cut-offs for VAFs, calculated as three times the interquartile range above the third quartile (Q3), to minimize the influence of outliers. The cut-offs of Deep-Seq and mMUTE-Seq were determined as 0.21% and 0.55%, respectively. Subsequent analyses showed a notable difference in sensitivity between the two methods. (Fig 5c) Specifically, mMUTE-Seq achieved a sensitivity score of 1, significantly outperforming Deep-Seq, which had a sensitivity of 0.65. However, both methods demonstrated comparable specificity, each scoring 0.95.
Next, we asked if mMUTE-Seq could be applied to identify low-frequency ctDNAs in cfDNAs from the plasma of cancer patients. To test the clinical utility of mMUTE-Seq, we compared the VAFs from mMUTE-Seq and Deep-Seq of 10 NSCLC patients who were diagnosed positive for EGFR mutation. Importantly, when using the mMUTE-Seq, we observed an average 11.81-fold increase in the detected VAFs of mutations residing within the regions targeted by the sgRNAs (Fig 5d).
To evaluate the performance of mMUTE-Seq in a liquid biopsy-based genotyping of cancer, we compared mutation profiles derived from both tissue and cfDNA in NSCLC patients. We found notable correlation between the samples in key mutations commonly observed in NSCLC, including EGFR exon 19 deletion, EGFR L858R, and EGFR T790M (Fig 5e). The mMUTE-Seq consistently identified tissue specific-mutations within cfDNA across all 10 patients, highlighting its utility for liquid biopsy-based detection of cancer mutations. Only two cases exhibited discordant mutational profiles: one case had a mutation uniquely identified in the tissue, while the other had a mutation exclusively detected in the cfDNA. In contrast, the Deep-Seq method exhibited lower concordance between tissue and cfDNA (Extended Data Fig 7). Coherent detection was observed in only 6 out of the 10 pairs. Consequently, mMUTE-Seq achieved a significantly higher sensitivity score of 0.91, compared to the 0.55 sensitivity of Deep-Seq. However, both methods displayed comparable specificity, each registering at 0.95 (Fig 5f and Extended Data Fig 8).
Previous studies have shown that detection of ctDNA tends to be less sensitive in early stages of cancer compared to later stages4,5,28,29. We asked if mMUTE-Seq could facilitate sensitive detection of stage I cancer in cfDNA. To this end, we generated receiver operating characteristics (ROC) curves of the above patient data for stage I to IV together, and stage I separately (Fig 5g). The area under the ROC curve (AUC) for all the stages were 0.96 for the mMUTE-Seq group, which was significantly higher than AUC of 0.72 in the Deep-Seq group. Notably, for stage I only, the AUC of the mMUTE-Seq and the Deep-Seq groups were 1.0 and 0.70, respectively. The results suggested that mMUTE-Seq offers a sensitive method for detecting mutations in the early stages of cancer.