In earlier studies, we and others had reported that FnCas9 has a very high intrinsic specificity, resulting in dissociation from off-targets presented in vitro4,19. In contrast, SpCas9 and its high-fidelity variants remain bound to off-target sites in a cleavage incompetent fashion, a property that might cause non-specific off-targeting outcomes from such regions20,21,22. To investigate if FnCas9’s high DNA binding specificity is reflected on a genome wide level, we constructed catalytically inactive (dead, d) dSpCas9 and dFnCas9 and targeted the c-Myc locus where comparable cellular editing efficiencies between SpCas9 and FnCas9 were observed previously4. Using chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq)23–25, we found that although both dSpCas9 and dFnCas9 were tightly bound to the on-target sites, dSpCas9 showed promiscuous binding at multiple off-targets (27 sites, 0.01 FDR) across the genome, even at sites with up to 6 mismatches in the sgRNA (Supplementary Fig. 1A). Whereas, all the 27 dSpCas9 off-target sites showed greater enrichment than the on-target, dFnCas9 was bound to 6 off-target sites (0.01 FDR) all of which showed at least 1.2-fold lower enrichment than the on-target (Supplementary Fig. 1B-C; Supplementary Table 2). This high specificity of binding in vivo thus presented an attractive scenario for structure-guided engineering to enhance the activity of the FnCas9 enzyme at sites where editing was minimal.
Towards engineering the protein, we used an approach of stabilizing FnCas9:DNA binding by introducing non-specific FnCas9:PAM interactions, based on recent mechanistic studies on SpCas9 highlighting the directional PAM-duplex DNA unwinding as the rate-limiting checkpoint of Cas9 action for R-loop expansion (Supplementary Note 1)21, 26–31. Additionally, we also investigated if the FnCas9 sgRNA length might also be a factor determining its DNA cleavage activity.
To discover the optimal length of gRNA for FnCas9, we performed an in vitro cleavage assay using a previously reported target DNA harboring a stretch of guanines with FnCas9 RNP containing variable length of gRNAs ranging from 20 to 24 nucleotides32 (nt, hereafter referred g20-g24). Interestingly, we observed the lowest activity with the canonical g20 while all other extended length gRNAs exhibited enhanced DNA cleavage rate with g21 inducing the fastest rate of cleavage (Supplementary Fig. 2A). We used g21 in all our subsequent assays unless stated otherwise.
Next, we engineered 49 different FnCas9 variants guided by its crystal structure bearing mostly single amino acid substitutions in the WED-PI domain to introduce novel PAM duplex DNA contacts (Fig. 1A, Supplementary Table 3). We then measured in vitro DNA cleavage activities of the FnCas9 variants against a DNA target containing GGG PAM (where FnCas9 was shown to be least active)3 and performed DNA cleavage experiments with the engineered variants. Recent reports have suggested that high-fidelity SpCas9 variants have slower enzyme kinetics and concomitant lower editing efficiencies5,6. FnCas9 being an enzyme with high fidelity, we therefore focused on engineered (en) FnCas9 variants showing a combination of faster cleavage rate and minimum structural alterations to ensure that its intrinsic specificity remains unchanged. A subset of nine enFnCas9 variants (containing single/combinatorial mutations) were selected for downstream experiments satisfying this criteria (Fig. 1B, Supplementary Note1). Among them, three variants (en1, en15 and en31) had at least 2-fold higher cleavage rates than the wild type protein (Fig. 1B). Intrigued by this observation, we tested cleavage efficiency of two of the enFnCas9 variants (en15 and en31) using super-extended gRNAs with 26 to 28-nt protospacer (g26-g28, hereafter referred as sx-gRNA) and confirmed similar cleavage efficiencies as g21 which suggests the compatibility of enFnCas9 variants with sx-gRNAs (Supplementary Fig. 2B). To our knowledge, similar observations have not been made so far for other Cas systems and this might offer further enhancement of specificity and nucleobase accessibility away from PAM as shown later.
Earlier reports have shown that engineered SpCas9 variants often create additional phosphate backbone interactions and facilitate these proteins to recognize non-canonical PAMs33,34. To test if enFnCas9 variants show a similar relaxation in PAM recognition, we selected a subset of five enFnCas9 variants based on their enhanced activity at the non-canonical NGA PAM containing DNA substrates (Supplementary Fig. 2B-C) and performed an in vitro PAM discovery assay. Deep sequencing of the PAM depleted library containing randomized 8 bp sequence (48 = 65,536 combinations in total) revealed that enFnCas9 variants showed more flexible recognition in second and third nucleotide positions as compared to FnCas9 (Supplementary Figs. 3,4). Importantly, for all the enFnCas9 variants tested, NGG PAM was relaxed to NGR/NRG thereby expanding (~ 3.5 fold over wild type FnCas9) the scope of accessibility across the human genome to just below SpCas9-RY35 and SpCas9-NG31 (Fig. 1D, Supplementary Fig. 6A, Supplementary Table 1).
The remarkable intrinsic specificity of FnCas9 to single-nucleotide mismatches in the target has proven effective both in disease diagnostics and disease correction4. At the level of diagnostics, FnCas9 has been utilized for paper strip-based robust detection of nucleic acid targets through the FnCas9 Editor Linked Uniform Detection Assay (FELUDA) and Rapid Variant Assay (RAY) platforms36,37. In contrast to collateral cleavage based platforms employed by Type V effectors (such as Cas12a38 or Cas12f39) or Type VI effectors (such as Cas1340), FELUDA and RAY uses the specificity of direct FnCas9:DNA binding as a lateral-flow readout through a combination of FAM-labeled FnCas9:sgRNA complex and paper strip chemistry (Fig. 1D)36,37. We anticipated that in comparison to FnCas9-based FELUDA, enFnCas9 (with NRG/NGR PAM)-based FELUDA can now cover ~ 2-fold higher number of reported Mendelian SNVs across the human genome thereby increasing the scope of detection to more disease-causing variants (Fig. 1E). Expectedly, on a lateral flow strip, all enFnCas9 variants tested (complexed with 20-nt gRNA, g20) showed robust activity on a substrate carrying the non-canonical NGA PAM whereas FnCas9 did not show any signal (Supplementary Fig. 5A). Importantly, enFnCas9 variants showed similar resolution of single nucleotide variant (SNV) diagnosis (4.4-fold) as compared to AaCas12b (4.6-fold) and Cas14a1 (5.1-fold) both of which belong to type V DNA targeting Cas systems and have been reported to have higher intrinsic specificity than SpCas910,11,15,16,41,42 further establishing its utility as a diagnostic platform (Supplementary Fig. 5B).
Since enFnCas9 variants were constructed by altering residues that stabilize the PAM duplex binding keeping the DNA interacting domains (responsible for PAM distal mismatch sensitivity) untouched, we speculated that they should still retain the high specificity as WT FnCas9. Indeed, upon performing a mismatch walking assay along the full sequence of the g20, the three highest activity enFnCas9 variants (en1, en15, and en31) all showed grossly similar specificity for mismatch tolerance as FnCas9 (Supplementary Fig. 5C). For all the enzymes, tolerance to mismatches was lowest at the most PAM proximal (1st and 2nd) and distal (15th-19th) bases. However, unlike FnCas9, the stringency for mismatch tolerance for all the variants was lower towards the middle part of the sgRNA (PAM distal 9–11 bases). This can be attributed to faster cleavage rates of enFnCas9 variants since even for FnCas9, longer incubation times can lead to substrate cleavage with mismatches in these positions4. To determine if these changes in enFnCas9 variants might affect their diagnostic potential, we selected the enFnCas9 variant with the broadest activity at altered PAM sites (en31) and investigated if it was able to distinguish single mismatches in two targets with pathogenic mutations related to Sickle Cell Anemia and the SARS-CoV-2 Alpha VOC signature (N501Y). Remarkably, en31 accurately distinguished both the target SNVs on a lateral flow device (Fig. 2F, Supplementary Fig. 5D) with an improved signal discrimination (> 3.5-fold) as compared to FnCas9 (Supplementary Fig. 5E). We confirmed that the same specificity of SNV discrimination was also extended for an NGA PAM-containing substrate as well (Supplementary Fig. 5F). Taken together, enFnCas9 variants have a very high specificity of mismatch discrimination similar to Cas12a or Cas12f but due to their wider PAM accessibility, these can potentially target more genomic sites and pathogenic SNVs for detection.
We next investigated if engineering FnCas9 by altering residues that interact with PAM in the substrate had altered its binding affinity to DNA. Using catalytically inactive versions of two of the variants (en1 and en15) we performed microscale thermophoresis (MST) to determine their DNA binding affinities on a substrate (VEGFA) with a 20-nt gRNA as reported earlier4. We found that these variants showed stronger DNA binding (Kd = 91.33 ± 29.8 nM for en1, Kd = 49.16 ± 10.96 nM for en15) as compared to FnCas9 (Kd = 170 ± 31.53 nM), with en15 showing ~ 3.5-fold higher DNA binding affinity (Supplementary Fig. 6B,C). Interestingly, in our previous study4, we showed that FnCas9 showed weaker binding to the same substrate as SpCas9 (3.02-fold). Thus, engineering improved enFnCas9:DNA binding affinity, reaching similar levels as SpCas9 but with superior specificity.
The safety of therapeutic genome editing is determined by off-target interrogation of CRISPR effectors. Although Cas12a and Cas12f have higher specificity than SpCas9, their therapeutic success relies on minimum ssDNA cleavage inside the cell such as those formed during replication, homology-directed repair, or transcription38,43. Interestingly, Cas12a has been reported to nick off-target DNA substrates with up to four mismatches depending upon the crRNA sequences employed44. On the contrary, enFnCas9 does not produce trans-cleavage products, and its high specificity both at the level of DNA interrogation and cleavage might be beneficial for safe nuclease-mediated genome editing. Although construction of high-fidelity SpCas9 proteins have improved its overall specificity, this is also accompanied by lower editing efficiencies5,45,46. We selected two such proteins (SpCas9-HF1 and eSpCas9) due to their balanced activity and specificity as reported in literature5,45,46 and compared their cellular editing rates (insertion/deletions) with one of the enFnCas9 variants, en1. We used 20-nt protospacer containing gRNAs for which bona-fide off-targets were identified either through in silico prediction or GUIDE-Seq4,47. Encouragingly, en1 showed higher editing rates than the wild-type protein or the SpCas9-HF1 and eSpCas9 variants at all the loci tested without any detectable editing at the corresponding off-targets (Fig. 2A, Supplementary Fig. 6D-E). Similarly, we confirmed successful genome editing by enFnCas9 variants (en1 and en15) in retinal pigmented epithelial cells (ARPE-19) and induced pluripotent stem cells (iPSCs) (Supplementary Fig. 6F-G). Notably, in iPSCs, en1 (18.6% indels) and en15 (23.0% indels) showed superior editing rates at the PAX6 locus when compared to even SpCas9 (13.8%) in unsorted cell populations (Supplementary Fig. 6F).
As seen in our in vitro studies, editing rate with enFnCas9 variants went up dramatically reaching ~ 90% at therapeutically relevant sickle cell locus HBB in HEK293T cells when combined with g21 (Fig. 2B). Similarly, g21 gave robust genome editing outcomes (up to 90%) with all the enFnCas9 variants at other loci too (EMX1 and FASN) (Supplementary Fig. 7A-B).
Next, we investigated if the high editing efficiency and DNA binding affinity compromised the single mismatch specificity of the enFnCas9 variants. To this end, we interrogated the FANCF site2 in HEK293T cells for which GUIDE-Seq validated off-target with a single PAM proximal mismatch was reported even by high-fidelity SpCas9 variants from independent studies21,45,48. Expectedly, we found comparable off-target editing (25% and 27%) as the on-target site (30% and 29%) by SpCas9-HF1 and eSpCas9 respectively (Fig. 2C). In sharp contrast, negligible (~ 1%) editing at the single mismatch off-target was observed for all the enFnCas9 variants when g20 was used, albeit with lower on-target editing (15–20% across the enFnCas9 variants) while FnCas9 did not induce substantial editing (~ 2%) (Fig. 2C). Interestingly, using g21 or g22 increased the on-target editing efficiency up to 45% with en15 but no increase in off-targeting was seen (Fig. 2C). A similar trend was seen for both en1 and en31 although en1 showed a small increase in off-target editing with a g21/22 which was still around three-fold lower than the high fidelity SpCas9 variants tested (Fig. 2C). Taken together, this underscores the combinatorial action of enFnCas9 variants and extended length gRNAs for highly precise and robust editing.
The activity of enFnCas9 variants on non-canonical PAMs (NGR/NRG) observed in vitro prompted us to evaluate the genome editing efficiencies of these variants on such altered PAM targets in human cells. Given its highest in vitro rate of DNA cleavage both at canonical NGG and non-canonical NGA PAM, en31 was additionally examined for cellular genome editing on the targets with NGA/NAG PAM. Two GUIDE-Seq validated gRNAs targeting an NGA PAM at RUNX1 and ZNF629 that had previously been reported49 with highly promiscuous off-targets were investigated alongside an additional NGA containing FANCF1 site2 gRNA. We confirmed robust editing at all the three loci (~ 80% at FANCF1, ~ 60% at RUNX1 and ~ 20% at ZNF629) (Fig. 2D). Expectedly, g21 was able to induce editing outcomes wherever g20 failed to do (ZNF629) (Fig. 2D). Remarkably, while previous reports had shown greater off-target editing than on-target activity with SpCas9 variants with special emphasis on OT12 of ZNF629 site which is an identical stretch of the on-target site49, we were unable to detect any off-target editing with the en31 variant at any of these loci except at OT12 of ZNF629 site (Fig. 2D). Despite being identical to the on-target of ZNF629 site, off-targeting at OT12 was marginally detected on contrary to SpCas9 variant49. Furthermore, we also confirmed robust editing by en31 in one (~ 70% at FANCF) out of three sites having NAG PAM with g21 (Supplementary Fig. 7C). Our results suggest that the PAM preference of en31 nuclease ranges from NGG > NGA > NAG while retaining superior specificity of DNA interrogation even in the sites showing preponderance of off-targeting by high-fidelity SpCas9 variants. Finally, we speculated that higher editing outcomes by enFnCas9 variants might reflect in both higher NHEJ mediated indels or HDR mediated knock-in rates. Expectedly, we observed higher HDR mediated knock-in of a long donor template (4.1 kb) at the DCX locus in HEK293T cells for both en1 and en15 as compared to SpCas9-HF1 and eSpCas9 (Fig. 2E). Collectively, en1 nuclease showed a higher rate of gene editing (NHEJ/HDR) at all the target loci tested highlighting its suitability as a highly potent genome-editing protein.
Despite the promises in Cas9 nuclease-based gene editing approaches, on-target genotoxicity combined with complex gene rearrangements has raised concerns about its use in therapeutic settings50–54. In contrast, the development of double-strand break (DSB)-free editing approaches such as base editing and prime editing has shown tremendous promise as safer alternatives55. Nevertheless, both the approaches suffer from guide-dependent off-targeting due to its reliance on enzymatically defective or inactive Cas9 for binding, an imperative feature for DSB free editing56,57. We sought to develop FnCas9/enFnCas9 base editors owing to its remarkable specificity of binding to cognate nucleobases both in vitro and in human cells (Fig. 2F). Among the enFnCas9 variants, en31 showed the broadest PAM flexibility and coupled with its robust indel activity in human cells appeared to be an ideal candidate for evaluation as a base editor. To this end, we generated adenine base editor variants for FnCas9/en31 following previously reported ABEmax (ABE8.17dV106W) configurations which are shown to be highly efficient with improved gRNA-independent editing profiles - a feature important to maintain transcriptome fidelity during base editing58,59. Given the larger share of ABE for pathogenic SNP correction55, we characterized FnCas9/en31-ABE for editing in human cells and compared it with SpNG-ABEmax8.17d, another PAM flexible ABE variant that has been widely reported in literature59. Given the larger share of ABE for pathogenic SNP correction60, we characterized FnCas9/en31-ABE for editing in human cells and compared it with SpNG-ABEmax8.17d, another PAM flexible ABE variant that has been widely reported in literature59. We chose the therapeutically relevant − 113/-116 sites of HBG1/2 promoter responsible for hereditary persistence of fetal hemoglobin (HPFH)60, a rare genetic condition known to ameliorate Sickle cell disease phenotype and the commonly used EMX1 site in HEK293T. For both loci, we observed low A > G substitution (1.7%/0.0% A6/A9 of -113/-116 and 3.7% A9 of EMX1) with en31ABEmax8.17d with sg20 but drastically improved A > G substitutions (14%/2.5% A6/A9 of -113/-116 and 3.7%/10.7%/13.33%/12.7% of A9, A12, A15 of EMX1) with sg21 (Fig. 2G, Supplementary Fig. 7D). Notably, SpNG-ABEmax8.17d showed reduced editing at both loci (6.7%/5.7% A6/A9 of -113/-116 and 0% of A12, A15 of EMX1) while the off-targeting profile in the validated off-target of EMX1 (EMX1-OT1) was poor albeit the off-target editing was very low owing to natural specificity of ABEs (Fig. 2G, Supplementary Fig. 7D). However, wild type FnCas9ABEmax8.17d did not induce any appreciable A > G substitution over the baseline. We confirmed robust A > G substitution efficiency up to 72% with en31ABEmax8.17d at different sites of the therapeutically relevant HBG1/2 promoter (-111, -123/124, -175, -198) with g21 outperforming g20 in all the sites tested (Supplementary Fig. 7E, F, G). Thus, en31ABEmax8.17d with a g21 showed robust base editing in human cells with higher target base substitutions than SpNG-ABEmax8.17d in the tested loci.
We speculated that widened PAM accessibility coupled with extended length sgRNAs might offer en31ABE distinct possibilities of base editing where conventional base editors might not be able to target the desired base. Because of protospacer length restrictions to 19/20-nt, SpCas9 base editors (ABE8s) can only target bases which are within the targeting window of the deaminase (PAM-distal 3rd to 9th bases counting PAM at positions 21–23)59. For editing other sites far away from the nearest available PAM, protein engineering to recognize a new PAM has been reported18. The en31ABE protein showed a wider editing window with respect to the PAM (PAM-distal 3rd to 14th bases counting PAM at positions 22–24) when interrogated at a loci with alternate adenine bases (Fig. 2I, Supplementary Fig. 7H-I). Moreover, since enFnCas9 can tolerate sx-gRNA such as sg26 or sg28 (Supplementary Fig. 2B), we hypothesized that combining the two properties could facilitate the shifting of adenine base editing window to target inaccessible bases away from PAM. To validate this, we chose two loci in the human genome EMX1 (NGG PAM) and SERPINI1 (NGA PAM) with a target base situated inaccessible PAM-distal positions (3rd position for EMX1 and 1st position for SERPINI1) (Fig. 2J, Supplementary Fig. 8B). Remarkably, by systematic modulation of gRNA lengths (g22-g26), we were successful in gradually shifting the editing window to the desired target while the target base editing in the primary window got serially diluted (Fig. 2J, K; Supplementary Fig. 8A, B). Thus, combining PAM flexibility (NGR/NRG) and extended length sgRNAs (up to 26) theoretically improves the target range of en31ABE to 99.39% of all human G > A pathogenic SNVs identified in ClinVar62,63 (Supplementary Fig. 7J).
Finally, towards the proof-of-concept validation of en31ABEmax8.17-based disease correction, we tested this protein for an ophthalmic condition where genetically corrected iPSC-derived Retinal Pigment Epithelium (RPE) sheet transplantation can be a viable therapeutic modality64. To this end, we isolated human dermal fibroblasts (HDFs) of a patient with retinitis pigmentosa from skin biopsy sample and reprogrammed it to generate hiPSCs that were further characterized for genetic identity, stemness markers and pluripotency (Supplementary Fig. 8A-D). This patient was diagnosed with significant retinal thinning and attenuated photoreceptor cell layer in Optical Coherence Tomography (OCT) due to generation of premature stop codon (p.Trp331Ter) at c.992 stemming from single base substitution from G to A on exon9 of RPE65 (TGG > TAG) (Fig. 2L). The hiPSC line was treated with en31ABE8.17d and the mutation specific sgRNA. We confirmed the successful installation of A > G substitution of the disease mutation in an unsorted population (21% with sg21, NGG PAM). Since, there were adjacent alternate PAMs present at this locus, we also validated successful A > G editing using these PAM sites (13% GGA and 8% AAG) conforming to the earlier observations of en31 nuclease activity NGG > NGA > NAG. Importantly, two of the clonally expanded iPSC lines derived from the edited cells showed 100% correction of the mutated base (A8) with very high base purity (undetectable bystander edits at A10-12) (Supplementary Fig. 8E). Thus, en31ABEmax8.17d can be successfully utilized for robust, precise, nucleobase correction with undetectable bystander edits in therapeutic conditions.