Engineered PAM- exible FnCas9 variants for robust and speci c genome editing and diagnostics

The clinical success of CRISPR therapies is dependent on the safety and efficacy of Cas proteins. The Cas9 from Francisella novicida (FnCas9) has negligible affinity for mismatched substrates enabling it to discriminate off-targets in DNA with very high precision even at the level of binding. However, its cellular targeting efficiency is low, limiting its use in therapeutic applications. Here, we rationally engineer the protein to develop enhanced (enFnCas9) variants and expand its cellular editing activity to genomic loci previously inaccessible. Notably, some of the variants release the protospacer adjacent motif (PAM) constraint from NGG to NGR/NRG making them rank just below SpCas9-RY and SpCas9-NG in their accessibility across human genomic sites. The enFnCas9 proteins, similar to Cas12a and Cas12f, harbor high intrinsic specificity and can diagnose single nucleotide variants accurately. Importantly, they provide superior outcomes in terms of editing efficiency, knock-in rates, and off-target specificity over other engineered high-fidelity versions of SpCas9 (SpCas9-HF1 and eSpCas9). Broad targeting range coupled with remarkable specificity of DNA interrogation underscores the utility of these variants for safe and efficient therapeutic gene correction across multiple cell lines and target loci.

genomic sites. The enFnCas9 proteins, similar to Cas12a and Cas12f, harbor high intrinsic specificity and can diagnose single nucleotide variants accurately. Importantly, they provide superior outcomes in terms of editing efficiency, knock-in rates, and off-target specificity over other engineered high-fidelity versions of SpCas9 (SpCas9-HF1 and eSpCas9). Broad targeting range coupled with remarkable specificity of DNA interrogation underscores the utility of these variants for safe and efficient therapeutic gene correction across multiple cell lines and target loci.

Main
Like the orthogonal Streptococcus pyogenes Cas9 (SpCas9) protein, FnCas9 too interacts with the minimal NGG protospacer adjacent motif (PAM) yet shows a much higher sgRNA sequence-dependent specificity when interrogated with DNA substrates [1][2][3][4] . Although high-fidelity versions of SpCas9 have been designed and validated in multiple systems, their editing efficiencies have generally dropped significantly as compared to the wild-type enzyme 5,6 . To circumvent these issues, in recent years, alternate high-efficiency Cas systems from other microorganisms have been demonstrated for genome editing [7][8][9][10][11] . Notably, very few show editing efficiencies either similar or higher than SpCas9, and the majority of these enzymes have a PAM requirement that is more complex and less available in the human genome than SpCas9, limiting the number of possible sites accessible for therapeutic correction 12,13,14,15,16 (Supplementary Table 1).
In earlier studies we and others had reported that FnCas9 has a very high intrinsic specificity, resulting in dissociation from off-targets presented in vitro 4,17 . In contrast, SpCas9 and its high-fidelity variants remain bound to off-target sites in a cleavage incompetent fashion 18,19 . Thus, although off-target cleavage has been reduced in these variants, they are still able to negotiate these loci at the level of binding, a feature that might cause non-specific off-targeting outcomes from such regions 20 . We speculated that if the high specificity of off-target discrimination at the level of DNA binding is also retained in vivo, FnCas9 might present a more specific editing scope, particularly relevant in therapeutic applications. To compare their genome-wide binding propensities, we constructed Hemagglutinin (HA)-tagged catalytically inactive (dead, d) dSpCas9 and dFnCas9 and targeted a locus (c-Myc) where comparable editing efficiencies between SpCas9 and FnCas9 were observed previously 4 (Supplementary Figure 1a). Using chromatin immunoprecipitation followed by massively parallel sequencing, we found that both SpCas9 and FnCas9 were tightly bound to the on-target sites explaining their high editing rates (Supplementary Figure   1A). Similar to earlier reports for other loci [21][22][23]dSpCas9 showed promiscuous binding at multiple off-targets (27 sites, 0.01 FDR) across the genome, even at sites with up to 6 mismatches in the sgRNA. Interestingly, all the 27 sites showed greater enrichment than the on-target (Supplementary Table 2 Table 2). This high binding specificity in vivo thus presented an attractive scenario for structure-guided engineering to enhance the activity of the FnCas9 enzyme at sites where editing was minimal.
FnCas9 is evolutionarily divergent to SpCas9 and harbors some structural dissimilarities such as unique interactions between the RuvC and REC3 domains, and the PI and WED domains with the latter sharing contacts with the REC1 and REC2 domains 1,3 . However, PAM recognition is conserved among Cas9 orthologs which trigger directional target DNA unwinding, R-loop formation and expansion. This eventually leads to reorientation of the HNH endonuclease domain to DNA cutting and concomitant RuvC activation leading to concerted DNA cleavage 24 . Recent mechanistic studies showed that the directional PAM-duplex DNA unwinding serves as the rate-limiting checkpoint of Cas9 action and a conformational switch discriminates Cas9 DNA binding and cleavage events 18,[25][26][27][28][29] . Moreover, the loss of nucleobase-specific interaction between the target DNA and Cas9 was reported to be rescued by base non-specific Cas9 interactions 3,30 . Thus, we reasoned that stabilizing FnCas9:DNA duplex binding by introducing base non-specific interactions between PAM duplex and the protein might improve FnCas9 nuclease activity without compromising its intrinsic specificity (Supplementary Note 1).
We engineered 49 different FnCas9 variants guided by its crystal structure bearing mostly single amino acid substitutions in the WED-PI domain to introduce novel PAM duplex DNA contacts ( Figure 1A, Supplementary Table 3). We then measured in vitro DNA cleavage activities of the FnCas9 variants against a DNA target containing GGG PAM (where FnCas9 was shown to be least active) 3 and performed target DNA cleavage experiments with the FnCas9 variants. Recent reports suggested that highfidelity SpCas9 variants result in lower cellular editing efficiency than SpCas9, a property attributed to their slower enzyme kinetics and in particular their ability to sense PAM distal mismatches 5,6 . Since FnCas9 naturally has a very high sensitivity to PAM distal mismatches, we assayed for engineered FnCas9 (enFnCas9) variants with faster cleavage activity on a DNA substrate and selected a subset of nine enFnCas9 variants (containing single/combinatorial mutations) ranked based on the number of amino acid substitutions and its position on the protein ( Figure 1B Figure 1D) 34,35 . We predicted that due to their broadened PAM accessibility, enFnCas9 variants (NRG/NGR) can now cover 82.2 % of the reported Mendelian SNVs across the human genome (compared to 40.46 % by FnCas9) thereby increasing the scope of detection to more disease-causing variants ( Figure   1E). Expectedly, on a lateral flow strip, all enFnCas9 variants tested showed robust activity on a substrate carrying the non-canonical NGA PAM whereas FnCas9 did not show any signal (Supplementary Figure 5A).
In recent studies, several type V DNA targeting Cas systems (such as Cas12a or Cas12f) have been characterized and engineered for genome editing 9,10,14-16,39,40 .
These Cas effectors have a naturally occurring high intrinsic specificity to mismatches 41,42,43,15 and induce non-specific single-stranded DNA (ssDNA) transcleavage upon target activation 36,37,44 . Indiscriminate cleavage of ssDNA reporters has been utilized as a read-out for detecting point mismatches in targets. To compare the inherent specificities across these naturally occurring Cas proteins, we purified FnCas9 34 , AaCas12b 44 , and Cas14a1 37 and compared their single mismatch specificities using sgRNA design principles reported for each of the CRISPR proteins with their respective reporter system (trans-cleavage for Cas12/14 and affinity for FnCas9). All the three Cas effectors were able to discriminate SNVs from the WT sequence with a signal resolution suitable for diagnosis (4.4-fold for FnCas9, 4.6-fold for AaCas12b, and 5.1-fold for Cas14a1) suggesting that they are all useful for in vitro discrimination of mismatched substrates and have very high intrinsic specificity (Supplementary Figure 5B).
Since enFnCas9 variants were constructed by altering residues that stabilize the PAM duplex binding keeping the DNA interacting domains (responsible for PAM distal mismatch sensitivity) untouched, we speculated that they should still retain the high specificity as WT FnCas9. Indeed, upon performing a mismatch walking assay along the full sequence of the sgRNA, the three highest activity enFnCas9 variants en1, en15, and en31 all showed grossly similar specificity for mismatch tolerance as FnCas9 (Supplementary Figure 5C). For all the enzymes, tolerance to mismatches was lowest at the most PAM proximal (1st and 2nd) and distal (15th-19th) bases.
However, unlike FnCas9, the stringency for mismatch tolerance for all the variants was lower towards the middle of the sgRNA (PAM distal 9-11 bases). This can be attributed to faster cleavage rates of enFnCas9 variants since even for FnCas9, longer incubation times can lead to substrate cleavage with mismatches in these positions 4 .
To determine if these changes in enFn variants might affect their diagnostic potential, we selected the enFnCas9 variant with the broadest activity at altered PAM sites We confirmed that the same specificity of SNV discrimination was also extended for an NGA PAM-containing substrate as well (Supplementary Figure 5F). Taken together, enFnCas9 variants have a very high specificity of mismatch discrimination similar to Cas12a or Cas12f but due to their wider PAM accessibility, these can potentially target more genomic sites and pathogenic SNVs for detection ( Figure 1C). We next investigated if engineering FnCas9 by altering residues that interact with PAM in the substrate had altered its binding affinity to DNA. To test this, we constructed  Figure 6A). Interestingly, in our previous study 4 , we had seen that FnCas9 showed weaker binding to the same substrate as SpCas9 (3.02-fold). Thus, engineering improved enFnCas9:DNA binding affinity, reaching similar levels as SpCas9 but with superior specificity.
The safety of therapeutic genome editing is guided by off-target interrogation of CRISPR effectors. Although Cas12a and Cas12f have higher specificity than SpCas9, their therapeutic success relies on minimum ssDNA cleavage inside the cell such as those formed during replication, homology-directed repair, or transcription 36,45 .
Interestingly, Cas12a has been recently reported to nick off-target DNA substrates with up to four mismatches depending upon the crRNA sequences employed 46 . On the contrary, enFnCas9 does not produce trans-cleavage products, and its high specificity both at the level of DNA interrogation and cleavage might be beneficial for safe nuclease-mediated genome editing. Although SpCas9 has shown robust gene editing capabilities across different genomic loci, the intrinsic non-specific nature of the protein has warranted the development of high-fidelity versions for potential therapeutic editing 47,48 . Interestingly, high-fidelity SpCas9 proteins generally show lower editing efficiencies as compared to the wild-type protein 5 . We selected two such proteins (SpCas9-HF1 and eSpCas9) due to their balanced activity and specificity as reported in literature 5,49,50 and compared their cellular editing rates (insertion/deletions) with enFnCas9 variants en1 and en15 in HEK293T cells. Among the two, enFn1 showed higher editing rates as compared to high-fidelity SpCas9 proteins at all four loci investigated and no detectable off-targets at any of the off-target sites identified either through GUIDE-Seq or in silico prediction 4,47 ( Figure 2B).
Expectedly, at all four loci, en1 had equal or higher editing efficiencies than the wildtype FnCas9 protein ( Figure 2B). These results showed that en1 achieves higher genome editing efficiency than high-fidelity SpCas9 variants but retains similarly high on-target specificity. Similarly, we confirmed that both en1 and en15 variants showed successful genome editing in other human cell lines as well such as the induced pluripotent stem cells (iPSCs) and retinal pigmented epithelial cells (ARPE-19) (Supplementary Figure 6B,C). Notably, in the iPSC line, en1 (18.6% indels) and en15 (23.0% indels) showed superior editing rates at the PAX6 locus when compared to even SpCas9 (13.8%) (Supplementary Figure 6B). Taken together, these results established that enFnCas9 variants perform genome editing with superior efficiency and specificity than high-fidelity SpCas9 variants.
Finally, we investigated if the higher editing efficiency of the enFnCas9 variants can also lead to greater homology-directed repair (HDR) when presented with a doublestranded DNA (dsDNA) template. Here too, we observed higher HDR mediated knockin of a long donor template (4.1 kb) at the DCX locus in HEK293T cells for both en1 and en15 as compared to SpCas9-HF1 and eSpCas9 ( Figure 2C). Collectively enFn1 showed a higher rate of gene editing at all the target loci tested both for insertions/deletions as well HDR mediated knock-in highlighting its suitability as a highly potent genome-editing protein.  In the present study, we have shown the remarkable efficacy and specificity of enFnCas9 variants in targeted genome editing and diagnostics. Two aspects of these variants would require further investigation. Firstly, the genome-wide editing specificity of enFnCas9 variants has not been explored and is a subject of ongoing experiments.
Interestingly, the specificity of these variants appears to stem from the DNA interaction properties of FnCas9 independent of the engineered residues in the enzyme. Thus, we observed that even after substantially improving DNA binding affinity and activity, en15 showed minimal editing at a GUIDE-Seq validated off-target with a single mismatch at the PAM proximal end ( Figure 2D). This is in sharp contrast to both eSpCas9 and SpCas9-HF1 which showed editing efficiencies comparable to the target site 51,52 ( Figure 2D). These results show that enFnCas9 variants possibly negotiate off-targets through a different mechanism than high-fidelity SpCas9 proteins. Similarly, due to its inherent modularity, FnCas9 has options for further engineering to reduce its size. To this end, we constructed a truncated enFn1 by deleting its REC2 domain (which does not show any tertiary contacts with the rest of the protein) and reduced the size of en1 to ~170 kDa, closer to that of SpCas9 (~159 kDa). Remarkably, en1-ΔREC2 can retain both its activity and specificity to similar levels as en1 ( Figure   2E,F,G). This is in sharp contrast to SpCas9-ΔREC2 where the substantial loss of activity was seen upon deletion 1 .
Our results indicate that engineering residues that regulate PAM duplex contacts in the Cas9 backbone can significantly improve editing efficiency without affecting specificity. This strategy can be potentially extended for other orthogonal Cas systems that possess higher intrinsic specificity but have low cellular activity. The enFn variants hold a lot of promise for safe and efficient nuclease mediated genome editing and also present potentially attractive avenues for double-strand break-free editing (such as base and prime editors) where the extent of off-target interrogation and concomitant nucleobase editing has not been understood to the fullest.

Plasmid construction
Point mutations and deletions were done by inverse PCR method on FnCas9 cloned in pE-SUMO vector backbone (LifeSensors) where intended changes were made on the forward primer and the entire plasmid was amplified by inverse PCR. Point mutations on the pET-His6-dFnCas9GFP and PX458-3xHA-FnCas9 (Addgene 130969) were done by essentially following the method described earlier 4 . gRNAs were cloned in the BbsI sites of PX458-3xHA-FnCas9, PX458-3xHA-en1FnCas9, PX458-3xHA-en15FnCas9, PX458-3xHA-SpCas9HF1 and eSpCas9(1.1) (Addgene 71814) for cellular genome editing assays by essentially following the method described earlier 53 . All of the constructs were sequenced before being used.

Protein Purification and sgRNA purification
The proteins used in this study were purified as reported previously 4,30 Briefly, plasmids for Cas9 from Francisella novicida were expressed in Escherichia coli and Cas14a1 were purified essentially by following the purification methods described earlier with some modifications 37,44 . The concentration of purified proteins was measured by the Pierce BCA protein assay kit (Thermo Fisher Scientific). The purified proteins were stored at -80 °C until further use.
In vitro transcribed sgRNAs were synthesized using MegaScript T7 Transcription kit (Thermo Fisher Scientific) using T7 promoter containing template as substrates. IVT reactions were incubated overnight at 37°C followed by NucAway spin column (Thermo Fisher Scientific) purification as described earlier 4 . IVT sgRNAs were stored at -20°C until further use.

(i) in vitro cleavage (IVC) assay
The RNA substrates were reverse transcribed into cDNA (Qiagen), followed by PCR amplification or the DNA substrates were only PCR amplified (Invitrogen) and further purified. The substrates were treated with a pre-assembled 500 nM en/FnCas9-sgRNA (1:1) RNP complex in a tube containing reaction buffer (20 mM HEPES, pH7.5, 150mM KCl, 1 mM DTT, 10% glycerol, 10 mM MgCl2) at 37°C for 10 min. The reaction was inactivated by using 1µl of Proteinase K (Ambion) at 55°C for 10 min, followed by the removal of residual gRNA by RNase A (Purelink) at 37°C for 10 min. The cleaved products were visualized on a 2% agarose gel and quantified.

Fluorescence assay (dFnCas9)
250nM biotin labelled PCR amplicons carrying 580bp long SARS-CoV-2 region with N501Y mutation were used for attaching DNA substrate to the wells of streptavidin coated plate by 10 mins incubation at room temperature. Wells were rinsed thrice with the wash buffer to get rid of the unbound amplicons (25mMTris-Cl, pH 7.2; 300mM NaCl; 0.1% BSA, 0.05% Tween®-20 Detergent) before using for the binding assay.  Coulter) were used to separate out amplicons from free primers and primer dimers.
Dual indexing was done using Nextera XT V2 index kit followed by a second round of bead-based purification. The libraries were quantified using a Qubit dsDNA HS Assay kit (Invitrogen, Q32853) and were also loaded on agarose gel for the qualitative check.
Libraries were normalized, pooled and were loaded onto the Illumina MiniSeq platform.

HDR assay at DCX locus in HEK293T
HEK293T cells were cultured in DMEM with GlutaMAX supplement (ThermoFisher Scientific Cat. No. 10566016) with 10% FBS serum. 70%-80% confluent HEK293T cells were harvested from a 6 well plate using Trypsin-EDTA (0.05%) (ThermoFisher Scientific Cat. No.: 25300062) and pipetted to make a single-cell suspension. For each electroporation reaction, a total 15ug plasmid was mixed in Resuspension buffer R, in which linearized donor plasmid DNA and Cas9-gRNA vector were taken in a 1:2 ratio.

ChIP Seq analysis
Raw sequencing reads were mapped to the human reference genome GRCh38 using bowtie2 55 . Peaks were called over input samples using MACS2 56 with default parameters. Finally, scrambled sample peaks were used to remove background and false positive peaks from the dSpCas9 and dFnCas9 test samples. These filtered peaks were searched for off-targets based on sgRNA sequence homology with a maximum of 6 mismatches. On-target peak coverage plots were generated by the fluff profiles command with 'remove duplicates' option 57 . Overlap between the dSpCas9 and dFnCas9 ChIP peaks were calculated using bedtools 58 and plotted as weighted Venn diagrams with the help of Intervene 59 .

PAM frequency analysis
PAM frequencies were calculated for more than 167 Cas systems (146 unique PAM sequences) from the human reference genome (GRCh38.p13) using in-house python script.

Amplicon sequencing analysis
Sequencing reads from different replicates were down-sampled prior to indel analysis for each target to remove sequencing read depth bias across the samples. Raw amplicon sequencing reads were subjected to indel frequency estimation using CRISPResso2 v2.0.45 60 with parameters such as ignoring substitutions and keeping minimum overlap between the forward and reverse read to the 10bp.

Immunostaining and confocal imaging
The ARPE-19 cells on glass coverslips were washed with phosphate-buffered saline (1X PBS) 48 hrs after transfection, fixed with 3.5% formaldehyde in 1X PBS for 10 minutes, followed by three washes. The cells were then permeabilized with 0.5% Triton X-100 in 1X PBS for 10 mins, followed by three washes, and then blocked with

Primer Sequences
All primer sequences used in the study are listed in Supplementary Table 4.

Data Availability
Deep sequencing data from ChIP and amplicon sequencing experiments were deposited as a BioProject under Project ID PRJNA766155.  Table 3). The variants en1, en15, en31 and en40 showed >80% substrate DNA cleavage within 0.5 min (Supplementary Figure 2A). Furthermore, en1, en15 and en31 showed around two fold higher cleavage activity w.r.t wild-type protein in the in vitro time kinetics assay ( Figure 2B).

Details of structure-guided engineering
The crystal structure of FnCas9 showed that the PAM duplex is nestled in the FnCas9 WED-PI domain, and the major groove dG(2) and dG(3) of 5′-NGG-3′ PAM on the nontarget strand are recognized by R1585 and R1556 respectively through bidentate hydrogen bonds while dN(1) is free from any protein contacts. R1556 also recognizes dA(3) by a single hydrogen bond which signifies the very weak interaction with 5′-NGA-3′ PAM by FnCas9. Towards developing a PAM-relaxed FnCas9NG variant, earlier work reported that substituting R1556A abrogates the protein function and thus relaxing the PAM constraint to NG PAM cannot be done by this approach. Partial rescue of functional activity was achieved by incorporating base non-specific interactions E1369R and E1449H. This led to the creation of the RHA-FnCas9 variant with activity on YG PAM 1 . However, we observed that the RHA variant has very poor activity even on the canonical NGG PAM necessitating the development of alternate variants with improved activity (current manuscript) (Supplementary Note 1, Figure 1).
In previous studies, the major groove adenine:glutamine contact had been reported 2,3 .
Substituting R1556Q (en49) completely destroyed the protein function in vitro, recapitulating the earlier observation seen with R1556A. Expectedly, R1556T(en17) substitution designed for 5′-NGT-3′ PAM recognition considering the preference of threonine for thymine base failed to show any protein activity. This confirms the interaction between R1556 with dG(3) of non-target strand of the PAM duplex is indispensable for FnCas9 functional activity. Interestingly, the incorporation of novel base non-specific interactions with the WED-PI domain of the protein created a subset of three variants (en1, en15 and en31) which showed enhanced enzymatic activity on both canonical NGG PAM and non-canonical NGA PAM. This subset also showed NGR/NRG PAM recognition which broadens the targeting scope in the genome. The crystal structure entails that the en1(E1369R) and en15(E1603H) variants create additional interactions with the backbone phosphate group between dC(-2) and dA(-1) in the target DNA strand and between dG(2) and dT(1) in the non-target strand respectively. In the triple mutant (E1369R/E1449H/G1243T, en31), E1369R/E1449H makes hydrogen bonding interaction with phosphate backbone between dC(-2) and dA(-1) in the target DNA strand and the phosphate-lock loop (PLL) G1243T makes additional hydrogen bonding interaction with +1 phosphate in the target strand from where DNA unwinding ensues (Supplementary Note 1 Figure 2).
Notably, amino acid substitutions of the protein with PAM-duplex base non-specific interactions which results in enhanced kinetic activity (en48, en50) failed to rescue the activity of R1556Q FnCas9.
It is important to note that G1243T alone cannot enhance functional activity to a greater extent. However, G1243T in combination with E1369R and E1603H (en31) exhibited robust in vitro kinetic enhancement on both NGG and NGA PAM.