nCas9 (H840A) sometimes creates DSBs
Cas9 nuclease enables programmable genome engineering via NHEJ or HDR by creating DSBs at target sites. In contrast, more recently developed genome editing tools such as base editors and PEs use nCas9 (D10A) and nCas9 (H840A), which nick the target and non-target strands, respectively (Fig. 1A). To confirm the cleavage patterns generated by these enzymes, we treated supercoiled plasmid in vitro with purified Cas9, nCas9 (D10A), or nCas9 (H840A) proteins together with in vitro transcribed sgRNA targeted to the HEK4 site. The nicking endonuclease Nt.BbvCI and the restriction enzyme SpeI (which generates DSBs) were used as controls. Nicking of supercoiled plasmids generates an open circular form, which exhibits an apparent increase in size on agarose gels compared to the supercoiled form, whereas linearized plasmids exhibit an apparent decrease in size (Fig. 1B). These size differences could potentially be used to describe the functional cleavage patterns of Cas9-related proteins. Digestion of supercoiled plasmids with Cas9 leads to almost complete linearization (99.7%), whereas treatment with nCas9 (D10A) generates primarily the open circular form (84.0%). However, in contrast to our expectations, two major products resulted from treatment with nCas9 (H840A): both open-circular (56.7%) and linearized forms (43.4%) (relative band intensities were calculated using ImageJ software) (Fig. 1C).
To examine the cleavage patterns generated by these enzymes further, we performed Digenome-seq 17–19. Purified Cas9, nCas9 (D10A), and nCas9 (H840A) proteins, together with in vitro transcribed HEK4-targeting sgRNA, were incubated with genomic DNA isolated from HEK293T cells. Cleavage patterns of Cas9, nCas9 (D10A), and nCas9 (H840A) were examined by whole genome sequencing (WGS) and examined with the IGV viewer. The anticipated cleavage patterns of these enzymes at the on-target site are presented in Fig. 1D. Cas9 and nCas9 (D10A) caused the expected cleavage patterns. However, surprisingly, nCas9 (H840A) completely cleaved the non-target strand, and partially cleaved the target strand, resulting in unexpected DSBs (Fig. 1E and S1A-S1D).
Addition of an N863A mutation to nCas9 (H840A) results in an enzyme that catalyzes single-strand breaks in the non-target strand
Our experiments so far have demonstrated that nCas9 (H840A) can sometimes generate DSBs. We reasoned that because target strand cleavage is catalyzed by the Cas9 HNH domain, the H840A mutation may not be enough to completely disable this HNH activity. Therefore, to create a complete non-target strand nickase Cas9, additional engineering in the catalytic region of the HNH domain may be needed to completely inactivate it. To test this hypothesis, we added another mutation to nCas9 (H840A). Structural studies of the Cas9 HNH domain indicated that residue N863 makes functional contact with H84020. N863 plays a role in coordinating with an Mg2+ ion required for catalysis when SpCas9 is in the cleavage state (state II). Therefore, we added the N863A mutation to nCas9 (H840A) in an effort to eliminate the function of the HNH domain. Next, purified Cas9, nCas9 (H840A), and nCas9 (H840A + N863A) proteins, together with in vitro transcribed sgRNAs (targeting the HEK4, EMX1, and RUX1 sites), were incubated with genomic DNA isolated from HEK293T cells. The cleavage patterns of these enzymes were then examined by WGS and visualized using the IGV viewer. As hypothesized, nCas9 (H840A + N863A) did not cleave the target strand, and, instead, generated clean single-strand breaks in the non-target strand at the HEK4, EMX1, and RUNX1 sites (Fig. 2A).
Because nCas9 (H840A) sometimes creates DSBs at on-target sites, it would also be expected to generate DSBs at genome-wide off-target sites; likewise, given that nCas9 (H840A + N863A) does not generate DSBs at on-target sites, it could avoid off-target DSBs. To investigate the pattern of genome-wide DSBs following treatment with nCas9 (H840A) and nCas9 (H840A + N863A), WGS data were subjected to Digenome-seq, a method that captures genome-wide off-target loci based on the in vitro cleavage pattern. DSBs and other base changes including C-to-U and A-to-I created by genome editing tools such as Cas9 nucleases, base editors, and PEs can be detected by this method17,21−24. Genomic DNA samples treated with wild-type (WT) Cas9, nCas9 (H840A), or nCas9 (H840A + N863A) targeted to the HEK4, RUNX1, and EMX1 sites were subjected to Digenome-seq and their captured genome-wide DSB sites are shown using Circos plots (on-target sites are indicated by black arrowheads; the height of the black bars represents the Digenome score) (Fig. 2B). The number of DSB sites captured in the Cas9-treated samples ranged from 148 (EMX1-targeted sgRNA) to 454 (HEK4-targeted sgRNA). In the nCas9 (H840A)-treated samples, 8 (RUNX1) to 23 (HEK4) DSB sites were captured. However, only a few DSB sites were detected in the nCas9 (H840A + N863A)-treated samples (in the range of 2 to 3 sites) in all cases (Fig. 2C). On average, 260 ± 100, 24 ± 10, 2.7 ± 0.3, and 2.0 ± 0.0 DSB sites were detected in the Cas9-, nCas9 (H840A)-, and nCas9 (H840A + N863A)-treated samples and the untreated control, respectively (Fig. 2D). In summary, the addition of the N863A mutation to nCas9 (H840A) almost completely eliminates the ability of this enzyme to create DSBs at both on-and off-target sites.
Additional Mutations In The Hnh Domain Further Reduce Indel Formation
To engineer a Cas9 nickase that would only cleave the non-target strand, we mutated additional residues in the Cas9 HNH domain that have been implicated in target strand cleavage. Using the structure of SpCas9 in cleavage state II (PDB:6O0Y), amino acids within 5 Å of H840 were chosen for further engineering25 (Figure S2A). N854 and D839, in the HNH domain were selected and mutated to alanine (Figure S2B). We generated 14 different versions of nCas9, containing different combinations of the D839A, H840A, N854A, and N863A mutations.
We reasoned that if the HNH domain were made completely dysfunctional by additional mutagenesis, indel generation in HEK293T cells treated with the newly engineered Cas9 enzymes would decrease. To test this idea, we delivered plasmids encoding WT Cas9 or nCas9 variants and sgRNAs targeting 15 endogenous genomic sites into HEK293T cells. Then, indel frequencies induced by Cas9 and the nCas9 variants were measured by targeted deep sequencing. We found that indel frequencies induced by Cas9 ranged from 24–80% (on average, 63 ± 2%), whereas frequencies induced by nCas9 (H840A) ranged from 0.050–15% (on average, 2.5 ± 0.6%), at the 15 endogenous sites. Interestingly, as we expected, nCas9 (H840A + N863A) induced a significantly lower average indel frequency of 0.34 ± 0.06% at the 15 target sites. As we had hypothesized, we identified versions of nCas9 with additional mutations that induced even lower indel frequencies than nCas9 (H840A + N863A). Three nCas9 variants (H840A + N854A, H840A + N863A + N854A, and H840A + N863A + D839A + N854A) induced indel frequencies of 0.03 ± 0.01%, 0.02 ± 0.01%, and 0.03 ± 0.01%, respectively, which represent significant reductions compared to that of nCas9 (H840A + N863A) (Figure S2C). Thus, additional mutations affecting catalytic amino acids in the HNH domain could further reduce indel generation in HEK293T cells.
Pe2 Variants Containing Improved Versions Of Ncas9 Induce Fewer Unwanted Indels
Relatively high frequencies of unwanted indels are one of the problems associated with prime editing. Because PE includes nCas9 (H840A), we reasoned that the ability of nCas9 (H840A) to generate DSBs could be the source of this problem. Therefore, we incorporated our newly-engineered nCas9 variants into PE2 to determine whether they could reduce the rate of unwanted indels.
We assessed the activity of these new PE2 systems, programmed to install single-base mutations, at 12 target sites in HEK293T cells (Fig. 3A). The frequency of intended base edits induced by PE2 (H840A) was 23 ± 6% on average (ranging from 4.2–65%). PE2 systems containing nCas9 with N863A, D839A, H840A + N863A, D839A + H840A, D839A + N863A, N854A + N863A, and D839A + H840A + N863A mutations in the HNH domain retained the desired single-base editing activity (19–23%, on average), which is comparable to that of PE2 (H840A). This finding shows that PE2 systems incorporating these HNH variants cleaved the non-target strand, essential for successful prime editing (Fig. 3B). Even the PE containing Cas9 nuclease exhibited an intended single-base editing activity of 6.1 ± 1.7% (ranging from 0.30–23%), although it also induced the highest frequency of indels (48 ± 4%). PE2 systems involving all other tested HNH domain mutations displayed intended editing efficiencies that were reduced by more than 80% compared to that of the conventional PE2 (H840A).
We then examined the frequency of unwanted indels. PE2 (H840A) induced unwanted indels at a frequency of 0.60 ± 0.17% (ranging from 0.023–1.7%). PE2 systems containing H840A + N863A, H840A + N854A, H840A + N854A + N863A, D839A + H840A + N854A, and D839A + H840A + N854A + N863A mutations reduced the frequency of unwanted indels by 2.6-, 3.8-, 4.2-, 3.8-, and 4.4-fold, respectively, compared to that induced by PE2 (H840A) (Fig. 3C). To select the PE2 variants that retained the ability to induce intended edits efficiently while generating fewer unwanted indels, we calculated the ratio of unwanted indels to total edits (intended edits + unwanted indels) in each case. The average ratio of unwanted indels for PE2 (H840A) at 12 endogenous target sites was 4.3%. Notably, we found that PE2 (H840A + N863A), PE2 (H840A + N854A), and PE2 (H840A + N864A + N863A) were associated with unwanted indel ratios of only 1.1%, 0.77% and 1.1% respectively (Fig. 3D). Thus, our improved versions of nCas9 incorporated into the PE2 system can reduce the ratio of unwanted indels, while maintaining intended editing outcomes.
Additionally, given that a deletion of the HNH domain of Cas9 is tolerated for its DNA binding function26, we constructed and tested HNH-deleted nCas9 variants. Four different fragments representing all or part of the HNH domain (residues 792–897, 765–908, 786–885, 824–874) were deleted and replaced with variable linkers of 2, 5, or 10 amino-acid residues in length (S3A). The resulting HNH-deleted nCas9 variants were then incorporated into PE2. We measured the frequencies of intended edits and unwanted indels induced by these PE2 variants targeted to 12 endogenous sites in HEK293T cells (Figure S3B). The Δ792–897 and Δ786–885 variants exhibited intended editing efficiencies about half that of PE2 (H840A), but all 12 HNH-deleted variants reduced the frequencies of both intended editing and unwanted indels (Figure S3C-E). Based on our data from PE2 variants containing HNH-substituted and HNH-deleted nCas9 variants, we focused on the three variants that resulted in the highest editing purity (H840A + N863A, H840A + N854A, and H840A + N864A + N863A) for further investigation (Fig. 3D).
Pe Variants Containing Ncas9 Variants Reduce The Frequency Of Unwanted Indels
Next, we tested the selected nCas9 variants in the PE3 system: Because PE2 exhibits a relatively low efficiency of genome editing, an additional guide RNA, namely a nicking sgRNA, is used to induce nicking in the opposite strand of DNA to increase the editing efficiency by stimulating cellular repair mechanisms. However, because the PE3 system uses two guide RNAs (pegRNA and sgRNA), PE3 (H840A) could generate DSBs at two sites via the activity of nCas9, increasing the yield of unwanted indels. Therefore, an appropriate nCas9 variant incorporated into PE3 should also reduce the frequency of unwanted indels induced by this system. To test our hypothesis, we examined the effect of using nCas9 variants in the PE3 system programmed to install five different single-base substitutions at three target sites (HEK3, RUNX1, and FANCF) (Fig. 4A-C). The frequency of intended edits induced by PE3 (H840A) ranged from 7.6–51% (on average, 32 ± 2%). Importantly, when PE3 (H840A + N863A) was used instead, the average frequency of correct edits was not significantly different that induced by PE3 (H840A) (on average, 30 ± 2%), but the average frequency of unwanted indels dropped significantly, from 4.3 ± 0.4% for PE3 (H840A) to 2.6 ± 0.3% for PE3 (H840A + N863A) (Fig. 4D). PE3 (H840A + N854A) and PE3 (H840A + N854A + N863A) induced extremely low frequencies of unwanted indels, but also induced low frequencies of intended edits (average, 9.9 ± 1.0% and 6.9 ± 0.7%, respectively) (Fig. 4D).
To examine the purity of correctly edited sequences, we calculated relative editing purity ratios [the frequency of correct edits normalized to that induced by PE3 (H840A) / the frequency of unwanted indels normalized to that induced by PE3 (H840A)] and the average editing purity [(the number of sequencing reads containing the correct edit) / (the number of sequencing reads containing the correct edit + the number of sequencing reads containing unwanted indels) * 100]. PE3 (H840A + N863A), PE3 (H840A + N854A) and PE3 (H840A + N854A + N863A) increased the relative editing purity ratios by 1.8-, 9.5-, and 9.4-folds respectively (Fig. 4E). The average editing purity was highest in PE3 (H840A + N854A, 95%) compared to PE3 (H840A, 86%), followed by PE3 (H840A + N854A + N863A, 95%), PE3 (H840A + N863A, 90%) (Fig. 4F).
Collectively, our data show that PE (H840A + N863A) can significantly reduce the rate of unwanted indels in both the PE2 and PE3 systems without sacrificing intended prime editing. Furthermore, PE3 (H840A + N854A) and PE3 (H840A + N854A + N863A) exhibited improved purity of intended editing, albeit with a lower range of intended edit frequencies than PE3 (H840A).
Prime Editing With Engineered Pegrnas
As an additional means of increasing the frequency of desired edits, we incorporated epegRNAs into the PE3 system (ePE3). epegRNAs were developed by adding a structured RNA motif, such as evopreQ1 or mpknot, to the 3’ end of the PBS sequence in pegRNAs. These RNA motifs, which are derived from virus sequences, protect pegRNAs from degradation and, thereby, improve both pegRNA stability and prime editing efficiency27. To test whether our PE variants can be applied to the epegRNA strategy, we assessed the efficiency of installing substitutions, insertions, and deletions by PE variants together with epegRNAs.
First, we generated epegRNAs encoding four to five different single-nucleotide substitutions at three different genomic loci (FANCF, HEK3, and RUNX1 sites (Fig. 5A, S4A and S4D)). Because PE (H840A + N863A) and PE (H840A + N854A) induced the highest and second highest frequencies of correct edits in both the PE2 and PE3 systems, we tested these variants in combination with epegRNAs. The frequencies of correct substitutions induced by ePE3 (H840A + N863A) (average, 37 ± 2%) were equal to those induced by ePE3 (H840A) (average, 38 ± 2%), whereas the average frequencies of unwanted indels were significantly decreased from 14 ± 1% for ePE3 (H840A) to 9.6 ± 1.0% for ePE3 (H840A + N863A) (S5A). Furthermore, the frequencies of correct substitutions induced by ePE3 (H840A + N854A) (average, 20 ± 2%) were retained more than half that of ePE3 (H840A) with minimal unwanted indels (average, 0.90 ± 0.14%) (S5A). The purity of correct substitutions induced by ePE3 (H840A + N854A) was 14-, 6.6-, and 16-fold higher than that for ePE3 (H840A) for editing at the FANCF, HEK3, and RUNX1 sites, respectively (Fig. 5B, S4B, and S4E). The purity of editing with ePE3 (H840A + N854A) reached 96%, 92%, and 99% at the FANCF, HEK3, and RUNX1 sites, respectively (Fig. 5C, S4C, and S4F).
For further evaluation, we tested an epegRNA encoding a 24-bp Flag-tag insertion at four different genomic loci (HEK3, VEGFA, FANCF, and RUNX1). When ePE3 (H840A + N863A) was used, the average frequency of correct insertions (20 ± 1%) was not significantly different from that induced by ePE3 (H840A) (21 ± 2%) at all four loci, whereas the average frequency of unwanted indels was significantly lower at 4.3 ± 0.7%, compared to that of ePE3 (H840A) (7.6 ± 1.2%) (Fig. 5D and S5B). Thus, the average purity of editing by ePE3 (H840A + N863A) was as high as 83%, whereas that of ePE3 (H840A) was 75% (Fig. 5F). In addition, the average frequency of correct insertions induced by ePE3 (H840A + N854A) ranged from 5.2–13% (average, 8.2 ± 0.9%). Remarkably, the average frequencies of unwanted indels induced by ePE3 (H840A + N854A) was 0.74 ± 0.11% (S5B). The relative editing purity ratio for ePE3 (H840A + N854A) was 4.3-fold higher than that of ePE3 (H840A) (Fig. 5E). The highest editing purity (91%) for inserting the Flag-tag was achieved by ePE3 (H840A + N854A) (Fig. 5F).
Finally, we tested epegRNAs encoding a 15-bp deletion at four different loci (HEK3, VEGFA, FANCF, and RUNX1) (Fig. 5G). Similar to results from the substitution and insertion experiments, ePE3 (H840A) and ePE3 (H840A + N863A) induced the same frequencies of the correct deletion (on average, 57 ± 4% and 57 ± 4%, respectively), but the frequency of unwanted indels induced by ePE3 (H840A + N863A) showed a trend to decrease at the four loci (Fig. 5G and S5C). In addition, when PE3 (H840A + N854A) combined with epegRNA was tested, the frequency of correct deletions reached 32 ± 5%, on average, but the average frequency of unwanted indels was 1.3 ± 0.3%, which led to an increase in the relative editing purity ratio, such that it was up to 6.1-fold higher than that of ePE3 (H840A) (Fig. 5H); the average editing purity for the deletion induced by ePE3 (H840A + N854A) was 95%, whereas that of ePE3 (H840A) was 84% (Fig. 5I).
In general, prime editing with epegRNAs increased the frequencies of the correct edit as well as that of unwanted indels. To decrease the frequency of such indels, we tested ePE3 containing nCas9 variants. When PE3 (H840A + N863A) was used with epegRNAs, the frequency of correct edits was the same as that induced by ePE3 (H840A), but the frequency of unwanted indels was significantly reduced for substitutions and insertions, leading to a higher editing purity than that obtained for ePE3 (H840A). In addition, when the ePE3 (H840A + N854A) variant was used, the frequencies of correct edits were dramatically increased compared to those induced by PE3 (H840A + N854A), which induced relatively low editing frequencies. However, surprisingly, unlike the frequency of the correct edit, the frequency of unwanted indels was not increased even when epegRNAs were used. Thus, the highest average editing purity, up to 99%, and up to a 16-fold higher relative editing purity ratio, was achieved by ePE3 (H840A + N854A) for all substitutions, insertions, and deletions.