Biochemical characterization of CjCas9
To examine the optimal guide length for CjCas9, we performed in vitro cleavage experiments, using the purified CjCas9, sgRNAs with 20- to 23-nt guide segments, and linearized plasmid DNA containing a target sequence and the canonical T3AACAC PAM. CjCas9 with the 20-nt guide sgRNA did not cleave the target DNA (Supplementary Fig. 2a,b). In contrast, CjCas9 with the 21–23-nt guide sgRNAs efficiently cleaved the target DNA, and the 22-nt guide sgRNA was optimal (Supplementary Fig. 2a,b), consistent with a previous study showing that 22-nt guide sgRNAs are ideal for CjCas9-mediated genome editing in human cells 8.
Since we previously examined the DNA cleavage activities of CjCas9 with a 20-nt guide sgRNA 12, we re-analyzed them using the optimal 22-nt guide sgRNA toward target DNAs with 16 different PAMs, in which the fourth to eighth nucleotides in the canonical T3AACAC PAM were individually substituted. CjCas9 efficiently cleaved the target DNAs with the T3VACAC PAMs, but not that with the T3TACAC PAM (Fig. 1a, Supplementary Fig. 3a–c), confirming the importance of the fourth V in the PAM. CjCas9 cleaved the target DNAs with the T3ARCAC and T3AAYAC PAMs, but not those with the T3AYCAC and T3AARAC PAMs (Fig. 1a, Supplementary Fig. 3a–c), confirming the requirements of the fifth R and sixth Y for the PAM recognition. CjCas9 cleaved the target DNA with the T3AACAC PAM more efficiently than with the T3AACBD (B = T/G/C; D = A/T/G) PAMs, indicating the importance of the seventh A and eighth C. The crystal structure of the CjCas9–guide RNA–target DNA complex suggested that the seventh A:T and eighth C:G (modeled) pairs in the PAM duplex are recognized by Arg866 through hydrogen bonds with the PAM-complementary T and G nucleobases 12 (Supplementary Fig. 3d), explaining the preference for the seventh A and the eighth C in the PAM. Together, these results revealed that CjCas9 recognizes the N3VRYAC PAMs, and are essentially consistent with previous studies 8,12.
To examine the potential preference of CjCas9 for specific sequences in the N3VRYAC PAMs, we measured the DNA cleavage activities of CjCas9 toward 12 different DNA targets encompassing all possible 12 nucleotide combinations at the fourth to sixth positions in the N3VRYAC PAMs. CjCas9 showed reduced activities toward the T3VGYAC targets (in particular, the T3CGYAC targets) relative to the T3VAYAC targets, although they are included in the N3VRYAC consensus sequences (Fig. 1b, Supplementary Fig. 3e–g). Collectively, these results indicated that CjCas9 disfavors some combinations in the N3VRYAC PAMs, such as N3RGCAC and N3CGYAC.
Engineering of the enhanced CjCas9 variant
To eliminate the bias of CjCas9-mediated PAM recognition, we sought to engineer a CjCas9 variant with enhanced activity. Previous studies revealed that additional interactions between Cas9 and the nucleic acids improve the DNA cleavage activity 19,20. We thus introduced mutations that could form new interactions with the guide RNA or the target DNA, based on the crystal structure of CjCas9 12 (Supplementary Fig. 5a). We purified more than 40 CjCas9 variants, and measured their cleavage activities toward the sub-optimal T3VGCAC targets (Supplementary Fig. 5b). Notably, the L58Y and D900K mutations enhanced the DNA cleavage activity, and the L58Y/D900K double mutation further improved the activity of CjCas9 (Supplementary Fig. 5c–f). The crystal structure suggested that Tyr58 (L58Y) and Lys900 (D900K) interact with the guide RNA and the target DNA, respectively (Supplementary Fig. 5a). We will hereafter refer to the L58Y/D900K variant as the enhanced CjCas9 (enCjCas9).
We next compared the in vitro cleavage activities of the wild-type CjCas9 (referred to as CjCas9 for simplicity) and enCjCas9 toward 23 DNA targets with different PAMs. Unlike CjCas9, enCjCas9 efficiently cleaved all of the T3VRYAC targets, including the sub-optimal T3CGYAC targets (Fig. 1c,d, Supplementary Fig. 5a–f). Furthermore, enCjCas9 cleaved some non-N3VRYAC targets, such as T3TACAC and T3AACAD (Fig. 1c, Supplementary Fig. 5 a–c).
To comprehensively compare the PAM specificities of CjCas9 and enCjCas9, we performed in vitro PAM discovery assays, in which a DNA library containing the target sequence adjacent to a randomized 8-bp sequence was cleaved by the purified CjCas9 (CjCas9 or enCjCas9) with the 22-nt guide sgRNA, followed by deep sequencing of the cleavage products. The sequence logos of the 8-bp random sequences depleted in this assay revealed similar N3VRYAC PAM sequences for CjCas9 and enCjCas9, although enCjCas9 exhibited more relaxed nucleotide requirements for the fifth, seventh, and eighth PAM positions (Fig. 1e,f). However, a sequence logo representation lacks detailed information about the preferences for individual sequences within promiscuous PAMs, although it is widely used to identify functional PAMs. We thus expressed the obtained sequencing data as 2D profiles of the mean log2 PAM depletion values on all 1,024 sequences at the fourth to eighth PAM positions. The PAM profiles revealed that CjCas9 has preferences among the N3VRYAC PAMs (Fig. 1g), consistent with our in vitro cleavage data for the individual PAM targets. In contrast, enCjCas9 efficiently recognized the N3VRYAC PAMs and some non-N3VRYAC sequences, such as N3VAYTC and N3VAYGC (Fig. 1h).
Genome editing in human cells
To assess the activities of CjCas9 and enCjCas9 in mammalian cells, we measured indel formation induced by CjCas9 or enCjCas9 at 38 endogenous target sites with the optimal N3VACAC, sup-optimal N3VGCAC, and non-N3VRYAC (N3TACAC and N3AACAD) PAMs in human embryonic kidney (HEK) 293Ta cells. CjCas9 induced indels at the target sites with the optimal N3VACA, sub-optimal N3VGCAC, and non-N3VRYAC PAMs at 44.0–53.6% (48.5% on average), 10.1–20.7% (16.4% on average), and 3.1–20.8% (12.9% on average) frequencies, respectively (Fig. 2a,b). In contrast, enCjCas9 induced indels at the optimal N3VACAC, sub-optimal N3VGCAC, and non-N3VRYAC PAM sites at 58.4–68.7% (64.8% on average), 20.2–39.8% (32.3% on average), and 14.8–41.0% (29.0% on average) frequencies, respectively (Fig. 2a,b). These results demonstrated that, as compared with CjCas9, enCjCas9 exhibits higher cleavage activities and broader targeting ranges in human cells, consistent with our in vitro cleavage data. Next, we compared the genome-editing efficiencies of CjCas9 and enCjCas9 with those of SpCas9 at two target sites with NGGAACAC PAMs, which can be accessed by both CjCas9 (N3VRYAC PAM) and SpCas9 (NGG PAM). CjCas9, enCjCas9, and SpCas9 generated indels at these two sites with 70.3–86.6% (78.4% on average), 79.7–87.5% (83.6% on average), and 55.8–62.6% (59.2% on average) frequencies, respectively (Fig. 2c). These results indicated that CjCas9 with optimal 22-nt guide sgRNAs can induce indels at target sites with appropriate PAMs, at efficiencies comparable to or higher than those of SpCas9, as previously reported 8.
Base editing in human cells
Target-AID, comprising the SpCas9 D10A nickase mutant fused to the Petromyzon marinus cytosine deaminase 1 (PmCDA1) and the uracil DNA glycosylase inhibitor (UGI), mediates C-to-T conversion at target genomic sites 15. We replaced the SpCas9 D10A nickase in Target-AID (referred to as SpCas9-AID for comparison) with the CjCas9 D8A and enCjCas9 D8A nickases to create CjCas9-AID and enCjCas9-AID, respectively. We examined whether CjCas9-AID and enCjCas9-AID could mediate C-to-T conversions at 38 endogenous target sites (identical to those tested for indel formation) in HEK293 cells. Unexpectedly, CjCas9-AID did not induce C-to-T conversions at the tested target sites (Fig. 3a,b). In contrast, enCjCas9-AID induced C-to-T conversions at 20 target sites at >5% frequencies (the optimal N3VACAC, sub-optimal N3VGCAC, and non-N3VRYAC sites at 15.8–25.1% (21.5% on average), 4.3–13.2% (8.0% on average), and 1.2–12.4% (6.2% on average) frequencies, respectively) (Fig. 3a,b). We also compared the base-editing efficiencies of CjCas9-AID, enCjCas9-AID, and SpCas9-AID at the two target sites with the NGGAACAC PAMs. Unlike CjCas9-AID, enCjCas9-AID induced C-to-T conversions at these target sites, albeit at lower efficiencies than those of SpCas9-AID (Fig. 3c). enCjCas9-AID induced C-to-T conversions predominantly at the −21 to −8 positions in target sites (Supplementary Fig. 6). These data revealed that enCjCas9, but not CjCas9, can be harnessed for base editing technologies.