TaRGET (Transposase B/augment RNA-based Genome Editing Technology) system
Recently, several hypercompact ancestral genes have been reported to show programmable RNA-guided endonuclease activity. In particular, the IS200/IS605 transposon-encoded TnpB has RuvC-like domain can be used as a genome editor when complexed with a compatible gRNA17. Karvelis et al. showed that TnpB of Deinococcus radiodurans ISDra2 (ISDra2TnpB) is a programmable endonuclease that is guided by right element (RE)-derived RNA (reRNA) to cleave DNA next to the 5’-TTGAT transposon associated motif (TAM)18. TnpB from A. macrosporangiidus also showed an omega gRNA-specific dsDNA cleavage in vitro with the TCAC TAM preference, though the in vivo indel efficiency was not explored18.
Type V Cas proteins, namely Cas12 family members, are likely to evolve from TnpB, and UnCas12f1 is an early member of TnpB-origin Cas effectors19. Because the IS200/IS605 family element transposase accessory protein TnpB from the Candidatus Woesearchaeota archaeon (hereafter TnpB) shares a perfectly matched nucleotide sequence with UnCas12f1 except for 5’-terminal 28 amino acid residues (Supplementary Table 1), the TnpB endonuclease may share certain molecular properties with it, including endonuclease activity and RNA binding. To test this possibility, the RNA-guided programmable nuclease activity was investigated for TnpB. In a previous study14, we developed several engineered versions of sgRNA for Cas12f, including ge3.0, 4.0, and 4.1. Identical to Cas12f, TnpB showed no indel-formation activity with canonical gRNA in HEK293T cells. However, TnpB exhibited significantly increased indel activity with the engineered (augment) gRNAs (Supplementary Fig. 1). Despite the slight target-dependent difference in the indel levels between Cas12f and TnpB, the overall cleavage showed a nearly identical pattern, indicating the orthogonal use of the augment RNAs for TnpB (TaRGET). Because the base editing efficiency usually depends on the indel efficiency of wild-type Cas effector proteins, it is necessary to start with a Cas system that shows sufficiently high indel efficiency. Thus, we compared the indel efficiencies of TaRGET, ISDra2TnpB, and AmaTnpB at the PCSK9 loci in HEK293T cells. The target sites are not exactly shared due to differences in respective PAM sequences. Therefore, 11 sites between exon 5 and exon 8 were selected instead in the PCSK9 genome sequence. Interestingly, the TaRGET system showed significantly higher indel efficiency, compared to those of ISDra2TnpB and AmaTnpB (Fig. 1a). Therefore, we concluded that the TaRGET system is feasible for use as a platform for the development of a compact base editing system.
Feasibility of dTnpB-based adenine base editors
Based on the information on the residues involved in the catalytic activity for Cas12f20, 21, we constructed four catalytically inactive mutants of TnpB (D354A, E450A, R518A, and D538A). Each mutant was tested as to whether the DNA cleavage activity was completely eradicated while allowing the preservation of the gene-targeting capability. An In vitro DNA digestion assay and an indel assay in HEK293T cells revealed that all of the mutants had null endonuclease activity (Supplementary Fig. 2), and we selected a dTnpB (D538A) mutant based on a previous CRISPRa experiment14.
A size-exclusion chromatogram was used to estimate the molecular mass of sgRNA-bound TnpB as ca. 194 kDa, suggesting that TnpB formed a homo-dimer in the presence of engineered gRNA, similar to UnCas12f120, 21 (Fig. 1b). In this case, the orientation of deaminase fusion may affect the base-editing property. To test this possibility, we constructed TnpB-based adenine base editors by fusing the wild-type Tad-mutant Tad (Tad-Tad*) or Tad*-Tad to either the N- or C-terminus of dTnpB3 (Fig. 1c). These constructs were tested for two validated targets14, one showing an A-rich sequence at the PAM-proximal region and the other showing this sequence at the PAM-distal region (Fig. 1d). The deaminase architectures fused to the C-terminal orientation (ABE-C1 and ABE-C2) showed substantial levels of A-to-G conversion activity, whereas the N-terminally fused modules (ABE-N1 and ABE-N2) were only marginal in terms of conversion activity. Conversions were only observed at the PAM-proximal regions. Because ABE-C2 showed higher conversion rates compared to the other ABEs, it was used to identify a base-editing window. In an experiment where two PAM-proximal A-rich sequences were targeted, a conversion was only observed within the window of A2 to A5, with the most prominent conversion activity observed at positions A3 and A4 (Fig. 1e). Base editors evolved with various engineered versions of Tad22. Thus, we compared the A3-to-G3 conversion rates for all Tad variants developed thus far and found that the architectures of the codon-optimized Tad-Tad* (V106W, D108Q)23,24 showed the highest conversion rates compared to any of the other forms (Supplementary Fig. 3a, Supplementary Table 1). We presented the optimized Tad dimer as Tad-Tad** and designated this ABE form as TaRGET-ABE-C3. The length of the linker and the position of NLS did not affect the base editing efficiency (Supplementary Fig. 3b). TaRGET-ABE-C3 was compared with several adenine base editors with respect to the editing windows and conversion efficiency (Fig. 1f). As mentioned above, the base editing window of TaRGET-ABE-C3 was formed at a relatively narrow range, similar to the recently reported ABEMINI15. However, the overall conversion efficiency of TaRGET-ABE-C3 was significantly higher than that of Cas12f-based ABEMINI, though it was lower than those of SpCas9 nickase-based ABEs, such as ABE7.10, ABE8e, and ABE9 (Fig. 1g).
In a previous study, we presented three different versions of sgRNA for UnCas12f1, ge3.0, ge4.0, and ge4.1, indicating that the selected sgRNA version would affect the base-editing efficiency. To investigate this possibility, we selected 18 targets that show different indel activity outcomes depending on the sgRNA version. Similar to earlier work, 15 out of 18 sites showed the correlation between the gRNA version and indel/conversion efficiency (Fig. 1h). That is, the selection of the most suitable gRNA must come first for the most desirable base-editing outcome. Taken together, TnpB-ABE-C3 guided by an optimal gRNA version shows optimal base-editing performance.
Expanding targetable sites via TnpB and Tad engineering
Identical to Cas12f1, the in vitro cleavage assay indicated that TnpB showed a PAM preference for TTTR (TTTA and TTTG), which means that targetable sites are quite restricted (Fig. 2a). Thus, we attempted to develop TnpB mutants with preference to non-TTTR PAM and to apply PAM variants to a wider range of sites. To do this, we initially constructed a PAM library vector, which was achieved by securing individual PAM clones (44=256 clones) and then mixing them at an equal molar ratio to ensure even distributions of each. The PAM library vectors were digested with sgRNA ge_4.1 and the TnpB PAM variant proteins. The cleaved vectors were amplified by adaptor ligation and PCR. A deep sequencing analysis enabled PAM variant-PAM preference matching (Fig. 2b). To select PAM variants with retained dsDNA cleavage activity levels, we prepared different HEK293T clones each carrying different PAM mutants at an NLRC4 locus via homology-directed repair. (For details, please refer to Supplementary Fig. 4) This approach would make it possible to compare the relative indel efficiency of each PAM variant, compared to the wild-type TnpB.
Because TnpB shows an identical PAM preference and shows sequence conservation in the DNA-binding region with UnCas12f1, we selected candidate amino acids on TnpB based on the structural characterization of UnCas12f120, 21, in this case, S170, Y174, A184, S188, R191, Q225, Y230, V271, and Q272. Each candidate site was mutated into all possible 19 amino acids, and each PAM variant candidate was tested with respect to the dsDNA cleavage activity in vitro for the altered PAMs as described in the scheme of Fig. 2b. The PAM-variant candidates were selected for the criteria of 1) high total sequencing reads and 2) a high sequencing read ratio for a specific PAM. The in vitro cleavage and the deep sequencing analysis enabled the screening of PAM-variant candidates (Supplementary Fig. 5). The results indicate that several variants showed redundantly identical PAM preference. For instance, the S170T, S188Q, S188H, Q225T, Q225F, and Q272K variants showed a high TGTA PAM preference. Among the variants, the S188Q variant showed highest indel frequency for TGTA PAM compared to the other variants, when tested in PAM sequence-altered HEK293T cells as described in Supplementary Fig. 4. Likewise, the S188Q, S188K, and R191K mutants showed high indel frequencies for TCTG, TGTG, and TTTC PAMs, respectively (Fig. 2c). The S118K variant showed broader PAM specificity, in this case TTTT and TTTC as well as TTTA and TTTG, i.e., TTTN (Fig. 2d). To test the application of the PAM variants to adenine base editing in a non-TTTR PAM context, the A-to-G conversion activity of the PAM variants were tested for different sites with altered PAMs. As shown in Fig. 2e, each PAM variant showed different levels of A3-to-G3, A4-to-G4, and A5-to-G5 conversion activities. A suitable PAM variant can be selectively used for a specific sequence context, or a variant showing a multi-PAM preference such as the S188K variant can be deployed for multiplexed targeting. Collectively, the engineering of TnpB expanded the occupancy of targetable base-editing sites from 0.78% to 3.12%.
Despite the expansion of targetable sites using PAM variants, the editable incidence is still limited because a prominent editing window formed at positions 3 and 4 (This feature is occasionally favorable for specific editing) Expanding or shifting the window could be an additional option by which to expand the applicability of the TaRGET-ABE system. Structural modeling of the TnpB-gRNA ribonucleoprotein complex identified possible mutation sites at Ile159 and Ser164. The model indicates that the bases at position of 5 and 6 are concealed in a pocket of the WED domain (Supplementary Fig. 6). We speculated that the replacement of Ile159 and Ser164 with a bulky amino acid would make the bases of positions 5 and 6 more protruding (Fig. 2f), which would make deaminases more accessible to those bases. We created the I159W and S164Y mutants and applied them to adenine base editing for several targets, each of which carrying A at a different position. When we compared the editing efficiency of the variants with that of wild-type TaRGET-ABE-C3, the S164Y mutant led to a dramatic compromise in the A-to-G conversion rates at positions 3 and 4 without window expansion. However, the I159W mutant upheld the conversion rates at the positions 5 and 6 with retained A3 and A4 conversions (Fig. 2g). The last approach was related to a divergent architecture of the deaminase module. While constructing various combinations of Tad variants, we fortuitously found that dTnpB-Tad-eTad modules showed a window expansion at position 2. The eTad sequence was originally used as a monomeric deaminase for the ABE8e version25. The fusion of the Tad-Tad8e dimer module to dTnpB (D538A), hereafter referred to as TaRGET-ABE-C3.1, induced dramatically boosted conversion at position 2 with sustained conversion efficiency outcomes at positions 3 and 4 (Fig. 2h).
We attempted to validate the TaRGET-ABE-C3.1 system for 25 endogenous sites (Fig. 2i, Supplementary Fig. 7 and Supplementary Table 2). The distribution confirmed the most prominent base editing at positions 2-5 without the application of the I159W mutation. Nonetheless, we identified two sites for which positions 17 and/or 18 was edited at a relatively high efficiency rate. Thus, this non-canonical editing should be monitored on a per-site basis. Taken together, the engineering and reconstruction of TnpB and Tad modules largely broadened the otherwise highly restricted base editing range by both expanding targetable PAMs and shifting or expanding the base editing windows.
Adenine base editing via AAV delivery
nSpCas9 (D10A)-based adenine base editors enable highly efficient A-to-G conversions in eukaryotic cells when delivered by plasmid vectors3, 25. However, the AAV delivery is limited due to the oversized deaminase-dCas9 modules8. This limitation can be overcome by using a split-AAV vector delivery12 or miniABE8e26. While all of the engineered ABEs compromised the full activity of Cas9-based ABE systems, our TaRGET-ABE system is sufficiently compact such that it can be delivered in an all-in-one AAV vector. Furthermore, there remains space for additional cargo within a payload size limit of ~4.7 kb. One of the applications utilizing the additional cargo space would be multiplexed base editing. We produced AAV2 particles where the TaRGET-ABE-C3 system was charged with one sgRNA (sgRNA 1 for site3 or sgRNA2 for site5) or paired sgRNAs simultaneously targeting site3 and site5 (Fig. 3a). HEK293 cells were transduced at a multiplicity of infection (MOI) of 100,000 for vector systems for ten days, during which the cells were sub-cultured upon five day post initial transduction and the MOI was kept constant through additional treatments of AAV particles with fresh media. When a single sgRNA was loaded onto the AAV vector, a target-specific base editing was achieved (Fig. 3b). Interestingly, we were able to perform multiplexed A-to-G conversions using paired sgRNAs in a single AAV particle. Moreover, the conversion efficiency at each site obtained using the paired gRNA-AAV particles was not compromised, compared to those obtained using one sgRNA-charged AAV particle.
The capability of charging paired sgRNAs in an all-in-one AAV vector system can act as a ‘double-edged sword’ regarding substitution-based treatments of certain diseases. We illustrate this concept for a possible treatment strategy for cancer. Epidermal growth factor receptor 4 (ErbB4; HER4) is a kinase that stimulates oncogenesis and cancer progression in many cancer types, and chemical or biological inhibitors are used for the treatment of cancer27. We loaded two sets of sgRNAs together with TarGET-ABE-C3 in an all-in-one AAV vector. sgRNA1 aims to induce exon skipping and a frame-shift mutation by substituting a splicing acceptor consensus sequence (-AG-) with a splicing-skipping sequence (-GG-). sgRNA2 induces the skipping of the exon involved in the binding of growth factors. The use of either one of the two sgRNAs can produce non-functional receptors, but the concomitant charge of the two sgRNAs can further increase the frequencies of the occurrence of non-functional ErbB receptors (Fig. 3c). We screened two intron-exon interface targets that meet the requirements of PAM and the reading frame: One (sgRNA1) is at the intron I-exon 2 and the other (sgRNA2) at intron III-exon 4. AAV2 particles carrying either one of the two sgRNAs or both were produced in HEK293T cells and treated in H661 cells at an MOI of 105. TaRGET-ABE-C3 carrying sgRNA1 or sgRNA2 produced non-functional mRNAs with at the frequency of 17.3±2.5% and 13.2±2.3%, respectively. However, the percentage of non-functional mRNAs increased to 26.8±3.7% for AAV particles carrying both sgRNAs (Fig. 3d). This ‘double-edged sword’ effect was manifested in the stalling of the growth of cancer cells, where the two sgRNAs collaborated to retard the growth of ErbB4-positive A549 cells (Fig. 3e). Taken together, these results suggest that the hypercompact TnpB-based adenine base editors provide a useful and precise genome editing tool delivered by AAV. It is important to note that a more universal PAM variant would expand the application of the exon-skipping strategy for gene knockout, particularly genes consisting of a few exons, including the transthyretin (TTR)28 and proprotein convertase subtilisin/kexin type 9 (PCSK9)29,30.