Adenine base editor engineering reduces editing of bystander cytosines

Adenine base editors (ABEs) catalyze specific A-to-G conversions at genomic sites of interest. However, ABEs also induce cytosine deamination at the target site. To reduce the cytosine editing activity, we engineered a commonly used adenosine deaminase, TadA7.10, and found that ABE7.10 with a D108Q mutation in TadA7.10 exhibited tenfold reduced cytosine deamination activity. The D108Q mutation also reduces cytosine deamination activity in two recently developed high-activity versions of ABE, ABE8e and ABE8s, and is compatible with V106W, a mutation that reduces off-target RNA editing. ABE7.10 containing a P48R mutation displayed increased cytosine deamination activity and a substantially reduced adenine editing rate, yielding a TC-specific base editing tool for TC-to-TT or TC-to-TG conversions that broadens the utility of base editors. Engineered variants of adenine base editors have reduced cytosine base editing or a specific C-to-G base editing activity.

Adenine base editors (ABEs) catalyze specific A-to-G conversions at genomic sites of interest. However, ABEs also induce cytosine deamination at the target site. To reduce the cytosine editing activity, we engineered a commonly used adenosine deaminase, TadA7.10, and found that ABE7.10 with a D108Q mutation in TadA7.10 exhibited tenfold reduced cytosine deamination activity. The D108Q mutation also reduces cytosine deamination activity in two recently developed high-activity versions of ABE, ABE8e and ABE8s, and is compatible with V106W, a mutation that reduces off-target RNA editing. ABE7.10 containing a P48R mutation displayed increased cytosine deamination activity and a substantially reduced adenine editing rate, yielding a TC-specific base editing tool for TC-to-TT or TC-to-TG conversions that broadens the utility of base editors.
Rational design of TadA7.10 to reduce cytosine editing. To identify key mutations in eTadA that would promote discrimination between adenine and cytosine, we first examined the amino acid sequences of TadA orthologs from various species, because some of these orthologs may have already evolved to avoid cytosine editing. Based on an amino acid sequence alignment of TadA orthologs (Fig. 1d) and the structure of Staphylococcus aureus TadA (saTadA) bound to a fragment of tRNA (Fig. 1e), we found that the identities of several residues in and around the active site vary substantially among the orthologs. For example, P48 in wtTadA is substituted by arginine in the majority of TadA orthologs, and D108 is changed to asparagine, glutamate or serine in other orthologs. In addition, the saTadA structure provided insight into what structural change in the RNA substrate is required for the deamination of cytosine, which is smaller than adenine. For adenine deamination activity, the hexagonal ring of adenine should be located deep inside the adenine-binding pocket, similar to what is shown in the saTadA structure with a purine base bound to the pocket. However, for cytosine deamination, its pyrimidine ring needs to be at the same position as that of the hexagonal ring of the purine base in the structure, and this consequently requires a shift of the sugar-phosphate backbone towards the rim of the pocket. Therefore, we rationally designed the amino acid changes ( Supplementary Fig. 2) to include the following: (1) the substitution of P48 and D108 in TadA7. 10 with bigger residues, which may prevent the DNA backbone from  ). Graphs show the base substitution activity of each ABE at A4 and C6 in the FANCF and RNF2 sites (right panel). Residue numbering is based on the amino acid position in E. coli TadA (n = 3, mean ± s.d.). c, Schematic diagram indicating the constituents of adenine base editors (left panel). The bar graphs (right panel) show the substitution efficiency of each ABE at adenines and cytosines in the target windows (n = 3, mean ± s.d.). NLS, nuclear localization signal. d, Sequence alignment of wtTadA, TadA7.10 TadA8e, saTadA and 10 TadA orthologs from various species. The positions corresponding to P48 and D108 of wtTadA are highlighted by blue and red boxes. e, The apo structure of wtTadA from E. coli (green; PDB code 1Z3A) was superposed onto the holo structure of saTadA bound to RNA (pink and white; PDB code 2B3J). The catalytically important zinc ion in the wtTadA structure is represented as a gray ball. Two dotted circles highlight putative spaces between the RNA backbone and the active site of wtTadA.
approaching the pocket rim, or with negatively charged residues, which may repel the backbone phosphate group; (2) F149A mutation, because F149 would closely interact with the DNA backbone across from D108 and might help a cytidine residue to go deep into the active site; and (3) mutation of V30 and F84, located in the adenine binding pocket, into isoleucine and leucine, which are found in the corresponding positions of many TadA orthologs (see Methods for details).
Determination of key mutations that affect cytosine editing. We next introduced each candidate mutation into TadA7.10 in either ABEmax or ABEmax-m (Supplementary Table 3) and tested the nucleotide conversion activity of each resulting ABE variant at the target sites in the FANCF and RNF2 genes in HEK293T cells. We also tested mutations that have previously been shown to partly reduce RNA editing activities, including K20A/R21A, R47Q, D53E, V82G, V106W and F148A [5][6][7]11 . When we normalized the adenine and cytosine conversion rates of the ABE variants to that of ABEmax, we found that most mutations increased or decreased both the adenine and cytosine editing activities in concert ( Fig. 2a and Supplementary  Fig. 3). Nevertheless, we identified four mutations (V106W, D108Q, F148A and F149A) that substantially lowered the cytosine editing activity but maintained or slightly decreased the adenine editing activity, resulting in higher specificity for adenine editing (Fig. 2a). Therefore, we further tested the activities of the four ABE variants at sites in two more endogenous genes (ABLIM3 and CSRNP3) and concluded that ABEmax-m containing TadA7.10-D108Q (hereafter referred to as ABEmaxQ-m) showed the highest specificity for adenine editing (Fig. 2b and Supplementary Table 1). Conversely, we found that the P48R mutation in TadA7.10 of ABEmax-m substantially reduced adenine editing rates but increased cytosine editing rates, resulting in high specificity for cytosine editing ( Fig. 2a and Supplementary Fig. 3). In summary, we identified three TadA7.10 variants: TadA7.10-D108Q and TadA7.10-F149A, which showed enhanced selectivity for adenine editing, and TadA7.10-P48R, which showed enhanced selectivity for cytosine editing.

D108Q decreases both cytosine and RNA deamination activity.
ABEmaxQ-m exhibited greatly reduced cytosine conversion activity, but its adenine conversion activity was also reduced compared to that of ABEmax or ABEmax-m. To compensate for this low editing activity, we next adapted TadA8e and TadA8.17, which were developed with the aim of enhancing editing activities 9,10 , and tested them in place of TadA7.10. We introduced the D108Q mutation into ABE8e, ABE8e-V106W and ABE8s, thereby generating ABE8e-D108Q, ABE8e-V106W/D108Q, and ABE8s-D108Q, which are referred to hereafter as ABE8eQ, ABE8eWQ and ABE8sQ, respectively. We also introduced the F149A mutation into ABE8e-V106W and ABE8eWQ, which are referred to hereafter as ABE8eWA and ABE8eWQA, respectively. We tested all ABE variants at a total of four endogenous sites (in FANCF, RNF2, ABLIM3 and CSRNP3). High-throughput sequencing results showed that the ABE8eQ, ABE8eWQ, ABE8eWA and ABE8sQ variants all exhibited enhanced adenine editing activities with reduced cytosine editing activities ( Fig. 3a and Supplementary Fig. 4). In particular, ABE8eWQ and ABE8eWA were determined to be the most optimized versions for both editing activity and specificity, indicating that V106W shows synergistic effects with D108Q and F149A.
To investigate the editing windows and other characteristics of the ABE8eWQ and ABE8eWA variants, we tested them at six additional endogenous target sites (in ABEsite5, ABEsite8, ABEsite10, ABEsite12, ABEsite16 and CCR5-10), which contained multiple adenines at different positions. High-throughput sequencing results showed that both ABE8eWQ and ABE8eWA variants exhibited adenine activity in slightly narrower adenine editing windows (4th~7th position) compared to ABE8eW (3rd~8th position) ( Fig. 3b and Supplementary Fig. 5), similar to ABE7.10-F148A 7 . We also confirmed that both ABE variants retained adenine substitution activities with reduced cytosine substitution activities by testing them at eight additional endogenous target sites (in CLEC4E, EMB, ARHGEF38, CST9L, CLYBL, HOPX, SAE1 and SDS), which contained TC motifs at different positions and with different flanking nucleotides ( Fig. 3c and Supplementary Fig. 6).
We next evaluated the ABE-mediated RNA off-target editing activity of all ABE variants except two variants (ABEmaxQ and ABE8eWQA) with relatively low DNA editing activities. We transfected each ABE variant into HEK293T cells and measured the A-to-I conversion frequencies in four representative RNA transcripts (CCNB1IP1, AARS1, PERP and TOPORS) 6,7 . High-throughput sequencing results revealed that the ABE8e and ABE8s versions showed increased RNA off-target effects compared to ABEmax, and that ABE8eW showed reduced RNA off-target effects compared to ABE8e (Fig. 3d), as previously reported 9,10 . ABE8eWA showed a similar level of RNA off-target effects to ABE8eW, whereas ABE8eQ, ABE8eWQ and ABE8sQ showed greatly decreased RNA off-target effects, consistent with the previous finding that several mutations affecting the 108th residue in ABE7.10 (or ABEmax) reduced RNA off-target effects 5 . D108Q could decrease the binding affinity of TadA8e for RNA, but not for DNA, because the carboxyl group of D108 forms a hydrogen bond with a 2ʹ hydroxyl group of the bound RNA in the saTadA-RNA structure. To determine whether the reduced RNA off-target effects were generally true at the transcriptome-wide level, we further conducted RNA-seq experiments for one control (nCas9 only) and five ABE variants (ABE8e, ABE8eW, ABE8eWQ, ABE8s and ABE8sQ). The RNA-seq data revealed that ABE8eWQ and ABE8sQ exhibited significantly reduced RNA off-target effects, almost similar to the control level ( Fig. 3e). Taken together, our results indicate that the D108Q mutation affects a key residue, decreasing both cytosine editing and RNA deamination activities, so that ABE8eQ, ABE8eWQ and ABE8sQ are optimized versions of ABE; of these, ABE8eWQ is the best version.
ABEs with the P48R mutation work as TC-specific base editors. Next, we sought to develop TC-sequence-specific base editing tools by using the TadA7.10-P48R variant, which has increased cytosine editing activity with greatly decreased adenine editing activity. To this end, we linked two copies of uracil DNA glycosylase inhibitor (UGI) to the C terminus of ABEmax-P48R as had been done in AncBE4max, an optimized cytosine base editor (CBE) 12 . The addition of UGI may increase the C-to-T editing ratios rather than the C-to-G editing ratios, or vice versa. We ultimately prepared six base editing tools: AncBE4max and AncBE4max(ΔUGI) as CBEs, ABEmax and ABEmax-UGI as ABEs, and ABEmax-P48R and ABEmax-P48R-UGI as TC-specific base editors (Fig. 4a). We tested all of them at a target site in the CSRNP3 gene in HEK293T cells. High-throughput sequencing results showed that CBE and CBE(ΔUGI) converted all Cs (that is, C3, C6, and C7) within the editing window with the highest rates, and that ABE and ABE-UGI converted all As (that is, A4 and A8) and C6, whereas ABE-P48R and ABE-P48R-UGI dominantly converted C6 (Fig. 4b).
We further tested these tools at six additional endogenous sites (in FANCF, RNF2, ABLIM3, RHPN2, BRME1 and LOC101927151) containing TC and A nucleotides at different positions within the editing window (for example, ATC, TCA or TCNA). High-throughput sequencing results showed that editing activity tendencies were consistent with results at the CSRNP3 site and that ABE-P48R dominantly converted C-to-G but that ABE-P48R-UGI dominantly converted C-to-T as expected ( Fig. 4c and Supplementary Fig. 7).
In addition, we tested both ABE-P48R and ABE-P48R-UGI at eight endogenous target sites (in CLEC4E, EMB, ARHGEF38, CST9L, CLYBL, HOPX, SAE1 and SDS), in which TC motifs were located at various positions. From these experiments, we confirmed that both ABE-P48R and ABE-P48R-UGI converted cytosines in a narrow editing window (5th~7th position) ( Fig. 4d and Supplementary Fig. 8). Furthermore, we tested the two tools at nine endogenous target sites and HAGH) in which cytosine was fixed at the 6th position in different motifs (AC, GC and CC). From these experiments, we found that ABE-P48R and ABE-P48R-UGI also have a preference for a TC motif ( Fig. 4e and Supplementary Fig. 9). Taken together, these results suggest that ABE-P48R and ABE-P48R-UGI can function A editing/C editing A editing/C editing  , with each independent replicate normalized to the editing efficiency of ABEmax. The two far right bar graphs indicate specificities of adenine conversion. Each dot indicates the adenine conversion efficiency compared to the cytosine conversion efficiency for an independent replicate. Variants that contain V106W, D108Q, F148A or F149A mutations exhibit the highest specificity for adenine editing and are highlighted with yellow boxes. A variant containing P48R conversely displayed the highest specificity for cytosine editing and is highlighted with a light red box (n = 3 except the designated replicates, mean ± s.d.). b, The diagrams on the left indicate the mutations in the adenosine deaminases of seven ABE variants. Base conversion efficiencies are shown in a heat map format (log 2 [fold change]). Adenine editing efficiencies relative to cytosine editing efficiencies are shown in bar graphs. Each bar represents the mean of results from two independent replicates, and each dot represents the data from an independent experiment. as specific cytosine editing tools for TC-to-TG and TC-to-TT editing, respectively, with reduced bystander editing effects.

Precise TC-to-TG or TC-to-TT editing of pathogenic mutations.
To show the potential of ABE-P48R and ABE-P48R-UGI for treating genetic diseases, we inspected all targetable variations registered in the ClinVar database in silico. A total of 36,153 T-to-C mutations causing pathological phenotypes in the database are located in a canonical cytosine base editing window (3rd~8th position) (Fig. 5a). Among them, 3,874 mutations are associated with a cytosine target motif within the ABE-P48R-UGI editing window. In addition, 3,248 of the 23,237 G-to-C mutations in the database can be targeted by ABE-P48R. In the case of a missense mutation in the TUBB6 gene (causing an F394S change in the protein) that is associated with congenital facial palsy, bilateral ptosis and velopharyngeal dysfunction 13 , a TC sequence should be corrected to TT. We first established a cell line containing the appropriate mutation in the genome to mimic the disease situation ( Supplementary Fig. 10), after which Substitution rate of cytosine to other nucleotides (%) we transfected a CBE (AncBE4max) or ABE-P48R-UGI into the cell line. High-throughput sequencing results showed that relative to ABE-P48R-UGI, CBE generated higher total cytosine conversion rates but lower rates of exact corrections (<1% compared to 4.1% for ABE-P48R-UGI) (Fig. 5b). Similarly, we also tested a missense mutation in the PTPN11 gene (causing an N58Y change) that is found in a patient with juvenile myelomonocytic leukemia and Noonan syndrome 14 , and observed consistent results (Fig. 5b). It is notable that ABE-P48R-UGI generated negligible bystander effects, whereas CBE generated abundant bystander effects, which suggests that ABE-P48R-UGI would be a useful TC-to-TT editing tool.
Conversely, for a missense mutation in the TPO gene (causing a Q660E change) that is found both in patients with nontoxic goiter and those with toxic goiter 15 , a TC sequence should be corrected to TG. As before, we generated a cell line with the appropriate genomic mutation to mimic the disease condition ( Supplementary Fig. 10), and then transfected the CBE (AncBE4max) or ABE-P48R into the cell line. High-throughput sequencing results showed that compared to ABE-P48R, CBE generated higher total cytosine conversion rates but lower rates of exact corrections (<1% compared to 3.1% for ABE-P48R) (Fig. 5c). Similarly, we also tested a missense mutation in the FBN1 gene (causing a C958S change) that may affect Marfan syndrome 16 , and observed consistent results (Fig. 5c). It is notable that like ABE-P48R-UGI, ABE-P48R generated negligible bystander effects, whereas CBE generated abundant bystander effects, which suggests that ABE-P48R could be a useful TC-to-TG editing tool.

Discussion
In this study, through rational design we determined a key mutation in eTadA, D108Q, that is responsible for decreasing cytosine catalysis activity. By contrast, D108F and D108W mutations dramatically decreased both adenine and cytosine editing activities (Fig. 2a), suggesting that amino acids larger than glutamine may limit the accessibility of the sugar-phosphate backbone. D108E completely abolished the activity, probably owing to strong charge repulsion between the carboxyl group of glutamate and the backbone phosphate. However, D108K caused only a mild decrease, suggesting that the lysine-phosphate interaction may not have a notable effect on the conformational dynamics of the DNA backbone. Although the size of methionine is similar to that of glutamine, D108M decreased the editing rates much more than D108Q, suggesting that direct polar interactions or water-mediated interactions between glutamine and the sugar-phosphate backbone may be crucial for the catalytic activity. Conversely, we further identified a key eTadA mutation, P48R, that increases cytosine catalysis activity while reducing adenine editing activity. Given that other mutations that affected P48 also similarly decreased the ratio of adenine editing to cytosine editing (Fig. 2a), the cytosine specificity can probably be attributed to a change in the conformation of the main chain that consequently affects the orientations of neighboring residues. For example, a slight conformational change of N46 could have a dramatic effect on the editing activity because the residue closely contacts the purine hexagonal ring in the saTadA-RNA structure. Fortunately, in the case of P48K and P48R, the editing activity seemed to be restored by the interactions of the residue at position 48 with the backbone phosphate; the arginine-phosphate interaction might have an additional effect on the cytosine specificity by stabilizing a backbone conformation favorable to cytosine binding.
Recently, novel forms of DNA base editing such as a C-to-G editing [17][18][19] , and simultaneous C and A editing [20][21][22][23] have been suggested, which would substantially expand the utility of DNA base editors. Along with the intense efforts to improve DNA base editors, our suggested tools, high-fidelity ABE variants that exhibit minimized cytosine catalysis and reduced off-target RNA editing, and TC-specific base editors with negligible bystander effects (summarized in Supplementary  Electroporation. The ABE expression plasmid (500 ng) and sgRNA plasmids (170 ng) were electroporated into 2 × 10 5 cells using a Neon Transfection System 10 μl Kit (Thermo Fisher Scientific, MPK1025). Appropriate electroporation parameters (1,500 V, 20 ms, 2 pulses for HEK293T) were used according to the manufacturer's protocol. Genomic DNA was isolated 72 h after transfection. This transfection method was adopted for the experiments shown in Figs. 1b,c, 2a,b, 3a,b and 4b,c, and Supplementary Fig. 3c. Although the total editing efficiencies of adenines and cytosines slightly varied depending on the transfection methods, the ratio of adenine editing to cytosine editing was similar ( Supplementary Fig. 3b). Rational mutation design strategy. For V30I and V30L, V30 closely interacts with the target purine base, and many TadA orthologs have Ile at this position. Leu was also tested simply because it is a similar amino acid. For P48H, a bigger residue may prevent the DNA backbone from approaching the pocket rim.
For P48R and P48K, a bigger residue may prevent the DNA backbone from approaching the pocket rim. In addition, positively charged groups may interact with the phosphate group holding the DNA backbone in a specific conformation. Arginine at this position is frequently found in TadA orthologs. For P48D and P48E, negatively charged residues may repel the backbone phosphate group. For F84I and F84L, the F84L mutation, which is a back-mutation because L84 is found in wtTadA, was expected to decrease the ratio of adenine editing to cytosine editing. In the 3D structure of wtTadA superimposed on RNA-bound SaTadA, L84 and D108 are closely located, and their side chains protrude towards the DNA backbone. Therefore, substituting L84 with a bigger residue could prevent the DNA backbone from approaching the active site, as we expected for D108Q. Given that F84 in TadA7.10 is a sufficiently big residue for this purpose, we wanted to test our hypothesis with the back-mutation F84L. For F84L/ D108Q (LQ) and F84L/D108Q/F149A (LQA), F84, D108 and F149 would closely interact with the same region of the DNA backbone and mutations affecting these residues might have synergistic or complementary effects on the deamination specificity. Therefore, we chose these mutational combinations despite knowing that F84I and F84L decreased the ratio of adenine editing to cytosine editing. For mutations affecting D108, we tested residues with various sizes and properties, except for positively charged residues. Given the relatively long distance between D108 and the DNA backbone phosphate, arginine and lysine residues could draw the phosphate group on the DNA backbone into the active site. This feature would be more favorable for the cytidine deamination activity. Glutamate at this position, but not other tested amino acids, is found in a few TadA  RNA-seq data analysis. The paired-end RNA-seq results were aligned to the reference genome (GRCh38) by using bwa mem with default options. The alignment results were sorted by samtools sort and information about substitutions was analyzed by reditools (https://github.com/BioinfoUNIBA/ REDItools2). Before calculating the A-to-I editing frequency, we removed all nucleotides with a read coverage of less than 50. We then counted the number of positions with adenosines that were partly read as inosine. Finally, we calculated the A-to-I frequencies at all positions.
Generation of knock-in cells containing mutated sequences. We prepared a donor plasmid containing a 150-bp gene segment with the intended mutation, the hph (hygromycin B phosphotransferase) gene to confer hygromycin resistance, and two sgRNA target sites for knock-in mediated by non-homologous end joining. The two target sequences are identical with a target site in the endogenous AAVS1 locus. After transfection of the donor plasmid and plasmids encoding Cas9 and sgRNA into cells, a Cas9-sgRNA complex will cleave the endogenous target site and the sites in the plasmid, dividing the plasmid into two parts. One of the parts will contain the gene segment with the intended mutation and the hygromycin resistance gene and can be inserted at the site in the AAVS1 locus. For transfection, HEK293T cells (1 × 10 5 cells per well) were cultivated in a 24-well plate. After 24 h, a mixture of 2 μl lipofectamine 2000 reagent (Thermo Fisher Scientific, 11668019), 1.5 μg plasmid DNA (750 ng Cas9 expression plasmid, 250 ng sgRNA expression plasmid and 500 ng donor plasmid) and serum-free medium were added to the cells. After 72 h, the transfected cells were cultivated in a 6-well plate with 200 μg ml −1 hygromycin B (Thermo Fisher Scientific, 10687010) for 14 days. After 14 days, HEK293T cells were maintained in DMEM supplemented with 10% FBS, 1% ampicillin and 100 μg ml −1 hygromycin B.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article. code availability