APOBEC3G (A3G) is a human, single-stranded DNA (ssDNA) deaminase that catalyzes fatal mutations in genomes of viruses such as HIV-1 [Reviewed in 6-9]. Since its discovery in 2002 [1], extensive research has been conducted globally to explore A3G functions and to overcome the HIV-1 pandemic by understanding its intrinsic antiviral mechanism. Expressed A3G binds to RNAs to form high-molecular-mass (HMM) ribonucleoprotein complexes in the cytoplasm [10, 11]. Upon infection by vif-deficient HIV-1, host A3G binds to viral genomic and nonviral RNAs and is incorporated into budding virions, together with viral components. Encapsidated A3G recognizes the natal ssDNA reverse transcript of the viral genome in capsids, and catalyzes cytosine deamination, i.e., 2'-deoxycytosine to 2'-deoxyuracil conversions in the ssDNA, impairing viral amplification [12-15]. Recent studies found that the viral genome is protected by the capsid assembly from viral entry until nuclear import [16, 17], indicating that A3G expressed in infected cells is excluded from invading viral components, including viral RNA and DNA. Therefore, encapsidation of A3G is the only way to access viral ssDNA. HIV-1 viral infectivity factor (Vif) protein, however, disturbs A3G encapsidation (Fig. 1a) [1, 3-5]. Vif hijacks host core binding factor b (CBFb) to form a stable Vif-CBFb complex, which associates A3G with the CUL5 ubiquitin ligase complex [18, 19]. The captured A3G is ubiquitinated and then degraded in proteasomes before its encapsidation. HIV-1 harboring a functional vif gene, thereby produces 'healthy' virions without A3G, advancing the viral life cycle, i.e., successful amplification of HIV-1. The A3G-Vif interaction is therefore a key event of viral immune suppression to overcome host defenses, and accordingly, it is an attractive target for new therapeutics against HIV-1/AIDS.
A3G is one of seven APOBEC3 (A3) enzyme family members, A3A, A3B, A3C, A3D, A3F, A3G and A3H, and it is the most potent anti-HIV-1 factor, in the absence of Vif [20, 21]. It has duplicated domains, the N-terminal (NTD) and C-terminal domains (CTD), which share a tertiary structure with a preserved zinc-binding motif [22, 23]. Structure based mutagenesis analyses have shown that A3G NTD amino acids are recognized by HIV-1 Vif [23, 24]. Specifically, aspartate-128 (D128) of A3G has been identified as a species-specific determinant since the single mutation, D128K, makes human A3G insensitive to HIV-1 Vif-induced degradation, while Old World monkey A3Gs containing K128 are sensitive to SIV Vif [25-28]. The A3G NTD is also indispensable for incorporation into virions via RNA binding [29-32], whereas the A3G CTD is central to its deamination activity [33, 34]. The crystal structure of the A3G CTD in complex with ssDNA has revealed the molecular mechanism of substrate recognition [35]. The active site pocket next to the zinc-binding motif accepts the target cytosine base dC(0), converting it to uracil. Additionally, the preceding base dC(-1) is accommodated in a surface cavity of the A3G CTD to achieve strict sequence specificity; i.e., the dinucleotide dC(-1)dC(0) is essential for A3G recognition. Despite unveiling structure-activity relationships of the CTD, molecular insights into the interaction between A3G NTD, Vif or RNA remain unknown. The difficulty of sample preparation, such as target heterogeneity, has prevented acquisition of high-quality data [36-38].
Structure of the sA3G-VC-RNA20 complex
To overcome target protein heterogeneity, we prepared a solubility-enhanced human A3G variant, sA3G, and found that a short RNA oligomer, RNA20, increased sA3G-Vif complex formation in vitro (Fig. 1b, c, d Table I and Supplemental Information). Importantly, sA3G has no mutations in sA3G-Vif interfaces as shown below and is sensitive to Vif-induced degradation as well as wild-type A3G, representing the A3G-Vif interaction in living cells (Fig. 1e and Extended Data Fig. 1f). The combination of our A3G variant with a previously optimized Vif-CBFb (VC) construct [39] enabled us to prepare a stable sA3G-VC-RNA20 complex, which displayed a highly homogeneous particle distribution using negative-staining electron microscopy (Extended Data Fig. 3e). We collected electron cryo-microscopy (cryo-EM) data of the A3G-VC-RNA20 complex in a frozen, hydrated state and reconstructed the three-dimensional (3D) electron-potential map by single-particle image processing. The refined C2-symmetrized map reached 2.8 Å resolution (Extended Data Fig. 6j) and allowed us to build an atomic model unambiguously (Fig. 1f, g and Extended Data Fig. 7).
Two heteromers of sA3G-VC-RNA20 face each other in a major sA3G-sA3G contact (Fig. 1f). To the best of our knowledge, the biological relevance of this heterodimeric arrangement is unclear; therefore, it will not be emphasized in this paper. This could be due to unexpected stabilization by amino acid replacements on the sA3G-sA3G dimeric interface. We focus on the asymmetric unit of this dimer, which represents the long-sought atomic structure of the A3G-Vif complex, including visualization of side chains at the interfaces (Fig. 1f, g and Fig. 2a, b). Unexpectedly, in the asymmetric unit, a single sA3G assembles with two VCs, named VCred and VCblue. Both VCs contact sA3G via their Vif protein and occupy a significant portion of the sA3G NTD surface area. The ligand, RNA20, participates in both interfaces sA3G-Vifred and sA3G-Vifblue. It has been thought that the A3G NTD exclusively provides the Vif-binding surface [6, 7]. In our structure, however, the sA3G CTD also clearly interacts with Vifred.
The protein-protein interface of sA3G-Vifred is biologically pivotal for complex formation, because it includes A3G amino acid D128, which has been identified as a determinant of species-specific A3G-Vif interaction [25-28]. Our cryo-EM structure shows sA3G side chains D128 and D130 proximal to the side chain of R15 on Vifred helix a1 (Fig. 2a), suggesting that their rotamers form bifurcated hydrogen bonds. The aspartic acid dyad also faces W70/G71 on Vifred, and is thus essentially surrounded by amino acids from Vifred. The preceding sA3G residue, W127, makes contact with Vifred H43, forming a p-p interaction between the aromatic rings (Fig. 2a). Vifred Y44 makes contact K270, located at the C-terminal end of helix a2 in the CTD (Fig. 2a, c). These interactions support previous findings that Vif residues 40-YRHHY-44 are critical for A3G binding [40]. Helix a2 of the CTD is arranged along helix a6 of the NTD and interacts mainly via hydrophobic interactions, such as the contact between M188 and F268. Although the sA3G CTD provides a limited interface for the sA3G-Vifred assembly, it is likely to have a significant impact on stabilizing the complex, considering that the C-terminal part of CTD helix a2 features the best-defined map resolution in the CTD, i.e., side chain rotamers of F268 and K270 were clearly resolved, as well as their interacting Vifred residues and RNA (Fig. 2a, c). The sA3G CTD and the sA3G-Vifred assembly support each other, and complex formation defines the CTD arrangement despite the flexible character of both A3G NTD and CTD in solution [36].
In the protein-protein interface of sA3G-Vifblue, on the other hand, Vifblue recognizes sA3G-NTD helix a1 (Fig. 2b). Amino acid D15 of sA3G contacts Vifblue K22 on a1. Vif amino acids K22 and K26, specifically the positive charge of K26, are required for Vif-induced degradation of A3G [41, 42]. Additionally, sA3G S18 and Y19 interact with Vifblue H43. The side chain of Y19 also faces the aromatic ring of Vifblue W70, forming a p-p interaction. Vif mutants W70A and W70R fail to neutralize antiviral activity of A3G [43, 44]. It is noteworthy that Vifblue H43 and W70 contact sA3G Y19, whereas they face sA3G W127 and D128 in the sA3G-VCred interface. This mechanism of targeting dual Vifs will be discussed in detail below.
RNA involvement in sA3G-Vif interaction
The areas that account for protein-protein interactions are ~550 Å2 and ~590 Å2 for sA3G-Vifred and sA3G-Vifblue, respectively. These relatively small interaction surfaces may not be sufficient to form a stable complex. Indeed, no sA3G-VC complex was captured when protein components alone were mixed together, whereas addition of a specific RNA oligomer induced stable complex formation (Extended Data Fig. 3a-d and Supplementary Information). The cryo-EM map explicitly shows that the ligand RNA20 interacts with both interfaces, sA3G-Vifred and sA3G-Vifblue (Fig. 1f, Extended Data Fig. 9). Specifically, we found two dinucleotides, rG6rA7 and rC17rA18, that were well defined in protein pockets of sA3G NTD and Vifblue, respectively (Fig. 2c-f).
The dinucleotide rG6rA7 is located at the sA3G-Vifred interface (Fig. 2c, d). A binding pocket for rG6 is formed by W127 (NTD), F268, and K270 (CTD) of sA3G and H43 and Y44 of Vifred. As described above, these amino acids contact each other to form the protein-protein interfaces between sA3G NTD-Vifred, sA3G CTD-Vifred and sA3G NTD-CTD (Fig. 2a). Additionally, the side chains of Vifred K22 and K26 are within hydrogen bonding range to a phosphate group on the RNA backbone (Fig. 2d). The well-resolved nucleotide, rA7, is buried deeper and is accommodated in a pocket of the sA3G NTD formed by NTD loops a1/b1 and b4/a4 (Fig. 2e). The refined atomic model indicates multiple hydrogen bonds between base rA7 and backbone atoms of the loops. In particular, atom N6 of base rA7 is within range to form bifurcated hydrogen bonds with carbonyl oxygens of both P25 and L123. This arrangement suggests a binding preference of the pocket for specific bases. Although the pocket size is large enough to accept a purine base, guanine atom O6 would be repelled by the carbonyl oxygens of P25 and L123, resulting in selective accommodation of adenine base. In addition, the 2’-hydroxyl group of rG6 can form bifurcated hydrogen bonds with side chains H42 and H43 of Vifred (Fig. 2c). These intermolecular interactions would compensate for an entropic penalty upon the complex assembly to confer base-specificity of the RNA ligand.
On the other hand, rC17rA18, binds to Vifblue. The cryo-EM map shows that the base rA18 is accommodated in a cavity formed by amino acids R17, T20, L24 and P162 to K168 of Vifblue including the PPLP motif (Fig. 2f) [45]. Previous studies found that mutations of the Vif PPLP motif reduced A3G binding, and increased A3G incorporation to virions. The refined model indicates hydrogen bonds between atom N6 of rA18 and the main chain carbonyl groups of Vifblue S165 (Fig. 2f). Once again, the molecular arrangement of this pocket likely excludes the O6 atom of guanine. Adenine satisfies the geometry and interactions with chemical moieties of surrounding amino acids. In contrast, nucleotide rC17 is exposed to the protein exterior. The atomic model indicates that phosphate groups of rC17 and rA18 are within hydrogen bonding range of side chains from T20 and R17, respectively, emphasizing the critical role of this adenine nucleotide.
We found that complex formation is also determined by non-specific sA3G-RNA interactions, not only through specific adenine base recognition mentioned above. Although the model assignment to the map was somewhat ambiguous for nucleotides rC1 to rU5, and rU8 to rA16, map features for the RNA20 ligand could be rationally interpreted. At a map contour level of 5s, rC1 to rU5 and rU8 to rA16 were clearly visible at the sA3G-Vifred and sA3G-Vifblue interfaces, respectively (Extended Data Fig. 9). The cryo-EM map locates base rC1 close to rU8, and the 5´-half of the RNA20, C1 to rU8, adopts a loop conformation (Extended Data Fig. 9e, i, j). Interestingly, nucleotides rC1 and rG2 contact protein surfaces of Vifred and even CBFbblue (Extended Data Fig. 9e), apparently filling the gap between sA3G, Vifred and CBFbblue and increasing the interface area of the assembly. Nucleotides rU8 to rA16 seem to extend the polynucleotide chain on the surface of sA3G (Extended Data Fig. 9i, j, m, n). Amino acid Y59 of sA3G is not involved in either protein-protein interaction, but contacts rU13 and rU14 (Extended Data Fig. 9m, n). The Y59D mutant attenuated Vif-induced degradation, compared with that of wild-type A3G (Extended Data Fig. 1f, lane 6, 7). As expected, the introduced aspartic acid most likely excludes accommodation of the RNA due to a repulsive interaction between its negatively-charged side chain and phosphates in the polynucleotide backbone. Shirakawa et al. reported that phosphorylation of A3G T32 disrupted Vif-induced degradation [46]. Amino acid residue T32 is located proximal to Y59 and contacts the ligand RNA (Extended Data Fig. 9n). The impact of T32 phosphorylation on degradation is also likely caused by exclusion of the ligand RNA. Although these amino acids are located away from the sA3G-Vif interface, they mediate complex formation indirectly through interactions with RNA.
Our cryo-EM map and available biochemical data demonstrate that RNA mediates both sA3G-Vifred and sA3G-Vifblue assemblies. The RNA20 ligand increases the interface areas by up to ~900 A2 and ~1,500 A2 for sA3G-Vifred-RNA20 and sA3G-Vifblue-RNA20, respectively, thus maximizing domain interactions and stabilizing the complex. It explains how RNA20 captures the sA3G-VC complex and enables structure determination by cryo-EM. Intriguingly, only the two nucleotides, rA7 and rA18, participate in key base-specific interactions with their protein binding partners. We propose that sA3G-VC assembly is promoted by various RNAs. We further assessed base specificity and impact on sA3G ubiquitination, as will be discussed hereafter.
Mechanism of sA3G targeting by dual Vifs
Our cryo-EM reconstruction of the sA3G-VC-RNA20 complex reveals two Vif proteins, Vifred and Vifblue, that both bind to a single sA3G, i.e., each of them recognizes a different region of the sA3G surface (Fig. 1f, g). Intriguingly, both Vifred and Vifblue achieve this using the same region of their protein surfaces. To inspect how Vif alters its target recognition, all intermolecular contacts are summarized in Figure 3a. Amino acid determinants of sA3G form patches that can be identified as specific interfaces for either Vifred, Vifblue or RNA (Fig. 3a, b, d); e.g., residue D128 of sA3G participates exclusively in the interface with Vifred (Fig. 3a, b). In contrast, Vifred and Vifblue share amino acid determinants that can bind to different portions of sA3G or RNA; e.g., amino acid R15 of Vifred contacts D128, D130 and Y131 of sA3G and the same residue of Vifblue makes contacts with rU11 and rU14 (Fig. 3a, c, e). These groups of binding sites can switch between Vifred and Vifblue. Amino acids R15, W79, L81 and Q83 of Vifred interact with sA3G, while those of Vifblue interact with RNA. Conversely, amino acids R23, K26, Y30 and H42 of Vifred participate in the interface with the ligand RNA while those of Vifblue are involved in sA3G interaction. Thus, these amino acid determinants of Vif form a complementary target pattern on each interface; i.e., they differ between the interfaces of sA3G-Vifred and sA3G-Vifblue, although both Vifred and Vifblue allocate the same face of their protein surface to the interface. Only Vif H43 and W70 interact with sA3G in both interfaces (Fig. 2a, b, and 3a). These two amino acids have been identified as critical determinants for A3G-Vif interaction [40, 44]. Interestingly, Vifred and Vifblue have identical polypeptide structures, except for different conformations at the interface peripheries.
How does Vif target different portions of sA3G without significant structural changes? To answer this question, we calculated the electrostatic surface distribution on sA3G and both Vifs (Fig. 3f-i). The electrostatic surface map shows that sA3G is dominated by negative potentials on both Vif binding patches (Fig. 3f), whereas the RNA binding site stands out as a positively charged area flanked by the binding interfaces of Vifred and Vifblue (Fig. 3f), supporting an RNA-binding mechanism employing electrostatic interactions. Upon RNA binding, the electrostatic surface of the sA3G-RNA20 complex becomes an extended region of predominantly negative potentials (Fig. 3g). This enhancement of negative potentials will promote an association with positive potentials on the interaction surfaces of Vifred and Vifblue (Fig. 3h, i). Thus, RNA mediates electrostatic complementarity between sA3G and Vifs, and this electrostatic complementarity is most likely the major factor driving assembly of sA3G and both Vifs. Intriguingly, a predicted model of wild-type A3G suggests that the original amino acids R14, E170 and E173 in periphery of the sA3G-Vifblue interface increase the surface potentials and enhance the electrostatic complementary between A3G, Vif and RNA even more than that of sA3G (Extended Data Fig. 8a, f, g). Taken together, specificity conferred by adenines rA7 and rA18 appears to govern not only RNA binding, but also sA3G-Vif assembly.
RNA promotes sA3G ubiquitination
How does RNA base specificity affect Vif-induced ubiquitination of A3G? To test this, we explored in vitro ubiquitination using the ubiquitin-activating enzymes E1, ubiquitin-conjugating enzyme E2 L3, NEDDylated CUL5, and VCBC proteins. In the presence of all required proteins, sA3G was polyubiquitinated, as expected, showing ladder-like bands on PAGE gels (Fig. 4a, lanes 8 and 9), whereas no ubiquitinated sA3G was detected when a protein component was missing (either substrate sA3G or ARIH2), or when using ubiquitin lacking glycine-76 at the C-terminus (Fig. 4a, lanes 1-4). Although the reaction occurred in the absence of the RNA20 ligand (Fig. 4a, lane 5), it was much less efficient than when RNA20 was present (Fig. 4a, lane 8). We tested A3G amino acid replacements which were previously identified to reduce Vif-induced degradation, i.e., mutations D128K or K297R/K301R/K303R/K334R [25-27, 47]. These mutations appeared to decrease polyubiquitination (Fig. 4a, lanes 6 and 7).
We further assessed effects of the RNA20 ligand and sA3G mutation on the ubiquitination reaction by monitoring the amount of intact sA3G remaining. Under our test conditions, sA3G was almost completely ubiquitinated within the monitoring period (Fig. 4b lanes 1-4, and 4c). As expected, mutation D128K slowed mono- and di-ubiquitination of sA3G (Fig. 4b lanes 5-8, and 4c). Interestingly, a DNA oligomer with a sequence corresponding to RNA20 showed no ability to promote sA3G ubiquitination (Fig. 4b lanes 9-12, and 4c). These results clearly indicate that RNA20 is the driver of sA3G ubiquitination. As mentioned above, a dinucleotide rG6rA7 of the ligand RNA20 forms a hydrogen bond network between sA3G and Vifred (Fig. 2c, e). This structure-activity relationship is important since sA3G ubiquitination remains enhanced when using an RNA oligomer with a poly-uridine sequence, except for rG6rA7 (U20-rGrA) although the reaction proceeds slowly (Fig. 4d lanes 1-4, and 4e, Table I). Whereas rG6-to-rU6 replacement was still able to promote the reaction (Fig. 4d lanes 5-8, and 4e), omission of nucleotide rA7 no longer enhanced the reaction significantly (Fig. 4e). Intriguingly, a pinpoint modification, U20-dGrA, i.e., removal of the 2’-hydroxyl group of rG6, led to a drastic loss of enhancement (Fig. 4d, lanes 9-12 and 4e). As described above, the rA7 base is accommodated in a deep pocket of sA3G and the 2’-hydroxyl group of rG6 can form bifurcated hydrogen bonds with the side chains of H42 and H43 on Vifred, so that a lack of either one severely weakens the interaction with sA3G-Vifred (Fig. 4e, f). The importance of these histidines was confirmed by introducing H42A/H43A mutations on Vif, which abolished degradation of wild-type A3G and sA3G in living cells (Extended Data Fig. 1f, lanes 5 and 15) [40].
We discovered two well-resolved dinucleotides rG6rA7 and rC17rA18 in the cryo-EM map. Their characteristic molecular envelopes (Fig. 2c-f) and unique occurrence among all tested RNA ligands (Extended Data Fig. 3b) allowed unambiguous assignment. During RNA sequence optimization to capture a stable sA3G-VC complex, we found that replacement of rA7-to-rU7 caused complete loss of complex formation (Extended Data Fig. 3b), while truncation of ligand RNA, including removal of rC17rA18, could be partially compensated by rG10 (Extended Data Fig. 3d). The loss of rA7 could not be neutralized by introduction of nearby rA4rA5 (Extended Data Fig. 3b). Based on these ligand sequence optimization trials, we conclude that the dinucleotide rG6rA7 plays a critical role in complex formation and subsequent ubiquitination of sA3G. Other parts of the RNA oligomer such as rG10 and rC17rA18 act in supportive roles. Interestingly, rG6rA7 is located at the interface of sA3G-VCred whereas rG10 and rC17rA18 interact with VCblue, implying that both sA3G-VCred and sA3G-VCblue are required for stable complex formation.
Taken together, the dinucleotide rG6rA7 serves an essential biological function in conferring local base preference upon the RNA ligand to promote A3G-Vif interaction. In addition, rG10 and rC17rA18 cooperatively enhance the assembly.