Reconstitution of vaccinia transcription initiation complexes
We used complete vRNAP purified from HeLa cells infected with an engineered vaccinia strain to reconstitute FLAG-tagged21 vRNAP pre-initiation and initially transcribing vaccinia vRNAP complexes. Transcriptionally active complete vRNAP was used to reconstitute transcription complexes on a vaccinia early promoter scaffold consisting of the critical region (CR), a non-complementary bubble including the transcription start site (+1) and a template cassette lacking G nucleotides (Fig. 1a). Complex formation was observed in the presence of NTPs upon incubation of complete vRNAP (Fig. 1b). A large-scale reconstitution of DNA-bound vRNAP complexes was separated by gradient centrifugation (Supplemental Fig. 1a), transferred onto holey carbon grids and three cryo-EM datasets were collected (Supplemental Figs. 1b,c). After extensive 3D classification, several distinctive particle classes could be separated (Supplemental Fig. 1d, 3a and 4a) that represented vRNAP complexes in transcription stages ranging from pre-initiation to co-transcriptional capping.
Cryo-EM structure of the vaccinia pre-initiation complex
Biochemical studies had previously shown that vRNAP, VETF and Rap94 are required for early transcription initiation31,32,26. We identified one particle class in our reconstitution that contained these factors along with the DNA scaffold and thus represented the bona fide PIC31,32. Its single-particle reconstruction displayed an overall resolution of 3.0 Å and diffuse density for DNA and VETF. Signal subtraction and focused refinement resolved the VETF-DNA subcomplex (Supplemental Fig. 1e-i, Supplemental Tab. 1). The density was docked with the core vRNAP model and the VETFl and VETFs chains and parts of Rap94 were traced de novo, allowing complete modelling of the PIC (Fig. 1c and d). Within the PIC, the promoter is positioned at the distal edge of the polymerase cleft. The upstream DNA contacts the protrusion domain of the polymerase subunit Rpo132, directly adjacent to the C-terminal domain (CTD) of Rap94. The downstream promoter region interacts with the vRNAP core through positions on the clamp head (Supplemental Fig. 2a). The melted promoter region is predominantly disordered but could be visualized with mild Gaussian filtering (Fig. 1e). It localizes centrally above the opening of the cleft forming a second contact zone with the clamp head (Supplemental Fig. 2a). Both DNA strands appear only minimally separated within the bubble region. The latter joins the adjacent double-helical upstream and downstream sections in a 100° angle accompanied by a 25 Å translational shift of the helix axes (Fig. 1e). We thus conclude that the DNA is in the initially melted state.
Of note, neither the B-cyclin, nor the B-homology region of the early transcription factor Rap94 establish direct DNA contacts in the PIC (Fig. 1d and Supplemental Fig. 2a). However, on the opposite side of the core vRNAP, VETFs and VETFl contact the DNA in the distal upstream and downstream promoter regions, respectively (Fig. 1f, Supplemental Figs. 2b,c). Therefore, and due to the absence of contacts in the initially melted region (IMR), the VETF heterodimer appears to be anchored like a bridge on the upstream and downstream region of the promoter (Fig. 1d and Supplemental Fig. 2b).
Unique mode of DNA-binding by the VETF heterodimer
The structure of VETF allowed us to decipher the mechanisms of vRNAP recruitment to the early promoter. VETFl folds into five distinct domains, termed NTD, TBPLD, CRBD, Domain 4 and CTD (Figs. 1c,f). Despite the absence of a priori detectable sequence homology, the second domain displays a bi-lobal TBP fold, and hence is a TBP-like domain (TBPLD). It is located centrally above the polymerase cleft and, unlike TBP in other structures of PICs, contacts the promoter in a sequence-independent manner. Sequence-specific DNA binding in the vaccinia PIC is instead facilitated by the neighbouring domain, which recognizes the CR (Figs. 1a,d and f). Based on its fold and binding mode, this module constitutes a novel type of double-stranded DNA binding domain, hence termed Critical Region Binding Domain (CRBD). While holding only a limited content of secondary structure elements, it gains structural rigidity through three disulphide bridges that position a 310-helix ideally for its insertion into the major groove of the DNA (Fig. 2a). The sidechain-to-base contacts of this helix are the major site for sequence-specific readout of the promoter sequence (Figs. 2b,c). Only weak bending of the DNA helix axis is introduced in this region (Fig. 2a).
The joint structural context of TBPLD and CRBD in VETFl establishes specific contacts to the upstream promoter (Supplemental Fig. 2b). On the core vRNAP this part of the promoter is anchored via the interaction of domain 2 of Rap94 with the NTD of VETFl (Fig. 1d). All other domains of VETFl (NTD, Domain 4 and CTD) contribute to the structural backbone of VETF. Domain 4 and the CTD of VETFl make up the interface to VETFs (Fig. 1f).
The downstream promoter interacts almost exclusively with VETFs (Figs. 1D, 2D, and S2B). Only one additional pointed contact to the core vRNAP is established by the clamp head close to the TSS (Supplemental Fig. 2a). We observe a striking similarity of the first two domains of VETFs with the canonical helicase fold of chromatin remodelling SNF2-type ATPases 22,33, of which INO80 34 is the closest homologue. With the latter, VETFs shares, along with the vRNAP-associated transcription factor NPH-I, an extended brace helix that stably bridges the N- and the C-lobe of the helicase fold (Supplemental Fig. 6). The intense DNA interaction of the VETFs helicase module is accompanied by a strong bend of the helix (Supplemental Fig. 2e). At the point of inflection, Phe271 intercalates via the minor groove, effectively disturbing the planar base-stacking over the range of roughly 3 base pairs on either side of the insertion site (Fig. 2d). Although melting of the two DNA strands is not observed at this position, this mechanism bears some similarity to the ‘scalpel’ method of strand-separating helicases 35 .
Promoter positioning and enforcement of transcription directionality in the PIC
We next asked how the DNA contacts established by the CRBD of VETFl control the initiation process. The 310-helix of CRBD inserts into the major groove, making it the reader head of VETF (hence termed the CRBD reader, Figs. 2a and b). The CR is essentially a consensus sequence of 15 A nucleotides, interrupted by a TG dinucleotide30,36 (Figs. 1a and 2c). Arg370 and Gln375 engage in base-specific H-bonding that involves the bases of the TG motif on the non-template strand and the complementary AC dinucleotide on the opposing template strand (Figs. 2b and 2c, Supplemental Video 1). By this means, VETFl anchors the promoter in a defined position relative to the polymerase cleft. The CR displays a high propensity for A nucleotides downstream of the TG motif (Figs. 1a and 2c). Consistent with this, the C5 methyl groups of the corresponding complementary T nucleotides at positions -18 and -17 of the template strand interact cooperatively with the reader head by stacking with Tyr376. Inverse promoter binding would imply an unfavourable contact of Tyr376 with adenine bases (Figs. 2b and 2c) and thus a single promoter direction is coerced. By this means, the CRBD-DNA interaction ensures the i) identification of the CR, ii.) alignment of the CR relative to the polymerase cleft, and iii.) enforcement of transcription directionality. The CRBD is thus is the main regulator of the transcription initiation process.
Asymmetric DNA binding by the TBP-like domain of VETFl
Our structure identified VETFl as a TBP-like protein (TBPLP) whose TBPLD is engaged in an intricate contact network comprising the neighbouring domains of VETFl, VETFs and Rap94 (Fig. 3a). Members of the TBPLD family had previously been identified solely by means of sequence homology. However, VETFl stands apart from previously known TBPLPs because of its extremely divergent sequence that until now had prevented its classification as such. Nevertheless, the structural conservation of the TBPLD is comparably high, resulting in a Z-score of 4.2 determined by PDBeFold 37 when matching it to PDB entry 1TBP. To compare their structures and binding modes, we aligned the TBPLD - upstream DNA module of VETFl (Fig. 3b) with the yeast TBP - TATA-box crystal structure (Fig. 3c). The TBPLD of VETFl features the characteristic saddle structure that was previously described for TBP38-41, however, the symmetry that is evolutionary conserved in TBP42,43 appears broken. As a consequence, and unlike TBP, which contacts the TATA-box symmetrically, VETFl binds the promoter asymmetrically and sequence-independently solely through its C-terminal TBP lobe. Most strikingly, the TBPLD inserts into the DNA major groove, contrary to the canonical binding mode of TBP, which is based on minor groove insertion. In accordance with this observation, the two strictly conserved pairs of DNA-intercalating phenylalanine residues on each lobe of TBP are absent in the TBPLD38-41. Still, the TBPLD induces a pronounced DNA bend via intercalation of aliphatic, rather than aromatic, sidechains (Fig. 3b). In agreement with the fundamentally different binding mode of the TBPLD, a consensus TATA-box is absent from vaccinia early promoters30.
Rearrangement of the complete vRNAP into the PIC
Complete vRNAP is the predominant vRNAP complex found in infected cells and necessary and sufficient to execute viral early transcription. Hence, we previously speculated that this unit becomes incorporated into virions as a pre-assembled unit to promote the restart of transcription in the next infection cycle21. To investigate the transformation of complete vRNAP into the PIC, we compared both structures and their cryo-EM reconstructions. The VETF heterodimer is already present in the complete vRNAP, yet defined density could only be observed for the CRBD of VETFl whereas the remaining parts were mobile. Under the assumption that the adjacent TBPLD is flexibly joined to the CRBD, we were able to dock the diffuse residual density in the vRNAP reconstruction with the VETFl coordinates extracted from the PIC model, resulting in reasonable overlap. In the resulting structure (Fig. 1g) VETFl displays a flexible interface to tRNAGln. A comparison with the PIC structure reveals major reconfigurations, including the release of all associated factors from complete vRNAP except for the VETF heterodimer and Rap94 (Supplemental Video 1). This underlines the importance of complete vRNAP as a pre-formed early transcription unit and the high plasticity of vaccinia transcriptional complexes (see also Supplemental Video 1 for a summary of core aspects of the PIC).
Structure of the late pre-initiation complex
The structural transition described above explains how complete vRNAP becomes recruited to the viral early promoter to form the PIC. We next solved the structure of vRNAP particle classes that represent bona fide transcription stages following the pre-initiation phase. Based on biochemical evidence, such particles are predicted to be devoid of VETF but contain Rap94. Particles of class 1, subclass 2 (Supplemental Fig. 3a), which yielded a reconstruction at 3.0 Å resolution (Supplemental Figs. 1b-d, Supplemental Tab. 2) fulfilled this criterion. The density could be docked with the complete vRNAP model21. Disordered density corresponding to DNA is visible upstream next to the Rap94 CTD and within the downstream DNA channel. These sites roughly coincide with the DNA anchor points on the core vRNAP observed in the PIC (compare Fig. 1d). However, no density for the DNA transcription bubble or nascent RNA was detected in the active cleft (Fig. 4a). Instead, we found well-defined density for the highly phosphorylated stretch within the C-terminus of Rpo30 (termed phospho-peptide domain, PPD, Fig. 4b). It is in a similar conformation as in the complete vRNAP21 and follows the path of the template- and non-template strand in the elongation complex (EC). This allows its pairing with the B-reader of Rap94 (Figs. 4a,b) and enables single-strand capture at later stages (see below). We therefore conclude that this particle represents a late state of the PIC (lPIC) in which VETF has been expelled, the melted promoter has been handed over to the core vRNAP, but transcription has not yet been initiated.
PPD assisted single-strand capture and formation of the ITC
Next, we investigated the structural basis of lPIC conversion into an initially transcribing complex (ITC). Three vRNAP particle classes yielded reconstructions that were identified as different conformations of the ITC based on their composition and promotor positioning (lTC1-3, Supplemental Figs. 3a-d, Supplemental Table 3). The exact location of the polymerase on the promoter could be determined because its downstream blunt end was readily visible in the density (Supplemental Fig. 5a). In contrast to the lPIC, we observed ordered density for DNA in the downstream DNA channel and for a DNA/RNA hybrid above the active site (Fig. 5a). The PPD of Rpo30, which occupied the position of the DNA/RNA hybrid in the IPIC has been displaced by the template strand. Consequently, the B-homology region became mobile and is not visible in the density (Fig. 5b). No density for upstream DNA was identified. The three ITC complexes superimposed well but differed in the positioning of the DNA within the downstream DNA channel (Fig. 5a) and the state of the clamp (Fig. 7b). For ITC3, downstream DNA density was located in a shallower position and was comparably less ordered. In the ITC1 particle, the clamp is in a closed conformation with the DNA bound firmly and deep in the downstream DNA channel. ITC2 and ITC3 display an open clamp conformation and the downstream DNA appears mobile and bound in a shallower position. No significant differences between the three ITC complexes were discernible with regard to the DNA/RNA hybrid region. Thus, the three ITC structures inform on the conformational flexibility of the ITC, and, in concert with the lPIC structure, on the template-strand capture mechanism.
Upstream promoter scrunching in the late initially transcribing complex
During 3D classification, one particular class stood out because it comprised particles considerably larger than the ITC (Supplemental Fig. 4a). After a further round of focused classification of these particles on the observed extra density followed by multibody refinement a reconstruction was obtained that allowed the construction of a complete model (Figs. 6a and Supplemental Figure 4b-d, Tab. 4). This complex was classified as a late ITC (lITC), based on the positions of the blunt ends of the upstream and downstream promoter-DNA segments that are visible in the density (Supplemental Fig. 5b) and on the presence of a RNA/DNA hybrid. Except for Rap94, the core vRNAP was in a conformation similar to that observed in the ITC complexes. The path of the downstream DNA fitted best that observed in the ITC3 particle, indicating loose binding. The downstream blunt end of the DNA scaffold had advanced roughly 5 base pairs in downstream direction compared to the ITC (Supplemental Fig. 5a). Massive extra density above the cleft was unambiguously attributed to upstream DNA-bound NPH-I, and the NTD Rap94 and B-cyclin domain of Rap94 (Fig. 6a). Strikingly, the Rap94 B-homology region, the NTD and adjacent linkers appeared entirely reconfigured in comparison to other vRNAP complexes (Supplemental Figs. 5d, e) and the whole path of the Rap94 chain was visible (Fig. 6b). We also note that the path of the upstream DNA in the lITC is fundamentally different from that observed in the vaccinia PIC and in the ITC of Pol II 44.
The blunt ends of the DNA promoter scaffold are visible in the EM density of the IITC (Supplemental Fig. 5b), thus allowing to determine the position of vRNAP relative to, and the size of, the transcription bubble (Supplemental Fig. 5c). Strikingly, the upstream end of the scaffold can only be accommodated within the lITC under the assumption of massive promoter scrunching. This includes 13 base-pairs upstream of the artificial non-complementary region of the promoter scaffold, that have been additionally melted when compared to the ITC (Supplemental Fig. 5c). It is likely that this condition enables promoter escape and hence contributes to the transition of the initiation phase into productive elongation (Supplemental Video 2).