1H, 15N and 13C resonance assignments of a minimal CPSF73-CPSF100 C-terminal heterodimer

The initial pre-mRNA transcript in eukaryotes is processed by a large multi-protein complex in order to correctly cleave the 3’ end, and to subsequently add the polyadenosine tail. This cleavage and polyadenylation specificity factor (CPSF) is composed of separate subunits, with structural information available for both isolated subunits and also larger assembled complexes. Nevertheless, certain key components of CPSF still lack high-resolution atomic data. One such region is the heterodimer formed between the first and second C-terminal domains of the endonuclease CPSF73, with those from the catalytically inactive CPSF100. Here we report the backbone and sidechain resonance assignments of a minimal C-terminal heterodimer of CPSF73–CPSF100 derived from the parasite Encephalitozoon cuniculi. The assignment process used several amino-acid specific labeling strategies, and the chemical shift values allow for secondary structure prediction.

although it is catalytically inactive it is required for CPSF73 activity (Mandel et al. 2006;Dominski 2010). The N-terminal domains of CPSF73 and CPSF100 contact each other, as evident from cryo-EM studies of human CPSF and the human histone pre-mRNA 3'-end processing machinery Sun et al. 2020).
The C-terminal regions of CPSF73 and CPSF100 also display conserved domains, and previous studies had suggested that the C-termini of both proteins are in direct contact (Michalski and Steiniger 2015;Lin et al. 2017). The cryo-EM studies noted above provide some information on the nature of this C-terminal interaction Sun et al. 2020). However, the lower resolution in the cryo-EM maps for the C-terminal domains of CPSF73 and CPSF100 prevent atomic insight into their structure and interaction surface. Nevertheless, the available density supported an overall architecture likely similar to that observed between the C-terminal regions of IntS11 and IntS9 (paralogs to CPSF73 and CPSF100, respectively) within the Integrator complex (Albrecht and Wagner 2012;Wu et al. 2017;Zheng et al. 2020;Pfleiderer and Galej 2021).
Using CPSF73 and CPSF100 from the parasite Encephalitozoon cuniculi, we were able to produce a soluble and stable minimal heterodimer formed by the C-terminal regions of each protein. The study of this minimal C-terminal heterodimer by NMR spectroscopy will allow for atomic

Biological context
The initial pre-mRNA transcript in eukaryotes must be processed in order to create the functional mRNA. One of the major modifications is a precise endonucleolytic cleavage at the 3' end of the pre-mRNA, followed by the addition of a polyadenosine (or polyA) tail. This processing is carried out by a large multi-subunit complex known in metazoa as the cleavage and polyadenylation specificity factor (CPSF). Based on biochemical and genetic approaches, the endonuclease CPSF73 was determined to be directly responsible for the cleavage reaction (Ryan et al. 2004;Dominski et al. 2005;Dominski 2010). The activity is contained within the N-terminal half of CPSF73 from residues at the interface of the metallo-β-lactamase domain and the β-CASP inserted cassette (Mandel et al. 2006). The paralogue CPSF100 has a similar architecture to CPSF73, and Cameron D. Mackereth cameron.mackereth@inserm.fr 1 investigation into the architecture of this interaction that is missing in the cryo-EM studies. As a first step towards this characterization, we have determined a near complete chemical shift assignment of backbone and sidechain 1 H, 13 C, and 15 N nuclei.

Methods and experiments
Residues 452-567 of E. cuniculi CPSF73 (Genbank KMV65242.1) and residues 525-639 of E. cuniculi CPSF100 (Genbank NP_597379) were amplified by PCR from cDNA clones (Katinka et al. 2001). A ligation independent cloning protocol used the NdeI and BamHI restriction sites present on modified pET vectors. The ecCPSF73(452-567) was inserted into a pET-MCN ampicillin-resistant plasmid for expression following a His6-tag and tobacco etch virus (TEV) protease site. To allow for coexpression, the ecCPSF100(525-639) used a pCDF-MCN plasmid for streptomycin resistance and was expressed without any purification tags.
Protein expression used Escherichia coli BL21(DE3) lysY (New England Biolabs) transformed by using both expression vectors. For unlabelled protein, 500 mL of lysogeny broth (LB) with 100 mg/L ampicillin and 50 mg/L streptomycin was incubated with shaking at 37 °C, using bacteria from a 10 mL LB overnight culture also grown at 37 °C. Expression was induced with 0.25 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) at an OD 600nm of 0.6, followed by overnight growth at 25 °C. Uniform incorporation of stable isotope used the same protocol but with M9 media supplemented with 1 g/L [ 15 N]-ammonium chloride and 2 g/L [ 13 C]-D-glucose. Fractional [ 13 C]-labelling used 0.2 g/L [ 13 C]-D-glucose and 1.8 g/L natural abundance D-glucose, with 1 g/L [ 15 N]-ammonium chloride also added. Amino acid-specific labelling of isoleucine, valine or leucine used 500 mL of a base medium of M9 media with 1 g/L [ 15 N]-ammonium chloride and 2 g/L natural abundance D-glucose. Thirty minutes prior to induction, 100 mg/L [ 13 C, 15 N]-Ile, [ 13 C, 15 N]-Val, or 13 C, 15 N-Leu was added to the medium. Specific labelling of deuterated amino acids used a similar strategy, with either 100 mg/L [ 2 H]-Phe in 500 mL M9, or a combination of 100 mg/L [ 2 H]-Tyr and 100 mg/L [ 2 H]-Trp in a culture size of 250 mL.
Following expression, the bacteria were collected by centrifugation at 4500 x g for 15 min at 4 °C. The cell pellet was resuspended in Binding Buffer containing 20 mM TRIS (pH 7.5), 500 mM NaCl, 2.5mM imidazole, and 5% (v/v) glycerol, supplemented with lysozyme and stored at -80 °C. For purification the frozen sample was thawed in ice then subjected to sonication on ice at 20% power for 5 min with alternating 30 s pulse and recovery periods. The lysate was cleared by centrifugation at 20,000 x g for 1 h at 4 °C, passed through a Whatman GD/X 25 filter, and added to a 1 mL Nuvia (Bio-Rad) Ni 2+ -column. After 15 mL of additional Binding Buffer, the column was washed with 7 mL of the same buffer but with 25 mM imidazole. Elution used the buffer with 500 mM imidazole into 1 mL fractions. Samples containing protein were identified by a rapid Bradford assay, and 2.5 mL were pooled and exchanged back to the Binding Buffer by using PD-10 columns. After an overnight digest with 100 µg/mL His 6 -tagged tobacco etch virus protease (TEV) at 4 °C, the sample was added to a 1 mL Nuvia Ni 2+ -column and the flow-through was collected. An additional 1.5 mL of Binding Buffer was added to ensure that all cleaved protein was collected. The sample was concentrated to 500 µL using 10 kDa cutoff Amicon Ultra 4 mL centrifugation device then changed to the final NMR buffer of 20 mM TRIS (pH 7.0), 150 mM NaCl, 2 mM DTT using a NAP-5 column. A final concentration of the sample again used a 10 kDa cutoff Amicon Ultra 4 mL centrifugation device. Analysis of the samples by SDS-PAGE confirmed sample quality and a 1:1 ratio of ecCPSF73(452-567):ecCPSF100(525-639). Heterodimer protein concentration was determined from absorbance at 280 nm and an extinction coefficient of 22,970 M − 1 cm − 1 calculated by using ProtParam (https://web.expasy.org/protparam/). The final samples included 10% (v/v) D 2 O added for the lock, or for the unlabelled and aromatic-specific deuterated samples prepared in 100% D 2 O. Assignment spectra were collected at 298 K on a Bruker Neo Avance spectrometer at 700 MHz or 800 MHz, equipped with a standard triple resonance gradient probe or cryoprobe, respectively. NMR data were processed by using NMRPipe/NMRDraw software (Delaglio et al. 1995) and NMR spectra were visualized and analyzed using Sparky (T. D. Goddard & D. G. Kneller, University of California, San Francisco, USA).
The backbone 1 H N , 1 H α , 13 C', 13 C α , 13 C β , and 15 N H resonance assignments for the minimal C-terminal heterodimer were initially determined on a sample of 350 µM [ 13 C, 15 N]ecCPSF73(452-567)/ecCPSF100(525-639) by using the data from a two-dimensional (2D) 1 H, 15 N-TROSY spectrum, and three-dimensional (3D) HSQC versions of HNCO, HN(CA)CO, HNCA, HNCACB, CBCA(CO) NH, HNHA, HA(CACO)NH, and N(COCA)NNH spectra. TROSY versions of 3D HNCO and HNCA spectra were also collected to improve the sensitivity of certain resonances. Due to the large number of residues in the complex, and the corresponding spectral overlap, a standard approach was insufficient to complete the assignment. Unfortunately, it is only possible to produce the complex by co-expression and thus labelling of a single peptide within the complex is not possible. We therefore opted to produce three additional 15 N-labelled samples in which only isoleucine, valine or leucine were 13 C-labelled. The concentration of the heterodimer in these 13 C-Ile, 13 C-Val and 13 C-Leu samples was 625, 150 and 280 µM, respectively. TROSY 3D HNCO and HNCA spectra were acquired for the 13 C-Ile, 13 C-Val and 13 C-Leu samples, and 2D 1 H-15 N planes of HSQC-HNCO and HSQC-HNCA spectra were also measured for the 13 C-Ile, 13 C-Val samples. Annotated overlays of these latter spectra are included as examples (Fig. 1 A,B).
Isoleucine and valine sidechain resonances required 3D H(C)CH-TOCSY and (H)CCH-TOCSY spectra measured on the 625 µM 13 C-Ile and 150 µM 13 C-Val samples, respectively. In addition, a 3D (H)CCH-TOCSY spectrum was acquired on the 280 µM 13 C-Leu sample. Methyl-centred constant-time 13 C-HSQC spectra were collected for all four samples, and additionally on a 880 µM 10% 13 C-labelled sample in order to allow for stereospecific methyl assignment (Senn et al. 1989). Since this latter sample was also uniformly 15 N-labelled, it was used to collect a 3D 15 N-HSQC-TOCSY (60 ms mixing time) spectrum to validate the connection between the sidechain assignments and backbone amide resonances.
Aromatic side chain resonances were assigned based on 2D 1 H, 1 H-NOESY (120 ms mixing time), 1 H, 1 H-TOCSY (60 ms mixing time) and double-quantum-filtered 1 H, 1 H-COSY Fig. 1 Selected spectra from heterodimer samples with amino acid-specific labelling. (A,B) 2D HNCO spectra (red) corresponding to 1 H-15 N crosspeaks, overlaid onto 2D HNCA (black) for samples that are uniformly 15 N-labelled with additional 13 C-labelling for (A) isoleucine, or (B) valine. Crosspeaks are labeled with the residue name (single amino acid letter code) and number, with residues from ecCPSF73 in normal text, and those from ecCPSF100 in italics. (C) 1 H, 1 H-TOCSY and (D) 2D double-quantum-filtered COSY of unlabelled heterodimer in D 2 O (black), overlaid with similar spectra collected from samples labelled with 2 H-phenylalanine (red) or a combination of 2 H-tyrosine and 2 H-tryptophan (cyan). Example spin systems corresponding to aromatic protons in Tyr462 in ecCPSF73 and Phe625 in ecCPSF100 are annotated with dotted lines. The spectra were recorded at 298 K at a field strength of 700 MHz

Extent of assignments and data deposition
By using four different samples, the backbone resonance assignment is essentially complete (Fig. 2). Missing backbone resonances are mainly due to line broadening in the N-terminal Gly-His-Met-Leu residues of the ecCPSF73(452-567) peptide, and for the ecCPSF100(525-639) peptide the N-terminal Met-Ser-Asp, residues Pro570-Arg571, residue Gly612, and several residues within a likely loop from Gly623-Tyr626. There are also fourteen other backbone resonances scattered throughout the two peptides that cannot be unambiguously assigned. The backbone assignments are 95% complete for 1 H N , 93% for 15 N H , 94% for 13 C', 97% for 13 C α , and 95% for 13 C β . Anomalous backbone chemical shift values for ecCPSF73 include upfield-shifted Ser507 1 H N (5.79 ppm) and Asp485 15 N H (112.96), and downfieldshifted Glu547 1 H α (5.24 ppm). For ecCPSF100, anomalous spectra, from a 458 µM unlabelled sample prepared in D 2 O. The assignment process was assisted by the use of two additional samples in which one or more deuterated aromatic amino acids were used during the expression. 1 H, 1 H-TOCSY (60 ms mixing time) and double-quantumfiltered 1 H, 1 H-COSY spectra were collected on a sample of 2 H-Phe (150 µM), and a sample including both 2 H-Tyr and 2 H-Trp (85 µM), with each sample prepared in D 2 O. By using an overlay of all three spectra, the sidechain aromatic 1 H spin systems were assigned (Fig. 1 C,D). The aromatic assignments were subsequently connected to the aliphatic part of the residue by using the initial 2D 1 H, 1 H-NOESY on the unlabelled heterodimer in D 2 O.

Conflict of interest
The authors declare that they have no conflict of interest.

Chemical shift analysis
As a first step in the characterization of the minimal C-terminal heterodimer of ecCPSF73-ecCPSF100, secondary structure predictions were obtained by using TALOS-N (Fig. 3A,B) from the online server (https://spin.niddk.nih. gov/bax/nmrserver/talosn/; Shen and Bax 2013). Secondary chemical shift values (Δδ) were also determined ( Fig. 3 C,D) based on a comparison to backbone 13 C α and 13 C' values predicted for the random coil by using the online ncIDP server (https://st-protein02.chem.au.dk/ncIDP/; Tamiola et al. 2010). The overall pattern of secondary structure elements is similar between the constructs from ecCPSF73 and ecCPSF100.