CTCF ZF domain is enriched for somatic missense mutations in cancer
We analysed cancer genome sequencing databases and published mutation data to determine the distribution, frequency and nature of somatic mutations occurring in CTCF (Supplementary Table 1). The distribution and frequency of all known somatic mutations in CTCF is shown with recurrent mutant residues indicated (Fig. 1A). The recurrent T204fs*26 and T204fs*18 mutations in CTCF arise due to a high frequency of insertions or deletions within a 30 bp purine-rich (> 85%) region at c.1048 –c.1077 encoding T204. Frequently occurring missense or nonsense mutations occur at H284, S354, R377, R448 and R457 within the ZF region of CTCF (Fig. 1A). Further analysis revealed that inactivating nonsense and frameshift mutations account for ~ 40% of somatic CTCF mutations (Fig. 1B). This result exceeds the ‘20/20 rule’ for tumour suppressor gene classification which requires that > 20% of somatic mutations are inactivating5 and affirms our earlier work demonstrating CTCF’s role as a tumour suppressor31–33. CTCF mutations occur prominently within hormone-responsive cancers arising in the endometrium and breast (~ 48%) (Fig. 1C).
We next examined somatic missense mutations and SNPs reported for CTCF and compared their observed and expected occurrences (Supplementary Table 2). CTCF’s ZF domain has a significant enrichment for somatic missense mutations observed over the number expected for its relative size, such that the observed/expected (O/E) ratio = 1.47, (p < 0.0001). Furthermore, there was a de-enrichment of non-synonymous SNPs occurring within the ZF domain (O/E = 0.48, p < 0.0001) (Fig. 1D, Supplementary Table 2). These results suggest that the human CTCF ZF region is intolerant to normal genetic variation, but is frequently inactivated in cancer. As ZF mutations would likely affect DNA binding, these are likely to have a significant impact on CTCF function. There is a concomitant paucity of missense somatic mutations within the N- and C-termini of CTCF (O/E = 0.63, p < 0.0001 and O/E = 0.65, p = 0.0269 respectively, Fig. 1D, Supplementary Table 2). Strikingly, the opposite trend is observed for SNPs in CTCF with an enrichment of missense SNPs in the N-terminus (O/E = 1.27, p = 0.0269) and C-terminus (O/E = 1.63, p = 0.0032) Fig. 1D, Supplementary Table 2). We then determined the potential functional impact of somatic missense mutations in CTCF using Polyphen analysis. Missense mutations exhibited an overall greater functional impact than missense SNPs (0.80 ± 0.35 vs 0.49 ± 0.44, mean ± SD, p < 0.0001, Supplementary Fig. 1A). Further analysis indicated that there was a decrease in the ratio of transition to transversion mutations when comparing SNPs to missense somatic mutations (2.24 to 1.19 respectively, p < 0.0001, Supplementary Fig. 1B). These data provide further support for the role of CTCF as a tumour suppressor that is frequently mutated and functionally impacted in cancer.
As the majority of somatic missense mutations in CTCF occur within the ZF domain we next analysed the distribution of missense mutations in specific ZFs of CTCF. We found that the greatest proportion of mutations occurred in ZF4 (~ 20%), followed by ZF3 (~ 15%) (Fig. 1E). ZFs 3–7 have been shown to be responsible for binding CTCF’s core 15 bp consensus, with other ZFs providing binding specificity depending on adjacent motifs15,43. A sequence logo depicting all 11 ZFs in CTCF (10 C2H2- and 1 C2HC-type) shows the conserved Cys and His residues that co-ordinate Zn2+ binding, an invariant hydrophobic Leu or Met residue at + 4 and substantial amino acid variation at other positions (Fig. 1F). The proportion of mutations occurring at each position within ZFs was determined. This analysis revealed that the proportion of inter-ZF mutations was 31.5%, Cys/His mutations (17.7%) and those affecting key DNA binding residues (-1, + 2, +3, + 6, 15.6%). Thus, approximately one-third of missense CTCF ZF mutations have an unknown impact but likely affect ZF folding and stability.
CTCF ZF mutations exhibit loss- and gain-of-function in cell growth phenotypes in vitro
To determine the functional consequences of CTCF ZF mutations, we examined missense mutations that had been detected in acute lymphoblastic leukaemia (ALL) samples: L309P (T-ALL; Mullighan unpublished) R339Q39, R377H44 and G420D (diagnosis and relapsed hyperdiploid B-ALL; Mullighan unpublished) (Fig. 2A, Supplementary Table 1). R377H occurs within the inter-ZF region, L309P affects the conserved intra-ZF Leu/Met residue, whilst G420D and R339Q both occur at key DNA contacting residues + 2 and + 6, respectively (Fig. 2A). We included R339W as a positive control as it was first identified in Wilms’ tumour as a potential ‘change-of-function’ mutation that abrogated DNA binding to a subset of CTCF sites regulating genes involved in cell proliferation34. All five mutations exhibit high Polyphen scores, indicating they significantly impact CTCF function (Fig. 2A). We introduced these mutations into HA epitope-tagged human CTCF within a lentiviral expression vector that co-expresses eGFP via a 2A peptide33. We transduced K562 erythroleukemia cells with CTCF WT and mutant constructs and showed that ectopic CTCF expression occurred at similar levels and above endogenous CTCF levels (Fig. 2B). Immunofluorescent staining for ectopic HA-tagged CTCF indicated that all CTCF mutants maintained nuclear localisation similar to WT CTCF (Fig. 2C). We next examined cell growth and showed that WT CTCF overexpression suppressed cellular proliferation (p < 0.0001) consistent with it being a tumour suppressor and as previously shown33 (Fig. 2D). Mutants L309P, R377H and R339W abrogated the tumour suppressive effect of CTCF and exhibited cellular proliferation similar to the empty vector control (all p < 0.0001 compared to WT), whilst R339Q had an intermediate effect on CTCF’s anti-proliferative function (p < 0.0001 compared to WT; p < 0.001 compared to control, Fig. 2D). K562 cells expressing CTCF G420D exhibited similar proliferation to WT CTCF (Fig. 2D). We next performed clonogenicity assays and showed that WT CTCF suppressed the colony-forming abilities of K562 cells as expected (p < 0.0001, Fig. 2E). Again, L309P, R377H and R339W abrogated the suppressive effect of CTCF on colony formation (p < 0.0001) whilst R339Q had an intermediate effect compared to control (p < 0.0001) and a near-intermediate effect compared to WT (p = 0.052). Remarkably, G420D exhibited gain-of-function by further reducing the clonogenic capacity compared to WT (Fig. 2E).
CTCF ZF mutations disrupt transcriptional activity
We next examined the impact of ZF mutations on transcriptional regulation by CTCF. Frequently occurring N-and C-terminal somatic missense mutations (Y226C and R603C respectively) were included as additional controls. Y226 is a key anchoring residue in the interaction of CTCF with the SA2-SCC1 cohesin complex45, whilst R603 resides within the nuclear localisation signal. Lentiviral plasmids encoding WT or mutant CTCF (Fig. 3A) were transfected into HEK293T cells followed by quantitation of CTCF protein and eGFP fluorescence levels. All CTCF ZF mutants exhibited decreased levels of ectopic CTCF expression to levels ~ 10–20% of WT, whilst non-ZF mutants demonstrated levels comparable to, or higher than, WT control (Fig. 3B & C). WT CTCF suppressed eGFP expression compared to eGFP control (p < 0.0001, Fig. 3C) whilst decreased eGFP expression was observed with CTCF mutants R339Q, R339W, R377H and G420D compared to WT. L309P had no impact; however, non-ZF mutations (Y226C and R603C) exhibited higher eGFP expression than WT (p = 0.0248 and p = 0.0061 respectively), but similar to empty vector control (Fig. 3D). These data suggested that ectopic plasmid-encoded CTCF could regulate its own expression. This was supported by the prediction of over a dozen putative CTCF binding sites in the CTCF WT and mutant plasmid backbone including within the CMV promoter that drives viral RNA expression (horizontal dashes, Fig. 3A). Accordingly, CTCF ZF mutants exhibited diminished CTCF protein expression and lower eGFP expression (Fig. 3B, C & D). Collectively, these data indicate that CTCF ZF mutants impact on CTCF’s normal role as a transcriptional regulator, which most likely results from disruption or destabilisation of DNA binding.
To examine this, we performed chromatin immunoprecipitation (ChIP) to determine if ZF-mutant disruption of transcriptional regulation leads to abrogation or alteration of DNA binding at CTCF target sites. Notably, we achieved equivalent levels of HA-tagged WT and ZF mutant CTCF in K562 cells after lentiviral transduction (~ 15–20% for all, Supplementary Fig. 2). We then performed ChIP using an anti-HA antibody, followed by PCR amplification of known CTCF target sites (Fig. 4). We observed both WT and mutant CTCFs still associating with archetypal CTCF target sites such as the H19 imprinting control region (ICR) and the β-globin hypersensitivity site HS5 (Fig. 4A). However, variegated CTCF mutant binding was detected at other cognate CTCF target sites proximal to the regulatory regions of BAG1, MAGEA1, XIST, BRCA1, PLK and APPβ (Fig. 4A). All CTCF ZF mutants exhibited a selective loss of DNA binding, with L309P, R339Q and R337H mutations exhibiting the greatest loss in binding (Fig. 4A-E). All CTCF mutants except G420D exhibited some loss of binding within the archetypal CTCF-regulated gene C-MYC (Fig. 4B). CTCF binding sites within known enhancers (Fig. 4C), insulator sites (Fig. 4D) and TAD boundaries (Fig. 4E) all showed selective binding by most CTCF ZF mutants. As CTCF binding is not completely abrogated, these data are consistent with CTCF ZF mutants displaying a change-of-function rather than loss-of-function.
Molecular dynamics (MD) simulations explain CTCF loss- and gain-of-function ZF mutant phenotypes
To gain insights into the structural impact of these mutations we modelled them on the published crystal structure of CTCF’s ZF domain (ZFs 2–7) in complex with DNA43 and performed molecular dynamics (MD) simulations. The locations of the 4 mutated ZF residues were superimposed on the CTCF structure (Fig. 5A). The folding free energy change (ΔΔG) calculated for all 5 resulting ZF mutations indicate that each of the mutations are destabilizing. L309P is predicted to have the most severe impact on CTCF folding (ΔΔG = 12.05 kcal/mol), compared to R339Q (ΔΔG = 6.87 kcal/mol), R339W (ΔΔG = 5.00 kcal/mol), R377H (ΔΔG = 5.64 kcal/mol) and G420D (ΔΔG = 1.91 kcal/mol) (Fig. 5B). Time evolution studies of secondary structure in WT and mutant CTCF ZF domain indicate that structural elements are stable at the location of each mutation (Supplementary Fig. 3). However, β-sheet-forming elements (red) are disrupted by: L309P (ZF2), R339Q, R339W (ZF3) and R377H (ZF4-5) between aa 353–363 in ZF4; and R339W, R377H and G420D (ZF6) between 295–305 in ZF2. In all mutants, the β-sheet and turn structure at aa 408–418 (ZF6) is also disrupted (Supplementary Fig. 3).
To examine each mutation in more detail, we examined the superimposed structures of WT and mutant CTCF ZF structures. L309 is facing away from DNA and does not directly contact DNA either before or after mutation to Pro (Fig. 5C). Despite this, analysis of molecular interactions between neighbouring CTCF amino acid residues and DNA revealed 7 existing bonds were lost, whereas 12 new bonds were formed (Supplementary Table 3). Root-mean-square deviation (RMSD) measurements showed that the L309P mutation induced a substantial increase in the deviation of the ZF2 backbone compared to WT over the 10 ns simulation run (Figs. 6A & B, p < 0.0001). Similarly, root-mean-square fluctuation (RMSF) measurements spanning the entire ZF 2–7 structure (Supplementary Fig. 4) indicated that there was a considerable increase in flexibility (p < 0.0001, Fig. 6C). Consequent to all the conformational changes, the distance of the ZF2 centroid from the DNA centroid was also increased (0.916 Å) in the L309P mutation (Fig. 6D).
Arginine 339 at DNA-binding position ‘+6’ within ZF3 directly contacted guanine (G14) and cytosine (C13) residues on one DNA strand via two hydrogen bonds and one cation-π bond, however mutation to Q (R339Q) or W (R339W) abolished these bonds (Fig. 5D & E, Supplementary Table 3). Remarkably, Q339 formed two new hydrogen bonds: firstly, between the Gln side chain carbonyl group and cytosine (C15); and secondly, between the side chain amide group and thymine (T7) on the complementary strand (Fig. 5D). Both mutations also disrupted the interaction of E336 with cytosine (C15), with 6 and 4 new bonds formed at neighbouring residues for Q339 and W339 respectively (Supplementary Table 3). MD simulations showed that the R339Q triggers less conformational deviation than WT or R339W (Figs. 6A & B, both p < 0.0001), however over the entire ZF structure R339Q and R339W both exhibited more flexibility than WT (Fig. 6C, p = 0.0018 & p < 0.0001 respectively). Consequently, R339Q shifted ZF3 towards DNA (2.342 Å) and in the case of R339W, ZF3 moved away from DNA (3.021 Å) (Fig. 6D).
R377H, which occurs in an inter-ZF residue between ZF4 and ZF5, disrupted three hydrogen bonds that stabilise the interaction of R377 with the DNA phosphate moiety at guanine (G8) (Fig. 5F). Adding to this, 22 neighbouring molecular contacts are lost and 22 new bonds are formed (Supplementary Table 3). RMSD measurements show that R377H induced an increased deviation in the conformation over time (Figs. 6A & B, p = 0.0439) and increased flexibility in the entire ZF structure (Fig. 6C, p < 0.0001). As a result, both ZF4 and ZF5 were shifted away from the DNA phosphate backbone (1.643 Å & 2.718 Å respectively, Fig. 6D).
Finally, CTCF modelling confirms glycine at position 420 and DNA-binding position ‘+2’ in ZF6 does not directly contact DNA (Fig. 5G). However, when mutated to aspartic acid (G420D), a new hydrogen bond is formed between the side chain carbonyl group and cytosine (C16) in the core consensus sequence (Fig. 5G). A net loss of 4 bonds at neighbouring residues was also observed (Supplementary Table 3). RMSD measurements showed that G420D exhibited decreased structural deviation during the simulation run (Figs. 6A & B, p < 0.0001) and decreased RMSF values compared to WT indicating reduced flexibility (p < 0.0001, Fig. 6C). Consequently, G420D resulted in ZF6 shifting 1.841 Å toward the DNA (Figs. 6D).
In summary, our data suggest that mutations R339W and R377H disrupted CTCF’s primary interactions with DNA and, along with the highly destabilising L309P mutation, are responsible for shifting ZF domains away from DNA. Importantly, R339Q and G420D both formed new primary bonds and the associated ZF domain moved nearer to the DNA.