Myeloid leukemia vulnerabilities at CTCF-enriched long noncoding RNA loci

The noncoding genome presents a largely untapped source of biological insights, including 2 thousands of long noncoding RNA (lncRNA) loci. While some produce bona fide lncRNAs, 3 others exert transcript-independent cis -regulatory effects, and the lack of predictive features 4 renders mechanistic dissection challenging. Here, we describe CTCF-enriched lncRNA loci 5 (C-LNC) as a subclass of functional genetic elements exemplified by MYNRL15, a pan-mye- 6 loid leukemia dependency identified by an lncRNA-based CRISPRi screen. MYNRL15 per- 7 turbation selectively impairs acute myeloid leukemia (AML) cells over hematopoietic stem / 8 progenitor cells in vitro , and depletes AML xenografts in vivo . Mechanistically, we show that 9 crucial DNA elements in the locus mediate its phenotype, triggering chromatin reorganization 10 and downregulation of cancer dependency genes upon perturbation. Elevated CTCF density 11 distinguishes MYNRL15 and 531 other lncRNA loci in K562 cells, of which 43-54% associate 12 with clinical aspects of AML and 18.4% are functionally required for leukemia maintenance. 13 Curated C-LNC catalogs in other cell types will help refine the search for noncoding onco- 14 genic vulnerabilities in AML and other malignancies. 15

tory elements (CREs), including H3K4Me1 and H3K27Ac histone marks, DNase hypersensi-69 tivity, and transcription factor occupancy (Fig. 2a). The screens uncovered two crucial DNA 70 regions whose accessibility and integrity were required for the maintenance of leukemic cells 71 (Fig. 2b, Extended Data Fig. 3a), and which enhanced reporter gene expression in dual lucif-72 erase assays (Extended Data Fig. 3b) -identifying the regions as functional sequences and 73 candidate cis-regulatory elements (cCREs C1 and C2). We note that the parallel screening 74 strategy using the dCas9 variant helps alleviate the risk of potential off-target DNA damage-75 driven phenotypes 28 , thus increasing the robustness of the screen. The CRISPR-Cas9 muta-76 genesis strategy also reiterates that leukemia cells do not appear particularly dependent on 77 the UNC45A and HDDC3 coding sequences, arguing against local enhancer functions on 78 these genes underlying the anti-leukemic effect of MYNRL15 perturbation. 79 Aiming to identify target genes and pathways controlled by the MYNRL15 cCREs, we next 80 performed RNA sequencing (RNA-seq) upon CRISPR-Cas9 mediated perturbation of the lo-81 cus. We opted for the CRISPR-Cas9 system in order to achieve a more targeted perturba-82 tion of MYNRL15 and attenuate effects on UNC45A and HDDC3 caused by CRISPRi. We 83 selected two guides from each cCRE, all of which robustly depleted K562 and ML-2 leuke-84 mia cells (Fig. 2c). This phenotype was underpinned by global changes in gene expression 85  To explore this option, we applied a sliding window approach to gene set enrichment analy-

Altered chromosome 15 architecture underlies the MYNRL15 perturbation phenotype 96
Given the deregulation of chromosome 15 neighborhoods upon MYNRL15 perturbation, we 97 explored whether MYNRL15 may be involved in chromatin conformation via next generation 98 Capture-C (NG Capture-C) 30 , using probes complementary to MYNRL15 cCRE C1 to enrich 99 for interactions involving the locus. This approach revealed extensive chromatin contacts be-100 tween MYNRL15 and sequences within a 500 kb radius, with weaker contacts occurring up 101 to 2 Mb away -a profile that was consistent across K562 and ML-2 cells (Fig. 3a-c). The lo- Integrating our findings regarding altered chromatin conformation and gene expression with 118 cancer dependency data, we eventually pinpointed the target genes of MYNRL15 through a 119 small-scale CRISPR-Cas9 screen of the 29 protein-coding genes located in the gained inter-120 action region. Combining these results with differential expression information, we identified 121 two downregulated protein-coding genes in the region that also score as leukemia depend-122 encies: IMP3 and WDR61 (Fig. 3f, Extended Data Fig. 6d-e). Notably, WDR61 is a compo-123 nent of the PAF1 complex (PAF1c), which is involved in important transcriptional programs 124 during hematopoiesis and leukemogenesis 31,32 . Gene expression changes induced by Paf1c 125 inactivation 33 were also detected upon MYNRL15 perturbation (Extended Data Fig. 6f). IMP3 126 encodes a component of the 60-80S U3 small nucleolar ribonucleoprotein, which is required 127 for cleavages in pre-18S ribosomal RNA processing 34 . It is a homolog of the yeast Imp3 pro-128 tein and has not been comprehensively studied in human cells to date. CRISPR-Cas9 medi-129 ated knockout of WDR61 and IMP3 robustly depleted K562 and ML-2 cells (Extended Data 130 Fig. 6g), recapitulating the MYNRL15 perturbation phenotype and positioning these genes 131 as targets of MYNRL15 (Fig. 3g). 132

AML specificity and potential therapeutic applicability of MYNRL15 133
To evaluate whether MYNRL15 dependency is specific to leukemic cells, we leveraged all-134 in-one lentiviral CRISPR-Cas9 constructs in primary human CD34 + HSPCs and blasts de-135 rived from two AML patients (see Supplementary Table 1 for clinical characteristics). The 136 transduced cells were sorted and seeded in methylcellulose-based colony-forming assays. 137 While MYNRL15 perturbation moderately attenuated colony formation in CD34 + HSPCs, it 138 had little effect on replating capacity and differentiation (Extended Data Fig. 7a). In contrast, 139 AML colony-forming units were virtually eradicated (Extended Data Fig. 7b) -implying that 140 MYNRL15 perturbation selectively impacts AML cells, and outlining a possible therapeutic 141 window (Fig. 4a). 142 To assess the therapeutic potential of MYNRL15 perturbation, we applied CRISPRi-based 143 two-color competitive xenotransplantation assays using AML cell lines and patient-derived 144 xenografts (PDXs) (Fig. 4b). Importantly, MYNRL15 perturbation significantly impaired the 145 propagation of two AML cell lines and two PDXs (Fig. 4b, Extended Data Fig. 7c-d) in the 146 hematopoietic organs of recipient mice, confirming its capacity to deplete leukemic cells in 147 vivo. Combined with the apparent AML-specific effect of MYNRL15 perturbation, these re-148 sults provide a proof-of-principle of how MYNRL15 perturbation may be leveraged as a ther-149 apeutic strategy. 150

MYNRL15 exemplifies a putative new class of CTCF-bound long noncoding RNA loci 151
Having implicated MYNRL15 in 3D genome organization and demonstrated its therapeutic 152 potential, we explored whether MYNRL15 may belong to a sub-category of biologically rele-153 vant lncRNA loci that have thus far been overlooked due to their lack of transcript-specific 154 functions. Given the effect of MYNRL15 on chromatin architecture and the multiple CTCF 155 binding sites in the locus, we explored CTCF density as a predictive metric for identifying 156 noncoding regulatory loci like MYNRL15 (Fig. 5a, Extended Data Fig. 8a-c). Interestingly, 157 log 10 -transformed values of this metric followed a near-normal distribution, and a cut-off of 158 two standard deviations from the median identified 654 genes with elevated CTCF density 159 which were highly enriched for lncRNAs (>80%, n=531 using K562 ChIP-seq data) (Fig. 5a,  160 Extended Data Fig. 8a-b). Inversely, the remainder comprised mainly of protein-coding loci 161 As a first step in determining the relevance of C-LNCs to myeloid leukemia, we tested their 170 association with clinical aspects in two AML patient cohorts 35,36 . This revealed that 43% and 171 54% of the identified C-LNCs associated with genetically-defined AML subgroups or patient 172 survival in the two AML cohorts, respectively ( Fig. 5a- ing that activity at these loci could underpin aberrant transcription factor programs and/or in-174 fluence patient outcomes. Furthermore, 18.4% functionally validated as essential for myeloid 175 leukemia maintenance in CRISPR-Cas9 screens tiling the CTCF sites in the loci (Fig. 5c, Ex-176 tended Data Fig 8f) -a hit identification rate that is substantially higher than what is typically 177 reported for lncRNA essentiality screens [37][38][39] (ranging from 2-6%), including our own (4.6%; 178 There is general agreement that the current lncRNA classification system leaves much to be 186 desired, necessitating extensive experimental labor in order to discern between the possible 187 modes of action for any given lncRNA 22  In our study, MYNRL15 perturbation resulted in the formation of a long-range chromatin in-200 teraction, leading to the downregulation of WDR61 and IMP3 and tumor suppression. Given 201 the accompanying reduction of CTCF occupancy, we expect this to occur through a mecha-202 nism similar to topologically associating domain (TAD) fusion [42][43][44] , although on a larger scale 203 than typically observed for TADs 45 . Alternatively, or perhaps concurrently, the attenuation of 204 CTCF binding upon MYNRL15 perturbation may strengthen compartmentalization 46 and pro-205 mote longer-range, higher-order architecture. We note that, while there is substantial overlap 206 between enhancer RNA (eRNA) and lncRNA annotations 47 , and while some of our data sup-207 port a local enhancer-like function for MYNRL15, we did not find evidence for locally-driven 208 phenotypes or RNA function, and the long-range architectural changes upon perturbation of 209 the locus especially separate MYNRL15 from classical eRNAs. 210 Given the attenuated impact of MYNRL15 perturbation on normal HSPCs compared to AML 211 cells, we surmise that distal connectivity may be the native conformation of the locus that is 212 lost during leukemic transformation; thus re-introducing it would selectively impair leukemic 213 cells. The oncogenic rewiring of 3D chromatin architecture through mutations and structural 214 variants has been reported in cancer 44,[48][49][50][51] . However, it is unlikely that genetic alteration un-215 derlies MYNRL15's role in leukemia, since the locus is required by cells from varied cytoge-216 netic and mutational backgrounds, and its perturbation drives matching chromatin changes 217 in two divergent cell lines. We speculate instead that MYNRL15 may be involved in unifying 218 leukemic genome organization signatures -a phenomenon that has long been established 219 for stemness-related expression and epigenetic signatures 24,25,39 . Recent works have begun 220 to implicate aspects of chromatin architecture in cell state transitions during hematopoiesis 52-221 55 and in the maintenance of leukemic transcription programs [56][57][58] . We expect future studies 222 will further reveal leukemic 3D genome organization signatures that underpin general onco-223 genic behaviors, irrespective of mutational drivers. 224 An important future direction will be to ascertain whether C-LNCs, like MYNRL15, contribute 225 to cancer-related chromatin architecture as a class. Based on their elevated CTCF densities 226 and other shared features, we hypothesize that many may function through similar mecha-227 nisms as MYNRL15. With our preliminary catalog of C-LNCs spanning various human cell 228 types and tissue contexts (www.C-LNC.org), we provide a resource that will help catalyze 229 future research and lay a foundation for unravelling principles of C-LNC function in healthy 230 and malignant cells. Given the high essentiality rate that we observed in myeloid leukemia, 231 C-LNCs could symbolize a major refinement in the search for both functional lncRNA loci 232 and noncoding oncogenic vulnerabilities across all types of cancer. 233  Table 4. 279

Gene expression analyses 379
RNA was isolated from cells using the Quick-RNA TM Microprep or Miniprep Kits (Zymo Re-380 search). RNA fractionation was performed as previously described 76 , except that we lysed 381 the nuclear pellet directly rather than isolating the nuclear-soluble and chromatin-associated 382 fractions separately. The TURBO DNA-free TM Kit (Invitrogen) was used for DNase treatment.  (150 bp paired-end reads) using polyA-enriched total cellular RNA. The raw sequence data 391 were processed by Novogene using a standard pipeline. In brief, reads were filtered using 392 in-house scripts and mapped to human reference genome hg38 using HISAT2 77 , and gene 393 expression was quantified using featureCounts 78 in R. The processed count data were sub-394 sequently analyzed in R using DESeq2 74 (Bioconductor). Gene sets from MSigDB v7.2 (H1, 395 C2, C3, C6), custom hematopoietic 16 and chromosome 15 gene sets, and PAF1c-knockout 396 gene expression signatures 33 were checked for enrichment via the Broad GSEA software 79 . 397 The custom positional gene sets were generated by walking a 1 Mb or 5 Mb window along 398 chromosome 15 and taking expressed genes within the windows. 399

NG Capture-C 400
Chromatin conformation capture with selective enrichment for MYNRL15-interacting se-401 quences was performed using next generation (NG) Capture-C as previously described 30

ATAC-seq 423
We performed assay for transposase accessible chromatin sequencing (ATAC-seq) as pre- end reads). The data processing was also performed by Novogene: In brief, raw reads were 428 trimmed and filtered using Skewer 88 and clean reads were aligned to hg19 with BWA 89 . Mito-429 chondrial reads were removed prior to subsequent analysis. Normalized pileups were gener-430 ated using deepTools 90 and viewed in the Integrated Genomics Viewer (IGV) 91 . 431

ENCODE datasets 445
The following K562 ChIP-seq data were used in this study: CTCF (ENCFF519CXF), SMC3 446 followed an approximately normal distribution; thus, we defined elevated CTCF density as 471 over two standard deviations above the median. Our analysis focused on loci that produce 472 long coding or noncoding transcripts (>200 nt) and included the following biotypes: protein 473 coding, lncRNA, lincRNA, processed transcript, and pseudogene. Refer to Supplementary 474 Table 11

TCGA/TARGET survival analysis 485
Event-free survival was defined as the time elapsed between diagnosis and the first event or 486 last follow-up. An event was defined as death from any cause, failure to achieve remission, 487 relapse, and secondary malignancy. Failure to achieve remission was considered an event 488 on day 0. Overall survival was defined as time elapsed between diagnosis and death from 489 any cause or last follow-up. We used the Kaplan-Meier method of estimating survival rates 490 and two-sided log-rank tests to compare differences in survival, as implemented in the sur-491 vival 99 and survminer 100 packages (base R). DESeq2 (Bioconductor) was used to normalize 492 and variance-stabilize read count data 74 from the TCGA 36 and NCI-TARGET 35

AML cohorts. 493
The NCI-TARGET dataset also required batch correction, for which we used sva 101 (Biocon-494 ductor). Normalized (and batch corrected) gene expression values were used for all subse-495 quent analyses. For patient stratification, optimal cut-offs were determined via maximally se-496 lected log-rank statistics as implemented in the maxstat package 102 (base R). 497

Statistical analyses and definitions 498
Statistical evaluations of experimental data were carried out in GraphPad Prism 8 using un-499 paired, two-tailed t tests. Data are presented as mean ± s.d. or s.e.m. as indicated in the fig-500 ure legends. Statistical analysis of gene expression data (RNA-seq) was performed in R us-501 ing DESeq2 (Bioconductor). Survival analysis was also done in R using the survival and sur-502 vminer packages (base R). CRISPR-Cas9 screening data were analyzed with the MAGeCK 503 suite, with the exception of the tiling screens, which were analyzed in R using DESeq2. In all 504 cases, measurements were taken on at least two biological replicates and P<0.05 was con-        downregulation upon MYNRL15 perturbation using sgRNA C1.1. f, Retrieval of PAF1c lossassociated gene sets upon MYNRL15 perturbation in our RNA-seq data. g, Individual proliferation assays validating depletion of K562 and ML-2 cells upon WDR61 and IMP3 knockout, compared to MYNRL15 perturbation using sgRNA C1.1 (n=4, mean ± s.e.m.; 4 guides each targeting WDR61 and IMP3). ****P<0.0001; all conditions shared the same P-value. Figure 7: Extended data from MYNRL15 perturbation experiments in primary and patient-derived cells.