Expression of the CIC-DUX4 fusion oncoprotein mimics human CIC-rearranged sarcoma in genetically engineered mouse models

CIC-DUX4 sarcoma (CDS) is a rare but highly aggressive undifferentiated small round cell sarcoma driven by a fusion between the tumor suppressor Capicua (CIC) and DUX4. Currently, there are no effective treatments and efforts to identify and translate better therapies are limited by the scarcity of patient tumor samples and cell lines. To address this limitation, we generated three genetically engineered mouse models of CDS (Ch7CDS, Ai9CDS, and TOPCDS). Remarkably, chimeric mice from all three conditional models developed spontaneous tumors and widespread metastasis in the absence of Cre-recombinase. The penetrance of spontaneous (Cre-independent) tumor formation was complete irrespective of bi-allelic CIC function and the distance between loxP sites. Characterization of primary and metastatic mouse tumors showed that they consistently expressed the CIC-DUX4 fusion protein as well as other downstream markers of the disease credentialing these models as CDS. In addition, tumor-derived cell lines were generated and ChIP-seq was preformed to map fusion-gene specific binding using an N-terminal HA epitope tag. These datasets, along with paired H3K27ac ChIP-seq maps, validate CIC-DUX4 as a neomorphic transcriptional activator. Moreover, they are consistent with a model where ETS family transcription factors are cooperative and redundant drivers of the core regulatory circuitry in CDS.


Introduction
The increased use of molecular diagnostics in cancer care has improved our understanding of tumor types that were previously difficult to classify.Ewing-like sarcoma, or undifferentiated small round cell sarcoma, is an example of a tumor that is both rare and difficult to distinguish from other small round blue cell tumors 1 .Historically named for its histological and phenotypic similarities to Ewing sarcoma, Ewing-like sarcoma was a diagnosis of exclusion 2 .Recently, it was discovered that a majority of Ewing-like sarcomas harbor a reciprocal translocation at t(4;19)(q35;q13), leading to a CIC-DUX4 gene rearrangement 3 .The resultant fusion protein contains the DNA binding domain from CIC, a canonical tumor suppressor, and the transcriptional activation domain (TAD) of DUX4, an early embryonic pioneer factor 4 .The CIC-DUX4 oncoprotein behaves as a transcriptional activator, however, the mechanisms behind CIC-DUX4 mediated tumorigenesis remain unclear 5 .
Ectopic CIC-DUX4 expression transforms murine mesenchymal cells and osteochondrogenic progenitor cells with high efficiency 6 .Orthotopic allografts using these cells form aggressive undifferentiated small round cell tumors resembling human CIC-DUX4 driven sarcomas (CDS).To our knowledge, there is no existing autochthonous mammalian model of this malignancy.Our prior work has demonstrated the importance of co-evolution of the tumor and tumor microenvironment when evaluating novel therapeutics and interventions such as radiotherapy 7 .In soft tissue sarcomas, radiation therapy is a mainstay of treatment, however, treatment efficacy varies across tumor types and histologies.Clinically, compared to Ewing sarcoma CDS is more resistant to chemotherapy and radiation therapy 8,9 .Due in part to the extreme rarity of the disease and further confounded by the aggressive disease course, identification and clinical trials of new therapies are especially difficult.Therefore, a model system that recapitulates this tumor and enables mechanistic studies of tumor initiation, progression, metastasis, immune evasion, and the response to treatment is of particular significance.In this work, we describe efforts to generate an autochthonous primary mouse model of CDS that mimics the human disease to serve as a preclinical platform for developing novel therapies for CDS.

Derivation of transgenic animals
All animal experiments were approved by Duke University Animal Care and Use Committee, protocol number A014-22-01.Ch7CDS mice were generated using homologyindependent CRISPR/Cas9-mediated targeted integration.A donor vector was designed to insert loxP sites and sequence from the human DUX4 TAD at the c-terminus of the endogenous Cic locus on chromosome 7 (ENSMUG00000005442).Suitable sgRNAs were identified (Supplementary data 4), validated in vitro, and then cloned into a CRISPR/Cas9 activator plasmid (Addgene #64073).G4 embryonic stem cells were co-transfected with the activator and donor plasmids and then selected in G418-containing media 10 .Embryonic stem (ES) cell clones containing the conditional knock-in were identified by PCR and validated with Sanger sequencing before injecting into donor ICR mouse morulae.Morulae were transplanted into female nurse ICR mice for gestation and delivery of chimeric pups.
Rosa26 Lox-STOP-Lox (LSL) Ai9-HA-CIC-DUX4 (R26-Ai9) mice were generated by subcloning an N-terminal 3x HA-tagged CIC-DUX4 fusion gene from Yoshimoto et al. 6 into a Rosa26 targeting construct (Addgene #21714).The sequence verified construct was then transfected into ES cells and selected in G418 media-containing.Clones containing the knock-in were identified by PCR, sequence validated, and then injected into donor ICR mouse morulae which were transplanted into female nurse ICR mice for gestation and delivery of chimeric pups.

Genotyping
Genomic DNA (gDNA) was purified from ES cells and tail clips using the Quick-DNA Miniprep Kit (Zymo Research).Initial validation of target insertion was performed with primers designed to amplify across the 5' and 3' integration sites (Supplementary data 1-4).Several positive ES cell clones were then selected (indicated by red font) and verified by Sanger sequencing prior to expansion and morulae aggregation.To look for recombination, gDNA was also purified from tumors and tumor-derived cell lines as above and new primer pairs were designed to amplify across the entire region between the loxP sites (Supplementary data 4).PCR was performed using TaKaRa LA Taq and optimized for amplicon size.

Derivation of tumor cell lines
Mouse soft tissue sarcoma cell lines were generated as described previously 11 .Tumors were resected using aseptic technique from humanely euthanized animals.Tumor tissue was enzymatically and mechanically digested by serial pipetting, washed, and resuspended in sterile PBS.The cell suspension was filtered through a 70 μm cell strainer, pelleted, and resuspended in DMEM containing 10%FBS and 1X GA-1000 antibiotic (standard growth media).Cells were plated at high density in tissue culture treated flasks and assessed for cell death the following morning.Viable adherent cells were maintained in standard growth media and passaged with 0.25% trypsin-EDTA.

Western blot
Cells lines were maintained in standard growth media until 80% confluency.Using Trypsin-EDTA, the cells were lifted, collected in Hank's Balanced Salt Solution, and then pelleted by centrifugation (300xg for 3 minutes).Lysates were made using RIPA buffer (supplemented with 1% SDS, HALT protease inhibitor (Thermo Fisher Scientific), and Benzonase) and quantified with Pierce BCA protein assay kits.Heat-denatured proteins were loaded onto a 10% Bis-Tris gel, run at 150-200v in 1x MES buffer, and then wet-transferred at 350mA for 1 hour at 4°C.Importantly, all above steps were completed in a single day due to the unstable nature of the CIC-DUX4 fusion protein.The CIC-DUX4 fusion (~260kD) was probed using an anti-HA antibody (Cell Signaling, 3724) at 1:1000 dilution or anti-DUX4 antibody (Abcam, ab124699) at 1:1000.Cre was detected using an anti-Cre antibody (Cell Signaling, 15036) at 1:1000 dilution with B-actin at 1:2000 (Sigma, A1978) as a loading control.Images were acquired on a LI-COR Odyssey CLx and processed using the Image Studio Software.

RNA-sequencing
RNA was extracted and purified from flash frozen tumors/normal tissues and cells using a Qiagen RNeasy kit following the manufacturer's instructions for fibrous tissue (Qiagen, Hilden, Germany).High quality RNA (RIN >7) was divided into triplicate from which 150bp paired-end, rRNA-depleted, libraries were made using the Illumina TruSeq RNA library Prep Kit (Illumina, CA, USA).Libraries were quantified using the KAPA Library Quantification kit (KAPA Biosystems, MA, USA), multiplexed, clustered onto flowcells, and then sequenced using an Illumina HiSeq 4000 sequencer (or equivalent platform) by GENEWIZ (Azenta, NJ, USA).

ChIP-sequencing
ChIP-sequencing was performed in tumor-derived cell lines generated from Ai9CDS and TOPCDS mice using the Active Motif ChIP-IT High Sensitivity kit (ActiveMotif, CA, USA).
Cells were seeded into 150mm dishes and cultured in standard growth medium.At 80% confluency, the cells were crosslinked in 37% formaldehyde (with methanol) for 15 minutes on the dish and quenched with glycine.Washed cell pellets were manually lysed using a dounce homogenizer and the chromatin was fragmented using a Q125 sonicator (Qsonica, CT, USA) with the following settings: 25% amplitude, 30 seconds 'ON'/30 seconds 'OFF' for 20 minutes total.Separate immunoprecipitation reactions using an Anti-HA tag antibody (Abcam; ab9110) or anti-Histone H3K27ac antibody (ActiveMotif, 39134) were setup for overnight immunoprecipitation reactions at 4°C.ChIP DNA was bound to Protein G agarose beads, column purified, and eluted.150bp, paired-end, DNA libraries for sequencing were prepared using the TruSeq DNA library Prep Kit (Illumina, CA, USA) and quantified using the KAPA Library Quantification kit (KAPA Biosystems, MA, USA).Libraries were sequenced on an Illumina HiSeq 4000 sequencer (or equivalent platform) by GENEWIZ (Azenta, NJ, USA).Raw sequencing reads were trimmed using Trimmomatic v0.39 and then aligned to the mm10 reference genome using Bowtie 2 (-q -t --no-mixed --no-discordant).Duplicate reads were marked and removed using Picard tools v2.18.2 and peaks were called with MACS3 (-B -f BAMPE -g 1.87e9 -q 0.01).Peak files were filtered against the ENCODE blacklisted regions (https://github.com/Boyle-Lab/Blacklist/tree/master/lists)and then annotated using ChIPseeker v3.17 12 .De novo motif enrichment analysis on HA-CIC-DUX4 peaks was performed with HOMER v4.11 13 .Super Enhancers were called from H3K27ac peaks using ROSE and used as input for CRCmapper to map the core regulatory circuitry 14,15 .Raw data is available on GEO under the accession number GSE241370.

Fusion of a human DUX4 C-terminal domain to endogenous Cic is sufficient to generate small round cell sarcomas in mice.
To investigate the ability of CIC-DUX4 to generate tumors in vivo, we used CRISPR/Cas9 to insert the C-terminal domain (CTD) of human DUX4 into exon 20-the most common breakpoint in human fusions-of the endogenous mouse Cic gene on chromosome 7 16 .This strategy was chosen over a pure endogenous fusion model due to the repetitive structure of Dux/DUX4 alleles and weak CTD conservation 17 .Here, prior to Cre exposure, mice express two alleles of endogenous Cic.In this model, injecting the mice with an Adenovirus expressing Cre or breeding the mice with a tissue specific Cre driver would drive recombination between the loxP sites to excise the endogenous exon 20 and termination sequence to express the Cic-DUX4 fusion (Figure 1a).By this method, one allele of the endogenous mouse Cic gene is converted to Cic-DUX4, which recapitulates the haploinsufficiency of CIC observed in naturally occurring CIC-DUX4 sarcomas.Screened and validated ES cell clones were successfully injected, and 38 viable chimeric pups were born to host ICR mother mice.As expected, chimeric animals showed a high contribution of donor genetic material (50-100%) and genotyping by tail clip at one week of age found that all 38 pups harbored an unrecombined transgenic allele (Supplementary data 1).Surprisingly, beginning at 3-weeks of age, in the absence of Cre recombinase the chimeric animals spontaneously developed large, and in some cases multifocal tumors involving the limbs, flank, back, abdomen, and head and neck region which rapidly grew and metastasized (Figure 1b).The animals were humanely euthanized when evidence of extensive tumor burden or illness was observed, and all discernable tumors were harvested for analysis.By 5 weeks of age, all 38 chimeric animals were dead (Figure 1c).Although none of the animals were exposed to Cre recombinase, PCR amplification across the loxP sites demonstrated tumor-specific recombination (Figure 1d).To further characterize this model, several tumors were evaluated using an immunohistochemical panel for small round cell sarcomas including: CD99, WT1, SOX10, Cyotkeratin (CK), CD34, and Desmin.The tumors consistently showed focal/patchy CD99 expression and strong WT1 expression similar to human CDS (Figure 1e) [18][19][20] .Taken together, these data suggest that spontaneous (Cre-independent) CIC-DUX4 expression is sufficient to induce the formation of aggressive small round cell sarcomas in mice.

CIC haploinsufficiency is not required for CIC-DUX4 sarcomagenesis.
CIC is a highly conserved and canonical tumor suppressor that regulates MAPK effector gene expression 21 .In oligodendrogliomas, mutations in CIC are common and believed to be a key oncogenic event 22 .To test the necessity of CIC haploinsufficiency for CDS formation, we utilized a homologous recombination strategy to insert a 3x HA-tagged human CIC-DUX4 cDNA into the Rosa26 locus under control of a Lox-STOP-Lox cassette (Figure 2a).Here, the HA-CIC-DUX4 fusion is expressed without affecting the endogenous Cic alleles.In parallel with production of the Ch7CDS model, targeted ES cell clones were screened, validated, and implanted giving rise to 32 chimeric pups.Animals showed 50-100% chimeric contribution and genotyping at 1 week of age confirmed the presence of an unrecombined transgenic allele (Supplementary data 2).Again, beginning at 3 weeks of age, in the absence of Cre-recombinase the chimeric Ai9CDS animals spontaneously developed tumors and rapidly succumbed to disseminated disease.By 6 weeks of age, all animals had died naturally or were humanely euthanized (Figure 2b) with tumors histologically resembling CDS.PCR amplification across the loxP sites demonstrated recombination of the loxP sites in the tumors (Figure 2c).Based on the rapid tumor onset despite two intact Cic alleles, we conclude that CIC haploinsuffiency is not required for CIC-DUX4 sarcoma formation.

Extended STOP cassette does not prevent Cre-independent recombination.
Genetic recombination is a fundamental biological process that is mediated, and exploited in genetic engineering, by the presence of direct repeat sequences such as loxP sites.In the Cre-loxP system, the likelihood and efficiency of recombination decreases with increasing distance between pairs of loxP sites 23 .To try to delay tumor onset, we engineered a third mouse model, TOPCDS, using the same strategy as the Ai9CDS mouse with one exception.In place of a short LSL cassette optimized for cloning, we utilized a long LSL cassette subcloned from a targeting construct used to control the expression of other potent oncogenes like Kras G12D (Figure 2d) 24 .
Compared to the ~900bp LSL cassette in the Ai9 vector, the TOPO LSL cassette spans ~5700 bp and contains two additional STOP sequences making it less susceptible to spontaneous recombination.42 viable chimeric pups were born to host ICR mothers.Animals showed 50-100% chimeric contribution and genotyping at 1 week of age confirmed the presence of an unrecombined transgenic allele (Supplementary data 3).Beginning at 3 weeks, in the absence of Cre the animals once again developed spontaneous tumors which rapidly and widely metastasized.By 9 weeks of age, all 42 animals had died naturally or were humanely euthanized (Figure 2e) with tumors identical in appearance and histology to the first two models.
Examination of loxP sites using PCR amplification showed tumor-specific recombination (Figure 2f).Thus, although the extended LSL cassette modestly increased tumor latency, it was not sufficient to prevent Cre-independent recombination.

Tumors that arise in chimeric transgenic animals are driven by CIC-DUX4 in the absence of Cre recombinase.
To validate that the tumors in our chimeric mice were driven by CIC-DUX4, in situ expression of the fusion protein was validated using immunohistochemistry. Here, antibodies against the N-terminal HA epitope tag as well as the C-terminus of human DUX4 were used to confirm expression of the fusion gene.Notably, in both Rosa26 models (Ai9CDS and TOPCDS), HA and DUX4 showed strong positive nuclear staining in the tumor cells but not in surrounding stroma or normal tissue (Figure 3a) 25 .As expected, Ch7CDS tumors exhibited strong nuclear DUX4 staining, and the HA epitope was not detected.Cre-recombinase was also not detected in any of the models.To rule out the possibility of transient Cre expression from a cryptic transgenic Cre allele, PCR genotyping was performed on genomic DNA isolated from ES cells, tails, and tumors from each of the three models.All assays for Cre, iCre, and strain-specific Cre (i.e.Meox2-Cre, Tie2-Cre, and Vil1-Cre) were negative.To facilitate mechanistic and genomic studies, tumor-derived cells lines were generated from each of the three mouse models.Like the parent tumors, all three cell lines expressed the full-length CIC-DUX4 fusion oncoprotein in the absence of Cre-recombinase (Figure 3b).To measure CIC-DUX4 transcriptional activity, RNA was harvested from tumors and tumor-derived cell lines for bulk RNA-sequencing.Using mouse KP (Kras G12D , p53 fl/fl ) sarcoma tumors 26 and normal muscle as controls, all three CDS models strongly upregulated markers of CIC-rearranged sarcoma including ETV1/4/5, SHC3/4, DUSP4/6, and VGF further credentialing them as CIC-DUX4 sarcomas 5,20,27 (Figure 3c).

CIC-DUX4 behaves as a neomorphic transcriptional activator.
Several studies have inferred a neomorphic function for CIC-DUX4 as a direct transcriptional activator based on gene expression changes in transformed cell lines and ChIPsequencing with non-specific antibodies 24,25 .Using the N-terminal 3x HA-epitope tag, CIC-DUX4 fusion gene specific ChIP-seq was performed in the Ai9CDS and TOPCDS cell lines revealing 4,861 and 5,057 high confidence binding sites, respectively (Figure 4a).Binding was most enriched at gene promoters (observed/expected:3-4, p<1e-70) and associated with genes involved in notable oncogenic signaling pathways including Hippo, Wnt, and PI3K-AKT (Figure 4b).Because fusion proteins can acquire novel binding site capabilities, de novo motif enrichment analysis was performed on peaks shared between the two experiments (n=2,410).As anticipated, the most enriched motif matched the predicted binding site of CIC (CATT), however, a motif matching the consensus binding site for ETS transcription factors (GGAA) was also highly over-represented at HA-CIC-DUX4 binding sites (Figure 4c).To investigate the effect of CIC-DUX4 binding on local chromatin, ChIP-seq for H3K27ac was performed in parallel.Predictably, HA-CIC-DUX4 peaks strongly co-localized with H3K27ac consistent with its function as a transcriptional activator (Figure 4d).Lastly, we sought to identify other core transcription factors that may co-operatively regulate the transcriptional circuity in CDS.To do so, Ai9CDS and TOPCDS cell line H3K27ac ChIP-seq datasets were used to define Super Enhancers (SE) which were then interrogated for known transcription factor motifs.This analysis identified 11 transcription factors common between both cell lines all of which, with the exception of Tead1 and Creb3l2, are highly upregulated in mouse and human CIC-DUX4 sarcomas (Figure 4e) 29 .Remarkably, 4 of the 11 predicted genes are ETS transcription factors (i.e.Etv5, Etv4, Etv1, and Ets1) adding to a body of literature which has previously implicated these factors as critical drivers of CIC-DUX4 sarcomagenesis 5,28,30 .

Discussion
CIC-DUX4 sarcoma (CDS) is a rare and highly aggressive undifferentiated small round cell sarcoma affecting adolescents and young adults 31 .Despite its clinical and morphological similarities to Ewing sarcoma, CDS is insensitive to conventional Ewing Sarcoma therapies translating to high rates of relapse and low survival 32 .To improve patient outcomes, the mechanisms underlying tumor initiation, maintenance, and metastasis need to be understood.
One obstacle to acquiring this knowledge is the scarcity of primary tissues from patients with CDS necessary for developing and testing specific hypotheses.To overcome this, an animal model with an intact immune system that mimics the aggressive properties of CDS would be a valuable resource to the field.
To this end, we worked to develop a genetically engineered mouse model of CDS.Surprisingly, the chimeric animals from three independent models, irrespective of Cic haploinsufficiency, developed aggressive tumors and metastatic disease that was uniformly lethal between 3-9 weeks of life.All tumors were similar to human CDS in appearance and expressed a CIC-DUX4 fusion gene in the absence of Cre-recombinase.Although spontaneous (Creindependent) recombination has been reported in other genetically engineered mouse models with conditional oncogenes, the event rate was low and tumors were specific to a few susceptible tissues and organs 33 .In the CDS models, the penetrance was complete and widespread.These results raise the possibility that spontaneous, Cre-independent recombination may be a common event in other conditional mouse models that employ the Cre-loxP system for gene regulation, but may not be appreciated if the potency of the oncogene is not as strong at CIC-DUX4.This may have important implications for mouse models of cancer where unanticipated expression of a neoantigen in the absence of Cre recombinase could engage the immune system prior to tumor initiation with exogenous Cre delivery.Regardless, the results of complete tumor penetrance in the 3 models of CDS in the absence of Cre demonstrate a strong positive selective pressure for the CIC-DUX4 fusion oncogene.Consistent with prior work, the RNA-seq and ChIP-seq data suggests that fusion of the DUX4 CTD converts the CIC transcriptional silencer into a potent transcriptional activator.Likely, this is mediated by the recruitment of P300/CBP and/or other histone acetyltransferases to genes normally silenced by CIC 34 .The genes most responsible and essential for the aggressive properties of CIC-DUX4 tumors remain to be determined.Our analysis of the transcriptional network activated in mouse CDS points to ETV1, ETV4, ETV5, and ETS1 as conserved targets of CIC-DUX4 and, more importantly, as key cooperative (and potentially redundant) regulators of the CDS transcriptional circuitry (Figure 4f).The role of ETV4 in CDS has been investigated before but the results are inconsistent.For example, in a transgenic zebrafish model of CDS, genetic loss of ETV4 impairs tumor formation 30 .In contrast, ETV4 knockdown in transformed CIC-DUX4-expressing NIH 3T3 cells had no effect on cell viability and tumor growth but did impair metastatic potential 28 .One explanation for the discrepancy may be a divergence in PEA3 subfamily (ETS transcription factors ETV1, ETV4, and ETV5) redundancy during evolution from zebrafish to human 35,36 .Another possibility relates to the species-specific distribution of ETS binding sites which has forestalled numerous attempts to model Ewing Sarcoma in mice 37 .To dissect the independent and overlapping roles of ETS1 and PEA3 subfamily genes in CDS, future work will use CRISPR/Cas9 to systematically and combinatorially delete these genes in cell lines and tumors.Of further interest is whether forced overexpression or stabilization of these same factors could be toxic to CDS cells in keeping with the 'Goldilocks phenomenon' of ETS family transcription factors described in Ewing Sarcoma 38,39 .Lastly, these results indicate that CDS, unlike Ewing Sarcoma, can be modeled in the mouse which is an important conceptual advance.However, innovative solutions to prevent spontaneous recombination and expression in the absence of Cre will be required to generate a CDS model that can be maintained through standard mouse breeding and lead to spatially-and temporally-restricted tumors as a platform for discovery and testing novel therapies.

Figure 1 .
Figure 1.Fusion of a human DUX4 C-terminal domain to endogenous Cic is sufficient to

Figure 3 .
Figure 3. Mouse tumors express CIC-DUX4 and a transcriptional signature consistent with CDS.

Figures Figure 1 Fusion
Figures

Figure 3 Mouse
Figure 3