A Mesenchymal-like Program of Dormancy controlled by ZFP281 Serves as a Barrier To Metastatic Progression of Early Disseminated Cancer Cells

Increasing evidence shows that cancer cells can disseminate from early-evolved primary lesions much earlier than the classical metastasis models predicted. It is thought that a state of early disseminated cancer cell (early DCC) dormancy can precede genetic maturation of DCCs and metastasis initiation. Here we reveal at single cell resolution a previously unrecognized role of mesenchymal- and pluripotency-like programs in coordinating early cancer cell spread and a long-lived dormancy program in early DCCs. Using in vitro and in vivo models of invasion and metastasis, single cell RNA sequencing and human sample analysis, we provide unprecedented insight into how early DCC heterogeneity and plasticity control the timing of reactivation. We identify in early lesions and early DCCs the transcription factor ZFP281 as an inducer of mesenchymal- and primed pluripotency-like programs, which is absent in advanced primary tumors and overt metastasis. ZFP281 not only controls the early spread of cancer cells but also locks early DCCs in a prolonged dormancy state by preventing the acquisition of an epithelial-like proliferative program and consequent metastasis outgrowth. Thus, ZFP281-driven dormancy of early DCCs may be a rate-limiting step in metastatic progression functioning as a first barrier that DCCs must overcome to then undergo genetic maturation.


INTRODUCTION
The majority of cancer patients die of metastatic relapse, which frequently occurs years to decades after diagnosis and treatment. This happens because patients already carry numerous disseminated cancer cells (DCCs) that can remain dormant for long periods of time, only to later on give rise to metastasis 1 . Although cancer dormancy is a major concern in the clinic, our knowledge about the origin and nature of dormant DCCs and the mechanisms that allow these cells to remain quiescent, but still retain metastasis-initiating capacity, is still limited. Additionally, it was believed that the ability of cancer cells to disseminate and metastasize was exclusive to late stages of progression [2][3][4] when rare cells in primary tumors gained numerous genetic alterations considered necessary for spread and target organ colonization. However, increasing evidence has revealed that the barriers to activate invasion and motility programs may be enabled very early in cancer evolution resulting in early DCCs seeding organs over long periods of time 1 .
Early dissemination or intra-organ dispersion was reported in human breast [5][6][7][8][9] , pancreatic 10-12 , lung 12 , melanoma 13 and colorectal 14 cancer patients. In renal cell 15 , ovarian 16 , testicular 17 carcinomas and osteosarcoma 18 , early spreading clones were also reported as metastasis founders across different organs. However, the exact programs leading early spread and the persistence of early DCCs are not clear from these studies. In mouse models of breast 7,19 , pancreatic 10,20 and melanoma cancers 21,22 we and others identified early lesion cells with the ability to spread to secondary organs; however, while early DCCs can indeed found metastasis 7,19 , information on their post-dissemination phenotypes had not been explored. This is important because the field has not resolved whether the time it takes for early DCCs to manifest as metastasis is due simply to a slow genetic maturation process or if indeed there is a program that holds early DCCs in a dormant state before they can initiate slow or fast (explosive growth 1 ) proliferation and "mature" genetically.
Using HER2 and PyMT oncogene-driven mouse models, we modeled early cancer cell dissemination and found that HER2 signaling activates a partial epithelial-to-mesenchymal transition (EMT) program, leading to dissemination 19 . Activation of progesterone and Wnt signaling and inhibition of the p38 pathway 7,19 fuels early spread through a process that resembles mammary tree branching morphogenesis 23 . Similarly, early dissemination is further fueled by early lesion infiltrating CD206+/Tie2+ macrophages in these same models, a similar population of macrophages that regulates mammary tree development 24 . Using a binary set of mesenchymal vs. epithelial markers (TWIST1 and CDH1), we found that HER2+ early DCCs in secondary organs maintain a mesenchymal and long-lived dormant phenotype that preceded metastasis initiation 19 . Here we expanded this analysis and used single cell RNA sequencing (scRNAseq) to reveal the DCC heterogeneity and plasticity in lungs across the evolutionary spectrum of the disease. Surprisingly, we found that early DCCs activate a novel program with the dual function of fostering early dissemination and initiating dormancy of early DCCs in lungs. We reveal that the primed pluripotency transcription factor ZFP281 is a key regulator of early DCC spread and dormancy. Using both organoid and in vivo models we show that ZFP281 induces mesenchymal-like and primed pluripotency programs, which while not enabling proliferation in the primary or secondary site, allows for efficient dissemination to the lungs. After dissemination, if not downregulated, ZFP281 maintains early DCCs in a prolonged dormant state. Importantly, we show that even aggressive late cancer cells can be reprogrammed into dormancy and prevented from metastasizing by regaining ZFP281 expression.
Our efforts in understanding early and late DCC heterogeneity have yielded a novel mechanism and regulator of metastatic dormancy that would have been missed if we had only focused on advanced primary tumor biology and the classical view of the metastatic cascade. Our work opens the way for understanding how early DCC dormancy is likely a first barrier for genetically immature DCCs to overcome to further evolve. These data may also enable exploiting these mechanisms to eliminate DCCs or force them into an indolent and harmless dormant phenotype.

Early lesions activate a mesenchymal-like program that persists in early and late lung DCCs.
To investigate the mechanisms of early dissemination, dormancy and metastasis awakening, we used the MMTV-ErbB2/HER2/Neu mouse model 25 , a spontaneous breast cancer model of HER2+ cancer with luminal characteristics 26 . This mouse model shows slow tumor progression, providing a significant temporal window to study early stages of tumorigenesis and metastatic progression prior to the occurrence of primary tumors (Figure S1A, 19 ). To understand the gene programs present in early versus late MMTV-HER2 lesions, we performed RNA sequencing (bulk RNAseq) of early lesion (EL) and primary tumor (PT) spheres, which recapitulate the in vivo behavior of EL and PT lesions 19 . We identified 4290 differentially expressed genes (DEGs, adj. p-value<0.05 and FC>2 or <0.5), 2873 upregulated and 1417 downregulated, in EL vs. PT spheres ( Figure 1A and STable 1). Among the upregulated gene ontology programs enriched in EL cells, we found terms associated with TGFβ, ECM, collagen, focal adhesion, PI3K and β1 integrin signaling, pathways associated with EMT, adhesion, cellular morphogenesis and ECM remodeling 27,28 . In contrast, the top downregulated gene ontology term was tight junction regulating genes ( Figure S1B and STable 2, Enrichr analysis 29,30 ).
Further supporting a gain of a more mesenchymal program, GSEA analysis 31,32 revealed epithelial-tomesenchymal transition (EMT) as the most enriched hallmark of EL over PT cells (Figure 1B and STable 3). EL cells were also enriched in 'mammary luminal down' and 'mammary stem cell' signatures ( Figure 1B and STable 3, 33 . Together these results suggest that HER2 + EL cells activate mesenchymal-like (M-like) and basal/stem-like programs, which are subsequently silenced in advanced PT cells that appear to gain an epithelial-like (Ep-like) program.
We next sought to better understand the heterogeneity of globally M-like EL and Ep-like PT cells, as well as early and late lung (eL and LL) DCCs. To this end, we sorted HER2+ tumor cells from EL, PT, eL DCCs and LL DCCs (Figure 1F and S1C) and performed single cell RNA sequencing (scRNAseq). Gene expression profiles from 3686 cells were obtained and compared with the trends we observed in EL vs. PT by bulk RNAseq. As expected, in single cells sorted from EL cells, the expression of genes which were upregulated (containing mesenchymal and stem genes) in the bulk RNAseq of 7-day sphere cultures of EL over PT spheres was higher and reciprocally, in single cells sorted from PTs, the expression of downregulated genes (containing luminal genes) was higher ( Figure 1C). Interestingly, EL cells showed more heterogeneity (expanded cloud range) than PT cells.
Additionally, early and late lung (eL and LL) DCCs clustered further away from PT cells but closer to EL cells (Figure 1C), suggesting that perhaps DCCs represent a subpopulation of EL cells with a more mesenchymal and stem signature. Further unsupervised clustering on the DEGs, using a previously described batch-aware algorithm 34 , showed that although EL and PT cells clustered almost independently, clusters 3 and 5 contain both EL and PT cells (Figure 1D), arguing that some PT cells maintain an EL program or regain such program. Interestingly, lung DCCs clustered separately from EL and PT cells; however, cluster 9 is uniquely composed by EL cells, eL DCCs and LL DCCs, no PT cells ( Figure 1D), suggesting that ELs contain a subpopulation of cells that already carry a signature that allows them to disseminate and persist in lung DCCs. Our analysis also showed that DCCs from early and late lungs, while heterogeneous and distinct from EL and PT cells, were always contained in the same signature clusters (7-11) ( Figure 1D) suggesting that DCCs with early lesion signatures may persist in the late stages. This prevented distinguishing early DCCs from those DCCs populating late lungs, most likely because late lungs carry early DCCs (derived from ELs), DCCs derived from primary tumors and growing metastasis, all co-existing in the same lungs. Nonetheless, Ep-and Mlike signatures found in the primary early and late lesions (Figure 1A and 19 ) were also found in lung DCCs and these cells could be broadly grouped into Ep-(7-8) and M-like (9-11) clusters ( Figure 1E).
Interestingly, the Ep-like clusters 7 and 8 shared epithelial signatures more homogeneously, while the M-Like clusters 9-11 showed more non-overlapping mesenchymal transcriptional signatures and all populations co-exist in early or late stage lungs.
Analyses by both FACS (Figure 1F and S1D) and immunofluorescence ( Figure 1G) revealed that PT cells show high levels of HER2 and a predominant epithelial phenotype, characterized by strong EpCAM expression and non-invasive organoids. In contrast, EL cells showed a broader spectrum of HER2 expression and a mixture of epithelial EpCAM + cells and also mesenchymal Eng/CD105 + cell 6 populations ( Figure 1F and S1D) which correlated with a more invasive phenotype when in culture on top of matrigel (Figure 1G and 19 ). Endoglin (Eng/CD105) was a mesenchymal marker 35 selected from the scRNAseq data ( Figure 1E) due to its selective upregulation in M-like DCCs enriched in early lungs. FACS analysis confirmed that the M-like/Eng + and invasive phenotype found in EL cells, persists and increases in frequency in early lung DCCs (Figure 1F, S1D and 1G). In contrast, late lungs presented a smaller population of M-like DCCs (Eng + and invasive) and enrichment in Ep-like DCCs (EpCAM + and non-invasive), resembling PT cells. We conclude that a subpopulation of EL cells carries a M-like signature that is found in M-like clusters in lung DCCs, which revealed a previously unrecognized heterogeneity of cellular states of DCCs in early and late stage lungs.
Remarkably, late lungs are still populated by a significant fraction of DCCs with transcriptional programs found in early DCCs.

Early DCCs gain multiple M-like cellular states and display a dormant phenotype.
To gain further insight into the heterogeneity of M-and Ep-like phenotypes of lung DCCs, we performed additional scRNAseq profiling focusing exclusively on lung DCCs and increasing the number of sampled cells. HER2-negative non-cancer lung cells and HER2+ DCCs from early and late stage mice were analyzed, and a comprehensive analysis and clustering of 15,287 additional cells was performed ( Figure S2A). We identified 25 distinct clusters, 10 were excluded due to their high prevalence in normal lung cells resulting in 15 DCC clusters (with less than 16% of normal lung cells).
These DCCs were further sub-grouped in M-like (1 to 4), hybrid (5 to 8) and Ep-like (9 to 15) clusters based on canonical mesenchymal and epithelial signatures (Figure 2A and S2A). Ep-like scores are variable but it seems the vast majority of DCCs keep an epithelial identity, gaining or losing mesenchymal traits (Figure 2A and S2A). Of note, DCCs show a high degree of cellular plasticity but few cells undergo full EMT or MET, which made us use the terms M-and Ep-like, not strict categorizations. We also found that late stage mice with advanced disease carried more Ep-like DCCs, while early stage mice had more frequently DCCs with M-like and hybrid phenotypes ( Figure   2A). Interestingly, only one cluster, cluster 15, was enriched exclusively in LL DCCs and had a strong Ep-like signature. All other clusters had some representation of early DCCs and DCCs from latestage animals with Ep-(more frequent in late lungs) and M-like (more frequently in early DCCs) phenotypes (Figure 2A-B, S2B and S2F).
Analysis of gene-to-gene correlation of highly variable genes identified gene modules with strong co-expression patterns. The enrichment of TF targets that correlated with the expression of the modules (Enrichr analysis 29,30 ) revealed multiple programs activated in DCCs that are associated with pluripotency, mixed-lineage differentiation and EMT ( Figure 2B and STable 5). M-like DCCs from cluster 1, enriched in gene module A, express brain-and osteoblast-lineage genes, and this A signature revealed genes commonly regulated by the TFs Neurod1 (neurogenic differentiation 1), SOX8, SOX9 and SOX10 (embryonic development regulators) and Vim, Col4a1 and Col4a2 (EMTassociated genes). M-like DCCs from cluster 2, enriched in gene module B, share some of the above mentioned genes and also genes commonly regulated by the TFs and chromatin remodelers SUZ12, SOX17, SOX18, POU5F1/OCT4, well-known pluripotency regulators. M-like DCCs from cluster 3 and 4, still carried genes controlled by the above transcriptional regulators but also gained EMT genes (Zeb2 and Col3a1) and genes commonly regulated by the TFs Snai2, Twist1, Prrx1, Fbn1 (also EMT inducers) and SMADs. These data support that an M-like program initiated in EL cells (Figures 1 and   S1) is carried by early DCCs in the lung and even persists in DCCs in late lungs. In hybrid DCCs  Figure 2C). When analyzing the Ep-like clusters (9-15), we found that gene module I is homogeneously expressed by almost all clusters (9 to 14). Interestingly, cluster 15 is distinct and, as mentioned above, composed only by DCCs from late lungs (LL). These DCCs express luminal epithelial genes (EpCAM and Krt18), Ovol1, Ovol2, Grhl2 TFs (also epithelial genes), as well as mammary gland/lactation genes (Csn1s1, Csn1s2, and Csn3) ( Figure 2B and STable 5). These results suggest that cluster 15 corresponds to more luminal differentiated Ep-like proliferative DCCs/metastasis, as evidenced by an increase in CCND1 gene, which is almost absent in M-like and hybrid clusters ( Figure 2D).
This led us to better define which clusters contain dormant DCCs. We found that Cdkn1c/p57 Kip2 , NR2F1 and TGFβ2, all genes previously linked to quiescence and dormancy of DCCs, are more frequently expressed by M-like DCCs (Figure 2D), which as mentioned above show higher frequency of early DCCs (Figure 2A-C, negative for p-histone H3 and p-Rb 19 ). These data suggest that early DCCs activate gene programs linked to cell plasticity, progenitor-like, M-like and dormant programs and that the transition from these programs to an epithelial and more "differentiated program" associates with their ability to proliferate and form metastasis.

ZFP281 is a marker of the M-like programs in EL cells and early DCCs.
Since EL cells do not form tumors but disseminate efficiently and persist as DCCs in lungs 19 , we hypothesized that the gain of an M-like program found in the DCCs may be transcriptionally encoded already in the early lesions. To address this hypothesis we performed a transcription factor (TF) network analysis mining the bulk RNAseq data derived from EL versus PT spheres. This analysis identified 8 interconnected nodes where ZFP281 was the TF node with the highest number of DEGs in EL cells ( Figure 3A and STable 6). ZFP281 is a key transcriptional regulator of primed pluripotency in both mouse and human embryonic stem cells and functions as a barrier toward achieving naive pluripotency 36 . ZFP281 is absent in terminally differentiated human tissues and it was shown to counteract osteogenic 37 and muscle differentiation 38 , and we did not find it expressed in normal mammary gland cells ( Figure S3B). Further, ZFP281 promotes EMT in colorectal cancer cells by upregulation of SNAI1 and CDH1 39 . ZFP281 is upregulated during the naïve-to-primed pluripotent state transition 36 where EMT or partial EMT/epithelial plasticity was postulated to happen 40,41 . When overexpressed in mouse ESCs, ZFP281 also suppresses growth (unpublished data). The second largest node we identified, NR5A2 (also known as LRH-1) (Figure 3A), also plays an important role in maintaining stem cell pluripotency during embryonic development 42 but its link to EMT is still unclear 43,44 . Among other TFs, we found RARβ and RARγ, previously linked to dormancy 45 , that may also play a role in early lesions and early DCCs. Thus, EL cells seem to upregulate a set of TFs that are involved in pluripotent stem cell plasticity and invasion programs.
We focused on ZFP281 as a potential paradigmatic EMT parallel between normal stem cell and cancer development owing to its roles in regulating stem cell pluripotency, growth arrest and invasion.
We validated the increase in ZFP281 mRNA levels and its activity by measuring the changes of its predicted target genes ( Figure 3A) by qPCR in EL over PT spheres ( Figure S3A). This analysis shows that M-like genes such as TGFBRII, CDH11 and Eng are induced in EL over PT cells, while Ep-like genes such as CDH1 and EpCAM were downregulated, arguing that ZFP281 represses an Ep-like identity. Analysis of the expression of the predicted ZFP281 target genes ( Figure 3A) in the lung DCC clusters revealed that the predicted ZFP281 targets upregulated in EL cells are also frequently upregulated in M-like lung DCCs. In contrast, Ep-like lung DCCs do not show upregulation of these predicted ZFP281 targets or observed upregulated genes (scRNAseq, Figure 3B). At the protein level, we found even stronger differences in ZFP281 expression: in normal FvB mammary glands (fully differentiated tissue) ZFP281 is expressed only in 3% of the cells; whereas 30% of EL cells express ZFP281, which is then downregulated in PT cells (8% ZFP281 + ; Figure 3C and S3B).
Staining of EMT markers (E-cadherin and Twist1 as epithelial and mesenchymal markers, respectively) in sequential sections show that structures enriched in ZFP281 are Ecad low (less intense membrane staining) and Twist1 high ( Figure S3B). When monitoring ZFP281 expression in the early DCCs, we also found that 42% of single early DCCs are ZFP281 + , while only 5% of cells within proliferative metastasis are ZFP281 + (Figure 3D-E). This further supports that ZFP281 and its regulated programs turned on in EL cells persist in early lung DCCs. Supporting the notion that ZFP281 may drive a dormant phenotype, Ki67 and ZFP281 expression were found to be mutually exclusive in early lung DCCs ( Figure 3D). To determine if ZFP281 upregulation was also a property of early lesions in human breast cancer we stained ductal carcinoma in situ (DCIS) and invasive breast cancer (IBC) samples for ZFP281. Our analysis revealed that, among 14 human DCIS samples, 48% of the cells per lesion were ZFP281 + . In contrast, only 11% of the IBC cells were positive for ZFP281 ( Figure 3F and G). These data strongly support that ZFP281 is a novel TF in cancer mainly associated with early breast cancer progression by controlling EMT programs while suppressing active cell proliferation. These further suggest that EL cells that turn on ZFP281 would be candidates for systemic spread followed by a dormant phenotype in target organs.

ZFP281 regulates components of an M-like program and primed pluripotency in early DCCs
ZFP281 tightly coordinates cell fate through regulation of primed pluripotency programs in development 36 with possible participation of EMT/partial EMT 40,41 . To determine whether ZFP281 also regulates such programs in early mammary cancer cells, we compared RNAseq data from naïve versus primed mouse pluripotent stem cells (a transition regulated by ZFP281 36 ) and from EL versus PT spheres ( Figure 1A). This analysis revealed that DEGs in EL vs PT and in primed versus naïve stem cells were commonly positively correlated in the categories of EMT and Wnt signaling ( Figure   4A), suggesting that in EL cells ZFP281 may drive similar EMT and Wnt signaling as those found in primed pluripotent stem cells.
To gather deeper insight into the ZFP281 regulated programs in EL versus PT cells we performed chromatin immunoprecipitation sequencing (ChIPseq) in EL and PT cells, identifying 4018 ZFP281 targets in EL cells. Strikingly, comparison of the EL ChIPseq data with that derived from primed mEpiSCs showed a significant overlap in the categories of genes and actual genes that are regulated by ZFP281 in these two contexts ( Figure 4B). ZFP281 seems to regulate cell cycle arrest, EMT, Wnt and FGFR signaling both in EL and primed mEpiSCs, suggesting that HER2-driven early lesion cells activates distinct programs found very early in embryo development, even earlier than EMT in neural crest cells during gastrulation 46 .
When comparing our RNAseq and ChIPseq data from EL versus PT cells, we found 504 genes with high ZFP281 binding and high expression in EL cells ( Figure S4A-B). Some of these genes overlap with the putative ZFP281 target genes from the network analysis in Figure 3A (Figure S4B), but we also identified new ZFP281 target genes that were not computationally predicted. Among them are Snai1, Vim, Zeb1 (EMT inducers 28 ), Cdk2 and Cdkn1a (cell cycle related 47 ) and Tgfbr1, Nr2f1 and Bmp7 (dormancy associated genes 48 ) (Figure 4C and S4C). These genes were exclusively bound by ZFP281 and upregulated in EL cells or bound by ZFP281 in EL and PT cells but only upregulated in EL cells. In contrast to the genes that were bound by ZFP281 in EL cells, we also identified 118 genes with high ZFP281 binding and high expression in PT cells ( Figure S4B). This suggests that while ZFP281 expression decreases in PT cells it still binds and regulates a different ZFP281-dependent program of yet unknown function in PT cells.
To address the importance of ZFP281 and its target genes in lung DCCs, we examined their expression in our lung DCC scRNAseq data. Strikingly, we found that ZFP281 targets (from EL ChIPseq) score summarizing the averaged expression of ZFP281 targets has a bimodal distribution in lung DCCs ( Figure S4D However, some clusters like cluster 8 showed a drop in ZFP281 signature score, arguing that some hybrid cluster cells are moving from an M-like to Ep-like state. We conclude that ZFP281-regulated genes in EL cells are still active in M-like dormant DCCs and that they likely make M-like cells permissive to explore these mesenchymal states while repressing an epithelial state. These data also further support that the M-like program driven by ZFP281 is activated in the EL cells, carried over and sustained in DCCs in the target organs.

ZFP281 prevents the acquisition of an Ep-like state and maintains a M-like dormant phenotype in DCCs
Our data support a model whereby ZFP281 regulates programs of dissemination and primed pluripotency that lead to early DCC dormancy. Thus, we set out to functionally test whether indeed ZFP281 holds DCCs in a dormant state in lungs. EL cells are engaged in a M-like invasive program and, when cultured in suspension (in mammosphere medium) or in 3D (on top of Matrigel), EL cells form more invasive (M-like) spheres than PT cells, which grow into large and less invasive (Ep-like) spheres ( 19 and Figure S5). Using an inducible short hairpin for ZFP281 (shZFP281) we show that ZFP281 downregulation in EL spheres leads to a transition from an M-like to Ep-like phenotype, resembling PT spheres ( Figure 5A). FACS analysis shows that a population of M-like EL cells with the DOX-induced shZFP281 gain medium and high EpCAM expression, however they do not downregulate Eng expression, leading to an increase in a hybrid phenotype (EpCAM + /Eng + , dark blue) (Figure 5D-E). Additionally, although the frequency of spheres does not change significantly While the in vitro assays employed above are not optimal surrogates to read out dormancy mechanisms, they provide clues as to the phenotypic direction of certain genes in cancer cells. Thus, we next tested the gain and loss of function effects of ZFP281 on tumorigenesis, dissemination and metastasis. EL spheres transduced with the DOX-inducible shZFP281 system were injected in the mammary fat pad (MFP) of mice as reported 19 . Then mice were given control drinking water (-DOX), water with doxycycline from day 0 (+DOX) or starting one month after sphere injection (-DOX +DOX) until the end of the experiment, five months after spheres injection. As previously reported 19 few mice developed palpable slow-growing tumors with static kinetics, with no difference between conditions (data not shown); however, when the injection sites were analyzed after five months, HER2+ EL cells were still found in the MFP of all mice and ZFP281 expression was downregulated both in EL shZFP281 '+DOX' and '-DOX +DOX' groups ( Figure 6A). Even in the absence of primary tumors, after five months single DCCs and micro-metastasis were found in all lungs, supporting a 100% efficiency of dissemination by EL cells (Figure 6B). Comparison of the control and two DOX treatment groups showed that EL shZFP281 -DOX +DOX mice showed less single DCCs per mouse.
However, both groups of animals where ZFP281 was downregulated from the beginning ('+DOX') or after one month ('-DOX +DOX') displayed a significant increase in lung metastasis ( Figure 6C). While solitary HER2 + DCCs in all groups were Ki67 negative, the frequency of proliferative Ki67 + cells in metastasis increased upon ZFP281 downregulation, regardless of the treatment schedule ( Figure   S6A). Given that the M-like clusters mostly enriched in early DCCs were characterized by a ZFP281 enriched signature that also showed expression of dormancy and cell cycle arrest genes (Figure 2D), these data strongly support that the M-like dormant phenotype is induced and maintained by ZFP281.
Consistently, loss of ZFP281 signaled early DCCs reactivation from dormancy and switch to a proliferative phenotype.
Next, we studied the effect of PT spheres overexpressing ZFP281 (ZFP281-OE). The control or ZFP281-OE spheres were injected in the MFP of mice as described above. Although ZFP281-OE cells were not impaired in their ability to form the initial tumor, they were significantly slower in their growth kinetics, supporting the growth suppressive function of ZFP281 ( Figure 6E). Tumor sections showed a heterogeneous increase of ZFP281 expression in the PT ZFP281-OE condition ( Figure   6D). Nevertheless, even with slower growing tumors (Figure 6E), after two months mice carrying ZFP281-OE tumors showed a five-fold increase in the number of lung single-cell DCCs compared to control tumors ( Figure 6F). Importantly, the increase in single-cell DCCs in the ZFP281-OE group did not result in an increase in micro-metastasis at two months. Thus, ZFP281 suppresses growth of the primary tumor, but enhances dissemination without a subsequent increase in metastatic growth, which is consistent with its ability to induce dormancy. In a longer experiment that allowed reading out better overt metastasis, less PT Control or ZFP281-OE cells were injected and tumors were allowed to grow for 70 days, removed by surgery and then mice were followed and euthanized five months after injection. While no difference in number of lung single DCCs was found, a significant reduction in number and size of metastasis was observed in PT ZFP281-OE mice over PT control (Figure 6G-I as well as reduction of Ki67 + cells (Fig S6B). These results clearly corroborate the key role of ZFP281 in inducing a growth arrested dormant phenotype in DCCs that is intrinsically active in early DCCs but also enforceable in late DCCs.

DISCUSSION
Early dissemination was documented in various human and mouse studies [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] ; however, limited information is available as to what is the fate of the early DCCs once lodged in target organs and before metastasis grow out. Further, incomplete modeling of early dissemination biology has prevented determining whether early DCCs turn on active programs of dormancy that delay re-growth and/or if they simply lack sufficient "driver" mutations and require more time and slow proliferation to produce successful clones in target organs.
Our previous work revealed that oncogene and microenvironmental signals in early lesions conspire to activate an EMT program, which appeared to persist in non-proliferative DCCs as marked by a TWIST1 high E-Cadherin low p-Rb low p-histone-H3 low profile that regained E-cadherin to resume proliferation 7,19 . Similarly, early pancreatic DCCs also undergo EMT that persists in circulating pancreatic cells that seed the liver 20 . However, all these studies did not functionally link the EMT program to dormancy or reactivation. Nevertheless, together these data hinted at a modulation of M-like and Ep-like identities in DCCs as a driving mechanism of early DCC fate. Corroborating this, our current study reveals that early lesions driven by the HER2 oncogene activate an M-like program of motility and invasion linked to primed pluripotency that not only allows early lesion cells to spread but also enables them with a program of dormancy where stem cell-like plasticity is operational. Our analysis revealed that ZFP281 serves as a barrier for DCCs to adopt an Ep-like phenotype but also enables M-like DCCs to explore at least 4 major phenotypes described by transcriptional modules A-D. These M-like programs seem to associate with the expression of dormancy markers such as the Cdkn1c/p57 Kip2 , NR2F1 and TGFβ2, some of which are directly bound by ZFP281 and exclusively induced in EL cells. Hybrid clusters of DCCs appeared to downregulate ZFP281 activity and gain back Ep-like genes, supporting our hypothesis that ZFP281 is restraining this switch. Ep-like clusters are also more homogeneous, arguing that once the DCCs commit to a proliferative phenotype they are funneled into a more phenotypically uniform state, except for cluster 15 that seemed to veer further into a "differentiation state" where lactation genes were upregulated. Interestingly, we had described that early DCC-founded metastasis had a mixed histology with undifferentiated and glandular-like structures reminiscent of lactogenic acini 19 . We interpret that early DCCs enter the lung in a M-like state and can persist dormant until signals, yet to be determined (intrinsic or microenvironmental), cause a final switch. Early DCCs M-like states may allow DCCs to explore different programs (developmental/pluripotency and mixed-lineage differentiation) that best fit them to adapt and survive in secondary organs. In fact, mesenchymal M-like DCCs in these same mouse models have been linked to drug resistance 49 . Our analysis of HER2 + early DCCs in tissues showed that they are all vastly negative for Ki67 protein expression. Thus, Hybrid and Ep-like cells may also undergo a dormancy phase, but their transcriptional states may enable them to be more prone to reactivate, a measure we could not capture with Ki67 staining. Additionally, the ZFP281 knockdown suggests that simply reducing this TF can move early DCCs to a more Hybrid or Ep-like phenotype and this correlated with increased metastatic growth. Furthermore, Snail-and Zeb1-driven EMT was previously described to suppress cell-cycle progression through repression of cyclin D1 and D2 50,51 ; while mesenchymal-to-epithelial transition (MET) was associated with rapid relapse and reduced survival in metastatic castrate resistant prostate cancer patients 52 .
Previous studies also theorized that M-like states might produce dormant DCCs 28 . Further, Lawson et al. found that low-burden (dormant-like) breast cancer metastatic cells (i.e., upregulated CDKN1B, CHEK1, TGFBR3 and TGFB2) were mostly basal and pluripotent stem-like (i.e., upregulated POU5F1 and SOX2), while higher-burden DCCs were more luminal-like and proliferative (i.e., upregulated MYC, CDK2, MMP1 and CD24) 53 . Previously, we also reported that the lineage commitment regulators DEC2/BHLHE41 and NR2F1/COUP-TF1 coordinate stem-like and quiescence programs 45,54,55 . However, all the latter studies are in late evolution cancer models. Our data is the first to functionally map these basal/stem-like and developmental/pluripotency programs to such early stages of cancer evolution and associate it with an M-like dormant DCC phenotype.
Recently, Laughney et al. also reported that metastatic cells (from late evolution cancer models) recapitulate a primitive transcriptional program spanning stem-like to regenerative pulmonary epithelial progenitor states, such as the key endoderm and lung-specifying transcription factors, SOX2 and SOX9 56 . Similarly, we observed that early DCCs in our model have a cellular plasticity that may allow them to explore different programs (developmental/pluripotency and mixed-lineage differentiation) that best fit them to adapt and survive in secondary organs. Together, these data suggests that pluripotency and dedifferentiation programs may be common programs present in different cancer types and in early stages of cancer progression. Our findings support that early DCCs display a high degree of cellular plasticity through mesenchymal-like, primed pluripotency and dormancy programs that likely endow them with the necessary fitness to survive and undergo genetic maturation upon reactivation.
As mentioned above, close to 100% of early DCCs are Ki67 negative and when ZFP281 is downregulated while metastases emerge, the Ki67 frequency is very low. Thus, as proposed earlier, it is likely that once early DCCs break out from dormancy, they then initiate slow proliferation and genetic maturation. These gradual kinetics may contribute to the invisible phase of the metastatic disease 1 . Nevertheless, our work shows that ZFP281-regulated (or other) dormancy of early DCCs is a prior barrier to overcome before maturation ensues and together both steps may be very protracted.
A remarkable finding was that the early DCC dormancy program seems to be pre-encoded in the primary site early lesions via ZFP281 upregulation and thus, this TF may serve as a new marker of dormant early DCCs. ZFP281 suppresses a fully epithelial phenotype, inducing a growth arrested dormant phenotype in DCCs that is spontaneously operational in early DCCs. Furthermore, loss of ZFP281 leads to reactivation of DCCs and switch to a proliferative phenotype. Thus, an opportunity opens to identify lesions that may carry or not this dormancy program and determine if it informs on dissemination and relapse measures. Indeed, we showed that ZFP281 detection is prevalent in human DCIS samples and significantly decreased in advanced invasive tumors, further supporting the validity of our findings. Since ZFP281 seems to be quite specific for early lesions and early DCCs, and knowing that other TFs, such as NR2F1, when detected in prostate and breast cancer DCCs inform on patient prognosis 45,57 , similar studies could be performed for ZFP281. It would be interesting to test if ZFP281 may help measure the abundance of early-like DCCs in patients with early or advanced disease and if it may serve as a marker of relapse.
Several questions that remain unanswered will need additional studies. For example, we have no clear indication of what cues specifically induce ZFP281 in early lesions and early DCCs. The link to M-like phenotypes suggest that signaling like TGFβs, Wnts and BMPs may induce this TF. Further, the role of ZFP281 in PT cells remains unknown. While ZFP281 expression decreases in PT cells it still binds and regulates a different ZFP281-dependent program in PT cells, suggesting that different ZFP281-dependent regulatory networks may operate in EL and PT cells, due likely to the presence/absence of different ZFP281-interacting co-activators and/or repressors 36 . Last, our approach of single cell analysis could not specifically distinguish early DCCs from those exclusively arriving from late lesions. Nevertheless, our data provide unprecedented insight into early DCC fate, demonstrating that ZFP281 regulates an active program of dormancy that must be overridden and precedes a slow proliferation phase towards metastasis. Future studies would also need to pair the analysis of these mechanisms to determine how they influence genetic maturation of early DCCs.
Overall, we reveal a unique biology that expands our understanding of metastatic progression that may lead to new markers and strategies to prevent metastasis.

ACKNOWLEDGMENTS
We thank the Aguirre-Ghiso lab for helpful discussions and thank the expertise and assistance of the    (B) Distribution of ZFP281 predicted target scores, summarizing the UMI fraction of ZFP281 predicted targets (ZFP281 node in Figure 3A and STable 6), in all DCC clusters analyzed by scRNAseq ( Figure   2).   Figure 1A). Gene lists in STable 7 and 8.

(B) Venn diagrams for ZFP281 targets identified by ChIPseq from EL cells and primed mEpiSCs
ChIPSeq data, GSE93042 from 58 ) and cell cycle arrest, EMT, Wnt and FGF signaling genes.
Comparisons and statistics were done in pairs (ZFP281 targets in EL vs. Cell cycle arrest genes; ZFP281 targets in EL vs. EMT genes; ZFP281 targets in EL vs. Wnt genes; ZFP281 targets in EL vs.  See also Figure S5.   (C) Biological negative controls used for FACS gating strategy. FvB mammary gland (MG) was used to set the EL and PT gate and FvB lungs for eL and LL DCCs (see Figure 1F).

Supplementary
Venn diagrams on Figure 4B. Table 10. Gene lists of DEG in RNAseq and ChIP seq data from EL/PT cells.

Supplementary
Heatmap on Figure S4A.  with a pipeline that included the following steps. Reads were aligned to the transcriptome derived from UCSC genome assembly (((hg19))) using STAR 59 with default parameters. Read counts for each transcript was measured using featureCounts 60   Single-cell datasets (1 st and 2 nd experiments) were clustered separately using an unsupervised clustering algorithm previously described 34 . In this EM-like algorithm, parameters for multinomial celltype specific gene-expression models are learned together with a parameter for the fraction of background noise that is associated with cells in each sample. Mitochondrial genes, Malat1 were excluded from the clustering process. The clustering parameters were chosen to accommodate the different UMI and cell counts between the experiments. In the 1 st experiment, the minimum number of UMIs per cell was 300, the number of clusters k was 12, and (P 1 ,P 2 ) =(30 th ,60 th ) percentiles. In the 2 nd experiment, the minimum number of UMI per cell was 800, k was set to 20, and (P 1 ,P 2 ) =(0 th ,20 th ) percentiles. Gene modules and gene scores: Gene modules were based on a gene-covariance analysis as was applied as in 34 . Briefly, cells were down-sampled and variable genes were selected based on the variance to mean ratio. Gene-to-gene correlations were estimated per sample and were averaged following z-transformation. The averaged correlation matrices were hierarchically clustered into gene-"modules". Given a gene-list, we calculated its score per cell by summing up the UMIs of the genes in this cell and divided the sum by the total sum of UMI in the cell. The score therefore equals to the fraction of UMIs associated with the genes in the list. The aligned bam files were sorted by name using the parameter -n. We used the HTSeq software (v0.11.2) and mm10 annotation file from GENCODE (version M19) to count reads for each gene using parameters -r name -f bam, and BioMart 63 to retrieve corresponding genes names. Finally, read counts were normalized with the trimmed mean of M-values (TMM) method 64 for differential expression analysis using edgeR (v3.26.8) 65 . Public RNA-seq data were downloaded (refer to Key

Supplementary
Resource Table) and aligned to mm10, and followed with the same processing setting. Cell cycle arrest, EMT, FGF signaling, and Wnt signaling gene sets were downloaded from MsigDB (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) with systematic names M1134, M5930, M1090, and M7847, respectively. All processed and index sorted bam files of high-throughput sequencing data were converted to TDF files using count command of igvtools, followed by visualization using IGV software 66,67 . Immunofluorescence. Tissues slides (see 'Animal experiments' section) were dehydrated, followed by antigen retrieval 10 mM citrate buffer pH 6.0 (Na 3 H 6 H 5 O 7 ). Blocking was done using 0.5% BSA in PBS with 5% normal goat serum (Thermofisher PCN5000) for 1 hour. Antibodies and incubation conditions used are summarized in Supplementary Table 13. For ZFP281 detection, Alexa Fluor™ 488 Tyramide SuperBoost™ Kit goat anti-mouse IgG (Invitrogen) was used for amplification of the signal. All slides were mounted with ProLong Gold Antifade reagent with DAPI (Invitrogen P36931).
Images were obtained using a Leica SPE high-resolution spectral confocal microscope and Leica software.
Quantitative PCR. Spheres were processed using Cell-to-CT 1-Step Power SYBR Green kit (Invitrogen, A25600) and primers from Supplementary In vivo experiments. 300 EL or 150-300 PT spheres were injected per site into nude mice (BALB/cnu/nu, Charles River) in 100ul of a 1:1 PBS-Matrigel solution (Corning, growth factor reduced). Spheres were injected in the two fourth inguinal gland fat pad using a 27-gauge needle.
Mice injected with sh-TRIPZ-shZFP281 were given control drinking water (-DOX), water with doxycycline from day 0 (+DOX) or water with doxycycline starting 1 month after sphere injection (-DOX +DOX) until the end of the experiment, 5 months after spheres injection. In the case of mice injected with tumour-derived spheres, tumours were removed before reaching 1cm 3 , according to IAUCU regulations. Mice were euthanized and organs were collected and processed 2 or 5 month after cancer cell injections. Immunofluorescences were performed 2 sections per mice were used to quantify and characterize single DCCs and metastasis. H&E slides were scanned using NanoZoomer Data availability. The datasets generated during and/or analyzed during the current study are available within the paper (and its Supplementary Information) and/or from the corresponding author on reasonable request. All sequencing data will be made available in a public data repository.