ZFP281 drives a mesenchymal-like dormancy program in early disseminated breast cancer cells that prevents metastatic outgrowth in the lung

Increasing evidence shows that cancer cells can disseminate from early evolved primary lesions much earlier than the classical metastasis models predicted. Here, we reveal at a single-cell resolution that mesenchymal-like (M-like) and pluripotency-like programs coordinate dissemination and a long-lived dormancy program of early disseminated cancer cells (DCCs). The transcription factor ZFP281 induces a permissive state for heterogeneous M-like transcriptional programs, which associate with a dormancy signature and phenotype in vivo. Downregulation of ZFP281 leads to a loss of an invasive, M-like dormancy phenotype and a switch to lung metastatic outgrowth. We also show that FGF2 and TWIST1 induce ZFP281 expression to induce the M-like state, which is linked to CDH1 downregulation and upregulation of CDH11. We found that ZFP281 not only controls the early dissemination of cancer cells but also locks early DCCs in a dormant state by preventing the acquisition of an epithelial-like proliferative program and consequent metastases outgrowth. Aguirre-Ghiso and colleagues report that the pluripotency transcription factor ZFP281 promotes dormancy of early disseminated breast cancer cells in the lung by driving a mesenchymal-like gene expression program.

M ost cancer patients die of metastatic relapse, which frequently occurs years to decades after diagnosis and treatment and is initiated by DCCs that can remain clinically dormant for long periods 1 . Cancer dormancy is a major clinical problem. However, our knowledge about how cancer cells remain quiescent while retaining metastasis-initiating capacity is still limited. Additionally, it was believed that cancer cells could disseminate and metastasize only during late stages of progression [2][3][4] . However, increasing evidence supports that early DCCs seed organs over long periods of time, starting very early in cancer evolution 1 .
We previously found that HER2 and PyMT oncogene signaling, along with tissue resident macrophages, activates a partial epithelial-to-mesenchymal transition (EMT) program, leading to early cancer cell dissemination 7,20,24 . Additionally, HER2 + early DCCs in secondary organs maintain a TWIST1 + and long-lived dormant phenotype that preceded metastasis initiation 20 . Here, we used single-cell RNA sequencing (scRNAseq) to reveal the DCC heterogeneity and plasticity of lung DCCs across the spectrum of mammary cancer progression. We found that the primed pluripotency transcription factor (TF) ZFP281 is a key regulator of early DCC spread and dormancy. Using both organoid and in vivo models, we show that ZFP281 is induced by FGF2 and TWIST1. TWIST1 induces ZFP281, which in turn maintains expression of the former TF to induce M-like and primed pluripotency-like programs. These programs suppress proliferation in primary sites but allow for efficient dissemination (at least to the lungs). After dissemination, ZFP281 maintains early DCCs in a prolonged growth-arrested dormant state via the induction of the class II cadherin 11 (CDH11). Importantly, we show that even aggressive primary tumor (PT) cells, which show low levels of ZFP281, can be reprogrammed into dormancy and prevented from metastasizing by regaining ZFP281 or CDH11 expression. Our findings have yielded a previously unrecognized mechanism of metastatic dormancy that would have been missed if only advanced primary tumor biology and the classical view of the metastatic cascade were considered. a single cluster (9) is uniquely composed of EL cells, eL DCCs and LL DCCs, but not PT cells (Fig. 1d), suggesting that ELs contain a subpopulation of cells that already carry a signature similar to the one found in cells that disseminated and persisted in lungs. DCCs from early and late lungs, although heterogeneous and distinct from EL and PT cells, were always contained in the same transcriptional clusters (7)(8)(9)(10)(11) (Fig. 1d), suggesting that DCCs with EL signatures may persist in the late stages. This prevented distinguishing early DCCs from those DCCs populating late lungs, most likely because late lungs carry early DCCs (derived from ELs), PT-derived DCCs and growing metastasis, all coexisting in the lungs. Nonetheless, Ep-like and M-like signatures found in the primary early and late lesions ( Fig. 1a and Harper et al. 20 ) were also found in lung DCCs, and these cells could be broadly grouped into Ep-like (7)(8) and M-like (9-11) clusters (Fig. 1e). Ep-like clusters 7 and 8 shared epithelial signatures more homogeneously, whereas the M-Like clusters 9-11 showed nonoverlapping mesenchymal signatures. Of note, DCCs show a high degree of transcriptional heterogeneity, but few cells display full EMT or mesenchymal-to-epithelial transition (MET) signatures, which led the terms M-like and Ep-like rather than strict categorizations.
Analyses by both fluorescence-activated cell sorting (FACS) ( Fig. 1f and Extended Data Fig. 1f) and immunofluorescence ( Fig. 1g) revealed that freshly isolated PT cells show high levels of HER2 and a predominant epithelial phenotype, characterized by strong EpCAM expression and noninvasive organoids. In contrast, EL cells showed a broader spectrum of HER2 expression and a mixture of epithelial (41.6% EpCAM + ) and mesenchymal (17.2% Eng/CD105 + ) cell populations (Fig. 1f, representative plots, and Extended Data Fig. 1f, replicates combined), which correlated with a more invasive phenotype ( Fig. 1g and Harper et al. 20 ). Endoglin (Eng/CD105) was a mesenchymal marker 32 selected from the scRNAseq data due to its selective upregulation in M-like DCCs (clusters 10 and 11; Fig. 1e). We confirmed that the M-like/Eng + and invasive phenotype found in EL cells persists and increases in frequency in early lung DCCs (Fig. 1f,g, Extended Data Fig. 1f, ~56.13%). In contrast, late lungs presented a smaller population of M-like (~30% Eng + ) and invasive phenotype and increase in Ep-like (~30% EpCAM + ) and noninvasive phenotype, resembling PT cells (Fig. 1f,g and Extended Data Fig. 1f). We conclude that a subpopulation of EL cells express an M-like signature found in lung DCCs clustered with an M-like signature. Remarkably, late lungs are still populated by a significant fraction of DCCs with signatures found in early DCCs, similar to EL cells.

Early DCCs gain M-like states and a dormant phenotype.
To gain further insight into the heterogeneity of M-and Ep-like phenotypes of lung DCCs, we next performed additional scRNAseq profiling exclusively on lung HER2 + DCCs. HER2 − non-cancer lung cells and HER2 + DCCs from early-and late-stage mice (15,287 additional Table 1) detected by RNAseq from MMTV-HER2 EL and PT spheres cultured for 7 days showing 2,873 upregulated genes (red) and 1,417 downregulated genes (blue) (P value < 0.05 and fold change (FC) >2 or <0.5). Full table is shown in Supplementary Table 1. a, GSEA 28,29 of EMT (top hallmark hit) and mammary gland luminal, myoepithelial/basal and stem cell signatures 30 in MMTV-HER2 EL versus PT 7-day sphere bulk RNAseq (GSEA; Supplementary Table 3). ES, enrichment score; NES, normalized enrichment score; NOM P value, nominal P value < 0.05; FDR value, false discovery rate < 0.25. Full table is shown in Supplementary Table 3. b, Distribution of the gene expression signatures 'Up' (upregulated) and 'Down' (downregulated) in EL/PT spheres ( Fig. 1a and Supplementary Table 1) in MMTV-HER2 EL (teal), PT (red), eL (early lungs, blue) and LL (late lungs, orange) DCC scRNAseq. c, MMTV-HER2 EL, PT, eL (early lungs) and LL (late lungs) DCC scRNAseq sample distribution per cluster. Unsupervised clustering on the DEGs was performed using a previously described batch-aware algorithm 31 . d, Heatmap of unique molecular identifier (UMI) counts of epithelial (Ep) and mesenchymal (M) genes (Supplementary Table 4) in scRNAseq after unsupervised clustering on the DEGs using a previously described batch-aware algorithm 31 . Cell clusters were subgrouped as EL (1)(2)(3)(4), PT (5-6) and DCCs (7)(8)(9)(10)(11), according to the predominant cell type in each cluster. e, Representative plots of EpCAM (epithelial marker) and Eng (mesenchymal marker) CD45 − HER2 + MMTV-HER2 EL, PT cells and eL (early lungs) and LL (late lungs) DCCs after tissue dissociation. Quantification is shown in Extended Data Fig. 1f. f, Imaging of sorted CD45 − HER2 + EL, PT, eL and LL DCCs upon 7-day culture in 3D, on top of Matrigel. SSC, side scatter. g, Brightfield images (left; scale bars, 50 μm) and HER2 (red) expression (right; scale bars, 25 μm). See also Extended Data Fig. 1. cells) underwent comprehensive analysis and clustering (Extended Data Fig. 2a-c). Note that the oncogenic driver (for example, HER2) expressed downstream of the MMTV promoter serves as the tag for isolation, and DCCs low or negative for HER2 may be lost and were not considered in the analysis. We identified 25 distinct clusters; 10 were excluded due to their high prevalence in normal lung cells (Fig. 2b). The DCCs were further subgrouped in HER2+M-like (1 to 4), hybrid (5 to 8) and Ep-like (9 to 15) based on canonical mesenchymal and epithelial signatures (Fig. 2a-c   Epcam-PerCP-Cy5.5 Epcam-PerCP-Cy5.5 Epcam-PerCP-Cy5.5 Eng-PE-Cy7  An extreme Ep-like signature enrichment is cluster 15, which was enriched exclusively in LL DCCs (Fig. 2a). These data suggest that late-stage mice carry more Ep-like DCCs, whereas early-stage mice more frequently have DCCs with M-like and hybrid phenotypes. A caveat of this analysis is that the M-and hybrid-like signatures in LL DCCs may be contributed by eL DCCs that persist in lungs (Fig. 2a,b). Analysis of gene-to-gene correlation among highly variable genes identified gene modules with strong coexpression patterns ( Fig. 2c and Extended Data Fig. 3a). The enrichment of TF targets that correlated with the expression of the modules (Enrichr analysis 33,34 ) revealed multiple programs activated in DCCs that are associated with pluripotency, mixed-lineage differentiation and EMT ( Fig. 2d and Supplementary Table 5). Predicted TF enrichment motifs in M-like, hybrid and Ep-like cells were further confirmed by additional differential gene expression analysis between these DCC clusters (Supplementary Table 6). M-like DCCs from cluster 1, enriched in gene module A, express brain-and osteoblast-lineage genes, and this A signature revealed genes that Enrichr analysis identified as being regulated by the TFs Neurod1 (neurogenic differentiation 1), SOX8, SOX9 and SOX10 (embryonic development); and upregulation of Vim, Col4a1 and Col4a2 (EMT genes confirmed by DEG analysis of scRNAseq dataset comparing M-like, hybrid and Ep-like clusters; Supplementary Table 6 and Extended Data Fig. 3). M-like DCCs from cluster 2, enriched in gene module B, share genes mentioned above and genes that the Enrichr analysis identified as regulated by TFs and chromatin remodelers SUZ12, SOX17, SOX18 and POU5F1/OCT4 (pluripotency regulators confirmed by DEG analysis of scRNAseq clusters; Supplementary Table 6). M-like DCCs from clusters 3 and 4 still carried genes controlled by the above TFs but also upregulated EMT genes (Zeb2 and Col3a1) and genes regulated by Snai2, Twist1, Prrx1, Fbn1 (EMT inducers (confirmed by DEG analysis of scRNAseq clusters); Supplementary Table 6) and SMADs. These data support the notion that an M-like program initiated in EL cells ( Fig. 1 and Extended Data Fig. 1) is also expressed in eL DCCs and persists in LL DCCs. Hybrid DCCs (clusters 5 to 8), shifted toward genes regulated by GATA6, Tp63, Tp73 and KLF4, typical basal and luminal epithelium switch regulators, and epithelial markers Krt7 and Krt8 (Fig. 2c and Supplementary Tables 5 and 6). Thus, DCCs in hybrid clusters might be in transit between M-and Ep-like states. Cluster 8 (hybrid) is composed of distinct cell populations that express gene modules H (B, C and D) and I (  Table 6 and Extended Data Fig. 3b). Cluster 15 is distinct and, as mentioned above, composed only of DCCs from late lungs (LL). These DCCs express luminal epithelial genes (EpCAM and Krt18), Ovol1, Ovol2, Grhl2 TFs (epithelial genes confirmed by DEG analysis -Supplementary  Table 6). These data indicate that early DCCs activate gene programs of progenitor-, M-like and dormancy phenotypes and that the transition to an Ep-like program is associated with their ability to form proliferative metastasis. Thus, the M-versus Ep-like states reflects a dormant versus proliferative state of DCCs.

ZFP281 is associated with M-like states in ELs and early DCCs.
MMTV-HER2 EL cells do not form tumors but disseminate efficiently and persist as DCCs in lungs 20 . We hypothesized that the M-like program found in the DCCs may be transcriptionally encoded in ELs. To test this hypothesis, we performed a TF network analysis mining the bulk RNAseq data derived from EL versus PT spheres (from Fig. 1). This analysis identified eight interconnected nodes where ZFP281 was the TF node with the highest number of connected DEGs in EL cells ( Fig. 3a and Supplementary Table 7). ZFP281 is a key regulator of primed pluripotency in mouse and human embryonic stem cells and functions as a barrier to achieve naive pluripotency 35 , and we found only low expression in rare cells in normal mammary gland cells (Extended Data Fig. 4b).
Further, ZFP281 promotes EMT in colorectal cancer cells 36 and it is upregulated during the naive-to-primed pluripotent state transition 35 where partial EMT/epithelial plasticity was postulated to happen 37,38 . However, it is unclear how ZFP281 regulates EMT and how it is linked to early breast cancer progression. The second largest node, NR5A2/LRH-1 (Fig. 3a), regulates embryonic stem cell pluripotency 39 , but its link to EMT in breast cancer is unclear 40,41 . Among other TFs, RARβ and RARγ, previously linked to dormancy 42 , may also play a role in ELs and early DCCs.
We focused on ZFP281, as in embryogenesis, it regulates stem cell pluripotency, growth arrest and EMT genes. We validated the increase in ZFP281 in EL over PT cells and its computationally predicted target genes ( Fig. 3a) levels by quantitative polymerase chain reaction (qPCR) (Extended Data Fig. 4a). We found that M-like genes such as CDH11 and Eng are induced in EL over PT cells, whereas Ep-like genes such as CDH1 and EpCAM are downregulated, arguing that ZFP281 represses an Ep-like identity. Predicted ZFP281 target genes ( Fig. 3a) are frequently upregulated in M-like lung DCCs, whereas Ep-like lung DCCs do not express these genes (scRNAseq; Fig. 3b). ZFP281 expression itself is also higher in M-like and hybrid cluster cells (1-8) over Ep-like clusters (9-15) ( Fig. 3c). At the protein level, we found even greater differences in ZFP281 expression; in normal Friend leukemia virus B susceptible strain (FvB) mouse mammary glands, ZFP281 is  Table 4) in MMTV-HER2 lung DCC clusters after unsupervised clustering on the DEGs using a previously described batch-aware algorithm 31 . Cell clusters were subgrouped as M-like (1-4), hybrid (5)(6)(7)(8) and Ep-like (9)(10)(11)(12)(13)(14)(15). Dots color-coded by sample origin (early lung (eL) DCCs, blue; late lung (LL) DCCs, orange). Pearson's chi-squared test with Yates' continuity correction. b, HER2 − lung cells (gray) and HER2 + eL (blue) and LL (orange) DCC scRNAseq sample distribution per cluster. Unsupervised clustering on the DEGs was performed using a previously described batch-aware algorithm 31 . c, Heatmap of UMI counts of selected genes (gene list in Supplementary Table 4) in MMTV-HER2 HER2 − lung cells (gray) and HER2 + eL (blue), and LL (orange) DCC scRNAseq. N1-N10 are clusters enriched in HER2lung cells and excluded in further analysis. Clusters 1-15 have less than 16% than HER2 − lung cells, so these clusters were considered cancer cell clusters, but HER2 − lung cells were excluded in further analysis. Clusters 1-15 DEG modules identified in boxed letters. d, DEGs represented in the heatmap in panel c (black) and TFs (blue) enriched in each gene module (in brackets) predicted using Enrichr 33,34 . Full gene lists in Supplementary Table 5 expressed only in 3% of the cells, whereas 30% of MMTV-HER2 EL cells express ZFP281, which is then downregulated in PT cells (8% ZFP281 + ; Fig. 3d and Extended Data Fig. 4b). The ZFP281 + cells in PTs were mainly found in the tumor-stroma interface. Similarly, in the MMTV-PyMT mouse model, ZFP281 expression is also upregulated in EL over PT cells (Extended Data Fig. 4c). Staining of E-cadherin (Ep-marker) and Twist1 (M-marker) in sequential sections show that in the MMTV-HER2 mouse mammary tissue, HER2 + EL structures enriched in ZFP281 are Ecad low (less intense membrane staining) and Twist1 high (Extended Data Fig. 4b). When monitoring ZFP281 expression in the early HER2 + DCCs, we also found that 42% of single early DCCs are ZFP281 + , whereas only 5% of cells within proliferative metastasis are ZFP281 + (Fig. 3e,f). This finding further supports that ZFP281 upregulation in EL cells persist in early lung DCCs. Interestingly, Ki67 and ZFP281 expression were found to be mutually exclusive in early lung DCCs (Fig. 3f), suggesting an anti-proliferative function. Further, the majority of lung HER2 + DCCs are in contact with alveolar type II (AT2) cells within the alveoli both in eL (72%) and LL (79%) mice, and all HER2 + DCCs were adjacent (in contact or 1-2 cell diameter) to CD31 + vessels (Extended Data Fig. 4d,e). Among human ductal carcinoma in situ (DCIS) samples, 48% of the cells per lesion were  ZFP281 + , whereas only 11% of the invasive breast cancer (IBC) cells were positive for ZFP281 (Fig. 3g,h). Detection of Ki67 in these same samples showed that DCIS (ZFP281 + ) was less proliferative than IBC (ZFP281 − ) (Fig. 3i). These data support that ZFP281 is an early breast cancer progression TF, which coordinates EMT-like and growth arrest programs.   Fig. 3 | Identification of the tF ZFP281 as upregulated in ELs and early DCCs. a, TF (squares and large fonts) network analysis derived from the RNAseq DEGs from MMTV-HER2 EL and PT spheres. Red circles, upregulated genes; green circles, downregulated genes; blue, TFs. Lines connect the TF at the center of the node to target genes, indicating that the connected genes have predicted TF binding elements (validated or predicted) in the promoter regions (−500 to +2,500 bp from the transcription start site (TSS)). Full gene list is shown in Supplementary Table 7. b, Distribution of ZFP281 predicted target scores, summarizing the UMI fraction of ZFP281 predicted targets (ZFP281 node in Fig. 3a and Supplementary Table 7), in all DCC clusters analyzed by scRNAseq (Fig. 2). Two-tailed Wilcoxon signed-rank test, P < 10 −10 . c, Distribution of ZFP281 expression in all DCC clusters analyzed by scRNAseq (Fig. 2). Two-tailed Wilcoxon signed-rank test, P = 3.1 × 10 −4 . d, ZFP281 protein expression in HER2 + cells from FvB mammary gland and MMTV-HER2 EL and PT tissues. Graph shows n = 4 mice per group, median and two-tailed Mann-Whitney test. Representative pictures are shown in Extended Data Fig. 4b. e,f, ZFP281 (green) and Ki67 (gray) protein expression in HER2 + cells from MMTV-HER2 (HER2, red) early lung DCCs and late metastasis (mets). Scale bars, 20 μm. Graph shows n = 4 mice per group, median and two-tailed Mann-Whitney test. g,h, ZFP281 (green) protein detection in human DCIS and IBC samples. Scale bars, 25 μm. Graph shows n = 14 patients per group, median and two-tailed Mann-Whitney test. i, Frequency of Ki67 + cells in human DCIS and IBC samples. Graph shows n = 14 patients per group, median and nonsignificant two-tailed Mann-Whitney test. See also Extended Data Fig. 4.

Analysis of ZFP281-regulated programs in early DCCs.
To reveal the programs that ZFP281 regulates in early mammary cancer cells, we compared RNAseq data from naive versus primed mouse pluripotent stem cells (a transition regulated by ZFP281 (ref. 35 )) and MMTV-HER2 EL versus PT spheres (Fig. 1a). Interestingly, DEGs in EL/PT and primed/naive stem cells were enriched in hallmarks of EMT (Fig. 4a,b). Conversely, the EMT pathway is downregulated upon RNA interference (RNAi)-mediated ZFP281 downregulation in MMTV-HER2 EL cells (Fig. 4c (right) and Extended Data Fig. 5a), suggesting that in EL cells, ZFP281 may drive EMT.
Chromatin immunoprecipitation sequencing (ChIPseq) in EL and PT cells identified 4,018 ZFP281 targets in EL cells (Supplementary Table 8). ZFP281 preferentially binds to 5′ untranslated regions and promoter regions in both EL and PT samples (Extended Data Fig. 5b), and ontology analysis on ZFP281 targets shows EMT, cell cycle and Wnt signaling pathways in the top  Table 8) and primed mEpiSCs (ChIPseq data: GSE93042 from Huang et al. 68 ) and cell cycle arrest, EMT, Wnt and FGF signaling genes.
Comparisons and statistics were done in pairs. Overlapped genes between these paired comparisons were rare so for graphical simplification the four comparisons are displayed together.  Fig. 2). h, Distribution of gene modules B and D in all DCC clusters. Dots represent single cells color-coded by ZFP281 target (ChIP data) scores (low (red) to high (green)). Plot with all cells and clusters in Extended Data Fig. 5i,j. See also Extended Data Fig. 5.
enriched pathways (Extended Data Fig. 5c). Strikingly, in EL cells and primed mouse epiblast-derived stem cells (mEpiSCs) (Fig. 4d), ZFP281 seems to regulate overlapping cell cycle arrest, EMT, Wnt and FGFR signaling pathways. Thus, HER2-driven EL cells activate distinct programs found very early in embryo development 43 . When comparing MMTV-HER2 EL/PT RNAseq and ChIPseq data, we found 759 genes with high ZFP281 binding and high expression in EL cells (Extended Data Fig. 5d and Supplementary  Table 10), and gene ontology analysis showed that these genes (UP_UP, red) are enriched in extracellular matrix organization. In contrast, 177 genes have low ZFP281 binding and expression in EL cells (DW_DW, green) and are enriched in cell-cell junction organization, among other pathways (Extended Data Fig. 5e). Some of these genes overlap with the putative ZFP281 target genes from Fig. 3a (Extended Data Fig. 5f), but we also identified new ZFP281 target genes. Among them are Snai1, Vim, Zeb1 (EMT inducers 27 ), Cdk2, Cdkn1a (cell cycle related 44 ) and Tgfbr1 and Nr2f1 (dormancy-associated genes 45 ) (Fig. 4e, Extended Data Fig. 5g and Supplementary Table 8). These genes were exclusively bound by ZFP281 and upregulated in EL cells or bound by ZFP281 in EL and PT cells but only upregulated in EL cells. We also identified 118 genes with high ZFP281 binding and high expression in PT cells (Extended Data Fig. 5d). This finding suggests that although ZFP281 expression decreases in PT cells, it still binds and regulates a different set of genes of unknown function in PT phenotype. To further filter genes regulated by ZFP281 in EL cells, we compared the list of ZFP281 ChIPseq targets (Supplementary Table 8) with 929 DEGs upon RNAi-mediated ZFP281 downregulation in EL cells (Extended Data Fig. 5a). These data show that more genes are bound by ZFP281 than those detected as induced or repressed directly in EL spheres upon ZFP281 knockdown. Nonetheless, 79 genes are simultaneously bound by ZFP281 in EL cells and differentially expressed upon its downregulation (Fig. 4f).
To address the importance of ZFP281 and its ChIPseq-identified target genes in lung DCCs, we examined their expression in our lung DCC scRNAseq data. Strikingly, M-like and hybrid DCC clusters display the highest levels of ZFP281-regulated signatures (ZFP281 targets from ChIPseq), and these scores drop significantly in Ep-like DCCs (Fig. 4g,h and Extended Data Fig. 5h-j). Some clusters like cluster 8 showed a drop in the ZFP281 signature score, arguing that some hybrid cluster cells move from an M-like to an Ep-like state. Together, the data support that ZFP281-regulated genes are activated in EL cells, carried over and sustained in M-like dormant DCCs in secondary organs.

ZFP281 maintains early DCCs in an M-like dormant state.
MMTV-HER2 EL cells are engaged in an M-like invasive program, whereas PT cells have a proliferative phenotype (Harper et al. 20 and Extended Data Fig. 6). To functionally test whether ZFP281 holds DCCs in a dormant state in the lungs, we used an inducible short hairpin for ZFP281 (shZFP281, different targeting sequence from that used in Fig. 4c,f). ZFP281 downregulation in EL spheres leads to a transition from an M-like to Ep-like phenotype, resembling PT spheres (Fig. 5a). FACS analysis showed that upon ZFP281 downregulation, a population of M-like EL cells gain EpCAM expression but do not downregulate Eng expression, leading to an increase in cells with a hybrid phenotype (EpCAM + /Eng + , dark blue) (Fig. 5d,e). ZFP281 downregulation did not affect the frequency of spheres (Fig. 5b), but it increased the sphere size (number of cells per sphere) (Fig. 5c), supporting a switch to enhanced proliferation upon sphere formation. In Matrigel, the number of invasive acini is significantly lower in the EL shZFP281+DOX condition (Fig. 5f,g). Similar results were observed with an short interfering RNA targeting mouse ZFP281 (Extended Data Fig. 6h,i, targeting sequence used in Fig. 4c,f). Corroborating this partial MET, we observed a decrease in Twist1, Eng and CDH11 (mesenchymal markers) upon ZFP281 downregulation (Extended Data Fig. 4a, third column). ZFP281 is also downregulated in PTs from the MMTV-PyMT model (Extended Data Fig. 4c), but knockdown of ZFP281 in MMTV-PyMT EL spheres did not enhance proliferation (Extended Data Fig. 6j-l) (see Discussion). Overexpression of ZFP281 in MMTV-HER2 PT spheres (PT ZFP281-OE, Extended Data Fig. 4a, fourth column) induced an invasive M-like phenotype (Fig. 5a), confirmed by FACS (Fig. 5d,e) and qPCR (Extended Data Fig. 4a, fourth column), increased sphere formation (Fig. 5b), reduced sphere size and thus proliferation (Fig. 5c) and increased organoid invasive phenotype (Fig. 5f,g). Similarly, overexpression of ZFP281 in MMTV-PyMT PT spheres suppressed sphere size (Extended Data Fig. 6h-j). Thus, PyMT tumors remain responsive to ZFP281 growth-suppressive function when overexpressed.
We next tested the gain-and loss-of-function effects of ZFP281 on tumorigenesis, dissemination and metastasis in vivo. MMTV-HER2 EL spheres transduced with the DOX-inducible shZFP281 system were injected in the mammary fat pad (MFP) and mice were given vehicle drinking water (−DOX), water with doxycycline from day 0 (+DOX), or starting 1 month after sphere injection (−DOX +DOX) for 4 months. As reported previoously 20 , few mice developed tumors that were small and static; however, when the injection sites were analyzed after 3 and 5 months, HER2 + EL cells were still found in the MFP of all mice, and DOX treatment caused ZFP281 downregulation ( Fig. 6a and Extended Data Fig. 7a). Even in the absence of PTs, after 3 and 5 months (two independent experiments), single DCCs and micrometastasis were found in all lungs, supporting a 100% dissemination efficiency by EL cells (Fig. 6b,c and Extended Data Fig. 7b). Three months after downregulation of ZFP281, an increase in the number and area of lung metastasis is already observed compared with shZFP281 −DOX mice (Extended Data Fig. 7b). Additionally, both groups of animals in which ZFP281 was downregulated from the beginning (+DOX) or after 1 month (−DOX +DOX) displayed a significant increase in lung metastasis 5 months after injection (Fig. 6c). Importantly, EL shZFP281 −DOX +DOX mice showed fewer single lung DCCs than control mice. Although solitary HER2 + DCCs in all groups were Ki67 − , proliferative Ki67 + cell frequency in metastasis increased upon ZFP281 downregulation (Extended Data Fig. 7c). Additionally, the frequency of Twist1 + DCCs decreased upon downregulation of ZFP281, whereas Ecad + DCC frequency increased (albeit more variably (not significant)) (Extended Data Fig. 7d,e). Given that the M-like clusters mostly enriched in early DCCs were characterized by a ZFP281-enriched signature that also showed expression of dormancy and cell cycle arrest genes (Fig. 2c,f), these data strongly support that the M-like and dormant phenotypes are induced and maintained by ZFP281. Consistently, loss of ZFP281 signaled early DCC reactivation from dormancy.
Next, we studied the phenotype of PT spheres overexpressing ZFP281 (ZFP281-OE). Control or ZFP281-OE spheres were injected in the MFP, and mice were euthanized 2 or 5 months later (two independent experiments). Tumor sections confirmed an overall increase in ZFP281 expression in the PT ZFP281-OE condition ( Fig. 6d and Extended Data Fig. 7f), which resulted in significantly slower growth kinetics, but not tumor take (Fig. 6e), supporting a growth-suppressive function of ZFP281 ( Fig. 5c and Extended Data Fig. 6k). Interestingly, the animals with slower-growing ZFP281-OE tumors (Fig. 6e) showed a fivefold increase in the number of lung single-cell DCCs compared to control tumors (Fig. 6f), but this DCC frequency increase did not result in an increase in micrometastasis at 2 months. Thus, ZFP281 suppresses growth of the PT but enhances dissemination without a subsequent increase in metastatic growth. In a second longer experiment, fewer PT control or ZFP281-OE cells were injected, and tumors were allowed to grow for 70 days and removed by surgery, and then mice were followed and euthanized 5 months after injection. Although no difference in the number of lung single DCCs was found, a significant reduction in the number and size of metastasis was observed in PT ZFP281-OE mice over PT control (Fig. 6g-i), as well as reduction of Ki67 + cells (Extended Data Fig. 7g). Additionally, upon overexpression of ZFP281, the less proliferative DCCs and metastasis displayed higher frequency of TWIST1 + and lower frequency of Ecad + cells (Extended Data Fig. 7h,i). These results further support the key role of ZFP281 in inducing an M-like phenotype and a growth-arrested dormant phenotype in DCCs.

Upstream inducers and downstream targets of ZFP281.
To explore the ZFP281 mechanism of action and regulation, we focused on EMT, Wnt and FGF signaling, common pathways linked to ZFP281 targets in EL and mEpiSCs cells (Fig. 4). To this end, we tested several FGF and Wnt ligands (FGF2, FGF10, Wnt3a and Wnt5a) expressed in EL and early DCCs or known to regulate ZFP281 in the embryo. These data revealed that in EL cells, which are already M-like and ZFP281 high , only FGF2 could further induce ZFP281 expression (Fig. 7a). Furthermore, TWIST1 and ENG were also induced by FGF2 (Fig. 7a). These results are corroborated by a decrease in the Ep-like population, an increase in the M-like population by FACS (Fig. 7b) and an increase of invasive phenotype of EL cells upon treatment with FGF2 (Fig. 7c). Together, these data suggest that FGF2 is an upstream regulator of ZFP281.    Fig. 7f. e, Tumor volume growth kinetics of PT control and PT ZFP281-OE spheres injected in the MFP. Tumors were removed at day 71, and mice were killed 5 months after sphere injections (corresponding lungs in panel g). Graph shows n = 10 tumors per condition, median, interquartile range and multiple t-tests. f,g, Frequency of lung SCs and metastasis 2 (f) and 5 (g) months after sphere injections. Two lung slides with all lobules represented were scanned and quantified per mouse. Graph shows n = 5 mice per condition, median and two-tailed Mann-Whitney test. h, H&E images of mice lungs 5 months after sphere injections. Scale bars, 2 mm. i, Quantification of lung metastasis burden (images in panel b, one slide per mouse). Graph shows n = 5 mice per condition, median and two-tailed Mann-Whitney test. See also Extended Data Fig. 7. TWIST1, a mesenchymal marker 46 upregulated in dormant early DCCs 20 , was differentially expressed (mRNA and protein) in early and late DCCs and upon ZFP281 modulation ( Fig. 2 and Extended Data Fig. 4a). TWIST1 downregulation in EL spheres using RNAi led to a downregulation of ZFP281, an increase in CDH1 levels (Fig. 7d), as well as a decrease in M-like and invasive phenotypes (Fig. 7e,f). Importantly, these changes were rescued by ZFP281 overexpression (Fig. 7d-f). Our data support a model in which FGF2 signaling induces ZFP281 to induce TWIST1, which in turn is required to maintain ZFP281 expression to reinforce the M-like phenotype in MMTV-HER2 EL cells.
We next explored ZFP281 downstream mechanisms that might allow early DCCs to maintain an M-like phenotype. CDH11 was consistently upregulated in EL cells downstream of ZFP281, differentially expressed upon ZFP281 modulation and a direct ZFP281 binding target in ChIPseq analysis (Extended Data Figs. 4a,e and 5g). In advanced breast cancer models, CDH11 was found to be upregulated and associated with a mesenchymal phenotype [46][47][48] , but no reports link CDH11 to early dissemination and dormancy. Immunofluorescence in the MMTV-HER2 EL and PT lesions, confirmed increased expression of CDH11 in 43% of EL cells versus less than 10% positive cells in PT samples (Extended Data Fig. 8a,b), similar to ZFP281 expression frequency (Fig. 3d). A similar pattern of CDH11 staining and mRNA expression was found in the MMTV-HER2 EL and PT mammosphere cultures (Extended Data Fig. 8c). In testing the functional contribution of CDH11 to EL and PT mammosphere phenotypes, CDH11 downregulation significantly decreased the percentage of invasive EL organoids (Extended Data Fig. 8d,e), whereas overexpression of CDH11 in PT cells led to an increase in organoids with invasive phenotype (Extended Data Fig. 8f,g). This phenocopies the ZFP281 modulation in EL and PT cells, suggesting that ZFP281 may signal through CDH11 ( Fig. 4e and Extended Data Fig. 7g).
Similar to ZFP281, CDH11 also seems to have a growthsuppressive function in vivo. CDH11 overexpression in PT cells led to slower tumor growth of orthotopically injected organoids (Extended Data Fig. 8h,i) and significantly reduced outgrowth of lung metastases (Fig. 7g,i). Nevertheless, CDH11 overexpression did not seem to change dissemination to the lungs, as the number of single DCCs and metastasis did not change (Fig. 7h). These data support that CDH11 can to some extent phenocopy the ZFP281 maintenance of an M-like and dormant phenotype.

Discussion
Limited information is available on the fate of the early DCCs between lodging and metastatic outgrowth in target organs. Our publications revealed that oncogene and microenvironmental signals in ELs conspire to activate an EMT program, which persisted in nonproliferative DCCs 7,20 . Similarly, early pancreatic DCCs also undergo EMT and persist after seeding the liver 21 . Here, we reveal that ELs activate M-like programs linked to primed pluripotency that not only allow EL cells to spread but also enable them with a program of dormancy where stem-cell-like plasticity is operational (Fig. 7j). ZFP281 is expressed in EL cells, enabling dissemination and M-like lung DCCs to explore at least four major transcriptional phenotypes (modules A-D). These M-like programs carry a dormancy signature. Hybrid clusters of DCCs appeared to downregulate ZFP281 activity and regain Ep-like and growth-promoting genes, supporting our hypothesis that ZFP281 prevents DCCs to switch to an Ep-like proliferative phenotype. Interestingly, Ep-like clusters are more homogeneous, arguing that once the DCCs commit to a proliferative phenotype, they are funneled into a more phenotypically uniform state. We interpret that early DCCs enter the lungs in an M-like state and can persist dormant until signals, yet to be determined, cause a final switch. The analysis of CDH11 function suggests that this class II cadherin contributes to maintain the ZFP281-driven M-like program. We hypothesize that an interaction with other CDH11 + DCCs or CDH11 + stromal cells 47,48 may allow early DCCs to maintain the M-like dormant phenotype. Mechanistic analysis supports that FGF2 signaling derived from EL cells or other stromal compartments in the primary lesion may enable ZFP281 upregulation for dissemination. Tissue resident macrophages are required for intravasation and early dissemination 24 , but additional work is needed to determine whether they produce FGF2, or influence the EL or endothelial cells to produce FGF2 and induce ZFP281 expression. FGF2 produced by early DCCs or stromal cells in secondary organs may also maintain the dormancy of early DCCs. We also reveal that ZFP281 requires TWIST1 to maintain the M-like phenotype. This finding, together with the finding that other EMT TFs were also upregulated in early DCCs, argues that the M-like program may be quite robust, explaining the ability of these dormant early DCCs to persist long-term. Whether loss of FGF receptors, TWIST1 or other EMT TFs expressed by early DCCs could also awaken these cells from dormancy remains to be tested. ZFP281 knockdown switched early DCCs to a more hybrid or Ep-like phenotype, which correlates with increased metastatic reactivation. Consistently, Snail-and Zeb1-driven EMT was previously described to suppress cell cycle progression through repression of cyclin D1 and D2 (refs. 49,50 ). In contrast, MET was associated with rapid relapse and reduced survival in patients with metastatic castrate-resistant prostate cancer 51 . Further, Lawson et al. found that low-burden (dormant-like) breast cancer DCCs in different organs were mostly basal and pluripotent stem-like, whereas higher-burden DCCs were more luminal-like and proliferative 52 . We previously reported that the lineage commitment regulators DEC2/BHLHE41 and NR2F1/COUP-TF1 coordinate stem-like and quiescence programs 42,53,54 . However, those studies were in late-evolution cancer models. Our data functionally map these basal/stem-like and developmental/pluripotency programs to such Fig. 7 | Role of FGF2, tWISt1 and CDH11 in ZFP281-mediated regulation of early DCC fate. a-c, MMTV-HER2 EL spheres and organoids were treated daily with FGF2, FGF10, Wnt3a or Wnt5a for 7 days. Graphs show n = 3, median with range and two-tailed Mann-Whitney test. d-f, MMTV-HER2 EL spheres were transfected with siControl or siTwist1 at days 1 and 2 and cultured for 5 additional days. Graphs show n = 3, median with range and two-tailed Mann-Whitney test. a,d, mRNA expression of ZFP281, Twist1, CDH1 and Eng in MMTV-HER2 EL spheres. b,e, Fold change of Ep-like (EpCAM + Eng − ), hybrid (EpCAM + Eng + ) and M-like (EpCAM − Eng+) populations in MMTV-HER2 EL spheres. Graphs show n = 3, median with range and two-tailed Mann-Whitney test. c,f, Quantification of 3D-Matrigel invasive phenotype of MMTV-HER2 EL organoids. Graphs show n = 4, median with range and two-tailed Mann-Whitney test. g, H&E images of mice lungs 5 months after MFP injection of PT control or PT CDH11-OE spheres. Scale bars, 2 mm. h,i, Quantification of lung metastasis burden, shown as number of single cells (SCs) and metastasis per lung section (h) and metastasis area (i). Graphs show n = 6 PT control and 8 PT CDH11-OE mice, median and two-tailed Mann-Whitney test. j, Model of ZFP281-regulated dissemination and dormancy states in early DCCs. Early upon cancer initiation via HER2 signaling, EL cells activate the primed pluripotency TF ZFP281 (top). M-like and pluripotency-like programs lead EL cells to disseminate and to enter a prolonged dormancy as early DCCs in lungs (lower left, early stage). Over time, putative intrinsic and microenvironmental changes allow the dormant DCCs to disrupt ZFP281 function and adopt an Ep-like phenotype (lower right, late stage), which enables a proliferative state. Importantly, M-like, hybrid and Ep-like DCCs coexist in both early-and late-stage lungs, with predominance of M-like dormant DCCs in early-stage lungs. See also Extended Data Fig. 8. Ct, control. early stages of cancer evolution and functionally link them with an M-like dormant DCC phenotype. Laughney et al. also reported that metastatic cells (from late-evolution cancer models) recapitulate a primitive transcriptional program spanning stem-like to regenerative pulmonary epithelial progenitor states, such as the key endoderm and lung-specifying TFs SOX2 and SOX9 (ref. 55 ). Together, these data suggest that pluripotency and dedifferentiation programs may be common in different epithelial cancers and, importantly, already active in early stages of cancer progression. Our findings support that early DCCs display a high degree of cellular plasticity through M-like, primed pluripotency and dormancy programs that likely endow them with the necessary fitness to survive and undergo genetic maturation upon reactivation.
ZFP281 suppresses an epithelial phenotype, inducing a dormant phenotype in early DCCs. Thus, an opportunity opens to identify lesions that may carry or not this dormancy program and determine if it informs on dissemination and relapse. Given that both MMTV-HER2 and MMTV-PyMT models show high expression of ZFP281 in ELs and loss in late lesions, various oncogenic inputs may achieve ZFP281 downregulation with progression. We showed that ZFP281 detection is prevalent in human DCIS samples and significantly decreased in advanced invasive tumors, further supporting  the validity of our findings. ZFP281 seems to be quite specific for ELs and early DCCs. Other TFs, such as NR2F1, which also limits early dissemination 56 , when detected in prostate and breast cancer DCCs inform on patient prognosis 42,57 . Thus, similar studies could be performed for ZFP281, and its detection may help measure the abundance of early-like DCCs in patients with early or advanced disease and determine whether it serves as a marker of relapse.
More work is needed to validate ZFP281 and the M-like dormancy program in human DCCs, and our approach could not specifically distinguish early DCCs from those exclusively arriving from late lesions. Nevertheless, we provide unprecedented insight into early DCC fate, demonstrating that ZFP281 regulates an active program of dormancy that must be overridden and precedes a slow proliferation phase toward metastasis. Our data may enable exploiting these mechanisms to eliminate DCCs or force them into an indolent and harmless asymptomatic phenotype.

Methods
Animal experiments. All experimental procedures were approved by the Institutional Animal Care and Use Committee of Icahn School of Medicine at Mount Sinai. MMTV-HER2/Neu mice were maintained on FvB background and bred and crossed in our facilities. 14-to 18-week-old female mice were used as early ('premalignant') stage mice and 20-week-old or older females with palpable tumors were used as late stage of cancer progression. MMTV-PyMT mice on C57/ BL6 background were purchased from the Jackson Laboratory. Four-to 6-week-old female mice were used as early-stage mice and 8-week-old or older females with palpable tumor(s) were used as late stage. No randomization or blinding was used to allocate experimental groups. Tumors were not allowed to grow beyond the Institutional Animal Care and Use Committee allowed limit of 1 cm 3 per animal.
Mice were euthanized using isoflurane and cervical dislocation. All five pairs of mammary glands were checked for the presence of any visible small lesions or palpable tumors. Mice were perfused with PBS and organs were collected. For histopathology, organs were fixed in 4% paraformaldehyde (Thermo Fisher Scientific) for 24 h, processed and embedded in paraffin, and sections were cut. For FACS and cell culture preparations, whole mammary glands, PTs and/or lungs were digested in 0.15% Collagenase 1 A (Sigma, C-9891) 2.5% bovine serum albumin (BSA) at 37 °C with agitation for 30 min. Red blood cell lysis buffer (Lonza) was used for 2-5 min, and cells were filtered through a 40-μm filter, passed through a 25G needle and counted. CD45 depletion (MACS, mouse CD45 MicroBeads) was performed for some experiments following the manufacturer's instructions.

MMTV-HER2 mice sphere injection experiments.
A total of 300 EL or 150-300 PT spheres were injected per site into nude mice (BALB/c nu/nu, Charles River) in 100 μl of a 1:1 PBS-Matrigel solution (Corning, growth factor reduced). Spheres were injected in the two fourth inguinal gland fat pad using a 27-gauge needle. Mice injected with sh-TRIPZ-shZFP281 were given control drinking water (−DOX), water with doxycycline from day 0 (+DOX) or water with doxycycline starting 1 month after sphere injection (−DOX + DOX) until the end of the experiment, 5 months after sphere injection. In the case of mice injected with MMTV-HER2 PT spheres, tumors were removed before reaching 1 cm 3 , according to IAUCU regulations. Mice were euthanized and organs were collected and processed 2 or 5 months after cancer cell injections. Immunofluorescence was performed, and two sections per mice were used to quantify and characterize single DCCs and metastasis. H&E slides were scanned using a NanoZoomer S60 Digital slide scanner and NDP.view2 software (Hamamatsu), and metastasis area was calculated and normalized for the total area of the lungs.

RNAseq.
Total RNA from MMTV-HER2 EL and PT spheres (after 7 days in cultures) was extracted using RNeasy protocol (Qiagen) and sequenced using Illumina MiSeq. RNAseq data were analyzed using Basepair software (https://www. basepairtech.com/) with a pipeline that included the following steps. Reads were aligned to the transcriptome derived from UCSC genome assembly (((hg19))) using STAR 58 with default parameters. Read counts for each transcript was measured using featureCounts 59 . DEGs were determined using DESeq2 (ref. 60 ), and a cutoff of 0.05 on adjusted P value (corrected for multiple hypotheses testing) was used for creating lists and heatmaps. GSEA was performed on normalized gene expression counts, using gene permutations for calculating P value (public Gene Expression Omnibus records: GSE165431).
Total RNA was extracted from MMTV-HER2 EL siControl and siZfp281 using GeneJET RNA Purification Kit (K0732, Thermo Fisher Scientific), followed by mRNA library construction using Universal Plus RNA-Seq with NuQuant (0361-A01, TECAN). Libraries were sequenced on the HiSeq 4000 to obtain paired-end 150-nt read length. RNAseq reads were aligned to the mouse mm10 genome using Bowtie2 (v2.3.4.3). The aligned bam files were sorted by name using the parameter -n. We used HTSeq software (v0.11.2) and the mm10 annotation file from GENCODE (version M19) to count reads for each gene using parameters -r name -f bam and BioMart 61 to retrieve corresponding genes names. Finally, read counts were normalized with the trimmed mean of M-values method 62 for differential expression analysis using edgeR (v3. 26.8) 63 .
Network analysis. Bioinf2bio did this analysis. The genomic sequence corresponding to the promoter (ranging from 2,500 bp upstream from the TSS until 500 bp after the TSS) of each DEG was extracted using the database UCSC (mm10). Next, we retrieved from JASPAR all available position weight matrices corresponding to all known mouse TFs, and for the identification of the TF binding site (TFBS), we used the TFBSTools package 64 (http://bioconductor.org/ packages/release/bioc/html/TFBSTools.html) designed to be a computational framework for TF binding analysis. We screened each DEG promoter sequence for all putative TFBS predicted in each of the retrieved position weight matrices. TFBS were scanned in both strands with a 'min.score.percentage' parameter set to 95%. For the network construction, we have used a matrix built from all the TFBS predicted within the promoters of all DEGs selected. Of note, we detected 581,121 TFBSs with a P value <10 −4 , so we selected only the TFBS with high scores (>21).
scRNAseq. Mammary glands of early-stage mice, PTs from late-stage mice and lungs from early and late-stage mice were dissected and digested (see Animal experiments section). For the first scRNAseq experiment (Fig. 1) Samples were sequenced on an Illumina NextSeq 550 using the 75-cycle kit to a depth of 100 million reads per library. For single-cell clustering, single-cell datasets (first and second experiments) were clustered separately using an unsupervised clustering algorithm previously described 31 . In this expectation-maximization-like algorithm, parameters for multinomial cell-type specific gene-expression models are learned together with a parameter for the fraction of background noise that is associated with cells in each sample. Mitochondrial genes and Malat1 were excluded from the clustering process. The clustering parameters were chosen to accommodate the different UMI and cell counts between the experiments. In the first experiment, the minimum number of UMIs per cell was 300, the number of clusters k was 12 and (P 1 ,P 2 ) = (30 th ,60 th ) percentiles. In the second experiment, the minimum number of UMIs per cell was 800, k was set to 20 and (P 1 ,P 2 ) = (0 th ,20 th ) percentiles. Public GEO records were GSE165456 (first batch) and GSE165459 (second batch). For gene modules and gene scores, gene modules were based on a gene-covariance analysis as was applied previously 31 . Briefly, cells were down-sampled and variable genes were selected based on the variance to mean ratio. Gene-to-gene correlations were estimated per sample and were averaged following z-transformation. The averaged correlation matrices were hierarchically clustered into gene 'modules' . Given a gene list, we calculated its score per cell by summing up the UMIs of the genes in this cell and divided the sum by the total sum of UMIs in the cell. The score therefore equals to the fraction of UMIs associated with the genes in the list.
ChIPseq. ChIP was performed using EZ ChIP protocol (Millipore) with Zfp281 antibody (ab101318, Abcam) for EL and PT samples (after 7 days in cultures). High-throughput sequencing was then used to get the ChIPseq data. ChIPseq reads were aligned to the mm10 genome using bowtie2 (v2.3.4.3), followed by removing PCR duplicates using Picard with the parameter REMOVE_DUPLICATES = true. ChIPseq peaks were determined by the MACS program (v.2.1.2) using input ChIPseq as the control data, and all other parameters followed the default setting. Binding difference around the transcription start sites (−5 kb, +5 kb) between the EL and PT samples are analyzed using the DiffBind (v2.1.6). ChIPseq data were compared with RNA-seq data analysis after RNA-seq reads were aligned to the mouse mm10 genome using Bowtie2 (v2.3.4.3). The aligned bam files were sorted by name using the parameter -n. We used HTSeq software (v0.11.2) and the mm10 annotation file from GENCODE (vM19) to count reads for each gene using parameters -r name -f bam, and we used BioMart 61 to retrieve corresponding genes names. Finally, read counts were normalized with the trimmed mean of M-values method 62 for differential expression analysis using edgeR (v3.26.8) 63 . Public RNAseq data were downloaded (Key Resource Table) and aligned to mm10, followed with the same processing setting. Public GEO record is GSE165444. Cell cycle arrest, EMT, FGF signaling and Wnt signaling gene sets were downloaded from MsigDB (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) with systematic names M1134, M5930, M1090 and M7847, respectively. All processed and index sorted BAM files of high-throughput sequencing data were converted to TDF files using count command of igvtools, followed by visualization using IGV software 65,66 .
Flow cytometry and cell sorting. Mammary glands of early-stage mice, PTs from late-stage mice and lungs from early and late-stage mice were dissected and digested (see Animal experiments section). In case of cells in culture, single-cell suspensions were obtained by incubating the cells in Accutase (Sigma) for 20 min at 37 °C. Cells were stained using antibodies and conditions in the Reporting summary. All experiments were performed using BD FACSAria II sorter equipped with FACS Diva software (BD Biosciences) or analyzed using Aurora analyzer (Cytek Biosciences) equipped with SpectroFlo software. Dead cells and debris were excluded by FCS, SSC and DAPI (4,6-diamidino-2-phenylindole) (Thermo Fisher Scientific) staining profiles. Data were analyzed with FACS Diva (BD Biosciences) or FCS Express Cytometry 7 (De Novo) software.
Immunofluorescence. Tissue slides (see Animal experiments section) were dehydrated, followed by antigen retrieval 10 mM citrate buffer, pH 6.0 (Na 3 H 6 H 5 O 7 ). Blocking was done using 0.5% BSA in PBS with 5% normal goat serum (Thermo Fisher Scientific, PCN5000) for 1 h. Antibodies and incubation conditions used are summarized in the Reporting summary. For ZFP281 detection, Alexa Fluor 488 Tyramide SuperBoost Kit goat anti-mouse IgG (Invitrogen) was used for amplification of the signal. All slides were mounted with ProLong Gold Antifade reagent with DAPI (Invitrogen, P36931).
3D cultures were fixed with 4% paraformaldehyde for 20 min at room temperature, permeabilized with 0.5% Triton X-100 in PBS for 20 min, and blocking was done using 1 Å~ immunofluorescence PBS wash buffer (130 mM NaCl, 7 mM Na 2 HPO 4 , 3.5 mM NaH 2 PO 4 , 7.7 mM NaN 3 , 0.1 %BSA, 0.2% Triton X-100 and 0.05% Tween-20) containing 5% normal goat serum (Thermo Fisher Scientific, PCN5000) for 1 h. Antibodies and conditions used are summarized in the Reporting summary. Chambers were removed from slides, and wells were fixed and mounted with ProLong Gold Antifade reagent with DAPI (Invitrogen, P36931). Images were obtained using a Leica SPE high-resolution spectral confocal microscope and Leica software.
qPCR. Spheres were processed using the Cell-to-CT 1-Step Power SYBR Green kit (Invitrogen, A25600) and primers from Supplementary Statistical and reproducibility. No statistical methods were used to predetermine sample sizes, sample sizes were chosen empirically and no exclusion criteria were applied. Block randomization was used for all mice assignments. The investigators were not blinded to allocation during experiments, but quantifications were done in coded samples to reduce operator bias. Statistical analyses were done using Prism software, and differences were considered significant if P < 0.05. Exact P values are present in all significant differences; the absence of P values represent nonsignificant differences. Unless otherwise specified, three or more independent experiments were performed in vitro with at least two technical replicates per condition, and two or more independent experiments were performed in vivo with at least five mice per condition. Data distributions were assumed to be normal, but this was not formally tested. Unless otherwise specified, all values were included, median and interquartile range are shown and two-tailed Mann-Whitney U-tests were performed. Representative images were selected after three or more independent experiments were performed and imaged in vitro (Figs. 1 and 5 and Extended Data Fig. 6), and at least five mice per condition were imaged (Figs. 3, 6 and 7 and Extended Data Figs. 7 and 8). Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Source data are provided with this paper. All sequencing data are available in a public data repository (GSE165431, RNAseq of MMTV-Neu EL and PT spheres; GSE165444, ZFP281 ChIPseq of MMTV-Neu EL and PT spheres; GSE165456, scRNAseq of MMTV-Neu primary site and lung cancer cells in early and late stage; GSE165459, scRNAseq of MMTV-Neu lung cancer cells in early and late stage). All other data supporting the findings of this study are available from the corresponding authors on reasonable request.  Table 4.