Systematic identification of cell-fate regulatory programs using a single-cell atlas of mouse development

Waddington’s epigenetic landscape is a metaphor frequently used to illustrate cell differentiation. Recent advances in single-cell genomics are altering our understanding of the Waddington landscape, yet the molecular mechanisms of cell-fate decisions remain poorly understood. We constructed a cell landscape of mouse lineage differentiation during development at the single-cell level and described both lineage-common and lineage-specific regulatory programs during cell-type maturation. We also found lineage-common regulatory programs that are broadly active during the development of invertebrates and vertebrates. In particular, we identified Xbp1 as an evolutionarily conserved regulator of cell-fate determinations across different species. We demonstrated that Xbp1 transcriptional regulation is important for the stabilization of the gene-regulatory networks for a wide range of mouse cell types. Our results offer genetic and molecular insights into cellular gene-regulatory programs and will serve as a basis for further advancing the understanding of cell-fate decisions. Single-cell RNA-sequencing of seven mouse developmental stages identifies lineage-specific and shared regulatory programs controlling cell-fate decisions. Cross-species analysis associates differentiation potency with ribosomal protein gene expression.

diverse systems. Previously published E14.5 and adult data 11,14 represented approximately 30% of the cells in the entire dataset. Systemic mouse single-cell atlases of P0, P10 and P21 have not been depicted thus far. We projected all single cells on a t-distributed stochastic neighbor embedding (t-SNE) plot and obtained 95 transcriptionally distinct cell populations ( Fig. 1c and Supplementary Table 4). Clusters that were composed of multiple tissues included immune cells (C9, C16, C18 C25, C29, C34), stromal cells (C13, C20, C22, C26, C28), muscle cells (C31) and endothelial cells (C8), whereas epithelial cells differed across tissues and formed separate clusters Muc5b), which may be an intermediate cell type during airway epithelial differentiation (Extended Data Fig. 4a,b). In addition, some tissue-specific markers showed ectopic expression in other tissues. For example, we discovered hepatocyte-like cells (Afp, Alb) in the pancreas at both the P0 and the P10 stages, and immunofluorescence assays confirmed their existence (Extended Data Fig. 4c,d). They displayed different expression patterns from liver hepatocytes and showed high expression of early hepatic stem or progenitor marker Hnf4a 23,24 (Extended Data Fig. 4e,f). Together, progenitor pools with co-expression or ectopic expression patterns may widely present in developing organs, suggesting the complexity of the mammalian state manifolds before terminal differentiation.
Characterization of regulatory programs in MCDA. Highresolution MCDA offers a powerful resource for studying the molecular basis of cell-fate decisions through various lineages. To reveal organism-wide characteristics, we applied different potency models based on entropy to qualify the state manifold landscape [25][26][27][28][29] . Entropy decreased continuously along with organ maturation in the most assayed lineages using different computational methods (Fig.  3a,b and Extended Data Fig. 5a-d), revealing a decrease in transcriptional plasticity and an increase in transcriptional stability. Based on the principles of these methods, we inferred that cell-type maturation appears to be an event associated with more singular transcriptomes and biological processes.
Cell types represent high-dimensional attractor states of GRNs 30 . TFs function as important regulators in GRNs to specify cell types and differentiation patterns 31 . To identify critical TFs of cell identity, we took the advantages of both data-driven (SCENIC) 32 and database-derived (VIPER-DOROTHEA) 33 methods to estimate the activities of TFs. We achieved >75% sensitivity to detect tissue-specific TFs based on single-cell datasets (Extended Data Fig.  5e). Over 900 TFs were identified with confidence levels ranging from A (high confidence) to C (low confidence) (Supplementary Table 10). Aggregated heatmaps were constructed to display the specific and common relationships of the TFs and their enriched lineages during development ( Fig. 3c and Extended Data Fig. 5f). The neural lineage was characterized by Dlx1, Pou3f3 and Sox10. The Cebpa and interferon regulatory transcription factor genes marked the immune lineage, whereas the endothelial lineage exhibited prominent Sox17 and Sox18 expression. Strikingly, hierarchical clustering analysis showed two modules of lineage-sharing TFs, which were enriched in adult tissues and fetal tissues, respectively ( Fig. 3c and Extended Data Fig. 5g). Enrichment and occupancy of Hox and zinc-finger families in fetal tissues have previously been associated with embryonic development 34,35 . The ubiquitous expression cluster in adult tissues was shared for a wide range of lineages, with extensive representation of Xbp1, genes of the activator protein-1 (AP-1) family and other molecules. Only 78 out of 268 TFs in this adult multi-lineage cluster were housekeeping genes 36 (Extended Data Fig. 5h). Jun and Fos gene families can dimerize and form AP-1, which has been reported to act as a regulator in the differentiation of various cell types 37,38 . In addition, AP-1 family members have been recently suggested to act as central regulators of somatic cell fate 39,40 . These highlighted the important roles of AP-1 family members in cell-type differentiation and Heatmap showing the cell number of corresponding cell types at each timepoint. c,d, Feature plots in the t-SNE map of P0 intestine (c, n = 9,265 cells) and P0 brain (d, n = 9,101 cells). Cells are colored according to the expression of the indicated marker genes or two genes. The red boxes magnify the co-expressed cell types in the tissues. e,f, Immunofluorescence assay for the cells that co-expressed makers of myocytes (Myl9) and endothelial cells (Esam) in both intestine (e) and brain (f) at the P0 stage. The blue marks the cell nucleus using DAPI. The red boxes indicate the co-expressed locations. The experiment was replicated three times with similar results. Scale bar, 20 μm.  cell-identity maintenance. Moreover, these TFs exhibited increasingly upregulated gene expression levels during lineage maturation (Extended Data Fig. 5i), which coincided with decreased entropy in most lineages (Fig. 3a,b and Extended Data Fig. 5a-d). Taken together, these results suggest that these lineage-common TFs function as vital regulators during maturation across a range of mouse cell types.
To explore the common changes in cross-species development, we performed entropy analysis and found that entropy decreased in all seven species along with development, which suggested that the increase in transcriptional stability was evolutionarily conserved ( Fig. 4a-f and Extended Data Fig. 7a,b). For the molecular changes, we performed differential gene expression analysis between corresponding cell-type pairs and mapped homologous genes to the human gene symbols to find commonly regulated genes in multiple species (Extended Data Fig. 7c,d and Supplementary  Tables 12 and 13). For all species, the numbers of conserved downregulated genes were greater than those of conserved upregulated genes, which suggests that stem/progenitors have more convergent expression patterns than differentiated cell types 45 (Extended Data Fig. 7e). Both commonly downregulated and upregulated genes in at least three species tended to have more protein-protein interactions (PPIs) than other conserved and not conserved genes in at least three species, which indicated that the common regulators were evolutionarily older 46 (Extended Data Fig. 7f,g). The genes downregulated during development were enriched with ribosomal protein genes, mitochondrial ribosomal protein genes and small nuclear ribonucleoprotein genes (Fig. 4g, Extended Data Fig. 7h and Supplementary Table 14). Notably, Myc and Mycn, as regulators of ribosome biogenesis 47,48 , showed high activity scores in the early stages of mouse development (Fig. 4h). They were classified in the common (fetal) module (Supplementary Table 10). These findings were highly consistent with recent studies, which reported that ribosomal protein genes as central network hubs are robust markers of differentiation potency 5,49 . In our cross-species entropy analysis, the conserved driving genes (Methods) in cells with high differentiation potential were also enriched in ribosomal biogenesis 50 (Fig. 4i). Ribosomal protein genes are also suppressed during zebrafish hematopoiesis 51,52 . Our results suggest that ribosomal protein genes are a conserved feature of stemness and they are downregulated during cell-type differentiation. On the other hand, the upregulated genes were highly enriched for immunity pathways (Extended Data Fig. 7i,j and Supplementary Table 15), which was consistent with recent reports on human and mouse adult tissues 14,53 . Together, we present a catalog of common features during lineage development from invertebrates to vertebrates; particular ribosomal protein genes are enriched in the less differentiated cells.

Gene regulation networks of cell-fate decisions across species.
To search for lineage-specific regulators among different species, we systematically aligned homologous pairs of cell lineages from each species across large evolutionary distances. Two methods, SAMap 54 and MetaNeighbor 55 , were applied with different calculation principles and homologous gene-mapping methods. SAMap enables mapping single-cell transcriptomes between phylogenetically remote species based on the gene expression similarity whereas MetaNeighbor has high replicability in cell-type matching using homologous weighted gene matrices. High confidence thresholds (alignment scores with >0.5 in SAMap and Mean_AUROC >0.8 in MetaNeighbor) were adopted to obtain complementarily reliable cell-type matches across species. Some 47 of the 60 cell lineages from 7 species were characterized into 8 meta-lineages (Extended Data Fig. 8a). The Uniform Manifold Approximation and Projection (UMAP) embedding based on pseudo-bulk cells per species proved the rationality of meta-lineages, in which pseudo-bulk cells from the same meta-lineage were more intensively clustered (Extended Data Fig. 8b,c). Then, the specificity of TFs was characterized with the modified regulon-specific scores with TF expression count matrices as input per species 56,57 . Lineage-specific TFs displayed sequence similarity within the meta-lineage across species (Extended Data Fig. 8d-j). Vertebrates tended to have more conserved species-specific TFs than invertebrates.
For lineage-common regulators among different species, we found that several commonly upregulated TFs exhibited remarkable convergence, including XBP1, JUND, FOSB, JUN, BHLHE40 and others (Fig. 5a), consistent with the enriched TFs in various adult mouse tissues (Fig. 3c and Supplementary Table 10). These TFs also displayed strong negative correlations with TFs that were enriched in lineage-specific progenitor cells (GATA1, PAX6, NKX6-2, NEUROD1, SOX10, OLIG2) in the Human Cell Landscape (HCL), a comprehensive cell landscape for humans generated by Microwell-seq 14 (Fig. 5b and Supplementary Tables 16 and 17). We suspect that these TFs may function as evolutionarily conserved regulators to guide multi-lineage cells to differentiation and maturity. We found that only one TF, Xbp1, stands out in all seven species ( Fig. 5a and Extended Data Fig. 7d). Therefore, we attempted to further characterize the role of Xbp1 in cell-type maturation. Previous work has emphasized functions of the basic helix-loophelix TF Xbp1 for cell differentiation in various cell types, including secretory cells, plasma cells, T cells, neurons, hepatocytes and other cell types [58][59][60][61][62] . As a putative common regulator, Xbp1 showed an upregulated expression pattern in most lineages of MCDA (Fig. 5c). We further dissected its regulatory role from a cell atlas perspective and found that stem regulators such as SOX4, SON and HES1 are the most negatively correlated with XBP1 in the HCL (Fig. 5d). In addition, the XBP1-binding motif in hematopoietic progenitors and neural progenitors was less enriched than their corresponding mature cell types in the single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) data of the mouse and human [63][64][65] (Fig. 5e,f).

Xbp1 as a common regulator in multi-lineage progression.
To dissect the mechanistic roles of the potential lineage-common regulators Xbp1, we used clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 to disrupt the Xbp1 locus in mice (  Table 18). As most Xbp1 −/− embryos died at E13.5, we applied scRNA-seq to analyze embryos at E12.5 from Xbp1 +/− heterozygous crosses before massive embryonic lethality 62 Table 19). We found that increased cell groups after Xbp1 disruption were all related to progenitor and immature cells (for example, fetal mesenchymal progenitors, early primitive erythroid progenitor, muscle progenitors, radial glia, oligodendrocyte progenitors and immature neurons) (Extended Data Fig. 9d). In addition, when compared with wild-type (WT) cells, Xbp1 −/− cells displayed higher entropy in a broad range of lineages, which may be linked to the eventual failure of cell-type maturation ( Fig. 6d and Extended Data Fig. 9e,f). Then we performed differential expression analysis and observed that a group of ribosomal protein genes (for example, Rps3a1 and Rps7) were specifically upregulated in Xbp1 −/− cells. Moreover, progenitor markers such as Sox4, Id2, Son and the imprinted gene H19 were enriched in Xbp1 −/− cells. The lineage-common regulators Fosb and Jun were downregulated in Xbp1 −/− cells ( Fig. 6e and Supplementary Table 20). Thus, disruption of Xbp1 caused mouse embryos to acquire a more progenitor state.
To characterize the loss-of-function changes at protein levels, we performed liquid chromatography-mass spectrometry (LC-MS) proteomic analysis on both WT and knockout (KO) embryos (Supplementary Table 21). Xbp1 −/− embryos exhibited higher expression level of pluripotency-related proteins such as Lin28a, Lin28b 66 , Pcgf6 (ref. 67 ) and Jarid2 (ref. 68 ) and lower expression level of cell type-specific proteins such as Snca in neural cells, Clu in stromal cells, Afp in hepatocytes, C1qb in macrophage and Blvrb in erythroid cells ( Fig. 6f and Extended Data Fig. 10a).

NatuRE GENEtIcS
Normalized expression **** NS **** NS **** **** ****   In addition, canonical Xbp1 targets related to the unfolded protein response (UPR) [69][70][71] displayed no significant changes at the protein level (Extended Data Fig. 10b). Furthermore, Xbp1 disruption in mouse embryonic stem cells (mECSs) did not alter stem cell culture and pluripotent gene expression, indicating that Xbp1 transcriptional regulation of lineage decisions is not downstream of the UPR (Extended Data Fig. 10c,d and Supplementary Table 22). We applied VarID 72 to qualify lineage-determining factor changes in scRNA-seq datasets. These significantly variable TFs in Xbp1 −/− samples displayed an Xbp1-binding motif in both scATAC-seq and chromatin immunoprecipitation sequencing (ChIP-seq) data ( Fig. 6g and Supplementary Table 23). Our results indicate a direct role of Xbp1 in lineage maturation via transcriptional regulation, during which Xbp1 functions through a mechanism that is independent of the UPR.

Discussion
Overall, our comprehensive MCDA atlas of mouse cell differentiation and maturation offers a powerful resource for investigating cell-fate decisions. We characterize a general feature of decreased entropy in most lineages during development. Our analysis of GRN dynamics reveals both lineage-common and lineage-specific regulators that contribute to cell-fate decisions. We highlight that Xbp1 is a critical and conserved transcriptional regulator of cell-type differentiation in many lineages, as shown in our multi-omic analysis of Xbp1 KO mouse embryos. However, the regulatory mechanisms of lineage-common regulators still require further research and functional validation in other settings such as in vitro differentiation, de-differentiation and trans-differentiation.
In the present study, we propose a systematic view of the cross-species state manifold landscape. Cells gradually progress from a stem/progenitor state toward specific cell fates with decreased entropy. During the process, divergent GRNs function following cell differentiation, including lineage-specific and lineage-common regulators. Lineage-specific TFs probably direct cell fate as potential regulators for the emergence of each cell type, whereas lineage-common ones probably represent general regulators to stabilize cell fates across various cell types, such as gravity through the process of state manifold 18,73 . We identify examples of common GRNs as conserved regulators of cell-fate stabilization (Fig. 6h). Thus, our work introduces a new functional classification of gene-regulatory programs to improve state manifold representations.
Tissue development and maturation atlases can provide global views of the cell-fate decision process. Using our data, we identified new cell types with co-expression and ectopic expression patterns during mouse development. We verified a myoendothelial cell type that co-expressed makers of myocytes and endothelial cells, a cell type that co-expressed makers of club and goblet cells and hepatocyte-like cell types in the pancreas during mouse development.
We hypothesize that early transitional cell types may serve as a pool of progenitors to broadly support the normal progression of much functional tissue formation. We also observe new cell types in neonatal mice that will require further verification and characterization. By integrating developmental atlases across species, we describe common characteristics at varying evolutionary distances during development. Entropy is a concise, independent and robust measurement for differentiation potential and we further associate it with ribosomal protein gene expression in evolutionarily distant species.
Our single-cell analysis of KO mice provides a systematic insight into gene function at the organism level. Similar strategies can be applied to a series of KO embryos for the dissection of a functional GRN during development. It would also be interesting to compare quantitative gene function across different model systems and species. Specific combinations of functional regulatory networks across species may hint at evolutionary regularities of cell types. It is worth noting that limitations of single-cell technologies such as the digestion process, batch effects and sequencing depth should be taken into consideration during such analyses.
In conclusion, we constructed an MCDA and systematically characterized the cell-fate regulatory programs during development across species, which will lead to new understandings of cell-fate decisions and the cellular state manifolds.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41588-022-01118-8. WT: n = 3, mean ± s.d.). A two-sided Student's t-test was performed to determine the statistical significance. The illustrative genes were manually selected from the full heatmap, which is shown in Extended Data Fig. 10a. g, Heatmap showing significantly variable TFs in Xbp1 −/− samples. Green and purple indicate the chromatin accessibility of the Xbp1-binding motif as determined by scATAC-seq and ChIP-seq, respectively. The Xbp1-binding motif of the mouse was from the CisBP database. Representative TFs are marked and were manually selected from Supplementary Table 23. Wilcoxon rank-sum test (two-sided) was performed to identify significantly variable TFs and p-value adjustment was performed using bonferroni correction (p adjusted values < 0.05 and fold change > 1.25). h, Schematic of cross-species state manifold landscape.

Methods
Mouse experiments to supplement the MCDA database. WT C57BL/6J mice were ordered from Shanghai Laboratory Animal Center. All mice were housed at Zhejiang University Laboratory Animal Center in a specific pathogen-free facility with individually ventilated cages. The room had a controlled temperature (20-22 °C), humidity (30-70%) and light program (12 h light:dark cycle). The mice were provided free access to a regular rodent chow diet.
To obtain embryonic samples (E10.5 embryos, E12.5 embryos), C57BL/6 mice were mated. Noon on the day the vaginal plug was visible was considered to be E0.5. Sex was not determined before tissue pooling for E10.5, E12.5 and P0 samples (except for the gonads). Embryos were collected from at least three independent litters (in total three to nine embryos) per development stage. For P10 and P21 samples, testes were collected from male mice and all the other tissues were collected from female mice.
All experiments performed in the present study were approved by the Animal Ethics Committee of Zhejiang University. All experiments conformed to the relevant regulatory standards at Zhejiang University Laboratory Animal Center.
Generation of Xbp1 KO mESC and mouse models. SgRNAs targeting exon 2 of Xbp1 were designed using the Zhang laboratory CRISPR design website tool (http://crispr.mit.edu). Oligonucleotides were synthesized and then cloned into an epiCRISPR-Cas9 vector 74 . The vector was extracted using an EndoFree Mini Plasmid Kit II (Tiangen Biotech, catalog no. 4992422) following the manual. Approximately 4 × 10 5 E14 mESCs were transfected with 2 μg of the vector with Lipofectamine 3000 (Life Technologies, catalog no. L3000001) based on an online protocol. At days 2-10, cells were selected with puromycin (0.5-1.0 μg ml −1 ). Then, single cells were reseeded in a 6-well plate and cultured in mESC media for 7-10 d. Individual colonies were picked and genotyped. The genomic RNA target sites and oligonucleotides used in the present study can be found in Supplementary Table 22.
Xbp1 KO C57BL/6J mice were generated by Nanjing Gempharmatech. Mice were genotyped by PCR using genomic tail DNA. To obtain live KO embryos at E12.5 for scRNA-seq, we used a Scientific Phire Animal Tissue Direct PCR Kit (Thermo Fisher Scientific, catalog no. F140WH) to genotype embryos quickly. All primers used for KO and genotyping are listed in Supplementary Table 18.
Immunofluorescent staining. Fresh mouse tissues were frozen in disposable molds containing optimal cutting temperature compound. Frozen sections were cut at 10 μm in CryoStar NX50 (Thermo Fisher Scientific), mounted on microscope slides and stored at −80 °C. Before staining, the sections were thawed for 20 min and 4% formaldehyde in phosphate-buffered saline (PBS) was added to cover the sections. Tissues were fixed for 15 min at room temperature. After fixation, sections were washed three times with PBS. Cells were permeabilized by covering the sections with 0.1% Triton X-100 in PBS for 10 min. Then, the sections were washed three times with PBS and blocked with 3% bovine serum albumin in PBS for 1 h at room temperature. Primary antibodies (anti-ESAM (1:50; Thermo Fisher Scientific, catalog no. MA5-24072), anti-Myl9 (1:400; Abcam, catalog no. ab187152), anti-Scgb1a1 (1:50, R&D, catalog no. MAB4218-SP), anti-tff2 (1:200; ProteinTech, catalog no. 13681-1-AP) and anti-AFP (1:200, Affinity, catalog no. AF5134)) diluted in blocking solution were added to cover the sections. The slides were placed in a wet box and incubated overnight at 4 °C. Relevant Alexa Fluor-488/594-conjugated secondary antibodies (1:1,000; Thermo Fisher Scientific, catalog nos. A-21208, A-21206 and A-11037) were used for labeling. The slides were then washed three times with blocking solution and stained with DAPI. Glass coverslips were then attached to the slides using mounting media. Immunofluorescence images were obtained using Olympus VS200.
Cell preparation. Mouse tissues were minced into pieces of ~1 mm on ice using scissors. The tissue pieces were transferred to a 15-ml centrifuge tube, rinsed twice with cold Dulbecco's (D)PBS and suspended in 5 ml of a solution containing dissociation enzymes. The samples were treated with various enzymes for different periods of time (Supplementary Table 3). During dissociation, the tissue pieces were pipetted up and down gently several times until no tissue fragments were visible. The dissociated cells were centrifuged at 300g for 5 min at 4 °C and then resuspended in 3 ml of cold DPBS. After passage through a 40-µm strainer (Biologix), the cells were washed twice, centrifuged at 300g for 5 min at 4 °C and resuspended at a density of 1 × 10 5 cells ml −1 in cold DPBS containing 2 mM EDTA.

ScRNA-seq.
Single-cell complementary DNA libraries were prepared using the Microwell-seq 11 . Briefly, cells were loaded on the microwell plate and extra cells were washed away gently using ice-cold PBS. Then bead suspension (sequences listed in Supplementary Table 2) was loaded on the plate and extra beads were washed away on a magnet. The plate was covered using cold lysis buffer (0.1 M Tris-HCl, pH 7.5, 0.5 M LiCl, 1% SDS, 10 mM EDTA and 5 mM dithiothreitol (DTT)) and incubated on ice for 12 min. Then, beads were collected and washed using 6 × saline sodium citrate and 50 mM Tris-HCl, pH 8.0. After washing, beads were resuspended in reverse transcription (RT) mix and incubated at 42 °C for 90 min. After RT, beads were washed in TE-TW (10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 0.01% Tween20) and 10 mM Tris-HCl, pH 8.0. Beads were resuspended in exonuclease I mix and incubated at 37 °C for 30 min. Then, beads were washed in TE-SDS (1XTE + 0.5% sodium dodecyl sulfate), TE-TW and 10 mM Tris-HCl, pH 8.0. Beads were resuspended in PCR mix with TSO (template switch oligo) primer to amplify the cDNA. After PCR, beads were removed and cDNA products were purified using 0.8 × VAHTS DNA Clean Beads (Vazyme. catalog no. N411-01). A more detailed version of the Microwell-seq protocol is available in Han et al. 14  To eliminate primer dimers and large fragments, 0.55-0.15× VAHTS DNA Clean beads were then used to purify the cDNA library. The size distribution of the products was analyzed on an Agilent 2100 bioanalyzer, and a peak in the range 400-700 bp was observed. Finally, the samples were subjected to sequencing on the Illumina HiSeq (data for MDCA) or MGI DNBSEQ-T7 (data for Xbp1 KO experiment). For MGI sequencing, we applied the protocol provided by the VAHTS Circularization Kit for MGI (Vazyme, catalog no. NM201-01) to obtain single-stranded circular cDNA available for DNB (DNA Nanoball) generation. We also replaced the official R1 sequencing primers with our customized R1 sequencing primers A and B (listed in Supplementary Table 2) to ensure the completion of the sequencing.
Processing of Microwell-seq data. Microwell-seq datasets were processed as described 11 . Reads were aligned to the Mus_musculus. GRCm38.88 genome using STAR 75 (v.2.5.2a). The digital gene expression (DGE) data matrices were obtained using the Drop-seq core computational protocol (available at website http://mccarrolllab.org/dropseq) with the default parameters. For quality control, we filtered out cells with detection of < 500 transcripts. Cells with a high proportion of transcript counts (> 20%) derived from mitochondria-encoded genes were also excluded. Cells were also corrected for RNA contamination and background-removed DGE data were constructed 14 . The SCANPY 76 (v.1.6.0) python package and Seurat 77 (v.3.2.2) R package were used to load the cell-gene count matrix and perform downstream analysis.
Clustering of the single-cell data matrix. For clustering of the complete mouse tissue dataset (520,801 cells), qualified cells were processed using SCANPY (v.1.6.0) in a Python (v.3.6.9) environment. Background-removed DGE data for cells analyzed in each tissue and genes expressed in at least 20 cells were used as inputs 14 . Then, DGE data were ln(c.p.m./(100 + 1)) transformed (where c.p.m. is counts min −1 ). We selected approximately 3,000 highly variable genes according to their average expression and dispersion. We then regressed out unique molecular identifiers and gene numbers and scaled each gene to unit variance, and the values beyond an s.d. of 10 were clipped. For the mouse tissue dataset, we chose PCs for principal component analysis (PCA) according to elbow plots and 50 PCs were used to create a neighborhood graph for the cells. We then used Leiden clustering to cluster with resolution = 8 and k = 25. Marker genes were calculated using Wilcoxon's rank-sum test (two-sided) and p-value adjustment was performed using the Benjamini-Hochberg correction. For visualization, t-SNE was used.
For kidney data, bbknn 78 (v.1.4.0) was performed by using ridge regression to remove batch effects. For clustering of single tissues, the Seurat pipeline was used with the default parameters for fewer cells. Cell type and lineage information of each cell type were manually annotated according to the marker genes reported in a previous paper 11 . A hierarchical tree of the MCDA was computed using the correlations of average gene expression of 95 clusters with highly variable genes.

Estimation of the variance of the MCDA.
To estimate the variance in the data depending on age, tissue or sex, we first aggregated the gene expression for each tissue at multiple time points. Using the above metadata as input, we performed principal variance component analysis (PVCA) using R Package pvca (v.1.26.0, https://www.bioconductor.org/packages/release/bioc/html/pvca.html) with the default parameters. It leverages the strengths of two popular data analysis methods PCA and variance components analysis and integrates them into a new algorithm. It also uses the eigenvalues associated with their corresponding eigenvectors as weights, to quantify the magnitude of each source of variability. All factors as well as their interaction terms are treated as random effects in the mixed model for variance component estimation. It fits a linear mix-effects model to data. Items such as 'tissue' and 'gender' are variances explained by interactions of two factors instead of the union of two factors.
Inference of the TFs for MCDA. As a proof of principle, we applied experimentally verified, tissue-specific TFs from the literature 79 as the gold standard. We included both tissue-restricted TFs and nonuniformally expressed TFs in different tissues as tissue-specific TFs. For datasets used, we selected high-quality cells with > 800 gene numbers as single-cell datasets, and also aggregated every 20 single cells in each cell type to produce pseudo-cells to enrich our choices of input datasets. We compared SCENIC 32 (v.0.10.0) and VIPER-DOROTHEA 33 (viper v.1.28.0 and dorothea v.1.6.0) for inferring specific TFs in the tissues. The DOROTHEA database provided TFs from different types of evidence with a different confidence. We used ABCDE (1,113 TFs) categories of DOROTHEA TFs in our comparison. Regulon specificity scores (RSSs) 56 were calculated to represent TF specificity in the tissue for both VIPER-DOROTHEA and SCENIC. Then we employed the youden index (sensitivity + specificity − 1) to find the best performance of VIPER-DOROTHEA and SCENIC in classifying tissue-specific TFs in both sensitivity and specificity. These TFs were compared with the gold standard lists for four aspects: sensitivity, specificity, false-positive rate and area under the precision-recall curve.
To define regulatory programs in MCDA, SCENIC and VIPER-DOROTHEA were applied first to infer the GRN with default parameters using high-quality single cells with > 800 genes. For VIPER-DOROTHEA, ABCDE (1,113 TFs) categories of DOROTHEA TFs were used. Second, z-scaled RSSs for VIPER-DOROTHEA and z-scaled TF activity scores of SCENIC in each stage lineage were calculated as a TF-by-lineage matrix. Then, fuzzy c-means clustering was performed on the TF-by-lineage matrix calculated by SCENIC and VIPER-DOROTHEA, resulting in a TF-by-module 'membership matrix' and a lineage-by-module 'centers matrix' . The centers matrix with 15 modules was used to generate the heatmap. We defined a threshold membership score (threshold = 0.2) in which TFs were assigned to a module. With the fuzzy c-means heatmap, we identified which modules/TFs were lineage specific and which were lineage sharing. We assigned TFs into specific lineages according to the aggregated patterns of modules manually and the resulting TFs were classified into three collections with high to low confidence: collection A consisted of TFs from both methods, collection B were TFs only from SCENIC and collection C TFs only from VIPER-DOROTHEA (Supplementary Table 10).

Analysis of time-related genes during cell-type maturation.
Early organ formation in mice begins at E10.5 and cells undergo differentiation to reach maturity during development 80 . Thus, we identified time-related genes that showed upregulation patterns at the expression levels during the developmental processes by using Spearman's rank correlation analysis for different lineages in each tissue 81 . Spearman's rank correlation coefficient, which has low requirements on data distribution and a high tolerance for outliers, can directly reflect the monotonous relationship between variables, so we adopted it. We treated the seven-stage information (E10.5, E12.5, E14.5, P0, P10, P21 and adult) as the vectors labeled (1, 2, 3, 4, 5, 6 and 7), and then calculated the correlation between the gene expression levels across seven development stages and the vectors for each stage. The larger the absolute value of the correlation coefficient, the stronger the monotonicity of the gene expression level and timepoints. The TFs with Spearman's rank correlation coefficient ≥ 0.8 in at least four lineages in five tissues with a P ≤ 0.05 were retained as the common time-upregulated TFs during lineage maturation.
Single-cell entropy analysis. Single-cell entropy estimation was performed using three methods: CCAT 29 (SCENT v.1.0.2), SLICE 27 (v.0.99.0) and StemID 26 (RaceID v.0.2.2). To obtain the best performance, normalization was dependent on the computational methods. For CCAT, it is an approximation of network entropy. We applied CCAT to compute the correlations with the connectome and transcriptome based on the 'net13Jun12.m' PPIs. We performed CCAT analysis by using a weighted matrix to leverage all the homology genes between human and other species. The weighted matrix was obtained by converting the gene homology relationship (one-to-one, one-to-many, many-to-one and many-to-many) into a binary matrix and normalized it to one human gene. In StemID, it estimates the Shannon entropy of a cell's transcriptome directly based on the expression of each gene. We used StemID to infer entropies with default parameters. For SLICE, it established a kappa matrix of gene ontology (GO) annotations of the human or mouse to evaluate the probability distribution of the functional activation of each cell. SLICE was performed as a deterministic calculation of scEntropy of individual cells over the GO cluster activation profile with iter = 50. Cells were downsampled to 2,000 per tissue per stage to cut the calculation burden of SLICE and StemID. In summary, CCAT calculates the entropy-related values from the perspective of the network entropy of the gene interaction network. SLICE and StemID calculate the entropy values by using the activation of the gene pathway and the gene expression as probabilistic events, respectively. Although the principles of the three methods are different, their central idea is to couple entropy with developmental potential. They evaluate biological systems using physical concepts and reflect the physical properties of biological systems.
Construction of a cell-type hierarchy across species and gene regulation analysis. For invertebrates, to infer the topological relations of cell-type development, we first constructed a PAGA graph 43 per lineage using SCANPY (v.1.6.0). We processed the data following the steps suggested by SCANPY, including total count normalization, log(1P) transformation, highly variable gene extraction, potential regression of confounding factors of genes and counts, scaling to z-scores and PCA. Then, we computed a neighborhood graph among data points and used UMAP for topologically faithful embedding with min_dist = 0.1. Then, PAGA was performed with iter = 1,000. The cell-type tree layout was based on a minimum spanning tree fitted to edges weighted by inverse connectivity. Edges in an abstracted graph with a probability > 0.0005 were considered as possible connections of cell-type hierarchies. For S. mediterranea, cell-type hierarchies were obtained from the consolidated lineage tree, which was provided in a paper 12 and, for C. intestinalis, lineage and stage information were directly from a paper 16 . For complex vertebrates, we connected cell states across time according to gene expression similarity 44 . For each tissue, we asked each adult cluster to 'vote' on its most likely ancestor cluster from the fetal stage. To eliminate the influence of cell number, we randomly sampled 150 cells to embed them into the PCA space learned from the second timepoint only and kept nontrivial PCs as defined above. Then, in this embedding, for each cluster in the late timepoint, the cluster identities of the five nearest neighbors of each constituent cell from the previous timepoint were determined using a Euclidean distance metric. The percentages of votes cast for each possible ancestor were calculated and the maximum frequencies of votes (20-100%) of the cells in the cell group decided the ancestor group. For zebrafish datasets, we integrated the data using Seurat (v.4.0.1) which anchors integration functions to do the batch correction before the PCA. Sankey plots were generated using the networkD3 (v.0.4, https://christophergandrud.github.io/networkD3) R package. For atlas projects across species, we performed the same differential expression analysis for cells in each tissue-cell type/lineage-cell type separately according to the cell-type hierarchy using FindMarkers function in Seurat (v.4.0.1). Wilcoxon's rank-sum test was performed to determine the statistical significance and the Benjamini-Hochberg correction was used for the p-value adjustment. The top common DEGs (20-100% of total cell-type pairs, mean 60-94% lineages, p adjusted values < 0.1, log 2 (fold-change) (log 2 (FC)) ≥ 0.25, min_pct ≥ 0.1) were estimated according to the frequency of differential expression in all unstable-to-stable cell-type pairs across species. To match the two timepoints (fetal and adult) of humans, only the E14.5 and adult stages of mice and 24-h post-fertilization and 3-month stages of zebrafish were considered for cross-species analysis. Genes that display consistent patterns in at least three species were defined as commonly upregulated and downregulated genes. Genes that were either 'up-' or 'down-'regulated were excluded in the analysis.
The top 20 most negative TFs of the upregulated TFs were determined by Pearson's correlations based on single-cell datasets and visualized by Cytoscape (v.3.5.0) 82 . In the present study, we considered only one-to-one orthologous pairs with humans for commonly regulated genes. As for species-specific TFs, TFs of H. sapiens, M. musculus, D. rerio and C. elegans were downloaded from the AnimalTFDB 3.0 database 86 . Other species-specific TFs except H. vulgaris were obtained from a paper 57 . Genes from H. vulgaris were obtained with Swiss-Prot IDs of best hits. Thus, the TFs of H. vulgaris were defined by the genes annotated with the GO terms downloaded from the uniport website: DNA-binding TF activity or TF binding. Those Swiss-Prot IDs of best hits were also checked for TFs from AnimalTFDB 3.0 and used as a supplement to TFs.

Collection and prediction of orthologous genes and
Lineage-specific TFs analysis across species. We applied two methods to calculate the lineage evolution relationship across species with the pseudo-cell as inputs (aggregated every 20 cells from each cell type): SAMap 54 (v.0.3.0) and MetaNeighbor 55 (pyMN v.0.1.0). SAMap enables mapping single-cell transcriptomes between phylogenetically remote species based on the expression similarity whereas MetaNeighbor has high replicability in cell-type matching using homologous weighted gene matrices. For SAMap, it constructs a gene-gene bipartite graph with cross-species edges connecting homologous gene pairs, weighted by protein sequence similarity. For MetaNeighbor, we constructed weighted matrices to leverage all the homology genes between humans and other species. The weighted matrices were obtained by converting the gene homology relationship (one-to-one, one-to-many, many-to-one and many-to-many) into a binary matrix and normalized it to one human gene each. Lineage pairs with high confidence thresholds (alignment scores with > 0.5 in SAMap and Mean_AUROC > 0.8 in MetaNeighbor) were considered as highly reliable and biologically plausible matches from different aspects. The combined projection of seven species was obtained from the function ' AMAP.scatter' of SAMap. The specificity of TFs was characterized using modified regulon specificity scores in SCENIC with TF expression count matrices as input 56,57 . We then calculated the z-score-normalized TF specificity score to predict the essential TFs in each lineage. Development-related, lineage-specific TFs were intersected with upregulated genes across species. The sequence similarity score was determined by the National Center for Biotechnology Information's (NCBI's) BLAST with transcriptome or proteome data as inputs. An E-value threshold of 1 × 10 6 was set. It was also integrated into SAMap.
Pathway enrichment analysis. We used clusterProfiler 87 (v.3.14.3) to perform GO biological pathway enrichment analysis and orthologous genes were taken as the universe. Hypergeometric test was performed to identify significant GO terms and the Benjamini-Hochberg correction was used to adjust p-values. We considered biological pathways with p adjusted values < 0.05. We used REVIGO 88 to visualize the enrichment results. For Extended Data Fig. 1e, we used clusterProfiler to perform GO biological pathway enrichment analysis for DEGs at neighboring stages. We considered biological pathways with p adjusted values ≤ 0.01. For each stage, the enrichment terms, as determined by clusterProfiler, were used to manually combine into 13 'super terms' for biological processes. For Extended Data Fig. 1i, GO enrichment analysis was first computed using the DEGs of the kidneys. Then, the enrichment scores of the terms were calculated and aggregated for each stage using AUCell 32 .
PPI analysis. We downloaded the PPI resource of human genes from STRING 89 (v.11). Experimentally validated interactions from humans and transferred by homology from other species were used for the analysis. Then, we compared the log 10 (PPI no.) of four groups, the upregulated genes in at least three species, downregulated genes in at least three species, other conserved genes in at least three species and all other genes in the PPI resource. We also downloaded the gene functional assignments from the eggNOG database (v.5.0) and used the mammals' nonsupervised orthologous groups (maNOG) to assign genes into 26 categories 90 . The 26 gene categories were arranged by their average number of PPIs in ascending orders. Statistical analyses were done with R package ggpubr (v.0.4.0, https://rpkgs. datanovia.com/ggpubr) for two-tailed Wilcoxon's rank-sum test to determine the statistical significance of the differences between two groups.
Analysis of the CCAT-driving gene across species. CCAT directly measures the correlation between transcriptome and connectome and will therefore be positive if most of the network hubs are overexpressed in more potent cells 29 . Thus, we used the number of adjacent edges to evaluate the degree of each gene in the PPI network and the top 20% of genes were regarded as network hubs. We intersected them with the commonly downregulated genes we found in the manuscript (highly expressed in undifferentiated cells, Supplementary Table 13) in each species as CCAT-driving genes in more potent cells. Genes that appeared in at least five species were regarded as conserved CCAT-driving genes. We performed gene enrichment analysis using clusterProfiler on those conserved CCAT-driving genes. The biological processes related to ribosome biogenesis were marked red according to a previous paper 50 .
Analysis of Xbp1 expression pattern in MCDA. Given the low detection rate of TFs in the single-cell experiment, we chose high-quality cells with > 800 genes and calculated the average expression of Xbp1 by normalization to a group of stably expressed gene sets generated from scMerge R package (v.1.2.0, https:// bioconductor.org/packages/release/bioc/html/scMerge.html). We used linear regression to measure the expression trend of Xbp1 with a 95% confidence interval.
Cell-type composition analysis. Significant differences in cell-type composition between groups were assessed using a propeller test from the speckle R package (v.0.0.1, https://github.com/Oshlack/speckle/). We considered groups with false discovery rate (FDR) ≤ 0.01 to represent significantly changed cell types.
Gene expression variability analysis. To detect sensitive changes in weakly expressed genes, we calculated the gene expression variability using VarID 72 (RaceID, v.0.2.2). We ran VarID with regNB=FALSE, k = 10 for the pruning step, no_cores=10 and default parameters otherwise.
Analysis of global proteomics data. LC-MS proteomic analysis was carried out by PTM Bio 91 . Briefly, mouse embryos were ground into powder in liquid nitrogen and suspended in an ice-cold lysis buffer with 1% Triton X-100 and 1% protease inhibitor based on occasional sonication. The cell lysates were centrifuged at 12,000g and 4 °C for 15 min. The supernatants were collected and the protein concentration was measured. Proteins were precipitated using 20% trichloroacetic acid for 2 h at 4 °C and then centrifuged at 4,500g for 5 min. The precipitate was washed three times with cold acetone. The dried protein pellets were resuspended within 200 mM tetraethylammonium bromide based on occasional sonication and then digested with trypsin overnight. DTT was added to a final concentration of 5 mM and the supernatants were incubated at 56 °C for 30 min. Iodoacetamide was added to a final concentration of 11 mM and the supernatants were incubated in the dark for 15 min. Peptides were separated using NanoElute and analyzed using timsTOF Pro. The resulting MS-MS data were processed using the MaxQuant search engine (v.1.5.2.8, https://www.maxquant.org) and mapped to the Mus_ musculus_10090 database. The FDR was adjusted to < 1% and the minimum score for modified peptides was set to >40. Trypsin/P was defined as the cleavage enzyme, and up to two missing cleavages were allowed. For proteomic analysis, the first search range was set to 5 p.p.m. for precursor ions and the main search range was set to 5 p.p.m. and 0.02 Da for fragment ions. Carbamidomethylation of cysteines was defined as the fixed modification and oxidation on methionine was defined as the variable modification. The quantification method used was label-free quantification, the FDR was adjusted to <1% and the minimum score for modified peptides was > 40.
ScATAC-seq and ChIP-seq data analysis. We used ChromVAR 92 (v.1.12.0) to calculate the accessibility of the Xbp1 motif in scATAC-seq datasets for comparing the Xbp1 motif enrichment between differentiated states and undifferentiated states in both the human and the mouse. The mouse scATAC-seq data were downloaded from two papers 63,64 and human scATAC-seq data from another paper 65 . The motif PWM was downloaded from the CisBP database (http://cisbp. ccbr.utoronto.ca). For better visualization, we arranged the cells according to their differentiated states. This comparison was restricted to the cell-type annotations provided. As shown, the Xbp1 motif was less opened in undifferentiated cells in both human and mouse tissues in neuron cell types and hematopoietic cell types. ChIP-seq data for Xbp1 were downloaded from previous studies 60,71,93,94 . The target genes were binarized and integrated for visualization.

Statistics and reproducibility.
No statistical methods were used to predetermine sample size; 520,801 single cells were analyzed in total for a time-series MCDA construction. A total of 52 mouse tissues from different development stages were analyzed. Two to four replications were done for different tissues. The results of major cell-type clusters are reproducible. Experimental mice and embryos were randomized before sample preparation. Different single cells were randomly captured before analysis. For all experiments, investigators were blinded to group allocation during the data collection and analysis. All related statistical methods and sample size are described in the figure legends and Methods. Fig. 1 | Construction of the MCDa. a, Hierarchical trees showing the relationship between 95 cell types in MCDA, colored by lineage. b, t-SNE visualization of 520,801 single cells from seven developmental stages of mice, colored by lineage. They share the same color legend of lineages. c, t-SNE visualization of 520,801 single cells from different developmental stages of mice, colored by tissue. d, Heatmaps showing the number of differentially expressed genes (DEGs) in each developmental stage across the ten tissues of mice. DEGs between two stages of cells were identified using a Wilcoxon rank sum test. e, Summary of the GO enrichment analysis performed on the DEGs in each developmental stage. f, Visualization of the top 10 principal components of PCA in MCDA. Colors represent tissues, which is the same in Extended Data Fig. 1c. g, Lollipop chart displaying the gene expression variance explained by residuals (that is, biological and technical noise) or experimental factors such as tissue, stage, gender, and their respective combinations. Items like "tissue and gender" are variances explained by interactions of two factors instead of the union of two factors. h, UMAP visualization of 57,118 single cells in the kidneys at 7 different time points, colored by stage. i, Summary of the GO enrichment analysis performed on the DEGs in the kidneys across different stages. The red marks the go terms related to physiological functions of renal functions. Fig. 4 | Examples of novel cell populations. a, Feature plots in the t-SNE map of P10 lung (n = 11,314 cells). Cells are colored according to the expression of the indicated marker genes or two genes. The red boxes magnify the co-expressed cell types in the tissues. b, Immunofluorescence assay for the club cell marker gene Scgb1a1 (green) and goblet cell marker gene Tff2 (yellow) in P10 lung. The red boxes indicate the co-expressed locations. The experiment was replicated three times with similar results. Scale bar, 20 μm. c, d, Left: feature plots of Afp in the t-SNE map of P0 pancreas (c, n = 5,639 cells), P10 pancreas (d, n = 11,007 cells). Cells are colored according to the expression of Afp. Right: immunofluorescence assay for the hepatocyte marker gene Afp (green) in P0 (c) pancreas and P10 (d) pancreas. The experiment was replicated three times with similar results. Scale bar, 20 μm. e, Heatmap shows the differentially expressed genes between liver hepatocytes and pancreas hepatocyte-like cells at the P0 stage. Wilcoxon rank-sum test (two-sided) was performed to identify differentially expressed genes and p-value adjustment was performed using bonferroni correction (p adjusted values < 0.05, fold change > = 2). f, Heatmap shows the differentially expressed genes between liver hepatocytes and pancreas hepatocyte-like cells at the P10 stage. Wilcoxon rank-sum test (two-sided) was performed to identify differentially expressed genes and p-value adjustment was performed using bonferroni correction (p adjusted values < 0.05, fold change > 2). Fig. 5 | Entropy estimations of the MCDa using. a, Entropy measurement of cells in MCDA using the SLICE method. The color represents the stage. P-values are from a two -sided Wilcoxon rank sum test comparing entropies of two different development stages (n = 60,065 cells, ns: not significant, p-value > 0.05, * p-value ≤ 0.05, ** p-value ≤ 0.01, *** p-value ≤ 0.001, **** p-value ≤ 0.0001). The exact p-values were displayed in the Source Data. Box plots: center line, median; boxes, first and third quartiles of the distribution; whiskers, highest and lowest data points within 1.5 × IQR. The same statistical analysis was performed for Extended Data Fig. 5a-d. b, Entropy measurement of each lineage in MCDA using the SLICE method. The color represents the stage (epithelial: n = 13,642 cells, neuron: n = 3,638 cells, immune: n = 15,719 cells, muscle n = 2,592 cells, stromal: n = 8,541 cells, endothelial: n = 4,528 cells, other: n = 2,626 cells, erythroid: n = cells, proliferating: n = 3,442 cells, secretory: n = 2,892 cells, germline: n = 5,480 cells). c, Entropy measurement of cells in MCDA using the StemID method (n = 60,065 cells). The color represents the stage. d, Entropy measurement of each lineage in MCDA using the StemID method. The color represents the stage (epithelial: n = 13,642 cells, neuron: n = 3,638 cells, immune: n = 15,719 cells, muscle n = 2,592 cells, stromal: n = 8,541 cells, endothelial: n = 4,528 cells, other: n = 2,626 cells, erythroid: n = cells, proliferating: n = 3,442 cells, secretory: n = 2,892 cells, germline: n = 5,480 cells). e, Boxplots displaying the sensitivity, specificity, FPR (False Positive Rate), and PRAUC (Precision-Recall Area Under Curve) of two methods with different inputs to detect tissue-specific TFs in MCDA (n = 9 tissues per box). Methods represented are running VIPER-DOROTHEA with pseudo cells (pseudo_VIPER-DOROTHEA) or single cells (single_VIPER-DOROTHEA), running SCENIC with pseudo cells (pseudo_SCENIC) or single cells(single_SCENIC). The union of the two methods with single cells (single_union (ABC)) was the union of collection ABC. And the intersection of the two methods with single cells (single_intersection (A)) is the collection A. Box plots: center line, median; boxes, first and third quartiles of the distribution; point, tissues in MCDA. The results indicate SCENIC with single-cell datasets performs better in specificity and PRAUC than VIPER-DOROTHEA. The union of two methods achieves over 75% sensitivity in identifying regulatory programs while the intersection of two methods achieves the highest specificity. f, Heatmap of aggregated module activities of TFs clustered by fuzzy c-means showing variation by stage and lineage from VIPER-DOROTHEA. g, Boxplot showing the module activity scores in module 14 (n = 56 TFs) and module 15 (n = 36 TFs) per lineage per stage in SECNIC. Red lines mark the zero line. Colors from blue to yellow represent the 7 development stages from E10.5 to adult stage. Box plots: center line, median; boxes, first and third quartiles of the distribution; whiskers, highest and lowest data points within 1.5 × IQR. h, Venn diagrams of the numbers of overlapping genes between housekeeping TFs and commonly upregulated TFs (TFs in module 14, collection ABC) in MCDA. i, Heatmap showing commonly upregulated TFs (TFs in module 14, collection ABC) with regard to expression levels in MCDA. The color displays the Spearman correlation between aggregated TF expression levels in tissue-lineage against development stages (labeled as 1 to 7 to represent E14.5 to adult). Red blocks indicate the TFs display the upregulated expression patterns in the specific lineages of tissues. Fig. 9 | scRNa-seq revealed the changes in Xbp1 -/embryos. a, Western blot for the knockout experiment. The molecular weight markers were labeled. The experiment was replicated three times with similar results. b, A igv view of mapped reads in the Xbp1 gene in the sequencing data of the WT and KO embryos. The left one shows the entire Xbp1 gene. The right one shows the marked red region which is the exon1 and exon2 region of Xbp1. The exon2 region shows no read coverage, which indicates that the exon2 (97 bp) has been completely disrupted in KO embryos. The blue lines link the different parts of reads that, by definition, map on several exons. The left and right genome browser tracks share the same y axis. c, Xbp1 -/embryos at E12.5. The arrows represent dead embryos. d, Scatter plot showing the cell composition proportions of differential cell types between KO and WT embryos on E12.5 (WT: n = 4, KO: n = 5, FDR < 0.01). e-f, Entropy measurement of each cluster in Fig. 6b using the StemID (e, n = 93,246 cells) and SLICE (f, n = 93,246 cells) methods. They share the same text in the x coordinates. P-values are from a two-sided Wilcoxon rank sum test comparing entropies of two different groups from each cluster (ns: not significant, p-value > 0.05, * p-value ≤ 0.05, ** p-value ≤ 0.01, *** p-value ≤ 0.001, **** p-value ≤ 0.0001). The exact p values were displayed in the Source Data. Box plots: center line, median; boxes, first and third quartiles of the distribution; whiskers, highest and lowest data points within 1.5 × IQR.