Transcriptomics of early development in mouse and human. Nearly half of all autosomal genes are expressed in early embryogenesis, i.e., during the first week of development (Figure 1A), in both sexes of mouse and human (Figure 1B). While the number of expressed autosomal genes remains relatively constant from the two-cell to pre-implantation stages across sex and species, a larger variance in the fraction of sex-chromosomal genes expressed across mouse and human developmental stages is observed. The ratio of X:A gene expression is different between mammals with relatively lower fractions of X-linked genes being expressed in mouse. In addition, a larger proportion of Y-chromosome genes are expressed in male human compared to male mouse. However, the Y-linked expressed fraction is generally much lower than the expressed fraction of X-linked and autosomal genes in both mammals (Figure 1B).
Of the 1,000 most highly expressed genes in mouse preimplantation stages, the number of genes that encode transcription factors (TF) and epigenetic enzymes (EE) peaked at the two-cell stage with steadily diminishing numbers as development proceeded (Em1.5: 51 TFs, 27 EEs; Em3: 24 TFs, 12 EEs). A similar trend was observed for human embryos in transcription factors (Em3: 49 TFs; Em7: 23 TFs). However, the number of expressed genes encoding EEs was almost constant between stages. Surprisingly, the orthologs of only 17 TFs and 6 EEs expressed in the mouse were detected in human embryos. Among the common regulatory factors between mouse and human, most were expressed at analogous stages of early embryonic development, such as Atf4, Elf3, Sall4, Tfap2c, Hdac1, Kdm5b and Tet1.
Several of the so-called “pluripotency factors”32,33, which include transcription factors, epigenetic enzymes, and signaling molecules, were detected in the two datasets with several uniquely expressed in one of the two species. For example, Dnmt3b, Dnmt3l, Sall4, and Tead4 were present in both mouse and human while Nanog, Esrrb, Gata4, and Pou5f1 (Oct4) were not detected in human embryos, and Klf4, Myc and Dnmt3a were not detected in mouse.
Relative contribution of developmental stage versus sex to gene expression in early embryogenesis. Normalized genome-wide expression counts from each sample were found to primarily cluster according to developmental stage progression in both mouse and human (Figure 2A-B). The first two principal components explained 63% and 50% of the total variance in gene expression in, respectively, mouse and human. While male and female samples from each developmental stage clustered in a time-dependent manner, it was difficult to visually differentiate among sexed samples via PCA. Male and female samples appear to weakly cluster together at very early stages and less during later stages of pre-implantation embryogenesis. To better understand the quantitative contribution of sex across development stages, we employed a linear regression model and found that sex explained nearly a quarter of the genetic variance in gene expression during the earliest stages of embryogenesis in both mouse and human (Figure 2C-D) but that this contribution of sex rapidly decreased. This rapid diminution in the expression variation that is explained by sex reflects that sex’s relative role decreases rapidly and substantially across very early development. Genes with the highest and lowest principal component scores for the top 30 principal components of gene expression data in mice and humans are shown in Supplementary Figures S2 and S3.
Enrichment of sex-differentially expressed genes. Common enrichment patterns among sex-differentially expressed genes (sexDEGs) are observed in males and females in both mouse and human (Table 1). As expected, due to the different number of sex chromosomes in males and females, sexDEGs that are X- and Y-linked are enriched in females and males across both species, respectively (Table 1). In contrast, the number of sex-differentially expressed autosomal genes is statistically under-represented in both sexes in most early developmental stages in mouse and human. X-linked genes also have lower than expected number of sexDEGs in early stages (though with low sample sizes). However, the X-chromosome becomes over-enriched for sexDEGs during the blastocyst stage. In terms of functional annotations, while the majority of DEGs are protein-coding genes, these genes are under-enriched for DEGs in many developmental stages with a greater effect in humans, likely due to increased samples and statistical power (Table 1). Similarly, this lack of power likely impedes our ability to detect enrichment in epigenetic and transcription factor sexDEGs in early embryogenesis.
Characterization of sex-specific differences in early mammalian embryogenesis. During the early stages of embryogenesis in both mouse and human, more genes appear to be expressed than not expressed (Figure 3A). However, this ratio quickly becomes approximately 1:1. The number of sex-differentially expressed genes in both mouse and human is small (Figure 3A). In mouse, the total number of sexDEGs increases starting from the four-cell stage and peaks at the sixteen-cell stage (Em3). Although the number of male-biased genes is eight times higher than female-biased genes at the four-cell and eight-cell stages, this ratio changes at the sixteen-cell stage in which upregulated female-biased genes are about twice the number of upregulated male DEGs. Our functional enrichment analyses of DEGs found that transcription factors (TFs) were enriched at the four-cell, eight-cell, and 64-cell stages while genes on the sex (X,Y) chromosomes were enriched at the eight- cell and 64-cell stages. We also found a significant difference in the number of DEGs encoding transcription factors (TFs) over-expressed in females compared with TFs that were over-expressed in males across developmental stages. The number of female-biased TF DEGs follows the same pattern as total DEGs, while the number of male-biased TF DEGs remains constant across developmental stages. In human, a comparable magnitude of sexDEGs is seen across stages with a peak in the late blastocyst stage, where female DEGs were found to be twice as high as male DEGs (Figure 3A).
Figure 3B presents heatmaps of the top twenty most consistently female-biased (red, top half) and male-biased (blue, bottom half) genes across development stages for mouse (left heatmap) and human (right heatmap). Interestingly, there were only two orthologs between mouse and human that showed similar sex-biased patterns, KDM5D and DDX3Y, both Y-linked genes. KDM5D is a histone lysine demethylase with a repressive transcriptional role and DDX3Y is a DEAD-box RNA helicase.
Functional enrichment of female selected clusters shows “nucleobase-containing small molecule metabolic process” involved in processing of nucleotides. However, this analysis for selected clusters of male samples shows metabolic and catabolic processes. (The samples from each stage are shown in different colors.) Surprisingly, the resulting network from male samples was very small with only 51 nodes; however, the female samples resulted in a network with about 4,000 nodes. Almost all the important nodes in each network (based on different centrality criteria) are available in the DEGs list (networks not shown here).
Differences in sex-differentially expressed protein interactions between mouse and human. In early mouse development, protein-protein interactions are dominated by non-differentially expressed genes (Figure 4A). In the 16-cell stage, we observe a burst of male-specific interactions and in stage 32C we see a similar burst in female-specific interactions while the number of interactions between non-DEGs decreases accordingly. In these “bursts”, interactions between male or female DEGs with non-DEGs also increases, but the same decrease is seen in both male and female in the blastocyst stage. A different pattern is seen in humans. Comparing the normalized fraction of interactions to total DEGs, we don’t see a burst in male- or female-specific interactions like we do in mice but there are still some patterns. Non-DEGs interactions are higher in earlier stages, decrease in middle stages E5 and E6, and increase again in the latest stage. We also observe an increase in female-female DEG interactions in E6, but not like the sharp increase observed in mice. It should be noted that the network sizes for human tend to be smaller here due to fewer human genes mapping to the PPI data.
Lack of conservation signal in orthologous genes across very early mouse and human embryonic stages. A clustering analysis (UMAP) of integrated gene expression data of all samples from both mammals allow us to align both datasets and compare the structure of clusters between mouse and human across timepoints. We observe similar stages in mouse and human in the same region of the graph, though again, it’s difficult to directly compare with the limited mouse samples. Cells are colored based on a general stage of developmental timepoints (see key in Figure 5A) that include enough samples from both mouse and human. Categorizing the cells between mouse and human in this manner is essential for our analysis in Figure 5B-E, which identifies genes that are conserved between mouse and human specifically in each general stage of S3, etc. Therefore, it’s essential to have at least some cells from both species that are in each timepoint to do this comparison.
In Figure 5B-E, the same UMAPs, but normalized expression levels are highlighted on a gradient in each plot. An example of a highly conserved gene for each stage described in Figure 5A (by p-value, see table of top conserved genes for each general timepoint S1, S2, S3, S4). Only genes in the top 10th quartile for expression level in each species are used in the color bar scale; the rest are gray to display contrast of high expression in certain developmental timepoints vs. others.
Conserved sex-biased genes and networks between mouse and human. We applied non-negative matrix factorization to identify co-expressed genes across stages separately in i) male mouse (Figure 6A), ii) male human (Figure 6B), iii) female mouse (Figure 6C), and iv) female human (Figure 6D). NMF analysis of gene expression data
gives unsupervised clusters of genes with similar expression patterns across timepoints (Figure 6A-D) (Supplementary Figures S4-S12).
A heatmap of GSEA of NMF clusters shows concordance between enriched clusters in mouse and human that include signals from male and female gametes, possibly remnants of events during fertilization (Figure 6E) (Supplementary Figures S13-17).