Gene expression level, alternative splicing and major isoform switching are very dynamic across embryonic development
To systematically investigate the gene expression changes from different levels during embryonic development, we first explored the changes of gene expression level, alternative splicing and major isoform switching between adjacent embryonic stages based on 1,529 scRNA-seq data of human early embryos from E3 to E7 stages. The software of SCDE was employed to carry out differential gene expression calling for a total of 58,302 genes (Ensembl gene annotation). Using the threshold of FDR < 0.05, we detected thousands of differentially expressed genes (DEGs) in each group of adjacent stage comparisons (Figure 1B). Interestingly, the number of stage-related DEGs were gradually decreased along embryonic development, where the largest number of genes were shown significant expression changes between E3 and E4 stages, while the lowest count of DEGs was identified in E6 versus E7 (Figure 1B). Moreover, more genes were changed in expression level in male embryos compared to that of female embryos during embryonic development except E6 versus E7. In adjacent stage comparisons, a large portion of DEGs shared between male (47.8% ~ 72.4%) and female (27.7% ~ 76.3%) embryos. Surprisingly, only 127 DEGs were common among those neighbouring stage comparisons, leaving most of the DEGs were different between any two groups (Figure 1C). Gene functional enrichment analysis showed that these 127 DEGs were mainly enriched in the pathways/biological processes of oxidative phosphorylation, autophagy-animal, aminoacyl−tRNA biosynthesis, aldosterone−regulated sodium reabsorption and lysosome (see Supplementary Figure 2A).
We further explored the alternative splicing changes between adjacent developmental stages using BRIE. Similar with the trend of stage-related DEGs, the amount of differential alternative splicing genes (DASGs) was also gradually reduced during embryonic development (Figure 1D). Specifically, the counts of DASGs between neighbouring stages range from 2,415 (E3 versus E4) to 1,183 (E6 versus E7). More genes were also exhibited differential alternative splicing in male embryos compared to that of female embryos in each neighbouring stage comparison. We also observed that a large portion of stage-related DASGs were common between male (28.8% ~ 57.3%) and female (24.2% ~ 51.8%) embryos, while the percentages of shared DASGs were smaller than that of corresponding stage-related DEGs. Those stage comparison groups only have 96 common DASGs, while each group possesses hundreds of unique DASGs (Figure 1E). These 96 genes are enriched in the pathways/biological processes of autophagy-animal, ErbB signaling pathway, adipocytokine signaling pathway, prolactin signaling pathway and mTOR signaling pathway (see Supplementary Figure 2B).
Considering that the major isoform (the isoform with the highest expression of a gene) switching is also an important aspect of gene expression changes, we then examined the isoform switching events between adjacent embryonic stages. Strikingly, the number of major isoform switching genes (MISGs) was gradually decreased with embryonic development as well, which was consistent with that of DEGs and DASGs (Figure 1F). Except the comparison of E3 versus E4, a bigger number of genes were switched their major isoforms in male embryos than that of female in other stage comparisons. Notably, only four MISGs (SLC39A7, TMC7, CLDN6 and DDR1) shared among distinct comparison groups (Figure 1G). SLC39A7 is zinc transporter, plays a critical role in regulating cell growth and death. Yan et al. found SLC39A7 is required for eye, brain, and skeleton formation during early embryonic development in zebrafish [15]. The function of TMC7 is not very clear. Recent study of Cheng et al. revealed that TMC7 is overexpressed in pancreatic carcinoma and contribute to tumor progression and metastasis [16]. CLDN6 is one of the earliest molecules expressed in embryonic stem cells committed to epithelial differentiation in both murine and human tissues [17, 18]. DDR1 has been proved to be functionally important in differentiation, cell motility, collagen synthesis and signaling [19]. Consequently, our results show that the changes of gene expression level, alternative splicing and major isoform switching are all decreased during early embryonic development, and the genes involved in these three types of changes varied greatly across different embryonic stages.
Male and female embryos have large differences in gene expression, alternative splicing and major isoform usage
To further check the expression changes associated with sex differences during early embryonic development, we further explored the DEGs, DASGs and MISGs between male and female embryos in each stage. E6 stage had the greatest amount of DEGs, while E3 stage exhibited the fewest number (only 47) of genes changed in expression for gender comparison (Figure 2A). Notably, 32 of those 47 DEGs (68.1%, 20 X chromosome genes and 12 Y chromosome genes) were from sex chromosomes, suggesting that most of the autosome genes do not varied in expression level in E3 stage (see Supplementary Table 1). In contrast, a number of genes were significantly changed in alternative splicing and major isoform switching between male and female embryos of E3. Specifically, 1,038 sex-related DASGs were detected in E3 which is the second largest compared to other stages (Figure 2B), and the largest number of sex-related MISGs (1,248) were identified in E3 (Figure 2C). Thus although only a small fraction of genes exhibited significant changes in expression level between male and female embryos of E3 stage, great differences exist in alternative splicing and major isoform switching. Only 17 DEGs, 13 DSAGs and 5 MISGs shared among different stages of sex comparison and a large portion of unique DEGs, DSAGs and MISGs were identified in each stage (Figure 2D-F). Interestingly, all these 17 DEGs are from sex chromosomes (six of them come from X chromosome and the remaining 11 are from Y chromosome), and 12 of the 13 DASGs are encoded by autosomes, and four of the 5 MISGs come from sex chromosomes (see Supplementary Table 1). Accordingly, the genes involved in expression level changes are largely distinct from those ones significantly changed in alternative splicing and isoform switching for gender comparison in early embryonic development.
Moreover, we found that a significant portion of sex-related DEGs are from sex chromosomes, but the great majority of sex-related DSAGs and MISGs are expressed by autosomes. Only 47 DEGs were detected in E3 stage between male and female, 68.1% of these sex-related DSAGs were from sex chromosomes (20 X chromosome genes and 12 Y chromosome genes) and E4 stage had 52.75% of sex-related DEGs encoded by sex chromosomes, while much smaller portion of sex-related DEGs were generated from sex chromosomes for E5 (17.49%), E6 (10.81%) and E7 (13.16%) stages, indicating that many autosome genes mainly exhibited gene expression level changes between male and female embryos after E4 stage. By contrast, lots of autosome genes showed significant alternative splicing variations and major isoform switching between different sexes from E3 stage, where only 2.28% ~ 4.07% of DASGs and 4.36% ~ 5.95% of MISGs were from sex chromosomes (see Supplementary table 1). Therefore, expression level profile of genes are much different from that of alternative splicing and major isoform switching between male and female embryos during early embryonic development.
Gene expression level, alternative splicing and isoform switching are complementary in profiling gene expression variations
Then we compared the DEGs, DASGs and MISGs resulted from the comparisons of embryonic development and different sexes. Strikingly, only a small portion of genes were common between any two types of DEGs, DASGs and MISGs for stage comparisons (Figure 3A-D), which is similar with the result of sex comparisons for distinct stages (Figure 3E-I). This could be explained by that DEGs, MISGs and DASGs mainly reflect the expression changes at levels of gene, isoform and exon, respectively. Exon level variation in expression may not affect the whole expression at gene and isoform levels, and isoform level changes do not necessarily result in the expression change at gene level as well. Moreover, for the genes shared among DEGs, DASGs and MISGs, the percentages for stage comparisons ranged from 3.9% to 11.4%, while the ratios were smaller for sex comparisons (0.5% ~ 5.3%). Combing the three types of genes, DEGs occupied the largest portion of genes followed by DASGs and MISGs along the embryonic development (Figure 3A-D). However, distinct results were observed between male and female embryos. The DEGs detected from sex comparisons took up the smallest fractions among these three types of genes in both E3 and E4 stages, but occupied the largest portions in sex comparisons of E5, E6 and E7 stages (Figure 3E-I). Therefore, the genes involved in changes of expression level, alternative splicing and major isoform switching are largely different across distinct embryonic developmental stages as well as between disparate sexes. Our results also show that DEGs, DASGs and MISGs reflect the expression changes of genes from distinct aspects, which are complementary for dissecting gene expression dynamics.
DEGs, DASGs and MISGs are involved in distinct but important pathways related to embryos
To gain insights into the functions of those DEGs, DASGs and MISGs, we first conducted gene functional enrichment analysis for stage-related genes using clusterProfiler [20]. As expected, a notable portion of the significantly enriched pathways were distinct among stage-related DEGs, DASGs and MISGs. Specifically, the stage-related DEGs are mainly involved in thermogenesis, oxidative phosphorylation and adherens junction (Figure 4A), whereas stage-related DASGs are mainly enriched in autophagy, ubiquitin mediated proteolysis and phosphatidylinositol signaling system (Figure 4B), and the top enriched pathways for stage-related MISGs are ribosome, cell cycle and HIF−1 signaling pathway. Intriguingly, we found that most of the significantly enriched pathways for stage-related MISGs and DASGs are common with that of stage-related DEGs (Figure 4D, P-value < 0.05). But only 10 significantly enriched pathways shared among the DEGs, DASGs and MISGs of stage comparisons (such as endocytosis, autophagy-animal, mitophagy, mTOR signaling pathway, Fc gamma R-mediated phagocytosis, endometrial cancer, choline metabolism in cancer, tight junction, AMPK signaling pathway and ubiquitin mediated proteolysis), and each type of stage-related genes enriched in some unique pathways (Figure 4D, P-value < 0.05).
Next, we carried out functional enrichment analysis for those DEGs, DASGs and MISGs identified in gender comparisons. The enriched pathways differ greatly among these three types of genes, which shows larger differences with that of stage comparisons. The top enriched pathways for sex-related DEGs are oxidative phosphorylation, protein processing in endoplasmic reticulum and thermogenesis (Figure 4E), while sex-related DASGs are mainly involved in ubiquitin mediated proteolysis, autophagy–animal, and phosphatidylinositol signaling system (Figure 4F). The sex related MISGs are mainly enriched in spliceosome, ribosome, cell cycle, and RNA transport (Figure 4G). Surprisingly, no significantly enriched pathways are common among sex related DEGs, DASGs and MISGs, suggesting that these three types of genes mainly function in distinct pathways.
Functionally important regulatory modules are identified in early embryonic development
To explore the gene expression regulations during early embryonic development, we further constructed the gene expression regulatory network in early embryos by employing SCENIC [21]. A total of 506 regulons that the modules are with significant motif enrichment of upstream regulating TFs were identified. Each regulon contains a transcription factor (TF) and the target genes likely regulated by this TF. Among these 506 TFs of regulons, 277 were stage-related DEGs, 80 were stage-related DASGs and 70 were stage-related MISGs. Thus most of those TFs were involved at least one level changes of gene expression, alternative splicing and major isoform switching. Interestingly, the embryonic developmental process from E3 to E7 can be clearly revealed based on those 506 regulons in t-SNE (t-Distributed Stochastic Neighbor Embedding), suggesting that the gene regulatory networks vary greatly among different developmental stages (Figure 5A). The male and female cells from the same embryonic stage mixed together in t-SNE, indicating that male and female embryos share similar regulatory network in the same stage. In order to identify stage specific regulons, we further examined the genes with significantly enriched expression in each stage using the method of analysis of variance (ANOVA). With the criteria of fold change > 2 and adjusted P-value < 0.01, E3 had the largest number (2,977) of genes with enriched expression (EEGs), whereas E6 possessed only 16 EEGs, which was the smallest. For E4, E5 and E7 stages, 530, 50 and 193 EEGs were identified, respectively (Figure 5B). Then we compared these EEGs with those 506 regulons and found that 56, 9, 1, and 9 specific regulons were separately detected E3, E4, E5 and E7 stages. No specific regulons were identified in E6 stage, which could be resulted from the fact that the gene expression profile in E6 was similar with that of E7 since that the smallest amount of DEGs, DASGs and MISGs were detected in E6 versus E7 compared to other adjacent stage comparisons.
The regulon of TF CREB1 is one of the specific regulons in E3 stage, where 602 genes potentially regulated by CREB1 were involved (Figure 5C). CREB1 is a critical TF that regulates ~25% of the eukaryotic genome and plays an important role in the synchronization of circadian rhythmicity, the differentiation of adipose cells, initial development of the nervous system, memory formation, and neuronal protection [22-24]. Some studies indicate that CREB1 mediates signals essential for maintaining cell viability during early embryonic development of mouse [25, 26]. Gene functional enrichment analysis showed that those 602 genes were mainly involved in the biological processes of histone modification, covalent chromatin modification and chromosome segregation (P-value < 0.05). The regulon of TF ATF3 and its 461 target genes were mainly expressed in E4 stages (Figure 5D). ATF3 is induced by a variety of signals, including cell cycling, neutrophil migration and sexual differentiation [27-29]. Cheng et al. revealed that ATF3 has an important role in regulating human endometrial receptivity and embryo attachment in vitro via up-regulation of leukemia inhibitory factor [30]. Those 461 genes were enriched in the biological processes of anion transmembrane transport, response to corticosteroid and response to glucocorticoid (P-value < 0.05). The only one specific regulon of E5 stage contains TF NFKB2 and its 42 target genes (Figure 5E). A central role for the NFKB2 gene is the maintenance of the peripheral B-cell population, humoral immune responsiveness, and splenic architecture [31]. But no previous studies show the association between NFKB2 and embryonic development, suggesting that NFKB2 is a novel TF specifically expressed E5 stage. Those 42 targeting genes of NFKB2 were mainly involved in epithelial tube morphogenesis, protein autophosphorylation and regulation of smooth muscle cell proliferation (P-value < 0.05). In E7 stage, TF GCM1 coupled with 38 genes formed a specific regulon (Figure 5F). Proteins encoded by GCM1 are crucial for mediating the differentiation of trophoblast cells along both the villous and extravillous pathways in placental development [32]. Those 38 targeting genes were mainly enriched in leukocyte migration and cell chemotaxis (P-value < 0.05).
The usage of transcription factor binding motifs for TFs is very dynamic along embryo development
Since the binding sequences of TFs are usually conserved, we further investigated the transcription factor binding motifs (TFBMs) for those 506 regulons by employing RcisTarget [21]. Interestingly, the counts of enriched TFBMs for those 506 regulons varied tremendously (range from 1 to 2,601). It is notable that TF GABPA has the largest number (2,601) of TFBMs. GABPA belongs to the E-twenty-six (ETS) family of DNA-binding factors and regulates a broad range of genes involved in cell cycle control, apoptosis, differentiation, hormonal regulation, and other critical cellular functions [33]. Most of the TFs in those regulons possessed 1~50 TFBMs (see Supplementary Figure 3). Over 91% (461 out of 506) of TFs had at least two TFBSs, suggesting that these TFs exhibit dynamic TFBS usage in early embryonic development. Specifically, the aforementioned stage-specific TFs CREB1, ATF3, NFKB2 and GCM1 for E3, E4, E5 and E7 stages possessed 6, 223, 48 and 8 TFBMs, respectively. Some of those TFBMs for the stage-specific TFs are shown in Figure 5C-F. Thus, our result show that the TFBM usage for those 506 TFs of regulons are very dynamic in early embryonic development and distinct TFBMs could be used in different developmental stages.