Single-Cell Analysis of Alternative Splicing and Gene Regulatory Network Reveals Remarkable Expression and Regulation Dynamics During Early Embryonic Development

Abstract


Background
The advent of single-cell RNA-seq (scRNA-seq) technologies enables the exploration of gene expression at single-cell resolution.To date, scRNA-seq has been widely applied to explore cell subtypes, gene expression dynamics and heterogeneity of diverse tissues in many species [1][2][3][4][5][6][7][8], which have revealed various meaningful results.By contrast, traditional bulk RNA-seq technologies mainly re ect the averaged gene expression of a large number of cells, which are inappropriate to study gene expression heterogeneity especially early embryonic development since only a small number of cells are contained.
With the innovation of scRNA-seq, an increasing number of different scRNA-seq protocols have been developed, but only the full-length transcript capturing scRNA-seq technologies (such as Smart-seq2 [9]) allow the analysis of alternative splicing rather than the 3'-end capturing protocols (e.g.drop-seq [10]) [8].However, the great majority of the published scRNA-seq studies mainly focused on the expression changes at gene level, which ignored the expression dynamics at isoform and exon levels.
Previously, Yan et al. rst investigated the transcriptome landscape of human early embryos from oocyte to blastocyst, where they revealed dynamic expression patterns across different developmental stages [11].Then Xue et al. further compared the gene expression pro les between humans and mice from oocyte to morula, and highlighted the evolutionary conservation of some expression networks in early embryonic development [12,13].Recently, Petropoulos et al. provided a single-cell transcriptome pro ling of a number of human preimplantation embryos across ve developmental stages (E3-E7) [14], but they did not explored the alternative splicing and gene regulatory pro les of early embryos.Both of the changes in gene expression level (gene-level) and alternative splicing (exon-level) are crucial for understanding the gene expression dynamics of early embryonic development.Moreover, isoform switching (isoform-level) is another important aspect of gene expression since most of human genes could encode multiple different isoforms, but none of these studies investigated and compared these different-level changes.Besides, a network view of dynamic gene expression regulations during embryonic development for unraveling developmental regulatory changes is still lacking.Consequently, it is necessary to further explore the expression of genes from different aspects (gene-level, isoform-level, exon-level and regulatory network) to better and more comprehensively understand the gene expression dynamics and the underlying mechanisms of embryonic development.
Here we further comprehensively analyzed the gene expression pro le of human early embryonic development process from distinct levels of gene, isoform, exon and regulatory network based on the scRNA-seq data of 1,529 cells from 88 male and female embryos.Besides differential gene expression analysis, we also investigated the changes of alternative splicing and major isoform switching along embryonic development as well as the sex differences between male and female embryos.Furthermore, we compared the genes and related pathways involved in differential expression, signi cantly alternative splicing and major isoform switching, and revealed large inconsistency among these aspects of gene expression changes.Finally, we constructed gene expression regulatory network of embryonic development and identi ed stage-speci c regulatory modules as well as enriched motifs bound by corresponding transcription factors.Our systematic analyses revealed interesting gene expression dynamics from diverse aspects, which deepens the understanding of underlying processes and mechanisms of human early embryonic development.

Gene expression level, alternative splicing and major isoform switching are very dynamic across embryonic development
To systematically investigate the gene expression changes from different levels during embryonic development, we rst explored the changes of gene expression level, alternative splicing and major isoform switching between adjacent embryonic stages based on 1,529 scRNA-seq data of human early embryos from E3 to E7 stages.The software of SCDE was employed to carry out differential gene expression calling for a total of 58,302 genes (Ensembl gene annotation).Using the threshold of FDR < 0.05, we detected thousands of differentially expressed genes (DEGs) in each group of adjacent stage comparisons (Figure 1B).Interestingly, the number of stage-related DEGs were gradually decreased along embryonic development, where the largest number of genes were shown signi cant expression changes between E3 and E4 stages, while the lowest count of DEGs was identi ed in E6 versus E7 (Figure 1B).Moreover, more genes were changed in expression level in male embryos compared to that of female embryos during embryonic development except E6 versus E7.In adjacent stage comparisons, a large portion of DEGs shared between male (47.8% ~ 72.4%) and female (27.7% ~ 76.3%) embryos.Surprisingly, only 127 DEGs were common among those neighbouring stage comparisons, leaving most of the DEGs were different between any two groups (Figure 1C).Gene functional enrichment analysis showed that these 127 DEGs were mainly enriched in the pathways/biological processes of oxidative phosphorylation, autophagy-animal, aminoacyl−tRNA biosynthesis, aldosterone−regulated sodium reabsorption and lysosome (see Supplementary Figure 2A).
We further explored the alternative splicing changes between adjacent developmental stages using BRIE.Similar with the trend of stage-related DEGs, the amount of differential alternative splicing genes (DASGs) was also gradually reduced during embryonic development (Figure 1D).Speci cally, the counts of DASGs between neighbouring stages range from 2,415 (E3 versus E4) to 1,183 (E6 versus E7).More genes were also exhibited differential alternative splicing in male embryos compared to that of female embryos in each neighbouring stage comparison.We also observed that a large portion of stage-related DASGs were common between male (28.8% ~ 57.3%) and female (24.2% ~ 51.8%) embryos, while the percentages of shared DASGs were smaller than that of corresponding stage-related DEGs.Those stage comparison groups only have 96 common DASGs, while each group possesses hundreds of unique DASGs (Figure 1E).These 96 genes are enriched in the pathways/biological processes of autophagyanimal, ErbB signaling pathway, adipocytokine signaling pathway, prolactin signaling pathway and mTOR signaling pathway (see Supplementary Figure 2B).
Considering that the major isoform (the isoform with the highest expression of a gene) switching is also an important aspect of gene expression changes, we then examined the isoform switching events between adjacent embryonic stages.Strikingly, the number of major isoform switching genes (MISGs) was gradually decreased with embryonic development as well, which was consistent with that of DEGs and DASGs (Figure 1F).Except the comparison of E3 versus E4, a bigger number of genes were switched their major isoforms in male embryos than that of female in other stage comparisons.Notably, only four MISGs (SLC39A7, TMC7, CLDN6 and DDR1) shared among distinct comparison groups (Figure 1G).SLC39A7 is zinc transporter, plays a critical role in regulating cell growth and death.Yan et al. found SLC39A7 is required for eye, brain, and skeleton formation during early embryonic development in zebra sh [15].The function of TMC7 is not very clear.Recent study of Cheng et al. revealed that TMC7 is overexpressed in pancreatic carcinoma and contribute to tumor progression and metastasis [16].CLDN6 is one of the earliest molecules expressed in embryonic stem cells committed to epithelial differentiation in both murine and human tissues [17,18].DDR1 has been proved to be functionally important in differentiation, cell motility, collagen synthesis and signaling [19].Consequently, our results show that the changes of gene expression level, alternative splicing and major isoform switching are all decreased during early embryonic development, and the genes involved in these three types of changes varied greatly across different embryonic stages.

Male and female embryos have large differences in gene expression, alternative splicing and major isoform usage
To further check the expression changes associated with sex differences during early embryonic development, we further explored the DEGs, DASGs and MISGs between male and female embryos in each stage.E6 stage had the greatest amount of DEGs, while E3 stage exhibited the fewest number (only 47) of genes changed in expression for gender comparison (Figure 2A).Notably, 32 of those 47 DEGs (68.1%, 20 X chromosome genes and 12 Y chromosome genes) were from sex chromosomes, suggesting that most of the autosome genes do not varied in expression level in E3 stage (see Supplementary Table 1).In contrast, a number of genes were signi cantly changed in alternative splicing and major isoform switching between male and female embryos of E3.Speci cally, 1,038 sex-related DASGs were detected in E3 which is the second largest compared to other stages (Figure 2B), and the largest number of sexrelated MISGs (1,248) were identi ed in E3 (Figure 2C).Thus although only a small fraction of genes exhibited signi cant changes in expression level between male and female embryos of E3 stage, great differences exist in alternative splicing and major isoform switching.Only 17 DEGs, 13 DSAGs and 5 MISGs shared among different stages of sex comparison and a large portion of unique DEGs, DSAGs and MISGs were identi ed in each stage (Figure 2D-F).Interestingly, all these 17 DEGs are from sex chromosomes (six of them come from X chromosome and the remaining 11 are from Y chromosome), and 12 of the 13 DASGs are encoded by autosomes, and four of the 5 MISGs come from sex chromosomes (see Supplementary Table 1).Accordingly, the genes involved in expression level changes are largely distinct from those ones signi cantly changed in alternative splicing and isoform switching for gender comparison in early embryonic development.
Moreover, we found that a signi cant portion of sex-related DEGs are from sex chromosomes, but the great majority of sex-related DSAGs and MISGs are expressed by autosomes.Only 47 DEGs were detected in E3 stage between male and female, 68.1% of these sex-related DSAGs were from sex chromosomes (20 X chromosome genes and 12 Y chromosome genes) and E4 stage had 52.75% of sexrelated DEGs encoded by sex chromosomes, while much smaller portion of sex-related DEGs were generated from sex chromosomes for E5 (17.49%),E6 (10.81%) and E7 (13.16%) stages, indicating that many autosome genes mainly exhibited gene expression level changes between male and female embryos after E4 stage.By contrast, lots of autosome genes showed signi cant alternative splicing variations and major isoform switching between different sexes from E3 stage, where only 2.28% 4.07% of DASGs and 4.36% ~ 5.95% of MISGs were from sex chromosomes (see Supplementary table 1).Therefore, expression level pro le of genes are much different from that of alternative splicing and major isoform switching between male and female embryos during early embryonic development.

Gene expression level, alternative splicing and isoform switching are complementary in pro ling gene expression variations
Then we compared the DEGs, DASGs and MISGs resulted from the comparisons of embryonic development and different sexes.Strikingly, only a small portion of genes were common between any two types of DEGs, DASGs and MISGs for stage comparisons (Figure 3A-D), which is similar with the result of sex comparisons for distinct stages (Figure 3E-I).This could be explained by that DEGs, MISGs and DASGs mainly re ect the expression changes at levels of gene, isoform and exon, respectively.Exon level variation in expression may not affect the whole expression at gene and isoform levels, and isoform level changes do not necessarily result in the expression change at gene level as well.Moreover, for the genes shared among DEGs, DASGs and MISGs, the percentages for stage comparisons ranged from 3.9% to 11.4%, while the ratios were smaller for sex comparisons (0.5% ~ 5.3%).Combing the three types of genes, DEGs occupied the largest portion of genes followed by DASGs and MISGs along the embryonic development (Figure 3A-D).However, distinct results were observed between male and female embryos.The DEGs detected from sex comparisons took up the smallest fractions among these three types of genes in both E3 and E4 stages, but occupied the largest portions in sex comparisons of E5, E6 and E7 stages (Figure 3E-I).Therefore, the genes involved in changes of expression level, alternative splicing and major isoform switching are largely different across distinct embryonic developmental stages as well as between disparate sexes.Our results also show that DEGs, DASGs and MISGs re ect the expression changes of genes from distinct aspects, which are complementary for dissecting gene expression dynamics.

DEGs, DASGs and MISGs are involved in distinct but important pathways related to embryos
To gain insights into the functions of those DEGs, DASGs and MISGs, we rst conducted gene functional enrichment analysis for stage-related genes using clusterPro ler [20].As expected, a notable portion of the signi cantly enriched pathways were distinct among stage-related DEGs, DASGs and MISGs.Speci cally, the stage-related DEGs are mainly involved in thermogenesis, oxidative phosphorylation and adherens junction (Figure 4A), whereas stage-related DASGs are mainly enriched in autophagy, ubiquitin mediated proteolysis and phosphatidylinositol signaling system (Figure 4B), and the top enriched pathways for stage-related MISGs are ribosome, cell cycle and HIF−1 signaling pathway.Intriguingly, we found that most of the signi cantly enriched pathways for stage-related MISGs and DASGs are common with that of stage-related DEGs (Figure 4D, P-value < 0.05).But only 10 signi cantly enriched pathways shared among the DEGs, DASGs and MISGs of stage comparisons (such as endocytosis, autophagyanimal, mitophagy, mTOR signaling pathway, Fc gamma R-mediated phagocytosis, endometrial cancer, choline metabolism in cancer, tight junction, AMPK signaling pathway and ubiquitin mediated proteolysis), and each type of stage-related genes enriched in some unique pathways (Figure 4D, P-value < 0.05).
Next, we carried out functional enrichment analysis for those DEGs, DASGs and MISGs identi ed in gender comparisons.The enriched pathways differ greatly among these three types of genes, which shows larger differences with that of stage comparisons.The top enriched pathways for sex-related DEGs are oxidative phosphorylation, protein processing in endoplasmic reticulum and thermogenesis (Figure 4E), while sex-related DASGs are mainly involved in ubiquitin mediated proteolysis, autophagy-animal, and phosphatidylinositol signaling system (Figure 4F).The sex related MISGs are mainly enriched in spliceosome, ribosome, cell cycle, and RNA transport (Figure 4G).Surprisingly, no signi cantly enriched pathways are common among sex related DEGs, DASGs and MISGs, suggesting that these three types of genes mainly function in distinct pathways.

Functionally important regulatory modules are identi ed in early embryonic development
To explore the gene expression regulations during early embryonic development, we further constructed the gene expression regulatory network in early embryos by employing SCENIC [21].A total of 506 regulons that the modules are with signi cant motif enrichment of upstream regulating TFs were identi ed.Each regulon contains a transcription factor (TF) and the target genes likely regulated by this TF.Among these 506 TFs of regulons, 277 were stage-related DEGs, 80 were stage-related DASGs and 70 were stage-related MISGs.Thus most of those TFs were involved at least one level changes of gene expression, alternative splicing and major isoform switching.Interestingly, the embryonic developmental process from E3 to E7 can be clearly revealed based on those 506 regulons in t-SNE (t-Distributed Stochastic Neighbor Embedding), suggesting that the gene regulatory networks vary greatly among different developmental stages (Figure 5A).The male and female cells from the same embryonic stage mixed together in t-SNE, indicating that male and female embryos share similar regulatory network in the same stage.In order to identify stage speci c regulons, we further examined the genes with signi cantly enriched expression in each stage using the method of analysis of variance (ANOVA).With the criteria of fold change > 2 and adjusted P-value < 0.01, E3 had the largest number (2,977) of genes with enriched expression (EEGs), whereas E6 possessed only 16 EEGs, which was the smallest.For E4, E5 and E7 stages, 530, 50 and 193 EEGs were identi ed, respectively (Figure 5B).Then we compared these EEGs with those 506 regulons and found that 56, 9, 1, and 9 speci c regulons were separately detected E3, E4, E5 and E7 stages.No speci c regulons were identi ed in E6 stage, which could be resulted from the fact that the gene expression pro le in E6 was similar with that of E7 since that the smallest amount of DEGs, DASGs and MISGs were detected in E6 versus E7 compared to other adjacent stage comparisons.
The regulon of TF CREB1 is one of the speci c regulons in E3 stage, where 602 genes potentially regulated by CREB1 were involved (Figure 5C).CREB1 is a critical TF that regulates ~25% of the eukaryotic genome and plays an important role in the synchronization of circadian rhythmicity, the differentiation of adipose cells, initial development of the nervous system, memory formation, and neuronal protection [22][23][24].Some studies indicate that CREB1 mediates signals essential for maintaining cell viability during early embryonic development of mouse [25,26].Gene functional enrichment analysis showed that those 602 genes were mainly involved in the biological processes of histone modi cation, covalent chromatin modi cation and chromosome segregation (P-value < 0.05).The regulon of TF ATF3 and its 461 target genes were mainly expressed in E4 stages (Figure 5D).ATF3 is induced by a variety of signals, including cell cycling, neutrophil migration and sexual differentiation [27][28][29].Cheng et al. revealed that ATF3 has an important role in regulating human endometrial receptivity and embryo attachment in vitro via up-regulation of leukemia inhibitory factor [30].Those 461 genes were enriched in the biological processes of anion transmembrane transport, response to corticosteroid and response to glucocorticoid (P-value < 0.05).The only one speci c regulon of E5 stage contains TF NFKB2 and its 42 target genes (Figure 5E).A central role for the NFKB2 gene is the maintenance of the peripheral B-cell population, humoral immune responsiveness, and splenic architecture [31].But no previous studies show the association between NFKB2 and embryonic development, suggesting that NFKB2 is a novel TF speci cally expressed E5 stage.Those 42 targeting genes of NFKB2 were mainly involved in epithelial tube morphogenesis, protein autophosphorylation and regulation of smooth muscle cell proliferation (Pvalue < 0.05).In E7 stage, TF GCM1 coupled with 38 genes formed a speci c regulon (Figure 5F).Proteins encoded by GCM1 are crucial for mediating the differentiation of trophoblast cells along both the villous and extravillous pathways in placental development [32].Those 38 targeting genes were mainly enriched in leukocyte migration and cell chemotaxis (P-value < 0.05).

The usage of transcription factor binding motifs for TFs is very dynamic along embryo development
Since the binding sequences of TFs are usually conserved, we further investigated the transcription factor binding motifs (TFBMs) for those 506 regulons by employing RcisTarget [21].Interestingly, the counts of enriched TFBMs for those 506 regulons varied tremendously (range from 1 to 2,601).It is notable that TF GABPA has the largest number (2,601) of TFBMs.GABPA belongs to the E-twenty-six (ETS) family of DNA-binding factors and regulates a broad range of genes involved in cell cycle control, apoptosis, differentiation, hormonal regulation, and other critical cellular functions [33].Most of the TFs in those regulons possessed 1~50 TFBMs (see Supplementary Figure 3).Over 91% (461 out of 506) of TFs had at least two TFBSs, suggesting that these TFs exhibit dynamic TFBS usage in early embryonic development.Speci cally, the aforementioned stage-speci c TFs CREB1, ATF3, NFKB2 and GCM1 for E3, E4, E5 and E7 stages possessed 6, 223, 48 and 8 TFBMs, respectively.Some of those TFBMs for the stage-speci c TFs are shown in Figure 5C-F.Thus, our result show that the TFBM usage for those 506 TFs of regulons are very dynamic in early embryonic development and distinct TFBMs could be used in different developmental stages.

Discussion
In this study, we systematically explored the gene expression pro le of human early embryos from the changes of gene-level (differential expression calling), isoform level (major isoform switching), exon-level (alternative splicing analysis) and expression regulatory network inference.Intriguingly, a common trend among DEGs, DASGs and MISGs is that their quantity is gradually decreased along the developmental stages, indicating that the expression changes is reduced during embryonic development.But for the comparison between male and female embryos of each stage, we did not observe similar trend as stage comparisons.The number of sex-related DEGs, DASGs and MISGs detected in different stages vary greatly as well.An interesting result is that only 47 sex-related DEGs were detected in E3 stage; however, 1,038 DASGs and 1,284 MISGs were identi ed between male and female comparison of E3.The result indicates that although the expression has change little between different sexes at gene level in E3 stage, great variations existed in alternative spicing and isoform switching of genes.Moreover, the stage-related DEGs, MISGs and DASGs had another similar tendency that the majority of each type of genes were distinct among the comparisons of E3-E4, E4-E5, E5-E6 and E6-E7, suggesting that the expression changes varied widely across different levels of gene, isoform and exon.Gene functional enrichment analysis results showed that those DEGs, DASGs and MISGs were mainly enriched different pathways, which is reasonable since the large differences of distinct types of genes.Therefore, combing expression level, alternative splicing and isoform switching can reveal a broader range of gene expression changes.
Alternative splicing enables the multi-exon genes to generate different protein-coding and/or noncoding isoforms, which largely increased the diversity of transcriptome and can in uence both of gene expression level and isoform switching events [34].When a gene executes speci c role/function in distinct conditions, it may employ disparate isoforms and change the expression levels of corresponding isoforms, resulting the switching of major isoform [34].In fact, gene expression level re ects the total expression changes of all the isoforms encoded by this gene, while isoform switching and alternative splicing separately show the expression variations of isoforms and exons.Therefore, the genes involved in expression changes among the levels of gene, isoform and exon could vary tremendously.Currently, most scRNA-seq studies mainly focused on the expression level changes of genes, which missed a large number of genes that signi cantly changed in alternative splicing and isoform switching [35].Our results indicate that the genes signi cantly changed in these three aspects are much different in early embryonic development, and they are complementary for elucidating the dynamic expression pro le of genes.To the best of our knowledge, our study is the rst to simultaneously explore the expression variations from gene expression level, alternative splicing and isoform switching.
Additionally, we identi ed 506 functional important regulons related to early embryonic development including some stage-speci c regulons (such as E3: MSX1, E4: SOX2 and E7: TFEB), and revealed dynamic usages of TFBMs for those involved TFs.MSX1 encodes transcription factors that control organogenesis and tissue interactions during embryonic development [36].Systemic deletion of MSX1 in mice results in perinatal lethality [37].SOX2 is necessary for the maintenance of pluripotency in epiblast and embryonic stem cells and knockout of SOX2 is early embryonic lethal [38,39].Later in development, SOX2 is required in various tissue stem cells and early progenitors, especially in the nervous system [40].Eiríkur et al. showed that TFEB plays a critical role in the signal transduction processes required for normal vascularization of the placenta [41].Therefore, the stage-speci c regulons identi ed by us are functionally important for embryonic development.

Conclusions
Collectively, our study gains insights into the dynamic changes of gene activities during early embryonic development from gene expression level, alternative splicing, isoform switching and gene expression regulatory network.The systematic analyses of gene expression pro le from different aspects can deepen the understanding of gene expression variations and the underlying molecular mechanisms.

Methods
Single-cell RNA-seq data processing and sex determination of early embryos A total of 1,529 scRNA-seq data of 88 human male and female early embryos (from stage E3 to E7) were rst downloaded from ArrayExpress database (https://www.ebi.ac.uk/arrayexpress/) with accession number E-MTAB-3929.Then the scRNA-seq data were mapped to the human reference genome GRCh38 using HISAT2 [42] (version 2.1.0)with default parameters.Expression of genes and transcripts were quanti ed in Transcripts Per Kilobase Million (TPM) StringTie [43] (version 1.3.3b)with parameters of "-e -A" based on the human gene annotation le of Ensembl database in GTF format (version 91).In order to determine the sex of each cell, the sum of TPM values for seven Y chromosome genes (DDX3Y, EIF1AY, KDM5D, PRKY, RPS4Y1, UTY and ZFY) were used to distinguish male embryos (cut-off: >= 60 TPM) from female embryos (cut-off: <= 40 TPM) (see Supplementary Figure 1) [14].
Differential expression calling of scRNA-seq data Differential gene expression analyses between adjacent embryonic stages or between male and female embryos were conducted by employing the tool of Single-Cell Differential Expression (SCDE, version 2.8.0) [44].Then differentially expressed genes (DEGs) were determined using the criteria of | cZ | > 1.96 which corresponds to FDR < 0.05.

Differential alternative splicing of human early embryos
To identify the alternative splicing events that signi cantly differentially changed between distinct conditions, we used the software speci cally designed for scRNA-seq data of BRIE [45] (version 0.2.0) to analyze the differential alternative splicing events between male and female cells or between adjacent stages.Then differential alternative splicing genes (DASGs) were selected with the threshold of Bayes factor > 10.

Major isoform switching analysis
To investigate the genes switched the isoform with the highest expression (major isoform) in different conditions, the transcript expressions of each multi-isoform gene in each sample were ranked from large to small according to their expression level.Considering that each embryonic stage or each gender of a certain stage contains many single cells, we de ned the transcript that has the highest expression among all isoforms of a gene in at least 60% of the cells for a given condition as the major isoform.Then the major isoform of each multi-transcript gene in each condition were determined.The genes switched major isoforms were denoted as MISGs (major isoform switching genes).

Gene functional enrichment analysis GO terms and pathways
Gene Ontology (GO) enrichment analysis of process and KEGG pathway enrichment analysis were carried out for corresponding gene set using R package clusterPro ler [20] (version 3.8.1).Only those GO terms and pathways with P-value < 0.05 were considered as statistically signi cant.

Figures
Figures

Figure 1 Distribution
Figure 1

Figure 2 Distribution
Figure 2

Figure 5 Gene
Figure 5