Construction of an adapted identification pipeline for 2,071 human embryonic RNA-Seq datasets
Screening for systematically published datasets in the National Center for Biotechnology Information's (NCBI's) Gene Expression Omnibus (GEO) database [22] yielded a catalog of 2,071 samples in 29 groups defined by developmental stages and cell types related to human embryonic development (Fig. 1A and Supplementary Table 1). Because none of these samples have genotype available, we chose a stringent approach with the use of RNA-Seq-data alone [23] for the identification of edits. As an adaptation for RNA-Seq datasets containing data on several-cell (e.g., 4-cell) and single-cell (e.g., oocytes) samples, we further minimized false positives (possibly the result of genomic contamination) by excluding all detected variant sites that overlapped with known genomic variants from worldwide genotyping studies [24–27]. When tested on an independent dataset with paired DNA and RNA sequenced for each single cell [28] (Fig. 1C and Supplementary Notes 2), this pipeline generated very low false-positive rates after filtering (Fig. 1D), supporting its application to the collected embryonic RNA-Seq datasets.
Identification of systematic A-to-I editome profile for human embryonic development
The application of the stringent pipeline to all 2,071 curated samples resulted in the identification of a total of 989,191 editing sites in normal and other samples (Fig. 2A), with hundreds to tens of thousands of sites identified in each stage (Fig. 2B). Consistent with previous large-scale identifications of RNA editing [23], we detected a high proportion of A-to-G mismatches (Fig. 2C), a high proportion of Alu edits among all edits similar to those in adult human tissues (Fig. 2D), and a signature RNA-specific adenosine deaminase (ADAR)-binding motif across all of these sites (Fig. 2E). These results supported the reliability of this human embryonic editome in revealing the dynamics of editing sites throughout embryonic development (see Fig. 2F for the example of the well-studied BLCAP Y2C recoding site [29]).
Detection of thousands of organized REs throughout early embryonic development
A per-stage search revealed that thousands of REs were present in all early embryonic stages (Fig. 3A and B).
Compared with all observed edits, REs were more likely to be exonic (<50% vs. >75%; Fig. 3C), and most were located in 3'-UTR regions (Fig. 3D). In addition, rather than being dispersed randomly like biological noises, >50% of REs persisted through stage transitions until the 2-cell stage, and ~30% of REs persisted through the 2-to-4-cell transition (Fig. 3E). It is also worth noting that most REs did not disappear completely upon stage transition, although they were no longer REs (as indicated by the scarcity of "not detected" edits in Fig. 3E). These results suggest a consistent, stable pattern (and thus a possibly functional role) of 3'-UTR REs in early human embryonic development.
REs target similar genes enriched with DNA replication-related functions across early embryonic stages
To gain insight into the functions that REs might affect, we selected genes that are frequently targeted by REs for each stage separately (Methods). We discovered hundreds of frequently targeted genes, >50% of which were targeted primarily in 3'-UTR REs in early embryonic stages (Fig. 4A and B). Similar to the REs, these RE-targeted genes also displayed a large degree of overlap from the oocytes (GV) to the 2-cell stages, and most such genes observed in 4-cell embryos were also observed in the 2-cell stage (Fig. 4C). Given this consistent pattern, we investigated the specific functions that these genes share, and found that functions enriched across >=3 stages were mostly related to DNA replication, a phenomenon observed only on genes targeted in exonic (primarily 3'-UTR) regions (Fig. 4D). These observations suggest a consistent functional impact of REs in early human embryogenesis.
RE-matching edits undergo organized loss in embryos with uniparental disomy and those from elder mothers
To further investigate the functional importance of REs, we examined for each early developmental stage whether RE-matching edits underwent an organized loss in embryos with particular phenotypes indicative of low embryo quality. An initial scan revealed 107 edits on 76 genes (Supplementary Table 3) that were REs in normal embryos, but completely lost in the same stage in pathological embryos (GSE133854 [30]) and embryos from elder mothers (GSE95477 [31] ; Fig. 5A). Notably, one of these edits, chr9:132375956, induces missense recoding during the translation of transcription termination factor 1 for ribosomal gene transcription (TTF1; ENSG00000125482 [32]), and its complete lost in parthenogenetic zygotes suggests the potential clinical value of the detection of this edit in cases of uniparental disomy. In addition, gene ontology (GO) analysis of the genes with AG-lost REs revealed enrichment in various functions shared by four or more genes, and many of these functions were related to RNA metabolism, (Fig. 5B and Supplementary Table 4), suggesting a potential link between these REs and RNA metabolism in these pathological embryos.
The complete lost of certain RE-matching edits in these embryos suggests the possibility that most other REs also undergo a systematic loss; indeed, we detected fewer RE-matching edits in protein-coding genes of abnormal embryos and embryos from elder mothers (GSE133854, GSE95477, and GSE101571 [33]) than in stage-matched control embryos included in the same datasets, although most differences were not statistically significant, possibly due to limited sample size (Fig. 5C). To examine whether a subset of protein-coding genes actively regulated by REs underwent statistically significant loss in abnormal embryos or embryos from elder mothers, we examined the average change in REs per gene in targets of maternal mRNA clearance (using other maternal genes as controls) with the exclusion of samples that were potentially outliers (Supplementary Notes 3). In GSE95477 samples (Fig. 5D), we observed statistically significantly fewer RE-matching edits per gene in targets of maternal mRNA clearance on average in oocytes (GV) from elder mothers than in those from young mothers (Fig. 5E), suggesting that REs are favored in these targets at least at this developmental stage (see Supplementary Fig. 3 for results from other datasets). These results raise the possibility that the loss of certain REs is indicative of certain phenotypes in a manner that may be related to key embryonic processes such as maternal mRNA clearance.
Targets of maternal clearance had more RE-induced microRNA binding sites than did nontargets
Having gained a preliminary understanding what genes and functions RE might affect, we then asked how RE would affect these genes. Because most exonic REs are located in 3'-UTRs (Fig. 3D), the gene element containing most MBSs, many 3'-UTR REs may affect genes by interfering with MBSs and thereby the microRNA-based regulatory program (see Fig. 6A for an example), a mechanism that has been studied extensively for RNA editing [10, 11, 13]. To confirm this, we annotated all MBSs on all editing-targeted transcripts before and after editing by TargetScan (with edited inosine treated as guanosine), and analyzed their associations with 3'-UTR edits. The 3'-UTR REs were much more likely to alter MBSs than were other edits (~50% vs. ~25% or less; Fig. 6B); in particular, they were more likely to result in MBS gain (mean, ~1 vs. ~0; Fig. 6C), suggesting their potential role in the enhancement of the microRNA-mediated degradation of targeted transcripts.
Based on this observation, we speculated that REs help to degrade mRNAs targeted by maternal mRNA clearance (referred to as "clearance targets" hereafter) [34] by introducing more MBSs (Fig. 6A). This hypothesis was immediately validated by the observation that REs result in bringing more MBSs on clearance targets than on other maternal genes (Fig. 6D), even with consideration of the net MBS change (i.e., accounting for the loss of preexisting MBSs by RE; Supplementary Fig. 4). Specifically, SUV39H2, a known target of maternal mRNA clearance during human embryrogenesis [5], received eight new MBSs and lost one preexisting MBS, suggesting a potential RE-MBS-based mechanism of clearance regulation (Fig. 6E). These results suggest a novel role of REs (and possibly other RNA edits) in the enhancement of maternal mRNA clearance through the introduction of more MBSs.