Large-scale identication of recurrent RNA edits in human embryos suggests their role in enhancing maternal mRNA clearance

Posttranscriptional modication plays an important role in key embryonic processes. Adenosine-to-inosine RNA editing, a common example of such modications, is widespread in human adult tissues and has various functional impacts and clinical consequences. However, whether it persists in a consistent pattern in most human embryos, and whether it supports embryonic development, are poorly understood. To address this problem, we compiled the largest human embryonic editome from 2,071 transcriptomes and identied for the rst time thousands of Recurrent Edits (REs; >=50% chances of occurring in a given stage) for each early developmental stages. We found that these REs prefer exons consistently across stages, tend to target genes related to DNA replication, and undergo organized loss in abnormal embryos and embryos from elder mothers. In particular, REs are likely to enhance maternal mRNA clearance by introducing more microRNA binding sites to the 3'-untranslated regions of clearance targets. This study suggests a potentially important, if not indispensable, role of RNA editing in key human embryonic processes such as maternal mRNA clearance; the identied editome can aid further investigations.


Introduction
The successful development of human embryos is based on a well-regulated network that spans multiple omic layers [1] , among which several types of posttranscriptional modi cations have been con rmed to contribute to maternal mRNA clearance. The dysregulation of such clearance could lead to severe developmental defects in non-human model organisms [2][3][4], and has been observed frequently in arrested embryos from patients [5]. Few of these discoveries, however, have examined the famous adenosine-to-inosine (A-to-I) RNA editing (referred to simply as "RNA editing" thereafter) [6].
As one of the well-known posttranscriptional modi cations, RNA editing converts the adenosines into inosines [7]. Because inosines are more like guanosines than the original adenosines, such editing can have various functional consequences, including the generation of non-synonymous substitutions during translation ("recoding") [8] or novel protein isoforms due to altered splicing [9], the alteration of microRNAtarget binding a nity [10,11], and the disruption of long stem loops in endogenous mRNA that might aid the self-tolerance of innate immunity [12] . An estimated ~20% of human 3'-untranslated region (3'-UTR) edits may affect microRNA binding sites (MBSs), which possibly affects the targeting of many microRNAs [13]. In addition, previous studies have identi ed several disease-informative edits [14], suggesting their potential role in key developmental processes.
Previous studies, however, have been conducted with adult human tissues; whether and how RNA editing could consistently contribute to human embryonic development remains largely unclear. Several recent studies have been conducted to investigate edits in human embryos using pilot embryo RNA sequencing (RNA-Seq) datasets [15][16][17], but the sample sizes have been limited and whether their conclusions drawn apply to most embryos remains unclear. In addition, the rapid primate-speci c expansion of Alu elements in mRNAs [18,19], which are hotspots of RNA editing [20], hinders the determination of the functional role of RNA editing in human embryos by simple examination of their non-primate model organism counterparts [21].
In this study, we rst compiled the rst systematic A-to-I editome for human early embryonic development based on 2,071 early embryonic RNA-Seq samples. We then con rmed the existence of per-stage Recurrent Edits (REs; edits observed in >=50% of samples) along with several lines of evidences suggesting their potential functions in human early embryonic development. In particular, we discovered a likely supportive role of REs in enhancing maternal mRNA clearance through the regulation of microRNA-based mRNA decay.

Results
Construction of an adapted identi cation pipeline for 2,071 human embryonic RNA-Seq datasets Screening for systematically published datasets in the National Center for Biotechnology Information's (NCBI's) Gene Expression Omnibus (GEO) database [22] yielded a catalog of 2,071 samples in 29 groups de ned by developmental stages and cell types related to human embryonic development ( Fig. 1A and Supplementary Table 1). Because none of these samples have genotype available, we chose a stringent approach with the use of RNA-Seq-data alone [23] for the identi cation of edits. As an adaptation for RNA-Seq datasets containing data on several-cell (e.g., 4-cell) and single-cell (e.g., oocytes) samples, we further minimized false positives (possibly the result of genomic contamination) by excluding all detected variant sites that overlapped with known genomic variants from worldwide genotyping studies [24][25][26][27]. When tested on an independent dataset with paired DNA and RNA sequenced for each single cell [28] ( Fig. 1C and Supplementary Notes 2), this pipeline generated very low false-positive rates after ltering (Fig. 1D), supporting its application to the collected embryonic RNA-Seq datasets.
Identi cation of systematic A-to-I editome pro le for human embryonic development The application of the stringent pipeline to all 2,071 curated samples resulted in the identi cation of a total of 989,191 editing sites in normal and other samples ( Fig. 2A), with hundreds to tens of thousands of sites identi ed in each stage (Fig. 2B). Consistent with previous large-scale identi cations of RNA editing [23], we detected a high proportion of A-to-G mismatches (Fig. 2C), a high proportion of Alu edits among all edits similar to those in adult human tissues (Fig. 2D), and a signature RNA-speci c adenosine deaminase (ADAR)-binding motif across all of these sites (Fig. 2E). These results supported the reliability of this human embryonic editome in revealing the dynamics of editing sites throughout embryonic development (see Fig. 2F for the example of the well-studied BLCAP Y2C recoding site [29]).
Detection of thousands of organized REs throughout early embryonic development A per-stage search revealed that thousands of REs were present in all early embryonic stages ( Fig. 3A and B).
Compared with all observed edits, REs were more likely to be exonic (<50% vs. >75%; Fig. 3C), and most were located in 3'-UTR regions (Fig. 3D). In addition, rather than being dispersed randomly like biological noises, >50% of REs persisted through stage transitions until the 2-cell stage, and ~30% of REs persisted through the 2-to-4-cell transition (Fig. 3E). It is also worth noting that most REs did not disappear completely upon stage transition, although they were no longer REs (as indicated by the scarcity of "not detected" edits in Fig. 3E). These results suggest a consistent, stable pattern (and thus a possibly functional role) of 3'-UTR REs in early human embryonic development.
REs target similar genes enriched with DNA replication-related functions across early embryonic stages To gain insight into the functions that REs might affect, we selected genes that are frequently targeted by REs for each stage separately (Methods). We discovered hundreds of frequently targeted genes, >50% of which were targeted primarily in 3'-UTR REs in early embryonic stages ( Fig. 4A and B). Similar to the REs, these RE-targeted genes also displayed a large degree of overlap from the oocytes (GV) to the 2-cell stages, and most such genes observed in 4-cell embryos were also observed in the 2-cell stage (Fig. 4C).
Given this consistent pattern, we investigated the speci c functions that these genes share, and found that functions enriched across >=3 stages were mostly related to DNA replication, a phenomenon observed only on genes targeted in exonic (primarily 3'-UTR) regions (Fig. 4D). These observations suggest a consistent functional impact of REs in early human embryogenesis.

RE-matching edits undergo organized loss in embryos with uniparental disomy and those from elder mothers
To further investigate the functional importance of REs, we examined for each early developmental stage whether RE-matching edits underwent an organized loss in embryos with particular phenotypes indicative of low embryo quality. An initial scan revealed 107 edits on 76 genes (Supplementary Table 3) that were REs in normal embryos, but completely lost in the same stage in pathological embryos (GSE133854 [30]) and embryos from elder mothers (GSE95477 [31] ; Fig. 5A). Notably, one of these edits, chr9:132375956, induces missense recoding during the translation of transcription termination factor 1 for ribosomal gene transcription (TTF1; ENSG00000125482 [32]), and its complete lost in parthenogenetic zygotes suggests the potential clinical value of the detection of this edit in cases of uniparental disomy. In addition, gene ontology (GO) analysis of the genes with AG-lost REs revealed enrichment in various functions shared by four or more genes, and many of these functions were related to RNA metabolism, (Fig. 5B and   Supplementary Table 4), suggesting a potential link between these REs and RNA metabolism in these pathological embryos.
The complete lost of certain RE-matching edits in these embryos suggests the possibility that most other REs also undergo a systematic loss; indeed, we detected fewer RE-matching edits in protein-coding genes of abnormal embryos and embryos from elder mothers (GSE133854, GSE95477, and GSE101571 [33]) than in stage-matched control embryos included in the same datasets, although most differences were not statistically signi cant, possibly due to limited sample size (Fig. 5C). To examine whether a subset of protein-coding genes actively regulated by REs underwent statistically signi cant loss in abnormal embryos or embryos from elder mothers, we examined the average change in REs per gene in targets of maternal mRNA clearance (using other maternal genes as controls) with the exclusion of samples that were potentially outliers (Supplementary Notes 3). In GSE95477 samples (Fig. 5D), we observed statistically signi cantly fewer RE-matching edits per gene in targets of maternal mRNA clearance on average in oocytes (GV) from elder mothers than in those from young mothers (Fig. 5E), suggesting that REs are favored in these targets at least at this developmental stage (see Supplementary Fig. 3 for results from other datasets). These results raise the possibility that the loss of certain REs is indicative of certain phenotypes in a manner that may be related to key embryonic processes such as maternal mRNA clearance.
Targets of maternal clearance had more RE-induced microRNA binding sites than did nontargets Having gained a preliminary understanding what genes and functions RE might affect, we then asked how RE would affect these genes. Because most exonic REs are located in 3'-UTRs (Fig. 3D), the gene element containing most MBSs, many 3'-UTR REs may affect genes by interfering with MBSs and thereby the microRNA-based regulatory program (see Fig. 6A for an example), a mechanism that has been studied extensively for RNA editing [10,11,13]. To con rm this, we annotated all MBSs on all editingtargeted transcripts before and after editing by TargetScan (with edited inosine treated as guanosine), and analyzed their associations with 3'-UTR edits. The 3'-UTR REs were much more likely to alter MBSs than were other edits (~50% vs. ~25% or less; Fig. 6B); in particular, they were more likely to result in MBS gain (mean, ~1 vs. ~0; Fig. 6C), suggesting their potential role in the enhancement of the microRNAmediated degradation of targeted transcripts.
Based on this observation, we speculated that REs help to degrade mRNAs targeted by maternal mRNA clearance (referred to as "clearance targets" hereafter) [34] by introducing more MBSs (Fig. 6A). This hypothesis was immediately validated by the observation that REs result in bringing more MBSs on clearance targets than on other maternal genes (Fig. 6D), even with consideration of the net MBS change (i.e., accounting for the loss of preexisting MBSs by RE; Supplementary Fig. 4). Speci cally, SUV39H2, a known target of maternal mRNA clearance during human embryrogenesis [5], received eight new MBSs and lost one preexisting MBS, suggesting a potential RE-MBS-based mechanism of clearance regulation (Fig. 6E). These results suggest a novel role of REs (and possibly other RNA edits) in the enhancement of maternal mRNA clearance through the introduction of more MBSs.

Discussion
By curating and analyzing the largest human embryonic editome to date, we showed that the early embryonic stages harbour thousands of REs that are preferably exonic and highly shared between stages at the editing site and target gene levels. We also showed that these REs could potentially enhance maternal mRNA clearance, a process that has been found to be associated with RNA editing in mouse embryos [6], by introducing more MBSs to clearance targets than to other maternal genes.
The role of A-to-I RNA editing in human has been ambiguous for a long time. Although several studies have demonstrated the importance of certain editing events [35][36][37] and documented the adverse consequences of the disruption of the core editing enzyme ADAR1 [38][39][40][41], the possible functional roles of RNA editing in key embryonic developmental processes remains largely unclear. Based on our observation of associations among REs, MBSs, and maternal mRNA clearance, we propose a working model of how human embryos could take advantage of the RNA editing machine for better development: embryonic A-to-I RNA edits, including the REs discovered in this study and possibly other accompanying edits, occur and result in the introduction of MBSs to (at least some) clearance targets more often than to other maternal genes; these targets are then more e ciently targeted and degraded by the microRNA machinery than they were in unedited form, thereby enhancing the maternal mRNA clearance (and thus the embryonic development [42][43][44]) (Fig. 7, left). Recent research has revealed the impairment of RNA editing in mouse oocytes upon knockout of Cnot6l, a deadenylase in the carbon catabolite repression 4negative on TATA-less complex that is required for deadenylation-based maternal mRNA clearance [6]; although the roles of RNA editing in human and mice may not be directly comparable, this nding suggests that the microRNA-based effect of RNA editing on maternal mRNA clearance discovered in the present study might cooperate with other posttranscriptional modi cations [2][3][4], possibly in an additive way [2], to advance maternal mRNA clearance.
Apart from altering the MBS count in clearance targets, REs (and other edits) can, in theory, affect embryonic development in other ways (Fig. 7, right). In fact, in addition to the completely lost REmatching edits identi ed (Supplementary Table 3), we discovered a subset of RE-matching edits that are nearly lost in cases of uniparental disomy [30] (Supplementary Table 5); these edits may be of additional critical value for scienti c understanding and clinical applications. Likewise, one could also further examine the recoding edits (Supplementary Table 6) in the editome to identify additional edits with critical functional impacts. Potentially useful insights could also be gained from the examination of REtargeted genes (and their accompanying REs) in postimplantation stages (Fig. 4B). Although scarce, several RE-targeted genes (Supplementary Table 7) are frequently edited by certain REs; these REs could be of special research interest, provided that they are validated to be non-somatic mutations by, for example, the examination of additional postimplantation embryos from independent individuals.
We'd note that we may have missed a certain number of edits (or even REs) in the current editome due to the relatively low sequencing depth of early single-cell RNA-Seq techniques, although we sought to cover as many reliable edits as possible by screening a set of thousands of samples with application of a stringent pipeline for candidate RNA edits. More informative REs (and their additional functions) may be discovered with deeper sequencing. In addition, the functional relevance of this editome to embryonic development is far from being extensively studied; speci cally, REs might be functionally important in key embryonic processes other than maternal mRNA clearance, such as those involving DNA replication and repair (as suggested by the results illustrated in Fig. 4D).

Conclusions
In this study, we have introduced the rst large-scale A-to-I RNA editome for early human embryos, the analysis of which revealed a consistent early-stage editing pattern (of REs) with probable functional importance in microRNA-based maternal mRNA clearance. These discoveries, along with the editome itself, are valuable resources for further examination of the interplay between RNA editing and other mechanisms involved in maternal mRNA clearance, as well as the identi cation of additional roles of Ato-I RNA editing in early human embryonic development.

Compilation of human embryonic RNA-Seq datasets
In addition to including human embryonic RNA-Seq datasets whose A-to-I editomes have been studied previously [15,16], we used GEOmetadb [45] to search GEO [22] for all RNA-Seq samples submitted before October 1st, 2020, using the keyword "embryo" and the species restriction of Homo sapiens. We ltered the datasets identi ed by this search to identify paired-end RNA-Seq data with read length >= 75 × 2 bp, to increase the accuracy of A-to-I RNA editome identi cation [46]. For single-cell RNA-Seq datasets, we required that the sequencing technology not be based on cell barcoding. This process yielded a total of 2,071 samples (1,852 normal and 274 abnormal) from 18 datasets (Supplementary Table 1), which were sent to the A-to-I RNA editome identi cation pipeline.

Identi cation of the A-to-I editome and REs within it
We adapted a published pipeline [23] used in the Genotype-Tissue Expression A-to-I editome study [47] (Supplementary Notes 1). Brie y, we: 1) generated a new reference genome by concatenating the hg38 assembly and all sequence fragments spanning known junction sites from the version 32 annotation of GENCODE [48]; 2) aligned quality-controlled reads to this new reference; 3) mapped these alignments back onto hg38 coordinates; 4) called variants with GATK [49]; and 5) ltered for A-to-G variants that did not overlap with common genomic variants or regions prone to algorithmic errors, and with enough read and sample support. REs were then identi ed for each stage by ltering for those edits observed in >= 50% samples in that stage.
To reduce false positive results in this pipeline as much as possible, we expanded the set of genomic variants used in step 5 above. Speci cally, in addition to data from dbSNP version 151 [50] , the University of Washington Exome Sequencing Project (https://evs.gs.washington.edu/EVS/), and the 1000Genomes Project [26], we used data from the Genome Aggregation Databse [27] and the NCBI's Allele Frequency Aggregator project [25] which span more than hundreds of thousands of individuals to exclude variants that overlapped with population genomic variants found in these studies or projects.
Variants passing through this lter are very unlikely to be false positives arising from genomic variation.
Annotation of the A-to-I editome We obtained from the GATK variant call format output the chromosome and position for each A-to-I edit in each sample, as well as its read coverage (AN), the number of reads supporting the editing (AC), and the editing frequency AF = AC / AN. We then annotated these edits using SnpEff [51] with GENCODE version 32 annotation, and classi ed them according to their SnpEff 'Annotation' Field: coding sequence (CDS) regions, 5'-UTRs, 3'-UTRs, exonic regions of non-coding transcripts, introns, and intergenic regions (Supplementary Table 2). When a given edit was of different types on different transcripts of a given gene locus (e.g., in the CDS region of one transcript and the 3'-UTR of another), we assigned the edit type in the following order: CDS > 5'-UTR > 3'-UTR > non-coding exonic > intronic > intergenic.

Gene-level enrichment analysis
For GO term enrichment analysis, we used the "enrichGO" function in clusterPro ler [53] with the org.Hs.eg.db database [54] to analyze enriched terms for each type of genes in each stage. To correct for multiple hypothesis testing, we pooled all enrichment results and adjusted them using the Benjamini-Honchberg method.

Annotation of MBSs and effects of REs on them
We used TargetScan [55] to annotate MBSs in 3'-UTRs. The multi-species 3'-UTR input for each chromosome was generated by subsetting the UCSC 30-way alignment in MAF format (http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz30way/) with the "interval_maf_to_merged_fasta.py" script from Galaxy tools [56] [https://github.com/galaxyproject/toolsiuc and https://github.com/galaxyproject/galaxy (release 21.01)] and the BED le describing the 3'-UTRs for that chromosome. To annotate the effect of each RE on MBSs, we modi ed the multi-species 3'-UTR input for TargetScan by replacing the adenine at the RE site in the human 3'-UTR sequence with guanine, and fed this modi ed multi-species 3'-UTR input to TargetScan again; in this way, one modi ed multispecies 3'-UTR input for TargetScan was generated for each RE. We then annotated an RE on a given combination of gene and microRNA family as follows:  The authors have no con ict of interests.      Genes targeted by maternal mRNA clearance have more MBS-gaining REs than do other maternal genes (also see Supplementary Fig. 4). All p-values were unadjusted and were derived from one-tailed Wilcoxon's Rank Sum test, with the alternative hypothesis being that mean value for the left (decay at 8cell) group would be greater than that for the right (others/baseline) group. The baseline group is just a value of 0; we plotted it here for the sake of visual clarity. (E) An example of RE-induced MBS gains on a known clearance target SUV39H2. Figure 7