Genome-wide profiling of DNA 5hmC in human foetal organs. We collected samples of 4 major organs including, heart, kidney, liver, and lung, from 8 human foetuses at 13–15, 18–19, 21–23 and 25–28 weeks with 2–4 replicates for each stage (Fig. 1a and Supplementary Table 1). Nano-5hmC-Seal36 was first conducted for DNA 5hmC profiling and displayed high reproducibility between replicates in different periods (Fig. S1a). To investigate genome-wide 5hmC features, we first examined the distributions of normalized reads of several development-related genes (Fig. 1b). The 5hmC enrichment of these genes prompted us to consider its organ-specific features in human foetuses at different stages.
To evaluate 5hmC global changes, we then illustrated 5hmC distributions along gene bodies (± 3 kb), which were similar to those of human adults17,25 (Fig. 1c). Most of the 5hmC peaks were localized on vital functional elements, such as promoters (Fig. S1b). These results showed that the distribution of DNA 5hmC was conserved in human foetuses and adults, similar to other mammals17,37. Moreover, the distinct signal intensity of 5hmC among stages in each organ indicated that 5hmC may play different roles during organ development (Fig. 1c). For instance, in the foetal heart, the 5hmC signal was relatively more stable than that of other three organs probably because it was almost formed at 8 weeks and displayed fundamental biological functions (Fig. 1a, c). In the liver, the 5hmC signal was higher when haematopoiesis began, and even higher when the liver showed a strong haematopoietic ability (Fig. 1a, c). Together, these results indicate that DNA 5hmC may dynamically regulate organ development by altering the activity of functional elements in an organ-specific manner.
We next identified differentially 5hmC-modified bins among four organs (ANOVA, fold change ≥ 1.5, P value < 0.05) (Fig. S1c), and the majority of these bins showed higher 5hmC levels than those of other organs (84.71–99.80%). Then adjacent bins were merged into one region called differentially hydroxymethylated regions (DhMRs) and 173,330–485,961 organ-specific DhMRs were obtained with around100–200 nt in length (Fig. S1d, S1e). No obvious preference of 5hmC modification was detected between gene body and intergenic regions, which was the same as for different chromosomes (Fig. S1f, g). Consistent with the annotation results of the 5hmC peaks in each organ, most of the organ-specific DhMRs were localized in functional elements (Fig. 1d). The merged top 1% least significant bins were defined as common hydroxymethylated regions (ChMRs) acting as controls. As expected, 5hmC enrichment in DhMRs was different across the 4 types of organs. For example, in the liver, 5hmC enrichment in promoters was higher than that of other organs (Fig. 1d). Moreover, most of the organ-specific DhMRs were located near transcription start sites (TSSs), with prominent in the liver (Fig. 1e). These results indicated that mammalian-conserved 5hmC modifications are widespread but show significant organ-specific feature in human foetuses.
We further identified 10,658–20,419 organ-stage-specific DhMRs (ANOVA, fold change ≥ 1.5, P value < 0.05) and found that samples from the same organ clustered more closely, whereas that samples from the same stage were scattered on the periphery (Fig. 1f and Supplementary Table 2). The results indicated that 5hmC could serve as a potential marker specific for human organs with different developmental stages. Moreover, cluster analysis showed that both 5hmC signal intensity and the corresponding functions of the DhMRs were both organ- and stage-specific (Fig. 1g, h). Taken together, these results suggest that organ- and stage-specific 5hmC modification is associated with morphogenesis and organ formation during early human development.
The reprogramming of DNA 5hmC during organ development. To further explore the dynamic sequential changes of 5hmC, we first classified the DhMRs into 6 clusters according to the overall changes of 5hmC signal in different stages, considering that it contains all types of nonrepetitive trends for 6 clusters (Figs. 2a and S2a). Most of the DhMR-containing genes belong to at least 2 clusters, suggesting that the same gene may be regulated through multiple pathways in a 5hmC-dependent manner (Fig. S2b). Therefore, we separated these DhMRs from different clusters into 2 groups: a single-regulation group (gene sets with DhMRs only belonged to 1 cluster) and a mixed-regulation group (gene sets with DhMRs belonged to at least 2 clusters) (Fig. 2b). Then, TF prediction was performed on the DhMRs from both groups, and the top 15 TFs significantly related to development and three most commonly enriched TFs are shown (Fig. 2b). To explore how TFs are involved in 5hmC-dependent organ developmental processes, we performed RNA-Seq on samples used for the Nano-hmC-Seal data (Supplementary Table 3). There was a strong correlation between replicates (Fig. S2c). In both the single- and mixed-regulation groups, development-related TFs, including T-box transcription factor 20 (TBX20), paired box protein Pax-8 (PAX8), Krueppel-like factor 1 (KLF1), and transcription factor 21 (TCF21), were highly enriched on DhMRs with organ-specific expression feature (Fig. 2b). Notably, a high 5hmC signal was detected on the DhMRs from cluster 5 in the heart at 21–28 weeks, cluster 3 in the kidney at 18–23 weeks, cluster 1 in the liver at 13–15 weeks, and cluster 3 in the lung at 18–23 weeks, which were specifically enriched by TBX20, PAX8, KLF1 and TCF21, respectively. For example, TBX20 is a transcriptional activator/repressor required for cardiac development38. The significant enrichment of this TF suggests its key role in the later stage of development, conferring full maturity of foetal heart. PAX8 was reported to be highly associated with the mesenchymal to epithelial transition involving metanephros morphogenesis39. From 18–23 weeks, corticomedullary differentiation is ongoing, and the nephron frequently increases simultaneously. PAX8 may mainly function during this period. A significantly higher binding of most TFs was also found in human adults, but the expression level of those TFs in foetuses was obviously higher than that in adults (Fig. S2d, e). Collectively, these results illustrate that TFs may participate in specific organ development with the assistance of sequential changes in 5hmC in foetuses.
DNA methylation is one of the major mechanisms adopted by the host to suppress the expression of transposable elements (TEs)40, some of which, however, could be used by their host genome as regulatory elements in certain organs during development41–43. Therefore, we investigated whether 5hmC modifications are preferentially installed on certain families of TEs in an organ- and stage-specific manner, which potentially facilitates the switch of the repressive state to the accessible state of the chromatin for TF embedding and subsequent regulation of gene expression in cis. We found that members of the Alu family of SINE elements show significant 5hmC enrichment on the DhMRs within cluster 3 and 4 in the foetal kidney (Figs. 2c and S2g–i). In particular, the binding sites of two TFs, namely CEBPB and TRPS1, are highly enriched in TSS-proximal Alu elements within these DhMRs (Figs. 2d and S2j), indicating that certain TFs preferentially bind to these Alu DhMRs, and their binding and cis-regulatory effect on target genes may be affected by differential 5hmC modifications of these Alu elements. The pattern of expression level changes of CEBPB target genes was consistent with its role as an activator in transcriptional regulation (Fig. 3e). Indeed, the expression levels of almost all CEBPB-targeted Alu-proximal genes (12/13) followed the 5hmC signals of the corresponding Alu DhMRs (Fig. 3e). Interestingly, approximately half of these CEBPB-targeted Alu-proximal genes (7/13) have been previously reported to be tightly associated with kidney development or responses to injury (Supplementary Table 4). This result indicates that 5hmC can epigenetically mark a transition state of Alu elements, through which to affect the binding of TFs and further downstream gene expression.
DNA 5hmC is correlated with gene expression homeostasis across human organs. Most DhMRs were located around the TSSs in all foetal organs (Fig. 1e) and the sequential changes of 5hmC signal on DhMRs showed a strong association with RNA abundance during organ development (Fig. 2b, e). We then asked whether the dynamic reprogramming of DNA 5hmC on promoters regulates gene expression at developmental stages. In support, a positive correlation between transcriptional activity and the 5hmC signal on promoters was observed in foetal organ samples (Fig. 3a), and most of the promoters contained 1–2 DhMRs (Fig. S3a). Importantly, their association was not set in stone during the development of each organ, and higher correlation coefficients appeared at the later stages of almost all four organs (Fig. 3a). Then we identified genes with significant 5hmC alterations in promoters across adjacent stages (changes ≥ 25%, fold change ≥ 2, Student’s t-test, P value < 1.03 × 10− 6) (Figs. 3b and S3b, c), and most of them were organ-specific promoters with obvious 5hmC changes (Fig. 3c). In particular, more than half of differentially expressed genes, such as those in the heart at 18–23 weeks (77.06%), liver at 18–23 weeks (74.09%) and 21–28 weeks (74.70%) (Figs. 3d and S3d), were subjected to 5hmC changes, and genes which were highly related to development were shown in Fig. 3d.
Additionally, we downloaded public DNase sequencing (DNase-Seq) data from the foetal heart, kidney and lung at 13–23 weeks44 and found that there was a high correlation among the 5hmC signal, chromatin openness on promoters and gene expression level, for which genes could be detected from all three omics datasets (Fig. 3e). This finding suggests that 5hmC overlaps with open chromatin and is positively associated with transcriptional activity. However, we did not observe the above similar results between adjacent stages (Fig. S3e). Then, we wondered whether TFs regulate the process across different organs through specific binding to 5hmC-enriched promoters. To discover corresponding 5hmC peaks based on known TF motifs, we performed pattern matching to reveal their one-to-one potential correspondence. To increase the reliability and specificity of the binding sequence, we used 30% of the single base proportion to obtain the optimal combinatorial sequences (Fig. S3f, g). Four kinds of random peaks/fragments were used to eliminate the effects of background noise (paired Student’s t-test, P value < 0.05) (Fig. S3h). According to the results, the number of 5hmC-modified target genes is different among distinct TF families. Cys2His2 zinc finger (C2H2 ZF), basic helix-loop-helix (bHLH), basic leucine zipper (bZIP), nuclear receptor, etc, are the main TF families in eukaryotes45, with more target genes than others in foetal organs (Fig. S3i). We further calculated the correlation of the 5hmC signal and the gene expression level of target genes from different TF families (Fig. 3f). The regulatory relationship of DNA 5hmC and transcription remained almost stable in C2H2 ZF, bHLH, bZIP, etc., whereas there were different regulatory roles in specific TF families, such as adaptor protein 2 (AP-2), early 2 factor (E2F), and homeodomain (Fig. 3f). All these findings indicate that 5hmC-modified promoters are highly associated with gene expression homeostasis during foetal organ development.
RNA m 5 C acts as a downstream regulator during foetal organ development. In the setting of organ-specific DNA 5hmC in the regulation of upstream gene expression, we wondered how RNA m5C drives organ-specific post-transcriptional regulatory networks, since it is one of the prevalent RNA post-transcriptional modifications associated with RNA stability31,32. RNA bisulfite sequencing (RNA-BisSeq) was performed on four foetal organs at three periods from 13 to 28 weeks. The samples had a dependable C-to-T conversion rate with good repeatability (Fig. S4a, b and Supplementary Table 5). Consistent with the findings in other vertebrates28,29,31,32, the majority of m5C sites were enriched near translation initiation sites (Fig. 4a). A total of 2,679–15,655 m5C sites within 316–1,622 mRNAs were identified at each stage during organ development (Fig. 4b). Eliminating the effect of continuous Cs, most m5C sites were strongly enriched in CG-rich and CDS regions (Fig. S4d, e). Considering that both the m5C methylation level and the number of m5C sites may affect the functions of corresponding mRNAs, we calculated the total methylation level across different organs (Figs. 4c and S4c). Globally, m5C levels dynamically change during the development of the kidney, liver and lung.
The global distributive features of m5C in human foetal organs prompted us to explore the potential role of m5C during development. Upon comparing RNA abundance changes between m5C-modified and unmodified mRNAs across adjacent stages, we found that the m5C-modified mRNAs were unstable before 18–19 weeks, whereas they became stabilized afterwards (P value < 5.40×10− 7), indicating that m5C modification has a dual role in the regulation of organ development (Fig. 4d). This association was independent of coverage changes (Fig. S4f).
Additionally, consistent with that of DNA 5hmC, organ-specific m5C-modified mRNAs were significantly enriched in the pathways related to cell differentiation, proliferation, and adhesion, and participated in the regulation of gene expression and translation during embryogenesis (Fig. S4g). Importantly, the organ-stage-specific m5C-modified mRNAs were closely related to RNA abundance in each organ (Fig. 4e and Supplementary Table 6). These genes revealed a high level of enrichment for organ physiology-related processes. Thus, these results indicate that as a post-transcriptional regulator, RNA m5C is indispensable during human foetal organ development.
DNA 5hmC and RNA m 5 C synergistically regulate organ-specific developmental genes which are critical in maintaining normal foetal growth. We next proposed that DNA 5hmC and RNA m5C may cooperate to spatiotemporally modulate the regulatory networks in foetal organs. Overall, promoters of genes with high transcriptional activity presented relatively higher 5hmC signals than the m5C methylation level on their corresponding mRNAs, suggesting a dominant role of DNA 5hmC at the transcriptomic level during development (Fig. S5a). To further investigate the respective functions of DNA 5hmC and RNA m5C, we divided the organ-specific modified genes into 4 groups: differentially expressed genes specifically regulated by both DNA 5hmC and RNA m5C (group A), differentially expressed genes regulated only by DNA 5hmC (group B), differentially expressed genes regulated only by RNA m5C (group C), and commonly expressed genes specifically regulated by DNA 5hmC and RNA m5C (group D) (Figs. 5a, S5b, c and Supplementary Table 7). As group D only accounts for 0.84% of the genes, these two types of modifications indeed affect the expression of the vast majority of organ-specific genes (Fig. S5b). Specifically, in group A, there were close associations among DNA 5hmC signal intensity, RNA m5C methylation level, and gene expression level (Fig. 5a).
Importantly, genes in all 3 groups (A, B and C) were involved in the process of organ formation and functional maturity (Fig. 5b and Supplementary Table 8). In the foetal heart, the genes from all three groups were significantly enriched in heart reconstruction, cardiomyocyte differentiation, and heart rate-related cardiomyocytes. The co-regulatory genes were mainly enriched in angiogenesis. In contrast to DNA 5hmC mainly regulating cardiac myofibril assembly, related ion transportation, and cell communication through electrical coupling, m5C modification is involved in atrial action potential and action potential depolarization, especially for SA and AV node cells. Nevertheless, both ventricular action potential and action potential repolarization need the involvement of both modifications. Although the foetal heart performs basic functions earlier than other organs, its gradual functional maturity and maintenance of normal development also require these two modifications. Additionally, RNA m5C was even more critical in cell differentiation of kidney, such as the regulation of mesenchymal epithelial cell transformation during metanephric morphogenesis, glomerular development, and renal vesicle morphogenesis. DNA 5hmC is the key driver of organic acid metabolic process in the liver. In the lung, both DNA 5hmC and RNA m5C synergistically regulate respiratory gaseous exchange, and they separately modified different gene sets to modulate lung morphogenesis. These results suggest that DNA 5hmC and RNA m5C generally play a synergistic role in the process of human foetal development, but in the foetal kidney and liver they exhibit differential and specific functions.
Interestingly, we found that 5hmC enrichment on promoters with m5C modification on their corresponding mRNAs was much higher than that without m5C (Fig. 5c), indicating a coordinated role of 5hmC and m5C in foetal organs. TET3 was reported to be the only existed DNA 5mC oxidase after fertilization, mediating the oxidation of most paternal nuclear cytosines and a large-scale removal of 5mC23,46. To determine the potential mechanisms underlying for the cross-talk between the two types of modifications, we calculated the correlation coefficients between the expression of TET3 and RNA m5C regulators. Apparently, the correlation coefficients of the expression levels of TET3 and NOP2/Sun RNA methyltransferase 2 (NSUN2)/Aly/REF export factor (ALYREF) were high (0.7652 and 0.7664) (Fig. 5d). However, no significant relationship was found between TET3 and Y-box binding protein 1 (YBX1). These results suggest that DNA 5hmC and RNA m5C might drive organ-specific regulatory networks through pre- and post-transcriptional mechanisms, respectively, which are mediated by TET3 and NSUN2/ALYREF. However, how they mechanistically contribute to developmental processes warrants further detailed investigation.