Genome-wide identification of ACRs during pig embryonic development
In order to examine the genome-wide ACRs involved in muscle development, we conducted ATAC-seq to profile chromatin accessibility at 45, 70 and 100 days during pig embryonic development. ATAC-seq was performed on 12 samples and a total of 152–170 million reads were uniquely mapped to the reference genome (Additional file 1: Table S1). We first assessed the quality of the libraries based on the peak signal distributions and the lengths of the inserts. Detailed information on the high-quality libraries derived from each individual can be found in Additional file 6: Fig. S1 and Additional file 6: Fig. S2, which show that all libraries exhibited the expected fragment length, with abundant nucleosome-free and span-mononucleosomal fragments (Additional file 6: Fig. S1), and the highest peak signal across the whole gene body in the TSS (Additional file 6: Fig. S2). Numerous peaks with high confidence were obtained based on the 12 libraries (Additional file 1: Table S1). As shown in Table S1, more peaks were acquired in samples from older embryos, indicating that chromatin is globally more accessible in older embryos than in younger embryos. To further verify the most representative accessible regions of the genome among samples, we merged the peaks that were shared among all four samples in the same group. Ultimately, 21638, 35447 and 60181 unique regions (or peaks) with 0.50%, 0.89% and 1.89% coverage for the swine genome were found at 45 dpc, 70 dpc and 100 dpc, respectively. All downstream analyses were based on these regions.
We investigated the relationship between peak numbers and chromosome length and found that chromatin with longer chromosomes displayed more peaks (r2=0.84 (LW45), r2=0.90 (LW70), r2=0.89 (LW100)) (Fig. 1B). To further profile the ACR distribution across the whole genome, we evaluated the relative positions of the peaks in the genome (Fig. 1C). At each stage, numerous peaks were annotated in intron and intergenic regions, accounting for approximately 2/3 of all peaks. Approximately 30%, 21%, and 14% of peaks were identified in the promoter region at 45 dpc, 70 dpc and 100 dpc, respectively. Among those peaks located in the promoter region, more than 91% of the peaks were annotated within -1 kb to 100 bp of the TSS. Few peaks were identified in transcriptional termination sites (TTSs) and exonic regions. In addition, the percentage of peaks located in the proximal promoter region (-1 kb to 100 bp away from the TSS) gradually decreased as the embryo developed, whereas the percentages of peaks within intron and intergenic regions increased. The ACRs among all stages were relatively enriched in promoter and exon regions, especially in the proximal promoter, instead of intergenic regions (Fig 1D).
We further investigated the average lengths of peaks in different genomic regions (Fig. 2A-2C) and found that peaks within proximal promoter regions were longer than those in other regions. Those in intergenic regions were the shortest.
Coordinate regulation of gene expression by ACRs
RNA-seq was conducted using the same individuals as ATAC-seq to explore the effect of ACRs on gene expression. The basic information of each RNA library is listed in Additional file 2: Table S2. Averages of 32.69 million reads per sample were uniquely mapped to the swine reference genome finally (Sus scrofa 11.1.94). The results of cluster analysis showed that the four replicates of each stage were all grouped together, indicating that our experiment had good repeatability (Additional file 6: Fig. S3). To explore the role chromatin accessibility in regulating gene expression, we performed further analysis on those peaks located at the proximal promoter. More than 90% of genes annotated by those peaks contained only one ACR in the proximal promoter region (Fig. 3A). We then analyzed the relationship between ACR number and gene expression level. The proportion of genes with high expression (FPKM>30) in the multiple-ACR group was greater than that in the one-ACR group (Fig. 3B). The genes that contained a single ACR in the proximal promoter region were sorted according to peak length and were then divided into 3 equal groups, namely, the top, middle and bottom groups, to investigate the connection between peak length and gene expression level. When several peak had the same length, we assigned the peak to the groups included the greatest numbers of peaks of similar length (i.g. Assuming that there are a total of 90 peaks. We divide them into 3 groups according their length. Thus, there should be 30 peaks in each group. However, when the length of the 29th and 30th peaks of the first group is equal to the first peak of the second group, we will assign the first peak in the second group to the first group). The results showed that the proportion of highly expressed genes with FPKM values greater than 30 gradually decreased from the top group to the bottom group in all of stages except in LW70 stage of one ACR group and LW45 stage of multiple group (based on total length), regardless of whether the genes have an ACR or multiple ACRs. And low-expressed genes with FPKM values lower than 5 of whole stages except in LW70 stage of multiple ACR group (based on max length) increased as the peak length declined (Fig. 4A-4C). These results demonstrate that genes with longer ACRs show comparatively higher expression levels.
Identification of stage-specific and differential ACRs during embryonic development
Although a large number of ACRs were detected at all stages examined in this research, differences were observed among stages (Fig. 5A) when we compared the genome-wide ACRs obtained from the different samples. In total, 1390, 2808 and 28153 stage-specific peaks were identified at LW45, LW70 and LW100, respectively (Fig. 5B). For example, XRCC1, which can manage a temporally responsive DNA repair process to advance the muscle differentiation program, contained an ACR in its proximal promoter region specifically at LW45 (Fig. 5C) [35], and ENPP2, which can regulate WNT/β-Catenin signaling to control myogenic differentiation, displayed an LW70-specific ACR in the proximal promoter region (Fig. 5C) [36]. In contrast, BCL6, which is related to the process of terminal differentiation in muscle cells, showed an LW100-specific ACR (Fig. 5C) [37]. These results demonstrate that ACRs vary dynamically. The changes in ACRs in each stage were consistent with the expression levels of the genes shown in Fig. 5E. The highest expression levels of XRCC1 and ENPP2 were observed at the LW45 and LW70 stages, respectively, whereas the highest expression level of BCL6 was observed at the LW100 stage, indicating that ACRs may regulate gene expression by altering the binding sites for TFs.
We also identified 17574 common peaks that were shared by all three embryonic stages in this study (Fig. 5B). However, the intensities of some of these peaks differed among the different groups. For example, in the gene body and promoter regions of MYOD1, an important regulator affecting muscle development, chromatin became more accessible from LW45 to LW100, especially in the LW100 stage whose peak intensity was significantly higher than other stages (p<0.05) (Fig. 5D) [38]. Correspondingly, the expression level of MYOD1 gradually increased as the embryonic stage progressed (Fig. 5E). This result suggests that the changes in the intensity of ACRs associated with this gene may also affect its expression (r2=0.37). Then, we performed DPI analysis for the common peaks among the different embryonic stages (Table 1). In total, 145, 1726 and 2687 ACRs were detected to have DPIs in the LW70 vs LW45, LW100 vs LW70 and LW100 vs LW45 comparisons, respectively. As shown in Fig 5B and Table 1, more peaks with significantly different intensities were observed in the LW100 vs LW70 comparison or in stage-specific ACRs at the LW100 stage than in other comparisons or stages, respectively, suggesting that a widespread change in chromatin accessibility occurs primarily during the formation of secondary fibers.
Distinct regulation of genes at different stages during skeletal muscle development
To further elucidate the regulatory roles of genes with stage-specific ACRs as well as with DPIs during embryonic muscle development, we investigated the peak-related genes and their functions. A total of 11344 genes were found to have stage-specific ACRs among the three stages (Fig. 6A). To assess the biological functions associated with the stage-specific ACR-related genes, GO analyses were performed (Additional file 3: Table S3). The GO analysis results showed that genes with stage-specific ACRs were significantly enriched for many different muscle-related biological processes (BPs). For example, genes associated with the LW45 were significantly enriched for the actin cytoskeleton organization and actin filament organization BPs [39, 40]. Genes with stage-specific ACRs at the LW70 were enriched for BMP signaling pathway and positive regulation of Wnt signaling pathway [41, 42]. And genes in LW100 stages were enriched for the muscle hypertrophy, striated muscle hypertrophy and muscle tissue development BPs. The results of KEGG analysis were displayed in Additional file 3: Table S3. Only one pathway, namely axon guidance, was found in LW70 stage and no significant pathways were enriched in LW45 stage. Three pathways were found in LW100 stage. The most significant enrichment pathway was Leukocyte trans-endothelial migration. Notably, 223 genes with stage-specific ACRs overlapped among the three stages, suggesting that these genes may be differentially regulated during different stages of embryonic muscle development (Fig. 6B) [43]. These overlapping genes were significantly enriched for muscle-related terms, such as skeletal muscle tissue development, actin filament organization and actin cytoskeleton organization (Additional file 3: Table S3).
For common peaks, we annotated these DPI found before, and then performed functional analysis on the annotated genes. A total of 2378 genes were observed among all comparisons (Fig. 6C). Similar to the results obtained for genes with stage-specific ACRs, genes annotated by DPIs in each comparison except in LW70 vs LW45 group were significantly enriched for many muscle-related related BPs (Additional file 4: Table S4), such as muscle organ development, muscle system process, skeletal muscle tissue development and skeletal muscle tissue regeneration. A total of 14 terms were unmasked in LW70 vs LW45 group, most of which were significantly associated with cardiac-related development (Additional file 4: Table S4). Several metabolic related pathways, such as fatty acid metabolism were significantly enriched for the genes with DPIs in the LW70 vs LW45 comparison [44]. Moreover, some custom developmental pathways, such as Wnt signaling pathway, MAPK signaling pathway and AMPK signaling pathway have been proven to play essential roles in muscle development and were enriched significantly in LW100 vs LW70 and LW100 vs LW45 comparisons [42, 45, 46]. In addition, among those genes with DPIs, we identified 63 genes with continuous significant changes in their peak intensities (Fig. 6D). A total of 10 terms were enriched significantly and related to cell fundamental functions (Additional file 4: Table S4).
Identification of regulatory DNA elements at different stages during embryonic development
The expression levels of genes were dramatically different at different stages during embryonic muscle development. We identified 2805 DEGs through pairwise differential expression analysis among the different stages. All of these DEGs could be divided into 6 clusters based on their expression patterns (Fig. 7A). The expression level of genes in cluster 1 decreased gradually from LW45 to LW100, while opposite trends of genes expression was detected in cluster 5. The genes in cluster 2 were highly expressed at the LW70 stage but displayed sharp changes at the LW45 and LW100 stages. Genes involved in cluster 3 were low expressed at LW45 and LW70 stage, while increased sharply in the LW100 stage. The expression level of cluster 4 was high in the LW45 and LW70 stages, while the expression level was the lowest in the LW100 stage. Genes in cluster 6 had a peak expression at LW45 and bottom expression at LW70 (Fig 7B).
For genes with stage-specific peaks in their proximal promoter regions, we integrated those genes with the DEGs revealed by RNA-seq and found that genes containing stage-specific ACRs in LW45 were most enriched in cluster 1, which had the highest expression level in the LW45 stage (Table 2). Genes with LW100 stage-specific ACRs were significantly enriched in cluster 3 and cluster 5, which displayed the highest expression level at LW100 (Table 2). Interestingly, genes associated with LW70 stage-specific ACRs were significantly enriched in cluster 4, which exhibited the peak expression level at LW45 (Table 2). These findings suggested that the alterations in gene expression may relate to the regulation of ACRs.
We further investigated the CREs within ACRs at the proximal promoter in cluster 1 and cluster 3-5 with HOMER software. Several known motifs associated with muscle development were significantly enriched in all stages (Additional file 6: Fig. S4-S7). For example, MEF2c and MyoG, which are well-known fundamental regulators of the skeletal muscle lineage during the embryonic period, were significantly enriched at the LW45 specific ACR in cluster 1 (Additional file 6: Fig. S4) [38, 47]. In addition, muscle development-related TFs, such as Mef2d and Mef2c, members of the MEF2 family of TFs, were significantly identified at the LW100 specific ACR in cluster 3 and 4 (Additional file 6: Fig. S6-S7) [47]. These results suggest that the identified TFs regulate a considerable proportion of genes and may play important roles during embryonic muscle development. The Nf1 being important roles in skeletal muscle development was detected at the LW70 stage (Additional file 6: Fig. S5) [48, 49]. These results indicate that the regulation of genes by TFs is dynamic during embryonic muscle development.
For genes with common peaks, we integrated the DPIs-related gens in ATAC-seq with the DEGs detected in RNA-seq (Table 3). A total of 637 differential peaks corresponding to 506 DEGs were revealed. Among those peaks mapped to DEGs, that approximately 80% showed the same expression trend (i.e genes containing DPIs in ATAC-seq were upregulated and the same genes in DEGs found by RNA-seq were also upregulated in the same comparison and vice versa.), suggesting ACRs might play important roles in regulating the expression of genes.
To further explore the functions of these genes in detail, GO and KEGG pathway analyses were performed (Additional file 5: Table S5). The comparison of LW100 vs LW70 and LW100 vs LW45 showed that several muscle-related terms were enriched, such as striated muscle contraction and muscle contraction. While the significantly enriched GO terms in LW70 vs LW45 comparison were mainly associated with synaptic-related functions, such as regulation of presynapse assembly and regulation of presynapse organization BPs. No pathways were significantly enriched in LW70 vs LW45 comparison. And two pathways, AMPK signaling pathway and calcium signaling pathway were simultaneously exposed in LW100 vs LW70 and LW100 vs LW45. And calcium signaling pathway was meanwhile the most enriched pathway in LW100 vs LW70 and LW100 vs LW45.
Several peaks mapped to DEGs exhibited different change trends (i.e genes containing DPIs in ATAC-seq were upregulated and the same genes in DEGs found by RNA-seq were downregulated in the same comparison and vice versa.) between the ATAC-seq and RNA-seq data shown in Table 3. We sought to identify whether these differentially altered ACRs were enriched for some particular transcriptional repression factor (Additional file 6: Fig. S8–S10). Ultimately, a known transcriptional repression factors CTCF, was discovered in both the LW100 vs LW70 and LW100 vs LW45 comparisons, suggesting that some of the genes in these two comparisons may be regulated by the same TFs during embryonic muscle development (Additional file 6: Fig. S9-S10).