Genome-wide identification of redundant and non-redundant miRNA-mediated cFFCs in Drosophila embryos.
The spatiotemporal regulation of miRNAs during development have been widely and precisely studied30–32. Redundant functions of miRNAs often arises from multiples tissues/organs or different developmental stages8. Therefore, it is crucial to confine the regulation to specific developmental stages when investigating the functions and evolutionary trajectories of miRNA-mediated redundancies within feedforward loops. Here, we select the Drosophila embryonic stage, for which ChIP-seq/ChIP-chip of most transcription factors (TFs) have been performed, to ensure clarity in regulations at the transcription level. We collected 220 ChIP-seq/ChIP-chip datasets of 74 TFs from GEO and modENCODE databases, constructing a comprehensive transcription regulation network during the embryonic stages of D. melanogaster, covering 0 to 24 hours post-egg-laying (Supplementary Table 1). A TF-target interaction was identified when the binding peak of the TF was located within 1.5kb upstream to 0.5kb downstream of a gene body. To enhance the accuracy of the identification, we also utilized single-cell ATAC-seq data from D. melanogaster embryos33 to adjust these ChIP-seq peaks with chromatin accessibility (see Methods). Out of the 69 TFs with identified targets by ChIP-seq/ChIP-chip data, 56 show over 80% validation of their target interactions, as evidenced by overlapping ATAC-seq peaks (Supplementary Table 2).
Next, we obtained miRNA regulatory data from TargetScan (Release 7.2), incorporating predictions from 94 conserved miRNA families34,35. To ensure accurate representation of the coordinated function of miRNA targeting with TFs, we selectively utilized miRNAs that expressed during the embryonic stage15 (see Methods, Supplementary Table 3). By integrating these TF regulations with miRNA targeting, we constructed a comprehensive gene regulatory network (GRN) that encompasses both transcriptional and post-transcriptional regulatory levels, including 69 TFs, 87 embryonic miRNA/families, and 11,180 target genes. In delineating the miRNA-mediated cFFCs, we identified 1,528 non-redundant motifs, each containing only one miRNA-mediated cFFC, as well as 5,712 redundant motifs involving multiple miRNA-mediated cFFCs (Fig. 1). Our analysis of redundant motifs indicated an average of 7.31 miRNAs per motif (Fig. 1). The maintenance of such redundancy, albeit costly, suggests that a moderate level of redundancy is typical in an evolutionarily stable system. This observation highlights the significance of redundancy in target gene regulation via cFFC formation and its integration with TF networks.
Purifying selection on miRNA target sites in redundant and non-redundant cFFCs.
Previous analyses suggested that cFFC redundancy may be evolutionarily unstable if one circuit simply serves as a backup for another, but it may be stable if the “redundant” circuits operate coordinately to execute biological functions28,29. The former scenario represents true redundancy and is likely to be relaxed from purifying selection, while the latter would be subjected to selective constraints. By detecting signals of nature selection, it may be possible to determine whether and how natural selection maintains such miRNA-mediated cFFC redundancy for evolutionary advantages. Given that cFFC motifs were identified based on conserved miRNA families, the mutations that lead to altered cFFCs predominantly arise from changes in miRNA targeting sites. Therefore, we performed evolutionary genetics analyses on miRNA target sites involved in cFFCs with varying degrees of redundancies to detect natural selection acting on these redundancies. We categorized the miRNA targeting into four distinct groups: 1) 40,154 non-motif sites not involved in any cFFC, of which 289 target sites were located on TFs (miRNA ->TF sites), 2) 1,579 non-redundant sites, each involved in only one cFFC, 3) 15,740 low-redundancy sites, each involved in fewer than 7.31 cFFCs (the average number of miRNAs per redundant motif), and 4) 35,344 high-redundancy sites, each involved in more than 7.31 cFFCs.
We detected purifying selection on miRNA-mediated cFFC redundancies at both between-species and within-species levels. Between-species conservation was assessed by determining the proportion of miRNAs targeting sites in D. melanogaster that are also present in other species (including D. simulans, D.yakuba, D. ananassae, D. pseudoobscura and D. virilis ) (Fig. 2a), which indicates long-term selective constraint. We found the highest between-species conservation in miR->TFs sites (with proportions of 86.2% in D. simulans, 75.1% in D. yakuba, 33.9% in D. ananassae, 29.1% in D. pseudoobscura, and 19.0% in D. virillis). High-redundant sites exhibited the next highest conservation (with proportions of 77.7% in D. simulans to 15.3% in D.virillis, P-value > 0.05 only in D. pseudoobscura and D. virillis, Fisher exact tests), followed by low-redundant sites (74.2% in D. simulans to 10.5% in D. virillis, P-value < 0.05, Fisher exact tests). As expected, non-redundant sites showed the lowest conservation proportion (ranging from 70.2% in D. simulans to 5.8% in D. virillis, all with P-value < 0.05, Fisher exact tests). Interestingly, the non-motif sites, which are assumed to be the foundation for composing cFFCs, exhibited higher conservation than non-redundant sites (ranging from 72.6% in D. simulans to 11.4% in D. virillis, all P-value < 0.05, Fisher exact tests), but less conserved than the low-redundant sites. This indicates that the cFFC redundancy, rather than the cFFCs themselves, may be crucial for biological functions.
Within-species selective constrains were inferred from Tajima’s D values36. Using D. simulans as an outgroup, we calculated the derived allele frequencies and Tajima’s D for miRNA target sites from different strains of D. melanogaster using data and methods from Dai et al37. Compared to the whole-genome fourfold sites, which are typically used as a neutral control, all miRNA target sites exhibited lower Tajima’s D values, indicating purifying selection (Mann-Whitney U tests, all P-values < 0.05; Fig. 2b). We also observed a decreasing trend in Tajima’s D values with increasing redundancy levels when comparing high-redundant, low-redundant, and non-redundant sites (Generalized linear model analysis, with all coefficients < 0, P-values < 0.05), suggesting a positive correlation between the degree of selective constraint and cFFC redundancies. However, the non-redundant sites exhibited higher Tajima’s D values compared to the non-TF sites (median − 0.715 vs -1.110, Mann-Whitney U test, P-value < 0.05), indicating the relaxation of functional constraint. Taken together, our results suggest cFFC redundancy is subject to selective constraint, and the degree of selective constraints correlates with redundancy levels.
The association between miRNA-mediated cFFC redundancy and expression stability.
It is thought that miRNAs canalize gene expression via widespread weak regulation8,9,17. We hypothesize that miRNA-mediated cFFC redundancies may also contribute to system stability. To test this hypothesis, we first checked the distribution of target redundancy in miRNA-mediated cFFCs. More than half of targets (5,712 out of 11,180) are regulated by redundant cFFCs, while 13.67% (1,528 out of 11,180) are regulated by non-redundant cFFCs, and the remaining 35.24% (3,940 out of 11,180) targets are regulated exclusively by miRNAs (categorized as non-motif targets). This distribution suggests the widespread prevalence of targets regulated by redundant cFFCs. We then examined 3’ UTR length for targets with different levels of redundancy, as cis-regulatory elements within the 3’UTR may influence mRNA stability. The longer 3’UTRs, the more cis-regulatory elements residing and more stable gene expression38–40. We found that the high-redundant targets have the longest 3’UTRs (peaking at 1,024 ~ 2,048 bp), followed by the low-redundant (peaking at 256 ~ 512 bp), non-redundant (peaking at 128 ~ 256 bp) and non-motif targets (peaking at 128 ~ 256 bp; Supplementary Fig. 1).
Next, we sought to investigate the functional consequences of cFFC redundancy on expression divergence between species. To do so, we sequenced the transcriptome of 0.5 h and 2 h embryos, each with two biological replicates, for both D. melanogaster and D. simulans. After mapping the raw reads to their respective genomes, gene expression level was quantified as Transcripts Per Million (TPM). Expression divergence between species was calculated as one minus the Spearman’ correlation coefficient (1 - ρ) of the TPM values of one-to-one orthologous gene pairs between D. melanogaster and D. simulans, as described previously41. To account for the impact of 3’UTR length, we classified target genes with varying levels of cFFC redundancy into three groups based on the length of their 3’UTRs (256 ~ 512 bp, 512 ~ 1,024 bp, and 1,024 ~ 2 048 bp; Fig. 3a). Within each target group with the same 3’UTR length, there is a general trend of decreased expression divergence as target redundancy increases in at least at one development stage (Mann-Whitney U tests, P-values < 0.05), despite the highest expression divergence observed in non-redundant targets with 3’UTR lengths of 1,024 ~ 2,048 bp in 2 h embryos (Fig. 3a). The high redundant targets almost always exhibited the lowest expression divergence, except those with 3’UTR lengths ranging from 256 to 512 bp (Mann-Whitney U tests, P-values < 0.05, Fig. 3a). Our results suggest that cFFC redundancy negatively correlates with gene expression divergence between species.
Finally, the impact of cFFC redundancy on system stability can also be assessed under perturbation. In this scenario, targets with cFFC redundancy are expected to show smaller variation compared to those without, and are also likely to return to the steady state more rapidly once the perturbations are removed. We subjected D. melanogaster larvae to heat shock at 30 ℃, and conducted RNA-seq analysis on both treated and control samples, each with three biological replicates, at 0 h, 3 h, 6 h, 9 h and 12 h post-treatment (see Methods). We indeed observed that the high-redundant cFFC targets showed minimal expression differences at 0 h after heat shock. However, statistical significance was only observed when compared to non-TF targets lacking cFFC regulation (Fig. 3b, Mann-Whitney U test, P-values < 0.05). While the precise time for these cFFC redundant targets to return to a steady state could not be determined, their expression fluctuated around the mean values at time points of 3 h, 6 h, 9 h and 12 h post-treatment (Fig. 3b), suggesting a rebound time of less than 3 h. Public data on heat shock in embryos and multiples stresses at 3- and 4-day larvae also provide evidence supporting the role of miRNA-mediated cFFC redundancy in buffering noise from environmental perturbations, to some extent (Supplementary Fig. 2).
MiRNA-mediated redundant cFFCs tend to target old genes or broadly expressed young genes.
If miRNA-mediated cFFC redundancy is evolutionarily significant, we anticipate that the associated targets are more likely to be evolutionarily conserved and exhibit widespread expression. We first checked the correlation of miRNA-mediated cFFC redundancy with gene ages. Genes were categorized into three age groups: Drosophilid (youngest), pre-Drosophilied (middle-aged), and pre-Bilateria (oldest), following the classification defined by Witt et al. and Kondo et al.42,43. The proportion of genes in each age group across different levels of cFFC redundancy was calculated (Fig. 4a). When compared to the entire gene set as a background, non-redundant and non-motif targets were found to be enriched in the Drosophilid group (with 30.9% for non-redundant targets, and 43.4% for non-motif targets, Fisher exact tests, P-value < 0.05; Fig. 4a). Conversely, targets in low- and high-redundant motifs were notably enriched in the pre-Bilateria group (accounting for 66.8% for low-redundant motifs, and 73.7% for high-redundant motifs, Fisher exact tests, P-values < 0.05; Fig. 4a). Additionally, targets in high-redundant motifs had a higher proportion of pre-Bilateria genes (73.7%) than those in low-redundant motifs (66.8%, Fisher exact test, P-value < 0.05). The trend of an increasing proportion of pre-Bilateria genes and a decreasing proportion of Drosophilid genes along with miRNA-mediate cFFC redundancies (Fig. 4a, ANOVA, P-values < 0.05) suggests that this type of redundancy leads to a preference in miRNA targeting towards older genes.
Next, we used τ, as defined by Witt et al to measure tissue expression specificity43 and compared τ values across genes with varying levels of cFFC redundancies (Fig. 4b). Our results indicated that genes regulated by non-motif miRNAs exhibited higher tissue specificity (median, τ = 0.725) compared to those regulated by the cFFC motifs (median, τ = 0.467; Mann-Whitney U test, P-value < 0.05; Fig. 4b). Furthermore, genes in non-redundant cFFC motifs (median, τ = 0.539) showed higher tissue specificity than those in redundant motifs (median, τ = 0.454, Mann-Whitney U test, P-value < 0.05; Fig. 4b). However, no significant difference was detected between low-redundant (median, τ = 0.458) and high-redundant target gene groups (median, τ = 0.449, Mann-Whitney U test, P-value > 0.05; Fig. 4b). We also examined the impact of cFFC redundancies on gene expression specificity within different age groups, revealing significant differences between non-motif and cFFC genes in the pre-Bilateria and pre-Drosophilid groups (P-values < 0.05, Mann-Whitney U Test; Fig. 4c), and among various levels of cFFC redundancies within the Drosophilid group (P-value < 0.05, ANOVA; Fig. 4c and Supplementary Fig. 3). This suggests a potential role of cFFC redundancy in influencing gene expression breadth, particularly in young genes. After accounting for the effects of gene ages and the interaction between cFFC redundancy and gene ages, we found that cFFC redundancy significantly affects gene expression breadth (ANOVA, P-value < 0.05, Fig. 4d). Additionally, we observed a notable enrichment of highly redundant genes among those most widely expressed (τ > 0.9, Fisher exact tests, P-value < 0.05, Fig. 4e). In conclusion, miRNA-mediated cFFC redundancy appears to be associated with widespread gene expression across tissues.