2.1. Mapping of sRNA in the buds of Juglans mandshurica
To identify novel miRNA and to investigate the role of miRNA in sex differentiation and gynoecium-androecium transformation of Juglans mandshurica, we constructed 12 small RNA libraries (three biological replicates per sample) in four types of protogyny female buds (T1), protogyny male buds (T2), protandry female buds (M1) and protandry male buds (M2). Deep sequencing of the small RNA library yielded average raw small RNA reads for the four Juglans mandshurica groups of 70099677, 42177432, 32725495 and 43655127 respectively. After removing the adapters and removing low quality sequences, reads in the 18-36nt range in length were selected (Clean Reads). A total of 20.0 G of clean data were obtained, with the average values for the four types being 38132413, 33541027, 25967944 and 35423485 (Table S1), which average of 73.70% of original data.
The distribution of sequence lengths before and after de-duplication showed that the sRNAs obtained were of different lengths, with the T1 group having the most abundant 33-nt RNA before de-duplication, while the other three groups had the most abundant 24-nt RNA. After de-duplication, all four groups of sRNAs were mainly distributed between 21-24nt in length, with 24nt RNA being the most abundant (Additional file 1). The number of unique sequences shared between different subgroups was counted at both the total and unique sequence levels. The total number of sequences in the four libraries was 316263574 (Fig. 1-A), and the total number of sequences after de-duplication was 1043444 (Fig. 1-B). The Clean Reads were compared to the reference genome and the results of the comparison are shown in Table S2, where on average more than a third of the sequences in the four sample sets were mapped to the reference genome. This indicates that the quality of the miRNA sequencing data is reliable and can be used for subsequent analysis. These sequences are distributed on 16 chromosomes of Juglans mandshurica, with chr01 being the most abundant and chr12 the least abundant. (Fig. 1-D)
Using Blast, Unique Reads were matched against the Rfam13 database to screen for the four known classes of ncRNA (rRNA, tRNA, snRNA and snoRNA). Reads that did not match the four types of ncRNA mentioned above were compared using Blast with mature miRNA sequences from plants in the miRBase22 database, from which conserved miRNA were screened. As a final summary of all the comparison results (Table S3), most of the RNAs were concentrated in the UNKNOWN category, with the second most abundant category being rRNA (Fig. 2). This suggests that there may be many more non-coding RNAs in the unknown category.
A total of 310 known miRNA and 278 novel miRNA were detected in the 12 libraries based on the comparison results of the miRBase 22.1 database and the miRDeep2 prediction results (Table S4). miRNA are generally conserved in plants and conserved miRNA have been identified in a total of 109 miRNAs among known miRNA and have been shown to be conserved in 68 plants. The largest miRNA family was MIR166 with 30 miRNA members, followed by MIR159, MIR167 and MIR156 with 24, 21 and 16 miRNA members, respectively (Fig. 1-E). PCA principal component analysis was performed on 12 replicates of samples from four groups of Juglans mandshurica (Fig. 1-C) and the results showed that replicates clustered together within the Juglans mandshurica group, indicating a low degree of variation between them, while the large variation between the Juglans mandshurica groups indicated significant differences between samples and could be used for differential expression analysis.
This sequencing was performed to analyze the base preference of pecan miRNA. The analysis showed a preference for uracil (U) for the first base of miRNA 19-23nt in length, adenine (A) for the first base of miRNA 24nt in length, and guanine (G) for the first base of miRNA 25nt in length (Fig. 3-A). Analysis of the base preferences for each site of the miRNA showed that U was relatively common in the first two positions and cytosine (C) in the first three positions. Bases 23 and 24 showed a strong preference for U and C, and base 25 showed a strong preference for G (Fig. 3-B). This is in agreement with the results of previous studies.
2.2. miRNA differential expression analysis
According to the screening criteria for differentially expressed genes in small RNAs of Juglans mandshurica buds, we obtained a comparative library of Juglans mandshurica DEGs: M1vsM2:107 differential miRNA, T1vsT2:85 differential miRNA; T1vsM1, 46 differential miRNA ; T2vsM2:65 differential miRNA, and counted the common and unique sequences between the groups ( Fig. 4-A). A total of 172 differential conserved miRNA were found in four groups of 12 libraries, and the abundance of differential miRNA was counted (Fig. 4-B). Clustering analysis was performed on the four groups of differential genes (Fig. 5).
2.3. Target gene prediction and GO, KEGG enrichment analysis
To better understand the function of DEGs, we predicted their target genes. A total of 638 genes were predicted as target genes and 2999 binding sites for miRNA to target genes were predicted (Table S5). The target genes were aligned with GO, KEGG, eggNOG, NR, Swissprot databases and blast to the Arabidopsis genome for observation of their function (Table S6). Target gene prediction revealed a total of 107 differentially expressed miRNA targeting 1022 potential mRNA loci in the M1vsM2 subgroup, with an average of 9.55 per mRNA/miRNA. Among the miRNA miR159b-3p and miR159a target the most mRNAs (32). Most miRNA target multiple loci, suggesting that a single miRNA can regulate multiple target genes. However, a single target gene can also be targeted by multiple miRNA. A total of 621 mRNAs can be regulated by more than one miRNA, with 12 mRNAs being targeted by 11 miRNA. According to the taxonomic annotation of the GO functions of DEG in M1vsM2, a total of 2954 functions can be observed, of which 2117 are Biological Processes (BP), 352 are Cellular Components (CC) and 485 are Molecular Functions (MF) (Table S7). The results showed that M1 and M2 were significantly enriched for 178 terms, of which 147 terms belonged to biological functions, 4 terms to cellular components and 27 terms to molecular functions. The most enriched term was binding (GO:0005488) while the enriched terms included stamen development (GO:0048443), androgen development (GO:0048466), and floral organ development (GO:0048438). This suggests that these genes may play a dominant role in M1vsM2 sexual differentiation (Fig. 6-A).
T1vsT2 had 85 differentially expressed miRNA targeting 906 mRNA loci with an average of 10.66 mRNA/miRNA. Of these miRNA annotated to target genes, miR159a-3p targeted the most miRNA (37), followed by miR159b-3p.1 (34). A total of 604 mRNAs could be regulated by more than 1 miRNA. Based on the results of the categorical annotation of GO functions in T1vsT2 (Table S7), it can be observed that a total of 2765 functions were annotated in this group, including 1980 biological functions, 307 cellular components and 477 molecular functions. Screening at a corrected P value (FDR) < 0.05, T1vsT2 significantly enriched a total of 238 terms. Of these, BP had 20, CC had 3 and MF had 26. The most enriched term was metabolic process (GO:0008152), followed by binding (GO:0005488) and cellular metabolic process (GO:0044237) (Fig. 6-B).
There were 46 differentially expressed miRNA targeting 416 mRNA loci in T1vsM1, with an average of 9.04 mRNAs/miRNA. Among these miRNA, miR159b-3p.1 and miR159c targeted the most mRNAs (34), followed by miR159a (33). A total of 247 mRNAs could be targeted by more than 1 miRNA. The categorical annotation results of GO functions showed that a total of 1653 functions were annotated in this group, including 1180 for BP, 183 for CC and 280 for MF (Table S7). Analysis of the significance of significant enrichment of GO functions showed that 163 functions were significantly enriched, including 130 biological functions, 5 cellular components and 19 molecular functions. The most enriched were cellular metabolic processes (GO:0044237), binding (GO:0005488) and organic cyclic compound binding (GO:0097159) (Fig. 6-C).
There were 65 differentially expressed miRNA targeting 656 mRNA loci under the T2vsM2 grouping, with an average of 10.09 per mRNA/miRNA. miR172b-5p targeted the most mRNAs (30), followed by miR156l (25). A total of 401 mRNA could be regulated by more than 1 miRNA, of which 12 mRNAs could be regulated by 11 miRNA. According to the categorical annotation results of GO functions, it can be observed that a total of 2476 functions were annotated in this group, including 1779 biological functions, 272 cellular components and 425 molecular functions (Table S7). 154 functions were screened for significant enrichment by GO function analysis, including 136 biological functions, 3 cellular components and 15 molecular functions. The most enriched term was metabolic process (GO:0008152), followed by binding (GO:0005488) and cellular metabolic process (GO:0044237) (Fig. 6-D).
The results of our differential expression analysis and GO and KEGG enrichment analysis for both protogyny and protandry types showed that there were 28 significantly differentially expressed miRNA, including 21 up-regulated genes and 7 down-regulated genes. GO enrichment analysis showed that a total of 299 functions were significantly enriched, with the most enriched function being cellular processes (GO:0009987), followed by metabolic processes (GO:0008152) and binding (GO:0005488).
KEGG enrichment analysis of the four groups (Table S8) showed that the most enriched target gene pathway in all four groups was Plant hormone signal transduction (ko04075), followed by Protein processing in endoplasmic reticulum (ko04141) and MAPK signaling pathway - plant (ko04016). Meanwhile, KEGG enrichment analysis of both protogyny and protandry types as a whole showed that the most enriched target gene pathways were Plant hormone signal transduction (ko04075), Amino sugar and nucleotide sugar metabolism (ko00520), and MAPK signaling pathway - plant (ko04016). This indicates that the target genes of the above pathways play a crucial role in the formation of both sex differentiation and heterosexualism mechanisms in Juglans mandshurica.
These target genes contain many transcription factors that are directly or indirectly involved in sex differentiation. For example, the SPL family targeted by miR156, the AP2 family targeted by miR172, the MYB family targeted by miR159 and miR171 can be used to regulate GA synthesis and signaling by targeting the SCL gene family, and the ARF gene family regulated by the miR160/167 family can regulate growth hormone signaling, among others.
2.4. qRT-PCR for validation of gene expression
To verify the expression pattern of the high-throughput sequenced miRNA, we randomly selected nine miRNA for real-time fluorescence PCR analysis(Table S9). The expression trends of the selected miRNA were consistent with the sequencing results (Fig. 7), indicating that the sRNA sequencing results were reliable.
2.5. Key miRNA screening
Based on the results of differential gene expression, GO and KEGG enrichment analyses, combined with other plant sex differentiation studies, we inferred key miRNA that have an important influence on sex differentiation and the mechanism of heterozygous heterozygosity formation in Juglans mandshurica (Table 1), and constructed a key miRNA-mRNA targeting regulatory network using cytoscape (Fig. 8).
Table 1
Four groups of key differentially expressed miRNA.
Group | miRNA | DEM | Log2 Fold Change |
M1vsM2 | miR159c-3p | Down | -4.182016741 |
miR399a | Down | -3.921752724 |
miR319a | Down | -3.133464842 |
miR171f-5p | Down | -2.485869043 |
miR172d | Down | -1.630965443 |
miR164-3p | UP | 8.294286129 |
miR167b | UP | 6.595958558 |
miR156b | UP | 6.455034958 |
miR156d-3p | UP | 5.339716891 |
miR171a | UP | 4.132868324 |
miR167c | UP | 3.641872628 |
miR396t | UP | 2.903334354 |
T1vsT2 | miR319a | Down | -4.000611515 |
miR159c-3p | Down | -3.305003284 |
miR319b | Down | -3.086782755 |
miR172d | Down | -2.073500685 |
miR167a | UP | 2.921788545 |
miR167b | UP | 2.901437113 |
miR171a | UP | 2.771554322 |
miR396t | UP | 2.409984627 |
miR319 | UP | 2.401037572 |
miR166b | UP | 2.274164037 |
miR5139a | UP | 2.039594749 |
T1vsM1 | miR166c | Down | -5.189040491 |
miR164a | UP | 2.779404952 |
T2vsM2 | miR166c | Down | -5.189040491 |
miR164-3p | UP | 10.00090635 |
miR171c-5p | UP | 6.912390786 |
miR171b | UP | 3.745594554 |
2.6. Combined analysis of miRNA sequencing and transcriptome sequencing
The miRNA-predicted target genes were compared with the significantly differentially expressed mRNAs in the transcriptome data to find the corresponding mRNAs (Fig. 9), which were subjected to GO functional enrichment analysis. The results showed that M1vsM2 enriched the most genes for the function of metabolic process (GO:0008152), T1vsT2 enriched the most genes for the function of regulation of biological process (GO:0050789), T1vsM1 enriched the most genes for the function of regulation of biological process (GO:0007275), T2vsM2 for the cellular macromolecule metabolic process (GO:0044260). All four groups were significantly enriched for functions related to sex differentiation, organic matter biosynthesis process (GO:1901576), reproduction (GO:0003006), flower development (GO:0009908), etc. This suggests that these miRNA and mRNAs play a key role in the direction of sexual differentiation.