Identification of 21- and 24-PHAS in Wheat
To identify the PHAS loci in wheat, we downloaded 261 small RNA datasets from the GEO database (Table 1), which included 12 seedling samples, 128 leaf samples, 12 root samples, one stem sample, one shoot sample, 29 young spike samples, two anther samples, one embryo sample, 17 spikelet samples, 12 rachis samples, 19 grain samples, 20 seed samples, 6 callus samples, and one mixed tissue sample. By comparing these small RNAs to the wheat genome using the Shortstack package [19] following the flowchart as shown in Supplementary Figure 1, a batch of PHAS (phasiRNA precursors) loci were identified with phased scores greater than 15, 20, 25 or 30 (Supplementary Table 1). In the reproductive tissues such as young spikes and anthers (Figure 1A-B && Supplementary Table 1), abundant 21- and 24-PHAS were identified. However, In the leaf, stem, root, spikelet, seed, callus, etc., few PHAS were identified (Supplementary Table 1 && Supplementary Figure 2). The abundance of phasiRNAs was not correlated with the total reads of the small RNA samples, but it was highly related to the types of tissues (Supplementary Figure 2).
In the downloaded small RNA samples of wheat, there were a group of datasets, which included the important development stage of young spike [20], i.e., DR stage (double-ridge stage) at the initiation of spike formation and spikelet development; FM stage, the stage of appearance of the floret meristems (FMs) glume primordia, and lemma primordia; AM stage (anther primordia stage), stamen and pistil primordia emerged from FMs with visible anther primordia for some florets; TS stage (tetrads stage), young florets began to differentiate with immature anthers and unelongated pistils, and the pollen mother cells completed meiosis to form the tetrads at this stage. In the DR and FM stages of young spike tissues, there were few PHAS loci, while in the AM and TS stages, there were abundant 21- and 24-PHAS. For example, ~3600 and ~4000 of the 21-PHAS loci with phased scores greater than 30, and 1200 and 1000 of the 24-PHAS loci with phased scores greater than 30 were identified in AM (SRR3690677 and SRR3690678) and TS (SRR3690679 and SRR3690680) samples, respectively (Supplementary Table 1). The number of 21-PHAS loci in anthers with lengths of 1.0 mm were more than in those with lengths of 1.5, 2.2 and 3.0 mm, while for the 24-PHAS, the number of PHAS loci were very similar among the different length of anther (Figure 1A-B). According to morphological development and stage determination of young spike and anther in wheat [21], six small RNA datasets were selected for further study that represented the early and later anther development stages. The early anther development stage included the floret meristem (FM, SRR5460941 and SRR5460949), anther primordia (AM, SRR5460967 and SRR5460972) and tetrad stages (TS, SRR5461176 and SRR5461177), and the later anther development stage included free haploid microspores (FHM, SRR449365 with 1.5 mm anther), mitosis (MIT, SRR449366 with 2.2 mm anther), and mature pollen (MP, SRR449367 with 3.0 mm anther). In the next section, these PHAS with score more than 30 were selected to do the further analysis.
For the 21-nt phasiRNAs in PHAS loci, the proportions of all 21-nt small RNAs were various in different inflorescence development. In the FM stage, there were few 21-nt phasiRNAs (300 TPM (transcripts per million)) with 0.6% out of the total 21-nt small RNAs, while in AM, a sharp increase (36054 TPM) was observed, comprising 23.96% out of the total. In the TS stage, the expression level continued to increase with 11410 TPM of 21-nt phasiRNAs (44.07% out of the totals) (Figure 1C & E). For the 24-nt phasiRNAs, the tendency of the proportion was similar to that of the 21-nt phasiRNAs in the three stages. The proportions of 24-nt phasiRNAs out of the total 24-nt siRNAs were 0%, 13.21% (101277 TPM), and 30.76% (156595 TPM) for the FM, AM, and TS stages, respectively (Figure 1C & E). This indicated that the 21- or 24-nt phasiRNAs and 21- or 24-PHAS loci were present in the AM stage and peaked in the TS stage. For the later anther development stage, the 21- and 24-nt phasiRNAs occurred in the FHM stage with proportions of 8.99% (13701 TPM) and 8.23% (52030 TPM), peaked in the MIT stage with proportions of 15.21% (28633 TPM) and 10.72% (60758 TPM), and then decreased with proportions of 12.9% (24562 TPM) and 8.74% (50908 TPM), respectively (Figure 1C & E). The proportions of 21- and 24-nt phasiRNAs in the three later anther development stage, were much lower than that of the AM or TS stage.
The synthesis of phasiRNAs in monocot reproductive tissues requires both PHAS precursors and their initiated miRNAs, such as miR2118 and miR2275 [16, 17]. The expression level of miRNAs was concertedly related to the synthesis of the phasiRNAs and PHAS loci. For 21- and 24-PHAS, the PHAS loci peaked both in AM and TS, and then rapidly decreased in the later anther development stage (Figure 1A-B). Both 21- and 24-nt phasiRNAs were expressed in AM, peaked only in TS and rapidly descended in the later anther development stage (Figure 1C & E). We then investigated the concert of the three elements including PHAS, phasiRNAs and their regulated miRNAs at the transcriptome level. The expression of miR2118 peaked in AM, which was similar to that of 21-PHAS, but occurred before the burst of 21-nt phasiRNAs. Then, the expression of miR2118 decreased in TS and disappeared in the later anther development stage. The expression of miR2275 peaked in TS, which was before that of 24-nt phasiRNAs but the same as that of 24-PHAS. Then, the expression of miR2275, 24-nt phasiRNAs and 24-PHAS rapidly decreased in the later anther development stage (Figure 1A-E). The expression of miR2275 was higher than that of miR2118 in the later anther development stage, which may be associate with the higher expression of 24-nt phasiRNAs than 21-nt phasiRNAs in these stages (Figure 1A-E). Together, the expression of the 21- and 24-nt phasiRNAs and their trigger miRNAs both burst in the early anther development stage.
The Distribution of PHAS Loci in the Wheat Genome
To investigate the genome distribution of PHAS in wheat, we used the Circos package to show the number of PHAS loci sliding the chromosome in window sizes of 500 kb. Here, we selected these small RNA datasets with abundant phasiRNAs in AM, TS, FHM, MIT and MP for further study. In the five developmental stages, the distribution of genome elements such as repeat sequences, gene body and intergenic regions were very similar in both 21- and 24-PHAS (Supplementary Figure 3). Few PHAS were located in the gene body or repeat sequence regions in the genome. Only 1~2% of these PHAS loci were distributed in gene body regions, and 12~21% of them were distributed in repeat sequence regions. In contrast, most of them (78~87%) were located in the intergenic regions. This result indicates that most PHAS loci may have independent transcript units that are not juxtaposed with the repeat sequences or coding genes.
According to the genome locations, we respectively merged all of the 21- and 24-PHAS in these samples derived from five inflorescence development stages. In total, there were 4850 and 3600 of unique 21- and 24-PHAS in these samples, respectively (Supplementary Figure 4A-B), most of which were common. In total, 94.93% (2042 out of 2151), 53.14% (2385 out of 4488), 98.61% (637 out of 646), 94.85% (1263 out 1198) and 95.91% (1056 out of 1101) of 21-PHAS (Supplementary Figure 4A), and 49.87% (993 out of 1991), 69.09% (1571 out of 2274), 98.26% (1019 out of 1037), 98.38% (1157 out of 1176) and 98.79% (1057 out of 1070) of 24-PHAS (Supplementary Figure 4B), were overlapped each other in at least two samples in AM, TS, FHM, MIT, and MP, respectively. The low number of common 21-PHAS in TS and 24-PHAS in AM indicated that there may be more tissue-specific PHAS loci in these two development stages, which may be associated with the transition of the development stage from floret meristem to meiosis.
These merged unique PHAS were plotted on the chromosome rainbows with black lines (Figure 2). Most of the PHAS loci were located at the end of the chromosomes, i.e. telomere regions. In most regions, the peaks of the 21-PHAS (red lines in Figure 2) were higher than those of the 24-PHAS (blue lines in Figure 2) in the representative tissues. Most of the peaks in both 21- and 24-PHAS were similar among the subA, subB and subD genomes in each sample. However, some peaks of 21- and 24-PHAS in local chromosomes were preferred among the three subgenomes in each sample (red and blue arrows in Figure 2 and Figure 3).
The Genome Plasticity of PHAS in Allohexaploid Wheat
Polyploidization is followed by genome partitioning or fractionation processes, i.e. a genome-wide diploidization, in which one or the other gene duplicate is lost. During this process, differently functional protein-coding genes have been shown to behave differently. Transcription factors or regulators are often retained as duplicated copies following whole genome duplications (WGDs), whereas others are progressively deleted back to a single copy (singleton) state [22-24]. For allohexaploid wheat (AABBDD, Triticum aestivum, 2n = 6x = 42), high sequence similarity and structural conservation are retained with limited gene loss after polyploidization [25]. However, at the transcript level, cell type- and stage-dependent genome dominance was observed in the local chromosome regions [26]. Studies of the roles of PHAS loci as a category of noncoding genes in the partitioning process of allohexaploid wheat have still not been performed. To investigate the location of PHAS in the subA, subB and subD of allohexaploid wheat, we performed a blast search against all genomes to identify the relationship of PHAS among the three subgenomes. With 80% identity and 80% matched sequence length, only 2.27%~6.40% of PHAS in the five samples retained the triplet copies in the three subgenomes, 11.11%~17.51% of PHAS retained the duplet copies in any two subgenomes, and 76.09%~86.22% of PHAS retained only singleton in any one subgenome (Supplementary Figure 5). The homoeologous relationship is shown in Figure 3 with the same color link lines in the homoeologous chromosomes. There were also some translocated homoeologs (the cyan links in Figure 3). For example, some PHAS in chr4A were homologous to those in chr5B/D and chr7A/D. These data showed that only partial PHAS retained the triple or duplet homoeologs, and most PHAS only possessed the singleton copy.
To investigate the subgenome distribution of PHAS loci in allohexaploid wheat, the total number of PHAS loci in each sub-chromosome was calculated, and there were no biased in each subA, subB and subD chromosomes. However, for the local chromosomes, there were some bias distribution in the local subA, subB and subD genomes. For 21-PHAS loci, in the bottom chromosomes of chr1, chr2, chr3 and chr4, and in the top and bottom chromosomes of chr4 and chr7, the PHAS loci were biased located in the chromosomes, but the tendency of preference was different, in either the top or bottom of one chromosome (Figure 2 (the red arrow regions) and Figure 4A). In chr1-b (b, bottom of the chromosome), significantly less 21-PHAS were located in the subA genome than in the subB and subD genomes (Fisher’s exact test, P-value < 0.05). In chr2-b, the number of 21-PHAS was significantly less in subB than in subA and subD (P-value < 0.05). In chr3-b, subB became the dominant genome with significantly more 21-PHAS than the subA and subD genomes (P-value < 0.05). In chr4-t (top of the chromosome), the number of 21-PHAS was less in subA than in subB and subD (P-value < 1.0e-5), while in chr4-b, the subA genome possessed numerous 21-PHAS, significantly more than the other subgenomes (P-value < 2.2e-16). In chr7-t, subB possessed less of the phased loci, next to the subA genome, and the subD possessed much more 21-PHAS than subA and subB (P-value < 0.001).
For 24-PHAS, only chr3-t, chr4-t/b and chr7-t in all five samples exhibited a biased distribution (Figure 2 (blue arrow regions) and Figure 4B). In chr4-t/b, the preference of 24-PHAS in chromosomes was very similar to that of 21-PHAS. In chr4-t, less PHAS loci were located in subA than in subB and subD (P-value < 1.0e-6), while in chr4-b, more PHAS loci were located in subA than in subB and subD (P-value < 2.2e-6). In chr3-t, more 24-PHAS were distributed in the subB genome than in the subA and subD genomes (P-value < 0.05). However, in chr7-t, far fewer 24-PHAS were located in the subB genome than in the subA and subD genomes (P-value < 1.0e-6). These data suggested that PHAS loci exhibited local chromosome preferences during the genome plasticity process.
Homoeologous PHAS Loci in Diploid, Tetraploid, and Hexaploid Wheat
The progenitors of allohexaploid wheat contain the diploid genomes AA, BB and DD, and the tetraploid genome AABB. The genomes of AA (Triticum urartu, 2n = 2x = 14), DD (Aegilops tauschii, 2n = 2x = 14), and AABB (Triticum turgidum, 2n = 4x = 28) were sequenced, and their genome assembly was nearly perfect at the chromosome level. This made it possible to investigate the evolution of 21- and 24-PHAS in the different ploidy of Triticum species. Thus, we downloaded the genome sequences of AA, DD and AABB, and then mapped the 21- and 24-PHAS with scores of greater than 30, which were identified in these samples from the five developmental stages of inflorescence, to these genome sequences by the BLAST program with 80% identity and 80% matched sequence length. Approximately 22.91%~25.98% of the 21-PHAS could be mapped to the AA genome, and 37.77%~44.12% of the 21-PHAS could be mapped to the DD genome, which was slightly higher than the 21-PHAS that were mapped to the AA genome. However, a greater proportion of PHAS (60.83%~62.72%) could be mapped to the AABB genome. For 24-PHAS, the mapped proportions to AA, DD and AABB were very similar to the mapped 21-PHAS. Approximately 22.52%~26.13%, 29.18%~37.42%, and 53.96%~58.05% of 24-PHAS could be aligned to the AA, DD and AABB genomes, respectively (Supplementary Table 2). In the tetraploid AABB, the mapping rates of 21- and 24-PHAS were much higher than the diploid species AA and DD. The scatter diagram showed that the mapping rate was positively correlated with the ploidy times in 21-PHAS with r2 = 0.8019, as determined by Pearson correlation test (P-value = 6.41e-6, Supplementary Figure 6A), and 24-PHAS with r2 = 0.8578 (P-value = 6.41e-6, Supplementary Figure 6B). This indicated that the expansion of the whole genome led to an increase of the PHAS loci.
For the mapped 21- and 24-PHAS to the AA genome, 67.55% and 75.73% of them were located in the subA genome of wheat, respectively. For the mapped 21- and 24-PHAS to DD genome, there were 72.78% and 77.53% of them located in subgenome D of wheat, respectively. For the mapped 21-PHAS to the AABB genome, 43.41% and 42.40% of them were from the subA and subB genome of hexaploid wheat, respectively, and for 24-PHAS, a similar proportion (i.e. 44.38% and 44.51% for subA and subB, respectively) was observed. The detailed genome distributions in the AABBDD genome for the mapped PHAS were clearly shown in each chromosome of Figure 5. For the PHAS mapped to the AA genome, there were more peaks (purple lines in Figure 5) from each chromosome of subA than from the other subgenomes. For the mapped PHAS to the DD genome, the peaks (green lines in Figure 5) were mostly distributed in the subD genome. For the mapped PHAS to the AABB genome, each chromosome of both subA and subB had more peaks (blue lines in Figure 5) than the subD genome. This finding indicated the orthologous relationship between each of subA and the AA genome, subB and BB genome, and subD and DD genome. The evolution independence of PHAS sequences among the three diploid species, AA, BB and DD, may suggest that the PHAS sequences in Triticum may diverge before tetraploid synthesis and may diverge after occurrence of the AA, BB and DD species.
Genome Distributions of MiR2118 and MiR2275 in Triticum
The production of phasiRNAs was initiated by miR2118 and miR2275 via cleavage of their PHAS precursors. MiR2118 and miR2275 possessed many copies in grasses. There were 18 members of miR2118 and four members of miR2275 in the rice genome and seven members of miR2118 and four members of miR2275 in the maize genome, based on the miRBase database [27]. To detect these miRNAs in Triticum, we mapped the mature sequences of miR2118 and miR2275 from the miRBase database to the Triticum genomes with perfect matches and found that there were 25, 30, 88, and 140 members of miR2118 in the DD, AA, AABB and AABBDD genomes, respectively, and 6, 5, 17 and 24 members of miR2275 in the four genomes, respectively. Most of these miRNAs were clustered on the chromosomes of the four species, as shown in Figure 6A and Supplementary Figure 7-8. The increase tendency of miR2118 and miR2275 was significantly correlated with ploidy, as determined by Pearson correlation test (P-value = 0.0015 and 0.0084, r2 = 0.9971 and 0.9833, respectively; Figure 6B-C). For miR2118, there was also a biased distribution in the three subgenomes of wheat. However, unlike the 21-PHAS, the tendency of miR2118 on the subgenome in each group of chromosomes was consistent (Figure 6A). Except for chr2, subB was significantly dominant, with many more members of miR2118 than the subA and subD genomes in chr1 (Fisher exactly test, P-value = 1.87e-5), chr4 (P-value = 3.90e-14) and chr5 (P-value = 3.65e-14). On chr2, there were also more members of miR2118 in the subB genome but not significantly with P-value > 0.05. In the tetraploid AABB genome (Supplementary Figure 7), miR2118 in subB was also dominant than the subA genome on chr1 (P-value = 2.06e-8), chr2 (P-value = 1.43e-2), chr4 (P-value = 3.72e-10), and chr5 (P-value = 1.43e-2). The similar dominant subgenome distribution of miR2118 in the AABB and AABBDD genomes indicated that the dominance of subB may have occurred before the synthesized hexaploidy of wheat. In the AA and DD genomes, there were also fewer miR2118 in chr1, chr2, chr4 and chr5 than in subB of the AABB and AABBDD genomes (Supplementary Figure 8A-B), which indicated that the expansion of miR2118 in the subB genome may have occurred before the synthesized tetraploidy of the AABB genome. This indicated the dynamic expansion of the trigger miRNAs following genome expansion or polyploidization.
PHAS Loci as the Targets of MiR2118 and MiR2275 in Wheat
The production of the PHAS loci could be initiated by miR2118 and miR2275, and then generate the 21- and 24-nt phasiRNAs, respectively. To identify whether miR2118 and miR2275 also target these PHAS transcripts in wheat, using the Targetfinder program [28], we aligned the two miRNA families to the PHAS sequences with a score less than 4. Taking these samples of the five development stages as examples, 35.43% (767 out of 2159), 34.06% (220 out of 646), 35.31% (446 out of 1263), 38.31% (824 out of 2151), and 35.09% (1575 out of 4488) of 21-PHAS sequences were predicted to be targeted by miR2118 in AM, ST, FHM, MIT, and MP, respectively. For 24-PHAS, 43.36% (575 out of 1323), 50.43% (523 out of 1037), 50.26% (591 out of 1176), 0.45% (9 out of 1991), and 48.86% (1111 out of 2274) were identified as the targets of miR2275, respectively. The alignment information between the miRNAs and PHAS loci were listed in Supplementary Table 3-7.
To validate whether these PHAS transcripts can be indeed cleaved by miR2118 and miR2275, we downloaded the degradome sequences from the GEO datasets for the young spike tissue under cold stress by Song et al [29], which corresponds to the small RNA datasets of the AM stage of young spike (control samples, SRR3680677 and SRR3680678; and cold stress samples at 0°C after 48 hours, SRR3680679 and SRR3690680). According to the abundance of reads along the whole transcripts, using the Cleaveland program [30] (P-value < 0.05 and the category <= 2), miR2118 and miR2275 were considered to be interacted between PHAS transcripts and miRNAs. The target plots of miR2118 and miR2275 characterized in the degradome datasets were shown in Figure 7A-D. And at the cleavage sites, the generated phasiRNAs with orientation from the positive or negative strand were shown in the bottom panel of Figure 7A-D. A total of 13.01% (520 out of 3996, SRR3680679) and 12.80% (525 out of 3996 in SRR3680680) of 21-PHAS were confirmed to be cleaved by miR2118 for the cold stressed samples. In the control samples, slightly less 21-PHAS were validated as cleaved targets, i.e. 9.86% (355 out of 3601 in SRR3680677) and 9.65% (353 out of 3659 in SRR3680678). For miR2275, few 24-PHAS were detected to be cleaved in the control and cold stressed samples. Only 0.18% (two out of 1122 in SRR3690680) and 0.38% (four out of 1052 in SRR3680679) of the 24-PHAS in the cold stressed samples were validated to be cleaved by miR2275, while in the control samples, there were slightly more target sites of 24-PHAS than in the cold stress samples, as confirmed by cleavage of miR2275, i.e. 3.16% (45 out of 1421 in SRR3680677) and 9.65% (38 out of 1203 in SRR3680678). This finding provided evidences that miR2118 and miR2275 could mediate the cleavage of PHAS transcripts in wheat. The cleavage information of the degradome for the PHAS loci were listed in Supplementary Table 8-10.
The Expression Level of PHAS Loci
The generation of phasiRNAs may depend on the expression of their precursors. To detect the expression of the PHAS precursors, we downloaded the RNA-seq datasets from the GEO database, including the reproductive tissues at DR, FM, AM and TS. The PHAS precursors were mostly expressed in the AM and TS stages, and only a few of them were expressed in the DR and FM stages (Figure 8A). It was coordinated with the expression of phasiRNAs. Furthermore, most of their expression levels were very low. Only 434 and 289 of PHAS with the expression level were more than one RPKM in the AM and TS stages of young spikes, respectively, and 175 of them were overlapped between the two stages (Figure 8C). This finding indicated the specific expression of PHAS loci in different developmental stages of young spikes.
PHAS transcripts were expressed specifically in the reproductive tissues, but whether they responded to abiotic stress was still unknown until now. Taking cold stress as an example, using the transcriptomes with polyA in the DR and AM stages of young spike samples including 0 (control), 6, 12, 24 and 48 h (hours) after cold stress at 0°C, we identified the expression of these PHAS loci. In the cold stressed samples, most of the PHAS loci were also mainly expressed in the AM stage, and the expression level also showed a difference (Figure 8B). Using the DEseq program [31] with the fold change (stressed vs control) more than two times and adjusting the P-value less than 0.01, 102, 104, 133 and 144 of PHAS were characterized as differentially expressed PHAS transcripts at 6, 12, 24 and 48 hours after cold stress at 0°C, respectively. For the differentially expressed PHAS, a total of 66.67% (68 out of 102), 73.08% (76 out of 104), 74.43% (99 out of 133) and 68.06% (98 out of 144) were common at the four stressed time points (Figure 8D). These results indicated that the PHAS transcripts responded to abiotic stress, such as cold stress.