Transcriptome sequencing of m6A modified Bombyx mori
The ovarian tissues of day-3 pupae of bivoltine strain, QiuFeng were collected from QFHT and QFLT group respectively, and 3 replicates were designed for each group. And their transcriptomes were sequenced according to the Input and IP classification. On average, about 43000000 raw reads were obtained from each Input sample, and about 40000000 raw reads were obtained from each IP sample. After filtering out low-quality data, the Input has a retention rate of 99.9% and the IP reads retention rate is approximately 95% (Additional File 1 and File 2: Figure S1 and Table S1). To visually display the data quality, analysis of the composition and quality distribution of bases was performed. The more balanced and higher quality the base composition is, the more accurate subsequent analysis will be (Additional File 3: Date S1 and Data S2). High quality clean reads were aligned to the ribosomal database using the alignment tool Bowtie2 to remove reads from ribosomal RNA (Additional File 2: Table S2). Finally, statistically valid reads for the silkworm genome data, the results show that the average valid comparison rates of the three replicates of Input-QFHT and IP-QFHT samples are about 64.6% and 64.0%, respectively, and that of Input-QFLT and IP-QFLT samples are about 68.7% and 67.0% (Additional file 2: Table S2), separately. The unique and de-duplicated reads were compared to each chromosome (positive and negative chains) on the genome, and the density was statistically mapped, so that the relationship between chromosome length and the number of reads compared were seen more intuitively (Additional file 4: Data S1).
The number of peaks, total peak length, average peak length, the number of reads in the peak as a percentage of the total number of reads in all alignments (FRiP) and the peak genome alignment rate were analyzed for the number of peaks performed in a single sample by intra-group sequencing. There is a good consistency in the data between single samples in each group (Additional file 2: Table S4). Statistical analysis of the number of peaks at different depths shows that the proportion of peak number of IP-QFHT in total peak is slightly higher than that of IP-QFLT in sequencing depth of 5-50 regions (Additional File 5: Data S1). The unique peak related reads obtained were analyzed and the number of reads to different regions of the transcript was counted. The results show that the relative transcript numbers of IP-QFHT reads in the 5'UTR and 3'UTR regions were higher (Additional file 6: Data S1). Further statistics on the distribution and width distribution (indicating the length of protein binding sequences) of peak on chromosomes showed that the peak number of m6A modification of IP-QFLT on chromosomes was slightly higher than that of IP-QFHT. In addition, the peak number of IP-QFLT is also higher in the protein-bound sequence length range of 100-300(Additional file 7: Data S1 and Data S2). Finally, the distribution of fold enrichment and q value of peak was analyzed, and the results showed that IP-QFLT was more significant both in enrichment multiple and significance degree, indicating that the peak expression of IP-QFLT was higher than that of IP-QFHT samples (Additional File 8: Data S1 and Data S2).
Filter peak among repeated samples within the group, and retain common peak which overlap > 50%. Statistical results show that the peak number of IP-QFLT is far more than that of IP-QFHT samples (Additional File 2: Table S5).
Distribution and enrichment of m6A modification
The peaks among three replicates in the same group with overlap>50% and co-occurring in at least two samples were filtered, and the statistical results showed that the number of peaks of IP-QFLT far exceeded that of IP-QFHT samples (Additional file 2: Table S5). In order to understand the difference of m6A modification in transcripts, the distribution number of peak on different gene functional elements such as 5'UTR, start codon, CDS, stop codon and 3'UTR in each group of samples was counted. Enrichment score of peak on functional elements of different genes was calculated for chi-square test with the formula : Enrichment score = n/(n× P)[5] (n: peak number on functional elements of each gene; N: total number of peak samples; P: proportion of genome length for each classification)[36]. The results showed that peak of QFLT and QFHT differed significantly in the distribution of functional elements and rich concentration. In terms of distribution, the proportions of QFLT and QFHT on functional components from high to low are CDS, start codon, stop codon, 5 'UTR and 3' UTR. The m6A modification in QFLT was more likely to distribute on CDS, and that in QFHT was more likely to distribute on the other gene functional elements except CDS (Figure 1A and 1B). In terms of enrichment, the highest enrichment rate of QFLT and QFHT was in the start codon area, followed by the stop codon area. But in the comparative analysis of the two groups of samples, the enrichment of QFLT in the CDS region was significantly higher than that of QFHT (Figure 1C and 1D). The m6A modification of QFLT and QFHT samples was highly similar in both distribution and enrichment of functional elements, indicating that m6A methylation modification was conservative in the silkworm. However, there were significant differences in the distribution and enrichment ratio of the same functional elements between the two groups, suggesting that m6A modification may have different effects on gene expression regulation in different diapause traits, and may have a potential regulation effect on diapause traits.
Peak combination between groups and multiple clusters
Principal component analysis (PCA) and correlation heat map method were used to conduct pair-way correlation analysis for all 6 samples from the two groups. The results showed that the peak data were significant different between the QFLT and QFHT groups, and the peak data difference within either group was relatively consistent, further indicating that the data can be used for subsequent analysis (Additional file 1: Figure S2 and S3). The peak filters of replicate samples between groups were merged and a Venn diagram was used to show the presence or absence of RNA methylation modifications between groups. Results showed that 659 peaks were unique to QFHT and 1563 peaks were unique to QFLT, with a total of 1,984 peaks (Figure 2A). According to the statistics on the number of peaks on related genes, more m6A modifications occurred in QFLT samples than in QFHT (Figure 2B). Motif analysis of MEME (detected 8~30 bp) and Dreme (detected ≤ 8 bp) in two different length ranges using the MEME Suite method showed that the motifs prefer “TCACT” sequences which is similar to the ubiquitous "RRACH" motifs in other species (Figure 2C), indicating that the motifs recognized by m6A modifications in silkworm were highly conserved and supported the presence of methylation modification mechanisms during silkworm development.
Difference of RNA methylation modification level between two groups
There were differences in the genes with m6A methylation modification between the two different treatment groups. Between groups, peaks without overlaps are defined as unique peaks, and peaks with overlaps as shared peaks. Venn diagram was used to show the relationship between groups with or without RNA m6A methylation modification. The analysis showed that IP-QFHT had 659 peaks mapped to 537 genes, and IP-QFLT had 1563 peaks mapped to 1042 genes. There were totally 1,884 peaks mapped to 1,275 genes in two groups (Figure 3A).
QFHT-specific peaks were enriched in 212 GO signaling pathways, of which GO_BP (biological process): 105, GO_CC (celluLar componen): 43, GO_MF (molecuLar function): 64 (Figure 3B); It mainly focuses on cellular process, metabolic process, individual development process, biological process regulation and etc. 249 KEGG signaling pathways (Figure. 3C) accounted for the largest proportion of environmental signal processing and signal transduction pathways, were enriched to 56 genes; followed by signaling molecules and interactions and membrane transport; In the cell process, the enrichment of cell growth and death was relatively more, followed by transport and catabolic pathways. In the body system, the enrichment from high to low were metabolic system, immune system and nervous system, respectively. The rest were concentrated in metabolic signaling pathways. In the process of genetic information, QFHT was more focused on translation, folding and degradation.
QFLT-specific peaks were enriched in 335 GO signaling pathways (Figure 3D), of which GO_BP: 105, GO_CC: 43, GO_MF: 64; mainly concentrated in cellular processes, metabolic process, individual process, biological process regulation and other pathways. Similar to QFHT, 280 KEGG signaling pathways (Figure 3E), in the environmental signal processing class, accounted for the largest proportion of signal transduction pathways followed by signal molecules and interaction signaling pathways, but the number of genes enriched in both was more than twice that in QFHT. In cell process, QFLT enrichment in cellular immune pathway was the most abundant compared with QFHT, followed by catabolism, and finally cell growth and death pathway. In the body system, the top three pathways of QFLT enrichment are metabolic, immune and nervous system, which are the same as QFHT, but the number of genes enriched is nearly three times that of QFHT. In the process of genetic information, QFLT focuses more on folding and degradation. These results indicated that the peak specific to QFLT and QFHT was significantly different and the m6A modified genes had a wider regulatory range. Compared with QFHT, the number of m6A modified gene transcripts occurred in QFLT was larger and involved in a broader depth of regulation.
The common peaks can be matched to 1275 genes, which can be enriched into 478 GO signaling pathways (Figure 3F), including GO_BP: 355, GO_CC: 33, GO_MF: 90. It mainly focuses on cellular process, metabolic process, individual process, biological process regulation and other pathways. There are 291 KEGG signaling pathways (Figure 3G), which mainly focus on metabolism, genetic information processing, environmental signal processing and cellular processes. The top 20 KEGG enrichment pathways mainly include PI3K-Akt signaling pathway, hippopotamus signaling pathway and MAPK signaling pathway, etc.
Analysis of differences in RNA methylation rates between two groups
MeRIP data and Input data were used to calculate the relative methylation rate of each peak. then the RNA methylation rate differences analysis filter peak using exomePeak at FDR < 0.05 and | log2FC | > 1. GO and KEGG functional enrichment analysis was performed on genes related to differential peak.
The results showed that compared with QFHT, there were 9 down-regulated peaks and 92 up-regulated peaks in QFLT, which could clearly show the significance of differential expression in the volcano map (Figure 4A). MeRIP-qPCR verification was performed for 10 genes with differentially expressed genes, and the results showed slightly different log2FC values from the sequenced, but the whole up-regulation or down-regulation trend was consistent (Additional file 2: Table S6). the 9 down-regulated peaks were matched to 4 genes, and no enrichment was found; 92 up-regulated peaks were mapped to 64 genes (Figure 4B). 19 genes were enriched in 43 KEGG signaling pathways (Figure 4C), and enriched in 41 GO signaling pathways (Figure 4D), which were GO_BP:23, GO_CC:4, GO_MF:14. Enrichment analysis indicated that these differentially methylated genes were mainly related to signal transduction. This result suggests that in QFLT (diapause egg producer) m6A modification involves in signal transduction in the pupal ovary through the occurrence of more m6A modifications, thereby affecting the phenotype of diapause in progeny.