Identication and Comparative Proling of microRNAs at Different Diapause Stages of Galeruca Daurica Adults

Background: MicroRNAs (miRNAs) are a class of small noncoding RNAs of approximately 22 nt in length, which regulate gene expression at the post-transcriptional level. Although the regulatory roles of miRNAs in various physiological processes throughout insect development have been investigated, it is almost unknown about the roles of miRNAs involved in the regulation of diapause in insects. Results: We constructed 12 small RNA libraries from Galeruca daurica adults at different diapause stages: pre-diapause (PD), diapause (D), post-diapause 1 (TD1), and post-diapause 2 (TD2). Using Illumina sequencing, a total of 95.06 million valid reads was obtained, and 230 miRNAs, including 143 conserved and 87 novel miRNAs, were identied from G. daurica. The expression proles of these miRNAs were assessed across different diapause stages and miRNAs that were highly expressed at different diapause stages were identied. Comparative analysis of read counts indicated that both conserved and novel miRNAs were differently expressed among the four different diapause stages, and the differential expression was validated via qRT-PCR. The 25, 11, 15, 14, 26, and one miRNAs were differentially expressed in D/PD, D/TD1, D/TD2, TD1/PD, TD2/PD, and TD2/TD1, respectively. The KEGG and GO analysis of the predicted target genes suggested the essential roles of miRNAs in the regulation of summer diapause in G. daurica, especially via the juvenile hormone, ribosome, MAPK signaling, mTOR signaling, Ca 2+ signaling, and G-protein coupled receptor signaling pathways. Conclusion: Our research results indicate that miRNAs may be involved in the regulation of summer diapause in G. daurica, and these results also provide an important new small RNA genomics resource for further studies on insect diapause.


Background
Diapause is a programmed dormancy that allows organisms to tolerate predictable periods of unfavorable conditions by temporarily stopping development and reducing metabolism [1]. Understanding the mechanism of diapause is essential not only for predicting the geographic and seasonal distributions of insects but also for the development of effective pest management strategies [2]. Although physiological and biochemical aspects of diapause are well understood and a growing body of literature reports the changes in gene expression that drive the diapause program in insects, many details of the regulatory networks remain largely unknown. One particular gap in our understanding of diapause is how regulatory noncoding RNAs regulate diapause [3].
Galeruca daurica (Coleoptera: Chrysomelidae) is a new pest on the grasslands of Inner Mongolia, China. Since its sudden outbreak in 2009, this pest has continually spread and caused great losses to pasture on the Inner Mongolia grasslands [30,31]. The leaf beetle occurs one generation each year, and adults survive summer as an obligatory diapause form. Summer diapause ends about three months later in autumn, and adults start to feed again, mate and lay eggs which overwinter under cow dung, stone and thick grass [32]. However, the regulation mechanism of summer diapause in this pest has little known until now. Chen et al. investigated the biochemical change in G. daurica adults during oversummering [32]. Ma et al. compared the change of proteomic levels in adults at the pre-diapause, diapause and post-diapause stages using iTRAQ, and found a large number of differentially-expressed proteins involved in metabolic process, stress response, cytoskeletal reorganization, and phagosome pathway [30]. Moreover, juvenile hormone regulation and Ca 2+ signaling may play an important role in the regulation of summer diapause in G. daurica. Our present research aims to discover potential contributions of miRNAs in regulating summer diapause in G. daurica adults.

Results
Analysis of sRNA data from libraries In order to obtain the microRNAs regulating summer diapause in G. daurica adults, 12 sequencing libraries were constructed from the adult samples at different diapause stages (PD, D, TD 1 , and TD 2 ). The raw data have been deposited in the National Center for Biotechnology Information (NCBI) Short Read Archive under BioProject ID PRJNA660157 (Accessions SRR12545577-SRR12545588). A total of 170.14 million raw reads were generated, ranging from 27.42 to 10.01 million reads from each library (Table S1). After removing 5′ and 3′ adapter sequences, low-quality reads, and RNAs less than 18 nucleotides and longer than 25 nucleotides, the remaining reads were searched against the Rfam database (http://rfam.janelia.org) and the Repbase database (http://www.girinst.org/repbase). Thereafter, sRNAs were classi ed into different categories according to their annotations as 3′ adapter (ADT) and length lter, junk reads, Rfam, repeats, rRNA, tRNA, snRNA, and snoRNA sequences, and other Rfam RNA sequences (Table S1). Finally, the total number of valid sequences was reduced to 95.06 million (Table S1). The number of unique sRNAs ranged from 2.16 to 0.82 million among the libraries (Table S2). The length distribution of unique sRNAs showed a bimodal pattern with peaks occurring at 21-22 and 25-26 nt (Table S2).
After the 83 miRNAs with very faint expression (reads <10) were excluded, the expression pro ling of remaining 147 miRNAs was analyzed. The heatmap showed the various expression patterns of miRNAs of G. daurica adults in different developmental stages, which could be further divided into 8 clusters (Fig. 1, Table S5). The numbers of miRNAs varied greatly in different clusters. The most abundant cluster D contained 39 miRNAs whereas the smallest cluster C just included four miRNAs.
Interestingly, the highly expressed miRNAs were much less during diapause (D) than those in the other stages (PD, TD1, and TD2), and only cluster A were highly expressed during diapause (D), including bmo−mir−6497−p3_1ss3AG, bmo−mir−6497−p3_1ss12AG, ame−miR−3759−3p_L−4R−2_1ss5CT, bmo−miR−2779_L−1_1ss2TA, bmo−miR−6497−5p_L−3_1ss11CT, bmo−miR−2779_L−2_1ss20AG, PC−5p−117266_63, PC−3p−361832_14, bmo−mir−6497−p5_1ss4AG, bmo−miR−2779_L−1R−1_1ss2TA, and bmo−mir−6497−p5_1ss18CT. The expression abundance of cluster B was highest in the TD2 stage, middle in the PD stage, and lowest in the D and TD1 stages. The clusters C were highly expressed in the TD2 stage, secondly in the TD1 stage, and lowest in the D and PD stages. The cluster D were highly expressed only in the TD1 stage while moderately or lowly in the other stages. The cluster E displayed high expression only in the PD stage while low expression in the other stages. The cluster F had the highest expression level in the PD stage, second in the TD2 stage, third in the TD1 stage, and lowest in the D stage. The ranking of expression abundance for the cluster G was TD1>PD>TD2>D, and PD>TD1>TD2>D for the cluster H.
Veri cation of differentially expressed miRNAs by qRT-PCR To con rm the results of small RNA sequencing, ten differentially expressed miRNAs were selected and the qRT-PCR analyses of ten miRNAs were performed (Fig. 5). Except for miR-970, nine out of ten miRNAs showed similar expression patterns as those revealed by small RNA sequencing, indicating the reliability of our small RNA sequencing data.

Discussion
In the present study, to explore characteristics and functions of miRNAs from G. daurica, 12 libraries from four developmental stages of adults were constructed. In total, 95.06 million valid reads were obtained, indicating G. daurica is rich in small RNAs. The length distribution of our small RNA libraries displayed a bimodal pattern with peaks occurring at 21-22 and 25-26 nt, similarly to previous studies in N. lugens [33], B. mori [34], M. destructor [16], and Bactrocera dorsalis [35], which have two peaks occurring at 22 and 25-27 nt. In contrast, many other species, such as L. migratoria [8], A. albopictus and C.quinquefasciatus [36], L. striatellus [25] and P. interpunctella [26] have unimodal patterns of length distributions with a peak occurring at 22 nt.
In the present study, the down-regulated differentially expressed miRNAs (20) were much more than the up-regulated differentially expressed miRNAs (5) during diapause (D/PD), which was in accord with our iTRAQ proteome data that the differentially overexpressed proteins (82) were more than the differentially underexpressed proteins (57) during diapause [30], because miRNA abundance has most often been negatively correlated with expression of its target genes. Reynolds et al. also got similar results that the majority of differentially expressed miRNAs (8) were down-regulated during diapause and only two miRNAs were up-regulated in diapausing pupae than in their non-diapausing counterparts in S. bullata [3].
Developmental arrest is a de ning feature of diapause, and many miRNAs have been implicated in regulation of developmental processes including developmental timing (let-7) and cell cycle progression (miR-100) [37,38]. Reynolds et al. indicated that let-7 abundance increased signi cantly 48 h post-diapause in S. bullata, and let-7 and miR-100 may be important regulators of developmental arrest in diapausing pupae [3]. In this study, let-7-5p_R+1 and miR-100_R-1 were also signi cantly up-regulated after diapause termination (TD2/D) while they were signi cantly down-regulated during diapause (D/PD) (Table 1). Therefore, these two miRNAs might play regulatory roles in insect diapause.
In many instances, miRNAs are essential to regulate the expression of their target genes to optimal levels [39]. Therefore, it is important to determine the target genes of miRNAs in order to understand their biological functions. In our study, a lot of miRNA target unigenes were predicted from the transcriptomes of G. daurica adults at different diapause stages. Although the accurate targets of these miRNAs are still unknown and much more work will be needed to con rm these exact target genes, these results could also provide us valuable information about the roles of miRNAs in the diapause regulation of G. daurica adults. Juvenile hormone (JH) is produced and secreted by the corpora allata in insects, and regulates various aspects of their developmental and reproductive processes [40]. The absence of JH induces adult diapause [41,42]. Hao et al. found the four differentially regulated genes involved in JH biosynthesis or degradation during summer diapause in D. antiqua [43]. During diapause maintenance in A. albopictus, three genes related to JH binding and metabolism were also differentially expressed [44]. The transcript encoding juvenile hormone epoxide hydrolase (JHEH) was signi cantly increased during adult diapause in Coccinella septempunctata [45]. Our proteomic analysis found that the juvenile hormone binding protein (JHBP) levels increased signi cantly in G. daurica adults developing from the PD to TD stages, whereas the juvenile hormone esterase (JHE) levels decreased [30]. qRT-PCR indicated that GdJHBP was expressed in various developmental stages of G. daurica with the highest expression in larvae and the lowest in eggs and pupae. Expression of GdJHBP was relatively low during summer diapause of G. daurica adults and was higher both before and after diapause [46]. The 150 putative target unigenes for 39 DEMs are involved in the regulation of juvenile hormone synthesis and catabolic process, such as juvenile hormone esterase, juvenile hormone acid methyltransferase, juvenile hormone binding protein5p2, juvenile hormone epoxide hydrolase-like protein 3, putative juvenile hormone-inducible protein, methoprene-tolerant isoform X1, and Kruppel-like protein 1 (Table S8). The main miRNAs targeting these genes included dpu-miR-100_R-1, tca-let-7-5p_R+1, aga-miR-13b, pca-miR-1175-5p_1ss20AT, tca-miR-11-3p, tca-miR-263a-5p_R-5_1ss10GA, tca-miR-277-3p_R-3, bmo-miR-9a-5p_R-2, tca-miR-2788-5p, tca-miR-2944c-3p_R+1, tca-miR-34-5p_R+1, and PC-3p-32095_277.
Ribosomal proteins are a kind of important proteins which are involved in the functions of protein synthesis, cellular metabolism, organism immunity, signal transduction and etc [47]. Robich et al. rstly found that RpS3a was signi cantly downregulated during diapause in Culex pipiens [48]. Kim et al. further found that RpS3a was consistently expressed in nondiapausing females of C. pipiens, but in females programmed for diapause, expression of the RpS3a transcript was dramatically reduced for a brief period in early diapause [49]. RNA interference against RpS3a in nondiapausing females stopped follicle development, mimicking the diapause state. Another ribosomal protein gene, RpS2, was also expressed continuously in nondiapausing females reared under long-day conditions, while it was strongly down-regulated 5-18 days after adult eclosion in females reared under the short-day conditions that induce diapause, and an exogenous application of JHIII could rescue the arrest in follicle growth caused by dsRpS2 [50]. These results suggest that ribosomal proteins may play a critical role in arresting the ovarian development associated with adult diapause. In our study, 184 putative target unigenes encoding ribosomal proteins for 49 DEMs were found (Table S9). The miRNAs targeting RpS2 included ame-miR-316-5p_R-1_1ss21CG, bmo-miR-9a-5p_R-2, tca-miR-87b-3p_R-2, and PC-3p-32095_277 while the miRNAs targeting RpS3a contained bmomir-6497-p5_1ss4AG, tca-miR-277-3p_R-3, ame-miR-316-5p_R-1_1ss21CG, bmo-miR-9a-5p_R-2, PC-5p-69765_116, and PC-3p-545_10300. Our study showed that GdRpS3a was expressed in different developmental stages with the highest expression levels after termination of adult diapause and the lowest during the egg stage and adult diapause of G. daurica (unpublished). Next, we will furtherly study the roles of these miRNAs in the regulation of summer diapause in G. daurica adults.

Conclusions
A total of 230 miRNAs was identi ed from G. daurica, and the expression analysis results demonstrated that the transcript levels of these miRNAs were changed in different developmental stages of G. daurica adults. The results of target prediction suggest that miRNAs may be involved in the regulation of summer diapause of G. daurica, especially via the juvenile hormone, ribosome, MAPK signaling, mTOR signaling, Ca 2+ signaling and G-protein coupled receptor signaling pathways, and these results provide an important new small RNA genomics resource for further studies on insect diapause.

Methods
Insect rearing, sampling, and sample preparation Eggs were collected from the Xilinguole grasslands of Inner Mongolia, China, on April 20, 2019. Egg collection was not necessary to be permitted. Eggs were incubated (RH: 70 ± 5%; temperature: 25 ± 1°C) until hatching. Larvae were transferred to natural conditions, and reared on Allium mongolium. As Ma et al. described [30], sampling was conducted at 3, 40, 90, and 100 days after adult emergence to obtain adult samples at the pre-diapause (PD, before diapause), diapause (D, during diapause), post-diapause (TD1, diapause terminated), and post-diapause (TD2, diapause terminated and the female abdomen expanded, indicating that the ovary began to develop) stages. Each sample (10 females and 10 males) for each treatment (PD, D, TD1 and TD2) with three replicates was ground in liquid nitrogen, and stored at -80°C until sRNA-seq and quantitative real-time PCR (qRT-PCR) was performed.
Small RNA isolation, cDNA library construction and deep sequencing Total RNA of each treatment was isolated using TRIzol Reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer's protocols. The quantity of the total RNA was accessed by NanoDrop™ 2000 spectrophotometer (Thermo Fisher, Waltham, MA, USA), and the integrity of the RNA samples were evaluated using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and only values of 28S/18S ≥ 0.7 and RIN ≥ 7.0 were considered quali ed for the subsequent small RNA library construction. The total RNA samples were size-fractionated, and 10-30 nt long small RNAs were isolated through gel separation. These small RNAs were puri ed and then ligated with 3′ and 5′ adapters. The fragments with adaptors on both ends were enriched by PCR after reverse transcription. After puri cation, the PCR ampli cation products were sequenced by the Lc-Bio Technologies Company, Ltd. (Hangzhou, China) using the Illumina Hiseq 2500 platform.
Sequence analysis, miRNAs identi cation, and data processing Raw reads were subjected to an in-house program, ACGT101-miR (LC Sciences, Houston, Texas, USA) to remove adapter dimers, junk, low complexity, common RNA families (rRNA, tRNA, snRNA, snoRNA) and repeats. Subsequently, unique sequences with length in 18-26 nucleotide were compared with known insect miRNAs in miRBase 22.0 by BLAST search to identify known miRNAs and novel 3p-and 5p-derived miRNAs. Length variation at both 3' and 5' ends and one mismatch inside of the sequence were allowed in the alignment. The unmapped sequences were BLASTed against the G. daurica transcriptome, and the hairpin RNA structures containing sequences were predicated from the ank 80 nt sequences using RNAfold software (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). The criteria for secondary structure prediction were: (1)  Data normalization followed the procedures with minor modi cation [62]. (1) Find a common set of sequences among all samples; (2) Construct a reference data set. Each data in the reference set is the copy number median value of a corresponding common sequence of all samples; (3) Perform 2-based logarithm transformation on copy numbers (Log2 (copy#)) of all samples and reference data set; (4) Calculate the Log2 (copy#) difference (∆Log2 (copy#)) between individual sample and the reference data set; (5) Form a subset of sequences by selecting |∆Log2 (copy#)| < 2, which means less than 4 (2 2 ) fold change from the reference set; (6) Perform linear regressions between individual samples and the reference set on the subset sequences to derive linear equations y = a i x + b i , where a i and b i are the slop and interception, respectively, of the derived line, x is Log2 (copy#) of the reference set, and y is the expected Log2 (copy#) of sample i on a corresponding sequence; (7) Calculate the mid value x mid = (max(x)-min(x))/2 of the reference set. Calculate the corresponding expected Log2 (copy#) of sample i, y i,mid = a i x mid + b i . Let y r,mid = x mid , let ∆y i = y r,mid -y i,mid , which is the logarithmic correction factor of sample i. We then derive the arithmetic correction factor f i = 2 ∆yi of sample i; (8) Correct copy numbers of individual samples by multiplying corresponding arithmetic correction factor to original copy numbers.

Analysis of differentially expressed miRNAs
Differential expression of miRNAs based on normalized deep-sequencing counts at different diapause stages was analyzed by using Student t-test. The signi cance threshold was set to be 0.05 in each test. miRNAs with P-value<0.05 between different stages (D vs PD, TD 1 vs D, TD 2 vs D, TD 1 vs PD, TD 2 vs PD, and TD 2 vs TD1) were considered as differentially expressed miRNAs (DEMs).

Prediction of target genes of miRNAs
The miRNA sequences were aligned against the G. daurica transcriptome database assembled in our lab (Accession number: SRP147970) to identify possible target sites by using the TargetScan 50 and Miranda 3.3a software packages [63]. The default parameters of miRanda were set score and energy threshold of 140 and −10 kcal/mol, respectively, which demanded strict 5′seed region (nucleotide 2-8 from the 5′end of miRNA sequence) pairing as well. The default parameters of TargetScan were that the 2-8 nt sequences of 5′ miRNA were selected as seed region to predict with 3′sequence of transcripts. Finally, the target genes simultaneously recommended by the two algorithms were de ned as the nal prediction.
Functional analysis of target genes To better understand the function of the target genes and their corresponding metabolic network at different diapause stages of G. daurica adults, annotations of target gene functions were performed using the Gene Ontology (GO, http://www.geneontology.org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/) pathway databases. The most abundant DEM targets were annotated based on sequence similarity by performing a BLASTx search against the GO protein database. Furthermore, target genes were categorized according to their functions under biological processes, molecular functions, and cellular components using GO analysis. The enriched metabolic pathways or signal transduction pathways of potential miRNA target genes were validated using KEGG enrichment analysis to enrich the KEGG terms. Fisher's exact test was employed to lter the signi cant GO terms and KEGG pathways using R software 3.3.1., and P < 0.05 was considered statistically signi cant.

Validation of miRNA expression pro les
The RNA samples for small RNA library were also used for qRT-PCR to con rm the reliability of the sequencing results. For the analysis of miRNAs, cDNA was synthesized using the Mir-X miRNA First-Strand Synthesis Kit (TaKaRa, Dalian, China) according to the manufacturer's instructions. The qRT-PCRs were performed on FTC-3000 (Funglyn Biotech, Canada) using the GoTaq® qPCR Master Mix (2×) (Promega, USA). The reaction pro le was as follow: starting with a 10 min incubation at 95℃ to activate the hot-start polymerase followed by 40 cycles at 95 ℃ for 15 s and 60℃ for 1 min, and then at 95℃ for 15 s, at 60℃ for 15 s and at 95℃ for 15 s. We performed three biological replicates (4 females and 4 males for each replicate) and four technical replications of each reaction, and U6 snRNA was used as an internal control gene for qRT-PCR of miRNAs. The relative expression values of the randomly chosen DEMs were calculated based on the 2 -∆∆Ct method [64]. The primers were designed by using the Primer Premier 5.0 (http://www.premierbiosoft.com/primerdesign/index.html) (Table S11)    The number of differentially expressed miRNAs at different diapause stages of G. daurica adults. The differentially expressed genes were analyzed by comparing D to PD, TD1 to PD, TD2 to PD, TD1 to D, TD2 to D, and TD2 to TD1 based on a P-value of < 0.05 by t-test.