Molecular dominance investigation for large-sized parents of Chinese Mitten Crab ( Eriocheir sinensis ) based on ovarian transcriptome

Background: Germplasm degradation is one of the important issues for the healthy development of Chinese Mitten Crab (Eriocheir sinensis) industry. Nowadays, nursery units and crab farmers have realized the seedling superiority of large-sized parents ( ≥ 200g male, ≥ 150g female), but the underlying molecular mechanisms remain unclear. Here we investigated the advantage of breeding with large-sized parents based on ovarian transcriptome. Results: RNA isolation and transcriptome sequencing were performed using ovarian tissue obtained from three large-sized female E. sinensis and three medium-sized female E. sinensis. The differentially expressed genes (DEGs) were investigated between two groups, followed by the exploration of common DEGs (co-DEGs) among the DEGs in the current study and published genes for growth, reproduction, and immune response pathways. Quantitative real-time polymerase chain reaction (qRT-PCR) was performed to validate all co-DEGs. Finally, a pathway enrichment analysis was performed to investigate the potential pathways associated with the validated co-DEGs. A total of 5,307 up-regulated genes and 3,465 down-regulated genes were investigated between the two groups, of which 43 co-DEGs were explored when all the DEGs in the current study were integrated with 338 genes in 15 reproductive, immune, and growth pathways. A total of nine co-DEGs, including TRINITY_DN13931_c0_g1, TRINITY_DN1908_c0_g2, and TRINITY_DN6686_c0_g1, were enrolled for further study after qRT-PCR validation. These validated co-DEGs were mainly enriched in pathways such as apoptosis, insulin signaling, and mTOR signaling. Conclusion: Our original work might lay an important molecular foundation for further exploring the advantages of using large-sized parents for E. sinensis breeding in the future. Based on the comparative analysis of gene expression, the advantages of breeding large-sized female E. sinensis parents at the molecular level, such as growth, reproduction, and immunity, were analyzed for the rst time. The apoptosis pathways, insulin signaling pathways, and mTOR signaling pathway might be involved in the advantage of breeding large-sized female E.

1. Molecular advantage of large-sized female sinensis parent investigated 2. Apoptosis, insulin, and mTOR pathways related to large-sized advantage 3. Genes like TRINITY_DN13931_c0_g1 were vital for large-sized sinensis breeding 4. 8,772 differentially expressed genes between large-sized and medium-sized crabs Background The Chinese mitten crab Eriocheir sinensis is a traditional aquatic treasure in China [1] . It is considered a luxury aquatic food for Chinese people due to its delicate avor [2]. The arti cial culture of E. sinensis has become both a leading and pillar industry in the cultivation of new freshwater varieties in China, and plays an important role in promoting the sustainable and healthy development of freshwater aquaculture in China [3]. High quality germplasm resources are bene cial for improving the breeding and economic effects of E. sinensis [4], but the degradation of germplasm resources in recent years is one of the most critical factors restricting its production. For example, to reduce broodstock costs, nearly all commercial hatcheries in China select small or medium-sized E. sinensis (females: 60-100 g/ind.; males: 80-150 g/ind.) as parents [5].
Parents with excellent characteristics contribute to solving the problem of germplasm degradation in E. sinensis [6]. At present, aquaculture and scienti c research have increasingly recognized the in uence of parental size on crab production [7,8]. The selection of large E. sinensis female: ≥ 150 g/ind.; male: ≥ 200 g/ind. as parents for group selection is a key measure for puri cation and rejuvenation [9]. It was proven that the fecundity, egg holding capacity, and spawning capacity of offspring increased with the increase in parents' size [7,8]. Meanwhile, the offspring of large-sized parents result in a higher proportion of large-sized market crabs and higher economic bene ts [10]. Moreover, a previous study showed that the growth characters and nutritional status in the juvenile phase (coin size) of the offspring of largesized parents were better than those from medium-sized parents [11]. Although genome sequencing, assembly, and annotation provide a valuable resource for understanding the molecular expression of animal growth and development processes, obtaining the genome sequence of any crab species is very di cult [12].
Transcriptome analysis is an effective method for evaluating gene expression and its biological function [13]. A previous study showed that transcriptome pro ling of the testes during the reproduction in E. sinensis can be realized using Illumina sequencing [14]. The molecular mechanism of the process of growth, immunity, and reproduction in E. sinensis has become one of the main research directions in this eld [15,16]. By comparing the transcripts from the ovarian library, a previous transcriptome analysis successfully identi ed gonadal-associated genes, which was helpful in understanding the regulatory mechanism related to the reproductive pathway [17]. The immune response during the development of E. sinensis was also explored at the RNA expression level [18]. However, knowledge of the underlying molecular mechanisms associated with growth, reproduction, and immune responses in the regulation of large-sized E. sinensis is still very limited. To understand the advantage of using large-sized E. sinensis as parents at the molecular level in more detail, the regulatory pathways associated with the growth, reproduction, and immune response of E. sinensis need to be studied.
In this study, tissues were obtained from the ovaries of large E. sinensis and medium-sized E. sinensis, followed by RNA isolation and transcriptome sequencing. The differentially expressed genes (DEGs) were investigated based on the sequencing data between the two groups. Then, the common DEGs (co-DEGs) were explored among the DEGs identi ed in the current study and the genes in growth, reproduction, and immune response pathways that have been previously published. All co-DEGs were validated by quantitative real-time PCR (qRT-PCR). Finally, a pathway enrichment analysis was performed to investigate the potential pathways associated with the validated co-DEGs in the large E. sinensis. This study hopes to reveal the advantage of using large-sized E. sinensis parents in breeding from a molecular perspective.

Correlation analysis between samples
After quality control and data pre-processing, the sequencing data were enrolled for the correlation analysis between the F-O-D group and F-O-Z group to test the reliability of the experiment and reasonable sample selection. The PCA analysis showed that the samples between the two groups were distributed in a decentralized way, while the samples within groups were clustered (Fig. 1). The correlation clustering heatmap showed that the correlation of samples enrolled in this study was high (Fig. 2).
Differentially expressed genes analysis based on sequencing data Based on the sequencing data, a total of 5,307 up-regulated genes and 3,465 down-regulated genes were revealed between the F-O-D group and F-O-Z group. The volcano plot for the result of the DEGs analysis is shown in Fig. 3A, which directly indicates the relative expression values of genes among samples in different groups. The scatter plot for all these DEGs is shown in Fig. 3B. Furthermore, the results of the cluster analysis for gene expression patterns showed that the expression of genes in the same sample was aggregated, and the difference was obvious among the samples (Fig. 4).

Co-differentially expressed genes investigation and quantitative real-time polymerase chain reaction veri cation
Based on the reference retrieval, there were seven reproduction-associated pathways (retinol metabolism, cell cycle, mTOR signaling pathway, Wnt signaling pathway, insulin signaling pathway, ovarian steroidogenesis, and progesterone-mediated oocyte maturation), four growth-associated pathways (mTOR signaling pathway, PI3K-Akt signaling pathway, Wnt signaling pathway, and TGF-beta signaling pathway), and seven immune-associated pathways (cell cycle, lysosome, endocytosis, phagosome, apoptosis, cell adhesion molecules (CAMs), and natural killer cell mediated cytotoxicity) screened out. A total of 338 genes enriched in these 15 pathways (after removing overlapping pathways) were explored. Combined with the DEGs investigated in the current study, a total of 43 DEGs related to reproductive, immune, and growth processes were further revealed. Finally, the qRT-PCR analysis showed that a total of nine co-DEGs (TRINITY_DN6686_c0_g1, TRINITY_DN6296_c2_g1, TRINITY_DN2026_c1_g1, TRINITY_DN1908_c0_g2, TRINITY_DN605_c0_g2, TRINITY_DN2717_c0_g1, TRINITY_DN608_c5_g1, TRINITY_DN13931_c0_g1, and TRINITY_DN35647_c0_g1) showed the same trend as the sequencing results.

Discussion
Although the advantage of using large-sized E. sinensis in actual production has been proven, the detailed molecular mechanism for this is still unclear. In this study, based on the E. sinensis ovarian transcriptome data analysis, a total of 5,307 up-regulated DEGs and 3,465 down-regulated DEGs were revealed between the large and medium-size groups. A total of 43 co-DEGs were explored when all the DEGs in the current study were integrated with 338 genes from 15 reproductive, immune, and growthassociated pathways reported in the literature. After qRT-PCR validation, a total of nine co-DEGs, including TRINITY_DN13931_c0_g1, TRINITY_DN1908_c0_g2, and TRINITY_DN6686_c0_g1, were nally investigated. The pathway enrichment analysis showed that these validated co-DEGs were mainly enriched in pathways such as the insulin signaling pathway (a reproduction-associated pathway), mTOR signaling pathway (a growth-associated pathway), and apoptosis (an immune-associated pathway).
The close relationship between insulin signaling and reproduction has already been revealed in the body development of helminths and insects [19]. Insulin signaling is an important pathway that regulates developmental and differentiation functions such as reproduction in animals [20]. Also, the control of body size involves growth regulating pathways such as insulin [21]. A previous study showed that the insulin-like androgenic gland hormone gene extensively participates in the reproduction of the female mud crab [22]. With the help of transcriptome sequencing, the multiple insulin-like peptides that participate in the insulin signaling pathway have been identi ed in crustacean species [23]. Importantly, female crabs are considered an important platform for the investigation of the biological function of insulin-like peptides in crabs [24]. A previous study indicated that the inhibitory role of insulin in vitellogenesis and oocyte maturation in the female mud crab is visualized via certain signaling pathways, such as autocrine and paracrine pathways [20]. Interestingly, the stimulation of vitellogenin expression is realized via insulin in the crab [23] . These insulin-associated proteins, including insulin-like peptides, are not only involved in ovarian development [24], but also negatively regulate glucose metabolism and participate in the immune response of E. sinensis against pathogen infection [25] .
A previous study showed that the mTOR pathway is associated with the insulin signaling pathway during the regulation of animal development, life span, and the length of larval development [26]. mTOR is a member of the phosphatidylinositol 3-kinase-related kinase family, which regulates various biological processes including growth control [27]. Since crustaceans must shed their exoskeleton periodically to develop and grow, the mTOR signaling pathway, which regulates growth and molting, has been widely studied by researchers [28]. It was proved that mTOR signaling gene expression contributes to the growth associated function in Gecarcinus lateralis [29]. The mTOR signaling genes are involved in molt regulation during the thermal acclimation of juvenile Dungeness crabs [30]. A previous study showed that mTOR integrates intrinsic signals (molting hormones) and extrinsic signals (thermal stress) to regulate molting and growth in decapod crustaceans. The molting gland is the source of steroid hormone production and the consequent regulation of the molt cycle in decapod crustaceans, and is responsive to both external environmental and internal physiological signals in the crab [28]. In crustaceans such as crabs, mTOR activity either directly or indirectly controls the transcription of genes that drive the activation of the molting gland [31].
The apoptosis pathway has been shown to participate in the immune process [32]. Recently, the effector caspases such as Sp-caspase, which exhibit an immune response and cell apoptosis, have been identi ed in the mud crab [33]. In crabs, apoptosis can induce the activity of hemocytes, which play a vital role in defending against pathogen invasion after pathogen stimulation [34]. For E. sinensis, a previous study showed that the reduction of apoptosis via certain proteins (such as Es IAP1) can regulate the activity of cells during the growth process [35]. A recent transcriptomic analysis showed that the growth inhibition and survival rate reduction induced by various factors (such as hepatopancreatic necrosis disease) were signi cantly correlated with apoptosis in the development of E. sinensis [36]. Importantly, genes such as nm23 and caspase in the apoptosis pathway participate in resistance to adverse environments during the growth of E. sinensis individuals [37,38]. Thus, large-sized parents of E. sinensis might display a strong adverse environment resistance via gene expression in apoptosis, which further bene ts the improvement of individual survival rates and economic value. In this study, the sequencing and differential expression analysis revealed nine co-DEGs, including TRINITY_DN13931_c0_g1, TRINITY_DN1908_c0_g2, and TRINITY_DN6686_c0_g1, between large-sized E. sinensis and medium-sized E. sinensis. Importantly, these co-DEGs were enriched in pathways including apoptosis, insulin signaling, and mTOR signaling. Thus, we speculated that apoptosis, insulin signaling, and mTOR signaling pathways enriched by TRINITY_DN13931_c0_g1, TRINITY_DN1908_c0_g2, and TRINITY_DN6686_c0_g1 might be associated with the advantage of using large-sized parents in E. sinensis breeding.

Conclusions
Our original work might lay an important molecular foundation for further exploring the advantages of using large-sized parents for E. sinensis breeding in the future. Based on the comparative analysis of gene expression, the molecular-level advantages of using large-size female parents when breeding E. sinensis, such as growth, reproduction, and immunity, were analyzed for this study. Apoptosis pathways, insulin signaling pathways, and the mTOR signaling pathway might play important roles in the advantage of large-size parents in E. sinensis breeding.

Methods
Animal and sample collection A total of three large female E. sinensis (weight: 248.60 ± 3.12 g) (F-O-D group) and three medium-sized female E. sinensis (weight: 102.87 ± 1.52 g) (F-O-Z group) were provided by a local crab breeding unit in the mating season (November 30, 2019). All crabs were washed and the ovarian parts of their reproductive system were dissected for the comparative analysis of the transcriptome. Transcriptome sequencing was provided by Shanghai Origingene Biomedical Technology Co., Ltd.

RNA isolation and sequencing
Brie y, using the TRIzol kit (Takara, Dalian, Chian), the total RNA in different samples was extracted, followed by puri cation via NanoDrop 2000 (Thermo Fisher, Massachusetts, USA). The reverse transcription cDNA library of the total RNA (5 μg/sample) was implemented using the TruseqTM RNA sample prep kit for Illumina® (New England BioLabs, Massachusetts, USA). Then, after treatment with repaired end modi cation, the cDNAs were ampli ed via 15 cycles of the repair chain reaction. Finally, the Illumina Hiseq 4000 platform (Illumina) with paired-end method [39] was used for cDNA cluster sequencing.
Pretreatment of RNA-seq data Quality control of the raw reads was implemented using FastqStat.jar tools (version: 1.0) as follows: i) removal of barcode sequences and bases containing non-AGCT at the 5' end; ii) elimination of the paired reads with N > 10%; iii) exclusion of low-quality reads in which bases with Q < 20; and iv) exclusion of reads with a length < 25 bp. The quality control of clean reads was performed using fastqc (version 0.11.4) [40].

Transcriptome assembly and gene expression analysis
Trinity is a software suitable for de novo assembly without a reference genome based on the Illumina short fragment sequence (clean data) [41]. In this study, the de novo transcriptome assembly with sequence reads obtained from the Illumina platform was performed by Trinity Assembler (version: 2.6.6), followed by gene prediction using the ORF method in Trinity. Then, based on the RNA-seq data quanti cation software Kallisto (version: 0.43.1), the quantitative analysis of the gene expression levels was investigated among different samples [42]. Transcripts per million reads (TPM) were used as quantitative indicators in the current analysis.

Relation analysis among samples
A correlation analysis was performed to investigate the correlation among samples, especially among biological replicates. Furthermore, a principal component analysis (PCA) analysis was performed to identify the impact of a large sample based on the expression of the sample clustering.
Differentially expressed genes analysis The differential expression analysis of genes between the two groups was performed using edgeR (version: 3.24) [43]. A false discovery rate (FDR) < 0.05, and |log2 fold change (FC)| > 1 were selected as cutoff thresholds for the DEGs investigation. The results were visualized using a volcano plot and scatter plot. Then, the clustering analysis of DEGs was performed with a distance algorithm (between samples: Spearman correlation coe cient; between genes: Pearson correlation coe cient) to reveal genes with the same or similar expression patterns using plot_cluster_exp (version: 1.1.0). The results of clustering were visualized by a heatmap.
Co-differentially expressed genes exploration based on the literature search The pathways and enriched genes related to reproduction, growth, and immunity were screened for by reference retrieval. Electronic literature databases including Embase, PubMed, and the Cochrane Library were searched. In addition, a literature review was used to select relevant studies. Then, the genes enriched in these pathways were intersected with DEGs in the current study to obtain the co-DEGs associated with reproduction, growth, and immune responses.
Quantitative real-time polymerase chain reaction assay In order to screen the DEGs with the same trend as the sequencing results, quantitative veri cation based on a qRT-PCR was performed on 12 co-DEGs that had higher FC. Brie y, according to the manufacturer's instructions, total RNA (10 μl) was extracted and quanti ed from each sample in each group using TRIzol reagent (Invitrogen, California, USA), and reverse-transcribed using SYBR Green PCR Mastermix (Applied Biosystems, Woolston, Warrington, UK). Then, the qRT-PCR was performed with the help of the ABi-PrisMR 7500 sequence detection system (Applied Biosystem, Foster City, California, USA). The gene ACTB was used as a reference. The ampli cation transcription reaction conditions were as follows: 95°C for 30 s, 40 cycles at 95°C for 5 s, and 60°C for 34 s. At the end point of each extended period, the signal was obtained, and then the ampli cation curve was studied. The primer sequences are listed in Table 2.
The relative expression of the target genes was calculated using the 2 -∆∆CT method [44]. Table 2 The primers used in this study. Pathway enrichment analysis The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of the co-DEGs veri ed by qRT-PCR were performed using go_enrichment (version: 2.1.0) software [45,46]. A P-value < 0.05 was considered a cutoff value for signi cant KEGG enrichment.

Declarations
Ethics approval and consent to participate Not applicable. The sampling location was not privately-owned or protected, and eld sampling did not involve protected species.

Consent for publication
Not applicable.
Competing Interest