3.1. Transcriptome sequencing and assembling based on reference genome
The twelve cDNA libraries were constructed through Illumina Nova seq 6000 platform. A total of 539,421,400 raw reads were obtained respectively. After data filtering, 522,003,548 clean reads (96.77%) and 78.30 Gb clean bases were acquired. About 5.82–7.18 Gb clean bases were produced for each library. The detailed information was shown in Table 2. Mapping to the reference genome of N. denticulata sinensis, 70% of clean reads on average were aligned to the reference genome and 68% of reads uniquely aligned to the reference genome (Table S2). The distribution of reads in the genomic region indicated that the proportion of reads aligned to exon was the highest, followed by the intergenic region and intron (Fig. 2). Finally, the spliced clean reads were clustered into 37,185 unigenes (27,905 known unigenes, 9,280 novel unigenes).
Table 2
Summary of data statistics by Illumina Sequencing
Sample | Raw_reads | Clean_reads | Clean_bases | Error_rate (%) | Q20 (%) | Q30 (%) | GC_pct (%) |
N1 | 46,261,934 | 44,523,362 | 6.68G | 0.03 | 97.03 | 91.73 | 39.67 |
N2 | 42,556,224 | 41,252,454 | 6.19G | 0.03 | 96.88 | 91.38 | 39.32 |
N3 | 47,477,422 | 46,848,198 | 7.03G | 0.03 | 96.98 | 91.6 | 39.58 |
CE1_1 | 39,774,562 | 38,798,422 | 5.82G | 0.03 | 96.87 | 91.36 | 36.94 |
CE1_2 | 42,786,830 | 41,798,550 | 6.27G | 0.03 | 96.99 | 91.6 | 38.13 |
CE1_3 | 46,992,542 | 46,145,040 | 6.92G | 0.03 | 97.11 | 91.85 | 37.75 |
CE2_1 | 46,386,434 | 45,470,132 | 6.82G | 0.03 | 97.07 | 91.76 | 38.10 |
CE2_2 | 44,255,572 | 43,083,416 | 6.46G | 0.03 | 96.51 | 90.59 | 35.89 |
CE2_3 | 50,150,176 | 47,890,160 | 7.18G | 0.03 | 96.79 | 91.20 | 35.08 |
Z1 | 46,401,008 | 42,915,382 | 6.44G | 0.03 | 96.85 | 91.35 | 36.84 |
Z2 | 41,897,708 | 40,867,828 | 6.13G | 0.03 | 97.07 | 91.81 | 39.75 |
Z3 | 44,480,988 | 42,410,604 | 6.36G | 0.03 | 97.08 | 91.85 | 40.00 |
3.2. Functional annotation of genes and quantification of gene expression
Five functional databases were performed to annotate the assembled 37,185 unigenes functionally and the results were as follows: PFAM (16,302 unigenes), GO (13,239 unigenes), COG (9,064 unigenes), KEGG (4,920 unigenes) and KO (4,750 unigenes). The annotation of known genes and new genes was shown in Table 3. A total of 9,064 unigenes (24.38%) were classified into 24 COG functional categories (Fig. 3a, Table S2). The function unknown (2,037 unigenes, 22.47%) was the predominant group, followed by signal transduction mechanisms (959 unigenes, 10.58%), posttranslational modification, protein turnover and chaperones (795 unigenes, 8.77%) (Table S2). No unigenes were functionally annotated to the remaining groups: X (Mobilome: prophages, transposons) and R (General function prediction only).
According to GO annotation analysis, 13,239 unigenes (35.60%) were classified into three functional categories: biological process, cellular component and molecular function (Fig. 3b). The result showed that membrane (GO:0016020), metabolic process (GO:0008152) and binding (GO:0005488) were the most prominent enriched terms in cellular component, biological process, and molecular function respectively (Table S4). Based on KEGG database, 4,920 (13.32%) unigenes were assigned to 27 pathways covering five major KEGG categories (Fig. 3c). Among them, transport and catabolism (834 unigenes, 16.95%), signal transduction (617 unigenes, 12.54%), translation (499 unigenes, 10.14%), global and overview maps (3428 unigenes, 69.67%) and organismal systems (75 unigenes, 1.52%) were separately the main enriched pathways in cellular processes, environmental information processing, genetic information processing, metabolism and organismal systems (Table S5).
To further explore the genes and pathways related to compound eye development, FPKM were used to show the gene expression. A lot of 11,970 co-expression genes were detected at a FPKM threshold of one in all four groups, while 657, 469, 502 and 2254 stage-specific genes were identified in N, CE1, CE2 and Z stages respectively (Fig. 4a). The correlation of samples was evaluated by Pearson correlation coefficient (Fig. 4b). The result revealed that CE1 and CE2 stage had similar expression patterns compared to N and Z stage. It meant that there might be no significant change in the biological process during the formation of compound eye pigment in the embryo.
Table 3
Summary of function annotation of N. denticulata sinensis transcriptome
| Number of Known Genes | Number of Novel Genes | Total Genes |
Annotated in PFAM | 15,638 | 664 | 16,302 |
Annotated in GO | 12,435 | 804 | 13,239 |
Annotated in KEGG | 4,744 | 176 | 4,920 |
Annotated in COG | 8,855 | 209 | 9,064 |
Annotated in KO | 4,580 | 170 | 4,750 |
(a)
(b)
(c)
Figure 3 Summary of gene annotation. (a) The COG annotation of assembled genes. The x-axis indicates content of each category of COG and the y-axis indicates number of genes annotated in each category. (b) The GO annotation of assembled genes. the x-axis indicates number of genes annotated in each term and the y-axis indicates names of each term. (c) The KEGG classification of assembled genes.
(a)
(b)
Figure 4 Quantitative analysis of gene expression. (a) Venn diagram of expressed genes across four groups. (b) Heatmap of the Pearson's correlation of different samples.
3.3 Identification and clustering analysis of differentially expressed genes (DEGs)
To explore the dynamic expression changes of genes, DEGs were obtained by comparing FPKM values of genes between adjacent groups. A total of 7,574 DEGs were differentially expressed. Among them, 4,898 genes showed differential expression from N to CE1 stage (3,234 up-regulated, 1,664 down-regulated), 1,101 DEGs from CE1 to CE2 stage (422 up-regulated, 679 down-regulated), and 3,389 DEGs from CE2 to Z stage (2,395 up-regulated, 994 down-regulated) (Fig. 5a). Furthermore, 289 genes were differentially expressed in all these comparison groups (Fig. 5b). In general, the comparison of N and CE1 stage had the largest number of DEGs, indicating that the transition from N to CE1 stage might be a critical stage for normal embryonic development.
According to the expression trend of DEGs, genes were divided into four clusters. The results of the cluster analysis were presented in Fig. 5c. A total of 1,659 DEGs were clustered into cluster1 (C1) and had the highest expression on N. Besides, the 1,275 DEGs clustered to C2 were highly expressed in CE1 and CE2. The C3 and C4 comprised 1,919 and 2,721 genes respectively. Except for N, the DEGs in C3 were significantly expressed in the other three developmental stages, with the most highly expressed genes in Z stage followed by CE2 and CE1. The DEGs in C4 were only highly expressed in Z stage. KEGG enrichment analysis revealed that the DEGs in C1 were mainly enriched to lipid metabolism-related pathways, whereas the DEGs in C2 were mainly enriched in pathways that maintained normal DNA replication. Additionally, the phototransduction - fly and insect hormone biosynthesis pathway were pathways to which both C3 and C4 were enriched.
Considering that the insect hormone biosynthesis pathway was related to the synthesis of juvenile hormones and ecdysteroids, we further explored key ecdysis-related members and their expression patterns during embryonic development, namely crustacean hyperglycaemic hormone (CHH), molt-inhibiting hormone (MIH), crustacean cardioactive peptide (CCAP), juvenile hormone epoxide hydrolase (JHEH) and juvenile hormone acid methyltransferase (JHAMT). The expression of JHEH gradually decreased, while the expression of other genes gradually increased (Fig. 5d, Table S6). The high expression of MIH, CHH and CCAP suggested that embryos could fulfil de novo ecdysteroid synthesis for embryonic hatching. In a word, there was strong cell proliferation in the embryo prior to compound eye development, meanwhile, the formation of the visual system and the hormonal regulation of hatching became the dominant biological events when compound eye development was initiated.
(a)(b)
(c)
(d)
Figure 5 Summary of differential gene analysis. The average FPKM values were used to plot heatmaps. (a) Statistics of differentially expressed genes (DEGs) between adjacent groups. Blue columns represent the total number of differentially expressed genes, red columns represent up-regulated genes and green columns represent down-regulated genes. (b) Overlap of DEGs between comparison groups. (c) The clustering analysis of DEGs. The figure consists of a boxplot, heatmap and KEGG annotation information. The boxplot and heatmap shows the show the gene expression in each cluster, while the pathways are shown on the right side of the heatmap. For the heatmap, red indicates high relative expression level and blue indicates low relative expression level. (d) Heatmap of genes related to molting. CHH, crustacean hyperglycemic hormones; MIH, molt-inhibiting hormone; CCAP, crustacean cardioactive peptide; JHEH, juvenile hormone epoxide hydrolase; JHAMT, juvenile hormone acid O-methyltransferase.
3.4 Gene ontology enrichment and KEGG pathway enrichment analysis for DEGs
GO functional enrichment analysis further detected significantly enriched GO terms in DEGs (padj < 0.05). In the comparison between N and CE1 stage, 619 DEGs were significantly enriched in 44 GO terms. Ion transport (GO:0006811) and extracellular region (GO:0005576) were the most highly enriched GO terms in biological process category and cellular component category, respectively. In molecular function category, ion channel activity (GO:0005216), channel activity (GO:0015267), passive transmembrane transporter activity (GO:0022803) and substrate-specific channel activity (GO:0022838) were the most enriched four GO terms (Fig. 6a). However, 170 DEGs were significantly enriched in 22 GO terms between CE1 and CE2 stage. The top four enriched GO terms of biological processes were related to metabolic processes comprising drug metabolic process (GO:0017144), chitin metabolic process (GO:0006030), amino sugar metabolic process (GO:0006040) and glucosamine-containing compound metabolic process (GO:1901071). As for cellular component category and molecular function category, extracellular region (GO:0005576) and chitin binding (GO:0008061) were the most enriched subclasses, respectively (Fig. 6b). In contrast to the other two comparison groups, 619 DEGs from CE2 to Z stage, were significantly enriched in 61 GO terms, with proteolysis (GO:0006508), extracellular region (GO:0005576) and peptidase activity, acting on L-amino acid peptides (GO:0070011) were the most significant GO terms in biological process, cellular component and molecular function separately (Fig. 6c). Chitin metabolic process (GO:0006030) was common to the three comparable groups, suggesting that the cuticle of embryos might be being constructed.
Performed by KEGG, the DEGs in N-CE1 stage were mainly enriched in pathways related to amino acid metabolism and carbon metabolism like beta-Alanine metabolism (ko00410) and fructose and mannose metabolism (ko00051) (Fig. 6d, Table S7). In addition to amino acid metabolism, some pathways were significantly enriched in CE1-CE2 stage, including glycerolipid metabolism (ko00561), glycosaminoglycan degradation (ko00531) and sphingolipid metabolism (ko00600). For CE2-Z stage, some KEGG pathways were associated with genetic information processing and energy metabolism, such as DNA replication (ko03030), homologous recombination (ko03440) and oxidative phosphorylation (ko00190). Besides, visual system-related pathway (phototransduction - fly) were significantly enriched in different comparison groups. Twenty-five DEGs were enriched for the phototransduction - fly pathway, and their expression profiles indicated that proliferating cells in the eye field might be gradually differentiated into photoreceptors at CE1 stage (Fig. 7).
(a)(b)
(c)(d)
Figure 6 Enrichment analysis of DEGs. (a) GO enrichment analysis from N to CE1. X-axis indicates -log10 p-value, and the Y-axis indicates the enriched GO terms. Three different colors of the columns represent three basic classifications of GO term. Each category displays five GO terms. (b) GO enrichment analysis from CE1 to CE2 stage. (c) GO enrichment analysis from CE2 to Z stage. (d) KEGG enrichment of DEGs. X-axis indicates the comparison groups and Y-axis indicates the pathway names. The different colors of the dots represent the p-value, and the number of DEGs in each pathway is represented by the size of the dots.
(a)
(b)
Figure 7 The summary of the phototransduction - fly signaling pathway. The average FPKM values were used to plot heatmaps. (a)The KEEG pathway of the phototransduction - fly signaling pathway. The green background indicates the unigenes annotated on the pathway. (b) The expression patterns of DEGs enriched in phototransduction - fly signaling pathway.
3.5 Identification of shared and stage-specific DEGs at four different development stages
Among the DEGs identified, 289 genes appeared in all four different stages of embryonic development. Functional annotation indicated that these genes were mainly involved in cuticle formation, muscle growth, and the establishment of immune system (Fig. 8). The detailed annotation information was shown in Table S8. Among the DEGs related to cuticle formation, there were eleven genes enriched in chitin metabolism pathway, fifteen genes annotated as cuticle proteins and one Bursicon gene. When the embryo developed to CE1 stage, the expression of related DEGs significantly increased and reached the maximum during Z stage (Fig. 8a). Moreover, there were nine DEGs related to muscle growth, including troponin C (2) and various types of actin (7). The results showed that, except for single transcript (evm.TU.CTG_13447.37), the expression of other genes gradually increased (Fig. 8b). The changes in the expression of these DEGs were consistent with the phenotype of appendage formation and muscle enlargement during four developmental stages. Besides, a total of 22 immune related genes were screened, involving immune effector factors, immune recognition receptors, and signaling pathways (Fig. 8c). It indicated that the embryonic immune system was constantly improving. Furthermore, three DEGs involved in phototransduction and eleven DEGs annotated as crustacyanin were found, with their expression levels continuously increasing (Fig. 8d). It was suggested that the crustacyanin genes should act in the formation of various pigment cells at later stages of embryonic development. For stage-specific genes, enrichment analysis revealed that 33 and 63 stage-specific genes were associated with chitin binding (GO:0008061) and structural constituent of cuticle (GO:0042302), respectively (Fig. 8e-f). And the number of these stage-specific genes gradually increased from N to Z stage.
(a)(b)
(c)(d)
(e)(f)
Figure 8 Clustering of DEGs identified in all three comparative groups. The average FPKM values were used to plot heatmaps. (a) Heat map of DEGs related to cuticle formation. Purple label indicates the genes related to chitin metabolism pathway, pink indicates putative cuticle gene, and green indicates the bursicon gene (b) Heatmap of DEGs related to muscle growth. (c) Heatmap of DEGs related to immune system. (d) Heatmap of crustacyanin genes. (e-f) Heatmap of stage-specific genes associated with chitin binding and structural constituent of cuticle.
3.6 Discovery of key genes related to eye development
Compound eye formation was a distinct and important developmental phenotype during embryonic development from N to Z stage. Therefore, it was necessary to uncover key members at the molecular level (Table 4). For the determination and differentiation of the eye field, the RD network had five key members, namely eyeless (ey), twin of eyeless (toy), sine oculis (so), eyes absent (eya) and dachshund (dac). Optix also participated in the formation of eye morphology. Based on the sequence information of D. melanogaster, a total of seven candidate homologs were identified, including two optix homologs. In eye color, we identified two Scarlet and five White based on the transcriptome, but there was no brown. In terms of visual signal transduction, opsins converted absorbed photons into electrical signal through the phototransduction mechanism. In this study, 14 putative opsins were identified, including 11 visual opsins and 3 non-visual opsins (Table S9). The phylogenetic analysis showed that 11 visual opsins were classified into three categories: long-wavelength-sensitive opsins (7), ultraviolet-sensitive opsins (3), and onychopsin (1) (Fig. 9a). When the embryo developed to Z stage, the expression level of visual opsins significantly increased (Fig. 9b). To respond to environmental stimuli, pigmentary-effector hormones could regulate retinal pigment and phototransduction processes. Four putative orthologues were found in the transcriptome, namely one RPCH (red pigment-concentrating hormone) and three PDHs (pigment-dispersing hormone). In short, the identification of related genes would help to further analyze the formation and neurogenesis of the visual system in N. denticulata sinensis.
Table 4
Summary of candidate genes related to eye development of N. denticulata sinensis
Gene annotation | Gene ID | Class |
Dachshund homolog 1 | evm.TU.CTG_40958.11 | RDGN |
Eyes absent homolog 2 | evm.TU.CTG_45744.3 | RDGN |
Paired box protein Pax-6, toy | evm.TU.CTG_5289.31 | RDGN |
Paired box protein Pax-6, eyeless | evm.TU.CTG_29246.4 | RDGN |
Sine oculis homeobox homolog 1a | evm.TU.CTG_58749.7_evm.TU.CTG_58749.2 | RDGN |
Homeobox protein SIX3, Optix | novel.5928 | RDGN |
Homeobox protein SIX3, Optix | novel.5927 | RDGN |
Scarlet | evm.TU.CTG_46350.11 | ABCG |
Scarlet | evm.TU.CTG_25458.44 | ABCG |
White | evm.TU.CTG_25458.44 | ABCG |
White | evm.TU.CTG_50190.14 | ABCG |
White | evm.TU.CTG_46350.9 | ABCG |
White | evm.TU.CTG_46350.10 | ABCG |
White | evm.TU.CTG_11631.10 | ABCG |
RPCH | evm.TU.CTG_47059.16 | Pigmentary-effector hormones |
PDH | evm.TU.CTG_40362.9 | Pigmentary-effector hormones |
PDH | evm.TU.CTG_48844.2 | Pigmentary-effector hormones |
PDH | evm.TU.CTG_48844.1 | Pigmentary-effector hormones |
Note: RDGN, retina determining gene network ABCG; ATP-binding cassette (ABC) transporter subfamily G; RPCH, red pigment-concentrating hormone; PDH, pigment-dispersing hormone. |
(a)
(b)
Figure 9 Analysis of visual opsins. (a) Phylogenetic tree of visual opsins. The opsins of N. denticulata sinensis is marked in the red typeface. All accession numbers of opsin sequences used in the phylogenetic tree are shown in Table S1. (b) Heatmap of putative visual opsins of N. denticulata sinensis. The average FPKM values were used to plot the heatmap.
3.7 The qRT-PCR validation of DEGs
To validate transcriptome sequencing data, partial DEGs in the phototransduction pathway and RDGN were selected for quantitative real-time PCR analysis. The result of qRT-PCR showed that expression changes of the eight selected DEGs were consistent with that of the transcriptome data, which confirmed the reliability of the transcriptome data (Fig. 10).