Sequencing and de novo assembly
Totally, 60,954,060, 57,468,802, 61,486,012, 57,611,124 paired-end reads were generated
by samples from brain, liver, ovary and testis of Collichthys luciduss, respectively. After removing low quantity reads and trimming adaptor reads, 60,322,004
(98.96%), 57,044,284 (99.26%), 60,867,978 (98.99%) and 57,087,688 (99.09%) clean reads
were obtained (8.0 G). De novo assembly of clean reads generated 189,181 transcripts
with an average length of 818 bp and a N50 length of 1463 bp. The longest transcript
of each gene was selected and yielded 131,168 unigenes. The average length of unigenes
was 644 bp and the N50 length was 1033 bp (Table 1). Most transcripts and unigenes
were 200 to 299 bp in length (Fig 1A, B). These unigenes were retained for subsequent
functional annotation.
Functional annotation
Unigenes were annotated using NR, GO, KEGG and Swissport databases, and a 10-5 e-value cut-off value was used. 51,590 (39.33%), 27,708 (21.12%), 12,883 (9.82%)
and 44,051 (33.58%) significant hits were produced respectively. Among these unigenes,
53,200 (40.56%) were annotated at least in one database and 8851 (6.75%) unigenes
were annotated in all five databases (Table 1). The similarity distribution of the
top hits showed that 61% of the mapped sequences had similarities higher than 80%,
while 37%of the hits had similarities ranging from 40% to 80%. (Fig 2A). The E-value
distribution had a comparable pattern with 51% of the mapped sequences with high homologies
(<1e-60), whereas 48% of the homological sequences ranged between 1e-5 to 1e-60 (Fig
2B). The species distribution of NR BLAST matches showed that the top two species
was Maylandia zebra (26%) and Oreochromis niloticus (21%) (Fig 2C).
GO and KEGG analysis of Collichthys lucidus transcriptomes
The roles of unigenes were investigated by GO and eggNOG analysis. In GO database,
27,708 unigenes were clustered into 67 terms of three different domains (biological
process (BP), cellular component (CC) and molecular function (MF). The number of unigenes
in each term are exhibited in Fig 3A. The top 3 terms ranked by number of unigenes
in BP were cellular process (17,477), single-organismal process (16,219) and biological
regulation (11,558). In CC, unigenes were mainly converged on cell (16,229), cell
part (16,151) and membrane (11,815). In MF, the most prominent terms were binding
(16,123), nucleic acid binding catalytic activity (10,504) and signal transducer activity
(1,974). On the other hand, eggNOG analysis annotated 48,042 (36.63%) unigenes, and
the result revealed that unigenes mainly contributed to signal transduction mechanisms.
The top 3 terms were signal transduction mechanisms, general function prediction only
and function unknow (Fig 3B).
To further explore the biological functions of unigenes, KEGG database was mapped
and 12,883 unigenes were distributed into 32 pathways, among which signal transduction
(1,877), endocrine system (849), immune system (820), signaling molecules and interaction
(814) and transport and catabolism (668) were the largest five groups (Fig 4).
Differentially expressed unigenes
The expression pattern of each unigene in brain, liver, ovary and testis was assessed
by expected number of Fragments Per Kilobase of transcript sequence per Millions base
pairs sequenced (FPKM). The result of examination of unigenes in each organ revealed
that brain expressed the most unigenes (74.83%), while liver expressed the least (46.18%)
(Fig 5 A). Additionally, a gene was identified as organ-specific when the FPKM was
at least 10 in one organ whereas less than 1 in other organs. Under this standard, 2,143, 304, 333, 556 unigenes were found specifically expressing in brain, liver,
ovary and testis, respectively. The most genes were specifically expressed in brain, followed by ovary and testis,
according with our hypothesis that HPG play a dominant role in Collichthys lucidus.
The expression profiles were compared between each two organs. Between brain and liver, 992 genes were upregulated in brain and 264 genes were up
regulated in liver. Between brain and ovary, 749 genes were upregulated in brain and
317 genes were upregulated in ovary. Between brain and testis, 685 genes were upregulated
in brain and 388 genes were upregulated in testis. Between liver and ovary, 211 genes
were upregulated in liver and 114 genes were upregulated in ovary. Between liver and
testis, 138 genes were upregulated in liver and 315 genes were upregulated in testis.
Between ovary and testis, 442 genes were upregulated in ovary and 846 genes were upregulated in testis (Table 2). The differential genes of group ovary vs testis are shown in Fig 5B and
the differential genes of other groups are shown in Additional file 2 (Fig S1).
Potential roles of differentially expressed unigenes
GO enrichment analysis were established for differentially expressed unigenes between groups and we focused on terms related to reproduction, sex determination
and sexual development. Differentially expressed unigenes between ovary and testis
were primarily involved in sexual reproduction (GO: 0019953), development of primary sexual characteristics (GO: 0045137), sex differentiation (GO:
0007548), male gonad development (GO:0008584), female gamete generation (GO: 0007292),
male gamete generation (GO: 0048232), development of primary male sexual characteristics
(GO: 0046546), male sex differentiation (GO: 0046661) (Additional file 3). Interestingly, these GO terms were also enriched in other organs,
implying their high correlation with reproduction and sex in Collichthys lucidus. For instance, sexual reproduction (GO: 0019953), multicellular organism reproduction
(GO: 0032504), reproduction (GO: 0000003) were enriched in differentially expressed
unigenes between brain and liver. Developmental process involved in reproduction (GO:
0003006), cellular process involved in reproduction in multicellular organism (BP,
GO: 0022412) and sexual reproduction (GO: 0019953) were enriched in differentially
expressed unigenes between brain and ovary. Developmental process involved in reproduction
(GO:0003006), sexual reproduction (GO: 0019953) and cellular process involved in reproduction
in multicellular organism (GO: 0022412) were enriched in differentially expressed
unigenes between brain and testis. Regulation of growth (GO: 0040008), Sexual reproduction
(GO: 0019953) and multicellular organism (GO: 0022412) were enriched in differentially
expressed unigenes between liver and ovary. Multicellular organism reproduction (GO:
0032504), sexual reproduction (GO: 0019953) and cellular process involved in reproduction
in multicellular organism (GO: 0022412) were enriched in differentially expressed
unigenes between liver and testis.
Furthermore, through KEGG pathway enrichment analysis, several pathways were enriched
in each group. To investigate pathways connected to reproduction and gonad development,
we particularly focused on differentially expressed unigenes between ovary and testis. 165 pathways, including PPAR signaling pathway, ovarian steroidogenesis, oocyte meiosis, steroid hormone biosynthesis,
apoptosis – multiple species and proximal tubule bicarbonate reclamation were significantly
enriched in differentially expressed unigenes between ovary and testis, which might
participate in the regulation of reproduction. The profile of group ovary vs testis
is exhibited in Fig 5C and others are shown in Additional file 4 (Fig S2).
Identification and characterization of genes involved in reproduction and sexual development
Based on the de novo assembly and annotation, 652 and 650 candidate unigenes were
identified for reproduction (GO: 0000003) and reproductive process (GO: 0022414) respectively.
Furthermore, 8 unigenes were enriched in both terms of reproduction, sex differentiation
and development, implying that they dominated the regulation of reproduction and sex
development. Among the 8 unigenes, HYAL (hyaluronidase), KLHL10 (Kelch-like protein 10), ROPN1L (rhophilin associated tail protein 1 like), ODF3L2 (outer dense fiber of sperm tails 3 like 2), SYCP3 (synaptonemal complex protein 3) and SRPK3-like (serine/threonine-protein kinase 3-like ) were upregulated in testis, whereas, BMP15 (bone morphogenetic protein 15) and RGS14 (regulator of G protein signaling 14) were upregulated in ovary (Table 3).
Identification of SSRs
Using MISA 1.0 (http://pgrc.ipkgatersleben.de/misa/misa.html), 35,476 SSRs were predicted and 2,455 compound formations were composed. Of the 35,476
detected SSRs, the most affluent repeat motif was mono- and di-nucleotide repeats,
constituting 41.18% (14,608) and 38.23% (13,563), respectively, followed by tri- (18.74%,
6,648), tetra- (1.73%, 614), penta- (0.09%, 31) and hexa-nucleotide (0.03%, 12). In
general, the most abundant repeat motif was poly-T (18.66%, 6617). Among di-nucleotide
repeat motifs, (GT/TG) was the dominant type with frequency of (12.24%, 4342) and
(CG/GC) was the least type which only occupied (0.09%, 31). These SSRs were valuable
for further genetic and mechanism researches (Table 4).
SSR marker polymorphism
In this study, 10 primer pairs were randomly designed to validate the amplification
and polymorphism in Collichthys lucidus collected from 7 different regions. Seven of the primer pairs successfully amplified
fragments. Among the successful primer pairs, one pair exhibited polymorphisms among
the seven colonial Collichthys lucidus (Figure 6 A and B). The polymorphic SSR marker was then used to perform genetic correlation
analysis among the Collichthys lucidus groups. The UPGMA clustering produced a dendrogram (Figure 6 C) that separated theCollichthys lucidus into two main groups, from which ND completely separate from the others. Moreover,
LYG and ZS formed a sub-cluster, while CM and DF formed a sub-cluster.
Validation of gene expression using qRT-PCR
To further verify the expression of genes identified by transcriptome sequencing experimentally,
8 genes enriched in reproduction terms were validated by qRT-PCR analysis. In accordant
with the RNA-seq, qRT-PCR showed ROPN1L, KLHL10, ODF3L2, SYCP3 and SRPK3-like were specific expressed in testis whereas BMP15 and RGS14 were specific expressed
in ovary (Fig 7).