Full-length Transcriptome Sequences and Identication of Putative Genes for Carotenoids Accumulation in Polymorphic Noble Scallop Chlamys Nobilis

Background: Carotenoids are ubiquitous in marine bivalves and the accumulation mechanism has attracted widespread interest, but still remains unclear. The enrichment carotenoids noble scallop Chlamys nobilis provide a model animal to explain the molecular mechanism of carotenoids in marine bivalves. For a better understanding of transcriptional data and putative genes involved in carotenoids accumulation in noble scallop, PacBio sequencing and RNA-seq data were jointed to analysis the differentially expressed genes in different tissues. Results: In the present study, we obtained 26,237 non-redundant transcripts, and total 9263 DEGs were identied among the ve tissues, and 3361 were up-regulated and 4980 genes were down-regulated in golden scallop (rich in carotenoids). Examination of the GO terms indicated that the DEGs mainly related to metabolic process, binding, catalytic activity and cellular process. KEGG pathway analysis showed that DEGs mainly related to phagosome, protein processing in endoplasmic reticulum and lysosome. Carotenoids-related genes, including CD36, Retinoid X Receptor (RXR), glutathione S-transferase (GST) and low density lipoprotein (LDL) showed signicantly difference between golden and brown scallops, and several genes function were also identied in the early developmental stages. Conclusions: This study revealed that there are large transcriptome differences between golden scallops and brown scallops, these results provide fundamental resources of large-scale full-length transcripts in C. nobilis and contribute to the understanding of carotenoids accumulation mechanism in mollusks. decreasing trend, brown scallops higher levels than golden scallops at most stages. of genes play crucial roles in the early developmental phases of carotenoids accumulated process in golden scallops.

CA, USA), its degradation and contamination was monitored on 1% agarose gels. And the RNA's concentration was measured by Qubit RNA Assay Kit in Qubit 2.0 Flurometer (Life Technologies, CA, USA). The integrity of RNA was evaluated by the RNA Nano 6000 Assay Kit. All the RIN (RNA integrity number) values of samples were ranging from 7.8 to 8.5.

Library preparation and sequencing
Total RNA was used to purify the mRNA by poly (T) oligo-attached magnetic beads. Full-length cDNA library were synthesized using the SMART PCR cDNA Synthesis Kit (Clontech, CA, USA), and NGS cDNA libraries were produced using First Strand Master Mix and Second Strand Master Mix. And then PCR ampli cation was carried out to enrich the cDNA fragments. Agilent Bioanalyzer 2100 system was used to evaluate the cDNA libraries'the quality. The Paci c Biosciences' real-time sequencer and Illumina HiSeq 2000 System were used to SMAT sequencing and NGS, respectively.
Quality ltering and error correction of PacBio long reads Low quality SMRT sequencing reads were removed according to the following parameter: length < 50 bp and quality < 0.75. After cleaning, the high quality reads were processed into reads of inserts (ROIs): full passes ≥ 0 and quality > 0.8. The ROIs with 5′ and 3′ primer sequences and a poly(A) tail were identi ed as full-length transcripts, and among them, ROIs were considered as full-length non-chimeric (FLNC) for possessing poly(A) and 5′tail. Consensus isoforms were clustered and corrected by approaching clustering using SMRT analysis (v2.3.0), and full-length consensus sequences were re ned using Quiver [19].
High-quality full-length transcripts were obtained from the full length transcripts with high quality which corrected by proovread 2.13.841 and removed by the CD-HIT [20].
Alternative splicing, simple sequence repeat (SSR) detection and prediction of lncRNA To identify alternative splicing events, unigenes were directly to run BLAST, which accord with the following conditons were considered as alternative splicing events: 1) in the alignment, two High-scoring Segment Pair must have the same forward/reverse direction; 2) and at least 100 bases at the 5′ and 3′ ends within their complete open reading frames [21]. MISA (http://pgrc.ipk-gatersleben.de/misa/) was used to identify the SSRs of these transcripts. Four computational approaches were used to predict the lncRNAs in this study, including CPC, CNCI, CPAT and Pfam.

CDS detection and functional annotation of transcripts
The coding regions of transcripts were identi ed by TransDecoder (https://github.com/TransDecoder/TransDecoder/releases) [22]. And the function of these transcripts was annotated by eight public databases, including NR, Pfam, KOG, COG, eggNOG, Swiss-Prot, KEGG and GO.

Differential expression analysis and functional enrichment analysis
To analyze the different expression genes (DEGs), the gene expression levels were used by FPKM, and EBSeq R package were used to performed the differential expression analysis between two samples, the threshold was FDR < 0.05 and |log2 foldchange | ≥1 for signi cantly differential expression.
The GOseq R packages was used to analyze the GO enrichment of DEGs, and to analyze the KEGG enrichment of DEGs, the KOBAS software was used in this study [23].
Function identi ed of the carotenoids related DEGs in the scallop early development To identify the carotenoids related genes function, samples were collected from early developmental stages of golden scallops and brown scallops, including fertilized egg, trochophore larva (T-larva), D-shape larva (D-larva), umbo larvae (U-larva), secondary shell stage (S-stage) and the juvenile stage (J-stage), and the expression levels of these related carotenoids genes were determined by qRT-PCR. The expression level of different genes was analyzed by 2 -△△Ct method.

PacBio sequencing and error correction of long reads
To obtain comprehensive transcriptome pro les of golden and brown scallops between the same tissue, 10 libraries of ve tissues were constructed from golden and brown scallops. Totally, 53.97 Gb NGS clean reads were produced with an average of 5.4 Gb for each sample (Additional le 1: Table S1), the sequencing data of this study was deposited in the NCBI (SRA: SRP171083).
Five tissues from golden scallops were used in extracting RNA, and the library was constructed by equally RNA from each tissues, and then sequencing.
Totally 679,907 polymerase reads were obtained, and their full passes ≥ 0, consensus accuracy > 0.8, the average length was 2,677 bp, and with quality of 0.89, and 12 passes (Additional le 1: Table S2). These polymerase reads were ltered using the standard protocol of SMRT Analysis software suite, and 182,010 ROIs were obtained, which included 90,727 FLNC and 80,556 non-full-length reads (Table 1). Duo to the high error of SMRT sequencing reads, it is indispensable to perform error correction using high-quality NGS short reads. After errors correction, low quality and redundant transcripts were removed. Finally, 26,237 of non-redundant transcripts were obtained from scallops.

Functional annotation and enrichment analysis
The obtained transcripts combining Pacbio and NGS data increase the accuracy and e ciency of functional gene prediction and annotation, especially lacking reference genome information. Functional annotation of the non-redundant transcripts was searched against the public databases using the BLAST, such as NR, GO, Swissprot and KEGG databases. In the GO database, 4,227 transcripts were annotated, and 6,424 in COG; 9,977 in KEGG; 13,320 in KOG; 15,759 in Pfam; 11,729 in Swiss-Prot; 17,251 in eggNOG; 20,961 in NR. Total of 21,030 transcripts were annotated at least one of the eight databases (Fig. 1a). Based on the NR database, homologous species of C. nobilis were predicted using sequence alignment. Approximately 91.11% of sequences were aligned to Mizuhopecten yessoensis, followed by Crassostrea gigas (1.16%) (Fig. 1b).

Alternative splicing analysis and SSR detection
Total 227 alternative splicing events were identi ed (Additional le 1: Table S3). Duo to no reference genome is available for this species, hence the types of alternative splicing events cannot be identify. And total of 26,135 transcripts (78,152,940 bp) were subjected to SSR analysis, including 23,758 SSRs and 12,442 SSR-containing sequences and found that most of them were with mono-, di-, or tri-nucleotide repeats ( Fig. 2; Additional le 1: Table S4). Considering the high quality of transcriptome sequences, the detected SSRs would be useful for marker-assisted breeding and genetic analysis in the C. nobilis.
Long non-coding RNAs (lncRNAs) As an emerging hot topic in biology, lncRNAs has been found to be functional as crucial regulators in variety of biological processes. In the present study, lncRNA transcripts were predicted by four methods, including CPC, CNCI, CPAT and pfam protein structure domain analysis, totally identi ed 6,032 lncRNAs in noble scallop (Fig. 3, Additional le 1: Table S5).

Identi cation of differentially expressed genes (DEGs)
Based on the reads from RNA-seq, FPKM values were used to investigate the gene expression patterns of different tissues of C. nobilis. Thus, the comparison of gene expression between golden scallops and brown scallops was performed. DEGs were analyzed using the edgeR software. To identify signi cantly differential expression genes, FDR < 0.05 & | log2 fold change | ≥1 was set as the criteria. DEGs of ve tissues from golden scallops and brown scallops was list in Table 2. And a total of 9263 DEGs were identi ed among the ve tissues, and the up-regulated and down-regulated genes number were showed in Additional le 1: Table S6, respectively. Then, Venn diagrams showed the number of genes uniquely expressed in each tissue or genes shared between one or more tissues (Fig. 4). GO and KEGG pathway enrichment analysis of DEGs in the same tissue were shown in Fig. 5 and Fig. 6. And the top 20 KEGG pathways in each tissues were listed in Additional le 1: Table S7. Based on the GO enrichment analysis, total of 47 signi cantly GO terms were observed (corrected P-value < 0.05). The top ve GO terms were macromolecular complex, protein complex, oxoacid metabolic process, organic acid metabolic process and lyase activity (Fig. 7). And these include a number of terms related to carotenoids, such as lipid transport and lysosome. Total of 229 KEGG pathways were identi ed (Additional le 1, Table S8). These include a number of terms related to carotenoids accumulation, such as lysosome, fat digestion and absorption and ABC transporters (Fig. 8).
To explore the genetic mechanisms of the golden scallops, the differentially expressed genes of ve tissues were ltered respectively for those believed to be involved in carotenoids accumulation. Several genes, including CD36, ABC transporter G family member 5 (ABCG5), beta, beta-carotene 15,15-dioxygenase (BCMO1), glutathione S-transferase (GSTs), intestine-speci c homeobox (ISX), very low density lipoprotein receptor (VLDLR), low-density lipoprotein receptor (LDLR), Craotene oxygenase related to carotenoids accumulation were shown in Fig. 6, CD36, GAST-theta and very low density lipoprotein receptor (VLDLR) genes were signi cantly higher expressed in the hemolymph of golden scallops than brown scallops; ABCG5 gene was highly expressed in the mantle of golden scallops; BCMO1 gene was highly expressed in the intestine of golden and brown scallops than other tissues; GST-Mu gene was signi cantly higher expressed in the gonad of golden and brown scallops than other tissues; Craotene oxygenase gene was signi cantly higher expressed in the intestine of golden and brown scallops than other tissues; low-density lipoprotein receptor (LDLR) has a signi cant differences in mantle and adductor muscle. Overall, our results of the high expression levels for these genes related to carotenoids accumulation are consistent with higher carotenoids content in golden scallops, suggesting that these genes play important roles in carotenoids accumulation.

Function identi ed of the carotenoids related DEGs in the scallop early development
To explore the carotenoids related DEGs function in the scallop early development, we selected six genes to analyze their spatial and temporal expression characteristics (Fig. 9). The results indicate that all these genes showed a strongly expression level in fertilized egg, except for the BCMO1; and these genes also showed a higher level in golden scallops than that of brown scallops at S-stage and J-stage, except for BCMO1 and GST-Pi. The expression level of BCMO1 showed a decreasing trend, and brown scallops had higher levels than golden scallops at most stages.

Discussion
Transcriptome data generated by PacBio sequencing improved the unigenes from C. nobilis, and after the NGS data correcting, the full-length transcripts in the present study has a higher level completeness and gene annotation, providing us a better understanding of the transcriptome. Previous studies have shown that the data produced by Illumina Hiseq platform has good sequencing depth and coverage in transcriptome, however, unigenes assembled by short reads would lead to a higher mis-assembly rate and unreliable gene annotation [24,25], and resulting in the inability to identify the genes associated with traits. To avoid such limitations in C. nobilis, a full-length transcriptome data from ve tissues (intestine, adductor, mantle, ovary and hemolymph), was produced using the PacBio SMRT sequencing approach, which maximizes transcript diversity. As expected, a great amount of transcriptome data was generated, 26,237 nonredundant transcripts and 21,030 transcripts were annotated which were much better than reported previously [18].
Carotenoids are fat-soluble pigments, showing red, orange or yellow which are commonly displayed in animals, however, except for a few arthropods, most animals are mainly obtained carotenoids from diets [26,27]. In recent years, carotenoids accumulation in bivalves was documented in Argopecten irradians [28], Hyriopsis cumingii [30] and Patinopecten yessoensis [30]. Previous studies have shown that the mechanisms of carotenoid accumulation in bivalves involved carotenoids absorption, transportation, deposition, and metabolic processing [3]. Several genes involved in carotenoid accumulation have been identi ed, for example, in our previous study, a scavenger receptor (SRB-like-3) gene shows a signi cantly correlation with the concentration of carotenoids in the tissues of noble scallops [18]. In P. yessoensis, PySCD (stearoyl-CoA desaturase) as the crucial enzyme in the monounsaturated fatty acids biosynthesis process, has been shown to enhance carotenoid absorption [31], and PyBCO-like 1 responsible for its muscle coloration [32]. However, the underlying mechanisms behind the carotenoid accumulation in marine bivalves remain unclear.
In the present study, NGS and SMRT sequencing approaches were combined to generate a more complete C. chlamys transcriptome. The average length of ROIs can represent the full-length transcripts (Table 1), and Illumina short reads were used to correct these SMRT long reads to generate high-quality transcripts, and can reduce mis-assemblies of genes to increase their sequence identity. Total of 21,030 transcripts were annotated in eight databases, and 227 alternative splicing events and 6,032 lncRNAs were identi ed. To explore the genes involved in carotenoids accumulation in noble scallop, a total of 9263 DEGs in different tissues between golden and brown scallops were identi ed, and 3361 genes were up-regulated and 4980 genes were down-regulated.
According to the KEGG pathway analysis, the Lysosome pathway was identi ed to be highly positive to carotenoids accumulation in golden scallop, and many carotenoids related genes shown to exhibit a high transcriptome expression level in golden scallop' tissues, for example, the CD36 has a highly expressed in the hemolymph of golden scallop. In the hemolymph of golden scallops, a higher expression level of CD36, GST-omega and VLDLR may be associated to carotenoids transportation. In silkworm, Cameo2, a member of the CD36 family, has been proved to transport lutein [33]; GST and VLDLR also reported that have a signi cantly associated to carotenoids transport in human [34,35]. Higher expression level ABCG5 and LDL in golden scallop mantle suggested they play a role in carotenoid accumulation. Previous studies in human have shown that ABCG5 polymorphism plays major roles in plasma response to cholesterol and carotenoids, total LDL cholesterol levels has a correlations of serum carotene concentrations [36,37]. In addition, we also found that many structural proteins, such as actin, bronectin, tektin and tubulin, were signi cantly higher in golden scallops comparing to the brown ones, previous studies believed that carotenoids transported to the target tissues and then deposited in speci c receptor cells, and then binding to the structural proteins [38][39][40]. Therefore, we putative that these genes are associated to the carotenoids accumulation in the golden scallop.
In the early development stages of scallops, the brown scallops have a signi cantly higher expression level of BCMO1 than that of golden scallops at most stages, suggesting BCMO1 maybe play a crucial role in the carotenoids accumulated process. In animals, BCMO1 gene is an essential enzyme in carotenoid metabolism, which can catalyze symmetrical cleavage of carotenoids, such as α-, β-carotene and β-cryptoxanthin [41]. For example, a BCO-like 1 was identi ed responsible for depositing carotenoid and coloring the muscle in Patinopecten yessoensis [32]. CD36, ABCG5, GST-Pi and StAR-like-3 also play important roles the early development stages of golden scallops' carotenoids accumulated process, which shown signi cantly higher expression levels in golden scallops than that of brown scallops. These genes are also responsible for the carotenoids absorption and transportation in other animals, for example, CD36 play a crucial role in absorbing provitamin A carotenoids in cell [42]; and the polymorphisms of ABCG5 in human affect the lutein supplementation [43]; GST-P1, exhibiting a higher a nity and speci city for zeaxanthin, is widely expressed in human epithelial tissue, showing a speci c ability of carotenoid binding protein for zeaxanthin [44]; StARD3 is a membrane protein of human retinal, which has been identi ed as the lutein binding protein involved in absorption of lutein [45]. All these results showed that these genes play crucial roles in the early developmental phases of carotenoids accumulated process in golden scallops.

Conclusion
Full-length transcriptome of golden and brown scallops was obtained by PacBio RS II sequencing and comparative analysis was conducted using NGS data between golden and brown scallops, these results provide a basis in understanding the molecular mechanism of carotenoids accumulation in golden scallops.
In addition, the expression patterns of these carotenoids accumulated genes in golden and brown scallops were analysed, and these genes showed large differences in expression, and several genes' function were identi ed in the early developmental stages of golden scallops. AS and lncRNA are also analyzed in the present study. Our results also provide valuable resource for further study on carotenoids accumulation in animals.

Declarations
Ethics approval and consent to participate All animals used in this study conducted under the guidelines of institution and nation.

Consent for publication
All authors agree to publish in this journal.
Availability of data and materials All available data and materials have been shown in the manuscript and additional les.  the percentage of full-length chimeric sequence in full-length sequence.      The SSR type distribution Note: p1, p2, p3, p4, p5, p6 and c represent perfect single-base repeat, double-base repeat, three-base repeat, four-base repeat, vebase repeat, six-base repeat and mixed SSR, respectively, (c contains at least two perfect SSRs, and the distance between them less than 100 bp).   Venn diagrams of genes uniquely expressed in each tissue or genes shared between one or more tissues in golden and brown scallops.

Figure 4
Venn diagrams of genes uniquely expressed in each tissue or genes shared between one or more tissues in golden and brown scallops.         The expression levels of carotenoids related genes in the early developmental stages in scallops.

Figure 9
Page 21/21 The expression levels of carotenoids related genes in the early developmental stages in scallops.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.