The globe transcriptome analysis reveals hybrid effects and regulatory divergence between (Acanthopagrus schlegelii ♂× Pagrus major ♀) hybrid F1 and parents CURRENT STATUS: POSTED

Background: Hybridization in two related homoploid species can mix the genotypes which have the potential for good environmental adaptation conditions or rapid growth condition. Since hybridization technique has been used in other animals to produce more rapidly growing individuals, it may be useful in commercial fish culture. The nutritional composition and rapid growth rate of P. major, A. schlegelii and hybrid F1 had been researched. However, gene expressions of these fish are less well studied. de novo assembly RNA-Seq is a better approach to understanding transcriptomes of animals and can offer much data with better coverage and sufficient sequence depth for transcriptomes. Results: In order to facilitate molecular research in parents and hybrids F1, we characterized the whole body transcriptome for identifcation of transcripts involved in growth, metabolism and immune since several attributes were coming from different samples (Acanthopagrus schlegelii, Pagrus major and hybrid F1 Acanthopagrus schlegelii ♂× Pagrus major ♀). We analyzed the transcriptome of As, Pm and hybrid F1 AP using Illumina high throughput sequencing platform. Our results suggested that physiological index of organism development we observed is consistent to the gene express in transcriptome. We have also compared the expression levels of transcripts in parents and hybrids F1. The results showed that AP had a quickly growth and strong resistant than parents Pm and As. Conclusion: This is the first research on transcriptome of As, Pm and hybrid F1 at one month age. And it will provide an important basic molecular data for studying molecular mechanisms involved in different sample.


Background
Hybridization in two related homoploid species can mix the genotypes which have the potential for good environmental adaptation conditions or rapid growth condition [1].
These situations have been confirmed in some plants and animals [2,3], including in fish [4]. Since hybridization technique has been used in other animals to produce more rapidly growing individuals, it may be useful in commercial fish culture [5]. And the ability of some hybrid fishes to grow more rapidly than the parent species has been reported by Ricker [6], Smirnov [7], Buss and Wright [8], and Garside and Christie [9].
Black seabream (Acanthopagrus schlegelii) (Bleeker, 1854) is one kind of marine carnivorous fish. It has high aquaculture potential because of its high economic value and excellent meat quality. Also it has strong ability to tolerate a wide range of environmental conditions, high stocking densities and strong resistance to diseases [10,11]. From the previous research, it is highly adaptable to a wide range of water temperatures and salinities [12,13]. But the long rearing period to market size and poor resistance to disease is the defect. Red seabream,Pagrus major (Temminck and Schlegel, 1843), another most important species in fisheries, has the advantages of large size, fast growth and high disease resistance [14], but it is susceptible to salinity and water temperature change.
The nutritional composition and rapid growth rate of P. major, A. schlegelii and hybrid F1 had been researched. However, gene expressions of these fish are less well studied. de novo assembly RNA-Seq is a better approach to understanding transcriptomes of animals and can offer much data with better coverage and sufficient sequence depth for transcriptomes. In this study, we report the de novo assembly transcriptome of A. schlegelii (As), P. major (Pm) and hybrid F1 (A. schlegelii♂ × P. major♀, AP) for the first time. The high-quality reads were assembled into unique transcripts. Then these transcripts were annotated using different database to identify the putative pathways and genes responsible for their body properties.

Results
Illumina paired-end sequencing and De novo assembly The fish were used as sequencing sample in biological duplicates. The pre-processing of raw reads was done for removal of adaptor sequences and low-quality reads and the average clean reads were 47.21M ± 7.05M (mean ± SD). After quality trimming, 97.37% ± 0.52% of reads were retained, indicating a high quality data set (>85% reads with ≥Q30).
The GC contents were 50.02% ±0.28%. The final assembly consisted of 151,982, 166,373, 216,483 transcripts and the assembled transcripts were further clustered into 120,855, 129,258, 155,311 unigenes with As, Pm and AP samples. The transcriptome assembly details are given in Table 1.

Functional annotation of unigenes
We annotated unigenes by NR, NT, KO, SwissPort, PFAM, GO, KOG databases. The detail information are showed in Table 2. NR and NT is the non-redundant NCBI collection and protein sequence database. And the Nt database can align more nucleic acid in this study.
Swiss-Prot is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, posttranslational modifications, variants, etc), a minimal level of redundancy and high level of integration with other databases. And this database successfully annotated the protein also contain over 20%. Of these, 34008, 30691 and 31770 were successfully GO mapped.
GO terms for biological processes, molecular function and cellular components were all highly represented in annotated genes (Fig. 1).
Also the KEGG database was used to analyze the the active biological pathways. These assembled unigenes were assigned to 32 biological pathways through this process. Among them, the highest pathway assigned to the unigenes is Signal transduction, followed by Endocrine system, Immune system, Nervous system, Signaling molecules and interaction, Cellular commiunity, Transport and catabolism and Digestive system (Fig. 2).

Differential gene expression analysis
The estimation of transcript abundance or expression levels of de novo assembled unigenes from each sample transcriptome was calculated based on FPKM values using RSEM. And the DEGs analysis were used to compare the hybrid F1 with As and Pm transcriptome: AP vs As and AP vs Pm. The results showed that the genes expressed in different groups had significantly different (Fig. 4). There are 13,441 and 10,075 genes significantly expressed in two groups (Fig. 5). GO and KEGG were used to analyse the function of genes.
In AP vs As group, most DEGs belonged to chemokine activity, chemokine receptor binding, carbohydrate binding, G-protein coupled receptor binding, cytokine receptor binding, cytokine activity, nucleic acid binding, immune system process, immune response, and so on (Fig. 6). It indicated that in hybrid F1 and parent, immune system had changed. In AP vs Pm group, these metabolic process, cellular metabolic process, catalytic activity, nitrogen compound metabolic process, intracellular part, organic cyclic compound binding, and so on (Fig. 6), indicated that hybrid F1 compared with its parent, had a strong development process.

Discussion
During the development phase of larval fish, many biological processes, such as differentiation, cellular proliferation, growth, can be regulated to obtain the phenotype of an adult fish [21]. These regulations are carried out through the expression of genes involved in larval ontogenesis processes. Gene expression analysis can be comprehensively analyze through transcriptome using high throughput RNA-Seq tool.
Previous physiologic studies showed that hybrid F1 seabream (Pagrus major♀ × Acanthopagrus schlegelii♂) grow quickly, with enhanced muscle nutritional composition [16][17][18][19]. However, the transcriptome information about the larval As, Pm and F1 are lacked. In the present study, we presented the main findings from transcriptomic investigations of As, Pm and AP during postembryonic (one month age) development 21 .
From the results of RNA-Seq, the whole data is nearly 6G of each sample ( Table 1). The depth of sequencing is enough and the quantity of our sequencing data are convincing for next analysis.
We used the unique transcripts as the database to analysis the development processes of each sample. From the gene function annotation, KEGG results showed that most genes belongs to signal transduction, endocrine system, immune system, nervous system, signaling molecules and interaction, and so on, in each sample (Fig. 2). The molecular results were consistent with physiological development. The multiple biological processes were combined with consistencies of results obtained from the annotated expressed genes in each sample, reveals the complexity of the transcriptome regulation during larval ontogenesis processes. The authors used the transcriptome analysis to show that more than two hundred genes involved in visual pigmentation, metabolism, digestive function and epithelial development exhibited differential expression during the early development of gilthead sea bream [22]. Form previous physiological research about As, Pm and AP samples, the fish began growing quickly, and the digest system, circulatory system, and excretory system are gradually formed during the fourth day after hatching [23]. The transcriptome results was consistent with physiological research. Our results showed that the many genes belonged to endocrine system, immune system, nervous system development has express in each sample. As we all know, immune system is very important to juvenile fish. When the immune system formed, they can prevent the disease invading about environment change or viral infect. During the agricultural production, 25-35 days fish will be sold to farm. These fish need to face the different environment to leave off the incubation environment. In this period, the fish have good condition for transport far away. Our result supposed that the immune system of the fish is more active at this phase, and the fish body has a strong ability to resist the invasion of external pathogens. Besides, metabolism process, such as carbohydrate metabolism, lipid metabolism, amino acid metabolism, and nucleotide metabolism, all had the active geneexpress (Fig. 2).  (Table 3). These genes in each pathway most have high expression during the individual growth. It also suggested that the cell in fish had cell multiplication to make the body grow. Then, the high expressed genes in fat digestion and absorption and oxidative phosphorylation pathway suggested that the fish need more energy for growing. Also between As and AP compared, AP also has a quickly growth. During the early development, the cell quickly divided into more cells to make body growth. The genes belong to DNA replication and cell cycle pathway can show this situation. During the replication, it need ECM-receptor interaction and mismatch repair to make the organism keep the biological function, such as cell aldhere [24]. The developmental integrity of fish cells plays an important role in the resistance of the whole body to external viruses. Besides the quickly growth, there are much other information we can get from AP and As. There are 111 genes of viral carcinogenesis which had high expressed in AP than in As ( Table 3). The high express genes showed a strong cell-proliferation ability in AP. As we know, tumor viruses can promote an abnormal cellproliferation via modulating cellular cell-signaling pathways via expression of many potent oncoproteins [25][26][27]. Our results showed that 111 genes belonging to oncoproteins had high expression in AP. It can suppose that AP had stronger cell-proliferation than As, suggesting that AP maybe grow quickly than As. Another special pathway in AP compared with As, is that 106 genes of herpes simplex infection pathway had higher expression.
Herpes simplex virus (HSV) is the main cause of herpes infections that lead to the formation of characteristic blistering lesion. HSV can interfere with host immune responses by expressing multiple viral accessory proteins [28]. Influenza A is a contagious respiratory disease caused by influenza virus infection, which had 91 genes highly expressed in AP sample than in As. Influenza virus has one kind of proteins which is a multifunctional virulence factor that interfere IFN-mediated antiviral response. It can inhibit activation of transcription factors such as NF-kappa B [29]. Except the virus infect, the parasite also can disrupt the intestinal mucus layer of fish, followed by apoptosis of host epithelial cells. 82 genes belonging to amoebiasis pathway had higher expression in AP than in As. The ingestion of contaminated water and food can lead the pathogenesis of amoebiasis. Intestinal tissue destruction causes severe dysentery and ulcerations in amoebic colitis [30]. And several amoebic proteins, such as lectins, cysteine proteineases, and amoebapores, are associated with the invasion process [31]. The fusion proteins have aberrant transcriptional function for fusion transcription factors alter expression of target genes, and thereby result in the tumorigenesis. They are all leaded by transcriptional misregulation. From the DEGs, There are 77 genes higher expressing in AP sample, belong to Transcriptional misregulation. During rearing, the juvenile fish often were attracted by virus in environment. These pathways maybe provided the challenge ability for fish to resist the environment. Also during these pathways, there were many immune related genes which had high express in fish. These results suggested that the AP, compared with As and Pm, has stronger immune system. The further studies about the As, Pm and AP immune response needs to be carried out in future.

Fish rearing and collection
The As (A. schlegelii),, Pm (P. major) and hybrid F1 AP ( A. schlegelii♂ × P. major♀) fertilized eggs were incubated in the cage of cement pool at 22.4 ℃, and the pH was about 8.2, and the hatched larvae were developed synchronously. They were temporarily maintained at the Jiangsu Province Seedling breeding and breeding Technology Center.
The larvae fish were preserved using RNA later and stored at −80°C for RNA extraction.

RNA extraction and qualification
Total RNA were extracted using Trizol extraction method [32]. RNA quantification were detected by 1.5% agarose gels. The more detail information about RNA quantification, RNA purity, concentration and integrity were carried out. RNA purity was assessed using the NanoPhotometer ® spectrophotometer (IMPLEN, CA, USA). Concentration was measured using Qubit ® RNA Assay Kit in Qubit ® 2.0 Flurometer (Life Technologies, CA, USA). And Integrity was checked by the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA).

Library preparation for transcriptome sequencing
A total amount of 1.5 µg RNA per sample was used for sample preparations. Sequencing libraries were generated using NEBNext ® Ultra™ RNA Library Prep Kit for Illumina ® (NEB, USA) following manufacturer's recommendations. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5×). First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H). Second strand cDNA synthesis was subsequently performed using DNA polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease / polymerase activities. After adenylation of 3' ends of DNA fragments, NEBNext Adaptor with hairpin loop structure were ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 µl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37°C for 15 min followed by 5 min at 95 °C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer.
At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system. The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumia) according to the manufacturer's instructions (Novogene Experimental Department). After cluster generation, the library preparations were sequenced on an Illumina Hiseq platform and paired-end reads were generated.

Data analysis
Raw data (raw reads) of fastq format were firstly processed through in-house perl scripts.
In this step, clean data (clean reads) were obtained by removing reads containing adapter, reads containing ploy-N and low quality reads from raw data. At the same time, Q20, Q30, GC-content and sequence duplication level of the clean data were calculated. All the downstream analyses were based on clean data with high quality. Transcriptome assembly was accomplished using Trinity [33] by default and all other parameters set default. Differential expression analysis of two groups was performed using the DESeq R package (1.10.1) [34,35]. Gene expression levels were estimated by RSEM [36] for each sample.
DESeq provide statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P values were adjusted using the Benjamini and Hochberg's approach for controlling the false discovery rate. Genes with an adjusted P-value <0.05 found by DESeq were assigned as differentially expressed.
Also the Gene Ontology (GO) enrichment analysis of the differentially expressed genes (DEGs) was implemented by the GOseq R packages based Wallenius non-central hypergeometric distribution [37] which can adjust for gene length bias in DEGs. KEGG [38] is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www.genome.jp/kegg/). We        The cluster heatmap in AP, As and Pm sample.

Figure 5
Volcano plot of DEGs in AP vs As and AP vs Pm groups. GO classification of DEGs in AP vs As and AP vs Pm groups