Acquisition of the Zebrafish Specimens
The wild-type AB zebrafish were purchased from the Wuhan Institute of Hydrobiology, Chinese Academy of Sciences, China, and the adult zebrafish of the same size (3 male and 3 female) were selected as experimental specimens.
Total RNA extraction and quality testing
Total RNA was extracted using Invitrogen Ambion RNA Extraction Kit according to the manufacturer's protocol (ThermoFisher Scientific, MA, USA). RNA degradation and contamination were monitored on a 1% agarose gel. The RNA integrity number (RIN) was measured using an Agilent Bioanakyzer 2100 (Agilent Technologies, CA, USA) (Agilent, CA, USA) to assess the RNA quality. Sequencing was performed if the samples RIN values were greater than eight. The total RNA concentration was determined using a Qubit 2.0 fluorometer (Life Technologies, CA, USA).
Sequencing Library Preparation, RNA-seq Sequencing, and raw data preprocessing
Library Preparation was created using VAHTS Stranded mRNA-seq Library Prep Kit according to the manufacturer's protocol (Vazyme, Nanjing,China). After the RNA samples passed the quality test, 2 g total RNA was enriched by magnetic beads with Oligo (dT). Subsequently, the Frag/Prime Buffer was used to break the mRNA into short fragments at 85℃ for 6 min. RNA fragments were converted to cDNA using random primers, followed by second-strand cDNA synthesis and end repair. The double-stranded cDNA was subsequently purified using AMPure XP beads (Beckman Coulter, CA, USA). The purified double-stranded cDNA was added an A and ligated to the sequencing linker, and AMPure XP beads were used for size selection of adapter-ligated DNA. Finally, PCR amplification was performed and the PCR product was purified using AMPure XP beads to obtain the final library. After the library was constructed, preliminary quantification was performed using Qubit 2.0 (Thermos fisher Scientific, MA, USA) , and then the size of the library was detected using the DNA High Sensitivity DNA Kit (Bioanalyzer 2100, Agilent, CA, USA) to ensure the proper insert size of 350–450 bp. The concentration of the library was accurately quantified using KAPA Library Quantification Kits according to the manufacturer's protocol (KAPA Biosystems, MA, USA). Subsequently, the library was sequenced using an Illumina HiSeq X Ten sequencing platform (Illumina, CA, USA). By using Trimmomatic software [35], Low-quality reads were filtered according to the following criteria: quality scores are less than 20, and reads with average quality scores of each read less than 20. FastQC software (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) [36] was employed to assess the quality of the raw reads.
Mapping, annotation, and differential expression analysis for mRNA-seq data
The zebrafish reference genome (GRCz10/danRer10, Sep.2014) and the reference Index (the GTF file) were downloaded from UCSC (http://genome.ucsc.edu/). Firstly, bowtie2-build was used to index the reference genome, and then TopHat (version 2.1.0) was used [37] to map the reads to the reference genome. TopHat initially removed a small portion of the reads based on the quality information of each reads and then mapped the qualified reads to the reference genome [37]. In addition, the parameter of "--library– type" was set to "fr-firststr", and the other parameters were set to default values. Then the result file of TopHat was input into Cufflinks software for further analyses, including transcript assembly, abundance estimation, and differential expression of RNA-Seq samples [38]. The confidence intervals for estimation of fragments per kilobase of transcript per million mapped reads FPKM were calculated using a Bayesian inference method [39]. Differential expressed genes were characterized according to the criterion of a fold change > 1.5 and q-value < 0.05.
Gene ontology analyses and KEGG analysis
DAVID online tool (https://david.ncifcrf.gov) was used for identifying enriched biological themes [40]. Enriched GO terms with Gene-Count > 5 and P-value < 0.05 were selected as the thresholds for the subsequent analyses. Cytoscape software (two tools: ClueGO and CluePedia) was used for the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis [41], showing only pathways with P-value < 0.05. ClueGO network diagram was created based on Kappa statistics. Every node in the diagram represented a term that reflected the relationships between nodes, and the color of the nodes reflected the enrichment of the node classification.
Validation by RT-qPCR
For the characterization of the zebrafish brain, RT-qPCR of mRNAs was performed using SYBR Green PCR Master Mix (Fermentas, Guangzhou, China) following the manufacturer’s instructions. The experiments were repeated at least in triplicate. The gene-specific primers were as follows:
zgc:114181(forward: 5’-TCACCGCCTTCCTCAGAAAT-3’;
reverse: 5’-ACTGAGCCGTGACCACTTTA-3’);
f13a1a.1(forward: 5’-GCGTGTCATTCCAAAACCCT-3’;
reverse: 5’-CAACTTGCACAGCCAGGATT-3’);
vtna(forward: 5’-GACATTCGCCGGCTTGTATT-3’
reverse: 5’-CAAGCGGACACTAAGGATGC-3’);
rbp2a(forward: 5’-TGACTAAACAAAAGGGCGCC-3’;
reverse: 5’-CGCCTCTGTGCATCTTCTTC-3’).
-actin(forward primer:5’-TCACCACCACAGCCGAAAG-3’;
reverse primer:5’-GGTCAGCAATGCCAGGGTA-3’)
-actin was used as an internal control. The efficiencies of all sets of primers were between 91.6%-97.3%. We used 96-hole RT-qPCR plates including three negative controls to discard false positive amplification signals. In each PCR plate, synthesized cDNAs without adding reverse transcriptase were used to confirm no genomic contamination. PCR reactions were performed at 95°C for 5 min, followed by 40 cycles of 95°C for 15s and 60°C for 1 min. The fold changes were calculated by ∆∆CT method [42].
lncRNA identification and characterization
Cufflinks script was used to determine whether the detected transcripts were annotated by Refseq genes of zebrafish genome (build GRCz10/danRer10, Sep., 2014). Transcripts with length < 200 nt or > 10000 nt were discarded, and only transcripts with exon number > 1 were retained. Coding Potential Calculator software (CPC, http://cpc.cbi.pku.edu.cn/) [43] and Coding Potential Assessment Tool (CPAT, http://lilab.research.bcm.edu/cpat/) [44] were used for the coding potential prediction analysis. Only transcripts that were considered "non-coding" by both of these tools were considered potential lncRNAs and software cuffdiff [38] was employed for subsequent analysis.
Protein-protein interaction network and lncRNA-gene co-expression network construction
The STRING online software [45] was employed to construct the interaction network of the proteins encoded by the differentially expressed genes. A combined score > 0.7 was adopted as the cut-off criterion. According to the FPKM values of the different expression of the lncRNAs and the mRNAs in the six samples, the Pearson correlation coefficient between the lncRNAs and the mRNAs were calculated according to their FPKM values. The threshold for positive correlation was set to PCC > 0.8 and P < 0.05. Finally, the protein-protein interaction network and the regulatory network of lncRNAs-mRNA were established via Cytoscape software (http://www.cytoscape.org/)[41]. We used the ZFLNC database to conduct a conservative analysis of lncRNA [46].
Statistical analysis
We used Pearson’s correlation coefficient to assess relationships between gene expression and stationary behavior and determined significance with two-tailed P-values, and a P-value of less than 0.05 was considered to be statistically significant.
In RT-qPCR validation experiment, a two-tailed t test was used to assess the statistical deference between male and female zebrafish, and a P-value of less than 0.05 was considered to be statistically significant.
All the statistical analyses were performed using R (https://www.r-project.org/, version 3.5.1).