Histology characteristics from the gonads of 3-year-old A. schrenckii
The gonads of 3-year-old A. schrenckii were visible at the early gametogenesis stage. In 3-year-old female individuals, the histological section of ovary showed deep, branching ovarian lamellae structure and was mainly composed of primary growth oocytes of different diameter sizes ranging from100 μm to 500 μm. Some oocytes at the perinucleolar stage were clearly observed with a high number of small nucleoli along the nuclear perimeter. In 3-year-old male individuals, section of the entire germ region of testis showed smooth surface of the lateral side. By histological observation, it was found that the testis tissue displayed alveolate seminiferous lobules organization structure. The anastomosing tubules were separated from each other by thin layers of compact connective tissues and filled with spermatogonia cells, differentiated primary spermatocytes and secondary spermatocytes enveloped by their own Sertoli cells. At the same time, aligned spermatogonia in mitosis connected by cytoplasmic bridges were also observed. Histomorphological characteristics of the testis and ovary from 3-year-old A. schrenckii individuals are showed in Figure 1.
Full-length transcripts from the gonads of A. schrenckii
The full-length transcriptome of A. schrenckii was generated using the PacBio Sequel platform on the pooled RNA from seven tissues, including three testes and four ovaries. The resulting total of 50.04 G subread bases was generated by two SMRT cells from the PacBio library; therefore, the 1,260,958 reads of insert  were produced with a mean read quality of 0.95 and mean passes of 14 circles (Supplementary Table 1). By applying the standard Iso-Seq classification and clustering protocol, all the ROIs were further classified into 358,153 nFL sequences and 860,617 FLNC reads with a mean length of 2,548 bp. Based on the ICE Quiver and Arrow polishing algorithms, we produced 461,596 polished full-length consensus transcripts with a mean length of 2,782 bp, including 335,067 high-quality (HQ) and 125,969 low-quality (LQ) sequences. After correction using short reads produced by Illumina short-read RNA-seq and subsequently removing redundancies using the CD-Hit program, the consensus transcripts were finally clustered into a total of 164,618 unigenes for subsequent analysis. We found that 91.09% of the unigenes main length distribution ranging from 1–6 kbp (Figure 2 and Supplementary Table 2). The Iso-Seq statistics from the gonads of A. schrenckii by the PacBio Sequel platform are listed in Table 1.
Meanwhile, the clean short reads were respectively generated from seven gonads of A. schrenckii (including three testes and four ovaries) using Illumina sequencing platform. The quality evaluation indexes of the clean short reads are summarized in Supplementary Table 3. In sum, the high-quality clean short reads from each sampled Illumina library was obtained with at least 6.96 ´107 read numbers and a Q30 of over 90%.
Efficient gene annotation of full-length unigenes from A. schrenckii gonads
To obtain a comprehensive functional annotation from the full-length gonad transcriptome of A. schrenckii, all the 164,618 nonredundant unigenes were assigned to align with seven different databases, including NR, KOG, Pfam, SwissProt, KEGG, GO, and eggNOG. A total of 93.55% of the unigenes (154,006 of 164,618) were successfully annotated with significant hits (E-value < 1E-5) from these well-curated databases. The statistics of the full-length unigene annotations are listed in Table 2. The remaining unannotated unigenes (10,612 unigenes) might represent novel A. schrenckii species-specific genes.
Among the annotated 60 classified GO terms, cellular process was identified as the most common annotation in the biological process; metabolic process and biological regulation were the next most abundant GO terms. In the molecular function and cellular component categories, binding and cell part annotations were identified as the most abundant terms, respectively. Two putative early gametogenesis related GO terms, including reproductive process (involving 1,188 unigenes) and reproduction (involving 1,134 unigenes), were successfully annotated. The GO classifications of the full-length unigenes from the gonads of A. schrenckii are shown in Supplementary Figure 1 and Supplementary Table 4.
In the KEGG classification, a total of 295 pathways annotated from 87,086 nonredundant unigenes were extracted from the gonad transcriptome of A. schrenckii (Supplementary Table 5). The results showed that the protein processing endoplasmic reticulum (2,329 unigenes), RNA transport (2,312 unigenes) and cell cycle (2,263 unigenes) were the top three pathways with the most abundant unigenes. Notably, we paid attention to 14 KEGG pathways, which may be closely associated with early gametogenesis of sturgeon. Among these, MAPK signaling pathway (1,834 unigenes), oocyte meiosis (1,731 unigenes) and progesterone-mediated oocyte maturation (1,398 unigenes) were the top three pathways with the most abundant unigenes distribution.
Search for genes involved in early gametogenesis of A. schrenckii
A total of 60 genes (involving 755 unigenes), reported as active in the gonad development of sturgeon [6, 20, 55], were detected to be present in the full-length gonad transcriptome of 3-year-old A. schrenckii. Therefore, the 60 early gametogenesis related genes and their NR annotations of full-length unigenes from the Iso-Seq of A. scipenserkii gonads are listed in Table 3 and Supplementary Table 6, respectively. Mainly, two major sex determination related genes in fish, Dmrt1and Gsdf, were also present in the full-length gonad transcriptome. Nine spermatogenesis-related genes were significantly matched with the full-length gonad transcriptome, including AR, Vasa, ER, Rspo1, Igf I, Dkk1, Hsd11b2, Cyp11b and ATRX. Meanwhile, four oogenesis-related genes were also predicted from the full-length gonad transcriptome, including Foxl2, Cyp19a, VR and Ozf. Four hormone receptor genes, including AR, ER, Gnrhr and Fshr. Two members of transforming growth factor (TGF)-β superfamily, Amh and Gsdf. Nine genes belonging to the sox subfamily, Sox3, Sox4, Sox5, Sox6, Sox7, Sox8, Sox9, Sox10 and Sox18, and five genes belonging to the double sex and mab-s (DM) domain, Dmrt1, Dmrt2, DmrtA1, DmrtA2 and DmrtB1, were found. In addition, Nine Sap family genes and four Hsp family genes were also concerned. We also found that Oct4 has the most abundant transcript diversity, with 100 unigenes predicted with ORFs among 104 unigenes, followed by Hsp70 (59/64) and Cyp19a (49/50).
Differential expression genes in early gametogenesis in the gonad transcriptome of A. schrenckii
Using nonredundant full-length transcripts as genome sequence references and combining the clean short reads datasets from the Illumina sequencing platform, the expression values (FPKM) of all 164,618 unigenes in three testes and four ovaries of seven A. schrenckii individuals were separately obtained. Usually, the 0.1 threshold of FPKM value is regarded as the expression criterion of the unigene in tested tissue (FPKM >0.1). Therefore, the expression characteristics of all the full-length unigenes between the testes and ovaries of A. schrenckii were classified into the following three categories. 1) 19,481 unigenes were not expressed in either ovaries or testes (FPKM=0 or FPKM <0.1); 2) 19,716 unigenes (13.6%) were ovary-specific expression patterns, including 14,284 unigenes in 0.1<FPKM<2, 4,242 unigenes in 2<FPKM<10 and 1,190 unigenes with FPKM>10; 3) In contrast, only 3,028 unigenes (2.1%) were exclusively transcribed in testis tissues, including 2,751 unigenes in 0.1<FPKM<2, 229 unigenes in 2<FPKM<10 and 48 unigenes with FPKM>10. Here, the distribution of testis-specific and ovary-specific unigenes are shown in a Venn diagram (Figure 3A).
DEseq software was used for the analysis of differentially expressed unigenes (DEUs) in the testes and ovaries. Among 24,101 DEUs with a |log2(Fold Change)| >1 and FDR < 0.05, 18,863 unigenes were upregulated in the ovaries, while 5,238 unigenes were upregulated in the testes (Supplementary Figure 2). In further analysis 30 genes of the 60 early gametogenesis-related genes were screened to have significant expression between the testes and the ovaries (Table 4). In total, twelve genes (Foxl2, Cyp19a, OCT4, Sox3, Sox7, Bmp15, Dkk1, Gsf1, Hsp, Hsp70, Hsp90 and Sap24) were shown to be upregulated in the ovaries, while eighteen genes (DmrtB1, Amh, Sox4, Sox5, Sox8, Sox9, Vasa, Rspo1, ERβ, Gsdf, Hsd11b2, Fshr, ATRX, Ozf6, Ozf7, Sap2, Sap5 and Sap6) exhibited significant higher expression in the testes. Among the highly expressed genes in the testes, Amh is a only tissue-specific gene, i.e. only expressed in testes.
To uncover the biological function of DEUs, we separately performed KEGG pathway enrichment analysis for the ovary-biased and testis-biased DEUs. As shown in Figure 3B and Figure 3C, we found that completely different KEGG pathways were enriched between the ovary-biased and testis-biased DEUs. A total of 17 and 11 terms significantly enriched from the ovary-biased and testis-biased DEUs were discovered, respectively (corrected P <0.05). Among the ovary-biased DEUs, most of the unigenes were involved in the three top KEGG pathways, including the cell cycle, oocyte meiosis and phagosome. However, most of the unigenes of the testis-biased DEUs were related to cell adhesion molecules, neuroactive ligand-receptor interactions, calcium signaling pathways, etc. We further analyzed the distribution of DEUs in early gametogenesis-related GO terms and KEGG pathways. We found that oocyte meiosis is the most significant pathway with 277 DEUs of 1,731 unigenes, followed by the progesterone-mediated oocyte maturation pathway with 201 DEUs (Table 5) and so we concluded that ovaries from 3-year-old A. schrenckii may be active in the oogenesis process.
Association analysis of alternative splicing (AS) and early gametogenesis from the gonad full-length transcriptomes of A. schrenckii
Because sturgeon species still had no reference genome, we detected AS transcript isoforms of the full-length gonad transcriptome from A. schrenckii by referring to the pipeline of Amborella trichopoda without a reference genome . As a result, a total of 236,672 AS events (involving 36,522 unigenes) were detected in the gonad transcriptome of A. schrenckii. Among these, we found that 16,909 unigenes (46.29%) had only one isoform; however, it is interesting to note that 5,314 unigenes (14.55%) were predicted to have more than 10 isoforms (Figure 4A).
Importantly, we found that sixteen early gametogensis-related genes were predicted to be involved in AS events (Supplementary Table 7). We selected two genes as examples, including Vasa (unigene ID: F01_cb6729_c68/f1p2/2928) predicted with four alternative isoforms and Fem1 (unigene ID: F01_cb8161_c15/f1p2/1763) predicted with five alternative isoforms. In Figure 4B, the cluster heatmap (Log2(FPKM+1) values) indicates the expression patterns of different alternative isoforms in the testes and ovaries of A. schrenckii. Vasa, a conserved spermatogenesis related gene, showed the main expression transcript with higher expression level in testes than that in ovary among 4 isoforms. Isoform 1 of Vasa had an obviously high expression in ovaries, and the other isoforms showed very low expression levels in both testes and ovaries. Meanwhile, we also found that expression levels of Fem and its five isoforms might seem like a paradoxical pattern. For example, isoform 2 and isoform 3 were the two main expression transcripts and had relatively high expression levels in both the ovaries and the testes, but isoform 4 and isoform 5 showed the opposite pattern between the ovaries and the testes. Therefore, these alternative isoforms suggest an important role in the regulation of gene expression through compensation or neutralization effects.
Subsequently, we investigated the distribution of AS events in the early gametogenesis-related GO terms and KEGG signaling pathways. The results indicated that the AS event is a pervasive feature involved in the early gametogenesis processes of sturgeon. The column plot showed that oocyte meiosis, progesterone-mediated oocyte maturation and the MAPK signaling pathway are the three most abundant AS events unigenes, while steroid biosynthesis is the least abundant event, with only eight unigenes (Figure 4C).
Association analysis of long noncoding RNAs (LncRNAs) and early gametogenesis from full-length gonad transcriptomes of A. schrenckii
A total of 10,566 unigenes were identified as putative LncRNAs from the full-length gonad transcriptome of A. schrenckii (Figure 5A). Further analysis indicated that 2,388 of the detected lncRNAs were not expressed in either the testes or ovaries (FPKM<0.1). Meanwhile, 524 LncRNAs present exclusively in ovary-specific expression and 551 LncRNAs present exclusively in testis-specific expression were also detected. Overall, 961 putative LncRNAs were differentially expressed between the ovaries and testes of A. schrenckii, including 235 ovary-biased LncRNAs and 726 testis-biased LncRNAs (Supplementary Figure 3).
Association analysis of transcript factors (TFs) and early gametogenesis from full-length gonad transcriptomes of A. schrenckii
A total of 4,339 nonredundant TF-related unigene sequences were matched against the AnimalTFDB database using BLAST, corresponding to 53 TF-related Pfam family domains. The top 20 abundant terms are listed in Figure 5B. The sixty unigenes were found to have two or more different TF-related domains. For example, the results indicated that F01_cb10567_c40/f1p0/2683 was predicted to have a Homebox and miscellaneous domain, which suggests that further identification needs to be performed.
Many SRY-related HMG (high-mobility group) box (Sox) transcription factors play an important role in male gonadal differentiation, spermatogenesis and gonadal function maintenance in vertebrate species. Because the SOX family often shares the conserved HMG box domain, members of the Sox family were herein identified from the full-length gonad transcriptome of A. schrenckii. After further validation using SMART protein motif analysis, 82 nonredundant unigenes were found to have both HMG domains and SOX NR annotations with significant matches. We also found that the sequence variations of Sox increased the numbers of unigenes, and the 82 nonredundant unigenes belonged to seven members of the SOX family, including Sox3 (52 unigenes), Sox4 (4 unigenes), Sox5 (5 unigenes), Sox7 (10 unigenes), Sox8 (6 unigenes), Sox9 (4 unigenes) and Sox10 (1 unigene) (Figure 5C and Supplementary Table 8).
Further analysis was performed using Sox9 as an example. The four nonredundant full-length unigenes were renamed Asc_Sox9-1-4 according to the length of PacBio Iso-Seq. The characteristics of the only completed sturgeon Sox9 protein from A. sinensis in the NCBI database (accession number: AHZ62758.1) and the Asc_Sox9-1-4 sequence are listed in detail in Table 6. The results showed two main differences: 1) compared to the length of Asi_Sox9 (2,145 bp), those of the four Asc_Sox9-1-4 were longer and varied from 2,873 bp to 3,491 bp. 2) The lengths of the amino acids in Asc_Sox9-1-4 also varied, from 429 aa to 488 aa. The information for the Asc_Sox9-1-4 sequences is listed in Supplementary File 1. Due to the significant differences in amino acid lengths, HMG domain position, UTR length and expression abundance, the four Asc_Sox9 genes suggested putatively novel transcripts in A. schrenckii. Figure 6 shows a schematic diagram of the gene structure and expression abundance (FPKM levels) among the Asc_Sox9-1-4 genes.
The phylogenetic tree was constructed from the alignment of the amino acid sequences in the four Asc_Sox9-1-4 genes with those from forty-six other animal species from five classes, including Mammalia, Aves, Amphibia, Reptilias and Osteichthyes (Supplementary Table 9). From Figure 7, we found that all four Asc_Sox9-1-4 genes and those of other sturgeon species were perfectly clustered into one group. Obviously, Asc_Sox9-1 gene was closer to that of A. baerii; however, it was genetically distant from Asc_Sox9-3, Asc_Sox9-2 and Asc_Sox9-4. As expected, compared with the other four classes, the sturgeon Sox9 genes had the closest evolutionary relationship with other teleost fish species among all the five classes. We also found differences between the Sox9 gene tree and the species tree, which suggests that Sox9 is under evolutionary selection pressure. Meanwhile, Sox9 genes from five different classes were clustered together into five groups, which indicates that their functions may be relatively conserved in vertebrates.