mRNA expression profiles in Mongolian horse testis development
To screen for genes related to testis development and spermiogenesis, a total of six libraries—three from immature colts (BSM1, BSM2, and BSM3) and three from mature stallions (ASM1, ASM2, and ASM3)—were constructed and sequenced on the Illumina Hiseq platform. An overview of the information for these libraries is listed in Table 1. The libraries were found to contain 58,277,598 (BSM1), 59,921,568 (BSM2), 53,371,478 (BSM3), 59,523,578 (ASM1), 61,727,216 (ASM2), and 53,956,064 (ASM3) raw reads. To ensure the quality of subsequent analysis, the adapter reads with ratio > 10% or with incomplete base information and low-quality reads were removed; the results were 57,482,858 (BSM1), 58,972,988 (BSM2), 52,475,320 (BSM3), 58,411,056 (ASM1), 60,775,868 (ASM2), and 53,135,736 (ASM3) clean reads. The error rate and GC content of each library were calculated to control for the quality of libraries (Table 1). These results verify that the six libraries were of high quality.
The clean reads were aligned to the reference genome of E. caballus, which is the most accurate and commonly used reference genome for horses[20, 21], using TopHat v2.0.12 (Table 2). In total, 84.85% (BSM1), 85.05% (BSM2), 84.25% (BSM3), 84.38% (ASM1), 84.14% (ASM2), and 84.23% ASM3) clean reads were uniquely mapped to the reference genome. In addition, more than 40% of clean reads were non-splice reads in each sample. Additionally, 27.98% (BSM1), 27.35% (BSM2), 26.92% (BSM3), 36.17% (ASM1), 34.89% (ASM2), and 34.74% (ASM3) of the clean reads were mapped to the borders of exons (also called “junction reads”). All RNA-Seq data have been submitted to The Gene Expression Omnibus, accession number GEO: GSE101697).
Analysis of alternative splicing
Alternative splicing (AS) is a common occurrence in most eukaryotic cells and can result in the translation of different forms of proteins at various time points and environments, which increases the adaptability and physical fitness of the species. In our study, the AS events were classified into five kinds by rMATS (http://rnaseq-mats.sourceforge.net/index.html): (1) SE: Skipped exon, (2) MXE: Mutually exclusive exon, (3) A5SS: Alternative 5' splice site, (4) A3SS: Alternative 3' splice site, and (5) RI: Retained intron. The kinds and quantities of AS events were counted, and the expression of each type of AS event was analyzed. The results indicate that numerous and varied AS events exist in the spermatogenesis process Table 3.
Analysis of DEGs
The DEGs between libraries were screened based on DEGSeq analysis, taking |log2(fold-change)| > 1 and P < 0.05 as the cut-off. By considering ad-joining libraries, 16,237 up- and 8,641 downregulated genes were identified in total that were up-regulated in ASM versus BSM (Table S1, Fig. 1a). Venn diagrams were constructed to summarize the data, which show that the ASM and BSM libraries shared 16,327 genes that were expressed in both BSM and ASM (Fig. 1b). Figure 1c shows the results of hierarchical clustering, with the DEGs of the six libraries divided into two clusters. These results show that the ASM and BSM libraries had overall distinct patterns of differential expression but that the replicates within each developmental stage were similar.
GO pathway enrichment analysis of DEGs
GO pathway enrichment analysis was used to explore the functions of the DEGs in testis development. A total of 709 GO terms related to “biological processes”, “molecular functions”, and “cellular components” were significantly enriched across the ASM vs BSM. DEG groups (Table S2, upregulated GO terms; Table S3, downregulated GO terms, Fig. 2a). Approximately 510 GO terms belong to “biological processes”, and these were mainly focused on “spermatogenesis” (36 genes), “male gamete generation” (36 genes), “spermatid development” (18 genes), “metabolic processes” (2,454 genes), “organic substance metabolic process” (2,044 genes), and “cellular metabolic process” (1,948 genes). Around 84 GO terms belong to the functional category of “cellular component”, and these were mainly concentrated in “intracellular” (1,951 genes), “organelle” (1,844 genes), and “intracellular part” (1,668 genes) classification groups. Approximately 115 GO terms belong to “molecular function”, which centered on “binding” (3,037), “protein binding” (1,719), and “catalytic activity” (1,659 genes) (Fig. 2b). The differential up- and down-regulation may due to the gene ontology associations with spermatogenesis or, conversely, spermatogonia development. Further research will aid in clarifying this discrepancy.
KEGG pathway enrichment analysis of DEGs
KEGG pathway enrichment analysis was performed on these differently expressed genes with a cut-off criterion of corrected P-values < 0.05. A total of 10 enriched pathways were detected (Table S4, upregulated KEGG pathways; Table S5). Six KEGG pathways were upregulated, including “ubiquitin-mediated proteolysis”, “protein processing in endoplasmic reticulum”, “RNA transport”, “metabolic pathways”, “oocyte meiosis”, and “cell cycle”. Four pathways were downregulated, including “ribosome”, “focal adhesion”, ‘proteoglycans in cancer”, and “ECM-receptor interactions” (Fig. 3).
Differential gene transcription factor analysis
Putative transcription factors were analyzed using the animal transcription factor database TFDB 2.0. For genes included in the database, the transcription factors were directly screened by Ensembl geneID; for genes not contained in database, the transcription factors were identified by hmmsearch according to transcription factor family Pfam files. Non-Ensembl geneID genes were screened through the known species transcription factor protein sequence database by BLASTX. In our study, 917 known gene transcription factors and 402 unknown transcription factors were detected, belonging to 57 gene families (Tables S6 and S7).
Quantitative real time PCR (qRT-PCR) validation of differential expression
To verify the differential expression of genes in ASM vs. BSM testis, we selected eight DEGs, including MEI1 (two different sites), EHMT2, BTRC, CLOCK, IQCG, ANAPC5 and SKIP, to validate the expression patterns by qRT-PCR. These genes belong to “oocyte meiosis” KEGG pathways and GO terms “spermatogenesis”, “male gamete generation”, and “spermatid development” (Table S8). There was a relatively high correlation between the mRNA expression levels of the eight genes,as detected by qRT-PCR and RNA-Seq (correlation coefficient = 0.853), and all 9 genes were upregulated in ASM. This suggests that our RNA-Seq results are reliable.