mRNA expression profiles in Mongolian horse testis development
To screen for genes related to testis development and spermiogenesis, a total of six libraries—three from immature colts (BSM1, BSM2, and BSM3) and three from mature stallions (ASM1, ASM2, and ASM3)—were constructed and sequenced on the Illumina Hiseq platform. An overview of the information for these libraries is listed in Table S1. The libraries were found to contain 58,277,598 (BSM1), 59,921,568 (BSM2), 53,371,478 (BSM3), 59,523,578 (ASM1), 61,727,216 (ASM2), and 53,956,064 (ASM3) raw reads. To ensure the quality of subsequent analysis, the adapter reads, the N base > 10% and low-quality reads were removed; the results were 57,482,858 (BSM1), 58,972,988 (BSM2), 52,475,320 (BSM3), 58,411,056 (ASM1), 60,775,868 (ASM2), and 53,135,736 (ASM3) clean reads. The error rate and GC content of each library were calculated to control for the quality of libraries (Table S1). These results verify that the six libraries were of high quality.
The clean reads were aligned to the reference genome of E. caballus, which is the most accurate and commonly used reference genome for horses[9, 10], using TopHat v2.0.12. In total, 84.85% (BSM1), 85.05% (BSM2), 84.25% (BSM3), 84.38% (ASM1), 84.14% (ASM2), and 84.23% (ASM3) clean reads were uniquely mapped to the reference genome. In addition, more than 40% of clean reads were non-splice reads in each sample. Additionally, 27.98% (BSM1), 27.35% (BSM2), 26.92% (BSM3), 36.17% (ASM1), 34.89% (ASM2), and 34.74% (ASM3) of the clean reads were mapped to the borders of exons (also called “junction reads”) (Table S2). All RNA-Seq data have been submitted to The Gene Expression Omnibus, accession number GEO: GSE101697).
Analysis of alternative splicing
Alternative splicing (AS) is a common occurrence in most eukaryotic cells and can result in the translation of different forms of proteins at various time points and environments, which increases the adaptability and physical fitness of the species. In our study, the AS events were classified into five kinds by rMATS (http://rnaseq-mats.sourceforge.net/index.html): (1) SE: Skipped exon, (2) MXE: Mutually exclusive exon, (3) A5SS: Alternative 5' splice site, (4) A3SS: Alternative 3' splice site, and (5) RI: Retained intron. The kinds and quantities of AS events were counted, and the expression of each type of AS event was analyzed. The results indicate that numerous and varied AS events exist in the spermatogenesis process Table S3.
Analysis of DEGs
The DEGs between libraries were screened based on DEGSeq analysis, taking padj < 0.05 as the cut-off. By considering ad-joining libraries, 16,237 up- and 8,641 down-regulated genes were identified in total that were up-regulated in ASM versus BSM (Fig. 1a). Venn diagrams were constructed to summarize the data, which show that the ASM and BSM libraries shared 16,327 genes that were expressed in both BSM and ASM (Fig. 1b). Fig. 1c shows the results of hierarchical clustering, with the DEGs of the six libraries divided into two clusters. These results show that the ASM and BSM libraries had overall distinct patterns of differential expression but that the replicates within each developmental stage were similar.
For further assessment of DEGs that are transcription factors. A total 922 transcription factors belong to 60 transcription factors families.
GO pathway enrichment analysis of DEGs
GO pathway enrichment analysis was used to explore the functions of the DEGs in testis development. A total of 70 GO terms related to “biological processes,” “molecular functions,” and “cellular components” were significantly enriched across the ASM vs BSM. DEG groups (Fig. 2). Approximately 32 GO terms belong to “biological processes,” and these were mainly focused on “metabolic processes” (2,454 genes), “organic substance metabolic process” (2,044 genes), and “cellular metabolic process” (1,948 genes). Approximately 20 GO terms belong to the functional category of “cellular component,” and these were mainly concentrated in “intracellular” (1,951 genes), “intracellular part” (1,844 genes), and “organelle” (1,668 genes) classification groups. Approximately 18 GO terms belong to “molecular function,” which centered on “binding” (3,037 genes), “protein binding” (1,719 genes), and “catalytic activity” (1,659 genes).
KEGG pathway enrichment analysis of DEGs
KEGG pathway enrichment analysis was performed on these differently expressed genes with a cut-off criterion of corrected P-values < 0.05. A total of 10 enriched pathways were detected. Six KEGG pathways were upregulated, including “ubiquitin-mediated proteolysis,” “protein processing in endoplasmic reticulum,” “RNA transport,” “metabolic pathways,” “oocyte meiosis,” and “cell cycle.” Four pathways were downregulated, including “ribosome,” “focal adhesion,” ‘proteoglycans in cancer,” and “ECM-receptor interactions” (Fig. 3).
Quantitative real time PCR (qRT-PCR) validation of differential expression
To verify the differential expression of genes in ASM vs. BSM testis, we selected eight DEGs, including MEI1 (two different sites), EHMT2, BTRC, CLOCK, IQCG, ANAPC5 and SKP1, to validate the expression patterns by qRT-PCR. These genes belong to “oocyte meiosis” KEGG pathways and GO terms “spermatogenesis,” “male gamete generation,” and “spermatid development” (Table S4). There was a relatively high correlation between the mRNA expression levels of the eight genes, as detected by qRT-PCR and RNA-Seq (correlation coefficient = 0.853), and all 8 genes were upregulated in ASM. This suggests that our RNA-Seq results are reliable.