Identification and differential expression analysis of lncRNAs in the yak skin
A total of 748.61M raw reads were produced from 15 skin samples in 5 phase groups of yaks using the Illumina Hiseq 2000 platform, in which 733.79 M reads were clean (Additional file 2), accounting for 98% of raw reads. After screening with rigorous criteria and analyzing the coding potential with the software CNCI, CPC, Pfam and PLEK, 2884lncRNAs from all samples were identified (Fig. 1a). Of these transcripts, 70.1% (2023) were novel lncRNAs. The average length of novel lncRNAs was 1152 bp. Furthermore, the data showed that approximately 40% of the novel lncRNAs were antisense lncRNAs and 60% were sense lncRNAs, besides approximately 90% of the novel lncRNAs with two to three exons (Fig. 1b). These data of lncRNAs were similar to the features of previously described lncRNAs [44].
The expression levels of lncRNAs were analyzed using Cuffdiff v2.1.1. The DELs of every two distinct groups were screened with the threshold set as fold change >2 and P-adjusted value <0.05. Numbers of all the DELs between every comparison groups and the common and specific DELs were visualized and were shown in Fig. 1c. Based on the result, Oct. was selected in the anagen phase with Jan. (catagen) and Mar. (telogen) for further functional analysis, because both Mar.-vs-Oct. and Jan.-vs-Oct. comparison groups had a higher number of DELs, while the number of DELs was relatively minor in the Mar.-vs-Jun. and Aug.-vs-Oct. comparison groups. Jun. and Aug. were preliminarily excluded to simplify subsequent analysis. In the differential expression analysis of the 3 groups (Jan., Mar. and Oct.), 280 DELs were obtained in the Jan.-vs-Mar., 340 and 198 DELs were obtained in the Mar.-vs-Oct. and Jan.-vs-Oct. comparison groups, respectively. Also, 22 of these DELs were shared in the 3 comparison groups (Fig. 1d). PCA was used to analyze the sample similarities. A more concentrated distribution of samples in the same group was clearly observed. In addition, Fig. 1e showed that the sample-to-sample distances are closer in the same group. The data indicated that the samples had high reliability.
GO and KEGG enrichment analysis
The nearest protein-coding genes paired with DELs were used to perform GO and KEGG enrichment analysis so as to investigate the potential function of DELs during the three distinct HFs phases. Fig. 2a shows the top 30 GO terms (top 10 terms in each GO category).. The result showed that the GO terms were enriched mainly in Keratinization, keratin filament, carbohydrate metabolic process, and thiamine transport, which are involved in hair formation or nutrient metabolism. In the Mar.-vs-Oct. comparison group, protein ubiquitination was highly enriched. The ubiquitin-mediated proteolysis pathway was reported to be important for distinguishing the secondary hair follicles (SFs) () from primary hair follicles (PFs) () in cashmere goats [45, 46], suggesting that the regulation of protein ubiquitination was related to the transition between PFs and SFs from Mar. to Oct. in the present study.
KEGG enrichment analysis was performed. The top 20 KEGG pathways with the number of DELs greater than 2 are shown in Fig. 2b. The VEGF signaling pathway, Wnt signaling pathway, and signaling pathways regulating the pluripotency of stem cells, which were reported to play indispensable roles during the HFs cycle, were enriched [47-49]. In addition, hormone and nutrition metabolism signaling pathway (e.g. estrogen signaling pathway, vitamin digestion and absorption, and glycine, serine, and threonine metabolism), immune response signaling pathway (e.g. antigen processing and presentation and natural killer cell–mediated cytotoxicity) were enriched, which might be essential for the HFs growth [6, 50]. The pathways among distinct phases coincidentally corresponding with the biological process of hair cycling seemed to orchestrate a dynamic genesis of the HFs cycle at the molecular level. For example, in Jan.-vs-Mar. comparison group, the only up-regulated pathway was vitamin digestion and absorption due to the relative resting state in the telogen (Mar.) and reserve energy for the next HFs cycle (Figure S1). Moreover, theHIF-1 signaling pathway involved in the perception of hypoxia was enriched, which might be related to the high altitude environment with frigid and hypoxia of yak living. The protein-protein interaction networks of coding genes that paired with all the DELs of the three distinct phases were analyzed using the STRING database (https://string-db.org). The result showed that ITCH, PSMD1, FBXL19, and HSPA9 primarily involved in protein ubiquitination (queried from the UniProt database) were key nodes of the DELs paired genes (Fig. 2c). The ubiquitin-mediated proteolysis pathway was important for distinguishing SFs from PFs in cashmere goats [45, 46], and it was recently reported that the ubiquitylation process associated with Hedgehog signaling a crucial signaling that affecting on HFs regeneration [51-53].
Comparative analysis of DELs in yak with cashmere goat lncRNA data
To further screen the DELs that are more likely to affect the HFs cycle in yak, 93 differently expressed protein coding genes (DEGs) paired with 110 DELs were obtained by combining the present data with previous mRNA sequencing data on the HFs cycle in yak [42]. The detailed information on these DELs and DEGs were presented in Additional file 3. The expression tendency and abundance of most DELs were consistent with the paired coding genes, although a few pairs such as TCONS_00055063 with its paired gene NOS3 had an opposite expression tendency. Six lncRNAs and four lncRNA-paired protein-coding genes, which were differentially expressed in at least any two distinct phases, were selected to verify the transcriptome sequencing data by RT-qPCR. LncRNAs were selected based on the expression levels and potential function of the paired protein-coding gene in hair development. Fig. 3a shows the RT-qPCR result and the comparison of the sequencing data and RT-qPCR data.
LncRNAs have species and tissue specificity with lower sequence conservation [11]. Yak and cashmere goat belong to the Bovidae family of ruminants, and they had a similar seasonal HFs cycle. To explore the sequence conservation level between yak and cashmere goat during the HFs cycle and obtain more reliable DELs regulating HFs growth, NCBI BLAST-2.9.0+ was used to detect the sequence-conserved lncRNAs of these 110 DELs with the lncRNA data of the cashmere goat in the HFs cycle (532 lncRNA sequences) [19]. All the lncRNA sequence of yak in this study was showed in Additional file 4.The BLAST result found that 24 DELs among these 110 DELs had sequence conservation with cashmere goat lncRNA data to different degrees. Then, the expression profiles of aligned DELs in the yak with their respective alignments in the cashmere goat dataset were comparatively analyzed. Excluding the repetitive lncRNA sequences, a total of 126 lncRNAs in the cashmere goat dataset were aligned by 24 DELs of the yak dataset. The aligned sequences in both datasets all accounted for approximately 20% (126/532, 24/110) of their total databases used in BLAST-2.9.0+ (Additional file 5). Among the 126 lncRNA sequences, 41 were found to be differentially expressed in the cashmere goat during the HFs cycle [19]. Comparative analysis found that the expression profiles of most of the differently expressed alignments (approximately 80%) in the cashmere goat in anagen and telogen were similar to those of the aligned DELs in the yak in anagen (Oct.) and the transition from catagen (Jan.) to telogen (Mar.).(Additional file 6). This finding suggested that the number of lncRNAs with a conserved sequence is limited, accounting for about 20%. The expression pattern of the sequence conserved DELs between the yak and cashmere goat data had a little difference, which probably related to the difference from species and sampling time, considering only two phases of the HFs cycle in the cashmere goat. Or not all kinds of conserved elements in the lncRNAs play the similar roles between species. Further experiments are needed to detect whether the DELs with conserved elements between the yak and cashmere goat playing similar functions during the HFs cycle.
Fig. 3b shows the 24 DELs and paired 23 DEGs of the yak with a heat map. In addition, comparative analysis of the 93 DEGs in the yak with the DEGs between the anagen and telogen of cashmere goat data [19] found that 10 DEGs were differentially expressed both in the yak and cashmere goat during the HFs cycle and the expression tendency in the cyclic process was consistent (Fig. 3c). The paired lncRNAs of genes GPRC5D, FER, EMX2 and ERAP2 simultaneously had a conserved sequence between the yak and cashmere goat. GPRC5Dand EMX2 were associated with HFs morphogenesis [54, 55]. These results verified the sequencing data and further revealed some potentially important DELs and their paired genes affecting theHFs cycle based on the comparative analysis of lncRNA-seq data between the yak and cashmere goat.
Sequence conservation analysis of lncRNAs between yak and cashmere goat
In the above section, the sequence conservation of the 110 DELs in yak was investigated by compared with cashmere goat lncRNA data during HFs cycle using NCBI BLAST-2.9.0+. Then, a detailed analysis of the sequence conservation properties of these aligned lncRNAs were performed. In the result of NCBI BLAST-2.9.0+, total of 302 hits were found in goat lncRNA data, aligned with 24 DELs of the yak database (Additional file 5) [19]. Length of the matching regions of the alignments was distributed between 28-3765 bp, and the matching percentage was more than 80% (Additional file 5). Fig. 4a shows the length distribution of the matching sequence of the alignments in cashmere goat lncRNA data, which indicates that the number of the matching sequence distributed between 101-200 bp is the most, accounting for nearly 60% of all the hits [19] (Fig. 4a, Additional file5). Furthermore, the matching regions of every aligned lncRNA in goat data were analyzed, revealing that most of the queried lncRNAs in the yak had numerous alignments with cashmere goat lncRNA data while the matching regions on one lncRNA in the yak were constrained (Additional files 5 and7). Table 1 presents partial queries that aligned more than 10 lncRNA sequences in cashmere goat data. For example, the matching region of lncRNA TCONS_00027937 with a high sequence conservation aligned to 36 lncRNA sequences of the cashmere goat data while all of these alignments roughly matched with the region of 43-160 of TCONS_00027937, the region might be an important element for the function of TCONS_00027937. Then, the lncRNAs TCONS_00008989 and TCONS_00020227 were randomly selected, along with lncRNA TCONS_00016111, which had the longest matching sequence to 3765 bp, were used to perform further detailed analysis for their matching regions.
For TCONS_00008989, all the alignments of cashmere goat data were roughly mapped to the region of 243-442 of TCONS_00008989. Then, a local multiple sequence alignment was manually performed in the matching region of TCONS_00008989 with other aligned 11 different sequences in cashmere goat. The result showed that the matching region between 315-356 of TCONS_00008989 about 40 bp was highly identical among 10 of the alignments (Figure S2). The sequence logo of this highly identical region among alignments was presented using WebLogo3 (http://weblogo.threeplusone.com/) (Fig. 4b).This short sequence might be an essential binding module for dictating the function of TCONS_00008989. Besides, it was speculated that a common feature of sequence conservation of lncRNAs was a short highly conserved sequence shared by multiple lncRNAs of one species or various species.
TCONS_00020227 is an intronic lncRNA, which was aligned to 31 lncRNA sequences of cashmere goat data, mainly matched with 4 regions: 2506-3121, 4251-4452, 4723-4898, and 4886-5053 within TCONS_00020227 (Additional file 5, Table 1). Compared with TCONS_00008989, the matching regions in TCONS_00020227 were relatively varied. The presence of repeats on TCONS_00020227 was analyzed using the Dot Plot of TCONS_00020227 on the dottup website. Two pairs of repeats appeared on TCONS_00020227 (Fig. 4c). The detailed regions of the repeats were presented by pair-wise alignment using Clone Manager software (Fig. 4c), finding that the latter repeats were coincident with the matching regions of 4723-4898 and 4886-5053, detected by NCBI BLAST-2.9.0+ (Fig. 4c and Table 1). The slight difference in the sequence value might be due to the use of distinct analysis methods, but the ranges were comparable. Another repeats appearing in the Dot Plot of TCONS_00020227 were also observed using Clone Manager:4252-4449 (equivalent to 4251-4452 detected by NCBI BLAST-2.9.0+), which repeated with the range of 3387-3563; despite a lower match percentage. This result showed that the matching regions repeatedly existed in TCONS_00020227 and further indicated the importance of the matching sequences. Besides, the data also indicated the effectivity of the BLAST+ result.
Finally, the sequence conservation of TCONS_00016111 was analyzed. As TCONS_00016111 had a longer matching region to 3765 bp, the sequence conservation of TCONS_00016111 was explored using Nucleotide BLAST in nucleotide collection databases with default parameters. The BLAST result showed that lncRNA TCONS_00016111 mapped to multiple species genomes, including Ovis canadensis canadensis, whales, Bos taurus, Capra hircus and so on. Most of these alignments were ncRNAs (Figure S3). Interestingly, the distance tree of the selected alignments of the top 15 organisms in the defaulted first page of BLAST result showed that TCONS_00016111 with Ovis canadensis canadensis and whales were closer than bovines at the distance tree (Fig. 4d and Figure S3), suggesting a relationship between Ovis canadensis canadensis, whales, and yaks in terms of some evolutionary traits. Moreover, two matching regions 2881-3029 and 3191-3365 of TCONS_0001611 searched using NCBI BLAST-2.9.0+ between yak and cashmere goat lncRNA data emerged in nearly all of the mapped sequences in the first page using Nucleotide BLAST (Fig. 4e). More importantly, these two regions were largely repeated in the genomes of Ovis canadensis Canadensis (bighorn sheep) and Bos mutus (yak) in different chromosomes, the number of repeats was from hundreds to more than one thousand. Fig. 4f shows an example of the searched results. Why these two regions were simultaneously emerged in genome in abundance. The two regions were found to be partly reverse-complementary so that the region 2881-3365 could form a stable secondary structure (Figure S4), which might be one of the conserved secondary structures of TCONS_0001611. This finding was consistent with the reported conservation properties of lncRNAs in terms of the conserved secondary structure [13]. In addition, miR-34a in cattle, Pan troglodytes, and Gorilla gorilla, and the pre-miR-34a and pri-miR-34a in human were found within the alignments of TCONS_00016111 (Fig. 4e), implicating that TCONS_00016111 also act as a miRNA precursor.