Selection of novel candidate HKGs based on RNA-seq data
From a complete transcriptome dataset, the fragments per kilobase of exon model per million mapped reads (FPKM) of all transcripts from each individual animal were obtained. We first removed some transcripts which did not have a credible function annotation, or exhibited low levels of expression (FPKM=0). This resulted in 15853 unigenes being found for further selection. Next, genes with a relatively high expression level (FPKM ≥ 10 or ≥ the 80th percentile) as determined by the average of the FPKM value, and genes with low variability as determined by the coefficient of variation (CV, %), maximum fold change (MFC), and dispersion measure (DPM), were evaluated (see Methods section). As shown in Figure 1, the probability density curve of all 15853 unigenes was evaluated based on the following 4 indicators:
(1) FPKM. Potential HKGs were relatively highly expressed genes . In this study, 5623 genes had FPKM values ≥ 10 (35.5% of 15853 genes, the green area in Figure 1A).
(2) CV (%). The most promising HKGs would have the lowest CV values. A total of 2266 genes with a CV ≤ 20% (14.3% of 15853 genes, the red area in Figure 1B) were retained in this step with CVs ranging from 7.7% to 20.0%.
(3) DPM. Most stable genes exhibited lower DPM values. The default parameter of DPM<0.3 returned an excessive 7025 unigenes, and so a more stringent DPM < 0.2 was used. Following this, 2026 genes (12.8% of 15853 genes, the yellow area in Figure 1C) were retained in this step with DPM values ranging from 0.09 to 0.2.
(4) MFC. This parameter reflects the range of extremum value, and the lowest MFC values are preferable. In this study, MFC < 2.5 was used which produced 2508 genes (15.8% of 15853 genes, the blue area in Figure 1D), all within the range of 1.35 to 2.5.
A Venn diagram was constructed for the 4-color blocks (green, red, yellow, and blue corresponding to those used in Figure 1A-D, respectively). This showed that 1325 genes (Figure 1E) met all 4 of the above requirements, and are significantly enriched in 11 signaling pathways (q <0.05) as shown in Additional file 1: Figure S1. These genes were considered as candidate HKGs and of these, 8 genes (RRAGA, PTPRA, SRP68, EIF4H, NCBP3, CTBP2, CNBP, and EEF2) that with lower CV value, higher FPKM value and easier primers design were selected for further qualification. Besides, 4 genes outside of the initial 1325 were considered, including SDHA and YWHAZ as they had previously been proposed by other researchers , and ACTB and GAPDH genes were included as the most commonly used endogenous HKGs for exploring target gene expression in goats. In total, 12 candidate HKGs were analyzed in subsequent steps, each gene was ranked based on its CV value with a lower CV receiving a higher-ranking order (Table 1).
Amplification specificity and efficiency of the candidate HKGs and target genes
A total of 15 primer pairs including 12 candidate HKGs and 3 target genes were designed for qRT-PCR experiments. The gene symbol, primer and amplicon specifications are shown in Additional file 1: Table S1. Amplification efficiency for all 15 genes ranged from 96.4% for DKK1 to 103.9% for PTPRA, and the coefficient of determination (R2) varied from 0.9986 to 0.9999. The specificity for each paired primer was validated by the melting curve analysis, which showed a single amplification peak (Additional file 1: Figure S2). Each pair of primers had good specificity and amplification efficiency around 100%.
Expression profiles of the candidate HKGs
The mean Ct (the average of 3 technical replicates in the same sample) values were used to calculate gene expression levels among samples with distinct experimental factors. As shown in Figure 2 and Additional file 1: Table S2, the expression level of the 12 candidate HKGs varied widely from 20.74 to 31.60. The most highly expressed gene was ACTB (mean Ct value: 23.25 cycles), and the lowest was SPR68 (mean Ct value: 29.07 cycles). The top 3 genes with low standard deviations were SRP68 (0.875), NCBP3 (0.970), and PTPRA (0.972). The 3 most deviant expressed genes were ACTB (1.483), CNBP (1.277), and GAPDH (1.258). The narrower standard deviation range of a gene means it has higher expression stability in different samples. Although some genes had a lower standard deviation than others, experimental errors are always possible. Therefore, to obtain a reliable evaluation of these candidate HKGs, further analysis with more scientific algorithms is needed.
Analysis of HKG expression stability
In this study, 4 publically available algorithms were used to evaluate HKGs for higher-accuracy stability rankings: geNorm, NormFinder, BestKeeper, and the ΔCt method.
Gene expression stability was determined by the M-value in geNorm analysis; the lower M value suggests a higher gene expression stability. For group 1, the two most stable genes were EIF4H and EEF2 with the lowest M value, and GAPDH was the most unstable gene (Figure 3A). For group 2, the two most stable genes were EIF4H and PTPRA, and ACTB was the most unstable gene (Figure 3B). For group 3, the two most stable genes were EIF4H and PTPRA, whereas ACTB was the most unstable gene (Figure 3C). For group 4, the two most stable genes were NCBP3 and PTPRA, and GAPDH was the most unstable gene (Figure 3D). For all samples, geNorm analysis was conducted on 39 samples and 12 HKGs. It was determined that the 3 most stable genes were PTPRA, EIF4H, and NCBP3. Conversely, ACTB, CNBP, and GAPDH were the most unstable genes (Figure 3E).
geNorm can be used to determine the minimum optimal number of HKGs needed for accurate normalization under different experimental treatments by analysing pairwise variation (Vn/Vn+1). This method recognizes Vn/Vn+1 <0.15 as a threshold value, and “n” as an appropriate number of HKG needed. The V2/3 values for all the experimental variables were below the cut-off value of 0.15 (0.067, 0.078, 0.099, 0.091 and 0.081 for group 1, 2, 3, 4, and all samples, respectively), which indicate that using double HKGs (first two genes in each group) is sufficiently accurate for use in normalizing qRT-PCR derived gene expression data (Additional file 1: Figure S3). The triplet or more gene combinations can also be used as Vn/Vn+1 <0.15.
Expression stability values, as determined by NormFinder, are shown in Table 2. For group 1, SDHA and EIF4H were the most stable HKGs, and ACTB was the least stable gene, which was the same as was determined by geNorm. In group 2, SDHA and NCBP3 were the most stable HKGs while ACTB was the least stable gene. In group 3, SDHA and YWHAZ got the top rank, while ACTB ranked at the lowest. In group 4, PTPRA and NCBP3 were the most stable, while GAPDH ranked at the lowest. In all samples, SDHA and NCBP3 were the most stable, while ACTB was the least.
The BestKeeper algorithm used std-values to assess HKG stability with the lower the std-value, the more stable HKG expression was. As shown in Table 3, in group 1, SDHA and PTPRA were the most stable HKGs, whereas RRAGA was the least stable. The same was observed with the geNorm analysis. In group 2, SDHA and EEF2 were the most stable HKGs, while ACTB was the least stable. In group 3, SDHA and YWHAZ got the top rank, while SPR68 ranked at the lowest. In group 4, EEF2 and NCBP3 were most stable, while GAPDH was the least. In all samples, SDHA and EEF2 were most stable, while ACTB was the least.
The 12 candidate HKGs were analyzed using the Delta Ct method, the data of which is presented in Table 4. The stability of the gene is inversely related to the std-value, thus a lower value indicates greater stability. In group 1, the two most stably expressed genes were PTPRA and SDHA, and the lowest were GAPDH and ACTB. In group 2, the two most stable genes were EEF2 and SDHA, and the least were ACTB and CNBP. In group 3, SDHA and PTPRA were the most stably expressed, whereas ACTB and CNBP were the least. In group 4, the top two stably expressed genes were NCBP3 and EEF2, whereas CNBP and GAPDH were the least. In all samples, the 3 most stable genes were PTPRA, EEF2, and SDHA, while GAPDH, CNBP, and ACTB were the least stable genes.
A comprehensive ranking of the four methods examined
Next, the ComprFinder algorithm was employed to obtain a comprehensive score that was used to rank the potential HKGs (Table 5). In group 1, the 3 most stable HKGs were EIF4H, PTPRA, and SDHA. In group 2, SDHA, NCBP3, and EEF2 were the most stable HKGs analyzed. In group 3, SDHA, PTPRA, and EIF4H were the three most stable HKGs analyzed. In group 4, NCBP3, PTPRA, and EEF2 were the most stable genes. The overall rankings, from the highest to the lowest stability, were NCBP3 >SDHA > PTPRA > EEF2 > EIF4H > SRP68 > CTBP2 > YWHAZ > RRAGA > GAPDH > CNBP > ACTB. It is interesting to note that the top 3 genes in different group rankings have at least 2 of NCBP3, SDHA and PTPRA. In contrast, the commonly used HKGs, ACTB and GAPDH, were relegated to the bottom 2 and 4 positions, respectively.
NCBP3, SDHA, PTPRA were the 3 most stable HKGs across all samples with scores within a tight range, calculated at 0.096, 0.099, and 0.108, respectively. They were also preferably ranked in groups 1-4 relative to other genes and were therefore considered to be the 3 most promising candidate HKGs, and were advanced for further validation.
Validation of HKGs by DKK1, SHH, and FGF5 genes
Based on the above analyses, 3 target genes (DKK1, SHH, and FGF5) were further characterized based on their changes in expression levels during the secondary hair follicle cycle (T1, T2, T3) with normalizations using different single HKG and multi-gene combinations. It was observed that NCBP3, SDHA, and EEF2 were the top 3 HKGs in group 2 (factor: hair follicle cycle) based on expression stability. Therefore, it can be concluded that the combination of NCBP3+SDHA+EEF2 was the best-normalized gene set for group 2. Since these 3 genes (NCBP3, SDHA, and PTPRA) are possibly the most important candidate HKGs, they were further characterized to determine optimal combinations for normalization of gene expression studies. Four multi-gene combinations, including NCBP3+SDHA+PTPRA, NCBP3+SDHA, NCBP3+PTPRA, and SDHA+PTPRA, in addition to 3 single-genes (NCBP3, SDHA, and PTPRA) were used for the analysis. Conversely, ACTB and GAPDH were used for comparison and were also examined as the multi-gene combination ACTB+GAPDH. In total, 11 multi-gene combinations or single genes were used as normalization factors. For multiple gene combinations, the geometric average of their Ct value was calculated. The relative gene expression level was calculated as 2-ΔCt, ΔCt= Δ (Cttarget gene-CtHKGs).
As is shown in Figure 4A, the expression profiles of DKK1 were similarly obtained using the 8 stable single-gene and multi-gene combinations. Furthermore, it was observed that DKK1 was more highly expressed in T2 compared to T1, and it was most highly expressed during the T3. Among the unstable single- and multi-gene combinations, only ACTB and ACTB+GAPDH performed similarly to the stable genes. However, the gene expression profile as normalized by GAPDH was different from the other conditions, and no significant difference has been identified among T1, T2, and T3. Expression of the SHH gene was even during the T1 and T2, but there was a significant decrease in T3 (Figure 4B). The 5 multi-gene combinations and NCBP3, SDHA identified this trend, but PTPRA did not. Though the GAPDH-normalized gene expression profile had similar trends to stable multi-gene combinations, ACTB was different. The combination of ACTB + GAPDH identified this expression change as a trend, but was not able to detect significant changes in expression. The expression profile of the FGF5 gene, when normalized by the most stable candidate HKGs used individually or in combination here, were very similar. High expression levels were observed in T2, but no statistical significance was identified relative to T1 and T3 (Figure 4C). The combination of ACTB + GAPDH showed a similar pattern to the stable HKGs, but when ACTB and GAPDH were used individually, the expression patterns were completely different. Furthermore, significant differences in ACTB were also identified in T2 relative to T1.
The above-mentioned results derived from Figure 4 reflect the differences of expression profiles of a single target gene normalized by 11 types of single or multiple-gene combinations. To further understand the relationship of those single or multi-HKG combinations, a correlation analysis on these relative expression data (2-ΔCt) of 3 target genes was performed. As shown in Figure 5, the normalized results using NCBP3+SDHA+EEF2 and NCBP3+SDHA+PTPRA had a high correlation coefficient (R=0.990, P <0.001), suggesting that they have extremely similar normalization capabilities. Other double-gene combinations including NCBP3+SDHA, NCBP3+PTPRA, and SDHA+PTPRA had high correlation coefficients, ranging from 0.969-0.997 with NCBP3+SDHA+EEF2. Also, these double-gene combinations had high correlation coefficients of 0.989-0.994 with NCBP3+SDHA+PTPRA. This indicated that these 3 types of double-gene combinations exhibited similar normalization capabilities to NCBP3+SDHA+EEF2 and NCBP3+SDHA+PTPRA. For single stable HKGs, NCBP3, SDHA, and PTPRA also exhibited high correlation coefficients with NCBP3+SDHA+EEF2 (0.942-0.973) and NCBP3+SDHA+PTPRA (0.952-0.977). The ACTB, GAPDH, and ACTB+GAPDH combinations had relatively low correlation coefficients with any of the stable single- (0.513-0.780) and multi-gene combinations (0.548-0.738).