Evaluation of Reference Genes for Expression Normalization in Psoralea Corylifolia L. under Abiotic Stress

Background: Psoralea corylifolia L., a traditional Chinese herbal medicine, has a wide range of pharmacological activities for the treatment of various diseases. To date, there have been no published reports of the systematic evaluation and selection of the best reference genes for biological analysis of P. corylifolia. Reverse transcription quantitative polymerase chain reaction (RT-qPCR) is a method for quantifying gene expression. Selecting the appropriate reference genes is essential to ensure accurate normalization of RT-qPCR results. Results: In this study, 10 candidate reference genes, namely actin (ACT), glyceraldehyde 3-phosphate dehydrogenase (GAPDH), protein phosphatase 2A subunit A3 (PP2A), nuclear cap-binding protein subunit 2 (NCBP2), cyclophilin 2 (CYP2), elongation factor 1-alpha (EF-1α), protein phosphatase 2A-2 (PP2A2), thioredoxin-like protein YLS8 (YLS8), polypyrimidine tract-binding protein (PTBP), and TIP41-like family protein (TIP41), were screened from the transcriptome database of P. corylifolia. The plants were treated under different abiotic stress conditions (H 2 O 2 , NaCl, CuSO 4 , high-temperature, low-temperature, PEG, and UV). Next, their expression stability was evaluated using four algorithm tools: BestKeeper, NormFinder, geNorm, and RefFinder. The results revealed that TIP41, CYP2, and YLS8 were the top three genes with the most stable expression levels under various abiotic stress conditions. Conclusion: We successfully selected appropriate reference genes for P. corylifolia, providing a reliable basis for further studies on the quantitative gene expression prole of the main active components in P. corylifolia.

dehydrogenase (GAPDH), 18 S ribosomal RNA (18 S rRNA), α-tubulin (TUB α), β-actin (ACT β), and elongation factor 1α (EF-1α). However, recent studies have reported variabilities in the expression of these reference genes under different processing environments [29][30][31][32]. Selecting incorrect reference genes will inevitably misguide the analysis. To date, there has been no systematic evaluation or validation of reference genes for RT-qPCR analysis of P. corylifolia. Consequently, identi cation of a reliable reference gene is essential for elucidating the gene expression pro les of P. corylifolia. Therefore, in this study, we aimed to discover a reliable reference gene for P. corylifolia. For this purpose, 10 candidate reference genes, namely ACT, GAPDH, PP2A, NCBP2, CYP2, EF-1α, PP2A2, YLS8, PTBP, and TIP41, were selected from the P. corylifolia database. To determine the appropriate reference genes under different environmental conditions, the following abiotic stress conditions were applied: oxidative stress (H 2 O 2 ), salt stress (NaCl), mental stress (CuSO 4 ), high-temperature (42°C) and low-temperature (4°C) stress, osmotic stress (polyethylene glycol, PEG), and ultraviolet (UV) rays. Finally, the raw data were analyzed using three Excel-based software packages, BestKeeper [33], NormFinder [34], and geNorm [35]. RefFinder was used to comprehensively evaluate the results. This study is the rst to explore the selection of appropriate reference genes in P. corylifolia under different abiotic stress conditions. It also provides a credible normalization standard and basis for further analysis of the gene expression of bioactive components in P. corylifolia.

Plant materials and stress treatment
One-year-old plants of P. corylifolia were collected from from Ningguo City, Anhui Province, China (longitude: 118.95E, latitude: 30.62N), and was identi ed as Psoralea corylifolia Linn. by Zhang Ning, school of pharmacy, Jiangsu Health Vocational College, and deposited in the herbarium of the medicinal botanical garden (ID: JSJK-PC-012). The planting eld of P. corylifolia was a private land and the landowner has allowed us to use it for study. Then, these accessions were transplanted into plastic basin composed of vermiculite, perlite, and peat moss at a ratio of 1:1:1. Next, the plants were grown in a greenhouse at a temperature of 25 ℃, a long photoperiod of 16 hours light and 8 hours darkness, 40-70% relative humidity, and 3000 lux light intensity until treated. To induce oxidative stress, the plants were exposed to 200 ml of H 2 O 2 for 24 h. For the salt stress treatment, the plants were exposed to approximately 200 mL (600 mM) NaCl for 7 days. Heavy metal stress was induced by treatment with 500 mM CuSO 4 for 24 h. For high-and low-temperature stress, the plants were stored in a constant-temperature incubator at 42°C and 4°C, respectively, for 48 h.To induce osmotic stress, the plants were cultured in 200 mL of 25% PEG 6000 for one week. For UV stress, the plants were exposed to UV light for 24 h, during which irrigation was carried out with 100 mL distilled water. The control group received no treatment, but was sprinkled with distilled water over time. All groupshad three biological replicates. The collected fresh leaf samples were cleaned, sterilized, frozen in liquid nitrogen to prevent RNA degradation, and then stored in a refrigerator at -80°C [36].

Selection of candidate reference genes and primers design
Based on previous studies in various plants, 10 common reference genes (ACT, GAPDH, PP2A, NCBP2, CYP2, EF-1α, PP2A2, YLS8, PTBP, and TIP41) were selected as candidates reference genes in this study. Information on the reference gene sequence of P. corylifolia was obtained from the TAIR database and is listed in Table 1. The potential unigenes of the local blast were screened by a program in the Bioedit Sequence Alignment Editor (v7.0.9), and the value of FPKM (fragments per kilobase of exon model per million mapped fragments) was employed to characterize the expression level of unigene, which could eliminate the in uence of gene length and sequencing volume differences on the calculated gene expression level (Table S1). The results of the melt curve analysis of the PCR products are depicted in Fig. S1, and most of them presented a single peak curve. The speci city of the primer sequences was veri ed. However, compared with that of other reference genes, the melting curve of PP2A showed an apparent uctuation trend, indicating that the speci city of its primer may be poor.
Total RNA extraction and RT-qPCR analysis RNA was extracted from approximately 100 mg of frozen samples using a Spectrum Plant Total RNA kit (Sigma, USA). The purity of the extracted RNA was analyzed using a NanoDrop spectrophotometer 2000 (Thermo Scienti c, USA). Only samples with OD 260/280 and OD 260/230 ratios between 1.8-2.1 and 1.6-2.2, respectively, can be used for the next operation. Reverse transcription was performed using HiScript II Reverse Transcriptase (Vazyme, Nanjing, China) in a 20 µL volume system containing 1 µg RNA. The reverse cDNA was diluted 10-fold as a template and ampli ed in PCR instrument under the following conditions: one cycle of 95°C for 5 min, 40 cycles of 95°C for 10 s, 58-60°C for 20 s, and 72°C for 20s. The relative expression levels of the genes were calculated according to the formula 2 ΔΔ−Ct [37].

Gene expression stability analysis
The cycle threshold (Ct) value indicates the cycle number when the generated uorescence signals in each response tube reach a detectable level. Hence, the Ct value directly re ects the expression level of a candidate reference gene. Correspondingly, a lower Ct value indicated a higher gene expression level. In this study, the expression stability of the candidate reference genes was compared using Bestkeeper, NormFinder and geNorm. These kinds of software use different algorithms to rank the expression level of each candidate gene.
BestKeeper is an algorithm software based on Excel, in which the raw Ct value can be inputted directly for evaluating and sequencing the stability of the corresponding reference genes under different stresses by calculating two parameters: coe cient of variation (CV) and standard deviation (SD). A low CV ± SD value indicates a high gene expression stability.
In contrast, in NormFinder and geNorm, the raw Ct value needs to be converted by the formula 2 −ΔCt (ΔCt = each corresponding Ct value -the minimum Ct value). geNorm shows the gene expression level by the parameter (M), where the threshold value of M is 1.5 [38]. The lower the M value, the higher the expression stability of the candidate. Furthermore, the pairwise variation (V n /V n+1 ) analysis of geNorm can be utilized to optimize the number of reference genes for accurate normalization. In general, when the value of V n /V n+1 is less than 0.15, no additional genes are required for normalization [39]. NormFinder measures the stability of gene expression by considering the value of the inter-group and intra-group variance M. Similar to that in other two algorithms, the lowest value of parameter M re ects the most stable gene expression. Through comprehensive analysis of the results of these software, reference genes that show the most stable expression under different stress conditions were accurately screened.

Statistical analysis
Three technical and biological replicates were implemented to obtain qPCR data, and the raw data are listed in Table S2. All data are presented as the mean ± standard error of the mean (SEM). Graphs were generated using Origin 2019 (OriginLab Corporation, Northampton, MA, USA). All data were analyzed using the geNorm, NormFinder and BestKeeper software according to the speci cation methods.

Results
Expression pro le of the candidate reference genes The mean Ct values of the 10 candidate reference genes ranged from 9.95-33.16, and most were distributed between 18 and 24. PP2A presented the highest expression level, with the lowest Ct values. TIP41 had the least variability, with the lowest SD value of 0.72, followed by YLS8, CYP2, PP2A, and ACT, suggesting that these genes possessed stable expression levels under different stress conditions. In contrast, PP2A2 showed the highest mean Ct value (33.16 ± 2.44), which indicated that this reference was not worthy of consideration (Fig. 1). It is worth noting that the high CV ± SD value of PP2A2under each condition indicated that its gene expression was not stable, which corresponded with the result of the Ct value. However, PP2A showed the lowest stability with the lowest CV ± SD value under most stress treatments, which was contrary to the outcomeof the Ct value. Therefore, further evaluation is of great signi cance to obtain accurate results in reference gene selection. Table 2 Expression stability rank of 10 candidate reference genes using BestKeeper in P. corylifolia (coe cient of variation (CV) ± standard deviation (SD)). NormFinder analysis Similar to that in BestKeeperanalysis, a lower Ct value indicates a higher stability of gene expression. As shown in ACT showed the highest expression stability in the NaCl and UV stress subsets. In the CuSO 4 and Cold subsets, YLS8 was the most stable reference gene. TIP41 had the highest stability in the high temperature and wild type subsets. NCPB2 and PP2A2 showed the lowest expression stability under the different stress conditions. Among all sample subsets, YLS8 was the most stable reference gene, followed by TIP41 and CYP2. Interestingly, in the overall ranking, the top three genes from the BestKeeper and NormFinder analyses were consistent. In addition to comparing and ranking the expression stability of the reference genes, geNorm can also optimize the number of reference genes by using pairwise variation (V n /V n+1 ) as a parameter. A threshold value of 0.15 was used to determine whether one more reference gene should be added for normalization. A value of < 0.15 indicates that an additional reference gene has no notable signi cance for normalization. In this study, we calculated the V n/n+1 value by using geNorm (Fig. 3 and Table S3). In all sterss condition except low temperature, two reference genes were su cient for normalization, with V 2/3 < 0.15. In the low-temperature subset, the V 3/4 value was below 0.15, which suggested that three reference genes should be applied for normalization.

Comprehensive analysis of reference gene validation
RefFinder was used to systematically select the optimal reference genes under different treatment conditions. By comparing the results of the above three algorithms with the ΔCt method, RefFinder can assess and screen out the most stable reference genes [40]. The results are listed in Table S4. In addition, all gene pairs were compared with each other and ranked according to the value of ΔCt, from low to high, as shown in Table S5. According to these results, TIP41 showed high stability in most treatment groups, which was consistent with the results of geNorm analysis.
Next, the geometric means of the three algorithms were calculated according to the ranks of the reference genes. TIP41, CYP2, and YLS8, ranked top three in all algorithms. In contrast, PP2A, NCBP2, and PP2A2, which suggested that these genes were not suitable for use as reference genes under different stress conditions (Fig. 4). Table S6 shows the ranking of the reference genes in all four algorithm tools revealing the genes that had stable expression under different stress conditions.

Discussion
In this study, among 10 candidate genes in P. corylifolia, the genes that show expression stability under various abiotic stress conditions were selected as the most suitable reference genes for use in RT-qPCR analyses. First, we veri ed the primer speci city of the candidate reference genes and the PCR experimental conditions. A single peak in the dissolution curve con rmed the speci city of the primers (Fig. S1). Moreover, according to the slope of the standard curve, all gene primers, except for PP2A and PP2A2 primers, had high PCR e ciency, with their R values are higher than 0.99 (Table 1). For these genes, there was a good linear relationship between Ct value and diluted cDNA concentration, as well as acceptable PCR conditions [40]. As a quali ed reference gene, its expression level should be within a reasonable range under various stress treatments. As illustrated in Fig. 1, the mean Ct values for the 10 candidates ranged from 9.95 to 33.16; the majority of these values were found to be between 18 and 24 in all tested samples, which meant that most candidates were likely to provide accurate normalization [41]. Low Ct values correspond to high expression levels and a narrow Ct distribution range implies lower variability and higher stability. Thus, NCBPI and PP2A2 may not be suitable as a reference gene. In contrast to PP2A, PP2A presented a narrow range of very low Ct values.
To systematically evaluate and compare the expression stability of the candidate genes, we used three Excel-based mathematical tools to process the raw data. Although the results of all three methods were reasonable, they were not completely consistent. The main reason is that they depend on different algorithms [19], therefore, another software RefFinder was applied to conduct a comprehensive analyze and rankthe results of the three algorithms to increase the credibility of the results. Interestingly, the three algorithms consistenly showed the same three genes as the most stable genes in the high temperature, PEG, and control subsets. Although BestKeeper revealed TIP41 as the most stable gene in most subsets ( Table 2). NormFindershowed that TIP41 had good stability in only three subsets (Table 3), this difference was mainly due to the different algorithms used to process the data. YLS8 also showed good stability in most stress conditions suggesting its potential as an appropriate reference gene.
In this study, GAPDH exhibited the highest stability under PEG stress, indicating that it can be a reliable reference gene under drought conditions. Although GAPDH has been widely used as a reference gene in different plants such as Linum usitatissimum L. and Coffea arabica [42,43], several other studies have also shown that GAPDH exhibited instability under different conditions [44,45], which may be partly due to the involvement of GAPDH in various cellular processes besides glycolysis [46].
Our results also showed that EF-1α, and ACT presented high stability in individual groups. In Arabidopsis, the expression of EF-1α was relatively stable [47]. However, under nutritional de ciencies or abiotic stresses, EF-1α shows poor expression stability [48]. ACT has been successfully applied to study the effects of low temperature and light in Festuca pratensis [49].
Considering that the average relative expression level of genes obtained using multiple internal reference genes is less variable than that obtained using a single internal reference gene [50], RT-qPCR with multiple reference genes may yield more stable and reliable results [22]. The number of reference genes used for normalization can be analyzed based on the paired variation value (V n /V n+1 ) calculated using geNorm software. Generally, once the V n /V n+1 value is above the suggested threshold value of 0.15, an additional reference gene must be introduced to calibrate the experimental results. As shown inTable S3, only the low-temperature subset required three reference genes, whereas the other subsets needed only two reference genes for normalization. NCBP2, PP2A, and PP2A2, showed the lowest stability in all stress conditions, according to the ranking of the three algorithms, indicating that they were not suitable reference genes for P. corylifolia gene research. Although PP2A showed a high expression level with low Ct value and was proven to be stable in some studies [51], another report showed its instability under several environments as well as tissue expression speci city [52]. Indeed, the use of a single, untested reference gene to standardize RT-qPCR results is usually not acceptable [24]. Thus, RefFinder analysis was conducted to evaluate the ranking generated by the three algorithms, and the result was similar to that of geNorm analysis. The nal ranking is presented in Fig. 4, with TIP41, CYP2, and YLS8 shown as the top three reference genes. Therefore, we recommended using these three genes as reference genes for RT-qPCR normalization in the study of P. corylifolia gene expression pro le.

Conclusion
Comprehensive analysis of BestKeeper, NormFinder, and geNorm results revealed TIP41, CYP2, and YLS8 as the reference genes with the most stable expression under all stress conditions tested, as well as PP2A, NCBP2, and PP2A2 as the least stable genes. geNorm analysis also showed that two stable reference genes were su cient for normalization under most treatment conditions. This is the rst study on the screening of reference genes in P. corylifolia. Our ndings not only lays a foundation for future research but also has a profound signi cance for guiding the screening of appropriate reference genes in other plant species.
Abbreviations Figure 1 The cycle threshold (Ct) values of 10 candidate reference genes in P. corylifolia. Lines across the Ct values box chart represent the median values. The boxes indicate the 25th and 75th percentiles, and the whisker caps indicate the maximum and minimum values.

Figure 2
The mean expression stability (M) of 10 candidate reference genes sorted using geNorm. The stability of gene expression increased from left to right, the most stable gene owned the lowest m value, which was distributed on the right.

Figure 3
The pairwise variation values of 10 candidates calculated using geNorm. Different treatment conditions are identi ed by different colors. The cut-off value of the optimized reference gene numbers for RT-qPCR normalization is 0.15.