Selection and validation of suitable reference genes for gene expression studies in Callerya speciosa (Champ. ex Benth.) Schot under different experimental conditions


 The accuracy of reverse transcription quantitative real-time PCR (RT-qPCR) is strongly depended on the stability of reference gene. Callerya speciosa, as a traditionally Chinese medicine, has a long cultivation history in south China. It is essential to select the suitable reference gene to obtain reliable RT-qPCR results when gene expression changes were evaluated. However, suitable reference genes in C. speciosa have not yet been investigated for accurate gene expression quantification under different experimental conditions. In this study, eight candidate reference genes (GAPDH, 60S, ACTIN, TUA2, TUB1, TIF5, UBQ, EF2) were selected from the transcriptome databases, and their expression stability under six experimental conditions (developmental stages, tissues, MeJA treatment, GA3 treatment, CPPU and PP333 treatment) was evaluated using ΔCT, geNorm, NormFinder, BestKeeper, RefFinder programs. The results showed that GAPDH was the optimal reference gene for all different experimental conditions, whereas ACTIN showed the most stability under the hormone treatments in C. speciosa. GAPDH and EF2 were proved to be the most stable genes for developmental stages, while different genes (GAPDH and TUB1) were stable reference genes for tissues. For treatments, ACTIN was identified as the most stable gene under most of hormone treatments. TUBI and ACTIN were at the beginning of the ranking order in PP333 treatment, while GAPDH and ACTIN were adequate for normalization in MeJA and GA3 treatments. TUBI and GAPDH were the most stable genes for CPPU treatment, while ACTIN was proved to be the most stable gene for three different treatments (MeJA, GA3 and PP333). Validation of reference genes was carried out by the target gene CsMYB36, which further confirmed their reliability. These results provided a theoretical basis for subsequent research on the regulation of functional gene expression in C. speciosa.


Introduction
Gene expression analysis has become an important method to reveal gene function and molecular mechanism. Due to its cost-effectiveness and relative simplicity, reverse transcription quantitative realtime PCR (RT-qPCR) is a popular tool in molecular biology to study gene expression [1,2]. However, RT-qPCR analysis is highly sensitive, affected by several factors, including the intrinsic variability of RNA [3], the speci city of the reaction, the stability of the applied reference gene. The result of RT-qPCR is accurate, only when the experimental settings are properly performed and appropriate normalization methods are employed [4]. In addition, the lack of validation of reference genes questions the accuracy of results obtained by RT-qPCR using non-valid references, implying the need for a systematic validation of reference genes [5]. Housekeeping genes are not expected to change under the experimental conditions serve as internal standard, generally used as reference gene [6,7]. Traditionally, housekeeping genes include Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), eukaryotic translation initiation factor 5A (EF5), 60S ribosomal protein L34 (60S) ubiquitin-conjugating enzyme E2 (UBC), polyubiquitin (UBQ), tubulin beta-1 chain (TUBI), elongation factor (EF2) ACTIN, α-Tubulin, β-Tubulin, 18S rRNA, 25S rRNA [8,9]. Therefore, selecting suitable reference genes is crucial to analyze the gene expression level under different experimental conditions by RT-qPCR. Several modern statistical algorithm tools have been recommended to nd the suitable reference genes, such as ΔCT [10], BestKeeper [11], geNorm [12], Norm nder [13], and RefFinder [14].
Callerya speciosa (Champ. ex Benth.) Schot is a perennial medicinal plant belonging to the Fabaceae family, and has a long cultivation history in south China. C. speciosa, commonly called Niu Da Li, possesses high medicinal value, and has been used as a Chinese medicine since the Ming Dynasty [15]. In recent decades, the pharmacology and phytochemistry researches have proved that C. speciosa exhibited a number of pharmacological effects of enhancing human immunity, anti-fatigue, expectorant effects, clinically used for invigorating vital energy and relieving asthma and strengthening the bones and muscles [16]. (Iso) avonoids are the main components in C. speciosa, and the index compounds maackiain and formononetin are considered to be the important indicators to control the quality [17]. At present, more attentions are paid to C. speciosa because of its health bene ts. However, the current molecular mechanisms of the (iso) avonoids biosynthesis in C. speciosa is still limited.
In recent years, several progresses have been made with respect to reference gene normalization for RT-qPCR, demonstrating that expression of reference genes changed considerably in different developmental stages and different experimental conditions. For example, in Gossypium hirsutum L. salt stress, UBQ7, GAPDH and EF1A8 were the better reference genes in leaves, while the better reference genes were TUA10, UBQ7, CYP1, GAPDH and EF1A8 in roots [18]. For Bromus sterilis, three developmental stages (2-leaves stage, 3-leaves stage and 4-leaves stage), 18S rRNA and ACCase were the better reference gene [19]. Under ABA treatment, the ideal reference genes of Nitraria tangutorum were EF1-α and His [20]. Our previous studies revealed that several R2R3-MYB TFs and structural genes might be involved regulating iso avonoid biosynthesis in C. speciosa, and CsMYB36 has the highest expression in R2R3-MYB TFs [21]. To understand these related gene expression changes will be facilitated to explain their functions in C. speciosa under different experimental conditions. Plant growth regulators play a vital role in the adaptation of plants to the environment, which can regulate the growth, development and reproduction of plants. Methyl jasmonic (MeJA), gibberellic acid (GA 3 ), N-phenyl-N'-(2-chloro-4-pyridyl) urea (CPPU) and paclobutrazol (PP 333 ) are important abiotic elicitors involved in abiotic stresses [22], accelerating cell division and plant growth [23]. Our previous research found that MeJA might be linked to the initial development of tuberous roots, GA 3 was considered essential for rapid thickening of tuberous roots [24]. With the development of molecular biology, gene expression analysis may help reveal the molecular mechanism of stress responses and root thickening in C. speciosa. Therefore, screening for C. speciosa reference genes under different experimental conditions are of great signi cance.
In this study, the expression stabilities of eight candidate reference genes ((GAPDH (glyceraldehyde-3phosphate dehydrogenase), 60S (60S ribosomal protein L34), ACTIN, TUA2 (tubulin alpha chain), TUB1 (tubulin beta-1 chain), TIF5 (eukaryotic translation initiation factor 5A), UBQ (polyubiquitin), EF2 (elongation factor 2)) under different experimental conditions were evaluated. Their expression levels under a serious of experimental conditions, including different tissues (root, stem, leaf, ower and seed), roots at four different developmental stages (6, 12, 18, 30 months after germination (MAG)), and growth regulator treatments (MeJA, GA 3 , CPPU, PP 333 ), were detected by RT-qPCR. Five different software packages (ΔCT, geNorm, NormFinder, BestKeeper, and RefFinder) were used to evaluate the expression stability of eight genes. In order to further verify the reliability of the reference gene selection results, we analyzed the expression patterns of CsMYB36, a gene related to the regulation of the iso avonoid synthesis. These results provide a theoretical basis for subsequent research on the regulation of functional gene expression in C. speciosa.

Results
2.1 Veri cation of primer speci city. A set of 48 pooled samples were selected to evaluate the primer speci city, including ve different tissues (roots, stems, leaves, owers, and seeds), four different developmental stages (6,12,18,30 MAG), and nine different treatments by plant growth regulators. For eight reference genes, the melting curve was a single peak by RT-qPCR analysis, and the no-template control in RT-qPCR reactions are no signal ( Supplementary Fig. S1 online). These results suggested that there were no genomic DNA contaminants and non-speci c ampli cation. The ampli cation e ciencies (E) ranged from 123% to 166.9%, and correlation coe cients (R 2 ) was higher than 0.981 (Supplementary 2.2 Analysis of the Ct value of reference genes. The Ct value usually is inversely proportional to the transcription level of a gene, so that lower expression abundance had higher Ct values. In this study, the Ct values of eight candidate reference genes changed under a series of experimental conditions, but the change trends were different. The lowest and highest Ct values were found in UBQ and EF2 at 20.15 and 32.14, respectively, indicating a large uctuation range of expression. Five candidate reference genes (GAPDH, 60S, ACTIN, TIF5 and UBQ) had relatively high expression abundances, and the mean Ct of 60S is the lowest (21.88). Another three genes (GAPDH, TIF5 and UBQ) showed the highest stability, followed by 60S with the mean Ct values of 22.27, 22.77, and 22.79, respectively (Fig. 2). These results suggested that it's necessary to evaluate the stability of candidate reference genes under different experimental conditions.
2.3 Delta Ct analysis. The method of ΔCT were evaluated the most stability reference genes based on Ct values. As shown in Table 1, the most stable gene in the eight different sample groups (all samples, all treatments, developmental stages, tissues, MeJA, CPPU, GA 3 , and PP 333 ) recommended by ΔCT analysis was GAPDH, TUB1, GAPDH, GAPDH, ACTIN, UBQ, ACTIN, and TUB1, respectively.
2.4 GeNorm analysis. Using the geNorm, the M value of candidate reference genes are ranked from lowest to highest. The lower M value represents more stable reference genes. In all case, the M values were < 1.5, suggesting that the candidate reference genes all had relatively acceptable stability values ( Fig. 3). For all samples, the GAPDH have the lowest M value. For development stages and tissues, TIF5 and GAPDH were the most stable reference genes, respectively. For treatments, the optimal candidate reference genes were TIF5, GAPDH, 60S, and ACTIN in all treatments, MeJA treatment, CPPU treatment, GA 3 treatment and PP333 treatment groups, respectively (Table 2). Notably, geNorm analysis indicated that GAPDH might be the most stable reference gene for all samples, while the stability of the conference genes under different treatments was not constant.
The optimal number of reference genes could also be recommended by geNorm program based on the pairwise variation value (Vn/n + 1). The criterion is when V n /V n+1 < 0.15, then the most suitable reference gene number is n. For all samples and treatments, the V7/8 > 0.15, suggesting that more than seven reference genes were needed to evaluate the gene expression. For developmental stages and PP333 treatments, the V3/4 values were < 0.15, indicating that the three recommended reference genes would be adequate for gene normalization. However, gene normalization in tissues, MeJA treatments, CPPU treatments and GA 3 treatments, only needed two reference genes, since the V2/3 values were all <0.15( Fig. 4).
2.5 NormFinder analysis. The calculation principle of NormFinder is similar to geNorm, the expression stability values (M) were calculated according to the CT value of the reference genes. The gene with the smallest M value is the most stable reference gene. As shown in Table 2, the stability of reference genes was ranked by NormFinder algorithm. For all samples and tissues, GAPDH was the lest expressed gene. ACTIN was the most expressed stable reference gene under MeJA and GA 3 treatment. However, the most stable gene in all treatments, developmental stages, CPPU treatment, and PP 333 treatment groups, was TUB1, EF2, UBQ, and TUB1, respectively. Consistent with the results of geNorm analysis, GAPDH was recommended to be the three-top stable reference gene in several groups. However, differences were also observed between the geNorm and NormFinder algorithms. For example, TIF5 gene was the most stable one by geNorm while ranked it as the rst in all treatments group (Table 2).
2.6 BestKeeper analysis. Distinct from the geNorm and NormFinder algorithms, the BestKeeper method evaluates gene stability based on the standard deviation (SD) and percentage covariance (CV) of candidate reference genes. The larger the SD value, the worse the stability of the gene. When SD 1, the expression of the internal reference gene is unstable. The analysis results are shown on Table 3. In all samples, the SD values of GAPDH, ACTIN, TUA2 and EF2 were >1, indicating that they were not suitable candidate reference genes. In developmental stages, tissues, MeJA treatments, CPPU treatments, GA 3 treatments and PP 333 treatments, the gene with the smallest SD value was GAPDH, TUBI, TUBI, TIF5, UBQ, UBQ, respectively. Interestingly, SD value >1 was generated in many gene under different experimental conditions, such as ACTIN in all treatments and CPPU treatments, TUA2 in all treatments, developmental stages, and CPPU treatments (Table 3). These results indicated that the BestKeeper analysis differed obviously from the geNorm and NormFinder algorithms.
2.7 RefFinder analysis. Based on the results of geNorm, NormFinder, BestKeeper and Delta CT, the online tool RefFinder is used to calculate the geometric mean to obtain the nal comprehensive index ranking of gene stability. As shown in Table 4, the ranking of the eight candidate genes for the entire sample was as follow: 60S > TIF5 > GAPDH > TUBI > EF2 > UBQ > ACTIN > TUA2. GAPDH was almost always the topranked one for all samples, developmental stages, tissues, MeJA treatment, CPPU treatment, GA3 treatment, while TUBI and ACTIN were at the beginning of the ranking order for PP 333 treatment. TUBI and UBQ were the most stable genes for all treatment, while ACTIN was proved to be the most stable gene for three different treatments (MeJA, GA3 and PP 333 ).
2.8 Reference gene validation. CsMYB36 is a key gene in R2R3-MYB transcription factor family of C. speciosa, which might play an important role in regulating iso avone synthesis. Therefore, CsMYB36 was selected as target gene to investigate the expression pro le using the two most stable reference genes and the least stable reference gene in this study, in order to validate the reliability and stability of the candidate reference genes determined by the applied algorithms. Our results showed that the relative expression of CsMYB36 increased signi cantly in root and ower when using the two most stable reference genes (GAPDH and TUB1) and the combination of GAPDH + TUB1 (p < 0.001), while the expression levels did not have dramatically different when using the least stable reference gene (EF2). The expression trends of CsMYB36 were similar when the most stable reference gene or the combination of the two most stable reference genes. Consistent with different tissues, the expression trends of CsMYB36 were also similar when using the single and combination of the two most stable reference genes in different developmental stages (6,12,18,30 MAG) and MeJA treatments (0, 3, 24, 48 h) (Fig. 5). Therefore, the results indicated that the stable reference genes strongly affected the relative expression levels of the target gene. If the reference gene has high stability, one gene is adequate for normalization.

Discussion
Gene expression patterns help to in-depth study of the mechanism and function of some unknown genes in animals, plants and microorganisms. There are many methods in molecular biology for gene expression, including Northern and Southern hybridizations, scintillation proximity assay, PCR-ELISA and RNase protection assay [6]. However, most of these methods have some shortcomings as follow: time consuming, labor intensive, nonquantitative, insu ciently sensitive, and cross contamination. Real-time polymerase chain reaction (RT-qPCR) is easy to perform with high sensitivity and speci city, which provides the accuracy and produces reliable as well as rapid quanti cation results [25,26], therefore it has emerged as a robust and widely used methodology for gene expression. To date, some plants have been selected for research on RT-qPCR analysis, including Glycyrrhiza [27], Nitraria tangutorum [20], Fucus distichus [28], Bromus sterilis [19], Allium tuberosum [29], and Suaeda glauca [30]. In addition, RT-qPCR is widely used in diagnosing diseases, such as preterm birth [31], diabetic kidney disease [32], Bronchopulmonary dysplasia [33]. Due to its rapidity and accuracy in informing on active coronavirus (CoV) infection, RT-qPCR has become the assay of choice for COVID-19 diagnosis [34,35]. However, RT-qPCR requires a stable internal standard as a reference gene. Previous studies reported that the expression pro les of reference genes might be unstable under different species, tissues, developmental stages, and abiotic stress [9,18,36]. And an unstable reference gene will cause misleading and even contradictory results [37,38]. Therefore, it's always necessary to validate the optimal reference genes prior to their applications.
C. speciosa has attracted more attention own to its medicinal and health care values. Unfortunately, there has been no research on the normalization of appropriate reference gene for RT-qPCR analysis in C. speciosa. In our study, eight typical candidate reference genes (GAPDH, 60S, ACTIN, TUA2, TUB1, TIF5, UBQ, EF2) were selected from the transcriptome database of C. speciosa, and their expression stability under six different experimental conditions (developmental stages, tissues, MeJA treatment, GA 3 treatment, CPPU treatment, and PP 333 treatment) was evaluated using ΔCT, geNorm, NormFinder, BestKeeper, and RefFinder programs. However, the most stable genes recommended by the four different programs were not always the same due to their different calculations. Similar to geNorm algorithm, the most suitable reference genes selected by NormFinder are generally the same (Table 2). Notably, the ranking order of reference genes when using BestKeeper was quite different from other algorithms (Table  3), which was consistent with other studies [39,40]. GAPDH is widely used as a reference gene, which converts glyceraldehyde-3-phosphate into 1, 3-bisphosphoglycerate and is an essential component of the glycolytic pathway. It was reported that GAPDH has been selected as the most stable gene in different tissues of Gossypium hirsutum L. [18] and Dendrocalamus lati orus Munro [41]. GAPDH displayed the maximum stability for most of single abiotic stresses in carrot [42]. Similarly, GAPDH was almost always the top-ranked gene for all different experimental conditions, and TUA2 was generally the least stable one in C. speciose (Table 4). Additionally, ACTIN which is the essential components of the eukaryotic cytoskeleton, was the most stable reference gene under the selected abiotic stress and hormone treatments in celery [43], and carrot [42]. In C. speciose, GAPDH and EF2 proved to be the most stable genes for developmental stages, while different genes (GAPDH and TUB1) were stable reference genes for tissues. For treatments, TUBI and ACTIN were at the beginning of the ranking order in PP 333 treatment, while GAPDH and ACTIN were adequate for normalization in MeJA and GA 3 treatments. TUBI and GAPDH were the most stable genes for CPPU treatment, while ACTIN was proved to be the most stable gene for three different treatments (MeJA, GA 3 and PP 333 ) ( Fig. 3 and Table 4). Therefore, we suggested that GAPDH was the optimal reference gene for all different experimental conditions, whereas ACTIN showed the most stability under the hormone treatments in C. speciosa.
To validate the accuracy of the experimental results, an iso avonoid synthesis gene CsMYB36 was chosen to be the target gene, and two most suitable reference genes and the least stable reference gene were chosen as reference genes. Our work revealed that the expression pro les of the CsMYB36 gene were similar when normalized by single reference genes with very high stability (Fig. 5). In contrast, there was a signi cant difference in target gene expression with the least stable gene TUA2 as reference gene. These results suggested that no signi cant difference was found when using the stable reference gene in RT-qPCR normalization in C. speciosa. However, many previous studies reported that the application of one reference gene was not adequate for normalization [12,30]. For instance, the combinations of ACTIN/TUB or ACTIN/EFIA were recommended for their use in the pooled analysis responded to biotic and abiotic stress in Vigna mungo [2]. In agreement with this, the combination of other housekeeping genes also showed high expression stabilities, such as PP2A and TUA5 in Suaeda glauca seeds [30], as well as UBQ and EF-1α in Tectona grandis L.f. [44]. According to the pairwise variation parameters by geNorm program, two reference genes were recommended for four different experimental conditions (tissues, MeJA, CPPU, and GA 3 treatments), while the V3/4 value was below the threshold of 0.15 for samples in developmental stages and under PP 333 treatment in this study (Fig. 4). As shown in Fig. 5, the expression level of the target gene were signi cantly upregulated in ve different tissues and under MeJA treatment when a single or a combination of reference genes were used, while it was remarkably upregulated in the 18 MAG developmental stages. These results suggested that one reference gene might be su cient for accurate normalization in C. speciosa.

Conclusions
For the rst time, our work aimed to identify the most stable reference genes for RT-qPCR analysis in different experimental conditions in C. speciosa. Eight candidate reference genes (GAPDH, 60S, ACTIN, TUA2, TUB1, TIF5, UBQ, EF2) were selected from the transcriptome database of Callerya speciosa, and their expression stability under six experimental conditions (developmental stages, tissues, MeJA treatment, GA 3 treatment, CPPU and PP 333 treatment) was evaluated using ΔCT, geNorm, NormFinder, BestKeeper, RefFinder programs. Meanwhile, the target gene CsMYB36 was used to validate the results recommended by programs. The results showed that GAPDH was the optimal reference gene for all different experimental conditions, whereas ACTIN showed the most stability under the hormone treatments in C. speciosa. GAPDH and EF2 were proved to be the most stable genes for developmental stages, while different genes (GAPDH and TUB1) were stable reference genes for tissues. For treatments, ACTIN was identi ed as the most stable gene under most of hormone treatments. TUBI and ACTIN were at the beginning of the ranking order in PP 333 treatment, while GAPDH and ACTIN were adequate for normalization in MeJA and GA 3 treatments. TUBI and GAPDH were the most stable genes for CPPU treatment, while ACTIN was proved to be the most stable gene for three different treatments (MeJA, GA 3 and PP 333 ). Validation of reference genes was carried out by the target gene CsMYB36, which further con rmed their reliability. In a word, our work contributes a suitable reference for selecting stable internal reference gene candidates to investigate gene expression in C. speciosa, and provides a theoretical basis for subsequent research on the regulation of functional gene expression and the understanding of molecular mechanisms in C. speciosa. Materials And Methods 5.1 Plant Materials and Treatment. The samples in this experiment were collected from the planting base of Guangxi Guangze Health Industry Co., Ltd., Nanning, China. There different tissues (root, stem, leaf) were collected from the 30-month seedlings, owers are collected at the bud stage, seeds are collected at ripe. tuberous roots at four developmental ages were also collected, including 6, 12, 18, 30 months after germination (MAG) (Fig. 1). Mature seeds were selected to germinate on wet lter under dark conditions at 26 ± 2°C for 10 days. Two days after germination, seedlings were cultured in plastic pots lled with a 3:1 (v:v) mixture of loamy soil and pearlite at 26 ± 2°C with 16-h/8-h light/dark photoperiod, 300 µmol m −2 s −1 light intensity, and 60% relative humidity in an arti cial weather box (MGC-400H, Shanghai, China) in GuangXi University of Traditional Chinese Medicine. For CPPU, GA 3 , and PP 333 treatment, the 3month-old seedlings were sprayed with 10 mg·L -1 , 50 mg·L -1 , and 100 mg·L -1 for 0, 12, 48, 72 h, respectively. The seedlings were sprayed with clean water as control condition. Meanwhile, three different MeJA concentrations (200, 400, 600 μmol L -1 ) for 48 h and six different time intervals (0, 3, 6, 24, 48 and 72 h) for 200 μmol L -1 MeJA were also treated on the 3-month-old seedlings. After treatment, the seedlings were cultured in the above-mentioned environments. The roots of seedlings were sampled after washing with distilled water. Thereafter, all samples are frozen immediately in liquid nitrogen and stored at − 80°C until further use. 5.2 Total RNA extraction and reverse transcription. Total RNA was extracted from samples using the Plant Total RNA Extraction Kit BSC65S1 (Biospin Technology Co., Ltd, Hangzhou, China), according to the manufacturer's instructions. The RNA purity and concentration were measured with spectrophotometer (Tiangen Biotech Co., Ltd.) and integrity was checked on 1% agarose gel electrophoresis. RNA samples with a concentration higher than 300 ng/μL and A260/A280 ratio between 2.0 and 2.3 were required for cDNA preparation (Supplementary Table S1 online). cDNA was synthesized using the PrimeScript TM RT reagent Kit with gDNA Eraser (Perfect Real Time) (Takara, Dalian, China) according to the manufacturer's instructions. The products were stored at − 20°C until further analyses. 5.3 Selection of candidate reference genes and primer design. Eight candidate reference genes (GAPDH, 60S, ACTIN, TUA2, TUB1, TIF5, UBQ, EF2) were selected from the transcriptome databases (SRA accession No. SRP223620), which have stable expressions (|log 2 ratio|≤0.5) in the roots of C. speciosa and often selected as housekeeping genes for RT-qPCR analysis in different plant species. The primers were designed by Primer 5.0 software according to their CDS sequences. The design Principles were as follows: the length of primers and ampli cation product size were 15-25 bp and 80-120 bp, respectively; the melting temperature (Tm) ,58℃; GC content, 40%-60% [45]. All primers were synthesized by the BGI Tech Solutions Co., Ltd. (BGI Tech) (BGI, Shenzhen, P. R. China). The primer information used in this study was listed in Supplementary Table S1 online. 5.4 RT-qPCR analyses. The RT-qPCR reaction was performed using a CFX96 TM real-time system (Bio-Rad, America). The 10 µL reaction system consisted of SYBR® Premix Ex Taq™ (5 µL), both the forward and reverse primers (0.5 µL, 5000 ng), 0.5 µL of the cDNA template (diluted ve-fold with RNase-free ddH2O, and 3.5 µL of RNase-freed ddH2O. All the PCRs were performed under the following conditions: one cycle of 15 min at 95°C for activation, followed by 40 cycles of 10 sec at 95°C, 30 sec at 60°C, and 30 sec at 72°C. Three technical replicates were conducted for each of the three biological replicates. 5.5 Primer ampli cation e ciency and linearity inspection. The standard curve was constructed with a ve-step dilution series of cDNA pool (equal volume of the cDNA from all the samples to be the pool). The ampli cation e ciency (E) and correlation coe cient (R 2 ) of primers were calculated using the standard curve. The ampli cation e ciency (E) of each primer pair was calculated by the curve slope using E= [46]. The E and R 2 of primer were listed in Supplementary Table S2 online. Every candidate gene in each sample was performed in three biological replicates with three technological replicates. PCR reactions with no-template controls were also conducted for each primer pair, to check the absence of primer dimers and random contaminations.
5.6 Stability analysis of candidate reference genes. The stability of eight candidate reference genes was evaluated by four Microsoft Excel-based computational programs (geNorm, NormFinder, BestKeeper and ΔCT), and an online data analysis program RefFinder. For geNorm analysis, the Ct values needed to be converted into a relative quantity according to the formula: 2 −ΔCT , where ΔCT is equal to the corresponding Ct value minus the minimum Ct value [47]. The geNorm algorithm was applied to evaluate the average expression stability values (M) and the pairwise variation (V) values of reference genes. Gene with an M value below the threshold 1.5 was thought to be the stable one. In addition, this program could also recommend the optimal number of reference genes by calculating the pairwise variation (Vn/Vn+1). When the pairwise variation was < 0.15, the optimal number of reference genes is n. The NormFinder was used to analyze the stable values of the expression stability by calculating the variance, including intra group and inter group variance. The lowest values indicated the highest stability. The BestKeeper program, which used the Ct values as row data calculated the values of variation (CV) and standard deviation (SD) to analyze gene stability. Generally, the more stable genes have the lower SD and CV values. The stability of eight candidate reference genes was comprehensively evaluated using RefFinder tool. To execute, the ranking of each gene by ΔCT, geNorm, NormFinder and BestKeeper was multiplied by different weighting coe cients and calculated the geometric mean to obtain an overall nal ranking. 5.7 Reference gene validation. In order to further verify the reliability of the reference genes, an iso avone synthesis gene CsMYB36 was chosen to be the target gene. CsMYB36 has a high-level expression level in the C. speciosa R2R3-MYB transcription factor family [21]. The expression pattern of CsMYB36 in different samples was investigated using the two most stable reference genes and the least stable reference gene ranked by RefFinder program. Three repetitions were performed to determine means and standard deviations (SD). Statistical analyses were performed using SPSS 24.0 software (Ehningen, Germany), and presented as the means ± SD. The statistical signi cance was determined using Duncan's multiple range test. "*", "**", and "***" in gures means the signi cantly different at 0.05, 0.01, and 0.001 probability levels, respectively. Author statement A preprint has been published [48].

Data Availability
The database generated for this study can be found in the transcriptome databases (SRA accession No. SRP223620).

Competing of interest
The authors declare no con ict of interest. Tables   Tables 1-4 are in the supplementary les section. Figure 1 The different tissues and developmental stages of C. speciosa. (a-d) Stem, leaf, ower, and seed, respectively. (e-h) Root at 6, 12, 18, 30 MAG, respectively.

Figure 2
Violin chat of CT value of 8 candidate reference genes. A violin column represents the data distribution state of a sample. The curve of the violin shape is the data distribution probability curve. The more data is distributed under this value, the wider the curve. The upper and lower ends of the thin black line represent the maximum and minimum values of the non-outliers of the data, the upper and lower edges of the thick black line represent the 3/4 digits and 1/4 digits of the data, and the white dot in the center represents the median of the data.