Identication and validation of appropriate reference genes for gene expression analysis in Schima superba

Background: Real-time quantitative PCR (RT-qPCR) is a reliable and high-throughput technique for gene expression studies, but its accuracy depends on the expression stability of reference genes. Schima superba is a strong resistance and fast-growing timber specie. However, so far, reliable reference gene identications have not been reported in S. superba. In this study, we screened and veried the stably expressed reference genes in different tissues of S. superba. Results: Nineteen candidate reference genes were selected and evaluated for their expression stability in different tissues. Three software programs (geNorm, NormFinder, and BestKeeper) were used to evaluate the reference gene transcript stabilities, and comprehensive stability ranking was generated by the geometric mean method. Our results identied that SsuACT was the most stable reference gene, SsuACT + SsuRIB was the best reference genes combination for different tissues. Finally, the stable and less stable reference genes were veried using the SsuSND1 expression in different tissues. Conclusions: This is the rst report to verify the appropriate reference genes for normalizing gene expression in S. superba, which will facilitate future elucidation of gene regulations in this species, and useful references for relative species. best combination for different tissues of S. superba. This study could provide a basis at the gene expression level of S. superba and Theaceae.


Background
At present, plant gene expression analysis methods include northern blot, in situ hybridization, RT-PCR, and Real-time quantitative PCR (RT-qPCR), and so on. RT-qPCR has been widely used in molecular biology research, and expression analysis is realized by real-time detection of uorescence signal changes in the whole PCR reaction process due to its high sensitivity, accuracy, speci city, throughput capability, and cost-effectiveness [1][2][3][4][5]. However, the accuracy of relative quanti cation in RT-qPCR is always affected by many variable factors. For example, the RNA quality, integrity, reverse transcription e ciency, and ampli cation e ciency [2,6]. To ensure accurate results and eliminate errors, it is necessary to use one or more stable reference genes to normalize the expression data of target genes [7].
Reference genes, also known as housekeeping genes, refer to a class of genes that can be stably expressed in different tissues and organs. In plant researches, the commonly used reference genes are mainly the genes that constituted the cytoskeleton or participated in the cells basic biochemical metabolic activities, including actin (ACT), β-tubulin (TUB), ribosomal RNA (18S rRNA, 26S rRNA), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), ubiquitin (UBQ) and elongation factors 1 alpha (EF-1α) [8][9][10]. However, studies have shown that the expression levels of these gene is speci c, and not stable under different species and treatments. The reference gene suitable for any condition does not exist [11][12][13]. Therefore, in the qualitative and quantitative research of genes, it is necessary to select the appropriate reference genes according to the speci c experimental conditions [14].
Schima superba is an evergreen broad-leaved tree in Theaceae and it valued commercially for its timber and re protecting [15][16][17]. Theaceae contains about 700 species, which possessed important economic values, such as the tea plant Camellia sinensis with their special drinking value, the traditional oil tree C. oleifera that produces high-quality edible seed oil and the ornamental plant C. azalea with their attractive owers. The genus Schima has approximately 20 species and is mainly distributed in southern China and the adjacent parts of East Asia, with 13 species (six endemic) present in China [18]. Some reference genes of Theaceae had been reported. For example, β-actin could be used as a reference gene for tissues and GAPDH for mature leaves and callus in C. sinensis [19]. TUA-3, ACT7α and CESA were relatively stable in the different tissues of six oil-tea camellia species [20]. TUA and GAPDH were optimal reference genes in different organs and TUB and UBQ in petals in C. azalea [21]. However, there were few reports on gene expression analysis and reference genes expression stability in S. superba. Only Yang [22] used the C. oleifera's GAPDH to verify the differential expression genes between self and cross pollination of S. superba, but the expression abundance and stability have not been reported. With the completion of genome sequencing and construction of high-density genetic map [23], molecular design breeding and molecular assisted breeding of S. superba have been carried out gradually. So it is necessary to select appropriate reference genes for gene expression analysis.
In this study, 19 candidate reference genes, namely, ColGAPDH SsuACT SsuGAPDH SsuHis (histone) SsuTUA1(alpha tubulin) SsuTUA2 SsuUBC1(ubiquitin-conjugating enzyme) SsuUBC2 SsuUBC17 SsuUBCJ2 SsuMDH(malate dehydrogenase) SsuCal7(calmodulin-7) SsuCas(caspase) SsueIF5(eukaryotic translation initiation factor 5A4) SsuMet2(metallothionin 2a) SsuGTP(GTP-binding protein) SsuRIB(60S ribosomal protein) SsuTUB(tubulin beta-3) SsuUDP(UDP-galactose transporte), were assessed by RT-qPCR in S. superba different tissues (root, xylem, phloem, leaf, bud and fruit). To obtain the most suitable reference genes, three different statistical tools (geNorm, NormFinder and BestKeeper) were selected to evaluate the expression stability. In addition, SECONDARY WALL-ASSOCIATED NAC DOMAIN 1 (SND1) gene, belonging to the NAC gene family, is involved in the initiation of secondary wall thickening in plant broblasts and is the primary switch in the transcriptional regulatory network of secondary wall thickening [24]. So to validate the selected best-ranked reference genes, the SsuSND1 expression levels in different tissues were investigated using the most and less stable reference genes or the combination.

Results
Candidate reference genes and PCR ampli cation Eighteen candidate reference genes were selected from the transcriptome of S. superba and ColGAPDH was cited from Yang [22]. The presence of a single PCR product expected size ( Figure S1) and single peak in the melting curve ( Figure   S2) con rmed the speci c ampli cation. The ampli cation e ciency (E) of all PCR reactions ranged from 93.47 % for SsuTUA1 to 109.03 % for SsuUBCJ2 (Table 1), suggested that these genes were suitable for further gene expression analysis. Meanwhile, the standard curves showed good linear relationships, with correlation coe cients (R 2 ) above 0.99 (Table 1).

Ct values of candidate reference genes
To assess the expression stability of 19 candidate reference genes in different tissues, the transcript abundances were presented as their Ct values. The Ct values varied in different tissues, which varied from 16.752 (SsuMet2) to 33.379 (SsuCas), while the mean Ct values varied from 18.032 (SsuMet2) to 25.556 (SsuMDH) ( Table 2). The Ct range > 4 of SsuCas, SsuUBC17, SsuUBC2 and SsuUDP in different tissues, suggested these genes varied greatly and were unstable in different tissues.
Analysis of reference gene stability using three bioinformatic programs In order to reduce the analysis error, the candidate gene stability ranking in different tissues was determined separately using geNorm, NormFinder and BestKeeper. So as to screen out the reference genes suitable for the experimental treatment, and provide bene cial reference for subsequent research.
The geNorm program is used to rank the genes expression stability by calculating the average expression stability values (M) based on the 2 −∆Ct value [25]. The smaller M value of reference gene, the more stably expressed. Meanwhile, if M value > 1.5, it is not suitable as a reference gene [25]. The M values of tested genes evaluated by geNorm were shown in Fig.1. SsuTUA1 and SsuRIB were ranked as the two most stable genes in different tissues, while SsuCas and SsuUDP were the two least stable genes.
The pairwise variation value (V n /V n+1 ) calculated by the geNorm determines the optimal number of reference genes.
When V n /V n+1 < 0.15, the optimal number of reference genes is n, otherwise, the number is n+1 [25]. In this study, except for V 18 /V 19 , the other value of V n /V n+1 < 0.15 (Fig.2), indicating that two reference genes would be su cient for gene normalization and an increase do not improve sensitivity.
NormFinder ranks the expression stability of reference genes by calculating the average pairwise variation in one relative to other candidate genes and the smaller stability value, the more suitable as reference gene [26]. For different tissues, the most stable gene was SsuACT, followed by SsuRIB, while the least stable gene was SsuCas, which were not altogether with the genes selected by the geNorm (Fig.3).
The expression stability is represent by the standard deviation (SD), coe cient of variance (CV) and correlation coe cient (r) of Ct values in the BestKeeper program, and the most stable reference genes are identi ed as those with the lowest SD and CV and the most r [27]. In this study, SsuACT and SsuUBCJ2 were identi ed as the most stable genes for different tissues, while SsuUBC17, SsuTUB, SsuUBC2, SsuUDP, SsuCas are unstable because of the SD > 1( Table 3).
The rankings of the nineteen tested genes were not perfectly consistent by geNorm, NormFinder and BestKeeper (Table   4). To provide a comprehensive evaluation of candidate reference genes, further analysis was thus carried out using geometric mean, which integrates geNorm, Norm nder and BestKeeper. The comprehensive ranking recommended by the geometric mean method was shown in Table 4, and SsuACT was the most stable genes for different tissues.
The best combination of reference genes was determined based on the optimal number calculated by geNorm and the ranking list obtained using geometric mean method. So SsuACT + SsuRIB was found to be the best combination of reference genes for different tissues.

Validation of the identi ed reference genes
In order to examine the reliability of the candidate reference genes for normalization, SsuSND1 expression pro les in different tissues were normalized using the two most stable candidate reference genes (SsuACT and SsuRIB), combination of stable genes (SsuACT + SsuRIB), the least stable reference gene (SsuCas) well as ColGAPDH (Fig.4). When SsuACT, SsuRIB, SsuACT + SsuRIB and ColGAPDH were used for normalization, the expression patterns of SsuSND1 were similar, and the relative expression of xylem, leaf and fruit were higher than others, and SsuSND1 was hardly expressed in bud and root, but the expression was most appropriate by SsuACT and SsuACT + SsuRIB. However it was analyzed by SsuCas, the expression pattern was not compatible and the expression level were too high in fruit and too low in bud. It was suggested that the selected reference genes were reliable.

Discussion
RT-qPCR has become a common technique for molecular biology research [28]. In the analysis process, reference genes are often used to reduce or correct the errors in the quantitative process of target genes. Therefore, the selection of appropriate reference gene is the key to realize the research of target gene expression under different experimental conditions or tissues [29]. S. superba has economic value for its timber, and the wood is used for furniture and construction. According to the phylogenetic analysis of Theaceae, Theeae and Gordonieae are closer related to each other, and the genera Schima belongs to the Gordonieae [30]. Moreover we have constructed a high-density genetic map and obtained 168 QTLs for 14 phenotypes [23], but still not focused on its molecular function and gene expression.
Therefore, to carry out the follow-up experiment smoothly, a stable and suitable reference gene would be selected and evaluated for the normalization of gene expression analysis by RT-qPCR in our research.
To avoid the limitations of using only single software analysis, three bioinformatic programs (geNorm, NormFinder and BestKeeper) were used to evaluate the expression stability of candidate reference genes in our analysis. The basis for evaluating gene stability in GeNorm is to use the 2 −∆Ct value of each gene to calculate the M value [25]. Meanwhile, GeNorm can determine the optimal number of reference genes required for quantitative analyses. In this study, the gene expression analysis needs two reference genes to achieve the best performance. The algorithm of NormFinder is similar to GeNorm using the 2 −∆Ct value as the relative expression to calculate the stability of gene expression [26]. BestKeeper focuses on the standard coe cient variation (SD) and variation correlation coe cient (CV) to screen the stability of internal reference genes [27]. The ranking from different programs showed some substantial discrepancies (Table 4).
For instance, SsuTUA1 and SsuRIB were the best reference genes identi ed by geNorm (Fig. 1), while it turned out that SsuACT was evaluated as the best by NormFinder (Fig. 3) and BestKeeper (Table 3). Differences in rankings among these programs have also been reported in other studies [12,31,32], which is likely the result of the different algorithms that they employ [33]. Therefore, to provide a comprehensive evaluation of candidate reference genes, the geometric mean was used to generate comprehensive stability ranking, and then the best combinations were determined based on the optimal number of reference genes calculated by geNorm. So SsuACT, SsuACT + SsuRIB were the most stable reference gene and combination for different tissues (xylem, phloem, leaf, bud, fruit and root). But in study processes, we found that Ct values of 19 candidate reference genes in seeds were signi cantly greater than other tissues, so the transcription levels uctuated greatly in the seven tissues (root, phloem, xylem, leaf, bud, fruit and seed). And the rankings of three programs were contradictory, so we excluded seeds and analyzed only the remaining six tissue. It is worth mentioning that, SsuRIB was the best reference gene predicted among the seven tissues by NormFinder, which indicated that SsuRIB could also be used to normalize the gene expression of seeds, because NormFinder is more suitable for the situation when the genes transcription level uctuates greatly [26,27].
Actin is a widely used as reference gene in plants. In this study, SsuACT is also the most stable gene. SsuRIB and SsuCal7 are the novel genes screened from genome and transcriptome of S. superba, and they have not been reported as reference genes in other species. And ColGAPDH is not suitable as a reference gene of S. superba because of low expression and poor stability. We compared the reference genes in S. superba, C. sinensis, C. oleifera and C. azalea in Theaceae. The optimal reference genes in different tissues are SsuACT, SsuRIB and SsuTUA1 of S. superba, β-actin of C. sinensis [19], TUA-3, ACT7α and CESA of C. oleifera [20] and TUA and GAPDH of C. azalea [21]. So ACT could be used as the reference gene of S. superba, C. sinensis and C. oleifera, and TUA could be used of S. superba, C. oleifera and C. azalea. Then we think ACT and TUA have wide applicability as the reference genes in Theaceae.
To validate the identi ed reference genes suitability, SsuSND1 expression patterns were investigated in different tissues using different reference genes. The expression pattern by normalized of SsuCas was not compatible with SsuACT + SsuRIB. The data once again demonstrated that reference genes play a key role in normalizing the data from RT-qPCR, and the use of inappropriate reference genes may lead to inaccurate results. Moreover, NAC plays a key role in the formation and development of apical meristem [34], lateral root [35] and secondary wall [36]. SND1 plays a similar role in S. superba and the relative expression of secondary xylem was higher than others.

Conclusions
This study is the rst systematic report about selection and veri cation of reliable stable reference genes for different tissues in S. superba, showing that SsuACT was the most stable reference gene, SsuACT + SsuRIB was the best combination for different tissues of S. superba. This study could provide a basis at the gene expression level of S. superba and Theaceae.

Plant materials
The plant materials were collected from the germplasm bank of S. superba clones (119°06'E, 28°03'N) in Zhejiang Longquan Academy of Forestry Sciences. The clones are the 25-year-old mother trees grafted with scions from Jianou, Fujian (118°31'E, 27°8'N) in 2008. The gene bank covers an area of 6.7 hm 2 and is located at the altitude of 200 300 m.
And the relative humidity of the area is 79%, and the average annual rainfall is 1 664.8 1706.2 mm.
In August 2020, different tissues were collected, including secondary xylem, secondary phloem, mature leaf, bud, annual fruit and root of tissue culture seedlings subcultured for 60 days. Each sample was set up in biological triplicate. The samples were frozen with liquid nitrogen and stored in -80℃.

Selection of candidate reference genes
Eighteen candidate reference genes GeneBank accessions: MW770873~770890 which had a stable expression in the different tissues and age of S. superba were selected according to transcriptome data from our laboratory (Unpublished, Novogene, Beijing, China) ( Table S1). Eleven of them are often used as housekeeping genes in model plant species. And ColGAPDH (C. oleifera) GeneBank accession: KC337052 gene was used as control [22], and the sequence consistency is only 38.68% with SsuGAPDH. According to their CDS sequences blasted from Genomic data of S. superba (Unpublished, Novogene, Beijing, China) (Table S2), the primers were designed on the Web using the Primer 3.0 (http://www.primer3plus.com/primer3web/primer3web_input.htm) and synthesized by Sangon Biotech Co., Ltd.

RT-qPCR analysis
The total RNA was extracted using the RNAprep Pure Plant Plus Kit (Polysaccharides & Polyphenolics-rich) (Code No. DP441, TIANGEN, China) and were stored at -80℃. RNA was assessed using 1% agarose gel electrophoresis and was quanti ed with a Nanodrop ND-2000 ultra micro nucleic acid protein analyzer (Thermo, USA). The RNA samples with A260/A280 ratios between 1.8 and 2.1 (Table S4)  The primer speci city was veri ed by the presence of a single peak in the melt curve analysis during the RT-qPCR process. Three independent biological replicates and three technical repetitions were performed for each of the quantitative PCR experiments. The threshold cycle (Ct) was measured automatically and correlation coe cients (R 2 ) together with slope were calculated from the standard curve based on a tenfold series dilution of the cDNA templates.
The corresponding RT-qPCR e ciencies (E) for each gene were determined from the given slope.

Validation of identi ed reference genes
The SND1 GeneBank accession: MW796194 was selected as target gene to validate reliability of identi ed reference genes from transcriptome data (Unpublished, Novogene, Beijing, China) ( Table S1). The gene expression pro les at different tissues were normalized using the most and least stable reference gene and ColGAPDH. Sample collections and experiments were performed as described above.

Statistical data analysis
The average Ct value was calculated from three biological replicates and three technical replicates, relative gene expression levels were calculated using the 2 −△△Ct method [37].
GeNorm, NormFinder and BestKeeper algorithms were used to evaluate the stability of 19      Pairwise Variation (V) of candidate reference genes calculated by geNorm Figure 3 Expression stability of candidate reference genes analyzed by NormFinder