Selection of Reference Genes for Transcription Studies on Adventitious Root Induction in Olive (Olea Europaea L.) Microshoots Considering Co-expression and Average Transcriptional Stability

Backgroung Selection of reference genes (RGs) for normalization of PCR-gene expression data includes two crucial steps: determination of the among-sample transcriptionally more stable genes and subsequent choosing of the most suitable genes as internal controls. Both steps can be carried-out through generally accepted strategies each having different strengths and weaknesses. The present study proposes to reinforce normalization of gene expression data by integrating and adding analytical revision at critical steps of those accepted procedures. Especially crucial is to counterbalance a higher representative number of RGs with a correspondent increase in their average transcriptional instability or a generalised co-expression trend among the samples. This methodological study used in vitro olive adventitious rooting as an experimental system, since the underlying morphogenetic process -wich is common to diverse species- is still not completely understood. following simple that systematic biological variations associated to experimental such gene Those types of systematic co-variation several popular hoc informatics programmes. To an algorithm of one of the hoc informatics programmes was adapted to allow partial automatization of RG selection for any strategy of transcriptional-gene stability ordering. In order to delve into the resulting possible RG sets suitability for inter-assay comparisons and technical-error compensation, separate statistics were formulated. The achieved results were compared with those obtained by standard stability ranking methods. Finally, a double evaluation was performed to accurately contrast two choice RG sets. The whole strategy was applied to a panel considering several independent factors, but the suitability of the obtained putative RG sets was tested for cases restricted to fewer variables. H2B, OUB and ACT are valid for normalization in transcriptional studies on olive microshoot rooting when comparing treatments, time points and assays.

the more robust steps from several strategies, to thus strengthen normalization of gene expression data. The method includes an unbiased transcriptional stability-ordering of genes and the adaptation of the GeNorm selection algorithm to any transcriptional stability ranking. The achieved results were compared with those obtained with the standard stability ranking strategies. In order to delve into the resulting possible RG sets suitability for inter-assay comparisons and technical-error compensation, separate statistics were formulated. Finally, a double-evaluation process was developed to accurately contrast two choice RG sets. The whole strategy was applied to an experimental panel considering several independent factors -such as treatment (BA with or without SHAM, control) time of culture and assay (repeat with same explant type proceeding from different biological material)-but the suitability of putative RG sets was tested for cases restricted to fewer variables.
The overall methodology was developed in a case-speci c study, but constitutes a guide for general application. A set of three RGs was identi ed as internal reference and is now available for wider expression studies on any target gene in similar systems.

Analysis of SHAM effects on adventitious rooting
Explants from a single clone of Olea europaea L. cv. 'Galega vulgar', pre-cultured in vitro according to [22], were used in all trials. Indole-3-butyric acid (IBA; Sigma-Aldrich, St. Louis, MO, USA) was used as the root promoting auxin at 14.7 mM. Since an involvement of AOX in adventitious rooting has been supposed [21], SHAM (Sigma-Aldrich, St Louis, MI, USA), a potential AOX inhibitor, was used to provide a restrictive treatment for adventitious root formation. Fresh SHAM was prepared in dimethyl sulfoxide (DMSO, Fluka, France), and the nal concentrations into the IBA solution were SHAM 100 mM, DMSO 26 % (v/v).
Used in this concentration, DMSO has no inhibitory effect on rooting [18]. Microshoots with 3-5 nodes, keeping the four full expanded apical leaves were used for the rooting trials. The basal parts of the explants (approx. 1 cm) were dipped for 10 s into the IBA solution (with and without SHAM). After that, the explants were in vitro cultured on a rooting medium devoid of growth regulators as proposed in [22]. A control without any dipping treatment was also used. Thus, three treatments were established in the nal subculture: rooting medium without immersion of basal part of explants (negative control), with initial immersion in IBA (rooting) or initial immersion in IBA+SHAM (rooting inhibition). Six assays each with these three treatments were performed. The derived in vitro grown plantlets were used in all subsequent analyses. Visible root formation was recorded at 22 and 28 days after transfer into rooting medium.

Biological material for transcript quanti cation analyses
Three assays were selected for molecular analyses, on the basis of differential rooting capacity for the treatment with IBA and without SHAM: assays I, II and III, ordered according to rooting capacity (Table 1). This differential behaviour confers more robustness when obtaining conclusions about factors in uencing adventitious root induction processes. Thus, in spite that assay is a xed factor, results would be also valid when considering it as a random factor, and it can function as biological replicate when considering only other factors.   Furthermore, PCR products were run in a gel to con rm a single band. Baseline and quanti cation cycle (Cq, de ned as the number of cycles needed for the uorescence signal to reach a speci c threshold of detection), so as speci city of the ampli cations were automatically determined using the 7500 Software v. 2.0.5 (Applied Biosystems 2010 Life Technology Corporation). The assay included two no-template controls without RNA or cDNA. No-reverse transcription controls (RNA from reverse transcription without reverse transcriptase) were run for each sample for the gene with lowest measured transcript accumulation.
Although 8 samples per group are advisable for RG selection using NormFinder (NFi) software [26], this programme was run with 6 (time x treatment). This number of samples was considered enough since: i) it is constant for a considerable number groups (all groups) and, ii) the number of CHG to test is higher than the minimum advisable ( ve) [26]. Less robust was the inter-group NFi analyses when taking into account also the assay inside of each treatment x time cell.
Additional test for gDNA contamination gDNA from two microshoot pools of the olive clone under evaluation was extracted employing DNeasy Plant Mini Kit (QIAGEN, Hilden, Germany) following manufacturer instructions. Additionally to no-reverse transcription control, the possible gDNA contamination in cDNA samples was made by checking the absence of gDNA bands of AOX1a (higher than the correspondent cDNA bands because of an intron) in the RT-PCR reactions (data not shown).

Calculation of PCR e ciency
To compare different qPCR runs, performed on different plates, all runs were adjusted to the threshold of Cq 0.1. Cq is, as it is deduced from its de nition, inversely correlated to the input amount of total RNA [27]. Raw data, including the melting and ampli cation curves obtained by the Applied Biosystems software, were extracted to Microsoft ® Excel les and then loaded for further data analysis.
The PCR e ciency of each primer pair was obtained by using LinRegPCR [28]. This program uses ampli cation data captured during the exponential phase of each PCR reaction after reconstructing baselines, a source of variation of the observed e ciency [28]. From a total of 78 ampli cation plots (i.e. two technical replicates of three biological replicates -assays-of a total of 13 different experimental conditions) per CHG, those that passed all quality tests of the program were used for calculations.
To overcome the problems derived from downstream mathematical proceedings, that by default do not take into account PCR e ciencies, corrected quanti cation cycle -Cq' = Cq log 2 (e ciency)-values were obtained using calculated e ciencies. Then, relative contents for each amplicon were calculated via the ΔCq method using one replicate of time 0 as reference.

Data analysis
A method for RG selection using estimators of coe cients of total variation (CV) and inter-group variation (F) of gene transcript accumulation values [5] was carried out (for simplicity, here named CV/F method). For a formulation of the mentioned relative variations, see [5]. To compare and eventually support CV/F results with different algorithms for the selection of the most transcriptionally stable reference genes, RefFinder (RFi) (http://www.leonxie.com/referencegene.php), a web-based comprehensive tool, was applied. To calculate values on gene transcript accumulation stability, RFi uses the currently available algorithms BestKeeper [29] -regarding Cq standard deviation (BKS)-, comparative ΔCq (CoD) [30], geNorm (gNo) [23] and NormFinder (NFi) [26] -only ungrouped samples-, assigning an appropriate weight to each one and ordering the CHG according to the output gene transcript accumulation stability measure. Each individual algorithm implemented in RFi also generates a ranking of genes. gNo, f.e., via a stepwise exclusion of the least stable gene, creates a stability ranking. RFi also calculates the BestKeeper Pearson coe cient of correlation (BKr) [29], although it is not used in later RFi weighting of stability parameters. Independent NFi software, additionally used in this work, uses an ANOVA-based model to estimate intra-and inter-group variation, and combines these estimates to provide a direct measure of the variation in transcript accumulation for each gene. To estimate differences among normalization factors (NFs, which equals the geometric means of the RG relative transcript accumulations for each replicate) the methodology of gNo for pairwise comparisons [23] was applied. For NF evaluation tests, two-and three-way ANOVA models with Bonferroni post-hoc adjustments (P<0.05) were performed, after data transformation to better keep the model assumptions, by loading data into the statistical programme IBM SPSS (v21, SPSS Inc., USA).
For further revision of the results, inter-replicate variations (estimated as relative variation within biological samples) and inter-assay variations (estimated as ratio of the relative variations within bifactorial groups -time x treatment-and within trifactorial groups -biological samples) were calculated, so as overall and individual CHG inter-replicate mean distances.

Results And Discussion
The justi cation and results of every step of the procedure for selection of RGs is discussed as follows.
Step 1. Selection of genes as candidates for reference genes and ampli cation tests Genes were selected according to procedures speci ed in Material and Methods. Single amplicons for each CHG were obtained whose speci city was con rmed by the observation of single-peak melting curves of the qPCR products or by the presence of a single band of expected size for each primer pair in agarose gel electrophoresis after PCRs employing either gDNA (data not shown) or cDNA as templates [see Additional le 2]). No primer dimers or other products resulting from non-speci c ampli cation were found.
Step 2. Calculation of Cq´s Calculated ampli cation e ciencies ranged from 1.855 to 1.935 (Table 2), values appropriated to be subsequently utilised in Cq´s calculation. The different abundance of each RG transcript affect the normalized results [31], but in the present case no considerable differences in transcript accumulation levels were observed. Furthermore, suitable RGs should be equivalent in transcript abundance to that of the target gene (TG), whose Cqs should be between 15 and 30 [26], limits within which the Cqs obtained in this analysis ts (data not shown; See Fig. 1 for a visual inspection of Cq´s). Step 3. Estimation of transcript level variations: The theoretically optimal way to identify the more transcriptionally stable candidate genes is trough estimation of both overall and inter-group variations in transcript levels under the employed experimental conditions [5]. With this purpose, two consecutive procedures were performed, as follows: Displaying of Cq´ means and standard deviation for experimental groups.
The sample maximization design here used for plate arrangement avoids inter-run variability among samples, frequently underestimated [25]. Cq´ values for the amplicons of the seven CHGs were depicted ( Fig. 1) to allow a visual evaluation of the transcript level stabilities and common trends among genes along the different factors (time, treatment). Hsp resulted as the most unstable CHG, with a general opposite trend to ACT, EF, GADPH, H2B and TUA, all of them genes showing a high degree of co-expression, v.g., co-regulation, in spite of they are functionally unrelated in a direct way. Co-regulation may be induced by stress or multi-stress [32,33], and could bias RG selection. In spite of their co-regulation, those genes show an opposite trend to the remaining and also stable gene, OUB, a behaviour that can be associated to its pre-proteolitic function, i.e., it is expressed when protein replacement is needed. The observed intra-group (time x treatment) variations were moderate except for Hsp, to what contributes a heterogeneous group-dependent variability among biological samples, since its intra-group inter-replicate variabilities were low (Fig. 1).

Calculation of transcriptional gene stability values
CV and F statistics estimates, respectively, the overall and inter-group variation of transcript levels of single CHGs without counterbalancing with the rest of CHGs. Thus, CV and F estimators provide stability rankings not so biased by concomitant, systematic biological variations that may be associated to an experimental condition, leading to a group of genes to follow a similar expression trend -such as the variation due to co-regulation. Thus, CV and F provide in most cases a good classi cation of the stability of CHGs. It is worth to point out that the purpose of CV and F statistics is the generation of a transcriptional gene stability ranking rather than gene transcriptional stability values, and not at all the obtaining of signi cant differences. Hence, inference statistics prerequisites such as normal distribution are not mandatory for the pursued goal. F was calculated for bi-(time x treatment) and trifactorial (time x treatment x assay) cases (thus F was named as F2 and F3, respectively).
Step 4. Ranking of candidate housekeeping genes according to their stability The more transcriptionally stable genes should have a more or less parallel classi cation in both CV and F descriptive parameters [5]. In the CV/F method, preference is given to CV (overall variation) ranking [5], thereby overcoming possible large discrepancies caused by opposite and extreme values of CV and F. The rankings of CHGs according to CV/F method -so as used software algorithms-are shown in Table 3 Table 3. Transcriptional stability rankings for transcript accumulation of candidate housekeeping genes for the whole panel of experimental conditions 1 .
1 According to BestKeeper Pearson coe cient of correlation (BKr), and the stability parameters for BestKeeper SD (BKS), comparative ΔCt method (CoD), GeNorm (GNo), NormFinder (NFi), RefFinfer (RFi) and CV/F method. A lower value for stability parameters indicates higher transcript accumulation stability. F and NF were calculated for different sample groups. Underlayed are the genes contributing to the 2-genes normalization factor with maximum stability according to NFi independent software. In bold, genes selected as reference genes (see Fig. 2). In cursive, the discarded CHG Hsp. 2 All tested genes with P<0.01 for Pearson correlation coe cient for the correspondent BestKeeper normalization factor, according to Pfa et al (2004).
Step 5. Discarding of transcriptionally unstable genes As it is also evident from visual distribution of CV and F parameters (Fig. 1), Hsp resulted in the more transcriptionally unstable CHG for all methods used. Moreover, in Table 3, Hsp ranked as the less stable CHG for CV and F3, and its rst position in F2 is an effect of an intolerably high intra-group Hsp instability.
Consequently, Hsp was discarded and removed for subsequent re-rankings.
Step 6. Re-rankings of remaining genes For CV/F method, the exclusion of Hsp does not change stability values and the order of the rest of CHGs. Thus, omitting Hsp, the order H2B, OUB, ACT and EF is clear for CV/F method, and, taking into account the preference given to CV ranking over that from F, those genes may be followed by GADPH and, nally, TUA. GADPH showed the lowest inter-group stability according to both calculated F rankings, and had a relatively low score according to CV, being ranked as 5 th for this parameter.
Step 7. Ranking with software methods as support and comparison with CV/F ranking Selection of functionally distant CHGs, such as those from the group here tested ( Table 2) and some previously considered as suitable RGs for plant in vitro growth and rooting studies ( [9] and references therein), diminish the risk that most of them be transcriptionally affected in a similar way by the experimental conditions. In spite of this, it have been advised to evaluate each individual CHG stability by other means previously to the software usage [5]. CV has been used as a main parameter for normalization [34], frequently complementing normalization software [5,35,36].
Software methods employed in the present study (except BKS algorithm implemented in RFi) use statistics that relate transcript levels among all used CHGs, and consequently rely under the hard to support assumption that a majority of the evaluated CHGs do not show real biological concomitant, systematic variations on their transcript accumulation among sample groups, i.e., the measured variation is assumed to be due to technical error for most of the tested genes. In other words, all those programmes are sensitive to concomitant variations such as co-regulation [37], especially CoD [38] and gNo [39]. Furthermore, NFi software, although less affected by co-regulation, does not account for inter-group systematic errors associated to sample preparation [38] -in contrast to F-. On the other hand, BestKeeper requirements are too strict with the inter-sample variance [40]. Moreover, rankings offered by all those methods are frequently con icting ( [5][6][7][8][9], reviewed in [10]). The RFi programme uses the geometric mean of the four statistical approaches (BKS in the case of BestKeeper, and overall variation for NFi) to rank the CHGs. RFi weighting may help to overcome some pitfalls, but it is obviously also in uenced by the systematic variations affecting the integrated algorithms.
In spite of these commented pitfalls, with the exception of gNo, which algorithm more strongly ranks according to similar gene transcript accumulation trends -likely a biasing effect-, the rest of used software agreed with CV/F method in including H2B, OUB and ACT into the three or four rstly ranked CHGs, although without concordance in the exact order positions (Table 3). : H2B ranked from 1 st to 2 nd position, OUB also (except for CoD method, where it ranked third), and ACT ranked between 2 nd and 4 th positions. The remaining CHGs, in a sort of parallelism to CV/F rank, usually oscillated between positions 3 th and 5 th (EF) or 4 th and 6 th (GADPH and TUA), except in the case of gNo software, which considered EF and GADPH as the optimal option.
The removal of the highly unstable Hsp affected the ranking of methods for which transcript accumulation or Cq' values of each CHG algorithmically interact with those from the others. Table 3 shows that H2B is still well ranked for all algorithms. Nonetheless, in those algorithms into RFi where the gene ranking is dependent on interactions among genes (CoD, gNo, and NFi), ACT and OUB pass to be into the last three positions. The removal of Hsp, which strongly affected average trends (see changes in BKr values), changed the rankings into CoD and gNo, two algorithms highly biased by co-regulation, a fact to consider especially in cases of plant multi-stress induction (such as the present one) even in cases of normally non-regulated genes [32,33]. The NFi integrated in RFi is also sensible to inter-sample systematic variations: in the present case, it is sensible to co-regulation. The use of the remaining six CHGs for subsequent estimations is into the limit of the advised number of CHGs to be used in ranking algorithm NFi [26].
After Hsp removal, all methods showed H2B as the most stable CHG (or 3 rd most stable for gNo) ( Table 3). In other words, H2B is the only selected CHG common for all methods. H2B was previously recommended to be used as RG due to its high stability in in vitro rooted Eucalyptus globulus microcuttings [9].
ACT usually ranks between 2 nd and 4 th positions. This ranking reinforces its 2 nd place for CV/F. OUB, 3 rd for CV/F ranking, ranks in the last two positions for those algorithms more susceptible to co-regulation effects. Since this is due to its transcript accumulation trend opposite to the rest of the ve remaining CHGs, in a way its behaviour compensates the opposite trend of both H2B and ACT, stabilizing the correspondent NF, as happens when the selection of two genes is con gured for independent NFi. It should be noticed that when the selection of only two genes is switched for this software, OUB -together with H2Bis suggested to compose the NF.
Selection of RGs composing NFs is performed after establishment of a stability ranking of CHGs. In spite of the commented preference for CV/F method in order to establish a stability ranking for CHGs, it is worth to compare if NFs obtained from each method can be similarly valid. There is a wide consensus in accepting different criteria for establishment of a proper NF after CHG ranking, but they have not been applied together in a systematic way. According to the 2nd round classi cations (after discarding Hsp) of the prompted CHGs, such criteria were adapted for an optimal selection of RGs into each ranking methodology, as follows in next Step.
Step 8. Selection of RG sets

Calculation of pairwise variations between possible NFs
A proper NF should be i) as stable among samples as possible as long as ii) its variations mainly re ect the real technical errors in order these errors be compensated. The rst criterion, NF stability, is usually procured through selection of a number of the most stable CHGs -i.e., RGs, which compose the NF. According to this, six NFs were calculated for each gene ranking method, being each NF composed from the most stable n CHGs, being n=1 to 6, and ordered according to n (Fig. 2).
The replicate pairwise variation (V) [23] indicates the variability between NF pairs. If V is low, both NFs are considered equivalent for normalization. If the compared NFs are composed each by the n and the n+1 most stable CHGs, and V is low, both NFs may be valid for normalization. If the pairwise comparisons are made from n=1 to the total number of CHGs, the rst low V found is considered to correspond to the optimal NF pair, and usually the NF composed with the minor number of genes is selected. A V threshold value of 0.15 is accepted as low enough, but this consensus should not be strict [11], especially when considering complex designs of various factors as in case presented in this article. Also, a trend of changing V values when adding new genes for the calculation of the NF is recognized to be equally informative [31,41,42], and in fact values higher than 0.15 have been accepted [32,43,44]. Concretely, the best option for a decreasing V trend are the NFs corresponding to the V lowest value. Moreover, if after a decreasing trend, even to less than 0.15, V starts to grow again by the addition to the NF of a worse ranked CHG, more instability may be being introduced, and consequently more error. Thus, the inclusion of such a CHG as RG is not advisable.
The methodology for V calculation has been used so far only for comparisons of pairs of NFs composed by n and n+1 CHGs as ranked according to the CHG stability obtained with gNo, since this algorithm is implemented in that software [23]. In the present work, such an algorithm was adapted to determine V between any pair of possible NFs, hereby obtaining left graphics of Fig. 2.

Determination of representative candidate NFs
Normalization against more than one gene is a priori recommended [45]. At least three RGs to calculate NF are recommendable [46,47]. Three nonphysiologically related genes are more representative, as long as NF is stable.

Assessment of congruent stability and selection of NFs for each gene ranking method
Selected NF among those constructed for each ranking method should be as stable as possible as long as the previous criteria are kept. Maximum or high NF stability underlies the method proposed in [48] for choosing NFs, but this may overlook CHG representation. Thus, the overall CHG representation in the candidate NFs and the congruency of these NFs -according to the previous criteria of stability and compensation of errors-were tested by assessing the stability of the candidate NFs regarding discarded NFs and individual CHGs, a matter that tends to be ignored. To inspect the suitability of the candidate NFs, the next rules were introduced: 1. i) Ideally, important variability parameters (CV and F) for NFs should be lower than those of the CHG -composing the NF-having the maximum values for such parameters (see Fig. 2 right). Otherwise, this worse ranked CHG may be unnecessarily contributing to additional NF instability.
2. ii) In this regard, as in general, when selecting NFs the ranking provided by overall variability (CV) should ideally have priority to that given by inter-group variability (F).
iii) F would be more representative of the whole experimental panel if it spans a higher number of factors: in the present case, F3 should have more priority than F2.
1. iv) Economical criteria may be taken into account additionally when two or more NFs may be similarly valid according to all above criteria, consequently selecting the RG set with the minimum number of CHGs.
In Table 3 (genes in bold) and Fig. 2, the proposed and selected RGs (genes which contribute to the NFs) for different algorithms are represented after discarding Hsp as highly unstable gene and applying the above described criteria. In all cases those rules brought NFs composed from three RGs. Results show three possible NFs, depending on the ranking method: H2B, OUB and ACT (for CV/F and BKS methods); H2B, ACT and EF (for independent NFi algorithms), and H2B, EF and GADPH (for the rest of the methods). There is a consensus for the selection of these three CHGs amongst RFi and their implemented methods, except for BKS. This apparent congruence can be an effect of the bias by co-regulation. In the case of independent NFi, which takes into account biological groups, the GADPH 2 nd position resulting from those algorithms is substituted by ACT, but this can be still an effect of the in uence of co-regulation. BKS and CV/F methods, not affected by the commented pitfalls, change the ACT of the selected NF of independent NFi by OUB. The different trend of this gene with respect to the majority of CHGs increases the stability of the resulting NF. Only H2B, the most stable CHG for all methods except for gNo (where ranked 3 rd ), constituted a consensus among the selected NFs.
In order to test the possible validity of any of the three selected NFs, V values between them were calculated. V values were always much higher than 0.15 (0.27 was the lowest one -data not shown), thus there being important signi cant differences between the correspondent NFs. The commented de ciencies of algorithms suggest the CV/F method as a reasonable solution, and its use at least as a rst approach. Only BKS, unaffected by co-regulation since uses a similar way to CV/F, had analogous results to CV classi cation. If being not too strict with high standard deviation for BestKeeper, BKS classi cation could be a good combination with F to classify stable genes. In fact, according to [49], the BestKeeper approach may be useful to narrow down a search if no speci c genes are known to be plausible candidates.
Step 9. Error compensation versus stability: inspection of the quality of the selected NFs The quality of NFs may be partially assessed by inspecting: i) in what measure they compensate the technical errors of transcript levels measurement, and ii) if they can also be valid for inter-assay (biological replicates in the present case) comparisons.
The total variation of the measurement of the transcriptional levels comprises the real biological variation plus technical errors. These errors are systematic when they are due to factors of experimental design (e.g. in the case of samples from a given experimental condition which are all more prone to RNA degradation), and may increase from RNA isolation with associated effects like carried over effects (such as analytical processing of experimental groups at different times or in different conditions). Carried over effects are minimized when using a sample maximization design [25] -as in the present case-, e.g., run-to-run variation. On the other hand, non-systematic, i.e., non-experiment-associated technical errors should weigh proportionally less among experimental groups (formed by, for example, different treatments or time points) than among biological, equivalent replicates. In the present case, since assay is a factor xed by rooting potential, the assay-associated systematic variation (and consequently the included systematic error) should weigh proportionally more in the overall inter-assay variation than in assays for which the rooting potential would have been similar. Thus, the inter-assay systematic errors should be compensated as possible by the selected NF avoiding the overcompensation, i.e., the compensation of the real biological inter-assay variation. This convenience is kept for the NFs selected in the analyses (Fig. 2) for all the indicated CHG ranking methods, since inter-assay variation of these NFs is moderate or low with respect to those non-selected and the individual CHGs (Fig. 3). What is more, except for the independent NFi, the inter-assay variation of the selected NFs is similar to that of the CHG, among those composing the NF, with maximal inter-assay variation. Thus, selection criterium i for Step 8 (Assessment of congruent stability) is also kept for inter-assay case. These considerations can lead to the conclusion that the selected NFs, for both bi-and trifactorial cases, are also valid for inter-assay comparisons. Obviously, NFs should also compensate non-systematic analytical errors. The inter-replicate variation (Fig. 3) virtually only accounts for the technical error accumulated during RT-replicates preparation and later associated effects, i.e., a non-systematic, non-factor associated error. Inter-replicate variation tends to decrease asymptotically with the number of RGs to a theoretically precise average value, as can be con rmed in Fig. 3. All genes are in reality absolutely stable for technical replicates, and consequently inter-replicate error would be optimally compensated by a NF composed from a large number of RGs.
Nevertheless, the contribution to the NFs of such a large number of RGs would compromise other requirements such as previously indicated criteria on this regard, so as a good compensation for systematic errors. Thus, a moderate NF inter-replicate variation value, of course lower than the NF-composing CHG with maximum value for this parameter, would be an acceptable deal. This is the case of selected NFs. This means a partial compensation of inter-replicate error, without compromising systematic error compensations and a reasonable stabilization of real transcript levels represented by the NF.
The inter-replicate mean distance of the NFs composed from the all seven CHGs (Ct'=0.0042) speci cally accounts of average error due to RT-replicate preparation. In exchange, the individual CHG inter-replicate mean-distances include all non-systematic technical errors, carried over or not. Selected NFs are again well ranked for this descriptor, having an intermediate-high value (0.0072 corrected cycles), and then overcompensating RT average error (0.0042 corrected cycles) and thereby accounting of later errors too (see replicate compensation ranking, given by standard deviations of the individual CHG interreplicate mean distances) (Fig. 3).
Step10. Determinationof the optimal normalization factor for the complete bi-and trifactorial panels For CV/F method, NF3, composed from H2B, OUB and ACT, is a good compromise to reduce V between two consecutive CHG groups (Fig. 2), having a major representation than only two CHGs and keeping the stability conditions stated above. OUB has a slight opposite trend to both H2B and ACT transcript accumulations, then compensating bifactorial (time x treatment) and trifactorial (time x treatment x assay) inter-group variations for NF3. NF3 offers low intergroup (bi-or trifactorial) variation (Fig. 2 right) and intermediate inter-replicate variation (Fig. 3 right), thus being reasonably stable for group-associated systematic variations without compromising the compensation of non-systematic technical errors. The inclusion of additional genes introduces more instability (Fig. 2 right). Furthermore, the addition of one or two worse ranked genes is not advisable, since they rarely are selected when testing experimental conditions separately (see next Step). As a general conclusion, this three-gene NF represents the optimal combination of CHGs, amongst those tested, for the general panel of assayed experimental conditions. Slight instability among groups may account for systematic variations.
Step 11. Normalization factors for more speci c experimental conditions Additionally, the validity of NF3 for speci c treatments inside of a time and along time inside of a treatment was checked. In these cases, the experimental condition groups consisted of the different levels of time or treatment (thus giving a one factor design) or in combination with the assays (thus giving a two factor design). Single CHGs were subjected to graphic inspection and ranked for each time or treatment according to the same algorithms used for the complete bi-or trifactorial panels (Table 4). After inspection of the graphics (Fig. 1), Hsp, due to its high instability, was removed from further consideration for IBA, IBA+SHAM, 4 h and 24 h. The genes ranked in last position for F were not removed as always ranked in the rst four positions according to CV (Table 4). After determination of a CHG ranking according to CV/F method -the method discussed above to be considered in general more suitable -the before described RG selection criteria were followed (Fig. 4).  Fig. 4). To see a detailed selection of RGs for groups see Additional le 3. Summarizing (Fig. 4), except for 96 h, where the combination H2B, ACT and EF is the optimal, H2B and OUB (NF2) belong to all selected NFs for the cases of speci c experimental conditions (one given treatment with time point as a factor or a given time point with treatment as a factor), whatever assay is included or not as a second factor. In some cases, the addition of EF (Control treatment) or TUA (4 h) to NF2 forms the optimal set for normalization. In most of the cases (IBA, IBA+SHAM, 24 h and 48 h), the addition of ACT to NF2 (and then constituting NF3) improves normalization according to the above pointed criteria. As previously indicated, for the bi-and trifactorial whole panels (treatment x time and treatment x time x assay, respectively), NF3 (H2B, OUB and ACT) is proposed as the optimal NF.
For more speci c cases (a given treatment at a given time point, or even biological or technical replicates in the same experimental conditions of the same assay), checking of the variability of CHGs among biological or technical replicates would be required, following a similar methodology to that exposed. With the present transcript accumulation dataset, the total number of replicates inside of a given experimental condition (6 = 3 assays x 2 technical replicates) may be too low to perform a consistent test. By this reason, if a NF should be estimated from the present datasets, the results obtained for the level-associated bifactorial cases (treatment level x assay or time level x assay) can be more valid. In spite of this, the decomposition of bi-and trifactorial whole panels for separate normalization and statistical comparison into the more simple level-associated mono-or bifactorial panels, respectively, is not advisable if a considerable loss of statistical power may be involved. Anyway, in such cases, CV and F values for NF3 are always low, which support the general use of NF3 as the most practical NF for most situations. Nevertheless, the obtained results coincide with those of other reports where differences in RG stability depend on characteristics of biological material or environmental conditions ( [50] and references therein). The complexity level of an experimental panel diminishes normalization accuracy, as can be expected a priori.
Step 12. Evaluation of NF3 by comparison with NF2 It was argued above (Steps 10 and 11) the proposition of NF3 as the best possible option for normalization of the whole panel, and also for several more concrete experimental conditions here assayed. In spite of this, NF3 may be sub-optimal for the rest of the cases and could be used as a less accurate solution. Nevertheless, according to the results on overall and intergroup variability for restricted panels (Fig. 4), NF2 (which is like NF3 without including the less stable gene, ACT), although less representative, could constitute also a good option, especially for more concrete experimental situations, such as monofactorial case IBA+SHAM (Step 11). NF3 is evaluated bellow by comparison with NF2 when normalizing a hypothetical highly stable gene and a real TG with low stability.

NF3/NF2 comparison by normalizing a stable gene
The theoretical results on an ideally completely stable TG (without any variation) regarding NF3 normalization were compared with those that would be obtained by normalising with NF2 on the same TG. The results are equivalent to normalize NF3 against NF2, and provide accurate differences between both NFs. Fig. 5 shows the deviations for such normalization after equalizing both NFs for time 0 (which constitutes the same situation for all treatments).
Differences in corrected cycles (Ct´) were only between 0.10 and 0.55, which are in the ranges of the intra-group standard deviations even of the more stable genes (Fig. 1), thus indicating that normalization of a real stable gene (which would show intra-group standard deviations) with NF2 would easily trough nonsigni cant differences regarding NF3. In other words, in the practice, NF2 or NF3 could be used for less stable genes normalization without obtaining too much differences. In fact, V for both NFs is about the threshold value of 0.15 accepted as low enough. In spite of it, since all differences gave higher transcript levels than time 0, NF2 normalization may lead to overestimation of relative transcript levels obtained after NF3 normalization (Table 5). According to the above justi cations for adding a third RG, it is supposed that the introduction of ACT in NF2 to compose NF3 corrects not only punctual non-systematic technical errors but also some systematic, group associated errors. Among these, can be detected a decrease of "NF2-transcript accumulation" 4 days after the experiment initiation or in samples from not-SHAM treated explants (with stress palliated by AOX) 4 h after the experiment initiation.

NF3/NF2 comparison by normalizing an unstable gene
According to the previous conclusions, to evaluate the applicability of NF2 or NF3, signi cant differences in transcript levels of a real unstable gene (Hsp) were determined according to both NFs. As shown in Fig. 1 and re ected in CV and F2 values ( and 2d that NF3 did not do (P =0.062), and the same happens in BA for assays I and III.
Combined evaluation.
After applying both comparisons, it can be concluded that the more TG-transcript level variations appear (higher intra-group standard deviations and more evident inter-group variation), the less discrepancies about signi cant differences in gene transcript accumulation are generated by the application of different NFs. Thus, NF2, here considered as sub-optimal, can generate the same signi cant differences than NF3 when the TG is unstable enough, including when its expression depends on experimental conditions.

Conclusions
Selection of appropriate RGs is crucial for the validity of transcription studies. The CV/F method (Steps 3 to 7) demonstrates to be a friendly and useful gene stability-ranking procedure lacking the inaccuracy generated by the in uence of biological systematic errors such as gene co-regulation. This inaccuracy may affect in different ways the most popular algorithms applied for such a purpose, leading to possible misinterpretations. Nevertheless, the additional use of popular software is advisable as complementary support of core methods. This is generally accepted state-of-the-art knowledge. However, combining the showed criteria for selection of a proper number of stable RGs provides more accuracy to normalization. The nucleus of the integrated procedure is the selection of the RG sets (Step 8): i) calculation of pairwise variations between any pair of possible proper NFs by implementing a method until now used only when gNo software have been run, ii) determination of representative candidate NFs, and iii) assessment of a congruent stability of candidate NF, a fact usually underconsidered.
The quality of the resulting NF may be additionally validated (Step 9) by inspecting its suitability for comparisons among non-considered factors or biological or technical replicates, and also checking in what extension such NF compensate errors of transcript level measurement. For those purposes, special statistical estimators were formulated for the case in study.
The NF composed from H2B, OUB and ACT provides a valid normalization for TGs in studies on olive microshoot adventitious rooting when comparing treatments, time points and assays (Steps 10 and 11).
Finally, double evaluation (Step 12) against both a theoretically highly stable gene and a real gene with relatively high instability provided information about the suitability of possible alternative NFs. The validity of the use of a sub-optimal NF depends on the variability of the studied TG. The more stable a TG is, the more transcript accumulation differences the application of different NFs will bring.   Pairwise variation (left) and descriptive parameters (right) for the normalization factors. Nomalization factors are composed from the indicated candidate housekeeping genes, determined according to the indicated methods. Descriptive parameters are CV and F for bifactorial -F2-and trifactorial -F3-cases. Discontinuous lines indicate the maximum value of the parameter into the candidate housekeeping genes contributing to the indicated normalization factors.

Abbreviations
Arrows indicates the selected NFs after the determination of representative candidate NFs (left) and the assessment of congruent stability (right) (see text, Step 8).  Pairwise variation (left) and descriptive parameters (right) for the normalization factors. Normalization factors are composed from the indicated candidate housekeeping genes, determined for CV ranking. Descriptive parameters are CV and F for bifactorial -F2-and trifactorial -F3-cases. Discontinuous lines indicate the maximum value of the parameter into the CHGs contributing to the indicated normalization factors. Arrows indicates the selected normalization factors after the determination of representative candidate NFs (left) and the assessment of congruent stability (right) (see text, Step 8).