Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis

doi:10.21203/rs.2.9587/v2

Download PDF

Methodology article

Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis

https://doi.org/10.21203/rs.2.9587/v2

This work is licensed under a CC BY 4.0 License

Journal Publication

published 10 Jan, 2020

Read the published version in BMC Genomics →

You are reading this older preprint version

Read the latest preprint version →

Background: RNA sequencing allows the measuring of gene expression at a resolution unmet by expression arrays or RT-qPCR. It is however necessary to normalize sequencing data by library size, transcript size and composition, among other factors, before comparing expression levels. The use of internal control genes or spike-ins is advocated in the literature for scaling read counts, but the methods for choosing reference genes are mostly targeted at RT-qPCR studies and require a set of pre-selected candidate controls or pre-selected target genes. Results: Here, we report an R-based script to select internal control genes based solely on read counts and gene sizes. This novel method first normalizes the read counts to Transcripts per Million (TPM) and then excludes weakly expressed genes using the DAFS script to calculate the cut-off. It then selects as references the genes with lowest TPM covariance. We used this method to pick custom reference genes for the differential expression analysis of three transcriptome sets from transgenic Arabidopsis plants expressing heterologous fungal effector proteins tagged with GFP (using GFP alone as the control). The custom reference genes showed lower covariance and fold change as well as a broader range of expression levels than commonly used reference genes. When analyzed with NormFinder, both typical and custom reference genes were considered suitable internal controls, but the expression of custom selected genes was more stable. geNorm produced a similar result in which most custom selected genes ranked higher (i.e. expression more stable) than commonly used reference genes. Conclusions: The proposed method is innovative, rapid and simple. Since it does not depend on genome annotation, it can be used with any organism, and does not require pre-selected reference candidates or target genes that are not always available.

Epigenetics & Genomics

next-generation sequencing

housekeeping genes for qPCR

R script

RNAseq is a technique used since the pioneer studies of R Lister, RC O'Malley, J Tonti-Filippini, BD Gregory, CC Berry, AH Millar and JR Ecker [1] (Arabidopsis thaliana), U Nagalakshmi, Z Wang, K Waern, C Shou, D Raha, M Gerstein and M Snyder [2] (Saccharomyces cerevisiae), BT Wilhelm, S Marguerat, S Watt, F Schubert, V Wood, I Goodhead, CJ Penkett, J Rogers and J Bähler [3] (Schizosaccharomyces pombe), and A Mortazavi, BA Williams, K McCue, L Schaeffer and B Wold [4] (Mus musculus). This technique allows the combination of transcripts discovery and expression levels quantification in a single assay and has an unlimited dynamic range of detection compared to microarrays or RT-qPCR [5, 6].

For differential expression studies, the gene expression values must be comparable between samples, which means that count data should be normalized for sequencing depth and other biases such as transcript length, GC content and transcript coverage. Reads/Fragments per Kilobase per Million (RPKM or FPKM) and Transcripts per Million (TPM) both normalize count data by transcript length and sequencing depth [7], but they may give biased results in the presence of highly expressed genes or when a lot of the genes are expressed in only one sample [8]. This is because one differentially expressed gene shifts the sequencing effort distributed to the others and all genes appear to be differentially expressed [9-11]. Other methods such as relative log expression (DESeq2) and trimmed mean of M-values (edgeR) can work with the carry-over effect of highly expressed genes [10].

The comparison of different softwares for RNAseq analysis is a recurrent subject in the literature [12-14] and many authors argue over the benefits of using housekeeping genes or spike-in controls to scale the count data, yet the evaluation of the reference genes used for RNAseq data analysis is not as common. When using internal or external control genes, the normalization is first performed on the controls and the result is used to normalize the other genes. The use of external spike-ins is advocated for introducing little error into the read counts, allowing identification of global shifts in gene expression [15-17]. However, reports have shown mixed performances with different normalization methods [18], resulting in high false discovery rates and false positive rates [19]. These may show differences in amplification depending on the type of tissue studied or the protocol for mRNA enrichment [20].

One alternative for external spike-ins is the use of internal control genes, as it is done in qPCR studies. Typical control genes are actin, tubulin, elongation factor 1, polyubiquitin and ribosomal RNAs, though the stability of expression of several of those is dependent on the conditions studied [21]. To solve this issue, different algorithms were proposed to find genes having the most stable expression, mostly for qPCR applications, but they need a set of predefined genes of interest (RefGenes, T Hruz, O Laule , G Szabo, F Wessendorp, S Bleuler, L Oertle, P Widmayer, W Gruissem and P Zimmermann [22]) or a set of pre-selected candidate reference genes (geNorm, J Vandesompele, K De Preter, F Pattyn, B Poppe, N Van Roy, A De Paepe and F Speleman [23]; NormFinder, CL Andersen, J Ledet-Jensen and T Ørntoft [24]; BestKeeper, MW Pfaffl, A Tichopad, C Prgomet and TP Neuvians [25]). The most frequent approach is to take previously identified stably expressed genes, as done by B Zhuo, S Emerson, JH Chang and Y Di [11] this however does not ensure that the selected genes will show stable expression in the studied organism and conditions.

Here we propose a simple and fast method to identify the genes having the most stable expression for each experimental condition. Our method is aimed at differential expression studies and represents a simple way to select custom reference genes for any species or any type of experiments, so they can be used in the normalization step of differential expression analysis algorithms, and does not necessitate spike-ins. It alleviates the problem inherent to predefined reference genes, which may not be stably expressed across experimental set-ups and are applicable to a single species.

Initially three RNAseq transcriptomes were generated using Arabidopsis transgenic plants expressing GFP alone (control) or GFP-fused to fungal effector genes (Mlp37347 and Mlp124499). We tested the normalization of our RNAseq data using two sets of reference genes: commonly used reference genes (Table 1) and the 104 stable Arabidopsis genes proposed by B Zhuo, S Emerson, JH Chang and Y Di [11]. The first set of reference genes was assessed for stability in three different permutations of the transcriptome sets as shown in Figure 1A (panel 1: Mlp37347 vs Control, panel 2: Mlp124499 vs Control, panel 3: Mlp124499 vs Mlp37347). In each case, high levels of covariance, ranging from 4.9% (NDUFA8 in Mlp124499 vs Mlp37347) to 41.5% (tubulin 6 in Mlp124499 vs Mlp37347) were obtained. Next, we performed the same analysis using the 104 genes proposed by B Zhuo, S Emerson, JH Chang and Y Di [11]. For the three permutations of the transcriptome sets, important fluctuations in the covariance were observed ranging from 2.9% to 49% (Figure 1B). Finally, we did the same for the set of 30 genes selected by T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26] for several plant tissues (Additional file 1). These results demonstrate that neither the commonly used reference genes, nor the 104 reference genes proposed by B Zhuo, S Emerson, JH Chang and Y Di [11] were the most stable in our conditions.

Table 1. Common reference genes used in this study for comparison against custom selected reference genes.

Symbol	Name	ATG
Actin 2	ACT2	AT3G18780
Actin 7	ACT7	AT5G09810
Actin 8	ACT8	AT1G49240
Adenine phosphoribosyltransferase 1	APT1	AT1G27450
Elongation factor 1-α	EF1α	AT5G60390
Eukaryotic translation initiation factor 4A-1	elF4A	AT3G13920
NADH-ubiquinone oxidoreductase 19-kDa subunit	NDUFA8	AT5G18800
Tubulin β-2/β-3 chain	TUB2	AT5G62690
β-tubulin 6	TUB6	AT5G12250
Tubulin β-9 chain	TUB9	AT4G20890
Polyubiquitin	UBQ4	AT5G20620
Ubiquitin extension protein	UBQ5	AT3G62250
Polyubiquitin	UBQ10	AT4G05320
Polyubiquitin	UBQ11	AT4G05050

In order to search for genes having the most stable expression, we developed a custom method to select reference genes using only one’s own RNAseq data. We first used a R function to transform the count data into Transcripts per Million [27] and calculate the average TPM and covariance for each gene. We then used the DAFS function [28] to calculate a cut-off for the exclusion of weakly expressed genes. Finally, the 0.5% remaining genes with lowest covariance were selected as reference genes (R script in Additional file 2, https://gist.github.com/KarenGoncalves/e8541973395f7947895408f38fecdf14). This pipeline is thereafter referred to as the custom selection script.

To test the developed method, we used the same transcriptome sets described in Figure 1 (the list of selected genes for each analysis is available in Table 1, Additional file 3). For each transcriptome set, we show in Figure 2 the average expression in log₂ TPM and covariance of the common reference genes (Common), the set of 30 genes from T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26] (Czechowski et al. 2005), the set of 104 genes from B Zhuo, S Emerson, JH Chang and Y Di [11] (Zhuo et al. 2016) and the genes selected using the custom selection script (Custom script). In all pairings the custom selected reference genes show broader range of expression levels and lower covariance (Figure 2) than the other sets. Next, we performed a differential expression analysis with DESeq2 [29] without control genes. We show in Figure 3 the log₂-transformed fold change by the ‒log₁₀-transformed adjusted pValue for each gene set. We can see that the set of genes selected with the custom script show lower fold change in all cases. We also compared the results of DESeq2 using no reference gene or the four sets indicated above for each permutation. As is shown in Table 2 (Additional file 3), in all the permutations the analysis without the use of references gives higher number of up-regulated genes than the analyses that use any of the reference sets while resulting in a lower number of down-regulated genes, possibly indicating a shift to downregulation that is not detected without reference genes.

To further test the stability of the custom reference genes in our experiment, we used NormFinder [24] and geNorm [23] to compare the four sets of reference genes using log₂ transformed TPM values. The complete result is presented in the Tables 3-5 of the Additional file 3. We present in Figure 4 the comparison of the set of common reference genes against the custom selected reference genes. The gene AT5G18800 (NDUFA8) which is in the set of common references was selected by the custom script in all three permutations and is shown with a purple border. Both sets of genes (custom and common refences) were under the stability threshold of NormFinder (0.5), meaning that the software considers them suitable references genes, however the custom selected genes (shown with a blue border) were more stable than the commonly used genes (shown in red, Figure 4). This was also the case for most genes tested with geNorm.

The use of reference genes in RNAseq studies is suggested in the literature [15-17], yet the methods for the selection of these genes are designed for qPCR data and require a set of preselected reference or target genes or the selection of conditions similar to that of one’s own experiment [22-25], which are not always available. As there is no previous transcriptomic study of plants constitutively expressing fungal effectors and since the information available on these effectors is scarce [30], it is not possible to know a priori their function and which host genes are impacted by the presence of these fungal proteins. For these reasons, we propose a new R-based function which enables the selection of custom reference genes regardless of the organisms used or of the experimental conditions.

The method developed here only requires information available from the RNAseq analyses. It uses Transcripts per Million [27] as a proxy for the expression level and the DAFS algorithm [28] to exclude genes with low counts, which may be inactive [31]. We first assessed whether the most commonly used reference genes (Table 1) or two sets of published reference genes for Arabidopsis [11, 26] were indeed the most stable in our experimental conditions. As demonstrated in Figure 1 and additional file 1, three sets of reference genes show a high level of covariance in our experimental conditions, indicating that they were not suitable reference genes for our differential expression analysis.

Having a high level of variability in the expression of the reference genes results in skewed quantitative analysis and may cause the loss of some differentially expressed genes which show modest variation in gene expression [21]. Thus, to alleviate the bias inherent to the use of inappropriate reference genes, we devised a R-based pipeline to select custom reference genes for one’s own experimental data. As presented in Figure 2 and 3, in all the pairings of the data used, the custom selected reference genes outperformed the other sets of reference genes in their expression stability, presenting lower fold changes and lower covariances. Our method also allows the selection of more reference genes (the final number is user defined), giving more reference points, hence more robustness, to the normalization of genes expressed at different levels.

Our results show the need for a new R-based pipeline for the selection of custom reference genes in transcriptomic studies. Our method can be applied to any organism and to any type of experimental conditions, and can easily be implemented or modified in R. This tool provides an alternative to spike-in controls and represents an improvement over pre-defined reference genes which may not be stable in one’s own experimental conditions.

Initial Arabidopsis thaliana Columbia-0 were obtained from Arabidopsis Biological Resources Center (ABRC). Arabidopsis transgenic plants expressing GFP alone (Control) or fused to a candidate secreted effector protein of the fungus Melampsora larici-populina (Mlp37347 or Mlp124499), obtained in our laboratory [30], were used for the transcriptome analysis.

RNA was extracted from pooled aerial tissue of 2-week-old soil-grown plants, doing four replicates per genotype, with the Plant Total RNA Mini Kit (Geneaid) using RB buffer following manufacturer’s protocol. The samples were treated with DNAse, then RNA quality was assessed using agarose gel electrophoresis. Libraries were generated with the NeoPrep Library Prep System (Illumina) using the TruSeq Stranded mRNA Library Prep kit (Illumina) and 100 ng of total RNA following manufacturer’s recommendations. The libraries were then sequenced with Illumina HiSeq 4000 Sequencer paired-end reads of 100nt.

Libraries were trimmed using Trimmomatic [32] (LEADING:4 TRAILING:4 SLIDINGWINDOW:4:20 MINLEN:20) and then the surviving paired reads were aligned to the TAIR10 assembly of the genome of A. thaliana with TopHat v2.0.14 [33] in Galaxy [34] (default options, with average mate inner distance varying for each replicate (Additional file 4) and standard deviation of mate inner distance of 50 base pairs). The general information of the sequencing results and mapping data is presented in Additional file 4, the dataset was deposited in NCBI-SRA under BioProject PRJNA528094 and in NCBI-GEO under the accession GSE136038. Further analyses were done using R software v.3.2.5. Genomic ranges of Arabidopsis transcripts were obtained from Ensembl plants [35] with GenomicFeatures and overlaps of sequencing reads with the transcripts were counted using GenomicAlignments [36], using options for paired-end reads and union mode.

We transformed the counts into TPM [27] and calculated the cutoff for active genes with DAFS [28]. We considered as reference the 0.5% of the active genes with the lowest covariance (R script in Additional file 1). Next, we used DESeq2 [37] to confirm that the selected genes were not deregulated. Finally, we compared the custom selected reference genes against three sets of genes (a list of 14 commonly used housekeeping reference genes (Table 1), the reference genes selected by T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26] and the 104 reference genes selected by B Zhuo, S Emerson, JH Chang and Y Di [11]) using NormFinder [24] and geNorm [23], using TPM values for the expression levels.

RNA: Ribonucleic Acid

RT-qPCR: Reverse Transcription quantitative Polymerase Chain Reaction

TPM: Transcripts per Million

DAFS: Data-Adaptive Flag Method for RNA-Sequencing Data

GFP: Green Fluorescent Protein

RPKM: Reads per Kilobase per Million

FPKM: Fragments per Kilobase per Million

RNAseq: RNA sequencing

mRNA: messenger Ribonucleic Acid

qPCR: quantitative Polymerase Chain Reaction

ACT2: Actin 2

ACT7: Actin 7

ACT8: Actin 8

APT1: Adenine phosphoribosyltransferase 1

EF1α: Elongation factor 1-α

elF4A: Eukaryotic translation initiation factor 4A-1

NDUFA8: Nicotinamide adenine dinucleotide-ubiquinone oxidoreductase 19-kiloDalton subunit

TUB2: Tubulin β-2/β-3 chain

TUB6: β-Tubulin 6

TUB9: Tubulin β-9 chain

UBQ4: Polyubiquitin

UBQ5: Ubiquitin extension protein

UBQ10: Polyubiquitin

UBQ11: Polyubiquitin

ABRC: Arabidopsis Biological Resources Center

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and materials

The dataset used herein was deposited in NCBI-SRA under BioProject PRJNA528094.

Competing interests

The authors declare that they have no competing interests.

Funding

Funding for the project was provided by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants to HG. The project in HG’s laboratory was also partially funded by an institutional Research Chair and a Canada Research Chair held by HG and a Canada Research Chair held by IDP. KCGS was funded by a master’s scholarship from the Fondation de l’Université du Québec à Trois-Rivières, an international PhD scholarship from the Fonds de Recherche du Québec sur la Nature et les Technologies (FRQNT) and a graduate fellowship from MITACS.

Authors’ contributions

KCGS, IDP and HG designed the work; KCGS performed the experiments; KCGS and HG wrote the paper; IDP and HG revised the paper and all authors approved the manuscript.

Acknowledgements

We thank Melodie B. Plourde for revising the manuscript.

Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR: Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 2008, 133(3):536.
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008, 320:1349.
Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bähler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 2008, 453:1245.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods 2008, 5(7):628.
Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews genetics 2009, 10(1):63.
Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X: Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PloS one 2014, 9(1):e78644.
Wagner GP, Kin K, Lynch VJ: Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory in biosciences 2012, 131:285.
Models for transcript quantification from RNA-Seq
Robinson MD, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology 2010, 11:R25.
Wolf JBW: Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial. Molecular ecology resources 2013, 13(4):572.
Zhuo B, Emerson S, Chang JH, Di Y: Identifying stably expressed genes from multiple RNA-Seq data sets. PeerJ 2016, 4:e2791.
Evans C, Hardin J, Stoebel DM: Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Briefings in bioinformatics 2018, 19:792.
Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome biology 2013, 14:R95.
Soneson C, Delorenzi M: A comparison of methods for differential expression analysis of RNA-seq data. BMC bioinformatics 2013, 14(1).
Lovén J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee TI, Young RA: Revisiting global gene expression analysis. Cell 2012, 151(October):482.
Lutzmayer S, Enugutti B, Nodine MD: Novel small RNA spike-in oligonucleotides enable absolute normalization of small RNA-Seq data. Nature scientific reports 2017, 7:5913.
Taruttis F, Feist M, Schwarzfischer P, Gronwald W, Kube D, Spang R, Engelmann JC: External calibration with Drosophila whole-cell spike-ins delivers absolute mRNA fold changes from human RNA-Seq and qPCR data. BioTechniques 2018, 62(2):61.
Risso D, Ngai J, Speed TP, Dudoit S: Normalization of RNA-seq data using factor analysis of control genes or samples. Nature biotechnology 2014, 32(9):902.
Paepe KD: Comparison of methods for differential gene expression using RNA-seq data. Dissertation. Gand: Universiteit Gent; 2015.
Qing T, Yu Y, Du T, Shi L: mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies. Science China life sciences 2013, 56(2):142.
Gutierrez L, Mauriat M, Guénin S, Pelloux J, Lefebvre JF, Louvet R, Rusterucci C, Moritz T, Guerineau F, Bellini C et al: The lack of a systematic validation of reference genes: A serious pitfall undervalued in reverse transcription-polymerase chain reaction (RT-PCR) analysis in plants. Plant biotechnology journal 2008, 6(6):618.
Hruz T, Laule O, Szabo G, Wessendorp F, Bleuler S, Oertle L, Widmayer P, Gruissem W, Zimmermann P: Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes. Advances in bioinformatics 2008, 2008:420747.
Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome biology 2002, 3(7):research0034.0011.
Andersen CL, Ledet-Jensen J, Ørntoft T: Normalization of real-time quantitative RT-PCR data: a model based variance estimation approach to identify genes suited for normalization - applied to bladder- and colon-cancer data-sets. Cancer research 2004, 64:5250.
Pfaffl MW, Tichopad A, Prgomet C, Neuvians TP: Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper – Excel-based tool using pair-wise correlations. Biotechnology letters 2004, 26(6):515.
Czechowski T, Stitt M, Altmann T, Udvardi MK, Scheible W-R: Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis. Plant physiology 2005, 139(1):17.
Counts_to_tpm.R [https://gist.github.com/slowkow/c6ab0348747f86e2748b/ea6b1a870ca99e68717a22b8cf78ab35e642f0ec]
George NI, Chang C-W: DAFS: a data-adaptive flag method for RNA-sequencing data to differentiate genes with low and high expression. BMC bioinformatics 2014, 15:92.
Love MI, Anders S, Hu W: Differential analysis of count data – the DESeq2 package. Genome biology 2014, 15(550):63.
Germain H, Joly DL, Mireault C, Letanneur C, Stewart D, Morency MJ, Petre B, Duplessis S, Séguin A: Infection assays in Arabidopsis reveal candidate effectors from the poplar rust fungus that promote susceptibility to bacteria and oomycete pathogens. Molecular plant pathology 2018, 19:200.
Hart T, Komori HK, LaMere S, Podshivalova K, Salomon DR: Finding the active genes in deep RNA-seq gene expression studies. BMC Genomics 2013, 14(1):778.
Bolger AM, Lohse M, Usadel B: Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30(15):2120.
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome biology 2013, 14:R36.
Afgan E, Baker D, Van den Beek M, Blankenberg D, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Eberhard C et al: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic acids research 2016, 44(W1):W10.
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W: BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 2005, 21(16):3440.
Lawrence GJ, Huber MLW, Pages H, Aboyoun P, Carlson M, Gentleman R, Morgan M, Carey VJ: Software for computing and annotating genomic ranges. PLoS computational biology 2013, 9:e1003118.
Love MI, Huber W, Anders S: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology 2014, 15(12):550.

Additional file 1

Figure (.tiff)

Title: Covariance of genes from T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26].

Description: Covariance level for each of the 30 genes selected by T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26] for each permutation (A: Mlp37347 vs Control; B: Mlp124499 vs Control; C: Mlp124499 vs Mlp37347).

Additional file 2

R script (.R). Viewer: https://www.r-project.org/

Title: Custom selection script.

Description: R script developed for selection of reference genes from transcriptomic data.

Additional file 3

Spreadsheets (.xlxs)

Title: Tables of additional data.

Description:

Table 1 – TAIR IDs of custom selected references for each transcriptome permutation

Table 2 – DESeq2 results summary of analysis without reference genes or with different reference sets (Custom selected, from Czechowski et al. 2005, from Zhuo et al. 2016 or Commonly used references). Table presents the genes found up- and down-regulated in two analyses.

Table 3 to 5– Summary of the results of several analyses for all the genes evaluated in this article: Column A: TAIR ID; Column B: ranking calculated with geNorm with the function “selectHKs” from the R package “NormqPCR”; Column C: average TPM value; Column D: covariance of the TPM values; Column E: the difference of expression of a gene between two samples calculated with NormFinder; Column F: the common standard deviation of the expression of a gene between two samples calculated with NormFinder; Column G: stability measure from NormFinder; Column H: log2-transformed fold change of each gene calculated with DESeq2 without using reference genes; Column I: adjusted pValue of the gene deregulation calculated with DESeq2 without using reference genes; Column J: sources that identified the gene as a reference, when more than one source selected the gene as reference they are separated by a “;”. Table 3 – Permutation Mlp37347 vs Control; Table 4 – Permutation Mlp124499 vs Control; Table 5 – Permutation Mlp124499 vs Mlp37347.

Table 6: Metadata of samples used; replicate identification, number of sequenced reads, average length of the separation between two paired reads, number of reads after trimming and filtering and number of aligned reads for each of the 4 replicates of the three samples used in this study.

Download PDF

Journal Publication

published 10 Jan, 2020

Read the published version in BMC Genomics →

Editorial decision: Minor revision
14 Dec, 2019
Review #2 received at journal
10 Dec, 2019
Reviewer #2 agreed at journal
26 Nov, 2019
Review #1 received at journal
26 Sep, 2019
Reviewers invited by journal
12 Sep, 2019
Reviewer #1 agreed at journal
12 Sep, 2019
Editor assigned by journal
22 Aug, 2019
Submission checks completed at journal
21 Aug, 2019
Editor invited by journal
21 Aug, 2019

You are reading this older preprint version

Read the latest preprint version →

Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis

Status:

Journal Publication

Version 2

Abstract

Figures

Background

Results

Discussion

Conclusions

Methods

Abbreviations

Declarations

References

Additional Files

Supplementary Files

Status:

Journal Publication

Version 2