Normalization of Gene Expression Data Revisited: The Three Viewpoints of the Transcriptome in Human Skeletal Muscle Undergoing Load-induced Hypertrophy and Why They Matter

doi:10.21203/rs.3.rs-1008326/v1

Download PDF

Research Article

Normalization of Gene Expression Data Revisited: The Three Viewpoints of the Transcriptome in Human Skeletal Muscle Undergoing Load-induced Hypertrophy and Why They Matter

https://doi.org/10.21203/rs.3.rs-1008326/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

The biological relevance and accuracy of gene expression data depend on the adequacy of data normalization. This is both due to its role in resolving and accounting for technical variation and errors, and its defining role in shaping the viewpoint of biological interpretations. Still, normalization is often treated in serendipitous manners. This is especially true for the viewpoint perspective, which may be particularly decisive for conclusions in studies involving pronounced cellular plasticity. In this study, we highlight the consequences of using three fundamentally different modes of normalization for interpreting RNA-seq data from human skeletal muscle undergoing exercise-training-induced growth. Briefly, 25 participants conducted 12 weeks of high-load resistance training. Muscle biopsy specimens were sampled from m. vastus lateralis before, after two weeks of training (week 2) and after the intervention (week 12), and were subsequently analyzed using RNA-seq. Transcript counts were modeled as i) per-library-size, ii) per-total-RNA, and iii) per-sample-size (per-mg-tissue).

Result

Initially, the three modes of transcript modeling led to the identification of three unique sets of stable genes, which displayed differential expression profiles. Specifically, genes showing stable expression across samples in the per-library-size dataset displayed training-associated increases in per-total-RNA and per-sample-size datasets. These gene sets were then used for normalization of the entire dataset, providing transcript abundance estimates corresponding to each of the three biological viewpoints (i.e., per-library-size, per-total-RNA, and per-sample-size). The different normalization modes led to different conclusions, measured as training-associated changes in transcript expression. Briefly, for 28% and 24% of the transcripts, training was associated with changes in expression in per-total-RNA and per-sample-size scenarios, but not in the per-library-size scenario. At week 2, this led to opposite conclusions for 5% of the transcripts between per-library-size and per-sample-size datasets (↑ vs. ↓, respectively).

Conclusion

Scientists should be explicit with their choice of normalization strategies and should interpret the results of gene expression analyses with caution. This is particularly important for data sets involving a limited number of genes or involving growing or differentiating cellular models, where the risk of biased conclusions is pronounced.

Bioinformatics

RNA-seq

skeletal muscle

normalization

resistance training

No competing interests reported.

Download PDF

Editorial decision: Major revision
16 Dec, 2021
Reviews received at journal
14 Dec, 2021
Reviews received at journal
06 Dec, 2021
Reviewers agreed at journal
29 Nov, 2021
Reviewers agreed at journal
29 Nov, 2021
Reviewers invited by journal
29 Nov, 2021
Editor assigned by journal
12 Nov, 2021
Editor invited by journal
05 Nov, 2021
Submission checks completed at journal
05 Nov, 2021
First submitted to journal
22 Oct, 2021

You are reading this latest preprint version

Normalization of Gene Expression Data Revisited: The Three Viewpoints of the Transcriptome in Human Skeletal Muscle Undergoing Load-induced Hypertrophy and Why They Matter

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1