Despite the utility and ubiquity of RNA-Seq, many of the confounding elements associated with the technology are still being characterized. In this work, we demonstrated how one such confounder–sample freeze-thaw–impacts sample quality and downstream analyses. We highlighted biases in publically available datasets, and observed an increased 3' bias when both freeze-thaw and poly(A)-extraction are features of a sample.
To gain a comprehensive understanding of this effect, we first simulated technical replication to measure the noise between technical replicates with different treatments. This allowed us to examine the impact of freeze-thaw cycles and the ability of RIN to capture those impacts. Next, we examined the impact of freeze-thaw cycles on the robustness and reproducibility of differential expression analysis. We found freeze-thaw cycles were substantially detrimental to the stability of gene expression analysis. By our estimates and at these subset sizes, reproducibility of a differential expression signature approaches zero after three freeze-thaw cycles (Supplementary Table 4). Finally, we demonstrated that poly(A)-enriched samples demonstrate substantial 3’ bias in read coverage with increased freeze-thaw cycles. Our results have implications with regards to technical variation due to sample handling, the sensitivity of differential gene expression analysis for frozen tissues and samples, and the utility of RIN.
Technical variation in RNA-Seq is substantial and can be attributed to a variety of factors, including read coverage, mRNA sampling fraction, library preparation batch, GC content, and sample handling38,39. As such, accounting for technical variation has been a major research area of focus for the past decade1,2,5,39,40. Degradation in combination with poly(A)-enrichment is a known source of variation in RNA-SEq. Yet, before technical variation can be accounted for, it must be characterized. While studies have looked into the effect of degradation on RNA-Seq, each mode of degradation impacts sample quality differently, and direct connections between freeze-thaw and sample quality has mainly been assessed via RIN16,41,42.
Our simulation of technical replicates helps delineate technical variation due to sample handling–specifically, freeze-thaw. Furthermore, the resulting noise provides an estimate for the number of random read counts associated with a gene. For example, given an average 25 million reads sequenced per sample, our approximate 4 percentage points difference in noise between one and two freeze-thaw cycles gives an expected stochasticity in 1 million of those reads; approximating the number of protein-coding genes in the human genome to be 20–25 thousand43, we can expect a difference of ~ 40–50 random counts per gene to exist between technical replicates due to a freeze-thaw cycle (Supplementary Methods, Fig. S15). Thus, each freeze-thaw introduces a non-negligible level of noise to the quantification of gene expression and differential expression of such genes.
To check for the possibility that there is a signature which can help correct for freeze-thaw distortion of RNA-Seq, we attempt to find a group of common DE genes across various DE methods. We find no such signature (Supplementary Results). This is expected, given that a major source of reduced sample quality due to freeze-thaw is mRNA degradation, which occurs randomly for each transcript and sample. A possible path forward is to correct for sample degradation. Several methods have been proposed for this. While some of these methods rely on RIN or similar metrics (e.g. mRIN, TIN, etc.)16,44, others have implemented statistical frameworks which account for gene-specific biases. DegNorm, for example, accounts for the gene-specific relative randomness in degradation in its correction approach17. Quality surrogate variable analysis (qSVA) specifically improves differential expression by identifying transcript features associated with RNA degradation for its correction28. Furthermore, there are recent methods which only assay the 3’ end of a transcript and therefore claim robustness in degraded samples45. While these approaches could help account for noise introduced by RNA degradation during repeated freeze-thaw cycles, they cannot necessarily remedy the associated loss of signal.
The effect of freeze-thaw and resultant degradation on RNA-Seq is particularly concerning when considering differential gene expression analysis. It has been observed that RNA degradation can induce the apparent differential expression in as many as 56% of genes44. To this end, we quantified this loss of DE reproducibility by measuring similarity and discordance in the context of sample quality. We found a decrease in reproducibility with both decreasing RIN and increasing freeze-thaw. Interestingly, for most reproducibility assessments, we observed a monotonic or near monotonic increase in disparity between low and high quality subsets with respect to subset size. Similarity demonstrated a larger average magnitude of disparity for freeze-thaw, whereas discordance demonstrated a larger average magnitude of disparity for RIN.
Based on our analysis, the utility of RIN in assessing quality when samples undergo freeze-thaw is questionable. The non-uniformity in mRNA degradation46–49 due to freeze-thaw sheds light on these challenges, since RIN cannot quantify quality at the individual gene level23. This is reflected in the fact that samples with RIN > 8 demonstrate degradation32. Furthermore, results assessing the effect of freeze-thaw cycles on RIN are inconclusive. While some studies claim RIN can be used to account for degradation effects in RNA-Seq16, others suggest it does not sufficiently capture the effects of degradation on sample quality26,28. When directly observing the effect of freeze-thaw on RIN, studies have found a negligible effect12 or can only detect an effect after numerous cycles27,50.
As such, we re-examined the utility of RIN as a measure of sample quality in relation to our noise estimation of random reads per sample23. We found that while noise increases with both decreasing RIN and increasing freeze-thaw, RIN may be an insufficient indicator of quality for samples that have undergone two or more freeze-thaws. Given these results, RIN may not always be a good metric to quantify the difference between technical replicates that have undergone variable sample handling17,26−28. We validate noise by confirming that it does not change with input RNA concentration, excepting outliers (Fig. S12). Therefore, in the future, noise could be a useful supplement to RIN when technical replicates are present.
The fact that our predicted decrease in similarity due to freeze-thaw does not change when incorporating RIN into our model further indicates that RIN alone cannot capture the changes in sample quality due to freeze-thaw. Despite this, RIN is a good indicator of sample quality, if not specifically for freeze-thaw. This is reflected in the fact that RIN validates our expectations for DE reproducibility analysis and the comparable range of noise, similarity, and discordance values between freeze-thaw and RIN assessments.
Finally, to confirm our expectation that freeze-thaw decreases sample quality18–22 and to further characterize the underlying mechanism, we validated the presence of a 3’ bias in coverage. This builds on our and others’ observations that a lower percentage of poly(A)-enriched transcripts are covered42. We compared coverage to ribosome depleted RNA-Seq data, which does not use 3’ hybridization to retain transcripts. We find that poly(A)-enrichment does in fact introduce a strong 3’ bias in coverage as compared to ribosome depletion. This bias is further exacerbated with additional freeze-thaw cycles in poly(A)-enriched but not ribosome depleted samples. This implies that degradation due to freeze-thaw does not impact RNA-sequencing of ribosome depleted samples to the extent that it does in poly(A)-enriched samples. In light of our demonstrations that 3' bias is associated with a substantial increase in noise and a decrease in DE reproducibility, these findings suggest that RNA-seq from samples that have both been poly(A)-extracted and undergone freeze-thaw cycles likely has unknown, diminished stability. While not all studies have technical replicates to estimate noise, the presence of exaggerated 3’ bias when poly(A)-extraction is combined with freeze-thaw can be a simple indicator of RNA-seq distortion.