Background: Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated–a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear.
Results: Using “mappability”, a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically.
Conclusions: We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.

Figure 1

Figure 2

Figure 3

Figure 4
The full text of this article is available to read as a PDF.
No competing interests reported.
This is a list of supplementary files associated with this preprint. Click to download.
Loading...
Posted 08 Mar, 2021
On 30 Mar, 2021
Received 28 Mar, 2021
On 15 Mar, 2021
Invitations sent on 15 Mar, 2021
On 09 Mar, 2021
On 09 Mar, 2021
On 09 Mar, 2021
On 06 Mar, 2021
Posted 08 Mar, 2021
On 30 Mar, 2021
Received 28 Mar, 2021
On 15 Mar, 2021
Invitations sent on 15 Mar, 2021
On 09 Mar, 2021
On 09 Mar, 2021
On 09 Mar, 2021
On 06 Mar, 2021
Background: Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated–a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear.
Results: Using “mappability”, a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically.
Conclusions: We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.

Figure 1

Figure 2

Figure 3

Figure 4
The full text of this article is available to read as a PDF.
No competing interests reported.
This is a list of supplementary files associated with this preprint. Click to download.
Loading...