Background: A common task in analyzing metatranscriptomics data is to identify microbial metabolic pathways with differential RNA abundances across multiple sample groups. With information from paired metagenomics data, current differential methods control for either DNA or taxa abundances to address their strong correlation with RNA abundance. However, it remains unknown if both factors need to be controlled for simultaneously.
Results: We discovered that when either DNA or taxa abundance is controlled for, RNA abundance still has a strong partial correlation with the other factor. In both simulation studies and a real data analysis, we demonstrated that controlling for both DNA and taxa abundances leads to superior performance compared to only controlling for one factor.
Conclusions: To fully address the confounding effects in analyzing metatranscriptomics data, both DNA and taxa abundances need to be controlled for in the differential analysis.