Differential abundance analysis is at the core of microbiome statistical analysis. However, microbial data presents three prominent challenges: The microbiome data is sparse and most species are absent from most samples. The microbial abundances vary across different scales. There are inner relations between different taxa, reflecting the taxonomic structure. A microbiome statistical test should handle these three challenges.
Here, we introduce miMic (Mann-Whitney iMage Microbiome), a straightforward yet remarkably versatile and scalable approach that effectively addresses compositional effects and inherent taxonomic relationships. miMic consists of three main steps: data preprocessing and translation to a cladogram of means, an apriori nested ANOVA to detect overall microbiome-condition (label) relations, and a post hoc miMic test along the cladogram trajectories.
We propose a novel metric for an unbiased comparison of methods in the case of missing ground truth and show that miMic drastically decreases the False positive rate while preserving the total positive rate. Using an analytical test case, simulations, and real-world examples, we demonstrate the accuracy of miMic compared to existing methods.