Background: Evidence linking microbiome and human health is rapidly growing. Microbiome profile can be a novel predictive biomarker for many diseases. However, bacteria count tables are typically sparse and bacteria are classified at a hierarchy of taxonomic levels, ranging from species to phylum. Existing analysis tools focus on identifying microbiome associations either at the community level or at a specific pre-defined taxonomic level. They fail to incorporate the evolutionary relationship between bacteria and cannot learn from the data to aggregate microbiome contribution, thus leading to less accurate and less interpretable results in prediction, classification or selection.
Results: We present DeepBiome, a phylogney-informed neural network architecture for predicting phenotypes from microbiome counts and uncovering the microbiome-phenotype association network. It takes microbiome abundance as the input and let the phylogenetic taxonomy guide the neural network architecture. Commonly used neural network architectures are targeted towards image and text analysis and typically require huge amount of training data, which is scarce in biomedical applications. By leveraging the phylogenetic information, DeepBiome relieves the heavy burden of tuning for the optimal deep learning architecture, avoids overfitting, and more importantly enables visualizing the path from microbiome counts to disease. It is applicable to both regression and classification problems. The simulation study and real-life data analysis demonstrate that DeepBiome is highly accurate and efficient and and provides a deep understanding of complex microbiome-phenotype associations even using small to moderate training sample sizes.
Conclusions: In practice, it is unknown at which taxonomic level that microbiome clusters tag the association. Therefore, the central advantage of the presented method over other analytical methods is that it offers an ecological and evolutionary understanding of host-microbe interactions which is important for microbiome-based medicine. DeepBiome is implemented using Python packages Keras and the Tensorflow. It is an open-source tool available at (https://github.com/Young-won/DeepBiome).