DeepBiome: a phylogenetic tree informed deep neural network for microbiome data analysis
Background: Evidence linking microbiome and human health is rapidly growing. Microbiome profile can be a novel predictive biomarker for many diseases. However, bacteria count tables are typically sparse and bacteria are classified at a hierarchy of taxonomic levels, ranging from species to phylum. Existing analysis tools focus on identifying microbiome associations either at the community level or at a specific pre-defined taxonomic level. They fail to incorporate the evolutionary relationship between bacteria and cannot learn from the data to aggregate microbiome contribution, thus leading to less accurate and less interpretable results in prediction, classification or selection.
Results: We present DeepBiome, a phylogney-informed neural network architecture for predicting phenotypes from microbiome counts and uncovering the microbiome-phenotype association network. It takes microbiome abundance as the input and let the phylogenetic taxonomy guide the neural network architecture. Commonly used neural network architectures are targeted towards image and text analysis and typically require huge amount of training data, which is scarce in biomedical applications. By leveraging the phylogenetic information, DeepBiome relieves the heavy burden of tuning for the optimal deep learning architecture, avoids overfitting, and more importantly enables visualizing the path from microbiome counts to disease. It is applicable to both regression and classification problems. The simulation study and real-life data analysis demonstrate that DeepBiome is highly accurate and efficient and and provides a deep understanding of complex microbiome-phenotype associations even using small to moderate training sample sizes.
Conclusions: In practice, it is unknown at which taxonomic level that microbiome clusters tag the association. Therefore, the central advantage of the presented method over other analytical methods is that it offers an ecological and evolutionary understanding of host-microbe interactions which is important for microbiome-based medicine. DeepBiome is implemented using Python packages Keras and the Tensorflow. It is an open-source tool available at (https://github.com/Young-won/DeepBiome).
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the manuscript can be downloaded and accessed as a PDF.
This is a list of supplementary files associated with this preprint. Click to download.
Posted 10 Jun, 2020
DeepBiome: a phylogenetic tree informed deep neural network for microbiome data analysis
Posted 10 Jun, 2020
Background: Evidence linking microbiome and human health is rapidly growing. Microbiome profile can be a novel predictive biomarker for many diseases. However, bacteria count tables are typically sparse and bacteria are classified at a hierarchy of taxonomic levels, ranging from species to phylum. Existing analysis tools focus on identifying microbiome associations either at the community level or at a specific pre-defined taxonomic level. They fail to incorporate the evolutionary relationship between bacteria and cannot learn from the data to aggregate microbiome contribution, thus leading to less accurate and less interpretable results in prediction, classification or selection.
Results: We present DeepBiome, a phylogney-informed neural network architecture for predicting phenotypes from microbiome counts and uncovering the microbiome-phenotype association network. It takes microbiome abundance as the input and let the phylogenetic taxonomy guide the neural network architecture. Commonly used neural network architectures are targeted towards image and text analysis and typically require huge amount of training data, which is scarce in biomedical applications. By leveraging the phylogenetic information, DeepBiome relieves the heavy burden of tuning for the optimal deep learning architecture, avoids overfitting, and more importantly enables visualizing the path from microbiome counts to disease. It is applicable to both regression and classification problems. The simulation study and real-life data analysis demonstrate that DeepBiome is highly accurate and efficient and and provides a deep understanding of complex microbiome-phenotype associations even using small to moderate training sample sizes.
Conclusions: In practice, it is unknown at which taxonomic level that microbiome clusters tag the association. Therefore, the central advantage of the presented method over other analytical methods is that it offers an ecological and evolutionary understanding of host-microbe interactions which is important for microbiome-based medicine. DeepBiome is implemented using Python packages Keras and the Tensorflow. It is an open-source tool available at (https://github.com/Young-won/DeepBiome).
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Due to technical limitations, full-text HTML conversion of this manuscript could not be completed. However, the manuscript can be downloaded and accessed as a PDF.