Background: Bacteria and microbial eukaryotes occupy a wide range of ecological niches and are essential for the functioning of ecosystems. The advent of next-generation sequencing methods enabled the study of environmental microbial community compositions. Yet, many questions regarding the stability and functioning of environmental microbiomes remain open.
Results: In the current study, we present a methodological framework to quantify the information shared between the microbial community of a habitat and the abiotic parameters of this habitat. It is built on theoretical considerations of systems ecology and makes use of state-of-the-art machine learning techniques. It can also be used to identify bioindicators. We apply the framework to a dataset containing operational taxonomic units (OTUs) as well as more than twenty physico-chemical and geographic parameters measured in a large-scale survey of European lakes. While a large part of variation (up to 61\%) in many physico-chemical parameters can be explained by microbial community composition, some of the examined parameters only share little information with the microbiome. Moreover, we have identified OTUs that act as `multi-task’ bioindicators that could be potential candidates for lake water monitoring schemes.
Conclusions: This study demonstrates the benefits of machine learning approaches in microbial ecology. Our results represent, for the first time, a quantification of information shared between the lake microbiome and a wide array of ecosystem parameters. Building on the results and methodology presented here, it will be possible to identify microbial taxa and processes central for the functioning and stability of lake ecosystems.