Deep learning is increasingly being used for predicting features and solving complex problems. Unlike traditional algorithms, in which the expertise and rules are already coded, the deep learning algorithms are built to automatically detect patterns in the data [1, 2], by also embedding the computation of variables into the models themselves to yield end-to-end models [3]. In particular, the construction and training of deep learning algorithms have been enabled by the increasingly availability of big data and by the rapid growth in the number and size of public available databases. So far, deep neural networks have been instrumental in advances in modern artificial intelligence, with applications as facial recognition, speech recognition, and self-driving vehicles. More recently, new applications in the field of molecular biology and metagenomics has been pioneered. Indeed, the same deep learning approaches are beginning to be applied to genetics, agriculture, and medicine [4–10]. However, deep learning is still unexplored in the field of microbial metagenomics, with only a few approaches suitable for microbiome data [11–13], and a huge untapped potential still unexplored.
The human microbiome, i.e. the sum of the different microbial ecosystems that colonize the niches of the human body, plays an important role for human physiology and its dysbiotic variations can impact our health [14]. Shifts in the composition of the microbial communities inhabiting the oral cavity and the gastrointestinal tract have been associated with the onset and/or progression of several conditions, such as periodontitis [15], and a series of modern chronic disorders, including inflammatory bowel disease [16], obesity [17], cardiovascular disease [18] and some forms of cancer [19–21]. The importance of the human microbiome in health and disease makes it imperative to understand the drivers of its variation. In this context, a new frontier is represented by the meta-community theory, according to this the symbiont human microbial ecosystems are in intimate connection, showing reciprocal influences and exchanges [22, 23]. Supporting a meta-community vision of the human microbial ecology, a close link between oral and intestinal microbiome has recently been hypothesized, with the former reflecting changes in the latter, in both healthy and diseased individuals [24–27].
Another scale of human microbiome variation is represented by its change across evolutionary timeline. Particularly, a large body of literature indicates that the current human gut microbiome has evolved towards at least two different configurations, rural and urban, both associated with the corresponding subsistence strategy. Respect to the first, generally considered as the pristine human gut microbiome, the urban configuration is characterized by an overall compression of microbial biodiversity, a wholescale loss of commensal microbial groups, an increased presence of genes related to antibiotic resistance and xenobiotics metabolism [28–33]. However, principally because of paucity ancient stool samples, the ancestral human gut microbiome is still unknown and the evolutionary trajectories and the drivers leading to its contemporary configurations are still to be described, living important gaps in the knowledge on the gut microbiome human host co-evolutionary trajectories. Contrary to ancient fecal samples, dental ones are more common and well preserved, allowing the extraction of the ancient oral microbiome from the ancient DNA conserved in dental tartar. Coherently with the meta-community vision, the ancient oral microbiome configuration can somehow mirror structural features of the gut one because of inherent connections between the two ecosystems. In this scenario, here we developed a new deep learning-based tool, G2S, which infers the gut microbiome configuration from oral microbiome data of a given individual. G2S is based on a model trained and tested on 171 paired samples of gingival and stool microbiome retrieved from the Human Microbiome Project (HMP) [34]. Our approach can be relevant to predict the eubiotic/dysbiotic state of the gut microbiome when fecal data are not available, and particularly suitable for human archaeological records, where coprolites and fecal sediments are really uncommon compared to dental calculi and other human remains. G2S is available on the website https://github.com/simonerampelli/g2s.