Comparison of metagenomics and metatranscriptomics/metaproteomics data sets for one-condition sample (Multi-meta-omic approach)
Assessing bacterial composition from metagenomics and metatranscriptomics data
The analysis of microbial community samples often raises the question of which bacteria form a given population. To answer this question, we performed two different types of analysis using gNOMO. First, we processed and analyzed metagenomics data to investigate the taxonomic composition of a given sample. Second, we analyzed and compared samples of two different conditions: 10d and 20d.
For the first analysis, the output was visualized using a Krona plot that is produced for each metagenomics and metatranscriptomics sample automatically within the gNOMO pipeline. For the first-condition (10d) sample, we observed that the main phyla present in this population were Bacteroidetes, Firmicutes, and Proteobacteria (Figure 2).. After analyzing the taxonomic distribution differences between the 10d and 20d samples, we observed no significant abundance differences in a preliminary analysis (Additional file 1: Tables S3 and S4).. In this analysis, the relative abundance of the main phyla and families was calculated in relation to the mean abundance of the two conditions. We observed that the four most abundant phyla distributions match our previous published studies based on 16S gene sequencing, while others (e.g., Planctomycetes, Deferribacteres and Actinobacteria) do not match exactly previous studies on this topic  (Additional file 1: Table S3).. We made similar observations regarding taxonomic abundances at the family level (Additional file 1: Table S4).. In general, this can be explained by the difference concerning the method and annotation between 16S rRNA gene sequencing analysis and metagenomics. 16S rRNA gene sequencing focuses on bacterial data and can be useful in environmental studies due to the lack of fully sequenced bacterial genomes in these kinds of scenarios. In contrast, metagenomics offers higher resolution, enabling a more specific taxonomic classification of sequences as well as the detection of new bacterial genes and genomes .
Figure 2: KronaPlot of the taxonomic annotation of a metagenomics sample (condition 10d). Bacterial taxa distribution of metagenomics data, corresponding to condition 10d. The bacterial taxa are classified by taxonomic hierarchy levels, from higher levels in the center of the chart (Kingdom Bacteria) progressing outward until genus level.
As described previously, our first analysis provided no clearly visible abundance differences between the two conditions, as we were expecting when studying such a stable situation (both are adult individuals differing in 10 days of development). However, we decided to validate this finding by a more sensitive statistical approach. To investigate this issue further, we used LEfSe  as a well-established statistical method for comparing the taxonomic distribution at genus level between 10d and 20d conditions. LEfSe has the advantage of recognizing the hierarchy of the taxonomic classification and accurately calculate statistically significant differences (represented as LDA scores) between different conditions.
Using LEfSe, we found, for example, that Fusobacterium (Fusobacteriaceae family), was more abundant at 10 days (LDA score > 3) in both metagenomics and metatranscriptomics data (Figure 3).. Fusobacterium has been related to disease and stress situations in the human gut microbiota , but is has also been related to the infants gut microbiota . Conversely, an unidentified genus belonging to the family Ruminococcaceae, has been found more abundant in 20d than 10d condition (LDA score > 3) in metagenomics data (Figure 3a),, but no differences between conditions have been found in metatranscriptomics data (Figure 3b).. Various genera belonging to the family Ruminococcaceae have been related to a healthy gut microbiota, like Ruminococcus and Faecalibacterium. These have been linked to degradation of starch in the human colon making it available for other bacteria in the gut , and degradation of cellulose in herbivorous mammals . These differences between 10d and 20d conditions could suggest that, even if the population is very stable along adult stages, it is being rearranged to its final composition. This rearrangement would imply a reduction in Fusobacterium and an increase of Ruminococcaceae along time (10d against 20d, Figure 3a).. On the other hand, Pseudomonas genus and an unclassified genus belonging to the family Pelagibacteraceae are more abundant only in metatranscriptomics analysis at 20d against 10d (Figure 3b).. Pelagibacteraceae has been described as a bacterial family localized in marine and freshwater environments , but has also been detected in the mouse gut microbiome  Pseudomonas genus has been related to pathogenicity in animals and plants, and is a commonly detected taxa in the gut of cockroaches . These results suggest that these taxa increase their transcriptional activity but not their abundance in the population along time. By the same reason the unidentified genus of Ruminococcaceae reduce its transcriptional activity (is overrepresented at metagenomics level but not at metatranscriptomics level in 20d sample). More importantly for the present work is the integration of this level of comparison that allows detection of particular taxa that differ significantly in their abundance in different conditions.
Figure 3: LEfSe graph of taxonomic annotation of metagenomics (top) and metatranscriptomics (bottom) data comparing the two conditions: 10d and 20d. Taxa with significant different distribution among the two conditions are identified. Only taxa with LDA scores over 2 are shown. Positive LDA scores are assigned to the taxa overrepresented in the condition 20d (green), and negative LDA scores to the taxa overrepresented in the condition 10d (red). metagenomics data (Fig 3a) and metatranscriptomics data (Fig 3b) are represented.
Functional analysis from integrated metagenomics and metatranscriptomics data for one-condition sample
Once the bacterial population has been described at the taxonomic level, the next step concerns the functional analysis of each microbiome dataset and the qualitative and quantitative differences of assigned functional annotations. To assess the level of transcriptional activity of the population, we compare the metagenomics data (gene pool) and the metatranscriptomics data (transcripts) corresponding to the microbiota of the 10d condition. Integrating metagenomics and metatranscriptomics allows calculating transcript/gene ratios that indicate gene transcriptional activation or repression. For this purpose, we applied LEfSe based on the functional role (or subrole) assignment using TIGRFAM (Figure 4; Additional file 1: Table S5).. We observed that energy metabolism (both anaerobic and aerobic metabolisms) and protein production are the most active metabolic pathways (Figure 4),, which indicates that the bacterial population is active.
Figure 4: LEfSe graph comparing metagenomics and metatranscriptomics data of TIGRFAM annotation (role and subrole levels) of condition 10d. Taxa with significant different distribution among metagenomics and metatranscriptomics data are identified. Only taxa with LDA scores above 2 are shown. Positive LDA scores are assigned to the functional categories overrepresented in the metatranscriptomics data (RNA, green), and negative LDA scores to the functional categories overrepresented in metagenomics data (DNA, red).
Alternatively, a pathway analysis enables to discover differences between states by using the Pathview R package. An analysis with Pathview shows which specific metabolic pathways (KEGG pathways) have statistically significant correlations between sample types and/or conditions and thereby complements the information provided by LEfSe. In a Pathview graph, an increase of the gene activity involved in a certain pathway can be observed. Our exemplary analysis using Pathview here focuses on the tricarboxylic acid cycleTCA cycle) of the gut microbiota, comparing again gene pool (metagenomics data) against transcripts (metatranscriptomics data) (Figure 5).. The TCA cycle consists of a series of oxidative reactions to finally obtain energy (ATP) from oxidative degradation of the acetyl group, in the form of acetyl-CoA, to carbon dioxide. The full cycle can be performed by bacteria in aerobic conditions, but some autotrophic bacteria are also able to perform the reverse TCA cycle (rTCA), and even some anaerobic bacteria are able to carry out an incomplete TCA cycle, defining the pan-metabolic capabilities for this pathway of the gut microbiota.
We have found that the majority of the enzymes that take part in the TCA cycle are overrepresented at the transcript level. This confirms our previous observations related to energy metabolism (Figure 4).. With both analysis methods and their visualizations, we were able to study different levels of complexity of the pan-metabolism of all bacterial populations. We observed that the microbiome actively produces energy and proteins to grow and maintain a very complex population. Beyond the use case shown above, depending on the particular study, other pathways could be analyzed.
Figure 5: KEGG Pathview graph of the TCA cycle metabolism route comparing metagenomics vs. metatranscriptomics data of the microbiota of 10d and 20d conditions. Some nodes are split between two colors, indicating 10d (left) and 20d (right) conditions. Green (–1) depicts genes underrepresented in metagenomics (but overrepresented in metatranscriptomics), while those marked in red (1) depicts overrepresented genes in metagenomics (but underrepresented in metatranscriptomics). In grey, values close to 0 in the ratio metatranscriptomics/metagenomics, indicating no differences in frequency.
Meta-omics integration: comparing metagenomics, metatranscriptomics, and metaproteomics data at the functional pathway level
Each meta-omics level data provides unique information in various ways, but their integration is crucial to gain a complete overview of the metabolic capabilities of the studied bacterial populations. metaproteomics data incorporation to the integrated analysis of microbiomes is essential to have a realistic overview of the functional capabilities of the bacterial populations. For this purpose, we analysed these meta-omics data together, as an example, focussing on the N metabolism pathway, corresponding to the N cycle, the set of reactions by which different inorganic N compounds are transformed into ammonia, a biologically reduced form of N that can be mainly introduced into synthesis of amino acids (glutamine and glutamate). We were interested in this pathway due to previous findings related to N metabolism of the host (B. germanica) and the endosymbiont Blattabacterium. As explained previously, Blattabacterium participates in the N recycling from stored urates to ammonia that can be used to synthesize glutamine and glutamate, connecting with the amino acid biosynthesis pathway . Here, the aim was to study N metabolism in the host gut microbiome and then to assess if the bacterial population has the metabolic capability to produce a form of usable N.
In this analysis, we investigated how variable or stable the overall N metabolism is at the gene, transcript and protein level along time (10d against 20d) in the investigated pathway (Figure 6).. While metagenomics and metatranscriptomics show almost complete coverage of the N metabolism pathways and very variable along time, only a few enzymes were observed in the metaproteomics data and very stable along time. These results suggest that while the gene pool (the population) can be variable, the final transcripts and at least the four detected proteins remain stable, which could point in the direction of a functional redundancy at the protein level, as has been previously described for human gut microbiota . However, deeper coverage of the metaproteomics data would be necessary to confirm these findings.
Figure 6: KEGG Pathview graph of the N metabolism route comparing metagenomics/metatranscriptomics/metaproteomics data of the microbiome at 10d and 20d. Some nodes are split between different colors, indicating metagenomics (left), metatranscriptomics (middle) and metaproteomics (right) data. Green (–1) depicts genes/transcripts/proteins overrepresented in 10d (but underrepresented in 20d), while those marked in red (1) depicts genes/transcripts/proteins overrepresented in 20d (but underrepresented in 10d). In grey, values close to 0 in the ratio 10d/20d, indicating no differences in frequency.
Comparison of host and microbiome data
Microbiota metabolism and functions are better understood when studied together with its host. gNOMO includes the analysis of the host data in parallel with its microbiome, so we can integrate and compare the metabolic pathways of host and microbiome. In the case of B. germanica, we have studied the N metabolism pathway that we had analyzed before with the focus on the microbiota data (Figure 6) integrating the host data (Figure 7).. We have observed which enzymes can be found in the bacterial population data and which ones can be explained by the host data (Figure 6 and 7)..
We expected to find a maximum of four enzymes in the host data, as in most eukaryotes only four enzymes of this pathway are present, and we could detect those in the host pathway. While these four enzymes were the only ones detected in the host, its gut microbiome possesses most of the enzymes present in the N metabolism pathway.
If we study these four enzymes present in the host data in detail, we can observe that all of them are overrepresented at 10d against 20d condition in metaproteomics data, and in metagenomics and metatranscriptomics data, they are almost undetectable (Figure 7).. When looking at the microbiome metatranscriptomics data, these proteins have a stable abundance over the whole time (Figure 6).. These findings could indicate that the production of these proteins in the hindgut of the host is reduced along time, but its production by the microbiome remains stable.
After analyzing the bacterial and the host capabilities together regarding this metabolic pathway, we find that the N metabolism corresponding to the N cycle is mostly performed by the microbiome.
Figure 7: KEGG Pathview graph of the N metabolism pathways comparing metagenomics/metatranscriptomics/metaproteomics data of the host between 10d and 20d conditions. Some nodes are split between different colors, indicating metagenomics (left), metatranscriptomics (middle) and metaproteomics (right) data. Green (–1) depicts genes/transcripts/proteins overrepresented in 10d (but underrepresented in 20d), while those marked in red (1) depicts genes/transcripts/proteins overrepresented in 20d (but underrepresented in 10d). In grey, values close to 0 in the ratio 10d/20d, indicating no differences in frequency.