Compositional structure of bacterial microbiomes in higher termite gut reflects the diet and lineage of the host
According to the recent reports, the host diet appears to be the major determinant of the bacterial community structure in higher termite guts  and the dietary changes in the feeding routine affect the composition of gut microbiota . Still, the importance of the host signal and previous indications of vertical inheritance  should not be neglected. To characterise the diversity of microbial communities associated with the termite gut, we analysed 41 gut samples collected from workers of 15 different termite genera with distinctive feeding habits. Thus, we extended the currently existing knowledge to gut bacterial communities from several previously understudied higher termite species from Syntermitinae, Apicotermitinae, Termitidae and Nasutitermitinae subfamilies. The high-throughput sequencing of bacterial 16S rRNA gene amplicons resulted in 4,086,163 reads, further rarefied to 10,000 reads per library, and assigned to 8,069 bacterial OTUs (defined at 97 % sequence similarity, Additional file 2: Table S2). The calculated rarefaction curves inferred from species richness reached a plateau, except for a few more diverse samples from soil-feeding termites (Additional file 1: Figure S3). As inferred from Boneh estimate, increased sequencing depth would have allowed describing on average 165 ± 79 additional bacterial OTUs.
To simplify the comparative analyses, the studied gut microbiomes were classified into two broad groups, based on their diets: (a) the “plant fibre feeders”, relying on sound lignocellulose sources including wood, grass and microepiphytes, and (b) the “soil feeders”, relaying on more humified lignocellulose such as soil, humus and litter (Table 1). Following the analysis of dissimilarity in community structure (Bray-Curtis) and membership (Jaccard) at the OTU level, termite gut microbiomes clustered (according to known host dietary preferences (Fig. 1A, Additional file 1: Figures S4-S5). ANOSIM R was equal to 0.91 and 0.98 (p < 0.001) for Bray-Curtis and Jaccard, respectively. Similar clustering pattern was also obtained using the weighted and unweighted UniFrac analyses (Additional file 1: Figure S6), which in addition take into account phylogenetic distances between observed organisms (OTUs) . In agreement with a previous report , the gut microbiomes of the plant fibre-feeding termites were characterised with an average microbial richness and diversity indices three to five times lower (richness 347 ± 61 and diversity 19 ± 6) than those of the soil-feeding termites (richness 1212 ± 326 and diversity 102 ± 83, Fig. 1B). For plant fibre feeders, roughly 31.3 ± 11.8 of top abundant OTUs represented 80 % of reads abundance in a sample, while the same was represented by 247.7 ± 134.4 OTUs for soil feeders. Following the taxonomic annotation of the resulting bacterial OTUs, 26 bacterial phyla were identified (Additional file 2: Table S2). The patterns of bacterial community compositions were consistent with those reported previously for hosts with similar feeding strategies . Unlike for the plant fibre feeders, the taxonomic profiles of the soil-feeding higher termite gut microbiomes were more heterogeneous between the termite species, most probably following the humification (decomposition) gradient of the diet. On average, the plant fibre-feeding termite cluster was dominated by Spirochaetae (63.9 % ± 9.1 relative community abundance), Fibrobacteres (16.6 % ± 7.7) and candidate phylum TG3 (10.0 % ± 5.2, Fig. 1A). By contrast, Spirochaetae was much less abundant in soil feeders (38.4 % ± 18.4), followed by Firmicutes (34.8 % ± 22.2) and Bacteroidetes (7.6 % ± 3.9 reads).
Our results also demonstrated that occurrence and abundance of specific OTUs assigned to the same bacterial phylum differed strongly depending on the termite lineage, regardless of the feeding habit of the termite host (Fig. 1A). For example, at the termite genus level, we could find that highly abundant OTU_5 and OTU_27 assigned to the Fibrobacteres phylum were enriched in Nasutitermes-specific subcluster, whereas OTU_82 and OTU_771 (also Fibrobacteres) were characteristic to Microcerotermes sp. Within soil feeders, the examples included preferential association of OTU_428 (Spirochaetes) with Embiratermes sp., and OTU_20 and OTU_33 being highly abundant mostly in Silvestritermes sp. Interestingly, none of the OTUs was shared among all of the 41 studied samples. The PERMANOVA analysis (p < 0.001) on the Bray-Curtis distance matrices for bacterial community profiles confirmed that the host feeding regime and the host taxonomy were two important factors shaping the gut bacterial community. Our observations thus extended previous reports to a broader number of termite species [38, 41]. It is important to note that within the same or similar feeding habits, clear compositional divergence was observed for the studied microbiomes at higher taxonomic resolution, where the occurrence and abundance of certain OTUs assigned to the same phyla were strongly dependent on the termite taxonomy. It is in line with recent studies [7, 8], which already postulated the presence of bacterial lineages specific to particular termite groups and suggested the mixed-mode transmission mechanisms (colony-to-offspring and colony-to-colony) of gut bacteria between termites.
Transcripts annotation to broad functional categories reveals shared metabolic signatures between plant fibre- and soil-feeding termite gut symbionts
Following the community structure analysis (Fig. 1), selected samples representative of 11 different termite species were further studied by sequencing of the enriched prokaryotic mRNA. De novo metatranscriptomic approach was chosen over the information content of the community metagenome in order to characterise more specifically the community effort to break down the different lignocellulosic fractions. For two termite colonies (E.neo_1 and S.hey_1) biological replicates were analysed to ensure the repeatability and thus the validity of the generated sequencing results (Additional file 1: Figure S2). In our study, the sequencing effort resulted in over 730 million raw reads that were reduced to nearly 500 million reads after quality trimming and rRNA removal. Co-assembly of all generated metatranscriptomes resulted in 1,959,528 contigs, which were further taxonomically and functionally annotated using public databases. Additional details related to the metatranscriptomic analysis are summarised in Additional file 2: Table S1. Archaeal sequences were not very prevalent and they accounted for less than 2 % of the metatranscriptomic abundance, which somehow remains in agreement with another study on microbial metatranscriptomes in termite gut . However, in contrast to the same study, little taxonomic consistency (including at the phylum level) was found between the bacterial community structure (Fig. 1) and the taxonomic distribution of assigned prokaryotic gene transcripts (Fig. 2), even though the initial database dependent taxonomy was further improved by comparing transcripts to MAGs reconstructed from termite gut microbiomes . In particular, large under-representation of Fibrobacteres and over-representation of Firmicutes were observed in the termite gut prokaryotic metatranscriptomes, especially within the plant fibre-feeding termite cluster. The possible taxonomic misclassification of certain gene transcripts might stem from different factors, including under-representation of bacterial sequences (Fibrobacteres in particular) of termite origin in public databases , and extensive horizontal gene transfer occurring in bacteria . For this reason, the phylogenetic distribution of the metabolic functions and pathways will not be broadly discussed in this study, especially in the case of the plant fibre-feeding termites.
In the studied prokaryotic metatranscriptomes, on average 64.2 % ± 2.7 of gene transcripts were assigned to 4910 KEGG Ontology categories (KO), accounting for an average of 62.4 % ± 2.74 reads abundance per sample. Out of the annotated KOs, 2686 were further assigned a metabolic function (following annotation to KEGG BRITE database), and based on the calculated rarefaction curves we could assume that a significant part of the higher termite gut metabolic potential was uncovered in our study (Additional file 1: Figure S7). Clustering microbial communities based on their metatranscriptomic profiles revealed the presence of two main clusters (Fig. 2C), nearly exactly resembling the separation of samples based on the community structure analysis (Fig. 1), which pointed towards putative differences in termite-specific activities of gut microbes. Slightly higher number of KO categories assigned for soil (3137.3 ± 438.5) versus plant fibre-feeding termite clusters (2663.2 ± 243.4) might relate to the host-specific metabolic needs (e.g. broader range of food sources) and reflects the overall higher species diversity of soil feeders gut microbiomes (Fig. 1B). In total, 70 % of shared KOs accounted for 99.5 % ± 0.2 of all function-assigned gene transcripts. Consequently, the 1168 KOs exclusively assigned to soil feeders cluster represented roughly 0.4 % ± 0.1 of their metatranscriptomic abundance. For plant fibre feeders, the 328 specific KOs were even less abundant (0.2 % ± 0.1).
We further compared the expression patterns of functionally assigned genes that were shared by the two clusters, and we found a surprisingly high congruency between soil and plant fibre feeders gut metatranscriptomes. We observed a significant correlation between the average cumulative expression of transcripts assigned to the same KO category for soil- and plant fibre-feeding termites (Fig. 2A), including the KOs assigned a metabolic function and in particular for multiple glycosylases that were detected (Fig. 2B). The latter represented some of the most highly expressed KOs, confirming the specialization of the prokaryotic community towards carbohydrates metabolism. Further metabolic pathways re-construction mirrored the above observations of functional congruency (Additional file 1: Figure S8). It also showed that cell motility (16.5 % ± 0.7 and 12.8 % ± 0.7 of metatranscriptomic abundance in soil- and plant fibre feeders clusters, respectively) together with carbohydrate metabolism (13.9 % ± 2.0 and 18.7 % ± 1.4 of metatranscriptomic abundance) were the two most highly expressed categories of metabolic pathways in both clusters (Fig. 2E). Finally, the relative abundance of different metabolic modules assigned to the carbohydrate metabolism category (Fig. 2F) was also similar among all the samples, regardless of the different nature and composition of plant fibre diet versus more decayed and nutrient rich soil, humus and litter.
Interestingly, K02406 (fliC gene) encoding for flagellin was by far the most expressed KO in both termite clusters. In total, transcripts involved in bacterial chemotaxis (ko02030) and flagellar assembly (ko02040) pathways accounted for 16.8 % ± 4.5 and 10.4 % ± 2.5 of all prokaryotic transcripts in plant fibre- and soil-feeding termite gut metatranscriptomes, respectively. According to the current taxonomic assignment, both Spirochaetes (56.3 % ± 8.5 of transcripts assigned to this gene category) and Firmicutes (32.2 % ± 5.9) were characterised with increased motility in soil feeders, while in plant fibre feeders mainly Spirochaetes accounted for 81.2 % ± 12.8 of bacteria actively swimming in the termite gut. The high expression of genes relevant to motility in these phyla could favour their abundance in the highly viscous environment of the termite gut. The tendency in our study remains consistent with  in which the wood-feeding Nasutitermes corniger gut microbiome was characterised with higher abundance of cell motility and chemotaxis assigned gene transcripts in comparison to the dung-feeding Amitermes wheeleri. In general, overrepresentation of transcripts relevant to cell motility in the termite gut versus other biomass-degrading microbiomes could be regarded as advantageous, enabling these prokaryotes e.g. to actively reach their preferred substrates or to locate themselves in most favourable physicochemical gradients present within gut niches of their residence . For comparison, in a previously characterised rumen metatranscriptome, genes involved in flagella assembly and chemotaxis were only poorly represented . According to another study , genes related to cell motility and chemotaxis often co-cluster with CAZymes in bacterial genomes and show similar tendency of their expression profiles. That is why, next to the diverse CAZymes repertoire (discussed below), high bacterial cell mobility might to some extend contribute to the incredible success of the higher termite gut system in efficient biomass decomposition, often exceeding that of other lignocellulose utilising environments .
Dietary imprints highlight subtle differences between the plant fibre- and soil-feeding termite gut symbiotic communities
Even though there was a strong conservation between the plant fibre- and soil-feeding termite clusters at different functional gene levels, to further investigate any possible cluster-specific functionalities, we used the linear discriminant analysis (LDA) size effect (LEfSe)  to determine if any metabolic trait could be putatively enriched in soil- versus plant fibre feeders metatranscriptomes. General observations were similar to the previously published report related to the metagenomic and metatranscriptomic analysis of hindgut microbiota of wood- and dung-feeding higher termites , therefore they will be only briefly discussed here. Details are available in Additional file 2: Table S3. Next to the KOs exclusively assigned (though not necessarily abundant) to soil- and plant fibre feeders, 56 and 174 different gene categories were significantly overrepresented in the plant fibre- and soil-feeding termite clusters, respectively. Illustration of the overrepresented KOs showed low metabolic overlap between the two clusters in terms of cluster-specific functionalities (Additional file 1: Figure S9). Several metabolic functions enriched in a plant fibre feeders cluster were related to nitrogen acquisition, with e.g. atmospheric nitrogen fixation (e.g. nifD K02586 and nifK K02591 - nitrogenase molybdenum-iron protein chains) being limited to this termite group. This observation is in line with previously published reports on wood-feeding termite microbiomes [9, 10]. In contrast to nitrogen-limited wood-based diet, soil has higher levels of fixed nitrogen in different forms, including nitrogenous residues of humic components derived from bacterial biomass. Therefore, multiple KOs relevant to protein degradation and amino acids metabolism (e.g. psmB K03433 proteasome β-subunit, aprX K17734 serine protease, pepE K05995 dipeptidase E) and bacterial cell wall degradation (e.g. cwlS K19220 peptidoglycan DL-endopeptidase, lysF/cwlE K19223 peptidoglycan DL-endopeptidase, pdaA K01567 peptidoglycan-N-acetylmuramic acid deacetylase), as well as nitrate and nitrite transport and metabolism (e.g. narG/narZ/nxrA K00370 nitrate reductase/nitrite oxidoreductase α-subunit, nirS K15864 nitrite reductase, nrtD/cynD K15579 nitrate/nitrite transport system ATP-binding protein) were enriched in soil-feeding termite gut metatranscriptomes.
In relation to sugar transport and metabolism, diverse components of sugar ATP-binding cassette (ABC) transporters were differentially enriched in both feeding categories, however phosphotransferase system (PTS)-mediated sugar transport was largely enriched in a soil feeders cluster, including glucose, maltose, trehalose, N-acetylmuramic acid, fructose, mannitol and gluco- and galactosamine (Additional file 2: Table S3). Based on the enrichment of sugar isomerases, bacteria in the guts of plant fibre-feeding termites next to glucose would preferentially use xylose and arabinose for their metabolism (both derived from heteroxylans abundantly present in their woody diet). While their soil-feeding prokaryotic counterparts would rely mainly on glucose (e.g. cellulose, xyloglucans), ribose and galactosamine utilisation, the latter being present in bacterial cell wall.
Interestingly, enrichment of soil-feeding termite cluster metatranscriptomes with CRISPR-Cas system related components would indicate that bacteria in the termite gut actively use their adaptive immune system to protect themselves from invasive mobile genetic elements . By briefly analysing spacer sequences from re-constructed CRISPRs and blasting them against NCBI viral database, we could potentially identify infecting phages (Additional file 2: Table S4). Most of the sequences corresponded to Siphoviridae and Myoviridae families of the order Caudovirales, which is in line with the dominance of these two types of dsDNA phages in the metavirome of the Coptotermes formosanus termite gut . Homologous sequences to a few spacers were identified within the sequenced genomes of giant viruses, including the Pandoravirus with the largest known viral genome to date . According to the very recent study, higher prevalence of huge phage in the human and animal gut compared to other environments is related to their main hosts which are Firmicutes and Proteobacteria , both phyla being more abundant in the soil-feeding termite cluster.
Landscape of prokaryotic CAZymes in guts of plant fibre- and soil-feeding termites
Prokaryotic contribution in terms of CAZymes expression is crucial for the functioning of the whole termite gut system , enabling the termite to feed on lignocellulose-rich biomass, which is particularly abundant in its diet. Until now, more attention has been given to CAZymes in wood-feeding higher termites [9-11, 15], whereas the termite genera which evolved to forage on more humified lignocellulose sources, including soil, humus or litter, have remained largely understudied. Initial analysis of metabolic pathways in our re-constructed de novo metatranscriptomes already evidenced the specialization of prokaryotic communities towards carbohydrates metabolism (Fig. 2E, F). To continue, all prokaryotic transcripts were compared to the entries in the CAZy database  and in total 8,920 putative CAZymes-related gene transcripts of prokaryotic origin were detected in our dataset. Generally low sequence similarity of re-constructed CAZymes to carbohydrate active entries in the NCBI non redundant protein database (Fig. 3A) would indicate that the termite gut environment is a promising source of novel carbohydrate active enzymes. For this reason and to avoid removing too many true positive CAZymes, the commonly applied threshold of e-value <10−18 and coverage >0.35 when using the dbCAN tool for CAZymes discovery was not applied to our dataset, unless indicated otherwise.
On average, the discovered CAZymes accounted for 1.5 % ± 0.3 of expressed prokaryotic gene transcripts in the different samples. In total 8,972 CAZymes-domains were identified on the re-constructed CAZymes gene transcripts and assigned to 62 unique GHs families (3083 domains), 58 CBMs (2299 domains), 15 PL (159 domains), 12 CE (460 domains) and four families representing AA group (83 domains). GH was the most highly expressed CAZymes class across all the samples (Fig. 3B). Importantly, the recovered GH and CBM expression profiles for the replicates of colonies E.neo_1 and S.hey_1 showed high consistency, which suggest rather unbiased recovery of metabolic profiles when using the applied pipeline (Fig. 3). A large number of glycosyl transferases (GT; 27 families, 1333 domains) was detected, however they were not further discussed here due to their biosynthetic nature of activities , which is of less interest to our study. Additionally, 84 cohesin/dockerin and 1472 SLH domains were also identified. As expected, due to the higher abundance of Firmicutes in the soil termite cluster, the average transcript abundance of cohesin/dockerin and SLH annotated genes was much higher in soil feeders than in plant fibre feeders. It would indicate the active presence of cellulosomes in the former group . Gene transcripts taxonomic assignment indicated that enzymes involved in carbohydrate metabolism originated mainly from Firmicutes, Spirochaetes, Fibrobacteres and Bacteroidetes (Additional file 1: Figure S10). While these results remain in agreement with , where many GH genes were predicted to be of Firmicutes origin, such high abundance of Firmicutes-related CAZymes seems questionable. Especially in the light of a previous study , where following the metagenomic binning, the majority of encoded GHs, initially assigned of Firmicutes origin, were re-assigned to either Treponema (Spirochaetes) or to Fibrobacteres. Therefore, the current taxonomic classification of CAZymes identified in this study could be further revised once additional prokaryotic genomes reconstructed from the termite gut microbiota are available.
Partially re-constructed transcripts containing more than one CAZymes ORF were detected in the de novo metatranscriptomic assembly, thus confirming a presence of putative saccharolytic operons in the termite gut microbes. Indeed, of the 8,901 CAZymes-containing metatranscriptomics contigs, 276 were shown to contain at least two gene transcripts. Their abundance was similar in the two termite clusters (on average 3.4 % ± 1.3 of reads) showing no prevalence of carbohydrate utilization gene clusters in a specific termite group. There was no evidence of the expression of polysaccharide utilization loci (PULs) typically found in Bacteroidetes genomes , in part due to high fragmentation of the metatranscriptome re-construction. However, higher expression of susC (TonB-dependent transporter) and susD (cell surface glycan-binding protein) genes, which together form part of PULs, coincided with higher abundance of Bacteroidetes in C. cavifrons guts. The above results remain in line with the recent functional metagenomics study of wood-feeding Globitermes brachycerastes gut microbiome, which revealed the tendency of saccharolytic genes to aggregate or form putative operons .
Metabolic overlap and cluster specificities of carbohydrate-degrading strategies employed by microbes in plant fibre- and soil-feeding termite guts
Out of the 62 assigned GH families for the whole community metatranscriptome, 41 GHs were common, while 3 and 18 GHs were specific to plant fibre- or soil-feeding termite gut microbes, respectively (Fig. 3C). For the GH families that were expressed by the two community clusters, we observed a significant correlation of average gene expression levels (Fig. 3D). Still, even though the GH family diversity was comparable for soil- and plant fibre-feeding termite clusters, the number of different genes assigned to the same GH family was on average 1.5 times higher in soil-feeding termite gut microbiomes.
The CAZymes metatranscriptome of both feeding termite groups was dominated by gene transcripts assigned to GH11 family (Fig. 3D, F, Additional file 2: Table S5), members of which have been predicted to have an endo-β-1,4-xylanase activity (EC 220.127.116.11). Transcriptional abundance of GH11 family is consistent with the recent study of a fibre-associated wood-feeding higher termite gut microbiome . A high number of gene transcripts was also assigned to GH5 family (mainly represented by GH5_4), which, based on their peptide pattern homology to characterised CAZymes proteins, were predicted to show either endoglucanase (EC 18.104.22.168) or endoxylanase activities (see below). Yet, the cumulative abundance of gene transcripts assigned to GH5 was lower than in the case of GH11. Another GH family, which initially appeared to be abundant (especially in microepiphytes feeding C. cavifrons), was GH109 (Additional file 1: Figure S11). The only so far identified activity of enzymes assigned to GH109 family is an α-N-acetylgalactosaminidase. These enzymes might be potentially involved in bacterial biomass turnover, by targeting the common components of bacterial cell walls, more specifically N-acetyl-D-galactosamine found in lipopolysaccharides . However, following the application of the dbCAN threshold, its transcriptional abundance was significantly reduced (Fig. 3D), what better corresponded to the previously published reports. Still, active biomass turnover in animal guts was suggested to promote the release of biomass degrading enzymes from lysed bacterial cells to the gut lumen, that subsequently become “public goods” helping other bacteria in lignocellulose degradation . In this context, increased GH109 transcriptional expression may be indicatory of intense bacterial cell lysis in the termite gut. High transcriptional expression was also observed for other GH families, including GH4 with assigned maltose-6-phosphate glucosidase (EC 22.214.171.124) and α-galactosidase (EC 126.96.36.199) activities, and GH23 putatively targeting bacterial peptidoglycan (Fig. 3D, F), the latter again pointing to high bacterial biomass degradation rate. Although not excessively discussed in this study, chitin utilisation by the termite gut microbes seems high, based on the occurrence of putative β-N-acetylglucosaminidase/β-N-acetylhexosaminidase (EC 188.8.131.52), chitosanase (EC 184.108.40.206), α-1,3/1,4-L-fucosidase (EC 220.127.116.11), chitinase (EC 18.104.22.168) or chitin deacetylase (22.214.171.124), assigned to the GH3, GH8, GH29, CE4 families as well as on the abundance of a CBM50 (putatively targeting chitin or peptidoglycan; Fig. 3E, G). Microbes in all studied termite guts were also able to preferentially utilise α-glycans, as it was assumed from high expression levels of GH13 assigned gene transcripts.
Based on the LEfSe analysis, three GH families (GH53, GH76 and GH130) were enriched in plant fibre-feeding termites metatranscriptomes, while GH55, GH65 and GH94 were more represented in a soil feeders cluster (Fig. 3D). In the case of a plant fibre feeders cluster, they presumably encoded endo-β-1,4-galactanase, α-1,6-mannanase, α-glucosidase or β-1,2-oligomannan phosphorylase activities. In contrast, families enriched in soil-feeding termites were mainly assigned as putative exo/endo-β-1,3-glucanase; or cellobiose phosphorylase, trehalose, maltose phosphorylase or cellodextrin phosphorylase. In line with increased utilisation of the two major lignocellulose components (Fig. 3D, F), meaning cellulose and xylan, respective xylan-targeting CBM36 was slightly enriched in a plant fibre cluster, while cellulose-specific CBM6 was more abundant in soil-feeding termite gut metatranscriptome (Fig. 3E, G). Interestingly, glycogen binding domain CBM48 was enriched in soil feeders (Fig. 3E, G), and together with the enrichment of the glycogen phosphorylase coding genes (K00688, Additional file 2: Table S3), it would indicate intensive glycogen utilisation by the termite gut bacteria. In general, glycogen is a major intracellular reserve polymer of yeast and bacteria , therefore it might be also abundantly present in soil microbial biomass, which is the main diet component of soil-feeding higher termites.
The diversity and gene expression profiles of microbial CAZymes identified in our study remain in agreement with previous metatranscriptomic reports published for wood-feeding termites [10, 15]. Very little information is available on humus-, soil- and litter feeders, except for one previous metagenomic analysis of microbiota in the hindguts of six different wood- and soil-feeding higher termites . Yet, extrapolation of the metagenomics results to functional gene expression profiles revealed by metatranscriptomics is difficult to achieve and thus the two studies cannot be directly compared. Previously reported metatranscriptome of the dung-feeding termite, A. wheeleri, is the closest study that could be compared with our soil cluster . Accordingly, transcriptomic abundance of mainly GH11, GH5, GH3, GH10 families was consistent with our results.
Specific degradation of the different lignocellulose fractions by the termite gut microbial enzymes
It is well recognised that multiple enzymatic activities can be assigned to a single CAZyme family, e.g. including GH3, GH5, GH13. Therefore, to get more insights into the putative enzymatic activities, gene transcripts encoding for CAZymes in our study were further given an EC number using homology search to peptide pattern (Hotpep) . Previously, the study of carbohydrate hydrolytic potential in anaerobic digester showed the usefulness of the complementary Hotpep analysis to the dbCAN-mediated CAZymes-coding discovery . In that study, several in silico predicted enzymatic activities were further experimentally confirmed. Here, 921 prokaryotic gene transcripts were given EC numbers. Their assignment indeed showed that gene transcripts classified to e.g. GH5 family were further given different enzymatic activities in silico (mainly 126.96.36.199, followed by 3.2.1.x, 188.8.131.52 and 184.108.40.206, Fig. 4). According to the predicted enzymatic activities, the most highly abundant enzymes were the ones targeting the backbone of the different lignocellulose components (Fig. 5), including mainly endocellulases (cellulose and presumably xylo- and β-glucans; EC 220.127.116.11) and endoxylanases (heteroxylans; EC 18.104.22.168) and to a lesser extent endomannanases (heteromannan, EC 22.214.171.124). Most of the respective gene transcripts were taxonomically assigned to Firmicutes, Fibrobacteres and Spirochaetes, however, due to largely incomplete public databases and small number of available MAGs of termite origin, this taxonomic classification should be revised when more representatives of the termite gut microbiome have their genomes sequenced. Complete utilisation of cellulose and xylan by the two termite clusters gut microbiomes could be further confirmed by abundant expression of genes encoding for β-glucosidases (EC 126.96.36.199, EC 188.8.131.52) and xylosidases (EC 184.108.40.206), as well as the presence of respective sugar transporters and isomerases (discussed above). Concerning mannan utilisation, no putative mannosidase (EC 220.127.116.11) was detected in our dataset. However, as previously shown for other anaerobic biomass degrading environments, a combined action of N-Acyl-d-glucosamine 2-epimerase and 4-O-β-D-mannosyl-d-glucose phosphorylase would first transform mannobiose into β-D-mannosyl-(1→4)-D-glucose, with the subsequent hydrolysis of mannosylglucose to glucose and mannose-1-phosphate . Both enzymatic categories were characterised with relatively high gene expression levels in fibre- and soil-feeding termite datasets, pointing to a similar mechanism of mannobiose hydrolysis in the termite gut.
Next to cellulose and lignin, mannans and xylans (hemicellulose) are the main components of woody biomass , and both polymers can be largely substituted with e.g. arabinose, galactose, xylose, glucuronic acid and other simple sugar monomers. Accordingly, in the case of plant fibre-feeding termites, multiple debranching enzymes including α-arabinofuranosidases (EC 18.104.22.168), α-galactosidases (EC 22.214.171.124, EC 126.96.36.199) and α-xylosidases (EC 188.8.131.52), were slightly more abundant. Ferulic acid is the most abundant hydroxycinnamic acid in the plant cell wall and it is ester-linked to the cell wall polysaccharide arabinoxylan . By forming covalent linkages between polysaccharide chains (cross-linking) and polysaccharide and lignin components, it limits the accessibility and thus the digestibility of polysaccharides in plant biomass. Consequently, its higher expression was identified in plant fibre feeders, however the discovery rate of putative feruloyl esterases (EC 184.108.40.206) was low in our study, and mainly limited to the metatranscriptome of the wood-feeding Microcerotermes (M.sp_2). Nevertheless, expression of trans-feruloyl-CoA hydratase/vanillin synthase (K18383) genes in several wood- and soil-feeding termite gut microbiomes indicates that bacteria can further metabolise ferulic acid to vanillin, as a part of their secondary metabolism. By contrast, average cumulative expression of putative acetyl esterases (EC 220.127.116.11) was higher in soil feeders.
Community-wide lignocellulolytic phenotype of different termite gut microbiomes is contributed by distinct multiple and single bacterial players
Direct comparison of the re-constructed gene transcripts showed that roughly 0.18 % of all de novo re-constructed prokaryotic genes were shared between all the samples. These values were slightly higher inside the two clusters, and 0.49 % of transcripts was shared between the samples assigned to plant fibre feeders. For soil feeders, slightly less of the total repertoire of expressed genes was common to all of the studied gut microbiomes (most probably due to higher species diversity), with roughly 0.29 % of shared transcripts. Yet, genes transcripts assignment to broader functional categories and subsequent enrichment of common KOs in gut metatranscriptomes of both types of termite feeding diets provides an example of the functional congruency. This observation indicates that even though each termite species operates with its own gut bacteria, there is a functional equivalence in microbial populations across different termite hosts. Taxonomically distinct microbial communities, displaying conserved global functional profiles, have been previously reported in other environments, including anoxic waste water treatment tanks  and marine sponges . Moreover, in the previous metagenomic and metatranscriptomic study of two higher termite species, the convergence of functions essential to termite biology among the gut microbiomes of wood- and dung feeders has already been proposed .
Based on the number of different genes assigned to the same functional category, the diversity of microbes contributing to the observed phenotype was 1.65 ± 1.2 fold higher in gut microbiomes of soil-feeding termites versus their plant fibre-feeding counterparts. It makes a direct link with the significantly higher bacterial compositional diversity in soil- versus plant fibre-feeding termites, as discussed above (Fig. 1B). For most of the assigned functionalities, we could also observe a significant correlation between the number of gene transcripts assigned to a gene category and its cumulative expression per sample (Additional file 1: Figure S12). It would suggest that most of the observed microbial processes are the collective metabolisms of multiple taxa contributing to a particular ecosystem trait, rather than the metabolic dominance of single prokaryotic players.
Functional gene redundancy implies that similar metabolic functions are expressed by multiple bacteria . In the case of the termite gut, it can be directly extrapolated to upper levels of functional hierarchy including CAZymes diversity profiles and even their gene expression patterns. For example, the occurrence and abundance of different gene transcripts assigned to GH11 family strongly differ between all the samples, pointing towards the unique gene repertoire of each prokaryotic metatranscriptome (Fig. 6). It is also interesting to note that several GH11 gene transcripts were characterised with exceptionally high expression levels, compared to the average expression levels of the remaining GH11 assigned genes (Additional file 1: Figure S13). The attribution of over 80% of the sequencing reads to roughly 21.8 % ± 9.5 of putative GH11 gene transcripts would indicate that the degradation of the xylan backbone is confined to single microbial players rather than to multiple microbial populations. Similar results, with few CAZymes outliers compared to the average expression levels of genes assigned to a given GH family were shown for e.g. GH5 or GH10. Altogether, it indicates that next to the combined activity of multiple microbial populations, single bacterial players may also contribute to some of the observed lignocellulolytic phenotypes of the termite gut system. However, in none of the cases the abundance of the most highly abundant transcripts was comparable to the GH11 family outliers. From the application point of view, such enzymes are potentially interesting candidates for further bioprospecting, once their factual activity is biochemically characterised.