Structure of the termite gut microbiome in the different gut compartments
The composition of microbial communities residing in the main L. labralis gut compartments was first assessed with 16S rRNA gene amplicon high-throughput sequencing. For this purpose, DNA was extracted from individually sampled gut sections, including the foregut, midgut and four major sections of the hindgut (P1, P3, P4 and P5; Fig. 1A). The sequencing resulted in 298,649 quality-trimmed reads, which were further assigned to 3,478 bacterial OTUs (Additional file 1: Table S1). Bacterial richness and diversity (based on normalised read counts to 27.5k reads) were the highest in the P3 section of the hindgut and dropped towards posterior segments of the gut (Fig. 1B, C). The number of reads generated for the foregut and midgut samples was significantly lower, therefore the diversity indices were not calculated for these compartments. Following the taxonomic annotation, in total, 26 bacterial phyla were identified in the different gut compartments, with the highest number of OTUs being associated with Firmicutes, followed by Proteobacteria, Bacteroidetes and Spirochaetes. Firmicutes was the most abundant in the foregut and hindgut sections, with Bacilli being the dominant class in the foregut, and Clostridia in the hindgut compartments (Fig. 1D, E). The midgut seemed mainly colonised by Spirochaetes, which were also abundant in the anterior hindgut, however different OTUs dominated in the different gut compartments (Additional file 1: Table S1), showing the niche preference of the different species. At the OTU level, despite an overlap in species between the different hindgut sections (Fig. 1C), a significant dominance of single OTUs was noticeable in the different gut compartments. For example, Treponema OTU_2 accounted for 13 % of the bacterial community in P1, while it was nearly absent in P2-P5 (less than 0.05 %). Similarly, Ruminococcaceae OTU_13 (Firmicutes) dominated in P3 (4.2 %), Desulfobulbaceae OTU_4 (Proteobacteria) in P4 (10 %) and Dysgonomonadaceae OTU_8 (Bacteroidetes) in P5 (14.6 %; Additional file 1: Table S1). Interestingly, one OTU dominated the Verrucomicrobia community in the posterior hindgut compartments (accounting for 7.9 % of bacterial community) and was associated with Diplosphaera genus (Additional file 1: Table S1). We did not manage to get a good-quality MAGs of this species (see below) to further reconstruct its metabolic potential. Nor was it reconstructed in a recent study of termite gut MAGs [29]. However, a previously characterised Diplosphaera colitermitum, isolated from the lower termite Reticulitermes flavipes, indicated a potential to degrade cellulose, even though the in vitro substrate utilisation spectrum was limited to starch and a few mono- and disaccharides [42]. Microenvironmental heterogeneity, including the difference in the oxygen gradient and pH across the termite gut, is the main driving force behind the dominance of the different bacterial species in separate gut sections [16]. Although these parameters have not yet been specifically assessed for L. labralis, to date there is a clear trend in all soil-feeding higher termites, which might be extrapolated to our study [14, 43]. In general, the foregut and midgut parts are neutral and characterised with the highest oxygen partial pressure, while the anterior hindgut compartments, and particularly P1, are highly alkaline and anaerobic.
No analysis of microbial communities separately in the different gut compartments of Labiotermes species has been done previously. However, analogous studies performed for other soil-feeding termites show similar trends, where the dominance of Firmicutes is most prominent in the anterior hindgut compartments, while Bacteroidetes and Proteobacteria become more abundant towards the end of the termite gut tract [16, 44]. To allow for a further comparison with the previously studied Labiotermes spp., we also analysed the average community composition of the hindgut by combining reads from the different hindgut sections together. As a result, the clear dominance of Firmicutes and to a lower extent Spirochaetes showed a strong resemblance to a previously published record [10]. Nevertheless, small differences in niche preferences of certain species can only be highlighted when analysing the different gut sections separately. Notably, the decreasing abundance of Firmicutes in the posterior hindgut in favour of Bacteroidetes, Proteobacteria and Verrucomicrobia, could not be deduced solely from the combined hindgut analysis.
Hindgut metagenomics and reconstruction of metagenome assembled genomes
To further assess the ability of specific bacteria to digest carbohydrates in the termite tract, we separately extracted the hindgut luminal fluid combining all the hindgut compartments and sequenced the whole community DNA. The sequencing effort resulted in an assembly of 304,288 contigs harbouring 763,653 genes. We initially reconstructed 143 microbial MAGs of different completeness and contamination (Additional file 1: Table S2). They were further refined, resulting in 39 good-quality microbial MAGs, of which 38 were classified as being of bacterial origin (Additional file 1: Table S3). The final collection of good-quality MAGs, with an average completeness of 71.4 % ±13.3 and contamination of 2.7 % ±2.4 was used for all comparative analyses. The majority of MAGs were assigned to Firmicutes, mainly Clostridia, which correlated with the dominance of these bacteria in the different hindgut compartments of Labiotermes workers (Fig. 1E). No Bacilli MAG was reconstructed, showing that species present in the foregut compartment were unlikely to migrate to the hindgut, possibly due to very different pH preferences. Only a few MAGs were assigned to Fibrobacteres, Proteobacteria and Spirochaetes, and no good-quality Bacteroidetes MAGs were recovered here, despite Bacteroidetes being the second most abundant bacterial phylum in P4 and P5 hindgut sections, based on the community structure analysis (Fig. 1D). We further compared our draft genomes to the previously published collection of MAGs from the gut of different higher termites, including 169 MAGs from the Labiotermes sp. gut [29]. Roughly 11 MAGs showed average nucleotide identity (ANI) exceeding 70 %, with an average of 78.7 % ± 2.4. None of the MAGs exceeded the species-level threshold of 95 % ANI, suggesting that all the good-quality MAGs reconstructed in our study represented novel bacterial species (Additional file 1: Table S4).
Single MAGs of rare bacterial phyla in the context of the termite gut included Cloacimonetes (MAG8; Fig. 2), Cyanobacteria (MAG33) and Patescibacteria (MAG32). The former was among the most metagenomically abundant bacterial MAGs (Fig. 2B), while the latter two were less abundant, and only characterised with less well reconstructed genomes. Cloacimonetes bacteria from other mainly anaerobic habitats are regarded as putative syntrophic propionate oxidisers (SPO), closely interacting with methanogens [45]. Similarly, the termite gut Cloacimonetes seem to have the same genomic potential towards SPO oxidation than the other known Cloacimonetes, making it a putative syntrophic partner of methanogenic archaea in soil-feeding termites. The other, previously sequenced termite gut Cloacimonetes was also reconstructed from a soil-feeding Labiotermes species [29]. Further comparisons of the genomic content of these two Cloacimonetes MAGs with other Cloacimonetes genomes indicated the enrichment of genes associated with cell mobility in the termite cluster Cloacimonetes (data not shown). Indeed, bacteria in the termite gut system are known to be highly mobile, possibly allowing them to actively reach their preferred substrate or swim against the physico-chemical gradients [10, 46]. The cyanobacterial MAG was affiliated with the Vampirovibrionia group (previously known as Melainabacteria), and to our knowledge, it represents the first MAG of this sister group of Cyanobacteria ever reconstructed from the termite gut microbiome. Nevertheless, its metabolic activity in the termite gut system seems minor, as could be further deduced from the gene expression profiles (see below).
One of the MAGs was affiliated with a previously reconstructed archaeal MAG from a Labiotermes species study [29] and assigned to the Methanomethylophilaceae family (Additional file 1: Table S3). Based on the ANI comparison, both archaeal MAGs represented different species. Three other low quality Methanomethylophilaceae MAGs were additionally reconstructed (Additional file 1: Table S2), suggesting that this archaeal family dominates the Labiotermes gut.
Metabolic potential towards lignocellulose digestion of main gut bacteria
In continuation, by mapping metagenomic and metatranscriptomic reads to the reconstructed MAGs, we determined the abundance (averaged over the hindgut compartments, as we only sequenced total hindgut metagenome representing the bacterial luminal fluid content) and activity of specific bacterial groups in the different gut compartments of Labiotermes workers. In addition to the discussed Cloacimonetes MAG, it was mainly Clostridia and two Treponema (Spirochaetes) that were among the most abundant MAGs in the termite hindgut metagenome. Not surprisingly, most of the MAGs proved to express their genes in the anterior hindgut compartments (Fig. 2B), mainly P3 which was also characterised with the highest bacterial richness and diversity (Fig. 1B). The P3 compartment is considered a preferred niche of microbial activity in the termite digestive tract [1]. Two MAGs assigned to Firmicutes (MAG23 and MAG28), both Clostridia, and one Proteobacteria (MAG2 affiliated with Xanthobacteraceae) were characterised with the highest metatransciptomics abundance in the P1 section of the hindgut. Bacterial MAG33 representative of Cyanobacteria, MAGs 13, 24 and 35 all assigned to Firmicutes, Patescibacteria MAG32 and MAG20 of Proteobacteria origin were characterised with the lowest transcriptional activity across the studied termite digestive tract compartments. We did not manage to link any OTU with a reconstructed MAG.
We further analysed the CAZyme genomic content and gene transcript expression levels in the different gut sections. In total, 1,713 of CAZy domains localised in 1,638 genes were detected in reconstructed good-quality MAGs, and the highest number was assigned with GHs (990 genes), glycosyl transferases (GTs; 404), carbohydrate esterases (CEs; 150) and carbohydrate binding domains (CBMs; 56), followed by pectin lyases (PLs, 23) and auxiliary activity enzymes (AAs; Additional file 1: Table S5) Representatives of Fibrobacteres and Firmicutes were characterised with the highest CAZyme gene coding frequency, 3.4 % and 3.1 %, respectively. The two phyla also encoded the highest diversity of GH gene families, 16.6 ± 8.4 and 10.5 ± 4, respectively, for Firmicutes and Fibrobacteres MAGs. The numbers are less representative for the latter phylum, as only two MAGs were reconstructed here. The highest number of GH genes was affiliated with GH5 (133 genes), and in particular mainly with a sub-family GH5_4, previously shown to contain multi-functional enzymes of Spirochaetes termite gut origin, with a specificity towards mannan, cellulose and xylan [9]. The most widespread GH family was GH13, and it was detected in 85 % of MAGs. This family includes enzymes involved in the degradation of alpha glucans (i.e. polysaccharides with glucose units linked by α(1®4) glycosidic bonds), and it is also widely expressed by the host in the upper part of the digestive tract (see below). The other widely encoded GH families included GH109 (acting on α-N-acetylgalactosamine, i.e. bacterial cell wall), GH23 (peptidoglycan and chitin), GH77 (starch) and GH3 (β-glucosidase and xylosidase). Of these, only GH3 and GH77 were defined as present in over 85 % of analysed termite species, according to the recent analysis of 129 termite gut metagenomes [8]. Differential utilisations of GH genes by distinct MAGs, as deduced by the gene expression levels (Fig. 2D), further highlights specialisation towards the different lignocellulosic and non-cellulosic polysaccharides. Among carbohydrate esterases, CE4 genes were present in multiple copies in nearly every reconstructed MAG, supporting the bacterial hydrolysis of hemicellulose (i.e. different types of xylans), as well chitin and peptidoglycan. In total, 14 different CBM families were detected, with CBM48 (glycogen binding) being the most widespread. Again, this supported the preference of the termite gut microbiome towards easy-to-degrade alpha glucans [9].
The number and diversity of bacterial oxidative enzymes known to be involved in lignin degradation and modification (i.e. AA1, 2, 4, etc.) was much lower compared to hydrolytic GHs, and no lytic polysaccharide monooxygenases (LPMOs) were detected (i.e. AA10; Additional file 1: Table S5). This result strongly contrasts with the observation that the soil-feeding termite gut microbiota is mainly active in the degradation of reduced substrates, including tannin-protein complexes and polyaromatics resulting from lignin degradation [5]. The Proteobacteria-assigned MAG2 encoded the largest repertoire of AAs, with six out of eight AA genes belonging to the AA3 family (mainly AA3_2), representing cellobiose dehydrogenases. Cellobiose dehydrogenases are extracellular enzymes whose function is to reduce the LPMOs via extracellular electron transfer, thus supporting oxidative polysaccharide depolymerisation [47]. Currently, AA3 genes in the CAZy database are exclusively of fungal origin, with just one putative AA3 gene sequence being of prokaryotic origin, namely a halophilic archaeon of the Halorubrum genus [39]. Nevertheless, prokaryotic genes assigned as AA3_2 were also present in the metagenomic reconstructions from higher and lower termite guts, as recently reported by [8]. In general, MAG2 was the most transcriptionally active in the highly alkaline P1 compartment, including the highest expression of AA3 coding genes. No LPMOs gene transcripts were detected in any bacterial genome, therefore the function of prokaryotic AA3_2 CAZymes remains speculative.
As a complement, we also verified whether CAZymes tend to cluster in the genomes of the soil-feeding termite gut microbes as previously shown for a grass-feeding Cortaritermes species [9]. As a result, we detected 51 putative CAZyme clusters, containing at least one CAZyme, one transporter and one transcriptional regulator (Additional file 1: Table S6). As expected with the overdominance of Firmicutes, these clusters were mainly detected in the reconstructed clostridial genomes, and only two clusters were from Cloacimonetes and Spirochaetes origin. GH2, GH4, GH9 and GH94 were the most frequently clustered CAZymes. Due to a high level of MAGs fragmentation, we suspect that the real number of CAZyme clusters is much larger, and clustering CAZymes into functionally complementary units is a general trend in bacterial genomes from biomass-rich habitats (own data, unpublished).
De novo metatranscriptomics and termite host gut transcriptomics
To further understand the lignocellulose degradation mechanisms in the different termite gut compartments, we separately reconstructed the de novo termite gut metatranscriptome and host gut transcriptome. The de novo co-assembly of reads from the six termite gut samples resulted in 2,090,052 contigs, with 2,114,234 recognised ORFs, corresponding to partial and complete gene transcripts. Of these, roughly 30 % were taxonomically annotated. Gene transcripts of bacterial origin corresponded to 369,503, representing around 17.5 % of recovered ORFs. In accordance with the gut microbial community structure, most of the gene transcripts were of Firmicutes origin, mainly Clostridia and to a lesser extent Bacilli, followed by Spirochaetes, Bacteroidetes and Actinobacteria (Additional file 2: Fig. S1). The highest transcriptional abundance of Firmicutes-assigned gene transcripts was in the anterior hindgut part (P1 and P3), while Actinobacteria, Bacteroidetes and Proteobacteria dominated the posterior hindgut compartments. Firmicutes and Actinobacteria were also the most transcriptionally active phyla in the foregut, while Spirochaetes mainly expressed their genes in the midgut. Overall, metatranscriptomic profiles correlated well with the abundance of specific phyla in the different gut compartments (Fig. 1D, Additional file 2: Fig. S1). Archaea gene transcripts were more abundant in the posterior hindgut, and mainly the P4 compartment, and were dominated by Euryarchaeota.
Community-wise, metatranscriptomic profiles relative to the different gut compartments were dominated by distinct functional groups (i.e. KOs), indicating different microbial activities in the different gut sections (Additional file 1: Table S7). Flagellin was overall the most transcriptionally abundant KO in the midgut and anterior hindgut only, suggesting that mainly bacteria residing in these gut compartments are motile. This characteristic was largely reduced in the posterior hindgut and was nearly absent from the foregut. As expected, and also in line with previous reports [15], methanogenesis was dominant in the posterior hindgut, where the pH is closer to the neutral values. Interestingly, acetoclastic methanogenesis prevailed in P4 and less in P3, and hydrogenotrophic mainly in P4, while methanogenesis from methanol also took place in P1, in addition to P3 and P4 (Fig. 3). Sulphate reduction seemed to take place mainly in the last hindgut compartment, characterised with the lowest pH. A relative abundance of gene transcripts associated with the urea cycle was highest in the foregut and midgut. Gene transcripts assigned to the KOs relative to carbohydrate hydrolytic enzymes (i.e. EC:3.2._) accounted for 1.33 % ±0.34 of all functionally assigned gene transcripts. This number is only slightly lower than the carbohydrate hydrolytic KOs that accounted for 1.99 % ±0.27 of all functionally annotated genes for a grass-feeding Cortaritermes species [9]. This indicates that the soil-feeding Labiotermes still feeds largely on cellulosic and non-cellulosic polysaccharides present in soil. Nevertheless, to complement the presumably low-carbon diet, microbes in the Labiotermes gut system actively fix CO2 employing distinct autotrophic carbon fixation pathways in the different compartments (Fig. 3, Additional file 1: Table S8). Photosynthetic Calvin cycles dominate in the foregut, most probably as a residual activity of ingested algae, cyanobacteria and other photosynthetic bacteria present in soil. A reductive citric acid cycle (Arnon-Buchanan cycle) prevails in the midgut and anterior hindgut, while the Wood-Ljungdal pathway seems to be less employed by bacteria, and low-abundant gene transcripts are only present in P1-P4 compartments.
Similar to the metatranscriptome assembly, six separate termite host gut libraries were de novo co-assembled into 218,010 contigs and of the predicted ORFs 14,281 were assigned to Insecta origin. The completeness of the termite gut transcriptome was assessed at 71.5 %. A functional annotation and abundance of the termite de novo reconstructed gene transcripts is given in Additional file 1: Table S9. Briefly, the enzymes involved in peptide hydrolysis (i.e. EC:3.4._) were among the most highly expressed metabolic genes in the foregut and midgut sections, with a trypsin KO1312 being the most expressed functional gene category in the midgut. A high abundance of peptidases-encoding gene transcripts suggests that the host metabolism is centred at protein degradation. Bacterial biomass is suggested to be the major source of proteinaceous food [48, 49]. Chitinase (KO1183), endoglucanase (KO1179) and α-glucosidase (KO1187)-coding gene transcripts were among the most abundant carbohydrate hydrolases in the midgut compartment. Further analysis of the termite gut transcriptome is limited to the host carbohydrate metabolism (see below).
Bacterial lignocellulolytic activities in Labiotermes gut compartments
An analysis of the reconstructed bacterial community metatranscriptome allowed for the identification of 3,534 CAZymes-domains, of which roughly 197 were also reconstructed in bacterial MAGs of Firmicutes, Fibrobacteres, Spirochaetes and Proteobacteria origin mainly. In accordance with a previous report, only a low overlap between the termite gut metagenome and metatranscriptome could be achieved [9], highlighting the importance of the metatranscriptomic reconstruction to obtain a complete picture of the termite gut microbiome lignocellulolytic capacities. In total, 166 different main CAZyme families were detected, or 239 if counting sub-families, which is significantly higher than the recently reported numbers for other soil-feeding termite gut metagenomes, on average 77.2 ±42 CAZyme categories [8]. The high diversity of microbial CAZyome may be linked to the fact that the Labiotermes considered a true soil feeder consumes the poorest quality biomass, and to achieve an acceptable nourishment level it must employ a larger portfolio of digestive strategies (i.e. a higher diversity of CAZymes) to efficiently assimilate the whole variety of nutritional resources available in soil. This could contrast with the lower CAZyme diversity in termites occupying the wood/soil interface (often categorised as soil feeders), which are associated with tree root systems, or highly humified but still recognisable organic litter [5]. The de novo reconstructed CAZymes in L. labralis metatranscriptome included GHs with 1,792 assigned gene transcripts, GTs (768), 15 AAs genes, CBMs containing genes (400), CEs (318), and to a much lesser extent, PLs (107) and SLHs (128). The highest diversity of bacterial CAZyme gene transcripts were of Firmicutes (2577 genes), Bacteroidetes (274), Actinobacteria (161), Proteobacteria (143), and Spirochaetes (136) origin, presumably making these phyla dominant lignocellulose hydrolysers in the termite gut system studied. CAZyme gene transcripts affiliated with the above bacterial phyla were detected in all of the studied gut compartments (Fig. 4A and B). However, the highest metatranscriptomic abundance and the largest number of expressed genes were in the anterior hindgut, which is typically the main site of bacterial presence and activity in the termite gut [1]. Anterior hindgut is also the environment of the highest lignocellulolytic activity of Firmicutes, while Bacteroidetes seem to be more actively involved in polysaccharide degradation in the posterior hindgut. Spirochaetes are the most transcriptionally active in the midgut and P1, while CAZyme gene transcripts of Actinobacteria and Proteobacteria were equally abundant across the whole gut system, with the exception of the foregut. Verrucomicrobia, which were relatively abundant in the posterior hindgut based on the 16S rRNA gene amplicon analysis (Fig. 1D), were slightly more active in P4 and P5, although the overall gene transcripts abundance was lower compared to the other phyla. Fibrobacteres, which next to Spirochaetes are the main biomass hydrolyser in the wood- and grass-feeding higher termites [9, 10], expressed some of their lignocellulolytic potential mainly in the P3 hindgut compartment (Fig. 4B).
Glycoside hydrolase was the most abundant enzyme class, and the diversity as well as the transcriptional abundance of expressed genes was highest in the anterior hindgut (Fig. 5). In total, 75 different GH families were detected and most of the annotated gene transcripts were of Firmicutes and Bacteroidetes origin. The largest diversity of genes was assigned to GH4 (232 gene transcripts), followed by GH3 (198), GH5 (140) and GH10 (118; Table S10). Evidently, there is a niche specialisation, and the different bacterial players utilise different GHs to target specific lignocellulose fibres in the different gut compartments. In the foregut part, alpha glucans targeting GH13 enzymes are employed by Firmicutes to degrade starch and trehalose. Further degradation of starches (GH4) and fucose (GH29) was the main activity taking place in the midgut, and the most abundant gene transcripts were of Firmicutes (GH29 and GH4) and Spirochaetes origin (GH29). The xylan-targeting endo-acting enzymes GH10 and GH11 are mainly employed by Firmicutes in P1 and to a lower extent in P3 gut parts, which are the most highly alkaline compartments [43]. Collectively, these two GH families were also among the most transcriptionally abundant GHs, showing the importance of hemicellulose in the diet of the soil-feeding Labiotermes. Of the main cellulose targeting GH families (i.e. GH1, GH3, GH5, GH9), GH3 and GH5 gene transcripts from Firmicutes were the most abundant in P1-P4. Actinobacteria seem to employ their GH3 and GH5 genes across most of the termite gut system, however, different genes are expressed in the different gut compartments. Bacteroidetes are mainly actively involved in cellulose degradation in the posterior hindgut, with much less activity in P1 and P3. Interestingly, while GH3 enzymes are rather compartment-specific, genes encoding GH5, and in particular GH5_4, are expressed over the different hindgut compartments, characterised with distinct pH values.
Bacterial contribution to the oxidative lignocellulose degradation was highest in P1, and most of the highly expressed AA gene transcripts were assigned to Firmicutes, mainly AA4, AA7 and AA10 families. The latter contains known bacterial LPMOs. Only five AA6 gene transcripts were detected, three of Firmicutes origin, and one each assigned to Proteobacteria and Deinococcus, all of them mainly expressed in the anterior hindgut compartments. Most of the bacterial CE genes were also expressed in the anterior hindgut, with the highest gene transcript abundance being measured in P3 (Fig. 4C, D). The dominant CE family in terms of the number of reconstructed gene transcripts and the overall transcript abundance was CE4 (191 gene transcripts), commonly involved in the deacetylation of xylan and chitin. CE1 was the second most abundant (38 gene transcripts) and highly expressed family. It contains known feruloyl esterases, i.e. enzymes which break the linkages between the lignin and hemicellulose moieties, and it is mainly bacteria of Firmicutes origin that employ this enzyme in the different hindgut compartments.
Diversity and expression patterns of host CAZymes
To complement the prokaryotic lignocellulolytic activity in the termite gut system, we also analysed the host CAZyome in the different gut compartments. As a result, a search for CAZyme gene transcripts in the reconstructed host transcriptome revealed the presence of 184 CAZymes assigned to AAs (4 families), CBMs (4), CEs (3), GHs (22), and GTs (36), with the latter two classes comprising the majority of the expressed genes (Fig. 4E, F; Additional file 1: Table S11). In comparison with the termite gut metatranscriptome, most of the host CAZyme coding genes were simultaneously expressed in the different gut compartments, however, the highest transcriptomic abundance was present in the foregut and midgut, and the lowest in P3, where most of the microbial activity takes place. This remains in agreement with previous molecular and enzymatic reports [1, 50], and designates anterior compartments of the termite digestive tract as the main sites of the activity of the host hydrolytic enzymes, excreted by the labial gland and the midgut epithelium, respectively. Interestingly, presumed activity of the same enzyme in the different gut compartments, characterised with a very distinct pH, would indicate a termite host itself as a source of CAZymes with an unusually high pH tolerance.
The largest discrepancy between the host and prokaryotic CAZyomes was a proportionally higher transcriptional abundance of genes hosting the CBM domains in the host transcriptome, specifically those assigned to the CBM13 and to a lesser extent, to CBM14 families. In total, 22 different GH families were identified in the termite transcriptome, suggesting a relatively broad hydrolytic potential of the host. The most highly expressed gene family was GH22, which encodes for lysozyme, and the highest transcript abundance was recorded in the midgut followed by the foregut (Fig. 5). As in the case of other soil-feeding insects (e.g. the Scarabaeidae insect; Wang, 2021) and soil-feeding termites [8], bacterial biomass present in the soil matrix is hydrolysed, providing essential nutrients to the termite host. Gene transcripts affiliated with plant-biomass targeting CAZyme families GH1, GH13 and GH18 were mainly expressed in the foregut and midgut sections. The highest transcriptomic abundance of the three GH9 genes paralogue encoding for an endoglucanase was in the midgut and is in agreement with previous reports on the concentration of the β-1-4-endoglucanase activity in the midgut of wood-feeding termites [52]. While many GH families seem to be expressed by both the termite host and the gut microbiome, some were restricted to the host only, and these included GH37, GH47, GH56, GH63, GH79, GH89, and GH152 (Fig. 5). In turn, the number of GH families exclusively present in the gut metatranscriptome was much higher, and included GH3, GH4, GH5, GH8, GH10, GH11, GH23, GH29, GH43, and GH109 among the most transcriptionally abundant.
Comparison of termite endogenous CAZymes across different termite species
In continuation, we compared the CAZyome of L. labralis to that of Cortaritermes sp. ([9]; both corresponding to the de novo reconstructed transcriptome) as well as to the CAZymes identified in Macrotermes natalensis [53], Zootermopsis nevadensis [54] and Cryptotermes secundus genomes (GenBank assembly accession GCA_002891405.2). In general, a lower CAZyme domain recovery rate in the L. labralis transcriptome was observed than in the sequenced genomes, especially compared to lower termites. However, the overall diversity patterns of the CAZyme families detected were similar among all the termite species, despite their completely different feeding regimes (Additional file 1: Table S12). On one hand, the lower number of CAZymes identified in L. labralis might be due to the incomplete termite transcriptome reconstruction. On the other hand, the presumably reduced lignocellulolytic potential of higher termite hosts compared to lower termites was already previously noted [53], and may be complemented by the higher diversity of hydrolytic microbial symbionts associated with the higher termite gut system.
As the cellulolytic activity is expected to be weaker in general in soil-feeding termites than in the wood feeders, we further investigated the evolution of the termite GH9 cellulases to better understand the ability of L. labralis to feed on different forms of cellulose present in soil. To infer the enzyme phylogeny, we incorporated the higher termite endoglucanase sequences from [55] that were supplemented with (partially reconstructed) GH9 proteins from L. labralis and other termite species including lower termites (Fig. 6). For most of the termite species, GH9 endoglucanase paralogs from the same species clustered together, confirming previous observations [55]. The only exceptions were the soil-feeding L. labralis, the wood-feeding Nasutitermes takasagoensis and the grass-feeding Cortaritermes sp. The gut microbiome of the latter species is typical of wood-feeding termites [9], confirming its affiliation with a plant fibre feeding termite guild. However, in the former study, the termite colony was collected in a deforested savannah region, and the termite seemed to build its nest out of soil particles, which may indicate a significant proportion of soil in its actual diet. In turn, N. takasagoensis is a wood-feeding arboreal nesting-type species [56], therefore the similarity of its GH9 gene paralogs to a true soil feeder is surprising. Indeed, the respective endonuclease paralogs from the three species co-clustered together, regardless of the phylogenetic distance and different diets. Three sets of paralogous endoglucanases clustered together with the soil- and wood-feeding higher termites (two of the three co-clustered with N. takasagoensis; clusters I-III, Fig. 6). The fourth cluster contained L. labralis and Cortaritermes sp. enzyme sequences, which were largely similar to endonucleases of phylogenetically basal lower termites (96 to 98 % of identity at the protein level) and were grouped separately from higher termites. The endoglucanases of all the other soil-feeding termites investigated here, including Anoplotermes, Grigiotermes, Subulitermes and Pericapritermes genera, co-clustered together in species-specific groups, forming one soil and litter feeding termite cluster (Fig. 6; [55]). Highly similar paralogous endoglucanases of the same species most possibly arose through gene duplication, as previously suggested for other termites [55]. A higher sequence homology likely imposes a similar specific activity on the produced enzyme, restricting the cellulolytic phenotype of the host. By contrast, the structurally distinct paralogous endoglucanases of L. labralis, Cortaritermes sp. and N. takasagoensis possibly show a different specificity towards distinct forms of cellulose. Such a concept has not yet been tested for termite enzymes, however, it was recently documented for cellulases from the Cellulomonas fimi bacterium [57]. The authors showed for example that the difference in the cellulose processivity of two GH6 cellulases was presumably due to their structural difference. In this case, processivity was defined as an extent of a continuous hydrolysis without enzyme detachment from the substrate, allowing for an efficient degradation of crystalline cellulose, for example. Such a capacity could render some termite species more efficient than others in cellulose digestion. This could possibly contribute to the ecological success of Labiotermes sp. which is one of the most abundant species in the Amazonian rainforest [13]. Further studies at the genomic level of the different termite host CAZyomes could shed light on the underlying genetic principles of the termite dietary evolution.
Complementary contribution of the host and its gut microbiome to soil polysaccharide degradation
Based on the variety of prokaryotic CBM domains, which is significantly larger than for the termite host, the gut microbes can target a much more diverse range of polysaccharides present in soil. Briefly, host enzymes would primarily target alpha glucans (i.e. starch; CBM13, CBM21) and chitin (CBM14), with the majority of CBM-carrying genes being expressed in the foregut, midgut and P1. Increased host lignocellulolytic activity in P1 may actually be attributed to the mixed segment, which was not specifically targeted in our study. No cellulose-targeting termite CBM genes were discovered, even though at least three GH9 paralogous endoglucanase genes are encoded in the L. labralis genome. The expression of specific genes carrying prokaryotic CBM domains seems to be compartment-specific, which is in accordance with a previously published metatranscriptomic analysis of the gut compartments of the wood-feeding Coptotermes formosanus lower termite [50]. The most widely expressed prokaryotic CBMs were also characterised with the highest transcriptional abundance and included xylan targeting CBM9, CBM54, CBM4 and CBM22, galactose, mannose and, alternatively, lactose targeting CBM13 and CBM32 and peptidoglycan and chitin binding CBM50.
The enzymatic degradation of soil organic matter which contains diverse polysaccharides, including lignocellulose components and recalcitrant chitin and peptidoglycan, both co-polymerised with polyphenols, starts in the foregut and continues across the whole gut system with the sequential detachment and removal of lignin, hemicellulose, and cellulose particles, as well as an intensive degradation of bacterial biomass. Based on the gene expression profiles, the host enzymatic machinery active in the upper digestive tract would first extract freely available simple sugars and oligo/disaccharides, e.g. glucose, mannose and mannooligosaccharides, galactose, cellobiose, etc. Alpha glucans would also be preferentially degraded in the foregut. For example, the gene transcript abundance of GH13 showing α-amylase activity is highest in this compartment (Fig. 5). Some microbes also benefit from the relatively easily accessible non-cellulosic polysaccharides and simple sugars already in the foregut and midgut by expressing for example their α -amylases (GH13), β-glucosidase (e.g. GH1, GH2, GH3, GH5, etc.) or maltose phosphorylases (e.g. GH65). Based on the CAZyme expression profiles, the upper digestive tract is also the site of bacterial biomass and chitin degradation, which confirms earlier reports [58]. Host lysozyme activity is expressed throughout the whole digestive tube, but GH22 gene transcripts are most abundant in the midgut and foregut compartments. Expression of the host lysozyme activity in the different gut compartments indicates that termites may simultaneously depend on their gut microbiota as mutualists and exploit them as a food source. In addition to providing essential nutrients to the host [51], a high bacterial biomass turnover could serve other purposes. First, the enzymes released from hydrolysing microbes present in soil or upper gut compartments, constitute a pool of “public goods” helping other microbes to degrade biomass faster [59]. Second, the degradation of bacterial biomass that is carried over from upper gut compartments together with the digested food, would minimise the competition between the microbial populations present in the different gut sections, indirectly promoting the different gut compartments to specialise in the efficient utilisation of separate biomass fractions. Third, a significantly higher expression of the host lysozyme coding genes in the upper digestive tract could also be a way to control the microbial population present in the foregut and the midgut, in order to assure the maximum utilisation of nutrients by the host. Hindgut bacteria also actively express lysozymes (i.e. GH23-GH25, and GH73), and the highest transcriptional abundance was observed in the P1 and P3 compartments.
Chitin is the second most abundant polymer in nature, also widely present in a tropical soil where it originates mainly from the decomposition of the exoskeletons of arthropods and fungi. Constantly complementing the diet with chitin, which is also a source of nitrogen, was recently suggested for the grass-feeding higher termite Cortaritermes sp., based on high gene expression levels of chitin-targeting enzymes [9]. Labiotermes seems to utilise chitin through a combined action of GH18 (chitinase activity prevails among the characterised eukaryotic GH18 enzymes) and CE4 (chitin deacetalyse is the only activity of the characterised eukaryotic CE4 enzymes described). Gene expression is most prominent in the foregut and midgut compartments and continues across the whole gut system. Bacterial chitin degradation also starts in the midgut but chitinases and chitin deacetylases are mainly expressed in the anterior hindgut. Chitin degradation by the lower termite gut ciliates was recently proposed to inhibit infection from entomopathogenic fungi [60]. A similar function could potentially be ascribed to gut procaryotes in higher termites, but this would require further investigation.
On average, lignocellulose is composed of cellulose (40-50%), hemicellulose (25-30%) and lignin (20-25%), with a smaller amount of pectin (5-10%). Lignin degradation in termites still remains an open question, and except for the fungus-growing termites, lignin is considered the major constituent of termite faeces [61, 62]. Early studies of soil-feeding termites reported low levels of enzymatic activity towards lignin deconstruction in the intestines of Cubitermes, Noditermes, and Procubitermes [63]. The lignin-targeting capacity of L. labralis has not yet been assessed, but according to the transcriptomic profiles, the host would express its own lignin-targeting enzymes across the whole gut tract. First, laccases of the AA1, which are mainly expressed in the foregut, would start depolymerising lignin allowing cellulases and microbial hemicellulases to access the other lignocellulose fractions in the hindgut more easily. Second, vanillyl alcohol oxidases of AA4 would continue the degradation of specific lignin components in P4, suggesting that the humified derivatives of lignin (i.e. aromatic substrates), could be further degraded to the nutritional benefit of the host. Both enzyme types operate in compartments characterised with a largely neutral pH and a relatively high partial pressure of oxygen, in line with the principle of known lignin degradation pathways [64]. Bacterial contribution to lignin degradation seems very limited in the L. labralis gut, and one AA1 gene and five AA4s were detected in the de novo reconstructed metatranscriptome, all characterised with a relatively low transcriptional abundance. In general, only a limited number of bacteria are currently known to degrade lignin, and these include representatives of Actinomycetes and Proteobacteria [64], which are characterised by low abundance in the Labiotermes gut.
Host cellulose degradation takes place across the whole termite gut, but the maximum gene expression is present in the midgut, and the main activity is attributed to GH9, with the highest transcript abundance coming from the Labio_GH9 cluster I (Fig. 6). Cellobiose dehydrogenase (AA3) combined with the LPMO activity (AA15) contributes to the cellobiose degradation to glucose moieties across the whole gut, but surprisingly, its maximum expression takes place in P5. Possibly, once lignin and hemicellulose fibres are dissociated, residual cellulose moieties can be subsequently accessed by the host. These expression profiles of cellulose-targeting enzymes correlate with the expression of β-glucosidase (i.e. mainly GH1 and GH30), which is highest in the midgut and in P5. Roughly three bacterial LPMOs of the AA10 family were identified in the metatranscriptome and were mainly expressed in the anterior hindgut. The expression of bacterial GHs known to express mainly a cellulolytic activity (i.e. GH9 and GH94) was generally low, compared to other CAZymes. However, GH5 and GH10 enzyme coding genes were widely expressed, mainly in the hindgut, with known activities including endoglucanase and endoxylanase [9]. Another widely expressed gene family in P1 was GH11, known to target xylan [65]. Similar to cellulose degradation by the host, cellulose and xylan depolymerisation by bacteria was accompanied by the expression of respective β-glucosidases (i.e. GH1, GH3, GH39), and β-xylosidases (i.e. GH3, GH43 and GH39), indicating efficient cellulose and xylan utilisation by the Labiotermes gut microbes.