Increased HDCs in CRC patients
We first focused on CRC. As expected, we found that HDCs were significantly higher in feces of CRC patients than that of the healthy controls in all seven datasets (Figure 1a, Additional file 1: Table S1 and Additional file 2: Table S2). We then identified in total 26 species that were significantly correlated with HDCs in at least two datasets (Spearman Rank Correlation, p-value < 0.05, Figure 1b; see Methods and Additional file 3: Table S3) and referred them as HDC-species below. We also identified species that showed significantly differential abundances between case and controls in at least two CRC datasets (adjusted p-value < 0.05, see Methods) and referred them as Dif-species (also known as CRC-signature species). Interestingly, we found half of the HDC-species (13 out of 26) overlap with the CRC Dif-species, including twelve CRC-enriched ones (Figure 1b) such as Fusobacterium nucleatum, Bacteroides fragilis and Peptostreptococcus stomatis, which were found in two recent meta-analyses of CRC [15, 16]. Microbial colonization varies along the colon, partly because of thickness of mucous layer. Previous studies showed the B. fragilis with the capability of glycoproteins degradation and toxin production could penetrate the protective mucous layer, suggesting the bacteria accelerate the injury of gut barrier, trigger inflammation and induce tumorigenesis [28-30].
We also identified forty HDC-correlated metabolic pathways in at least two datasets (referred as to HDC-pathways, see Additional file 4: Table S4); among which, sixteen were identified as metabolic pathways with differential abundances between patients and controls in at least two datasets (referred as to Dif-pathways, see Methods). Most of the HDC-pathways that decreased in at least three datasets were related to carbohydrate degradation for production of energy and short-chain fatty acids, such as D-galactose degradation and sucrose degradation (Figure 1c) [31]. In addition, HDC negatively correlated with the degradation pathways of several monosaccharides and monosaccharide derivatives, including fucose, mannose, galactose and UDP-N-acetyl-D-glucosamine (Additional file 4: Table S4), which are known building blocks of gut mucus glycans; these results indicated decreased concentrations of the monosaccharides and derivatives, further confirming that the intestinal barrier is compromised [30].
Together, our results suggested that CIB, as indicated by HDCs that can be directly quantified from gut metagenomics data, maintained a relationship with gut microbiota dysbiosis both in taxonomic and functional levels.
Combination of HDC and microbiome contributed significantly to patient stratification
We next tested if HDC-species and HDC-pathways could contribute to patient stratification in CRC. Similar to Wirbel et al [15] and Thomas et al [16], we performed a leave-one-dataset-out (LODO) analysis [32] in which Random forest classifiers were trained on the combined datasets of all but one, and tested on the one that was left-out; we did this for each dataset in turn. As shown in Figure 2a and 2b, for models trained using species and pathways abundances, including HDCs could improve prediction performance. More importantly, HDC was ranked as a top feature, i.e. the 4th and 1st in the taxonomic (Figure 2c) and functional (Figure 2d) models, respectively. Interestingly, both HDC-related models performed better than models based on Dif-species and Dif-pathways, even though overlap existed in the taxonomic and functional features (Figure 2a, 2b). These results indicated the HDC-correlated features could contribute substantially to patient stratification and disease diagnosis (Figure 2).
Similar results were found in CD
We then checked if similar results could be found in CD. A previous study reported elevated fecal HDCs in pediatric CD patients as compared with healthy controls [13]; the authors used quantitative polymerase chain reaction (QPCR) method to quantify HDCs by targeting human beta-tubulin coding-sequences. The authors also calculated HDCs from the metagenomics data and reported that the QPCR results were positively correlated with metagenomics-data-derived HDC values (r = 0.81 Pearson’s correlation, p = 9.3 x 10-11; see ref [13]). We re-calculated the HDCs using our methods and found they were highly correlated with theirs (r = 0.978 Pearson’s correlation, p < 2.2e-16; Additional file 5: Table S5). These results further validated the reliability and accuracy of metagenomics-derived HDCs.
We identified 46 HDC-species (Control+Baseline group, Spearman correlation, P-value < 0.001), most of which (31 out of 47) overlapped with the Dif-species of CD that showed significant abundance changes between healthy controls and untreated patients (Control+Baseline group, Wilcoxon rank sum test, adjusted p-value < 0.05, Figure 3a, Additional file 6: Table S6 and Additional file 7: Table S7). Akkermansia muciniphila and Bacteroides caccae as mucus-degrading commensal species, were expectedly reduced with increasing HDCs, because impaired gut was insufficient to secrete mucus [33]. Another control-enriched bacterial marker, Eubacterium ventriosum, was previously identified to be negatively associated with fundamental components of eukaryotic cell membranes [34]. Similarly, differential pathways partly overlapped with HDC related pathways, including those involved in carbohydrate, protein and glycogen metabolism, the decreased abundances of which were known to associated with nutrient deficiency and dysfunction of intestine (Additional file 8: Table S8 and Additional file 9: Table S9) [31, 35, 36].
We also built random forest classifiers using species and pathways abundances for CD and did 10 times repeated 10-fold cross-validation. Similar to CRC, we found that adding HDC to the input data could improve prediction performance (AUC increased from 0.94 to 0.95 based on species profile; increased from 0.90 to 0.92 based on pathways profile; Additional file 10: Figure S1); similar to CRC, we found that HDC was ranked as a top important feature (1st in this case), and majority of top ten features were HDC-species (Figure 3b). Interestingly, although overlapped significantly, these species are quite different from those in CRC (Additional file 11: Table S10) in terms of their changes and importance in patient stratification (Figure 3b), likely due to differences of disease localizations and microenvironments: CD commonly occurred in the terminal part of ileum and present an inflammatory habitat for microbes, while CRC appearing as tumor microenvironment occurred in the colorectum [37, 38]. Nonetheless, it appears that elevated HDC is a common feature of intestinal diseases, while different diseases can be distinguished by their different gut dysbiosis profiles.
HDC and related dysbiosis signified clinical treatment outcomes
The CD patients we analyzed were treated with diet intervention or anti-TNF antibodies; the outcomes were evaluated with fecal metagenomics sequencing at week 1, 4 and 8 after the interventions [13]. We found that the HDCs were significantly decreased over time (Figure 4a). As expected, HDC correlates significantly with FCP (Pearson’s correlation = 0.498, p < 2.2e-16, Additional file 12: Figure S2), a clinical indicator of intestinal inflammation released by neutrophils. However, concentrations of FCP were only associated with three CD Dif-species, indicating that HDC is a better biomarker related with dysbiosis than FCP. Strikingly, we found 23 of the HDC-species in CD showed coordinated changes with HDC, i.e. species that were positively (negatively) correlated with HDC in the Control+Baseline group decreased (increased) with the decreasing HDCs (Kruskal-Wallis rank sum test, adjusted p-value < 0.05, Additional file 13: Figure S3), suggesting that the intervention that reduced fecal HDCs could globally reverse the gut dysbiosis in a species-specific manner. Such a conclusion was further supported by the observation that the correlations between HDC and some of the species were consistent in the Control+Baseline, Week1, Week4 and Week8 groups (Figure 3a).
We then investigated the effects of classifiers based on HDC and gut microbiome in predicting response to CD therapy (see Methods). As we expected, including HDC to the models could improve performances (Figure 4b, Additional file 14: Figure S4); again, we found that models based on HDC-species performed better than models based on Dif-species. These results suggested we need reform the previous thinking that considers only changed species as biomarkers of patients, because there were some species whose alterations did not reach the significance threshold (e.g. fdr < 0.05) but had a tendency. Besides, according to accuracies of classifiers built on pathways, we hypothesized that the microbial functional network didn’t change a lot during treatment, even if the conditions of the patients were improved over time (Additional file 14: Figure S4). To confirm our hypothesis above, we collected another metagenomics dataset of CD patients for external validation. Interestingly, models built on HDC and HDC-species performed better (AUC=0.71, Figure 4c) than other models (AUCs≤0.66) (Additional file 15: Table S11). Most of the top features of HDC related classifier are consistent with foregoing results that several HDC-species tended to recover when patients were under treatment (Additional file 16: Figure S5). The performance of the classifiers confirmed our inference that HDC related features (i.e. HDC-species) had the potential to be signatures in classifying therapeutic responses (Figure 4b, Additional file 15: Table S11).