It is the first bioinformatics analysis to investigate the interaction between gut microbiota and CRC in the microbiota field. To reduce confounding biases, we designed a paired-sample study based on the metagenomic data from the GMrepo Database. This type of study: 1) enables the comparison of gut microbiota among CRC patients from different countries; 2) is beneficial for identifying new CRC-enriched microorganisms commonly presented worldwide regardless of the regional difference.
Using the metagenomic data in our study, we identified CRC-enriched or depleted microorganisms precisely at the species level. Firstly, we confirmed the novel associations of CRC with several enriched species as reported recently, including Fusobacterium nucleatum and species belonging to Parvimonas, Peptostreptococcus, Porphyromonas, and Prevotella genera. (Ahn et al.,2013; Wirbel et al.,2019; Wong et al.,2019) Within all CRC-enriched species, we noticed that parvimonas micra has the highest mean decrease accuracy in the random-forest model and it also has frequent interaction with other species in the co-occurrence network. This finding was consistent with previous studies which report a key role of parvimonas micra in CRC formation (Yu et al.,2017; Dai et al.,2018). Another species we noticed is A.muciniphila. We found that A.muciniphila was over-represented in CRC patients than healthy controls. According to our subgroup results, the CRC enrichment tendency of A.muciniphila exists in all six countries. This result is consistent with several previous studies (Dingemanse et al.,2015; Osman et al.,2021; Wang et al.,2022;) in which A.muciniphila was considered a CRC-enriched biomarker that could promote CRC formation by triggering inflammation and intestinal epithelial cells proliferation. However, while we chose biomarkers in the random forest model, A.muciniphila was not a statistically-significant candidate predictor. Some studies reported the anti-tumorigenesis foundation of A.muciniphila (Fan et al.,2021), which shows a contradictory role of A.muciniphila in CRC formation. We also detected a slightly decreasing trend of A.muciniphila in CRC patients with BMI over 30. Therefore, we assume that the abundance of A.muciniphila might be fluctuant due to its potential interaction with other environmental factors. It can not be considered as a single biomarker to confirm the CRC diagnosis.
Although the distributions of gut microbes differ among different countries, we identified some new species exclusively existing in CRC patients but are commonly present in CRC patients from most countries. Collinsella tanakaei is a novel species that exclusively exists in CRC samples from five different countries, and its accumulated abundance ranks third among all CRC-specific species. Although Collinsella tanakaei has not been reported to have direct associations with CRC activity, it has been previously reported to be associated with a gene called 12-beta-HSDH, which is also related to CRC progression (Doden et al.,2021). In addition, this species has been shown to interact with its related metabolites such as lactate, acetate, and formate. These metabolites are major end products of glucose fermentation, which might be involved in the primary and metastatic colon cancer cells (Grabon et al.,2016).
Besides, we also investigated the interaction between gut microbiota and two important CRC-associated factors: age and BMI. We found that Bacteroides uniformis is not only a significant CRC-enriched species but also an age-discriminatory bacterial taxon. In our study, Bacteroides uniformis showed an increase in abundance with increasing age, and this trend was more apparent in patients over 70 years old. There is still no promising evidence of the role of Bacteroides uniformis in CRC activity. Wang (Wang et al.,2012) found an increased abundance of Bacteroides uniformis in healthy volunteers. Justesen (Justesen et al.,2022), however, reported an enriched abundance and biomarker potential of Bacteroides uniformis in CRC diagnosis. No research discloses the role of Bacteroides uniformis in senescence progression. Thus, future studies should focus on the investigation of the association between this species and age-related diseases. In addition, we noticed that Dorea longicatena, Adlercreutzia equolifaciens, and Eubacterium hallii had positive associations with BMI, which was proved by both statistical methods. Dorea longicatena and Eubacterium hallii are obesity-related microorganisms reported by previous studies (Yan et al.,2021;Companys et al.,2021). Interestingly, we detected that Eubacterium hallii also has an indirect positive relationship with Ruminococcus sp in our co-occurrence network analysis. Ruminococcus sp is another species identified by the random-forest method to be positively related to BMI levels, especially in obese CRC patients. Further studies could focus on the possible interaction among these obese-clustered bacteria in CRC patients.
To further develop a diagnostic panel for CRC screening, we established several random-forest models by integrating different numbers of significant microbial candidates. Both means of AUC values in the training and validation cohorts were more than 0.80, indicating a relatively stable model performance in CRC diagnosis. Although the abundance of some species was significantly enriched or depleted in CRC groups than in controls, these species were not all selected as significant predictors in the final prediction model. It indicates that a more complex microbial interaction possibly triggers the formation of CRC rather than a single species. Thus, a microbial panel consisting of a series of significantly contributed species should be applied clinically as non-invasive biomarkers instead of using a single microorganism.
Our study has some limitations. Firstly, It is a case-control study with a relatively small sample size. Although we primarily finished quantitative comparisons based on a paired-sample design, it is better to reduce the individualized difference of other environmental factors by recruiting prospective cohorts, enabling tracking of the microbial change of the same person before and after CRC. Secondly, we were unable to access the detailed data of clinical characteristics such as the tumor size, location, and stage, results of other diagnostic tests due to the unavailability of this information from the GMrepo database. Thus, it is better to add more detailed clinical variables associated with certain phototypes in the future development of the microbiota database, which could promote the further analysis of the interaction between gut microbiota and more environmental factors.
In conclusion, we demonstrated gut microbial changes in CRC patients and established a microbial panel as a non-invasive method for CRC diagnosis. Identification of key species and their associated genes should be further emphasized to disclose the relative causality of microbial organisms and CRC development. This study may put forward more mechanism studies to interrogate the causal molecules in microbiome-linked CRC carcinogenesis.