Study design and patient demographic characteristics
There were a total of 329 CRC matched tumor tissues and adjacent non-tumor tissues collected from Chinese patients. After extracting DNA and detecting the enriched microbiota of all tissue samples by 16S rRNA Illumina MiSeq sequencing, the microbial diversity analysis, LEfSe and metabolic pathway prediction were performed in the discovery cohort which included 30 neural invasion patients with CRC (group NI), 23 vascular tumor-thrombus patients with CRC (group VT) and 35 double negative patients with CRC (group DN). Microbial community analysis was performed among the tumor tissues as well as among the non-tumor tissues so that we could observe the differences in the composition and structure of microorganisms clearly in tumor or non-tumor tissues with different invasive states and identified differential microbiota in the three groups. Then, patients in prognosis cohort were divided into two groups according to the abundance of the most abundant genera respectively, and the relationship between the abundance of several microbiota and the prognosis of patients with invasive status was discussed. Finally, we analyzed the correlation between the abundance of specific microbiota and the clinical characteristics of 286 patients in the clinical characteristics cohort, and illustrated the association of the abundance of specific microbiota in tumor tissues with the overall survival time of patients in the overall cohort (Fig. 1).
The clinical characteristics of the patients in the discovery cohort are shown in Table S1. There were no significant differences in age, gender, tumor diameter, location or differentiation among the three groups (all p > 0.05).
Estimation of sequencing depth
A data set of 203,358 high-quality sequences was obtained from 88 pairs of CRC tissue samples. The median length of each sample was 430 bp. Rarefaction curves based on Faith’s PD showed that the sequencing depth of all samples was close to saturation, indicating that the sequencing results are enough to reflect the diversity contained in the current samples (Fig. 2A). Among them, the genetic diversity of tumor tissues is higher than that of non-tumor tissues. On the one hand, a Venn diagrams illustrating the relationship among the groups showed that compared with DN tumor tissues, 1554 of 9248 OTUs were unique in VT tumor tissues and 1945 OTUs were unique in NI tumor tissues (Fig. 2B). On the other hand, a Venn diagram also showed that only 1725 of 8678 OTUs were unique in VT non-tumor tissues and 1878 OTUs were unique in NI non-tumor tissues (Fig. 2C).
Gut microbial alpha diversity
The alpha diversity indices of Chao1, Observed species and Good’s coverage, which mainly indicated the species richness in the sample, were analyzed by using QIIME2 for estimating the richness and diversity of microbial community in these samples. There were significant differences between the VT tumor tissues and adjacent non-tumor tissues in the Chao1, Observed species and Good’s coverage indices (p = 0.00063, 0.0034, and 0.00021, respectively) (Fig. 2D). The Chao1 and Good’s coverage indices also significantly differed between the NI tumor tissues and adjacent non-tumor tissues (p = 0.032 and 0.022, respectively), but not for the Observed species index (p = 0.7) (Fig. 2E). However, there were no significant differences found in these indices between the DN tumor tissues and adjacent non-tumor tissues (p = 0.73, 0.4, and 0.43, respectively) (Fig. 2F). These results indicate that the non-tumor samples have a greater species richness than tumor samples with VT and NI.
The Chao1 and Observed species indices also significantly differed between the VT and DN tumor tissues (p = 0.025 and 0.019, respectively), except in the non-tumor tissues. (Fig. 3A-B). Meanwhile, there were no significant differences between the NI and DN tissues as well as between the NI and VT tissues (all p > 0.05) (Fig. 3A-B).
Composition of the microbial community
The variations in the microbial compositions between paired samples from three groups were determined using PERMANOVA to measure beta diversity. The neural invasion, vascular tumor-thrombus and double negative tumor tissues and their adjacent non-tumor tissues had a statistically significant difference in beta diversity, according to Jaccard distance (p = 0.001, Table 1). However, the Jaccard distance did not significantly differ between the VT and DN tissues as well as between the NI and DN tissues (all p > 0.05, Table 1), but not between the VT and NI tumor tissues (p = 0.016 and q = 0.048, Table 1).
Table 1
Beta diversity assessed by Jaccard distances.
Metric | Model | F | R2 | p-Value | q-Value |
Jaccard distances | | | | | |
| Tumor vs. Non-tumor (vascular tumor-thrombus CRC) | 2.891 | 0.062 | 0.000*** | 0.000*** |
| Tumor vs. Non-tumor (neural invasion CRC) | 2.892 | 0.047 | 0.000*** | 0.000*** |
| Tumor vs. Non-tumor (double negative CRC) | 2.617 | 0.037 | 0.000*** | 0.000*** |
| Vascular tumor-thrombus vs. Double negative (tumor) | 1.132 | 0.020 | 0.193 | 0.290 |
| Vascular tumor-thrombus vs. Double negative (non-tumor) | 1.121 | 0.020 | 0.212 | 0.636 |
| Neural invasion vs. Double negative (tumor) | 1.017 | 0.017 | 0.362 | 0.362 |
| Neural invasion vs. Double negative (non-tumor) | 0.886 | 0.014 | 0.636 | 0.636 |
| Vascular tumor-thrombus vs. Neural invasion (tumor) | 1.436 | 0.027 | 0.016* | 0.048* |
| Vascular tumor-thrombus vs. Neural invasion (non-tumor) | 0.958 | 0.018 | 0.477 | 0.636 |
The q-Value was determined with Benjamini-Hochberg method. CRC, colorectal cancer. *p < 0.05; ***p < 0.001; *q < 0.05; ***q < 0.001. |
The PCoA analysis based on the unweighted UniFrac distance matrix also proved that significant clustering was shaped in the tumor tissues of the three groups, which was distinguished from the non-tumor tissues (Fig. S1A-C). We intuitively discovered the most different microbial species between the tumor and non-tumor tissues using LEfSe (Fig. S1D-F). When the LDA threshold was 4.0, p_Thermi, c_Deinococci, o_Thermales, f_Thermaceae, f_Oxalobacteraceae, g_Thermus and g_Cupriavidus were enriched in tumor tissues. Simultaneously, p_Proteobacteria c_Alphaproteobacteria, o_Sphingomonadales and f_Sphingomonadaceae were enriched in the non-tumor tissues compared with those in all tumor tissues. More interestingly, f_Pseudomonadaceae and g_Pseudomonas were highly enriched in the non-tumor tissues of the VT and DN groups, but not in the NI group.
Beta diversity analysis between the three groups was also evaluated through PCoA based on the unweighted UniFrac distance matrix and NMDS (Fig. S2). Similar to the alpha diversity analysis results, no significant clustering was observed between the NI and DN groups. Although there were differences in the alpha diversity, no significant clustering was shaped between the VT and DN groups.
At the phylum level, the five most fundamental phyla in the 88 CRC tumor tissue samples were Proteobacteria, Thermi, Firmicutes, Bacteroidetes and Actinobacteria, accounting for 98% of the total community (Fig. 4A). According to the relative abundance of the phyla in each group, the general microbial community composition could be clearly seen. At the genus level, the four dominant genera in the tumor tissues were Cupriavidus, Acinetobacter, Sphingobium and Thermus (Fig. 4B). To further evaluate the microbial community differences between the tumor and non-tumor tissues of the three groups, we conducted LEfSe. As expected, we did not find meaningful differences in the non-tumor tissues; the different microbial species in the tumor tissues are shown in Fig. 4C. When the LDA threshold was 2.0, the gut microbiota of the patients with NI was enriched with f_Pseudomonadaceae, g_Pseudomonas, g_Pelomonas, g_Ralstonia, g_Acidovorax and g_Enhydrobacter. In the DN CRC tumor tissues, increased abundances of o_Desulfovibrionales, f_Desulfovibrionaceae, g_Desulfovibrio and g_Subdoligranulum were observed. Only g_Gemmiger was enriched in the patients with VT. The highest relative abundance, LDA score, p value and q value of each taxon among the three groups were also described in Table S2. Furthermore, the different microbiota detected in LEfSe differed between the tumor tissues of the VT and NI groups, as also shown in the previous beta diversity analysis. A total of 18 enriched species were identified (Fig. 4D). g_Cupriavidus, f_Pseudomonadaceae and g_Pseudomonas were enriched in group NI, while g_Meiothermus and g_Gemmiger were enriched in group VT.
Metabolic pathway difference analysis between the tumor and non-tumor tissues
We analyzed the differences in the metabolic pathways between the tumor and adjacent non-tumor tissues using PICRUSt2 and MetaCyc in the discovery cohort. As shown in Table S3, there were fewer differential metabolic pathways in the VT group than in the NI and DN groups, and there was no significantly down-regulated metabolic pathway in the VT tumor tissues. Several metabolic pathways, such as mandelate degradation I, mandelate degradation to acetyl-CoA, and cytidine monophosphate-legionaminate biosynthesis I, were enriched in the tumor tissues of the NI and VT groups. Numerous metabolic pathways, including glyoxylate assimilation, 3-hydroxypropanoate cycle, and superpathway of the 3-hydroxypropanoate cycle, were significantly reduced in the tumor tissues of the DN and NI groups compared with those in the non-tumor tissues. Moreover, seven metabolic pathways, including mevalonate pathway I, superpathway of sulfolactate degradation, peptidoglycan biosynthesis, teichoic acid (poly-glycerol) biosynthesis, geranylgeranyl diphosphate biosynthesis and lactose, superpathway of glycerol degradation to 1, 3-propanediol, and galactose degradation, highly increased in the tumor tissues of the three groups. However, we did not find significantly different metabolic pathways among the tumor tissues of the three groups, which indicates that there is no unique metabolic pathway in tumor tissues with different invasion statuses.
Association of the microbial communities with outcomes and clinical characteristics
Based on the results of the three groups in the discovery cohort, we evaluated the association of high or low abundance microbiota in the tumor tissues and the survival of the patients with CRC with VT or NI in the prognosis cohort (31 and 51 patients, respectively). As shown in Table 2, we used the Kaplan–Meier method and found that among the top 18 highly abundant microbial communities at the genus level, the abundance of Herbaspirillum was significantly associated with the survival time of the patients with CRC with VT or NI (p = 0.026 and 0.014, respectively). A high abundance of Cupriavidus in the NI CRC tissues was associated with a significantly shorter survival time (p = 0.019). Even though Pseudomonas, Pelomonas, Ralstonia and Acidovorax were enriched in tumor tissues of the NI group, their abundance has no significant relationship with the prognosis of patients with neural invasion (all p > 0.05).
Table 2
Association of microbial communities with overall survival of neural invasion and vascular tumor-thrombus CRCs.
Microbial communities | OS in neural invasion patients with CRC (days) | OS in vascular tumor-thrombus patients with CRC (days) |
High | Low or negative | p-Value | High | Low or negative | p-Value |
Cupriavidus | 457 | 500 | 0.019* | 443 | 582 | 0.546 |
Acinetobacter | 448 | 510 | 0.558 | 459 | 574 | 0.193 |
Sphingomonas | 423 | 536 | 0.145 | 567 | 459 | 0.497 |
Thermus | 465 | 492 | 0.224 | 481 | 550 | 0.180 |
Pseudomonas | 482 | 474 | 0.717 | 480 | 552 | 0.591 |
Sphingobium | 493 | 463 | 0.528 | 586 | 439 | 0.486 |
Brevundimonas | 505 | 450 | 0.547 | 491 | 536 | 0.413 |
Massilia | 460 | 497 | 0.180 | 454 | 580 | 0.707 |
Ochrobactrum | 433 | 526 | 0.150 | 395 | 642 | 0.491 |
Shigella | 535 | 420 | 0.553 | 486 | 545 | 0.157 |
Caulobacter | 523 | 432 | 0.125 | 464 | 568 | 0.683 |
Lactobacillus | 542 | 413 | 0.113 | 496 | 534 | 0.544 |
Arthrobacter | 457 | 501 | 0.710 | 552 | 474 | 0.871 |
Herbaspirillum | 433 | 525 | 0.014* | 485 | 546 | 0.026* |
Pelomonas | 436 | 522 | 0.134 | 492 | 539 | 0.733 |
Aquabacterium | 446 | 511 | 0.652 | 520 | 509 | 0.609 |
Acidovorax | 486 | 470 | 0.734 | 427 | 608 | 0.151 |
Ralstonia | 442 | 517 | 0.691 | 497 | 533 | 0.204 |
We define the level of bacterial taxa (high and low or negative) based on the median abundance of microbiota in tumor tissues. OS data are shown as the mean. CRC, colorectal cancer; OS, overall survival. *p < 0.05. |
Considering that the small sample size in our previous analysis may not be widely representative, we verified the association of the status of the different taxa with the clinical characteristics of 286 patients with CRC. We selected four microbes mentioned in the previous experiment, including Cupriavidus, Herbaspirillum, Thermus and Pseudomonas. Interestingly, the status of Cupriavidus was associated with NI (p = 0.042, Table 3), pN stage (p = 0.006, Table 3) and clinical stage (p = 0.001, Table 3). An abundance of Herbaspirillum was also associated with the tumor diameter (p = 0.001, Table 3), NI (p = 0.011, Table 3), pN stage (p = 0.013, Table 3), distant metastasis (p = 0.001, Table 3) and clinical stage (p = 0.001, Table 3). However, there was no significant relationship between the abundance of Herbaspirillum and the incidence of VT (p = 0.214, Table 3). Moreover, the abundance of Thermus was related to the pN stage (p = 0.013, Table S4) and clinical stage (p = 0.043, Table S4). Although Pseudomonas was highly enriched in the tumor tissues of patients with NI, its abundance was not significantly correlated with the clinical characteristics of the patients (all p > 0.05, Table S4).
Table 3
Characteristics according to the relative abundance of Cupriavidus and Herbaspirillum in CRC tissues.
Characteristics | Cupriavidus-high | Cupriavidus-low | p-Value | Herbaspirillum-high | Herbaspirillum–low or negative | p-Value |
| n = 143 (%) | n = 143 (%) | | n = 143 (%) | n = 143 (%) | |
Age (years) | 57.94 ± 12.93 | 55.07 ± 13.17 | 0.064 | 57.52 ± 12.99 | 55.49 ± 13.20 | 0.190 |
Gender | | | | | 0.233 | | | 0.811 |
Male | 76 (53) | 86 (60) | | 82 (57) | 80 (56) | |
Female | 67 (47) | 57 (40) | | 61 (43) | 63 (44) | |
Tumor diameter (cm) | 4.51 ± 2.24 | 4.86 ± 2.24 | 0.190 | 4.26 ± 2.08 | 5.11 ± 2.33 | 0.000*** |
Neural invasion | | | | | 0.042* | | | 0.011* |
Absent | 90 (63) | 106 (74) | | 88 (62) | 108 (76) | |
Present | 53 (37) | 37 (26) | | 55 (38) | 35 (24) | |
Vascular tumor-thrombus | | | | | 0.890 | | | 0.214 |
Absent | 109 (76) | 108 (76) | | 104 (73) | 113 (79) | |
Present | 34 (24) | 35 (24) | | 39 (27) | 30 (21) | |
pT stage | | | | | 0.657 | | | 1.000 |
T1 + T2 | 10 (7) | 12 (8) | | 11 (8) | 11 (8) | |
T3 + T4 | 133 (93) | 131 (92) | | 132 (92) | 132 (92) | |
pN stage | | | | | 0.006** | | | 0.013* |
N0 | 63 (44) | 86 (60) | | 64 (45) | 85 (59) | |
N1 + N2 | 80 (56) | 57 (40) | | 79 (55) | 58 (41) | |
Distant metastasis | | | | | 0.227 | | | 0.000*** |
M0 | 112 (78) | 120 (84) | | 105 (73) | 127 (89) | |
M1 | 31 (22) | 23 (16) | | 38 (27) | 16 (11) | |
Clinical stage | | | | | 0.000*** | | | 0.000*** |
1 + 2 | 50 (35) | 77 (54) | | 48 (34) | 79 (55) | |
3 + 4 | 93 (65) | 66 (46) | | 93 (65) | 66 (46) | |
The p-values for categorical and continuous variables were analyzed by Chi-square test and Fisher’s exact test, and continuous variables were analyzed by two independent-samples t-test. CRC, colorectal cancer. *p < 0.05; **p < 0.01; ***p < 0.001. |
Additionally, we tested the association of the different microbiota with the outcomes of the overall cohort based on detailed follow-up records through the Kaplan-Meier analysis. The median survival time of the patients with CRC with highly abundant Cupriavidus was significantly shorter than that of the patients with scarce Cupriavidus (p = 0.025, Fig. 5A). Meanwhile, there was a significant difference between Thermus-high patients and Thermus-low/negative patients (p = 0.019, Fig. 5B). However, in patients with different abundances of Pseudomonas in tumor tissues, there was no significant difference (p = 0.5, Fig. 5C). It can be also observed that the difference in the survival time between Herbaspirillum-high and Herbaspirillum-low/negative patients was statistically significant (p = 0.00031, Fig. 5D).