Of the 33 patients included in the final analyses, 12 had been diagnosed with tumors of the central nervous system (CNS), 13 with solid tumors (ST), and eight with leukemia. The cohort of patients included children within an age range from newborn to 17 years. In particular in the ST group was the median age two years (inter quartile range 0–11) relative to 4.5 (3.5–6.5) for leukemia and 9.5 (3–15) for CNS tumors. An initial non-metric multidimensional scaling (NMDS) analysis demonstrated the large effect age had on the microbiota in this cohort (Figure S1). [19] Yatsunenko et al. (2012) and others [20] (Stewart et al. 2018) have reported that the gut microbiota develops an adult composition by three years of age. Thus, we divided the patients into those less than three years old and those three years or older. A PERMANOVA model including both, diagnosis and age group, did not demonstrate any significant effects on diagnosis (p = 0.27), while age group was highly significant (p < 0.001). Diversity was significantly reduced (Figures S2a and S2b) in the patients younger than three years, both relative to those three years and older and healthy controls, in terms of species richness and Shannon entropy (p < 0.002 for all comparisons, Wilcoxon rank sum test). This result was in line with the expectations [19] (e.g. Yatsunenko et al. 2012). Thus, in order to carry out a more realistic comparison with our kindergarten controls, we excluded children younger than three years (11 in total) from our patient cohort in the following analyses. Within the patients of age three years and over, there was no significant effect of age on the microbiota (p = 0.57, PERMANOVA). Furthermore, all patients will henceforth be treated as a single group in the comparisons with the control group (in the control group, none of the children was below three years).
Children diagnosed with cancer had a significantly different microbiota composition (R2 = 0.09, p < 0.001, PERMANOVA) compared to the microbiotas of the cohort of children sampled from the kindergartens (Fig. 1). This result was confirmed by a machine learning approach using a random forest classification model. This resulted in a classification accuracy of 97.87%, with only one misclassified sample out of 47 (p < 0.001, permutation test).
We observed a marginally higher Shannon entropy in the patient group relative to controls (p < 0.046, Wilcoxon rank sum test, Figure S2b), while ASV richness did not differ significantly between the groups (Figure S2a). At the phylum level, we observed an increased mean relative abundance of Actinobacteria in the patient group (p = 0.028, Wilcoxon rank sum test) (Figure S3), but no other significant differences.
ASV level analysis using DESeq2 identified 11 ASVs with significantly (Benjamini-Hochberger corrected p < 0.05) differential occurrence between the patient and the control group (Table 1). Eight of these ASVs were depleted in children diagnosed with cancer. Strikingly, three of these eight were classified as Faecalibacterium prausnitzii (ASV2, ASV3, and ASV6), and all three were among the four ASVs with the highest significance for differential occurrence. The combined relative abundance of these three ASVs were significantly higher in the control cohort (p < < 0.001) (Fig. 2). All three groups were highly prevalent in the data (Figures S4a, b, and c) and additional testing confirmed the DESeq2 results (Wilcoxon rank sum test p < 0.001 for all comparisons). No significant relationship was observed between the relative abundance of these three ASVs and the age parameter in the patient group (Figures S5a, b, and c). Furthermore, all three ASVs were found in both, the patients and the controls. The other five ASVs depleted in patients were Haemophilus parainfluenzae, Alistipes finegoldii, and Sutterella wadsworthiensis, as well as two members of the Clostridium IV and XIVa clusters. The three ASVs that were significantly overrepresented in the patient group, were Prevotella copri, Dorea longicatena, and a member of the Clostridium XIVa clustered.
ASV2, ASV3, and ASV6 were also variables of high importance for successful discrimination between the patients and the controls in the random forest model (Figures S6a and b). The random forest model also identified the following short chain fatty acid producing taxa as being among the top 20 variables of highest importance for successful classification: Roseburia intestinalis, Ruminococcus, Anaerostipes hadrus, Roseburia, Coprococcus comes, Intestinimonas, Lachnospiracea, Roseburia inulinivorans, and Blautia faeces.