Gut mucus layer degradation is associated with aggressive cervical cancer phenotype


 Background

Gut microbiome community composition differs between cervical cancer (CC) patients and healthy controls, and increased gut diversity is associated with improved outcomes after treatment. We proposed that functions of specific microbial species adjoining the mucus layer may directly impact the biology of CC.
Results

In this study, we examined metagenomes of rectal swabs in 41 CC patients using whole-genome shotgun sequencing and found a significant association between molecular functions encoded by the metagenomes with markers of aggressive cancer including initial tumor size and stage. Profiling of the molecular function abundances and their distributions identified 2 microbial communities co-existing in each metagenome but with distinct metabolism and taxonomic structures. Community A (Clostridia and Proteobacteria predominant) was characterized by high activity of pathways involved in stress response, mucus glycan degradation and utilization of degradation byproducts. This community was prevalent in larger, advanced stage tumors. Conversely, community B (Bacteroidia predominant) was characterized by fast growth, active oxidative phosphorylation, and production of vitamins. This community was prevalent in small, early-stage tumors.
Conclusions

Based on these results, we propose that increased mucus layer degradation is associated with a more aggressive cervical cancer phenotype.

In most studies, bacterial DNA extracted from fecal samples is used as proxy for evaluation of the gut microbiome composition. However, the microbial communities adjoining and populating the outer mucus layer of the intestine are different from stool as a whole [12] and may be more relevant in modulating immune function. Indeed, composition of bacteria within the stool primarily re ect the dietary habits of patients, while microbial communities adjoining the mucus layer [13,14] also protect the gut epithelial surface that holds many important immune and metabolic functions. The difference may be more pronounced when the mucus layer is somehow disturbed as a result of a disease [15].
The mucus layer consists of an outer mucus layer, which is a habitat for commensal bacteria, and a smaller inner layer, which is attached to the epithelial cells and is lacking commensal bacteria [16]. This protective coatings are the crucial interface between the host and microorganisms [17] and are mainly (~ 89%) comprised of glycans (polysaccharides) attached by O-glycosylation to Muc2 (mucin 2) protein [18]. Galactose and N-acetylgalactosamine are major components of mucus glycan in normal human descending colon, although the structure varies in other parts of the intestine [19]. The mucus glycans provide attachment sites for bacteria [20] and select species that can in uence the mucus layer structure and the host [21] by degrading mucin glycans, metabolizing products of the degradation, and producing metabolites that affect the host, positively or negatively. According to recent studies [22,23], mucins have also potent bene cial properties and can modulate microbial phenotypes suppressing quorum sensing, bio lm formation, and secretion of toxins. Thus, the microbial community adjoining and populating the outer mucus layer occupies a rather distinct environment within the intestine and can be affected not only by diet, but also by the mucus barrier and by the host.
Previous bacterial 16S rRNA gene sequencing studies found that stool and rectal swab microbiotas from the same subject were similar and that rectal swabs can be used as a proxy for fecal samples [13,14,24]; However, in light of the ndings referenced above, the mucus layer of the intestine, including rectum, can signi cantly modify phenotype of the adjoining microbial community. This means that even if the structure of the community is similar between stool and swabs, metabolic functions implemented by the same organisms in these different environments may be different. In addition, the use of Whole Genome Shotgun sequencing (WGS) instead of 16S RNA can provide ner resolution of the microbial community structure and functions and can reveal features not found by 16S.
In this study we propose that molecular functions encoded by metagenomes, further referred as MA (Mucus Associated) because they populate the mucus layer or adjoin to it, are different among CC patients and likely associate with their clinical characteristics. We therefore explore metabolic characteristics of microbial communities populating the mucus layer in rectums of 41 CC patients. Rectal swabs were used to sample the communities and then to sequence them by WGS.

Study cohort
Patients were enrolled on an IRB approved (MDACC 2014 − 0543) study for longitudinal sampling of the gut microbiome (Table S1). For all 41 patients enrolled on study, median age was 49 (range 29 to 72 years), and median BMI was 28.6 (range 17.5 to 46.7). Most patients were Non-Hispanic White (44%) or Hispanic (44%) and had squamous carcinoma (74%) and stage II disease (54%). Median tumor size was 5.4cm (range 1.8cm to 11.5cm). More information on clinicopathological characteristics of the patients is provided in Table S2.
Functional richness of MA metagenome associates with larger tumor size and with advanced CC stage We rst explored associations of total number of predicted genes and functions with patient and tumor characteristics (Table S3, Fig. 1a). A higher numbers of unique functions was positively correlated with increased tumor size (R = 0.41, P = 0.008) (Fig. 1b). Importantly, a large number of unique KEGG orthologous groups (KOs) was signi cantly associated not only with large tumor size (P = 0.02), but also with advanced stages (III/IV) of cervical cancer (P = 0.005; Fig. 1a).
Unsupervised hierarchical clustering reveals microbial communities with 2 distinct pro les of molecular functions Next, we explored whether speci c genes or functions were associated with patient and tumor characteristics using unsupervised hierarchical clustering of abundances. We selected 2,396 KOs that were found in greater than 30 samples. These KOs clustered into 2 large clusters and 4 sub-clusters ( Fig. 1c). In general, Cluster 2 was comprised of smaller tumors ("ST cluster"), when compared with Cluster 1 ("LT cluster"; t-test P = 0.06). Within the LT cluster, Sub-cluster 1.1 was dominated by younger patients (t-test P = 0.05) with large tumors (t-test P = 0.03).
The KEGG pathway analysis of the ST cluster and the LT cluster using KEGG mapper [38] also identi ed a set of pathways enriched in each of the clusters (Fig. 1c). The ST Cluster was enriched with pathways associated with rapid microbial cell proliferation, including ribosome biogenesis (Additional File 1: Figure  S3a), DNA repair (Additional File 1: Figure S3b), oxidative phosphorylation (Additional File 1: Figure S3c), and synthesis of peptidoglycan and lipopolysaccharides (Fig. 1d). Almost all enzymes involved in synthesis of biotin (vitamin B7) were also found in the ST cluster (Fig. 1d).
The LT cluster (Cluster 1 in Fig. 1c) was enriched with pathways associated with bacterial stress response, as indicated by activation of quorum sensing (Additional File 1: Figure S4a), sporulation, and degradation of glycan. The glycan degradation pathways included high activity of the KEGG pathway, "Other glycan degradation" (Fig. 1e). In addition, pathways involved in utilization of glycan degradation byproducts, such as fructose and mannose (Additional File 1: Figure S5a), and especially galactose (Additional File 1: Figure S5b) were enriched. Galactose is the major constituent (~ 85%) of glycans in normal human gastric mucus [44] and was also enriched in the LT cluster. Both branches of the pentose phosphate pathway, the oxidative branch maintaining redox balance in stress conditions and non-oxidative branch that supply glycolysis with intermediates derived from pentoses [45], were also more active in the LT cluster (Additional File 1: Figure S6). Most enzymes involved in utilization of ethanolamine (Fig. 1e), a breakdown product of human and bacterial cell membrane [46], and enzymes involved in production of ornithine (Fig. 1e), a precursor for synthesis of polyamines, such as putrescine [47,48], were found enriched in the LT cluster. The biological processes of agellar assembly and chemotaxis (Additional File 1: Figure S7) were also up-regulated in the LT cluster, revealing the importance of mobility for bacteria residing in the metagenomes [49].
Supervised comparison of metagenomes associated with largest and smallest tumors To con rm the association of large tumor size with glycan degrading microbial communities, we divided patient samples into "Large Tumor" (LT-group) and "Small Tumor" (ST-group) groups. Two groups of metagenomes were selected to quantify abundances of the communities. The LT-group included 14 metagenomes of patients with the largest tumors, and the ST-group had 14 metagenomes of patients with the smallest tumors (Fig. 2a). The LT-group were also signi cantly more enriched in patients with stage III/IV of CC then ST-group (chi-squared contingency table tests p-values 0.02), since tumor size is roughly correlated with stage in the 2009 FIGO staging system used in the study. The diversity and richness of molecular functions was signi cantly higher in LT versus ST-group ( Fig. 2b and Fig. 2c respectively). There were 1,496 differentially abundant KOs between the groups with most of the KOs (75%) were more abundant in the LT-group. The density plots of the differentially abundant KOs were also different between the groups. The KO abundances were signi cantly more scattered around the mean (standard deviation is 2.6) in LT-group when compared with ST-group (standard deviation is 1.3). The KO abundances mean value was almost twice as low in LT-group than in ST-group ( Fig. 2d and Fig. 2e respectively). The observed density plots indicated that 2 distinct populations (sets) of molecular functions, referred to as LT-abundant and ST-abundant, might coincide in each metagenome but be most representative in either LT-or ST-group of samples. This conclusion was con rmed by density plots of all KOs found in LT-group ( CAZyme Enzyme families associated with small tumors To associate the LT-abundant KO population with the glycan degradation we compared abundances of carbohydrate-active enzymes (CAZyme) families in this population and in the STabundant KO population using CAZy database [39]. The database classi es CAZymes according to their functions, such as synthesis of complex carbohydrates or their hydrolysis.
Glycoside hydrolases GH, which are enzymes involved in hydrolysis of glycosidic bonds between carbohydrates or between a carbohydrate and a non-carbohydrate moiety, were highly enriched in LT abundant KO population (Fig. 2h). There were 16 GH among KOs in the population, and none in the ST abundant population, which included only 3 Glycosyl Transferases (GT); the enzymes that are mainly involved in biosynthesis of disaccharides, oligosaccharides, and polysaccharides. We further compared ST-and LT-abundant KO populations in terms of abundances of glycosidases, enzymes annotated by EC 3.2.1-involved in hydrolyzes of O-and S-glycosyl compounds. There were 32 glycosidases in 28 samples, and many of them were signi cantly more abundant in metagenomes of patients with large tumors (Fig 2i). Only one enzyme, lysozyme, was found to be signi cantly more abundant in the small tumors group of patients. Lysozyme is a known component of two-component cell lysis cassette in bacteriophages [50].

Biological processes associated with small and large tumors
The KEGG pathway analysis of differentially abundant KOs was further used to identify speci c biological processes represented by LT-abundant and ST-abundant KOs (Fig. 2j).  Figure S10A and S10B). These positive associations are well documented in the literature as very signi cant in studies of large number of CC patients (>10,000) [51]. In this study, patients with positive nodes also had signi cantly increased activity of glycan degradation pathway (p=0.005) and ornithine biosynthesis (p=0.05) in MA metagenomes ( Fig.3a and Fig. 3b). Activity of the pathways was also signi cantly higher in patients with stage III/IV tumors ( Fig. 3c and Fig. 3d), while activity of ribosome biogenesis was higher in metagenomes of stage I/II patients (Additional File 1: Figure S11).
Taxonomic associations with tumor size Difference in taxonomic structure of MA metagenomes in LT-group versus ST-group can be seen at different taxonomic levels (Fig. 4a, Additional File 1: Figure S12-S14) with dominance of Phylum Firmicutes (Class Clostridia and Order Clostridiales) and Phylum Proteobacteria in LT-group and Phylum Bacteroidetes (Class Bacteroidia, Families Prevotellaceae and Ruminococcaceae in ST-group. Like the differentially abundant molecular functions, the density plots of differentially abundant species were also different between the groups (Fig. 4b) with a greater median abundance of species in ST-group. No difference, however, was seen between LT-and ST-groups of metagenomes in terms of species richness, diversity, and evenness (Fig. 4c).
Taxonomic annotation of most abundant contigs (OTUs) in each group and comparison of their abundances between groups revealed that many of the contigs belonged to Order Bacteroidales, and they were signi cantly more abundant in ST-group. There were 5 putative species of Genus Porphyromonas (P. endodontalis, P. sp COT_239OH1446, P. bennonis, P. sp HMSC065F10 and P. uenonis) and 2 species of Genus Prevotella (P. Timonensis and P. Buccalis) among Bacteroidales (Fig. 4d, Additional File 2: Table S3). None of the genera was signi cantly abundant in LT-group. Contigs annotated by Class Clostiridia and other levels of taxonomy in the Class were abundant in L-group including 12 putative species of Family Ruminococcaceae. Only 1 species of Ruminococcaceae (AF41_9) was found more abundant in ST group. The observations were consistent with results obtained by comparison of LT-and ST-groups using LEfSe ( Fig. 4e and Additional File 1: Figure S15). The latter analysis has also identi ed less abundant taxa enriched in one of the groups, such as Class Tissierellia, Genera Ezakiella, Murdochiella and Hungatella, in ST-group, as well as Families Ruminococcaceae, Lachnospieraceae, Enterobacteriaceae, and Order Enterobacterales in LT group (Additional File 1: Figure S16).

Taxonomic associations with pathways
To identify taxa utilizing the glycan degradation pathway in each metagenome we selected contigs that encode enzymes of the pathway and quanti ed abundances of taxa representing these contigs. Overlay of the enzyme abundances across metagenomes with the relative abundances of taxa involved in glycan degradation (Fig. 4f) showed that major taxa degrading glycan are different between MA metagenomes of patients with small and large tumors. Clostridiales is more likely to encode enzymes of the glycan degradation in the LT-group, while Bacteroidales is the major taxon that encode the enzymes in ST-group. The result is consistent with the difference in taxonomic structure of MA metagenomes described in the previous paragraph. Further correlation analysis revealed a signi cant positive association of the glycan degradation pathway score with Bacteroidales, Clostridiales, and with non-classi ed taxa. A signi cant negative correlation was found with Actinomycetales, Chlamydiales, and Tissierellales (Additional File 1: Figure S17). In general, there was signi cant variation across taxa involved in glycan degradation. The variation was especially dramatic in MA metagenomes of patients with large tumors.

Discussion
Herein, we have described signi cant differences in the gut metagenomes adjoining the mucus layer (MA metagenome) between cervical cancer patients with small, early stage tumors and large, advanced stage tumors at the time of treatment. Speci cally, MA metagenomes of patients with small tumors were enriched in molecular functions associated with biosynthesis and rapid proliferation. By contrast, MA metagenomes of patients with large tumors were enriched with functions associated with bacterial stress response and degradation of glycan. In addition, the utilization of ethanolamine and functions related to bacterial cells mobility were more abundant among young patients with large tumors and advanced stage (Fig. 1c), suggesting particularly aggressive tumor biology. Activity of glycan degradation and of ornithine biosynthesis were also signi cantly associated with the advance tumor stage and the presence of cancer cells in lymph nodes, while activity of the ribosome biogenesis signi cantly associated with low stage tumors and better recurrence-free survival. This dominance may lead to signi cant changes in the metabolic environment of the intestine that directly impact tumor growth and progression.
It is possible, this aggressive tumor biology is fueled by differing molecular functions in MA metagenome. We have described two distinct microbial consortia, which we refer to here as proliferating and mucus degrading (Fig. 4g). Both consortia co-exist; although patients with larger tumor size and advanced stage show an obvious dominance of the mucus degrading consortia over proliferating. The tumor promoting effects of glycan degradation and utilization of major degradation products, such as galactose and other pentoses [19], may help drive this aggressive tumor growth. Glycan degradation leads to production of metabolites with known tumor growth promoting effects and with resistance to radio-and chemotherapy, such as ornithine [52] and ceramides [53,54]. The metabolites are produced as by-products of glycan degradation (Fig. 1e) and can be further metabolized either by bacteria or by host and give oncogenic effect after their metabolization. Ceramide itself, for example, is a powerful tumor suppressor, but products of its metabolism are potent tumor survival factors associated with resistance to cancer therapies [53]. Ornithine can be used by the consortium or by the host to synthesize the polyamines putrescine and spermidine (Additional File 1: Figure S8), which cause tumorigenic transformation and tumor progression [55]. If the gut metabolites cross the gut mucosa into blood through the mucus layer, they may promote systemic proliferation of tumor cells. In addition, the intensive ethanolamine utilization (Fig. 1e) by the LT-group metagenomes, especially pronounced in younger patients (Fig. 1c), also suggests a more pathogenic environment in the intestine. The set of enzymes involved in the pathway are similar to the eut operon in Salmonella enterica serovar Typhimurium [56], a known gastrointestinal pathogen. Many other species that contain the eut genes are also pathogens [46], because ethanolamine is derived from the membrane phospholipid phosphatidylethanolamine, an important component of all bacterial and eukaryotic cells. The intensive utilization of ethanolamine may indicate a degradation of the colonic epithelium [46] that secretes peptides inhibiting bacterial penetration into the inner colonic mucus layer and blocking bacterial mobility [17,57]. Indeed, the activity of pathways associated with mobility, such as chemotaxis and agella assembly, are also enhanced in MA metagenomes of the LT-group, particularly in sub-cluster of younger patients (Fig. 1c).
Conversely, the increased synthesis of peptidoglycan and lipopolysaccharides by the proliferating consortia may improve the intestinal microenvironment and the immune response in patients with smaller tumors [58]. The increased production of vitamins B7 (biotin) and B12 (cobalamin) may also be important for immune cell function and reduction of cellular oxidative stress [59][60][61]. Overall, these metabolites may have a tumor suppressive effect.
The ndings suggest a potential therapeutic intervention in CC by shifting the balance in MA metagenome from mucus degrading consortium to proliferating one for suppression of the tumor growth and enhancement responses to treatment.
This study does not make clear what comes rst, the biological mechanism of mucus degradation or tumor progression. It is likely that both processes are tightly related and in uence each other. Cervical cancer typically develops over years. Differences in the microbial ora of a large tumor may re ect changes in the tumor microenvironment directly affected by local tumor progression, or gut metagenome functions may allow increased tumor growth via systemic factors released into the bloodstream. Alternatively, tumor size may be a surrogate for other aggressive tumor biology, such as hypoxia or necrosis, which increase as tumors grow. Although this study associates large tumor size with degradation of mucus layer and reveal by-products of the degradation with known tumor promoting effects, it doesn't provide direct evidence for an association of tumor-promoting metabolites with tumor size. It is possible that tumor progression and associated changes of organismal processes may be primary and drive changes in the mucus layer microenvironment and in molecular functions of the MA community through production of cytokines and metabolites [17] or by changing the mucin glycans that are also important host signals selecting microorganisms and making them less pathogenic [23]. Indeed, the study reveals stress-associated changes in the mucus layer microenvironment that are indicated by reduced proliferation, and by increased activity of quorum sensing and sporulation in mucus degrading consortium.
Importantly, taxonomic structure differed between groups only when functions and pathways were considered, suggesting taxonomic comparison alone is not adequate. Further studies using mouse or cell culture models may be necessary to decipher biological mechanisms underlying the discovered association between CC and mucus layer degradation. In addition, because potential mechanisms underlying the association of the tumor size and mucus layer degradation are not speci c to CC, it is very likely that similar associations between the tumor size and structure and function of MA metagenomes may be seen in other cancer types.
Shifting the balance between proliferating and mucus degrading consortia in MA metagenomes may directly affect cancer therapy and drug toxicity. The mucus layer plays a very important protective role in the intestine [62], therefore its unbalanced degradation can impair sensitivity to or ability to resist toxic effects of drugs. Recently, a growing body of evidence linked microorganisms to cancer therapy e cacy and toxicity [8][9][10]. Our previous study of metagenomes in fecal samples of melanoma patients with different response to anti-PD-1 therapy [10] reveals that degradation associated pathways are enriched in non-responders, while biosynthetic pathways are enriched in responders. We were not able to associate catabolic pathways with the mucus layer degradation in this study; however, we can speculate that the degradation can be responsible for the enrichment of catabolic processes in non-responding patients. The small number of samples and the use of fecal samples instead of rectal swabs might complicate identi cation of the mucus layer degradation in the study. It is likely that fecal samples are not optimal for the evaluation because they represent a different environment in the intestine. Further studies using WGS are necessary to compare fecal and swab samples from the same patient when mucus layer degradation is evaluated on association of patient's response to cancer therapies, cancer stage, or patient survival.

Conclusions
Overall, these ndings suggest potential interventions related to mucus enhancement that could improve outcomes and decrease cervical cancer progression. Further mechanistic studies are important to explore if the disturbed balance between proliferating and mucus degrading consortia can be corrected by diet, probiotics, antibiotics, or other interventions.

Patient samples collection and processing
Forty-one patients treated at the University of Texas, M.D. Anderson Cancer Center or the Lyndon B. Johnson Clinic (LBJ) at Harris Health with a diagnosis of CC participated in the study. The patients were enrolled on an Institutional Review Board (IRB) approved prospective protocol. Informed consent was obtained to collect rectal swabs. All patient samples were acquired prior to receiving any treatment.
Samples were collected from each out of 41 patients by a clinician performing rectal exams using a matrix-designed quick-release Isohelix swab. The swabs were initially stored in 20ul of proteinase K and 400 μl of lysis buffer (Isohelix) within 1 h of sample collection and then were frozen and kept at -80°C.

WGS sequencing and metagenome assembly
Whole Genome Shotgun sequencing was performed on genomic bacterial DNA (gDNA), which was extracted to maximize bacterial DNA yield from specimens while keeping background ampli cation to a minimum [25,26]. Libraries were constructed from each sample using the KAPA Hyper Prep Kit (Kapa Biosystems, Wilmington, MA, USA) and sequenced using the Illumina HiSeqX platform with the 2 x 150 bp paired-end read protocol. Sequencing reads were derived from raw BCL les which were retrieved from the sequencer and called into fastqs by Casava v1.8.3 (Illumina). The appropriate read preparation steps, such as quality control, trimming and ltering, and host DNA removal prior to further analysis, were performed using an in-house pipeline (Additional le 1: Figure S1b; Additional le 2: Table S1). Brie y, paired-end raw sequence reads were ltered and trimmed using BBMap [27]. The trimmed reads were mapped to a hg38 reference database (GCA_000001405.28) using bowtie2 [28] to remove host contamination. The cleaned reads were then assembled to longer sequences (contigs) using both MEGAHIT [29] and metaSPAdes [30]. The assembled contigs were ltered, to remove those that were smaller than 1000bp, and binned using MetaBAT2 [31]. The assembled and binned contigs were used for gene predictions by Prodigal [32]. Annotation of the genes by KEGG ortholog groups (KOs) was implemented by KofamKOALA [33], and taxonomic classi cation of the contigs was done by CAT and BAT [34]. The read coverage of each assembled contig was calculated by aligning the cleaned reads directly to the contig using BBMap and by counting the mapped read by featureCounts [35]. The read coverage, GC content, taxonomic and functional annotations for each gene/contig were summarized using a Perl script. All the software tools were running with default parameters if not speci ed. Versions and sources of the software tools or packages used in the pipeline are listed in Additional le 2: Table S1. Output of the assembly pipeline was a set of assembled contigs for each sample, their taxonomic annotation and read coverage, gene predictions for each contig, and functional annotation of each gene by KEGG Orthologous Group (KO) if found.

Computational analysis of assembled genomes
Annotations of assembled contigs for all samples were aggregated into 1 table, referred to as the Metagenome Function Abundance (MFA) table (Additional le 1: Figure S1b). Each column in the table represents a metagenome, and each row represents a predicted known molecular function annotated by KO. Thus, each cell in the MFA table has a quantity of the speci ed (KO id) molecular function in the speci ed sample (Sample id). Quanti cation of the molecular functions by MFA table is explained using a toy example provided in Additional le 1: Figure S2. The MFA table was normalized using total number of reads in each sample and then multiplied by 1,000,000. Further analysis of the normalized MFA table included unsupervised and supervised methods, annotation by biological processes and pathways, and integration with clinicopathological characteristics of the patients (Additional le 1: Figure S1c).

Unsupervised analysis of MFA table
The MFA table was ltered to select KOs found in at least 30 patients, log2-transformed and centered using the median, and then hierarchically clustered by the open source clustering software [36] with default parameters. The inferred clusters of samples were tested for association with clinical information including age, CC stage and tumor size, using sher.test() and wilcox.test() functions in R. The inferred clusters of KOs were searched for overlap with known KEGG referenced pathways and modules using the "Search&Color Pathway" tool in KEGG mapper [37,38] with further manual curation of the results.

Supervised analysis of MFA table
The analysis was used to nd differentially abundant KOs in MA metagenomes of patients with large versus small tumors. Samples for the analysis were selected by sorting all 41 samples by tumor size of the CC patients and assigning top 14 samples with largest tumors to LT-group and bottom 14 samples with smallest tumors to ST-group for the comparison. All KOs found in at least 1 sample were included in the analysis. Fisher's test and Mann Whitney tests were used to nd differentially abundant KOs between the group with p-value < 0.05 (either test) without adjustment. The KO was considered enriched in LTgroup if Fisher's test p-values < 0.05 and the KO is more common in the group. If the Fisher's test p-value >0.05 but the Wilcoxon test p-value <0.05 than the enrichment was inferred by difference in mean abundances between ST-and LT-groups. The differentially abundant KOs were searched for overlap with known KEGG referenced pathways and modules as described above. The overlapping KOs were used to infer the pathways enrichment score calculated as ratio of difference between percentage of KOs overlapped with the pathway in ST-and LT-groups to sum of the percentages. Only top scored pathways that include 9 and more KOs for LT-group and 4 and more KOs for ST-group are considered.
Annotation of the differentially abundant KOs by carbohydrate-active enzymes (CAZy) families was implemented using the mapping table between the KO ID and CAZy family ID [39]. The table was

Survival analysis
The analysis was used to associate the Recurrence Free Survival (RFS) probability with clinicopathological characteristics and KEGG pathway [40]. Activity of each pathway found to be associated with the tumor size was quanti ed for each patient using the mean value of log2-transformed normalized abundances of KOs involved in the pathway. The mean value is referred to as the pathway activity score. To generate Kaplan-Meier plots for the pathway activity score, we categorized the score based on a cut-off set at the rst quartile. Observations for each variable falling within the rst quarter were labeled as "Low" and those greater than the rst quartile cut-off were labelled as "High". The analysis was implemented for each pathway identi ed as differentially enriched in either small or large tumor group metagenomes. We used WHO standards to set cut-offs to categorize BMI into "Underweight/Normal weight" vs. "Overweight/Obese". We used a cut-off set at the median for other continuous clinicopathological characteristics, namely, age and tumor size. Cox proportional hazards regression analysis [41] was used to evaluate predictive value of KEGG pathways and clinical characteristics on RFS time. The R packages 'survival' and 'survminer' were used to compute the survival curves, and to visualize them as Kaplan-Meier plots.
Taxonomic characterization of the cohort All predicted contigs annotated by Length (L), Depths (D), and by taxonomy were used to quantify abundances of the unique set of species in 14 samples of LT group and in 14 samples of ST group. For each sample, depth of the contig was multiplied by its length and the values were summarized for all contigs that belong to the same taxa to quantify the taxa abundance.
The values were normalized using the sum of the values for all contigs identi ed in the sample.
The OTU (operating taxonomic units) table was created by merging the abundances of all identi ed taxa for all 28 samples. Two approaches were used to nd differentially abundant putative taxa between LT and ST group. In the rst approach, the OTUs were selected by Fisher's exact test (for rare OTUs) and Mann Whitney tests (for common OTUs) with p-value cutoff < 0.05 (either test) without adjustment. In the other approach, the Linear discriminant analysis (LDA) Effect Size (LEfSe) [42] available as a Galaxy [43] module was used to determine the taxa that are differentially abundant between LT and ST groups. The analysis was run with default parameters, except the threshold for the logarithmic LDA score for discriminative taxa. The threshold was set to 2.5 instead of 2.
To reveal taxa involved in glycan degradation pathway in each metagenome we have selected all contigs that encode enzymes involved in the pathway. The set of unique taxa from taxonomic annotation of the contigs was considered as the representative taxa of the pathway. To create an OTU table of the representative taxa, each OTU were quanti ed by multiplying the length and the depth of each selected contig and by summarizing the obtained values for all contigs encoding the OTU. The 100% stacked area plot was used to visualize taxonomic structure of the community involved in the pathway across all studied samples. The Pearson correlation coe cients were used to evaluate association of the log2transformed score of the KEGG pathway with the abundance of each identi ed taxon.

Declarations
Ethics approval and consent to participate Informed written consent was obtained from eligible patients willing to participate in our study.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Raw sequencing reads generated by WGS are available in the BioProject database under accession PRJNA702617(https://www.ncbi.nlm.nih.gov/bioproject/PRJNA702617).   Association of KEGG pathway and clinicopathological characteristics. a Increased activity of Glycan degradation pathway and b Ornithine biosynthesis in metagenomes of patients with positive nodes on imaging. Categorization of patients into the negative (red color) and positive (green color) groups according to absence or presence of cancer cells in lymph nodes respectively revealed that 6 out of 12 patients with negative nodes had very low activity of Glycan degradation pathway. c Increased activity of Glycan degradation pathway and d Ornithine biosynthesis in metagenomes of patients with stage III/IV cervical cancer. e, f Negative association of Phosphotransferase system and of Glycan degradation pathway with recurrence free survival (RFS). Both pathways were enriched in metagenomes of patients with large tumors. The pathway activity was quanti ed in each metagenome by the mean value of log2transformed normalized abundances of KOs involved in the pathway; the value is referred to as the pathway activity score, which was categorized as high in the rst quartile and low in the rest. g, h Signi cant positive association of Ribosome biogenesis and of DNA repair pathways with RRS. Both pathways were enriched in metagenomes of patients with small tumors.