Stool pattern is associated with prevalence of tumorigenic bacteria of the microbiota and plasma and fecal fatty acids in healthy Japanese adults

Background: Colibactin-producing Escherichia coli containing polyketide synthase (pks + E. coli) has been shown involved in colorectal cancer (CRC) development through gut microbiota analysis in animal models. Stool status has been associated with potentially adverse gut microbiome proles from fecal analysis in adults. We examined the association between stool patterns and the prevalence of pks + E. coli isolated from microbiota in fecal samples of 224 healthy Japanese individuals. Results: Stool patterns were determined by factorial analysis using a previously validated questionnaire including stool frequency, volume, color, shape, and odor. Factor scores were classied by tertile. The prevalence of pks + E. coli was determined using specic primers for pks + E. coli in fecal samples. Plasma and fecal fatty acids were measured via gas chromatography-mass spectrometry. The prevalence of pks + E. coli was 26.8%. Three stool patterns contributed 70.1 % of all patterns seen (factor 1: lower frequency, darker color, and softer shape, factor 2: higher volume and harder shape, and factor 3: darker color and stronger odor). Multivariable adjusted odds ratios (95% condence intervals) of the prevalence of pks + E. coli for the highest versus the lowest third of the factor 1 score was 3.16 (1.38 to 7.24; P for trend = 0.006). This stool pattern correlated with some plasma and fecal fatty acids. No other stool patterns were signicant. Conclusions: These results suggest that stool pattern may be useful for evaluating gut microbiota, including the presence of tumorigenic bacteria and fecal fatty acids. It may provide useful insight for effective early discovery strategies for CRC.


Background
Colorectal cancer (CRC) is the second most common cause of cancer death and the third most common cancer in the world. Among all cancers, the mortality and incidence of CRC contributes approximately 9% and 10%, respectively [1]. The majority of loss of disability-adjusted life years for CRC primarily comes from years of life lost (95%), with years lived with disability only contributing 5% [2]. Reducing the number of CRC patients not only has a signi cant impact in increased overall longevity for the human race, but also has signi cant effects on reducing mean medical care costs. Thus, the discovery of simple targets that lead to early discovery of CRC risk is needed.
The fecal microbiome may contribute to CRC development because it shows changes in the very early stages [3].
Colibactin is a complex secondary metabolite produced by a certain gut Escherichia coli strain harboring the genomic island (clb gene cluster) encoding polyketide synthase (pks + E. coli) [4][5][6][7][8][9][10][11]. Colibactin has been shown to lead to genomic instability to the mammalian cells by inducing DNA interstrand cross-links via DNA alkylation [7,10,11]. This phenomenon leads to DNA double-strand breaks [6][7][8][9] and cell cycle arrest [8]. Therefore, the presence of pks + E. coli producing colibactin in the gut microbiome may be a risk factor of CRC and may be a useful target for identifying groups at high risk for both incidence and progression.
Stool status variables, such as stool shape [12][13][14], frequency [12,15], and color [16] have been associated with potentially adverse gut microbiome pro les from fecal analysis in healthy individuals and in patients with acute gastroenteritis. Using a stool pattern approach, which considers a more comprehensive overview of the stool status, could provide more interpretable ndings than studying single stool exams since some stool status variables are related to each other, such as stool frequency and shape [12].
We have previously reported that dietary intake was inversely related to the prevalence of pks + E. coli in Japanese adults [17]. Indeed, dietary intervention has been reported to improve stool status variables, including stool frequency and shape with simultaneous increase in both fecal [18] and plasma [19] fatty acids. Therefore, to further clarify the association between stool status and the prevalence of pks + E. coli, the association between stool status and fecal and plasma fatty acids need to be better understood. To our knowledge, these relationships have not been studied.
In this study, we aimed: 1) to evaluate the association between stool patterns and prevalence of pks + E. coli isolated from fecal samples, and 2) to investigate the relationship between stool patterns and plasma and fecal fatty acids in healthy Japanese individuals. We hypothesized that stool pattern variables were associated with the prevalence of pks + E. coli and the levels of plasma and fecal fatty acids because stool patterns may re ect the gut microbiota.

Results
Participant characteristics Table 1 shows the participant characteristics in the analysis cohort. Of the included 224 participants, 60 participants were noted pks + E. coli positive (26.8%). In comparing participants who were pks + E. coli positive and negative, the presence of pks + E. coli tended to be greater in both men and alcohol drinkers. Other variables were not signi cantly associated. b Categorical variables are shown as number of individuals (%) and were analyzed using a Χ 2 test.

Reproducibility Of Self-reported Stool Status
We evaluated the reproducibility of the self-reported stool status variables, including stool volume, shape, color, and odor (  Multivariate analysis for stool patterns and pks + E. coli carriers Table 3 shows the stool patterns extracted by factorial analysis in this population. Three stool patterns contributed 70.1% of all patterns seen (factor 1: lower frequency, darker color, and softer shape, factor 2: higher volume and harder shape, and factor 3: darker color and stronger odor). We evaluated the relationship between the prevalence of pks + E. coli and stool patterns by multivariate logistic analysis ( Table 4). The multivariable adjusted odds ratios (95% CI) of the prevalence of pks + E. coli for the highest versus the lowest third of the factor 1 score was signi cant at 3.16 (1.38 to 7.24; P for trend = 0.006), but no signi cance was seen with any other stool pattern. In addition, stool status such as stool color and shape was signi cantly associated with the prevalence of pks + E. coli (Supplementary Table 1). Ref reference. The prevalence rates of pks + E. coli are shown as numbers of people and percentages. The detail of the three stool patterns are as follows. Factor 1: lower frequency, darker color, and softer shape; factor 2: higher volume and harder shape; and factor 3: darker color and stronger odor.
a Statistical analysis was carried out using the likelihood ratio test for multivariate logistic analysis, and the odds ratios (ORs) and 95% con dence intervals (CI) were estimated. Bold p values are statistically signi cant (p < 0.05).
b Model 1 was adjusted for age (continuous) and sex (male or female).
c Model 2 was as model 1 plus mutual adjustment for BMI (continuous), family history of cancer (yes or no), smoking status (never smoker, past smoker, or current smoker), step counts (continuous), alcohol drinker (yes or no), and green tea consumption (continuous).
Correlation between stool status and plasma and fecal fatty acids Table 5 shows the association of stool patterns with fatty acids derived from plasma and fecal samples. The factor 1 score was signi cantly positively correlated with fecal isobutyrate, isovalerate, valerate, and hexanoate; and was signi cantly negatively correlated with plasma eicosenic acid and α-linoleic acid, as well as fecal propionate and Page 9/20 succinate. Other stool patterns had no signi cant correlation. The correlations between stool status and plasma and fecal fatty acids were tabulated and presented in supplementary Tables 2 and 3, respectively. Statistical analysis was performed by Spearman's correlation analysis. The statistical signi cance p value is indicated as follows: if p < 0.05, single asterisk (*); if p < 0.01 double asterisk (**). If the results presented a positive correlation, the participants with higher adherence to each stool patterns mean to relate higher plasma and fecal fatty acids (conversely, a negative correlation indicates that they mean lower it).

Discussion
In this study, we investigated the relationship between the prevalence of pks + E. coli and stool patterns through a population-based cohort study. Even after adjusting for confounders, we found a stool pattern (factor 1) that was signi cantly associated with the prevalence of pks + E. coli. In addition, this stool pattern was correlated with certain plasma and fecal fatty acids. As far as we know, this is the rst study to show a stool pattern association with not only prevalence of pks + E. coli, but also plasma and fecal fatty acids. These associations suggest that stool pattern may re ect the gut microbiota, including the presence of tumorigenic bacteria.
Certain risk factors for CRC incidence have been identi ed, including smoking, obesity, diabetes, and high consumption of alcohol, as well as consuming red and processed meats, in epidemiological studies [20]. These identi ed CRC-risk factors have not only been associated with increased CRC incidence, they have also been associated with potentially adverse gut microbiome pro les [21]. Recently, the prevalence of pks + E. coli isolated from the colonic epithelium has been reported to be higher in patients with familial adenomatous polyposis [22], in ammatory bowel disease [23], and CRC [22] compared to healthy individuals. Thus, it is important to evaluate the association between environmental exposure factors and the prevalence of tumorigenic bacteria in the gut microbiota.
Our results demonstrated a signi cant association between the prevalence of pks + E. coli and stool pattern. Animal models mimicking the natural transmission of E. coli producing colibactin from mothers to neonates has shown lower rates of Firmicutes taxa, Proteobacteria taxa, and microbial species richness, as well as higher DNA repair function compared to the sham model [24]. This model has also illustrated an association with gut homeostasis activities, including renewal of the mature epithelium and occurrence of crypt ssion [25]. Stool status variables, such as shape [12,13], frequency [12,15], and color [16] have been associated with higher microbial species richness pro les from fecal analysis in healthy individuals and patients with acute gastroenteritis. Our results support these ndings. A previous study showed that the majority of CRC deaths were attributed to non-screening in the United States [26]. Although a causal relationship between the prevalence of pks + E. coli as a tumorigenic bacteria and an increased risk of CRC has not been established well, this study may underscore the potential bene ts of evaluating the presence of pks + E. coli as a target for early prognostication in populations with a high risk of CRC. Our results also suggest that stool pattern might be a marker associated with the prevalence of tumorigenic bacteria in healthy individuals. Longitudinal objective monitoring of a person's stool status from serial samples taken from an individual's excreta at home, as previously suggested [27], may be the most reasonable and costeffective method for early detection of risk factors of CRC.
Nutrients derived from our ingested food are utilized by the gut microbiome, with certain preferred energy sources such as short chain fatty acids (SCFAs) for colonocytes [28,29]. These metabolites can suppress in ammation and carcinogenesis through effects on immunity, gene expression, and epigenetic modulation [28][29][30][31]. Some plasma fatty acids have been noted to be inversely or positively associated with the presence of colon adenomas [32] and increased risk of CRC in middle aged adults [33]. In addition, studies in CRC patients have noted lower levels of propionate and butyrate [34] and higher levels of valeric acid, isobutyric acid, and isovaleric acid [35] in SCFAs derived from fecal samples compared to healthy controls. Production of SCFAs has been shown to be reduced in patients with diarrhea compared to those without diarrhea [36]. In addition, inhibition of SCFA synthesis by administration of polyethylglycol and antibiotics has been reported to result in diarrhea [37]. It has been reported that the distal colon transit, re ected in stool frequency, was associated with not only plasma acetate and fecal SCFAs [38], but also with microbiota diversity, especially the Firmicutes taxa (Faecalibacterium, Lactococcus, and Roseburia) [39]. Our results indicated that the stool pattern that showed a relationship with the prevalence of pks + E. coli was also signi cantly correlated with certain plasma fatty acids, including α-linoleic acid [40] and certain fecal SCFAs, such as propionate [34] and isovaleric acid [35]. These were also associated with higher incidences of CRC, supporting previous ndings. Taking information from previous studies, we speculate that gut microbiota and dietary components interacted to generate biologically active molecules including SCFAs, which in uenced gut secretion and motility, and that this could play a fundamental role in stool status [30,31], as well as affect the prevalence of pks + E. coli. While detailed mechanisms and causal relationships should be clari ed in further studies, we can conclude that fecal matter is not just a simple waste material, but could be a possible tool to assess the gut microbiota, screening for the presence of tumorigenic bacteria and speci c fecal fatty acids via comprehensive examination of variables including color, shape, frequency, volume, and odor.
The strength of this study is in nding a veri ed association between stool patterns and plasma and fecal fatty acids. The multifaceted, self-reported questionnaires used to assess the stool status had previously been validated against objective fecal characteristics as well [41]. In addition, we showed that twice self-reported stool status was highly reproducible and believe that it is unlikely for there to have been misclassi cation when done in this manner.
Thus, this study might generate a new hypothesis for the association between the prevalence of pks + E. coli as a tumorigenic bacteria and stool pattern.
However, this study has a number of methodological limitations. Even when minimizing the effect of confounders using multivariate analysis to adjust for known covariates, being a cross-sectional study, we are unable to theorize about the temporal and direct causality of the observed association between stool pattern and the prevalence of pks + E. coli. Second, this study detected the clb gene cluster in DNA extracted from fecal samples, not from the DNA of isolated E. coli pure cultures. Previous study has been evaluated prevalence of pks + E. coli by the selective cultivation method [23]. However, our previous study [17] demonstrated that prevalence of pks + E. coli isolated from fecal matter are relatively similar previous reports investigating prevalence of pks + E. coli by the selective cultivation method [23]. Therefore, it will be necessary to evaluate the concordance rate of prevalence of pks + E. coli de ned using these two different methods for the same subject. Third, although our results showed that a softer stool shape was negatively associated with the prevalence of pks + E. coli, it is unclear whether participants with diarrhea, who have softer stools, have a lower prevalence of pks + E. coli or whether the reverse was true. In addition, we were unable to completely exclude systematic error due to self-reporting. In addition, we could not account for unmeasured confounding factors associated with stool status in this observational study. For example, stool color is mainly characterized by stercobilin (urobilin), an orange pigment which is the oxidized metabolite of urobilinogen [42]. Stercobilin derived from bile pigment is responsible for the brown color of human feces. Since we did not measure stercobilin and bile acids directly in all participants, we could not account for their possible effects on the results, though our results were similar after adjusting for bile acids in a subsection of participants with available bile acid data. It is necessary to further verify our results with further studies including patients and communitydwelling residents with symptoms such as diarrhea and constipation. Finally, there is the possibility of sampling bias due to the more health-aware nature of the participants in this study than in the general population. Of 750 participants in the NEXIS cohort study, 259 adults agreed to participate. As the participation rate was relatively low, selection bias may have occurred. In addition, participants were all living in the Tokyo metropolitan area and the mean age was 58 years old in Japan. These limitations may prevent the generalization of our results. Therefore, prospective cohort studies with larger randomized samples should be done to further investigate the association between the prevalence of pks + E. coli and stool pattern.

Conclusion
These results suggest that an adverse stool pattern is positively associated with prevalence of pks + E. coli. Given the rapidly increasing incident and mortality rates for CRC around the world, its early discovery is important for both enabling people to stay healthy and for limiting the burden of healthcare-related costs. Therefore, stool pattern may be useful for evaluating gut microbiota including the presence of tumorigenic bacteria and fecal fatty acids, and it may provide useful insight for effective early discovery strategies for CRC.

Participants and study procedure
This cross-sectional study utilized data from the Nutrition and Exercise Intervention Study (NEXIS) cohort study [17]. This cohort study has been managed by the National Institutes of Biomedical Innovation, Health and Nutrition This study was approved by the ethics review board of the Research Ethical Review Committee of NIBIOHN. After study procedures and the risks associated with participation in this study were explained, written informed consent was obtained from all participants before data acquisition. This study was carried out in accordance with the principles of the Declaration of Helsinki.
A kit for fecal collection and storage and the questionnaire for the lifestyle survey were mailed to the participants. They were instructed to complete the questionnaire to record pertinent lifestyle variables (e.g. medical history, smoking habit, dietary intake, and stool status) and to collect fecal samples approximately 7 mm in diameter (soybean size) at home. Dietary intake was evaluated using a previously validated brief-type self-administered diet history questionnaire [43]. To measure daily step counts as an objective form of physical activity, we used a triaxial accelerometer (Actimarker; EW4800; Panasonic Co., Ltd, Japan). The participants were instructed to bring their fecal samples and questionnaires to the NIBIOHN within a week after answering the questionnaires and nishing the serial fecal collection. There, they received physical and health examinations such as anthropometry and blood tests. Investigators, registered dieticians, or nurses checked the questionnaires and interviewed those with unclear responses or unanswered questions to con rm answers. Blood samples were used as a biochemical examination for conventional risk factors for lifestyle-related diseases, with close attention placed on variables such as low-density lipoprotein-cholesterol, hemoglobin A1c, and triglycerides. The collected feces, serum, and plasma were immediately placed in a sealed container and stored as individual sample types to avoid cross-contamination between samples in a -20 °C freezer.
Con rmation of pks + E. coli by PCR Bacterial genomic DNA was extracted from frozen fecal samples. Details of this protocol have been reported elsewhere [17,44]. To con rm the presence of pks + E. coli, we performed PCR to amplify the genes from the clb cluster using bacterial genomic DNA as a template. We used the PrimeSTAR GXL DNA polymerase or the SapphireAmp Fast PCR Master Mix (Takara Bio Inc., Shiga, Japan) for PCR analysis. The primers used in the PCR experiments are as follows: clbB forward primer: 5'-tgttccgttttgtgtggtttcagcg-3', reverse primer: 5'gtgcgctgaccattgaagatttccg-3'; clbJ forward primer: 5'-tggcctgtattgaaagagcaccgtt-3', reverse primer: 5'aatgggaacggttgatgacgatgct-3'; clbQ forward primer: 5'-ctgtgtcttacgatggtggatgccg-3', reverse primer: 5'gcattaccagattgtcagcatcgcc-3'. In this analysis, samples ampli ed with the appropriate amplicon length in the three clb genes were de ned as pks + E. coli positive individuals.

Evaluation Of Stool Status
Stool status was assessed using the multifaced self-reported questionnaire called the "intestinal visible sheet", which covers the 5-stool status variables (frequency, volume, color, shape, and odor) and was previously developed and validated against objective measurements of fecal characteristics including fecal weight, moisture, hardness, and color in adults [41]. In the NEXIS cohort study, all participants were given similar stool questionnaires assessing both "habitual stool status" and "stool status when collecting fecal samples (excluding stool frequency)" at the same time. We evaluated the reproducibility of the results by comparing these variables because variables evaluated based on self-reporting questionnaires may be affected by recall bias. We used the habitual stool status data in all analysis because it simultaneously evaluated all ve stool statuses.

Measurement Of Plasma Fatty Acids
To investigate plasma and fecal fatty acids, we used the frozen stored plasma and fecal samples. Total lipids were extracted from 0.4 mL plasma following the methodology reported by Folch et al. [45]. After hydrolysis with KOH, fatty acids were extracted with hexane and tricosanoic acid (C23:0) as an internal standard. Methyl esteri ed fatty acids were prepared with trimethylsilylating reagent and subjected to gas chromatography (GC). GC-electrospray ionization mass spectrometry (ESI/MS) analysis was performed using a Hitachi 063 gas chromatograph (Hitachi High-Tech Corporation, Tokyo, Japan) equipped with a hydrogen ame ionization detector. A glass column, 40 m × 0.3 mm in volume, was coated with diethylene glycol succinate (DEGS). Nitrogen gas was used as a carrier gas and delivered at a ow rate of 25 mL min-1. The column, detector, and injection port temperature were 180 °C, 220 °C, and 260 °C, respectively. This analysis measured the 24 types of plasma fatty acids with chain lengths from 12 to 24 carbons.

Measurement Of Fecal Short-chain Fatty Acids
Five to 10 mg feces were mixed with 90 uL MilliQ and 10 uL 2 mM internal standard containing acetic acid, butyric acid, and crotonic acid for 5 min. The mixture was homogenized with 50 uL HCL and 200 uL diethyl ether, and centrifuged at 3,000 rpm for 10 min at room temperature. Eighty uL of the supernatant organic layer was transferred to a new glass vial and combined with 16 uL N-tert-butyldimethylsilyl-N-methyltri uoroacetamide (MTBSTFA) as a derivatization reagent. The vials were immediately capped tightly with electronic crimper (Agilent), incubated for 20 min in an 80 °C water bath, and then left at room temperature in the dark for 48 hours for derivatization. The derivatized samples were analyzed using a GC-MS-TQ8040 gas chromatograph mass spectrometer (Shimadzu Corporation, Kyoto, Japan), and the injection was performed using an AOC-20i auto injector (Shimadzu Corporation, Kyoto, Japan). The capillary column was a BPX5 column (0.25 mm × 30 m × 0.25 um; Shimadzu GLC). Pure helium gas was used as a carrier gas and delivered at a ow rate of 1.2 mL min-1. The head pressure was 72.8 kPa with split (split ratio 30:1). The injection port and interface temperature were 230 °C and 260 °C, respectively. This analysis measured the 10 types of fecal SCFAs (C1:0-C6:0).

Statistical analysis
Participant characteristics were compared between participants with and without pks + E. coli. The variables compared were decided in accordance to baseline characteristics used in a previous study [17]. Continuous variables were shown as mean and standard deviation, with differences between the two groups evaluated using the unpaired t-test. Categorical variables were shown in numbers and percentages, with differences between the two groups evaluated using the chi-square test.
The agreement, adjacent agreement, and disagreement from the twice-evaluated stool status variables were expressed as number and percentage. Disagreement was de ned as a difference of more than three categories between each variable. In order to evaluate the reproducibility of the variables in self-reported stool status, we used a weighted κ statistic with 95% CI [46].
To extract the primary stool patterns, we used factorial analysis with varimax rotation (orthogonal transformation) to derive non-correlated factors [47]. This approach maintained a greater interpretability because each factor could be noted independent of the others with distribution explained by the variance among the individual components. We considered the scree plot and eigenvalues to determine the number of factors to retain by minimizing the number of indicators that had high loading on one factor [47]. For these reasons, we identi ed three stool patterns of interest from the ve stool statuses. We considered stable factor load to be scores greater than 0.4 [48]. For every participant, E. coli carrier. KT, NK, and TK measured fatty acids in plasma and stool. All authors critically reviewed the manuscript and approved the nal version, and agreed to be accountable for all aspects of the work and ensure that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.