pks+Escherichia coli more prevalent in benign than malignant colorectal tumors

Some E. coli strains that synthesize the toxin colibactin within the 54-kb pks island are being implicated in colorectal cancer (CRC) development. Here, the prevalence of pks+E. coli in malignant and benign colorectal tumors obtained from selected Filipino patients was compared to determine the association of pks+E. coli with CRC in this population. A realtime qPCR protocol was developed to quantify uidA, clbB, clbN, and clbA genes in formalin fixed paraffin embedded colorectal tissues. The number of malignant tumors (44/62; 71%) positive for the uidA gene was not significantly different (p = 0.3428) from benign (38/62; 61%) tumors. Significantly higher number of benign samples (p < 0.05) were positive for all three colibactin genes (clbB, clbN, and clbA) compared with malignant samples. There was also higher prevalence of pks+E. coli among older females and in tissue samples taken from the rectum. Hence, pks+E. coli may not be associated with CRC development among Filipinos.


Introduction
In 2018, the World Health Organization reported that about 1 in every 6 deaths worldwide is due to cancer, with colorectal cancer (CRC) ranking second in mortality rate, presenting great economic burden particularly in low-and middleincome countries (LMIC) [1]. In the Philippines, CRC ranks third in incidence for both sexes and across all age groups in 2020 [2]. Notably, mortality and morbidity related to CRC can be prevented by early screening as it takes up to 20 years for a non-cancerous growth to become malignant. Non-modifiable factors such as age, sex, hereditary mutations, and modifiable factors which include smoking, alcohol intake, medications, obesity, and diet have been identified to increase the risk of developing CRC. Of particular importance is diet since the gut microbiota heavily depends on nutraceuticals for their survival [3]. The gut microbiota has been of great interest lately because of its apparent role in the pathogenesis of immune-mediated/autoimmune, metabolic, cardiovascular, neuropsychiatric, and uremic diseases, and also cancer [4]. Bacteroides fragilis, Fusobacterium nucleatum, Streptococcus gallolyticus, Helicobacter pylori, Enterococcus faecalis, Clostridioides difficile, Clostridium 1 3 septicum, and Escherichia coli were among the species associated with CRC development [5][6][7][8][9][10].
E. coli strains are the first to colonize the gastrointestinal tract within hours of birth, which later harmoniously coexist with its human host. However, some strains synthesize toxins, including cyclomodulins, which are known genotoxins and modulators of cellular differentiation, apoptosis, and proliferation [11]. Among these cyclomodulins is colibactin, which is synthesized by a hybrid non-ribosomal peptide synthetase-polyketide synthase (NRPS-PKS) assembly line found within the 54-kb pks island containing 19 genes (clbA to clbS) [12]. Colibactin has been shown to induce DNA double-strand breaks and transient G2-M cell cycle arrest in host mammalian cells. The infected host cells can eventually survive but the incomplete DNA repair may lead to higher mutation rates that drive tumorigenesis [13]. In addition, infected host cells can also secrete growth factors that can be stimulatory to non-infected neighboring cells leading to their abnormal proliferation and tumor development [14]. Hence, this pks island that codes for colibactin has been associated with CRC development.
A study on Italian samples revealed that pks + E. coli colonize precancerous lesions, polyp lesions, and the normal tissues adjacent to these lesions but not the healthy normal mucosa [15]. Increased amounts of pks + E. coli among ulcerative colitis and CRC cases have also been noted [16]. Moreover, a meta-analysis presented that pks + E. coli strains were more prevalent among CRC and IBD patients compared with clinically healthy controls [17]. Yet, the prevalence of pks + E. coli in colonic lavage samples from Japanese patients with CRC was not significantly different from clinically healthy controls [18].
In the present study, the prevalence of pks + E. coli in malignant and benign colorectal tumors obtained from selected Filipino patients was compared. A realtime qPCR protocol was also developed, including design of primers to quantify select colibactin genes (clbB, clbN and clbA) using formalin fixed paraffin embedded (FFPE) biopsies.

Study samples and sample preparation
A total of 140 FFPE colorectal tissue samples (70 malignant paired with 70 benign) collected from January 2015 to August 2018 were retrieved from the repositories of the hospital study sites. Samples from patients with history of inflammatory bowel disease (IBD), polyposis syndromes, and Lynch syndrome were excluded from this study.
Tissue sectioning was carried out with a microtome (Leica RM2235, Germany) following standard protocols. The outer sections were stained with hematoxylin and eosin (H&E), and then sent to two (2) external evaluators (pathologists) blinded of the original diagnosis to confirm presence or absence of cancer cells. The inner slices were collected in nuclease-free tubes and stored at room temperature until DNA extraction and molecular analyses. Only the tissue samples in diagnostic concordance among external evaluators and original diagnosis of the study sites were considered for further molecular analysis.
Anti-contamination protocols were strictly followed. The working area and the entire microtome were cleaned with 70% ethyl alcohol and acetone, respectively; and the gloves and microtome blades were changed after every specimen processed. Pure paraffin blocks were also sectioned every ten (10) tissue samples to assess any cross-contamination.

DNA extraction
The number and thickness of sections generated were based on tissue size: ten 5 μm-thick sections if ≤ 0.2 cm 2 in size; five 5 μm-thick sections if > 0.2 cm 2 but ≤ 3.0 cm 2 ; and three 5 μm-thick sections if > 3.0 cm 2 . DNA was extracted using an in-house protocol as described [19]. Briefly, the tissues were treated with proteinase K dissolved in aqueous solution of tris-HCl, sodium EDTA, and Tween 20, and incubated for 24 h at 56 °C with constant mixing at 300 RPM. Proteinase K was then inactivated at 72 °C for 10 min and the mixtures were centrifuged at maximum speed for 2 min at 4 °C to allow the paraffin layer to form. The clear aqueous phase below the paraffin film was aspirated, transferred to a new nuclease-free tube, and stored at − 20 °C until use.

dsDNA control and primer set design for detection of pks+ Escherichia coli
The novel primer sequences that aimed for production of < 100 bp amplicons were defined and experimentally tested. Sequence variation in each target gene (uidA, clbB, clbN and clbA) was assessed utilizing BLASTn and primer blast (http:// www. ncbi. nlm. nih. gov/ tools/ primer-blast). The database generated 5-10 pairs of primers indicating the number of base pair, GC content, position of sequence, length, and self-complementarity [20]. Due to degradation of DNA from formalin fixation and paraffin embedding, the primers were designed to amplify target sequence < 100 bp in length, with similar melting temperatures (Tm), 21-23 bp long, and 40-60% G/C content. In silico primer tests were performed with the NetPrimer software by Premier Biosoft (http:// www. premi erbio soft. com/ netpr imer/ index. html). Gene sequences were sent to IDT Integrated DNA Technologies, Inc. (Singapore) for synthesis.
To further check the efficiency of the designed primers, gradient run with annealing temperatures from 59 to 61 °C was performed in a conventional PCR (T100 Thermal Cycler, BioRad, USA) using random colorectal DNA samples described above. Ten μl of the reaction mixture consisted of 1 μl DNA sample, 1 μl each of 10 μmol/l forward and reverse primers, 5 μl GoTaq Green Master Mix (Promega, USA), and 2 μl nuclease-free water. PCR products were subjected to 2% agarose gel electrophoresis for 45 min at 135 V and visualized under Gel Doc EZ Gel Documentation System (BioRad, USA).

Optimization of PCR conditions
Optimization of realtime qPCR conditions were done through gradient runs using 2 µl of a ten-fold serially diluted synthetic dsDNA (1 ng) in 20 µl reaction mixtures and 2 µl of random colorectal DNA sample in identical reaction mixtures. Optimum annealing temperatures were chosen by observing which temperature provided the earliest amplification and stable plateau of the synthetic dsDNA and DNA sample. Accuracy and efficiency of each primer pair were established through reproducible standard curves with 90-110% efficiency readings and 0.98-0.99 correlation coefficient (r 2 ) values derived using CFX Maestro software version 1.0 (Bio-Rad, USA).

Data analysis
To determine the copy number present per nanogram (ng) sample of each synthetic dsDNA uidA, clbB, clbN, and clbA, the formula was used; where N syn C is the gene copy number as a function of a variable amount of the synthetic dsDNA m in nanograms (ng) and number of base pairs l; while N A , m DNA , and k are constants, in which N A is the Avogadro's number, m DNA is the mass of one mole of a base pair assumed as 650 g, and k is a dimensionality constant equal to 1 × 10 9 units, which converts the units of m to grams [21].
To determine the copy number of uidA, clbB, clbN, and clbA present in the colorectal tumor tissues, the formula where N tis C is the copy number of the target gene based on the Cq value x, while α and β are constants. The values of the constants were derived using the formulas where n refers to the fold dilution series of the synthetic dsDNA. The Cq values and their corresponding cut-off values were internally determined by the CFX Maestro software.
To determine the prevalence of each gene in malignant and benign samples, their respective odds ratios were computed, and Fischer's Exact Test was used to determine if these values were statistically significant at 5% confidence. To further characterize the prevalence of each gene, the quantity and percentage of malignant and benign cases for each considered patient characteristic (age, sex, tumor site, and tumor grade) were tabulated. Only the samples which tested positive to all colibactin genes were evaluated for prevalence.

Characteristics of the study participants and their samples
Of the 70 pairs of malignant and benign FFPE colorectal tissue blocks retrieved from the biological repositories, 62 pairs had concordant histopathologic readings by all evaluators and were included for molecular analysis. Of the 62 malignant samples, 43 were matched with the adjacent cancer-free tissues retrieved from the same CRC patients (n = 43). The remaining 19 malignant samples were matched with cancer-free tissues removed from other patients who were of the same age and sex as the CRC patients. Median age at diagnosis of the CRC patients (n = 62) was 72 years old (range: 22-88 y/o). More samples came from female participants (n = 37/62; 59%). More malignant tumor samples originated from the rectum (41/62; 66%) and were diagnosed as adenocarcinoma (55/62; 89%); while for the benign tumors, they were mostly derived from the colon (56/62; 90%) as lines of resection (43/62; 69%). Only 39 samples had available information on their tumor grade, which were mostly poorly differentiated (25/39; 64%) ( Table 1).
(2) N tis C (x) = + ln(x) In-house realtime qPCR protocol for detection of uidA, clbB, clbN and clbA Accession numbers of the target genes uidA (KT311783.1), clbB (JX280405.1), clbN (JX280402.1), and clbA (JX280403.1) obtained from NCBI website were noted to specify the locations of each gene in the whole genome database. Several primers to detect the aforementioned genes have been reported [13,14,18,[22][23][24][25][26][27][28][29]; however, new sets were designed to amplify PCR products that were < 100 bp in length since the DNA samples were extracted from FFPE tissues. Optimum PCR conditions were achieved through several gradient runs using different annealing temperatures (50-63 °C) set in 40 cycles. Efficiency ratings of 90 to 100% and r 2 ranging from 0.98 to 0.99 were obtained after several assays and replications. The primer sequences and PCR conditions designed and optimized in this study are listed in Table 2.

Correlation between pks + E. coli and clinical characteristics
The prevalence of pks + E. coli in colorectal tissues as deduced by quantitative PCR of the telltale genes was further correlated with the patient's profile. However, results were only reported in frequency or percentages and no statistical analysis was done due to the limited sample size. Results showed that the prevalence of uidA in malignant and benign samples was not different regardless of age, sex, and tumor site. However, the colibactin genes were more prevalent among female patients, especially those above 60 years of age, and from samples taken from the rectum (Table 4).

Discussion
This study tried to investigate the possible association of pks + E. coli with colorectal tumor development among selected Filipinos. The malignant and benign colorectal tumor samples were initially sent to external reviewers and only the FFPE samples where all pathologists agreed on their diagnosis were included for further molecular analysis. The extracted DNA were then analyzed for the uidA gene which encodes the enzyme beta-D-glucuronidase. Molina et al. reported that this gene is highly specific for detecting E. coli strain K12 by PCR [30]. Furthermore, the same gene is also used for detecting urosepsis strains of E. coli encoding cyclomodulins [31]. In this study, only the uidApositive samples were analyzed for clbB, clbN and clbA, the colibactin synthesis genes of E. coli [28]. These three genes were chosen to represent the 19 colibactin genes found in the pks island. Specifically, clbA is located at the beginning of the assembly line and is responsible for the activation of the non-ribosomal peptide synthetase genes, clbB and clbN [12]. Colibactin B and N are carrier proteins that tether the growing colibactin chains [32]. The biosynthesis of colibactin in E. coli is reported to induce DNA interstrand crosslinks in cellulo and contributes to bacterial virulence [33]. Due to its cytopathic effects in infected epithelial cells, colibactin has  been implicated in CRC development by increasing epithelial cell proliferation and tumor invasion [34]. BLAST® sequence analysis of the pks marker genes showed 100% homology with E. coli strains EcPF5, EcPF14, SCU-488, SCU-306, HB37, SCU-101, UPEC129, P14, NS-NP030, and KS-P019. Among the 19 colibactin genes that BLAST has primer sequences, only clbA, clbB, clbM, clbN, clbQ, clbP, clbR, and clbS are provided, which may also amplify the colibactin genes of Klebsiella pneumoniae and Citrobacter koseri. However, it must be noted that E. coli (53.6%) has significantly higher relative percentage of pks genes compared with K. pneumoniae (17.9%) and C. koseri (7.1%). Furthermore, among these three bacteria, the proteins encoded by the pks genes have only been detected so far in E. coli strains [23].
A significantly higher number of benign colorectal tissues tested positive for all three colibactin genes compared with malignant samples. This is in contrast to studies done in the UK, France, and Malaysia which reported that pks + E. coli was significantly higher in CRC patients compared to noncancer patients [34][35][36]. However, a study conducted among a Japanese population reported no significant difference in the prevalence of pks + genes between CRC cases and healthy controls [18]. It is postulated that microbiota composition varies according to geographic area; thus, this might explain the discrepancy in the distribution of the pks island in the Philippines, Malaysia, Japan, and Europe (UK and France) [37]. Moreover, the difference in the type of tissue samples that were subjected to molecular analysis might have affected the results. Both the UK and France studies made use of fresh biopsy tissues; Malaysia used in vitro assays; whereas, Japan and this study utilized colonic lavage and FFPE colorectal tissues, respectively [35]. Several studies have reported that there was little variation in the bacterial communities present along the colon of CRC patients, whether it be a malignant or benign region. Same clonal E. coli has been documented along the colon of patients regardless of the presence of tumor [36]. Additionally, a study on cyclomodulin-producing E. coli B2 strains isolated from patients with colon cancer and diverticulitis were able to form biofilm but showed poor invasive and adherent activities. This proved that pks + E. coli could colonize intestinal mucosa of the colon but the malignant transformation remains a question [38].
Both age and sex showed to affect the gut microbiota component of an individual. This study observed that the older female patients, whether with malignant or benign colorectal tumors, had higher prevalence of pks + E. coli than males. It is worth noting that the incidence of CRC per 100,000 population in the Philippines is at 23.7% for males and 15.1% for females [39]. In another study, the pathogenic cyclomodulin-positive E. coli strains were more prevalent in the mucosa of male patients with late-stage CRC [40].
In addition, the old age group has been observed to carry increased Proteobacteria population, which explains why the older patients in this study registered higher amounts of E. coli [41]. It would be interesting to know whether a more sedentary lifestyle associated with age is a risk factor for colonization with pks + E. coli.
This study also noted that pks + E. coli was more abundant in samples taken from the rectal tumors, whether benign or malignant. In the beginning of the study, when the tumor sections were retrieved, there were more malignant samples from the rectum which could have influenced the frequency of tumors that tested positive for uidA. In contrast, a study among Malaysian CRC patients showed that pks + E. coli was more abundant in the distal than proximal parts of the colon [36].
Results also showed that pks + E. coli is more predominant in poorly differentiated malignant tissues. Conversely, a study conducted in a Malaysian population noted higher prevalence of pks + E. coli in early stage CRC than later stages [36]. In another study, no significant difference was seen between CRC patients with pks + E. coli and pks − E. coli in terms of bacterial colonization, inflammatory score, neoplastic stage, and tumor node metastases grade [14]. This higher ratio of pks + bacteria in tumor cells may result to dormancy of tumor cells since pks + E. coli can induce cellular senescence. Meanwhile, a low bacteria : tumor ratio induces senescence only to a smaller area of the tumor, while at the same time releasing growth factors that negates senescence leading to further tumor proliferation [42].
To our knowledge, this is the first study on the prevalence of pks + E. coli in malignant and benign colorectal tumors obtained from selected Filipino patients. Compared to previous studies that analyzed fresh biopsies and colonic lavage, the present study was limited to the use of archived FFPE colorectal tissue samples. FFPE remains to be widely used in molecular assays because of technical ease in tissue processing as well as it offers economic advantage for longitudinal tissue specimen storage [43]. The use of FFPE is also advantageous in settings where ethical clearance is a crucial consideration because archived specimens from accredited repositories can be retrieved and analyzed without the need for recruiting new patients [44]. However, one major challenge in using FFPE in molecular assays is the low quality and quantity of nucleic acids extracted from FFPE tissue blocks. Thus, the need to optimize and develop new protocols in extracting and amplifying DNA products from FFPE specimens is necessary [43]. In this study, a realtime qPCR protocol was developed, including the design of primers that can amplify < 100 bp long sequence of the uidA, and colibactin genes. In addition, this study was only able to include a limited number of samples even though it is easier to retrieve FFPE samples than do perspective sampling. The lack of financial capabilities in developing countries such as the Philippines poses a great limitation in the collection and storage of FFPE samples; thus these practices are not a routine practice in hospitals [19]. As in this study, samples were collected only from two hospitals. Thus, a follow-up study that includes a greater number of participants should be explored to confirm any or lack of association of pks + E. coli with CRC development among Filipinos. Nonetheless, results of this study can provide baseline data on the prevalence of pks + E. coli in malignant and benign colorectal tissues from Filipino patients.