Study of the diversity of 16S–23S rDNA internal transcribed spacer (ITS) typing of Escherichia coli strains isolated from various biotopes in Tunisia

We investigated the 16S–23S rRNA intergenic spacer region (ISR)-PCR and the phylogenetic PCR analyses of 150 Escherichia coli isolates as tools to explore their diversity, according to their sampling origins, and their relative dominance in these sampling sources. These genetic markers are used to explore phylogenetic and genetic relationships of these 150 E. coli isolates recovered from different environmental sources (water, food, animal, human and vegetables). These isolates are tested for their biochemical pattern and later genotyped through the 16S–23S rRNA intergenic spacer PCR amplification and their polymorphism investigation of PCR-amplified 16S–23S rDNA ITS. The main results of the pattern band profile revealed one to four DNA fragments. Distributing 150 E. coli isolates according to their ITS and using RS-PCR, revealed four genotypes and four subtypes. The DNA fragment size ranged from 450 to 550 bp. DNA band patterns analysis revealed considerable genetic diversity in interspecies. Thus, the 450 and 550 bp sizes of the common bands in all E. coli isolates are highly diversified. Genotype I appeared as the most frequent with 77.3% (116 isolates), genotype II with 12% (18 isolates); genotype III with 9.7% (14 isolates), and the IV rarely occurred with 4% (2 isolates). Distributing the E. coli phylogroups showed 84 isolates (56%) of group A, 35 isolates (23.3%) of group B1, 28 isolates (18.7%) of group B2 and only three isolates (2%) of group D.


Introduction
Escherichia coli is associated with a variety of intestinal diseases in humans and animals (Frohlicher et al. 2008;Martinez-Medina and Garcia-Gil 2014). Some pathogenic E. coli can produce adherent and destructive lesions, which are characterized by the bacteria tightly adhering to intestinal epithelial cells and destroying the underlying cytoskeleton (Abbassi et al. 2021;Jang et al. 2017). Similarly, E. coli was considered a normal resident of the intestines of humans and most animals. Some E. coli strains can cause a variety of intestinal and extra-intestinal diseases, such as diarrhea, urinary tract infections, sepsis, and neonatal meningitis (Gomes et al. 2016). The typing method to distinguish different bacterial isolates of the same species is an indispensable epidemiological tool in infection prevention and control (Dallal et al. 2019). Traditional typing systems based on phenotypes (such as serotypes, biotypes, phage types or antibacterial profiles) have been used for many years (Fratamico et al. 2016). However, more advanced molecular methods have recently been developed to examine the affiliation of microbial isolates (Amer et al. 2020), and these methods have changed our ability to accurately distinguish bacterial types and subtypes (Sabat et al. 2013). Since the last decade, several molecular techniques for microbial characterization have been developed. The study of non-coding RNA is important for finding functions or roles in cells (Harris et al. 2018). To understand its function, we may find a derivative structure. The tRNA family is a form of RNA molecule with a special function that can convert amino acids into protein-building machinery. 16S, 23S, ITS, gyrase, RNA polymerase and DNA ligase are highly conserved genes in bacteria and can be used for molecular identification. The ITS is also called ISR and exists between the 16S and 23S rDNA regions of ribosomal genes. The arrangement of complete ribosomal gene units (such as 16S-ITS-23S-ITS-5S) is scattered in the genome of bacteria, and its copy number is between 1 and 15 (Tacao et al. 2005). The recently developed DNA fingerprinting method is based on the repeated intergenic consensus sequence (ERIC) of intestinal bacteria, and has described repeated foreign palindrome (REP) and Box elements for distinguishing bacterial strains (Xin Wang et al. 2015;Tacao et al. 2005). According to reports, the length polymorphism in the ITS of 16S-23S ribosomal DNA is a stable genetic marker for studying bacterial phylogeny. Although rRNAs (and rRNA genes) are highly conserved, nucleotide variation between rDNA sequences is usually large enough to be used to estimate the relationship between bacterial phylogenetic profiles (Gutellet et al. 1994). The usefulness of rDNA sequence as a classification tool has been shown in bacteria, among which bacterial 16S rRNA sequence analysis has redefined the phylogenetic relationship, and it depended too on cell metabolism before (Fox et al. 1980). The size and number of DNA fragments generated by PCR amplification can be achieved to quickly identify a wide range of bacteria. In this work, we used 16S-23S ITS genetic markers to study the phylogenetic relationships of some E. coli strains isolated from various environmental biomes (water, animals, humans, and vegetables). Several biochemical markers have been studied in advance to identify these isolates. The relationship between these E. coli isolates and their system group members and their origins will be studied.

Collection, bacterial isolates media and chemicals
All samples collected from various biotopes (animal organs and meats, soil, water, feces of varied animals, food, humans, nosocomial and abattoir environment) were cultured onto three specific media, either eosin-methylene blue (EMB) agar (Oxoid Ltd., Basingstoke, UK), Chromagar™ or MacConkey agar (MSA, Becton Dickinson) for 24 h at 37 °C for the detection of E. coli. A single colony from positive samples was sub-cultured on Nutrient Agar (Biolife, Milano, Italy) for 18-24 h at 37 °C. The cultures were kept in the freezer (+ 4 °C) before the different tests and subcultured on a new nutrient agar when considered needed. The specific colony of presuming E. coli was preliminarily identified by the characteristic green metallic sheen on the EMB or blue color on the Chromagar™ or brick red on the MacConkey. Colonies with typical E. coli morphology were selected and identified by some standard specific morphological and biochemical tests, such as Gram stain, catalase, oxidase, indole, methyl-red Voges-Proskauer, citrate and urease, and confirmed by the Api 20E system (BioMérieux, La Balme-Les-Grottes, France). The final identification of all the isolates was made by the PCR with specific genes of E. coli and using the following primers (Altschul et al. 1997): F: 5′-ACT GGA ATA CTT CGG ATT CAG ATA CGT-3' R: 5′-ATC ACA GAT TCA TTC CAC GAAA-3'. All E. coli isolates were stored at -80 °C in brain-heart infusion broth (Oxoid Ltd., Basingstoke, UK) containing 20% of glycerol.

Extraction of genomic DNA
The method adopted is based on the ability of silica resin to bind DNA in the presence of a high concentration of guanidine thiocyanate chemotropic agent that guaranteed an excellent disruption of bacterial cells, collected from the MacConkey plates. Purified DNA was recovered from cell lysates using two sequential chloroform phenol extraction and ethanol precipitation steps (Jenson et al. 1993). DNA is typically determined by spectrophotometer at 260 nm, and one absorbance unit (A 260 ) corresponded to 50 μg DNA/ml. The purity may also be estimated by spectrophotometer from the relative absorbance determination at 260 and 280 nm, respectively (A 260 /A 280 ). Due to the variation between individual starting DNA materials, the expected range of A 260 /A 280 ratios will be around 1.6-1.8.

Phylogenetic grouping
PCR was performed with a Perkin-Elmer Gene Amp 9600 thermocycler under these conditions: denaturation for 5 min at 94 °C; 30 cycles of the 30 s at 94 °C, 30 s at 55 °C and 30 s at 72 °C, and a final extension step of 7 min at 72 °C (Clermont et al. 2000). Phylogroups and subtypes were identified according to Clermont et al. (2000) and Escobar et al. (2004).

PCR amplification of 16S-23S rDNA ITS and reaction conditions
The method of Jensen et al. (1993) was used in operating RS-PCR genotyping that is based on the amplification of the 16S-23S rRNA ISR. G1 and L1 primers defined by Jensen et al. (1993) were used in operating genotyping. The first primer G1 was selected from a highly conserved region immediately adjacent to the 16S-23S spacer. This oligonucleotide contains the sequence 16F: GAA GTC GTA ACA AGG and it is about 30 to 40 nucleotides upstream from the spacer boundary (Fig. 1). The second primer L1 was chosen from the five bacterial and four plant chloroplasts 23S sequences compiled by Getell et al. (1988). This sequence 23R: CAA GGC ATC CAC CGT, is the most conserved 23S Fig. 1 Representative gel electrophoresis of triplex genotyping PCR assay conducted on E. coli isolates, using E. coli ATCC 25922 (R) strain as a positive control. Expected product sizes are 279 bp for the chutA gene, 211 bp for the yjaA gene, and 152 bp for the TSPE4 fragment sequence immediately following the spacer, and it is situated approximately 20 feet downstream from the spacer boundary (Jensen et al. 1993). Primers for both the 16S and 23S regions were restricted to a length of 15 bases because of variations in the sequences beyond these highly conserved regions. The protocol of DNA amplification adopted was as recommended by Fournier (2008). Each reaction of a total volume of 25 µl contained 1× Hot Star Taq Master Mix (Qiagen Benelux, Antwerp, Belgium), 800 nM of each primer (G1 and L1 primer) and 30 ng DNA. The PCR amplification conditions were: 95 °C for 15 min, followed by 27 cycles of 94 °C for 1 min, followed by 2 min ramp and annealing at 55 °C for 7 min. Then a further 2 min ramp, the extension was done at 72 °C for 2 min.

Construction and analysis of dendrogram
Fingerprints, discontinuous noise and the overall density of fingerprints, ITS-PCR patterns, a band-matching algorithm (match-matching tolerance of 1.0%) were used to calculate the pairwise similarity matrix with similarity coefficients. Cluster analysis of similarity matrixes was performed by UPGMA (Unweighted Pair Group Method with Arithmetic Mean). Major DNA bands were needed for constructing the phylogenetic tree with TFPGA (Tools for Population Genetic Analyses). Each isolate of E. coli was one population. Hence, there were 4 total populations and 2 loci were considered for constructing the dendrogram.

Morphological and biochemical characteristics of all E. coli isolates
This study was conducted between January and July 2012, and 150 E. coli isolates are collected from various biotopes, distributed: 22 from animal organs and meats, 21 from varied kinds of soil, 45 from different water, 16 from feces of varied animals, 27 from different foods, 5 from humans, and 14 from different nosocomial and abattoir environment.
According to the results of morphological and biochemical tests, all isolates were as Gram-, urease-, catalase-and oxidase-negative, and 97% were indole-positive, 75% were mobile, 92% were citrate of Simmons-negative, and 99% were gas glucose-positive (Table 1).

Distribution of genes encoding RS-PCR groups according to the origins of E. coli isolates
The result of Table 3 showed the dominance of water as the most important source of E. coli isolates with around 37.3% (56/150), the food animals, vegetables, feces, and soil arrive after with a respective frequency of 0.18% (27/150), 0.17% (26/150), 0.15% (23/150) and 0.11% (17/150). The lowest one is observed from a human source with a frequency of 0.01% (2/150). Distributing phylogroups according to the origins of E. coli isolates showed that water and food animals appeared as the primary sources of the different phylogroups registered in this study since all known E. coli phylogroups were found with water and food animals. The soil and vegetables, sampling origins came in the second range with a smaller number of phylogroup types. While E. coli isolates from feces appeared to present the lowest number of known E. coli phylogroups.

Cloning and sequencing of 16S-23S RDNA ITS
In the present study, 16S-23S rRNA ISRs of 150 E. coli strains were successfully amplified by PCR. Besides the required presence of the nuc gene, all the 150 E. coli isolates presented the Ert2 genes, confirming the strain identity as E. coli. Analysis of the 16S-23S-region revealed various sized amplicons of 100, 150, 290, 320, 450, 550 bp, respectively; the most frequent ones ranged between 450 and 550 bp. The configuration ITS patterns reflected a developed level phylogenetic grouping: Amplicons of nearly 750 bp were calculated to comprise approximately 550 bp of the 3'portion of the 16S rRNA genes and 450 bp of the 5′ portion of the 23S rRNA gene; a selection of 12 strains is shown in Supplementary Material 1. Patterns generated by RS-PCR and the miniaturized electrophoresis were well reproducible to show that four distinct groups of E. coli isolates were found (Fig. 2).
The dendrogram showing the clustering of the amplification patterns of E. coli with RS-PCR is generated using the squared Euclidean distance measure and the average linkage clustering method with the program SPSS 22 for Windows (Fig. 3).
Analysis of the 16S-23S rRNA intergenic spacer region by RS-PCR revealed four genotypes and four subtypes. On one side, the genotype I appeared as the most frequent in 116 isolates among 150 ones and represented 77.3% of all the isolates. By against, 18 isolates representing 12% of all isolates belonged to genotype B1; therefore, phylogroups A and B1 accounted for the whole 89.3% (134 isolates) of the isolates. The number of 14 isolates (9.7%) belonged to genotype B2, and the genotype D rarely occurred with 4% (2 isolates). For further analysis, the rare genotypes were grouped and named other genotypes (OG). The dendrogram confirmed the dissimilarities between the different genotypes, in particular for strains of genotype B2 and D (Fig. 3).
Comparisons of aligned 16S-23S rDNA space region sequences revealed that rRNA processing motifs are highly conserved within the 16S-23S rDNA space regions of all the isolates (Fig. 2). In the sequence region between the tRNA genes, the number of nucleotide positions varied from 7 (in genomovars 1 and 5) to 31.
TaqI restriction profiles of the 16S-23S rDNA space region amplicons, using the PCR primers 16F and 23R, included 4 to 7 bands of sizes ranging from 450 to 550 bp (Fig. 2). Identical TaqI digestion profiles always were found for strains belonging to a genomovar (as defined by genomic DNA similarities). All E. coli genomovars presented 2 characteristic bands of 450 and 550 bp related to their TaqI restriction patterns (supplementary material 1, and Fig. 2). The 16S-23S rDNA space region restriction patterns of strains of E. coli, generated by TaqI digestion, were clustered by UPGMA. Branching dichotomies, because of 16S-23S    (2) Water (1), fa (1) A1(1), B1 (1) rDNA space region polymorphisms, resulted in clusters of strains at the species level.

Discussion
The phylogenetic analysis of 150 isolates showed that in our study, most of the isolates belonged to phylogenetic groups A and B 1 . Isolates belonging to these two system groups are considered symbionts of animal or human origin, because their genes encoding virulence factors are few and unrelated, and they are found in human or animal naturally infectious E. coli isolates. Heat map analysis supports our results and proves the presence of two main groups of E. coli bacterial strains according to their origin. The two groups shown in Fig. 4 are: group 1 taken from wastewater (D1, d2, B2-3), group 2 taken from animals, plants and wastewater (B2-2, A1, B1 and A0). The frequency of phylogroups is lower (Carlos et al. 2010;Jakobsen et al. 2010). However, based on several studies, based on the presence of several virulence genes, B 2 and D phylogroups strains appear to be more toxic (Kilani et al. 2017). Interestingly, no significant relationship was found between antibiotic resistance and the members of the phylogenetic group of the isolates. Fecal contamination is mainly caused by the dominance of Escherichia coli, which constitutes a serious environmental problem and may affect many coastal and inland waters around the world (Anderson et al. 1997). Point source discharges, such as raw sewage, storm water, and combined sewer overflow, effluents from wastewater treatment plants and agro-alimentary industry sources, are the major contributors to fecal pollution and contamination of natural environmental systems (Griffin et al. 2001).
Thus, 16S-23S rDNA space regions may be good targets of genomovar-and species-specific probes for environmental monitoring. The fewer conserved 16S -23S rDNA space regions can be applied as a high-resolution indicator of the evolutionary divergence of E. coli strains.
Despite the observation that every species of E. coli analyzed presented a unique restriction pattern, more strains of each species will need to be analyzed before arriving at general conclusions about the utility of 16S-23S rDNA space region restrictions for the identification of strains at the species level. Whereas comparisons of 16S rRNA gene sequences are restricted in their power to resolve closely related species of a genus (Fox et al. 1992;Martinez-Murcia et al. 1992;Hauben et al. 1997Hauben et al. , 1998, spacer regions within the 16S and 23S genes in prokaryotic rRNA genetic loci exhibit significant length and sequence polymorphisms Our results confirm that RS-PCR can be a rapid test for molecular typing of E. coli strains isolated from various biological communities, and can identify genetic subtypes with specific virulence. In the study of Fournier et al. (2008), the authors concluded that the bovine Staphylococcus aureus isolates are genetically heterogeneous using the 16S-23S rDNA spatial region. Maeda et al. (2000) showed that the 16S-23S rRNA intergenic regions contained different tRNA compositions, and the similarities in the nucleotide sequence of the non-coding regions flanking the tRNA gene have been noted. The phylogenetic information variable site is only in the non-coding region. The sequence analysis results of the 16S-23S rDNA spatial region maintain and correlate with the clear phylogenetic relationship in the phylogenetic group, providing an alternative tool for genotype and E. coli species differentiation (Hin-Choung et al. 2001). Therefore, amplification using primers considered based on these flanking sequences will produce polymorphic fingerprints that could distinguish bacterial strains at the species and subspecies levels (Bidet et al. 2000). However, since the RS-PCR patterns are more simply visible visually than the REP-PCR or ERIC-PCR patterns, they may be a practical technique for routine usage (Hin-Choung et al. 2001). Also, the sensitivity and the specificity of the RSS-PCR method were 100 and 96%, respectively (Kimura et al. 2000). Although variances are detected in the number and size of the PCR-6S-23S rDNA space region products attained from different strains, these characteristics alone could not be used for an overall difference of all genomovars or E. coli species.

Conclusion
In this study, we evaluated the diversity of 150 E. coli strains isolated from different sites. Isolated and selected strains were submitted to identification, virulence genes and antibiotic resistance analyses. The results obtained can be used in epidemiological, diagnostic, virulence and molecular taxonomy studies. The study of the diversity of E. coli strains in this study was performed by the RS-PCR technique. RS-PCR is a suitable rapid typing method for E. coli isolates. This method was implemented in its high discriminative power. But RS-PCR may be a more practical method because fewer amplification bands and patterns are generated, simplifying reviews and interpretation of data. It is a rapid, easily workable and achievable and reproducible method appropriate for genotyping of A. hydrophila, for example at the strain level. 16S-23S rRNA and phylogroups analyses of E. coli isolates revealed the potential for identifying sources of E. coli environmental contamination. A fairly small number of isolates are necessary to find the candidate source-specific E. coli that is stable and unchanging under the simulated environmental conditions. The results achieved by the RS-PCR technique will be invaluable for developing extra typing strategies and the optimization of traditional typing methods, such as the triplex PCR of phylogeny group approaches. The ribosomal spacer PCR (RS-PCR) appeared as a highly resolving and robust genotyping method for E. coli of moderate costs and suitable to be used for routine. Water and food animals appeared as the most sampling sources of E. coli isolation, showing high diversity. Soil, vegetables and feces came in the second rank, while the human origin showed the least E. coli diversity.