Genetic diversity of Entamoeba Species among Children Under 5 years in the Vhembe District, Limpopo Province, South Africa


 Background: Our understanding of the disease caused by the various Entamoeba species and its epidemiology is changing with time, and very little is known about their genetic diversity. Therefore, this study sought to investigate the prevalence and genetic diversity of Entamoeba species among children under 5 years in the Vhembe District, South Africa.Methods: A total of 534 stool samples from 313 children (both males and females) aged under 5 were collected from 12 villages in Vhembe District, Limpopo. The prevalence of Entamoeba infections was examined by microscope and PCR, followed by Sanger sequencing for specific regions of 18S rRNA gene to identify and differentiate the circulating species of Entamoeba in the study population. Results: Of the 313 children recruited in the study, 163 were females and 150 were males, the ages ranged from 1 to 3 years. Of the 534 samples, 130/534 (24.3%) were microscopically positive for Entamoeba cysts. However, Entamoeba genus-specific DNA ampliﬁcation using PCR identiﬁed 43/534 (8%) of Entamoeba species. Twenty positive amplicons were sequenced by Sanger sequencing technologies. Out of the twenty samples, twelve (60%) were confirmed to be Entamoeba species. The Entamoeba species identified in the study as evidenced by BLAST calls in the NCBI database and phylogenetic tree after narrowing the search using the option Entamoeba taxid (5758) from the NCBI database included 4 E. polecki (33.3%), 6 E. coli (50%), one E. muris (8.3%) and one E. hartmanni (8.3%). The phylogenetic tree showed the close relationship between isolated species and the ones in the GenBank.Conclusion: The current study has shown for the first time the presence of E. polecki in humans and the existence of possibly two types of E. coli infecting humans. Our findings further emphasize the need for the re-evaluation of the pathogenicity of species such as E. polecki which are quite common in the study population and might be responsible for some of the health complications.


Background
Entamoeba genus contains a group of unicellular, anaerobic and parasitic microorganisms that infect the gastrointestinal tract of both humans and animals [1]. Despite the focus is largely due to E. histolytica, a major cause of morbidity and mortality in humans and animals [2], other species including E. dispar, E. moshkovskii, E. coli, E. polecki, E. muris and E. hurtmani are also found in this genus [3].
Entamoeba polecki, a uni-nucleated cyst-producing Entamoeba species, infections in humans are infrequent and are mainly linked with animal contact such as pigs [4][5][6][7]. Other reported cases of Entamoeba infections in human includes Entamoeba chattoni, a uninucleated cyst-producing Entamoeba species contracted during contact with monkeys [8]. However, whether they occur in humans or are even genetically distinct remains to be established [6].
The genetic diversity of Entamoeba species is one of the major studies that one can follow to understand the species in detail [9]. The identi cation of pathogenic species of Entamoeba may provide insight knowledge about the treatment, control, diagnosis and the epidemiology of the species [10]. Due to the humans and animals' medical purposes, the diversity of these parasites has been investigated [9]. Even though many Entamoeba species have been isolated and identi ed using molecular methods, still the genetic diversity of the Entamoeba species is poorly understood [9].
Studies of the diversity of Entamoeba species have been reported worldwide. Feng et al., (2018) reported a clear diverse of Entamoeba species in China in which the group highlighted the detection of 50% of Entamoeba polecki in pigs and humans' stool samples [11]. Data on the genetic diversity of Entamoeba species is scarce and only few studies have examined the genetic assortment of Entamoeba African strains. The present study sought to investigate the genetic diversity of Entamoeba species among children under 5 in the Dzimauli population, Vhembe District, Limpopo province in South Africa.

Study area, population and sample collection
The current study was carried out in 19 rural villages in Vhembe, Limpopo province, South Africa. In brief, Genus-speci c PCR assay, sequencing and phylogenetic assay Genomic DNA was extracted from all stool samples using a QIAamp DNA Stool Mini Kit (Qiagen) following manufacturer's instructions (Qiagen, Inc., Hilden, Germany). A PCR was performed using genusspeci c PCR primers based on small-subunit rRNA gene sequences. Primer sequences (Entam1: 5´GTT GAT CCT GCC AGT ATT ATA TG 3´ and Entam2: 5´CAC TAT TGG AGC TGG AAT TAC 3´) which produce an approximate 550bp fragment length previously described by Verweij et al. (2001) were used [6]. Brie y, genus-speci c PCR ampli cations were performed in a nal volume of 25µl using the thermal cycler (P100TM Thermal Cycler, BIO-RAD). For cycling conditions all reactions involved an initial denaturation step at 94 ˚C for 5 minutes followed by 35 cycles of 94 ˚C for 1 minute, 55 ˚C for 1 minute and 72 ˚C for 1 minute and a nal extension at 72˚C for about 7 minutes. The PCR products were separated by electrophoresis in 2% agarose gel and visualized by UV-trans-illuminator.
For further analysis all positive amplicons were sequenced at a sequencing company (Inqaba Biotech, Pretoria, South Africa). Both strands were sequenced with the primers used for PCR. Sequenced data were aligned, analyzed and edited using BioEdit editor and the evolutionary relationship between the species was inferred using MEGA 10 software [12]. The SSU rRNA gene sequences obtained in this study were deposited in GenBank under accession numbers MW133761 -MW133772

Results
The mean age of the participants in the current study was 1.39 years. Five hundred and thirty four stool specimens were included to detect Entamoeba genus from symptomatic and asymptomatic children. Vegetative and/or cyst forms were found in 24.3% (130/534) by direct wet mount microscopy as Entamoeba cyst, either singly or in combination with other intestinal parasites such as Endolimax nana and Trichuris trichura. As for the genus speci c PCR, 8% (43/534) were identi ed as positive for the Entamoeba genus (Fig. 1). Twenty were randomly selected for Sanger sequencing of which, twelve were con rmed to be Entamoeba species as evidenced by BLAST calls in the NCBI database and phylogenetic tree (Fig. 2) after narrowing the search using the option Entamoeba taxid (5758). The Blast calls result included: 4 E. polecki (33.3%), 6 E. coli (50%), one E. muris (8.3%) and one E. hartmanni (8.3%).

Discussion
To our knowledge, no study has investigated the prevalence and genetic diversity of Entamoeba species among children under 5 years in the Vhembe District, Limpopo, South Africa. In the present study, we have shown for the rst time that the infection rate with Entamoeba species is 8% using molecular tools. Compared to PCR and sequencing, vegetative and/or cyst forms were found in 24.3% (130/534) by direct wet mount microscopy as Entamoeba cysts highlighting the trouble many laboratory technicians face in identifying and differentiating morphologically similar cysts and/or trophozoites of Entamoeba genus such as E. histolytica, and other uni-nucleated cysts including immature cysts of E. histolytica [14][15][16].
To investigate the genetic diversity of Entamoeba species in the study population, twenty samples were Sanger sequenced resulting in four Entamoeba species, E. coli (60%), E. polecki (33.3%), and E. muris and E. hartmanni (8.3%). Although the identi ed species might be less pathogenic in the case of a single infection, coinfections with other pathogens including bacterial, fungal and viral infection may augment the severity of the disease [17].
A large genetic distance exists between the un-, tetra-, and octanucleated cyst forming Entamoeba species as described by Silberman et al (1999) [18]. As presented in the phylogenetic tree, all Entamoeba polecki isolates clustered with E. polecki (AB845671) and E. coli (AB845674) reference sequence. On the other hand, they are widely separated from E. coli (AB444953), E. muris (FN396613) and the tetranucleated cyst forming E. histolytica and E. moshkovskii. Interestingly, possibly two variants of E. polecki are clearly distinguishable in the phylogenetic tree (Fig. 2). Isolate 46 is further away in the tree from the other three E. polecki samples suggesting that possibly two variants of E. polecki are identi ed in the present study. It has previously been proposed that variants of E. polecki exist since there is no host speci city and no known difference except for small amounts of sequence divergence [6,19].
Sequencing revealed more E. coli positive samples clustered in two distant parts of the phylogenetic tree suggesting that they may be different species/types/stains of E. coli. Figure 2 shows that isolates 22, 24 and 27 clustered with E. coli (AB845674) reference sequence whereas isolates 28, 5 and 16 all clustered together and were widely separated from the other E. coli samples and reference sequence. Stensvold et al. (2011a) reported that E. coli samples from humans group into two clusters, which have been named subtypes 1 and 2 (ST1 and ST2) with ST1 widespread among humans [19]. Whether this variation is a result of the possible source of infection, human or animal origin, remains to be established. Sample 3, Entamoeba hartmanni, s tetra-nucleate cyst producing Entamoeba visibly branches out separately in the phylogenetic tree away from the other tetra-nucleated Entamoeba species (Fig. 2).
Studies done by Stensvold et al., (2011b) demonstrated human infections with E. polecki, in which a novel 18S rRNA gene sequence was identi ed in a species of Sulawesi macaque [20]. However, in many cases, the local prevalence of these species may vary signi cantly based on the different geographical regions. A study done in South Africa reported that E. polecki (90%) were more prevalent as compared to E. coli with (10%) [21]. Furthermore, another study was reported in India which reported about 49.5% of E. polecki and only 7.4% with E. coli and E. moshkovskii [21]. Entamoeba polecki is mostly isolated from domestic animals especially pigs [22]. Therefore, looking at the study population setting we can also suggest that the infection might be transmitted from pigs to water than humans. Only one sample (#23) returned 100% identity with E. muris (FN396613) and E. coli (AB444953) suggesting the sample could either be infected from an animal or a human source. Both E. muris and E. coli are producers of octanucleated cysts and both look identical morphologically under the microscope.

Conclusion And Recommendations
Page 6/13 The current study has shown for the rst time the presence of E. polecki in humans and the existence of possibly two types of E. coli strains infecting humans. What is clear from this study, is that humans can undoubtedly be infected with uninucleated cyst-producing and that more genetic variability exists within this group as well as E. coli than has previously been recognized in human infections. Our ndings further emphasize the need for the re-evaluation of the pathogenicity of species such as E. polecki which are quite common in the study population and might be responsible for some of the health complications. Several studies concerning the virulence factors and pathogenicity of the identi ed species still need to be done.   The evolutionary history was inferred by using the Maximum Likelihood method and Tamura-Nei model [13]. The tree with the highest log likelihood (-6058.62) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. This analysis involved 22 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. There were a total of 695 positions in the nal dataset.
Evolutionary analyses were conducted in MEGA X (Kumar et al., 2018) Figure 2