A multilocus sequence analysis for the taxonomic update and identification of the genus Proteus CURRENT

Background: The members of the genus Proteus are commonly opportunistic pathogens that cause a variety of infections in humans. The molecular genetic relationships among Proteus species are remain unelucidated. In this study, we developed a multilocus sequence analysis (MLSA) approach based on five housekeeping genes (HKGs) to delineate phylogenetic relationships of species within genus of Proteus . Results: Of all 223 Proteus strains collected in current study, phylogenetic tree of concatenated five HKGs ( dnaJ, mdh, pyrC, recA and rpoD ) divided into eleven clusters, which representative of their counterpart species. Meanwhile, phylogenetic trees of the five individual HKGs were also corresponded to that of the concatenated tree, except for recA , which clustered four strains at an independent cluster. The evaluation of inter- and intra-species distance of HKGs concatenation, all inter-species indicated more significant different distances than those of intra-species, which revealed these HKGs concatenation can be used as gene marker to distinguish different Proteus species. Further web-based DNA-DNA hybridization estimated by genome of type strains confirmed the validity of the MLSA and each of eleven clusters were congruent with eleven different Proteus species. In addition, we used the established MLSA method to identify the randomly collected Proteus , and found that P. mirabilis is the most species of Proteus . However, the top second is P. terrae , but not P. vulgaris . Combined with the genetic, genomic and phenotypic characteristics, three species, P. terrae , P. cibarius and Proteus genospecies 5 should be regarded as the heterotypic synonyms. Conclusions: Our data suggested the MLSA was a powerful method for the discrimination and classification of the Proteus at species level. The MLSA scheme provides a rapid, economical and precise identification of Proteus strains. The identification of Proteus species determined by the MLSA approach plays an important role in clinical diagnosis and treatment of Proteus infection.

The genus was first described by Hauser and was successively separated into two species, i.e, Proteus mirabilis and Proteus vulgaris, on the basis of their ability to ferment maltose [2]. Strains of P.
vulgaris comprised three biogroups, based on three biochemical reactions, namely indole production, salicin fermentation and aesculin hydrolysis. Biogroup 1 was characteristic by being negative for those three reactions, named as P. penneri [3]. By contrast, biogroup 2 was positive for the three reactions and retained the name of P. vulgaris. Biogroup 3 was positive for indole production but negative for salicin fermentation and aesculin hydrolysis [4], and further separated into four groups by DNA-DNA hybridization that were designated Proteus genospecies 3, 4, 5 and 6 [4]. The genospecies 3 can be distinguished from Proteus genospecies 4, 5 and 6 because it is negative for Jordan's tartrate utilization, and was named by the species of P. hauseri, while genospecies 4, 5 and 6 remained unnamed due to their undistinguishable phenotypic differentiation [4]. In addition, six newly defined species, i.e, P. terrae and P. cibarius, P. alimentorum, P. columbae, P. faecis and P. cibi were proposed recently based on phylogenetic, phenotypic, chemotaxonomic and genotypic analysis [5][6][7][8][9].
Thus, the genus Proteus comprises at present ten validly published species and three unnamed genospecies (4, 5 and 6).
Except for those six newly defined species, the classification of other Proteus species and genospecies were based on the difference of biochemical reactions and DNA-DNA hybridization, which were designed nineteen years ago or even more time [4,10]. Particularly, the molecular evolutionary characteristics and genetic relationships among those Proteus phenospecies and genospecies are remain unelucidated, due to absence of a molecular typing method in Proteus genus. Multilocus sequence analysis (MLSA) based on several housekeeping genes (HKGs), has previously been successfully employed to delineate boundaries between closely-related bacterial species, subspecies and component strains [11][12][13]. Partial sequences of protein-encoding genes have been proven useful for species identification and as phylogenetic markers in the family Enterobacteriaceae [14,15].
In the present study, we developed a five-gene MLSA approach to delineate genetic similarities and differences among Proteus species. We used this MLSA method to type the genotypic species of 223 Proteus strains that were identified by phenotypes. Our data indicated this MLSA was a powerful method for the discrimination, classification and phylogenetic analysis of the Proteus at species level, meanwhile, we revealed taxonomic relationship between phenotypic and genotypic species, specially, modifying two phenotypic taxonomy using this MLSA method.

DNA Extraction, PCR Amplification And Sequencing
The genomic DNA from Proteus strains were extracted using a genomic DNA purification kit (Tiangen Biotech, Beijing, China) in accordance with the manufacturer's instructions. Extracted DNA was dissolved in TE buffer and stored at -20 °C until use as PCR templates. Five candidate HKGs were used for MLSA analysis, i.e, dnaJ, mdh, pyrC, recA and rpoD. The primer sets were designed and listed in Table 1. For PCR amplification, each reaction was performed in a final volume of 50 µl containing 25 µl of 2 × Taq PCR MasterMix (Tiangen Biotech, Beijing, China), 1.5 µl 10 µM of each forward and reverse primer, 2 µl DNA template, and 20 µl ddH 2 O. The reaction mixture was subjected to denaturation at 95 °C for 5 min, followed by 30 cycles of denaturation at 95 °C for 30 sec, annealing at 52 to 55 °C for 30 sec and extension at 72 °C for 1 min/kb. An extension step of 10 min at 72 °C was carried out following the last cycle to ensure full-length synthesis of the fragment. All PCR products of the five HKGs were commercially direct sequenced in both directions (TsingKe Biological technology, Beijing, China).  Phylogenetic tree branch support estimation, 1000 replications were calculated to obtain the bootstrap values.

Intra-And Inter-species Phylogenetic Distance Of HKGs
Intra-species phylogenetic distance was defined as the phylogenetic distance within the strains from the same species, and inter-species phylogenetic distance was defined as the phylogenetic distance of strains from a species with strains from other species. The phylogenetic distance between strains was calculated using MEGA 7.0 with Kimura 2 parameter model. The minimum, median, and maximum of intra-and inter-species for each species were calculated. Variance of compacted or dispersive distance of species analyzed using Fisher's exact test.

Genomic Relatedness Among Isolates Of Different Species
The genomic relatedness among isolates of different species was further evaluated by web-based DNA-DNA hybridizations (DDH), like in silico DDH (is DDH) and average nucleotide identity (ANI) to detect their similarity values [16,17]. Is DDH values were determined using the genome-to-genome distance calculator (GGDC) web server (http://ggdc.dsmz.de/) and ANI values were measured by EZ BioCloud platform (http://www.ezbiocloud.net/tools/ani), with similarity values of 70% and 95% as the standard threshold for species boundaries, i.e., two isolates represented different species when their is DDH and ANI values were below 70% and 95% threshold, respectively [16,17].

MLSA of the five concatenated HKGs
Of all 223 Proteus strains collected in this study, phylogenetic tree of concatenated 5-gene divided them into eleven clusters ( Fig. 1), representing of thirteen species. Among them, ten clusters contained one type strain of each. However, cluster 5 was comprised of three type strains, i.e, Proteus genospecies ATCC51470 T , P. cibarius JCM 30699 T and P. terrae LMG 28659 T .

Identification of Proteus species by phylogenetic analysis of five individual genes
Phylogenetic trees based on five individual HKGs were also constructed ( Fig. 2). Phylogenetic trees of the five HKGs (dnaJ, mdh, pyrC, recA and rpoD) can be divided into eleven clusters, representative thirteen species, and corresponding to that of the concatenated tree. Meanwhile, phylogenetic trees of four individual HKGs (dnaJ, mdh, pyrC and rpoD) were the same as that of the concatenated tree, both in numbers of species (cluster) and strain numbers within each species (cluster). There is one inconsistence between trees of recA and concatenated 5-gene: recA identified four strains as uncluster, whereas the four strains were identified by concatenated 5-gene and other four HKGs as species 6 (Fig. 2).

Inter-And Intra-species Distance Of HKGs
The inter-and intra-species distances of HKGs were summarized in boxplot of the concatenated 5gene (Fig. 3). All inter-species indicated the obviously different distances than that of intra-species.
Among the inter-species boxplot, two species, P. mirabilis, and P. hauseri indicated compacted distance ranges (both standard deviation, SD = 0.004), whereas the remaining nine species shared dispersive distance ranges (SD ranges from 0.024 to 0.065); On the other hand, among intra-species boxplot, P. hauseri possessed compacted distance range (SD = 0.000) compared to that of five species (SD range from 0.012 to 0.058). Meanwhile, boxplots of the five individual genes ( Figure S1) indicated same trends of intra and inter-species distance as that of the concatenated 5-gene, although there are small part overlapping in species 5 and 6 of pyrC. The detailed genetic distance and median values of individual genes and the concatenated 5-genes were summarized in Table S1.

Web-based DNA-DNA Hybridizations Among Species
To confirm the correctness of strains among the eleven species, we used web-based DDH, like is DDH and ANI to detect their similarity values. Among the eleven species defined in this study, the is DDH and ANI values of the type /representative strains were 23.5-51.4% and 80.8-94.4% (Table 2), less than the proposed cutoff level for species delineation, i.e., 70% and 95%, respectively. Notably, among the three subclusters within cluster 5 (Fig. 1), either among the three published type strains (Proteus genospecies ATCC51470 T , P. cibarius JCM30699 T and P. terrae LMG28659 T ) or representative strain (CA142267) among the three subclusters, their is DDH and ANI values were more than the proposed cutoff level for species delineation. The results indicated strains within the cluster 5 actually belonged to same species.  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  AN  I  DD  H  10 0 10 0 # Strain: a, P. mirabilis ATCC 29906 T (GenBank accession no. ACLE00000000.1); b, P. penneri ATCC 33519 T (PHFJ00000000); c, P. vulgaris KCTC 2579 T (PHNN000000000); d, P.hauseri JCM 1668 T (PGWU00000000); e, Proteus genospecies 4 ATCC 51469 T (PENV00000000); f, Proteus genospecies 5 ATCC 51470 T (PENU00000000); g, P. cibarius JCM 30699 T (PGWT00000000); h, P. terrae LMG 28659 T (PENS00000000); i, CA142267; j, Proteus genospecies 6 ATCC 51471 T (PENT00000000); k, P. columbae 08MAS2615 T (NGVR00000000); l, P. alimentorum 08MAS0041 T (NBVR00000000); m, P. faecis TJ1636 T (PENZ00000000); n, P. cibi FJ2001126-3 T (PENW00000000). Results were percentages based on Formula 2, calculate distances and DDH estimates with GGDC 2; ANI values were estimated using the web-based service ANI calculator (http://www.ezbiocloud.net/tools/ani). The grey shadow marked strains with is DDH > 70% and ANI > 95%, respectively, indicating they belong to the same species. Reclassification of Proteus genospecies 5 and P. cibarius to P. terrae Since either MLSA of the five concatenated HKGs or phylogenetic analysis of five individual genes indicated that three type strains, i.e, Proteus genospecies ATCC51470 T , P. cibarius JCM 30699 T and P.
terrae LMG 28659 T were fell into one cluster (cluster 5 in Fig. 1), further web-based DNA-DNA hybridizations, like is DDH and ANI confirmed that, among the three subclusters within cluster 5, either among the three type strains or representative strain (CA142267) among the three subclusters, their is DDH and ANI values were higher than the proposed cutoff level for species delineation (70% for is DDH and 95% for ANI, Table 2). The genomic analysis provided evidences that strains within the cluster 5 actually belonged to same species.
Further phenotypic characteristics were detected among type strains of Proteus genospecies 5, P. cibarius and P. terrae, and slight distinctive properties were observed (Table 3). Only minor differences were obtained between type strains of the three species, this including growth in optimum temperature, growth range in NaCl and pH, the utilization of Dnase, lipase and citric acid, and DNA G + C content. Combined with the genetic, genomic and phenotypic characteristics, three species, P.  Table 3 Distinctive phenotypic characteristics among type strains P. terrae, P. cibarius and Proteus genospecies 5#.

Discussion
MLSA has been used for classification at species level in numerous Enterobacteriaceae [14,15,[18][19][20][21][22][23]. Normally, four to seven HKGs were selected for MLSA to determine phylogenetic relationships. It has been advised to use sequence data from more than one gene, to reduce the possibility of ambiguities caused by genetic recombination or specific selection. MLSA is increasingly applied in order to obtain a higher resolution power between species within a genus [24]. In this study, when amplified by PCR of 223 tested Proteus strains collected, the five HKGs (dnaJ, mdh, pyrC, recA, and rpoD) have a good corresponding relationship of consistency among different species. Thus, we established the MLSA method with the five genes for taxonomic analysis of Proteus genus. Our MLSA-based approach can be used to effectively discriminate Proteus sp., and enable the delineation of species boundaries with high confidence.
Our MLSA method divided all 223 Proteus strains into eleven clusters, representative of eleven species. Among the eleven species, P. mirabilis was the majority species collected in this study, which agree with numerous reports of Proteus genus classified by phenotypic methods [2]. However, even all P. mirabilis isolates were phenotypic with the same distinguishing biochemical features, i.e, positive for ornithine decarboxylase but negative for sucrose and maltose only. Species P. mirabilis can be further divided into three dominant subcluster. In contrary, species P. vulgaris was the most conservative cluster than any others, which exhibited one of the minimum intra-species distances of HKGs among eleven species (Fig. 2). Interestingly, P. hauseri was phylogenetically more closely to P.
mirabilis than any other species (Fig. 1), although P. hauseri was once belonged to biogroup 3 of P.
Cluster 5 included three subclusters, our web-based DDH indicated that strains within the cluster (including three type strains, Proteus genospecies ATCC 51470 T , P. cibarius JCM 30699 T and P. terrae LMG 28659 T ) actually belong to the same species. P. cibarius and P. terrae were defined as new species of the genus Proteus maybe because both studies were excluded type strain of Proteus genospecies 5 (such as ATCC 51470 T ) [4], meanwhile, papers of the two species were accepted for publication in a very near time (year of 2015 and 2016) at different journals [5,6], so that they did not cited each other.
Proteus is the most common opportunistic pathogen, of which P. mirabilis and P. vulgaris have long been considered as the two most common species [2,25,26]. Clinically, different treatment schemes may adopt according to different species of Proteus [27,28]. In this study, we used the established MLSA method to identify the randomly collected Proteus, and found that P. mirabilis is the most species of Proteus. However, the top second is P. terrae, but not P. vulgaris, this result is quite different from that of clinical phenotype identification [2]. The reason is that in clinical, strains of Proteus genospeces 4, 5 and 6 have long been identified as P. vulgaris by phenotypic biochemical reactions [4]; meanwhile, as the result of this study, Proteus genospeces 5 accounts for a large proportion (Fig. 1); moreover, P. penneri and P. hauseri are initially classified as different biogroup of P. vulgaris [4].

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.
Supplementary materials.pdf