Occurrence and diversity of CRISPRs in Laribacter hongkongensis isolates from animals, environment and diarrhea patients in southern China CURRENT

Clustered regularly interspaced short palindromic repeats (CRISPRs) and CRISPR-associated (Cas) proteins are functional elements of archaea and bacteria, and they form the genetic barrier that reduces the transformation of horizontal genes by an unknown mechanism. we searched for CRISPRs in 118 Laribacter hongkongensis strains isolated from patients, animals, and water reservoirs. Two CRISPR loci, designated CRISPR4.1 and CRISPR4.2, were identified in L. hongkongensis strains. A CRISPR4.1/cas system was detected in 91.5% (108/118) of the isolates and belonged to the I-F/Ypest subtype of CRISPR/cas systems, while the remaining ten strains only possessed cas genes without the CRISPR4.1 array. The CRISPR4.2 locus was an orphan locus and existed in 72.0% (85/118) L. hongkongensis strains. Meanwhile, a total of 2562 spacers and 980 unique spacers, arranged in 77 alleles, were found, including 1613 (579 unique, 40 alleles) for CRISPR4.1 and 949 (401 unique, 37 alleles) for CRISPR4.2. The results showed that limited spacers had matches in the plasmid (34), phage (19) and bacteria chromosomal sequences (4) from GenBank databases. Consequently, we found the diversity and activity of CRISPRs from human and frog isolates were closer and higher, respectively, than those of the fish isolates.

biological functions, types, and epidemiological characteristics of CRISPR loci may be worth studying.

Background
Clustered regularly interspaced short palindromic repeats (CRISPRs) are recently discovered to be common in the genomes of bacteria and archaea, where they provide acquired immunity against foreign DNA [1] . CRISPR loci are usually composed of palindromic direct repeats [2] that are regularly interspaced by flexible sequences called spacers. The CRISPRs are classified into 12 major groups based on repeat sequence similarity and the ability to produce and stabilize RNA secondary structures [3]. CRISPRassociated proteins (cas) genes, which encode functional proteins, such as helicases, polymerases, nucleases, and polynucleotide binding, are often adjacent to CRISPR and contribute to the propagation and functioning of CRISPR [4]. They include eight CRISPR/ cas subtypes, according to gene order as well as gene content. Many spacers of CRISPR derived from proto-spacers are part of foreign DNA [5][6][7][8], suggesting that spacers may provide a historical record of mobile element exposure. Spacer sequences show the phenomenon of insertion and selective elimination in the course of bacterial evolution and induce structural polymorphism in CRISPR, which show differences between different strains of the same species [9][10][11]. Thus, the structure of the CRISPR may be a potential target for tracing the origin and evolution of bacteria.
As a newly discovered species, Laribacter hongkongensis (L. hongkongensis) is a member of the Neisseriaceae family of β-subclass Proteobacteria, and it is a novel emerging bacterium that is closely associated with community-acquired gastroenteritis and travelers' diarrhea [12,13]. In 2001, this facultative anaerobic, motile, Gram-negative, nonsporulating, ureasepositive, S-shaped bacillus was isolated first from an alcoholic cirrhosis patient in Hong

Results
Identification of two CRISPR loci by analysis of 6 available L. hongkongensis genomes First, we analyzed the genome sequence of L. hongkongensis LHGZ1 (CP022115.1) and detected two confirmed CRISPR systems, which were distant to each other (>344 kbp) ( Fig.  1). Both of them exhibited several parallel direct repeats (GTTCACTGCCGGACAGGCAGCTCAGAAA), which could be classified into cluster 4 on the basis of sequence identity [3]. We named them CRISPR4.1 and CRISPR4.2, respectively. For those two specific genomic structures, we only confirmed the existence of putative cas genes in the flanking sequence from the CRISPR4.1 arrays, which consisted of the successive co-oriented cas genes (cas1-cas3-csy1-csy2-csy3-csy4) and were located between the LHGZ1-1184 (rve_3, rve_3 domain-containing protein) gene and the LHGZ1- CRISPR4.1, strain HLHK9 being the exception. Strain HLHK9 had the majority of cas genes that LHGZ1 had but no direct repeat or spacer sequence (  Fig. 1). The order of cas genes was csy1-csy2-csy3-csy4'-csy4-cas1'. Among these cas genes, cas3 was deleted, and cas1' had only 111-bp, which were not identical to any 111-bp stretch of cas1. Surprisingly, the csy4' gene and csy4 gene were divided by a 9 bp stretch, and csy4' only had 165 bp identical to csy4. Nevertheless, the leader sequence was still present and similar (94% DNA identity) to that of strain LHGZ1, although it did not exhibit any direct repeat or spacer DNAs. The last direct repeat was located next to an sts gene homologous to strain LHGZ1 in the remaining 6 4 investigated L. hongkongensis isolates (Fig. 1). All four showed a parallel cas gene group with 90%-98% DNA similarity to LHGZ1, and the direct repeat was also similar to the LHGZ1 strain ( Fig. 1). Leader sequences (93%-97% DNA similarity to LHGZ1) were also found in the four strains. The results in Fig. 1 indicate that all strains exhibited disparate numbers of spacer sequences.
Among the two CRISPR spacers, (i) the mean number of spacers of CRISPR4.1 was higher than that of CRISPR4.2 (p=0.000); (ii) the unique-spacer proportion of CRISPR4.1 was lower than that of CRISPR4.2 (p=0.001); (iii) there was no statistically significant difference between the number of alleles in CRISPR4.1 and that in CRISPR4.2 ( p=0.361), as shown in Table 6.
Interestingly, a further comparative analysis of the two CRISPRs' spacers of isolated strains from different sources revealed that in CRISPR4.1 (  Table 7), (i) the average number of spacers between human-origin and frog-origin strains was higher than that of fish-origin strains (all p < 0.05), but there was no statistically significant difference between frogorigin and human-origin strains (p=0.176); (ii) the number of unique spacers in the fishorigin, frog-origin and human-origin strains was 87 (11.73%), 410 (56.63%) and 115 (97.46%), respectively. According to the chi-square test, the distribution difference of unique spacers in different-origin strains was statistically significant (all p < 0.0125), indicating that the human-origin strains had the largest proportion of unique spacers, followed by frog-origin strains; and (iii) the number of alleles in both human-origin and frogorigin strains was greater than that in fish-origin strains (all p < 0.0125). In CRISPR4.2 (Table 8), (i) the average number of spacers of human-origin strains was higher than that of fish-origin strains (p < 0.05), but there was no statistically significant difference between frog-origin strains and fish-origin or human-origin strains ( p = 0.179); (ii) the number of unique spacers in the fish-origin, frog-origin and human-origin strains was 55 (12.70%), 304 (70.86%) and 72 (98.63%), respectively. The chi-square test showed that the distribution difference of unique spacers was statistically significant in the three different-origin strains, (all p < 0.0125), indicating that unique spacers in human-origin strains accounted for the largest proportion, followed by frog; and (iii) the number of alleles of both frog-origin and human-origin strains was higher than that of fish-origin strains ( p < 0.0125). The spacers of the two loci from the reservoir water strain was consistent with that of the fish-origin strain.
Taken together, 980 unique spacers existed in the two CRISPR loci. Over half of the unique spacers (349 out of 579 in CRISPR4.1, 286 out of 401 in CRISPR4.2) were strain-specific (white boxes), i.e., present in only one strain (  Fig. 2). We found that these sequences were
The spacers of CRISPR are considered to have originated from mobile genetic elements, which are named proto-spacers [5,27]. To elucidate the ecological function of the CRISPR, we sought proto-spacers of the CRISPR in the NCBI database (methods mentioned above).
Despite the abundant availability of phage (virus), plasmid and bacteria sequences, we only matched proto-spacers for 18 unique spacers matched 53 mobile elements and four nonmobile elements among 980 unique spacers, including 34 plasmids, 19 phages, and four bacteria (Table 9). Interestingly, among plasmid proto-spacers, half of them (17) corresponded to Xylella fastidiosa strain plasmids, while the phage proto-spacers were diverse. It was also notable that although the general partiality to plasmids was obvious for CRISPR4.1 (30 out of 37), the CRISPR4.2 proto-spacers had a different prevalence, which was a trend toward phages (12 out of 16). Among the matched spacers, most spacers were associated with several, even ten, known sequences (SZ-W33-4.1-6) of plasmids and phages. Meanwhile, we also found that some different spacers from different strains could match the same mobile element. Indeed, SZ-W33-4.1-6 and SZ-W57-4.1-5/6 proto-spacers were found in the same plasmids, such as Xylella fastidiosa 9a5c plasmid pXF51 and Xylella fastidiosa strain Pr8x plasmid pXF39. This suggested that these strains have become resistant to the same phages and plasmids and that these plasmids and phages are widespread in L. hongkongensis strains.

Analysis of encoded products of proto-spacers
After the analysis of the encoded product of the proto-spacer (  Table 9), we summarized the coding products of the spacer matching gene, usually being a variety of protein families, including (i) some coding products were necessary for the survival and growth of microorganisms, such as "DNA polymerase", "capsid maturation protease", "secretion system protein TraC", "DNA repair protein" and "DNA-binding protein"; some were involved in adapting to the environment, such as "conjugal transfer protein TraG". (ii) Some of the genetic coding products were functional proteins, for example, "ATPase", "DNA cytosine methyl transferase". (iii) Some of the coding products were "hypothetical protein" with unknown functions. There were also many spacer coding products that were currently not put into categories, including "hypothetical protein," which need to be further analyzed and studied. Finally, there were some genetic fragments that were noncoding regions or pseudogenes, which also require further study.

Discussion
The CRISPR/cas system has been described in many different species of bacteria [28,29].
The CRISPR/cas system protects prokaryotes against invading plasmids and viruses and provides a historical perspective of foreign genetic element exposure [29,30]. No study about the CRISPR/cas system in L. hongkongensis has been reported. In our study, we try to provide the first description of the structure and potential function of CRISPR/ cas systems in 118 L. hongkongensis strains.
The I-F CRISPR type (CRISPR4.1 and CRISPR4.2) existed in most of the L. hongkongensis strains. Between the two CRISPR loci present in L. hongkongensis strains, diversity was observed at many levels, including 108 isolates that contained a CRISPR4.1/ cas system, which included repeat genes and complete cas. The number of repeats for CRISPR4.1 varied from 2 to 39, and it was higher than other bacteria and infrequent in enterobacteria [28,31], indicating that the CRISPR4.1/cas system was functional. Yet 10 isolates were  [29]. Previous data have suggested that the enzymatic machinery of a specific locus cannot be effective in conjunction with the CRISPR genetic content of another [29]. Here, we provide data indicating that each cas system may be directly linked to a particular CRISPR repeat sequence. Considering the CRISPR4.2 without cas genes around, we speculate that CRISPR4.2 is a degenerate locus.
The two CRISPR loci shared identical repeats, which were classified into cluster 4. Our data show that the cas gene group was possibly directly connected with a peculiar CRISPR repeat, which is concurrent with the CRISPR structures and cas genes demonstrated by Kunin et al [3]. Sequence alignment between CRISPR loci showed that, although the repeats were generally highly conserved around the locus, polymorphism still existed. In the CRISPR4.2 locus, the last repeats were distinct from the others in the last seven bases but were alike in all the strains, which also existed in Francisella tularensis. This was unexpected because it is accepted that each repeat shares a structure with the previous repeat [32]. We speculate that there may be some sort of joint point between the repeat sequence and the spacer sequence. The marked difference between the first and last repeats suggests that these repeats have something to do with the evolution of the bacteria [28]. This observation is consistent with our assumption that bacterial evolution is accompanied by base alterations in CRISPR repeats.
For cas genes, there was no definite relationship between cas genes and the source, amount or host identity of repeats, since CRISPR4.1 was always accompanied by the presence of cas genes. Some Cas proteins might be required for the novel repeat-spacer entity unit to interact with molecules by CRISPR repeats. Other Cas proteins are likely related to spacer-encoded resistance, which may be modulated with an RNA-guided mechanism [33]. Further studies surveying the molecular mechanism of the role of CRISPRs and the functional connection between special cas genes and a peculiar CRISPR repeat should be done.
Spacers flank consecutive repeats and constitute the most diverse part of CRISPR. The spacer DNAs have been studied previously in many species [34,35]. Our study showed that when corresponding CRISPR loci in other strains were compared, the spacer sequences were abundant in L. hongkongensis strains (Table 5), showing the polymorphism and evolutionary variability of the system. In total, 2562 spacers were found in the two CRISPR loci. A total of 579 and 401 unique spacers, arranged in 40 and 37 alleles, were found for CRISPR4.1 and CRISPR4.2, respectively. These spacers were also arranged in a number of alleles in isolates of different origin (Table 5). These results demonstrated that less than half of the CRISPR array was specific to most strains. The degree of spacer polymorphism, in terms of both total number of unique spacers and the total number of unique spacer arrangements, for a given CRISPR locus was directly correlated with its activity. Thus, CRISPR4.1 was more active than CRISPR4.2. This was based on several results: (i) the CRISPR4.2 repeat sequences were more degenerated than CRISPR4.1; (ii) the average number and maximum number of CRISPR4.1 spacers were higher than CRISPR4.2 spacers; (iii) cas genes accompanied CRISPR4.1, while CRISPR4.2 was an orphan locus. Moreover, our data indicate that the CRISPR polymorphism of human isolates and frog isolates was more closely related and more extensive than that of fish isolates, and the reasons are similar as the abovementioned: (i) the numbers of spacers for human isolates and frog isolates were higher than fish isolates for CRISPR4.1 and CRISPR4.2, suggesting there was more genetic diversity of L.
hongkongensis isolates of frog origin and human origin than those of fish origin, which is consistent with previous studies based on PFGE and MLST [22,24]; (ii) there was no significant difference between frog isolates and human isolates in the quantitative distribution of spacers in the two CRISPR loci; (iii) the ratio of unique spacer sequences in fish isolates was the lowest, while human isolates had the highest, in the two CRISPR loci; (iv) the ratio of alleles of fish isolates was the lowest, meaning that the spacers of fish strains were relatively highly homogeneous. Although the number of the isolates from other origins except for fish and frogs was relatively small, the diversity of the isolates from other host origins was also observed, each having its specific allele. Overall, the CRISPR loci were complex in isolates of the three origins, the highest level of polymorphism being found in the human isolates and frog isolates, followed by fish isolates. It is possible to speculate that frogs are more closely related to humans, and their clones could be better adapted to human hosts.
Our study supports that the spacers were added into the CRISPR locus adjoining the leader sequence, resulting in more diversity than the sequences at the end of CRISPR. This meant that spacer sequences were chronological records that reflected previous encounters with foreign genetic elements. Although continual insertion of new spacers responds to foreign genetic elements, the stability of the bacterial genome always keeps relatively stable, since the deletion of original and valueless spacers, notably the terminal spacers, which possibly came from previous events, accompanied the entry of a novel spacer (  Fig. 2). Furthermore, we speculated that recently acquired spacers might be important and have more opportunities to be kept in the environment to target foreign genetic sequences. In addition, we have shown in strains JM-679, JM-W2, JM-W8, and GZ-W2 that CRISPR4.1 contained, respectively 4, 2, 2, and 4 of the same spacers in themselves, which likely involved responses to foreign genetic elements, leading to the same acquisition of spacers.
In a word, CRISPR loci seemed to evolve with the process of gain and loss of repeat-spacer units.
In general, the spacer DNA is very specific for each strain and is therefore used for epidemiological genotyping [1,36,37]. The analysis of the spacer DNAs showed that matches in CRISPR4.1 (41) were stronger than in CRISPR4.2 (16), but in both, few (only 18) spacers had matches, including plasmids (34), phages (19), bacteria (4) ( Table 9). The low degeneration of the older spacer sequences could explain the lower level of matches, which was found in the recently obtained spacers. Consequently, according to our results, the number of spacers for most strains that had matches were more than the average number of spacers, suggesting the acquisition of spacers from foreign elements. Beyond that, many CRISPR spacers matched the HLHK9 and LHGZ1 genomes (data not shown), implying acquisition of spacers from their own genomes. Many unique spacers did not match any genome, indicating there are many underexploited plasmids, phages, and bacteria, or else that the highly efficient evolution of plasmids and phages leads to proto-spacer base mutation, giving rise to recognition escape by spacers. Interestingly, our data also strongly suggest that CRISPR4. Additionally, the diversity and activity of human-origin and frog-origin CRISPRs are more similar and higher than those of the fish isolates. Our results also indicate that the CRISPR4.1/cas system is a functional unit, and CRISPR4.2 perhaps is a degenerated and nonfunctional locus. Considering the higher spacer diversity and structure of the CRISPR4.1 array in L. hongkongensis, the potential function of a particular CRISPR locus for genotyping, epidemiological and evolution studies may warrant further investigation.

L. hongkongensis bacterial strains
A total of 118 L. hongkongensis isolates were characterized in this study. There were 66 strains from freshwater fish (65 grass carp and one bighead carp), 42 strains from frogs (one wild tiger frog, one giant spiny frog and 40 raising tiger frogs), seven strains from humans, two strains from sewer rats, and one strain from the water reservoir. The 66 fish strains and 42 frog strains were recovered previously from the guts of these animals randomly sampled from retail food markets in Shenzhen (20 fish and 34 frogs), Guangzhou (31 fish and 6 frogs), and Jiangmen (15 fish and 2 frogs), Guangdong province. Seven human L. hongkongensis isolates were all obtained from the feces of patients, including two strains from Guangzhou, two strains from Jiangmen, two strains from Hong Kong, and one strain from Hangzhou, Zhejiang province. The two sewer rat isolates were collected from the intestinal contents of sewer rats in Guangzhou. Finally, one strain from a water sample was recovered from Helong water reservoir in Guangzhou. All of the isolates were identified according to standard biochemical procedures and were analyzed by using a 16S rRNA gene-based PCR assay [15]. Biochemical tests of potential L. hongkongensis isolates showed that they reacted with arginine, catalase, cytochrome oxidase, dihydrolase, and urease and were hardly positive for sugar oxidation-fermentation [17]. These isolation

PCR amplification
DNA extracts were obtained by a rapid boiling method from the 112 confirmed L.
hongkongensis isolates except the above six available complete genomes. These extracts were used as the templates for PCR amplification. The CRISPR and cas PCR primers in this study were synthesized following the principle of the alignment of the conserved sequences, which were found in the GenBank nucleotide sequence database according to the abovementioned accession numbers. Then, we obtained the primers as shown in Table   1. Primers of the 16S rRNA gene were used in positive-control reactions for each colony lysate. The cycling conditions of the two CRISPR arrays were (i) initial denaturation step of 5 hongkongensis CRISPR loci sequences, which were stored in BLAST and CRISPRTarget, were ruled out.

Statistical analyses
Data analysis was performed with Statistical Package for the Social Sciences (SPSS; Version 20.0), and p < 0.05 was regarded as statistically significant. The number of spacers of L.
hongkongensis isolates were compared between two CRISPR loci using the independentsamples t test. Comparisons among more than two groups were performed using analysis of variance (ANOVA) and a post hoc test. Comparisons of the distribution of unique spacers and CRISPR alleles of the L. hongkongensis isolates from fish, frogs, and human for the two CRISPR loci were applied using Pearson's chi-square test. Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.           hongkongensis LHGZ1 is given. A 9 bp insertion sequence is indicated by a red inverted triangle. Homologous genes are marked with the same color.