Comparative genomics of swine pathogen Chlamydia suis with human- and mouse-adapted Chlamydia spp

The phylum, Chlamydiae, contains a unique set of obligately intracellular bacteria with specific host ranges, despite high genomic relatedness between species. Chlamydia suis, a ubiquitous swine pathogen, has the potential for zoonotic transmission to humans and often encodes for resistance to the primary treatment antibiotic, tetracycline. Because of this emerging threat, and to gain a better understanding of the basic biology, comparative genomics for a tetracycline-resistant swine isolate (R19) with inter- and intra-species genomes was performed. A 1.094Mb genome was determined through de novo assembly of Illumina high throughput sequencing reads. Annotation and subsystem analyses were conducted using novel bioinformatic platform, PATRIC, revealing 986 putative genes (Chls_###) that are predominantly orthologs to other known Chlamydia genes. Subsequent comparative genomics using C. trachomatis and C. muridarum, human and mouse pathogens, respectively, revealed a high level of genomic synteny and overall sequence identity. Around 857 genes were found to be conserved between these three species while 92 unique C. suis components were annotated. Direct comparison of chlamydia-specific gene families; inclusion membrane proteins, polymorphic membrane proteins and the major outer membrane protein as well as the plasticity zone, demonstrates the high gene content identity with C. trachomatis and C. muridarum as well as highlights putative host-specificity factors. This study constitutes the first genome-wide comparative analysis for C. suis, generating a fully annotated reference genome using NIAID tool, PATRIC. Overall, these for to Blosum65. Distances were obtained from pairwise alignments of all sequence pairs. To assemble the trees, a Jukes-Cantor genetic distance model was selected and UPGMA tree-building was performed. Nucleotide-based alignments and analyses were also performed in all cases and resulting phylogenies showed similar relationships to protein sequences (Data not shown). Whole genome alignments and subsequent phylogenies to determine genomic evolutionary relationships were performed using the progressiveMauve algorithm with a match seed weight of 15 and a minimum LCB score of 30,000. Gaps were aligned using Muscle 3.6 and a phylogenetic tree was determined using the neighbor-joining method with bootstrapping.

analyses build upon previous C. suis analyses to compare the composition of key genetic components with two closely related Chlamydia species. These studies will enable focused efforts on factors that provide key species specificity and adaptation to cognate hosts that are attributed to chlamydial infections, including humans. Continued analysis of the C. suis genome will facilitate studies into the complexities of the chlamydial proteome and discovery of basic biology.

Background
Chlamydia is a genus of biologically distinctive, obligate intracellular bacterial pathogens that cause disease in both humans and agriculturally important livestock. Chlamydia suis is a near-ubiquitous swine pathogen causing reproductive disorders, conjunctivitis, enteritis, rhinitis and pneumonia (1)(2)(3)(4). Prevalence of C. suis disease within regions throughout the world has been investigated and those reports each found high incidence in farmed pigs. Providing a major challenge for porcine treatment, many C. suis strains have acquired a tetracycline-resistance cassette with clinical resistance confirmed in over seven countries (5)(6)(7)(8)(9). In vitro evidence has also demonstrated the ability for tetracycline resistance to be conveyed to C. trachomatis through horizontal gene transfer under conditions of co-infection (10,11). Though traditionally restricted to its swine host, there is some evidence that C. suis may be zoonotically transmitted, although active human infection and clinical symptoms have not been observed (12)(13)(14).
The emerging potential threat of C. suis and ongoing concerns to the agricultural industry highlight a need for a comprehensive comparison of the Chlamydia suis genome to the already prominent human pathogen, Chlamydia trachomatis. C. trachomatis is the most commonly reported bacterial infection worldwide and is the causative agent behind several important diseases including blinding trachoma and the sexually transmitted disease, Chlamydia. Importantly, genome sequencing and comparative studies between Chlamydia have enabled the identification and analysis of gene products that are diverse and may participate in host specific adaptations (15) (16) (17) (18). While there have been many intraspecies analyses of C. suis with intense focus on the tetracycline-resistance genes, including those by Donati et al (19), Joseph et al (20), Marti et al (21), and Sethsmith et al (22), interspecies comparisons, particularly with C. trachomatis and C. muridarum, the species most closely related to C. suis, have been limited. One key observation within Chlamydia is that despite the genetic similarity between species, there appear to be many host-restricting factors that are responsible for the variation in disease between species. We expect that by studying the C. suis genome in comparison to other Chlamydia, we can begin to tease out some of these key host-specificity factors.
The C. suis R19 strain is a tetracycline resistant strain isolated from a Nebraska farm (23) and has been widely studied for diagnostic and antibiotic resistance transfer capabilities.
Previously, this strain was sequenced by Joseph et al with 14x coverage, but because this genome remains incomplete and with lower quality reads, we have used alternative methods to independently assemble a C. suis R19 genome and have used novel NIAID annotation pipeline, PATRIC, for genome annotation which was able to identify several new open reading frames and possible coded protein functions.These analyses have revealed that C. suis has a genome that is highly similar in content and organization to both C. trachomatis and C. muridarum and phylogenetic analysis confirms previous reports that C.
suis is most closely related to these species. Accordingly, C. suis encodes for many genes similar to other Chlamydia but with unique combinations that are both distinct from as well as similar to C. trachomatis and C. muridarum which may play a role in the hostadaptation of Chlamydia.

Results And Discussion
The Chlamydia suis genome architecture and composition.
The C. suis R19 strain genome was sequenced using Illumina HiSeq with over 1000X coverage (Supplementary Figure 1). This enabled an unbiased, de novo assembly of the genome. The completed genome was 1,094,719 bp in length and a plasmid of 7,496 bp.
Read density analysis of the plasmid relative to the chromosome supported approximately 2 copies of the plasmid per chromosome. The genome was annotated using both direct ORF prediction and BLAST homology-based annotation through Geneious and the PATRIC annotation platform revealing 986 coding regions (Chlamydia suis; Chls_###), with 6 rRNAs and 37 tRNAs, and an overall 41.7% G+C content. Relative features of the R19 genome are displayed in Figure 1A and Supplementary Table 1 with comparative data in Table 1.
The origin of replication for this genome was assigned according to the reference genome C. trachomatis L2 434/Bu (NC_010287) (24). Of the 986 coding sequences identified in C.
suis, 686 were assigned through comparisons to other Chlamydia and through inferred functional assignment of known motifs with a 75% identity threshold. The assigned genes were divided into 9 broad categories or defined as "other" using the subsystems approach and the RAST toolkit (25,26). Overall, the largest division, at 165 genes, were predicted to be involved with protein processing followed closely by metabolism-related genes with 163 ( Figure 1B). Additional subsystems include those involved with stress response, defense or virulence (50 genes), DNA processing (51 genes), energy (62 genes), RNA processing (31 genes), cellular processes (28 genes), cell envelope (28 genes) or membrane transport (3 genes). The remaining 282 genes had no homologues or identified motifs in Genbank and were annotated as hypothetical.
For comparison, Chlamydia trachomatis L2/434/Bu contains a predicted962 genes while the C. muridarum Nigg genome consists of 983 genes, as re-annotated under the PATRIC annotation platform. C. suis contains 83 genes that do not appear to correlate with direct homologs in either C. trachomatis or C. muridarum, while most of these are hypothetical proteins, several are known members of the tetracycline resistance island. There is a high level of synteny, approximately 90%, in the genome organization for these three genomes as well as to other members of the phylum (Figure 2A). A complete genetic identity table comparing the C. suis R19 gene content to single representations from the chlamydial genomes in Figure 2A is provided in Supplementary Table 2. Both this genetic comparison, and genome-wide sequence phylogenies ( Figure 2B) corroborate previous findings using 16S sequences that C. suis is in a phylogenetic clade with C. trachomatis and C. muridarum (27). Upon closer inspection of the genetic make-up of the genome, the "20 most phylogenetically informative proteins" as reported by Pillonel et al are present, intact, and highly similar to the other Chlamydiales ( 28). As in the other species in the genus, there is one plasmid with 8 genes encoding replicative machinery and putative virulence factors (29).
The C. suis R19 tetracycline resistance cassette, likely integrated by the IS605 transposase and horizontal gene transfer, is thought to be spreading among C. suis strains. Two separate analyses of the tet islands were conducted by Seth-Smith et al and Marti et al in 2017, both of which included R19 for comparison (16,17). Results demonstrated that the island was plastic, but that the key tet (C) gene was always intact, as it is for R19 (22) and that the cassette is transferable between C. suis strains by homologous recombination without the need for the transposase (21).
The C. suis plasticity zone Arguably the most interesting genomic disparity between Chlamydia is the plasticity zone (PZ). This is a variable region present in all species of the genus that is hypothesized to contain genes responsible for host-specific virulence factors. The PZ of C. muridarum contains a series of phospholipases (PLDs), three genes often annotated as adherence factors that have cytotoxic domains similar to those in Clostridium spp, as well as a purine interconversion operon (30). In contrast, the significantly reduced PZ of C. trachomatis contains several of the phospholipases, but only contains a truncated version of one of the cytotoxin-resembling genes in specific serovars and has replaced the purine interconversion operon with a tryptophan biosynthesis pathway ( Figure 3A) (31). Other full or partial genomes were analyzed for 29 deposited SRA samples using the C. suis R19 genome as a reference (22). Interestingly, when these strains of C. suis were examined, seven were discovered to contain only one of the adherence factors, indicating that both copies are likely non-essential for the in vivo success of the organism, as also noted by Seth-Smith et al (22). It is unclear as of yet, whether the two cytotoxins have distinct biological functions in vivo or whether they play a role in host or biovar selectivity.
Also found in the C. suis PZ is an intact tryptophan biosynthesis pathway, homologous to the one in C. trachomatis genital strains. Complete tryptophan biosynthesis operons are also found in C. pecorum and C. felis. Tryptophan levels are known to play a key role in the inhibiting pathogenesis of intracellular organisms, including Chlamydia. Human (36) and pig (37) innate immune responses are able to limit the available tryptophan through the indoleamine 2,3-dioxygenase (38) response pathway mediated by interferon gamma (IFN-γ). Briefly, IFN-γ, activated by a chlamydial infection, will induce IDO to catalyze the breakdown of L-tryptophan into N-formylkynurenine effectively depleting available tryptophan available for the bacteria. Indole producers, like many members of the human vaginal microbiome, provide the necessary input for salvage by the trp operon.
Importantly, ocular C. trachomatis do not encode the trp operon, potentially due to the absence of indole-producing microbes. Sherchand et al demonstrate that the trp operon can be deleterious to Chlamydia in the absence of indole-producers (39). It would follow that perhaps a strong negative selection on the trp operon due to a lack of indoleproducers in C. muridarum-infected tissues would explain the absence of the trp operon in C. muridarum. There is no current evidence to support an IFN-γ-mediated IDO response in mice which further supports the negative selection of the trp operon in mouse-infecting Chlamydia. Overall, this pathway could provide a hypothesis as to why there are no incidences of C. muridarum infection in humans, but there have been reports of C. suis present in human samples while detection of active human infection or symptomology has not been reported.
As is seen in C. muridarum and C. trachomatis, the PZ of C. suis contains a putative operon encoding phospholipases (PLDs) and a conserved MAC/perforin-like protein. The operon in C. suis, however, seems to contain an increased number of PLDs relative to both C. trachomatis and C. muridarum (Fig. 3A). Eukaryotic phospholipase D has roles in lipid metabolism and vesicle regulation and this family of proteins has been exploited by pathogens and used to increase virulence (40). While the in vivo role of these genes in Chlamydia remains relatively unknown, a study done with C. trachomatis in HeLa cells provides evidence that the PZ phospholipases (pzPLDs) may be important for inclusion formation (41). Other bacterial phospholipase D genes have been characterized, however, and could provide insight into their role in Chlamydia. A study in Neisseria gonorrhea showed that a phospholipase D homolog acts to increase adherence and invasion to cervical cells by stimulating complement receptor type 3-mediated endocytosis (42). This effect was species specific, as other bacterial phospholipase D proteins were not able to rescue a knock out mutant. In Yersinia pestis, a plasmid-encoded phospholipase D allows the pathogen to survive in its arthropod host further suggesting that these proteins may play a role in host-specificity (43).
The make-up of Chlamydia PLD family genes, therefore, may affect the ability of each species to infect its respective host(s). As noted, C. suis contains more PLD genes within its PZ than almost any other chlamydial species. In fact, several species including C. caviae, C. pneumoniae and many parachlamydiae are missing any PLD family genes in the plasticity zone, but retain ancestral chromosomal PLD outside of the PZ (41). The variability in number and sequence of this PLD operon, and the putative roles in virulence and host specificity from other bacterial species suggests that these enzymes could be providing an essential function in the manipulation of the unique host cell allowing the species to infect and survive within its specific host.
Phylogenetic analysis of the repertoire of PLD enzymes, as identified through their HxKx 4 Dx 6 G domains, revealed a distinct clade of chromosomal PLD which mirror classic phylogenetic relationships for the Chlamydia ( Figure 3B). This indicates that this chromosomal PLD is likely truly ancestral and has evolved with the speciation of Chlamydia. The pzPLDs, however, are not so simply categorized. There is a great diversity in these sequences, with a subset of proteins more closely related the chromosomal PLD, perhaps representing duplication events. The clade more distantly related the chromosomal PLDs include the greatest number of pzPLDs, potentially pointing to a unique role for this subset of PLDs. The division between clades could suggest differential selection for these PLDs which would support a distinct biological function.

C. suis membrane proteins
One key protein differentiating members of Chlamydiaceae is the major outer membrane protein (Momp, ompA). This porin-like structure is integrated into the outer membrane of all Chlamydia species and has been shown to make up approximately 60% of the chlamydial outer membrane (44). Both the completed structure and function of this protein remain a mystery, but its conserved and highly present nature indicates that this gene is crucial to bacterial survival in the host. In 2016, a study was conducted which looked at variation in ompA among strains of C. muridarum collected from wild mice and found through PCR analysis, a 99% identity between ompA genes (45). Similarly, several strains of Chlamydia pecorum, Chlamydia abortus, Chlamydia pneumoniae, and Chlamydia psittaci (46), have been deposited and close examination of these strains shows little ompA diversity. In contrast, C. trachomatis ompA has increased variability between strains when compared with C. muridarum and has allowed classification into unique serovars or clades based solely on ompA-typing.
While it has been shown that genotyping by ompA does not indicate genomic relatedness (47), these divisions do appear to correlate with specific pathobiovars. In C. trachomatis, ompA appears to be evolving at a faster rate than the rest of the genome, likely due to unique selective pressures, (48)  suis using PCR analysis of VD2 and VD3 (49). When the deposited contigs were investigated for completed genes for ompA, and compared to R19, similar results were found. C. suis Momp appears to also be divided into four variable domains which directly correspond to those seem in C. trachomatis, though are divergent (Figure 4). Phylogenetic analysis of the C. suis ompA divides the strains into distinct clades, corroborating the observation by Chahota et al, that C. suis strains could be serotyped and a larger study could be performed to investigate the potential for distinctions between the clades.
There are several other important gene families that show some variability between the Chlamydiales, including a family of inclusion membrane proteins (Incs), and polymorphic membrane proteins (Pmps). Both families have been shown to be correlated with tissue tropism and therefore, may play a role in speciation (50). The chlamydial inclusion is a modified vacuole in which chlamydia are able to grow and divide. Relatively little is known about the modifications made to this vacuole that allow for the bacteria's survival, but the Inc family of proteins are secreted into the inclusion membrane and are known to interact with the host cytosol. Inc proteins make up approximately 7-10% of the chlamydial proteome and have a high degree of diversity (51). Despite this diversity, most Incs are defined by a characteristic bilobed hydrophobicity motif. This domain is conserved among predicted Inc proteins and seems to be a Chlamydia-specific motif.
There are varying predictions for the numbers of Inc proteins across the genus with 23 conserved incs encoded in 5 examined species (52). The C. suis R19 genome encodes each of these conserved incs as well as 36 non-conserved predicted Incs for a total of 58 predicted Incs. According to Lutter et al, C. trachomatis also encodes for 55 incs with 6 that differ from C. muridarum, which has 53 Incs. Analysis of the 58 putative inclusion membrane proteins in C. suis that were identified and aligned with C. trachomatis and C. muridarum revealed an additional 2 putative Incs for each species. A Venn diagram and phylogeny were constructed to display Incs shared between or specific to the three species and their evolutionary relationship ( Figure 5). Briefly, 51 Incs are conserved within the three species of the clade. One Inc is unique to C. muridarum (TC0011, Figure 5B).
Several Incs are shared between species, with 5 Incs found in C. trachomatis and C. suis only; CT529 (Chls_876), CT222 (Chls_534), CT224 (Chls_538), CT225 (Chls_539) and CT164 (Chls_474) ( Figure 5C). Three of these are found in an operon of inclusion membrane proteins and possibly function together as a complex. Additionally, three Incs are unique to C. muridarum and C. suis, TC0239 (Chls_262), TC0171 (Chls_829) and TC0573 (Chls_617). These are not phylogenetically similar and likely stem from an ancestral duplication event before the evolutionary division of the two species. Interestingly, and unexpectedly, due to the placement of C. suis as an intermediate between C. trachomatis and C. muridarum, there appears to be one Inc that is absent in C. suis, but found in the other two species (CT227/TC0498) ( Figure 5C, panel 6). This Inc is closely related to the neighboring Inc CT226/TC0497 which is also present in C. suis. In general, the conservation of Incs between C. suis, C. muridarum and C. trachomatis suggests that the protein functions may be retained in all three species and that they may share cognate host interactions. Non-conserved Incs may indicate key differences in host interactions or the host environment in general ( Figure 5C). Functional analysis of host-binding orinteracting partners for each of these Incs, could reveal their specific roles in their host species.
Like Incs, polymorphic membrane proteins (Pmps) share in varying numbers across species, though the specific roles for each Pmp are unknown. Pmps are outer membrane proteins and are immunogenic for humans. These genes make up approximately 1-2% of the chlamydial gene content and each contain a C-terminal phenylalanine and as well as multiple GGAI motifs which are associated with host cell adhesion (53). Nine putative pmps were annotated in C. suis R19 through a genome wide search for multiple GGAI motifs. Each of these correspond directly to a Pmp in C. trachomatis and C. muridarum.

Conclusions
The C. suis R19 genome was determined through whole genome sequencing and fully annotated using PATRIC. One key aspect of this study was the use of NIAID annotation service, PATRIC, for independent annotation. As indicated in Table 1 Based on the evidence for C. suis evolution with C. trachomatis and C. muridarum and the presence of several genes present in C. trachomatis and absent in C. muridarum, C. suis may provide a better model for human chlamydial infection than C. muridarum. In order to evaluate this, several questions would need to be answered. Primarily, does C. suis infect, ascend and cause pathology in a mouse? One study has provided some evidence that C.
suis may result in a more robust infection in mice than C. trachomatis ( 54), but the absence of direct infection analyses and comparison with C. muridarum leaves some uncertainty related to this hypothesis.
Comparative analysis of the C. suis genome opens the door for further studies into the complexities of the chlamydial genetic repertoire, proteome and host specificity adaptations. Interestingly, individual Chlamydia can display diverse but limited ranges of mammalian hosts. Livestock pathogens like C. pecorum, which cause chlamydiosis in a variety of animals particularly ruminants and swine, and is notably, the leading cause of infectious disease in koalas and C. abortus, which also infects ruminants and has been associated with spontaneous abortions in swine and sheep, are among the few able to infect and cause disease in a multiple ruminants species, while the majority of Chlamydia, including C. suis, C. trachomatis and C. muridarum appear to be largely restricted to a single host. Comparative genomic studies, like this report, will continue to identify molecular candidates potentially involved in restricting host range and enable direct genetic analyses to investigate the role of these host specific factors.

Whole Genome Sequencing and De Novo Assembly
Using the DNeasy Blood and Tissue Kit (Qiagen), DNA was isolated and purified from C. suis strain R19, provided by Dr. Daniel D Rockey (8). Library generation was performed at the University of Kansas Genome Sequencing Core using the NEBNext Ultra II DNA Library kit. Libraries were run on the Illumina Miseq PE100. Over 90 million reads, of 151bp, were obtained. Reads were trimmed and quality filtered using BBDuk. Subsequent de novo assembly was performed in the Geneious (version 9.1.8) software suite using Velvet v1.2.10 with the Velvet optimizer to determine an optimal Kmer (55). When needed, to resolve larger gaps and to verify final assembly, PCR and Sanger sequencing was used.
Predicted coverage to the assembled R19 genome was over 1000x. The origin of replication and the first nucleotide was assigned using C. trachomatis L2 434/Bu (NC_010287) as a reference and genomes were adjusted accordingly.

Annotation
Annotation was performed using the RAST tool kit (RASTtk) through PATRIC web resources (56) and using open reading frame prediction through Geneious. Functional assignments of Geneious-identified open reading frames were verified through BLAST analysis against protein sequences in the NCBI database. Subsystems were determined through PATRIC automatic annotation.

Phylogenetic Analyses
Comparison of genetic content was performed using the following reference genomes: C. BLAST E-value of 1e-5. Phylogenetic trees comparing single genes were constructed using the Geneious Tree Builder for a global alignment with free end gaps and a cost matrix set to Blosum65. Distances were obtained from pairwise alignments of all sequence pairs. To assemble the trees, a Jukes-Cantor genetic distance model was selected and UPGMA treebuilding was performed. Nucleotide-based alignments and analyses were also performed in all cases and resulting phylogenies showed similar relationships to protein sequences (Data not shown). Whole genome alignments and subsequent phylogenies to determine genomic evolutionary relationships were performed using the progressiveMauve algorithm with a match seed weight of 15 and a minimum LCB score of 30,000. Gaps were aligned using Muscle 3.6 and a phylogenetic tree was determined using the neighbor-joining method with bootstrapping.

Declarations
Ethics approval and consent to participate Not Applicable

Not Applicable
Availability of data and materials The datasets generated and/or analyzed during the current study are available in the NCBI repository, www.ncbi.nlm.nih.gov under the accession numbers CP034310 and CP034311.

Competing Interests
The authors declare that they have no competing interests Funding ZED was supported by NIH AI126785. PSH was supported by NIH AI126785 and P20GM113117. Genome sequencing was supported by NIH P20GM103638. NIH or associated administrative personnel had no direct influence on study design, analysis, interpretation of the data, or writing this manuscript.
Author's Contributions ZED assembled and annotated the genome, performed all subsequent comparative analyses and wrote the manuscript. PSH was a major contributor of analytical direction and in revising the manuscript. All authors read and approved the final manuscript.      trachomatis (blue), C. muridarum (green) and C. suis (red). Tree is transformed with ordered branching.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.