The Chlamydia suis genome architecture and composition.
The C. suis R19 strain genome was sequenced using Illumina HiSeq with over 1000X coverage (Supplementary Figure 1). This enabled an unbiased, de novo assembly of the genome. The completed genome was 1,094,719 bp in length and a plasmid of 7,496 bp. Read density analysis of the plasmid relative to the chromosome supported approximately 2 copies of the plasmid per chromosome. The genome was annotated using both direct ORF prediction and BLAST homology-based annotation through Geneious and the PATRIC annotation platform revealing 986 coding regions (Chlamydia suis; Chls_###), with 6 rRNAs and 37 tRNAs, and an overall 41.7% G+C content. Relative features of the R19 genome are displayed in Figure 1A and Supplementary Table 1 with comparative data in Table 1.
The origin of replication for this genome was assigned according to the reference genome C. trachomatis L2 434/Bu (NC_010287) (24). Of the 986 coding sequences identified in C. suis, 686 were assigned through comparisons to other Chlamydia and through inferred functional assignment of known motifs with a 75% identity threshold. The assigned genes were divided into 9 broad categories or defined as “other” using the subsystems approach and the RAST toolkit (25, 26). Overall, the largest division, at 165 genes, were predicted to be involved with protein processing followed closely by metabolism-related genes with 163 (Figure 1B). Additional subsystems include those involved with stress response, defense or virulence (50 genes), DNA processing (51 genes), energy (62 genes), RNA processing (31 genes), cellular processes (28 genes), cell envelope (28 genes) or membrane transport (3 genes). The remaining 282 genes had no homologues or identified motifs in Genbank and were annotated as hypothetical.
For comparison, Chlamydia trachomatis L2/434/Bu contains a predicted962 genes while the C. muridarum Nigg genome consists of 983 genes, as re-annotated under the PATRIC annotation platform. C. suis contains 83 genes that do not appear to correlate with direct homologs in either C. trachomatis or C. muridarum, while most of these are hypothetical proteins, several are known members of the tetracycline resistance island. There is a high level of synteny, approximately 90%, in the genome organization for these three genomes as well as to other members of the phylum (Figure 2A). A complete genetic identity table comparing the C. suis R19 gene content to single representations from the chlamydial genomes in Figure 2A is provided in Supplementary Table 2. Both this genetic comparison, and genome-wide sequence phylogenies (Figure 2B) corroborate previous findings using 16S sequences that C. suis is in a phylogenetic clade with C. trachomatis and C. muridarum (27). Upon closer inspection of the genetic make-up of the genome, the “20 most phylogenetically informative proteins” as reported by Pillonel et al are present, intact, and highly similar to the other Chlamydiales (28). As in the other species in the genus, there is one plasmid with 8 genes encoding replicative machinery and putative virulence factors (29).
The C. suis R19 tetracycline resistance cassette, likely integrated by the IS605 transposase and horizontal gene transfer, is thought to be spreading among C. suis strains. Two separate analyses of the tet islands were conducted by Seth-Smith et al and Marti et al in 2017, both of which included R19 for comparison (16,17). Results demonstrated that the island was plastic, but that the key tet (C) gene was always intact, as it is for R19 (22) and that the cassette is transferable between C. suis strains by homologous recombination without the need for the transposase (21).
The C. suis plasticity zone
Arguably the most interesting genomic disparity between Chlamydia is the plasticity zone (PZ). This is a variable region present in all species of the genus that is hypothesized to contain genes responsible for host-specific virulence factors. The PZ of C. muridarum contains a series of phospholipases (PLDs), three genes often annotated as adherence factors that have cytotoxic domains similar to those in Clostridium spp, as well as a purine interconversion operon (30). In contrast, the significantly reduced PZ of C. trachomatis contains several of the phospholipases, but only contains a truncated version of one of the cytotoxin-resembling genes in specific serovars and has replaced the purine interconversion operon with a tryptophan biosynthesis pathway (Figure 3A) (31). Other Chlamydia contain permutations of these plasticity zones where some contain no phospholipases, variable cytotoxin numbers with two present in C. pecorum and one ortholog in C. caviae, C. felis, and C. psittaci. The presence or absence of the tryptophan operon and close investigation of the PZ has provided insight into the immunological interactions between host and bacteria (32–34).
The C. suis PZ contains two adherence factors, both of which are similar to two of the C. muridarum cytotoxin-like genes as also determined by Manuela et al for C. suis MD56 (Figure 3) (35). To investigate the presence of these cytotoxins in other strains of C. suis, full or partial genomes were analyzed for 29 deposited SRA samples using the C. suis R19 genome as a reference (22). Interestingly, when these strains of C. suis were examined, seven were discovered to contain only one of the adherence factors, indicating that both copies are likely non-essential for the in vivo success of the organism, as also noted by Seth-Smith et al (22). It is unclear as of yet, whether the two cytotoxins have distinct biological functions in vivo or whether they play a role in host or biovar selectivity.
Also found in the C. suis PZ is an intact tryptophan biosynthesis pathway, homologous to the one in C. trachomatis genital strains. Complete tryptophan biosynthesis operons are also found in C. pecorum and C. felis. Tryptophan levels are known to play a key role in the inhibiting pathogenesis of intracellular organisms, including Chlamydia. Human (36) and pig (37) innate immune responses are able to limit the available tryptophan through the indoleamine 2,3-dioxygenase (38) response pathway mediated by interferon gamma (IFN-γ). Briefly, IFN-γ, activated by a chlamydial infection, will induce IDO to catalyze the breakdown of L-tryptophan into N-formylkynurenine effectively depleting available tryptophan available for the bacteria. Indole producers, like many members of the human vaginal microbiome, provide the necessary input for salvage by the trp operon. Importantly, ocular C. trachomatis do not encode the trp operon, potentially due to the absence of indole-producing microbes. Sherchand et al demonstrate that the trp operon can be deleterious to Chlamydia in the absence of indole-producers (39). It would follow that perhaps a strong negative selection on the trp operon due to a lack of indole-producers in C. muridarum-infected tissues would explain the absence of the trp operon in C. muridarum. There is no current evidence to support an IFN-γ-mediated IDO response in mice which further supports the negative selection of the trp operon in mouse-infecting Chlamydia. Overall, this pathway could provide a hypothesis as to why there are no incidences of C. muridarum infection in humans, but there have been reports of C. suis present in human samples while detection of active human infection or symptomology has not been reported.
As is seen in C. muridarum and C. trachomatis, the PZ of C. suis contains a putative operon encoding phospholipases (PLDs) and a conserved MAC/perforin-like protein. The operon in C. suis, however, seems to contain an increased number of PLDs relative to both C. trachomatis and C. muridarum (Fig. 3A). Eukaryotic phospholipase D has roles in lipid metabolism and vesicle regulation and this family of proteins has been exploited by pathogens and used to increase virulence (40). While the in vivo role of these genes in Chlamydia remains relatively unknown, a study done with C. trachomatis in HeLa cells provides evidence that the PZ phospholipases (pzPLDs) may be important for inclusion formation (41). Other bacterial phospholipase D genes have been characterized, however, and could provide insight into their role in Chlamydia. A study in Neisseria gonorrhea showed that a phospholipase D homolog acts to increase adherence and invasion to cervical cells by stimulating complement receptor type 3-mediated endocytosis (42). This effect was species specific, as other bacterial phospholipase D proteins were not able to rescue a knock out mutant. In Yersinia pestis, a plasmid-encoded phospholipase D allows the pathogen to survive in its arthropod host further suggesting that these proteins may play a role in host-specificity (43).
The make-up of Chlamydia PLD family genes, therefore, may affect the ability of each species to infect its respective host(s). As noted, C. suis contains more PLD genes within its PZ than almost any other chlamydial species. In fact, several species including C. caviae, C. pneumoniae and many parachlamydiae are missing any PLD family genes in the plasticity zone, but retain ancestral chromosomal PLD outside of the PZ (41). The variability in number and sequence of this PLD operon, and the putative roles in virulence and host specificity from other bacterial species suggests that these enzymes could be providing an essential function in the manipulation of the unique host cell allowing the species to infect and survive within its specific host.
Phylogenetic analysis of the repertoire of PLD enzymes, as identified through their HxKx4Dx6G domains, revealed a distinct clade of chromosomal PLD which mirror classic phylogenetic relationships for the Chlamydia (Figure 3B). This indicates that this chromosomal PLD is likely truly ancestral and has evolved with the speciation of Chlamydia. The pzPLDs, however, are not so simply categorized. There is a great diversity in these sequences, with a subset of proteins more closely related the chromosomal PLD, perhaps representing duplication events. The clade more distantly related the chromosomal PLDs include the greatest number of pzPLDs, potentially pointing to a unique role for this subset of PLDs. The division between clades could suggest differential selection for these PLDs which would support a distinct biological function.
C. suis membrane proteins
One key protein differentiating members of Chlamydiaceae is the major outer membrane protein (Momp, ompA). This porin-like structure is integrated into the outer membrane of all Chlamydia species and has been shown to make up approximately 60% of the chlamydial outer membrane (44). Both the completed structure and function of this protein remain a mystery, but its conserved and highly present nature indicates that this gene is crucial to bacterial survival in the host. In 2016, a study was conducted which looked at variation in ompA among strains of C. muridarum collected from wild mice and found through PCR analysis, a 99% identity between ompA genes (45). Similarly, several strains of Chlamydia pecorum, Chlamydia abortus, Chlamydia pneumoniae, and Chlamydia psittaci (46), have been deposited and close examination of these strains shows little ompA diversity. In contrast, C. trachomatis ompA has increased variability between strains when compared with C. muridarum and has allowed classification into unique serovars or clades based solely on ompA-typing.
While it has been shown that genotyping by ompA does not indicate genomic relatedness (47), these divisions do appear to correlate with specific pathobiovars. In C. trachomatis, ompA appears to be evolving at a faster rate than the rest of the genome, likely due to unique selective pressures, (48) and is characterized by four variable domains (VD) which are used to distinguish serovars. These variable domains are generally regions of lower hydrophobicity and potentially indicate outward facing domains, leading to the hypothesis for the biovar-specificity attributed to this gene. The variability in ompA could confer an evolutionary advantage specific to the host organism and the specific tissues involved in infection.
A recent analysis by Chahota et al investigated the potential for serotyping by Momp in C. suis using PCR analysis of VD2 and VD3 (49). When the deposited contigs were investigated for completed genes for ompA, and compared to R19, similar results were found. C. suis Momp appears to also be divided into four variable domains which directly correspond to those seem in C. trachomatis, though are divergent (Figure 4). Phylogenetic analysis of the C. suis ompA divides the strains into distinct clades, corroborating the observation by Chahota et al, that C. suis strains could be serotyped and a larger study could be performed to investigate the potential for distinctions between the clades.
There are several other important gene families that show some variability between the Chlamydiales, including a family of inclusion membrane proteins (Incs), and polymorphic membrane proteins (Pmps). Both families have been shown to be correlated with tissue tropism and therefore, may play a role in speciation (50). The chlamydial inclusion is a modified vacuole in which chlamydia are able to grow and divide. Relatively little is known about the modifications made to this vacuole that allow for the bacteria’s survival, but the Inc family of proteins are secreted into the inclusion membrane and are known to interact with the host cytosol. Inc proteins make up approximately 7–10% of the chlamydial proteome and have a high degree of diversity (51). Despite this diversity, most Incs are defined by a characteristic bilobed hydrophobicity motif. This domain is conserved among predicted Inc proteins and seems to be a Chlamydia-specific motif.
There are varying predictions for the numbers of Inc proteins across the genus with 23 conserved incs encoded in 5 examined species (52). The C. suis R19 genome encodes each of these conserved incs as well as 36 non-conserved predicted Incs for a total of 58 predicted Incs. According to Lutter et al, C. trachomatis also encodes for 55 incs with 6 that differ from C. muridarum, which has 53 Incs. Analysis of the 58 putative inclusion membrane proteins in C. suis that were identified and aligned with C. trachomatis and C. muridarum revealed an additional 2 putative Incs for each species. A Venn diagram and phylogeny were constructed to display Incs shared between or specific to the three species and their evolutionary relationship (Figure 5). Briefly, 51 Incs are conserved within the three species of the clade. One Inc is unique to C. muridarum (TC0011, Figure 5B). Several Incs are shared between species, with 5 Incs found in C. trachomatis and C. suis only; CT529 (Chls_876), CT222 (Chls_534), CT224 (Chls_538), CT225 (Chls_539) and CT164 (Chls_474) (Figure 5C). Three of these are found in an operon of inclusion membrane proteins and possibly function together as a complex. Additionally, three Incs are unique to C. muridarum and C. suis, TC0239 (Chls_262), TC0171 (Chls_829) and TC0573 (Chls_617). These are not phylogenetically similar and likely stem from an ancestral duplication event before the evolutionary division of the two species. Interestingly, and unexpectedly, due to the placement of C. suis as an intermediate between C. trachomatis and C. muridarum, there appears to be one Inc that is absent in C. suis, but found in the other two species (CT227/TC0498) (Figure 5C, panel 6). This Inc is closely related to the neighboring Inc CT226/TC0497 which is also present in C. suis. In general, the conservation of Incs between C. suis, C. muridarum and C. trachomatis suggests that the protein functions may be retained in all three species and that they may share cognate host interactions. Non-conserved Incs may indicate key differences in host interactions or the host environment in general (Figure 5C). Functional analysis of host-binding or -interacting partners for each of these Incs, could reveal their specific roles in their host species.
Like Incs, polymorphic membrane proteins (Pmps) share in varying numbers across species, though the specific roles for each Pmp are unknown. Pmps are outer membrane proteins and are immunogenic for humans. These genes make up approximately 1–2% of the chlamydial gene content and each contain a C-terminal phenylalanine and as well as multiple GGAI motifs which are associated with host cell adhesion (53). Nine putative pmps were annotated in C. suis R19 through a genome wide search for multiple GGAI motifs. Each of these correspond directly to a Pmp in C. trachomatis and C. muridarum. These Pmp primary sequences range from high identity between the three speciesto low identity: PmpA (79.8%), PmpI (78.8%), PmpH (77.0%), PmpB (74.9%), PmpG (74.0%), PmpE (73.9%), PmpD (72.9%), PmpF65.2%) and PmpC46.3%). Seven out of nine C. suis R19Pmps are more similar to their C. muridarum orthologs while two (PmpH and PmpF) are more similar to C. trachomatis. Interestingly, Pmps have been shown to be involved in cellular tropism as six of the nine Pmps in C. trachomatis analyzed from different serovars were able to be phylogenetically clustered based on disease properties, suggesting a role for these membrane proteins in adhesion or differential biovar tropism (53).