Bioinformatic prospecting identified 99 novel, misannotated and unnoticed putative circular bacteriocins

Circular bacteriocins are antimicrobial peptides produced by bacteria with a N and C termini ligation. They have desirable properties such as activity at low concentrations along with thermal, pH and proteolytic resistance. There are nineteen experimentally confirmed circular bacteriocins as part of bacteriocin gene clusters, with transport, membrane and immunity proteins. Traditionally, novel antimicrobials are found by testing large numbers of isolates against indicator strains, with no promise of corresponding novel sequence. Through bioprospecting publicly available sequence databases, we identified ninety-nine circular bacteriocins across a variety of bacteria bringing the total to 118. They were grouped into two families within class IIc (IIc i and ii) and further divided into subfamilies based on similarity to experimentally confirmed circular bacteriocins. Within subfamilies, sequences overwhelmingly shared similar characteristics such as sequence length, presence of a polybasic region, conserved locations of aromatic residues, C and N termini, gene clusters similarity, translational coupling and hydrophobicity profiles. At least ninety were predicted to be putatively functional based on gene clusters. Furthermore, bacteriocins identified from Enterococcus, Staphylococcus and Streptococcus species may have activity against clinically relevant strains, due to the presence of putative immunity genes required for expression in a toxin-antitoxin system. Some strains such as Paenibacillus larvae subsp. pulvifaciens SAG 10367 contained multiple circular bacteriocin gene clusters from different subfamilies, while some strains such as Bacillus cereus BCE-01 contained clusters with multiple circular bacteriocin structural genes. Sequence analysis provided rapid insight into identification of novel, putative circular bacteriocins, as well as conserved genes likely essential for circularisation. This represents an expanded library of putative antimicrobial proteins which are potentially active against human, plant and animal pathogens.

3 Background Circular bacteriocins are a class of ribosomally produced antimicrobial peptides with a covalent peptide bond between the N and C termini (Samyn, Martinez-Bueno et al. 1994, Kawai, Saito et al. 1998. The circularisation of the molecule improves thermostability, pH tolerance and proteolytic resistance (Borrero, Kelly et al. 2018), under which conditions most other proteins would be denatured or inactivated. Linearising or nicking circular bacteriocins hampers these intrinsic properties as well as causing a significant reduction in anti-microbial potency (Kawai, Kemperman et al. 2004, Montalbán-López, Spolaore et al. 2008, Sánchez-Hidalgo, Montalbán-López et al. 2011. They have been shown to work by binding to the cell membrane and creating pores, which act as non-selective ion channels causing cell death (Gálvez, Maqueda et al. 1991, Kawai, Ishii et al. 2004, Himeno, Rosengren et al. 2015. Receptor molecules binding circular bacteriocins may also be involved, as demonstrated by garvicin ML targeting the maltose ABC transporter (Gabrielsen, Brede et al. 2012).
Bacteriocins have many advantages over traditional antimicrobials such as antibiotics.
There have been no major reports of bacteriocin resistance, possibly due to their strong activity at low concentrations (Perez, Zendo et al. 2014). Because bacteriocins are encoded, it means they can be genetically engineered and targeted towards specific organisms (Perez, Zendo et al. 2014, Jiménez, Diep et al. 2015. Due to these characteristics, there is also considerable scope for use in anti-spoilage and food-safety applications.
Circular bacteriocins are class IIc (or class V) bacteriocins which can be divided into two families, IIc i and IIc ii based on sequence identity (Cotter, Hill et al. 2005, Gabrielsen, Brede et al. 2014. Table 1 shows the list of experimentally confirmed circular bacteriocins and their characteristics. Class IIc circular bacteriocins are short sequences (58-70 amino acids in length), four (five in the case of AS-48 and BacA) helical segments that enclose a tightly packed hydrophobic core, a saposin fold, no cysteine pairs, and all (except butyrivibriocin AR10) contain a polybasic region involved in binding to target cell membranes (González, Langdon et al. 2000, Acedo, van Belkum et al. 2015, Himeno, Rosengren et al. 2015. Circular bacteriocins are usually produced by a gene cluster or operon consisting of 4-10 genes. The mechanism of circularisation and roles of each gene within clusters have not yet been completely elucidated (Maqueda, Sánchez-Hidalgo et al. 2008, Gabrielsen, Brede et al. 2014, though annotation and mutagenesis studies have provided insight into this (Cebrián, Maqueda et al. 2010, Sánchez-Hidalgo, Montalbán-López et al. 2011. A prepeptide encoded by the bacteriocin structural gene is produced, followed by signal sequence/leader peptide cleavage. This mature peptide is then able to be either circularised within the cell then secreted which has been shown for leucocyclin Q (Mu, Masuda et al. 2014), or secreted and then circularised (Perez, Zendo et al. 2018). The genes involved and the process are not well understood, and it's possible that different pathways exist for different circular bacteriocins. Circularisation appears contingent on hydrophobic N and C termini residues along with the signal sequence, which is required for correct mature peptide processing (Perez, Sugino et al. 2017).
Circular bacteriocin gene clusters are often constituted of overlapping genes, demonstrating a tight organisational structure or genes which depend upon the ribosomal binding site of upstream genes. This indicates expression is regulated by translational coupling (Perez, Ishibashi et al. 2016). All of the currently identified circular bacteriocin gene clusters contain at least two genes that are translationally-coupled (Table 1).
There are nineteen experimentally confirmed circular bacteriocins. Evolutionary-based approaches such as sequence alignments, phylogenetics and gene cluster analysis can provide insight and allow novel identification. This study has identified many new and unmentioned putative circular bacteriocins based on sequence similarity from publicly available sequence data. These putative circular bacteriocins were analysed for characteristics commonly found in circular bacteriocins.

Identification and characteristics of putative circular bacteriocins
This study has identified ninety-nine putative circular bacteriocins within a range of microorganisms, bringing the total known circular bacteriocins to 118 ( Fig S1, Figure 1).
Fig S1 contains detailed information about each identified circular bacteriocin, characteristics, strain information and accession numbers. As signal sequences can be highly species specific Abrahmsèn 1989, Himeno, Rosengren et al. 2015), they were not used for identification of putative circular bacteriocins. Although they are essential for correct folding, circularisation and bioactivity of circular bacteriocins (Perez, Sugino et al. 2017), removing them from database mining allowed identification of distantly-related putative circular bacteriocins. While some putative circular bacteriocins were annotated correctly, many were unannotated or annotated as branched-chain amino acid aminotransferases which are involved in amino acid catabolism (Thage, Rattray et al. 2004), despite having high similarity and sequence motifs to the mature sequences of known circular bacteriocins.
None contained disulphide bonds. Cysteine residues existed only as single residues in 10/118 of the putative and experimentally confirmed sequences, indicating they are not present for disulphide bond formation ( Fig S1).
larvae ERIC_I (NZ_CP019651.1) harboured AS-48-like and uberolysin-like clusters. Table 2 shows the list of bacteriocins identified which may be active against the WHO's global priority list of antibiotic resistant bacteria due to the presence of putative immunity genes within the gene clusters (WHO 2017).

Phylogenetics of circular bacteriocins
Based on the sequence analysis of bacteriocins, there appears to be two different families of class IIc circular bacteriocins, family IIc i and IIc ii, each cluster with bootstrap values of 100 (Cotter, Hill et al. 2005, Gabrielsen, Brede et al. 2014 (Fig S2). Out of the 118 sequences, 89 (75.4%) are part of family IIc I 29 (24.6%) are from IIc ii ( Fig S2, Table 3).
However, there is considerable sequence divergence within these families, with family IIc i demonstrating a wide variety of sequence lengths and compositions. Therefore, the most appropriate way to classify these sequences was to separate them based on their most closely-related experimentally confirmed circular bacteriocin. In some cases such as streptocyclin, divergence was considered too high (based on bootstrap values) and new subfamilies were coined using the 'cyclin' suffix.
Due to phylogenetic ambiguity and divergence of the identified circular bacteriocin sequences, it was inappropriate to classify each putative circular bacteriocin into currently identified/characterised subfamilies. To remedy this, new circular bacteriocin subfamilies were proposed and named including streptocyclin, akalicyclin, krulwicyclin, bacillocyclin and venezuelacyclin ( Fig S1).
Family IIc i was composed of the circularin, lactocyclin/leucocyclin, bacillocyclin, AS-48, amylocyclin, enterocin NKR-5-3B, uberolysin, aureocyclin 4185/garvicin ML, venezuelacyclin, krulwicyclin and carnocyclin A subfamilies. IIc ii were composed of the paracyclicin, akalicyclin, streptocyclin, butyrivibriocin AR10, gassericin/acidocin and plantaricyclin/plantacyclin subfamilies. Due to sequence similarity and phylogenetic branch position, several experimentally confirmed circular bacteriocins were classified within the same subfamily. They included aureocyclin 4185 and garvicin ML (61.4% similarity), lactocyclin and leucocyclin (82% similarity), gassericin and acidocin (100% similarity), plantaricyclin and plantacyclin (94.8% similarity). Some of these subfamilies will most likely fracture into clearer, distinct subfamilies as more sequences become available. Several putative circular bacteriocins were found on lone phylogenetic branches and did not fit into subfamilies and were not classified beyond the familial level ( Figure   1).Hydrophobicity of mature circular bacteriocins Analysis of hydrophobicity profiles suggested two major profiles (Figure 2), with a few exceptions ( Fig S3). This gave further evidence that the putative sequences identified were most likely circular bacteriocins. The two major hydrophobicity profiles of the circular bacteriocins matched the phylogenetic family classifications of IIc i and IIc ii ( Figure 2). It appears, despite sequence divergence within families, residues are mutating to residues which maintain the hydrophobic profile of the protein. In general, the N terminus of class IIc i tended to have a variable hydrophobic profile, reflecting the sequence divergence and residue length differences within the family.
Both families have similar regions within the hydrophobicity profiles, despite the sequence variability within and between them. In general they are considerably hydrophobic. The C and N termini of every sequence was also found to be hydrophobic ( Fig S3). Both families also have a notable polybasic region (residues 52-65 in IIc i and 14-19 in IIc ii) which produces two hydrophilic troughs.
Despite not fitting into any direct phylogenetic subfamilies within family IIc ii, Bacillus pumilus GM3FR, Paeniclostridium sordellii R26833 and Bacillus thuringiensis serovar indiana HD521 all match the hydrophobic profile of family IIc ii. Sequence logos ( Fig S4) showed high levels of conservation within the IIc ii family, while IIc i had high levels of conservation at the N and C termini. The conserved termini may be implicated as a ligation motif, allowing circularisation of the C and N termini.

Gene cluster analysis
To determine the number of putatively functional circular bacteriocins, each putative cluster was compared to the cluster of its most closely related experimentally confirmed circular bacteriocin ( Figure 1). Table 3 shows a summary of this analysis.
Though there was high cluster divergence between families, similar genes were found in clusters in almost every case, but not limited to: ABC transporters, putative immunity gene/s, transmembrane proteins, SpoIIM proteins, permeases etc. Table S1 shows a general summary across the identified subfamilies. This provided more evidence that most of these putative sequences were circular bacteriocins, in line with the sequence similarity and hydrophobic profile results. Different gene clusters showed different degrees of similarity, with many having gene rearrangements, inversions, insertions and sharing low sequence similarities between homologues.
Several bacteriocin clusters appeared incomplete ( Fig S5) and it is probable that some of these clusters were vestigial or pseudogenes. Of the total 118 circular bacteriocin clusters, a conservative estimate of 90 (76.3%) were putatively functional (Fig S1), though the number is likely higher due to the percentage of gene clusters which contain translational coupling (92.4%). As this analysis was restricted by limited sequence data and assemblies, other genes outside the clusters required for circular bacteriocin production may be present elsewhere in the genome. These would be functional but would be scored as non-functional via this analysis. ABC transporters were seen in every single experimentally confirmed circular bacteriocin cluster, as well 95/99 of putatively identified clusters ( Fig S5). This indicates that these 5 circular bacteriocins without ABC transporters were either inactive vestigial remnants or exported via another ABC transporter. Circular bacteriocin ABC transporters are highly similar to ABC transporters within the genomes. It was unclear if non-cluster transporters would be involved in production of circular bacteriocins and were thus considered putatively non-functional. HylD/efflux RND transporters were only present in a few clusters within subfamilies and were not indicative 21.2% of the clusters were found on plasmids, 65.3% were chromosomally located, and the remaining 13.6% were considered unknown (Table 3, Figure 1). 20.3% were associated with mobile genetic elements such as insertion sequences ( Fig S5).
In the AS-48 subfamily (Figure 3), six genes as-48ABCC1DD1 have been shown to be essential for AS-48 production (Martínez-Bueno, Valdivia et al. 1998). This consists of the bacteriocin structural gene, a short and long putative membrane protein/stage II sporulation protein M, another putative transmembrane protein, an ABC transporter and an immunity gene (Martínez-Bueno, Valdivia et al. 1998). All six genes were found in most clusters, though putative immunity genes were not identified in 3/10 clusters. This analysis revealed stage II sporulation protein M domains were commonly found in the putative membrane proteins of the identified circular bacteriocin clusters. Other times, they were found encoded by two separate genes ( Figure 3). Therefore, they were treated as similar genes.
Immunity genes from clusters of experimentally confirmed circular bacteriocins appear to have two to three transmembrane domains ( Figure S3). They also contain large hydrophilic region/s which occur between these domains. Acidic residues were also found outside these transmembrane domains in 10/15 experimentally-confirmed circular bacteriocin immunity genes. There were no cysteine pairs found in the immunity genes except for in the atypical lycD sequence from leucocyclicin Q.
To demonstrate the identification of putative circular bacteriocin subfamilies, which were most likely functional, cluster analysis of the putative bacillocyclin subfamily is shown in Fig S6. Five of the six gene clusters match the gene cluster profile of the AS-48 subfamily (closest phylogenetic relative) and appear to be intact.
Another previously undescribed observation was that some strains contained multiple structural bacteriocin genes within the same cluster ( Fig S7). Bacillus cereus BCE-01 (NZ_MVPV01000042.1) contained two different circularin-like circular structural bacteriocin genes with 82.89% identity. 80% identity was found between the signal sequences of these two structural genes. Bacillus thuringiensis AFS079576 (NZ_NUXU01000032.1) also contained two circularin-like structural genes with 81.58% identity within the same cluster. 80% identity was found between the signal sequences of these two structural genes. Bacillus weihenstephanensis SDA_NFFE664 (NZ_FMBF01000026.1) contained three uberolysin-like circular structural genes with 100% identity and 92% identity, respectively. Each circular bacteriocin structural gene from B.
weihenstephanensis SDA_NFFE664 had identical signal sequences to the others in the cluster.
Each structural gene within these multi-structural gene clusters had independent putative promoters. Another observation is that a single putative immunity gene was found within these clusters, indicating it is most likely the single immunity factor for each circular bacteriocin variant.

Discussion
Putatively functional circular bacteriocins This study shows that circular bacteriocins are much more prevalent than originally expected (Perez, Zendo et al. 2018). A literature search revealed a circular bacteriocin was likely isolated from Lactobacillus acidophilus IBB 801, though this was not confirmed nor was sequence data available (Zamfir, Callewaert et al. 1999). Some circular bacteriocins identified here have 100% similarity to other circular bacteriocins despite being present in different species. It has been shown that identical circular bacteriocins can have different structures and activities based on the presence of D-amino acids, as is the case with gassericin A and reutericin 6 (Kawai, Ishii et al. 2004). This study has shown bacteria from a wide range of sources including milk, soil, urine, plant cores, honeybee larvae, deep sea water and more ( Figure 1) contain putatively functional circular bacteriocin clusters. This indicates a potentially large reservoir of circular bacteriocinproducing strains and circular bacteriocins which could be used as therapeutics, food preservatives (Perez, Zendo et al. 2018), or in other applications such as use as vector proteins to stabilise bioactive proteins (Iwai and Plückthun 1999). There are many bioactive peptides which report low stability (Espinosa-Hernández, Morales-Camacho et al. 2019), which could be stabilised with C-N terminal ligation (Clark, Fischer et al. 2005) found in circular bacteriocins. During the process of this manuscript being written, circular bactercion amylocyclicin CMW1 was discovered (Kurata, Yamaguchi et al. 2019). This sequence was successfully predicted as a circular bacteriocin from this dataset, appearing in Bacillus amyloliquefaciens LL3. This co-occurrence provides more evidence that the predicted circular bacteriocins are likely correctly identified.
Bacillus spp. also contained the largest range of putative circular bacteriocins in this dataset. They contained clusters from IIc I subfamilies: AS-48, amylocyclicin, enterocin NKR-5-3B, uberolysin, lactocyclin/leucocyclin, circularin, bacillocyclin, krulwicyclin. They also contained IIc ii circular bacteriocins which were not assigned subfamilies. However, this may have been due to their phylogenetic heterogeneity, some of which has been remediated though reclassifications based on next generation sequencing rather than phenotype (Nazina, Tourova et al. 2001, Logan, Berge et al. 2009).
The percentage of gene clusters which contain translational coupling (92.4%) is most likely a better representation of functional clusters than the conservative prediction based on gene presence (76.3%) found in Table 3. Translational coupling indicates a high level of cluster structure conservation (van de Guchte, Kok et al. 1991) and it would be highly unusual for these genes to be asserting such a high degree of organisational structure if they were not positively-selected for, that is if they were not functional/expressed.
Mutations in these tightly-packed clusters will not only alter the ends of particular gene products, but also impact transcription of downstream genes in alternative reading frames.
Presence of polybasic and aromatic residues were locationally conserved, found in 94.1% and 97.5% of identified circular bacteriocins, respectively (Table 3, Fig S1). Aromatic residues are often found flanking transmembrane-associated helices, allowing penetration into membranes (Braun andvon Heijne 1999, Gleason, Greathouse et al. 2013). Trp24 has been shown to be essential in the biological activity of AS-48, as it is located in a hydrophobic region that interacts with the membrane (Sanchez-Hidalgo, Fernández-Escamilla et al. 2010).
It has been previously pointed out that circular bacteriocins have similar hydrophobicity profiles (Kawai, Kemperman et al. 2004). Analysis of hydrophobicity profiles allowed increased confidence in the identification of putative circular bacteriocins discerned through sequence similarity. Hydrophobic profiles were maintained within subfamilies, as well as more generally within the families IIc i and ii. By comparing profiles of putatively identified sequences to the average profile of each family, it can be determined which family they belong to. This could also be used to screen out non-circular bacteriocins.
Though the hydrophobicity profiles are different between the families IIc i and ii, if the profile of IIc ii is flipped, the profile is surprisingly similar to IIc i (Fig S3). There are particular sequences which show divergence to the profiles, such as L. mesenteroides TK41401 (leucocyclicin Q) and Lactococcus sp. QU 12 (lactocyclicin Q) from IIc i, and Trichococcus alkaliphilus B5 (paracyclicin subfamily) and Alkalibacterium AK22 (akalicyclin subfamily) from IIc ii.
A hydrophilic region was found in every putative and experimentally confirmed circular bacteriocin (Figure 2, Fig S3). This usually overlapped with the uni/polybasic region and implied a conserved functional region. There is strong evidence for a similar mechanism of action for this region, given the high levels of evolutionary conservation. This region is most likely involved with cell membrane interaction and binding based on the positivelycharged basic residues and the negatively-charged cell membrane (Kim, Mosior et al. 1991, Jiménez, C Barrachi-Saccilotto et al. 2005. In experiments, the positively-charged (and polybasic) region of AS-48 (residues 49-69) showed no killing activity, but showed competitive binding to the negatively-charged membrane against the wild type AS-48 bacteriocin (Jiménez, C Barrachi-Saccilotto et al. 2005), indicating the role this region plays in the bactericidal activity of circular bacteriocins. Butyrivibriocin AR10 uncharacteristically does not contain a polybasic region (only a single basic residue), yet is functional as a circular bacteriocin against other B. fibrisolvens isolates (Kalmokoff and Teather 1997). It has a hydrophobic profile with a hydrophilic region which is consistent with family IIc ii. This indicates polybasic regions aren't necessarily required for antimicrobial activity, but the hydrophilic region is. higher resolution such as subfamilies, experimentally-confirmed circular bacteriocins can be used as type-sequences and accurate sequence analysis and comparisons can be performed. This reduces the background noise of distantly-related circular bacteriocins within the immediate sequence family. It's highly probable that the putative circular bacteriocins within each subfamily share a similar mechanism of action but have their own distinct spectrum of activity. The phylogenetic classifications were further enforced by cluster analysis. For example, uberolysin and amylocyclicin circular bacteriocin subfamilies are distinct at the cluster level, have different hydrophobicity profiles at their C termini (Fig S3), yet are not divergent regarding structural gene homology despite a size difference of 6 residues.

Conserved genes within circular bacteriocin clusters
Cluster analysis proved to be informative for determining putative functional circular bacteriocins, as well as phylogenetic classification. Recently-diverged structural genes would most likely have similar associated genes within their bacteriocin clusters. The drawback of this type of analysis was the associated genes essential for circular bacteriocin product may not be present within the same cluster but elsewhere within the genomic material. However, given a conservative 77.6% estimate of putative functionality, a number of potentially useful antimicrobial peptides have been highlighted (Figure 1). It is probable that some of these clusters contain non-functional pseudogenes, but given that most clusters were 'intact' upon comparison to experimentally confirmed clusters, the genes are considered conserved for circular bacteriocin production (Tomita, Fujimoto et al. 1997, Kawai, Saito et al. 1998, Martínez-Bueno, Valdivia et al. 1998, Kalmokoff, Cyr et al. 2003, Kemperman, Kuipers et al. 2003, Wirawan, Swanson et al. 2007, Martin-Visscher, van Belkum et al. 2008, Sawa, Zendo et al. 2009, Borrero, Brede et al. 2011, Masuda, Ono et al. 2011, Golneshin 2014, Potter, Ceotto et al. 2014, Scholz, Vater et al. 2014, Acedo, van Belkum et al. 2015, Perez, Ishibashi et al. 2016, Collins, O'Connor et al. 2017, Borrero, Kelly et al. 2018.
Strains with multi-structural gene clusters are an undescribed phenomenon until now.
Given their high sequence identities to each other, it is clear they are a result of duplication events in which slight variants with independent promoters have been selected for. It is most likely that these strains swap or co-express variable circular bacteriocins via response regulators and quorum sensing (Kalmokoff, Cyr et al. 2003, Wirawan, Swanson et al. 2007, Bartholomae, Buivydas et al. 2017, allowing expression of different circular bacteriocins with a slightly different spectrum of activity/microbial targets. These multi-structural gene clusters can also give us insights into the putative immunity genes. It appears one putative immunity gene is enough to provide protection against each circular bacteriocin variant within the cluster. This indicates immunity genes may provide broader immunity than once thought and may possibly provide immunity to similar circular bacteriocins with as low as ~80% similarity. Based on the presence of two (sometimes three) putative transmembrane domains, as well as the central hydrophilic region and presence of acidic residues at the termini, the mechanism of immunity can be proposed. Immunity proteins may function as transmembrane proteins and competitively bind positively-charged/polybasic regions of corresponding circular bacteriocins, thus reducing pore formation within the cell membrane. Acidic residues found in the immunity proteins may compete with the negatively-charged cell membrane. Though, further experimental analysis is required, as immunity has been shown to be a cumulative effect with other genes within the cluster demonstrating a role in immunity (Mercedes, Antonio et al. 2004, Mu, Masuda et al. 2014, Perez, Ishibashi et al. 2016). More broadly, the observation that immunity genes are present in most gene clusters indicates these bacteria are susceptible to their own bacteriocins. Therefore, related species may also prove susceptible if lacking the corresponding immunity gene. This is hopeful as circular bacteriocins identified here were found in Enterococcus, Staphylococcus and Streptococcus species, which are currently regarded by the WHO as priority organisms for discovery of new antimicrobialsWHO 2017).

Selfish genetic elements
Although providing fitness to the cell, circular bacteriocins and their associated clusters can be thought of as selfish genetic elements. Given the high stability of circular bacteriocins, if at any time the cluster is mutated or plasmid is lost, the immunity factors associated with the cluster may also be lost. The ex-producer would then be susceptible to the bacteriocin, and therefore this phenotype will be selected against. Also, given the high temporal stability of circular bacteriocins, they would also be more stable than the immunity genes which would be more susceptible to proteases, heat, pH etc, and would require continual renewal via gene expression. By nature, it is a toxin-antitoxin system which locks the producing strain into a long-term partnership. It has been demonstrated by removing the circular bacteriocin gassericin A from a plasmid, segregational stability of that plasmid drops (Ito, Kawai et al. 2009). This explains why so many of the circular bacteriocin clusters identified were putatively intact (Figure 1, Table 3), regardless if they are chromosomally associated or plasmid-borne. As previously described, the spectrum of antimicrobial activity (usually to closely related species) of circular bacteriocins provides further evidence of the toxin-antitoxin relationship (Martin-Visscher, van Belkum et al. 2008, Ito, Kawai et al. 2009, Egan 2018. Coincidentally, the circular bacteriocin from L. nodensis DSM 19682 was previously highlighted by a similar genome-mining study and the strain was not found to demonstrate antimicrobial activity against a range of bacteria including Enterococci and Lactobacilli (Collins, O'Connor et al. 2017). Given that the gene cluster was identified as intact ( Figure 1, Fig S5), it is possible the bacteriocin was not tested against closely-related strains (including L. nodensis) which may demonstrate susceptibility.
Based on the selfish gene hypothesis, any clusters currently identified as missing a putative immunity gene should realistically have one that is just yet to be identified.
Production of the circular bacteriocin without immunity factors will result in death of the producer, therefore sequence data from pure cultures would not exist, indicating the immunity gene is currently unidentified. Alternatively, if the antimicrobial activity mode of action relies on specific target receptors not found in the producer strain, immunity genes would not be needed. Another alternative explanation is recent inactivation of the entire immunity-gene-lacking cluster, which given enough time will eventually be reduced to pseudogenes and vestigial fragments. Though, given the general intactness of the clusters, the former hypothesis seems most likely. Being associated with conjugative plasmids or mobile genetic elements (Table 3)

Circular bacteriocin characteristics
Polybasic residues were identified in the mature bacteriocin via the 'Mark' function in Notepad++ version 7.5.9 searching for the string "R|K|H" using the following search modifiers: 'Regular expression' and 'Match case'.

Hydrophobicity analysis
Hydrophobicity profiles were generated using the protscale website https://web.expasy.org/protscale/ with a sliding window of 9 (Kyte and Doolittle 1982).
95% confidence intervals were calculated using the Descriptive Statistics module from the Data Analysis ToolPak in Microsoft Excel. As C and N termini would be joined in the mature circular bacteriocin form, the first four residues were copied to the end of the sequence and the final four residues were copied to the beginning of the sequence to account for the sliding window of 9. This was performed by searching the amino acid fasta file for: ))$ and replacing with $4$1$2 with search modifiers: 'Regular expression' and 'Match case' in Notepad++.

Gene cluster analysis
To determine if the circular bacteriocin structural gene and associated gene clusters were present on plasmids or chromosome, tBlastn and BLASTn (https://blast.ncbi.nlm.nih.gov/Blast.cgi) (NCBI 2017) was used to see if there were significant nucleotide hits to plasmids or chromosomes on NCBI. Size was also considered; if a gene cluster was on a contig >100 kb, it was considered most likely chromosomal.
Functional domains were determined using HMMER version 3.2.1 (http://hmmer.org/) (Luciani, Lopez et al. 2018), along with NCBI annotations to infer gene function. Presence of plasmid-determinants such as repA/B and mobilisation genes were used to determine presence of cluster on plasmid. Presence of chromosomal determinants such as the 16s and tRNA genes were used to infer chromosomal localisation. If location was unclear, they were determined as 'Unknown'.
For gene clusters broken up amongst multiple contigs, contigs containing cluster elements were first joined with 5 N's, and then used for cluster alignments and analysis.
Easyfig version 2.2.3 (Sullivan, Petty et al. 2011) was used to align and visualise gene clusters using the tblastx function with an e-value cut-off of 0.001. Lactococcus sp. QU 12 was excluded from cluster analysis as only the structural gene sequence data is publicly available.

Declarations
Availability of data and materials All data generated or analysed during this study are included in this published article [and its supplementary information files].

Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.      Table S1.docx