First description of M. pinnipedii mixed-strain infection in a pinniped
The genome sequencing and assembly of the two M. pinnipedii isolates (named MP1 and MP2) resulted in 9,766,154 reads (228x coverage), and 9,433,048 reads (220x coverage), assembled into 102 and 106 contigs (~4.28 Mb/genome), respectively. Genome annotation with Prokaryotic Genome Annotation Pipeline (PGAP) indicates a GC content of 65.4% and 45 tRNA for both strains, and 4,269 and 4,276 genes, 3,993 and 3,996 coding DNA sequences (CDSs), and 152 and 158 pseudogenes for M. pinnipedii MP1 and MP2, respectively. Genomes of both strains showed similar characteristics to other genomes of the same species available in the database (Fig. 1).
Mapping of the M. pinnipedii MP1 and MP2 reads with the genome of the M. tuberculosis strain H37Rv resulted in a reference coverage of 99.08% and 99.03%, and 2,312 and 2,350 variants, respectively. Upon exclusion of false-positive prone areas, the isolates shared 1,912 SNPs (M. pinnipedii MP1) and 1,998 SNPs (M. pinnipedii MP2) with M. tuberculosis H37Rv. In addition, 181 (M. pinnipedii MP1) and 208 (M. pinnipedii MP2) indels (single-nucleotides or short indels ranging from 2 to 36 bp) were detected. When comparing M. pinnipedii MP1 and M. pinnipedii MP2, the distance between these strains, after excluding false-positive prone areas, was found to be 113 SNPs and 46 indels (single-nucleotides or short indels ranging from 2 to 36 bp). A total of 122 (76.73%) of these 159 SNPs and indels were located in CDSs with known functions, based on the COG (Cluster of Orthologous Groups) functional annotation (Additional file 1: Table S1). The main functional annotations were those related to energy production and conversion (11.95%, 19/159), lipid transport and metabolism (9.43%, 15/159) and cell wall/membrane/envelope biogenesis (7.55%, 12/159), which are known to play crucial roles in the pathogenicity of MTBC species. Interestingly, the majority of detected SNPs are non-synonymous (86.72%, 98/113), which means that related CDSs are under selective pressure for protein sequence change, having the potential to promote functional diversity.
CDSs of known function with non-synonymous SNPs identified between M. pinnipedii MP1 and M. pinnipedii MP2 were evaluated using GO enrichment analysis and STRING protein networks. The statistically significant biological processes identified were: trehalose metabolic process (P value = 3.07E-04), fatty acid biosynthetic process (P value = 2.70E-04) and cell wall biogenesis (P value = 2.20E-04). The identified molecular functions were ATPase activity (P value = 8.05E-05) and ATP binding (P value = 5.09E-05), while the identified cellular component was the cell wall class (P value = 9.95E-05). These findings suggest an evolutionary pressure for change in genes associated with cell wall remodeling. The unique lipid-rich mycobacteria cell wall is the first line of interaction between host and pathogen, with crucial implications in virulence, and is also important for environmental stress resistance (e.g. protection from osmotic stress, starvation, freezing, and desiccation).
When using STRING, we identified strong protein interactions between specific members of the VapBC protein family (Fig. 2). The fact that multiple proteins with STRING-detected relationships have non-synonymous SNPs also indicates that these Vap genes, in particular, are under evolutionary pressure for change. In M. tuberculosis, the VapBC family accounts for more than half of the proportion of toxin-antitoxin (TA) systems (47/88 putative TA systems) , and genomic regions containing VapBC are strongly associated with virulence and pathogenicity factors . The identified CDSs are part of a type-II TA system related to mRNA translation regulation. More specifically, VapC25 and VapC29 are ribonucleases shown to inhibit M. smegmatis cell growth when induced (i.e. have bacteriostatic effects) . Taken together, these results suggest that MP1 and MP2 may have different survival capacities.
Two distinct spoligotypes, SB0155 and SB2455, and two distinct Mycobacterial Interspersed Repetitive Unit-Variable Number of Tandem Repeat (MIRU-VNTR) profiles were identified in M. pinnipedii MP1 and M. pinnipedii MP2, respectively (Additional file 2: Figures S1 and S2). The in silico predicted MIRU-VNTR matched the pattern identified using PCR. The M. pinnipedii strains MP1 and MP2 differed at MIRU04 and MIRU10 loci (Additional file 2: Figure S1). Variations at multiple loci between isolates of the same animal normally indicate mixed-strain infections . In contrast to the spoligotype SB0155 , SB2455 profile has never been described in M. pinnipedii strains. Therefore, based on current parameters of distinction between microevolution and superinfection regarding SNPs/indels  and MIRU-VNTR , the sea lion analyzed herein was infected with two different strains of M. pinnipedii.
This is the first report of M. pinnipedii superinfection in a sea lion. As these animals normally live in dense colonies, they are highly vulnerable to epizootics of infectious diseases that can be transmitted by direct animal-to-animal contact. Superinfection caused by M. tuberculosis in humans is normally reported in highly endemic countries, in which people are exposed to multiple strains of M. tuberculosis throughout their lives and HIV plays an important role in shaping the disease incidence . By tracing a parallel, it is possible that M. pinnipedii is highly endemic in the population from which this sea lion came, represented by different circulating strains, which may have unprecedented effects on the conservation of this species. Adverse environmental conditions, insufficient nutrition, and chronic stress due to disturbance or competition can act to suppress the immune system and, therefore, pinnipeds may be predisposed to diseases caused by several pathogens. In Brazil, sea lion stranding occurs normally due to compromised animal health when they travel long distances during the migratory period . The weakened state in which these animals arrive at the coast, often infected by large bacterial loads, is the factor that most commonly leads to death [28, 29]. The contribution of M. pinnipedii infection in this context remains to be elucidated.
Although M. pinnipedii has been described in both captive and free-ranging animals, systematic population surveillance studies are still lacking. In 2011, a captive South American sea lion housed in a Czech Republic zoo that was imported from a German zoo (not specified) was found to be infected with M. pinnipedii. The animal’s parents also died of tuberculosis in Germany and they were captured as juveniles from the coastal waters of Uruguay in 1992 , suggesting that the infection came from wildlife. Outbreaks of the disease in wild and/or captive-born South American sea lions housed at the Heidelberg Zoo, Germany and the Le Pal Zoo, France have also been described . Added to this report of superinfection, it is thus clear that M. pinnipedii is endemic in wild populations of South American sea lions and is being introduced into zoos, warranting an urgent need to evaluate the extent of tuberculosis in free-ranging pinnipeds of the southern hemisphere.
Strains of M. pinnipedii have a highly conserved proteome
Figure 3 illustrates groups of orthologous proteins present in seven modern M. pinnipedii strains that were deposited in the National Center for Biotechnology Information (NCBI) (M. pinnipedii MP1, MP2, ATCC BAA-688, G01222, G01491, G01492, G01498). A total of 3,986 protein clusters (i.e. groups of ≥ 2 proteins) were identified among the analyzed strains, including 3,694 (92.67%) core protein clusters present in all seven genomes (Fig. 3A; Additional file 3: Table S2). According to COG classification, about a third (1,147/3,694; 31.05%) of the core protein clusters have unknown function, while the majority of these have functional annotation (2,547/3,694; 68.95%). The top five functional classes were: cell wall/membrane/envelope biogenesis (M; 426/3,694, 11.5%), lipid transport and metabolism (I; 284/3,694, 7.7%), energy production and conversion (C; 280/3,694, 7.6%), transcription (K; 205/3,694, 5.5%) and amino acid transport and metabolism (E, 181/3,694, 4.9%) (Fig. 3B-C).
There are no protein clusters (i.e. groups of ≥ 2 proteins) unique to each M. pinnipedii strain (Fig. 3A). However, a number of single proteins not categorized into any protein cluster were considered unique to each M. pinnipedii strain (MP1 = 32, MP2 = 37, ATCC BAA-688 = 58, G01498 = 103, G01492 = 68, G01491 = 52, G01222 = 80). As the annotation tool used herein (RAST; see methods) does not differentiate between pseudogenes/broken genes and true genes, we used BLASTp and/or tBLASTn to search the NCBI's PGAP annotation of MP1, MP2 and ATCC BAA-688 strains using these “unique proteins” as input. From those with annotated function (i.e. not hypothetical or PE/PPE genes), our BLAST search revealed that all functional CDSs are considered pseudogenes, are non-existent or are present in contigs smaller than 700 bp (with doubtful sequencing/assembly quality) based on PGAP annotation (Additional file 5: Table S4). As RAST reports broken CDSs at the end of contigs as true genes, these CDS fragments (i.e. artefacts of draft genomes) failed to cluster into orthologous groups. Nevertheless, there is still a number of hypothetical and PE/PPE genes that we were not able to distinguish between pseudogenes and true genes using the proposed tools, indicating the possibility of distinct phenotypes within M. pinnipedii strains. These variations may be a consequence of gene truncation or pseudogenization, which may lead to neofunctionalization or gene loss, respectively, as well as duplication events, phenomena commonly observed in the MTBC [31, 32].
When considering non-core protein clusters (n = 292) shared between two and up to six M. pinnipedii strains, similar results were obtained, with all functional CDSs identified as pseudogenes or non-existent based on PGAP annotation, and an unknown number of possible hypothetical and PE/PPE genes that may be variably present among M. pinnipedii strains. Taken together, our results indicate that the analyzed M. pinnipedii strains present a high level of proteome conservation, which is in contrast with a recent pangenome analysis of M. tuberculosis strains that detected at least 1,122 CDSs in the accessory genome and 964 strain-specific CDSs in 36 genomes . However, the M. tuberculosis pangenome in that study was also analyzed using RAST, and pseudogenes were not taken into consideration.
Modern M. pinnipedii strains have unique deletion markers
In the large sequence polymorphism (LSP) analysis, eight previously described deletions (RD2seal, MiD3, MiD4, RD3, RD7, RD8, RD9, RD10) were found in M. pinnipedii strains (Fig. 4). Although the genome coverage of the ancient M. pinnipedii reads in respect to M. tuberculosis H37Rv varied from 31.94% to 47.83% (after quality trimming), the deletions MiD3 and MiD4 were not found in the ancient strains of M. pinnipedii, while present in the eight analyzed modern strains of the same species. This finding contributes to the understanding of M. pinnipedii evolution over time, indicating an active process of genome reduction occurring for the past one thousand years.
We have also detected 19 regions with ambiguous read mapping (i.e. areas with low sequencing coverage composed of mapped reads with non-specific match) ranging from 548 bp to 5,329 bp (Additional file 6: Table S5). The vast majority (16/19; 84.21%) are associated with at least one PE/PPE gene, while the three remaining regions comprise a hypothetical protein with an oxireductase (Rv3530c-Rv3531c) and two transposases (Rv0797 and Rv3023c). Given the repetitive nature of these regions and possible sequencing coverage bias, these findings should be interpreted with caution. Future validation using PCR assays are strongly indicated. The PE/PPE gene families correspond to approximately 10% of the coding capacity of the M. tuberculosis genome  and show a high degree of variation across members of the MTBC and between strains of the same species.
The finding of MiD3 and MiD4 being present only in modern strains of M. pinnipedii suggests that these deletions may have occurred independently, twice, over its evolutionary course (Fig. 5). MiD3 and MiD4 were first identified in Mycobacterium microti [35, 36]. This pathogen was initially described in voles , but has since been isolated from pigs, llamas, cats, wild boars, and immunosuppressed humans, being restricted to Eurasia. This raises questions about the true host range of these closely related pathogens, which may or may not have changed over the past one thousand years.
Two distinct clusters of modern M. pinnipedii strains
Figure 5 illustrates the SNP-based phylogenetic tree of the MTBC including eight modern and three ancient M. pinnipedii strains. A total of 1,698 polymorphic positions were found to be unique to M. pinnipedii strains, i.e. not present in any other of the MTBC species included in this study (Additional file 7:Table S6). This finding is not comprehensive of all possible MTBC strains distributed worldwide, which means that different strains not analyzed herein may also have mutations in these same polymorphic sites. In addition, strains of M. pinnipedii appeared distributed among three main clusters, with the ancient strains emerging from the most basal node (Fig. 5). Interestingly, modern M. pinnipedii strains appeared divided into two groups according to geographic locations, and possibly host species: modern cluster 1, comprising isolates of South American origin, and modern cluster 2, comprising isolates from Australia. These results are in accordance with findings of M. tuberculosis and M. bovis lineages/clonal complexes that are also associated with distinct geographic locations [32, 38, 39]. Whether or not these are also associated with different virulence phenotypes is unknown.