General characteristics of Cupriavidus strain STM 6070
STM 6070 is a fast-growing, Gram-negative, motile, rod-shaped isolate that forms white-opaque, slightly domed and moderately mucoid colonies within 2-3 days when grown on solid media (Figure S1). Because STM 6070 was trapped from nickel-rich ultramafic soil, we compared its heavy metal tolerance with that of other symbiotic and non-symbiotic Cupriavidus strains. The growth of STM 6070 was compared to the growth of C. metallidurans CH34T (a model organism for heavy metal resistance ) and its heavy metal-sensitive derivative AE104 (CH34T devoid of the plasmids pMOL28 and pMOL30 that confer heavy-metal-resistance ) at various concentrations of Ni2+ (Figure S2). Of the tested strains, STM 6070 had the highest tolerance to Ni2+ and was the only strain capable of growth at 15 mM NiSO4.
C. metallidurans CH34T grew in the presence of 10 mM NiSO4, while AE104 was unable to grow at 3 mM NiSO4. Previous studies had established that other symbiotic C. taiwanensis strains LMG 19424T from Taiwan  and C. taiwanensis STM 6018 from French Guiana  were also unable to grow at 3 mM NiSO4 (data not shown).
In light of the observed Ni2+ tolerance of STM 6070, we examined the tolerance of the Cupriavidus symbionts to other metal ions. In the presence of Cu2+, STM 6070, 6018 and LMG 19424T were able to grow in media containing 1.0 mM Cu2+, however, growth of STM 6070 was inhibited from 0.6 mM Cu2+ (Figure S3). In addition, STM 6070 was able to grow in media containing 15 mM Zn2+, whereas STM6018 and LMG 19424T were far more sensitive and could not grow at this concentration (data not shown). Since STM 6070 was highly tolerant to Ni2+ and Zn2+, the genome of this strain was examined, in particular for putative HMR determinants.
STM 6070 Minimum Information for the Genome Sequence (MIGS) and genome properties
The classification, general features and genome sequencing project information for Cupriavidus strain STM 6070 are provided in Table S1, in accordance with the minimum information about a genome sequence (MIGS) recommendations  published by the Genomic Standards Consortium . The genome sequence consisted of 6,771,773 nucleotides with 67.21% G+C content and 107 scaffolds (Table 1) and contained a total of 6,182 genes, of which 6,118 were protein encoding and 64 were RNA only encoding genes. The majority of protein encoding genes (81.69%) were assigned a putative function, whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table S2.
Phylogenetic placement of STM 6070 within the Cupriavidus genus
Previous studies have shown that STM 6070 is most closely related to C. taiwanensis LMG 19424T  and C. alkaliphilus ASC-732T , according to recA phylogenies . This was confirmed by a phylogenetic analysis based on an intragenic fragment of the 16S rRNA gene (Figure S4). To determine the taxonomic placement of STM 6070 at the species level, the whole genome of STM 6070 was compared with sequenced genomes of five non-symbiotic and three symbiotic Cupriavidus species (Table S3) to establish the average nucleotide identity (ANI) (Table S4).
ANI [41-43] comparisons showed that the STM 6070 genome displayed the highest ANI values with the C. taiwanensis strains STM 6018 and LMG 19424T, but the values were lower than the species affiliation cut-off scores (Table S4). This reveals that STM 6070 (and isolates of the same rep-PCR group isolated from New Caledonia soils ) represent a new Cupriavidus species, for which we propose the name Cupriavidus neocaledonicus sp. nov. (i.e. from New Caledonia). The ANI values also suggest that the UYPR2.512 and AMP6 strains represent new Cupriavidus species.
Synteny between genomes
To assess how the observed differences in genome size (6.48 – 7.86 Mb) affected the distribution of specific genes within the five symbiotic strains of Cupriavidus, we used progressive Mauve  to align the draft genomes of STM 6070, STM6018, UYPR2.512 and AMP6 to the finished genome of C. taiwanensis LMG19424T (Figure S5). The alignments of the STM 6018 and STM 6070 genomes against that of C. taiwanensis LMG 19424T showed a high similarity of collinear blocks within the two largest replicons (Figure S5A), the sequence of the LMG 19424T chromosome 1 (CHR1) being more conserved than that of the chromosome 2 (CHR2 or chromid). We identified eight scaffolds specific to STM 6070 (A3AGDRAFT_scaffold_31.32_C, _43.44_C, _54.55_C, _39.40_C, _104.105_C, _101.102_C, _99.100_C, and _89.90_C) that could not be aligned to the LMG19424T genome sequence, as well as two STM 6070 scaffolds (A3AGDRAFT_scaffold_84.85_C and _75.76_C) that were absent from LMG 19424T but present in STM 6018. A putative genomic rearrangement was also detected within one scaffold of STM 6070 (A3ADRAFT_scaffold_0.1), in which one part of the scaffold mapped to chromosome CHR1 and another part mapped to the chromid CHR2 of LMG 19424T (see shaded area on Figure S5A).
In contrast, the genome alignment of UYPR2.512 and AMP6 with LMG19424T showed important differences in replicon conservation (Figure S5B). Earlier studies on comparative genomics of Cupriavidus species have suggested that the largest CHR1 replicon probably constitutes the ancestral one, while the smaller CHR2 replicon was acquired as a plasmid during the evolution of Cupriavidus and gradually evolved to a large-sized replicon following either gene transfer from CHR1 or horizontal gene transfer . Large secondary replicons, or “chromids” , such as CHR2, have been detected in many bacterial species and carry plasmid-like partitioning systems [25, 35] and some essential genes, such as rRNA operons and tRNA genes (present in CHR2 of LMG19424 and the corresponding syntenic region of STM6070). This chromid also carries many genes that are conserved within a genus, and genes conserved among strains within a species. This may well explain the greater degree of sequence divergence observed (Figure S5) in CHR2 as compared with CHR1 in the symbiotic Cupriavidus genomes.
Finally, we observed that whereas most of the LMG 19424T pSym sequence was well conserved in the STM 6018 and STM 6070 genomes (Figure S5A), only a few LMG19424T pSym genes (including the nod, nif, fix and fdx genes) were conserved across all five genomes. The M. pudica microsymbionts (LMG 19424T, STM 6018 and STM 6070) had almost identical pSyms (conserved pSym synteny with nod genes characterized by 100% protein identity). In contrast, the Parapiptadenia rigida (UYPR2.512) and Mimosa asperata (AMP6) nodulating strains harboured divergent pSyms (low synteny, with nod genes characterized by 80-94 and 95-98.4% protein identity to those of LMG 19424T, respectively). Based on phylogenetic analyses of symbiotic and housekeeping loci, our results support the hypothesis that symbiotic Cupriavidus populations have arisen via horizontal gene transfer .
Comparisons of Cupriavidus neocaledonicus STM 6070 with other sequenced genomes of symbiotic Cupriavidus
The comparison of gene orthologues of STM 6070 with those of the symbiotic Cupriavidus strains LMG 19424T, STM 6018, UYPR2.512 and AMP6, performed using the “Gene Phyloprofile” tool in the Microscope MaGe platform  (Figure 1), showed that these strains have a large core set of 4673 genes, representing from 55.5 to 78.1% of the total number of genes in these organisms (70.2% for STM 6070). Each species harbours a set of unique genes, which range from 226 for LMG 19424T to 1993 for UYPR2.512; larger genomes had a greater number of unique genes (Figure 1A). STM 6070 harbours 483 unique genes, which represent 7.2% of the total number of genes in the genome. The majority of these unique genes (376) encode hypothetical proteins. Only 22.2 % of the 483 STM 6070 unique genes could be ascribed to functional COG categories (Figure 1B). Within the functional COG category “Cellular processes and signaling”, the largest number of genes were found in Cell wall/membrane/envelop biogenesis, Signal Transduction, Defense mechanism and Intracellular trafficking, secretion, and vesicular transport. This may be related to processes required for plant host relationships and bacterial adaptation to the host environment. For example, within functional category M we detected several genes encoding glycosyl transferases, which are putatively involved in biosynthesis of exopolysaccharides and/or polysaccharides, products that have been shown to play a major role in rhizobial infection .
Unique STM 6070 genes within the signal transduction category included four genes encoding putative universal stress proteins (UspA family), additional response regulators and a sensor protein (RcsC), while the defense mechanism category includes genes encoding type I and III restriction modification systems, as well as genes encoding multidrug resistance efflux pumps, which could reflect adaptation to ultramafic soils. A high number of specific genes was assigned to “Information storage and processing”. For example, 38 genes encoded putative transcriptional regulators (COG category ‘transcription’) of various families (AraC, CopG, GntR, LacI, LysR, LuxR, MerR, NagC, TetR and XRE), suggesting a requirement for supplementary regulatory mechanisms of cellular and metabolic processes. Finally, a high number of specific genes was assigned to metabolic functions, represented mainly by amino acid, carbohydrate and inorganic ion transport and metabolism, energy production and conversion, lipid metabolism and secondary metabolites biosynthesis, transport and catabolism.
Metal resistance determinants in the STM 6070 genome
To understand the genetic basis of STM 6070 metal tolerance, we then searched for the presence of common and specific heavy metal resistance (HMR) markers within the genomes of STM 6070 and the other symbiotic Cupriavidus species, using the TransAAP tool on the TransportDB website (http://www.membranetransport.org/)  to find genes encoding predicted transporter proteins. Given that STM 6070 is nickel- and zinc-tolerant, we were particularly interested in identifying HMR proteins within known transporter superfamilies (Transporter Classification Database: http://www.tcdb.org/) [50-52]. TransAAP analysis revealed a total of 834 putative transporters within STM 6070, of which 156 were classified within the MFS, CDF, RND, CHR, ACR3 and P-ATPase protein families (Table S5). Of the 156 TransportDB predicted transporters, 23 HME transporter genes were identified in the STM 6070 genome. Based on gene arrangements and homology with characterised HMR loci, a total of 55 structural HMR genes (TransportDB predicted HME genes plus associated genes) were located in 12 clusters (clusters A – L, Figure 2). The transporter superfamily genes were compared with those described for C. metallidurans CH34T, C. necator H16, and the symbiotic species C. taiwanensis LMG19424T , Cupriavidus sp. UYPR2.512 and Cupriavidus sp. AMP6 (Table 2, Table S6).
Major Facilitator Superfamily (MFS) proteins
The MFS is one of the two largest families of membrane transporters found in living organisms. Within the MFS permeases, 29 distinct families have been described, each transporting a single class of compounds . Of the 106 STM 6070 TransAAP-identified genes encoding putative MFS proteins, two genes (nreB and arsP) were associated with HME functions. The nreB gene located in the nreAB operon (cluster I), and the arsP gene located in the arsRIC1C2BC3H1P operon (cluster K), encode putative nickel and arsenic efflux systems, respectively (Figure 2) .
Cation Diffusion Facilitator (CDF) proteins
The CDF proteins are single-subunit systems located in the cytoplasmic membrane that act as chemiosmotic ion-proton exchangers . They include HMR proteins such as CzcD, which provides resistance to cobalt, zinc and cadmium . Four genes encoding CDF proteins were detected in the STM 6070 genome (Table S6), but only one, czcD, is located in an HME cluster (czcDI2C3B3A3, cluster K) (Figure 2). This locus encodes a CDF efflux protein with 67.2 % identity to CH34T CzcD, which mediates the efflux of Co+2, Zn+2, and Cd+2 ions . The second CDF gene (dmeF) encodes an efflux protein with highest identity (76.1 %) to the CH34T DmeF protein, which has a role in cobalt homeostasis and resistance , while the other two CDF genes (fieF1 and fieF2) encode efflux proteins with homology to CH34T FieF (70.8 and 69.8 % identity, respectively). FieF has a role in ferrous iron detoxification but was also shown to mediate low level resistance to other divalent metal cations such as Zn2+ and Cd2+ [55, 56].
The RND-HME transporters are transmembrane proteins that form a tripartite protein complex consisting of the RND transmembrane transporter protein (component A), a membrane fusion protein (MFP) (component B), and an outer membrane factor (OMF) protein (component C). These components export toxic heavy metals from the cytoplasm, or the periplasm, to the outside of the cell and have been designated as CBA efflux systems, or CBA transporters , to differentiate them from ABC transporters. Within a CBA system, the RND transmembrane and MFP proteins [52, 57], mediate the active part of the transport process, determine the substrate specificity, and are involved in the assembly of the RND-HME protein complex.
The RND-HME transmembrane proteins contain a large periplasmic loop flanked by 12 transmembrane α-helices, TMH I to TMH XII . They are classified into different groups according to the signature consensus sequence located in TMH IV, which is essential for proton/cation antiport and is used to predict the heavy metal substrate specificity [52, 58]. The five classes of efflux systems and their predicted heavy metal substrates include: HME1 (Co2+, Zn2+, Cd2+), HME2 (Co2+, Ni2+), HME3a (divalent cations), HME3b (monovalent cations) HME4 (Cu+ or Ag+) and HME5 (Ni2+) types [52, 59, 60].
Our phylogenetic analysis of the eight TransAAP predicted STM 6070 RND proteins, together with the analysis of the conserved motifs within the proteins, suggests that three of these proteins belong to the HME1 class, two belong to the HME3a class and the remaining three proteins belong to the HME3b, HME4 and HME5 classes, respectively (Figure 3 and Table S6). The STM 6070 genome lacks genes encoding the HME2-type transmembrane proteins, such as the C. metallidurans CH34T CnrA and NccA, which are involved in heavy metal resistance and have predicted substrate specificity for cobalt and nickel .
STM 6070 contained 3 RND-HME1 encoded proteins (CzcA1-A3), characterized on the basis of homology to the canonical CzcA proteins of CH34T. STM 6070 CzcA1 and CzcA3 grouped with CH34T CzcA and CczA2, while STM 6070 CzcA2 formed an outgroup (Figure 3). The C. necator czcA1 gene was within an operon located in cluster F (Figure 2) and annotated as czcJ1I1C1B1A1. In addition to the czcCBA genes, this cluster contained a czcI1 homolog to a transcriptional regulator that has been shown to control the expression of czcC1B1A1 [52, 61] and a czcJ1 homolog, which was reported to be strongly induced by Cd2+, Cu2+, Ni2+, and Zn2+ [35, 62]. This operon was located in a genomic region showing high synteny with corresponding regions in the other symbiotic Cupriavidus strains and in C. necator N-1, and the STM 6070 CzcA1 protein showed high identity with other Cupriavidus CzcA orthologues (Table S6). In C. metallidurans CH34T, the corresponding czc cluster (czcMNICBADRSEJ, locus tags Rmet_5985-74) is located on the plasmid pMOL30 and contains additional genes that are not found in STM 6070 . The second STM 6070 RND-HME1 efflux system (czcC2B2A2) formed part of a large group of HMR loci within cluster I (Figure 2). Immediately upstream of czcC2B2A2 is a nreB gene, encoding a putative nickel resistance MFS protein. A similar arrangement has been observed for the CH34T nccCBA nreB cluster found on plasmid pMOL30 . Cluster I was delimited by transposases and no conserved syntenic arrangement with the six other Cupriavidus genomes was observed (Table 2). The third STM 6070 RND-HME1 efflux system (czcD czcI2C3B3A) was located within cluster K (Figure 2). The czcI2 and czcD components encode putative regulator and CDF proteins, respectively. This cluster, which was also delimited by two Tn3 transposases, had conserved synteny to corresponding regions in the genomes of Cupriavidus spp. LMG 19424T and STM 6018, but not to AMP6 and UYPR2.512.
STM 6070 contained two putative RND-HME3a efflux systems located in clusters G and I. Cluster G contained an hmv operon, located in a region that was syntenic to corresponding regions in symbiotic Cupriavidus and C. necator N-1. Although the region was not syntenic in CH34T, the encoded HmvCBA proteins all had high identity with STM 6070 HmvCBA proteins [35, 63], however, the role of the CH34T proteins in heavy metal efflux has yet to be determined .
Cluster I contained a putative zinc efflux RND-HME3a system, annotated as hmxB zneAC, with the associated upstream genes zneRhmxS encoding a two-component sensor regulatory system. The encoded proteins had low identity (38-44%) to corresponding proteins in other Cupriavidus genomes, however, although the BAC gene arrangement is atypical to the characterised RND-HME CBA transporter gene arrangement, it is the same as that described in the characterised CH34T HME3a zinc efflux system zneSRBAC [35, 57, 64] (Table S6). The STM 6070 ZneA protein contained highly conserved amino acids identified in the active and proximal heavy metal-binding sites of the characterised CH34T ZneA protein  (Table S7). Based on conservation of the essential amino acid residues, these proteins would be divalent cation transporters, putatively involved in zinc efflux. Interestingly, the highest similarity to the STM 6070 HmxB ZneAC proteins (70, 86.5 and 69.5%, respectively) was to encoded proteins of the marine betaproteobacterium Minibacterium massiliensis, within an operon of similar architecture but of unknown function and substrate specificity .
An RND-HME3b hmyFCBA efflux system was identified in cluster A (Figure 2). This operon showed high identity to a corresponding CH34T hmyFCBA cluster (locus tags Rmet_4119-4123), located on the chromid, and was also highly conserved in the four symbiotic Cupriavidus strains and C. necator N-1. The role of the Hmy efflux system in Cupriavidus is currently unknown and this system is likely to be inactive in CH34T since hmyA in this strain is insertionally inactivated by IS1088 . However, in the characterized Escherichia coli metal cation-transporting efflux system CusCFBA, cusF encodes a small auxiliary protein that is required for full resistance to copper and silver . A similar role is predicted for the Cupriavidus hmyF even though there is low identity (< 30%) to E. coli cusF.
An RND-HME4 silDCBAF efflux system was identified in cluster J and has been suggested to be important for monovalent cation efflux in CH34T . No syntenic regions were identified in the other Cupriavidus genomes. However, this operon is similar to the CH34T pMOL30 silDCBA operon (Rmet_5030-5034), which encodes a putative silver efflux system, and to the CH34T chromid cusDCBAF operon (Rmet_6133-6136), which encodes a putative copper efflux system . Similar operons were also identified in the STM 6018, AMP6, N-1 and H16 genomes.
An RND-HME5 nieIC cep nieBA efflux system, identified in cluster B, was located 28 kb downstream of cluster A. This operon included a gene encoding a conserved exported protein (cep) situated between the nieC and nieB structural genes, disrupting the typical RND CBA operon arrangement. Among the Cupriavidus strains, a similar operon structure was found only in the AMP6 genome, with the structural proteins displaying high identity to the corresponding STM 6070 proteins. This operon structure was also found in the genome of M. massiliensis , with the encoded proteins having 41 to 79 % protein identity with those of STM6070. As there are no RND-HME5 efflux systems present in CH34T, the protein encoded by STM 6070 nieA was compared with the characterized RND-HME5 proteins NrsA (involved in nickel resistance) and CopA (involved in copper resistance) of the cyanobacterium Synechocystis sp. PCC 6803 [69, 70]. The phylogenetic analysis (Figure 3) shows that although these proteins possess a common ancestor, they form two well separated clades, one comprising the HME5 proteins of STM 6070, AMP6 and M. massiliensis, and the second containing the NrsA and CopA of PCC 6803 together with RND-HME5 proteins from the cyanobacterium Anabena sp. PCC 7120 . The betaproteobacterial and cyanobacterial RND-HME5 proteins share less than 41 % identity, resulting in totally different amino acids involved in putative proximal and distal metal-binding sites, as well as differences in the consensus sequence of the TMHIV α-helice (Table S7). Of particular interest was the finding that the three histidines, which are present in the proximal site of NieA and in the proteins of this clade, form part of conserved HAEGVH and HRLDH motifs, and match with putative nickel-binding motifs H-X4-H and H-X3-H that are predominant in Ni-binding proteins, as described for the Ni-binding proteins of Streptococcus pneumoniae . Based on these findings, we suggest that this nieIC cep nieBA operon encodes a new RND-HME system (class 6) putatively involved in nickel efflux and represents an interesting candidate for knockout mutation to determine if it is a major determinant of nickel tolerance in STM 6070.
Chromate Ion Transporter (CHR) proteins
The CHR proteins efflux chromate from the cytoplasm through an indirect active transport process . Two STM 6070 genes (chrA1 and chrA2) were identified as encoding putative CHR proteins. The STM 6070 ChrA1 and ChrA2 proteins showed higher identity to each other than to the CH34T pMOL28 and chromid ChrA proteins. The chrB1A1 operon (cluster B) was located up-stream of the putative RND-HME5 efflux system nieIC-cep-nieBA (Figure 2). This chr operon was conserved in the genomes of the symbiotic Cupriavidus strains LMG19424T and STM6018, forming part of a large synteny block. The second chr operon, annotated as chrB2A2CF-cep-chrL (chrY), was located in cluster I, along with the RND-HME efflux systems czcC2B2A2 and hmxB-zneAC and the nreAB operon (Figure 2). In addition to chrB2A2, this operon contained four other genes: chrC, encoding a putative superoxide dismutase that may reduce chromate and thereby decrease chromate toxicity ; chrF, encoding a putative transcriptional repressor ; cep, encoding a conserved exported protein containing a Concanavalin A-like lectins/glucanases domain; and finally, chrL, encoding a lipoprotein (protein family, LppY/LpqO ) with 71.1 % identity to CH34T ChrL (also annotated as CH34T ChrY). Corresponding gene clusters were identified in the UYPR2.512 and CH34T genomes. In CH34T the corresponding chrL (chrY) gene (locus tag Rmet_6195) is induced by chromate ). Deletion of chrL in the Gram-positive Arthrobacter sp. strain FB24 resulted in a noticeable decrease in chromate resistance . The STM 6070 chr operon lacks the chrE, chrO, chrN, chrP and chrZ orthologues found in the corresponding CH34T chr operon. The different chromate resistance genes might affect tolerance to chromate, or to another metal-oxyanion . The STM 6070 chrB2 gene appears to be inactivated by an insertion that changes the reading frame after 214 amino acids, and shortens the protein to only 293 amino acids, instead of the full length 324 amino acid protein encoded by CH34T chrB. Since ChrB seems to be important for chromate resistance in CH34T , the tolerance of STM 6070 to chromate might be compromised. Indeed, in our experimental conditions STM 6070 only showed slight tolerance to Cr6+ (0.1 mM) .
Arsenical Resistance-3 (ACR3) proteins
The ACR3 family includes permeases involved in arsenate resistance. The two STM 6070 ACR-3 type arsB1 and arsB2 genes are located in two ars operons encoding putative arsenate detoxification systems. The first operon is located down-stream of the czc operon in cluster K. Genes in this ars operon had high identity with genes of the CH34T arsMRIC2BC1HP operon encoding an arsenite and arsenate detoxifying system [62, 77]. This ars cluster encoded a putative arsenite/arsenate transcriptional regulator/repressor (ArsR), a glyoxalase family of proteins (ArsI), three arsenate reductases (ArsC1, ArsC2, ArsC3), an arsenite efflux pump belonging to the ACR3 class of permeases (ArsB1), a NADPH-dependent FMN reductase (ArsH1), and a putative permease from the major facilitator family (MFS) (ArsP) . The operon was highly conserved in the Cupriavidus symbionts LMG19424T and STM6018 and formed a large syntenic region. The second ars operon arsR2C4B2H2, in cluster L, was present in all other Cupriavidus genomes except UYPR2.512, but was missing several genes (arsI, arsC and arsP) found in the cluster K operon.
P-type ATPase proteins
P-type ATPases directly utilise ATP to export metal ions from the cell cytoplasm. Of the ten STM 6070 genes assigned to the P-type ATPase protein family (Table S5), five encoded P-type ATPases putatively involved in HME (Figure 2 and Table 2). The copF P-type ATPase gene in cluster J was located upstream of the silDCBAF operon and could encode an essential copper efflux component, as shown for CH34T . However, the STM 6070 CopF appears to be truncated in its C-terminus and thus may not be functional. Two P-type ATPase-encoding genes were identified in cluster D and annotated as silP and copP. The encoded proteins had very low identity with proteins of the Cupriavidus genomes (Table S6), except for one P-type ATPase protein from AMP6 with 86% identity with the CopP protein. The proteins had higher identity with P-type ATPases encoded by C. necator H16, annotated as SilP (86%) and CopP (94.7%), and putatively involved in silver and copper ion transport, respectively . Within cluster H, a P-type ATPase-encoding gene, annotated as cupA, was located next to a regulatory gene, cupR (Figure 2), in a conserved large syntenic block common to all compared Cupriavidus strains, with high identity between corresponding genes. The cupA and cupR genes are putatively involved in copper ion transport. Finally, zntA was located within cluster C in a group of genes annotated as czcJ2-hns-czcLRS-ubiGI-zntA. Genes in this cluster had high identity with loci in two gene clusters in CH34T that have been annotated as zntA czcICΔB (locus tags Rmet_4594-4597) and czcBA ubiG czcSRL IS hns mmmQ (locus tags Rmet_4469-4461), respectively. These CH34T clusters encode an RND system (czcICBA), the ZntA ATPase, a two-component regulatory system CzcRS and a 3-demethylubiquinone-9 3-methyltransferase (UbiG) [35, 63]. UbiG participates in the biosynthesis of ubiquinone and its activity could be related to the sensor kinase activity of the two-component system CzcRS [78, 79]. The czcL, hns and mmmQ genes encode an unknown protein, an H-NS like protein and a small stress responsive protein, respectively. Genes in the second CH34T cluster may be inactivated by an insertion sequence located between czcL and hns. The synteny of the STM 6070 cluster C is perfectly conserved in the genomes of the four symbiotic Cupriavidus strains, suggesting that it is functional, but it is devoid of the czcCBA RND system found in the corresponding CH34T cluster. The role of the regulatory loci czcLRS-ubiGI, with regard to zntA expression, would thus be interesting to determine.
Other mechanisms of cation detoxification (not included in TransAAP)
The search for further heavy metal resistance determinants in STM 6070 that were orthologous to those described in CH34T led to identification of a copper-resistance operon copRSABCD (cluster E). This had a similar structure to the CH34T cop cluster (copS2R2A2B2C2D2) located on the chromid, which encodes a copper-resistance mechanism that is thought to sequester copper outside the cytoplasm [80, 81]. CopSR is a two-component sensor-regulator system and CopA is a putative multi-copper oxidase thought to oxidize Cu1+ to Cu2+. CopA proteins contain several motif variants of MGGM/MAGM/MGAM/MSGM, possibly involved in binding numerous Cu1+ ions, as determined for Pseudomonas syringae CopA . CopA is exported to the periplasm by the twin-arginine translocation pathway , where it may interact with an outer-membrane protein CopB, providing the minimum system required for low level copper resistance. CopD is a membrane protein involved in transfer of Cu1+ from the periplasm to the cytoplasm for CopA binding [80, 81]), and CopC is thought to regulate copper uptake by CopD. The STM 6070 CopA protein shows 75.8 % identity to both CH34T CopA1 (pMOL30) and CopA2 (chromid) proteins. Interestingly, the alignment of corresponding proteins reveals the presence of a histidine-rich sequence (GHG GHS GDS GHS GDS (GHS)5 GDS GHG AHA GHG) located in the middle of the methionine-rich CopA motif in the STM 6070 protein, which is absent from other CopA sequences deposited in the NCBI database. The Escherichia coli HRA-1 and 2, Enterococcus hirae CopB  and Rhizobium leguminosarum ActP  Cu-exporting P-type ATPase proteins also contain histidine-rich leaders, which we postulate bind to copper ions. The STM 6070 CopRSABCD putative copper sequestration system may provide a second line of defence against copper toxicity and is particularly well conserved in all of the symbiotic Cupriavidus isolates.
Location of HMR determinants
The detected STM 6070 HMR determinants in the 12 clusters (A to L, Figure 2, Table 2) were assigned to putative replicons of the STM 6070 genome, following alignment of contigs to the finished LMG19424T genome. Two clusters (D and H) could be assigned to chromosome 1 (CHR1), one cluster (K) to the pSym, and nine clusters (A, B, C, E, F, G, I, J and L) to CHR2 (chromid, Figure S6). Therefore, STM 6070 appears to carry the great majority of its HMR clusters on CHR2. In contrast C. metallidurans CH34T harbours 8 out of 24 HMR clusters on CHR2 (chromid) [35, 52, 63]. The genome synteny comparison revealed that six of the STM 6070 HMR clusters (A, C, E, F, G and H) are common to symbiotic and non-symbiotic Cupriavidus genomes. STM 6070 HME gene products from clusters A, C, E, F, G and H displayed highest identity (93 to 100 %) with corresponding proteins of C. taiwanensis isolates (LMG 19424T and STM 6018, Table S6), reflecting the taxonomic relationship with C. taiwanensis.
Synteny analysis indicated that the specific STM 6070 HMR clusters B, D, I and J were absent from all other analysed Cupriavidus genomes, although some of the HMR genes within these clusters had orthologues (35 to 89 % of encoded protein identity) in the genomes of the other Cupriavidus strains. Cluster K was perfectly conserved within the LMG 19424T and STM 6018 genomes (100 %) in a large syntenic region, whereas it was absent from the AMP6 and UYPR2.512 genomes. Only the separate czc and ars operons from cluster K were detected in the non-symbiotic Cupriavidus genomes, with encoded protein identities of 76 - 77 % and 83 - 88 %, respectively, to the STM 6070 czc and ars operon encoded proteins. This observation can be explained by the location of cluster K on the pSym, which, as proposed recently , seems to be largely shared between M. pudica microsymbionts of different genomic backgrounds. Indeed, we demonstrated by the progressive Mauve alignment (Figure S5) that the pSym seems to be conserved in the genomes of the M. pudica-nodulating LMG 19424T, STM 6018 and STM 6070, in contrast to genomes of AMP6 and UYPR2.512, which nodulate different mimosoid legumes and harbour totally different symbiotic plasmids.
The analysis of genes adjacent to HMR clusters revealed that for clusters D and J contained a transposase-encoding gene at one end of the cluster and clusters I and K were flanked by transposase-encoding genes (Figure 2). Analysis of the GC% using a two-tailed Mann-Whitney U test revealed that cluster D and J did not contain a significantly different GC% (P-value >0.01) in comparison to the average GC% of the genome. In comparison, clusters I and K did contain a significantly different GC% (P-value <0.01) in comparison to the average GC% of the genome. This suggests acquisition of the clusters by horizontal gene transfer (HGT) for clusters I and K. Cluster I, located on the chromid, is the largest of these clusters (of approximately 25 kb), flanked by transposases of the Tn3 and IS66 type, and carries four different HMR determinants, including czcC2B2A2 and hmxB zneAC. Cluster K is flanked by two Tn3 transposases, however, unlike Cluster I there is a high conservation of architecture and gene identity with the closely related C. taiwanensis strains (LMG 19424T and STM 6018). This may indicate that Cluster I contains HME determinants that are important for survival in the New Caledonian ultramafic soils. In C. metallidurans, the acquisition of mobile genetic elements that contain metal resistance genes appears to be a strategy important for its adaptation to environments that contain elevated levels of heavy metals [62, 85].
In contrast, no transposases or insertion sequences could be found around cluster B, or more particularly, around the operon nieIC cep nieBA). This operon, which is absent from LMG 19424T and STM 6018 genomes, is located in a large highly conserved region, suggesting a gene loss from C. taiwanensis genomes. Interestingly, nieIC cep nieBA (cluster B) and hmxB zneAC (cluster I), two unique RND-HME systems in terms of operon structure and protein sequences, showed significant structure and protein sequence similarity with two operons from the genome of M. massiliensis .