The aim of this study was to identify potential sites in GyrA protein linked to a specific character. In order to perform this analysis, it was necessary first to carefully analyze the dataset in order to identify any potential characteristics either genetic, temporal or geographical that could be linked to any GyrA amino acid changes.
Analysis of the dataset
The 252 gyrA sequences analyzed in this report concerned 15 Neisseria species, N. meningitidis, N.gonorrhoeae and N.lactamica being the most represented (Additional Figure 1a). Ten of these species have been reported to infect humans, N. meningitidis strains as well as N.gonorrhoeae strains being pathogenic for humans. This study focused on N. meningitidis and therefore 87% of sequences (218/252) concerned this species. The 252 strains featured in this analysis were isolated from 15 countries of 4 continents, and mostly in China (Additional Figure 1b). The analyzed sequences concerned strains collected from 1931 to 2017, with most of the strains collected after 2005 (Additional Figure 1c). Eighty-eight STs were represented among the 218 N. meningitidis strains analyzed in this study (Additional Figure 2a-c). Thirty-nine of these STs belong to CC4821 and ST4821 was represented by 43 strains (Additional Figure 2a). ST7 and ST5, which belong to CC5, were also well represented with 28 and 11 strains respectively (Additional Figure 2b). Eleven serogroups of N.meningitidis were analyzed; B and C were represented by 60 strains, A by 47 and W by 12 (Additional Figure 2d). Based on STs, 19 clonal complexes were represented in the dataset. The majority was CC4821 (95/252) followed by CC5 (41/252) and unassigned (UA) (Additional Table 1).
Evolutionary analysis of 252 gyrA nucleotide sequences
A neighbor joining phylogenetic tree was constructed with 252 gyrA nucleotide sequences (Figure 1). The nucleotide sequences were significantly divergent with an overall p-distance of 0.052 (Table 1). An overview of the tree showed that the sequences from N.meningitidis were found across the tree demonstrating that gyrA gene was relatively divergent within that species. Moreover, no trend was found based on collection location, time or even genetic characteristics like CC or serogroup. For example, the sequences of CC4821 strains (in red in Figure 1) were found across the tree. Similarly, sequences of CC5 strains (in blue in Figure 1) were found in two lineages suggesting that the gyrA gene evolution was independent of the CC5 typing. Overall, these observations suggested that the evolution of gyrA sequences was independent of location, time as well genetic characteristics of the bacterial strains.
Among the 252 analyzed sequences, 48% (121) were identical (Table 1). We chose to maintain these sequences despite the sequence identity in order to keep valuable information in terms of geography, time and genetic characteristics. Nearly 60 % of the sequences (151) were found on the top of the tree, with no significant bootstrap value (Figure 1). These sequences were highly homogeneous, with a p-distance of 0.003 (Table 1). The remaining 101 sequences were more divergent, with a p-distance of 0.064 relative to sequences grouped on the top of the tree. Most of the nodes concerning these 101 sequences featured a bootstrap value > 70%. Moreover, the p-distance within this group of 101 sequences was 0.083 demonstrating that these sequences were highly divergent between each other. As major nodes of the tree featured a bootstrap value > 80%, we decided to arbitrarily assign sequences to 9 different genetic groups (Figure 1; Table 1).
Fifteen sequences were considered outliers. Even though they were from 2 genetically distinct groups at the bottom of the nucleotide tree, these sequences shared a common node with 90 % bootstrap suggesting that these sequences were divergent compared to the rest of the sequences. The lowest p-distance between the outlier group and the other 8 groups was 0.16. These sequences were from 11 species that were not found elsewhere in the tree. We could think that this might be the consequence of a sample bias. However, as these sequences shared 8 amino acid changes, it strongly suggested that these sequences were related and significantly divergent compared to the rest of the analyzed sequences (Table 1, Additional Figure 3). Group 1 consisted of 2 sequences, N.meningitidis-MK930374-100514-ST4832-CC4821-C-China-2005 and N.subflava-CP031251.1-M18660----2009. This group was supported by a bootstrap of 99% suggesting that these 2 sequences were very different from the rest of the sequences. The sequences shared 7 amino changes (Table 1, Additional Figure 3). However, the long branch corresponding to N.subflava-CP031251.1-M18660----2009 sequence suggested that this sequence was also significantly divergent from N.meningitidis-MK930374-100514-ST4832-CC4821-C-China-2005 and this was confirmed by a p-distance of 0.06 between these 2 sequences. Group 2 consisted of 2 sequences from N.meningitidis strains collected in Gansu and Guangxi province in China in 2007 and 2011, respectively, namely N.meningitidis-KF733178-G1-ST5636-UA-B-China-2007-R and N.meningitidis-MK930446-GX34173-ST9477-CC4821-B-China-2011. Both group 3 and 4 consisted of a single sequence, namely, N.meningitidis-MK930402-231003-ST12300-CC4821-B-China-2009 and N.meningitidis-KF733135-62-ST5751-UA-B-China-2006-R, respectively. The nodes corresponding to these sequences were supported by bootstrap values of 82 and 97% respectively. Furthermore, the lowest pairwise p-distances between these sequences and the other analyzed sequences were 0.032 and 0.025 respectively demonstrating that these sequences were significantly divergent compared to the rest of the analyzed sequences. Group 5 consisted of 26 sequences, mainly from N.meningitidis. However, a subgroup of 7 sequences supported by a bootstrap of 99% featured 5 sequences from N.lactamica. Group 6 consisted of 42 sequences from N.meningitidis at the exception of one sequence from N. cinerea, namely N.cinerea-LS483369.1-NCTC10294, which shared a node with N.meningitidis-KF733132-59-ST7962-CC77-NG-China-2009-R. However, the N.cinerea sequence featured a long branch suggesting a significant divergence compared to the N.meningitidis sequence. Twenty-five sequences, mainly from strains of ST7 and CC5 shared a common node with a bootstrap of 99% and a long branch. Finally, based on the available data, the sequences from group 6 appeared to be from strains that were resistant to quinolone. However, no amino acid substitution was shared by these strains suggesting that there was no common marker for the resistance phenotype (Additional Table 2). Group 7 was particular as it exclusively concerned sequences from N. gonorrhoeae species. These sequences were highly homogeneous with a p-distance of 0.002 and they shared a node with a bootstrap of 99% and a long branch. Moreover, these sequences shared 6 amino acid changes (Table 1, Additional Figure 3). Interestingly, the 12 analyzed sequences were from strains that were either susceptible or resistant to quinolone suggesting that this divergence was not related to a resistance phenotype. Group 8 consisted of most of N.meningitidis sequences analyzed in this report. The strains were collected in the last 90 years in 13 different countries from 4 continents. Despite the significant time span and geographic spread, these sequences were highly conserved with a p-distance of 0.003. Moreover, these sequences were from strains of 72 STs, 24 CCs, 10 serogroups, including the reference strain 053442. Overall, this showed that gyrA gene was highly conserved among most N.meningitidis strains despite different genetic characteristics, geographic locations or collection time.
Analysis of the divergence within the GyrA protein
The amino acid divergence within the GyrA protein was analyzed among 131 unique sequences (Additional Table 2). Two hundred fifty seven divergent positions were identified among the 931 amino acid featured in the alignment (Figure 2). Even though these sites were found across the protein, the distribution of the divergence did not appear to be random. Two regions were highly conserved, from positions 530 to 620, and a smaller region between 300 and 330. According to the protein from E.coli, the first region corresponds to the end of the amino terminal domain and the beginning of the carboxy terminal domain. The second region corresponds to the tower domain of the protein based on the 3D model structure. Among the 257 divergent positions, 4 sites were highly divergent, namely 91, 417, 665 and 210. For example, more than 57% of the sequences featured a mutated residue at position 91 (Figure 2). Whereas 42% of the sequences had a T, 38% had an I, 14.7% had an S and the remaining sequences had either an F or a V (Additional Table 2). Overall, the divergence among the 257 variable sites was shared by 2-3% of the sequences (Additional Figure 4a). However, 52 divergent positions were only found in one sequence. These variable residues might be due to sequence artifact. Because these residues were not shared by other sequences, this meant that the variability at these positions had not been selected in the evolutionary process. Seventy-two positions were shared by at least 10 % of the sequences (14 out of 131), position 91 being the most variable as 75 sequences (57%) featured an amino acid substation (Additional Figure 4b). Eighty-nine positions featured multiple amino acid residues (Additional Figure 4c-d). For example, the position 441 featured 6 amino residues (R, H, T, Q, G and N) in addition to the main residue E (Additional Table 2). This suggested that these positions were not only highly variable but also that these positions could allow different residues without affecting the protein function. The alignment of 131 sequences featured a few gaps, mainly located between positions 720-760 and the C terminus of the protein sequence.
The 257 divergent sites were analyzed among the complete dataset of 252 sequences in order to identify any potential correlation between divergence and geographic, species or genetic characteristics (Additional Table 2). As far as we can tell, no correlation was found between divergence and geographic location. Moreover, no temporal pattern was identified. However, a correlation between diversity and species was found for N. gonorrhoeae and N. lactamica (Table 2).
Indeed, 6 changes were only found in N.gonorrhoeae, D210K, E456K, E483K, V486I, N836S, D917G. Similarly, D907E was only found in N.lactamica. In contrast, no residue was specific to N.meningitidis. As the dataset was biased towards N.meningitidis, one explanation could be poor sampling. Only 12 N.gonorrhoeae sequences were analyzed compared to 120 N.meningitidis. However, the fact that the 6 residues were found in N.gonorrhoeae strains collected in different country and at different time suggested that these residues were indeed species specific. The search for a link between diversity and genetic marker like ST, CC or serogroup was unsuccesuful suggesting that gyrA evolution was independent of the evolution of the rest of the genome. Finally, one position (91) appeared to be linked to quinolone resistance. All the resistant strains had an I or a F whereas all the sensitive strains had an S or a T. V was found in strains with undetermined resistance phenotype. Interestingly, N.gonorrhoeae appeared to have different markers, F for resistance and S for sensitivity.
Identification of potential resistance markers to quinolone
Among the 252 analyzed sequences, 61 were from strains tested for resistance to quinolone (Additional Table 1). These strains were further analyzed in order to identify additional potential markers for resistance. A change that would be found in resistant strains but not found in any sensitive strains would qualify. Thirty sites were identified (Additional Table 3). For example, H8N was found in 29 resistant strains but not featured in any sensitive strains. All the strains were analyzed for these 30 positions. Fifty-one different profiles had been identified, meaning that there were 51 combinations of these markers among all the analyzed strains (Table 3).
Among the 30 potential markers, position 91 and 103 were the most shared in the profiles. Some amino acid substitutions were only found in N.gonorrhoeae strains, like D95G or D95A. Finally, 24 of the 51 profiles concerned strains that were known to be resistant to quinolone. As resistance markers were initially described in E.coli, a comparison between gyrA sequences of E.coli and the reference strain N.meningitidis 53442 was necessary in order to check the position of these markers in E.coli sequence (Additional Table 4).
Recombination within gyrA gene between N. spp.
Among the 9 groups identified during the phylogenetic analysis, group 1 was of particular interest, concerning N.meningitidis-MK930374-100514-ST4832-CC4821-C-China-2005 and N.subflava-CP031251.1-M18660. These 2 sequences shared 7 residue changes, not seen in other strains. Furthermore, five of these changes were seen within 30 amino acids (Table 1). Finally, amino acid changes observed in one strain were not seen on other strains, like position 740 and 750. All these observations suggested a recombination between these 2 strains which was confirmed by a bootscan analysis (Figure 3). Other potential recombinations, with either cinerea, or lactamica strains and between meningitidis strains were also identified.