General genome features of M. aeruginosa strains
Genomes of 24 M. aeruginosa strains were retrieved from the National Center for Biotechnology Information (NCBI) database for series analysis. As shown in Table 1, except for strains NIES 2481[39] and NIES 2549[40], no plasmid sequences were discovered in other strains. Among them, genome sequences of the strains CHAOHU 1326 and NaRes975 were recently released by our laboratory[41-42], and their general information are shown in Table S1. The average size of the genomes was 4.97 0.40 Mb, and the average G+C content was 42.66 0.26%. Genomes ranged in size from 4.26 Mb (M. aeruginosa PCC9806) to 5.89 Mb (M. aeruginosa KW, Table 1). The selected genomes are of high completeness and low contamination as evaluated based on lineage-specific marker sets by checkM[43]. Multiple rRNA coding sequences were present in M. aeruginosa strains. Generally, each strain contains 1~2 sets of rRNA clusters as a rough estimation due to the incomplete sequences present in the genomes (Additional file 1, Table S2).
Table 1. Genome features of the 24 analyzed M. aeruginosa strains.
Strains
|
Isolation Location
|
NCBI Accession Number (Genome/Plasmid)
|
NCBI Assembly Number
|
Contigs
|
Genome Size (Mbs)
|
G+C %
|
CDs
|
Completeness
(%)
|
Contamination
(%)
|
CHAOHU 1326
|
Chaohu Lake, CN
|
MOLZ00000000/-
|
GCA_001895325.1
|
617
|
5.27158
|
42.50
|
4590
|
99.67
|
1.39
|
DIANCHI905
|
Dianchi Lake, CN
|
AOCI00000000/-
|
GCA_000332585.1
|
335
|
4.85887
|
42.50
|
4303
|
99.01
|
2.92
|
KW
|
Wangsong Reservoir, KR
|
MVGR00000000/-
|
GCA_002025445.1
|
6
|
5.88943
|
42.80
|
4854
|
97.92
|
0.51
|
NaRes975
|
Nanwan Reservoir, CN
|
MOLN00000000/-
|
GCA_001885655.1
|
413
|
5.11753
|
42.40
|
4617
|
99.89
|
0.51
|
NIES44
|
Lake kasumigaura, JP
|
BBPA00000000/-
|
GCA_000787675.1
|
79
|
4.56532
|
43.20
|
4053
|
99.89
|
0.07
|
NIES87
|
Lake kasumigaura, JP
|
BFAC00000000/-
|
GCA_002933835.1
|
246
|
4.92578
|
42.90
|
4214
|
99.89
|
0.84
|
NIES88
|
Lake Kawaguchi Yamanashi, JP
|
JXYX00000000/-
|
GCA_001578075.1
|
262
|
5.26322
|
43.00
|
4620
|
99.45
|
0.84
|
NIES98
|
Lake kasumigaura, JP
|
MDZH00000000/-
|
GCA_001725075.1
|
500
|
4.98253
|
42.40
|
4412
|
99.67
|
0.37
|
NIES843
|
Lake kasumigaura, JP
|
AP009552/-
|
GCA_000010625.1
|
1
|
5.84279
|
42.30
|
5190
|
99.89
|
0.51
|
NIES1211
|
Lake Tofutsu, JP
|
BEIV00000000/-
|
GCA_003206625.1
|
289
|
4.73839
|
42.80
|
4209
|
99.89
|
0.51
|
NIES2481
|
Lake kasumigaura, JP
|
CP012375/CP025929
|
GCA_001704955.2
|
2
|
4.44055
|
42.86
|
3966
|
99.82
|
0.15
|
NIES2549
|
Lake kasumigaura, JP
|
CP011304/CP026286
|
GCA_000981785.2
|
2
|
4.3012
|
42.90
|
3843
|
99.89
|
0.07
|
PCC7806SL
|
Braakman Reservoir, NL
|
CP020771/-
|
GCA_002095975.1
|
1
|
5.13934
|
42.10
|
4497
|
99.67
|
1.45
|
PCC7941
|
Lake Lillte Rideau, CA
|
CAIK00000000/-
|
GCA_000312205.1
|
433
|
4.8019
|
42.60
|
4337
|
98.57
|
0.73
|
PCC9432
|
Lake Lillte Rideau, CA
|
CAIH00000000/-
|
GCA_000307995.2
|
438
|
4.99494
|
42.50
|
4543
|
99.67
|
0.29
|
PCC9443
|
Fishpond, CF
|
CAIJ00000000/-
|
GCA_000312185.1
|
760
|
5.18504
|
42.70
|
4545
|
98.36
|
0.37
|
PCC9701
|
Guerlesquin dam, FR
|
CAIQ00000000/-
|
GCA_000312285.1
|
550
|
4.756
|
42.70
|
4312
|
99.12
|
0.07
|
PCC9717
|
Rochereau dam, FR
|
CAII00000000/-
|
GCA_000312165.1
|
892
|
5.30034
|
42.70
|
4609
|
98.57
|
0.29
|
PCC9806
|
Oskosh, US
|
CAIL00000000/-
|
GCA_000312725.1
|
310
|
4.26256
|
43.10
|
4258
|
99.01
|
0.18
|
PCC9807
|
Hartbeespoort dam, ZA
|
CAIM00000000/-
|
GCA_000312225.1
|
782
|
5.15571
|
42.60
|
4588
|
99.01
|
0.91
|
PCC9808
|
Malpas dam, AU
|
CAIN00000000/-
|
GCA_000312245.1
|
479
|
5.05105
|
42.40
|
4556
|
99.01
|
0.51
|
PCC9809
|
Lake Michigan, US
|
CAIO00000000/-
|
GCA_000312265.1
|
809
|
5.01102
|
42.80
|
4497
|
98.36
|
0.95
|
Sj
|
Lake Shinji, JP
|
BDSG00000000/-
|
GCA_003206555.1
|
366
|
4.61732
|
42.80
|
3956
|
99.67
|
0.18
|
TAIHU98
|
Taihu Lake, CN
|
ANKQ00000000/-
|
GCA_000330925.1
|
4
|
4.84961
|
42.50
|
4340
|
99.89
|
0.22
|
Modular signaling proteins involved in c-di-GMP metabolism and regulation in M. aeruginosa
A genome search for genes that encode enzymes involved in c-di-GMP metabolism was performed to identify the putative translated products that have DGC and PDE activities in the selected 24 M. aeruginosa genomes. The accession numbers of the predicted proteins are shown in Table 2. This survey led to identification of three enzymatic classes of predicted proteins DGCs, PDEs, and hybrid DGC–PDEs, which contain GGDEF and EAL domains, even though they have opposing activities. As listed in Table 2, 14 of the 24 M. aeruginosa genomes had genes that encode DGC enzymes, which contain a fused N-terminal REC domain and GGDEF domain in tandem. The REC domain, as a signal receiver domain present in association with c-di-GMP metabolism domains, is supposed to modulate the enzymatic activities in response to the internal or external stresses. There are two types of PDEs in M. aeruginosa genomes, one type contains partial EAL domain, and the other type contains HD-GYP domain along with a N-terminal DICT domain, a sensory domain in “diguanylate cyclases and two-component system”. Interestingly, compared with the HD-GYP domain-containing PDEs, which were identified in all selected M. aeruginosa genomes and seemed to be highly conserved proteins with partial EAL domains were found less frequently (in only three genomes). Except for the NIES44 genome, each of the other 23 genomes was found to have a GG[D/E]EF-EAL hybrid protein, consisting of GGDEF, EAL domain, and Forkhead associated (FHA) domain, a putative nuclear signaling domain.
Table 2. Predicted modular signaling proteins involved in c-di-GMP metabolism in all 24 analyzed M. aeruginosa genomes.
Strains
|
DGC
(REC-GGDEF)a
|
PDE (DICT-
HD-GYP)
|
PDE (EAL)
|
Hybrid protein (FHA-GGDEF-EAL)
|
DGC, PDE, Hybrid proteinb
|
CHAOHU 1326
|
WP_052276147.1
|
WP_052275339.1
|
-
|
WP_052277914.1
|
1, 1, 1
|
DIANCHI905
|
-
|
WP_002746813.1
|
-
|
WP_002743531.1
|
0, 1, 1
|
KW
|
WP_079210059.1
|
WP_002796380.1
|
-
|
WP_079210289.1
|
1, 1, 1
|
NaRes975
|
-
|
WP_002752229.1
|
-
|
WP_044034220.1
|
0, 1, 1
|
NIES44
|
-
|
WP_045358386.1
|
-
|
-
|
0, 1, 0
|
NIES87
|
-
|
WP_104396273.1
|
-
|
WP_104397223.1
|
0, 1, 1
|
NIES88
|
WP_061433230.1
|
WP_061432432.1
|
-
|
WP_061431785.1
|
1, 1, 1
|
NIES98
|
-
|
WP_002752229.1
|
-
|
WP_002739484.1
|
0, 1, 1
|
NIES843
|
WP_012264732.1
|
WP_002796380.1
|
-
|
WP_012266621.1
|
1, 1, 1
|
NIES1211
|
WP_039900524.1
|
WP_039900517.1
|
WP_071989022.1
|
WP_110544382.1
|
1, 1, 1
|
NIES2481
|
WP_046660716.1
|
WP_066029831.1
|
WP_080949698.1
|
WP_066029445.1
|
1, 1, 1
|
NIES2549
|
WP_046660716.1
|
WP_046662116.1
|
WP_080949698.1
|
WP_046660636.1
|
1, 1, 1
|
PCC7806SL
|
-
|
WP_002746813.1
|
-
|
WP_002743531.1
|
0, 1, 1
|
PCC7941
|
-
|
WP_002752229.1
|
-
|
WP_043997363.1
|
0, 1, 1
|
PCC9432
|
-
|
WP_002752229.1
|
-
|
WP_002750015.1
|
0, 1, 1
|
PCC9443
|
WP_043996837.1
|
WP_002765696.1
|
-
|
WP_002768060.1
|
1, 1, 1
|
PCC9701
|
WP_002801860.1
|
WP_002803155.1
|
-
|
WP_004163835.1
|
1, 1, 1
|
PCC9717
|
WP_043999403.1
|
WP_002762031.1
|
-
|
WP_002761714.1
|
1, 1, 1
|
PCC9806
|
WP_002783698.1
|
WP_002780038.1
|
-
|
WP_002783280.1
|
1, 1, 1
|
PCC9807
|
WP_002787322.1
|
WP_002785975.1
|
-
|
WP_004161732.1
|
1, 1, 1
|
PCC9808
|
-
|
WP_002752229.1
|
-
|
WP_044034220.1
|
0, 1, 1
|
PCC9809
|
WP_043999403.1
|
WP_002796380.1
|
-
|
WP_002797049.1
|
1, 1, 1
|
Sj
|
WP_110579156.1
|
WP_110579081.1
|
-
|
WP_110577728.1
|
1, 1, 1
|
TAIHU98
|
-
|
WP_002733640.1
|
-
|
WP_002739484.1
|
0, 1, 1
|
a Letters in parentheses are domains of the referred c-di-GMP metabolism enzymes.
b Number of DGCs, PDEs and hybrids proteins
Bacteria encode a variety of sensory and signal transduction proteins to sense and adapt to changes in the physicochemical makeup of their environment. Sensory and signal transduction proteins encoded in the selected 24 M. aeruginosa genomes were predicted, and 12 sensory domain-containing proteins were found. Most of these proteins are signal transduction histidine kinases. The accession numbers and domain architectures of the highly conserved GAF, PAS, and REC domain-containing proteins are listed in Additional file 1, Table S3. As an important sensor for photosensory behavior, the GAF domain was commonly associated with c-di-GMP domains in cyanobacteria[38]. As many as 11 of the 12 proteins had the GAF domain, and some even contained two. PAS-containing proteins are related to sensory input (GAF), transduction (HAMP), or output (histidine kinases). Half of the four predicted PAS-containing proteins contain a PAC motif, a conserved region of 40–45 amino acids located at the carboxy-terminal of the PAS domain, which contributes to PAS structure [28]. Interestingly, some sensory domain-containing proteins in different genomes were identical, and were therefore assigned the same accession number, such as NIES2549 and NIES2481, DIANCHI905 and PCC7806SL, and NaRes975 and PCC9808.
Pan-genome of M. aeruginosa
To assess the distribution of genes involved in c-di-GMP metabolism and regulation across the M. aeruginosa genome, a core–pan-genome analysis was performed using all 24 M. aeruginosa genome sequences as input in the Bacterial Pan Genome Analysis (BPGA) tool[44]. The pan-genome analysis revealed a core genome of 1,918 genes with an accessory genome of 36550 genes and 6489 unique genes (Fig. 2a). Accessory genes are those whose orthologs are present in two or more genomes, but not in all the genomes. M. aeruginosa possess a core genome shared by 24 strains, accounting for 37.0% to 49.9% of the genome repertoire. The core–pan plot (Fig. 2b) showed that the pan-genome trend curve did not reach a plateau and seemed to extend with addition of more genomes to the analysis. Therefore, the pan-genome was considered an “open” pan-genome. In contrast, as shown in Fig. 2b, the core genome curve leveled off, considered as a conserved core genome.
The distribution of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Cluster of Orthologous Groups (COG) categories in M. aeruginosa has been mapped in Fig. 2c and Additional file 1, Fig. S1. Genes involved in c-di-GMP metabolism and regulation were assigned to gene families belonging to environmental information and signal transduction category. Among these genes, the GG[D/E]EF domain-containing DGCs and GG[D/E]EF-EAL hybrid proteins coding genes were classed into accessory genes, while the HD-GYP domain containing PEDs and the sensory genes which possess essential sensor and regulator domain, as the core genes, were relatively conservative for M. aeruginosa strains.
The BLAST Ring Image Generator (BRIG) alignment made it clear that most regions within the 24 M. aeruginosa genomes were conserved when compared to the reference strain NIES843 (Fig. 3). Several regions appeared to have low or even no similarity, possibly because of acquisition/deletion/rearrangement or horizontal gene transfer (HGT). The outer ring in Fig. 3 showed the distribution of genes encoding the domains involved in c-di-GMP signaling in NIES843 genome, and the specific locations were list at Additional file 1, Table S4. The corresponding sequences of other M. aeruginosa strains seemed highly conserved, and some even shared an identity up to 100% with the reference genome.
[Figure 2]
[Figure 3]
Phylogenetic analysis of M. aeruginosa strains
Comparing with the phylogenetic tree based on the 16S rRNA gene (Additional file 1, Figure S2a), the phylogenetic tree based on the conserved marker genes, previously validated as phylogenetic markers for (cyano) bacteria[45], produced higher resolution (Fig. S2b). The 31 conserved marker genes tree revealed a topology with generally well-defined nodes, with bootstrap support values greater than 90% over 1,000 replicates. Further, propensity for gene loss (PGL) analysis of the gene families revealed a group of strains have lost the REC-GGDEF domain coding gene, including strains NIES98[46], TAIHU98, NaRes975, PCC9808, PCC9432, DIANCHI905, PCC7806SL, PCC7941, NIES87, and NIES44 (Fig. 4a). As to the node, consisting of strains NIES1211, PCC9701, NIES44, NIES2549 and NIES2481, the EAL domain coding gene seems to be acquired, but PCC9701 and NIES44 have lost this gene. Gene encoding GGDEF-EAL hybrid protein was lost in strain NIES44. There is no gain or loss of genes encoding HD-GYP and sensory domain-containing protein. To further analyze the evolution of genes encoding sensory domain-containing protein, phylogenetic tree was constructed using a multilocus sequence typing approach based on these concatenated conserved sensory domain-containing proteins sequences (Fig. 4b). It showed a similar topology with the conserved marker genes tree (Figure S2b). Phylogenomic analyses based on binary gene presence/absence (1/0) pan-genome matrix generated by BPGA pipeline resulted in a tree (Figure 4c) with a topology similar to the trees obtained using conserved marker genes and sensory genes (Figures S2b, 4b). All phylogenetic trees provided more robust topologies than that based on 16S rRNA gene analysis alone.
A visual comparison of phylogenies based on 31 marker genes, sensory domains, and pan-genome presence/absence matrix were generated by tanglegram [47]. As shown in Fig. S3, only a few strains (2 of 24) occupied divergent positions on the phylogenetic trees based on 31 marker genes and sensory domains, which indicated a congruence between the two trees (Fig. S3b). Interestingly, most of that strains in which REC-GGDEF domains containing protein DGC is not detected (marked in blue), including NIES98, TAIHU98, NaRes975, PCC9808, PCC9432, DIANCHI905, PCC7806SL, and PCC7941, appeared to be phylogenetically closely related, thus were always grouped in the same clade of the different phylogenetic trees based on 31 marker genes, sensory domains, and pan-genome presence/absence matrix. Among them, DIANCHI905 and PCC7806SL[48] are representatives of toxic (microcystin-producing) bloom-forming strains; in contrast, PCC 9432 and NIES98 are non-microcystin-producing strains[49]. Specifically, pairs of strains also appeared to be phylogenetically closely related, such as NIES2549 and NIES2481, DIANCHI905 and PCC7806SL, and NaRes975 and PCC9808, although they were isolated from diverse geographic origin (Table 1). The majority of the M. aeruginosa strains were isolated in different locations, but no correlation was found between their geographic distribution or bloom-forming ability and phylogenetic relationships, consistent with the previous report by Meyer et al[50].
To the majority sequence of genes encoding for c-di-GMP metabolism and regulation in M. aeruginosa, likelihood ratio test indicated that model M2 and M8 gave a significantly better fit than model M1 and M7, respectively, which allowed individual sites to evolve under positive selection (Additional file 1, Table S5). Lots of sites with ω>1 were identified in the sequence of genes responsible for c-di-GMP metabolism and regulation, revealing that they are likely to have been subjected to positive selection (Additional file 2, Table S6).
[Figure 4]
Structural features of GGDEF domain, EAL domain, and HD-GYP domain of M. aeruginosa strains
To elucidate the structural features, structure predictive modeling of GGDEF domain, EAL domain, and HD-GYP domain was performed on the corresponding M. aeruginosa strains. The NIES843 genome is a representative genome of M. aeruginosa because of its genome has been completely sequenced and is modeled in Figure 5. Similarly, the corresponding structural models of strain CHAOHU 1326 were shown in Additional file 1, Figure S4. The Z-values of the models are shown in Table S7. All the models of the GGDEF, EAL and HD-GYP domain containing proteins were qualified with a Z-score higher than -4.0.
According to SWISS-MODEL, the crystal structure of the conserved GGDEF domain of WspR (Protein Data Bank (PDB) id: 3BRE) was selected as the template to model the structure of the GGDEF domain of the DGC [51-52]. Amino acids S173 to N329 from the GGDEF domain were used to perform structural alignments (Fig. 5a, left). The amino acid sequences of the GGDEF domains in DGC showed a similarity of 34.2–37.2% to that of 3BRE (Additional file 1, Table S8). C-di-GMP binds to the catalytic site and to a second site distal to the catalytic loop. DGC proteins possess a conserved allosteric inhibition site (I site), composed of a RXXD motif (in which X represents any amino acid) five amino acids upstream of the GGDEF active site, that is important for controlling DGC activity. When levels of c-di-GMP are high, the second messenger can bind the RXXD motif, thereby repressing the DGC activity[53]. A systematic analysis and comparison of the 14 genomes that have corresponding GGDEF domains was performed to identify the amino acid motifs or signatures involved in catalysis and allosteric inhibition. As shown in Fig. 5a (left), the WebLogo alignment revealed that the RXXD and GGEEF motifs of the GGEEF domain were highly conserved in the same amino acid residues: Arg-Gln-Val-Asp (RQVD) and Gly-Gly-Glu-Glu-Phe (GGEEF), respectively. The GG[D/E]EF domain of the putative DGCs possessed the conserved amino acid residues essential for GTP binding, indicating that the DGCs may have catalytic activity[26].
Because only three genomes had partial EAL domains, the EAL domain in hybrid proteins from the M. aeruginosa NIES843 genome were chosen as paradigms to examine the crystal structure. Based on the crystal structure of the GGDEF-EAL domain of RmcA (PDB ID: 5M3C), which has a crystallographic resolution of 2.8 Å[54], the GGDEF and EAL domains in the hybrid protein of NIES843 were modeled. Compared with 5M3C, the GGDEF-EAL domains in the hybrid proteins showed sequence conservation of 35.9–37.8% (Additional file1, Table S9). The low sequence conservation appeared to have no impact on model prediction by SWISS-MODEL. Compared with DGCs that contained only the GGDEF domain, amino acid residues of RXXD and GGDEF motifs in the GGDEF domain of the hybrid proteins were less conserved (Fig. 5a, right). The WebLogo alignment in Fig. 5b showed that amino acid residues of the EAL domain involved in the binding of c-di-GMP and catalytic activity were highly conserved in all sequences. The Glu in the EAL signature motif is an essential residue that is required to bind the c-di-GMP, whereas a change of Ala into Val (EVL) still sustains the enzymatic activity[55]. Arg in the second position downstream of the EAL signature motif was conserved in nearly all EAL domain sequences; thus, the EAL signature motif can be extended as EXLXR motif, which forms a stable platform to bind c-di-GMP[56].
Crystal structure of HD-GYP domain of M. aeruginosa NIES843 was modeled based on PA4781(PDB ID: 4R8Z) from P. aeruginosa[57] (Fig. 5c). Aligned with PA4781, the HD-GYP domain containing PDEs in M. aeruginosa showed sequence conservation of 33.5–34.3% (Additional file 1, Table S10). Generally, in HD-GYP domain, the HD residues clearly serve as metal ligands, the signature of HD can be extended as a larger motifs HDxGK; while the GYP motif may be more usefully considered as part of the HHExxDGxGYP, and the role of GYP motif may be substrate specificity determining but is not certainly clear[57-58]. WebLogo alignment revealed that the GYP motifs of GYP loop in M. aeruginosa were highly conserved in the same amino acid residues EFExxDGxGVP, whereas Val replaced Tyr compared with the GYP motif template. Moreover, the HD motif possess YR residues in M. aeruginosa strains. That is, amino acid residues were replaced by YR-GVP in HD-GYP motif in all selected M. aeruginosa strains.
[Figure 5]
Structural features of the PilZ domain of M. aeruginosa strains
All selected M. aeruginosa genomes encoded proteins that possess a PilZ domain. Twenty-one genomes encoded cellulose synthase (CelA), which contained a C-terminal PilZ domain, and the other four genomes encoded a protein that contained only a PilZ domain. The accession numbers of the corresponding proteins are shown in Additional file 1, Table S11.
To identify the structural features, structure predictive modeling of proteins with a single PilZ domain and PilZ domain-containing CelA was performed for M. aeruginosa strains. Predictive modeling was based on the crystal structure of the BcsA (PDB id: 4P02) from Rhodobacter sphaeroides, which has a crystallographic resolution of 2.65 Å, according to SWISS-MODEL results[31]. Modeling of the PilZ domain-containing protein CelA of strain CHAOHU 1326 is shown in Figure 6a. The c-di-GMP-binding PilZ domain was located in the C-terminal region of CelA and had similar structure with protein containing a single PilZ domain in Figure 6b, which were derived from the representative M. aeruginosa strain NIES843. Figure 6c shows that the PilZ domain consists of a six-stranded β-barrel and a short α-helix that follows the last strand of the β-barrel.
[Figure 6]