Comparative genomics analysis of c-di-GMP metabolism and regulation in Microcystis aeruginosa

doi:10.21203/rs.2.15778/v1

Download PDF

Research article

Comparative genomics analysis of c-di-GMP metabolism and regulation in Microcystis aeruginosa

https://doi.org/10.21203/rs.2.15778/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 09 Mar, 2020

Read the published version in BMC Genomics →

You are reading this older preprint version

Read the latest preprint version →

Background: Cyanobacteria are of special concern because they proliferate in eutrophic water bodies worldwide and affect water quality. As an ancient photosynthetic microorganism, cyanobacteria can survive in ecologically diverse habitats because of their capacity to rapidly respond to environmental changes through a web of complex signaling networks, including using second messengers to regulate physiology or metabolism. A ubiquitous second messenger, bis-(3′,5′)-cyclic-dimeric-guanosine monophosphate (c-di-GMP), has been found to regulate essential behaviors in a few cyanobacteria but not Microcystis, which are the most dominant species in cyanobacterial blooms. In this study, comparative genomics analysis was performed to explore the genomic basis of c-di-GMP signaling in Microcystis aeruginosa. Results: General characterization along with a pan-genome analysis showed that M. aeruginosa have a medium size genome (4.99 Mb in average), a conserved core genome, and an expansive pan-genome. Phylogenetic analysis showed good overall congruence between the two types of phylogenetic trees based on 31 highly conserved protein-coding genes and pan-genome matrix. Furthermore, phylogenetic analysis revealed no correlation between geographic distribution and phylogenetic relationships of the M. aeruginosa strains isolated from different regions. Moreover, proteins involved in c-di-GMP metabolism and regulation, such as diguanylate cyclases, phosphodiesterases, and PilZ-containing proteins, were encoded in M. aeruginosa genomes. It was revealed that the numbers of genes that encode diguanylate cyclases, phosphodiesterases, and hybrid proteins with GGDEF-EAL domains in M. aeruginosa might result from environment-specific adaptation. Bioinformatics and structure analysis of c-di-GMP signal-related GGDEF, EAL and GGDEF-EAL domains revealed that they all possess essential conserved amino acid residues that bind the substrate. In addition, it was also found that all selected M. aeruginosa genomes encode PilZ domain containing proteins. Conclusions: Comparative genomics analysis of c-di-GMP metabolism and regulation in M. aeruginosa strains helped elucidate the genetic basis of c-di-GMP signaling pathways in M. aeruginosa. Knowledge of c-di-GMP metabolism and relevant signal regulatory processes in cyanobacteria can enhance our understanding of their adaptability to various environments and bloom-forming mechanism. Keywords: Microcystis aeruginosa, Comparative genomics, c-di-GMP, Phylogenetic analysis, GGDEF, EAL, PilZ

Epigenetics & Genomics

Microcystis aeruginosa

Comparative genomics

c-di-GMP

Phylogenetic analysis

GGDEF

EAL

PilZ

Cyanobacteria, which are phototrophic bacteria that survive in ecologically diverse habitats, have received growing attention because they have been forming toxic blooms in eutrophic water bodies worldwide for decades[1-2]. Dense blooms are considered seriously harmful to aquatic ecosystems because of their deleterious effects on water quality, such as increased turbidity, smothering submerged aquatic vegetation, and producing taste and odor compounds[3-4]. Moreover, some cyanobacteria species can synthesize toxic secondary metabolites, such as hepatotoxin microcystins that can inhibit eukaryotic protein phosphatases; thus, they threaten the functional of water bodies for drinking, bathing, and fishing, and they also ultimately pose potential risks to animal and human health[5-7]. Cyanobacteria are able to inhabit most of Earth’s environments because they evolved mechanisms to monitor and rapidly adapt to environmental changes through a web of complex signaling networks, such as using second messengers to regulate physiology or metabolism[8].

Cyanobacteria must cope with variations in the external environment, which rely on signaling molecules to translate these changes into intracellular responses to mediate adaptation to ambient conditions. Once bacterial cells sense an external stimulus, such as light and temperature, the intracellular level of a second messenger rapidly changes to amplify the biological input signal to a downstream output effector and initiate physiological changes, including sugar metabolism, motility, and biofilm production[8-9]. A ubiquitous second messenger, bis-(3′,5′)-cyclic-dimeric-guanosine monophosphate (c-di-GMP), which was first identified as an allosteric activator of cellulose synthase in Gluconacetobacter xylinus in 1987, plays an important role in regulating biofilm formation or dispersal in response to various environmental cues and cell–cell signals[10-14]. Studies have summarized that c-di-GMP regulates an astounding array of important processes in bacteria, including transcription, RNA turnover, protein synthesis, motility, virulence, and altering activities of proteins or protein complexes[15-17]. The intracellular level of c-di-GMP are modified by the rate of its synthesis and degradation in response to a variety of environmental stimuli, relying on the opposite enzymatic activity of diguanylate cyclases (DGCs) and c-di-GMP-specific phosphodiesterases (PDEs), respectively[12, 18]. DGC proteins contain a GGDEF domain that synthesizes one c-di-GMP molecule from two GTP molecules[19-20]. PDE proteins contain an EAL or, less frequently, a HD-GYP domain, which breaks down c-di-GMP into the linear molecule 5′-phosphoguanylyl-(3′-5′)-guanosine or into two GMP molecules[21-22]. Moreover, GGDEF and EAL domains can both be present in the same protein, forming “hybrid” proteins, even though they have opposing activities[23-24]. In that case, only one of the two domains is catalytically active, and the other performs a regulatory function, or a third regulatory domain is present that may disjoin the activity of the GGDEF and EAL domains[23, 25]. Ute Römling et al list a census of all GGDEF, EAL, and HD-GYP domains in bacterial genomes[12, 26]. Diverse sensor domains can modulate enzymatic activities in response to external stimuli, including N-terminal response regulator receiver (REC), Per/Arnt/Sim (PAS), histidine kinases/adenylate cyclases/methyl accepting proteins and phosphatases (HAMP), and cGMP phosphodiesterase/adenylyl cyclase/FhlA (GAF) domains[25, 27-28]. C-di-GMP has been found to be recognized by downstream receptors that have been linked to specific physiological processes, ranging from polysaccharide biosynthesis to direct regulation of gene expression and to motility. Among the downstream effectors, the PilZ domain is ubiquitous in bacteria and can bind c-di-GMP to regulate biosynthesis of biofilms, such as cellulose and alginate[29-31]. The PilZ domain can be a stand-alone protein or fused with other functional proteins, such as cellulose synthases and alginate biosynthesis protein, or attached to certain signaling domains, such as the GGDEF, EAL, and HD-GYP domains[30, 32]. Molecular mechanisms of c-di-GMP signaling in a few cyanobacteria that are indispensable photosynthetic microorganisms in the environment, such as Thermosynechococcus and Synechocystis, have been examined in-depth[33-36]. However, none of those studies have addressed Microcystis, one of the most ubiquitous freshwater cyanobacterial genera, which limits the comprehensive understanding of c-di-GMP signaling in cyanobacteria.

Genome sequencing of numerous Microcystis species has been performed, which makes it possible to improve our knowledge about c-di-GMP function in this genus. The purpose of this study was to explore the genomic basis of c-di-GMP signaling in M. aeruginosa. In this study, c-di-GMP metabolism and regulation in M. aeruginosa was revealed through in silico comparative analyses. The comparative genomic analyses were first based on phylogenetic, phylogenomic, and pan-genome analyses of the complete or draft genome sequences of 25 M. aeruginosa strains available in GenBank. Then, we identified genes that encode proteins containing the GG[D/E]EF, EAL, GG[D/E]EF-EAL, and PilZ domains in these strains. We also characterized the structural features of these domains and other associated sensing and signaling domains. The comparative genomic analysis will help elucidate c-di-GMP metabolism and relevant signal regulation processes in cyanobacteria.

General genome features of M. aeruginosa strains

Genomes of 25 M. aeruginosa strains were retrieved from the National Center for Biotechnology Information (NCBI) database for series analysis. The general features of the genomes are presented in Table 1. Except for strains NIES 2481[37] and NIES 2549[38], no plasmid sequences were discovered in other strains. The average size of the genomes was 4.99 Mb, and the average G+C content was 42.70%. Among them, strain KW had the largest genome (5.89 Mb), whereas strain PCC9806 had the smallest genome (4.26 Mb; Table 1).

Genome sequences of the strains CHAOHU 1326 and NaRes975 were recently released by our laboratory. The draft genome of strain CHAOHU 1326 contained 617 contigs with an N₅₀ of 19,902 bp and the largest contig size was 84,471 bp. The M. aeruginosa CHAOHU 1326 genome had a total size of 5,271,583 bp, with a G+C content of 42.50%. In total, 5,517 genes were identified from CHAOHU 1326, including 4,590 protein-coding sequences (CDSs), 59 RNA-coding genes, and 868 pseudogenes. RNA-coding genes consist of 46 tRNA genes, four noncoding RNAs (ncRNA), and three sets of rRNA genes. Among 4,590 CDSs, 3,434 could be assigned putative functions, whereas 1,156 were predicted to encode hypothetical proteins. As for strain NaRes975, the final assembly of the genome consisted of 413 contigs with an N₅₀ contig length of 29,122 bp and a largest contig length of 97,956 bp. The assembled genome was 5.1 million nucleotides in length with a G+C content of 42.40%. In total, there were 5,388 genes identified from the NaRes975 genome, which included 4,617 CDSs, 47 RNA-coding genes, and 724 pseudogenes. RNA sequences consisted of 40 tRNAs, four ncRNAs, and one rRNA cluster. Additionally, four strains with complete genome sequences (NIES843, NIES2481, NIES2549, and PCC7806SL) all contained two sets of rRNA clusters (5S, 16S, 23S: 2, 2, 2), as did six other strains with draft genome sequences (Additional file 2, Table S2). Additionally, 14 strains contained one rRNA cluster (5S, 16S, 23S: 1, 1, 1).

Pan-genome of M. aeruginosa

To assess genome similarity among M. aeruginosa strains, a core–pan-genome analysis was performed using all 25 M. aeruginosa genome sequences as input in the Bacterial Pan Genome Analysis (BPGA) tool[39]. The pan-genome analysis revealed a core genome of 1,993 genes with an accessory genome of 36,030 genes and 2,265 unique genes (Fig. 1a). Accessory genes are those whose orthologs are present in two or more genomes, but not in all the genomes. Additionally, unique genes are genes that are only present in one genome out of all those compared. The core–pan plot (Fig. 1b) showed that the pan-genome trend curve did not reach a plateau and seemed to extend with addition of more genomes to the analysis. Therefore, the pan-genome was considered an “open” pan-genome. In contrast, as shown in Fig. 1b, the core genome curve leveled off, which indicated that the core genome size did not significantly change with the addition of new genomes. Consequently, the core genome was considered conserved.

To determine the distribution of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Cluster of Orthologous Groups (COG) categories in M. aeruginosa, annotation of the pan-genome of the 25 genomes has been mapped (Fig. 1c and Additional file 2, Fig. S1). KEGG pathway analysis revealed that the genomes of all strains had high proportions of core, accessory, and unique genes associated with carbohydrate metabolism. More specifically, the core, accessory, and unique genes all similarly encode proteins involved in amino acid and energy metabolism. Distributions of genes in COG categories showed that numerous unique genes were responsible for cell wall/membrane/envelope biogenesis and secondary metabolite biosynthesis, transport, and catabolism, which indicates diverse strategies of environmental adaptation by secondary metabolite synthesis.

The BLAST Ring Image Generator (BRIG) alignment made it clear that most regions within the 25 M. aeruginosa genomes were conserved when compared to the reference strain NIES843 (Fig. 2). Several regions appeared to have low or even no similarity, possibly because of acquisition/deletion/rearrangement or horizontal gene transfer (HGT). These regions contained a variety of genes (shown in Additional file 1, Table S1 with annotation) that could be identified by conserved sequences flanking the variable regions. Approximately 44.6% of the genes located in these regions were annotated as hypothetical proteins.

Phylogenetic analysis of M. aeruginosa strains

The 16S rRNA gene alone is unable to provide sufficient phylogenetic resolution[40-42], especially for closely related Microcystis species; therefore, phylogenetic relationships were further examined based on a set of conserved marker genes. To assess relationships between the M. aeruginosa strains, the phylogenetic tree was constructed based on amino acid sequences of 31 highly conserved proteins that were encoded by the genes distributed in genomes as a single copy along 24 M. aeruginosa genomes (strain SPC777 was not analyzed here because of the lack of a conserved marker gene). The sequences of the same proteins from Synechocystis sp. PCC6803 were used as an outgroup. The resulting phylogenetic tree (Figure 3a) revealed a topology with generally well-defined nodes, with bootstrap support values greater than 50% over 1,000 replicates. Compared with the phylogenetic tree based on the 16S rRNA gene (Additional file 2, Figure S2), the phylogenetic tree based on the conserved marker genes (Figure 3a) produced higher resolution, by which different strains well distributed and clustered for discrimination. Phylogenomic analyses based on binary gene presence/absence (1/0) pan-genome matrix generated by BPGA pipeline resulted in a tree (Figure 3b) with a topology similar to the tree obtained using conserved marker genes (Figure 3a). Both phylogenetic trees provided more robust topologies than that based on 16S rRNA gene analysis alone. Specifically, strains NIES843, PCC9809, KW, and NIES88 were included in the same clade, and strain CHAOHU 1326 was closely related to strains PCC9807, PCC9443, and PCC9717. Pairs of stains also appeared to be phylogenetically closely related, such as NIES2549[38] and NIES2481[37], DIANCHI905 and PCC7806SL[43], and NaRes975 and PCC9808. Strain M. aeruginosa NIES1211 seemed to be the most divergent.

The majority of the M. aeruginosa strains were isolated in different locations, but no correlation was found between their geographic distribution or bloom-forming ability and phylogenetic relationships (Table 1). Among them, strains CHAOHU 1326[44], DIANCHI905, NaRes975, and TAIHU98[45] are bloom-forming strains isolated from China. Strains NIES843[46] and PCC7806SL are representatives of toxic (microcystin-producing) bloom-forming strains; in contrast, NIES98[47], NIES44[48], NIES2549[38], and NIES87[49] are non-microcystin-producing strains.

Modular signaling proteins involved in c-di-GMP metabolism and regulation in M. aeruginosa

A genome search for genes that encode enzymes involved in c-di-GMP metabolism was performed to identify the putative translated products that have DGC and PDE activities in the selected 25 M. aeruginosa genomes. The predicted proteins are listed in Table 2 and the accession numbers of the corresponding proteins are shown in Additional file 2, Table S3. This survey led to identification of three enzymatic classes of predicted proteins DGCs, PDEs, and hybrid DGC–PDEs, which contain GGDEF and EAL domains, even though they have opposing activities. As listed in Table 2, 14 of the 25 M. aeruginosa genomes had genes that encode DGC enzymes, which contain a fused N-terminal REC domain and GGDEF domain in tandem. The REC domain, as a signal receiver domain present in association with c-di-GMP metabolism domains, is supposed to modulate the enzymatic activities in response to the internal or external stresses. Interestingly, compared with the HD-GYP domain-containing PDEs, which were identified in all selected M. aeruginosa genomes and seemed to be highly conserved proteins with partial EAL domains were found less frequently (in only three genomes). Except for the NIES44 genome, each of the other 24 genomes was found to have a GG[D/E]EF-EAL hybrid protein. The GG[D/E]EF domain-containing DGCs and GG[D/E]EF-EAL hybrid proteins also belonged to accessory genes according to the core–pan-genome analyses.

Bacteria express a variety of sensory and signal transduction proteins to sense and adapt to changes in the physicochemical makeup of their environment. Sensory and signal transduction proteins encoded in the selected 25 M. aeruginosa genomes were predicted, and 12 sensory domain-containing proteins were found. The accession numbers and domain architectures of the highly conserved GAF, PAS, and REC domain-containing proteins are listed in Table 3. As an important sensor for photosensory behavior, the GAF domain was commonly associated with c-di-GMP domains in cyanobacteria[50]. As many as 11 of the 12 proteins had the GAF domain, and some even contained two. PAS-containing proteins are related to sensory input (GAF), transduction (HAMP), or output (histidine kinases). Half of the four predicted PAS-containing proteins contain a PAC motif, a conserved region of 40–45 amino acids located at the carboxy-terminal of the PAS domain, which contributes to PAS structure [27]. Interestingly, some sensory domain-containing proteins in different genomes were identical, and were therefore assigned the same accession number, such as NIES2549 and NIES2481, DIANCHI905 and PCC7806SL, and NaRes975 and PCC9808.

Structural features of GGDEF and EAL domains of DGCs and hybrid proteins of M. aeruginosa strains

To elucidate the structural features, structure predictive modeling of GGDEF and EAL domains of the DGCs and hybrids proteins was performed on the corresponding M. aeruginosa strains. The NIES843 genome is a representative genome of M. aeruginosa because of its genome has been completely sequenced and is modeled in Figure 4. Similarly, the corresponding structural models of strain CHAOHU 1326 were shown in Additional file 2, Figure S3.

Using SWISS-MODEL, the structure of the GGDEF domain of the DGC protein was modeled based on the crystal structure of the conserved GGDEF domain of WspR (Protein Data Bank (PDB) id: 3BRE), which has a crystallographic resolution of 2.4 Å[51-52]. Structural alignments were performed using the GGDEF domain of 3BRE from amino acids S173 to N329 (Fig. 4a, left). C-di-GMP binds to the catalytic site and to a second site distal to the catalytic loop. DGC proteins possess a conserved allosteric inhibition site (I site), composed of a RXXD motif (in which X represents any amino acid) five amino acids upstream of the GGDEF active site, that is important for controlling DGC activity. When levels of c-di-GMP are high, the second messenger can bind the RXXD motif, thereby repressing the DGC activity[53]. A systematic analysis and comparison of the 14 genomes that have corresponding GGDEF domains was performed to identify the amino acid motifs or signatures involved in catalysis and allosteric inhibition. As shown in Fig. 4a left, the WebLogo alignment revealed that the RXXD and GGEEF motifs of the GGEEF domain are highly conserved in the same amino acid residues: Arg-Gln-Val-Asp (RQVD) and Gly-Gly-Glu-Glu-Phe (GGEEF), respectively. Even though the amino acid sequences of the putative DGC proteins that only contained the GG[D/E]EF domain did not possess high percentages of identity (34.2–37.2%, Additional file 2, Table S4), the proteins contained all of the essential conserved amino acid residues that bind the substrate GTP, which indicates they may have catalytic activity.

Because only three genomes had partial EAL domains, the EAL domain in hybrid proteins from the M. aeruginosa NIES843 genome were chosen as paradigms to examine the crystal structure. Based on the crystal structure of the GGDEF-EAL domain of RmcA (PDB ID: 5M3C), which has a crystallographic resolution of 2.8 Å[54], the GGDEF and EAL domains in the hybrid protein of NIES843 were modeled. Compared with 5M3C, the GGDEF-EAL domains in the hybrid proteins showed sequence conservation of 35.9–37.8% (Additional file 2, Table S5). The low sequence conservation appeared to have no impact on model prediction by SWISS-MODEL. Compared with DGCs that contained only the GGDEF domain, amino acid residues of RXXD and GGDEF motifs in the GGDEF domain of the hybrid proteins were less conserved (Fig. 4a, right). The WebLogo alignment in Fig. 4b showed that amino acid residues of the EAL domain involved in the binding of c-di-GMP and catalytic activity were highly conserved in all sequences. The Glu in the EAL signature motif is an essential residue that is required to bind the c-di-GMP, whereas a change of Ala into Val (EVL) still sustains the enzymatic activity[55]. Arg in the second position downstream of the EAL signature motif was conserved in nearly all EAL domain sequences; thus, the EAL signature motif can be extended as EXLXR motif, which forms a stable platform to bind c-di-GMP[56].

Structural features of the PilZ domain of M. aeruginosa strains

All selected M. aeruginosa genomes encoded proteins that possess a PilZ domain, except for M. aeruginosa SPC777, which had two corresponding domains. Twenty-one genomes encoded cellulose synthase (CelA), which contained a C-terminal PilZ domain, and the other four genomes encoded a protein that contained only a PilZ domain. The accession numbers of the corresponding proteins are shown in Additional file 2, Table S6.

To identify the structural features, structure predictive modeling of proteins with a single PilZ domain and PilZ domain-containing CelA was performed for M. aeruginosa strains. Predictive modeling was based on the crystal structure of the BcsA (PDB id: 4P02) from Rhodobacter sphaeroides, which has a crystallographic resolution of 2.65 Å, according to SWISS-MODEL results[29]. Modeling of the PilZ domain-containing protein CelA of strain CHAOHU 1326 is shown in Figure 5a. The c-di-GMP-binding PilZ domain was located in the C-terminal region of CelA and had similar structure with protein containing a single PilZ domain in Figure 5b, which were derived from the representative M. aeruginosa strain NIES843. Figure 5c shows that the PilZ domain consists of a six-stranded β-barrel and a short α-helix that follows the last strand of the β-barrel.

The occurrence of cyanobacterial blooms appears to be increasing because of environmental factors, including continued eutrophication, rising atmospheric CO₂ concentrations, and global warming[57-59]. Cyanobacteria can survive in ecologically diverse habitats, to a great extent, because intracellular second messengers function in pathways that mediate cellular responses to oxidative stress, nutrient imbalances, and temperature variations in the environment[8]. C-di-GMP, as a universal bacterial second messenger, has been shown to regulate biofilm formation and aggregation, which are beneficial for cyanobacteria colony formation and thus promotes bloom formation[33, 60]. With recent advances in genome sequencing and bioinformatics, it is possible to identify sequence groups with high genotypic similarity based on variation in protein-coding genes distributed across the genomes and predictions drawn from bioinformatics, and thereby provide genetic insight into c-di-GMP signaling regulation in M. aeruginosa. Because only one or two M. aeruginosa genomes do not adequately represent this species, 25 M. aeruginosa genomes available in NCBI’s GenBank were selected to comprehensively clarify the genetic similarities and differences of M. aeruginosa strains in the present study.

The selected M. aeruginosa strains in this study diverged to some extent at the genomic level and were isolated from aquatic ecosystems around the world. An in-depth comparative genomics analysis was conducted that included genome feature analysis, core–pan-genome analysis, and phylogenetic analysis were used to distinguish differences and similarities among the 25 selected M. aeruginosa genomes. The average size of these genomes was 4.99 Mb and the average G+C content was 42.67%. Genomes ranged in size from 4.26 Mb (M. aeruginosa PCC9806) to 5.89 Mb (M. aeruginosa KW). Core–pan-genome analysis revealed that the selected M. aeruginosa shared a core genome of 1,993 genes, a pan-genome of 36,030 accessory genes, and 2,265 unique genes. The core–pan-genome analysis indicated that these strains maintained a conserved core genome and an expansive pan-genome that allow them to acquire new genes.

Microcystis aeruginosa genome sizes result from a mix of gains of losses during natural selection as they were subjected to changing environments and competitive forces during the evolution of the species. As a freshwater species, M. aeruginosa have medium genomes compared with other cyanobacteria, especially compared with marine species that mostly occur in low nutrient and stable open ocean waters, such as Synechococcus and Prochlorococcus, the genome sizes of which are almost half those of M. aeruginosa[61]. Some reports indicated that genome size is positively correlated with the number of duplicated genes, which can originate from either within the genome itself or can be introduced by HGT[62]. Gene duplication and high genetic redundancy in the M. aeruginosa genomes are considered an evolutionary strategy that might confer this cyanobacterial species an extensive adaptive capacity that allows them to inhabit a wide range of habitats worldwide, and facilitates the ability to proliferate and dominate the phytoplankton communities in eutrophic freshwater ecosystems[63].

To gain a comprehensive understanding of the phylogenetic relationships among M. aeruginosa strains, a multilocus sequence typing approach based on 31 conserved gene sequences previously validated as phylogenetic markers for (cyano) bacteria was used instead of discrimination based only on the traditional 16S rRNA gene, which does not sufficiently discriminate between strains[39, 64]. To strengthen the analyses, phylogenomic analyses based on a binary gene presence/absence (1/0) pan matrix were generated by the BPGA pipeline. The phylogenomic tree based on whole genome information was more reliable compared with the phylogenetic tree only based on the 16S rRNA gene. Furthermore, good overall congruence was found in the composition of these trees, as each clade was identical or nearly identical.

It seems that the relatedness of the closely related strains studied did not perfectly reflect their similar physiological characteristics or geographical origins. In this study, the 25 strains were isolated from water bodies of several countries, and their substantial similarity indicated that distinct geographic distribution may not be a determinant of intraspecies divergence. Phylogenetic analysis could also not reveal the M. aeruginosa strains with bloom-forming characteristics. Previous studies demonstrated that Microcystis “species” distinctions are problematic and doubtful[65-66]. Microcystis taxonomic studies using 16S rRNA analysis revealed that phylogenetic trees using sequences with significantly high sequence similarities did not clearly delineate Microcystis species[67-68]. In this study, phylogenetic analyses were established that combined 31 protein-encoding phylogenetic marker genes with phylogenomic analyses based on a binary gene presence/absence (1/0) pan matrix, and more tests are needed to further determine whether this alternative approach could better classify Microcystis species.

In this study, bioinformatics tools furthered our understanding of c-di-GMP signaling in M. aeruginosa by recognizing and studying domain architectures and tridimensional structures of the predicted proteins with DCGs, PDEs, and DGC–PDEs encoded in the genomes. These encoding genes are widespread in other cyanobacterial species, such as Synechocystis sp. PCC6803 and Thermosynechococcus elongatus BP-1, which reportedly encode a considerable number of proteins predicted to be involved in c-di-GMP metabolism[34, 50]. In general, the number of domains involved in c-di-GMP signaling in cyanobacteria may be mainly determined by genome size[69]. However, there are at most three c-di-GMP signal-related domains identified in M. aeruginosa genomes, even if the mean genome size of this species is nearly two-fold that of the Synechocystis sp. PCC6803 and Thermosynechococcus elongatus BP-1. An alternate explanation is that, in cyanobacteria, the number of c-di-GMP signal-related domains are not simply correlated with genome size but may also be affected by bacterial adaptation. Among the species present in the CyanoBase database, the species found to lack c-di-GMP signaling systems were Prochlorococcus and some Synechococcus strains. It was reported that Synechococcus strains that contain c-di-GMP-modulating domains inhabit both marine and freshwater habitats and are found in rich-nutrient (eutrophic) waters, whereas Synechococcus strains lacking c-di-GMP-regulatory domains inhabit low-nutrient (oligotrophic) marine habitats[34]. Species adapted to stable habitats may have lost genes that encode c-di-GMP-modulating proteins. Primitive M. aeruginosa that inhabit low-nutrient lakes may have a small number of c-di-GMP domains even though they have relatively large genomes[70].

GG[D/E]EF and EAL domain-containing proteins analyzed in this study included all essential conserved amino acid residues that bind the corresponding substrate to have enzymatic activity. Structural analysis provides important information for predicting the function of these proteins that contain GGDEF, EAL, and hybrid domains, and creates a paradigm for future studies that analyze the evolution of enzymes involved in c-di-GMP metabolism. The domain architectures of the deduced amino acid sequences from the M. aeruginosa genomes also revealed diverse sensor domains, such as REC, PAS/PAC, GAF, and HAMP, which are involved in activity regulation by driving the protein dimerization process and play important roles in c-di-GMP-controlled rapid response to changing environmental conditions. Some sensory domain-containing proteins from different genomes have identical amino acid sequences, such as that of NIES2549 and NIES2481, DIANCHI905 and PCC7806SL, and NaRes975 and PCC9808. It should be noted that each pair of strains have close genetic relationships as determined by phylogenetic analysis. It was also found that all selected M. aeruginosa genomes encode PilZ domain, regardless of if it is in CelA, by which c-di-GMP could stimulate the biosynthesis of extracellular polysaccharides that are important for biofilm formation.

The M. aeruginosa genome analysis revealed that regulatory pathways of c-di-GMP signaling networks might be present in some M. aeruginosa strains similar to other bacteria. For example, environmental stimulations and cell–cell signals may be sensed by diverse sensor domains, such as PAS and GAF domains, which then stimulate DGCs or PDEs to synthesize or hydrolyze c-di-GMP, and thus control the level of c-di-GMP. Simultaneously, c-di-GMP is recognized by downstream receptors that have been linked to specific physiological processes, ranging from polysaccharide biosynthesis to direct regulation of gene expression. Accordingly, cells could respond in time to adapt to the new conditions. In addition, there was correlation in other organisms between specific habitats and the presence of c-di-GMP domains[34]. In some M. aeruginosa strains, the missing or lack of encoding genes for DGC or PDE revealed that c-di-GMP signaling regulation might not be the sole alternative regulatory pathway in this ancient photosynthetic microorganism, especially for strains adapted to stable habitats. Moreover, M. aeruginosa strains might use other signal molecules, such as NO, to regulate diverse biochemical and physiological processes[8].

In summary, comparative genomic analysis of 25 publicly available M. aeruginosa genomes focusing on c-di-GMP metabolism and regulation revealed the following main results:

(1) Based on the general genome features and pan-genome of the species we studied, M. aeruginosa strains have a medium genome (4.99 Mb in average), a conserved core genome, and an expansive pan-genome that result from natural selection, and enhanced survival and proliferation in various habitats.

(2) Phylogenetic and phylogenomic analysis revealed that the relatedness of the closely related M. aeruginosa strains did not reflect the geographical origins, even though they were isolated from diverse freshwater ecological environments. The characterized divergences and similarities revealed by phylogenetic and phylogenomic trees help further clarify M. aeruginosa evolution.

(3) In silico analysis of signaling related DGCs, PDEs, and hybrid proteins revealed that GGDEF, EAL and GGDEF-EAL domains all contained all essential conserved amino acid residues that bind the corresponding substrate to have catalytic activity. In addition, it was also found that all selected M. aeruginosa genomes encode PilZ domain, regardless of if it is in CelA. Moreover, the numbers of DGCs, PDEs, and the hybrid proteins present in M. aeruginosa strains might result from environment-specific adaptation.

This study is the first to analyze c-di-GMP signal-related proteins in M. aeruginosa, and our findings provide a pre-requisite genetic basis for further experimental characterization and evaluation of biological function. Some important aspects are still unclear that could help enhance our understanding of M. aeruginosa blooms in aquatic environments, such as the involvement of the specific domain-containing proteins of c-di-GMP signaling networks in M. aeruginosa physiological regulation and an ecologically relevant explanation of how M. aeruginosa adapts to its specific ecological niche.

Microcystis aeruginosa genomes

All of the M. aeruginosa genome sequences available in June 2018 in the NCBI database, annotated with the Prokaryotic Genome Annotation Pipeline[71], were used to conduct various analyses. Draft genomes that consisted of more than 1,000 contigs were omitted to obtain consistent genome quality. The sequencing and sequence assembly of M. aeruginosa strain NaRes975 and CHAOHU 1326 genomes were performed as previously described[44].

Comparative genome analyses

Core–pan-genome analysis was performed using the BPGA tool[39]. Orthologous clusters were assigned by grouping all protein sequences encoded by the 25 genomes using the default clustering tool USEARCH based on the default 50% sequence identity cut-off. Core–pan-genome plots were calculated over 500 iterations. Comparative functional analysis was performed based on COG of proteins and KEGG pathways by focusing on distributions of representative protein sequences of core, accessory, and unique clusters of the M. aeruginosa strains. BRIG(version 0.95)[72] was used to create a circular genome comparison to highlight the areas of difference and similarity between the 25 genomes compared with the reference sequence.

Phylogenetic analyses

To elucidate the phylogenetic relationships between the M. aeruginosa strains, 16S rRNA gene sequences of cyanobacterial strains for which whole genome sequence data were available on NCBI were downloaded and analyzed to construct a phylogenetic tree. Sequences were aligned in MUSCLE version 3.8 with default settings[73]; then, the phylogenetic and molecular evolutionary analyses were conducted using MEGA version X[74]. The phylogenetic tree was inferred using the neighbor-joining method with 1,000 bootstrap replications. The evolutionary distances were computed using the maximum composite likelihood method and the units were number of base substitutions per site. The analysis involved 26 nucleotide sequences, including 25 sequences of M. aeruginosa strains and the Synechocystis sp. PCC6803 sequence as the outgroup. All positions that contained gaps and missing data were eliminated. There were a total of 1,313 nucleotides in the final dataset.

A multilocus sequence typing approach based on concatenation of 31 conserved marker genes, most of which encode ribosomal proteins, was used to generate the phylogenomically reconstructed tree following protocols described by Wu and Eisen (2008)[64, 75]. These protein sequences were mined by the AutoMated Phylogenomic inference Application−AMPHORA2 tool[76-77], using default settings for the bacteria option and a cut-off E-value of 1 e-10. Individual alignments were performed for each of the 31 gene sets in MUSCLE version 3.8 with default settings[73], trimmed with respect to the reading frame, and subsequently concatenated with the FaBox Fasta Alignment Joiner[78]. Only genomes with all selected sets of conserved genes were used in the phylogenetic analysis. A maximum likelihood tree was constructed with MEGA X using the Jones–Taylor–Thornton model with nearest neighbor interchange[74, 79]. Then, 1,000 bootstrap replicates were calculated to evaluate relative branch support. The analysis involved 25 genome sequences, including 24 sequences of M. aeruginosa strains and the Synechocystis sp. PCC6803 sequence the outgroup. There were 7,481 total nucleotides in the final dataset.

The pan phylogenetic tree was reconstructed using the neighbor-joining algorithm based on a binary gene presence/absence (1/0) pan matrix generated by BPGA from orthologous clusters after clustered by USEARCH.

Genes and proteins analyses

Genes that encode the GG[D/E]EF, EAL, GG[D/E]EF-EAL domains; the related sensor GAF, PAS, and HAMP domains; and the c-di-GMP binding domain PilZ from the selected 25 M. aeruginosa genome sequences were identified by performing BLAST searches against the NCBI GenBank database and Conserved Domain database (CDD)[80], Microbial Signal Transduction Database (MiST, version 3.0)[81], Pfam[82], and PROSITE[83], and characterized using the Simple Modular Architecture Research Tool (SMART)[84]. CDD was used to identify the amino acids of the motifs present in the various domains. Automated protein structure models were predicted and built by the SWISS-MODEL server[51] by searching for evolutionarily related protein structures against the SWISS-MODEL template library SMTL based on the PDB database[85-86]. In this platform, templates are ranked based on the expected quality of the resulting models, and estimated by Global Model Quality Estimate and Quaternary Structure Quality Estimate[86-87]. The crystal structures of a DGC (WspR) from P. aeruginosa[52], the GG[D/E]EF-EAL hybrid domain protein RmcA from P. aeruginosa[54], and the PilZ domain-containing protein BcsA from R. sphaeroides were selected as templates for the structural analyses[29]. QMEAN scoring functions were used to estimate alternative models and screen for models whose scores strongly matched high-resolution structures that were then used to create the corresponding model[88]. Multiple protein sequence alignments were generated through MUSCLE with default parameters. Conserved motif sequence figures were visualized using WebLogo based on aligned amino acid sequences[89]. Structures were matched using Chimera UCSF[90].

BPGA: Bacterial Pan Genome Analysis;

BRIG: BLAST Ring Image Generator;

c-di-GMP: bis-(3’ ,5’)-cyclic-dimeric-guanosine monophosphate;

CDD: Conserved Domain database;

CDSs: protein-coding sequences;

COG: Cluster of Orthologous Groups;

DGCs: diguanylate cyclases;

GAF: cGMP phosphodiesterase/adenylyl cyclase/FhlA;

HAMP: histidine kinases/adenylate cyclases/methyl accepting proteins and phosphatases;

HGT: Horizontal gene transfer;

KEGG: Kyoto Encyclopedia of Genes and Genomes;

MiST: Microbial Signal Transduction Database;

NCBI: National Center for Biotechnology Information;

ncRNA: noncoding RNA

PAS: Per/Arnt/Sim;

PDB: Protein Data Bank;

PDEs: phosphodiesterases;

REC: response regulator receiver;

SMART: Simple Modular Architecture Research Tool;

Nucleotide sequence accession numbers

The whole genome sequences of M. aeruginosa CHAOHU 1326 and NaRes975 were deposited in the DDBJ/ENA/GenBank database under accession numbers MOLZ00000000 and MOLN00000000, respectively.

Additional files

Additional file 1. Table S1. Annotation of the gap regions in circular map of 25 M. aeruginosa genomes (.xlsx).

Additional file 2. Supplementary Figures S1−S3 and Tables S2−S6 (.docx).

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and materials

All genome sequences analyzed during the current study are available from the National Centre for Biotechnology Information (NCBI) genome database. Additionally, the whole genome sequence of M. aeruginosa CHAOHU 1326 and NaRes975 have been deposited in the GenBank database under accession number MOLZ00000000 and MOLN00000000, respectively. All data generated during this study are presented within the manuscript and/or additional files.

Competing interests

All authors declare that they have no competing interests.

Funding

This work was supported by the Natural Science Foundation of China (No. 21577081).

Author Contributions

LL conceived the study and supervised the research. MC retrieved the sequences from the database, performed the comparative genomic analyses and was a major contributor in writing the manuscript. CYX participated in the genomic and structural analyses. XW, CYR, and JD contributed to the revision of the structure and content of the manuscript. All authors read and approved the final manuscript.

Harke MJ, Steffen MM, Gobler CJ, Otten TG, Wilhelm SW, Wood SA, et al. A review of the global ecology, genomics, and biogeography of the toxic cyanobacterium, Microcystis spp. Harmful Algae. 2016;54:4-20.
Garcia-Pichel F, Belnap J, Neuer S, Schanz F. Estimates of global cyanobacterial biomass and its distribution. Arch Hydrobiol Suppl Algol Stud. 2003;109:213-227; doi: 10.1127/1864-1318/2003/0109-0213.
Paerl HW, Huisman J. Blooms like it hot. Science. 2008;320(5872):57-58.
Huisman J, Codd GA, Paerl HW, Ibelings BW, Verspagen JMH, Visser PM. Cyanobacterial blooms. Nat Rev Microbiol. 2018;16(8):471-483; doi: 10.1038/s41579-018-0040-1.
MacKintosh C, Beattie KA, Klumpp S, Cohen P, Codd GA. Cyanobacterial microcystin-LR is a potent and specific inhibitor of protein phosphatases 1 and 2A from both mammals and higher plants. FEBS Letters. 1990;264(2):187-192; doi: 10.1016/0014-5793(90)80245-e.
Yoshizawa S, Matsushima R, Watanabe MF, Harada K, Ichihara A, Carmichael WW, et al. Inhibition of protein phosphatases by microcystins and nodularin associated with hepatotoxicity. J Cancer Res Clin Oncol. 1990;116(6):609-614; doi: 10.1007/bf01637082.
Codd GA, Lindsay J, Young FM, Morrison LF, Metcalf JS. Harmful Cyanobacteria. In: Huisman J, Matthijs HCP, Visser PM, editors. Harmful Cyanobacteria. Dordrecht: Springer Netherlands; 2005. p. 1-23.
Agostoni M, Montgomery BL. Survival strategies in the aquatic and terrestrial world: the impact of second messengers on cyanobacterial processes. Life-Basel. 2014;4(4, Sp. Iss. SI):745-769; doi: 10.3390/life4040745.
Townsley L, Yildiz FH. Temperature affects c-di-GMP signalling and biofilm formation in Vibrio cholerae. Environ Microbiol. 2015;17(11):4290-4305; doi: 10.1111/1462-2920.12799.
Valentini M, Filloux A. Biofilms and cyclic di-GMP (c-di-GMP) signaling: lessons from Pseudomonas aeruginosa and other bacteria. J Biol Chem. 2016;291(24):12547-12555; doi: 10.1074/jbc.R115.711507.
An SW, Wu JE, Zhang LH. Modulation of Pseudomonas aeruginosa biofilm dispersal by a cyclic-di-GMP phosphodiesterase with a putative hypoxia-sensing Domain. Appl Environ Microbiol. 2010;76(24):8160-8173; doi: 10.1128/aem.01233-10.
Römling U, Galperin MY, Gomelsky M. Cyclic di-GMP: the first 25 years of a universal bacterial second messenger. Microbiol Mol Biol Rev. 2013;77(1):1-52; doi: 10.1128/MMBR.00043-12.
Sauer K. c-di-GMP Signaling. Walker JM, editor2017.
Boyd CD, O'Toole GA. Second messenger regulation of biofilm formation: breakthroughs in understanding c-di-GMP effector systems. Annu Rev Cell Dev Biolo. 2012;28(28):439.
Duerig A, Abel S, Folcher M, Nicollier M, Schwede T, Amiot N, et al. Second messenger-mediated spatiotemporal control of protein degradation regulates bacterial cell cycle progression. Genes Dev. 2009;23(1):93-104; doi: 10.1101/gad.502409.
He Y-W, Zhang L-H. Quorum sensing and virulence regulation in Xanthomonas campestris. FEMS Microbiol Rev. 2008;32(5):842-857; doi: 10.1111/j.1574-6976.2008.00120.x %J FEMS Microbiology Reviews.
Liang ZX. The expanding roles of c-di-GMP in the biosynthesis of exopolysaccharides and secondary metabolites. Nat Prod Rep. 2015;32(5):663-683; doi: 10.1039/c4np00086b.
Ryjenkov DA, Tarutina M, Moskvin OV, Gomelsky M. Cyclic diguanylate is a ubiquitous signaling molecule in bacteria: insights into biochemistry of the GGDEF protein domain. J Bacteriol. 2005;187(5):1792-1798; doi: 10.1128/JB.187.5.1792-1798.2005.
Chan C, Paul R, Samoray D, Amiot NC, Giese B, Jenal U, et al. Structural basis of activity and allosteric control of diguanylate cyclase. Proc Natl Acad Sci U S A. 2004;101(49):17084-17089; doi: 10.1073/pnas.0406134101.
Whiteley CG, Lee DJ. Bacterial diguanylate cyclases: Structure, function and mechanism in exopolysaccharide biofilm development. Biotechnol Adv. 2015;33(1):124-141; doi: 10.1016/j.biotechadv.2014.11.010.
Sultan SZ, Pitzer JE, Boquoi T, Hobbs G, Miller MR, Motaleb MA. Analysis of the HD-GYP domain cyclic dimeric GMP phosphodiesterase reveals a role in motility and the enzootic life cycle of Borrelia burgdorferi. Infect Immun. 2011;79(8):3273-3283; doi: 10.1128/iai.05153-11.
Christen M, Christen B, Folcher M, Schauerte A, Jenal U. Identification and characterization of a cyclic di-GMP-specific phosphodiesterase and its allosteric control by GTP. J Biol Chem. 2005;280(35):30829-30837; doi: 10.1074/jbc.M504429200.
Chou SH, Galperin MY. Diversity of cyclic di-GMP-binding proteins and mechanisms. J Bacteriol. 2016;198(1):32-46; doi: 10.1128/jb.00333-15.
Navarro M, De N, Bae N, Wang Q, Sondermann H. Structural analysis of the GGDEF-EAL domain-containing c-di-GMP receptor FimX. Structure. 2009;17(8):1104-1116; doi: 10.1016/i.str.2009.06.010.
Galperin MY. Diversity of structure and function of response regulator output domains. Curr Opin Microbiol. 2010;13(2):150-159; doi: 10.1016/j.mib.2010.01.005.
Römling U, Galperin MY, Gomelsky M. Distribution of GGDEF, EAL, HD-GYP and PilZ domains in bacterial genomes 2013 [updated 2016 Aug 31. Available from: https://www.ncbi.nlm.nih.gov/Complete_Genomes/c-di-GMP.html.
Henry JT, Crosson S. Ligand-binding PAS domains in a genomic, cellular, and structural context. Annu Rev Microbiol. 2011;65(1):261-286; doi: 10.1146/annurev-micro-121809-151631.
Schirmer T. C-di-GMP synthesis: structural aspects of evolution, catalysis and regulation. J Mol Biol. 2016;428(19):3683-3701; doi: 10.1016/j.jmb.2016.07.023.
Morgan JLW, McNamara JT, Zimmer J. Mechanism of activation of bacterial cellulose synthase by cyclic di-GMP. Nat Struct Mol Biol. 2014;21(5):489-496; doi: 10.1038/nsmb.2803.
Amikam D, Galperin MY. PilZ domain is part of the bacterial c-di-GMP binding protein. Bioinformatics. 2006;22(1):3-6; doi: 10.1093/bioinformatics/bti739.
Schäper S, Steinchen W, Krol E, Altegoer F, Skotnicka D, Søgaard-Andersen L, et al. AraC-like transcriptional activator CuxR binds c-di-GMP by a PilZ-like mechanism to regulate extracellular polysaccharide production. Proc Natl Acad Sci U S A. 2017;114(24):E4822-E4831; doi: 10.1073/pnas.1702435114.
Fujiwara T, Komoda K, Sakurai N, Tajima K, Tanaka I, Yao M. The c-di-GMP recognition mechanism of the PilZ domain of bacterial cellulose synthase subunit A. Biochem Biophys Res Commun. 2013;431(4):802-807; doi: 10.1016/j.bbrc.2012.12.103.
Agostoni M, Waters CM, Montgomery BL. Regulation of biofilm formation and cellular buoyancy through modulating intracellular cyclic di-GMP levels in engineered cyanobacteria. Biotechnol Bioeng. 2016;113(2):311-319; doi: 10.1002/bit.25712.
Agostoni M, Koestler BJ, Waters CM, Williams BL, Montgomery BL. Occurrence of cyclic di-GMP-modulating output domains in cyanobacteria: an illuminating perspective. mBio. 2013;4(4):e00451-00413; doi: 10.1128/mBio.00451-13.
Gen E, Ryouhei N, Takashi S, Rei N, Masahiko I. Cyanobacteriochrome SesA is a diguanylate cyclase that induces cell aggregation in Thermosynechococcus. J Biol Chem. 2014;289(36):24801-24809.
Savakis P, De CS, Angerer V, Ruppert U, Anders K, Essen LO, et al. Light-induced alteration of c-di-GMP level controls motility of Synechocystis sp. PCC 6803. Mol Microbiol. 2012;85(2):239-251.
Yamaguchi H, Suzuki S, Osana Y, Kawachi M. Complete genome sequence of Microcystis aeruginosa NIES-2481 and common genomic features of group G M. aeruginosa. J Genomics. 2018;6:30-33; doi: 10.7150/jgen.24935.
Yamaguchi H, Suzuki S, Tanabe Y, Osana Y, Shimura Y, Ishida K-I, et al. Complete genome sequence of Microcystis aeruginosa NIES-2549, a bloom-forming cyanobacterium from Lake Kasumigaura, Japan. Genome Announc. 2015;3(3):e00551-00515; doi: 10.1128/genomeA.00551-15.
Chaudhari NM, Gupta VK, Dutta C. BPGA- an ultra-fast pan-genome analysis pipeline. Sci Rep. 2016;6:24373; doi: 10.1038/srep24373.
Hasegawa M, Hashimoto T. Ribosomal RNA trees misleading? Nature. 1993;361(6407):23-23; doi: 10.1038/361023b0.
Jonathan H. Badger JAEaNLW. Genomic analysis of Hyphomonas neptunium contradicts 16S rRNA gene-based phylogenetic analysis: implications for the taxonomy of the orders "Rhodobacterales" and Caulobacterales. Int J Syst Evol Microbiol. 2005;55(3):1021-1026.
Woese CR, Achenbach L, Rouviere P, Mandelco L. Archaeal phylogeny: Reexamination of the phylogenetic position of Archaeoglohus fulgidus in light of certain composition-induced artifacts. Syst Appl Microbiol. 1991;14(4):364-371; doi: 10.1016/S0723-2020(11)80311-5.
Frangeul L, Quillardet P, Castets A-M, Humbert J-F, Matthijs HCP, Cortez D, et al. Highly plastic genome of Microcystis aeruginosa PCC 7806, a ubiquitous toxic freshwater cyanobacterium. BMC Genomics. 2008;9:274-274; doi: 10.1186/1471-2164-9-274.
Chen M, Tian L-L, Ren C-Y, Xu C-Y, Wang Y-Y, Li L. Extracellular polysaccharide synthesis in a bloom-forming strain of Microcystis aeruginosa: implications for colonization and buoyancy. Sci Rep. 2019;9(1):1251; doi: 10.1038/s41598-018-37398-6.
Yang C, Zhang W, Ren M, Song L, Li T, Zhao J. Whole-genome sequence of Microcystis aeruginosa TAIHU98, a nontoxic bloom-forming strain isolated from Taihu Lake, China. Genome Announc. 2013;1(3):e00333-00313; doi: 10.1128/genomeA.00333-13.
Kaneko T, Nakajima N, Okamoto S, Suzuki I, Tanabe Y, Tamaoki M, et al. Complete genomic structure of the bloom-forming toxic cyanobacterium Microcystis aeruginosa NIES-843. DNA Res. 2007;14(6):247-256; doi: 10.1093/dnares/dsm026.
Yamaguchi H, Suzuki S, Sano T, Tanabe Y, Nakajima N, Kawachi M. Draft genome sequence of Microcystis aeruginosa NIES-98, a non-microcystin-producing cyanobacterium from Lake Kasumigaura, Japan. Genome Announc. 2016;4(6):e01187-01116; doi: 10.1128/genomeA.01187-16.
Okano K, Miyata N, Ozaki Y. Whole genome sequence of the non-microcystin-producing Microcystis aeruginosa strain NIES-44. Genome Announc. 2015;3(2):e00135-00115; doi: 10.1128/genomeA.00135-15.
Yamaguchi H, Suzuki S, Kawachi M. Draft genome sequence of Microcystis aeruginosa NIES-87, a bloom-forming cyanobacterium from Lake Kasumigaura, Japan. Genome Announc. 2018;6(8):e01596-01517; doi: 10.1128/genomeA.01596-17.
Savakis P, De Causmaecker S, Angerer V, Ruppert U, Anders K, Essen LO, et al. Light-induced alteration of c-di-GMP level controls motility of Synechocystis sp. PCC 6803. Molecular microbiology. 2012;85(2):239-251; doi: 10.1111/j.1365-2958.2012.08106.x.
Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46(Web Server issue):W296-W303.
De N, Pirruccello M, Krasteva PV, Bae N, Raghavan RV, Sondermann H. Phosphorylation-independent regulation of the diguanylate cyclase WspR. PloS Biol. 2008;6(3):e67.
Chen MW, Kotaka M, Vonrhein C, Bricogne G, Rao F, Chuah MLC, et al. Structural insights into the regulatory mechanism of the response regulator RocR from Pseudomonas aeruginosa in cyclic Di-GMP signaling. J Bacteriol. 2012;194(18):4837-4846; doi: 10.1128/JB.00560-12.
Mantoni F, Paiardini A, Brunotti P, D'Angelo C, Cervoni L, Paone A, et al. Insights into the GTP-dependent allosteric control of c-di-GMP hydrolysis from the crystal structure of PA0575 protein from Pseudomonas aeruginosa. Febs J. 2018;285(20):3815-3834; doi: 10.1111/febs.14634.
Rao F, Yang Y, Qi Y, Liang Z-X. Catalytic mechanism of cyclic di-GMP-specific phosphodiesterase: a study of the EAL domain-containing RocR from Pseudomonas aeruginosa. J Bacteriol. 2008;190(10):3622-3631; doi: 10.1128/JB.00165-08.
Chou S-H, Galperin MY. Diversity of cyclic di-GMP-binding proteins and mechanisms. J Bacteriol. 2015;198(1):32-46; doi: 10.1128/JB.00333-15.
Ullah H, Nagelkerken I, Goldenberg SU, Fordham DA. Climate change could drive marine food web collapse through altered trophic flows and cyanobacterial proliferation. PLoS Biol. 2018;16(1):21; doi: 10.1371/journal.pbio.2003446.
Visser PM, Verspagen JMH, Sandrini G, Stal LJ, Matthijs HCP, Davis TW, et al. How rising CO2 and global warming may stimulate harmful cyanobacterial blooms. Harmful Algae. 2016;54:145-159; doi: 10.1016/j.hal.2015.12.006.
O’Neil JM, Davis TW, Burford MA, Gobler CJ. The rise of harmful cyanobacteria blooms: The potential roles of eutrophication and climate change. Harmful Algae. 2012;14:313-334; doi: 10.1016/j.hal.2011.10.027.
Rossi F, De Philippis R. Role of cyanobacterial exopolysaccharides in phototrophic biofilms and in complex microbial mats. Life-Basel. 2015;5(2):1218-1238; doi: 10.3390/life5021218.
Bentkowski P, Oosterhout CV, Ashby B, Mock T. The effect of extrinsic mortality on genome size evolution in prokaryotes. ISME J. 2017;11(4):1011-1018.
Humbert JF, Barbe V, Latifi A, Gugger M, Calteau A, Coursin T, et al. A Tribute to Disorder in the Genome of the Bloom-Forming Freshwater Cyanobacterium Microcystis aeruginosa. PLoS One. 2013;8(8):14; doi: 10.1371/journal.pone.0070747.
Larsson J, Nylander JAA, Bergman B. Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits. BMC Evol Biol. 2011;11:21; doi: 10.1186/1471-2148-11-187.
Wu M, Eisen JA. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2008;9(10):R151-R151.
Šejnohová L, Maršálek B. Microcystis. In: Whitton BA, editor. Ecology of Cyanobacteria II: Their Diversity in Space and Time. Dordrecht: Springer Netherlands; 2012. p. 195-228.
Lyra C, Suomalainen S, Gugger M, Vezie C, Sundman P, Paulin L, et al. Molecular characterization of planktic cyanobacteria of Anabaena, Aphanizomenon, Microcystis and Planktothrix genera. Int J Syst Evol Microbiol. 2001;51(Pt 2):513-526.
Otsuka S, Suda S, Li R, Watanabe M, Oyaizu H, Matsumoto S, et al. 16S rDNA sequences and phylogenetic analyses of Microcystis strains with and without phycoerythrin. J Fems Microbiol Lett. 1998;164(1):119-124.
Otsuka S, Suda S, Shibata S, Oyaizu H, Matsumoto S, Watanabe M. A proposal for the unification of five species of the cyanobacterial genus Microcystis Kützing ex Lemmermann 1907 under the rules of the Bacteriological Code. Int J Syst Evol Microbiol. 2001;51(Pt 3):873.
Römling U, Liang Z-X, Dow JM. Progress in understanding the molecular basis underlying functional diversification of cyclic dinucleotide turnover proteins. J Bacteriol. 2017;199(5):e00790-00716; doi: 10.1128/JB.00790-16.
Wilson AE, Sarnelle O, Neilan BA, Salmon TP, Gehringer MM, Hay ME. Genetic variation of the bloom-forming Cyanobacterium Microcystis aeruginosa within and among lakes: implications for harmful algal blooms. Appl Environ Microbiol. 2005;71(10):6126-6133; doi: 10.1128/AEM.71.10.6126-6133.2005.
Badretdin A, Nawrocki EP, Ostell J, Pruitt KD, Zaslavsky L, DiCuccio M, et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016;44(14):6614-6624; doi: 10.1093/nar/gkw569.
Alikhan N-F, Petty NK, Ben Zakour NL, Beatson SA. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics. 2011;12:402-402; doi: 10.1186/1471-2164-12-402.
Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic acids research. 2004;32(5):1792-1797; doi: 10.1093/nar/gkh340.
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms. Mol Biol Evol. 2018.
Shih PM, Dongying W, Amel L, Axen SD, Fewer DP, Emmanuel T, et al. Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. Proc Natl Acad Sci U S A. 2013;110(3):1053-1058.
Wu M, Scott AJ. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics. 2012;28(7):1033-1034; doi: 10.1093/bioinformatics/bts079.
Kerepesi C, Bánky D, Grolmusz V. AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite. Gene. 2014;533(2):538-540; doi: 10.1016/j.gene.2013.10.015.
Villesen P. FaBox: an online toolbox for fasta sequences. Mol Ecol Notes. 2007;7(6):965-968; doi: 10.1111/j.1471-8286.2007.01821.x.
Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992;8(3):275-282.
Marchlerbauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45(D1):D200-D203.
Ulrich LE, Zhulin IB. The MiST2 database: a comprehensive genomics resource on microbial signal transduction. Nucleic Acids Res. 2010;38(Database issue):D401-D407; doi: 10.1093/nar/gkp940.
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47(D1):D427-D432; doi: 10.1093/nar/gky995.
Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, et al. New and continuing developments at PROSITE. Nucleic Acids Res. 2013;41(Database issue):D344-D347; doi: 10.1093/nar/gks1067.
Letunic I, Bork P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 2018;46(D1):D493-D496; doi: 10.1093/nar/gkx922.
Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, et al. The Protein Data Bank. Acta Crystallogr Sect D-Struct Biol. 2002;58:899-907; doi: 10.1107/s0907444902003451.
Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42(Web Server issue):W252-W258; doi: 10.1093/nar/gku340.
Bertoni M, Kiefer F, Biasini M, Bordoli L, Schwede T. Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology. Sci Rep. 2017;7(1):10480-10480; doi: 10.1038/s41598-017-09654-8.
Benkert P, Biasini M, Schwede T. Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics (Oxford, England). 2011;27(3):343-350; doi: 10.1093/bioinformatics/btq662.
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Res. 2004;14(6):1188-1190; doi: 10.1101/gr.849004.
Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera−a visualization system for exploratory research and analysis. J Comput Chem 2004.

Table1 Genome features of the 25 analyzed M. aeruginosa strains.

Strains	Isolation Location	NCBI Accession Number (Genome/Plasmid)	NCBI Assembly Number	Contigs	Genome Size (Mbs)	G+C %	CDs
CHAOHU 1326	Chaohu Lake, CN	MOLZ00000000/-	GCA_001895325.1	617	5.27158	42.50	4590
DIANCHI905	Dianchi Lake, CN	AOCI00000000/-	GCA_000332585.1	335	4.85887	42.50	4303
KW	Wangsong Reservoir, KR	MVGR00000000/-	GCA_002025445.1	6	5.88943	42.80	4854
NaRes975	Nanwan Reservoir, CN	MOLN00000000/-	GCA_001885655.1	413	5.11753	42.40	4617
NIES44	Lake kasumigaura, JP	BBPA00000000/-	GCA_000787675.1	79	4.56532	43.20	4053
NIES87	Lake kasumigaura, JP	BFAC00000000/-	GCA_002933835.1	246	4.92578	42.90	4214
NIES88	Lake Kawaguchi Yamanashi, JP	JXYX00000000/-	GCA_001578075.1	262	5.26322	43.00	4620
NIES98	Lake kasumigaura, JP	MDZH00000000/-	GCA_001725075.1	500	4.98253	42.40	4412
NIES843	Lake kasumigaura, JP	AP009552/-	GCA_000010625.1	1	5.84279	42.30	5190
NIES1211	Lake Tofutsu, JP	BEIV00000000/-	GCA_003206625.1	289	4.73839	42.80	4209
NIES2481	Lake kasumigaura, JP	CP012375/CP025929	GCA_001704955.2	2	4.44055	42.86	3966
NIES2549	Lake kasumigaura, JP	CP011304/CP026286	GCA_000981785.2	2	4.3012	42.90	3843
PCC7806SL	Braakman Reservoir, NL	CP020771/-	GCA_002095975.1	1	5.13934	42.10	4497
PCC7941	Lake Lillte Rideau, CA	CAIK00000000/-	GCA_000312205.1	433	4.8019	42.60	4337
PCC9432	Lake Lillte Rideau, CA	CAIH00000000/-	GCA_000307995.2	438	4.99494	42.50	4543
PCC9443	Fishpond, CF	CAIJ00000000/-	GCA_000312185.1	760	5.18504	42.70	4545
PCC9701	Guerlesquin dam, FR	CAIQ00000000/-	GCA_000312285.1	550	4.756	42.70	4312
PCC9717	Rochereau dam, FR	CAII00000000/-	GCA_000312165.1	892	5.30034	42.70	4609
PCC9806	Oskosh, US	CAIL00000000/-	GCA_000312725.1	310	4.26256	43.10	4258
PCC9807	Hartbeespoort dam, ZA	CAIM00000000/-	GCA_000312225.1	782	5.15571	42.60	4588
PCC9808	Malpas dam, AU	CAIN00000000/-	GCA_000312245.1	479	5.05105	42.40	4556
PCC9809	Lake Michigan, US	CAIO00000000/-	GCA_000312265.1	809	5.01102	42.80	4497
Sj	Lake Shinji, JP	BDSG00000000/-	GCA_003206555.1	366	4.61732	42.80	3956
SPC777	Billings Reservoir, BR	ASZQ00000000/-	GCA_000412595.1	278	5.455	42.60	4935
TAIHU98	Taihu Lake, CN	ANKQ00000000/-	GCA_000330925.1	4	4.84961	42.50	4340

Table 2. Predicted modular signaling proteins involved in c-di-GMP metabolism in all 25 analyzed M. aeruginosa genomes.

Strains	DGC (REC-GGDEF)^a	PDE (HD-GYP)	PDE (EAL)	Hybrid protein (GGDEF-EAL)	DGC, PDE, Hybrid protein^b
CHAOHU 1326	+	+	-	+	1, 1, 1
DIANCHI905	-	+	-	+	0, 1, 1
KW	+	+	-	+	1, 1, 1
NaRes975	-	+	-	+	0, 1, 1
NIES44	-	+	-	-	0, 1, 0
NIES87	-	+	-	+	0, 1, 1
NIES88	+	+	-	+	1, 1, 1
NIES98	-	+	-	+	0, 1, 1
NIES843	+	+	-	+	1, 1, 1
NIES1211	+	+	+	+	1, 1, 1
NIES2481	+	+	+	+	1, 1, 1
NIES2549	+	+	+	+	1, 1, 1
PCC7806SL	-	+	-	+	0, 1, 1
PCC7941	-	+	-	+	0, 1, 1
PCC9432	-	+	-	+	0, 1, 1
PCC9443	+	+	-	+	1, 1, 1
PCC9701	+	+	-	+	1, 1, 1
PCC9717	+	+	-	+	1, 1, 1
PCC9806	+	+	-	+	1, 1, 1
PCC9807	+	+	-	+	1, 1, 1
PCC9808	-	+	-	+	0, 1, 1
PCC9809	+	+	-	+	1, 1, 1
Sj	+	+	-	+	1, 1, 1
SPC777	-	+	-	+	0, 2, 1
TAIHU98	-	+	-	+	0, 1, 1

^aLetters in parentheses are domains of the referred c-di-GMP metabolism enzymes.

^bNumber of DGCs, PDEs and hybrids proteins

Table 3. Highly conserved GAF and PAS domain-containing protein accession numbers and domain architectures in M. aeruginosa.

Strains

CHAOHU 1326

WP_072924073

WP_072924730

WP_052277404

WP_072926573

WP_072926580

WP_052277305

WP_072924626

WP_072924963

WP_052276811

WP_072926064

WP_002762449

DIANCHI905

WP_002745958

WP_002747168

WP_002744306

WP_004157354

WP_002747345

WP_002748263

WP_002741087

WP_002747668

WP_002744275

WP_002741702

WP_002740896

WP_002742002

WP_079209549

WP_079208831

WP_079209700

WP_079207421

WP_079205485

WP_079208235

WP_079208383

WP_079210006

WP_079205924

WP_079209813

WP_079208453

WP_079208968

NaRes975

WP_002794319

WP_002791029

WP_002793179

WP_002754030

WP_002793237

WP_002794122

WP_004162195

WP_002794503

WP_002791187

WP_002794215

WP_002791072

NIES44

WP_045357954

WP_045357888

WP_045360329

WP_045360749

WP_045359283

WP_045357878

WP_045359251

WP_045361304

WP_045362584

WP_045356473

WP_045357371

WP_045361075

NIES87

WP_104395268

WP_104396695

WP_104397419

GBE75978

WP_104396850

WP_104397238

WP_104395772

WP_104396333

WP_104396993

WP_104398395

WP_104397430

WP_104395682

NIES88

WP_061431276

WP_061433352

WP_061433296

WP_061431986

WP_061431151

WP_061430162

WP_061433130

WP_061432020

WP_061432273

WP_061430955

WP_061432206

WP_061432934

NIES98

WP_069475418

WP_042790772

WP_002776488

WP_069474734

WP_069474833

WP_016515378

WP_069473877

WP_069475019

WP_069474425

WP_069475037

WP_002735697

NIES843

WP_002798555

WP_012265232

WP_012266947

WP_012263925

WP_012265699

WP_012265377

WP_012267004

WP_012264674

WP_012267308

WP_012265457

WP_012266445

NIES1211

WP_106909322

WP_110545822

WP_110544865

WP_008197420

WP_008206790

WP_008206225

WP_110544472

WP_008196709

WP_110545383

WP_008201845

WP_008204843

NIES2481

WP_046661343

WP_046662881

WP_066029521

WP_046663349

WP_046663066

WP_066029442

WP_046661009

WP_046661034

WP_066030064

WP_066029449

WP_046662520

NIES2549

WP_046661343

WP_046662881

WP_046660921

WP_046663349

WP_046663066

WP_046660624

WP_046661009

WP_046661034

WP_046662941

WP_046660687

WP_046662520

PCC7806SL

WP_002745958

WP_002747168

WP_002744306

WP_004157354

WP_002747345

WP_002748263

WP_002741087

WP_002747668

WP_002744275

WP_084990071

WP_002740896

WP_002742002

PCC7941

WP_002773209

WP_002778812

WP_002773364

WP_002776488

WP_002774300

WP_043997359

WP_002753664

WP_002776134

WP_002777487

WP_002753355

WP_002778069

WP_002773459

PCC9432

WP_002752739

WP_002751457

WP_002750080

WP_002754030

WP_043998257

WP_002753664

WP_002750596

WP_004158869

WP_002753355

WP_002755152

WP_002735697

PCC9443

WP_002768350

WP_043996502

WP_002765532

WP_002772016

WP_002767728

WP_004159948

WP_002766513

WP_002772102

WP_002769252

WP_002770726

WP_002768675

WP_002765941

PCC9701

WP_004268082

WP_004267903

WP_002800894

WP_002800080

WP_002800292

WP_043997878

WP_002802777

WP_002802735

WP_002801722

WP_004163700

WP_002800792

WP_002800602

PCC9717

WP_004159531

WP_002757620

WP_002759229

WP_002764008

WP_002760382

WP_002756977

WP_004266808

WP_002757139

WP_002758487

WP_002759341

WP_002758380

WP_002762449

PCC9806

WP_002782883

WP_002781396

WP_002783547

WP_002780554

WP_002781783

WP_110578750

WP_002781098

WP_002783718

WP_002780998

WP_002781582

WP_002784436

PCC9807

WP_002789391

WP_002785382

WP_002787487

WP_002786795

WP_002786914

WP_002789289

WP_002787548

WP_004161361

WP_002785295

WP_002785224

WP_002789038

WP_002790089

PCC9808

WP_002794319

WP_002791029

WP_002793179

WP_002754030

WP_002793237

WP_002794122

WP_004162195

WP_002794503

WP_002791187

WP_002794215

WP_002791072

PCC9809

WP_002798555

WP_002798111

WP_002798380

WP_004162832

WP_002797601

WP_002799132

WP_002798872

WP_004162777

WP_004163339

WP_004162493

WP_002796812

WP_002798261

WP_110578336

WP_110578996

WP_110578784

WP_110579418

WP_110578966

WP_110577725

WP_110578750

WP_110577976

WP_110579984

WP_110578953

WP_110579722

WP_110579820

SPC777

WP_016515800

WP_016515594/WP_016515130

WP_016517186

WP_016516925

WP_016517335

WP_016515378

WP_016516028

WP_016515861*2

WP_016515025

WP_016517250

WP_002735697

TAIHU98

WP_002732405

WP_042790772

WP_002740043

WP_002737006

WP_002739588

WP_002733486

WP_002737089

WP_002732234

WP_002733944

WP_002733887

WP_002735697

Download PDF

Journal Publication

published 09 Mar, 2020

Read the published version in BMC Genomics →

Editorial decision: Major revision
03 Dec, 2019
Review #3 received at journal
26 Nov, 2019
Review #2 received at journal
05 Nov, 2019
Review #1 received at journal
01 Nov, 2019
Reviewer #2 agreed at journal
23 Oct, 2019
Reviewer #3 agreed at journal
23 Oct, 2019
Reviewer #1 agreed at journal
17 Oct, 2019
Reviewers invited by journal
14 Oct, 2019
Submission checks completed at journal
02 Oct, 2019
Editor assigned by journal
22 Sep, 2019
Editor invited by journal
21 Sep, 2019
First submitted to journal
18 Sep, 2019

You are reading this older preprint version

Read the latest preprint version →

Comparative genomics analysis of c-di-GMP metabolism and regulation in Microcystis aeruginosa

Status:

Journal Publication

Version 1

Abstract

Figures

Background

Results

Discussion

Conclusions

Methods

Abbreviations

Declarations

References

Tables

Supplementary Files

Status:

Journal Publication

Version 1