Data description and blooming stage division
We collected 16 CAs from the northern bays of Lake Taihu from April 2015 to February 2016, in which cyanobacteria showed a full bloom cycle (Table 1). All samples were used for 16S rRNA gene sequencing to detect bacterial community composition, metagenome sequencing (MG) to detect functional potentials and metatranscriptome sequencing (MT) to detect community gene expression. A total of 424 Mb of 16S rRNA gene amplicon sequence data were generated from Illumina MiSeq sequencing, and 122 Gb of high-quality MG sequence data and 160 Gb of high-quality MT sequence data were generated from Illumina HiSeq sequencing, which are among the biggest dataset obtained for any CAs so far. Using real-time remote sensing image data of Lake Taihu, we calculated the total area of cyanobacterial blooming. The blooming area was very small (~4.5 km2) in April and May, and expanded dramatically (~129.1 km2) in June, and continued to decline (~1.70 km2) in January. Therefore, we divided the blooming cycle into three stages (Early-stage: pre-blooming stage, Mid-stage: peak blooming stage and Late-stage: post-blooming stage). These 16 CA samples spanned the three blooming stages: three samples in Early-stage, 10 samples in Mid-stage and three samples in Late-stage (Table 1). Considering the imbalance of sample size in three groups, we corrected the sample size in all subsequent analyses.
Dynamics of microbial taxonomic structure in CAs
Bacteria dominated the CA community (MG 97.561.89%, MT 86.496.77%), with small fractions of Archaea (MG 0.190.07%, MT 0.380.13%) and Eukaryota (MG 2.251.84%, MT 13.136.68%). The most abundant bacteria found in the Early-stage samples on the basis of 16S were members of Cyanobacteria (38.85.1% s.d.), Bacteroidetes (30.010.1%) and Proteobacteria (26.54.8%) (Fig. 1A). The most abundant bacteria found in the Mid-stage samples were members of Proteobacteria (36.16.2%), Cyanobacteria (35.88.5%) and Bacteroidetes (20.0%). The most abundant bacteria in the Late-stage samples were members of Cyanobacteria (68.23.7%), Proteobacteria (17.84.6%) and Bacteroidetes (12.52.2%). Similar distributions of phyla were observed in 16S, MG and MT data sets (correlation coefficients of r>0.65 for 16S and MG, r>0.85 for MG and MT, and r>0.90 for 16S and MT), suggesting that the data sets were of excellent quality.
Different stages had unique sets of microbes, especially for Mid-stage (Fig. 1B). When comparing the taxonomic composition among three stages, 50 OTUs (22 Bacteroidetes,10 Cyanobacteria, 15 Proteobacteria and 3 Unassigned) showed significantly different abundance among three stages via ANOVA test (Fig. 2A), 16 of which belong to the family Cytophagaceae and four of which belong to the cyanobacterial genus Dolichospermum. According to UPGMA clustering results of the OTUs abundance, 16 samples could also be divided into three significantly different groups (Adonis, R2 = 0.71, p=0.001). Remarkably, the three groups correspond to three blooming stages. When comparing the taxonomic composition between pairwise stages, four OTUs showed different abundance between Early-stage and Mid-stage; seven between Early-stage and Late-stage; one between Mid-stage and Late-stage (Fig. S1). It is noteworthy that the cyanobacterial genus Microcystis was detected enriched in Mid-stage than Early-stage (Fig S1A), which is consistent with previous studies showing that Microcystis was the dominant genus during heavy cyanobacterial blooming in summer and autumn [36, 86, 87].
Different alpha diversity was detected in three stages based on OTU abundance table, with Early-stage exhibiting the highest alpha diversity (Fig. 1C). Early-stage and Mid-stage were similar in 16S rRNA operon copy number and total nitrogen. However, it is interesting that the alpha diversity was the lowest in the Late-stage, but 16S rRNA operon copy number and total nitrogen were the highest, consistent with the resource-copy number theory (“more resource, more copy number”)  and the resource-ratio theory (“more resource, less diversity”)  .
Dynamics of microbial functional structure in CAs
The CAs of three blooming stages shared more genes and transcripts compared to OTUs (Fig. 1B). The CAs in Mid-stage exhibited more genes and transcripts than those in the other two stages. While, the CAs in Early-stage and Late-stage shared least amount of genes and transcripts, consistent with the conclusion from OTUs analysis. When annotated the genes and transcripts with KEGG database, the CAs in Mid-stage and Late-stage shared more transcripts that mainly include nitrogen metabolisms related genes, such as denitrification and ammonification. (Fig. 1D). A total of 456 genes mainly related to enzymes (244 genes), ribosome (46 genes) and transporters (42 genes) showed significantly different abundance among three stages (ANOVA test, Fig. 2B). Meanwhile, 37 transcripts mainly related to enzymes (15 genes), ribosome (9 genes) and photosynthesis proteins (5 genes) had significantly different abundance among three stages, of which 20 transcripts (Fig. 2C, highlight with light yellow) were annotated to the same KOs of the significant genes. Those 20 transcripts were mainly included in the pathway of photosynthesis system and subunit ribosomal protein. It is noteworthy that most of the genes (436 of 456) with significant difference were not differently expressed. Six SEED subsystems had significantly different abundance among three stages (Fig. S2A). Photosynthesis related subsystems, such as electron transport and photophosphorylation, NAD and NADP, light-harvesting complexes and alkylphosphonate utilization, are significantly enriched in Mid-stage compared to Early-stage (Fig. S2B). Protein secretion system is significantly enriched in Early-stage; meanwhile ABC transporter is significantly enriched in Early-stage compared to Mid-stage (Fig. S2B).
Phosphate, nitrogen and sulfur were closely related to cyanobacterial bloom; which give rise to methane and algal toxin production. Therefore, we focused on the genes on the related pathways. The pentose phosphate pathway, sulfur metabolism and methane metabolism appeared in all three stages from both MG and MT data, with the Mid-stage showing the most abundant genes on all these pathways (Fig. 3). Dissimilatory reduction of nitrate to ammonium, nitrogen fixation and denitrification were detected and nitrogen fixation only appeared in Late-stage. Microcystins are a group of at least 80 chemical variants and synthesized non-ribosomally by protein products of gene cluster mcy. The mcys were detected in all three stages from both MG and MT data and most abundant in Mid-stage, followed by Late-stage and Early-stage.
When using the MT/MG ratio to estimate taxonomic activity at the genus level, the average MT/MG ratios were the highest for Basidiomycota (MT/MG=4.1), Ascomycota (MT/MG=3.5) and Cyanobacteria (MT/MG=1.9), suggesting that Cyanobacteria, in addition to the first two fungi, were acclimated to be the most active (Fig. 1E). However, considering the relative abundance of eukaryote resulted from metagenomic data is 2.2%, we focused on the bacteria. We calculated the MT/MG ratio for each genus of bacteria. As shown in Table 2, the genera showing highest activity in three blooming stages were different. The three most active genera for Early-stage are Pseudomonas (MT/MG=3.8), Dolichospermum (MT/MG=3.2) and Pseudanabaena (MT/MG=3.0). Dolichospermum is the dominant cyanobacterial genus in the Early-stage resulted from 16S rRNA analysis and were acclimated to be active (Fig. 1E). In Mid-stage, the three most active genera are Arcobacter (MT/MG=12.5), Dechloromonas (MT/MG=5.7) and Clostridium (MT/MG=4.5. All the three genera are cyanobacterial attached bacteria and are more active than cyanobacteria, suggesting their significant roles in heavy water bloom. The. In Late-stage, Dolichospermum (MT/MG=4.4), Aphanizomenon (MT/MG=4.0) and Pseudomonas (MT/MG=4.0) are the highest active genera. Different from the Mid-stage, the nitrogen fixation cyanobacteria genus, Dolichospermum, returned to be the dominant cyanobacterial genus in Late-stage and shows highest MT/MG ratio. Another nitrogen fixation cyanobacterial genus, Aphanizomenon is the second most active genus. Several genes involved in nitrogen fixation were also detected in the MG and MT data sets.
Metagenome-assembled genomes (MAGs)
We generated 233 metagenome-assembled genomes (MAGs) with ≥60% completeness and <10% contamination, 161 of which were high quality draft genomes with ≥70% completeness and <5% contamination . De-replication of highly similar MAGs based on FastANI values of ≥95% resulted in the consolidation of 78 non-redundant MAGs and their phylogenetic tree (Fig. 4). The tree is dominated by large numbers of genomes from Proteobacteria (44 MAGs, mostly Alphaproteobacteria) and Bacteroidetes (15 MAGs, mostly Cytophagia) phyla, and also contains several genomes from the phyla of Gemmatimonadetes (2 MAGs), Spirochaetes (2 MAGs), Cyanobacteria (2 MAGs), Acidobacteria (1 MAG) and 12 novel Bacterial genomes. Their completeness, contamination, length, N50 and taxonomic annotation were shown in Table 3.
The relative abundance of 78 non-redundant MAGs in all samples is provided in Table S1. Using a cut-off of 1×coverage, most MAGs (72) were present in more than one samples and 26 MAGs were present in more than eight samples. Four MAGs were present in more than 14 samples, which were all members of the Alphaproteobacteria. Nineteen MAGs were shifted in relative abundance among three stages (ANOVA test, P < 0.10) (Fig. 4) and their relative abundance were shown in Fig. S3A. Two MAGs annotated to Cytophagales order and Caulobacteraceae family showed higher abundance in Early-stage than Mid-stage (Welch’s t-test, P < 0.05), while two MAGs belonging to Acetobacteraceae family and Alphaproteobacteria class showed higher abundance in Mid-stage than Early-stage (Fig. S3B). Ten MAGs showed significantly different abundance between Early-stage and Mid-stage, only one of which (150608.59.fa annotated to Cyanobacteria phylum) had higher abundance in Late-stage. When comparing between Mid-stage and Late-stage, 15 MAGs were more abundant in Mid-stage, except one MAGs (150608.59.fa annotated to Cyanobacteria phylum) that was more abundant in Late-stage. The cyanobacterial MAG 150608.59.fa has closest sequence similarity to Dolichospermum flos-aquae.CHAB-1629 (Genome-to-Genome Distance Calculator  DNA–DNA hybridization (DDH) 18.4%2.4). Dolichospermum was detected to be the dominated cyanobacterial genus in the Late-stage, with 150608.59.fa showing the highest abundance in Late-stage. The other non-redundant cyanobacterial MAGs (150402.41.fa with the closest sequence similarity to Microcystis aeruginosa.NIES-88, DDH 12.9%2.7), was detected enriched in Mid-stage than Early-stage.
Based on the average relative abundance, the most abundant MAG was 151222.71.fa annotated to Betaproteobacteria. We constructed its metabolic pathways (Fig. 5) and identified rich transporters of ammoniu, phosphate, amino acid, peptide and sugar, and several metallic elements such as iron, nickel, cobalt and magnesium. It has the flagellar biosynthesis pathway to synthesize flagellar to improve movement in viscous environments. It also has the nitrogen metabolism transferring nitrate to ammonia, the sulfur metabolism transferring thiosulfate to sulfate, the fatty acid metabolism transferring acetyl-coa to plamitic acid and the oxidative phosphorylation pathway releasing energy supplies ADP and inorganic phosphoric acid to synthesize ATP through respiratory chain.
Twenty two high-quality cyanobacterial MAGs were constructed from binning analysis, which were clustered into four clusters based on their gene function (Fig. 6). Cluster 1 and Cluster 2 only contain one MAG. Cluster 3 and Cluster 4 respectively contain nine and 11 MAGs. Their phylogenetic tree, alongside 21 public genomes of two dominant cyanobacterial genera (Microcystis and Dolichospermum) in Lake Taihu from NCBI, was constructed based on bacterial core gene set (Fig. S4). As expected, these 22 cyanobacterial MAGs could also be divided into four evolutionary divergence in the phylogenetic tree. Both Cluster 1 and Cluster 2 are similar to Microcystis aeruginosa. Cluster 3 is close to Dolichospermum flos-aquae and Cluster 4 to Pseudanabaena yagii GIHE-NHR1. There were significant differences in metabolism among the four clusters, especially on the energy pathways of nitrogen metabolism, sulfur metabolism and photosynthesis (Fig. 6). Cluster 1 and Cluster 2 have more energy metabolism pathways than Cluster 3 and Cluster 4, with the modules of M00145 (NAD(P)H:quinone oxidoreductase, chloroplasts and cyanobacteria), M00163 (Photosystem I), M00161 (Photosystem II) and M00616 (Sulfate-sulfur assimilation). However, Cluster 1 has extra sulfur metabolic modules (M00596: Dissimilatory sulfate reduction, M00176: Assimilatory sulfate reduction) and Cluster 2 has extra nitrogen metabolic modules (M00531: Assimilatory nitrate reduction, M00615: Nitrate assimilation). The energy pathways of Cluster 3 mainly include oxidative phosphorylation and photosynthesis (Photosystem I), while Cluster 4 mainly include Photosynthesis (Photosystem II), nitrogen and sulfur metabolism. All of the four clusters have prokaryotic defense system, especially Cluster 4, which has three CRISP-associated proteins Csm1, Cmr3 and Cmr4 (Fig. S5). Those cyanobacterial MAGs also have dramatically different transporters. Cluster 1 can transport phosphate, ammonium, peptide/nickel, biopolymer, ferrous iron, lipopolysaccharide and urea. Cluster 4 can transport high-affinity iron, Na+, lipopolysaccharide, simple sugar, putative Mg2+, NitT/TauT, biopolymer and urea. Cluster 2 can transport branched-chain amino acid, polar amino acid, neutral amino acid, cobalt/peptide/nickel, glutamate, chromate, Ca, Fe-S cluster assembly and zinc. Cluster 3 can transport vitamin B12, MFS, arginine, lysine, histidine and glutatmine.
Microcystis aeruginosa strains varied by their capabilities on microcystins biosynthesis .We obtained the major strain for each sample by constructing strains generation for each metagenomic CA sample, and checked its toxicity by detecting mcy gene family (Fig. S6A). First, 10,568 gene families (similarity > 95%) were clustered from four reference genomes of Microcystisaeruginosa. Then, 12 Microcystis aeruginosa strains were obtained from 12 CA samples and they were clustered into two major clusters respectively similar to M. aeruginosa NIES-843 and M. aeruginosa PCC-9806, which were dramatically different from the two very similar genomes of M. aeruginosa NIES-2481 and M. aeruginosa NIES-2549. Two strain clusters showed different microcystins biosynthesis characteristics. The cluster similar to M. aeruginosaNIES-843 contained mcy gene family and the other cluster similar to M. aeruginosa PCC-9806 was without mcy gene family. Additionally, different strains were composed of different gene families, indicating the high functional diversity of Microcystis aeruginosa strains in Lake Taihu. For the other dominant cyanobacterial genus Dolichospermum flos-aquae, 4,326 gene families were clustered from one available reference genome (D. flos-aquae CHAB 1629). Compared to Microcystis aeruginosa, Dolichospermum flos-aquae showed much lower strain diversity in Lake Taihu with almost same gene family combination to the reference genome of D. flos-aquae CHAB 1629 (Fig. S6B). Seven major strains were constructed from seven CA samples, of which three samples were from Early-stage and three samples from Late-stage, consistent with the phenomenon that Dolichospermum showed significant higher abundance in Early-stage and Late-stage.