General features of cpDNA
This study contains data from 14 chloroplast genomes representing the existing major branches of Chaetophorales. Seven of twelve newly added chloroplast genomes were with complete genomic maps (Additional file 1).
All complete chloroplast genomes of Chaetophorales (Table 1) consistently contained 67 protein-coding genes and 3 rRNA genes without inverted repeats (IR). Protein-coding genes primarily included 5 psa, 15 psb, 11 rps, 8 rpl, 6 atp, 5 rpo, 4 pet, 3 chl and 4 ycf genes. Furthermore, some genes appeared only once, such as the rbc, cem, fts, clp, tuf, and ccs. Significant differences were observed in genome size, GC content, total number of genes, number of tRNAs, number of introns, and number of protein-coding genes distributed on the positive and negative strands of the genome respectively. The chloroplast genome size ranged 150157–223902 bp. Aphanochaete elegans (HB201732) had the smallest chloroplast genome, and Stigeoclonium helveticum (UTEX 441) had the largest chloroplast genome. The GC content ranged 23.88%–31.70%, of which Aphanochaete elegans (HB201732) had the lowest GC content, and Chaetophoropsis polyrhium (HB201646) had the highest GC content. The number of tRNAs ranged 25–30, which was markedly different. Introns varied between 2 and 33. Aphanochaete elegans (HB201732) only contained two introns, displaying the most compact genome, while Schizomeris leibleinii (UTEX LB 1228) contained 33 introns. Furthermore, the distribution of genes on the coding strand was skewed and varied among species. The protein-coding genes were distributed among both strands, and the number of genes at the plus or minus strand varied among different species. The distribution of protein-coding genes of Aphanochaete elegans (HB201732) was the most uneven (+/-, 51/16). The total length of the coding region accounted for 45.15%–65.79%, and Aphanochaete elegans (HB201732) accounted for the highest proportion, while Stigeoclonium sp. (bmA10) accounted for the lowest proportion.
Furthermore, five fragmentary chloroplast genomes were obtained. Despite different degrees of deletions in the chloroplast genome, partial genome sequences we generated, including complete sequences of all 58 protein-coding genes shared among the completely sequenced cpDNAs; therefore, protein-coding genes were maximally extracted for phylogenetic analyses (Table 1).
Phylogenetic analyses based on the four nuclear concatenated markers (18S + 5.8S + ITS2 + partial 28S rDNA)
The 53-taxa alignment comprised 3032 bp. In total, 664 sites among these nucleotides were variable, of which 496 sites were parsimoniously informative and 168 sites were singleton sites. The average content of A, T, C, and G was 24.17%, 25.67%, 21.46%, and 28.70%, respectively, of which the G + C content (50.16%) was greater than that of the A + T content (49.84%). The transition/transversion ratio was 1.77. Chloroplast genomes from 12 strains represented four families herein and are shaded in grey. The phylogenetic trees generated using the Bayesian and ML methods displayed similar topologies to those reported previously [21, 39, 40]. Phylogenetic analyses of both alignments resolved six currently recognized monophyletic families in Chaetophorales [23]. Family Schizomeridaceae, as a sister family of those in Chaetophorales, was the basal clade of Chaetophorales with robust support (100/1.00) and was markedly separated from Aphanochaetaceae (Fig. 1).
Phylogenetic analyses based on the chloroplast protein-coding genes
Both data sets were assembled from the following 58 protein-coding genes: atpA, atpB, atpE, atpF, atpH, atpI, ccsA, cemA, chlB, chlN, clpP, petB, petD, petG, petL, psaA, psaB, psaC, psaJ, psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ, rbcL, rpl2, rpl5, rpl14, rpl16, rpl20, rpl23, rpl36, rpoA, rpoC2, rps3, rps4, rps7, rps8, rps9, rps11, rps12, rps14, rps18, rps19, tufA, ycf12, ycf3, and ycf4.
These aforementioned genes formed a concatenated nucleotide (nt) dataset comprising 32019 and 21346 base pairs (without 3rd codon positions). In total, 18578 sites and 10706 in these nucleotides were variable, of which 16752 and 9429 sites were parsimoniously informative and 1826 and 1277 sites were singleton sites. The average content of A, T, C, and G was 31.44%, 33.89%, 15.25%, and 19.42% for the complete data set, and 29.71%, 32.57%, 16.03%, and 21.69% for the dataset without 3rd codon positions, wherein the A + T content was markedly greater than that of G + C. The 58 protein-coding genes concatenated amino acid (aa) dataset comprised 10673 characters.
Maximum likelihood (ML) phylogenetic trees generated with the concatenated nucleotide (nt) data set treated with three methods (partitioned by gene position, codon position, and gene position without 3rd codon positions) had low support values at the node of the clade (orders Chaetophorales and Chaetopeltidales) (56/65/70) (Fig. 2).
Nonetheless, the topologies of phylogenetic trees generated with concatenated datasets (nt and aa) were almost identical to each other and the support values in the amino acid (aa) data set were high at almost all nodes (Fig. 3), in contrast with previous studies with rDNA datasets [20, 21, 39]; this can be visualized on the basis of two aspects: the topologies and the support value, especially in the OCC clade. The support values in concatenated datasets of the chloroplast were markedly higher than those on rDNA datasets. Chlorophyceae diverged into two well-supported clades: VS and OCC clades. In the OCC clade, Oedogoniales was located at the base of the branch, and Chaetophorales and Chaetopeltidales were most closely related. Regarding the marked differences in the inner branching in Chaetophorales, Chaetophorales diverged into four well-supported clades, including five currently approved families except for Barrancaceae: Schizomeriaceae, Aphanochaetaceae, Uronemataceae, Fritschiellaceae, and Chaetophoraceae. Schizomeriaceae and Aphanochaetaceae could not be adequately separated, as rDNA datasets instead clustered into one branch at the base of order Chaetophorales. Chaetophoraceae sensu lato was located at the top branch of the Chaetophorales, displaying a basal split into the two well-supported clades, representing Fritschiellaceae and Chaetophoraceae sensu stricto, respectively. Family Uronemataceae as the sister was most closely related to Chaetophoraceae sensu lato.
Synteny analysis
ProgressiveMauve was used to analyze synteny in the chloroplast genome in Chaetophorales and set Schizomeris leibleinii as the reference genome [26]. Synteny analysis is illustrated in Fig. 4. Nine genomes from five families were used, including seven genera, and more than 27 locally collinear blocks (LCBs) were identified. The LCB connecting lines were confounding among chloroplast genomes and considerable rearrangements and inversions were noted, especially in Fritschiellaceae and Chaetophoraceae. The largest LCB was more than 40 kb (Fig. 4a). Synteny was highly homogenous among Schizomeris leibleinii (Schizomeridaceae), Aphanochaete confervicola, and Aphanochaete elegans (Aphanochaetaceae) (Fig. 4b). Three conserved LCBs comprising common genes (psbB, psbT, and psbH), (psaC and psbN), and (petL), respectively, were somewhat modified within most members of Chaetophorales. For example, compared to Schizomeris leibleinii, LCB (psbB, psbT, psbH) included another gene petD and orf101, and gene petL was inverted in Stigeoclonium helveticum. Similar patterns were observed in other species. Moreover, gene psbN was proximal to psaC; however, it did not split and transsplice psaC in Stigeoclonium sp. Nonetheless, the aforementioned three LCBs between Schizomeridaceae and Aphanochaetaceae were highly conserved (Fig. 5). Furthermore, the guide tree inferred from chloroplast genomes, using progressiveMauve, clearly indicated that Schizomeridaceae and Aphanochaetaceae clustered into one clade at the base of Chaetophorales (Fig. 5).
Evolution of the Chaetophorales based on the germination type of zoospores
Morphological and life history observations clearly revealed that in the order Chaetophorales, zoospores of Schizomeriaceae contained zoospores for erect germination; Aphanochaetaceae, prostrate germination. Uronemataceae only contained zoospores for erect germination. In Chaetophoraceae sensu lato, zoospores of the family Chaetophoraceae sensu stricto and family Fritschiellaceae were present for erect germination and prostrate germination, respectively [23].
Based on the germination type of zoospores, the evolutionary hypothesis of Chaetophorales was proposed: the clade including Schizomeriaceae and Aphanochaetaceae including zoospores for erect and prostrate germination, respectively, was most closely related to the original ancestors of Chaetophorales, wherein the aforementioned two families were clustered together and located at the base of Chaetophorales; Uronemataceae displayed a loss of traits [20], only retaining zoospores for erect germination. In Chaetophoraceae sensu lato, the Stigeoclonium-like ancestors evolved independently in two directions. Some of them evolved into a group only with zoospores for prostrate germination and the highly differentiated prostrate in genera Fritschiella and Chaetophoropsis (Fritschiellaceae). The other part evolved into a group containing only zoospores for erect germination and the highly differentiated erect part in genus Draparnaldia (Chaetophoraceae) [41, 42], which were located at the top branch of Chaetophorales representing the most evolved taxa (Fig. 5).