To confirm the phylogenetic position of the eight strains under study, we used the rbcL phylogeny to identify the major groups of Cyanidiophyceae (Figure S2). The taxon-rich (269 taxa) rbcL tree showed five major cyanidiophycean groups: 1) Cyanidium chilense assemblage (Cd. chilense; known as mesophilic Cyanidium sp.) [12], 2) Cyanidium caldarium (Cd. caldarium), 3) Galdieria sulphuraria assemblage (G. sulphuraria), 4) Cyanidioschyzon merolae (Cz. merolae), and 5) Cyanidiococcus yangmingshanensis assemblage (Cc. yangmingshanensis; known as Galdieria maxima) [27], which is consistent with previous studies [14, 18]. Our eight strains represent well the diversity of the five major groups of Cyanidiophyceae (see in Figure S2).
The general characteristics of the eight mitogenomes were compared including five new and three published datasets (Table 1) [25, 26]. These mitogenomes are clearly divided into two types as Cyanidium-type and Galdieria-type based on mitogenome features (e.g., genome size, genome structure, number of genes, skewness of nucleotides; see Supplementary Information 1). Cyanidium-type (C-type) is comprised of five taxa (mesophilic Cd. chilense Sybil Cave, Cd. caldarium, Cz. merolae, Cc. yangmingshanensis, and Cyanidiophyceae sp. MX-AZ01), and the Galdieria-type (G-type) is comprised of three taxa (G. phlegrea, G. sulphuraria 108.79 E11 and 074W). Based on our observations and previous work, C-type and G-type are recognized not only using mitogenome characteristics, but also on the basis of morphological characteristics, cellular features, and ecological habitats. Cells of G-type species are generally larger than C-type and have a simpler morphology, compared to the more diverse cell shapes (e.g., spherical, oval, club-shaped) in C-type species (Fig. 1A). According to a previous analysis, G-type Galdieria sulphuraria contains several mitochondria per cell that have a net-like structure, whereas C-type Cd. caldarium contains a single mitochondrion in a spheroid cell [28]. Likewise, multiple mitochondria are identified in G-type G. sulphuraria 108.79 E11, whereas a single spheroid mitochondrion is found in C-type Cc. yangmingshanensis 8.1.23 F7 in transmission electron microscopic observation (Figs. 1A).
Table 1
General characteristics of Cyanidiophyceae mitogenomes. *abbreviations: CZME (Cyanidioschyzon merolae), CYSP (Cyanidiophyceae sp.), CCYA (Cyanidiococcus yangmingshanensis), CDCA (Cyanidium caldarium), CDCH (Cyanidium chilensis), GAPH (Galdieria phlegrea), GASU (Galdieria sulphuraria).
Type | Cyanidium-type (C-type) | Galdieria-type (G-type) |
Species* | CZME 10D | CYSP MX-AZ01 | CCYA 8.1.23 F7 | CDCA ACUF 019 | CDCH Sybil Cave | GAPH DBV 009 | GASU 074W | GASU 108.79 E11 |
Genome Size (bp) | 32,211 | 32,620 | 32,387 | 34,207 | 33,039 | 21,792 | 21,428 | 21,611 |
GC-content (%) | 27.1 | 26.7 | 26.4 | 25.9 | 44.5 | 41.4 | 44.0 | 41.8 |
GC-skew | 0.06 | 0.03 | 0.03 | 0.02 | 0.01 | 0.71 | 0.74 | 0.66 |
AT-skew | 0.01 | 0.02 | 0.03 | 0.03 | 0.03 | 0.25 | 0.25 | 0.29 |
Number of Genes | 64 | 61 | 62 | 61 | 61 | 26 | 27 | 27 |
Non-coding Region (%) | 5.20 | 6.50 | 5.13 | 10.64 | 6.37 | 17.55 | 15.55 | 16.49 |
NCBI Accession Number | NC_000887 | KJ569774 | MT270119 (this study) | MT270118 (this study) | MT270117 (this study) | MT270116 (this study) | NC_024666 | MT270115 (this study) |
Cyanidium -type and Galdieria-type mitogenomes resolved using phylogenetic analysis
Phylogenetic analysis also supports the recognition of the C-type and G-type mitogenomes. The concatenated protein ML phylogeny resolves the C-type and G-type with full bootstrap support (see Fig. 1B). Within the C-type, the mesophilic Cd. chilense Sybil Cave diverged first, followed by Cd. caldarium ACUF 019. Cz. merolae 10D are grouped together with the monophyletic clade of the Cc. yangmingshanensis 8.1.23 F7 + Cyanidiophyceae sp. MX-AZ01. Our current mitogenome data resolve the internal relationships within the Cyanidiophyceae, in particular, the positions of mesophilic Cd. chilense Sybil Cave and Cd. caldarium ACUF 019, that were poorly resolved until now. In this mitogenome data analysis, however, we observed an extraordinarily long internal branch of the G-type (Fig. 1B), which implies high divergence when compared to C-type species or unidentified/extinct genetic diversity in the G-type lineages. In addition, we tested individual gene phylogenies to see the consistency compared to concatenated gene tree in Supplementary Information 2. Applying variable mitochondrial gene datasets, we were able to resolve phylogenetic relationships among major cyanidiophycean clades.
Different CDS content between Cyanidium-type and Galdieria-type mitogenomes: gene loss and transfer
After the recognition of two different groups in Cyanidiophyceae, we focused on mitogenomes gene gains and losses. Comparison of CDS content between the two different types revealed that one-half of mitochondrion-encoded genes in C-type mitogenomes are missing in the G-type, in particular synteny of the green block in G-type was changed (Fig. 2A) [i.e., losses of all ribosomal protein genes (rps, rpl) and a few core genes (ccmA,B and sdhB,D)]. To examine endosymbiotic gene transfer (EGT) from mitochondrial to the nuclear genome for these missing genes in G-type, we searched 18 homologous genes in the two available nuclear genomes of Cz. merolae 10D and G. sulphuraria 074W [23, 29]. Ten genes were identified from the nuclear genome of G. sulphuraria 074W, but owe did not identify eight mitochondrial genes (‘N/D’ in Fig. 2A) that may indicate outright gene losses in the mitogenome, although it is not possible to rule out issues related to low-quality genome data. Other explanations for missing genes is either high diversification after gene transfer or degeneration of genes from mitochondria. Out of twelve ribosomal proteins, only rps12 and rpl20 were found in the G. sulphuraria 074W nuclear genome. These nuclear-encoded rps12 and rpl20 genes were not grouped together but rather located in two different scaffolds, 57 and 29, respectively (Fig. 2B), whereas most ribosomal protein encoding genes are located in a single syntenic block in C-type mitogenomes (ccmA-nad6; green block in Fig. 2A). We could not detect the remaining ribosomal protein encoding genes, instead, we found six nuclear-encoded homologs of ribosomal protein (rpl6, rpl14, rpl16, rps4, rps11, rps19) in the genome (see question marks in Fig. 2A). With the ccmF gene (see Supplementary Information 3 for details), the origins of six homologous genes of ribosomal protein were unclear based on phylogenetic analyses due to low bootstrap support values.
It is unlikely that mitochondrial translation would function properly without a complete set of ribosomal subunit proteins, therefore, the nuclear-encoded homologs could “compensate” for gene losses (e.g., 16–19 tRNAs loss in G-type mitogenome). Meanwhile, the homologs of rps13 and rps19 were not detected from the Cz. merolae 10D but were found in the early branched, mesophilic Cd. chilense, suggesting independent gene losses in both C-type and G-type mitogenomes. It was possible to detect a plastidial-copy, nuclear-copy (host-derived), or other (e.g., unknown sources) ribosomal proteins from homologous searches, which implies the possibility to translocate ribosomal subunits from various origins (e.g., nuclear, plastid, other bacteria) into mitochondria as suggested in previous studies [30–32] .
Changes in amino acid composition in Galdieria-type mitochondrial genes
Enzymes from extremophilic organisms have high thermostability, more charged amino acid composition, and reduced hydrophobic surfaces to withstand extreme temperatures and pH [33, 34]. Furthermore, proteins of thermophilic species are shorter in length than those in their mesophilic counterparts [35]. To investigate protein characteristics, we compared protein charge, hydropathy, and stability in 16 genes (genes in Fig. 3C) that are retained in all 12 mitogenomes (i.e., G-type, C-type, non-cyanidiophycean red algae).
Amino acid composition of these 16 conserved mitochondrial genes shows that 13 out of 20 amino acids are significantly different (asterisks in Fig. 3A; Table S7) between G-type and C-type and this resulted in a difference in the structure of amino acids that effectively modified the protein properties of the genes. Whereas there is a similar amino acid composition between the non-cyanidiophycean red algae (outgroup) and C-type, G-type mitogenomes have a distinct amino acid composition (Fig. 3A). G-type genomes show a higher proportion of positively charged amino acids than those of C-type (G-type: 10.77–10.82%, C-type: 6.84–7.44%; see in Figure S6). A lower negative charge amino acid composition was found in G-type genomes when compared to C-type (G-type: 2.26–2.47%, C-type: 4.06–4.21%; see in Figure S6). Likewise, the influence of amino acid changes altered protein charge, hydrophilicity, and hydrophobicity (Fig. 3B,C).
Hydrophilicity, which was measured for 16 conserved mitochondrial proteins (Fig. 3B), representing the G-type species showed relatively higher hydrophilic amino acids in mitochondrial proteins (50.56–50.97%) than those in other red algal species (C-type and outgroup species: 41.64–43.49%). Because some genes showed dramatic difference in amino acid composition or in protein size, we examined individual gene hydropathy (a scale of hydrophobicity and hydrophilicity) to avoid a biased assessment [36, 37]. G-type proteins are clearly less hydropathic than other groups (Fig. 3C) and 11/16 mitochondrial genes in G-type tend to have reduced protein length (Figure S7A) when compared to other species. These dramatic differences in amino acids (e.g., charge, length) of mitochondrial genes are critical to protein structure that can affect solubility, stability, and their functions [38]. We applied in silico analysis (e.g., instability index, aliphatic index) to calculate the stability of conserved mitochondrial proteins and also found G-type and C-type have a few significant differences (Figure S7B, C) in their mitochondrial proteins. Conserved mitochondrial genes in Cyanidiophyceae are mostly membrane-bound proteins or in mitochondria they form protein complexes (e.g., mitochondrial respiratory complexes). In these cases, protein folding, which is a key factor to understand protein stability and their activity, can be highly dependent on lipid composition of the mitochondrial membrane or a protein-protein interaction with other supermatrix-forming proteins [39, 40]. Therefore, these protein interactions with mitochondrial membrane lipids need further studies.
Extreme GC-skew in Galdieria-type mitogenomes and its associated characteristics
On the basis of mitogenomes comparison, G-type mitogenomes have distinctive characteristics, such as high gene divergence and asymmetric nucleotide substitution. The difference in GC-contents between the C-type and G-type is pervasive across genomes (Figure S8) showing that C-type species have lower GC-contents (25.0-27.1%) than that of G-type species (41.4–44.0%) excluding mesophilic Cd. chilense (44.5%). However, GC-skew (G-C/G + C) and AT-skew (A-T/A + T) are clearly different in C-type and G-type (see Table 1): C-type composed symmetric AT and GC composition balances (AT-skew: 0.01–0.03; GC-skew: 0.01–0.06). In contrast, G-type showed unbalanced AT composition (AT-skew: 0.25–0.29) and extremely asymmetric composition of GC nucleotides (GC-skew: 0.66–0.74). While in a member C-type species, mesophilic Cd. chilense, the GC-content (44.5%) is close to those of G-type, but GC-skew (0.01) or AT-skew (0.03) of mesophilic Cd. chilense are more similar to other C-type mitogenomes. In other words, mesophilic Cd. chilense can be regarded as an intermediate state between C-type and G-type based on its genomic features and phylogenetic position.
All genes, including 17 CDSs, seven tRNAs, and two rRNAs, are located in a single strand of G-type G. sulphuraria 074W mitogenome except for the anticlockwise cob gene, whereas genes in C-type Cz. merolae 10D mitogenome are distributed in both strands as usual (Fig. 4A). According to the directional distribution of genes and extreme GC-skew, genes in G-type mitogenome appear to be substantially strand-biased. The cob gene, which is located in an antisense orientation of G-type species, has lower GC-skew than average GC-skew of G-type mitogenomes (cob gene region GC-skew: 0.43–0.48, mitogenome GC-skew: 0.66–0.74; see Table 1, Figure S9) and shows a higher TIGER value compared to other genes meaning that cob gene contains lower variable sites (TIGER value of cob gene: 0.766, average TIGER value: 0.630; see Figure S4). Based on these observations, we examined the potential impact of extreme GC-skew on G-type mitogenomes.
One of the key indicators to distinguish leading and lagging strands is the GC-skew [41]. In most cases, positive GC-skew reflects the leading strand, whereas negative GC-skew represents the lagging strand [42, 43]. GC-skew analysis of the G-type shows all positive values and its cumulative GC-skew is gradually increased without any decreasing points unlike other red algal species including C-type (Fig. 4A; Figure S9). Mitogenomes with a positive GC-skew in a single strand have been well studied for their replication system, particularly in human mitochondria. Although their precise replication mechanisms are still under investigation, it is accepted that they have a unique asymmetric replication process that contains one unidirectional leading strand and the other unidirectional lagging strand assisted by RNAs without any Okazaki fragments [44–46]. G-type species likely have an asymmetric replication mechanism: a guanine-rich leading strand (H-strand) and a cytosine-rich lagging strand (L-strand). After the separation of L-strand from H-strand, without a bidirectional replication fork, a daughter lagging strand is synthesized by a nascent leading strand and a nascent lagging strand synthesizes a daughter leading strand (Fig. 4B). The synthesis of lagging strand is considered to be more accurate than the leading strand due to the short Okazaki fragments [47], which have been exemplified in some mitochondria (e.g., yeast, fish, and bacterial species) with experimental verifications [48–50]. In addition, different gene substitution rates have been reported between two DNA strands of mitochondrial genome (e.g., higher in lagging strand of fish) [51]. Taken together, G-type species have higher rates of mutation that is likely to be accelerated by unidirectional replication (Fig. 4B).
On the basis of these findings, we propose that unidirectional replication in G-type mitochondria led to higher divergence than in the C-type and other non-cyanidiophycean red algae, which have bidirectional mitogenome replication. Because of mitochondrial replication system divergence, G-type mitogenomes may have different uses of the DNA polymerases described in Jain et al. (2015), resulting in sequence variation. Such accelerated mutation rates of the two DNA strands may potentially contribute to G-type mitogenomes having a higher fitness in rapidly changing environments and play a role in adaptation [52]. We speculate that extreme GC-skew in the G-type resulted in the use of unidirectional replication that led to changes in protein properties, affecting the fitness of mitochondrial proteins in harsh environments.