Genomic Characteristics
The Rhodotorula sphaerocarpa ETNP2018 genome was assembled into 115 scaffolds, with a total size of 17.7 Mbp, and an N50 value of 377,844 (Table 1). The 6,451 gene models gave a gene density of 364 genes/Mbp, on par with previously published literature on marine fungi [73]. BUSCO estimated that the genome is 97.3% complete. Of the 1764 BUSCOs from the OrthoDB v10 database for Basidiomycota, 1713 are present as single copies, 3 as duplicated copies, and 13 as fragmented in the Rhodotorula sphaerocarpa ETNP2018 genome. There are 5,324 eukaryotic cluster of orthologs (KOGs), 7,872 protein family (Pfam) domains, and137 tRNAs in the assembly. A total of 3,210 (49.8%) genes were annotated by KEGG orthology, including 172 CAZymes. The average genome size of all 15 Rhodotorula strains is 20.3 ± 1.6 Mbp and there are 3,275 ± 154 annotated proteins in each (Table 1). All 15 representative strains average 5,610 ± 360 KOG assignments, 8,240 ± 418 Pfam domains, and 194 ± 19 CAZymes. Marine yeasts isolated from seawater including R. sphaerocarpa ETNP2018, R. sphaerocarpa GDMCC 60679, R. diobovata 08-225, and R. mucilaginosa CYJ03 have an average genome size of 19.0 ± 1.6 Mbp with 3,203 ± 99 annotated proteins (Table 1).
Table 1
General genomic characteristics of Rhodotorula sphaerocarpa ETNP2018 and fourteen representative Rhodotorula strains.
Source | Species | Strain | Total Length (Mbp) | GC (%) | Scaffolds | Scaffold N50 | Contigs | Contig N50 | Annotated Proteins | Gene Models | tRNA |
Seawater | R. sphaerocarpa | ETNP2018 | 17.7 | 63.4 | 115 | 377,844 | 120 | 356,447 | 3210 | 6451 | 137 |
Seawater | R. sphaerocarpa | GDMCC 60679 | 18.0 | 63.0 | 32 | 1,074,774 | 32 | 1,074,774 | 3213 | 6310 | 316 |
Seawater | R. diobovata | 08-225 | 21.1 | 67.0 | 361 | 118,648 | 678 | 82,556 | 3315 | 7741 | 183 |
Seawater | R. mucilaginosa | CYJ03 | 19.1 | 60.5 | 88 | 420,192 | 88 | 420,192 | 3073 | 6620 | 137 |
Marine Sediment | R. paludigena | P4R5 | 21.0 | 64.3 | 277 | 180,700 | 290 | 171,007 | 3406 | 7430 | 127 |
Freshwater | R. glutinis | ZHK | 22.3 | 67.8 | 30 | 1,466,672 | 49 | 963,562 | 3376 | 7569 | 154 |
Freshwater | R. kratochvilovae | YM25235 | 23.7 | 67.3 | 46 | 1,067,950 | 46 | 1,067,950 | 3581 | 8224 | 294 |
Acid Mine Drainage | R. taiwanensis | MD1149 | 19.6 | 61.7 | 181 | 388,693 | 227 | 345,821 | 3202 | 6961 | 113 |
Endophytic | R. kratochvilovae | Y14 | 22.0 | 67.5 | 46 | 1,029,848 | 46 | 1,029,848 | 3465 | 7958 | 194 |
Endophytic | R. graminis | WP1 | 21.0 | 67.8 | 26 | 1,420,730 | 322 | 167,431 | 3213 | 6875 | 152 |
Terrestrial (I.S.S) | R. mucilaginosa | F6_4S_B_2B | 20.2 | 60.6 | 199 | 432,962 | 222 | 353,031 | 3261 | 7040 | 124 |
Terrestrial | R. toruloides | NBRC 0880 | 20.7 | 61.8 | 30 | 1,390,799 | 30 | 1,390,799 | 3429 | 8284 | 128 |
Terrestrial | R. mucilaginosa | B1 | 20.0 | 60.6 | 225 | 256,462 | 225 | 256,462 | 3256 | 7052 | 113 |
Terrestrial | R. sp. | JG-1b | 19.4 | 67.0 | 156 | 301,937 | 171 | 280,417 | 3036 | 6396 | 115 |
Terrestrial | R. sp. | CCFEE5036 | 19.1 | 60.6 | 155 | 337,802 | 201 | 256,314 | 3096 | 6624 | 120 |
Average of all Strains | | | 20.3 | 64 | 131 | 684,401 | 183 | 547,774 | 3275 | 7169 | 160 |
Stdev. of all Strains | | | 1.6 | 3 | 103 | 493,511 | 168 | 425,961 | 154 | 662 | 64 |
Average of seawater strains | | | 19 | 63.5 | 149 | 497,865 | 230 | 483,492 | 3,203 | 6781 | 193 |
Stdev. Of seawater strains | | | 1.5 | 2.7 | 145 | 407,049 | 301 | 420,520 | 99 | 653 | 85 |
A one sample t-test found the difference in genomic size between R. sphaerocarpa ETNP2018 and the average of all strains to be significant (p = 1.059x10− 5). R. sphaerocarpa ETNP2018’s genome is the smallest of the marine strains; however, a one sample t-test found this difference to be insignificant (p = 0.099). The difference was significant when compared with the mean genome size of freshwater, endophytic, and terrestrial strains, however (p = 0.04, 0.039, and 0.0008, respectively). In contrast, the number of genes with functional annotations was not different between R. sphaerocarpa ETNP2018 and the average of all strains, as well as marine, freshwater, endophytic, and terrestrial source category averages (p = 0.06, 0.44, 0.12, 0.57, and 0.24, respectively).
The genome size of R. sphaerocarpa ETNP2018 was 10.8% lower than the average genome size of five representative terrestrial Rhodotorula strains (Table 2). This reduction of 10.8% was the largest amongst the marine strains of Rhodotorula, suggesting that the environmental pressures of OMZ conditions favor fungal strains with smaller genomes in comparison to other marine environments. The degree of genome reduction we observed in marine Rhodotorula strains was similar to that in the marine bacterium Pelagibacter ubique HTCC1062, a well-documented example of a streamlined microbe (Table 2) [74]. However, the genome reduction by the heterotrophic yeast and bacteria was not as large as the reduction (38%) found in the marine chemoautotrophic Nitrosopumilus maritimus SCM1 when it was compared to five terrestrial strains of ammonia-oxidizing archaea (Table 2; Table S2).
Table 2
The percent reduction in the genomic size of marine Rhodotorula strains, compared with the average of the representative terrestrial Rhodotorula. Pelagibacter ubique HTCC1062 and Nitrosopumilis matrimis SCM1 were chosen as well-known examples of streamlined marine microbes for comparison. P. ubique HTCC1062 was compared to five terrestrial strains of the Rickettsieae subclass and N. matrimis SCM1 to five representative ammonia-oxidizing archaea, all retrieved from the NCBI assembly database.
Species | Strain | Genome Size (bp) | Terrestrial Average | % Reduction |
R. sphaerocarpa | ETNP2018 | 17,716,787 | 19,867,559 | 10.83 |
R. sphaerocarpa | GDMCC 60679 | 18,031,004 | 19,867,559 | 9.24 |
R. paludigena | P4R5 | 21,142,622 | 19,867,559 | 6.42 |
R. mucilaginosa | CYJ03 | 19,073,214 | 19,867,559 | 4.00 |
Pelagibacter ubique | HTCC1062 | 1,308,759 | 1,413,279 | 7.40 |
Nitrosopumilus maritimus | SCM1 | 1,645,259 | 2,670,916 | 38.40 |
The number of KOGs related to translation and ribosomal biogenesis, the transport and metabolism of amino acids, carbohydrates, lipids, secondary metabolites, and coenzymes were lower in the genome of R. sphaerocarpa ETNP2018 than the other 14 Rhodotorula strains (Tables S3 and S4) [12]. Nevertheless, R. sphaerocarpa ETNP2018 contained the core set of protein-coding genes despite nutrient scarcity and its small genome (Figure S1), which is consistent with streamlining in response to nutrient deprivation [8]. Previously conducted studies on streamlined microbes have found that the average genome size were the smallest for microorganisms isolated from oligotrophic seawater and the largest for those isolated from soil [8]. Microorganisms isolated from freshwater exhibit a broad spectrum of genome sizes [8]. We find Rhodotorula genomes from both ends of the size spectrum consistent with this theory. Among the 15 Rhodotorula strains, R. glutinis ZHK and R. kratochvilovae YM25235 were the largest in genome size, and they were isolated from eutrophied environments (the Pearl River and Chenghai Lake, respectively) [15, 18]. The genomes of two soil strains, R. sp. JG-1b and R. sp. CCFEE5036, were smaller than the average genome size of all 15 strains. However, both of these strains were isolated from permafrost and hyper-arid soil in Antarctica’s McMurdo Dry Valley [75, 76]. Their reduced genome size could be related to the extreme conditions of their environment with low nutrient availability.
Phylogeny
Phylogenomic analysis revealed several monophyletic clades based on species (Fig. 1). R. sphaerocarpa ETNP2018 was closely related to the other R. sphaerocarpa strain isolated from aquaculture seawater in Maoming, Guangdong, China. R. toruloides NBRC 0880 represents the basal taxon of the 15 Rhodotorula genomes examined. The same Rhodotorula species have been isolated from drastically different environments (Fig. 1). This could be attributable to the ability of Rhodotorula yeasts to adapt to a diverse range of environmental conditions [14, 75]. R. mucilaginosa has been isolated from soil, both animal and plant microbiomes, industrial mineral deposits, the international space station, and the marine water column [19, 77]. Our phylogenomic reconstruction suggests that R. sp. CCFEE5036 and R. sp. JG-1b are both strains of the species R. mucilaginosa. Phylogenetic and phenotypic analyses have described R. sp. JG-1b as a novel species that comprises a sister clade of R. mucilaginosa; the divergence of R. sp. JG-1b from its sister clade could be attributed to genomic streamlining in response to its oligotrophic environment [20].
CAZymes
CAZymes identified in the R. sphaerocarpa ETNP2018 genome include 56 glycoside hydrolases (GH), 84 glycosyltransferases (GT), 17 related to auxiliary activities (AA), 3 carbohydrate binding module (CBM), 8 carbohydrate esterase (CE), and 4 polysaccharide lyase (PL) families (Table S5). Chitinase (GH18), xyloglucanase (GH16), β-hexosaminidase (GH20), both α- and β-glucosidase (GH3/31), invertase (GH32), α,α-trehalase (GH37), α-mannosidase (GH38/47), and cellulase (GH5) glycoside hydrolase CAZymes are all conserved across the Rhodotorula genus (Fig. 2). Therefore, Rhodotorula yeasts have the potential to digest chitin, xylan, hexoses, trehalose, some mannose, and cellulose [66]. Trehalase and glycogen debranching CAZymes (GH13) are also present in all 15 representative members of the Rhodotorula genus, indicating use of both storage polysaccharides to maintain energy production in response to potential carbon deprivation (Fig. 2). The most prevalent and abundant glycosyl hydrolase families include GH5, GH16, and GH18 (Figure S2).
Chitin is the most abundant biopolymer found in the marine environments and thus an important source of carbon and nitrogen for marine microbes [78]. Chitin is produced throughout the water column by fungi, protists, and crustaceans, yet is utilized so rapidly that it is present only in trace concentrations in marine sediments. The importance of chitin as a source of nutrient for marine fungi is relatively understudied however, as most marine chitin degradation is attributed to bacteria [78]. Chitinase is conserved throughout the Rhodotorula genus. GH18 was most prevalent in genomes of freshwater strains (7 ± 1.4) and the least in genomes of marine strains (4 ± 1.4). Chitin degradation via chitinase results in a wide array of oligomers including diacetylchitobiose. The endo-β-N-acetylglucosaminidase (GH85, EC: 3.2.1.96), which degrades diacetylchitobiose into monomeric residues of β-1,4-N-acetyl-D-glucosamine (GlcNAc) and subsequently soluble sugars and dissolved organic nitrogen, was present in all Rhodotorula genomes except for R. graminis WP1 and R. kratochvilovae YM25235 (Fig. 2) [78, 79].
GH2, a β-mannosidase, and GH76, an α-1,6-mannanase, are absent exclusively in the two genomes of R. sphaerocarpa strains isolated from seawater (Fig. 2). Polygalacturonase (GH28), β-glucoronyl hydrolase (GH88), chitonsanase (GH75), and β-mannanase (GH26) CAZymes are absent in R. sphaerocarpa ETNP2018 as well as several other strains (Fig. 2). Mannanase and mannosidase allow yeast to ferment mannose to ethanol as a source of carbohydrates [80]; reduction in mannose hydrolyzing CAZymes in the R. sphaerocarpa ETNP2018 genome suggests that it is not a commonly utilized substrate for the strain. Mannans are typically found in plant vacuoles and the endosperm of seeds, as well as the cell walls of certain yeasts [80]. These sources place open ocean yeast such as R. sphaerocarpa ETNP2018 far from a consistent supply of mannans, suggesting that its reduction in mannanase and mannosidase CAZymes is a response to the low encounter frequency for the substrate [80, 81].
A recent study demonstrated a positive correlation between a fungal strain’s repertoire of CAZymes and its saprophytic tendencies [82]. This suggests that strains which encode few CAZymes, such as those isolated from the water column, have less saprophytic tendencies and encounter fewer carbohydrates than those with higher counts, such as freshwater or endophytic strains. Fungi are the dominant detritovores in eutrophic freshwater ecosystems such as streams and wetlands, and endophytic fungi are reported to opportunistically utilize saprophytic feeding mechanisms after the death of their host plant [83, 84]. The availability of organic matter in these environments makes the synthesis of many different CAZymes more energetically favorable in comparison with the oligotrophic open ocean. Given the low availability of organic matter in the open ocean water column, R. sphaerocarpa ETNP2018 likely streamlined its genome to reduce unnecessary and biosynthetically expensive CAZymes.
Central Carbon Metabolisms
The Embden-Meyerhof-Parnas pathway, Tricarboxylic acid (TCA) cycle, glyoxylate cycle, and pentose phosphate pathway were present in their entirety in the R. sphaerocarpa ETNP2018 genome (Figure S1). Potential substrates for these pathways include glucose, the cell’s preferred substrate, as well as acetate, ethanol, D-lactate, L-glutamine, and oxaloacetate (Fig. 3). The glyoxylate cycle, a secondary shunt of the TCA cycle localized in the peroxisome, utilizes isocitrate lyase (EC: 4.1.3.1) to catalyze the conversion of isocitrate to glyoxylate as well as succinate to malate without requiring the energy intensive decarboxylation steps required to form S-succinyl-dihydrolipoamide-E from isocitrate during the TCA cycle (Fig. 2) [85]. Glyoxylate cycle genes in yeast have been shown to upregulate in macrophage-engulfed Candida yeasts, concurrent with a downregulation of transcriptional machinery and glycolytic enzymes, allowing the cell to acquire carbon through alternative sources to glucose and conserve energy [85]. This suggests preferential use of the glyoxylate cycle as a response to glucose deprivation which is typical in the oligotrophic ocean.
In case physical processes transport R. sphaerocarpa ETNP2018 to the anoxic part of the water column, its genome shows the potential to ferment pyruvate via the enzyme pyruvate decarboxylase (EC: 4.1.1.1), creating acetaldehyde that can be further converted to acetate, ethanol, or a carboxylic acid (Fig. 2). Acetate is synthesized from acetaldehyde via aldehyde dehydrogenase (EC: 1.2.3.1) to replenish acetyl-CoA using its acetyl group (Fig. 2). Ethanol is then synthesized via alcohol dehydrogenase (EC: 1.1.1.2) alongside the interconversion of NADH to NAD+ as a mechanism of replenishing the intracellular reducing agent (Fig. 2; Fig. 3). It can then be excreted passively or converted to acetate in the peroxisome (Fig. 3). The small number of Pfam domains related to short chain dehydrogenase enzymes (PF00106), which are responsible for fermentative reactions on aldehydes and alcohols (Table 3), suggests the niche of R. sphaerocarpa ETNP2018 is not the anoxic portion of the water column. Reduced fermentative machinery may rather serve as an adaptation to their oligotrophic yet oxygenated environment, where biosynthetic resources are at a premium, and anaerobic metabolisms act as a last resort response to unfavorable changes in conditions.
Table 3
Pfam domains with major depletions in the genome of R. sphaerocarpa ETNP2018 as compared to the average of all Rhodotorula strains studied.
Domain ID | Description | R. sphaerocarpa ETNP2018 | Avg | SD |
PF07690 | Major Facilitator Superfamily | 101 | 129.73 | 19.02 |
PF00172 | Fungal Zn(2)-Cys(6) binuclear cluster domain | 49 | 55.73 | 11.63 |
PF00106 | Short Chain Dehydrogenase | 37 | 45.27 | 6.05 |
PF13561 | Enoyl-(Acyl carrier protein) reductase | 35 | 44.40 | 5.95 |
PF08659 | KR Domain | 30 | 36.93 | 5.42 |
PF04082 | Fungal specific transcription factor domain | 21 | 29.67 | 3.96 |
PF12937 | F-box-like | 20 | 34.60 | 13.13 |
PF00646 | F-box domain | 14 | 22.20 | 4.78 |
PF01753 | MYND finger | 7 | 24.73 | 15.13 |
PF03171 | 2OG-Fe(II) oxygenase superfamily | 5 | 9.60 | 3.42 |
PF02668 | Taurine catabolism dioxygenase TauD, TfdA family | 5 | 9.93 | 3.63 |
PF01179 | Copper amine oxidase, enzyme domain | 0 | 2.07 | 1.03 |
PF02727 | Copper amine oxidase, N2 domain | 0 | 1.07 | 0.70 |
PF03452 | ANP1 (alpha-1,2-mannosyltransferase) | 0 | 1.67 | 0.72 |
PF02194 | PXA domain | 0 | 1.67 | 0.72 |
PF07683 | Cobalamin synthesis protein cobW C-terminal domain | 0 | 1.67 | 0.59 |
Transporters
Compared to other Rhodotorula strains, the genomes of both R. sphaerocarpa ETNP2018 and GDMCC 60679 are particularly low in the number of the Pfam domain encoding the major facilitator superfamily (MFS) of transporters (PF07690) (Table 3), which play a significant role in the cross-membrane transport of organic solutes. This suggests R. sphaerocarpa strains isolated from seawater have streamlined their genomes given the low substrate availability in the ocean. MFS transporters also symport H+ with siderophores, organometallic molecules formed by prokaryotes to sequester ferric iron [86, 87]. Although yeasts are considered to utilize siderophore assimilation as an opportunistic iron uptake mechanism [87], the low number of MSF transporters in the genomes of R. sphaerocarpa strains suggests a low competitiveness in siderophore acquisition, a tradeoff resulting from genome streamlining.
Nevertheless, the number of other metal transporters annotated by Pfam was not lower in the genomes of R. sphaerocarpa ETNP2018 and GDMCC 60679 in comparison to the other strains investigated. High affinity iron permease (PF03239) was conserved across all fifteen representative Rhodotorula, suggesting ferrous iron intake provides them with much of the iron required for protein synthesis. One copy of PF10566, which contains natural resistance associated macrophage protein (Nramp) transporters, was conserved across all 15 Rhodotorula strains, aside from R. toruloides NBRC 0880, which contains two, and R. kratochvilovae YM25235, which contains zero. Nramp transporters, belonging to the Smf family of genes, are responsible for the cross-membrane transport of a variety of transition metals [88, 89]. Smf proteins demonstrate the highest affinity for Cu2+ and Mn2+ and are thought to be responsible for the high-affinity Mn2+ uptake system, but also show function in transporting ferrous iron, copper, nickel, cadmium, cobalt, zinc, and manganese [86, 88].
Nitrogen Assimilation
Ten of the fifteen representative Rhodotorula strains encode the genes for nitrate assimilation pathways (Fig. 2), through which nitrate and nitrite are transported into the cell by the nitrate/nitrite transporter narK and reduced to ammonium via the enzymes nitrate reductase (EC: 1.7.1.1) and nitrite reductase (EC: 1.7.1.4) (Fig. 3). Six of these ten genomes were isolated from aquatic sources: two from freshwater, R. glutinis ZHK and R. kratochvilovae YM25235, and four from the marine environment, R. sphaerocarpa ETNP2018, R. paludigena P4R5, R. sphaerocarpa GDMCC 60679, and R. diobovata 08-225 (Fig. 2). The only aquatic yeast lacking this genetic potential is R. mucilaginosa CYJ03, isolated from the Yellow Sea in Yunnan, China (Fig. 2) [77].
Nitrate assimilation, in particular the reduction of nitrate in the cytosol, is an energetically expensive process [90]. Strains lacking the genetic potential to assimilate nitrate were largely isolated from environments where competition for resources is less intense and alternative sources of nitrogen (e.g. ammonium, urea) are likely readily available. R. mucilaginosa CYJ03 was isolated from the northern Yellow Sea, which has been eutrophied for decades [91]. It can therefore be inferred that R. mucilaginosa CYJ03 encounters comparatively high ammonium concentrations in the water column and to synthesize nitrate reductase would constitute a waste of biosynthetic resources.
Yeasts in the genus Rhodotorula have previously displayed the ability to grow on acetonitrile as a sole nitrogen source [92]. Nitrile hydratase (NHase) proteins, together with amidases (EC: 3.5.1.4), mediate a two-step metabolism of nitrile compounds such as acetonitrile to amides and acids; a second nitrile-hydrolyzing enzyme found in yeast, nitrilase (EC: 3.5.5.1), can perform the same reaction in one step. One of either nitrilase or cyanoalanine nitrilase (EC: 3.5.5.4) was found in all Rhodotorula genomes except for the two R. sphaerocarpa strains, R. sp. CCFEE5036, and R. taiwanensis MD1149 (Fig. 2). However, none of the representative strains contained genes for NHase synthesis. Acetonitriles are predominately released via terrestrial biomass burning and constitute only a trace gas in the global atmosphere, placing open water R. sphaerocarpa strains far from stable sources of incorporable nitriles [93]. The lack of NHase, CobW, and nitrilase genes exclusively in both R. sphaerocarpa ETNP2018 and R. sphaerocarpa GDMCC 60679 suggests that as the lineage was diverging, R. sphaerocarpa strains did not retain CobW or nitrilase genes potentially due to a lack of available nitrile compounds in the ocean.
Secondary Metabolisms
All Rhodotorula genomes contained between four and six BGCs, primarily from the categories of non-ribosomal polyketide synthase (NRPS) and Terpene synthesis. The genome of R. sphaerocarpa ETNP2018 included NRPS-like clusters 1.1 and 6.1 as well as terpene clusters 8.1 and 9.1. The core biosynthetic gene of terpene cluster 9.1 functions in the formation of an isoprenoid biosynthetic complex. An NCBI BLAST search identified both lycopene β-cyclase (EC: 5.5.1.19) and phytoene synthase (EC: 2.5.1.32) domains in the complex. Both lycopene and phytoene result from the digestion of cytosolic acetyl-CoA during the mevalonate (MVA) pathway, which converts acetyl-CoA into isopentyl diphosphate (Fig. 3) [94]. Phytoene is converted to lycopene via the enzymatic action of phytoene desaturase (EC: 1.3.99.30), where it can be further metabolized to form the carotenoid β-carotene via lycopene β-cyclase [94]. Yeast carotenoids are responsible for protection from over-exposure to ultraviolet light, in addition to proposed antimicrobial activity [95]. Rhodotorula have been shown to increase carotenogenesis as light intensity increases, indicating the molecules have a photoprotective role in the cell [95, 96]. R. sphaerocarpa ETNP2018 also encodes five laccases (EC: 1.10.3.2) auxiliary activity CAZymes (AA1), the most of all fifteen Rhodotorula analyzed (Table S6). Fungal laccases are ligninolytic enzymes, yet are also known to function in plant pathogenicity, detoxification, and pigment modification [97]. Laccase degrades β-carotene and other carotenoids, indicating potential functions in UV protection and nutrient recycling via the breakdown of intracellular carotenoid pigments, especially given the low concentration of lignin found in the water column R. kratochvilovae YM25235 encodes only one terpene BGC and two laccase CAZymes, despite having an exceptionally large genome and living in a eutrophic environment (Table S6) [18]. Light is attenuated more rapidly in eutrophic lakes than in pelagic seawater due to the presence of particulate and dissolved organic matter, meaning lake yeast would have less exposure to potentially harmful UV light and therefore have a reduced requirement for pigmented molecules [98].