Genome evolution analysis of K. marxianus explores the genetic basis of its rapid growth
We performed a genome sequencing of the K. marxianus FIM1 strain used in this study, and 8 nuclear chromosome sequences were obtained with total size of 10.9 Mbp. Then genome was annotated via “Yeast Genome Annotation Pipeline”  for gene identification and via non-redundant (nr) database for functional annotation, and finally the GFF3 file was obtained (NCBI Genbank accession numbers CP015054 to CP015061).
To observe the gained/lost genes during K. marxianus speciation in Saccharomycetales, a polygenetic tree was first reconstructed (Fig. 1), which contains 12 species in Saccharomycetaceae, 1 in Phaffomycetaceae, and 1 in Dipodascaceae as outgroup. Herein the concept of ‘gene family’ was introduced (see Methods for more detailed derivation), that each gene family is a set of homologous genes among those 15 yeast species. There were 636 gene families containing a single copy of gene in each species. We concatenated those single-copy genes in each species into a super gene, and reconstructed a phylogenetic tree (Fig. 1) using the maximum likelihood method. The derived phylogenetic tree (Fig. 1) was consistent with the reported tree which contains K. marxianus . Then the number of gained or lost gene families at each node (Fig. 1) was calculated. The high-intense events of genes gain/lost when Saccharomycetaceae occurrence (node 3) and Kluyveromyces occurrence (node 6) indicate the formation of evolutionary family and genus may be substantially attributed to a great change in gene contents.
For K. marxianus, the genes gained at each node during its speciation were analyzed in terms of functional category (Fig. 2). As shown in Fig.2, at nodes 2 and 3, introduced genes were mainly involved in transcription, cell cycle regulation, mitochondrial morphology, carbon metabolism, etc, suggesting the ancestor of K. marxianus endowed with common physiological characteristics of Saccharomycetaceae. At nodes 4 ~ 7, it was majorly introduced with genes in DNA synthesis, chromosome segregation, ATP production, glucose transport, and lactose metabolism, implying that along with the emergence of Kluyveromyces genus, the cell cycle apparatus and ATP synthesis had experienced further refinement which may contribute to growth acceleration, meanwhile the resource catabolism may be adjusted to Kluyveromyces-characterized environments, such as milk. At the last node, genes involved in vacuole function and pre-mRNA processing were introduced. However, there are still a number of K. marxianus-specific genes with totally unknown function, their contribution to growth process may be partly reflected by RNA-seq analysis.
We further compared the copy number of genes in different yeast species (Fig. 3A). It was found that compared to other species, genes involved in flocculation, iron transport, and biotin biosynthesis have particularly high copies in K. marxianus, i.e. 11, 13, and 3, respectively, which may contribute to its fast growth trait.
RNA-seq analysis revealed that 40% of K. marxianus-specific genes were up-regulated and may participate in glucose transport and mitochondrial related function
To analyze the difference of homologous genes’ expression pattern between different species in the same cultivation condition, we chose the type strain S. cerevisiae as reference and cultivated K. marxianus and S. cerevisiae in YPD medium at 30°C. Then RNA-seq analysis was performed at different time points (1h, 4h, 6h, 12h, 24h, 48h, and 72h) during their cultivation, respectively, based on their growth curves (Additional file 1: Figure S1). Additional file 2 provides expression FPKM values of all genes. The gfold algorithm was conducted to measure gene family’s differential expression (see Methods).
For those gene families in Saccharomycetale, there are 55 gene families only containing K. marxianus genes, which are called KMS (K. marxianus-specific) genes herein. As shown by the RNA-seq analysis (Fig. 3B), using 1h as control, 28 KMS gene families had significant expression changes during cultivation, i.e.with 2 fold or more changes at some time point after 1h. Furthermore, 23 gene families were up-regulated (Fig. 3B) and occupied approximately 40% of the total KMS genes. To explore their possible functions, Pearson correlation coefficient (denoted as r) was calculated between KMS gene and gene with known function, based on the hypothesis that strongly co-expressed genes are likely to function in the same or closely related biological pathways [17,18]. Genes with the highest r value (at least r > 0.8) were used to infer the possible function of KMS genes. The predicted function of KMS genes and the corresponding r value were listed in Fig. 2B. It was found that the up-regulated KMS genes may mainly participate in glycogen metabolism, glucose transport, and mitochondrial related function, while the down-regulated KMS genes may be involved in ergosterol, leucine, and purine biosynthesis, which give a clue for the fast growth from the perspective of KMS genes.
Mitochondrial function related genes and highly copied genes were particularly up-regulated in K. marxianus compared to in S. cerevisiae
To analyze the difference of homologous gene expressions between K. marxianus and S. cerevisiae, gene family was preferrably used, which contains a total set of homologous genes and thus is suitable for the many-to-many correspondence of homologous genes between two species. The expression value of a gene family in a species was the sum of the contained genes’ expressions. There were 3653 gene families contain both K. marxianus and S. cerevisiae genes, in which 2535 gene families were identified as differentially expressed in K. marxianus or S. cerevisiae, using respective 1h as control. Those gene families were then clustered into 12 groups according to their expression patterns (Fig. 4, denoted as C1, C2, ..., C12), using k-means method calculated on Euclidean distance. The significantly enriched GO terms (pvalue < 0.05) for each cluster were listed in Table 1 (more details are provided in Additional file 3).
As illustrated in Fig. 4, at the early stage (i.e. time points 4h and 6h), the numbers of differentially expressed genes in K. marxianus are much higher than those in S. cerevisiae. Furthermore, K. marxianus and S. cerevisiae showed dramatic expression change since 12h and 24h, respectively, which are in line with the time point of OD600’s quick increment in the growth curves (Additional file 1: Figure S1).
For the commonality of gene expression pattern between K. marxianus and S. cerevisiae, most genes involved in ribosome biogenesis and protein translation were both evidently down-regulated during cultivation (C1 ~ C3 in Fig. 4), and genes participating in trehalose biosynthesis, TCA cycle, respiratory chain, and autophagy were both heavily up-regulated (C9 ~ C11 in Fig. 4). For the dissimilarity of gene expression between the two species, genes for the intracellular signal transduction (e.g. MAPK cascade) were unchanged in K. marxianus and up-regulated in S. cerevisiae (C12 in Fig. 4). Genes involved in DNA replication and cell cycle were down-regulated in K. marxianus and nearly unchanged in S. cerevisiae (C4 in Fig. 4). Notably, genes participating in biotin biosynthesis, iron transport, flocculation, respiratory chain, and mitochondrion assembly were up-regulated in K. marxianus but unchanged in S. cerevisiae (C7 in Fig. 4), which are consistent with the particularly high copy numbers in K. marxianus. Overall, those findings suggest ribosome biogenesis, signal transduction, and cell cycle may not be key to the fast growth phenotype in K. marxianus, instead, mitochondrial function related genes and highly copied genes may strongly contribute to its fast growth.
Fundamental expression of genes involved in TCA cycle, respiratory chain, and ATP synthesis exhibit higher levels in K. marxianus compared to in S. cerevisiae
Regarding expression at 1h as the fundamental level for gene transcription, we compared this between K. marxianus and S. cerevisiae. Using S. cerevisiae 1h as control, there were 357 gene families with significantly higher expressions and 437 gene families with lower expressions in K. marxianus 1h. The enriched GO term for those genes were listed in Table 2 (see details in Additional file 4). It is clear that compared to S. cerevisiae, K. marxianus has enhanced fundamental expression in TCA cycle, respiratory chain, and ATP synthesis, while genes involved in cellular response to drug and amino acid transport had relatively lower expressions. We further validated the fundamental expression comparison of respiratory chain genes and ATP synthesis genes by RT-qPCR analysis, whose results in Fig. 5 were perfectly consistent with the findings derived by RNA-seq. Therefore, at the fundamental expression level, K. marxianus has remarkably higher gene expressions in mitochondrial function, including respiratory chain and ATP synthesis, which may greatly contribute to its fast growth.
K. marxianus and S. cerevisiae have significant difference in metabolic pathways during their fastest growth phase
Due to the cultivation experiments (Additional file 1: Figure S1), the fastest growth phase for K. marxianus was 6h ~ 12h and for S. cerevisiae was 12h ~ 24h herein. This was also confirmed by the RNA-seq analysis (Fig. 4), i.e. there was a genome-wide dramatic expression change between 6h and 12h for K. marxianus and between 12h and 24h for S. cerevisiae. We further analyzed homologous gene expression change during the fastest growth phase.
For K. marxianus, treating 6h as control, gfold values at 12h were calculated, while for S. cerevisiae, treating 12h as control, gfold values at 24h were calculated. Then gene families were ordered according to K. marxianus’ and S. cerevisiae’s gfold values, respectively, as shown in Fig. 6A and Fig. 6B. The majority of enriched GO for the up-regulated and down-regulated genes were also provided in Fig. 6 (details in Additional file 5). Gene expression changes during K. marxianus’ fastest growth phase were mainly consistent with those in S. cerevisiae, but there are also some inconsistence (Fig. 6). The enriched GO terms for the consistent and inconsistent parts were listed in Table 3 (details in Additional file 6). In the consistency, TCA cycle and respiratory chain were both up-regulated in K. marxianus and S. cerevisiae, while cell cycle related genes had little expression change in both species. In the inconsistency, glucose import, iron homeostasis, and ATP biosynthesis were up-regulated in K. marxianus but unchanged in S. cerevisiae. While genes for protein folding and response to heat were unchanged in K. marxianus but up-regulated in S. cerevisiae. Those difference in metabolic pathways during their fastest growth phase, may contain the key for unlocking K. marxianus’ fast growth mystery.
K. marxianus preferentially co-up-regulated respiratory chain, ATP synthesis, and glucose import during the fastest growth phase
We further compared the changes in central metabolic pathways between K. marxianus and S. cerevisiae during their fastest growth phase. As shown in Fig. 7, the central metabolism was divided into material metabolism (in the left part, including glucose transport, glycolysis, and TCA cycle) and energy metabolism (in the right part, including respiratory chain, ATP synthesis, iron transport, and heme biosynthesis). For the material metabolism, glucose import was particularly enhanced in K. marxianus but unchanged in S. cerevisiae, while glycolysis and TCA cycle were consistent between K. marxianus and S. cerevisiae. For the energy metabolism, iron transport and heme biosynthesis, which take part in respiratory chain, were specially up-regulated in K. marxianus but unchanged in S. cerevisiae. For the respiratory chain itself (not including the downstream ATP synthesis), it was enhanced both in K. marxianus and S. cerevisiae, with a stronger up-regulation in K. marxianus. For the ATP synthesis via F0F1 ATPase, it was strengthened in K. marxianus but slightly down-regulated in S. cerevisiae, suggesting in K. marxianus the respiratory chain and ATP biogenesis are particularly tightly coupled. Notably, as aforementioned, during this phase, glucose transport was also evidently up-regulated in K. marxianus but not in S. cerevisiae. The implication for the co-up-regulated glucose import and respiratory chain in K. marxianus will be explored in great details in the discussion part.
To gain insight into the underlying mechanism for the co-expression of respiratory chain and ATP synthesis in K. marxianus but not in S. cerevisiae, we analyzed the upstream of the involved genes. The upstream 1kb of 27 genes in respiratory chain and 14 genes in ATP synthesis were analyzed via MEME software online  for motif discovery. There were six significant motifs (E-value < 0.05) located in those upstream regions in K. marxianus (Fig. 8A), but only one significant motif in S. cerevisiae (Fig. 8B). The sequences and location of such motifs were provided in Fig. 8. Similarly, the upstream 1kb of 27 genes in respiratory chain and 9 genes in glucose import were also analyzed by MEME. Three significant motifs were detected in K. marxianus (Fig. 8C) and one motif identified in S. cerevisiae (Fig. 8D). The above findings provide molecular support for the particular co-expression of glucose transport, respiratory chain, and ATP synthesis in K. marxianus, indicating the importance of regulatory region evolution during K. marxianus speciation.