Comparative genomic analysis reveals metabolic mechanisms for Kluyveromyces marxianus’ fast growth during evolution

Using yeast fermentation to produce bioethanol, is an economic and renewable way to tackle the rapid increase in fuel consumption. Faster cell growth rate guarantees the superior result of fermentation course. The “non-conventional” yeast Kluyveromyces marxianus is the known fastest-growing eukaryote on the earth. Although its wide application in industry, the molecular mechanisms for its fast growth have seldom been discovered. out a comparative genome content analysis for K. marxianus evolution in Saccharomycetaceae and identied the marxianus speciation. Then RNA-seq analyses for and at different time points along cultivation were performed, to infer the function of and to nd out the difference in homologous gene expression patterns between the two species. results were further validated with RT-qPCR analysis. Genome content analysis shows the highly intense events of genes’ gain/loss happened particularly high participating in The

guarantees the superior result of fermentation course [6]. K. marxianus is the fastest-growing eukaryote known so far [7]. The fastest growth rate for K. marxianus is 0.80 h -1 [7], for K. lactis is 0.50 h -1 [1], for Saccharomyces cerevisiae is 0.37 h -1 [1], and for Pichia pastoris is 0.18 h -1 [8]. Though the famous recognition of K. marxianus' fast growth, relatively little is known about the underlying mechanisms that carry out this phenotype.
Based on the literatures to our knowledge, through the pH-auxostat cultivation passage of K. marxianus, it elevates its own growth rate by enlarging cellular surface area with activated membrane processes [7]. However, how K. marxianus ensures its fast growth compared to other yeast species, especially in the aspects of molecular genetics and intracellular metabolic ow regulation, has not been reported so far.
Currently, ve K. marxianus strains have been completely sequenced, including KCTC 17555 [9], DMB1 [10], CCT 7735 [11], DMKU 3-1042 [12], and NBRC1777 [13]. According to those genome data, K. marxianus has only a few speci c genes with totally unknown function, which are not found in other species. The majority of K. marxianus genes have homologies in other species (e.g. S. cerevisiae) with reported functions. What are the possible functions of K. marxianus-speci c genes, and how do those speci c genes and homologous genes contribute to K. marxianus' fast growth trait? These questions should be further explored.
Transcriptome studies for K. marxianus have focused on its physiological properties, such as xylose catabolism [12,14], high temperature resistance [12], and ethanol tolerance [3]. These current studies mainly analyzed K. marxianus' gene expression changes in different environments. However, the investigation in parallel for homologous gene expression pattern comparison between K. marxianus and other yeast species has been little reported. There is a challenge to carry out this, since the homologous gene matches between different species are often many-to-many. If use one-to-one match, many important information would be missed. Thus we use the concept of 'gene family' to approach this problem, which will be described in the following part detailedly.
In this study, we carried out a complete genome sequencing and annotation of the K. marxianus FIM1 strain used in our laboratory. Then comparative genomic analysis for K. marxianus evolution in Saccharomycetaceae was performed, to nd out the gained/lost genes during K. marxianus speciation and its particularly highly copied genes. Furthermore, S. cerevisiae was chosen as the control strain and RNA-seq analyses of K. marxianus and S. cerevisiae at different time points along cultivation were performed. Then we compared homologous gene expression patterns as well as metabolic pathways between those two species, to scope out K. marxianus's unique regulation patterns. Combining the analysis of genome content evolution and homologous gene expression regulation, we hope the mystery of K. marxianus' fast growth could be revealed.

Results
Genome evolution analysis of K. marxianus explores the genetic basis of its rapid growth We performed a genome sequencing of the K. marxianus FIM1 strain used in this study, and 8 nuclear chromosome sequences were obtained with total size of 10.9 Mbp. Then genome was annotated via "Yeast Genome Annotation Pipeline" [15] for gene identi cation and via non-redundant (nr) database for functional annotation, and nally the GFF3 le was obtained (NCBI Genbank accession numbers CP015054 to CP015061).
To observe the gained/lost genes during K. marxianus speciation in Saccharomycetales, a polygenetic tree was rst reconstructed (Fig. 1), which contains 12 species in Saccharomycetaceae, 1 in Phaffomycetaceae, and 1 in Dipodascaceae as outgroup. Herein the concept of 'gene family' was introduced (see Methods for more detailed derivation), that each gene family is a set of homologous genes among those 15 yeast species. There were 636 gene families containing a single copy of gene in each species. We concatenated those single-copy genes in each species into a super gene, and reconstructed a phylogenetic tree (Fig. 1) using the maximum likelihood method. The derived phylogenetic tree ( Fig. 1) was consistent with the reported tree which contains K. marxianus [16]. Then the number of gained or lost gene families at each node ( Fig. 1) was calculated. The high-intense events of genes gain/lost when Saccharomycetaceae occurrence (node 3) and Kluyveromyces occurrence (node 6) indicate the formation of evolutionary family and genus may be substantially attributed to a great change in gene contents.
For K. marxianus, the genes gained at each node during its speciation were analyzed in terms of functional category (Fig. 2). As shown in Fig.2, at nodes 2 and 3, introduced genes were mainly involved in transcription, cell cycle regulation, mitochondrial morphology, carbon metabolism, etc, suggesting the ancestor of K. marxianus endowed with common physiological characteristics of Saccharomycetaceae. At nodes 4 ~ 7, it was majorly introduced with genes in DNA synthesis, chromosome segregation, ATP production, glucose transport, and lactose metabolism, implying that along with the emergence of Kluyveromyces genus, the cell cycle apparatus and ATP synthesis had experienced further re nement which may contribute to growth acceleration, meanwhile the resource catabolism may be adjusted to Kluyveromyces-characterized environments, such as milk. At the last node, genes involved in vacuole function and pre-mRNA processing were introduced. However, there are still a number of K. marxianusspeci c genes with totally unknown function, their contribution to growth process may be partly re ected by RNA-seq analysis.
We further compared the copy number of genes in different yeast species (Fig. 3A). It was found that compared to other species, genes involved in occulation, iron transport, and biotin biosynthesis have particularly high copies in K. marxianus, i.e. 11, 13, and 3, respectively, which may contribute to its fast growth trait.
RNA-seq analysis revealed that 40% of K. marxianus-speci c genes were up-regulated and may participate in glucose transport and mitochondrial related function To analyze the difference of homologous genes' expression pattern between different species in the same cultivation condition, we chose the type strain S. cerevisiae as reference and cultivated K. marxianus and S. cerevisiae in YPD medium at 30°C. Then RNA-seq analysis was performed at different time points (1h, 4h, 6h, 12h, 24h, 48h, and 72h) during their cultivation, respectively, based on their growth curves (Additional le 1: Figure S1). Additional le 2 provides expression FPKM values of all genes. The gfold algorithm was conducted to measure gene family's differential expression (see Methods).
For those gene families in Saccharomycetale, there are 55 gene families only containing K. marxianus genes, which are called KMS (K. marxianus-speci c) genes herein. As shown by the RNA-seq analysis (Fig. 3B), using 1h as control, 28 KMS gene families had signi cant expression changes during cultivation, i.e.with 2 fold or more changes at some time point after 1h. Furthermore, 23 gene families were up-regulated (Fig. 3B) and occupied approximately 40% of the total KMS genes. To explore their possible functions, Pearson correlation coe cient (denoted as r) was calculated between KMS gene and gene with known function, based on the hypothesis that strongly co-expressed genes are likely to function in the same or closely related biological pathways [17,18]. Genes with the highest r value (at least r > 0.8) were used to infer the possible function of KMS genes. The predicted function of KMS genes and the corresponding r value were listed in Fig. 2B. It was found that the up-regulated KMS genes may mainly participate in glycogen metabolism, glucose transport, and mitochondrial related function, while the down-regulated KMS genes may be involved in ergosterol, leucine, and purine biosynthesis, which give a clue for the fast growth from the perspective of KMS genes.
Mitochondrial function related genes and highly copied genes were particularly up-regulated in K. marxianus compared to in S. cerevisiae To analyze the difference of homologous gene expressions between K. marxianus and S. cerevisiae, gene family was preferrably used, which contains a total set of homologous genes and thus is suitable for the many-to-many correspondence of homologous genes between two species. The expression value of a gene family in a species was the sum of the contained genes' expressions. There were 3653 gene families contain both K. marxianus and S. cerevisiae genes, in which 2535 gene families were identi ed as differentially expressed in K. marxianus or S. cerevisiae, using respective 1h as control. Those gene families were then clustered into 12 groups according to their expression patterns (Fig. 4, denoted as C1, C2, ..., C12), using k-means method calculated on Euclidean distance. The signi cantly enriched GO terms (pvalue < 0.05) for each cluster were listed in Table 1 (more details are provided in Additional le 3).
As illustrated in Fig. 4, at the early stage (i.e. time points 4h and 6h), the numbers of differentially expressed genes in K. marxianus are much higher than those in S. cerevisiae. Furthermore, K. marxianus and S. cerevisiae showed dramatic expression change since 12h and 24h, respectively, which are in line with the time point of OD 600 's quick increment in the growth curves (Additional le 1: Figure S1).
For the commonality of gene expression pattern between K. marxianus and S. cerevisiae, most genes involved in ribosome biogenesis and protein translation were both evidently down-regulated during cultivation (C1 ~ C3 in Fig. 4), and genes participating in trehalose biosynthesis, TCA cycle, respiratory chain, and autophagy were both heavily up-regulated (C9 ~ C11 in Fig. 4). For the dissimilarity of gene expression between the two species, genes for the intracellular signal transduction (e.g. MAPK cascade) were unchanged in K. marxianus and up-regulated in S. cerevisiae (C12 in Fig. 4). Genes involved in DNA replication and cell cycle were down-regulated in K. marxianus and nearly unchanged in S. cerevisiae (C4 in Fig. 4). Notably, genes participating in biotin biosynthesis, iron transport, occulation, respiratory chain, and mitochondrion assembly were up-regulated in K. marxianus but unchanged in S. cerevisiae (C7 in Fig.  4), which are consistent with the particularly high copy numbers in K. marxianus. Overall, those ndings suggest ribosome biogenesis, signal transduction, and cell cycle may not be key to the fast growth phenotype in K. marxianus, instead, mitochondrial function related genes and highly copied genes may strongly contribute to its fast growth.
Fundamental expression of genes involved in TCA cycle, respiratory chain, and ATP synthesis exhibit higher levels in K. marxianus compared to in S. cerevisiae Regarding expression at 1h as the fundamental level for gene transcription, we compared this between K. marxianus and S. cerevisiae. Using S. cerevisiae 1h as control, there were 357 gene families with signi cantly higher expressions and 437 gene families with lower expressions in K. marxianus 1h. The enriched GO term for those genes were listed in Table 2 (see details in Additional le 4). It is clear that compared to S. cerevisiae, K. marxianus has enhanced fundamental expression in TCA cycle, respiratory chain, and ATP synthesis, while genes involved in cellular response to drug and amino acid transport had relatively lower expressions. We further validated the fundamental expression comparison of respiratory chain genes and ATP synthesis genes by RT-qPCR analysis, whose results in Fig. 5 were perfectly consistent with the ndings derived by RNA-seq. Therefore, at the fundamental expression level, K. marxianus has remarkably higher gene expressions in mitochondrial function, including respiratory chain and ATP synthesis, which may greatly contribute to its fast growth.
K. marxianus and S. cerevisiae have signi cant difference in metabolic pathways during their fastest growth phase Due to the cultivation experiments (Additional le 1: Figure S1), the fastest growth phase for K. marxianus was 6h ~ 12h and for S. cerevisiae was 12h ~ 24h herein. This was also con rmed by the RNA-seq analysis (Fig. 4), i.e. there was a genome-wide dramatic expression change between 6h and 12h for K. marxianus and between 12h and 24h for S. cerevisiae. We further analyzed homologous gene expression change during the fastest growth phase.
For K. marxianus, treating 6h as control, gfold values at 12h were calculated, while for S. cerevisiae, treating 12h as control, gfold values at 24h were calculated. Then gene families were ordered according to K. marxianus' and S. cerevisiae's gfold values, respectively, as shown in Fig. 6A and Fig. 6B. The majority of enriched GO for the up-regulated and down-regulated genes were also provided in Fig. 6 (details in Additional le 5). Gene expression changes during K. marxianus' fastest growth phase were mainly consistent with those in S. cerevisiae, but there are also some inconsistence (Fig. 6). The enriched GO terms for the consistent and inconsistent parts were listed in Table 3 (details in Additional le 6). In the consistency, TCA cycle and respiratory chain were both up-regulated in K. marxianus and S. cerevisiae, while cell cycle related genes had little expression change in both species. In the inconsistency, glucose import, iron homeostasis, and ATP biosynthesis were up-regulated in K. marxianus but unchanged in S. cerevisiae. While genes for protein folding and response to heat were unchanged in K. marxianus but up-regulated in S. cerevisiae. Those difference in metabolic pathways during their fastest growth phase, may contain the key for unlocking K. marxianus' fast growth mystery.
K. marxianus preferentially co-up-regulated respiratory chain, ATP synthesis, and glucose import during the fastest growth phase We further compared the changes in central metabolic pathways between K. marxianus and S. cerevisiae during their fastest growth phase. As shown in Fig. 7, the central metabolism was divided into material metabolism (in the left part, including glucose transport, glycolysis, and TCA cycle) and energy metabolism (in the right part, including respiratory chain, ATP synthesis, iron transport, and heme biosynthesis). For the material metabolism, glucose import was particularly enhanced in K. marxianus but unchanged in S. cerevisiae, while glycolysis and TCA cycle were consistent between K. marxianus and S. cerevisiae. For the energy metabolism, iron transport and heme biosynthesis, which take part in respiratory chain, were specially up-regulated in K. marxianus but unchanged in S. cerevisiae. For the respiratory chain itself (not including the downstream ATP synthesis), it was enhanced both in K. marxianus and S. cerevisiae, with a stronger up-regulation in K. marxianus. For the ATP synthesis via F 0 F 1 ATPase, it was strengthened in K. marxianus but slightly down-regulated in S. cerevisiae, suggesting in K. marxianus the respiratory chain and ATP biogenesis are particularly tightly coupled. Notably, as aforementioned, during this phase, glucose transport was also evidently up-regulated in K. marxianus but not in S. cerevisiae. The implication for the co-up-regulated glucose import and respiratory chain in K. marxianus will be explored in great details in the discussion part.
To gain insight into the underlying mechanism for the co-expression of respiratory chain and ATP synthesis in K. marxianus but not in S. cerevisiae, we analyzed the upstream of the involved genes. The upstream 1kb of 27 genes in respiratory chain and 14 genes in ATP synthesis were analyzed via MEME software online [19] for motif discovery. There were six signi cant motifs (E-value < 0.05) located in those upstream regions in K. marxianus (Fig. 8A), but only one signi cant motif in S. cerevisiae (Fig. 8B). The sequences and location of such motifs were provided in Fig. 8. Similarly, the upstream 1kb of 27 genes in respiratory chain and 9 genes in glucose import were also analyzed by MEME. Three signi cant motifs were detected in K. marxianus (Fig. 8C) and one motif identi ed in S. cerevisiae (Fig. 8D). The above ndings provide molecular support for the particular co-expression of glucose transport, respiratory chain, and ATP synthesis in K. marxianus, indicating the importance of regulatory region evolution during K. marxianus speciation.

Discussion
In this study, upon the genome content analysis of K. marxianus evolution in Saccharomycetaceae and the transcriptome analysis between K. marxianus and S. cerevisiae, we found the mechanisms for K.
marxianus' fast growth may be closely related to ATP production. This nding coincidentally meets the widely recognized concept that ATP is the energy in living cells and the key to drive energy-consuming processes such as growth [20,21]. In the following, we will discuss the traditional dilemma for the tradeoff in ATP production and the new metabolic strategy for how K. marxianus overcomes this, as discovered in this work.
Cell's growth rate is tightly connected with ATP production [21]. Generally, higher growth rate relies on higher ATP production yield as well as production rate [21]. However, heterotrophic organisms usually have to face the trade-off between ATP production rate and yield [22]. In details, during cells' degrading substrates with higher free energy into products with lower free energy, the free energy difference between substrate and product can be partly conserved into ATP production and partly dissipated to drive the degradation reaction. If substrate (such as glucose) is catabolized through the 'glycolysis -respiratory chain' route ( Fig. 9A, blue arrow), 32 ATPs are generated. In this case, free energy difference is nearly totally preserved into ATPs and the reaction is in thermodynamic equilibrium, resulting in the very slow rate of substrate degradation (including glucose import), which hinders the competition with other species for environmental resources. If cells adopt the route of 'glycolysis -fermentation' for substrate degradation (Fig. 9A, red arrow), only 2 ATPs are produced, the other part of free energy difference is used to promote the degradation reaction, triggering the fast glucose import, which is a remarkable advantage in resource competition. However, the very few ATP yield can hardly supply su cient energy for cell growth.
We found K. marxianus may have developed two new strategies during evolution for ATP production to guarantee its fastest growth. (1) K. marxianus may co-up-regulate respiratory chain and glucose transport, which does not exist in S. cerevisiae (Fig. 7, Fig. 9B). This was supported both by the speci c genes and homologous genes. For KMS genes, due to their predicted function based on Pearson correlation coe cient, genes possibly involved in respiratory chain and glucose transport were upregulated in the fastest growth phase (Fig. 3B). For homologous genes, genes participating in respiratory chain and ATP synthesis were highly up-regulated in K. marxianus than in S. cerevisiae at the fundamental level (Table 2, Fig. 5), while genes involved in respiratory chain (including iron transport and heme biosynthesis) as well as in glucose import were co-up-regulated in K. marxianus but not in S. cerevisiae during their fastest growth phase (Fig. 7). These ndings suggest K. marxianus may gain a new strategy of coupling glucose import with respiratory chain, to ensure during the high yield of ATP production (by respiratory chain), available medium resources are also strongly absorbed (by glucose import). This particular co-up-regulation in K. marxianus, was further supported by the larger motif numbers in the upstream of respiratory chain and glucose transporter genes in K. marxianus (Fig. 8C) than in S. cerevisiae (Fig. 8D), implying K. marxianus may acquire the new metabolic strategy via gene regulation region evolution. This is consistent with previous reports that the cis-regulatory systems are dynamically evolved in Ascomycete fungi [23].
K. marxianus may tightly co-up-regulate respiratory chain and ATP synthesis related F 0 F 1 ATPase during the fastest growth phase (Fig. 7). In contrast, in S. cerevisiae's fastest growth phase, respiratory chain was up-regulated but F 0 F 1 ATPase was unchanged (Fig. 7). Notably, the up-regulation of genes involved in response to heat and protein folding in S. cerevisiae but unchanged in K. marxianus (Table 3), is consistent with the imperfect coupling of respiratory chain and ATP synthesis in S. cerevisiae. It has been reported that when respiratory chain and ATP synthesis are uncoupled, a large part of the free energy difference derived from electron transport chain are transformed to heat [24], and this leads to the upregulation of cellular response to heat and molecular chaperones for maintaining protein folding [25]. The remarkably higher number of motifs in the upstream of respiratory chain and F 0 F 1 ATPase genes in K. marxianus (Fig. 8A) than in S. cerevisiae (Fig. 8B), suggesting K. marxianus may have particularly optimized its regulation regions to ensure highly e cient production of ATP.
Finally, in this work we attempted to make function prediction for K. marxianus-speci c (KMS) genes. Based on the recognition that genes with strong co-expression may be in the same or closely related pathway [17,18], we used the function known gene with the highest Pearson correlation coe cient to infer the function of KMS genes. The predicted function of KMS genes were generally accordant with the ndings of homologous genes (Fig. 9B). However, we found in some cases, several genes possessing very close correlation coe cients with a KMS gene, may have distinct functions. Therefore, there is still a long way to unravel the functions of KMS genes by experiment validation and to re-evaluate their contribution to the fast-growth phenotype.

Conclusions
In this study, based on the comparative genome study for gene content and expression pattern, we found K. marxianus' fast growth may be attributed to the preferentially enhanced fundamental expressions of TCA cycle and respiratory chain genes, as well as the co-up-regulation of glucose transporter, respiratory chain and ATP synthesis genes during the fastest growth phase. Those conclusions underscore the importance of genome-wide rewiring of transcriptional network during evolution. To sum up, our ndings provide a theoretical support for K. marxianus' wide industrial application, and also propose a practicable means to explore species' complex phenotype formation by combining genome evolution and homologous gene expression analysis.

Methods
Yeast strains and culture conditions K. marxianus FIM1 strain used in this work was deposited at China General Microbiological Culture Collection Center (CGMCC) with a reference number of 10621. S. cerevisiae S288c strain was used in this study. FIM1 and S288c were transferred from YPD plates to 10 mL tube containing 3 mL YPD medium (2% glucose, 2% peptone, 1% yeast extract). Strains were inoculated and grown at 30°C and 220 rpm for overnight, then transferred to 150 mL asks with 50 mL YPD medium to start at an OD 600 of 0.1. After that the asks were shaken at 30°C and 220 rpm.
Genome sequencing, assembly, and annotation The nuclear DNA was extracted from K. marxianus FIM1 using standard protocols. DNA libraries harboring 300-bp, 3-kb, and 8-kb inserts were subsequently constructed. Paired-end sequencing of these DNA libraries was then performed on an IlluminaHiSeq 2000 and PacBio RS sequencer by using standard protocols. Low quality short reads were trimmed or ignored by using Trimmomatic. The SOAPdenovo [26] was then applied to the short reads passing the ltering procedure to generate scaffolds. Protein-coding genes were predicted from the assembled genome by using the "Yeast Genome Annotation Pipeline" [15], and then blasted against NCBI non-redundant (nr) protein database with E-value cutoff as 10 -5 for functional annotation. tRNAs were predicted by tRNAscan-SE [27] and rRNAs were predicted by Barrnap (https://github.com/tseemann/barrnap). K. marxianus FIM1's complete genome sequence and annotation GFF3 le have been deposited in NCBI Genome with GenBank accession No. CP015054 to CP015061.

Identi cation of gene families and phylogenetic analysis
To study the gain/loss of genes during K. marxianus evolution, we downloaded 14 genome sequences in Saccharomycetales (Fig. 1). Gene families of all protein sequences in the total genomes were identi ed as follows. First, an all-against-all BLAST was applied on these protein sequences with E-values < 10 -5 . Second, global protein similarities were calculated using InParanoid [28] and those matched with both su cient gene coverage (> 75%) and alignment identity (> 50%) were left for further analyses. Third, orthologue clusters, i.e. the 'gene families', were identi ed by comparing protein alignments using OrthoMCL [29]. Multiple sequence alignment of proteins in each gene family was obtained using MUSCLE [30] with default parameters and was further trimmed with trimAL [31]. The gene families containing single-copy genes within each genome were concatenated into a super gene for reconstruction of phylogenetic tree, using the maximum likelihood method with 100 bootstrap replicates. The Dollo Parsimony analysis was performed on all the gene families of the 15 yeasts by using PHYLIP package to analyze the gain and loss of gene families at each node along phylogenetic tree.

Gene copy number analysis
To analyze gene copy numbers among different species in Saccharomycetales, we rst found out the replicated genes in K. marxianus according to the same functional annotation within a gene family, then counted the copy number of those genes in other species based on their replication in the gene family.

Sample preparation for RNA-seq
For comparing the RNA-seq data of K. marxianus and S. cerevisiae along their growth course, strains were previously inoculated in 50 mL YPD medium overnight under agitation at 220rpm at 30°C, then grown into a new ask containing 50 mL fresh YPD medium at initial OD 600 of 0.1. After that, 150 L samples were collected at 1h, 4h, 6h, 12h, 24h, 48h, and 72h. Samples for RNA-seq analyses were both in biological two replicates. Total RNA was extracted using the ZR Fungal/Bacterial RNA MiniPrepTM (Zymo Research, CA). The samples were then sent to the Genergy Biotechnology company (Shanghai, China) for quality and quantity evaluation and sequencing.

RNA-seq analysis and differential expression identi cation
We obtained 15.1 million pair-end reads on average for each RNA sample. After initial QC, short 150 bp reads were mapped to the reference genomes of K. marxianus FIM1 and S. cerevisiae S288c (Saccharomyces Genome Database) using HISAT2. 1 [32]. Finally the FPKM values for all genes were obtained (Additional le 2). Due to the fact that there usually exists a many-to-many correspondence for the homologous genes between two species, and a gene family contains a total set of homologous genes, we used gene family to compare homologous genes' expression difference between K. marxianus and S. cerevisiae. For each gene family in a species, its expression is the sum of FPKM values from the contained genes. Then gene family's differential expression was calculated using the GFOLD algorithm [33], which is roughly equivalent to the raw fold change log 2 ratio value, but takes into account the uncertainty of gene expression measurement by RNA-seq, thus is more reliable than the raw fold change [33]. Two fold or more changes, i.e. |gfold| >= 1, were de ned as signi cantly differentially expressed herein. Gene family with gfold >= 1 is de ned as up-regulated and with gfold <= -1 is referred as downregulated. When analyzing gene expression change within a species along cultivation time, expression at 1h was as control, then gene family's gfold value at a subsequent time point was calculated. When comparing fundamental expressions between K. marxianus and S. cerevisiae, S. cerevisiae 1h was chosen as control, the gfold value for K. marxianus 1h vs S. cerevisiae 1h was then calculated.

RT-qPCR analysis
Genes' expression levels were determined by real-time reverse transcription PCR (RT-qPCR). To compare genes' fundamental expressions between K. marxianus and S. cerevisiae, strains were diluted into YPD medium at an initial concentration of 0.1 OD 600 , then cells were collected at 1 h. The total RNA was isolated using ZR Fungal/Bacterial RNA MiniPrep kit (R2014, Zymoresearch) and cDNA was obtained by reverse transcription using PrimeScript RT (RR037A, Takara). Analysis of cDNA was conducted on a LightCycler 480 (Roche Applied Science, Germany) with the SYBR Premix Ex Taq II (RR820A, Takara). The expressions of individual genes were normalized against the level of 18S rRNA. Primers used for RT-qPCR were listed in Additional le 1: Table S1.
Function prediction for K. marxianus-speci c genes based on correlation coe cient Inspired by the idea of using gene co-expression network to predict the function of hypothetical genes in Aspergillus niger [17], in this study we employed Pearson correlation coe cient (denoted as r) to infer the function of K. marxianus-speci c (KMS) genes, which was calculated between a KMS gene and a function known gene along the time points during cultivation. If r > 0.8, the function known gene can be considered as candidate for function prediction, nally the gene with the highest r value was chosen for function inference.
Additional le 5: Detailed information of enriched GO terms. A le containing enriched GO terms for the up-regulated and down-regulated genes in K. marxianus and S. cerevisiae during their fastest growth phase, respectively.
Additional le 6: Detailed information of enriched GO terms. A le containing enriched GO terms for the differentially expressed genes with consistent and inconsistent patterns between K. marxianus and S. cerevisiae during their respective fastest growth phase.  Figure 1 Phylogenetic tree of K. marxianus evolution in Saccharomycetales. The tree was generated based on the concatenated sequence of 636 single-copy genes without long-branch score heterogeneity. The bootstrap value 100 at each node point indicates the correctness of inferred topology. The blue part contains Y.

Figures
Page 20/27 lipolytica in Dipodascaceae and P. pastoris in Phaffomycetaceae as outgroup. The purple and the brown parts represent the two major clades in Saccharomycetaceae, respectively, i.e., the one close to Saccharomyces and the one close to Kluyveromyces. At each node, the number in circle denotes the serial number in phylogenetic tree, and the numbers of gained and lost gene families were provided in pink and blue boxes, respectively.

Figure 2
Functional categories of genes gained at each node along K. marxianus speciation. The number upon each functional category column denotes the counted number of K. marxianus gained genes involved in the function process. inferred by the known function of the highest correlated gene, were provided on the left, while the corresponding correlation coe cients were listed on the right.

Figure 4
Clustering expression patterns of homologous gene family in K. marxianus and S. cerevisiae. Gene families with 2 fold or more changes at least at one time point after 1h in K. marxianus or S. cerevisiae were presented in this gure. The horizontal axis carries out the time points along cultivation for K.
Page 23/27 marxianus (on the left, abbreviated as 'KM') and S. cerevisiae (on the right, abbreviated as 'SC'). Each line represents the differential expression (i.e. gfold value) of a gene family containing homologous genes in K. marxianus and S. cerevisiae, using respective 1h as control. Using k-means based on Euclidean distance, gene families were clustered into 12 clusters, denoted as C1, C2, ..., C12 in the rst column. The clusters C4, C6, C7, and C12 are apparently different between K. marxianus and S. cerevisiae.

Figure 5
RT-qPCR analysis of genes' fundamental expression in K. marxianus compared to S. cerevisiae. 'KM_1h' and 'SC_1h' denote gene expression at 1h in K. marxianus and S. cerevisiae, respectively. Using SC_1h as control, the ratio of KM_1h vs SC_1h was provided. Note that the starting point for the vertical axis is 1, which directly displays K. marxianus' higher expression than S. cerevisiae. Some genes for respiratory chain (e.g. YNL134C) were nearly undetectable in S. cerevisiae at 1h, thus have no value in the gure.  Comparison of central metabolic pathways during fastest growth phase between K. marxianus and S. cerevisiae. For each gene, the nearby colour box denotes its up-regulation or down-regulation. The red system and green system stand for up-regulation and down-regulation in K. marxianus, respectively. The purple system and blue system stand for up-regulation and down-regulation in S. cerevisiae, respectively.
The correspondence of colour and gfold value is quanti ed by the colour bars in gure's upper part. For the changes in pathway, the hollow arrow, slant lled arrow, and blue solid arrow denote the generally down-regulated, unchanged, and up-regulated state of the pathway, respectively.

Figure 8
Comparison of K. marxianus and S. cerevisiae' motifs detected in genes' upstream 1kb. Motif analysis for the upstream 1kb of respiratory chain genes and ATP biosynthesis genes. Motifs in the upstream 1kb of respiratory chain and ATP biosynthesis genes are detected in K. marxianus (A) and in S. cerevisiae (B).
Motifs in the upstream 1kb of respiratory chain and glucose transport genes are identi ed in K. marxianus (C) and in S. cerevisiae (D). In each sub gure, motifs' sequence, signi cant E-value, and corresponding color were listed on the left, their detailed locations were shown on the right.