Background: Currently, methylotrophic yeasts (e.g., Pichia pastoris, Hansenula polymorpha, and Candida boindii) are subjects of intense genomics studies in basic research and industrial applications. In the genus Ogataea, most research is focused on three basic O. polymorpha strains—CBS4732, NCYC495, and DL-1. However, these three strains are of independent origin and unclear relationship. As a high-yield engineered O. polymorpha strain, HU-11 can be regarded as identical to CBS4732, because the only difference between them is a 5-bp insertion.
Results: In the present study, we have assembled the full-length genome of O. polymorpha HU-11 using high-depth PacBio and Illumina data. Long terminal repeat (LTR) retrotransposons, rDNA, 5' and 3' telomeric, subtelomeric, low complexity and other repeat regions were curated to improve the genome quality. We took advantage of the full-length HU-11 genome sequence for the genome annotation and comparison. Particularly, we determined the exact location of the rDNA genes and LTR retrotransposons in seven chromosomes and detected large duplicated segments in the subtelomeric regions. Three novel findings are: (1) O. polymorpha NCYC495 is so phylogenetically close to CBS4732/HU-11 that the syntenic regions covers nearly 100% of their genomes with a nucleotide identity of 99.5%, while NCYC495 is significantly distinct from DL-1; (2) large segment duplication in subtelomeric regions is the main reason for genome expansion in yeasts; and (3) the duplicated segments in subtelomeric regions may be integrated at telomeric tandem repeats (TRs) through a molecular mechanism, which can be used to develop a simple and highly efficient genome editing system to integrate or cleave large segments into yeast genomes.
Conclusions: Our findings provide new opportunities for in-depth understanding of genome evolution in methylotrophic yeasts and lay the foundations for the industrial applications of O. polymorpha HU-11 and CBS4732. The full-length genome of the O. polymorpha strain HU-11 should be included into the NCBI RefSeq database for future studies of O. polymorpha CBS4732, NCYC495, and their derivative strains.