Comparative genomics uncovers genetic diversity and synthetic biology of secondary metabolite production of the genus Trametes

It’s well-established that the CAZyme genes of genus Trametes contributed to the degradation processes of polysaccharides, including lignin or crystalline cellulose. However, the comprehensive analysis of the composition of CAZymes and the biosynthetic gene clusters of Trametes genus remain unclear. We conducted comparative analysis, detected the CAZyme genes, and predicted the biosynthetic gene clusters for 9 Trametes strains. Among 82,053 homologous clusters we obtained for genus Trametes, we identified 8,518 core genes, 60,441 accessory genes and 13,094 specific genes. Our results showed that a large proportion of CAZyme genes were catalogued into glycoside hydrolases, glycosyltransferases, and carbohydrate esterases. The predicted BGCs of Trametes genus were divided into 6 strategies and the 9 Trametes strains harbored 47.78 BGCs on average. Our study uncovers the genus Trametes exhibited an open pan-genome structure, provides insights into the genetic diversity and explores the synthetic biology of secondary metabolite production for Trametes genus. In this study, we collected 9 available genomic data from 8 species of Trametes genus and conducted a comparative genomics analysis to investigate the genetic diversity and the synthetic biology of secondary metabolite. We obtained the pan-genome of Trametes genus, annotated the pan-genome against COG database and eggNOG database, and detected the CAZyme genes. The biosynthetic gene clusters for 9 Trametes strains were also predicted. Our results showed that the genus Trametes exhibited an open pangenome structure and diverse genetic diversity for Trametes strains. The different distribution of CAZymes of Trametes genus revealed that the difference of ability for utilizing the carbohydrates for Trametes strains. The predicted biosynthetic gene clusters with unknown functions in Trametes genus suggested that Trametes genus has a great potential value for producing secondary metabolite.


Introduction
The genus Trametes, one of white-rot basidiomycetes, belonging to the family Polyporaceae and the class Agaricomycetes of fungi commonly grows in the tiled layer of decaying wood [ 1 ] and uses deciduous tree [ 4 ]. Although the compositions of CAZyme genes of fewer species of the Trametes genus have been reported, the comprehensive analysis of the composition of CAZyme genes of Trametes genus still remains unclear.
Secondary metabolites (referred to as natural products or specialized metabolites) are the foundation of many drugs [ 15 ] and are also important chemicals widely used in agriculture and nutrition [ 16 ]. It is well-known that fungi are rich sources of thousands of second metabolites (SMs) and fungal SMs can be grouped into four main chemical types: polyketides (PKS), terpenoids, shikimin acid derived compounds, and non-ribosomal peptides (NRPS) [ 17 ]. Secondary metabolites of fungal are crucial players in its development and actively shape interactions with other microbes [ 18 ]. A previous study has reported that the number of genes coded for secondary metabolism of ascomycetes is higher than that of basidiomycetes, archeo-ascomycetes, and chytridiomycetes [ 19 ]. In recent years, lot of fungal genomes were sequenced and the development of various genome mining software tools, such as antiSMASH [ 16 ] and ClusterFinder [ 20 ], enabled researchers to analyze the biosynthetic gene clusters (BGCs) of secondary metabolites. It has been reported that the genomes of filamentous fungi contain up to 90 potential BGCs encoding their diverse secondary metabolites [ 21 ] and 24 genomes of Penicillium genus were mined for BGCs and associated PKS and NRPS BGCs to known pathways [ 22 ]. However, there has been no systemically comparative analysis on the identification of BGCs of secondary metabolite in Trametes genus.
In this study, we collected 9 available genomic data from 8 species of Trametes genus and conducted a comparative genomics analysis to investigate the genetic diversity and the synthetic biology of secondary metabolite. We obtained the pan-genome of Trametes genus, annotated the pan-genome against COG database and eggNOG database, and detected the CAZyme genes. The biosynthetic gene clusters for 9 Trametes strains were also predicted. Our results showed that the genus Trametes exhibited an open pangenome structure and diverse genetic diversity for Trametes strains. The different distribution of CAZymes of Trametes genus revealed that the difference of ability for utilizing the carbohydrates for Trametes strains. The predicted biosynthetic gene clusters with unknown functions in Trametes genus suggested that Trametes genus has a great potential value for producing secondary metabolite.

Pan-genome construction and analysis
To conduct the pan-genome analysis, 9 available genomic data from 8 species within the genus Trametes were collected. The assembled genome of Trametes genus ranged from 31.62 Mb to 57.98 Mb, while the number of contigs/scaffolds ranged from 13 to 10,327, as well as the number of CDSs/ORFs predicting by Pordigal ranged from 56,735 to 101,817 (Table 1). To characterize the differences of genomic compositon among these 9 strains, we detected the orthologs from the 610,421 high-quality proteins of Trametes genus. In total, we obtained 82,053 homologous clusters and the accumulation curve of homologous cluster showed that Trametes genus exhibited an open pan-genome structure (Figure 1a).
The size of pan-genome of Trametes genus tended to increase gradually with new strains and was estimated to be 82,053 non-redundant genes within 9 strains (Figure 1a). While the size of core-genome tended to decrease progressively with new strains and was comprised of 8,518 non-redundant genes within 9 strains of Trametes genus (Figure 1a).
In contrast, we obtained 60,441 accessory genes and 13,094 specific genes of Trametes genus. The distribution of accessory genes varied from 17,587 to 31,856. Although the size of T. cinnabarina FP 104138-Sp is smallest (Table 1)

Phylogenetic analysis of Trametes pan-genome
It is well-established that the nuclear ribosomal Internal Transcribed Spacer (ITS) region is the primary fungal barcode marker [ 23 ] and useful to identify the broadest range of fungi [ 24 ]. To analyze the phylogenetic relationships for 9 Trametes strains, we built the phylogenetic tree for Trametes genus and Coriolopsis caperata, which was selected as outgroup to root the topology of the tree, based on their ITS1 and ITS2 sequences, respectively. We observed that the ITS1 and ITS2 sequences have a strong power to differentiate the 9 strains of Trametes genus (Figure 1c
To explore the evolution of function catalogues, we also conducted a GO analysis to characterize genetic functions of the pan-genome. These genes were categorized according to biological process, cellular component and molecular function. The results showed that a great number of genes of pan-genome involved in various enzymes, metabolic pathways and biological processes. Specifically, the enrichment analysis of biological processes showed that many genes were classified into three categories: biosynthetic process (2,265), response to stress (2,186), and small molecule metabolic process (1,882, Figure 2b). In addition, a total of 606 orthologous genes were contributed to the carbohydrate metabolic process (Figure 2b). The classification according to molecular function showed that the majority of genes grouped into binding (such as ion binding, DNA binding, and RNA binding) and enzymes (such as ATPase activity, oxidoreductase activity, and kinase activity), yet various genes also enriched in other categories included transporter activity and enzyme regulator activity (Figure 2b). The enrichment analysis of cellular component showed that the higher proportions of genes were grouped into the functions, including cell (8,492), intracellular (8,386), and cytoplasm (7,363, Figure 2b). Among these categories, we observed that accessory clusters are highly represented in these functions than core and specific genes ( Figure  2b). Thus, we speculated that the diversity of accessory clusters might influence their functional diversity, which allows these strains to utilize the resources of surrounding environment and better adapt to the environment.

Identification of CAZymes for Trametes genus
It is well-established that the white-rot basidiomycetes have ability to decompose lignin most efficiently [ 25 ]. The genus Trametes, as one of branches of white-rot basidiomycetes, the members of this genus, such as T. villosa [ 25 ] and T. gibbosa BRFM 952 [ 26 ], have been reported that these strains have an unexpected high activity on lignin or crystalline cellulose. Previous study has showed that the carbohydrate-active enzymes (CAZymes) play a central role in the degradation process of glycoconjugate, oligo-and polysaccharide [ 27 ]. Hence, to obtain systematic understanding of the CAZymes of Trametes genus, we systemically identified the pan-genome of Trametes genus against the CAZy database and grouped CAZyme genes into different CAZyme families and CBMs based on family-specific HMMs [ 28 ]. In total, we identified 280 orthologous genes ( Figure 3a) and grouped these genes into 87 CAZyme families. Our results showed that a large proportion of orthologous genes catalogued into glycoside hydrolases (35.36%, GHs), glycosyltransferases (21.07%, GTs), and carbohydrate esterases (13.57%, CEs, Figure 3a). The GH hydrolyze the glycosidic bond between two or more carbohydrates [ 27 ] and it contributes the most catalytic enzymes and involves in the degradation of lignocelluloses [ 29 ]. Moreover, we observed that the number of CAZymes of accessory clusters are higher than that of core clusters and specific genes (Figure 3a)

Identification of biosynthetic gene clusters (BGCs) of Trametes genus
To obtain a better understanding of the secondary metabolite of Trametes genus, we applied antiSMASH to predict the biosynthetic gene clusters (BGCs). Our results showed that the BGCs of Trametes genus were mainly divided into 6 strategies, including putative cluster of unknown type identified with the ClusterFinder algorithm (cf_putative), terpene cluster (terpene), nonribosomal peptide synthetase cluster (nrps), putative fatty acid cluster identified with the ClusterFinder algorithm (cf_fatty_acid), type I PKS cluster (t1pks), and lanthipeptide cluster (lantipeptide, Figure 4). Specifically, the 9 strains of

Gene prediction and orthologs identification
We conducted the genome annotation for 9 strains using Pordigal [ 30 ] (version: 2.6). Specifically, to predict the genes and proteins for 9 genomic data of Trametes genus, we applied the Pordigal with default setting to recognize open reading frames (ORFs) and protein sequences [ 31 ]. We identified the protein orthologs of Trametes genus by using OrthMCL [ 32 ] (version: 2.0.9) with e-value < 1e-5 and inflation parameter of 1.5. We divided the homologous clusters into three groups: core, accessory and specific groups. The core genes or proteins represent the genes or proteins shared in all 9 genome of Trametes genus used in our study, while the accessory genes or proteins comprised genes or proteins shared by at least two strains but not all 9 strains of Trametes genus. The rest of genes or proteins only occurred in one strain were clustered into specific groups (strainspecific genes or proteins). We obtained the protein sequences of homologous clusters for Trametes genus.

Phylogeny analysis
The nuclear ribosomal Internal Transcribed Spacer (ITS) region has been reported as the primary fungal barcode marker [ 23 ] and ITS1 or ITS2 was widely used to identify the broadest range of fungi [ 24 ]. Hence, we applied ITSx [ 33 ], which is an open source software utility to extract the highly variable ITS1 and ITS2 subregions from ITS sequences, to extract the ITS1 and ITS2 sequences from the genomic data of 9 strains of Trametes genus. Meanwhile, we chose the ITS1 and ITS2 sequences of Coriolopsis caperata (accession number: AB158316) as outgroup. In our work, due to the fact that the ITS1 and ITS2 sequences of Trametes coccinea were not extracted, we selected the ITS1 and ITS2 sequences of Pycnoporus coccineus strain MUCL  Note: * represent the CDSs/ORFs were predicted by Pordigal.     The probability and homologous gene clusters for three putative BGCs of T.
pubescens. The probability and homologous gene clusters for the putative BGCs