Global transcriptomic analysis
To describe the gene expression patterns during the development, 18 libraries were constructed using samples from six growth stages of Sarcomyxa edulis(Fig. 1), with three biological replicates per sample. A total of 774.59 million raw reads were generated by Illumina paired-end sequencing. After cleaning and quality checks, 742.29 million clean reads were obtained, with an average of 41.23 million reads per copy (Table S1). More than 77.56% of the reads per replicate could be mapped to the S. edulis genome. The Q30 percentage of all sequences in the 18 libraries were over 91%.
Gene expression level analysis
By calculating the FPKM value of different genes, the number of genes and the expression level of single gene under different expression levels were analyzed. Generally, it is considered that the value of FPKM is greater than 1, indicating that the gene is expressed, and the higher the value of FPKM, the higher the level of gene expression. The results of gene expression level analysis showed that the median of 18 samples was relatively concentrated, and the value of log10 (FPKM) ranged from 3.82 to4.59. The highest expression in B2 was 16.13(Fig. 2, Table S2)
Identification of differentially expressed genes across various developmental stages
Differentially expressed genes (DEGs) across six stages were identified based on FPKM values with a Q-value of 0.05 and a|log 2 (Fold Change)| of 1 set as thresholds (the adjustment range of the difference multiple was defined as greater than or equal to 2, and the adjustment range of the significance threshold was defined as less than or equal to 0.05). As shown in Fig. 3A, the largest number of differentially up-regulated genes were identified between the B5 and B6 stages (3171), followed by B4 vs. B5(2478) and B1 vs. B2 (2243) (Fig. 3a). These results revealed that the expression profiles of the primordium (B4) and mature fruiting body (B6) are significantly different from mycelium (B5). Probably because they're different organizations. The greatest number of unique DEGs were found in the comparison between B5 and B6 (903), while only 153 DEGs were unique to the B3 vs. B4 comparison (Fig. 3B). There were 215 shared DEGs among all five comparisons of six stages and these were enriched primarily with the response to catalytic activity (GO:0003824) (Fig. S2).
Functional classification of differentially expressed genes
All DEGs were classified into three categories: biological process (BP), cellular component (CC)and molecular function (MF). The significantly enriched terms were almost the same for the B1 vs. B2, B2 vs. B3, B3 vs. B4, B4 vs. B5 and B5 vs. B6 comparisons (Fig. 4A-E). All of the significantly enriched terms were assigned into metabolic process(BP), cell (CC) and molecular function (MF).
The DEGs were mapped to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and tested for enrichment to further research their functions. The significantly enriched pathways were almost the same for the B1 vs. B2, B2 vs. B3, B3 vs. B4, B4 vs. B5 and B5 vs. B6 comparisons. All of the significantly enriched pathways were assigned into Global and overview maps and Carbohydrate metabolism, they are all part of the Metabolism (Table S3).
Gene co-expression networks construction
Because the results of network module analysis are easily affected by outliers, outliers must be removed before building the network module to ensure the accuracy of the results. There were 18 samples in this study, which were clustered by calculating the correlation coefficient of each sample expression level (Figure 5A). The Pearson’s correlation coefficient of gene expression between replicates for each sample was shown in Figure S2. Samples with low correlation, or samples that cannot be clustered on the tree graph, are outlier samples, such as B6-1, B4-2 and B3-2. After the outliers are removed, the clustering tree of the remaining samples is shown in Figure 5B.
For the selection of power value, in general, we will take the minimum power value when the correlation coefficient reaches the plateau period (or greater than 0.8) as the parameter for subsequent analysis (as shown on the left of Figure 6), and we will count the changes of average gene connectivity under different power values (as shown on the right of Figure 6). According to the results of Figure 6, the power value parameter selected for this analysis is 8. The selected characteristic parameters for this analysis are shown in the figure below.
This analysis identified 19 different modules(marked with different colors, Fig. 7A), in which each branch of the gene clustering tree corresponds to one module. The 19 modules correlated with different development stages due to stage-specific expression profiles. The number of genes in each module is shown in Fig. 7B. The turquoise module, with 2312 identified genes, has the most genes. The grey module has the least genes, with 1 identified gene.
Using the correlation analysis between module eigenvalues and specific traits and phenotype data, find out the modules most related to traits and phenotypes for further analysis. The matrix of four physiological and biochemical traits (Laccase, acidic xylanase(ACX), cellulase (CL) and lignin peroxidase (Lip) activities) of the samples obtained in the previous step was used for correlation analysis with the above modules under different treatment time (Table S4), among which some modules were highly correlated with physiological and biochemical traits (Fig. 8). For example, the blue module had significant positive correlation with CL and ACX (r = 0.96, r = 0.88, respectively). There was a significant positive correlation between dark orange modulus and laccase (r = 0.93), laccase and ACX (r = 0.82, r = 0.86), salmon modulus and lip (r 128 = 0.91), salmon module and Lip (r = 0.91). Next, the genes in these four modules will be further studied.
GO annotation of target modules
In order to further explore the function of the target module, we map each module gene to each term of GO database(http://www.geneontology.org/), and calculate the number of genes in each term, so as to get the gene list and gene number statistics with a certain go function. The results show that these four modules can be significantly enriched into several go pathways under the three categories of biological processes, molecular functions and cellular components (Fig. 9). As shown in Fig. 8, the four modules can be enriched into catalytic activity genes and binding genes, indicating that WGCNA can be effectively constructed into biologically significant co-expression modules. These modules can be the focus of this study.
Screening and functional analysis of key genes in target modules
In order to obtain the key genes in the above four modules, Cytoscape software was used to visualize the gene regulation network and screen out the genes with high connectivity in the modules(Fig. 10). From the 4 modules with the highest correlation, 17 key genes that may be related to lignocellulose degradation were screened (Table 1).In these networks, each node represents a gene, the nodes are linked by lines, and genes at both ends of the line are generally assumed to have the same biological function. Nine key genes were screened out from the blue module, two key genes were screened out from the darkorange module, three key genes were screened out from the steelblue module, and three key genes were screened out from the salmon module.
Seventeen key genes were described in Table 1. At the same time, they can all be annotated to the CAZy database, indicating that they all belong to carbohydrate enzymes. It shows that our analysis is correct. Among them, six genes belonged to AAs family( Auxiliary Activities : Redox enzymes that act in conjunction with Cazymes), nine genes belonged to GHs family( Glycoside Hydrolases : hydrolysis and/or rearrangement of glycosidic bonds ), one gene belonged to CBMs family( Carbohydrate-Binding Modules: adhesion to carbohydrates ), and one gene belonged to PLs family( Polysaccharide Lyases: non-hydrolytic cleavage of glycosidic bonds ).
Differential expression of lignocellulose degradation related genes
The differential expression of these 17 genes was analyzed (Fig. 11). SE.1A4887, SE.1A4757, SE.1A2186, SE.1A5997, SE.1A7714 and SE.1A4339 were clustered together, and the expression level were high in all the six stages. SE.1A3039, SE.1A1616, SE.1A5542, SE.1A8861 and SE.1A1551 were clustered together, and the expression level were low. Among them, SE.1A3690 showed as the highest expression in the B1 stage, indicating its relation to lignocellulose degradation significantly. B4 and B6 are clustered together because they are all fruiting body. B2 and B3 cluster together because they are at a lower temperature.
Validation of DEGs by Quantitative Real-Time PCR (qRT-PCR)
In order to further verify the reliability of the transcriptome data, Four core genes and four other DEGs related to lignocelluloses degradation were selected for qRT-PCR(Fig. 12).The primers used in qPCR were listed in Table S5. The expression patterns of 8 genes detected by qRT-PCR wereconsistent with the transcriptome expression trend during the six development but a few genes were at some stage showed different expression trends. The above disparities may have been caused by differences between transcriptome sequencing and qRT-PCR as detection methods, a certain degree of inconsistency (about 30-40%)(Chen et al., 2020). The result was normal and reasonable. The results showed that the qRT-PCR results of most of the genes were consistent with the transcriptome data.