C. subvermispora tRNAs. Genome analysis of C. subvermispora by tRNAScan-SE identified a total of 192 tRNAs in 32 scaffolds (Fig. 1). About 72% of the tRNA genes presented introns (Table 1). The scaffold with the highest number of tRNA genes was scaffold 1, which contains 20 copies of various tRNA genes (Additional file 2). The tRNA with the highest number of gene copies corresponded to tRNAs charging glycine, with 17 gene copies distributed in eight scaffolds (scaffolds 1, 2, 5, 7, 9, 10, 13, and 20). The tRNAs for cysteine and tryptophan presented the lowest number of gene copies, each with three gene copies in three scaffolds (Table 1).
Phylogenetic analysis of C. subvermispora tRNA genes. To determine whether the high number of tRNA genes charging the same amino acid corresponds to related gene copies, a phylogenetic reconstruction and evolutionary distance calculation were performed using the tRNA sequences identified by the tRNAscan-SE program. Phylogenetic reconstruction indicated that most tRNA genes that code for the same amino acid group together, with the exception of the tRNA genes that charge arginine, valine, and alanine (Fig. 1), which form two groups in each case. For tRNAs loading arginine, Group I comprise genes presenting anticodons with the WCG sequence. In contrast, group II presents a YCK anticodon consensus sequence. In tRNAs that load valine, Group I correspond to tRNA genes without introns, while group II includes all valine tRNA genes containing introns. Genes coding for the group I of alanine tRNAs exhibit anticodons with the consensus sequence YGC, whereas, the anticodon sequence is AGC for group II (Fig. 1). The tRNA genes 66 and 155 also showed a different pattern. tRNAScan prediction indicates that the amino acid loaded by tRNA155 should be serine, however, the sequence of this gene grouped with threonine charging tRNA genes. Also, the tRNA66 gene is expected to load isoleucine though this gene does not group with isoleucine charging tRNAs (Fig. 1). To identify tRNA genes that are repeated, the evolutionary distance between different tRNA genes was calculated. The genes with values of evolutionary distance equal to 0.000 were selected. This analysis identified 15 tRNA genes that are repeated between two and ten times. The group of tRNA genes that showed the greatest expansion corresponds to those tRNAs carrying glutamic acid and glycine. One glycine tRNA genes is repeated twice and the other is repeated ten times. The tRNA gene for glutamic acid is also repeated ten times (Additional file 3). These genes are scattered along the genome, with the exception of tRNA genes 91, 92, 94, 95, 96, and 97, which code for glycine and are adjacent in the genome.
tRNA abundance and codon usage in C. subvermispora. The expansion of certain families of tRNAs in the genome of C. subvermispora could be the result of an evolutionary pressure to increase its expression. In some organisms such as E. coli and yeast, the number of copies of a tRNA gene is proportional to the abundance in the genome of the decoded codon [24]. This proportionality is explained because during translation process, tRNAs that decode frequently used codons have a higher rate of consumption. To sustain adequate translation, cells must balance synthesis and consumption rates of tRNAs. The increase in copy number of a gene that encodes a tRNA that recognizes a frequently used codon, allows increasing expression of this tRNA to balance its rate of consumption.
To test if the expansion of certain families of tRNAs in C. subvermispora is related to the increment of the specific codons, we assessed whether there was a correlation between the frequency of codon usage and the amount of tRNA genes that decode these most highly used codons. We identify a positive correlation between these parameters (ρ = 0.406, p = 0.0016, n = 61). Dos Reis et al show that the Relative Adaptativeness to the tRNA pool (w) which takes in account that codons can be recognized by anticodons with perfect or imperfect match (wobble codon-anticodon recognition rules) with different affinities is a better parameter to measure the adaptation of a codon to their decoding tRNAs than the absolute number of tRNA. When the frequency of codon usage was correlated with the Relative Adaptativeness to the tRNA pool (w), an increased correlation was observed (ρ = 0.459, p = 2.2 × 10− 4, n = 61) (Fig. 2).
Synonymous codons are not used equally in an organism. As one tRNA can decode several synonymous codons with different affinities, the expansion of some tRNA families in C. subvermispora may be related to the preferential use of certain synonymous codons in coding regions of C. subvermispora. To assess this hypothesis, we correlated RSCU values with Relative Adaptativeness to the tRNA pool. Non-statistical correlation was observed, in part because w values are normalized with respect to the tRNA with the highest number of genes able to decode it (Wi/Wmax) and not with respect to the pool of tRNAs that decode the complete set of synonymous codons. When W values where normalized with respect to the total amount of tRNAs that decode a set of synonymous codons, a strong correlation with the RSCU values (ρ = 0.628, p = 0, n = 61) was observed. This suggests that among synonymous codon, those highly represented in the C. subvermispora genome tend to be decoded by those tRNAs with a high number of gene copies.
Relationship between gene expression level, codon bias, and translational efficiency in C. subvermispora. The bias in codon usage and adaptation to the tRNA pool modulates translational efficiency. Thus, highly expressed genes tend to use codons that are over-represented in the genome, which in turn present greater availability of tRNAs [24]. To determine whether this relationship exists in C. subvermispora, expression levels of C. subvermispora genes were correlated with their adaptation values to the tRNA pool and to codon bias. Adaptation to the tRNA pool was evaluated using two indexes: i) tAI, which measures adaptation of the tRNA pool compared to the relative amount of each tRNA gene, and ii) AAtAI, which evaluates whether a gene preferentially uses the most abundant tRNA charging a particular amino acid. Codon bias was analyzed by calculation of CAI (Codon Adaptation Index), and CPB (Codon Pair-Bias), to assess if gene expression is correlated with bias in usage of codons or of codon pairs. Codon bias also was evaluated using Nc value to determine if the increase in transcription is associated with a decrease in the diversity of codons used. We analyzed expression levels determined by RNAseq obtained from the fungus grown on a medium with Ball-Milled Aspen [37]. All indicators showed high degrees of correlation among them, however, these indicators show very low correlation coefficients with expression levels measured by RNAseq (Additional file 4).
Transcriptional response to growth on BMA, codon bias and translational efficiency. Growth of C. subvermispora in natural environments is dependent on wood. In 2012, Fernandez-Fueyo et al reported microarray experiments that compared gene expression of C. subvermispora grown on glucose and on Ball-Milled Aspen (BMA) as carbon sources. Saline media with BMA has been used as a laboratory medium that mimics growth on wood to analyze expression of genes that are transcriptionally regulated by growth on wood. To analyze if genes regulated by conditions that mimic growth on wood, such as BMA have a different adaptation to the tRNA pool or codon bias, we used the microarray data published by Fernandez-Fueyo and defined four groups of genes: group A corresponds to genes where expression was reduced at least 2 times with a p-value lower than 0.05. Group B includes all genes which showed increased expression of at least 2 times with p-value lower than 0.05. Group C corresponds to all genes with non-significant differences (p > 0.05) and group D contains all genes with low changes in expression (< 2 fold) that are statistically significant. (p < 0.05). Our results show that group B has lower values of Nc and higher values of CAI, tAI, AAtAI, and CPB. Groups A, C, and D show non-significant differences among them. This implies that genes induced by wood preferentially use a reduced set of codons that are better adapted to the tRNA pool present in C. subvermispora (Fig. 3).
When we correlated the CAI, tAI, AAtAI, Nc and CPB indexes with the ratio between expression in BMA and glucose culture medium, a statistically significant correlation was observed. Positive correlations were found with almost all indicators used (CAI, tAI, AAtAI and CPB), the exception was Nc that showed a negative correlation. The higher correlation were identified in CAI and tAI indexes, and in genes that showed significant differences for expression in BMA saline medium compared to expression in glucose-supplemented saline medium (Table 3). This increase in correlation coefficients can be explained if growth on lignocellulose exerts pressure on codon usage of genes involved in the metabolization of this carbon source, thereby selecting those codons that increase the translational efficiency of these genes.
Interestingly, when genes from Group B were sorted according to their codon usage adaptation values or to the tRNA pool, we found that genes coding for ribosomal proteins presented the highest CAI, tAI, and AAtAI values (Additional file 5). An increase in the expression of these genes may lead to an increase of the protein biosynthetic capacity, thus exposure of C. subvermispora to wood or lignocellulose leads to an increase of the rate of translation, specifically enhancing the synthesis of ligninolytic enzymes.
Translational efficiency and codon bias in ligninolytic genes. Genome analysis of C. subvermispora indicates that this organism has an expansion of the number of genes directly related to the mineralization and hydrolysis of lignocellulose. The genome contains 17 annotated genes of manganese peroxidase, five genes coding for laccases, four genes for cellobiose dehydrogenases (CDH), six genes for Δ-12 dehydrogenases, and five genes for Δ-9 dehydrogenases. These genes that encode the enzymes that catalyze the same reaction exhibit differential expression, which might reflect that they serve different functions in the mineralization/hydrolysis process of lignocellulose [8]. To determine whether genes of the same family present a similar bias of codon usage and of codon adaptation or adaptation to the tRNA pool, we compared the values of Nc, tAI, AAtAI, CAI, and CPB of these genes. These values were also normalized respect to the mean and standard deviation of the values obtained for all genes encoded in C. subvermispora (Z-values) [38]. This normalization was applied to identify if and how values for ligninolytic genes differ with respect to these same values in genes not directly related to the ligninolytic process (Table 4).
Genes encoding manganese peroxidase generally show above-average adaptation values to the tRNA pool and codon usage (Z value > 0). The manganese peroxidase gene with the highest level of expression (ID: 50297) proved to be the most adapted to the tRNA pool, with tAI values that are more than two standard deviations from the mean tAI values of other C. subvermispora genes (Z-tAI > 2). We also observed that the most induced manganese peroxidase gene in BMA medium also shows a high value of adaptation to the tRNA pool with a (Z-tAI = 1.619). Genes encoding laccases showed a similar trend, as only the gene that significantly changed their expression levels after growth in BMA medium (ID130783) showed a value of tAI above the average (Z-tAI = 0.461).
Among cellulases, the gene ID148588 showed the highest adaptation values to the tRNA pool (Z-tAI = 1.756) and high expression level under growth with glucose as the sole carbon source. This gene also increased its expression in BMA medium. Several genes for cellulases show Nc values below 40, indicating a strong bias in the use of synonymous codons. However, in this group of genes an association between expression levels or adaptation to the tRNA pool or any other index used in this work was not found. Genes for CDH show little bias in codon usage (Nc ~ 50). Only the CDH gene which is induced by BMA (ID: 84792) shows an Nc value of 48 and values of adaptation to the tRNA pool slightly higher than mean (Z > 0) (Table 4). The Δ12-dehydrogenase genes showed a weak codon bias with the exception of gene ID124050, which showed a Nc value of 38 and an above average adaptation to the tRNA pool. This gene also showed a strong expression in medium containing glucose which was not modified by the BMA medium. An opposite behavior was observed in the 9-dehydrogenase genes, where the gene with the greatest induction in BMA (ID129048) showed a lower Nc value (41.46), suggesting a strong bias in codon usage (Table 4).
When genes related to mineralization and digestion of lignocellulose were ordered according to their adaptation values to codon usage or the tRNA pool, genes with greater adaptation values were found to belong to the manganese peroxidase families and to proteins with cellulose-binding domains. Interestingly, manganese peroxidases are more adapted to the tRNA pool, while proteins with cellulose-binding domains showed higher adaptation to codon usage.