3.1 Nucleotide Composition analysis:
In order to understand the effect of C. sativus genes on CUB, their nucleotide composition was examined. The results of nucleotide composition analysis showed that the average percentage of nucleotide base at the third codon position was highest for G (36.6%) followed by the C (33.9%), T (29.9%) and A (25.9%). We also calculated GC/AT content and GC percentage at three codon positions individually (GC1, GC2 and GC3). At individual codon positions, we found that the mean GC proportion at 3rd position was maximum (56.6%) followed by GC content at first (52.5%) and second codon positions (40.6%). Overall coding sequences of C. sativus was GC biased as mean GC value was greater than mean AT percentage. Similar to the host, the genomic content of its associated fungal pathogens F. oxysporum and A. fumigatus also showed GC biasness. In F. oxysporum the nucleotide composition analysis showed that the average percentage of nucleotide base at the third codon position was highest for C (44.91%) followed by the T (33.43%), G (26.94%) and A (18.18%). The mean GC proportion at second position was maximum (GC2 = 54.69%) followed by GC content at first (GC1 = 52.76%) and third codon positions (GC3 = 44.58%). In A. fumigatus the nucleotide composition at third codon position was highest for C (41.7%), followed by G (34.2), T (30.4), and A (18%). The mean GC content at second position was found to be maximum (GC2 = 55.7%) followed by mean GC at first position (GC1 = 53.02%), and at third position (GC3 = 41.2%) (Table 1; Supplementary S1).
Table 1
Nucleotide composition analysis for the coding sequences of Crocus sativus, Fusarium oxysporum and Aspergillus fumigatus.
Nucleotide composition | Crocus sativus | Fusarium oxysporum | Aspergillus fumigatus |
A3% | 25.9 | 18.18 | 18 |
T3% | 29.9 | 33.43 | 30.4 |
C3% | 33.9 | 44.91 | 41.7 |
G3% | 36.6 | 26.94 | 34.2 |
GC% | 51.1 | 52.91 | 53.08 |
AT% | 49.9 | 47.09 | 46.92 |
GC1% | 52.5 | 52.76 | 53.02 |
GC2% | 40.6 | 54.69 | 55.7 |
GC3% | 56.6 | 44.58 | 41.2 |
AT3% | 43.4 | 55.42 | 58.8 |
GC12% | 72.86 | 80.11 | 80.87 |
3.2 Codon usage bias analysis:
Whether the pattern of codon usage in C. sativus and its associated pathogens is random or biased, CUB analysis was carried out using three indices namely ENC, CAI and CBI. The mean ENC value of C. sativus was 51.8, indicating the random usage of codons. The existence of weak bias in the genome of C. sativus was further reinforced by the low values of CBI (-0.004) and CAI (0.199) Similarly, the weak codon usage pattern and random codon usage bias was shown in the genome of F. oxysporum and A. fumigatus based on ENC, CAI and CBI values (Table 2; Supplementary S1).
Table 2
Average score of different codon bias indices for the coding sequences of Crocus sativus, Fusarium oxysporum and Aspergillus fumigatus.
Codon bias analysis indices | C. sativus | F. oxysporum | A. fumigatus |
CBI | -0.004 | 0.11 | 0.11 |
ENC | 51.8 | 49.71 | 54.37 |
CAI | 0.199 | 0.26 | 0.25 |
tAI | 0.15 | | |
MELP | 0.179 | | |
3.3 Relationship between codon usage bias and mRNA translation efficiency.
The tAI of a coding sequence ranges from 0 to 1 (dos Reis et al., 2004). The corresponding high values indicates high levels of translational efficiency. In the present study, the average value of tAI for C. sativus genes was found to be 0.15 which is comparatively low. This revealed less amount of tRNAs availability and thus contributing to low translational efficiency. In addition, the moderate correlation between tAI and ENC of C. sativus genes (r = -0.032, p = 0.761) further suggested the weak relationship between mRNA translation efficiency and CUB (Table 2).
3.4 Relationship between codon usage and gene expression
The selected coding sequences of C. sativus were allocated into two groups based on the ENC values (ENC > 50 and ENC < 50). Inverse relationship exists between ENC values and CUB. In the present study we found the relationship between gene expression embodied by FPKM value and Codon usage bias represented by ENC values. We observed that, high codon usage bias group with ENC < 50 possessed an average FPKM value of 2.81 while the low codon usage bias group with ENC > 50 had an average FPKM value of 4.91. Therefore, it can be inferred that in C. sativus, highly expressed genes are characterized with low biasness and genes with low expression exhibited high biasness in the codon pattern (Fig. 1; Supplementary S1). Further, the CDS were separated into four groups including A, T, C and G – ending, according to the inclination of the base at third position of each codon. The result indicated that genes with G as the preferred base at the third position had high FPKM value followed by T, C, and A. The contribution of G bases are in alliance with the fact that preferred codons in C. sativus also tend to have G ending codons (Fig. 2 ; Supplementary S1).
3.5 Effect of gene expression level (MELP) on codon usage bias
Gene expressivity has long been known to have a major impact on an organism's underlying codon usage (Coghlan and Wolfe, 2000; Kliman et al., 2003; Sharp and Li, 1987). We calculated MELP values for the genes in this investigation to uncover any potential bias in context of gene expression. The MELP value of C. sativus genes ranged from 1.10 to 3.55 with an average of 1.79 (Table 2) which shows relatively high gene expressivity. In addition we have performed correlation analysis of MELP and ENC to understand the effect of expression level for codon usage bias. A non-significant correlation was found between MELP and ENC (r = 0.158, p = 0.198).
3.6 RSCU analysis:
RSCU (Relative Synonymous Codon Usage) is an index used to measure the disparity in the synonymous codon usage. Based on the RSCU analysis, codons with high frequency were determined. Of all the preferred codons for 18 amino acids, most of them were C-ending (8 codons) followed by G (6 codons), U (3 codons) and A-ending (1 codon). Among high frequency codons, AGG (1.86), CUC (1.56), AUC (1.35) and AAG (1.34) encoding Arg, Leu, Ile and Lys respectively were identified to possess maximum RSCU values. Whereas codons CGU (0.54), GUA (0.52) and AAA (0.66) encoding Val, Arg and Leu were found to have minimum RSCU values. Therefore, it can be deduced from the present study that the genome of C. sativus has the predilection towards G/C ending codons in comparison to A/U ending codon which mostly falls into the under-represented group. The result indicate that natural selection and mutational pressure in C. sativus affects base composition differences that might play key role in existing codon bias towards G/ C ending codons. Similarly, in F. oxysporum among the preferred 18 codons there were 12 C-ending codons followed by 3 U-ending codons and 3 G-ending codons. Also, in A. fumigatus among the preferred codons, C-ending codons (13-codons) were found to the dominant, followed by G-ending codons (3-codons) and U-ending codons (2-codons). None of the codons in both the pathogens showed preference for A-ending codons. Overall, the RSCU results revealed that the host C. sativus and its fungal pathogens F. oxysporum and A. fumigatus have similar preference for GC-ending codons. This result refers to dominance of GC-ending codons (Table 3).
Table 3
Relative synonymous codon usage bias analysis for the coding sequences of Crocus sativus, Fusarium oxysporum and Aspergillus fumigatus.
Amino acid | Synonymous codons | Relative frequency C. sativus | RSCU C. sativus | Relative frequency F. oxysporum | RSCU F. oxysporum | Relative frequency A. fumigatius | RSCU A. fumigatus |
Alanine | GCU | 0.28 | 1.12 | 0.35 | 1.4 | 0.34 | 1.36 |
| GCC | 0.28 | 1.12 | 0.37 | 1.48 | 0.32 | 1.28 |
| GCA | 0.26 | 1.04 | 0.15 | 0.6 | 0.1 | 0.4 |
| GCG | 0.18 | 0.72 | 0.11 | 0.44 | 0.23 | 0.92 |
Cystine | UGU | 0.38 | 0.76 | 0.33 | 0.66 | 0.34 | 0.68 |
| UGC | 0.61 | 1.22 | 0.65 | 1.3 | 0.63 | 1.26 |
Aspargine | GAU | 0.59 | 1.18 | 0.45 | 0.9 | 0.39 | 0.78 |
| GAC | 0.41 | 0.82 | 0.54 | 1.08 | 0.6 | 1.2 |
Glutamine | GAA | 0.39 | 0.78 | 0.32 | 0.64 | 0.36 | 0.72 |
| GAG | 0.61 | 1.22 | 0.66 | 1.32 | 0.63 | 1.26 |
Phenylalanine | UUU | 0.39 | 0.78 | 0.33 | 0.66 | 0.44 | 0.88 |
Glycine | UUC | 0.61 | 1.22 | 0.65 | 1.3 | 0.55 | 1.1 |
| GGU | 0.22 | 0.88 | 0.39 | 1.56 | 0.24 | 0.96 |
| GGC | 0.21 | 0.84 | 0.3 | 1.2 | 0.46 | 1.84 |
| GGA | 0.32 | 1.28 | 0.19 | 0.76 | 0.14 | 0.56 |
| GGG | 0.25 | 1 | 0.1 | 0.4 | 0.14 | 0.56 |
Histidine | CAU | 0.48 | 0.96 | 0.43 | 0.86 | 0.53 | 1.06 |
| CAC | 0.5 | 1 | 0.55 | 1.1 | 0.46 | 0.92 |
Isoleucine | AUU | 0.3 | 0.9 | 0.35 | 1.05 | 0.39 | 1.17 |
| AUC | 0.45 | 1.35 | 0.54 | 1.62 | 0.51 | 1.53 |
| AUA | 0.25 | 0.75 | 0.09 | 0.27 | 0.09 | 0.27 |
Lysine | AAA | 0.33 | 0.66 | 0.26 | 0.52 | 0.34 | 0.68 |
| AAG | 0.67 | 1.34 | 0.7 | 1.4 | 0.65 | 1.3 |
Leucine | UUA | 0.08 | 0.48 | 0.06 | 0.36 | 0.04 | 0.24 |
| UUG | 0.19 | 1.14 | 0.11 | 0.66 | 0.15 | 0.9 |
| CUU | 0.19 | 1.14 | 0.25 | 1.5 | 0.2 | 1.2 |
| CUC | 0.26 | 1.56 | 0.32 | 1.92 | 0.24 | 1.44 |
| CUA | 0.08 | 0.48 | 0.06 | 0.36 | 0.05 | 0.3 |
| CUG | 0.2 | 1.2 | 0.17 | 1.02 | 0.29 | 1.74 |
Asparagine | AAU | 0.47 | 0.94 | 0.3 | 0.6 | 0.35 | 0.7 |
| AAC | 0.53 | 1.06 | 0.69 | 1.38 | 0.64 | 1.28 |
Proline | CCU | 0.26 | 1.04 | 0.34 | 1.36 | 0.2 | 0.8 |
| CCC | 0.21 | 0.84 | 0.34 | 1.36 | 0.37 | 1.48 |
| CCA | 0.3 | 1.2 | 0.2 | 0.8 | 0.2 | 0.8 |
| CCG | 0.23 | 0.92 | 0.11 | 0.44 | 0.2 | 0.8 |
Glutamine | CAA | 0.44 | 0.88 | 0.34 | 0.68 | 0.3 | 0.6 |
| CAG | 0.56 | 1.12 | 0.65 | 1.3 | 0.69 | 1.38 |
Arginine | CGU | 0.09 | 0.54 | 0.23 | 1.38 | 0.14 | 0.84 |
| CGC | 0.12 | 0.72 | 0.27 | 1.62 | 0.32 | 1.92 |
| CGA | 0.11 | 0.66 | 0.18 | 1.08 | 0.18 | 1.08 |
| CGG | 0.13 | 0.78 | 0.09 | 0.54 | 0.17 | 1.02 |
| AGA | 0.24 | 1.44 | 0.11 | 0.66 | 0.09 | 0.54 |
| AGG | 0.31 | 1.86 | 0.09 | 0.54 | 0.07 | 0.42 |
Serine | UCU | 0.16 | 0.96 | 0.23 | 1.38 | 0.18 | 1.08 |
| UCC | 0.2 | 1.2 | 0.23 | 1.38 | 0.22 | 1.32 |
| UCA | 0.13 | 0.78 | 0.1 | 0.6 | 0.14 | 0.84 |
| UCG | 0.16 | 0.96 | 0.12 | 0.72 | 0.16 | 0.96 |
| AGU | 0.14 | 0.84 | 0.1 | 0.6 | 0.12 | 0.72 |
| AGC | 0.2 | 1.2 | 0.19 | 1.14 | 0.14 | 0.84 |
Threonine | ACU | 0.28 | 1.12 | 0.3 | 1.2 | 0.21 | 0.84 |
| ACC | 0.32 | 1.28 | 0.42 | 1.68 | 0.38 | 1.52 |
| ACA | 0.22 | 0.88 | 0.14 | 0.56 | 0.19 | 0.76 |
| ACG | 0.18 | 0.72 | 0.12 | 0.48 | 0.21 | 0.84 |
Valine | GUU | 0.27 | 1.08 | 0.29 | 1.16 | 0.29 | 1.16 |
| GUC | 0.25 | 1 | 0.47 | 1.88 | 0.34 | 1.36 |
| GUA | 0.13 | 0.52 | 0.08 | 0.32 | 0.02 | 0.08 |
| GUG | 0.35 | 1.4 | 0.14 | 0.56 | 0.33 | 1.32 |
Tyrosine | UAU | 0.59 | 1.18 | 0.4 | 0.8 | 0.43 | 0.86 |
| UAC | 0.41 | 0.82 | 0.59 | 1.18 | 0.56 | 1.12 |
3.7 ENC Plot Analysis:
The scattered graph was plotted between ENC and GC3s to illustrate the relationship between nucleotide composition and codon bias in C. sativus sequences, thereby analysing the key aspects of codon usage among genes. Evidently, compositional restrictions were found to be the only determinant factor shaping the codon usage pattern based on the expected values of ENC with respect to GC3 on standard line. We observed that most genes were either above or below the predicted line in the current analysis, implying that the pattern of codon usage in C. sativus has been influenced by forces other than compositional restrictions, such as natural selection. Similar scattering pattern of genes was observed in the ENC plot of F. oxysporum and A. fumigatus (Fig. 3). Overall, the ENC plot analysis revealed that the codon usage pattern in C. sativus and its fungal pathogens was affected by more than one factors like mutational pressure and natural selection.
3.8 Neutrality Plot analysis
The neutrality plot, which is a scatter plot between GC12 and GC3 values, is used to investigate the effect of mutation pressure against natural selection. On evaluating the graph, we observed the slope of regression line was approaching near to zero (0.15), indicating the role of mutation pressure. However, considering the broad GC range, the impact of natural selection was also ascertained. Further, regression coefficient was used to determine the extent of both the evolutionary forces. The regression coefficient of GC12 on GC3 was 0.1547, suggesting the contribution of mutational pressure was 15% while influence of natural selection was 85%. Evidently, natural selection seemed to play predominant role in the CUB of C. sativus. Correspondingly, in F. oxysporum the regression coefficient of GC12 on GC3 was 1.765, signifying that the contribution of mutational pressure was 17.65% while the role of natural selection was 82.35%. In A. fumigatus the contribution of mutational pressure was found to be 19.42% while the impact of natural selection was found to be 80.58% (Fig. 4). Overall, the results indicates the dominant role of natural selection in shaping codon usage pattern in host plant and its associated fungal pathogens.
3.9 Parity plot (PR2) analysis
Parity plot analysis was carried out to determine the proportion of GC and AT pairs in gene that further revealed the effect of evolutionary forces on the formation of codon bias. Equivalent use of AT and GC reflects the role of mutation alone while disproportionate use indicate the significant involvement of natural selection. On analysing the PR-2 plot we observed that most of the genes were clustered around the 0.5 coordinate exhibiting the major role of mutation in shaping codon usage pattern in C. sativus. But few genes showed their presence distant from the centre, suggesting that natural selection along with mutation pressure might have influenced the CUB of C. sativus. Further, the plot was divided into four quadrants, we observed the prevalence of most genes on upper left side of the quadrant as compared to other three quadrants, denoting the enrichment of A/C ending codons. Overall PR2 analysis showed the role of both evolutionary forces (mutation and selection) in shaping CUB in C. sativus
In F. oxysporum the PR2 analysis showed that the most points fell in third quadrant (in which ratio of A3/U3 and G3/C3 < 0.5) as compared to other quadrants, whereas first quadrant contained few points. These results indicate that CDS in F. oxysporum had a slight yet noticeable preference for U and C codons at the third codon position. Therefore, the balance between AU and GC has been disrupted in F. oxysporum. This uneven usage of AT and GC reflects the role of both evolutionary forces (mutational pressure and natural selection) in shaping codon usage pattern in F. oxysporum. Congruently, in A. fumigatus unequal usage of AT and GC also suggests the role of both evolutionary forces (mutational pressure and selection) in shaping its codon usage pattern (Fig. 5).
3.10 Correspondence analysis (COA)
COA is a multivariate statistical approach to explore the disparity of synonymous codon usage pattern using the RSCU values. Among the 59 axes, axis 1 and axis 2 were the major axes that accounted for the majority of the variation. We observed scattering pattern of codons over the four quadrants of COA plot indicating the differential influence of evolutionary forces on each codon. It was found that A and U/T ending codons were in close proximity to axis 1, indicating the nucleotide composition mostly A/ T influences the codon usage. Moreover, it can also be inferred that mutational effect of A and T ending codons on CUB is more as compared to C ending followed by G ending codons which were placed near axis 2.
The COA analysis of F. oxysporum reveals that among the 59 axes, axis 1 (32.47%) accounted for the majority of the codon usage variation followed by axis 2 (12.73%). The AU ending codons (Blue colored) were mostly found on the right side of the axis 1 along with few GC ending codons (green colored) while most of the GC ending codons found on the left side of the axis 1 reflecting the difference in the degree of codon usage bias among AU and GC codons. In case of A. fumigatus the COA analysis showed that among the 59 axes, axis 1 (29.50%) contributes to the maximum codon usage variation followed by axis 2 (25.10%). The COA plot revealed that the majority of the points were clustered around the origin of axes indicating that these genes have more or less similar codon usage biases. However, few points are widely scattered along the negative side of axis 1, which suggest that codon usage bias of these genes are not homogeneous. It can be noted that there were very few points scattered widely along the positive side of the first major axis. Since the first axis explained only a partial amount of variation of codon usage, we hypothesize that several major factors were involved in shaping codon usage variation of A. fumigatus. In C. sativus and F. oxysporum the corresponding distribution of synonymous codons on these axes clearly shows the separation of C/G-ending codons and A/U ending codons along the first axes. This further supported that variations in synonymous codon usage among the genes are based on the nucleotide content within them (Fig. 6).
3.11 Correlation Analysis:
The correlation analysis was performed between ENC/CAI vs other CUB indices for comprehensive understanding concerning the factors impacting patterns of codon usage in C. sativus. We observed that ENC showed significant positive correlation with A3, T3, and AT3 and negative correlation with G3, C3, GC1, GC12 and GC3. CAI had exhibited negative correlation with A3, T3 and AT3 and was positively correlated with G3, C3, GC1, GC2, GC12 and GC3. The correlation analysis in F. oxysporum reveals that ENC was found to be significantly positively correlated with GC3 and A3 while a significant negative correlation found with G3. C3, T3, GC1, GC2, AT3, GC12 and CAI (Table 4). However, CAI was found significantly positively correlated with C3, T3, GC1, GC2, AT3 and GC12 while negatively correlated with G3, GC3, and A3. The correlation analysis in A. fumigatus showed that ENC has significant positive correlation with A3, T3, AT3 and non-significant positive correlation with G3, while it is found that there was significant negative correlation with C3, GC1, GC2, GC3, GC12 and CAI. However, the CAI was found to be having significant positive correlation with C3, GC1 and GC3 while it has significant negative correlation with G3, A3, T3, GC2, AT3 and GC2 (Table 5).
Table 4
Correlation analysis between ENC and CUB indices of Crocus sativus, Fusarium oxysporum and Aspergillus fumigatus. .
ENC | G3 | C3 | A3 | T3 | GC1 | GC2 | GC3 | AT3 | GC12 | CAI |
C. sativus | r= -0.551, p = 0.000 | r= -0.331, p = 0.009 | r = 0.643, p = 0.000 | r = 0.407, p = 0.001 | r= -0.506, p = 0.000 | r= -0.270, p = 0.035 | r= -0.569, p = 0.000 | r = 0.569, p = 0.000 | r= -0.517, p = 0.000 | r= -0.289, p = 0.024 |
F. oxysporum | r = -0.093, p = 0.005 | r = -0.859, p = 0.000 | r = 0.872. p = 0.000 | r = -0.023, p = 0.499 | r = -0.683, p = 0.000 | r = -0.421, p = 0.000 | r = 0.566, p = 0.000 | r = -0.566, p = 0.000 | r = -0.636, p = 0.000 | r = -0.820, p = 0.000 |
A. fumigatus | r = 0.031, p = 0.348 | r = -0.885, p = 0.000 | r = 0.755, p = 0.000 | r = 0.545, p = 0.000 | r = -0.726, p = 0.000 | r = -0.254, p = 0.000 | r = -0.355, p = 0.000 | r = 0.355, p = 0.000 | r = -0.349, p = 0.000 | r = -0.705, p = 0.000 |
Table 5
Correlation analysis between CAI and CUB indices of Crocus sativus, Fusarium oxysporum and Aspergillus fumigatus. .
CAI | G3 | C3 | A3 | T3 | GC1 | GC2 | GC3 | AT3 | GC12 | ENC |
C. sativus | r = 0.233, p = 0.071 | r = 0.360, p = 0.004 | r= -0.495, p = 0.000 | r= -0.213, p = 0.099 | r = 0.369, p = 0.003 | r = 0.276, p = 0.031 | r = 0.339, p = 0.008 | r= -0.339, p = 0.008 | r = 0.412, p = 0.001 | r= -0.289, p = 0.024 |
F. oxysporum | r = -0.022, p = 0.510 | r = 0.863, p = 0.000 | r = -0.831, p = 0.000 | r = 0.123, p = 0.000 | r = 0.564, p = 0.000 | r = 0.280, p = 0.000 | r = -0.531, p = 0.000 | r = 0.531, p = 0.000 | r = 0.492, p = 0.000 | r = -0.820, p = 0.000 |
A. fumigatus | r = -0.186, p = 0.000 | r = 0.770, p = 0.000 | r = -0.766, p = 0.000 | r = -0.077, p = 0.021 | r = 0.301, p = 0.000 | r = -0.108, p = 0.001 | r = 0.093, p = 0.005 | r = -0.093, p = 0.005 | r = -0.029, p = 0.385 | r = -0.705, p = 0.000 |
These results showed that overall in all three species CUB is affected by nucleotide composition along with other factors viz, mutational pressure and natural selection.
3.12 Amino acid usage frequency
Amino acid composition is affected by the genomic GC content and in turn affects the codon usage pattern of the organisms. Analysing the frequency of amino acids would be helpful in understanding the evolutionary pattern of organism in addition to its physiological functions. In the genome of C. sativus preponderance of 6-fold and 4-fold amino acids were observed. Leucine was found to be the most abundant amino acid trailed by serine, alanine, glutamic acid and glycine. Whereas cysteine, histidine and tyrosine were among the least abundant amino acids In F. oxysporum the amino acid composition was found be highest for the glycine followed by alanine, leucine, serine, and valine. These results showed that in the genome of F. oxysporum there is preference for 4-fold and 6-fold amino acids. Whereas histidine, cystine and tyrosine were found to be least abundant amino acids. Likewise, the amino acid composition of A. fumigatus indicated leucine to be most abundant amino acid followed by serine, alanine, glycine and proline while cystine, histidine and tyrosine were among least abundant amino acids. Inclusively, existence of 4-fold and 6-fold amino acids were more widespread in C. sativus and its pathogens (Fig. 7).