3.1. Conserved domain structure analysis of IIV6 DNA ligase
The IIV6 genome contains a coding sequence that produces a 615-amino acid protein, which has been identified as an NAD+-dependent DNA ligase. BLAST homology searches revealed that the DNA ligase protein in the IIV6 genome exhibits low identity with DNA ligases from previously characterized organisms, including Escherichia coli (24%), Thermus filiformis (22%), Acanthamoeba polyphaga mimivirus (31%), Vaccinia virus (22%), and Melanoplus sanguinipes entomopoxvirus (30%) [11, 22, 25, 62, 63]. Despite the relatively low sequence similarity, IIV6 DNA ligase protein displays several conserved residues that play a role in DNA ligase activity in related proteins. One of these residues is lysine 101, which is located within the conserved motif I KxDG and serves as the site for AMP binding. IIV6 DNA ligase has the catalytic core composed of an adenylation domain (amino acid 57–285) and OB-fold domain (amino acid 293–362). The IIV6 DNA ligase core is preceded by an N-terminal Ia domain (amino acid 1–56), which contains the nicotinamide mononucleotide binding site that is present in all NAD+-dependent DNA ligases [64] and by two C-terminal domains: a helix-hairpin-helix (HhH) domain (amino acid 471–520) and a BRCT domain (amino acid 549–612) and the zinc finger domain is absent in IIV6 DNA ligase (Fig. 1).
3.2. Compositional analysis in IIVs DNA ligase gene
To emphasize the possible impact of nucleotide contents on codon usage, we initially analyzed the nucleotide contents of the DNA ligase genes (Table 2). Analysis of base composition of IIVs DNA ligase genes demonstrated that the mean values of A%, G%, C% and T% were 36.69, 19.90, 15.11, and 28.30, respectively (Fig. 2A). It was found that the A/T nucleotide frequencies in IIVs DNA ligase genes were significantly higher compared to the G/C nucleotide frequencies. Likewise, the nucleotides A/T were most abundant in the IIV6 DNA ligase gene, with compositions of 42.37% and 31.60%, respectively, followed by G (14.23%) and C (11.80%). To obtain a more comprehensive understanding of how nucleotide content influences the pattern of codon usage in the DNA ligase genes of IIVs, we calculated the nucleotide content specifically at the 3rd position of synonymous codons (A3, T3, C3, G3) (Table 2). The nucleotide content at the 3rd position of synonymous codons was found to be as follows: A3 (46.34%), T3 (45.63%), C3 (18.87%), and G3 (19.34%) (Fig. 2B).These findings suggest that A/T-ending codons may be more prevalent than G/C-ending codons, and this pattern was consistently observed in the IIV6 DNA ligase gene. These results suggest that in IIVs DNA ligase genes, compositional factors play a crucial role in determining the total nucleotide content and the nucleotide content at the 3rd position.
Table 2
Nucleotide composition analysis of IIVs DNA ligase genes
IIVs DNA ligase genes | A% | C% | T% | G% | A3s% | C3s% | T3s% | G3s% | AT% | GC% | GC1% | GC2% | AT3% | GC3s% | GC12% | ENC | CAI | E-CAI |
IIV6 205R | 42.37 | 11.80 | 31.60 | 14.23 | 63.32 | 10.26 | 51.71 | 6.89 | 73.97 | 26.03 | 37.66 | 26.95 | 86.53 | 12.35 | 32.35 | 35.64 | 0.84 | 0.86 |
IIV31 110R | 35.96 | 16.82 | 29.14 | 18.08 | 45.82 | 18.49 | 46.72 | 18.75 | 65.10 | 34.90 | 41.36 | 33.94 | 70.61 | 27.57 | 37.70 | 48.84 | 0.71 | 0.73 |
IIV3 MIV052L | 29.19 | 23.38 | 22.63 | 24.79 | 28.03 | 39.18 | 26.13 | 31.84 | 51.83 | 48.17 | 63.80 | 38.71 | 43.20 | 54.72 | 43.90 | 55.27 | 0.49 | 0.48 |
IIV25 170R | 39.20 | 12.36 | 31.67 | 16.77 | 52.65 | 11.00 | 54.58 | 12.06 | 70.87 | 29.13 | 39.88 | 29.60 | 82.09 | 16.77 | 34.80 | 40.70 | 0.48 | 0.56 |
IIV22 159R | 40.16 | 11.42 | 31.75 | 16.67 | 54.84 | 9.68 | 57.42 | 7.65 | 71.91 | 28.09 | 40.16 | 29.84 | 85.74 | 12.70 | 35.05 | 39.40 | 0.48 | 0.57 |
IIV9 109R | 38.89 | 12.65 | 31.90 | 16.56 | 52.39 | 11.75 | 55.05 | 10.30 | 70.79 | 29.21 | 40.63 | 29.52 | 82.54 | 16.29 | 35.30 | 39.91 | 0.75 | 0.85 |
IIV22A 166R | 39.95 | 11.26 | 31.91 | 16.89 | 54.74 | 9.01 | 58.15 | 7.39 | 71.86 | 28.14 | 40.49 | 30.16 | 86.23 | 12.06 | 35.35 | 37.26 | 0.49 | 0.56 |
IIV30 168R | 39.83 | 11.51 | 31.81 | 16.86 | 53.66 | 10.06 | 57.39 | 8.62 | 71.63 | 28.37 | 39.77 | 30.11 | 84.78 | 13.68 | 35.00 | 38.77 | 0.75 | 0.79 |
AMIV 120 | 30.88 | 19.59 | 24.92 | 24.61 | 31.17 | 28.08 | 34.34 | 35.18 | 55.80 | 44.20 | 47.49 | 36.05 | 50.94 | 47.90 | 41.85 | 55.82 | 0.50 | 0.54 |
CSAIV 185R | 29.38 | 22.29 | 21.43 | 26.89 | 27.53 | 38.64 | 24.42 | 37.91 | 50.81 | 49.19 | 49.19 | 37.82 | 39.45 | 58.67 | 43.45 | 57.50 | 0.51 | 0.61 |
SHIV 65R | 39.32 | 12.90 | 28.61 | 19.17 | 50.78 | 18.14 | 43.54 | 22.82 | 67.93 | 32.07 | 37.40 | 28.17 | 69.36 | 28.69 | 32.70 | 46.32 | 0.54 | 0.59 |
SHIV 55R | 32.52 | 17.53 | 21.92 | 28.03 | 36.72 | 22.89 | 40.16 | 29.18 | 54.43 | 45.57 | 61.47 | 34.86 | 59.63 | 39.18 | 48.10 | 55.70 | 0.66 | 0.63 |
CQIV 087L | 39.32 | 12.96 | 28.56 | 19.17 | 50.78 | 18.14 | 43.54 | 22.82 | 67.87 | 32.13 | 37.40 | 28.34 | 69.36 | 28.69 | 32.80 | 46.25 | 0.83 | 0.83 |
Mean | 36.69 | 15.11 | 28.30 | 19.90 | 46.34 | 18.87 | 45.63 | 19.34 | 64.99 | 35.01 | 44.36 | 31.85 | 70.03 | 28.41 | 37.57 | 45.95 | 0.62 | 0.68 |
SD | 0.045 | 0.043 | 0.041 | 0.045 | 0.115 | 0.106 | 0.116 | 0.113 | 0.085 | 0.085 | 0.088 | 0.039 | 0.169 | 0.167 | 0.050 | 0.079 | 0.14 | 0.13 |
A3, T3, C3, and G3 represent the frequency of the nucleotide A, T, C, and G at the third positions of codons; GC12 represents the G + C content at the first and second positions of codons; GC3s represents the frequency of the nucleotide G + C at the third positions of synonymous codons; ENC represents the effective number of codons; CAI represents codon adaptation index |
Examining the GC content at each codon position provides a useful tool for assessing base composition bias and understanding its potential implications in molecular evolution and genetic processes. The mean GC nucleotide contents were 44.36% at the first codon position (GC1); 31.85% at the second codon position (GC2); 37.57% at the first and second codon position (GC12). Based on the analysis, the mean AT content was determined to be 64.99%, while the mean GC content was calculated to be 35.01% (Fig. 2C). Additionally, the mean AT3 content was found to be 70.03%, and the mean GC3 content was 28.41% (Fig. 2D). Similar results were found in IIV6 DNA ligase coding sequence. These findings show that the IIVs DNA ligase coding sequences are AT-rich and there is a preference for A/T nucleotides at the third codon position.
3.3. RSCU analysis
To examine the patterns of synonymous codon usage and assess the potential bias towards A/T-terminated codons, we calculated the RSCU for codons in the IIV6 DNA ligase and compared with that of its host (Table 3). Among the 18 most prevalent codons, the majority of them were A/T-terminated codons. Specifically, there were eleven T-ending codons, including TGT (C), GAT (D), TTT (F), CAT (H), AAT (N), CCT (P), TAT (Y), GCT (A), ATT (I), GTT (V), and TCT (S). Additionally, there were seven A-ending codons, namely TTA (L), CAA (Q), AAA (K), GAA (E), AGA (R), TCA (S), and GGA (G). Remarkably, the results revealed a notable trend where a majority of the over-represented codons (RSCU > 1.6) in the IIV6 DNA ligase gene was were found to have A-ended nucleotides. Conversely, the most under-represented codons (RSCU < 0.6) had C-ended, as illustrated in Table 3. These findings strongly indicate that mutation pressure played a substantial role in influencing the codon usage patterns observed in the IIV6 DNA ligase gene.
Table 3
The relative synonymous codon usage frequency of IIV6 DNA ligase gene (IIV6 205R) and natural host of IIV6
| | RSCU | | | | | RSCU | |
AA | Codon | IIV6 205R | C. Suppressalis | | AA | Codon | IIV6 205R | C. suppressalis |
Phe (F) | TTT | 1.71 | 1.71 | | Ala (A) | GCT | 2.18 | 2.07 |
| TTC | 0.29 | 0.29 | | | GCC | 0.00 | 0.40 |
Leu (L) | TTA | 3.43 | 3.54 | | | GCA | 1.82 | 1.49 |
| TTG | 0.76 | 0.49 | | | GCG | 0.00 | 0.04 |
| CTT | 1.05 | 0.85 | | Tyr (Y) | TAT | 1.33 | 1.75 |
| CTC | 0.10 | 0.23 | | | TAC | 0.67 | 0.25 |
| CTA | 0.38 | 0.77 | | His (H) | CAT | 1.75 | 1.55 |
| CTG | 0.29 | 0.11 | | | CAC | 0.25 | 0.45 |
Ile (I) | ATT | 1.50 | 1.70 | | Gln (Q) | CAA | 1.83 | 1.66 |
| ATC | 0.32 | 0.30 | | | CAG | 0.17 | 0.34 |
| ATA | 1.18 | 1.80 | | Asn (N) | AAT | 1.68 | 1.68 |
Val (V) | GTT | 1.89 | 1.52 | | | AAC | 0.32 | 0.32 |
| GTC | 0.42 | 0.24 | | Lys (K) | AAA | 1.92 | 1.85 |
| GTA | 1.47 | 2.00 | | | AAG | 0.08 | 0.15 |
| GTG | 0.21 | 0.24 | | Asp (D) | GAT | 1.54 | 1.63 |
Ser (S) | TCT | 2.63 | 2.13 | | | GAC | 0.46 | 0.37 |
| TCC | 0.00 | 0.75 | | Glu (E) | GAA | 1.73 | 1.71 |
| TCA | 1.69 | 1.83 | | | GAG | 0.27 | 0.29 |
| TCG | 0.00 | 0.19 | | Cys (C) | TGT | 1.60 | 1.38 |
| AGT | 1.31 | 0.62 | | | TGC | 0.40 | 0.63 |
| AGC | 0.38 | 0.58 | | Arg (R) | CGT | 0.00 | 0.42 |
Pro (P) | CCT | 1.88 | 1.38 | | | CGC | 0.55 | 0.53 |
| CCC | 0.24 | 1.22 | | | CGA | 0.55 | 2.84 |
| CCA | 1.88 | 1.38 | | | CGG | 0.00 | 0.21 |
| CCG | 0.00 | 0.03 | | | AGA | 4.36 | 1.41 |
| | | | | | AGG | 0.55 | 0.49 |
Thr (T) | ACT | 1.16 | 1.48 | | Gly (G) | GGT | 1.00 | 1.12 |
| ACC | 0.21 | 0.98 | | | GGC | 0.13 | 0.12 |
| ACA | 2.63 | 1.39 | | | GGA | 2.63 | 2.44 |
| ACG | 0.00 | 0.14 | | | GGG | 0.25 | 0.32 |
AA, amino acid; RSCU, relative synonymous codon usage value. Green colors denote codons favored by IIV6 DNA ligase gene and host (RSCU > 1). Overrepresented (RSCU ≥ 1.6) and underrepresented (RSCU < 0.6) codons are marked as bold with red and blue colors, respectively. The optimal codons for IIV6 DNA ligase gene are underlined |
Similar patterns of codon usage were observed in the DNA ligase genes of other IIVs. These genes showed a preference for specific codons such as ATT (I), TCT (S), TTA (L), and AGA (R), which were consistently favored across the codons (Fig. 3). These results provide further evidence that A/T-terminated codons are favored in the IIVs DNA ligase genes.
To identify the potential impact of the host organism on the CUB of the IIV6 DNA ligase gene, the codon usage patterns of the DNA ligase gene and those of its host (Chilo suppressalis) were compared. The analysis revealed that out of the fifty nine synonymous codons in the IIV6 DNA ligase gene, 44 codons showed similarity to those found in Chilo suppressalis, the host organism (Table 3). In particular, codons such as TTT (F), TTA (L), ATT (I), TCT (S), CCT (P), ACT (T), GCT (A), TAT (Y), CAT (H), AAT (N), AAA (K), GAA (E), and GGA (G) showed a close similarity between the IIV6 DNA ligase gene and its host organism. However, codons like AGT (S), CCC (P), ACA (T), CGA (R), and AGA (R) exhibited a significant discrepancy between the IIV6 DNA ligase gene and its host. The consistency observed in the patterns of codon usage between the IIV6 DNA ligase gene and its natural host may contribute to enhanced translational efficiency of viral genes.
3.4. Analysis of the effective number of codons (ENC) and ENC plot
The ENC is a vital parameter for assessing codon usage patterns, providing valuable information about codon usage bias and the factors affecting it. The research conducted in this study examined different ENC values (ranging from 35.64 to 57.50) for the DNA ligase genes of IIVs. The average ENC value of 45.95 indicated a low level of bias in codon usage within these genes (Table 2). The ENC values of IIVs DNA ligase genes exhibited significant differences, particularly the ENC value (35.64) of IIV6 DNA ligase gene was notably lower compared to the other IIVs DNA ligase genes. This indicates that the IIV6 DNA ligase gene exhibits a slightly elevated level of codon usage bias when compared to the other genes.
In order to conduct a more in-depth analysis of the CUB within the genes, an ENC-GC3s plot was created (Fig. 4). The results suggest that both mutational pressure and natural selection make a contribution the codon usage patterns observed in the DNA ligase genes of IIVs. This is supported by most of the data points lying below the expected curve, demonstrating the influence of selective forces, while a smaller number of points are positioned above the curve, suggesting the impact of mutational pressure.
3.5. Analysis of the neutrality plot
The analysis of a neutrality plot can provide insights into how natural selection and mutational pressure affect the CUB of genes. Therefore, the neutrality plot was generated between GC12 values and GC3 values (Fig. 5). The analysis revealed a significant positive correlation (r = 0.0476, p < 0.05) between GC12 and GC3 values. the regression line's slope was determined to be 0.2359, suggesting that approximately 23.59% of the observed codon usage bias can be attributed to direct mutation pressure. Conversely, the remaining 76.41% of the bias was determined to be influenced by natural selection. The results of the study suggest that mutation pressure possessed a limited influence on the CUB observed in the IIVs DNA ligase genes. In contrast, the predominant factor shaping the bias was likely natural selection.
3.6. PR2 bias plot analysis
The PR2 bias plot was utilized to evaluate and analyze the effect of mutational pressure and natural selection on the CUB in the DNA ligase genes of IIVs. This plot provided a visual representation of the roles played by these factors in shaping the observed CUB patterns (Fig. 6 ). In the PR2 bias plot, the AT bias was plotted against the GC bias to assess the deviation from the expected A = T and C = G pattern. Interestingly, here, a slight deviation was observed in the PR2 bias plot, indicating a non-equivalence between adenine (A) and thymine (T), as well as between guanine (G) and cytosine (C) at the third codon positions of the DNA ligase genes in IIVs. The analysis revealed that the mean AT bias value was 0.50, indicating an equal preference for A and T nucleotides, while the mean GC bias value was 0.49, indicating a slightly higher tendency for G over C nucleotides within the DNA ligase gene sequences. The results demonstrated a preference for A and G nucleotides (Purines) over T and C nucleotides (Pyrimidines), implying that both mutational pressure and natural selection contribute to the CUB observed in IIVs DNA ligase genes.
3.7. Codon adaptation index (CAI) analysis
CAI is an index utilized to assess the degree of codon usage bias or adaptation of a gene to its host organism. The CAI values were computed for all codons in the IIVs DNA ligase genes using the codon usage data from ten different host organisms (Table 1), including Chilo suppressalis, Armadillidium vulgare, Ochlerotatus taeniorhynchus, Simulium vittatum, Oxycanus dirempta, Helicoverpa zea, Anopheles minimus, Chondrocladia grandis, Litopenaeus vannamei and Cherax quadricarinatus as a reference set. The findings of the present study indicate a tendency towards higher CAI values (> 0.5) in the IIVs DNA ligase genes, suggesting that these genes exhibit a better level of adaptation to their respective host organisms (except for IIV3 MIV052L, IIV25 170R, IIV22 159R, IIV9 109R, IIV22A 166R) (Table 2). The CAI values of IIVs DNA ligase genes exhibited a range of 0.48 to 0.84, with a mean value of 0.62. The CAI value of IIV6 DNA ligase gene in relation to C. suppressalis was 0.84, and was remarkably higher than others.
To further validate the statistically important differences in CAI values, the E-CAI values were calculated for the IIVs DNA ligase genes; their E-CAI values exhibited a range of 0.48 to 0.86, with a mean value of 0.68 (Table 2). The E-CAI value of IIV6 DNA ligase gene was 0.86 (p < 0.05), indicating the adaption of IIV6 DNA ligase gene to its host and revealing that the sequences had normal distribution. Similar results were also found in other IIVs DNA ligase genes.
3.8. Correspondence analysis
To assess the main patterns of variation in codon usage among DNA ligase genes, COA was conducted using the RSCU values. The results of the COA analysis indicated that axis 1 explained 58.74% of the total variation, followed by axis 2 with 14.6%, axis 3 with 4.12%, and axis 4 with 1.02%. These results suggest that the first axis accounts for a major role in explaining the variation in RSCU values, while the second axis contributes to a lesser extent. The higher percentage of variation explained by axis 1 suggests that it captures the most significant patterns and trends in the codon usage data. To gain a deeper understanding of the dispersion of synonymous codon usage patterns, we conducted further analysis by examining the dispersion in the plane defined by the first two principal axes of COA. The data points exhibit a broad distribution across the entire graph (Fig. 7A). When the codons were ordered according to their RSCU values across the two primary axes of COA, it was observed that A- and T-terminated codons were positioned closer to axis 1 compared to C- and G-terminated codons (Fig. 7B).
3.9. Dinucleotide abundance frequency analysis
In order to assess the possible impact of dinucleotide composition on codon usage bias, the relative frequencies of the sixteen dinucleotides were computed for the IIVs DNA ligase genes (Table 4). The results revealed a non-random distribution of relative dinucleotide frequencies in the IIVs DNA ligase genes. The relative frequencies of the dinucleotides AC, CA, CT, GA, and GT were found to be consistent with the expected theoretical frequencies, which are typically close to 1 (mean ± SD = 0.956 ± 0.105, 1.045 ± 0.093, 0.919 ± 0.162, 1.037 ± 0.162 and 0.984 ± 0.112, respectively). None of the dinucleotides were over-represented and one dinucleotide (TA = 0.675 ± 0.139) was under-represented in IIVs DNA ligase genes (Fig. 8). Furthermore, the analysis revealed that 90% of the frequencies among the 16 dinucleotides fell within the normal range of 0.78 to 1.23.
Table 4
Relative abundance of the 16 dinucleotides in IIVs DNA ligase genes
Dinucleotides | Range | Means ± SD |
AA | 0.916–1.365 | 1.187 ± 0.111 |
AC | 0.780–1.128 | 0.956 ± 0.105 |
AG | 0.583–1.089 | 0.860 ± 0.153 |
AT | 0.811–1.042 | 0.877 ± 0.090 |
CA | 0.933–1.207 | 1.045 ± 0.093 |
CC | 0.664–1.509 | 1.180 ± 0.279 |
CG | 0.451-1.600 | 0.880 ± 0.402 |
CT | 0.594–1.083 | 0.919 ± 0.162 |
GA | 0.841–1.261 | 1.037 ± 0.162 |
GC | 0.623–1.128 | 0.824 ± 0.145 |
GG | 0.932–1.422 | 1.142 ± 0.184 |
GT | 0.813–1.109 | 0.984 ± 0.112 |
TA | 0.489–0.819 | 0.675 ± 0.139 |
TC | 0.875–1.485 | 1.135 ± 0.247 |
TG | 0.988–1.345 | 1.163 ± 0.118 |
TT | 0.884–1.316 | 1.205 ± 0.115 |
For IIV6 DNA ligase gene, the over-represented dinucleotides were GG (1.283) and TT (1.274) while the under-represented dinucleotides were TA (0.780) and CG (0.451). To evaluate the possible impact of under-representation of CG and TA dinucleotides on the codon usage pattern, the RSCU values of the eight codons containing CG (CCG, GCG, TCG, ACG, CGC, CGG, CGT, and CGA) were examined. All codons containing CG were found to be under-represented (RSCU < 0.6), indicating that they were not preferentially utilized in the codon usage pattern. Two (CTA and TAC) of six codons containing TA were under-represented (Table 3). Nevertheless, the RSCU values of the four codons containing TA (TTA, ATA, GTA, and TAT) indicated that these codons were preferentially selected and utilized in the codon usage pattern. Similarly, the impact of over-representation of TT and GG on CUB was investigated by analyzing the RSCU values of the seven codons containing TT (TTT, TTC, TTA, TTG, CTT, ATT, and GTT) and the six codons containing GG (CGG, AGG, GGC, GGA, GGG, and GGT). Out of the seven codons containing TT, five (TTT, TTA, CTT, ATT, and GTT) were found to be over-represented (RSCU > 1.6). This suggests a preference for these specific codons in encoding their relevant amino acids within the DNA ligase genes. Among the six codons containing GG, only one codon (GGA) was found to be over-represented. The findings imply that the CUB of the IIV6 DNA ligase gene can be affected by the frequency of dinucleotides.
3.10. Correlation analysis
To delve deeper into the impact of natural selection or mutational pressure on the CUB within the DNA ligase genes of IIVs, a correlation analysis was conducted. A high correlation value (r) close to 1 indicates that translational selection is the primary factor driving the CUB. On the other hand, a low correlation (approaching 0) indicates that mutation may have a stronger influence compared to translational selection on the CUB [65].
Therefore, Spearman's rank correlation analysis was conducted to examine the relationship between nucleotide compositions (A, T, C, G, and GC), ENC values, and codon composition (A3s, T3s, C3s, G3s, and GC3s) of IIVs DNA ligase genes (Table 5). The results displayed a substantial negative correlation between A or T and C3s, G3s, and GC3s in the IIVs DNA ligase genes. Conversely, a substantial positive correlation was identified between A or T and A3s, T3s. Furthermore, C, G, or GC exhibited a significant negative correlation with A3s and T3s, but a significant positive correlation with C3s, G3s, and GC3s. The ENC values of IIVs DNA ligase genes seemed to show a positive correlation with C3s, G3s, GC3s, while the T3s, A3s have a negative correlation with ENC. The observed relationships strongly indicate the importance of nucleotide composition, influenced by mutational pressure, in determining the codon usage patterns of IIVs DNA ligase genes.
Table 5
Correlation analysis between nucleotide compositions (A%,T%, C%, G%, GC), the ENC values and codon composition ( A3s, T3s, C3s, G3s, and GC3s) of IIVs DNA ligase genes
| A3s | T3s | C3s | G3s | GC3s |
A% | 0,95041* | 0,79614* | -0,91736* | -0,88981* | -0,88981* |
T% | 0,8331* | 0,95173* | -0,88* | -0,88* | -0,90759* |
C% | -0,95592* | -0,95592* | 1 | 0,92837* | 0,95592* |
G% | -0,87724* | -0,80828* | 0,79724* | 0,88552* | 0,86621* |
GC | -0.9931* | -0.8831* | 0.9491* | 0.9546* | 0.9546* |
ENC | -0,96011* | -0,87758* | 0,9216* | 0,97112* | 0,95461* |
The values in the table refer to correlation coefficient “r” values in a correlation analysis.*P-value < 0.05 |
The correlation analysis conducted on the IIVs DNA ligase genes revealed a significant positive correlation between axis 1 and GC3s, GC, C, G, C3s, G3s, and ENC. Conversely, a significant negative correlation was found between axis 1 and A, T, A3s, and T3s (Table 6). In contrast, axis 2 showed no significant correlations with any nucleotide constraints in the IIVs DNA ligase genes.
Table 6
Correlation analysis between A, T, C, G, ENC, A3s,T3s, C3s, G3s, GC3s, GC and the first axis values
| A% | T% | C% | G% | ENC | A3s | T3s | C3s | G3s | GC3s | GC |
Axis 1 | | | | | | | | | | | |
r | -0,85557* | -0,93664* | 0,96561* | 0,81543* | 0,92857* | -0,9271* | -0,97662* | 0,96561* | 0,95461* | 0,97662* | 0.9231* |
P | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 |
The values in the table refer to correlation coefficient “r” values in a correlation analysis.*P-value < 0.05 |
Table 7
Correlation analysis among the first axis values, ENC, A3s, T3s, G3s, C3s, GC3s, GC values and the Gravy values
| Axis 1 | ENC | A3s | T3s | C3s | G3s | GC3s | GC |
Gravy | | | | | | | | |
r | -0,72527* | -0,73626* | 0,72902* | 0,69051* | -0,71252* | -0,76204* | -0,76754* | -0.7582* |
P | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 | < 0.05 |
The values in the table refer to correlation coefficient “r” values in a correlation analysis.*P-value < 0.05 |
Aromaticity and hydropathicity can have an impact on the CUB, and their effects may be attributed to natural selection. To determine the role of natural selection on the codon usage of IIVs DNA ligase genes, a correlation analysis was conducted between ENC, A3s, T3s, G3s, C3s, GC3s, GC values, and Gravy values (Tablo 7). The results indicated a positive correlation between Gravy and A3s as well as T3s, while a negative correlation was found between GC, CC3s, G3s, C3s, ENC, and axis 1. However, Aroma has no correlation with axis 1 and other indices. Therefore, the association between hydrophobicity and the codon usage of IIVs DNA ligase genes suggests the impact of natural selection on the pattern of codon usage.