Phylogenetic classification
Phylogenetic analysis of 41 SARS-Cov-2 whole genomes indicated that all SARS-Cov-2 strains were clustered into separate clades (Fig. 1). In the tree, SARS-Cov-2 of China (Wuhan) formed a distinct clade, and most of SARS-Cov-2 of USA clustered together in another clade. SARS-Cov-2 of China (Yunnan and Shenzhen), Australia and South Korea were more closely related to that of USA, and the virus of China (Hangzhou) was more closely related to that of China (Wuhan). Besides, SARS-Cov-2 of Japan were distributed in different clades.
A/U nucleotides are more frequent than G/C in SARS-Cov-2 coding sequences
We analyzed 41 whole genomes of SARS-Cov-2, which originates from humans infected in different countries during 2019 to 2020. Our results suggested that the four nucleotides were used at unequal frequencies, the genome compositions were significantly enriched for AU (62.00% ± 0.02) over GC (38.00% ± 0.01) (Table 1, Fig. S1 (A), wilcox.test, P < 0.01). Additionally, the mean count of nucleotides A (29.90% ± 0.03) and U (32.11% ± 0.01) were significantly higher than G (19.62% ± 0.01) and C (18.38% ± 0.01) (Table 1, Fig. S1 (C), wilcox.test, P < 0.01). These results were consistent with our prior study of Crimean-Congo hemorrhagic fever virus, which was also indicated the frequencies of A and U were higher than that of C and G [17].
Table 1
Nucleotide composition analysis of SARS-Cov-2 complete sequences (%).
Accession ID
|
A
|
U
|
G
|
C
|
AU
|
GC
|
A3
|
U3
|
G3
|
C3
|
AU3
|
GC3
|
GC1
|
GC2
|
GC12
|
MN988669.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
38.18
|
45.84
|
21.33
|
19.87
|
68.10
|
31.90
|
34.90
|
24.10
|
29.50
|
MN988668.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
38.18
|
45.84
|
21.33
|
19.87
|
68.10
|
31.90
|
34.90
|
24.10
|
29.50
|
LC522975.1
|
29.89
|
32.11
|
19.62
|
18.37
|
62.00
|
37.99
|
39.61
|
43.08
|
17.76
|
26.80
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
LC522974.1
|
29.89
|
32.11
|
19.62
|
18.37
|
62.00
|
37.99
|
39.60
|
43.07
|
17.77
|
26.80
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
LC522973.1
|
29.89
|
32.11
|
19.62
|
18.37
|
62.00
|
37.99
|
39.60
|
43.09
|
17.77
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
LC522972.1
|
29.89
|
32.11
|
19.62
|
18.37
|
62.00
|
37.99
|
39.60
|
43.08
|
17.79
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT039888.1
|
29.89
|
32.12
|
19.62
|
18.37
|
62.01
|
37.99
|
39.58
|
43.09
|
17.79
|
26.77
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
LC521925.1
|
29.88
|
32.13
|
19.61
|
18.38
|
62.01
|
37.99
|
39.58
|
43.07
|
17.82
|
26.81
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MT020881.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
39.59
|
43.09
|
17.79
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT020880.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
39.59
|
43.09
|
17.79
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MN985325.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
39.59
|
43.09
|
17.79
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT044258.1
|
29.90
|
32.10
|
19.62
|
18.39
|
62.00
|
38.01
|
39.60
|
43.05
|
17.80
|
26.79
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MT044257.1
|
29.90
|
32.11
|
19.62
|
18.38
|
62.01
|
38.00
|
39.57
|
43.09
|
17.77
|
26.79
|
65.10
|
34.90
|
33.70
|
24.10
|
28.90
|
MT039887.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
39.58
|
43.09
|
17.79
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT027064.1
|
29.89
|
32.11
|
19.62
|
18.37
|
62.00
|
37.99
|
39.59
|
43.09
|
17.79
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT027063.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
39.59
|
43.09
|
17.79
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT027062.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
39.59
|
43.09
|
17.79
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MN997409.1
|
29.89
|
32.11
|
19.62
|
18.38
|
62.00
|
38.00
|
39.60
|
43.08
|
17.77
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MN994468.1
|
29.90
|
32.11
|
19.62
|
18.37
|
62.01
|
37.99
|
39.60
|
43.07
|
17.79
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MN994467.1
|
29.89
|
32.11
|
19.61
|
18.38
|
62.00
|
37.99
|
39.57
|
43.08
|
17.78
|
26.81
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MN988713.1
|
29.89
|
32.10
|
19.62
|
18.37
|
61.99
|
37.99
|
39.59
|
43.11
|
17.77
|
26.76
|
65.10
|
34.90
|
34.10
|
24.40
|
29.25
|
MN975262.1
|
29.92
|
32.10
|
19.61
|
18.37
|
62.02
|
37.98
|
39.63
|
43.09
|
17.76
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MN938384.1
|
29.86
|
32.13
|
19.64
|
18.38
|
61.99
|
38.02
|
38.56
|
33.95
|
29.98
|
23.92
|
58.60
|
41.40
|
34.90
|
24.10
|
29.50
|
MT007544.1
|
29.95
|
32.08
|
19.60
|
18.37
|
62.03
|
37.97
|
39.79
|
43.01
|
17.73
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT039873.1
|
29.87
|
32.11
|
19.63
|
18.39
|
61.98
|
38.02
|
39.57
|
43.07
|
17.80
|
26.81
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MT019533.1
|
29.90
|
32.11
|
19.62
|
18.38
|
62.01
|
38.00
|
39.60
|
43.09
|
17.77
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT019532.1
|
29.91
|
32.10
|
19.62
|
18.37
|
62.01
|
37.99
|
39.62
|
43.07
|
17.78
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT019531.1
|
29.93
|
32.08
|
19.61
|
18.37
|
62.01
|
37.98
|
39.65
|
43.06
|
17.77
|
26.80
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT019530.1
|
29.92
|
32.09
|
19.62
|
18.38
|
62.01
|
38.00
|
39.61
|
43.05
|
17.79
|
26.82
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MT019529.1
|
29.93
|
32.08
|
19.62
|
18.37
|
62.01
|
37.99
|
39.64
|
43.07
|
17.79
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT066176.1
|
29.86
|
32.12
|
19.63
|
18.38
|
61.98
|
38.01
|
39.55
|
43.08
|
17.80
|
26.79
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MT066175.1
|
29.87
|
32.12
|
19.63
|
18.39
|
61.99
|
38.02
|
39.55
|
43.08
|
17.80
|
26.79
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MT049951.1
|
29.95
|
32.08
|
19.60
|
18.37
|
62.03
|
37.97
|
39.67
|
43.09
|
17.76
|
26.78
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MT039890.1
|
29.95
|
32.10
|
19.60
|
18.36
|
62.05
|
37.96
|
39.67
|
43.06
|
17.79
|
26.79
|
65.10
|
34.90
|
34.90
|
22.90
|
28.90
|
MN996531.1
|
29.86
|
32.12
|
19.63
|
18.39
|
61.98
|
38.02
|
38.14
|
45.84
|
21.34
|
19.88
|
68.10
|
31.90
|
34.90
|
24.10
|
29.50
|
MN996530.1
|
29.86
|
32.12
|
19.63
|
18.39
|
61.98
|
38.02
|
38.56
|
33.94
|
29.98
|
23.92
|
58.60
|
41.40
|
34.90
|
24.10
|
29.50
|
MN996529.1
|
29.86
|
32.12
|
19.63
|
18.39
|
61.98
|
38.02
|
39.53
|
43.08
|
17.81
|
26.81
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MN996528.1
|
29.92
|
32.10
|
19.61
|
18.37
|
62.02
|
37.98
|
39.62
|
43.07
|
17.78
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
MN996527.1
|
29.85
|
32.13
|
19.63
|
18.38
|
61.98
|
38.01
|
39.56
|
43.09
|
17.80
|
26.79
|
65.00
|
35.00
|
34.90
|
24.10
|
29.50
|
MN908947.3
|
29.94
|
32.08
|
19.61
|
18.37
|
62.02
|
37.98
|
39.66
|
43.07
|
17.77
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
NC_045512.2
|
29.94
|
32.08
|
19.61
|
18.37
|
62.02
|
37.98
|
39.66
|
43.07
|
17.77
|
26.79
|
65.10
|
34.90
|
34.90
|
24.10
|
29.50
|
Mean
± SD
|
29.90 ± 0.03
|
32.11 ± 0.01
|
19.62 ± 0.01
|
18.38 ± 0.01
|
62.00 ± 0.02
|
38.00 ± 0.01
|
39.45 ± 0.43
|
42.83 ± 2.16
|
18.64 ± 2.76
|
26.14 ± 1.89
|
64.98 ± 1.67
|
35.02 ± 1.67
|
34.85 ± 0.22
|
24.08 ± 0.19
|
29.46 ± 0.14
|
ENC represents the effective number of codons. |
GC12 represents the G + C content at the first and second positions of codons. |
GC3 represents the G + C content at the third positions of codons. |
AU3 represents the A + U content at the third positions of codons. |
To investigate the magnitude of codon usage bias in SARS-Cov-2, we calculated the mean values of nucleotide contents for all triplets during the research. At the third codon position, the nucleotide percentage of A3 (39.45% ± 0.43) and U3 (42.83% ± 2.16) were significantly higher than G3 (18.64% ± 2.76) and C3 (26.14% ± 1.89) (Table 1, Fig. S1 (D), wilcox.test, P < 0.01). These values were similar with the total nucleotide composition, showing that A3, U3, G3 and C3 may be influenced by the total nucleotide composition. GC nucleotide composition at each codon position was also considered to reflect the base composition bias. The ranges of GC content were as follows: 33.70–34.90% (mean = 34.85, SD = 0.22) at the first codon position; 22.90–24.40% (mean = 24.08, SD = 0.19) at the second codon position; and 28.90–29.50% (mean = 29.46, SD = 0.14) at the first and second codon positions. In addition, the mean values of AU3 and GC3 compositions were 64.98% ± 1.67 and 35.02% ± 1.67, respectively, suggesting that A and U nucleotides may be more enriched in the end of codons (Table 1, Fig. S1 (B), wilcox.test, P < 0.01).
Codon usage patterns of SARS-Cov-2 and its hosts
The RSCU analysis was used to estimate the codon usage patterns of SARS-Cov-2 complete genomes and their major gene sequences. Among the eighteen most abundantly used codons in SARS-Cov-2 complete genomes, twelve codons [UUU (Phe), UUA (Leu), AUU (Ile), GUU (Val), AGU (Ser), CCU (Pro), ACA (Thr), GCU (Ala), UAU (Tyr), UGU (Cys), AGA (Arg) and GGU (Gly)] had A/U at the end (three A-ended; nine U-ended) and the remaining six [CAC (His), CAG (Gln), AAC (Asn), AAG (Lys) and GAC (Asp)] had G/C at the end (three C-ended; three G-ended), when the SARS-Cov-2 coding sequences were similar according to their gene groups. This indicates that A- and U-ended codons are preferred in the SARS-Cov-2 coding sequences. An analysis of overall RSCU values showed that 3 of the 18 preferred codons [ACA (Thr), AGA (Trg) and GGU (Gly)] had RSCU values > 1.6, while RSCU values for the remaining preferred codons were > 0.6 and < 1.6. The evaluation of overall RSCU values may potentially hide gene-specific patterns, thus we also estimated the RSCU values of SARS-Cov-2 coding sequences based on the gene groups. We found that the preferred codons were differentiated among the gene groups. The ratios of consistent/inconsistent preferred codons between the SARS-Cov-2: E gene, M gene, 2 N gene and S gene were 4:14, 6:12, 4:14 and 8:10, respectively (Table 2, Fig. S2). Patterns of gene-specific over-represented codons were also observed in the SARS-Cov-2 isolates, 11 of the 18 preferred codons were over-represented in the E gene, 10 of the 18 were over-represented in the M gene, 6 of 18 were over-represented in the N gene and 9 of 18 were over-represented in the S gene. The gene-specific RSCU patterns indicated the independent evolution dynamics of the SARS-Cov-2 isolates. In addition, to estimate the potential effects of the host and vector on the viral codon usage pattern, the RSCU patterns were considered and matched with various potential hosts such as human and bat (Table 2, Fig. S2). Among these 18 preferred codons, we found that the ratio of common/uncommon preferred codons was 6:12 between SARS-Cov-2 and human and 13:5 between SARS-Cov-2 and bat (Table 2, Fig. S2).
Table 2
The relative synonymous codon usage frequency (RSCU) of SARS-Cov-2 complete genomes, major viral genes, its natural hosts and transmission vectors.
A.A
|
Codons
|
E gene
|
M gene
|
N gene
|
S gene
|
SARS-Cov-2
|
Human
|
Bat
|
Phe
|
UUU
|
0.8
|
0.91
|
0.48
|
1.53
|
1.33
|
0.97
|
1.31
|
UUC
|
1.20
|
1.09
|
1.52
|
0.47
|
0.67
|
1.03
|
0.69
|
Leu
|
UUA
|
0.43
|
0.69
|
0.46
|
1.56
|
1.40
|
0.5
|
1.76
|
UUG
|
0.86
|
0.69
|
1.99
|
1.12
|
1.30
|
0.85
|
0.64
|
CUU
|
2.99
|
2.06
|
1.77
|
2.00
|
1.19
|
0.81
|
0.76
|
CUC
|
0.00
|
1.03
|
0.44
|
0.67
|
0.46
|
1.07
|
0.70
|
CUA
|
0.86
|
0.86
|
0.66
|
0.50
|
0.88
|
0.46
|
1.51
|
CUG
|
0.86
|
0.69
|
0.66
|
0.17
|
0.77
|
2.33
|
0.63
|
Ile
|
AUU
|
1.00
|
1.65
|
1.93
|
1.74
|
1.46
|
1.13
|
1.29
|
AUC
|
1.00
|
0.90
|
0.86
|
0.55
|
0.64
|
1.37
|
0.69
|
AUA
|
1.00
|
0.45
|
0.21
|
0.71
|
0.90
|
0.5
|
1.02
|
Val
|
GUU
|
2.15
|
1.00
|
1.00
|
1.98
|
1.45
|
0.79
|
0.91
|
GUC
|
0.31
|
0.00
|
1.50
|
0.87
|
0.55
|
0.90
|
0.85
|
GUA
|
0.92
|
2.00
|
0.50
|
0.62
|
0.93
|
0.52
|
1.78
|
GUG
|
0.62
|
1.00
|
1.00
|
0.54
|
1.06
|
1.79
|
0.46
|
Ser
|
UCU
|
3.00
|
0.80
|
1.30
|
2.24
|
1.50
|
1.15
|
1.33
|
UCC
|
0.00
|
1.20
|
0.49
|
0.73
|
0.55
|
1.17
|
1.33
|
UCA
|
0.75
|
1.20
|
1.45
|
1.58
|
1.48
|
0.93
|
1.93
|
UCG
|
0.75
|
0.40
|
0.32
|
0.12
|
0.30
|
0.36
|
0.33
|
AGU
|
0.75
|
1.60
|
1.46
|
1.03
|
1.57
|
0.98
|
0.93
|
AGC
|
0.75
|
0.80
|
0.97
|
0.30
|
0.51
|
1.42
|
1.23
|
Pro
|
CCU
|
4.00
|
0.80
|
1.14
|
2.00
|
1.60
|
1.20
|
1.40
|
CCC
|
0.00
|
0.00
|
1.00
|
0.28
|
0.32
|
1.22
|
0.44
|
CCA
|
0.00
|
2.40
|
1.57
|
1.72
|
1.37
|
1.14
|
1.23
|
CCG
|
0.00
|
0.80
|
0.29
|
0.00
|
0.73
|
0.45
|
1.12
|
Thr
|
ACU
|
1.00
|
1.54
|
2.00
|
1.81
|
1.59
|
1.03
|
1.32
|
ACC
|
0.00
|
0.92
|
0.75
|
0.41
|
0.31
|
1.32
|
0.32
|
ACA
|
2.00
|
0.92
|
1.00
|
1.65
|
1.82
|
1.19
|
0.94
|
ACG
|
1.00
|
0.62
|
0.25
|
0.12
|
0.63
|
0.46
|
1.28
|
Ala
|
GCU
|
1.00
|
2.51
|
2.06
|
2.13
|
1.26
|
1.08
|
1.66
|
GCC
|
1.00
|
0.43
|
0.76
|
0.41
|
0.29
|
1.51
|
0.12
|
GCA
|
0.00
|
0.84
|
0.86
|
1.37
|
1.03
|
0.95
|
1.31
|
GCG
|
2.00
|
0.21
|
0.32
|
0.10
|
0.97
|
0.46
|
0.69
|
Tyr
|
UAU
|
0.00
|
0.89
|
0.36
|
1.48
|
1.22
|
0.93
|
1.08
|
UAC
|
2.00
|
1.11
|
1.64
|
0.52
|
0.72
|
1.07
|
0.82
|
His
|
CAU
|
2.00
|
1.60
|
1.50
|
1.53
|
0.76
|
0.85
|
0.81
|
CAC
|
0.00
|
0.40
|
0.50
|
0.47
|
1.13
|
1.15
|
1.05
|
Gln
|
CAA
|
0.00
|
1.00
|
1.54
|
1.48
|
0.87
|
0.49
|
0.95
|
CAG
|
0.00
|
1.00
|
0.46
|
0.52
|
1.23
|
1.51
|
1.52
|
Asn
|
AAU
|
1.60
|
0.73
|
1.45
|
1.23
|
0.77
|
0.98
|
0.48
|
AAC
|
0.40
|
1.27
|
0.55
|
0.77
|
1.14
|
1.02
|
0.95
|
Lys
|
AAA
|
2.00
|
1.14
|
1.35
|
1.25
|
0.86
|
0.88
|
1.05
|
AAG
|
0.00
|
0.86
|
0.65
|
0.75
|
1.31
|
1.12
|
1.14
|
Asp
|
GAU
|
2.00
|
0.33
|
1.17
|
1.39
|
0.69
|
0.99
|
0.86
|
GAC
|
0.00
|
1.67
|
0.83
|
0.61
|
1.23
|
1.01
|
1.00
|
Glu
|
GAA
|
1.00
|
1.71
|
1.33
|
1.42
|
0.77
|
0.85
|
1.00
|
GAG
|
1.00
|
0.29
|
0.67
|
0.58
|
1.06
|
1.15
|
1.10
|
Cys
|
UGU
|
0.67
|
2.00
|
0.00
|
1.40
|
1.00
|
0.95
|
1.00
|
UGC
|
1.33
|
0.00
|
0.00
|
0.60
|
0.77
|
1.05
|
0.17
|
Arg
|
CGU
|
2.00
|
2.14
|
1.24
|
1.28
|
0.41
|
0.54
|
2.23
|
CGC
|
0.00
|
0.86
|
1.03
|
0.14
|
0.33
|
1.11
|
0.70
|
CGA
|
2.00
|
0.43
|
1.03
|
0.02
|
1.37
|
0.76
|
0.49
|
CGG
|
0.00
|
0.00
|
0.41
|
0.28
|
0.79
|
1.31
|
0.58
|
AGA
|
2.00
|
1.29
|
2.07
|
2.85
|
2.63
|
1.18
|
1.39
|
AGG
|
0.00
|
1.29
|
0.21
|
1.43
|
1.42
|
1.10
|
0.82
|
Gly
|
GGU
|
4.00
|
1.43
|
0.93
|
2.29
|
1.71
|
0.71
|
0.75
|
GGC
|
0.00
|
0.86
|
1.49
|
0.73
|
0.81
|
1.35
|
0.96
|
GGA
|
0.00
|
1.71
|
1.21
|
0.83
|
1.01
|
1.01
|
1.49
|
GGG
|
0.00
|
0.00
|
0.37
|
0.15
|
0.46
|
0.93
|
0.80
|
AA represents amino acid; the “RSCU” value represents the pattern of relative synonymous codon usage; orange colors represents the codons favored by SARS-Cov-2 and hosts (RSCU > 1); over-represented (RSCU > 1.6), and under-represented (RSCU < 0.6) codons are marked as bold with red and green colors, respectively, the ideal codons for SARS-Cov-2 are marked as underline. |
Measuring the similarity influences between the overall codon usage of SARS-Cov-2 and that of hosts
Spearman’s correlational distance analysis was used to further estimate similarity of codon usage patterns and to investigate how the overall codon usage patterns of hosts and SARS-Cov-2 participated in evolutionary process. This analysis was performed to determine the similarities of general codon usage patterns between SARS-Cov-2 and hosts. Such RSCU-dependent analysis was applied routinely for the viral hosts, and remain limited to codon usage patterns and similarities [28–31]. Here, we performed this method through the hierarchical clustering analysis of virus and hosts in this study, and estimated their overall codon usage similarities. This optimized method was performed to present a clear sight of codon usage patterns. Two main groups were noted in this analysis. It was shown that one cluster included the virus and the vector (bat) and the other cluster only included the host (human) (Fig. 2). The statistical tests for the distances of RSCU values (each of which was compared with a synonymous shuffling null model) indicated that a significant signature of codon usage patterns existed for vector and SARS-Cov-2 (P < 0.01) compared with human and SARS-Cov-2 (P > 0.05). This suggested that possible viral transmission in humans may depend on the vector (bat).
Codon usage adaptation in SARS-Cov-2
Codon adaptation index (CAI) analysis was performed to investigate the relationship between the codon usage patterns and the expression levels of SARS-Cov-2 coding sequences, which reflected the adaptation of virus to their host cellular machinery. The CAI values are ranged from 0 to 1, and higher CAI values are considered as higher levels of codon usage bias [32]. The CAI values were obtained for each gene of SARS-Cov-2 in relation to human and bat, respectively (Fig. 2 and Table S1). In the SARS-Cov-2 isolates of E genes, the mean CAI value was noted in relation to human (0.617 ± 0.001) and bat (0.602 ± 0.001). In the M genes, the mean CAI value was noted in relation to human (0.674 ± 0.001) and bat (0.670 ± 0.001). In the N genes, the mean CAI value was noted in relation to human (0.731 ± 0.001) and bat (0.710 ± 0.001). In the S genes, the mean CAI value was noted in relation to human (0.710 ± 0.001) and bat (0.755 ± 0.001). The Student’s t-test was applied to estimate the significant differences in this study, and it suggested that there were significant differences in the CAI values (Fig. 3 and Table S1).
Evolutionary rates of various genes in SARS-Cov-2
To investigate why the CAI value is not coincident between S genes and the other three genes in relation to human and bat, we estimated the evolutionary rates of SARS-Cov-2 strains whose collection date was known, by the Bayesian coalescent approach according to the sequences of various protein-coding genes. Using the best-fit model, Bayesian estimated the mean substitution rates for these genes were between 2.35 × 10− 4 and 4.21 × 10− 3 substitutions per site/year (Table 3). Among the structural proteins-encoding genes, the E gene had the fastest evolutionary rate (4.21 × 10− 3) and a 95% highest probability density (HPD) of between 5.40 × 10− 8 and 1.3 × 10− 2. The S gene evolved at the slowest rate of 2.35 × 10− 4 with the 95% HPD of between 1.29 × 10− 8 and 7.21 × 10− 4.
Table 3
Bayesian estimates of evolutionary rate of specific gene segments of SARS-Cov-2.
Gene
|
Evolutionary rate (nt substitutions
per site per year)
|
95% HPD
|
E
|
4.21 × 10− 3
|
5.40 × 10− 8~1.3 × 10− 2
|
M
|
1.45 × 10− 3
|
1.11 × 10− 7~4.41 × 10− 3
|
N
|
7.96 × 10− 4
|
1.57 × 10− 7~2.36 × 10− 3
|
S
|
2.35 × 10− 4
|
1.29 × 10− 8~7.21 × 10− 4
|
HPD, highest probability density. |