Phylogenetic analysis of coronavirus genomes (Figure 1A) revealed that the newly identified coronavirus 2019-nCoV Wuhan-Hu-1 sequence was closer to SARS-CoVs and more distant from two alphacoronaviruses (HCoV-229E, HCoV-NL63).
Nucleotide composition analysis revealed that 2019-nCoV Wuhan-Hu-1 had the highest compositional value of U% (32.2) which was followed by A% (29.9), and similar composition of G% (19.6) and C% (18.3). Moreover, the mean GC and AU compositions were 37.9% and 62.1% (2019-nCoV Wuhan-Hu-1), 41.0% and 59.0% (SARS-CoV Tor2), 40.8% and 59.2% (SARS-CoV Urbani), 41.5% and 58.5% (MERS-CoV HCoV-EMC), 36.8% and 63.2% (HCoV-OC43), 32.0% and 68.0% (HCoV-HKU1), 38.0% and 62.0% (HCoV-229E), 34.4% and 65.6% (HCoV-NL63), respectively indicating that 2019-nCoV Wuhan-Hu-1 as well as other representative coronaviruses in this study are all AU rich.
RSCU analysis of the complete coding sequences of 2019-nCoV Wuhan-Hu-1 revealed that the following codons (AGA, UAA, GGU, GCU, UCU, GUU, CCU, ACU, CUU, UCA, ACA, UUA) were over-represented (RSCU value >1.6) and all ended with A/U. The highest RSCU value for the codon AGA for R (2.67) amino acid and lowest in UCG for S (0.11), which was consistent with recent report by Codon W1.4.2 analysis [9]. The heatmap analysis (Figure 1B) further revealed that all the coronaviruses analyzed in this study share the over-represented codons (GGU, GCU, UAA, GUU, UCU, CCU, ACU) and the average RSCU value >2.0, whereas two codons (UCA, ACA) were over-represented only in 2019-nCoV and SARS-CoVs.
The profiles of codon usage patterns among different genes of coronaviruses were further analyzed (Figure 1C). As for spike (S) gene, all the coronaviruses analyzed in this study share the over-represented codons (UCU, GUU, GCU, CCU, ACU, AUU) and all ended with U, whereas two codons (CCA, ACA) were over-represented only in 2019-nCoV. As for envelop (E) gene, two codons (GCG, UAC) were over-represented only in 2019-nCoV and SARS-CoVs. All the coronaviruses analyzed in this study did not use two synonymous codons (CGC, CGG) for arginine as well as CCG for proline at all. Only 2019-nCoV and SARS-CoVs did not use CAA for glutamine whereas they use AUC for isoleucine and UCG for serine. As for membrane (M) gene, two codons (GUA, GAA) were over-represented only in 2019-nCoV. As for nucleocapsid (N) gene, all the coronaviruses analyzed in this study share the over-represented codons (CUU, ACU, GCU) and all ended with U. The average RSCU values of GCU in complete gene, S gene, E gene, M gene and N gene in all the coronaviruses were 2.22, 2.30, 1.79, 2.13, 2.16, respectively. GCU for alanine was identified as the highly preferred codon.
To further estimate the degree of codon usage bias, intrinsic codon bias index (ICDI), codon bias index (CBI) and effective number of codons (ENC) values were calculated (Table 1). ICDI value (0.144), CBI value (0.306) and ENC value (45.38) all exhibited relatively low codon usage bias of 2019-nCoV, similar to SARS-CoV Tor2, SARS-CoV Urbani, MERS-CoV HCoV-EMC, HCoV-OC43, HCoV-229E whereas different from HCoV-HKU1 (ICDI 0.372; CBI 0.532; ENC 35.617) and HCoV-NL63 (ICDI 0.307; CBI 0.476; ENC 37.275), which exhibited moderate codon usage bias.
Table 1 The parameters of codon usage bias among the coronaviruses analyzed in this study.
Coronaviruses
|
ICDI
|
CBI
|
ENC
|
2019-nCoV Wuhan-Hu-1
|
0.144
|
0.306
|
45.38
|
SARS-CoV Tor2
|
0.075
|
0.223
|
49.746
|
SARS-CoV Urbani
|
0.08
|
0.228
|
48.965
|
MERS-CoV HCoV-EMC
|
0.082
|
0.248
|
50.033
|
HCoV-OC43
|
0.213
|
0.367
|
43.794
|
HCoV-HKU1
|
0.372
|
0.532
|
35.617
|
HCoV-229E
|
0.172
|
0.358
|
43.45
|
HCoV-NL63
|
0.307
|
0.476
|
37.275
|
Overall, this study has taken a snapshot of the codon usage pattern of 2019-nCoV. This novel coronavirus has a relatively low codon usage bias, similar to most of the representative coronaviruses, which might help to adapt to the host or the varied environment. Influence factors account for the low codon usage bias of 2019-nCoV, e.g. natural selection and mutational pressure, warrant further investigation. The information from this research may not only be helpful to get new insights into the evolution of human coronavirus, but also have potential value for developing coronavirus vaccines.