Phylogenetic analysis of coronavirus genomes (Fig. 1A) revealed that the newly identified coronavirus SARS-CoV-2 Wuhan-Hu-1 sequence was closer to SARS-CoV Tor-2 as well as SARS-CoV Urbani, and more distant from two alphacoronaviruses (HCoV-229E, HCoV-NL63).
Nucleotide composition analysis revealed that SARS-CoV-2 Wuhan-Hu-1 had the highest compositional value of U% (32.2) which was followed by A% (29.9), and similar composition of G% (19.6) and C% (18.3). Moreover, the mean GC and AU compositions were 37.9% and 62.1% (SARS-CoV-2 Wuhan-Hu-1), 41.0% and 59.0% (SARS-CoV Tor2), 40.8% and 59.2% (SARS-CoV Urbani), 41.5% and 58.5% (MERS-CoV HCoV-EMC), 36.8% and 63.2% (HCoV-OC43), 32.0% and 68.0% (HCoV-HKU1), 38.0% and 62.0% (HCoV-229E), 34.4% and 65.6% (HCoV-NL63), respectively indicating that SARS-CoV-2 Wuhan-Hu-1 as well as other representative coronaviruses in this study are all AU rich.
RSCU analysis of the complete coding sequences of SARS-CoV-2 Wuhan-Hu-1 revealed that the following codons (AGA, UAA, GGU, GCU, UCU, GUU, CCU, ACU, CUU, UCA, ACA, UUA) were over-represented (RSCU value > 1.6) and all ended with A/U. The highest RSCU value for the codon AGA for R (2.67) amino acid and lowest in UCG for S (0.11), which was consistent with recent report by Codon W1.4.2 analysis [9]. The heatmap analysis (Fig. 1B) further revealed that all the coronaviruses analyzed in this study share the over-represented codons (GGU, GCU, UAA, GUU, UCU, CCU, ACU) and the average RSCU value > 2.0, whereas two codons (UCA, ACA) were over-represented only in SARS-CoV-2 and SARS-CoVs.
The profiles of codon usage patterns among different genes of coronaviruses were further analyzed (Fig. 1C). As for spike (S) gene, all the coronaviruses analyzed in this study share the over-represented codons (UCU, GUU, GCU, CCU, ACU, AUU) and all ended with U, whereas two codons (CCA, ACA) were over-represented only in SARS-CoV-2. As for envelop (E) gene, two codons (GCG, UAC) were over-represented only in SARS-CoV-2 and SARS-CoVs. All the coronaviruses analyzed in this study did not use two synonymous codons (CGC, CGG) for arginine as well as CCG for proline at all. Only SARS-CoV-2 and SARS-CoVs did not use CAA for glutamine whereas they use AUC for isoleucine and UCG for serine. As for membrane (M) gene, two codons (GUA, GAA) were over-represented only in SARS-CoV-2. As for nucleocapsid (N) gene, all the coronaviruses analyzed in this study share the over-represented codons (CUU, ACU, GCU) and all ended with U. The average RSCU values of GCU in complete gene, S gene, E gene, M gene and N gene in all the coronaviruses were 2.22, 2.30, 1.79, 2.13, 2.16, respectively. GCU for alanine was identified as the highly preferred codon.
To further estimate the degree of codon usage bias, intrinsic codon bias index (ICDI), codon bias index (CBI) and effective number of codons (ENC) values were calculated (Table 1). ICDI value (0.144), CBI value (0.306) and ENC value (45.38) all exhibited relatively low codon usage bias of SARS-CoV-2, similar to SARS-CoV Tor2, SARS-CoV Urbani, MERS-CoV HCoV-EMC, HCoV-OC43, HCoV-229E whereas different from HCoV-HKU1 (ICDI 0.372; CBI 0.532; ENC 35.617) and HCoV-NL63 (ICDI 0.307; CBI 0.476; ENC 37.275), which exhibited moderate codon usage bias.
Table 1
The parameters of codon usage bias among the coronaviruses analyzed in this study.
Coronaviruses | ICDI | CBI | ENC |
SARS-CoV-2 Wuhan-Hu-1 | 0.144 | 0.306 | 45.38 |
SARS-CoV Tor2 | 0.075 | 0.223 | 49.746 |
SARS-CoV Urbani | 0.08 | 0.228 | 48.965 |
MERS-CoV HCoV-EMC | 0.082 | 0.248 | 50.033 |
HCoV-OC43 | 0.213 | 0.367 | 43.794 |
HCoV-HKU1 | 0.372 | 0.532 | 35.617 |
HCoV-229E | 0.172 | 0.358 | 43.45 |
HCoV-NL63 | 0.307 | 0.476 | 37.275 |
Overall, this study has taken a snapshot of the codon usage pattern of SARS-CoV-2. This novel coronavirus has a relatively low codon usage bias, similar to most of the representative coronaviruses, which might help to adapt to the host or the varied environment. Influence factors account for the low codon usage bias of SARS-CoV-2, e.g. natural selection and mutational pressure, warrant further investigation. The information from this research may not only be helpful to get new insights into the evolution of human coronavirus, but also have potential value for developing coronavirus vaccines.