Epidemic HCoVs, including SARS-CoV, MERS-CoV and SARS-CoV-2, although targeting several cell lines, are featured by a higher involvement of the lower respiratory tract compared to other HCoVs and can, therefore, be responsible for severe pneumonia 11,28. Besides direct damages due to viral replication, the dysregulation of the host immune response can induce immune cell infiltration and cytokine storm, leading to severe disease occurrence and fatalities. In mouse experimental models, delayed IFN-I signalling was associated with accumulation of pathogenic inflammatory monocyte-macrophages in the lung and elevated expression of several pro- inflammatory cytokines and chemokines and consequent lung immunopathology 17. However, IFN-I administration before the viral replication peak protected mice from clinical disease17. Therefore, the prompt innate response can limit viral replication and control the downstream activation of immune-mediated damages.
The analysis of genome composition of HCoVs revealed a remarkable bias in several dinucleotide pair usage, as testified by several Rho lower than the 0·78 and 1·23 cut-offs proposed by Karlin et al., (1998) (Figure 1). However, these thresholds can be considered accurate for long sequences only18. Additionally, dinucleotide frequency could be affected by codon bias and by amino acid composition, imposed by protein functional constraints. To deal with this issue, a permutation approach, reshuffling the synonymous codons along the protein (i.e. without affecting the overall codon usage bias and protein structure), was implemented to normalize the Rho value through random sequences generation, allowing statistical testing.
Both Rho and Zscore highlighted a significant CpG under-representation compared to what expected by chance and nucleotide frequency alone, similarly to what described for other RNA viruses19,29. This pair is well-known to be underrepresented in eukaryotic genomes since cytosine in CpG dinucleotides is easily methylated and tend to spontaneously deaminate into thymine.
However, methylation does not seem to occur in RNA viruses, which use their synthetic apparatus for genome replication and transcription30. Higher stacking energy associated to CpG could lead to stronger secondary structures in ssRNA viruses and affect transcription and translation efficiency, as proposed for ssDNA viruses31. However, the corresponding GpC and other pairs featured by high thermal energy were normally represented in the same genes (Figure 1), contradicting this hypothesis. Unmethylated CpG DNA is a well-known target of the pattern recognition receptor (PRR) Toll-like receptor 9 (TLR-9) in mammals and is thus involved in innate immune response activation, thus explaining the tendency of DNA viruses to reduce their CpG content. Although different pattern recognition receptors like TLR-3, TLR-7, TLR-8, RIG-I and MDA5 were recognized to target viral RNA, none of those specifically recognizes CpG motifs 14. However, Atkinson et al., demonstrated that experimentally increasing the CpG content in some RNA viruses led to attenuation, lower replication rate and low competitive fitness relative to wild-type32.Takata et al., proved that the zinc-finger antiviral protein (ZAP) selectively binds to sequences containing CpG dinucleotide and HIV strains whose CpG content has been modified are defective in the normal cells but able to replicate in ZAP defective ones 33. Particularly, ZAP was reported to interact with viral RNA and lead to its degradation34,35. Additionally, a shorter ZAP isoform (ZAPS) has a regulatory activity on RIG-1 signalling, strengthening the RIG-I-mediated induction of type I interferons and other inflammatory cytokines36. In fact, its actual role in antiviral innate immune responses against influenza virus and Newcastle disease virus was experimentally proven36.
Therefore, HCoVs CpG content is likely under strong selective constraints to minimize viral recognition, degradation and/or activation of host innate immunity. The observed CpG ratio would thus be part of a broader HCoVs escape mechanism, likely in concert with viral proteins14. The depletion in CpG content was negatively correlated to coding sequence length (Figure 2) and a significant (p<0·001) negative relationship was identified between CpG ratio (i.e. CpG count ÷ CDS length) and gene length (Supplementary figure 10 and Supplementary table 2), particularly for SARS-CoV and SARS-CoV-2. A stronger selective pressure acting on mRNAs containing the higher absolute amount of CpG can be hypothesized. Significantly, SARS-CoV and SARS-CoV-2 are the HCoVs featured by the more pronounced bias, particularly in the longest CDS, coding for pp1ab and S. These genome features could severely impair viral recognition in the early infection phases when viral nucleic acids are the more abundant viral pathogen-associated molecular patterns (PAMPs), and when the inhibitory effect of viral proteins on cellular defence mechanisms is still modest. This could be associated with limited or delayed INF production, higher viral replication and immune response deregulation, leading to a poor outcome. SARS-CoV and SARS-CoV-2 also displayed a lower TpA content, which is frequently reported to be under-represented in eukaryotic genomes 37. TpA recognition in viral RNA sequences is described as a vertebrate immune response mechanism and other human viruses like West Nile Virus (WNV) and Hepatitis C virus (HCV) are known to be recognized by RNase L 38,39. Accordingly, artificial increase in TpA content resulted in viral attenuation, although less marked compared to CpG32. This feature could further promote immune evasion, enhancing viral replication. On the contrary, MERS-CoV had a lower degree of these dinucleotides under-representation. If this could be associated to a more intense immune response, severe disease and case fatality rate, as proposed for the original 1918 H1N1 influenza virus and the recent H5N1 avian viruses (featured by a higher CpG content)29, would require further investigations.
The strong constraints acting on CpG were confirmed by RSCU analysis, which appeared greatly affected by underling dinucleotide bias. In fact, the under-representation of codons containing the CpG pair was a common feature of HCoVs. Nevertheless, it was particularly evident in the SARS- CoV-2 pp1ab and S coding regions. The pattern was progressively less marked in other genes, in a CDS length-dependent fashion. Particularity, the E gene was proven in countertrend. If other factors besides gene length are involved (e.g. mRNA transcription level and timing), remains to be established.
In addition to comorbidities, age is one of the most relevant risk factors for severe disease occurrence and death. A decreased efficiency of several components of the immune system has been proven in elderlies. Among those, deficiency in the induction of type I interferon (IFN) was described in response to IAV infection in older patients 40. Of note, both direct and indirect defects acting on the RIG-1 pathways occurs. The first ascribable to increased basal proteasomal degradation of the adaptor protein tumor necrosis factor receptor–associated factor 3 (TRAF3), which impairs the primary induction of IFN expression downstream of RIG-I signalling. The second due to the impaired expression of the transcription factor IRF8 in older people, which is further exaggerated by the initial defects in IFN secretion and leads to a marked decrease in positive feedback amplification of the IFN response 41.It is therefore tempting to speculate that an interaction between a defective RIG-1 signalling pathway and low RIG-1 activation due to poor viral recognition could exacerbate the delay and effectiveness of INF production in older patients, contribution to a poor outcome. Therapeutic strategies acting on this axis could boost antiviral responses to SARS-CoV-2, and other infection as well, reducing morbidities in ageing population. Clearly, dinucleotide composition alone cannot explain the different epidemiological and clinical features of HCoVs. Receptor and tissues tropism, as well as differential viral protein function and interaction with host ones, play a major role in the final outcome. Interestingly, HCoVs responsible for severe disease demonstrated a higher effective number of codons, overlapping the host lung one, which could suggest a higher ability to exploit the cell replicative machinery. In fact, while genome composition and dinucleotide frequency preeminently affected RSCU, a residual deviation from expectations was observed even after accounting for these factors. Thus, other forces are likely acting directly on codon bias.
The present study demonstrates a severe under-representation of some dinucleotide pairs, CpG and to a lesser extent TpA, in SARS-CoV and even more in SARS-CoV-2. Since these motifs have been proven to be the target of PRRs, the SARS-CoV-2 genome features are likely to contribute in preventing viral recognition in the early infection phases, potentially leading to poorly effective and dysregulated immune response13, as demonstrated for SARS-CoV. These effects could be magnified in elderly people where the components of the involved signalling pathways are already defective. The underlying biological processes could, therefore, be considered a primary therapeutic target aimed to reactivate and boost patient response to viral infection. Additionally, these pieces of evidence could contribute to the development of genetically engineered vaccines, like RNA vaccines, able to elicit a strong initial innate immunity without affecting the protein phenotype and therefore their structure and immunogenicity.