The features of SARS-CoV-2 and other human coronavirus genomes could affect interferon production and immune response

DOI: https://doi.org/10.21203/rs.3.rs-25640/v1

Abstract

A new pandemic caused by the betacoronavirus SARS-CoV-2 originated in China in late 2019. Although often asymptomatic, a relevant percentage of affected people can develop severe pneumonia. Initial evidence suggests that dysregulation of the immune response could contribute to the pathogenesis, as previously demonstrated for SARS-CoV. The presence of genome composition features involved in delaying viral recognition is herein investigated for human coronaviruses (HCoVs), with a special emphasis on SARS-CoV-2. A broad collection of HCoVs polyprotein, envelope, matrix, nucleocapsid and spike coding sequences was downloaded and several statistics representative of genome composition and codon bias were investigated. A model able to evaluate and test the presence of a significant under- or over-representation of dinucleotide pairs while accounting for the underlying codon bias and protein sequence was also implemented.

The study revealed the significant under-representation of CpG dinucleotide pair in all HcoV, but especially in SARS-CoV and even more in SARS-CoV-2. The presence of forces acting to minimize CpG content was confirmed by relative synonymous codon usage pattern. Codons containing the CpG pair were severely under-represented, primarily in the polyprotein and spike coding sequences of SARS-CoV-2. Additionally, a significant under-representation of the TpA pair was observed in the N and S region of SARS-CoV and SARS-CoV-2.

Increasing experimental evidence has proven that CpG and TpA are targeted by innate antiviral host defences, contributing both to RNA degradation and RIG-1 mediated interferon production. The low content of these dinucleotides could contribute to a delayed interferon production, dysregulated immune response, higher viral replication and poor outcome. Significantly, the RIG-1 signalling pathway was proven to be defective in elderlies, suggesting a likely interaction between limited viral recognition and lower responsiveness in interferon production that could justify the higher disease severity and mortality in older patients.

Introduction

The family Coronaviridae includes a large group of enveloped, positive-stranded RNA viruses further classified in 4 genera Alpha-, Beta-, Gamma-, and Deltacoronavirus.

The coronavirus genome is of extraordinarily large size compared to other RNA viruses, ranging between about 26 to 32 kilobases1. The 5’ two-thirds of the genome code for two polyproteins, pp1a and pp1ab, which are then proteolytically cleaved in several nonstructural proteins. Production of the pp1ab requires the translating ribosome to change the reading frame at the frameshift signal that bridges ORF1a and ORF1ab2. The final one-third encodes a set of four structural protein genes, in the order of spike (S), envelope (E), membrane (M) and nucleocapsid (N). Besides, several accessory ORFs are also interspersed along the structural protein genes and the number and location varies among CoV species 3.

Until 2019, six human coronaviruses (HCoVs) have been identified, including the two alpha-CoVs HCoV-NL63 and HCoV-229E and the beta-CoVs HCoV-OC43, HCoV-HKU1, severe acute respiratory syndrome-CoV (SARS-CoV) and Middle East respiratory syndrome-CoV (MERS- CoV)4. All these viruses cause mainly respiratory signs and HCoV-NL63, HCoV-229E, HCoV- OC43 and HCoV-HKU1 have been associated to the common cold and mild, upper respiratory tract infections, although lower respiratory tract involvement and more serious respiratory disease can occur in children, elderly and persons with underlying illness 5,6.

Even if HCoVs have been identified for decades, their clinical importance was limited until the emergence of the two epidemic SARS-CoV and MERS-CoV viruses. SARS-CoV originated from bats species through a likely passage in carnivores in 2003 and affected more than 8000 people, with 916 deaths (case fatality approximatively 10%) in 29 countries before being eradicated by global control efforts 7. MERS-CoV, incidentally identified in 2012, was then proven to regularly infect humans who acquire the infection from dromedary camels, although bats were again identified as the original reservoir. Since the first reports, more than 2000 cases have been described, with a case fatality higher than 30%4.

In late 2019, a new pandemic HCoV (SARS-CoV-2) emerged in the city of Wuhan in Hubei province, and despite unprecedented control measures implemented in China and other countries, SARS-CoV-2 has reached a worldwide distribution8. At the moment of writing, 212 countries have reported SARS-CoV-2 detection, for a total of 1353361 cases and 79235 deaths (https://www.who.int/emergencies/diseases/novel-coronavirus-2019). However, the lack of diagnostic assays, especially in the initial epidemic phase, and of properly performed epidemiological studies hampers the comprehension of the actual infection prevalence. Mathematical simulation based on the data available from 11 European countries estimates that between 7 and 43 million individuals have been infected with SARS-CoV-2 across all considered countries by the end of March, representing between 1.88% and 11.43% of the population, with peaks of 15% in some countries 9.

While most of the infected people had an asymptomatic infection or mild disease, 20% of ill people develop severe disease, with an overall case fatality rate of more than 3% among confirmed cases. Several factors seem to affect the risk of severe disease and mortality, including gender, co- morbidities and age 10.

Although distinct, MERS-CoV, SARS-CoV and SARS-CoV-2 share some common clinical and pathogenic features 11. SARS-CoV and MERS-CoV severe to fatal cases were predominant characterized interstitial pneumonia, edematous lungs with acute diffuse bronchial and alveolar damage, coupled with increased monocyte, macrophage, and neutrophil infiltration in the lungs11. Elevated levels of proinflammatory cytokines and chemokines were also identified, indicative that both virus‐induced cytopathic effects and immunopathology due to “cytokine‐storm” are likely contributing to disease severity and poor prognosis12.High levels of proinflammatory cytokines were measured in SARS patients with severe disease compared to uncomplicated cases.

Initials studies on SARS-CoV-2 evidenced overlapping signs and lesions and patients requiring intensive care (IC) showed higher plasma levels of IL‐2, IL‐7, IL‐10, GSCF, IP10, MCP1, MIP1A, and TNF‐α than non‐IC patients13.

Coronaviruses have evolved different mechanisms to alter cell signalling, delay innate immunity and deregulate immune effectors. Several coronavirus proteins have been demonstrated to interact at different levels with, but not limited to, interferon I (INF-I) 14–16. Accordingly, delayed interferon production and consequent dysregulated innate immune response have been linked to lethal infection in experimental mice model17.

A rising amount of evidence suggests that besides proteins, also viral genome composition could be involved in minimizing viral recognition and therefore delay the activation of an effective immune response. Several studies have demonstrated the presence of a relevant genomic signature in dinucleotide frequencies of different organisms 18. This evidence has been reported also for viral genomes and could be the result of pressures induced by the “host environment” and its immune defences19. Interestingly, even the virome of different environments is featured by a distinct pattern, which confirms the direct or indirect effect of environmental conditions on organism genome composition20.

Codon bias is another phenomenon affecting the fitness of the genome without a direct impact on proteins. Because of the degeneracy of the genetic code, most amino acids are encoded by two to six different codons21,22. However, different synonymous codons are used with different frequencies among organisms or even among tissues of the same organism23. Being intracellular obligate parasites, viruses must be able to both efficiently exploit the cell synthetic machinery and avoid recognition mechanisms. Accordingly, the virus-host mimicry in terms of codon bias and genome composition well as their progressive viral adaptation of after a host jump has been occasionally reported 24–27.

The present study investigates the genome composition of HCoVs, with a special emphasis on SARS-CoV-2 to evaluate the presence of patterns that could contribute in innate immunity deregulation, favouring viral replication and therefore infection spreading and disease severity.

Results

Genome composition

Although a different nucleotide percentage was identified among genes and species, some common features were present. Particularly, G and C nucleotide were significantly less present than A and T, independently from the vial species and genes (Supplementary figure 1 and table 1). Overall, the G+C content decreased from the first to third codon position. The envelope protein was a significant exception, being featured by a higher GC3 content than CG2. The same could be observed in the matrix protein of HCoV-229E, MERS-CoV and SARS-CoV.

The analysis of dinucleotide odds ratio revealed a remarkable heterogeneity. However, all HCoVs demonstrated a relevant under-representation of the CpG pair. SARS-CoV-2 was the species with the lowest CpG Rho in the pp1ab and especially in S CDS. On the other hand, SARS-CoV-2 and SARS-CoV were the only species where this dinucleotide was over-represented in the E coding gene (Figure 1). Among the HCoVs inducing severe disease, MERS-CoV demonstrated the overall less biased usage of this dinucleotide. The TpA pair was also significantly under-represented in HcoV-229E, SARS-CoV and SARS-CoV-2 (and to a lesser extent in MERS-CoV) in the nucleocapsid, and slightly under-represented in the S gene of SARS-CoV and SARS-CoV-2. Most of the other dinucleotide pairs were within the expected ranges, although TpG was over- represented in pp1ab of all viruses and in species-specific fashion for other genes.

The Zscore confirmed the observed scenario, reinforcing that the CpG was significantly under- represented even accounting for amino acid composition and codon bias (Supplementary figure 2). However, the CpG odds ratio of SARS-CoV and SARS-CoV-2 fall within the expected range in the E genes, differently from what observed by crude Rho estimation. On the other hand, an over- representation of ApC, CpA and TpG pair was observed in the pp1ab and S of essentially all viruses, being the S of HCoV−NL63 the main exception. The first two principal components (PC) of PCA based on Zscores explained 78% and 8.9% of the overall variance and were therefore maintained to explore the data. A clear separation could be observed among proteins while viral species demonstrated a largely overlapping distribution (Supplementary figure 3). However, within each gene, viral HCoVs could be differentiated (Figure 2). PC1 loadings inspection shoved that CpG had the highest positive correlation with this principal component. The HCoVs proteins distribute along PC1 axis following a length-dependent pattern. Pp1ab located at the most negative extreme, followed by S, N, M and E. Therefore a negative correlation could be observed between coding sequence length and CpG content.

Within each protein, pp1ab and S coding gene of SARS-CoV and SARS-CoV-2 locate at the most negative extreme of PC1. SARS-CoV and SARS-CoV-2, together with HcoV-229E, had lower PC1 values compared to other species in the M also, while only SARS-CoV-2 in the N gene. On the contrary, the highest value was observed in the E coding sequence.

In the second PC (PC2), CpA and TpA showed the highest positive and negative correlation, respectively (Figure 2 and Supplementary figure 3), while CpG showed a certain negative correlation. SARS-CoV-2 was associated to higher PC2 values in S (together with SARS-CoV) and N (together with HCoV−229E and SARS-CoV). Slightly positive to neutral values featured SARS- CoV-2 in pp1ab, M and E gene, respectively.

Codon bias

RSCU analysis highlighted a similarly heterogeneous pattern. However, the under-representation of codon containing the CpG dinucleotides was a common feature, affecting all viral proteins and viral species (Supplementary figure 4). However, SARS-CoV-2 only had all these codon under- represented (or, to a lesser extent, normally represented) in pp1ab gene, while in the S gene also SARS-CoV and HCoV−OC43 shared this feature. On the contrary, a higher number of CpG rich codons was observed in the E coding gene.

PCA performed on RSCU highlighted a less clear overall pattern, featured by a lower variance explained by each PC. However, when performed at the individual gene level, a distinction among species could still be achieved. Independently on the gene, most of the codons with CpG were typically highly correlated with at least one of the PC (Figure 3 and Supplementary figure 5-9). In pp1ab, with only one exception, all CpG rich codons were positively correlated with PC1 and PC2 (Supplementary figure 5). SARS-CoV-2 was the only viral species with highly negative values in both PCs. On the contrary, MERS-CoV located in the quadrant featured by higher positive values. Similarly, SARS-CoV and SARS-CO-2 had negative PC2 values, while most CpG codons were positively correlated with this component (Figure 3 and Supplementary figure 5). In the S region, most of CpG rich codon were negatively correlated to PC1 and PC2. SARS-CoV-2 located in the positive quadrant of both PC, while SARS-CoV had positive values in the PC2 and negative ones in PC1 (Supplementary figure 6). In the N, M and E genes, SARS-CoV-2 located in regions defined by a neutral to positive association with CpG content. Although in M and N genes, CpG containing codons correlation with PC was less polarized (Supplementary figures 7-9).

A differential codon usage among proteins could be confirmed by effective codon usage analysis. E gene showed an overall more biased (lower Nc) compared to other viral proteins and the host coding sequences. Among HCoVs, MERS-CoV, SARS-CoV and SARS-CoV-2 showed a higher Nc, fully overlapping with the host one in all but E coding regions. HCoV−HKU1, HCoV−NL63 and to a lesser extent HCoV−OC43 and HCoV−229E had a lower Nc, distinct from the host, although exceptions were present for specific species-gene pairs (Figure 4).

Nc' was significantly higher compared to Nc, leading to values even greater than the human genes ones in the pp1ab and S coding region, testifying that viral genome nucleotide composition had a relevant effect on codon bias. However, even after accounting for this component, effective codon number deviated from what expected based on GC3 composition only (Figure 4).

Discussion

Epidemic HCoVs, including SARS-CoV, MERS-CoV and SARS-CoV-2, although targeting several cell lines, are featured by a higher involvement of the lower respiratory tract compared to other HCoVs and can, therefore, be responsible for severe pneumonia 11,28. Besides direct damages due to viral replication, the dysregulation of the host immune response can induce immune cell infiltration and cytokine storm, leading to severe disease occurrence and fatalities. In mouse experimental models, delayed IFN-I signalling was associated with accumulation of pathogenic inflammatory monocyte-macrophages in the lung and elevated expression of several pro- inflammatory cytokines and chemokines and consequent lung immunopathology 17. However, IFN-I administration before the viral replication peak protected mice from clinical disease17. Therefore, the prompt innate response can limit viral replication and control the downstream activation of immune-mediated damages.

The analysis of genome composition of HCoVs revealed a remarkable bias in several dinucleotide pair usage, as testified by several Rho lower than the 0·78 and 1·23 cut-offs proposed by Karlin et al., (1998) (Figure 1). However, these thresholds can be considered accurate for long sequences only18. Additionally, dinucleotide frequency could be affected by codon bias and by amino acid composition, imposed by protein functional constraints. To deal with this issue, a permutation approach, reshuffling the synonymous codons along the protein (i.e. without affecting the overall codon usage bias and protein structure), was implemented to normalize the Rho value through random sequences generation, allowing statistical testing.

Both Rho and Zscore highlighted a significant CpG under-representation compared to what expected by chance and nucleotide frequency alone, similarly to what described for other RNA viruses19,29. This pair is well-known to be underrepresented in eukaryotic genomes since cytosine in CpG dinucleotides is easily methylated and tend to spontaneously deaminate into thymine.

However, methylation does not seem to occur in RNA viruses, which use their synthetic apparatus for genome replication and transcription30. Higher stacking energy associated to CpG could lead to stronger secondary structures in ssRNA viruses and affect transcription and translation efficiency, as proposed for ssDNA viruses31. However, the corresponding GpC and other pairs featured by high thermal energy were normally represented in the same genes (Figure 1), contradicting this hypothesis. Unmethylated CpG DNA is a well-known target of the pattern recognition receptor (PRR) Toll-like receptor 9 (TLR-9) in mammals and is thus involved in innate immune response activation, thus explaining the tendency of DNA viruses to reduce their CpG content. Although different pattern recognition receptors like TLR-3, TLR-7, TLR-8, RIG-I and MDA5 were recognized to target viral RNA, none of those specifically recognizes CpG motifs 14. However, Atkinson et al., demonstrated that experimentally increasing the CpG content in some RNA viruses led to attenuation, lower replication rate and low competitive fitness relative to wild-type32.Takata et al., proved that the zinc-finger antiviral protein (ZAP) selectively binds to sequences containing CpG dinucleotide and HIV strains whose CpG content has been modified are defective in the normal cells but able to replicate in ZAP defective ones 33. Particularly, ZAP was reported to interact with viral RNA and lead to its degradation34,35. Additionally, a shorter ZAP isoform (ZAPS) has a regulatory activity on RIG-1 signalling, strengthening the RIG-I-mediated induction of type I interferons and other inflammatory cytokines36. In fact, its actual role in antiviral innate immune responses against influenza virus and Newcastle disease virus was experimentally proven36.

Therefore, HCoVs CpG content is likely under strong selective constraints to minimize viral recognition, degradation and/or activation of host innate immunity. The observed CpG ratio would thus be part of a broader HCoVs escape mechanism, likely in concert with viral proteins14. The depletion in CpG content was negatively correlated to coding sequence length (Figure 2) and a significant (p<0·001) negative relationship was identified between CpG ratio (i.e. CpG count ÷ CDS length) and gene length (Supplementary figure 10 and Supplementary table 2), particularly for SARS-CoV and SARS-CoV-2. A stronger selective pressure acting on mRNAs containing the higher absolute amount of CpG can be hypothesized. Significantly, SARS-CoV and SARS-CoV-2 are the HCoVs featured by the more pronounced bias, particularly in the longest CDS, coding for pp1ab and S. These genome features could severely impair viral recognition in the early infection phases when viral nucleic acids are the more abundant viral pathogen-associated molecular patterns (PAMPs), and when the inhibitory effect of viral proteins on cellular defence mechanisms is still modest. This could be associated with limited or delayed INF production, higher viral replication and immune response deregulation, leading to a poor outcome. SARS-CoV and SARS-CoV-2 also displayed a lower TpA content, which is frequently reported to be under-represented in eukaryotic genomes 37. TpA recognition in viral RNA sequences is described as a vertebrate immune response mechanism and other human viruses like West Nile Virus (WNV) and Hepatitis C virus (HCV) are known to be recognized by RNase L 38,39. Accordingly, artificial increase in TpA content resulted in viral attenuation, although less marked compared to CpG32. This feature could further promote immune evasion, enhancing viral replication. On the contrary, MERS-CoV had a lower degree of these dinucleotides under-representation. If this could be associated to a more intense immune response, severe disease and case fatality rate, as proposed for the original 1918 H1N1 influenza virus and the recent H5N1 avian viruses (featured by a higher CpG content)29, would require further investigations.

The strong constraints acting on CpG were confirmed by RSCU analysis, which appeared greatly affected by underling dinucleotide bias. In fact, the under-representation of codons containing the CpG pair was a common feature of HCoVs. Nevertheless, it was particularly evident in the SARS- CoV-2 pp1ab and S coding regions. The pattern was progressively less marked in other genes, in a CDS length-dependent fashion. Particularity, the E gene was proven in countertrend. If other factors besides gene length are involved (e.g. mRNA transcription level and timing), remains to be established.

In addition to comorbidities, age is one of the most relevant risk factors for severe disease occurrence and death. A decreased efficiency of several components of the immune system has been proven in elderlies. Among those, deficiency in the induction of type I interferon (IFN) was described in response to IAV infection in older patients 40. Of note, both direct and indirect defects acting on the RIG-1 pathways occurs. The first ascribable to increased basal proteasomal degradation of the adaptor protein tumor necrosis factor receptor–associated factor 3 (TRAF3), which impairs the primary induction of IFN expression downstream of RIG-I signalling. The second due to the impaired expression of the transcription factor IRF8 in older people, which is further exaggerated by the initial defects in IFN secretion and leads to a marked decrease in positive feedback amplification of the IFN response 41.It is therefore tempting to speculate that an interaction between a defective RIG-1 signalling pathway and low RIG-1 activation due to poor viral recognition could exacerbate the delay and effectiveness of INF production in older patients, contribution to a poor outcome. Therapeutic strategies acting on this axis could boost antiviral responses to SARS-CoV-2, and other infection as well, reducing morbidities in ageing population. Clearly, dinucleotide composition alone cannot explain the different epidemiological and clinical features of HCoVs. Receptor and tissues tropism, as well as differential viral protein function and interaction with host ones, play a major role in the final outcome. Interestingly, HCoVs responsible for severe disease demonstrated a higher effective number of codons, overlapping the host lung one, which could suggest a higher ability to exploit the cell replicative machinery. In fact, while genome composition and dinucleotide frequency preeminently affected RSCU, a residual deviation from expectations was observed even after accounting for these factors. Thus, other forces are likely acting directly on codon bias.

The present study demonstrates a severe under-representation of some dinucleotide pairs, CpG and to a lesser extent TpA, in SARS-CoV and even more in SARS-CoV-2. Since these motifs have been proven to be the target of PRRs, the SARS-CoV-2 genome features are likely to contribute in preventing viral recognition in the early infection phases, potentially leading to poorly effective and dysregulated immune response13, as demonstrated for SARS-CoV. These effects could be magnified in elderly people where the components of the involved signalling pathways are already defective. The underlying biological processes could, therefore, be considered a primary therapeutic target aimed to reactivate and boost patient response to viral infection. Additionally, these pieces of evidence could contribute to the development of genetically engineered vaccines, like RNA vaccines, able to elicit a strong initial innate immunity without affecting the protein phenotype and therefore their structure and immunogenicity.

Material and methods

Dataset

The broad collection of pp1a, E, M, N and S complete coding sequences obtained from HCoVs was downloaded using the ViPR on-line tool (accessed 27/03/2020)42. In-house developed Python scripts were used for gene and features extraction, benefiting from the Biopython library functions43. Different datasets were created for each coding DNA sequence (CDS) and sequences with unknown nucleotide, frameshift mutations, premature stop codons or derived from experimental models were excluded from further analysis.

Viral genome composition analysis

For each sequence, the following statistics were obtained: the content of each nucleotide, total GC content (GC) and in codon positions 1 (GC1), 2 (GC2) and 3 (GC3) (in percentage).

The dinucleotide odds ratio (Rho) was computed for each dinucleotide pair using the R library Seqinr44.

Briefly, the Rho represents the frequency of dinucleotide (xy) divided by the product of frequencies of nucleotide (x) and nucleotide (y) and it is expected to be equal to 1.00 when dinucleotide (xy) is formed by chance. Since dinucleotide frequencies can be biased by the protein primary structure (i.e. amino acid sequence) and codon usage bias of these genes, the observed Rho was normalized by its expectation and variance estimated through random sequence generation, thus allowing to evaluate the degree of over- under-representation and its statistical significance. Particularly, the selected models allowed for random sequence generation by shuffling of synonymous codons, without affecting the codon usage bias and protein structure. A total of 1000 simulated sequences were generated for each dinucleotide pair.

Relative synonymous codon usage (RSCU) and effective number of codons (Nc)

The RSCU was calculated using the seqinr package in R. This statistic, indicative of codon bias, is calculated based on the number of times a particular codon is observed, relative to the number of times that the codon would be observed assuming a uniform synonymous codon usage.

Consequently, in the absence of any codon bias, a value close to 1 is expected, while synonymous codons with values lower than 0.6 or greater than 1.6 are regarded as under- or over-represented, respectively, 24,27.

The Nc values were calculated using the ttp://agnigarh.tezu.ernet.ihn/~ssankar/cub.php website.

This summary statistic describes the total number of different codons used in a sequence and can thus range between 21 (only one codon used for each amino-acid) and 60 (all synonymous codons are uniformly used). A second parameter, the Nc’ statistic (also ranging between 21 and 60) was calculated to account for the effect of genome composition on codon bias. Obtained Nc and Nc’ values were plotted against their GC3 content and compared with the expected Nc distribution under the assumption that it is determined only by GC3 content, similarly to what performed in Franzo et al.,(2017 and 2018).24,25

Additionally, the effective number of codons of the human lung genes was also evaluated. To this purpose, human lung protein expression levels were obtained from Wang et al.,45 through the Expression atlas of EMBL-EBI interface46. Proteins in the upper 25% expression quantile were selected and the relative coding sequences were downloaded and analyzed as previously described.

Principal component analysis (PCA)

The principal component analysis was performed independently on the dinucleotide Zscore and RSCU of all genes, after centring and scaling, using the prcomp function of the stats library in R.

Declarations

Competing interests

The authors declare no competing interest.

Funding

This research was partially founded by the grant (BIRD187958/18) from the Department of Animal Medicine, Production and Health, University of Padua.

References

  1. Lim, Y., Ng, Y., Tam, J. & Liu, D. Human Coronaviruses: A Review of Virus–Host Interactions. Diseases 4, 26 (2016).
  2. , E. Ribosomal Frameshift Signals in Viral Genomes. Viral Genomes - Mol. Struct. Divers. Gene Expr. Mech. Host-Virus Interact. 91–122 (2012). doi:10.5772/26550
  3. Fehr, A. R. & Perlman, S. Coronaviruses: An overview of their replication and pathogenesis. in Coronaviruses: Methods and Protocols 1–23 (Springer New York, 2015). doi:10.1007/978-1-4939-2438-7_1
  1. Yin, Y. & Wunderink, R. G. MERS, SARS and other coronaviruses as causes of Respirology 23, 130–137 (2018).
  1. Reads, C. Human coronaviruses : What do they cause ? International Medical Press : Antiviral Therapy Human coronaviruses , what do they cause ? (2015).
  2. Corman, V. M., Muth, D., Niemeyer, D. & Drosten, C. Hosts and Sources of Endemic Human Coronaviruses. in Advances in Virus Research (2018). doi:10.1016/bs.aivir.2018.01.001
  3. WHO | WHO guidelines for the global surveillance of severe acute respiratory syndrome (SARS). Updated recommendations, October 2004. WHO (2015).
  4. Lake, M. A. What we know so far: COVID-19 current clinical knowledge and research. Med. 20, 124–127 (2020).
  5. Flaxman, S. et al. Report 13: Estimating the number of infections and the impact of non- pharmaceutical interventions on COVID-19 in 11 European countries. (2020).
  6. Rabi, F. A., Al Zoubi, M. S., Kasasbeh, G. A., Salameh, D. M. & Al-Nasser, A. D. SARS- CoV-2 and Coronavirus Disease 2019: What We Know So Far. Pathogens 9, 231 (2020).
  7. Liu, J. et al. Overlapping and discrete aspects of the pathology and pathogenesis of the emerging human pathogenic coronaviruses SARS‐CoV, MERS‐CoV, and 2019‐nCoV. Med. Virol. 92, 491–494 (2020).
  8. Kong, S. L., Chui, P., Lim, B. & Salto-Tellez, M. Elucidating the molecular physiopathology of acute respiratory distress syndrome in severe acute respiratory syndrome patients. Virus Res. 145, 260–269 (2009).
  9. Huang, C. et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395, 497–506 (2020).
  10. Kikkert, M. Innate Immune Evasion by Human Respiratory RNA Viruses. Journal of Innate Immunity 12, 4–20 (2020).
  11. Li, G. et al. Coronavirus infections and immune responses. Med. Virol. 92, 424–432 (2020).
  12. Zheng, J. & Perlman, S. Immune responses in influenza A virus and human coronavirus infections: an ongoing battle between the virus and host. Current Opinion in Virology (2018). doi:10.1016/j.coviro.2017.11.002
  13. Channappanavar, R. et al. Dysregulated Type I Interferon and Inflammatory Monocyte- Macrophage Responses Cause Lethal Pneumonia in SARS-CoV-Infected Mice. Cell Host Microbe (2016). doi:10.1016/j.chom.2016.01.007
  14. Karlin, S., Campbell, a M. & Mrázek, J. Comparative DNA analysis across diverse genomes. Rev. Genet. 32, 185–225 (1998).
  15. Gu, H., Fan, R. L. Y., Wang, D. & Poon, L. L. M. Dinucleotide evolutionary dynamics in influenza A virus. Virus Evol. 5, 1–10 (2019).
  1. Willner, D., Thurber, R. V. & Rohwer, F. Metagenomic signatures of 86 microbial and viral metagenomes. Microbiol. 11, 1752–1766 (2009).
  2. Hershberg, R. & Petrov, D. A. Selection on Codon Bias. Rev. Genet. 42, 287–299 (2008).
  3. Roth, A., Anisimova, M. & Cannarozzi, G. M. Measuring codon usage bias. in Codon Evolution: Mechanisms and Models 189–217 (Oxford University Press, 2012). doi:10.1093/acprof:osobl/9780199601165.003.0013
  4. Plotkin, J. B., Robins, H. & Levine, A. J. Tissue-specific codon usage and the expression of human genes. (2004).
  5. Franzo, G., Tucciarone, C. M., Cecchinato, M. & Drigo, M. Canine parvovirus type 2 (CPV-2) and Feline panleukopenia virus (FPV) codon bias analysis reveals a progressive adaptation to the new niche after the host jump. Mol. Phylogenet. Evol. 114, 82–92 (2017).
  1. Franzo, G., Segales, J., Tucciarone, C. M., Cecchinato, M. & Drigo, M. The analysis of genome composition and codon bias reveals distinctive patterns between avian and mammalian circoviruses which suggest a potential recombinant origin for Porcine circovirus 3. PLoS One 13, (2018).
  2. Bahir, I., Fromer, M., Prat, Y. & Linial, M. Viral adaptation to host: A proteome-based analysis of codon usage and amino acid preferences. Syst. Biol. 5, 311 (2009).
  3. Wong, E. H. M., Smith, D. K., Rabadan, R., Peiris, M. & Poon, L. M. Codon usage bias and the evolution of influenza A viruses. Codon Usage Biases of Influenza Virus. BMC Evol. Biol. 10, 253 (2010).
  4. Chan, M. C. W. The Lancet Respiratory Medicine Tropism of the novel coronavirus SARS- CoV-2 in human respiratory tract : an analysis in ex vivo and in vitro cultures. (2019).
  5. Greenbaum, B. D., Levine, A. J., Bhanot, G. & Rabadan, R. Patterns of evolution and host gene mimicry in influenza and other RNA viruses. PLoS Pathog. 4, e1000079 (2008).
  6. Cheng, X. et al. CpG Usage in RNA Viruses: Data and Hypotheses. PLoS One 8, (2013).
  7. Shackelton, A., Parrish, C. R. & Holmes, E. C. Evolutionary basis of codon usage and nucleotide composition bias in vertebrate DNA viruses. J. Mol. Evol. 62, 551–563 (2006).
  8. Atkinson, N. J., Witteveldt, J., Evans, D. J. & Simmonds, P. The influence of CpG and UpA dinucleotide frequencies on RNA virus replication and characterization of the innate cellular pathways underlying virus attenuation and enhanced replication. Nucleic Acids Res. 42, 4527–4545 (2014).
  9. Takata, M. A. et al. CG dinucleotide suppression enables antiviral defence targeting non-self RNA. Nature 550, 124–127 (2017).
  10. Zhu, Y. & Gao, G. ZAP-mediated mRNA degradation. RNA Biol. 5, 65–7 (2008). doi.org/10.4161/rna.5.2.6044
  1. Chen, G., Guo, X., Lv, F., Xu, Y. & Gao, G. p72 DEAD box RNA helicase is required for optimal function of the zinc-finger antiviral protein. Natl. Acad. Sci. U. S. A. 105, 4352–4357 (2008).
  2. Hayakawa, S. et al. ZAPS is a potent stimulator of signaling mediated by the RNA helicase RIG-I during antiviral responses. Immunol. 12, 37–44 (2011).
  3. Roth, A., Anisimova, M. & Cannarozzi, G. M. Measuring codon usage bias. in Codon Evolution Mechanisms and Models 189–217 (Oxford University Press, 2012). doi:10.1093/acprof
  4. Washenberger, C. L. et al. Hepatitis C virus RNA: Dinucleotide frequencies and cleavage by RNase L. Virus Res. 130, 85–95 (2007).
  5. Ireland, D. D. C. et al. RNase L Mediated Protection from Virus Induced PLoS Pathog. 5, e1000602 (2009).
  1. Pillai, P. S. et al. Mx1 reveals innate pathways to antiviral resistance and lethal influenza disease. Science (80-. ). 352, 463–466 (2016).
  2. Molony, R. D. et al. Aging impairs both primary and secondary RIG-I signaling for interferon induction in human monocytes. Signal. 10, (2017).
  3. Pickett, B. E. et al. ViPR: An open bioinformatics database and analysis resource for virology research. Nucleic Acids Res. 40, D593-8 (2012).
  4. Cock, P. J. A. et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
  5. Charif, D. & Lobry, J. R. SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis. in Structural approaches to sequence evolution 207–232 (Springer, 2007). doi:10.1007/978-3-540-35306- 5_10
  6. Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues Running title : Quantitative human proteome atlas. bioRxiv 1–16 (2018). doi:10.1101/357137
  7. Petryszak, R. et al. Expression Atlas update - An integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 44, D746–D752 (2016).