We searched for SARS-CoV-2 transcription in 636 human-SARS-CoV-2 RNA-seq datasets, selecting the 328 of them with at least 1,000 SARS-CoV-2 reads (S. Table 1). We applied the hyperediting tool developed by E. Y. Levanon et al.23 on these human datasets to trace genuine ADAR hyper-editing (he) and thus, we retrieved 14,273 SARS-CoV-2 he reads displaying 0.036 edited reads every thousand viral reads on average (S. Table 1). Along the SARS-CoV-2 genome sequence, he sites could be identified only by lowering the fraction of minimum edited sitesper read to 0.03 instead of 0.05 and the resulting he levels poorly correlated with the coverage of SARS-CoV-2 in the same analyzed samples (Figure 1A, r=0.42, p-values 1.3 e-8). Notably, only 49 of the analyzed datasets included more than 100 hyper-edited SARS-CoV-2 reads (total 9,568 he reads), with the he sites mostly clustering in two conserved hotspots along the SARS-CoV-2 genome: one localized on the RNA-dependent RNA polymerase (nsp12, position 14221:14331) and the second on nsp6 (position 11058:11162), both encoded within the polycistronic ORF1ab (Figure 1D). Normalizing he by gene expression levels, we demonstrated that he mostly impacted ORF6 (Figure 1C), whose encoded product is known to impair the induced transcription of ISGs by interacting with STAT1 and STAT224.
The analysis of additional 227 RNA-seq datasets referred to SARS-CoV-2 infection of non-human hosts, namely hamster, ferret, non-human primates, and mice, revealed relevant he levels exclusively in mouse datasets from one specific experiment (PRJNA646535, S. Table 1). SARS-CoV-2 infection in these mice was made possible by transfection of human Angiotensin I converting enzyme 2 (hACE2), while the role of Interferon signaling during infection was evaluated by knocking-out Interferon receptor (IFNR) or Interferon Regulatory Factor 3/7 (IRF3/7)25. Notably, the IRF3/7 knock-out mice displayed the highest he levels (1.16‰), followed by control (0.26‰) and IFNR knock-out mice (0.11‰). In the mouse datasets we detected a higher hyper-editing efficiency (measured as the number of edited bases per he cluster) on SARS-CoV-2 RNAs than in human datasets; in addition, the normalized he levels impacted SARS-CoV-2 genes homogenously (Figure 1C and D).
To verify if SARS-CoV-2 could in some way interfere with the ADAR he of host dsRNAs, we systematically traced he events on both the mouse and human genomes. In mouse, ADAR he impacted preferentially non-coding genes, although the detected he levels were not influenced by SARS-CoV-2 infection for either protein coding genes or non-coding elements (S. Figure 1A). Although ADAR expression levels were low (<3 TPMs) and mildly downregulated in the knock-out mice, we observed a reduced he, in agreement to the reduced expression of interferon-related genes, which also include ADAR (S. Figure 1A). The efficiency of he was similar between SARS-CoV-2 and mouse RNAs in the genetic knock-out background (Figure 1B). Concerning the human datasets, the host reads have been unfortunately removed from most of the datasets displaying SARS-CoV-2 he. In 7 accessible human samples, we could show a higher he efficiency on host RNAs than on SARS-CoV-2, with an average of 5 edited bases per cluster compared to 3.5 (Figure 1B).
To further investigate the possible implications of ADAR he during SARS-CoV-2 infection in humans, we analyzed post-mortem RNA-seq data from patients differing in the SARS-CoV-2 infection level (N=57)26, demonstrating that he of host dsRNAs was not influenced by SARS-CoV-2 infection (S. Figure 1B). In the same human samples, like in the mouse samples, the ADAR expression ranged from mid to low levels (<20 TPM) and was only mildly modulated by SARS-CoV-2 (fold change=1.8).
Given these results we hypothesized that the SARS-CoV-2 genome sequence could confer resistance to ADAR he and therefore we investigated the dinucleotide composition of the ORFs belonging to 70 coronaviruses, demonstrating a significant under-representation of the ‘WA’ (W=A/T), ‘TC’ and ‘CG’ motifs (S. Figure 2). While the ‘CG’ motif is known to be under-represented in coronaviruses, in agreement with the effect of the zinc finger antiviral protein (ZAP) and the low-CpG frequency characterizing vertebrate hosts27, the ‘WA’ and ‘TC’ motifs are preferential targets for ADAR and APOBEC3 enzymes, respectively28,29. No significant differences in the inter- and intra-genus under-representation were detectable for ‘WA’ and ‘TC’, except between alpha and delta-coronaviruses at ‘WA’ (p-value 0.0032, Figure 2, S. Table 2), with WA under-representation in ORFs 1ab and S being detected in 96% and 81% of the tested coronavirus genomes, respectively. The second metric that we considered was “the replacement transition fraction”, or repTrFrac, which determines a significantly high mutation susceptibility leading to non-synonymous polymorphism (nsSNPs) in an ORF. Applying this metric, we showed that repTrFrac in ‘WA’ was significantly higher than in TC for most of the coronavirus genomes (‘WA’: 78.8 ± 18%; ‘TC’: 31 ± 34.6%, p-value 2.79 e-11, Figure 2).
Among beta-coronaviruses, SARS-CoV-2 locates in the upper part of the distribution for ‘WA’ motifs, because of a significant WA under-representation in the ORFs covering 95% of the genome (ORF1ab, S, ORF3a, M, ORF6, ORF7a, N), whereas SARS-CoV-2 is in the lower part of the distribution for ‘TC’ motifs (ORF1ab and S, Figure 2). We also showed that most of the non-structural proteins encoded in the Orf1ab displayed ‘WA’ under-representation, except for nsp7, nsp9-12 and nsp16 (Figure 1D).