SARS-CoV-2 evasion from ADAR hyper-editing is both genome-encoded and sustained by the virus replication strategy

We evaluated the role of ADAR during in-vivo SARS-CoV-2 infection, identifying ADAR-mediated hyperediting only in 49 RNA-seq samples at a low level. Hyper-editing of host dsRNAs appeared not inuenced by SARS-CoV-2 infection and showed higher eciency compared to viral editing. Conversely, in mouse samples we found abundant hyper-editing with similar eciency between host and SARS-CoV-2 RNAs. Underrepresentation of dinucleotide motifs along coronavirus ORFs suggested that SARS-CoV-2 resistance to ADAR hyper-editing is both evolutionary-encoded and sustained by viral replication strategy.


Introduction
Since 2000, Coronaviruses like SARS-CoV, MERS-CoV and SADS-CoV have caused pandemics in multiple hosts 1 . Since the end of 2019, Severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) has rapidly spread worldwide, making COVID-19 the worst zoonotic disease of the modern history. Although considerably similar to coronaviruses of bats and pangolins 2 , SARS-CoV-2 has jumped from a still unknown host into humans, with the cross-species transmission mediated by an e cient interaction with human cell receptors 3 . Although the SARS-CoV-2 mutation rate is slower than other RNA viruses because of RNA proofreading activity 4 , genomic data produced in real time have shown a strong bias towards Cto-T substitutions, possibly resulting from APOBEC-mediated editing 5,6 .
SARS-CoV-2 mutations can crucially affect host-pathogen interactions by modifying the targets of RNA editing enzymes, the secondary structure of viral RNAs 7 and stimulating innate immune responses via TLR7 and TLR8 8 . RNA sequencing (RNA-seq) and biochemical analyses revealed an extraordinary activation of the immune system during SARS-CoV-2 infection, known as "cytokine storm", which contributed to the SARS-CoV-2-induced mortality 3 . Moreover, the footprint of APOBEC3 and ADAR deamination has been detected during SARS-CoV-2 infection in-vivo 6 . APOBEC and ADAR seem to deaminate in preferential contexts as APOBEC deaminates primarily TC or CC motifs in ssDNA, leading to TT or CT variations, whereas ADAR deaminates adenosine (A) to inosine (I) primarily in the context of WA (W=A/T) motifs in dsRNA, with the resulting inosine being read as guanine during translation. Depending on the host-virus combination, A-to-I hyper-editing (he) can either limit the virus replication by impairing its dsRNAs and marking them for a rapid degradation 9 or favors the virus by acting as negative-feedback in the immune system, reducing the dsRNA detection mediated by MDA5 10 . The analysis of few SARS-CoV-2 RNA-seq samples outlined a low frequency of ADAR editing, and raises relevant questions about the ADAR role during infection 11 .
In details: 1. To what extent is SARS-CoV-2 susceptible to ADAR-mediated editing? 2. Is ADAR activity, in the form of he, traceable during SARS-CoV-2 infection? Here, we veri ed the in-vivo levels of ADAR he in 863 RNA-seq datasets from humans and other SARS-CoV-2 infection models. Analyzing the dinucleotide composition of the open-reading frames (ORFs) of SARS-CoV-2 and other coronaviruses, we traced the evolutionary footprints of RNA editing enzymes and we estimated the current susceptibility of these coronaviruses to RNA editing.

Results And Discussion
We searched for SARS-CoV-2 transcription in 636 human-SARS-CoV-2 RNA-seq datasets, identifying 328 samples with at least 1,000 SARS-CoV-2 reads (S. Table 1). We applied the hyperediting tool 12 on these human host samples to trace genuine ADAR hyper-editing (he), retrieving 14,273 SARS-CoV-2 he reads, with an average of 0.036 edited reads every thousand viral reads (S. Table 1). Notably, he along SARS-CoV-2 was identi ed only by lowering the fraction of minimum edited sites per read to 0.03 instead of 0.05. The he levels poorly correlated with the coverage of SARS-CoV-2 in these samples ( Figure 1A, r=0.42, p-values 1.3 e -8 ) and only 49 samples included more than 100 he reads (for a total of 9,568 he reads). In these samples we showed that he is mostly localized in two conserved hotspots along the SARS-CoV-2 genome: one localized on the RNA-dependent RNA polymerase (nsp12, position 14221:14331) and the second on nsp6 (position 11058:11162), both encoded within the polycistronic ORF1ab ( Figure 1D). Normalizing he by gene expression levels, we demonstrated that he mostly impacted ORF6 ( Figure 1C). ORF6 is known to impair the transcriptional induction of ISGs by interacting with STAT1 and STAT2 13 .
The analysis of additional 227 RNA-seq datasets referred to SARS-CoV-2 infection in non-human hosts, including hamster, ferret, non-human primates, and mice revealed relevant he levels exclusively in mouse samples of one experiment (PRJNA646535, S. Table 1). SARS-CoV-2 infection in these mice was made possible by ACE-transfection, whereas the role of Interferon signaling during infection was evaluated by knocking-out IFNR or IRF3/7 14 . Notably, the IRF3/7 knock-out mice displayed the highest he levels (1.16‰), followed by control (0.26‰) and IFNR knock-out mice (0.11‰). In these mouse samples, we detected a higher e ciency of he, measured as the number of edited bases per he cluster, compared to SARS-CoV-2 in humans, and the normalized he levels homogenously impacted SARS-CoV-2 genes ( Figure  1C and D).
To verify if SARS-CoV-2 could interfere with ADAR he of host dsRNAs, we traced he events on mouse and human genomes. In mice, ADAR he impacted preferentially non-coding genes, although he levels were not in uenced by SARS-CoV-2 infection for either protein coding genes or non-coding elements (S. Figure 1A).
Although ADAR expression levels appeared low (<3 TPMs) and mildly downregulated in knock-out mouse samples, we observed a reduced he levels in these latter samples, possibly due to the general reduction of interferon-related genes in which ADAR is included (S. Figure 1A). The e ciency of he was similar between SARS-CoV-2 and mouse in all the samples ( Figure 1B). Unfortunately, human host reads have been removed from most of the samples with SARS-CoV-2 he. In 7 human samples we could show that the host he e ciency was higher compared to SARS-CoV-2, with an average of 5 edited bases per cluster compared to 3.5 ( Figure 1B).
To further verify the possible implication of ADAR he during SARS-CoV-2 infection in humans, we analyzed post-mortem RNA-seq samples of lung with different SARS-CoV-2 infection levels (N=57) 15 . We demonstrated that he of host dsRNAs was not in uenced by SARS-CoV-2 infection (S. Figure 1B). Similar to mouse samples, ADAR was only mildly modulated by SARS-CoV-2 (1.8x), with mid to low expression levels (<20 TPMs).
According to our results, we speculated that SARS-CoV-2 genome sequence could confer resistance to ADAR he. The analysis of the dinucleotide composition of ORFs belonging to 70 coronaviruses demonstrated a signi cant under-representation of the 'WA' (W=A/T), 'TC' and 'CG' motifs (S. Figure 2). While the 'CG' motif is known to be under-represented in coronaviruses, in agreement with the effect of the zinc nger antiviral protein (ZAP) and the low-CpG frequency characterizing vertebrate hosts 16 , the 'WA' and 'TC' motifs are preferential targets for ADAR and APOBEC3 enzymes, respectively 17,18 . No signi cant differences in the inter-and intra-genus under-representation were detectable for 'WA' and 'TC', except between alpha and delta-coronaviruses at 'WA' (p-value 0.0032, Figure 2, S. Table 2), with WA under-representation in Orf1ab and S detected in 96% and 81% of the tested coronavirus genomes, respectively. The second metric that we considered is "the replacement transition fraction", or repTrFrac, which determines a signi cantly high mutation susceptibility in an ORF, leading to non-synonymous polymorphism (nsSNPs). We showed that repTrFrac in 'WA' was signi cantly higher than in TC for most of the coronavirus genomes ('WA': 78.8 ± 18%; 'TC': 31 ± 34.6%, p-value 2.79 e -11 , Figure 2).
Among beta-coronaviruses, SARS-CoV-2 appears in the upper part of the distribution for 'WA' motifs, because of a signi cant WA under-representation in ORFs covering 95% of the genome (ORF1ab, S, ORF3a, M, ORF6, ORF7a, N), whereas 'TC' motifs SARS-CoV-2 are in the lower part of the distribution (ORF1ab and S, Figure 2). We also showed that the 'WA' under-representation characterizes most of the non-structural proteins encoded along Orf1ab, except for nsp7, nsp9-12 and nsp16 ( Figure 1D).

Conclusions
We showed how ADAR and APOBEC have evolutionary contributed to minimize their own RNA editing targets in the extant coronavirus genomes. ADAR has massively directed genome evolution towards less editable targets, with just few spaces left for additional synonymous variations, whereas APOBEC has leaved room for additional synonymous variations. This result is consistent with the strong bias towards C-to-T variations traced during the SARS-CoV-2 pandemic, likely produced by APOBEC 5,6 . Genomeencoded resistance could explain the low frequency of ADAR he on SARS-CoV-2 in-vivo, although the evidence of abundant he in mouse samples rather suggests that host-speci c SARS-CoV-2 replication mechanisms contribute to reduce ADAR he. SARS-CoV-2 RNAs are protected in double membrane vesicles in humans 19 , effectively masking the potential ADAR substrate for editing. In mice the formation of these vesicles has never been veri ed for SARS-CoV-2. Unmodi ed he levels towards human dsRNAs discharged the hypothesis of a direct interaction between SARS-CoV-2 and ADAR, according to the absence of ADAR in the viral interactome 20 . We concluded that SARS-CoV-2 escapes ADAR he because of a combination of genome-encoded resistance and protected virus replication mechanisms in humans.

Materials And Methods
Data retrieving. Genome sequences of 70 coronaviruses were downloaded from the NCBI genome database and parsed to extract 617 open reading frames (ORFs, S. Table 2). A total of 1,792 RNA-seq datasets of SARS-CoV-2 were retrieved form NCBI SRA archive (accessed 1 st of December 2020) and screened as follows: RNA-seq metadata were used to extract samples of in-vivo infection in different host, while the number of SARS-CoV-2 reads per sample was assessed by mapping the reads to the MN908947 reference genome using bwa (github.com/lh3/bwa). Samples with at least 1,000 SARS-CoV-2 reads were considered positive.
ADAR hyper-editing analysis. The hyperediting tool 12 was applied after minimal modi cations of the original version, which utilized bwa, SAMtools (github.com/samtools) and BEDTools (github.com/arq5x/bedtools2), implemented to overcome software incompatibilities. The tool parameters were adapted to our model, applying: 3/5 for Minimum of edited sites at Ultra-Edit read (%); 60 for Minimum fraction of edit sites/mismatched sites (%); 25 for Minimum sequence quality for counting editing event (PHRED); 60 for Maximum fraction of same letter in cluster (%); 20 Minimum of cluster length (%); and imposing that the he clusters should not be completely included in the rst or last 20% of the read. Outputs in BED format were parsed using custom scripts, and further analyzed using CLC Genomic Workbench v.21 (Qiagen, US).
Gene expression analysis and he normalization. Quality-trimmed reads were mapped to the SARS-CoV-2 reference genome (MN908947) applying 0.8 and 0.8 for length and similarity parameters, respectively.
Gene expression values were computed as Transcript Per Million (TPM) or as uniquely mapped reads, to normalize he levels.
Under-representation analysis. Under-representation and replacement transition fraction analysis were performed using the n3 module of the Cytidine Deaminase Representation Reporter (CDUR) 21 . Brie y, this reporter received as input a coding sequence, which was shu ed 1,000 times by switching nucleotides in the third positions of the codons while maintaining the integrity of the amino-acid sequence as well as the genome GC content. We measured the relevant statistics (e.g., belowTA and repTrFracTA) as follows. The "below" metrics counted the number of hotspots (e.g., TA) in the input and compared this number to the distribution of hotspots observed in the shu ed sequences to obtain an empirical P-value. The replacement transition fraction, or repTrFrac, compared the ratio of possible non-synonymous mutations that can occur at the hotspot (e.g., TA) to the observed number of hotspots, obtaining a P-value in a similar way. This fraction was compared to the distribution resulting from the shu ed sequences, to obtain a second empirical P-value. value associated with the under-representation of the speci ed dinucleotide whereas the repTrFrac columns report the p-value associated with the replacement transition fraction of the speci ed dinucleotide.