Comprehensive Characterization of RNA Editing in Primary Gastric Adenocarcinoma through RNA-seq Data Analysis

RNA editing is a post-transcriptional nucleotide modication in humans. Of the various types of RNA editing, the adenosine to inosine substitution is the most widespread in higher eukaryotes, which is mediated by ADAR family enzyme. Inosine is recognized by the biological machineries as guanosine, therefore, editing can potentially rendering substantial functional effects throughout the genome, depending on where it located. RNA editing could contribute to cancer by either exclusive editing of tumor suppressor/promoting genes or by introducing transcriptomic diversity to promote cancer progression. Here, we provided a comprehensive overview of the RNA editing sites in gastric adenocarcinoma and highlighted some of their possible contributions to gastric cancer. RNA-seq data corresponding to 8 gastric adenocarcinoma and their paired non-tumor counterparts were retrieved from GEO database. After pre-possessing and variant calling steps, a stringent ltering pipeline was employed to distinguish potential RNA editing sites from SNPs. The identied potential editing sites were annotated and compared with those in DARNED database. Totally, 12362 high-condence adenosine to inosine RNA editing sites were detected across all samples. Of these, 12105 and 257 were known and novel editing events, respectively. These editing sites were unevenly distributed across genomic regions, nearly half of them were located in 3´UTR. Indeed, 4868, 3985 and 3509 editing sites were found to be common in both tissue, normal specic and cancer specic, respectively. Further analysis revealed signicant number of differentially edited events among these sites, which were located in protein coding genes and microRNAs. Given the distinct pattern of RNA editing in gastric adenocarcinoma and adjacent normal tissue, edited sites have the potential to serve as biomarkers and therapeutic targets in gastric cancer diagnose, management and treatment. The identication of RNA editing sites deeply depends on sequencing technology and bioinformatics approaches. We developed a pipeline for identifying RNA editing events in primary gastric cancer and normal tissues by screening RNA differences from reference genome followed by successive and rigorous ltering Most of previous studies have used coupled RNA and DNA to identify editing by the contrary, we identied RNA editing using RNA sequencing alone. Our signicant number of editing sites, vast majority of them in regions, Also a few novel editing which for the rst time in the the number of identied editing sites huge, most of the sites exhibited low editing levels and approximately half of the identied sites were in less than 27% of their related Our analyses that the associated both of Alu the of


Introduction
RNA editing is a common and essential post-transcriptional alteration of RNA sequences, affecting millions of bases, expanding the transcriptome diversity and the functions of RNA transcripts [1]. Although several types of RNA editing have been characterized, conversion of adenosine residues to inosine (A to I) is the most frequent type of editing in humans, which is catalyzed by the double stranded Despite the identi cation of these regulation elements, the main controlling feature of ADAR target recognition and how the ADAR nominates an adenosine for edition, remains to be further studied. Since, these elements do not allow the prediction of editing sites, identi cation of editing events is therefore dependent on sequencing data [22].
The advent of next-generation sequencing (NGS) has greatly improved the genome-wide identi cation of RNA editing sites through RNA sequencing (RNA-Seq) technologies and so far several million high con dence editing sites have been recognized in the human genome [23]. Identi cation of editing sites from RNA-seq data seems to be straightforward. Simply, aligning RNA-seq reads to the reference genome and searching for A to G mismatches, leads to detection of editing sites [24]. However, there are several sources of disagreement between RNA sequence and the reference genome, making the identi cation of actual editing sites challenging. The major challenge in identifying RNA editing events using RNA-seq data is the discrimination of genuine editing sites from somatic mutations, SNPs and sequencing errors, therefore, robust bioinformatical approaches need to overcome this challenge [25]. However, dozens of outstanding studies have successfully employed RNA-seq data alone to identify editing events [25][26][27][28][29][30][31][32][33][34].
To the best of our knowledge, there has been no comprehensive study investigating the editome in gastric adenocarcinoma and many outstanding questions on the extent and consequences of RNA editing in gastric cancer remain concealed. In this study we leveraged publicly available sequencing datasets to characterize RNA editing in gastric cancer.

RNA-seq datasets
Raw paired-end RNA-seq samples related to eight primary gastric adenocarcinoma and their paired nontumor counterparts were retrieved from publicly available GEO database (Gene Expression Omnibus database, accession number GSE85465). Non-tumor counterparts refers to samples harvested from the stomach, from sites distant from the tumor and exhibiting no visible evidence of tumor or intestinal metaplasia/dysplasia upon surgical assessment. The original data and sample details are described by Ooi et al. [35]. RNA-seq libraries of these samples were constructed using Illumina Stranded Total RNA Sample Prep Kit v2 and the dataset was generated using the Illumina HiSeq 2000 platform and the paired-end 101 bp read option.
Quality control and read mapping First, FastQC v0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was employed to control raw reads quality [27]. Furthermore, sequencer adapter removal and quality trimming was performed with Trimmomatic v0.32 (parameters: trailing 20 Maxinfo 60:0.95 and minimum length 60) [36]. Then, clean reads were aligned to the human reference genome (GRCh38) using Hisat2 v2.0.5, as it is more e cient at providing editing prediction from RNA-Seq data than other programs [37]. To reduce the potential bias caused by short read alignment, only uniquely and concordantly aligned reads were kept. PCR-induced duplicated reads that mapped to the same location were marked and excluded from analysis using the MarkDuplicates tool from the Picard package (http://picard.sourceforge.net/), except those with the highest mapping quality score [26]. To promote the aligning in the anking of the indel regions and to improve the quality of reads, the remaining reads were locally realigned around putative indels and the base quality values were recalibrated by GATK tool v3.5 (https://www.broadinstitute.org/gatk).

Variant calling and identi cation of RNA editing sites
To perform variant calling, single nucleotide variants (SNVs) were rst called using the HaplotypeCaller from the GATK tool with a stand_call_conf and stand_emit_conf value of 30 and mbq of 25 [38]. Next, the SNVs were removed from further analysis if they were corresponded to known SNPs found in Ensembl human SNP database version 151. Then, the remaining variants were ltered using the GATK standard lters including; 1) total depth of coverage < 10, to remove variants with less than 10 reads that passed the caller's internal quality control metrics. 2) HomopolymerRun > 5, to eliminate the variants with a homopolymer run larger than 5 bp on either side. 3) RMSMappingQuality < 40, to exclude variants with root mean square mapping quality less than 40 over all the reads at the site. 4) MappingQualityRankSum <-12.5, this parameter compares the mapping qualities of the reads supporting the reference allele and the alternate allele and employed to avoid mapping quality bias. A negative value indicates the mapping qualities of the reference allele are higher than those supporting the alternate allele. 5) QualitybyDepth < 2, this annotation is intended to normalize the variant con dence in order to avoid in ation caused when there is deep coverage. 6) ReadPosRankSum <-8, this annotation compares whether the positions of the reference and alternate alleles are different within the reads and eliminates variant distance bias [39].
Additionally, several quality-aware ltering steps employed to increase the accuracy of identifying true RNA editing sites. First, the sites with more than one non-reference type and homozygous sites for the alternative allele were ltered. Second, we discarded the sites with fewer than three reads supporting the SNV and only those sites, which at least 10 reads cover that site were kept for further analysis. Further, the SNV sites with an extreme or a rare degree of variation (threshold for the editing ratio was between 10% and 90%) were removed under the assumption that 100% editing e ciency is unrealistic. Third, SNVs located in regions with bidirectional transcription (transcription that occurs on both the positive and negative strands) were ltered. Fourth, GMATo software used for detection of simple sequence repeats (SSR) patterns and SNVs located in SSR regions were considered as biased with an offset of ± 3 bases [40]. Fifth, SNVs occurred within 5 bp intronic anking region were removed. Finally, to reduce falsepositive SNVs because of misalignment of sequencing reads to other parts of the genome, we ltered out SNVs in paralogs or repetitive regions by retrieving and aligning 100 bp of anking sequence (50 upstream and 50 downstream of the SNV) using BLAT [41]. Only the SNVs that were located in uniquely mapped sequences considered as RNA editing site. A to G and editing sites were kept for further analysis and other non-canonical editing sites were excluded. Ultimately, we compare identi ed RNA editing sites with those in DARNED [42] database and categorized them as "known RNA editing site", if they were in the database, and as "novel editing site" if they were not. An overview of our computational analysis pipeline for identifying the RNA editing sites is shown in Fig. 1.

Neighborhood pro le of editing
In order to predict the conservation of the editing sites neighborhood nucleotides, 10 bp upstream and 10 bp downstream of the edited sites were extracted. Then, WebLogo software was employed to generate a consensus sequence logo and investigate the sequence context anking the identi ed potential editing sites [43].

Annotation of RNA editing sites
The functional annotation and genomic location of the RNA editing sites were performed using SnpEff v4.3 [44]. The gene set used for annotation was Ensembl version GRCh38.92. In order to identify the biological functions associated with edited genes in cancer and normal tissue, we used Enrichr webapplication to conduct a functional enrichment analysis based on Gene Ontology (GO) biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway terms (Adjusted P-value ≤ 0.05) [45].

Validation of detected editing sites
To validate detected RNA editing sites, we used the publicly available human expressed sequence tags (ESTs) (ftp://ftp.ncbi.nih.gov/repository/UniGene/) to investigate whether the editing events identi ed by our pipeline were also present in these sequences. First, 50 bp upstream and downstream anking regions of editing sites were extracted and queried against the human EST sequences using BLAST.
Then, alignments with e-values < 10 − 5 were considered as signi cant and counted. On the other hand, since most of A to I RNA editing occurs in Alu repeats [46], we evaluate intersection of Alu repeats with identi ed editing sites. To do this, genomic positions of Alu repeats were downloaded from UCSC database (http://genome.ucsc.edu/) and their distribution pattern across the genome were compared with pattern of identi ed A to I editing sites.

Impaired microRNAs targeting
In order to predict microRNAs whose binding is affected by RNA editing, we downloaded the predicted microRNA binding data of highly conserved miRNA families from miRcode database [47]. Then, we applied intersect feature of BedTools to nd RNA editing sites that overlap with target site of microRNAs [48].

Statistical Analyses
Statistical signi cance for differences between cancer and normal tissue editing ratio were assessed by paired Student's t-test. Spearman's correlation coe cient was used to determine the relationship between chromosome length, number of Alu elements, number of protein coding genes and number of editing sites. Differences were considered signi cant when the P-value or adjusted P-value was < 0.05

Results And Discussion
Identi cation of RNA editing sites High-throughput RNA-seq technology have facilitated the discovery of transcriptome-wide RNA editing events across individuals and tissues at unprecedented throughput and resolution. However, the main obstacle in identifying bona d RNA editing sites using RNA-seq data is the distinction of RNA editing sites from rare SNPs and technical artifacts caused by sequencing or read-mapping error. To accurately detect the RNA editing sites at the transcriptome-wide level in gastric cancer, we developed a computational approach by using a precise strategy (see Fig. 1). This strategy enabled us to identify the potential RNA editing sites using RNA sequencing data alone, without the need for available matched DNA sequence from the same sample. We obtained 1725 million reads from RNA-seq data of eight gastric adenocarcinoma and their paired normal tissues. After quality trimming, a total of 1492.1 million reads were generated from all samples (on average, 93.3 million reads per sample). The clean reads were aligned to the reference genome with an average mapping rate of 91.67%. Also, the average rate of uniquely and concordantly mapped reads was 74.34% (range 59-84%). Initial analysis led to the identi cation of 1370502 variants and after excluding SNPs and INDELs 141347 SNVs remained. Finally, after applying multiple stringent lters to exclude false-positives, a total of 12362 unique A to G RNA editing sites were identi ed across all samples, which 12105 sites were previously reported in DARNED database and 257 variants were novel editing sites. These editing sites were distributed in 2406 unique gene. Based on our ltering criteria, all of these editing sites were located in unique genomic positions and were not close to any splice junction, bidirectional transcription or low complexity regions (such as SSRs). A summary of the statistics of raw and clean reads and mapping information as well as the number of identi ed SNVs and editing sites in different samples is provided in S1 File.
Sequence preferences analysis ADAR enzyme targets dsRNA of any sequence, but it has a sequence preference in the vicinity of the editing sites. Consistent with the known attributes of ADAR substrates, our results showed that the nucleotide immediately upstream (relative − 1 location) of edited site had a strong preference for G depletion and T enrichment. While, nucleotide immediately downstream (relative + 1 location) of the editing site showed signi cantly depleted T and favored G (see Fig. 2).

Validation of identi ed editing
The location of identi ed editing sites were compared with the position of Alu elements across genome. Interestingly, distribution of A to G editing sites and Alu elements were very similar across the genome.
This is more obvious when we look closely at chromosomes 1, 9, 16 and 19, where, ends of these chromosomes are reach in Alu repeats but the middle of chromosomes are relatively vacant (see Fig. 3). Next, to validate whether the identi ed RNA editing sites were true positive, we searched for evidences of the identi ed RNA editing sites in expressed sequence tags (ESTs) based on NCBI database. Of 12105 known and 257 novel editing sites, 10944 (90.4%) and 218 (84.8%) sites were found in EST clones, respectively. Moreover, further investigation revealed that 7643 (68.5%) of the identi ed editing events were validated in more than ve EST clones, which reinforce the accuracy of our method.

Distribution of the editing sites across genomic regions
First, the location of editing sites was annotated according to Ensembl database. As shown in Table 1 the most biotype of edited transcripts were "Protein coding" and the least were "snoRNA". Also, 42 editing sites were located in miRNAs, which were belonged to 17 unique microRNAs. Of these, MIR34A included 10 cancer-speci c editing sites ( Table 2). Investigation of genomic distribution of editing sites showed that the number of RNA editing sites greatly are varied across genomic regions. Overall, the 3´UTR was the most edited region, with 5870 editing sites (45.5% of all detected editing sites), followed by the upstream (23.5%), and the 5´UTR had the least number of editing sites (less than 1%). Indeed, 192 (1.6%) of the editing sites were located in exons, including 81 sites (42% of exonic editing sites) with nonsynonymous effect and 43 sites (22%) with synonymous effect. Exonic RNA editing leads to at least one premature termination codon and two stop loss mutations. Also, one editing site with start-gain mutation effect was detected (see Fig. 4).  Gene editing rate and RNA editing level RNA editing sites often appear in clusters, due to simultaneous editing of multiple adenosines by ADAR proteins. Therefore, we investigated whether the identi ed editing sites were in clusters or not. We found that 34% of genes were edited in more than ve sites. Furthermore, gene editing rate (number of edited sites located in gene) were calculated to evaluate clustering of editing sites. Overall, each gene in our study showed editing rate equal 5.1, which means on average each gene had ve editing sites. Interestingly, editing rates were different when genomic regions were considered separately. Editing rate in 3´UTRs was predominant, 7.1 editing site per gene, and exons showed the least editing rate, 1.4 editing site per gene. Editing rate in upstream and downstream regions, which included a large number of editing sites, was 3.5 and 3.8 site per gene, respectively. The frequency distribution of gene editing rates across genomic regions is shown in Fig. 5A. RNA editing level was also calculated for all edited sites, using the following formula [80]: RNA editing level= (number of reads supporting edited allele × 100) / (total number of reads at a site) Average RNA editing level across all sites was 30.72, which means, approximately 31% of each gene transcripts were edited in a given site. Editing level for most of the identi ed editing sites in the present study ranged from 15 to 25. The frequency distribution of RNA editing levels is shown in Fig. 5B.
Association between chromosome length, Alu elements, protein coding genes and number of editing sites Pearson's correlation coe cient was used to investigate the association between the number of editing sites and length of chromosomes. As expected, the number of RNA editing sites tended to be associated with chromosome length, but the association was weak when all chromosomes were included (r = 0.47, P = 0.02). As show in Fig. 6A, chromosome 19 has the highest editing frequency according to its size. Excluding the chromosome 19 from the analysis showed a signi cant correlation between number of RNA editing sites and length of chromosomes (r = 0.6 P < 0.002). In addition, correlation of editing with both number of Alu elements and number of protein coding genes were calculated. Surprisingly, we found that correlation of editing with number of protein coding genes was stronger than number of Alu elements, where Spearman's correlation coe cient was 0.91 and 0.85, respectively ( Fig. 6B and 6C). Cancer and normal speci c editing sites Among the 12362 editing sites, 4868 sites were found within both normal and cancer samples. On the other hand, 3985 and 3509 editing sites were speci c to normal and cancer tissues, respectively. Statistical analysis revealed 285 differentially edited events among common editing sites. Notably, 129 cancer-speci c and 173 normal-speci c editing sites were found to be differentially edited (see Fig. 7).
Functional enrichment analysis of the cancer and normal-speci c edited genes showed a larger number of signi cant terms in cancer-speci c edited genes. Nine GO term were signi cantly enriched in cancerspeci c edited genes, on the other hand only one term was signi cantly enriched in normal-speci c edited genes. GO and KEGG pathways categories of the top ve cancer-and normal-speci c edited genes are shown in Table 3. These signi cantly enriched terms could help us a lot to further understand the role of edited genes in gastric cancer.

Functional impacts of RNA editing sites
The functional impact of RNA editing could induce by vast range of molecular mechanisms. For instance, it can lead to amino acid recoding, causing changes in seed sequences of microRNAs or affect microRNA targeting sites. In search of amino acid recoding mutations, 81 editing sites were found across 63 genes that could lead to non-synonymous change (S3 File), including 12 novel editing sites. Interestingly, MUC4, an epithelial glycoprotein coding gene, was edited in two positions (3:195780295 and 3:195780902), which caused p.L3762P and p.S2560P, respectively (Table 4). Also, microRNAs targeting could affect by editing. In this regard, 44 editing sites were detected that affect microRNA target recognition in normal and cancerous tissue of gastric (Table 5). In addition, 294 editing sites with nonsense-mediated decay impact were found that affect 92 protein coding genes. Of these, 80 and 111 sites were identi ed only in cancer and normal samples, respectively. Also, 103 nonsense-mediated decay editing sites were found in both cancer and normal tissues (S4 File).

Discussion
The identi cation of RNA editing sites deeply depends on sequencing technology and bioinformatics approaches. We developed a pipeline for identifying RNA editing events in primary gastric cancer and normal tissues by screening RNA differences from reference genome followed by successive and rigorous ltering criteria. Most of previous studies have used coupled RNA and DNA sequences to identify editing events [28,81], by the contrary, we identi ed RNA editing sites using RNA sequencing data alone.
Our analyses found signi cant number of editing sites, vast majority of them harbored in 3´UTR regions, which has been reported in previous studies [80,82]. Also a few novel editing sites were found, which were reported for the rst time in the current study. Although the number of identi ed RNA editing sites was huge, most of the sites exhibited low editing levels and approximately half of the identi ed sites were edited in less than 27% of their related transcripts.
Our analyses found that the RNA editing sites were highly associated with both number of protein coding genes and Alu elements distribution in the genome. Also, frequency of editing sites were correlated with size of chromosomes. These results are in a good agreement with Chigaev et al. study, who reported that correlation of editing frequency with protein coding genes is stronger than lincRNA density [80]. However, these correlation could result from the bias of the library preparation step of RNA sequencing projects. Since oligo-dT primers apply to capture the RNA through the poly-A tail, most of the reads will be related to protein coding genes.
To date, no speci c sequence has been found that characterize editing sites of any of the ADAR enzymes.
However, in the neighborhood of edited adenosine, there are preferred and opposed preferences. Consistent with previous studies, there was an over-representation of guanosine in the neighboring position downstream, while guanosine was depleted in the upstream neighboring position [26,82]. Since some of adenine bases in the right context do not edit, other features proposed to be involved in determination of editing. Daniel et al. described editing inducer elements distance from the edited adenine, which increase the editing e ciency and speci city of a highly edited site [20]. Wong et al. reported that editing e ciency is strongly in uenced by the base opposing the edited adenosine. They found that when there is an A:C mismatch at the editing site, editing by ADAR enzyme was enhanced compared to when A:A or A:G mismatches or A:U base pairs occurred at the same site [21]. Due to the contradictory results, it is di cult to make de nitive conclusions about potential editing sites.
We wonder whether RNA editing could function as an additional mechanism contributing to tumorigenesis by generating speci c RNA editing sites that are unique to cancer samples. In the search of the answer to this question we found that 28.4% and 32.2% of the identi ed editing sites were speci c to cancer and normal tissues, respectively. These tissue speci c editing sites could contribute to cancer initiation and progression, if they located in important gene. Some of cancer-speci c editing sites and their role in pathogenesis of cancer have been identi ed in previous studies. RNA editing of transcription factor PROX1, a candidate tumor suppressor, leads to several missense substitutions including E328G, R334G, and H536R and loses tumor suppressive functions. These editing events have been seen in a number of esophageal, pancreatic, and colon cancer samples, but no such editing is seen in a number of cDNA libraries of many normal tissues [17].
We also found a remarkable number of common editing events between cancer and normal tissues, which their editing levels were signi cantly different in cancer and normal tissue. Deregulated editing level in cancer and normal common editing sites could be an important contributor in tumorigenesis. Chen et al. reported that RNA editing level of AZIN1 increases by at least 10% in hepatocellular carcinoma compared to adjacent normal liver. The edited isoform compared with wild-type AZIN1 has increased a nity to antizyme, which leads to neutralization of antizyme-mediated degradation of ornithine decarboxylase and cyclin D1 and promotes cell proliferation [83]. In this regard, Han et al. reported a higher level of editing on RHOQ in tumor compared with normal tissue in colorectal cancer, which results in N136S amino acid substitution. This RNA mutation increases RHOQ protein activity, actin cytoskeletal reorganization and invasion potential [84]. On the contrary, hypo-editing of several genes are associated with cancer phenotypes. The pre-mRNA transcript encoding the GluR-B has two functionally important editing sites (Q/R and R/G sites) and the Q/R site almost entirely edited, which is necessity for normal function of receptor. It has been proved, in malignant tissue of human brain tumors, this editing site of GluR-B considerably under-edited compared with control tissues [85]. Our results corroborate that the RNA editing frequency can be regulated in a tissue speci c manner, which is consistent with observations reported previously.
Our results showed that the vast majority of editing sites in gastric cancer were located in 3´UTR and up/down stream regions as well as a large number of editing sites were observed in coding regions.
According to their genomic location, these RNA editing events could lead to various functional impacts and apply their effects through several dominant mechanisms. First and most important, RNA editing events in exonic region can cause amino acid change and imitate cancer-associated missense mutations. Our pipeline identi ed 81 editing events with non-synonymous effect, including 12 novel editing events. Notably, we found four missense RNA mutations in mucin family (MUC3A, MUC4 and MUC6). Normal gastric epithelial cells transcribe MUCs, which have several functions including; protection against mechanical and infectious lesions, lubrication and acid resistance [86]. Several studies have been reported that transcription pro le of mucins are changed in gastrointestinal cancers, which overall suggests an important role for MUCs in gastric cancer [87][88][89]. Our results reinforced the hypothesis that inappropriate RNA editing can be involved in gastric cancer development.
Second, RNA editing could affect microRNAs target recognition and subsequently affect the expression pro le of the genes. Previous computational analyses suggested that RNA editing tends to avoid microRNA target sites in general, even though RNA editing events have a potential to block the microRNA target recognition. Dysregulation of microRNA target recognition has been linked to cancers [90,91]. In this context, 44 editing events were found in the present study, where at least one microRNA binding was disrupted. In consistent with our research, Soundararajan et al. identi ed 652 editing events in lung cancer, which were located in the 3´UTR of 205 target genes and mapped to 932 potential microRNA target binding sites [92]. All together these ndings are inconsistent with Liang and Landweber previous computational analyses, where they suggested that RNA editing tends to avoid microRNA target sites in general, even though RNA editing events have a potential to block the microRNA target recognition [93]. It is worth to remind, RNA editing events in addition to disrupting existing microRNA binding sites, could generate novel microRNA regulatory networks. In a completely separate mechanism from what has been mentioned, RNA editing could affect microRNA biosynthesis. miR-142 is highly expressed in hematopoietic tissues, conversely it is not expressed in non-hematopoietic tissues. Also, its expression in patients with acute myeloid leukemia is signi cantly lower than that in controls. Yang et al. showed that editing of pri-miR-142, leads to suppression of its processing by Drosha and subsequently it degradation [94].
Third, editing of microRNA sequences could alter their binding a nity or target recognition properties.
Since microRNAs play a role in nearly all cellular pathways and pathological processes, including cancer initiation and progression, uctuations of their targeting are an important contributor to cancer [95]. Our analysis revealed 42 editing sites in 17 cancer-associated microRNAs, some of them exclusively edited in cancerous tissue. Consistent with our results, Nigita et al. identi ed 40 and 18 potential editing sites in Lung Adenocarcinoma and Lung Squamous Cell Carcinoma, respectively [96]. Indeed, our results showed miR-34a, a cancer-speci c edited microRNA, was edited in 10 position. Previous studies have been identi ed this microRNA as a tumor suppressor in gastric cancer cell lines [58]. On the other hand, it was shown miR-34a epigenetically down-regulated or silenced in gastric cancer tissues and cell lines [97]. We therefore speculate that editing in some positions could terminate the function of miR-34a, but further studies are required to con rm this possibility.
To our knowledge, this is the rst time to comprehensively characterize editome of normal and cancerous tissue of gastric. Findings of the current study uncovered relatively large number of RNA editing sites, which were unevenly distributed across genome. Editing level of these sites and editing rate of different genes had diverse distribution. We also found a signi cant number of exclusively edited genes in cancer and normal tissue, which are likely to contribute to cancer initiation and progression.

Conclusions
Gastric cancer initiation and progression is driven by the cumulative effects of genetic and epigenetic alterations, RNA editing a widespread post-transcriptional mechanism could be part of these alterations. Depending on genomic location and level of editing, this phenomenon could leads to missense mutations, affecting microRNA biosynthesis and targeting, changing splicing patterns and modifying microRNA target sites. Editome of gastric cancer vastly differ from adjacent tissue in terms of both type and number of editing sites. Given the distinct pattern of RNA editing between gastric cancer and normal tissue, edited sites have the potential to serve as biomarkers and therapeutic targets in gastric cancer diagnosis, management and treatment.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets analysed during the current study are available in the GEO repository [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85465]. All data generated during this study are included in this published article and its supplementary information les.

Competing interests
The authors declare no competing interests.

Funding
This work was supported by the Tarbiat Modares University and Iran National Science Foundation (grant number: 97014362).

Authors' contributions
The study concept and design involved SS, MRB, HM and JB. SS and JB were responsible for the recruitment and data collection. Data analysis was completed by MRB and JB. JB drafted the original manuscript. The article was revised by SS, MRB and HM. All authors read and approved the nal version. S1 File: Summary of the statistics of raw and clean reads and mapping information.  Neighborhood sequence preferences of nucleotides for RNA editing sites.  Distribution of RNA editing sites in different genomic regions.

Figure 5
Frequency histogram of gene editing rate (a) and frequency distribution plots of RNA editing levels