Transcriptional regulatory networks mediated by abnormal m6A modications in tumor development

N6-methyladenosine (m 6 A) is one of the common modications of transcripts that is regulated by related proteins including writers, readers and erasers. Abnormal m 6 A modication plays an important role in the process of tumor development. Current approaches for tumor m 6 A RNA methylation, based on sequencing data from public databases, focus on the comparison of m 6 A modication regulator expression by RNA-seq data. We obtained MeRIP-seq and corresponding RNA-seq data from GEO database to compare and analyze the distributional characteristics of m 6 A in different types of tumor samples, the differences in m 6 A modication enrichment, and the regulatory role in gene transcriptional expression level. We found that the enrichment of m 6 A modication was enhanced in IMPA and GBM tumor samples, but was decreased in METTL3 knockdown tumor cells. Combined with clinical data from the TCGA database, we found that the high expression of DUSP7 signicantly affected the overall survival of AML patients and was involved in tumor development through the PIK3R2 protein and MAPK signaling pathways.

the enrichment of m 6 A modi cation was enhanced in IMPA and GBM tumor samples, but was decreased in METTL3 knockdown tumor cells. Combined with clinical data from the TCGA database, we found that the high expression of DUSP7 signi cantly affected the overall survival of AML patients and was involved in tumor development through the PIK3R2 protein and MAPK signaling pathways.

Conclusion
Abnormal m 6 A modi cation enrichment in tumor samples leads to dysregulation in the expression of cancer-related genes.

Background
Post-transcriptional modi cation has become an important regulator in many biological processes and has attracted increasing attention. N6-methyladenosine (m 6 A) is one of the common internal modi cations in eukaryotic mRNA [1]. One of the methods widely used to study m 6 A modi cation at the transcriptional level is methylated RNA immunoprecipitation sequencing (MeRIP-seq), which involves the immunoprecipitation of m 6 A modi ed RNA fragment and peak detection by comparing to background gene coverage. In 2012, Meyer et al. [2] determined 7,676 human genes containing m 6 A in somatic cells using the MeRIP-seq method; they found that m 6 A was primarily localized to the conserved RRACH sequence (R = G or A;H = A, C or U), the sequence is enriched in the 3' untranslated regions(3′UTRs), and the stop codon region of the protein-coding mRNA. In mammals the amount of m 6 A modi ed adenine is between 0.1% and 0.4% [3,4], so each mRNA has only 3-5 m 6 A methylation sites and regulates RNA stability, localization, splicing, transport, and translation at the post-transcriptional level [5]. m 6 A modi cation is mainly associated with three types of regulators. The rst is m 6 A methyltransferases, which encode genes called "writers" and include METTL3, MetTL14, MetTL16, RBM15/15B, KIAA1429, ZC3H13 and WATP (Wilms Tumor 1 associated protein) [6]. Together, they form a complex that induces the writing of the m 6 A methylated group into RNA [7]. The second type of regulators is m 6 A demethylase, which encodes genes called "erasers" and includes FTO and ALKBH5 [8]. This protein can remove the m 6 A methylation group in RNA, thus affecting the biological process of tumor. The last group of regulators can bind to the m 6 A methylation site and can read information to play a role; the encoding genes of this group are called "readers", some of which identi ed so far are YTHDF1, YTHDF2, YTHDF3,YTHDC1, YTHDC2, HNRNPC, HNRNPG, HNRNPA2B1 and IGF2BPs [9]. Readers are proteins that bind to the m 6 A sites on RNA in the nucleus or cytoplasm to perform a speci c biological function. At present, the research on the molecular biological mechanism of m 6 A modi cation of tumor is still in the initial stage, and further exploration is needed to provide targets for tumor therapy.
Breakthroughs in the discovery and research of m 6 A writers, erasers and readers, as well as the development of high-throughput sequencing data analysis, have helped illuminate the biological functions and potential mechanisms of m 6 A. A growing number of studies have shown that m 6 A RNA methylation plays an important role in tumor genesis and development. The effect of m 6 A modi cation on tumors is mainly re ected in the regulation of these tumor-related genes. Abnormal m 6 A modi cations are closely related to tumor development [10,11]. Therefore, it is particularly important to further study the regulatory mechanism of m 6 A modi cation in the process of tumor development. Glioblastoma is the most deadly primary brain tumor. Studies have shown that m 6 A modi cation is associated with the growth and self-renewal of glioblastoma stem cells (GSCs) [12]. Knockdown of METTL3 or METTL14, key components of the RNA methyltransferase complex, promotes GSCs growth, self-renewal, and tumorigenesis, whereas upregulation of METTL3 or inhibition of FTO has the reverse effects [12]. Another study also reported the important role of METTL3-mediated m 6 A modi cation in GSC maintenance and in glial cell dedifferentiation [13]. This study found that METTL3 expression is upregulated in GSCs and is attenuated during differentiation. The role of METTL3 has also been reported in acute myeloid leukemia(AML). Studies have shown that METTL3 is expressed more abundantly in AML cells than in normal hematopoietic cells. And METTL3-mediated elevation of m 6 A in AML plays an important role in maintaining pluripotency and in inhibiting cell differentiation [14,15]. In other tumors, such as gastric cancer, upregulation of METTL3 promotes the stability of ZMYM1, thereby enhancing EMT process in vitro and metastasis in vivo [16].
In the present study we obtained high-throughput sequencing data of MeRIP-seq for invasive malignant pleomorphic adenoma (IMPA), glioblastoma (GBM), acute myeloid leukemia(AML), and gastric cancer (GC) from GEO database, and we analyzed the level of m 6 A modi cation in different tumor samples. Combined with the corresponding RNA sequencing (RNA-seq) data of these tumor samples, we analyzed the differences of gene expression at the transcription level under different m 6 A modi cation levels. In addition, combined with the tumor clinical data from the The Cancer Genome Atlas (TCGA) RNA-seq database, we expected to identify the key genes that affect the survival of different tumor patients under the regulation of m 6 A modi cation, so as to provide guiding strategies for the clinical treatment of tumor patients. Based on this data analysis, we found that the enrichment degree of m 6 A modi cation was enhanced in IMPA and GBM but was decreased in METLL3 or YTHDF1 knockdown tumor cell lines, suggesting that m 6 A modi cation plays an important role in tumorigenesis. Among the genes with different levels of m 6 A modi cation that were common to different tumor types, the differential expression of DUSP7 is signi cantly correlated with the survival of patients with AML. DUSP7 may be involved in tumor genesis and development through the MAPK signaling pathway, and this gene has rarely been reported in AML, thus providing a new idea for subsequent clinical studies on AML.

Peak Identi cation and Distribution Pattern of m 6 A Modi cation Enrichment in Different Types of Tumor Samples
We screened MeRIP-seq data for different tumor types from the GEO database, but there was less data related to tumor tissue samples. Functional enrichment analysis was performed based on the screened differential peak associated genes. In IMPA, compared with the hypo-methylated peaks, the hyper-methylated peak associated genes were enriched in more cancer-related signaling pathways (p-value < 0.05), such as p53 signaling pathway, Hedgehog signaling pathway, PI3K-Akt signaling pathway, etc. (Fig. 2F). In GBM, both the hypomethylated peaks and hyper-methylated peak associated genes were enriched in some cancer-related pathways, such as PI3K-Akt signaling pathway, p53 signaling pathway, WNT signaling pathway, HIF-1 signaling pathway, and TNF signaling pathway, etc. (Fig. 2G). After knocking down METLL3 in AML cells, compared to the control, hypo-methylated peak associated genes enrichment in more cancer related pathways (Fig. 2H). However, after YTHDF1 knockdown in GC cells, there was no signi cant difference in the enrichment results of cancer-related pathways between genes associated with hyper-methylated and hypo-methylated peaks (Fig. 2H).

Associations Between m 6 A Enrichment Peaks with Dysregulation and Differentially Expressed Genes in Tumors
To investigate the association between differential enrichment peaks and gene expression, we further analyzed RNA-seq data from these tumor samples. The threshold value of differentially expressed genes was de ned as the absolute value of foldchange greater than 2 and a p-value less than 0.05. The IMPA tumor group consisted of 1778 up-regulated genes and 1459 down-regulated genes compared with the control group. Compared with the control, GSC included 2554 upregulated and 2253 downregulated genes. After METTL3 knockdown, AML cells included 1555 upregulated and 1089 downregulated genes, whereas GC cells included 830 upregulated and 387 downregulated genes after YTHDF1 knockdown ( Fig. 3A).
Differential m 6 A methylation peaks associated genes were analyzed combined with differential expression genes (Fig. 3B), and we found that in IMPA tumor and GSC group, there were more genes that signi cantly upregulated with hyper-methylated m 6 A peaks (hyper-up) and signi cantly downregulated with hypo-methylated m 6 A peaks (hypo-down). To compare the changes in expression of m 6 A-modi ed and unmodi ed genes, cumulative distribution analysis was performed based on foldchange of gene expression [18]. A signi cant in uence of m 6 A modi cation on gene expression was found in IMPA and GBM group (p-value < E-06) but not in AML and GC knockdown group (Fig. 3C-D).
Further functional enrichment analysis was performed on the differentially expressed genes associated with differential enrichment peaks. It was found that in the IMPA tumor group, the differentially expressed genes associated with hyper-methylated peaks were enriched in more cancer-related pathways (p-value < 0.05) than genes associated with hypo-methylated peaks (Fig. 3E). In the GSC group, compared with the hyper-methylated peaks, the different regulated genes associated with the hypo-methylated peaks were enriched in more cancer-related pathways (p-value < 0.05) (Fig. 3F). In AML and GC knockdown cells, differentially m 6 A enrichment peaks associated with different regulated genes were enriched in fewer cancer-related pathways than IMPA and GBM group (p-value < 0.05) (Fig. 3G-H). However, GO analysis for AML knockdown cells group showed that the differentially expressed genes that hypo-methylated were related to cell differentiation (Supplementary Table 1).

Speci city of Differentially Expressed Genes Associated with Clinical Overall Survival Mediated by m 6 A Modi cations in Tumors
To explore whether common m 6 A modi cation-mediated differentially expressed genes (DEGs) in different tumors are associated with overall survival in patients, we analyzed the intersection of these gene sets. The GC group was not considered in the common analysis because the enrichment in cancerrelated pathways was not signi cant. There were 140 DEGs and 185 differentially m 6 A methylation associated genes (DMGs) in IMPA, GBM, and AML (Fig. 4A). And there were 10 genes within both DEGs and DMGs (Fig. 4B). The expression patterns of these 10 genes in the three tumor groups were not completely consistent (Fig. 4C).
Combined with clinical data from the TCGA RNA-seq database, we analyzed the overall survival of these 10 genes in different tumor patients. Because there are no clinical data of IMPA in the TCGA database, the data of Head and Neck Squamous Cell Carcinoma (HNSC) were selected for analysis. The differential expression of DUSP7 (p-value = 0.00051) was signi cantly correlated with overall survival in AML patients (n = 173) (p-value = 0.0025) (Fig. 4D). High expression of DUSP7 was associated with poor survival in AML patients. DUSP7 is a hypo-methylated associated gene in AML sh-METLL3 cells, containing two hypo-methylated peaks (chr3:52050843-52051063, p-value = 0.00173; chr3:52053994-52054343, p-value = 0.00515) ( Figure S2C), which is consistent with the visualization result of the peaks on DUSP7 (Fig. 4E). High expression of other genes, such as IL31RA (p-value = 0.18), was associated with poor overall survival in HNSC patients (n = 566) (p-value = 0.044) (Fig. 4F), the high expression of COL6A1 (p-value = 0.24) and COS2(p-value = 0.69) was associated with poor overall survival of GBM patients (n = 166) (p-value = 0.01) (Fig. 4D); however, differences in the expression of these genes were not signi cant. For the gene set common to DEGs and DMGs in GC, no genes were found to be signi cantly associated with the overall survival of patients with stomach adenocarcinoma (STAD) (n = 41) ( Figure S2D). negatively correlated (DUSP7 Neg ) [19]. Compared with the DUSP7 Neg group, the DUSP7 Pos group was enriched in apoptosis signaling pathway, MAPK signaling pathway, and acute myeloid leukemia pathway(p-value < 0.05) (Fig. 5B). DUSP7 belongs to a class of DUSPs that inactivates MAPK through dephosphorylation [20]. Because both GSEA results are enriched in the MAPK signaling pathway, to further analyze the interaction pathway associated with DUSP7 in this pathway, the protein interaction analysis based on STRING database was performed. DUSP7 was found to interact with MAPK1, MAPK3, and PIK3R2 (Fig. 5C). PIK3R2, a phosphoinositide-3-kinase regulatory subunit 2, is involved in some cancer-related pathways ,such as the JAK-STAT signaling pathway, which was found to be enriched in all enrichment results of both GSEAs (p-value < 0.05).

Discussion
The occurrence and development of tumors are regulated by complex biological processes. As one of the modi cation types of transcript level, m 6 A modi cation plays an important role in the metabolism of transcript [21][22][23]. More and more studies have shown that m 6 A modi cation plays an important role in the process of tumor genesis and development [16,24,25]. To explore the biological functions of m 6 A modi cation in different tumors, we obtained MeRIP-seq and RNA-seq data from GEO database for four tumor types to compare the differences in the level of m 6 A modi cation and the biological signaling pathways involved in different tumor types. Combined with TCGA database, the effect of differentially expressed genes with signi cantly different m 6 A modi cation on the overall survival of patients was analyzed.
MeRIP-seq analysis showed that m 6 A modi cation was enriched in 3 'UTR, CDS, and 5' UTR in different tumor samples. In tumor cells that knock down METLL3 or YTHDF1, the number of m 6 A-enriched peaks on the transcript is reduced. METTL3 is an important m 6 A RNA methylation transferase, and YTHDF1 is one of the m 6 A readers [9]. Their expression in knockdown cells is reduced, which will affect the m 6 A modi cation of the transcript. The cumulative distribution of peak enrichment showed that the overall m 6 A enrichment level decreased signi cantly after AML cells knockdown METLL3. In IMPA and GBM tumor samples without gene knockdown, the overall enrichment level of m 6 A was signi cantly increased.
KEGG pathway enrichment results of the genes associated with different m 6 A enrichment peaks indicated that the hyper-methylated peaks were signi cantly enriched in more cancer-related pathways in IMPA, while both the hyper-methylated and hypo-methylated peaks in GBM were enriched in more cancer pathways. These results suggest that abnormal m 6 A modi cation of the transcript is involved in tumor progression.
In order to analyze the regulation of m 6 A modi cation on transcription expression, combined with RNAseq data, the overall foldchange level of expressed genes were compared. The results showed that genes with m 6 A modi cation (Gene m6A ) were more differentially expressed in IMPA and GBM tumor samples than those without m 6 A modi cation (Gene no−m6A ), which suggests that the differential m 6 A modi cation in tumors has a signi cant effect on the expression of transcripts. There was no signi cant change between Gene m6A and Gene no−m6A in the tumor cells that knocked down METTL3 or YTHDF1 due to the decrease of the transcription level of m 6 A modi cation. KEGG enrichment analysis of hyper-methylated DEGs (hyper-DEGs) and hypo-methylated DEGs (hypo-DEGs) indicated that these differentially expressed genes regulated by abnormal m 6 A modi cations were involved in cancer-related signaling pathways. However, in IMPA tumor samples, hyper-DEGs were involved in more cancer signaling pathways, while in GBM tumor samples, hypo-DEGs were involved in more cancer-related pathways, suggesting that the regulation of gene expression mediated by abnormal modi cations of m 6 A is a very complex regulatory network and that other regulatory factors may be involved.
Based on MeRIP-seq data, it was found that METTL3 knockdown resulted in the enrichment of hypomethylated peaks in the CDS region of DUSP7 transcripts, and the expression of this gene was signi cantly decreased. Combined with the TCGA RNA-seq database, we found that DUSP7 gene expression was signi cantly different in AML patients and that patients with high expression of DUSP7 gene had a shorter overall survival. Because there are few studies on DUSP7 in AML patients, further GSEA analysis of DUSP7 showed that it participates in MAPK signaling pathway through the MAPK1 and MAPK3, and it participates in the JAK-STAT signaling pathway through the PIK3R2 protein.
Phosphoinositol 3-kinase (PI3K) and mitogen-activated protein kinase (MAPK) have been widely reported for their roles in oncogenic transformation, cell cycle and apoptosis regulation, which are well-known cancer markers[26].
In summary, our article reveals the differences of m 6 A modi cation enrichment in different types of tumors and shows that abnormal m 6 A modi cation enrichment in tumor samples leads to differences in the expression of cancer-related genes. DUSP7 is signi cantly differentially expressed in AML patients, and its low expression is regulated by abnormal modi cation of m 6 A, while its high expression is associated with poor overall survival. DUSP7 has rarely been reported in AML, so it provides a new target for clinical study of AML.
Conclusions m 6 A modi cation was enriched in IMPA and GBM tumor samples but was decreased in METTL3 knockdown tumor cells. The high expression of DUSP7 signi cantly affected the overall survival of AML patients and was involved in tumor development through the PIK3R2 protein and MAPK signaling pathways.

Sequencing Datasets and Clinical Data
MeRIP-seq and RNA-seq data were obtained from the GEO database for different tumor types, including invasive malignant pleomorphic adenoma, glioblastoma, acute myeloid leukemia and gastric cancer. The sequencing data of invasive malignant pleomorphic adenoma were obtained from GSE161879, including 5 pairs of tumor tissues and corresponding normal gland tissues. The sequencing data of glioblastoma were obtained from GSE158741, including 4 pairs of neural stem cells and glioblastoma stem cells isolated from the patient's tumor tissue. The sequencing data of acute myeloid leukemia were obtained from GSE94613, including four METTL3 knockdown AML cell samples and two control samples. The sequencing data of gastric cancer were obtained from GSE166972, including 3 pairs of YTHDF1 knockdown AGS cell samples and control samples. Clinical data for tumors were obtained from The Cancer Genome Atlas (TCGA) RNA-seq database.

Data Analysis for MeRIP-seq
Raw reads of each sample were trimmed using the Trimmomatic software to remove adaptor sequences and bases with low quality [27]. The processed reads were then aligned to the human reference genome (version hg38, UCSC) using bowtie2 [28] with default parameters. Only unique mapped reads with mapping quality no less than 30 were kept for the subsequent analysis.
For MeRIP-seq peak calling, the MACS2 software was used to identity m6A-enriched (version 2.2.7) [29], with the corresponding input sample serving as control. MACS2 was run with default options except for'-p 0.05 -nomodel, -keepdup all'to turn off fragment size estimation and to keep all uniquely mapping reads, respectively. For differential peak analysis, DiffBind [30] packages was used to identify differentially enrichment peaks. The differentially peaks were selected with log2 (fold change) > 1 or log2 (fold change) < -1 and p value < 0.05. Motifs enriched in m6A peaks within all mRNAs were identi ed using HOMER software (v4.10) [31]. The motif length was restricted to 5-6 nucleotides.

RNA-Seq Analysis
For gastric cancer pair-end sequencing data, Hisat2 [32] was used to map reads to the genome of Homo sapiens (hg38) with default parameters. For other tumor RNA-seq data, bowtie2 [28] was used to align reads to the human reference genome. featureCounts 1.5.0-p3 was used to count the reads numbers mapped to each gene. And FPKM of each gene was calculated based on the length of the gene and reads count mapped to this gene. The differential expression genes between samples were analysed with DESeq2. P-value of 0.05 and absolute foldchange of 2 were set as the threshold for signi cantly differential expression.

Functional Enrichment Analysis
Gene Ontology (GO) enrichment analysis of the differentially expressed genes and differentially enrichment peak related genes was implemented by clusterPro ler R package. GO terms with corrected Pvalue less than 0.05 were considered signi cantly enriched by differential expressed genes. And then test the statistical enrichment of differential expression genes in KEGG pathways. The gene interaction network was constructed by the STRING protein interactome database [33].

Gene Set Enrichment Analysis (GSEA)
To explore DUSP7 gene involving pathways that detected from RNA-seq data of LAML patients which obtained from TCGA, gene set enrichment analysis (GSEA) was performed with the JAVA program using the KEGG pathway gene set access from MSigDB. Genes were ranked on the basis of signal2noise between the high-expression and low-expression groups. GSEA analysis was also carried out according to the ranking of person correlation in the DUSP7 positively correlated vs. DUSP7 negatively correlated. Gene sets with P-value < 0.05 after performing a GSEA preranked analysis were considered signi cantly enriched.

Statistical Analysis
Computational and statistical analyses were performed using R (version 4.0.4). Kaplan-Meier analysis and a log-rank test were used to conduct survival analysis. The Wilcoxon signed-rank test was applied to compare cumulative distribution. Statistical signi cance was de ned as a p-value < 0.05.

Declarations
Ethics approval and consent to participate Not necessary.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.

Availability of data and materials
Publicly available datasets were analyzed in this study. The MeRIP-seq data and RNA-seq data can be found from GEO database(https://www.ncbi.nlm.nih.gov/geo).
Authors' contributions X.L. conceived and designed the project. W.Q. collected the clinical information, MeRIP-seq data, RNA-seq data and analyzed data. W.Q. and X.L. wrote the manuscript. All authors contributed to the article and approved the submitted version.   Comparison of m6A enrichment levels in different tumor samples and KEGG enrichment analysis for differential enrichment peak associated genes The m6A peaks in these tumor samples were characterized by the canonical motif (A). The number of differential peaks identi ed in different tumor samples (B). Cumulative distribution curves of m6A enrichment in tumor samples compared to controls Effects of DEGs associated with differential m6A peaks on patients' overall survival Common differentially expressed genes (DEGs) and differential m6A methylation associated genes (DMGs) (A).
There are 10 genes that are common between DEGs and DMGs (B). Expression patterns of 10 genes in tumor groups (C). High DUSP7 expression was signi cantly associated with poor overall survival in AML patients (D) and m6A modi cation enrichment of DUSP7 decreased after METLL3 knockdown (E).
Effects of high expression of other genes on overall survival in patients with HNSC and GBM (F, G). Kaplan-Meier analysis was applied for survival analysis and differences between groups were analyzed using the log-rank test.
Page 19/20 Protein interaction results showed that DUSP7 was involved in AML progression through MAPKs and PIK3R2 proteins (C).