Genetic characteristics and prognosis of m6A RNAmethylation regulator in acute myeloid leukemia

Background: To identify the genetic characteristics of m6A RNA methylation regulators in AML and explore their potential value as prognostic markers. Methods: RNA-seq transcriptome data and clinical survival data of acute myeloid leukemia (AML) were downloaded from ICGC and TCGA, gene annotation les were downloaded from GENECODE (1). 13 widely reported m6A RNA alphas were obtained from the literature. The expression of m6A RNA methylation regulators were collected and analized using gene annotation les. The samples were subjected to consistent clustering to obtain two subgroups RM1 and RM2, and the pathological characteristics and survival between the two subgroups were analyzed. Comparative analysis and functional analysis of m6A RNA methylation regulators between subgroups were completed. The STRING database analyzed the interactions between m6A RNA methylation regulators, and Spearman analyzed the correlation of expression of m6A RNA methylation regulators. COX regression analysis and risk scores were used to predict prognosis and pathological characteristics, and risk scores calculated using features were used to predict the prognosis and clinicopathological characteristics of tumor patients. Results: According to the morphological characteristics of AML, the samples were divided into 8 categories (M0 Undifferentiated, M1, M2, M3, M4, M5, M6, M7), and among them, the expression prole and expression heat map of 13 m6A RNA methylation regulators were constructed. Using m6A RNA methylation regulator as a feature vector, consistent clustering of 151 samples yielded two subgroups RM1 and RM2. Among them, the expression of the regulator in RM1 was higher than RM2, and RM2 patients had a longer survival time than RM1. Gene set enrichment analysis found that functional processes such as endothelial-hematopoietic transformation through the Notch pathway were signicantly enriched in RM1. The analysis of the KM curve indicates that when the expression of FTO or ALKBH5 or of ZC3H13 was low, the survival time of patients was signicantly higher than that with high expression. Conclusion: m6A RNA methylase regulator is not only an independent prognostic marker but also predict m6A m6A million 13 m6A RNA methylation m6A RNA methylation interactions two and data results

current knowledge, the exact etiology of leukemia is unknown, but it is related to regional environmental factors, ionizing radiation, chemical exposure, alcoholism and smoking are related to the body's special response to certain viral infections [4][5]. In addition, in recent years, it has been found through genetic mutation frequency and some biomarker studies that it may be a combination of genetics and environmental factors results [6][7].
M6A is a universal form of mRNA modi cation, but little is known about its role in AML. This work aims to identify the genetic characteristics and prognostic value of m6A regulators in AML. The AML samples were collected in TCGA, the Log-rank test and Cox regression model were used for survival analysis. Chisquare test was used to calculate the relationship between m6A regulator changes and clinicopathology. Genetic changes of m6A regulators in AML were identi ed and their changes were found to be poor signi cant relationships between clinical characteristics. These ndings help to understand the epigenetic modi cation of RNA in AML.
Select 151 acute myeloid leukemia samples with existing pathological characteristics (morphological characteristics: M0 Undifferentiated, M1, M2, M3, M4, M5, M6, M7) and RNA-seq expression data; use gene annotation le to construct the expression pro le (rpkm data) of m6A RNA methylation regulators (Supplementary Table 1), classify acute myeloid leukemia samples based on the morphological characteristics of acute myeloid leukemia, and use m6A RNA methylation regulators, nally construct an expression heat map.

Sample consistent clustering and subgroup analysis
The AML m6A RNA methylation regulator was used as a feature vector, and ConsensusClusterPlus consensus clustering (k = 2) was performed on the samples. Two subgroups RM1 and RM2 were obtained. The t-test was used to compare and analyze the age difference between rm1 and rm2, the chisquare test was used to analyze the difference between the who subclasses of the two subgroups, and the cox regression was used to analyze the difference in survival between the two subgroups.
2.3 Analysis of the interaction between m6A RNA methylation regulators and functional analysis between subgroups The STRING database was used to analyze the interactions between m6A RNA methylation regulators, and Spearman analyzed the expression correlation of m6A RNA methylation regulators. Construct the expression pro les of m6A RNA methylation regulators in RM1 and RM2 subgroups, use PCA to analyze the differences in m6A RNA methylation regulator expression between the two subgroups, and use R package: clusterpro er to annotate and Enrichment analysis. The enrichment content includes GO biological processes (BP) and KEGG Pathways.
2.4. Cox regression analysis and the use of risk scores to predict prognosis and pathological characteristics A risk score is given for each acute myeloid leukemia sample, with the formula: Risk score = , Among them, Coe is the regression coe cient (COe cient) of COX regression, and xi is the expression value of the prognosis methylation regulator of each acute myeloid leukemia. This formula was used to calculate the risk score of each acute myeloid leukemia sample. According to this risk score, the sample was divided into high-risk group and low-risk group. Find the difference between overall survival (OS) between the two categories.

Prediction of prognosis and clinicopathological characteristics of tumor patients using risk scores calculated by features
Receiver operating characteristic (ROC) curves were used to estimate classi cation performance.The higher the area under the curve (AUC) value, the higher the classi cation performance. Using 13 m6A RNA methylation regulators as risk characteristics, the TCGA acute myeloid leukemia sample was divided into 5 parts, 5 times cross-validation was applied, the model was trained with four fths of the sample and tested on the test set (the remaining one fths of the samples). In this way, each part will be tested once. Then the receiver operating characteristic (ROC) curve is used to estimate the classi cation performance. The higher the area under the curve (AUC) value , The higher the classi cation performance. Comparative analysis of whether the risk score model can perfectly predict the three-year survival rate, RM1 / 2 subgroup, prognosis results, morphological characteristics and other characteristics of tumor patients.

Sample consistent clustering
Using the selected 13 m6A RNA methylation regulators as feature vectors, ConsensusClusterPlus clustering was performed on 151 samples, and two subgroups RM1 and RM2 were obtained. See the ConsensusClusterPlus clustering map of two subgroups RM1 and RM2 in Figure 1B. Under ideal conditions, the samples in the Consensus Cluster Plus cluster should be scattered, and the samples in the group should be clustered together, as shown in the Figure 1B: where the purple dots represent RM1 and the green dots represent RM2. RM1 and RM2 contain 91 and 60 samples, respectively (Supplementary Table 2). The heatmap of 13 m6A RNA methylation regulators of acute myeloid leukemia between the two subgroups is shown in Figure 1C. One row represents one gene and one column represents a sample. The samples were ordered from left to right, and the red group on the left is RM1, and the right blue group is RM2. The genes are based on different functions: writers (methyltransferase: METTL3, METTL14, WTAP, KIAA1429, RBM15, ZC3H13); readers (binding proteins: YTHDC1, YTHDC2, YTHDF1, YTHDF2, HNRNPC ); erasers (demethylase: FTO, ALKBH5), sorted from top to bottom. Red represents high gene expression and purple represents low gene expression. Among them, the expression level of 13 m6A RNA methylation regulators in RM1 was generally higher than RM2.

Comparative analysis of pathological characteristics (age + morphological characteristics) and
survival between the two subgroups Extract the samples containing the age data in the two subgroups, and perform a T test on the age of the samples in the two subgroups. Compare the differences between the ages of the two subgroups (RM1, RM2). From the box plot, you can see that there is a difference between the ages of the two subgroups. (p = 0.08), but the difference is not signi cant (Figure 2A). The samples containing the morphological characteristics data in the two subgroups were extracted and the chi-square test was used to analyze the morphological characteristics of the two subgroups. The results showed that: The morphological characteristics of the two subgroups were signi cantly different (p = 0.00337). For a pie chart of the morphological characteristics of the two subgroups, see Figure 2B. Extract the samples with survival data from the two subgroups, use Cox regression to analyze the survival of the two subgroups, and draw the KM survival curve, see Figure 2C. The results showed that the two subgroups (RM1, RM2) had different survival periods. Among them, 13 m6A RNA methylation regulators had higher expression values in RM1 than RM2, and RM2 (blue) patients had a longer survival time than RM1 (red). The results indicate that 13 m6A RNA methylation regulators were in RM1. It is possible that it inhibits the expression of key functional genes in the AML patients, resulting in a signi cant reduction in survival time.
3.4 Analysis of the interaction between m6A RNA methylation regulators and functional analysis among subgroups Using STRING to draw a protein interaction network diagram (PPI) of 13 m6A RNA methylation regulators, and get the interaction relationship between 13 m6A RNA methylation regulators, see Figure   2D. Principal component analysis (PCA) was used to evaluate the difference in expression between the two subgroup samples (RM1, RM2), and the results were shown in Figure 3a. The blue dots represent RM1 and the yellow triangles represent RM2. Then, using R Contains clusterPro ler for enrichment analysis of 13 m6A RNA methylation regulators (GO-BP, KEGG Pathway) See Figure 3b. The 13 m6A RNA methylation regulators were mainly involved in Notch signaling pathway, cytokine-mediated signaling pathway, endothelial pathways and biological processes such as hematopoietic transition have been shown to be related to acute myeloid leukemia.

Cox regression analysis and use of risk scores to predict prognosis
Each sample was scored by the m6A RNA methylation regulator, and the samples were divided into high and low risk groups based on the risk score of each sample. Survival analysis was performed on the two groups of samples. The KM survival curve is shown in Figure 4a. It can be seen that the risk score can well separate the high and low risk groups of the sample (p = 0.042). In order to further study the previous relationship of the 13 m6A RNA methylation regulators, we constructed their co-expression relationship, and different points in the co-expression relationship diagram represented different methylation regulators and described their functional correlation. Using the R package Corrplot further visualized the above relationship.The expression correlation diagram of 13 m6A RNA methylation regulators is shown in Figure 4b Figure 5A, test set shown in Figure 5B. Risk scores for subgroup outcome prediction, training set shown in Figure 5C, test set shown in Figure 5D. Risk scores were used to evaluate combination features to predict outcome outcome, training set see Figure 5E, the test set was shown in Figure 5F, the risk score of the combined feature was used to predict the morphological characteristics of the outcome, the training set was shown in Figure 5G, and the test set was shown in Figure 5H. The ROC curve shows that the risk score can perfectly predict the three-year survival rate of AML patients. In the RM1 / 2 subgroup, the morphological feature status and prognostic outcome status, and the prediction e ciency was better than the morphological feature status. These results showed that the risk score calculated by the feature can accurately predict the prognosis and clinicopathological characteristics of AML patients.

Analysis of prognostic correlation of 13 m6A RNA methylation regulators
Thirteen m6A RNA methylation regulators were individually analyzed for prognosis and the survival curve was drawn ( Figure 6). The results showed that: the eraser (demethylase): FTO, ALKBH5, and writers: ZC3H13, were signi cantly correlated with overall survival (P <0.05, Figure 6K-M).

Discussion
Acute myelogenous leukemia is a hematopoietic stem cell malignant disease. It is characterized by the abnormal proliferation of the myeloid lineage of embryonic cell clones, which can lead to the accumulation of immature progenitor cells and impair hematopoietic function. [11][12]. The condition of AML develops very quickly. If not treated in time, it may be fatal within weeks or months. Acute myeloid leukemia is the most common acute leukemia in adults [12][13], although AML can occur in all ages, it mainly happens in the elderly and the average age at diagnosis is approximately 70 years [14][15]. Currently, most patients diagnosed with AML are unable to determine their etiology and susceptibility, but are exposed to DNA-destroying agents (such as benzene, cigarettes, ionizing radiation (usually due to radiation therapy), and cytotoxic chemotherapy) which increase the risk of amAMLl [16][17]. AML is similar to other cancers in that it exhibits abnormal proliferation, survival and differentiation of related cells. This kind of cellular characteristics is caused by genetic changes in the cells, but the coding sequence mutations in AML cells are much less than those in most solid epithelial tumor cells. Previous studies have shown an identi cation in 200 adult AML patient's tumor samples with nearly 2,000 different mutated genes, but only 23 of them were frequently mutated, and an average of 13 mutations were identi ed per genome; 5 of these mutations were in repeatedly mutated genes [18][19], the large overlap of these mutations provides a potential direction for the prognosis and treatment of AML [20][21].
M6A RNA methylation modi cation is the most common way to modify mRNA, and it exists in many species of animals and plants, yeasts, bacteria, and mycoplasma. Studies have found that it can modify more than 7,000 mammalian genes, of which Contains about 12,000 m6A sites [22]. These sites are concentrated in PRACH (R is G or A, H is A, C or U), they are usually found in the terminator and 3'UTR.
M6A through m6A methyltransferase ("writer") is modi ed and subsequently recognized by the m6A binding protein ("reader"), and this modi cation is also eliminated by demethylase ("eraser"). M6A modi cation is very common, and its dynamic regulation has been shown to be strongly related to gene expression [23]. In recent years, the clinical application value of m6A in tumors has become increasingly apparent. It mainly affects the occurrence and development of cancer by regulating the life activities of cells. M6A as a promising biomarker and is increasingly being used to detect and prevent cancer [24]. In addition, more and more studies show that m6A has potential clinical application value as a therapeutic target for cancer patients [25].
Researchers have reported that METTL3 is a proto-oncogene, which can be suppressed by m6A modi cation. Prepare cell differentiation in test tubes while promoting cell growth. Conversely, in vivo, this can induce cell differentiation and apoptosis, thereby inhibiting leukemia [26]. METTL14, which is also a proto-oncogene, can be modi ed by m6A. Inhibit the differentiation of hepatocytes in leukemia and promote the regeneration of stem cells [27]. FTO is also a proto-oncogene, and m6A modi cation can promote the transformation of leukocytes in leukemia and inhibit their differentiation [28]. In our study, we obtained and screened RNA-seq transcriptome data and clinical survival data for acute myeloid leukemia (AML) from ICGC and TCGA, and constructed 13 m6A RNA methylation regulators using gene annotation les Expression pro le of factors. Through consistent clustering of samples, two subgroups RM1 and RM2 were obtained, and the functional analysis of m6A RNA methylation regulators between subgroups was performed. For m6A RNA methylation regulators, we used the STRING database for protein protein interaction network (PPI network) analysis. The STRING database is a database that searches for known and predicted protein-protein interactions. This database can be applied to 2031 species, including the interaction between 9.6 million proteins and 13.8 million proteins. Using STRING, the interactions between 13 m6A RNA methylation regulators were obtained and the corresponding network diagrams were drawn; circles in the network represent proteins (i.e., 13 m6A RNA methylation regulators), straight lines represent interactions between proteins. Different colored lines represent different evidence; the stronger the two proteins interact, the thicker the lines; the different colored and shaped lines represent different interactions, including data results mined from PubMed summary text, database data results, and results predicted using bioinformatics methods. Principal Component Analysis (Principal Component Analysis, PCA) is a multivariate analysis technique. The core idea of PCA is to reduce the dimensionality of the data while preserving the differences of the data as much as possible, that is, abstracting out less unrelated variables to describe the data. It is a group of points in multi-dimensional space. While maintaining the relative spatial position of this group of points, it is rotated to a new coordinate system (the coordinate axis is each PC), so that the coordinates of each point on the new coordinate axis (projection) has the largest variance, and the axis with the largest projection variance is PC1, followed by PC2. Principal component analysis (PCA) is used to evaluate the difference in expression of samples (RM1, RM2) between two subgroups, and PCA uses linear algebra calculation method, dimensionality reduction and principal component extraction of tens of thousands of genetic variables. Ideally, in the PCA graph, samples between groups should be scattered, and samples within the group should be gathered together. In this study, we use PCA was used to compare the expression pro les between the two subgroups of RM1 and RM2, and the results showed that there was a signi cant difference between them. We performed a survival analysis of the grouped cases based on the mean of FTO and ALKBH5, and the analysis of the KM curve indicated that the expression was low in FTO (blue), the patient's survival time is signi cantly higher than high expression (red); in the case of ALKBH5 low expression (blue), the patient's survival time is signi cantly higher than high expression (red). Similarly, based on The survival analysis of grouped cases by the mean of ZC3H13 also showed that compared with high expression (red), ZC3H13 low expression (blue), the survival time of patients was signi cantly improved. The results indicate that FTO, ALKBH5 and ZC3H13 may be involved in the regulation as key genes The occurrence and development of AML has led to a signi cant reduction in survival time.
In the future, m6A RNA methylation modi cation can become a potential therapeutic target. New drugs can control the cancer process through the regulation of m6A RNA methylation modi cation. It can also become a biomarker for cancer occurrence and development to provide rapid and sensitive warnings [29][30][31]. We also need more experiments to discover the mechanism of m6A RNA methylation modi cation in AML regulation, so as to open up the development of new drugs and biomarker research door.

Conclusion
In conclusion, 13 major m6A RNA methylation regulators can be divided into 2 subgroups (RM1 / 2) by consistent clustering analysis in different clinicopathological characteristics samples. Compared with RM2, RM1 has worse prognosis and more morphological features. Moreover, gene enrichment analysis found that functional processes such as endothelial-hematopoietic transformation transmitting signals through the Notch pathway were signi cantly enriched in RM1. The above ndings indicate that 13 m6A RNA methylation enzyme regulators can be used as a risk pro le for AML, which is not only an independent prognostic marker, but also predicts the clinicopathological features of AML.  a. Sample survival curve for high and low risk groups; 4b. Correlation diagram of methylation regulator expression in acute myeloid leukemia (The beginning of each line represents 13 m6A RNA methylation regulators, and each small circle represents the Spearman correlation coe cient of two methylation regulators. Red indicates that the Spearman correlation coe cient is close to -1, and blue indicates The Spearman correlation coe cient is closer to 1. The closer the absolute value of the Spearman correlation coe cient is to 1, the more speculated that the two methylation regulators play a similar role in RM1 or RM2 of AML).