Bioinformatics Analysis the Expression Characteristics of RNA m6A Methylation Regulator in Breast Cancer Progression

Background: RNA m6A methylation regulator has been shown to play an important regulatory role in tumorigenesis and progression. However, the role and characteristics of RNA m6A methylation in breast cancer are not yet clear. Objective: To analyze the effect of the expression pattern of M6A methylation factor on the progression of breast cancer by bioinformatics. Results: This study uses bioinformatics to analyze three data sets of TCGA-BRCA, GSE96058 and GSE25066. According to the expression level of m6A-related genes, breast cancer samples are divided into 4 subtypes: quiescent, m6A-methylation, protein-binding, and mixed. The results of R-survival analysis showed that the survival of breast cancer samples of the four subtypes was signicantly different. The analysis of breast cancer gene expression features showed that there are signicant differences in the number of exon skip among the four subtypes. For tumor driver genes, the degree of TP53 mutation and copy number loss are the most in the protein-binding subtype. Among DNA damage repair genes, the copy number of RAD54B in the protein binding subtype is signicantly increased, while other DNA damage repair-related genes have fewer mutations and copy number deletions are common. The effect of m6A methylation on the proportion of inltrated immune cells indicates that Macrophages M0 and Mast cells resting are signicantly different in the four m6A subgroups, and are related to patient prognosis. The number of sensitive samples of chemotherapeutics (taxane anthracyclines) is the lowest in the quiescent subtype. In addition, the highest tumor stemness index in breast cancer samples is the protein-bound type, and the lowest is the m6A methylated type. Conclusion: The results of our analysis suggest that the reduced expression level of m6A methylation writer gene and the high expression of m6A reader protein in the four methylation subtypes of breast cancer are key to the progression of breast cancer. mutations in the TCGA-BRCA dataset and GO:0006281, as well as the mutation type and copy number mutation type information of these 20 genes. Heat maps are used to show the distribution of mutations and copy number variations of the top 10 tumor driver genes and DN damage repair genes in the four m6A subtypes.


Introduction
Worldwide, breast cancer is the most common malignancy cancer in women. Globally, there are more than 1.7 million new cases occur each year, with an increasing rate of 3 to 4 percent per year [1,2]. Gene mutation and abnormal expression are important reasons for the occurrence and progression of breast cancer [3]. Studies have shown that each stage of tumor occurrence and progression is accompanied by changes in genetic information [4]. Compared with the lower frequency of gene mutation, the frequency of epigenetic modi cation changes is greatly increased [5]. Epigenetic modi cations lead to changes in gene expression that are common in tumor cells. Among them, the abnormal modi cation of RNA plays a key role in tumor progression. RNA modi cations include 5-methylcytosine (m5C), pseudouridine, 2'-O-methylation, N1methyladenosine (m1A), N7-methyladenosine and N6-methyladenosine (m6A), etc [3,6]. Among these modi cations, RNA m6A methylation is considered to be the most signi cant and abundant modi cation form in eukaryotic cells, with its abundance accounting for 0.1-0.4% of total adenosine residues [7]. Since fat-mass and fat-associated protein was reported in 2011, the study on m6A methylation in tumors has attracted extensive attention [8].
M6A modi cation is a reversible process dynamically regulated by methyltransferases("Writers"), binding proteins ("Readers") and demethylases ("Erasers") [9,10]. M6A methylation-related proteins are usually abnormally expressed in a variety of human cancer tissues, which in turn leads to abnormal expression levels of cancer-driving or tumor suppressor genes [11]. For example, YTHDF2 promotes the stem cell phenotype and tumor metastasis of hepatocellular carcinoma by recognizing the m6A methylated 5'-UTR of OCT4 mRNA [12]. M6A reader YTHDF1 promotes ovarian cancer progression by increasing EIF3C translation [13]. METTL3 protein and mRNA levels are highly expressed in pancreatic cancer tissues and cancer cells, high METTL3 expression is associated with high pathological stage and high N stage [14]. Abnormal expression of m6A-related proteins has a similar effect in breast cancer. M6A demethylase FTO promoted breast tumor progression by inhibiting BNIP3 expression [15]. In the hypoxic microenvironment, ALKBH5 was highly expressed in breast cancer, leading to the demethylation of m6A on NANOG mRNA and increasing the stability of NANOG mRNA, thus promoting the enrichment of breast tumor stem cells in [16]. In addition, it has been reported that the abnormal expression of M6A-related genes is related to the prognosis and survival of breast cancer patients, for example, the up-regulated expression of YTHDF1, YTHDF3 and KIAA1429 is related to the poor prognosis of breast cancer [17]. The increase or decrease of m6A modi cation showed different results in different tumors. At present, the relationship between the abnormal expression of M6A-related genes and clinical outcomes in patients is not clear, and the effect of M6A-related factors on the molecular characteristics of breast cancer has not been systematically reported. It has been previously reported that the mechanism of m6A modi cation in promoting cancer depends on the expression of m6A methylationrelated proteins. Therefore, breast cancer is divided into several subtypes according to the expression pattern of m6A gene. In this study, the relationship between different m6A subtypes and survival, prognosis, mutation, copy number variation, immune cell in ltration, drug sensitivity and tumor stemness index in breast cancer samples was analyzed.  [18]. Two different standardized algorithms, RUVr and RUVg, are used to remove batch effects ( Figure S1). (writer + reader) and Set2 (eraser). According to the Z-score median of Set1 and Set2 gene expression of the sample, the samples is divided into 4 types. They are static Quiescent (setl ≤ 0, set2 ≤ 0), m6A methylation (set1 > 0, set2 ≤ 0), Protein binding (set1 ≤ 0, set2 > 0) and Mixed (set1 > 0, set2 > 0).

Analysis of the prognosis and clinicopathological indicators of the four m6A subtypes of breast cancer
The survival package of R is used to analyze the prognosis of the four m6A subtypes of breast cancer, and circos is used to analyze the correlation between the four m6A subtypes of breast cancer and different clinicopathological indicators.  [19]. The MsigDB database (V7.1) is used to download the gene annotation data of Genes annotated by the GO term GO: 0006281. The R package Maftols was used to extract the top 10 genes with the most mutations in the TCGA-BRCA dataset and GO:0006281, as well as the mutation type and copy number mutation type information of these 20 genes. Heat maps are used to show the distribution of mutations and copy number variations of the top 10 tumor driver genes and DN damage repair genes in the four m6A subtypes.
1.5 Analysis of in ltrated immune cell and immune e cacy of four breast cancer m6A subtypes Cibersort (V1.01) is used to analyze the proportion of immune cells in the four subtypes of tumor samples [20]. The coxph function in the survival (V3.2-3) package of R is used in the univariate cox analysis of the proportion of 22 immune cells to obtain the immune cell in ltration types that are signi cantly related to the prognosis of the four m6A subtypes of breast cancer. Meanwhile, the proportion of 22 immune cells in the four m6A subtypes of breast cancer are analyzed by ANOVA. The in ltration types of immune cells in the four m6A subtypes of breast cancer were obtained. Multivariate cox analysis is performed using the covariance of the immune in ltration ratio of different immune cells calculated by Cibersort, and the sum of the value multiplied by the corresponding immune in ltration ratio is used as the sample risk score. TIDE is used to analyze the differences in immune e cacy of the four m6A subtypes in the TCGA-BRCA and GES96058 data sets. The analysis is performed using the TIDE (HTTPS: /github.com/liulab-dfci/TIDEpy) default parameter [21]. Then, the Z-score of TCGA and GSE data sets is extracted. The expression levels of M6A-related genes were rst standardized by Z-score. Then, breast cancer samples were divided into four subtypes, according to the Z-score median of m6A-related genes expression level in the samples in Set1 and Set2 (Fig. 1A). The sample sizes of the four m6A methylation subtypes show similar distributions (proportions) in each data set, which indicates that the typing results of breast cancer samples based on the expression levels of m6A-related genes showed good consistency in different data sets (Table 1). In addition, the expression levels of 21 m6A methylation-related genes in four m6A subtype breast cancer samples are shown in Fig. 1B. The results of survival analysis showed that there were signi cant prognostic differences among the 4 subtypes ( Fig. 2A). Among them, the survival rate of mixed subtype samples is higher than the other three subtypes. In the triple-negative breast cancer TNBC samples, the survival of the four subtypes of breast cancer samples has more signi cant differences ( Fig. 2B). In addition, the analysis results between m6A subtypes and clinical pathological features show that there is a signi cant quantitative difference between the m6A methylated subtypes and protein binding subtypes of ER+. However, there were no signi cant differences among the four m6A subtypes in HER+/-, TNM staging, and age (Fig. 2C).

Analysis of variable splicing, mutation and copy number variation of four m6A breast cancer subtypes
According to the statistical results of TCGA SpliceSeq, we found that there are differences in the number of ES between the four m6A subtypes, and the other six alternative splicing subtypes have no signi cant changes among the four m6A subtypes (Fig. 3A). Furthermore, the effect of the number of ES type variable shear on the prognosis of the four m6A subtypes was analyzed, and the results showed that the m6A methylated subgroup was signi cantly different from the protein-bound subgroup in the prognosis, no matter in all samples or in triple-negative breast cancer samples ( Fig. 3B and C). Next, the mutations and copy number variations of tumor driver factors and DNA damage repair genes of the four m6A subtype samples were analyzed. The 10 oncogenes with the most mutations in the TCGA-BRCA dataset were selected. The 10 genes are TP53, PIK3CA, TTN, CDH1, GATA3, MUC16, KMT2C, MAP3K1, RYR2, HMCN1. The results showed that TP53 was the gene with the most mutations and copy number loss in protein-binding type breast cancer samples (Fig. 3D).
Morever, the analysis of GO: 0006281 gene set showed that the 10 DNA damage repair genes with the most mutations were TP53, ATRX, BRCA2, ATM, BRCA1, POLQ, SETX, BLM, RAD54B, and ATR. Further analysis of the mutation types and copy number variation types of these 10 genes revealed that except for TP53, DNA damage repair genes were rarely mutated in breast cancer samples. However in terms of copy number variation, the copy number of RAD54B in the protein-bound subtype was increased, while the other 9 genes all showed different degrees of copy number deletion in the 4 m6A subtypes (Fig. 3E).

Analysis of immune cell in ltration in four m6A subtypes of breast cancer
Cibersort was used to analyze the types and proportions of in ltrated immune cells for TCGA and GSE96058 samples. The results showed that a total of 22 types of immune cell in ltratio, of which 11 types showed signi cantly different in ltration rates among the 4 subgroups (Fig. 4A). Considering the in uence of immune cell in ltration on the prognosis of the sample, univariate cox is performed to analyze the relationship between the proportion of immune cell in ltration and patient survival. The results indicated that, in all the samples, the different proportion of NK cells resting and macrophages M0 led to signi cant differences in patient survival time, with HR < 1 (Fig. 4B). In the triple-negative breast cancer samples, patient survival was signi cantly affected by the ratio of dendritic cell resting and mast cell resting, and HR < 1 (Fig. 4C). Next, the types of immune cell in ltration with signi cant prognostic value in univariate cox analysis were demonstrated by multivariate prognostic analysis (risk value = proportion of immune cell in ltration *HR). The results showed that NK cells resting and macrophages M0 were related to the survival of all samples in univariate prognostic analysis. The risk prediction results of high and low risk samples obtained by using the median risk value as the grouping index does not match the actual risk (risk value = β-NK cells resting×NK cells resting + β-Macrophages M0×Macrophages M0) (Fig. 4D). However, the multi-factor risk prediction of dendritic cells resting and mast cells resting showed good risk prediction effects for triple-negative breast cancer samples (Fig. 4E). Based on the above analysis, mast cells resting and macrophages M0 was taken as the follow-up research object, because of the signi cant differences between the two types of immune cell in ltration in the m6A subgroup, which also showed signi cant in uence on the prognosis in univariate cox analysis.
Further survival analysis revealed that the survival of all breast cancer samples was related to changes in the proportion of Macrophages M0 (Fig. 4F). While, mast cells resting with signi cant prognostic value in triple-negative breast cancer (Fig. 4G).
2.5 Differences in immune e cacy and sensitivity to anthracycline among the 4 m6A subtypes TIDE is used as an indicator of differences in immune e cacy. The results showed that there was no signi cant difference in TIDE between the resting group and the m6A methylation group, as well as the mixed type and the m6A methylation group, and there were signi cant differences in immune e cacy between the other groups (Fig. 5A). Based on the grouping results of GSE25066 samples (Table 2), the number of anthracyclines resistance and sensitivity samples of the 4 m6A subtype groups were compared. The results showed that the number of drug-sensitive samples in the resting group was the least among the 4 subtypes. This result indicates that the expression of m6A gene contributes to the response of breast cancer to anthracyclines (Fig. 5B).  (Fig. 6). This result suggests that the expression of m6A writer and eraser alone will not lead to the increase of tumor stemness, while the expression of m6A reader is related to the increase of tumor cell stemness. In addition, the reason why the dryness index of resting breast cancer samples is higher than that of m6A methylated samples may be due to non-m6A factors.

Discussion
Based on the expression level of m6A in breast cancer, this study took multi-platform and diverse data as the object, and the breast cancer samples were divided into 4 m6A subtypes according to the expression level of m6A. The results of this analysis indicate that m6A has important application value in the prognosis, detection of tumor progression, evaluation of chemotherapy effect and risk prediction of breast cancer patients, and may be a new reference standard for the evaluation of breast cancer progression and prognosis. The formation process of m6A methylation is catalyzed by methyltransferases composed of RBM15, ZC3H13, METTL3, METTL14, WTAP and KIAA1429, while the removal process is mediated by demethyases such as FTO and ALKBH5 [23]. M6A methylation level is related to the expression of methyltransferase and demethylase in cells. In addition, a group of speci c RNA binding proteins consisting of YTHDF1/2/3, YTHDC1/2, HNRNPA2B1, LRPRC, FMR1, etc. are necessary for m6A modi cation to exert effects [24]. The binding protein of m6A is a "double-edged sword", which can not only promote the translation of mRNA, but also reduce the stability and accelerate degradation of mRNA. More and more evidences indicate that the dysregulation of m6A regulatory factor is a key factor in the development of tumor.
In this study, differences in clinicopathological indicators, gene mutations, copy number variations, types and proportions of immune cell in ltration, immune e cacy, chemotherapeutic drug sensitivity, and tumor stemness are analyzed among the four m6A subtypes. Speci cally, among the four breast cancer subtypes, the survival rate of the protein-binding subtype samples is signi cantly lower than the other samples. It has been shown that the decreased expression of m6A methylase (METTL3, METTL14 and WTAP) and the increased expression of m6A methylase (FTO and ALKBH5) in BC are closely related to the progression and poor survival rate of breast cancer [25]. The up-regulated expressions of m6A readers YTHDF1 and YTHDF3 are associated with metastasis and poor prognosis in breast cancer [26]. High expression of m6A binding protein has also been reported in other tumors to be associated with poor prognosis and reduced survival of patients except breast cancer. For instance, highly expressed LRPPRC is an effective marker for shorter BCP-free survival and OS in patients with metastatic PCa after androgen deprivation therapy [27].These results indicated that the high expression of M6A-related proteins, especially mA binding proteins, is an important factor for poor prognosis of tumors. In terms of molecular characteristics, TP53 mutations and copy number loss were most common in protein-bound breast cancer samples. It is known that abnormal expression of M6A-related proteins regulates the expression of p53 by changing m6A modi cation levels. It is reported that he up-regulated expression of YTHDF1 and HNRNPA2B1 in melanoma resulted in the inhibition of p53 expression [28]. However, the regulatory effect of TP53 on m6A-related protein expression has not been reported. We hypothesized that the mutation and copy number loss of TP53, a key tumor suppressor gene, reduced the inhibition of tumor suppressor genes, which may be the key to promoting the up-regulation of m6A binding protein in breast cancer. It is con rmed that the expression of m6A binding protein IGF2BP1 mRNA is increased by 100 times in the study of brolamellar hepatocellular carcinoma, while the p53 cancer-suppressing pathway is signi cantly inactivated in the FL-HCC cells [29]. In addition, CNV deletions of DNA damage repair genes increase genomic instability, which is also an important factor for abnormal expression of M6A-related genes.
Considering the in uence of the proportion and type of in ltrating immune cells on the prognosis of the sample, m6A gene expression and the proportion and subtypes of in ltrating immune cells were analyzed. The results revealed that the proportion of NK cells and macrophages M0 was signi cantly different in the 4 m6A subtypes, and related to the patient survival time. However, in triple-negative breast cancer samples, the ratio of dendritic cells resting and mast cells resting are the main factor for the signi cant difference in survival time of the four m6A subtype samples. In fact, abnormal expression of m6A-related factors determines the type and proportion of in ltrating immune cells. For example, the in ltration of CD8 + T cells in pancreatic cancer was reduced by the increase and loss of ALKBH5 at arm-level [30]. High expression of WTAP reduces tumor-associated T lymphocyte in ltration and immune response, leading to poor prognosis of gastric cancer [31]. In addition, this study revealed that the immune e cacy of m6A protein-binding breast cancer was signi cantly lower than that of other subtypes. This approach to predicting tumor therapy based on m6A-related factor expression has also been demonstrated in other tumor species. For example, glioma patients were divided into high-risk and low-risk groups according to the expression of m6A regulatory factor, and a good predictive effect was obtained in terms of prognosis and treatment effect [32]. In the comparison of chemotherapy resistance, the most chemotherapysensitive samples were protein binding subtype breast cancer, while quiesent subtype had the least chemotherapy-sensitive samples. Similar studies have yielded different results in different tumors. Similarly, high expression of METTL3, YTHDF3, and YTHDF1 induced NSCLC drug resistance and metastasis by promoting m6A methylation of YAP mRNA and translating [33]. Finally, the analysis of tumor stemness showed that the m6A protein-binding breast cancer samples had the strongest tumor stemness among the four subtypes, which was consistent with the previous analysis.
In conclusion, the effects of different m6A-related factor expression levels on the prognosis, mutation, immunity and chemotherapy of breast cancer samples were systematically and comprehensively analyzed. This result has positive signi cance for understanding the role of m6A molecular events in breast cancer.

Conclusion
At present, the effect of the overall expression pattern of m6A methylation factor on breast cancer has not been reported. In this study, the classi cation of breast cancer based on the expression characteristics of m6A methylation regulator can be used to evaluate the disease progression, prognosis, chemotherapy effect, immune cell invasion and stemness index of patients. Our results indicate that Quiescent, M6A methylation, Protein binding and Mixed expression patterns predict different progression and outcome of breast cancer. These ndings con rm that the different expression patterns of M6A methylation related factors in breast cancer contribute to provide potential molecular targets for clinical healthcare workers, and are expected to become a new molecular diagnostic and therapy method.

Ethics approval and consent to participate
Clinical and animal experiments are not involved in this study. Therefore, ethical and participatory consent elements are not included.

Consent for publication
This study did not involve the collection of patient samples and data.

Availability of data and materials
This research involves data from 3 data sets: TCGA-BRCA, GSE96058 and GSE25066. TCGA -BRCA gene expression data, copy number variation and gene mutation data downloaded from the ucsc xena: