Identification of a five m6A-relevant mRNAs signature and risk score for the prognostication of gastric cancer

Background N6-methyladenosine (m6A) is the most abundant form of methylation modification in eukaryotic cell messenger RNA (mRNA). However, the role of m6A in gastric cancer (GC), which is one of the most common gastrointestinal malignancies, is unclear. In this study, m6A-relevant mRNA signatures and risk scores were determined to predict the prognosis of GC. Methods The expression profiles and clinical information of 367 patients were downloaded from The Cancer Genome Atlas (TCGA). Cluster analysis and univariate Cox analysis were performed to identify the regulatory factors of RNA methylation associated with GC prognosis. A co-expression network was constructed using the WGCNA package in R. The correlations between module eigengenes and clinical traits were then calculated to identify the relevant modules. We used univariate Cox analysis to screen for genes that are significantly associated with prognosis in the module. We identified hub genes by least absolute shrinkage and selection operator (LASSO) and multivariate analysis and developed a Cox prognostic model. Finally, the hub gene expression values weighted by the coefficients from the LASSO regression were applied to generate a risk score for each patient, and receiver operating characteristic (ROC) and Kaplan-Meier curves were used to assess the prognostic capacity of the risk scores. The asporin (ASPN) gene in GC cell lines was verified via quantitative polymerase chain reaction (qPCR) and Western blot. Moreover, 5-ethynyl-2'-deoxyuridine (EdU) and transwell assays were applied to evaluate the effects of the proliferation, migration, and invasion abilities in GC cells after ASPN knockdown. Western blot verified the effects of ASPN on the phosphoinositide 3-kinase (PI3K)/serine/threonine kinase (AKT)/mechanistic target of rapamycin kinase (mTOR) pathway and epithelial-mesenchymal transition (EMT) pathway-related gene expression. Results Our results indicated that AARD, ASPN, SLAMF9, MIR3117 and DUSP1 were hub genes affecting the prognosis of GC patients. Besides, we found that ASPN expression was upregulated in GC cells. The knockdown of ASPN expression suppressed GC cell proliferation, migration, and invasion by deactivating the PI3K/AKT/mTOR and EMT pathways, respectively. Conclusions Our findings indicated that ASPN participates in the biological process of GC as an oncogene and may be a promising biomarker in GC.


Introduction
Global cancer statistics have shown that gastric cancer (GC) is considered one of the most invasive cancers and the third leading cause of tumor-related deaths [1] . Over the past few decades, various strategies have been developed for GC treatment, which is a signi cant improvement in the early diagnosis and treatment of GC [2,3] . However, due to the atypical and insidious nature of early clinical symptoms of gastric cancer, only a small number of patients can be clearly diagnosed, while more than 60% of patients already have local or distant metastasis at the time of diagnosis [4] . Therefore, there is an urgent need to develop an effective and effective strategy for early diagnosis and treatment of GC.
N6-methyladenosine (m6A) modi cation is the methylation of the adenosine base at the nitrogen-6 position of the mRNA. It is a rich nucleotide modi cation rst discovered in eukaryotic messenger RNA in (methyltransferases, including WTAP, KIAA1429, RBM15 and METTL3/14), readers (YTH domain containing RNA binding proteins and heterogeneous nuclear ribonucleoprotein, including YTHDF1/2/3, YTHDC1 and HNRNPC) and erasers (demethylases, including ALKBH5 and FTO) [7,8] . In recent years, it has been found that the expression of HNRNPC is related to the development of malignant tumors and gliomas, is involved in the occurrence of glioblastoma multiforme which can predict the prognosis [9] . HNRNPC was proved to promote oral squamous cell carcinoma carcinogenesis and can be an independent biomarker prognostic biomarker [10] . Studies have also shown the potential value of HNRNPC as a prognostic and therapeutic marker for GC, and it plays an important role in promoting the translation of human gastric cancer genes [11] . In this study, we downloaded the expression pro le and clinical data of the Tumor Genome Atlas (TCGA). By using the univariate Cox analysis method, we determined the prognostic m6A RNA methylation regulator. In addition, we established weighted gene co-expression network analysis (WGCNA), lasso regression analysis (least absolute shrinkage and selection operator) and multivariate COX analysis to identify the pivotal genes that might be regulated by m6A RNA methylation regulators and related to the prognosis of GC. Finally, according to the selected combination of pivot genes, a risk scoring model was constructed to evaluate its application in the prognosis of GC.
These hub genes are closely related to the methylation m6A RNA methylation regulators, which provides new ideas for the research of GC.

2.1Patient datasets and m6A regulators
The mRNA expression data and corresponding clinical information of patients with gastric cancer were downloaded from The Cancer GenomeAtlas (TCGA). This study included the expression pro les of 309 patients with complete follow-up data in the TCGA database. Tcgabiollinks package was used to download TCGA data.
In this study, we included many m6A methylation regulators, such as writers : RNA-binding motif 15 (RBM15), KIAA1429, METTL3, METTL14, zinc nger CCCH domain-containing protein 13 (ZC3H13) and WT1-associated protein ; readers : YTH m6A RNA-binding protein 1 (YTHDF1), YTH m6A RNAbinding protein 2 (YTHDF2), YTH domain-containing 1 (YTHDC1), YTH domain-containing 1 (YTHDC2), and heterogeneous nuclear ribonucleoprotein C (HNRNPC); and erasers : α-ketoglutaratedependent dioxygenase alkB homolog 5 (ALKBH5) and fat massand obesity-associated protein (FTO). In order to study the differential expression of m6A RNA methylation regulators in tumor and normal tissues, we analyzed the mRNA expression pro le of TCGA-gastric cancer (including 58 normal samples and 309 tumor samples). Cluster analysis was applied to m6A RNA methylation regulators, and heatmaps and violin maps were presented to show differences. The pheatmap R package and the vioplot R package were used for drawing the plots. In addition, we used univariate Cox analysis to identify m6A-related genes related to the prognosis of gastric cancer (m6A regulatory genes with value < 0.05 considered statistically signi cant).

2.2Co-expression network construction and identi cation of clinically signi cant modules
The co-expression network was constructed by the WGCNA package in R [12] . The genes with variances greater than all variance quartiles were selected and those genes with larger variances and larger mean variations in different samples were considered. The expression data pro le of the selected genes was quali ed and the samples were clustered to detect outliers. Gene clustering modules were identi ed on the basis of the clinical features (including the expression of the m6A regulatory genes that we selected before) and topological overlap matrixbased dissimilarity [13] . Next, the relevance between clinical features and module eigengenes was used to identify the correlated modules. Highly correlated modules were considered to be very signi cant for our research.

2.3Identi cation of hub-genes and riskscore model construction
We selected the modules of interest where the genes in the modules were de ned as highly relevant with certain clinical traits. Next, Univariate Cox analysis and LASSO were used to screen for the genes that were signi cantly correlated to prognosis in the module (P < 0.01 was considered signi cant). Cluster Pro ler R package was used for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of screened genes, and P < 0.05 was considered a statistically signi cant difference. The TCGA samples were randomly divided into two groups: 153 samples were tested, and 156 samples were veri ed. There was no statistically signi cant difference in the expression of HNRNPC and other clinicopathological variables between the two groups. In the training set (n = 156), LASSO regression was perfomed to screen for gastic cancer prognosis-related genes based on lambda.min (the lambda corresponding to the smallest mean error), and the hub genes were selected. LASSO was analyzed with the "glmnet" package in R. The hub gene expression values weighted by the coe cients from the LASSO regression generated a risk score for each patient. Finally, The Survminer R package was used to nd the optimal cutoff for the risk score, while receiver operating characteristic (ROC) and Kaplan-Meier curves were used to assess the prognostic capacity of the risk scores.

Identi cation of prognostic m6A RNA methylation regulators
Univariate Cox analysis was performed to identify the gene map of m6A related to the prognosis of liver cancer patients (forest) to show m6A regulators with P<0.05. Based on the univariate COX analysis, we found that the high expression of HNRNPC is better related to the prognosis of the following patients: HNRNPC of gastric cancer in the TCGA data set shows P<0.05, and HR>1, which can be considered as pathogenic factors negatively affecting the prognosis of gastric cancer ( Figure 1A).

Differential expression of m6A RNA methylation regulators
In the TCGA set, 58 cases were normal samples and 309 cases were tumor samples. Heatmaps and violin maps were drawn according to the different gene expression levels. According to the results, we can conclude that HNRNPC had higher expression in the tumor samples than in the normal samples. RBM15, WTAP, METTL3, YTHDF2, YTHDF1, YTHDC1, YTHDC2, KIAA1429, ZC3H13 and HNRNPC were shown signi cantly higher expression in tumors than in normal tissues ( Figure 1B). As shown in the violin plot ( Figure 2), the expression of YTHDC2, RBM15, ZC3H13, METTL3, YTHDC1, KIAA1429, WTAP, YTHDF1, YTHDF2 HNRNPC in normal tissues is obviously different from that in tumor tissues, and the differences had signi cance (P < 0.05). They showed higher expression in tumor tissues.

Co-expression network construction
As mentioned above, this study calculated the variance of the expression of each gene in all samples, and taking the variance value greater than the quartile as the standard, a total of 6685 genes were screened out. A hierarchical clustering tree was constructed from 6685 genes in 309 tumor samples. Then, 309 samples and sample clinical information were hierarchically clustered ( Figure 3A). To construct a scale-free network, there was the need to choose the appropriate weighting factorβwhile moderately retaining the average connectivity of each gene node. We nally chose β = 5 to construct the co-expression network ( Figure 3B). After determining the β value, a total of 15 modules were identi ed ( Figure 3C).

Correlation between modules and phenotypes
According to the correlation between each module and the clinical phenotype, we selected the modules that were signi cantly associated with prognosis and HNRNPC expression. The turquoise and magenta modules were signi cantly highly associated with HNRNPC expression (positive values indicate a positive correlation, negative values indicate a negative correlation) and had a stronger correlation with the pathologic stage. This indicates that the genes in the two modules may be regulated by HNRNPC and play a role in the prognosis of gastric cancer patients (Figure 4).

Hub genes identi cation
In order to further determine the prognostic genes regulated by HNRNPC, we chose the turquoise and magenta modules to conduct further research on 1538 genes. A preliminary selection of prognostic genes was made by univariate Cox, where a P < 0.05 was used as a cutoff for screening prognostic genes, and 98 genes were selected. The 98 selected genes were analyzed by clusterPro ler R package for GO and KEGG pathway analysis. In the biological process terms of GO analysis, the genes were mainly enriched in "positive regulation of ERK1 andERK2 cascade," "cellular response to chemokine," "monocyte chemotaxis," "regulation of cartilage development," "regulation of phospholipase activity," "programmed cell death involved in cell development," "chemokine−mediated signaling pathway." In cell component terms, Differentially Expressed Genes (DEGs) were mainly enriched in "collagen−containing extracellular matrix," "basement membrane," " brillar center." In molecular function terms, DEGs were mainly enriched in "extracellular matrix structural constituent," "G protein−coupled receptor binding," "endodeoxyribonuclease," "activity, producing 5'−phosphomonoesters," "chemokine receptor binding," "extracellular matrix structural constituent conferring compression resistance." KEGG pathway analysis demonstrated that 98 selected genes were signi cantly enriched in "endonuclease activity, active with either ribo−or deoxyribonucleic acids and producing 5'−phosphomonoesters," "extracellular matrix structural constituent," "titin binding," "endodeoxyribonuclease," "activity, producing 5'−phosphomonoesters," "extracellular matrix structural constituent conferring compression resistance," "endonuclease activity," "nuclease activity," "transmembrane receptor protein kinase activity" and "dioxygenase activity," etc ( Figure 5).
Then, 309 TCGA samples were randomly divided into a training set and an testing set. The tableone R package was used to describe the clinical information difference between the internal training set and the internal testing set. The results showed that expression of HNRNPC and the other clinicopathological variables was not signi cantly different between the two groups ( Table 1). In the experimental group, a total of 98 prognostic genes were screened for the two modules using LASSO analysis and multivariate COX analysis. The results showed that AARD, ASPN, SLAMF9, MIR3117 and DUSP1 are real hub genes that are associated with patient prognosis (Figure 6A-C). In the TCGA datasets, a signi cant correlation was found between the expression of HNRNPC and that of the hub genes ( Figure 6D).

Risk scores
Five genes were identi ed and subsequently used to construct a prognostic gene signature. The risk score = −(0.166195281×AARD + 0.016850602×ASPN + 0.591607997×SLAMF9 + 0.591607997×MIR3117 + 0.00276337×DUSP1), and we used the Survminer R package to nd the optimal cutoff for the risk score, while ROC and Kaplan-Meier curves were used to assess the prognostic ability of the risk scores. We plotted the risk score distribution, the time-dependent ROC curve and the survival analysis of the training set and testing set ( Figure 8). Our results indicated a bad performance of the ve-gene signature for survival prediction (P < 0.05).

Discussion
N6 methyladenosine (m6A) modi cation is the most common modi cation in human mRNA [14] , which is considered to be a new epigenetic regulator of mRNA processing and translation. Increasing studies have revealed that the maladjustment of m6A is closely related to abundant physiological and pathological phenomena, including carcinogenesis [15] , obesity, immune maladjustment and so on [16,17] . In recent years, Mounting evidence has proved that m6a-related genes play a vital role in the genesis and the development of gastic cancer [2,18] . For example, Lin S, et al reported that METTL3 inhibits the mobility and proliferation of human gastric cancer cells and leads to inactivation of the AKT signaling pathway, indicating that it may be a meaningful and potential target for the treatment of human gastric cancer [19] . It has been found that the transfer of HNRNPC location may be related to chemoresistance of gastric cancer, suggesting the potential usefulness of HNRNPC as a prognostic and therapeutic marker of GC [11] .
Pi J, et al reported that YTHDF1 directly promotes the translation of the key Wnt receptor frizzled7 (FZD7) in an m6a-dependent manner, so that the mutant YTHDF1 enhances the expression of FZD7, which ultimately leads to the over-activation of the Wnt/β-catenin pathway and promotes the occurrence of GC [20] .
In this study, we rst evaluated the expression of HNRNPC in GC and found that the expression of HNRNPC in tumor samples increased signi cantly. As an effective IRES activator, HNRNPC is related to the establishment and maintenance of malignant phenotype. It is regulated by increasing the level of IGG1R and ultimately promotes the occurrence of gastric cancer [11,21] . As an important m6A methyltransferase, HNRNPC has been found to play a very important and potential role in a variety of physiological and biochemical functions, and it is also related to the occurrence and development of many cancers.
We nally identi ed ve hub genes (AARD, ASPN, SLAMF9, MIR3117 and DUSP1) that may be regulated by HNRNPC. In previous studies, these ve genes have been found to be involved in the development of many diseases. ASPN promotes the migration and invasion of colorectal cancer cells via TGFβ/Smad2/3 pathway and could serve as a potential prognostic biomarker in colorectal cancer patients [22] . It has been found that ASPN promotes cell proliferation by interacting with PSMD2, and down-regulation of its effectors and serves as a potential therapeutic target in GC [23] . The expression of SLAMF9 in melanocyte lesions may indicate genetic susceptibility to the development of malignant melanoma, which suggests that SLAMF9 plays an important role in melanoma biology [24] . It has been reported that miR-3117 participates in the proliferation of HepG2 by targeting PHLPPL, thereby participating in the occurrence and development of liver cancer [25] . The elevated DUSP1 expression is related to tumor progression, drug resistance and poor prognosis, and can be used as a predictive biomarker for apatinib treatment [26] . However, there are few studies on AARD. Our research indicated that there is a certain connection between these ve hub genes and HNRNPC, however, this remains to be veri ed by further experiments.

Conclusion
Our research revealed a risk model consisting of ve m6A-relevant genes, which may be useful for the prediction and diagnosis of gastric cancer. In addition, this discovery also provides motivation for basic medical research on m6A methylation in gastric cancer.

Declarations
Ethics approval and consent to participate The study was approved by the ethical committee of Northern Jiangsu People's Hospital, Yangzhou, China. The study was performed in accordance with the Declaration of Helsinki.

Consent for publication
Not applicable.
Availability of data and materials The data analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare no con icts of interest.  Differential expression of m6A RNA methylation regulators. Vioplot visualizing the differentially m6A RNA methylation regulators in gastric cancer (assume blue is normal and red is gastric cancer). HNRNPC showed higher expression in tumor tissues.  Correlation between modules and phenotypes. Each row in the gure corresponds to a gene module, and each column corresponds to a clinical phenotype. The numbers in brackets indicate the P value, and the numbers without the brackets indicate the correlation.
Page 16/18 Functional enrichment analysis of 98 selected genes. KEGG, Kyoto Encyclopedia of Genes and Genomes.