Identication and validation of a novel prognostic index based on m6A RNA methylation regulators in esophageal carcinoma

The incidence and mortality rate of esophageal carcinoma (ESCA) remains high. This study proposed to explore the promising prognostic markers based on m 6 A RNA methylation regulators, and nally to improve the prognostic assessment for ESCA patients. The RNA sequencing and relevant clinical data of ESCA and normal tissues were obtained from The Cancer Genome Atlas (TCGA) database. Then, we evaluated the expression pattern of 13 m 6 A methylation regulators in ESCA and normal samples. Two groups of ESCA were divided by the consensus clustering analysis. STRING database and R package were used to construct the protein-protein interaction network and conduct correlation analysis, respectively. Cox regression and least absolute shrinkage and selection operator (LASSO) regression analyses were performed to develop the multiple-gene risk signature. Kaplan-Meier method and receiver operating characteristic (ROC) curves were used to assess the accuracy of the model. The clinical nomogram combining clinicpathological factors and gene signature was built to predict survival rate of ESCA patients. Gene set enrichment analysis (GSEA) and networks prediction analysis were conducted to explore the signaling pathways that related to the risk genes.


Background
Esophageal carcinoma (ESCA) is the seventh most commonly diagnosed cancer worldwide, with 572,000 new cases and 509,000 deaths estimated in 2018 [1]. The development of ESCA is a complex process which is in uenced by many factors [2]. The major risk factors of ESCA are obesity, heavy drinking, gastric re ux, and smoking [3]. Although much progress has been made in the early screen and treatment strategy of ESCA, this disease remains a profound public health burden [3]. Radical esophageal resection is the mainstay of treatment for patients with resectable ESCA. However, the overall survival (OS) of ESCA patients has remained poor over the past decades [4]. Thus, it is of great importance to identify the prognostic biomarkers for ESCA patients, especially for these with high risk and may receive bene t from the timely treatment.
RNAs play a crucial role in various cellular processer, and its emerging roles in the regulation of tumor progress have been gradually revealed during the past decade [5]. To date, RNA modi cations have been identi ed in various RNAs, including mRNA, non-coding RNA, and others [5,6]. So far, there are more than 150 known RNA modi cations forms have been reported, including 5-methylcytosine, N6methyladenosine (m 6 A), N7-methyladenosine, etc [7]. Among these, N6-methyladenosine (m 6 A) is widely found in various RNAs and is regarded as the most abundant types of mRNA modi cations [8]. Generally, RNA m 6 A modi cation can modulate RNA transcription, translocation, metabolism, and RNA splicing [8][9][10]. The process of RNA m 6 A modi cation is dynamic and reversible [10], which is regulated by a methyltransferase complex called "writers" (KIAA1429, RBM15, METTL3, METTL14, ZC3H13, and WTAP), "readers" (YTHDF1, YTHDF2, YTHDC1, YTHDC2, and HNRNPC), and "erasers" (ALKBH5 and FTO). The dysregulation of m 6 A RNA methylation regulators is found to be related to several diseases including obesity, autoimmune diseases, and cancers [11,12]. For example, METTL14 exerts an oncogenic role through regulating its mRNA targets (MYB and MYC) by m 6 A modi cation in leukemogenesis [13].
Controversy, METTL3-mediated m 6 A modi cation plays a tumor suppressor role in colorectal cancer [14].
Thus, the m 6 A methylation regulators may play different roles in different cancers. Considering the limited reports of the role of m 6 A methylation in ESCA, studying on the clinical values of m 6 A methylation regulators in ESCA is highly needed.
In this study, we comprehensively analyzed the expression pattern of 13 widely studied m 6 A RNA methylation regulators in 160 ESCA and 11 normal tissues from the Cancer Genome Atlas (TCGA) datasets. The correlation of these m 6 A regulators and its relationship with patients' clinical features were explored. Based on the result of Cox regression and LASSO regression analysis, a two-gene signature based on m 6 A RNA regulators was constructed, which presented good performance in predicting the prognosis of ESCA patients. A nomogram combining the clinicalpathological factors and risk gene signature was established to predict the individual's survival rate in ESCA. Finally, several important pathways (cell cycle, mTOR pathway, and p53 signaling pathway) were found to be associated with the dysregulation of risk genes.

Data collection and procession
The RNA-sequencing transcriptome data and clinicopathological information of 160 ESCA patients and 11 normal samples were obtained from The Cancer Genome Atlas (https://cancergenome.nih.gov/) database. Data procession was performed as the previous studies [15,16]. GSE13898 dataset was downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) for validation [17]. This study was conducted according to the ow chart ( Figure. Table 1). The Wilcoxon signed-rank test was applied to screen the differentially expressed m 6 A RNA methylation regulators between ESCA and normal tissues with P < 0.05.
Subsequently, the expression data of the m 6 A RNA methylation regulators were combined with the corresponding clinical data. The "pheatmap" and "vioplot" packages in R software were used to visualize the expression patterns of m 6 A regulators.

Consensus clustering analysis
The "Consensus ClusterPlus" package was used to divide the ESCA samples into different clusters [18]. Subsequently, the principal component analysis (PCA) was performed to evaluate the gene expression patterns of m 6 A RNA methylation regulators in two ESCA clusters. Then, "survival" package was used to compare the OS of patients in two ESCA clusters. The clinicopathologic features and expression patterns of m 6 A RNA methylation regulators in two groups were presented by "pheatmap" package.
2.4 Protein-protein interaction network construction, correlation analysis, and KEGG enrichment STRING database (https://string-db.org/; Version: 11.0) was used to construct the protein-protein interaction (PPI) network for these 13 m 6 A methylation regulators. Subsequently, the hub genes within the PPI network were screened via CytoHubba-plugin in Cytoscape software according to the degree of connectivity. The correlation analysis was conducted by "corrplot" package to explore the relationships among the m 6 A regulators. KEGG analysis of the 13 m 6 A methylation regulators was conducted via the FunRich tool (FunRich 3.0) [19].

Construction of prognostic gene signature
The cBio Cancer Genomics database (https://www.cbioportal.org/) was used to investigate the genetic alteration information of 13 m 6 A methylation regulators [20]. Univariate Cox regression analysis was performed to exp lore the relationships between the expression of m 6 A RNA methylation regulators and patients' survival [21]. Then, these 13 m 6 A methylation regulators were entered into a LASSO regression analysis [15]. Two m 6 A regulators were screened as the prognostic factors by LASSO regression analysis. Moreover, the prognostic roles of these two genes were explored using the Kaplan-Meier plotter website (http://kmplot.com/analysis/index.php), a free available tool based on GEO database, European Genomephenome Archive (EGA) database, and TCGA database [22]. These two m 6 A regulators were then used to develop a potential risk signature as previous study [23]. Each patient with ESCA was assigned a risk score according to the formula. The predictive formula was calculated as following: risk score = (Coe cient gene1 × Expression gene1 ) + (Coe cient gene2 × Expression gene2 ) [24]. The Kaplan-Meier curve and receiver operating characteristic (ROC) curve were applied to assess the predictive e ciency of the gene signature.

Identi cation of the independent prognostic indicators in ESCA
Univariate and multivariate Cox regression analyses were performed to identify the independent prognostic factors for ESCA patients (including age, gender, stage, T (primary tumor), M (metastasis), N (lymph nodes), and risk score). In addition, the Kaplan-Meier curves were conducted to compare the OS difference between the low-risk and high-risk groups strati ed by gender age, and stage.

Construction and validation of the clinical nomogram
A nomogram based on the independent prognostic factors (stage and risk score) was develop for predicting the 1-year and 3-year survival rate of ESCA patients [25]. The calibration curves were used to evaluate the performance of the nomogram. GSE13898 ESCA dataset was downloaded and used to validate the accuracy of the nomogram.

Gene set enrichment analysis (GSEA)
160 ESCA patients were divided into high expression and low expression group according to the median expression value of ALKBH5 or HNRNPC. GSEA (http://software.broadinstitute.org/gsea/index.jsp) was performed to analyze HNRNPC or ALKBH5 gene related biological pathways and biological processes [26]. The normalized enrichment score (NES) and nominal p-value were applied to sort the pathways and processes enriched in each phenotype [27].

Establishment of transcription factors (TFs)-genes networks and miRNA-genes networks
The NetworkAnalysis database (http://www.networkanalyst.ca) was applied to build the TFs-genes networks and miRNA-genes networks for HNRNPC and ALKBH5 [28]. The ENCODE database with ChIP-seq data in the NetworkAnalysis platform was used to predict the potential TFs [28]. The miRNA-gene networks were predicted by the TarBase and miRTarBase in the NetworkAnalysis platform.

Statistical analysis
The Perl language and R statistical package (R version 3.6.3) were used to conduct all the statistical tests and graphics unless otherwise stated. P < 0.05 was considered as statistical signi cance.

Expression patterns of m 6 A RNA methylated regulators in ESCA
The m 6 A RNA methylation regulators play a crucial role in the cancer development and progression. Thus, we rstly explored the expression patterns of these 13 genes in 160 ESCA tissues and 11 normal tissues in TCGA dataset. As shown in Figure.   ). To better understand the biological functions of m 6 A RNA methylation regulators in ESCA, KEGG analysis were performed for these 13 genes, as shown in Figure. S2, KEGG analysis showed that these genes were mainly enriched in processing of capped intron-containing pre-mRNA, mRNA processing, and formation and maturation of mRNA transcript, gene expression, and regulation of telomerase, etc.

Clustering of ESCA patients and prognosis of two clusters
According to the expression similarity of m 6 A RNA methylation regulators, "Consensus ClusterPlus" package in R software was applied to clustered the ESCA cases into different groups. As a result, k = 2 was selected as the most appropriate index, which could cluster the ESCA patients into two cohorts (cluster 1 and cluster 2) with ideal stability (Figure. Figure. S4a showed that the ESCA patients in cluster 1 presented a trend with shorter OS compared to cluster 2 although it was not statistically signi cant (P = 0.333). However, the relationship analysis of the clinicopathological features showed obvious difference for the stage factor between the two clusters (P < 0.01) ( Figure. S4b). Taking together, the above results indicated that the clustering result was associated with the clinicopathological factors and tumorigenesis of ESCA.

Construction of the risk gene signature for ESCA
To investigate the contributions of the 13 m 6 A RNA methylation regulators to ESCA, the genetic alteration information of these regulators were explored in the cBio Cancer Genomics database. The Nature 2017 ESCA dataset (265 cases) and Firehose Legacy ESCA dataset (184 cases) of ESCA were both included. In order to get a better understanding of the relationships between the m 6 A RNA methylation regulators expression and prognosis of ESCA patients, LASSO regression analysis was conducted for these 13 genes. Figure. 4b presented the regression coe cient of these 13 genes in ESCA. As shown in Figure. 4c, when the two genes (ALKBH5 and HNRNPC) were included, the model achieved the best performance. Finally, a stepwise multivariate Cox regression was performed to construct the optimal risk signature based on these two genes. Each ESCA patient received a risk score calculated as follows: risk score = (-0.027844 × expression value of ALKBH5) + (0.008465 × expression value of HNRNPC). Then, the ESCA patients were divided into low-risk and high-risk group according to the median value of risk score. The Kaplan-Meier curves showed that ESCA patients with high risk score had a poorer survival than those with low risk score (P = 1.411e-02) (Figure. 4d). The ROC curve showed that the risk score curve had a good feasibility in predicting the individuals' survival rate with AUC of 0.644 ( Figure. 4e). Figure. 4f illustrated that the survival time of ESCA patients decreases along with the rising of risk score. Therefore, the above results suggesting that this risk gene signature could effectively identify the high risk ESCA patients with poor OS.
Finally, we also performed subgroup analysis to assess the prognostic role of the two-gene risk signature in ESCA patients according to several clinicopathological factors (grade, stage, and age). ESCA patients with high risk score in subgroup of female, stage III-IV, and patients with age > 65 had dramatically lower OS than patients with low risk score (Figure. 6) (P < 0.05), suggesting that the two-gene based signature have good ability to discriminate the patients with poor OS. Other subgroups also presented similar trends but not statistically signi cant.

Construction and validation of a clinical nomogram
The nomogram was used to quantitatively evaluate the patients' survival rate through integrating the independent prognostic factors (stage and risk score). The total points of risk factors were used to assess the patients' 1-year and 3-year survival rates ( Figure. 7a). The concordance index (C-index) was 0.70 (95% CI: 0.62-0.79). Moreover, calibration curves showed good concordance between the nomogram-predicted survival and actual survival ( Figure. 7b-7c), especially for the 3-year survival. Importantly, we validated the nomogram in the GSE13898 ESCA dataset, and the 1-year and 3-year calibration curves was also presented good concordance between the nomogram-predicted survival and actual survival ( Figure. 7d-7e). These result suggested that the clinical nomogram performs well in predicting the prognosis of ESCA patients.

Identi cation of ALKBH5 and HNRNPC related signaling pathway and biological processes via GSEA
To explore the signaling pathways and biological processes that are differentially activated in the ESCA development and progression, GSEA was conducted between high and low ALKBH5 or HNRNPC expression datasets respectively. According to the normalized enrichment score (NES) and NOM p-value < 0.05, the most signi cantly enriched signaling pathways and biological processes were selected and presented in Figure. 8 and Figure. S6, respectively. Figure. 8a-8c showed that cell cycle, DNA replication, and mTOR signaling pathway were differentially enriched in ALKBH5 high expression phenotype, while cell cycle, pentose phosphate pathway, and p53 signaling pathway were activated in HNRNPC high expression phenotype. As for the signi cant terms of biological processes, DNA biosynthetic process, DNA recombination, and DNA repair were associated with ALKBH5 overexpression (Figure. 6Sa), whereas phospholipase C activity and regulation of cellular extravasation were mainly related to low expression phenotype of HNRNPC ( Figure. 6Sb).

Discussion
ESCA is one of the major causes of cancer-related mortality worldwide. So far, there are no effective treatment strategies for advanced ESCA patients, leading to a poor 5-year survival rate. To optimize and personalize the treatments for ESCA patients, an accurate prognostic judgment is important during the cancer management. So far, the tumor node metastasis (TNM) classi cation is the most commonly utilized tool for predicting the prognosis in cancer patients [29]. However, increasing evidences demonstrated the unsatisfactory discriminative capacity of TNM system in predicting clinical outcomes [30,31]. With the development of life science and technology, more and more prognostic markers have been identi ed for ESCA [3]. However, the existing markers may lack su cient sensitivity and speci city in cancer prognosis. The gene signature based prognostic index model holds promising in prognostic prediction, thus may help to improve the personalized medicine.
To the best of our knowledge, this study rstly investigated the prognostic roles of reported m 6 A RNA methylation regulators in ESCA. Firstly, 8 out of 13 m 6 A RNA regulators were found to be dysregulated in ESCA, implying its crucial roles in the development of ESCA. Moreover, two clusters of ESCA presented signi cant difference in tumor stage, which also indicated that the expression pattern of these regulators were associated with the malignancy of ESCA. Previously, yang et al reported that genetic variants of m 6 A modi cation genes could affect the ESCA susceptibility [32]. In this study, we also found that genetic alteration of these gene was common in ESCA according to the cBio Cancer Genomics database. These results, taking together, suggested that m 6 A RNA methylation play a signi cant role in modulating the malignant process of ESCA. Subsequently, we conducted the Cox regression and LASSO regression analyses to build a gene signature with two m 6 A RNA regulators, ALKBH5 and HNRNPC, which divided the ESCA patients into two groups (high-risk and low-risk groups). The results showed that the two-gene based signature could effectively discriminate the patients with different OS (P = 1.411e-02), and presented a great performance in prognosis prediction. What's more, the univariate and multivariate Cox regression analyses further proved that the risk score calculated by the gene signature was an independent prognostic indicator with the highest HR value than other factors. Subgroup analysis also showed that this two-gene signature have good ability to distinguish the patients with poor OS in group of female, stage III-IV, and patients with age > 65. Other subgroups presented similar trends although with no signi cant difference. One reasonable explanation for it may due to the limited number of ESCA patients in TCGA database. The clinical nomogram based on the independent prognostic factors was established and validated to have a good ability in predicting the 1-year and 3-year survival rate of ESCA patients.
In the two-gene signature, ALKBH5 was proved to be a protective gene in ESCA, while high level of HNRNPC was associated with poor OS. The progression of esophageal carcinomas is in uenced by the complex gene networks, our predicted TFs-genes networks and miRNA-genes networks of ALKBH5 and HNRNPC may provide powerful bases for in-depth studies regarding the molecular mechanisms of ESCA. Currently, the biological functions of ALKBH5 and HNRNPC have been investigated in various researches.
Previously, He et al. demonstrated that ALKBH5 can suppress pancreatic cancer by demethylating lncRNA KCNK15-AS1 [33]. Moreover, ALKBH5 can inhibit pancreatic cancer through regulating WIF-1 RNA methylation and Wnt signaling pathway [34]. On the contrary, ALKBH5 was also reported to be an candidate oncogene in other cancers, including ovarian cancer [35], breast cancer [36], and gastric cancer [37]. Given that the opposite role of ALKBH5 in different cancers, more experimental studies are highly demand. In our study, several ALKBH5-related signaling pathways were screened by GESA, including cell cycle, DNA replication, and mTOR signaling pathway. Recently, a study reported that ALKBH5 can activate EGFR-PIK3CA-AKT-mTOR signaling pathway in ovarian cancer [35]. These ndings suggested that ALKBH5-mTOR pathway may be an available therapeutic pathway for cancer treatment. HNRNPC mainly function as an oncogenic role in cancers [38,39]. Previously, studies found that the abnormally expressed HNRNPC was related to the LBX2-AS1 pathway [40] and JAK-STAT1 signaling pathway [39] in several cancers. But the association between the HNRNPC expression and cell cycle related pathway, pentose phosphate pathway, and p53 signaling pathway was rstly reported here, thus the regulatory mechanisms required to be elucidated in future.
There are also some limitations in this study. Firstly, the number of normal tissues (11) is much less than the number of ESCA tissues (160). Secondly, we did not divide the ESCA samples into two subtypes (adenocarcinoma and squamous cell carcinoma) due to the limited number of patients in TCGA dataset. These may in uence the reliability of our ndings. Thirdly, the prognostic index should be further validated in large clinical cohort. Finally, further studies on the two m 6 A RNA regulators may improve the targeted therapy in ESCA patients.

Conclusions
In conclusion, the dysregulated m 6 A RNA methylation regulators were closely related to clinicopathological factors of ESCA. The m 6 A RNA methylation regulators, especially ALKBH5 and HNRNPC, play an essential role in the cancigenesis and progression of ESCA. Importantly, the two-gene based signature was demonstrated to be a promising tool to distinguish ESCA patients with different clinical outcome. The GESA and networks prediction analysis also provided new clues for mechanism research in ESCA.

Availability of data and materials
All data used in the study were downloaded from The Cancer Genome Atlas (TCGA) database and GEO database.

Competing interests
The authors declare that they have no con icts of interest.  The ow chart of the study analysis.     Subgroup analyses for survival in low-risk and high-risk group strati ed by clinicopathological factors. ab. Subgroup analyses for survival in low-risk and high-risk group strati ed by gender; c-d. Subgroup analyses for survival in low-risk and high-risk group strati ed by stage; e-f. Subgroup analyses for survival in low-risk and high-risk group strati ed by age. Calibration curves present the concordance of 1-year (d) and 3-year survival (e) between the observation and the prediction in the GSE13898 ESCA cohort (testing group).  TFs-genes networks and miRNA-genes networks of AKLBH5 and HNRNPC a. The predicted networks of TFs and two m6A RNA methylation regulators (AKLBH5 and HNRNPC). The blue circles represent AKLBH5 and HNRNPC, the red and orange squares represent the predicted TFs; b. The predicted networks of miRNAs and two m6A RNA methylation regulators (AKLBH5 and HNRNPC). The blue circles represent AKLBH5 and HNRNPC; The red and orange squares represent the predicted miRNAs.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. gure.S1S7.docx