# A m6A-related lncRNA Signature as a Novel Prognostic Factor for Gastric Cancer

## Abstract

Background: N6-methyladenosine (m6A)isa common form of mRNA modification regulated by m6A RNA methylation regulators. However, studies have not explored the role of m6A-related lncRNA in gastric cancer (GC). This study aimed atexploring biological and prognostic roles of m6A-related lncRNA in GC.

Methods: We identified 800 m6A-related lncRNAs through correlation analysis of 13 main m6A RNA methylation regulators and all lncRNAs expressed in GC. We further categorized patients into train group and testing group equally.

Results: A total of 11 m6A-related lncRNA signature associated with prognosis of GC were identified through univariate cox regression analysis and LASSO analysis which was validated using the testing dataset and complete dataset, respectively. More deaths and shorter survival time were reported for patients in the high-risk group compared to low-risk group. lncRNA signature is an independent prognosis predictor as shown by cox regression analysis of the complete dataset. Moreover, genes involved in base excision repair were highly expressed in patients in the high-risk group as shown by gene set enrichment analysis (GSEA) result whereas ECM receptor interaction and focal adhesion pathway were enriched in low-risk group. A nomogram on independent factors showed clinical net benefit asan overall survival predictor of GC. In addition, we identified four subgroups of GC patients with significant differences in overall survival (OS).Subgroup C1 and C2 responded well to immunotherapy, compared to subgroup C3 and C4.

Conclusions: m6A-related lncRNA signature and four molecular subgroups provide information on the underlying molecular mechanism of GC and provide for a basis for development ofpersonalized therapy.

## Introduction

Gastric carcinoma (GC) is one of the most common malignanttumours and the third leading cause of cancer-relatedmortality (1). It is associated with poor prognosis with about about 80% of patients being diagnosed at an advanced stage(2).Surgery is the most effective therapy for gastric cancer. However, full recovery is not guaranteed for patients with recurrent or un-resectable GC. Notably, 5-year mortality rate for advanced gastriccancer is between 30–50%(3). Therefore, there is need to explore alternative prognostic markers.

Studies report that RNA modification plays a crucial role in the post-transcriptional regulation of gene expression (4). A total of 163 different RNA modifications were reported in all living organisms through MODOMICS by 2017(5). N6-methyladenosine (m6A) modifications are the most common RNA modifications and have been widely studied. m6A methylation is a dynamic reversible process carried out by methyltransferase complex (writers), demethylase (erasers), and function manager (readers)(6). The methyltransferase complex is composed of METTL3, METTL14, KIAA1429, WTAP, RBM15, and ZC3H13 which mediate RNA methylation modification process(7). Demethylase is composed of FTO and ALKBH5 and mediatesRNA demethylation process(8, 9). On the other hand, the function manager includes YTHDC1, YTHDC2, YTHDF1, YTHDF2 and HNRNPC and play a rolein “reading” RNA methylated information and translation and degradation of downstream RNA(10). m6A through the “writers”link methyl groups to RNA which is further recognized by the “readers” to aid processes such as RNA processing, nuclear export, translation, and decay. Further, m6A plays a role in gene expression regulation through demethylation of RNA by demethylase(11).

Although the human genome can transcribe ~ 60000 genes only ~ 20000 of these genes are protein coding genes. The remaining ~ 40000 genes are mainly classified into three types: snRNA, miRNA and lncRNA. Among these non-coding genes, there are ~ 16000 lncRNA genes which account for a quarter of the total genes. lncRNA are at least 200 bp long and have a lower protein-coding potential compared to miRNA and snRNA(12). Previous studies report that lncRNAs play an important role in transcriptional and post transcriptional regulation and chromatin modification through regulation of gene expression(13). Recent studies have reported aberrant lncRNAs expression as diagnostic and prognostic markers in tumors(14). However, the role of lncRNAs in m6A modification in GC have not been reported.

In the present study, we analyzed the role of m6A-related lncRNAs in overall survival of GC patients. We constructed 11 m6A-related lncRNA signature and validated using the testing dataset. Notably, the findings of this study show m6A-related lncRNA signature can be used as an independent prognosis marker for GC. Patients were classified into four subgroups using non-negative matrix factorization (NMF) based on the expression of lncRNA.Subgroup C4 showed better response to immunotherapy compared with the other subgroups. The findings of this study provideinformation on the role of m6A in GC.

## Data collection and correlation analysis

RNA transcriptome dataset and corresponding GC clinical information were downloaded from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) database. Geneswere grouped into protein coding genes and lncRNA genesbased on the human genome annotation data. Expression levels of 13 m6A regulator genes were also determined. We used Pearson correlation coefficient to evaluate the correlation between m6A regulator genes and lncRNA. Geneswith absolutecorrelation coefficient > 0.3 and P value < 0.05 were considered as m6A-related lncRNA. Patients were then grouped into two equal groups: training dataset and testing dataset. Retrieved data were used for subsequent bioinformatics analysis.

## Risk model construction

Univariate cox regression analysis, least absolute shrinkage and selection operator (LASSO)-penalized Cox regression analysis were employed to identify m6A-related lncRNA and for construction of prognostic lncRNA signature in the training dataset. The risk score was calculated based on the following formula:

$$Risk score=\sum _{i=1}^{N}({Exp}_{i }\times {\beta }_{i})$$ , where Expi represents each lncRNA expression and βi represents the coefficient of each lncRNA. Patients in the testing dataset were further grouped into high-risk and low-risk group based on the risk score.Receive operating characteristic (ROC) curve analysis was used to evaluate the accuracy of lncRNA signature in the training dataset, testing dataset and the complete dataset. The relationship between lncRNA signature and other clinical parameters inGC

The complete dataset with corresponding clinical information were used for subsequent analysis. The relationship between lncRNA signature risk score and clinical traits was determined using univariate cox regression analysis and multivariate cox regression analysis and p < 0.05 was considered statistically significant.

## Construction and validation of Nomogram model

Nomogram prediction model was constructed using the lncRNA signature risk score and independent clinical factors using “rms” R package. The calibration plot was applied to validate the calibration and accuracy of the nomogram (by a bootstrap method with 1,000 replicates). Moreover, accuracy of lncRNA signature risk score in grouping patients and independent clinical factors was validated using the decision curve analysis.

## Gene set enrichment analyses

Gene Set Enrichment Analyses (GSEA) method was used to explore the potential KEGG pathway implicated in the lncRNA prognostic signature. The reference gene set was retrievedfrom c2.cp.kegg.v7.1.symbols files and P < 0.05 and a false discovery rate q < 0.25 were considered statistically significant.

## Non‑negative matrix factorization consensus clustering

We used non-negative matrix factorization (NMF) clustering methods to explore potential molecular subgroups based on the prognostic lncRNA expression profile. The optimal K cluster was selected based on the cophenetic correlation coefficient. We further used the TIDE algorithm to predict the response of the four subgroups to immunotherapy.

## Statistical analysis

Computational and statistical analyses were performed using R software (https://www.r-project.org/).Kruskal-Wallis test was used to compare the four subgroupsandexplore their response to immunotherapy. Differences between high-risk group and low-risk group were determined using Kaplan-Meier curve and log-rank test. Clinical datawere analyzed using chi-square test or Fisher’s exact test. For all analyses p < .05 was considered statistically significant.

## Expression profiles of m6A RNA methylation regulators in GC

m6A RNA methylation regulators play a crucial role in progression of malignant tumors, therefore, we explored the expression profiles of 13 m6A RNA methylation regulators in GC. Higher overall expression levels of the 13 genes were observed in the tumor tissue compared with normal tissue (Fig. 1A). Notably, tumor tissue showed significantly higher expression levels of m6A regulators except for FTO and ALKBH5 compared with normal tissue with p value < 0.05. Further, correlation analysis showed that HNRNPC and YDHDF2 methylation regulators were highly correlated with GC progression (Cor = 0.57) (Fig. 1B).

## Identification of m6A-realted lncRNAs in GC

In order to identify potential m6A-related lncRNAs, Pearsoncorrelation coefficient was used to assess the relationship between m6A methylation regulators and lncRNAs. We identified 1351 interactions and 800 m6A-related lncRNAs with acorrelation coefficient > 0.3 and P value < 0.05. Among them, 117 interactions showed negative relationships between lncRNAs and m6A methylation regulators (Supplementary Table S1).

## lncRNA signature construction

A total of 339 patients samples (with survival time ≥ 30 days) expressing 800 lncRNAs were grouped into training dataset (N = 136) and testing dataset (N = 203). We performed univariate cox regression analysis and LASSO-penalized Cox regression analysis to construct a 11-lncRNA signature model of the training dataset (Fig. 2). The risk score for each patient in the training dataset, testing dataset and the complete dataset was calculated based on the risk formula: risk score = AL049840.3 * 0.599866058 + AC008770.3 * (-1.237087957) + AL355312.3 * (-0.19130367) + AC108693.2 * (-0.956067535) + BACE1-AS * (-0.362760192) + AP001528.1 * 0.528553101 + AP001033.2 * 0.594102051 + AC092574.1 * (-0.618599189) + AC010719.1 * (-0.337936563) + AC009090.3 * 0.770122519 + SAMD12-AS1 * (-0.766369919). Patients in the training dataset, testing dataset and complete dataset were further grouped into high-risk group and low-risk group. More deaths were reported for patients in the high-risk group compared to the low-risk group patients (Supplementary Fig. 1). K-M curve analysis results showed significant differences between risk groups (P < 0.05) (Fig. 3). The area under the ROC curve (AUC) for 5-year overall survival (OS) were 0.94, 0.85 and 0.89 in the training dataset, testing dataset and complete dataset, respectively, implying m6A RNA methylation regulators are effective prognosis markers of GC (Fig. 4).

## Prognostic value of lncRNA signature

The association between lncRNA signature and clinical factors were evaluated by performing KM curves analysis. As shown in Fig. 5A, we can found that the mortality rate was significantly increasing from younger group to older group (P < 0.05). Moreover, the stage also presented a significant divergence from stage I to stage IV (P < 0.05) (Fig. 5B). These result indicated that the higher risk score is, the greater the progression of GC is. Moreover, we also investigate the prognostic value of the m6A-reated lncRNA signature in GC patients stratified by clinicopathological variables, including age, gender, grade and stage. For all different stratification, patients in the high-risk group tended have a lower overall survival rate compared to low-risk group (Fig. 6A-H). These result suggesting that the m6A-related lncRNA signature can predict the prognosis of GC without the need to considering the clinical factors.

## Independent prognostic role of the lncRNA signature and individualized prediction model construction

To determine the independence of the lncRNA signature in the clinical factors, we conducting a univariate cox regression analysis and multivariate cox regression analysis among lncRNA signature and clinical factors. As shown in Fig. 7A-B, we found that the age, stage and lncRNA signature can can act as an independent prognostic factors for the prognosis of GC. Moreover, we further constructed a nomogram for the GC patients based on these independent factors (Fig. 8A). The calibration plots showed that the performance of the nomogram have a good concordance with the prediction of 1-, 3- and 5-year (Fig. 8B-D).

## Gene Set Enrichment Analyses (GSEA)

The potential pathways or functions of the m6A-related lncRNA signature were explored by performing Gene Set Enrichment analysis. As showed in Fig. 9A-C, we discovered that patients in high risk group mainly involved in cytokine receptor interaction, focal adhesion and ECM receptor interaction pathway, while basal transcription factors pathway were enriched in low-risk group (Fig. 9D).

## Consensus clustering of m6A-related lncRNAs

The NMF algorithm using m6A-related lncRNAs was used to explore the molecular subtypes of GC. In order to ensure a robust and reliable subtype, low expression level lncRNAs were filtered out and retained the lncRNAs that associated with survival of GC for univariate cox regression analysisto ensure a robust and reliable subtype.The data was then used in NMF clustering analysis. The cophenetic correlation coefficients were calculated to determine the optimal k value, and k = 4 was eventually selected as the optimal cutoff after comprehensively consideration (Fig. 10A, four subtypes were named C1, C2, C3 and C4). When k = 4, the consensus matrix heatmap still presented sharp and crisp boundaries, suggesting stable and robust clustering for the samples (Fig. 10B). Notably, the KM curve analysis result showed an significant survival difference among subgroups, of which C1 and C2 showed a better survival outcome compared to C3 and C4 (P = 0.031) (Fig. 10C). Moreover, the different expression pattern of the lncRNAs in the four subgroup suggesting that these lncRNA might play an important role in the subgroups (Fig. 10D). Further, we used the TIDE algorithm to predict response of the four subtypes to immunotherapy.Subtype C1 and C2 responded better to immunotherapy compared with subtype C3 and C4 (P = 0.0072).

## Disscussion

Several studies have reportedthat m6A-related lncRNA are implicatedin the development of various tumors, including GC (15). Therefore, exploring the role of lncRNA in the prognosis or diagnosis of GC will contribute to understanding the molecular mechanism of GC (16). Further, the role of m6A-related lncRNAs in the prognosis and diagnosis of GC are still not clear and deserve further study (17).

In the present study, we systematically investigate the role of m6A-related lncRNAs in the prognosis of GC. We retrieved 339 GC patients with survival time more than 30 days to the downstream analyses. By performing correlation analysis, univariate cox regression analysis and LASSO-penalized regression analysis, we successfully constructed a 11 m6A-related lncRNA signature. KM curve analysis result indicated that the signature could efficient stratify patients’ OS and have a robust high prognostic value. Further ROC analysis result suggesting that the lncRNA signature have a high accuracy in predicting the 5-year OS of GC. In this 11 -lncRNA signature, each lncRNA can be regarded as protective or risky prognostic factors for patients in GC. However, compared with one lncRNA, a multiple lncRNAs signature can evaluate the prognosis more accurately and efficiently (Supplementary Fig. 2). Thus, the m6A-related lncRNA signature could served as an reliable prognostic indicator for GC survival.

We also evaluate the relationship between clinical factors and lncRNA signature. We demonstrated that the risk score were presented an significantly increasing trend from young group to older group, indicating that the age factor could promote the development of GC. Similarly, the stage also presented increasing trend from early stage to advance stage. We also delighted to found that the age, stage and lncRNA signature can served as an independent prognostic risk factor for the prognosis of GC survival. Moreover, we constructed a nomogram based on the age, stage and lncRNA signature. The calibration plots showed the best performance in 1-year, 3-year and 5-year, which may help plan the short-term follow‐up for individual treatment. In addition, we identified four robust molecular subtypes based on the m6A related lncRNAs. We demonstrated that subtype C1 and C2 were corresponding to a better survival outcome compared to subtype C3 and C4, and subtype C1 and C2 are more likely to be responded to the immunotherapy than subtype C3 and C4. This may partly explain why the overall survival in the C1 and C2 were better than C3 and C4.

Overall, our prognostic model is based on the 11 m6A-related lncRNAs, which significantly lowers the cost of sequencing and have a high clinical application value. Moreover, the prognostic model showed a good performance for survival prediction in patients with GC. Nonetheless, several limitations needed to be discussed. Firstly, our prognostic model was constructed based on the TCGA database, lack of external cohort or patient cohort. Secondly, the expression level of lncRNAs needed to be further validated using the in vivo or in vitro experiments.

## Conclusions

In summary, we explored the expression levels and prognostic value of m6A-related lncRNAs through a myriad of analyses. We constructed a 11-lncRNA signature with a high prognostic value and can served as an independent prognostic factor for GC. To the best of our knowledge, this is the first report to developing a m6A-related lncRNA risk model for GC. whereby subtype C1 and C2 showed better response to immunotherapy compared with C3 and C4.The findings of this study provide new insight into understanding the role of m6A-related lncRNAs in GC and provide a basisfor the development ofpersonalizedtherapy.

## Declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Availability of data and materials

The datasets generated for this study can be found in the https://portal.gdc.cancer.gov/ and https://dcc.icgc.org/

Competing interests

The authors declare that they have no competing interests.

Funding

This study was supported by the National Science Foundation of China,(Grant/Award Number: 81902383), The Doctoral Scientific Research Startup Foundation of Liaoning Province (Grant/Award Number: 2019-BS-146), Revitalizing Liaoning Talents Program (Grant/Award Number: XLYC1907004)and Young and Middle-aged Scientific & Technological Innovation Talent Support Plan of Shenyang City(Grant/Award Number: RC200223).

Authors' contributions

Bin Ma designed the study, Haixu Wang and Qingkai Meng collected the clinical information and gene expression data, and analysis data. Haixu Wang wrote the manuscript and Bin Ma revised the manuscripts.

Acknowledgements

We appreciated TCGA database for providing the original study data.

## References

1. Shin H-R, Carlos MC, Varghese C. Cancer Control in the Asia Pacific Region: Current Status and Concerns. Jpn J Clin Oncol. 2012;42(10):867–81. doi:10.1093/jjco/hys077 %J Japanese Journal of Clinical Oncology.
2. Lordick F, Allum W, Carneiro F, Mitry E, Tabernero J, Tan P, et al. Unmet needs and challenges in gastric cancer: The way forward. Cancer Treat Rev. 2014;40(6):692–700. doi:https://doi.org/10.1016/j.ctrv.2014.03.002.
3. Katai H, Ishikawa T, Akazawa K, Isobe Y, Miyashiro I, Oda I, et al. Five-year survival analysis of surgically resected gastric cancer cases in Japan: a retrospective analysis of more than 100,000 patients from the nationwide registry of the Japanese Gastric Cancer Association (2001–2007). Gastric Cancer. 2018;21(1):144–54. doi:10.1007/s10120-017-0716-7.
4. He C. Grand Challenge Commentary: RNA epigenetics? Nat Chem Biol. 2010;6(12):863–5. doi:10.1038/nchembio.482.
5. Boccaletto P, Machnicka MA, Purta E, Piątkowski P, Bagiński B, Wirecki TK, et al. MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res. 2017;46(D1):D303-D7. doi:10.1093/nar/gkx1030.
6. Yang Y, Hsu PJ, Chen Y-S, Yang Y-G. Dynamic transcriptomic m6A decoration: writers, erasers, readers and functions in RNA metabolism. Cell Res. 2018;28(6):616–24. doi:10.1038/s41422-018-0040-8. PubMed PMID: 29789545.
7. Wang Q, Geng W, Guo H, Wang Z, Xu K, Chen C, et al. Emerging role of RNA methyltransferase METTL3 in gastrointestinal cancer. J Hematol Oncol. 2020;13(1):57. doi:10.1186/s13045-020-00895-1.
8. Tang C, Klukovich R, Peng H, Wang Z, Yu T, Zhang Y, et al ALKBH5-dependent m6A demethylation controls splicing and stability of long 3′-UTR mRNAs in male germ cells. (2018) 115(2):E325-E33. doi: 10.1073/pnas.1717794115 %J Proceedings of the National Academy of Sciences.
9. Ding C, Zou Q, Ding J, Ling M, Wang W, Li H, et al. Increased N6-methyladenosine causes infertility is associated with FTO expression. (2018) 233(9):7055–66. doi: 10.1002/jcp.26507.
10. Serrano-Gomez SJ, Maziveyi M, Alahari SK. Regulation of epithelial-mesenchymal transition through epigenetic and post-translational modifications. Molecular Cancer. 2016;15(1):18. doi:10.1186/s12943-016-0502-x.
11. Chen X-Y, Zhang J, Zhu J-S. The role of m6A RNA methylation in human cancer. Molecular Cancer. 2019;18(1):103. doi:10.1186/s12943-019-1033-z.
12. Rinn JL, Chang HY. Genome Regulation by Long Noncoding RNAs. (2012) 81(1):145 – 66. doi: 10.1146/annurev-biochem-051410-092902. PubMed PMID: 22663078.
13. Wang X, Li G, Luo Q, Xie J, Gan C. Integrated TCGA analysis implicates lncRNA CTB-193M12.5 as a prognostic factor in lung adenocarcinoma. Cancer Cell Int. 2018;18(1):27. doi:10.1186/s12935-018-0513-3.
14. Poursheikhani A, Abbaszadegan MR, Nokhandani N, Kerachian MA. Integration analysis of long non-coding RNA (lncRNA) role in tumorigenesis of colon adenocarcinoma. BMC Med Genomics. 2020;13(1):108. doi:10.1186/s12920-020-00757-2.
15. Wang Z-q, He C-y, Hu L, Shi H-p, Li J-f, Gu Q-l, et al. Long noncoding RNA UCA1 promotes tumour metastasis by inducing GRK2 degradation in gastric cancer. Cancer Lett. 2017;408:10–21. doi:https://doi.org/10.1016/j.canlet.2017.08.013.
16. Chandra Gupta S, Nandan Tripathi Y. Potential of long non-coding RNAs in cancer patients: From biomarkers to therapeutic targets. Int J Cancer. 2017;140(9):1955–67. doi:10.1002/ijc.30546.
17. Fang X-y, Pan H-f, Leng R-x, Ye D-q. Long noncoding RNAs: Novel insights into gastric cancer. Cancer Letters (2015) 356(2, Part B):357 – 66. doi: https://doi.org/10.1016/j.canlet.2014.11.005.