A m6A-related lncRNA Signature as a Novel Prognostic Factor for Gastric Cancer


 Background: N6-methyladenosine (m6A)isa common form of mRNA modification regulated by m6A RNA methylation regulators. However, studies have not explored the role of m6A-related lncRNA in gastric cancer (GC). This study aimed atexploring biological and prognostic roles of m6A-related lncRNA in GC. Methods: We identified 800 m6A-related lncRNAs through correlation analysis of 13 main m6A RNA methylation regulators and all lncRNAs expressed in GC. We further categorized patients into train group and testing group equally. Results: A total of 11 m6A-related lncRNA signature associated with prognosis of GC were identified through univariate cox regression analysis and LASSO analysis which was validated using the testing dataset and complete dataset, respectively. More deaths and shorter survival time were reported for patients in the high-risk group compared to low-risk group. lncRNA signature is an independent prognosis predictor as shown by cox regression analysis of the complete dataset. Moreover, genes involved in base excision repair were highly expressed in patients in the high-risk group as shown by gene set enrichment analysis (GSEA) result whereas ECM receptor interaction and focal adhesion pathway were enriched in low-risk group. A nomogram on independent factors showed clinical net benefit asan overall survival predictor of GC. In addition, we identified four subgroups of GC patients with significant differences in overall survival (OS).Subgroup C1 and C2 responded well to immunotherapy, compared to subgroup C3 and C4. Conclusions: m6A-related lncRNA signature and four molecular subgroups provide information on the underlying molecular mechanism of GC and provide for a basis for development ofpersonalized therapy.


Introduction
Gastric carcinoma (GC) is one of the most common malignanttumours and the third leading cause of cancer-relatedmortality (1). It is associated with poor prognosis with about about 80% of patients being diagnosed at an advanced stage (2).Surgery is the most effective therapy for gastric cancer. However, full recovery is not guaranteed for patients with recurrent or un-resectable GC. Notably, 5-year mortality rate for advanced gastriccancer is between 30-50% (3). Therefore, there is need to explore alternative prognostic markers.
Studies report that RNA modi cation plays a crucial role in the post-transcriptional regulation of gene expression (4). A total of 163 different RNA modi cations were reported in all living organisms through MODOMICS by 2017 (5). N6-methyladenosine (m6A) modi cations are the most common RNA modi cations and have been widely studied. m6A methylation is a dynamic reversible process carried out by methyltransferase complex (writers), demethylase (erasers), and function manager (readers) (6). The methyltransferase complex is composed of METTL3, METTL14, KIAA1429, WTAP, RBM15, and ZC3H13 which mediate RNA methylation modi cation process (7). Demethylase is composed of FTO and ALKBH5 and mediatesRNA demethylation process (8,9). On the other hand, the function manager includes YTHDC1, YTHDC2, YTHDF1, YTHDF2 and HNRNPC and play a rolein "reading" RNA methylated information and translation and degradation of downstream RNA (10). m6A through the "writers"link methyl groups to RNA which is further recognized by the "readers" to aid processes such as RNA processing, nuclear export, translation, and decay. Further, m6A plays a role in gene expression regulation through demethylation of RNA by demethylase (11).
Although the human genome can transcribe ~ 60000 genes only ~ 20000 of these genes are protein coding genes. The remaining ~ 40000 genes are mainly classi ed into three types: snRNA, miRNA and lncRNA. Among these non-coding genes, there are ~ 16000 lncRNA genes which account for a quarter of the total genes. lncRNA are at least 200 bp long and have a lower protein-coding potential compared to miRNA and snRNA (12). Previous studies report that lncRNAs play an important role in transcriptional and post transcriptional regulation and chromatin modi cation through regulation of gene expression (13).
Recent studies have reported aberrant lncRNAs expression as diagnostic and prognostic markers in tumors (14). However, the role of lncRNAs in m6A modi cation in GC have not been reported.
In the present study, we analyzed the role of m6A-related lncRNAs in overall survival of GC patients. We constructed 11 m6A-related lncRNA signature and validated using the testing dataset. Notably, the ndings of this study show m6A-related lncRNA signature can be used as an independent prognosis marker for GC. Patients were classi ed into four subgroups using non-negative matrix factorization (NMF) based on the expression of lncRNA.Subgroup C4 showed better response to immunotherapy compared with the other subgroups. The ndings of this study provideinformation on the role of m6A in GC.

Materials And Methods
Data collection and correlation analysis RNA transcriptome dataset and corresponding GC clinical information were downloaded from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) database. Geneswere grouped into protein coding genes and lncRNA genesbased on the human genome annotation data. Expression levels of 13 m6A regulator genes were also determined. We used Pearson correlation coe cient to evaluate the correlation between m6A regulator genes and lncRNA. Geneswith absolutecorrelation coe cient > 0.3 and P value < 0.05 were considered as m6A-related lncRNA. Patients were then grouped into two equal groups: training dataset and testing dataset. Retrieved data were used for subsequent bioinformatics analysis.

Risk model construction
Univariate cox regression analysis, least absolute shrinkage and selection operator (LASSO)-penalized Cox regression analysis were employed to identify m6A-related lncRNA and for construction of prognostic lncRNA signature in the training dataset. The risk score was calculated based on the following formula: , where Exp i represents each lncRNA expression and β i represents the coe cient of each lncRNA. Patients in the testing dataset were further grouped into high-risk and low-risk group based on the risk score.Receive operating characteristic (ROC) curve analysis was used to evaluate the accuracy of lncRNA signature in the training dataset, testing dataset and the complete dataset. The relationship between lncRNA signature and other clinical parameters inGC The complete dataset with corresponding clinical information were used for subsequent analysis. The relationship between lncRNA signature risk score and clinical traits was determined using univariate cox regression analysis and multivariate cox regression analysis and p < 0.05 was considered statistically signi cant.

Construction and validation of Nomogram model
Nomogram prediction model was constructed using the lncRNA signature risk score and independent clinical factors using "rms" R package. The calibration plot was applied to validate the calibration and accuracy of the nomogram (by a bootstrap method with 1,000 replicates). Moreover, accuracy of lncRNA signature risk score in grouping patients and independent clinical factors was validated using the decision curve analysis.

Gene set enrichment analyses
Gene Set Enrichment Analyses (GSEA) method was used to explore the potential KEGG pathway implicated in the lncRNA prognostic signature. The reference gene set was retrievedfrom c2.cp.kegg.v7.1.symbols les and P < 0.05 and a false discovery rate q < 0.25 were considered statistically signi cant.

Non-negative matrix factorization consensus clustering
We used non-negative matrix factorization (NMF) clustering methods to explore potential molecular subgroups based on the prognostic lncRNA expression pro le. The optimal K cluster was selected based on the cophenetic correlation coe cient. We further used the TIDE algorithm to predict the response of the four subgroups to immunotherapy.

Statistical analysis
Computational and statistical analyses were performed using R software (https://www.rproject.org/).Kruskal-Wallis test was used to compare the four subgroupsandexplore their response to immunotherapy. Differences between high-risk group and low-risk group were determined using Kaplan-Meier curve and log-rank test. Clinical datawere analyzed using chi-square test or Fisher's exact test. For all analyses p < .05 was considered statistically signi cant.

Results
Expression pro les of m6A RNA methylation regulators in GC m6A RNA methylation regulators play a crucial role in progression of malignant tumors, therefore, we explored the expression pro les of 13 m6A RNA methylation regulators in GC. Higher overall expression levels of the 13 genes were observed in the tumor tissue compared with normal tissue (Fig. 1A). Notably, tumor tissue showed signi cantly higher expression levels of m6A regulators except for FTO and ALKBH5 compared with normal tissue with p value < 0.05. Further, correlation analysis showed that HNRNPC and YDHDF2 methylation regulators were highly correlated with GC progression (Cor = 0.57) (Fig. 1B).
Identi cation of m6A-realted lncRNAs in GC In order to identify potential m6A-related lncRNAs, Pearsoncorrelation coe cient was used to assess the relationship between m6A methylation regulators and lncRNAs. We identi ed 1351 interactions and 800 m6A-related lncRNAs with acorrelation coe cient > 0.3 and P value < 0.05. Among them, 117 interactions showed negative relationships between lncRNAs and m6A methylation regulators (Supplementary Table S1).
lncRNA signature construction Prognostic value of lncRNA signature The association between lncRNA signature and clinical factors were evaluated by performing KM curves analysis. As shown in Fig. 5A, we can found that the mortality rate was signi cantly increasing from younger group to older group (P < 0.05). Moreover, the stage also presented a signi cant divergence from stage I to stage IV (P < 0.05) (Fig. 5B). These result indicated that the higher risk score is, the greater the progression of GC is. Moreover, we also investigate the prognostic value of the m6A-reated lncRNA signature in GC patients strati ed by clinicopathological variables, including age, gender, grade and stage. For all different strati cation, patients in the high-risk group tended have a lower overall survival rate compared to low-risk group (Fig. 6A-H). These result suggesting that the m6A-related lncRNA signature can predict the prognosis of GC without the need to considering the clinical factors.
Independent prognostic role of the lncRNA signature and individualized prediction model construction To determine the independence of the lncRNA signature in the clinical factors, we conducting a univariate cox regression analysis and multivariate cox regression analysis among lncRNA signature and clinical factors. As shown in Fig. 7A-B, we found that the age, stage and lncRNA signature can can act as an independent prognostic factors for the prognosis of GC. Moreover, we further constructed a nomogram for the GC patients based on these independent factors (Fig. 8A). The calibration plots showed that the performance of the nomogram have a good concordance with the prediction of 1-, 3-and 5-year ( Fig. 8B-D).

Gene Set Enrichment Analyses (GSEA)
The potential pathways or functions of the m6A-related lncRNA signature were explored by performing Gene Set Enrichment analysis. As showed in Fig. 9A-C, we discovered that patients in high risk group mainly involved in cytokine receptor interaction, focal adhesion and ECM receptor interaction pathway, while basal transcription factors pathway were enriched in low-risk group (Fig. 9D).

Consensus clustering of m6A-related lncRNAs
The NMF algorithm using m6A-related lncRNAs was used to explore the molecular subtypes of GC. In order to ensure a robust and reliable subtype, low expression level lncRNAs were ltered out and retained the lncRNAs that associated with survival of GC for univariate cox regression analysisto ensure a robust and reliable subtype.The data was then used in NMF clustering analysis. The cophenetic correlation coe cients were calculated to determine the optimal k value, and k = 4 was eventually selected as the optimal cutoff after comprehensively consideration (Fig. 10A, four subtypes were named C1, C2, C3 and C4). When k = 4, the consensus matrix heatmap still presented sharp and crisp boundaries, suggesting stable and robust clustering for the samples (Fig. 10B). Notably, the KM curve analysis result showed an signi cant survival difference among subgroups, of which C1 and C2 showed a better survival outcome compared to C3 and C4 (P = 0.031) (Fig. 10C). Moreover, the different expression pattern of the lncRNAs in the four subgroup suggesting that these lncRNA might play an important role in the subgroups (Fig. 10D). Further, we used the TIDE algorithm to predict response of the four subtypes to immunotherapy.Subtype C1 and C2 responded better to immunotherapy compared with subtype C3 and C4 (P = 0.0072).

Disscussion
Several studies have reportedthat m 6 A-related lncRNA are implicatedin the development of various tumors, including GC (15). Therefore, exploring the role of lncRNA in the prognosis or diagnosis of GC will contribute to understanding the molecular mechanism of GC (16). Further, the role of m6A-related lncRNAs in the prognosis and diagnosis of GC are still not clear and deserve further study (17).
In the present study, we systematically investigate the role of m6A-related lncRNAs in the prognosis of GC. We retrieved 339 GC patients with survival time more than 30 days to the downstream analyses. By performing correlation analysis, univariate cox regression analysis and LASSO-penalized regression analysis, we successfully constructed a 11 m6A-related lncRNA signature. KM curve analysis result indicated that the signature could e cient stratify patients' OS and have a robust high prognostic value. Further ROC analysis result suggesting that the lncRNA signature have a high accuracy in predicting the 5-year OS of GC. In this 11 -lncRNA signature, each lncRNA can be regarded as protective or risky prognostic factors for patients in GC. However, compared with one lncRNA, a multiple lncRNAs signature can evaluate the prognosis more accurately and e ciently ( Supplementary Fig. 2). Thus, the m6A-related lncRNA signature could served as an reliable prognostic indicator for GC survival.
We also evaluate the relationship between clinical factors and lncRNA signature. We demonstrated that the risk score were presented an signi cantly increasing trend from young group to older group, indicating that the age factor could promote the development of GC. Similarly, the stage also presented increasing trend from early stage to advance stage. We also delighted to found that the age, stage and lncRNA signature can served as an independent prognostic risk factor for the prognosis of GC survival. Moreover, we constructed a nomogram based on the age, stage and lncRNA signature. The calibration plots showed the best performance in 1-year, 3-year and 5-year, which may help plan the short-term follow-up for individual treatment. In addition, we identi ed four robust molecular subtypes based on the m6A related lncRNAs. We demonstrated that subtype C1 and C2 were corresponding to a better survival outcome compared to subtype C3 and C4, and subtype C1 and C2 are more likely to be responded to the immunotherapy than subtype C3 and C4. This may partly explain why the overall survival in the C1 and C2 were better than C3 and C4.
Overall, our prognostic model is based on the 11 m6A-related lncRNAs, which signi cantly lowers the cost of sequencing and have a high clinical application value. Moreover, the prognostic model showed a good performance for survival prediction in patients with GC. Nonetheless, several limitations needed to be discussed. Firstly, our prognostic model was constructed based on the TCGA database, lack of external cohort or patient cohort. Secondly, the expression level of lncRNAs needed to be further validated using the in vivo or in vitro experiments.

Conclusions
In summary, we explored the expression levels and prognostic value of m6A-related lncRNAs through a myriad of analyses. We constructed a 11-lncRNA signature with a high prognostic value and can served as an independent prognostic factor for GC. To the best of our knowledge, this is the rst report to developing a m6A-related lncRNA risk model for GC. whereby subtype C1 and C2 showed better response to immunotherapy compared with C3 and C4.The ndings of this study provide new insight into understanding the role of m6A-related lncRNAs in GC and provide a basisfor the development ofpersonalizedtherapy.

Declarations
Ethics approval and consent to participate Not applicable

Consent for publication
Not applicable Availability of data and materials The datasets generated for this study can be found in the https://portal.gdc.cancer.gov/ and https://dcc.icgc.org/

Competing interests
The authors declare that they have no competing interests.       The receive operator curve (ROC) analysis for the m6A-related lncRNAs signature in the training dataset (A), testing dataset (B) and complete dataset (C), respectively.         Gene set enrichment analysis of the m6A-related lncRNA signature in the high-risk group (A-C) and lowrisk group (D), respectively. Gene set enrichment analysis of the m6A-related lncRNA signature in the high-risk group (A-C) and lowrisk group (D), respectively.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.