Development and validation of a 13-m6a related lncRNA prognostic signature for lung adenocarcinoma

Considerable evidence suggests that N6-methyladenosine (m6A) is involved in the regulation of long noncoding RNA (lncRNA), whichparticipates in the occurrence, development and prognosis of tumorscancerBut the relationship between m6A regulators-related lncRNA (mRlncRNA) and lung adenocarcinoma (LUAD) remains unclear. This study aims to determine a feature based on mRlncRNA for prognostic evaluation of LUAD patients. By integrating the gene expression data of LUAD and normal samples from the Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) database, the m6A gene and mRlncRNA with imbalanced expression were screened out. Then we used the least absolute shrinkage and selection operator (LASSO) to obtain the 13-lncRNA prognostic signature in the TCGA training cohort. Patients were divided into two risk groups based on the risk score of lncRNAs characteristics, and their overall survival (OS) was signicantly different. The predictive power of this signature was veried in TCGA testing cohort and entire TCGA cohort. These landmark lncRNAs were involved in several biologiocal processes and pathways related to cell cycle, DNA replication, P53 signaling pathway and mismatch repair. Besides, the high-risk group was low-response to cisplatin, while high-response to mitomycin, docetaxel and immunotherapy. In conclusion, we identied a 13-mRlncRNA model associated with prognosis and treatment sensitivity in LUAD, which may provide clues about the inuence of m6A on lncRNA in LUAD and promote the further improvement of LUAD individualized treatment strategies. least absolute shrinkage and selection operator; OS: overall survival; DElncRNA: differentially expressed lncRNA; ROC: receiver operating characteristic; AUC: area under the ROC curve; GSEA: gene set enrichment analysis; checkpoint


Background
The latest cancer statistics revealed that lung cancer is currently the second leading cause of new cancer cases and the leading cause of death worldwide [1]. The most common type of lung cancer is LUAD, which accounts for about 40% of all lung cancer cases [2]. Although new drugs targeting key gene mutations improve the survival rate of patients with advanced LUAD, these mutations are not common in LUAD, leading to only some patients bene tting from this treatment program [3]. Therefore, to further explore the therapeutic targets of LUAD for developing effective drugs is in urgent need.
LncRNA is a non-coding RNA de ned as a length of more than 200 nucleotides, its abnormal expression is also considered to play an important role in the occurrence and development of lung cancer [4]. For instance, lncRNA JPX participates in the progression of EMT by activating Wnt /β-catenin signaling, and is closely related to tumor TNM stage [5]; LINC00312 is reported to be a tumor neovascularization-related lncRNA as well as a promising biomarker of glioma patients [6]. LncRNAs have also been shown to play a key role in cancer immunity of LUAD. LINC00301 can target TGF-β to increase regulatory T cells in nonsmall-cell lung cancer (NSCLC) while reducing CD8 + T cell population [7]. Athie et al. con rmed that the expression of lncRNA ALAL-1 was negatively correlated with the immune in ltration of lung squamous cell carcinoma, thereby mediating immune in ltration [8].
N6-methyladenosine (m6A), methylated at the N6 position of adenosine, is considered to be the most common, abundant and conserved within eukaryotic messenger RNA and lncRNA transcription modi cation [9]. The m6A modi cation is regulated by three types of m6A regulators, writers (methyltransferases), erasers (demethylases) and readers (signal transducers) [10]. The m6A modi cation is a dynamic and reversible epigenetic process which involved in almost all steps of RNA metabolism, including translation, degradation, and folding [11]. More and more studies have shown that m6A regulators can regulate the occurrence and development of tumors in LUAD and other cancer. YTHDF1 enhances the translation of EIF3C, thereby promoting ovarian carcinogenesis and metastasis [12]. IGF2BP1 overexpression/knockdown could promote/inhibit endometrial cancer cell proliferation, regulate cell cycle and cancer progression [13]. Low expression of FTO in LUAD cells inhibits LUAD progression by suppressing cell proliferation and invasion [14]. However, the role of m6A regulators in lncRNA dysregulation is not fully understood in LUAD, and whether m6A modi cations are involved in lncRNA-dependent LUAD progression remains unclear.
In our research, we identi ed the differentially expressed m6A regulators based on the LUAD datasets of TCGA and GEO. In addition, we developed 13-mRlncRNA prognostic signature based on the TCGA training cohort, and evaluated its prognostic predictive effect in the test cohort and the entire TCGA cohort. At the same time, we investigated the potential role of the mRlncRNA signature in immune checkpoint inhibitor (ICI) immunotherapy and chemotherapy. Moreover, a nomogram was constructed to more accurately predict OS of LUAD patients.

Data sources and process
As shown in Fig. 1, the RNA-Seq expression data with corresponding clinical information were obtained from The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov). Samples without OS and complete clinicopathological characteristics were excluded. Ultimately, we obtained 464 tumor samples and 54 normal samples from 448 patients. At the same time, we excluded RNAs with an average expression of less than 1 in TCGA. Besides, to make the differential m6A regulators identi ed by the subsequent analysis more accurate, we retrieved and nally downloaded three LUAD datasets (GSE18842 [15] (HNRNPA2B1, HNRNPC, IGF2BP1, IGF2BP2, IGF2BP3, RBMX, YTHDC2, YTHDC1, YTHDF1, YTHDF2, and YTHDF3). Subsequently, two m6A regulators (METTL16, VIRMA) were excluded because they could not be detected in the three datasets GSE18842, GSE19804, GSE33356.

Difference analysis
The R package of Deseq2 and the Wilcoxon test were used to identify differentially expressed lncRNA (DElncRNA) and m6A methylation regulators, respectively. For DElncRNA, | log 2 (fold change) | > 0.5 and P < 0.05 were considered signi cant, while for m6A methylation regulators, P < 0.05 was considered signi cant.

M6A-related lncRNAs and survival analysis
The Spearman correlation test was used to identify mRlncRNA in DElncRNA. | Spearman correlection |> 0.5 and P <0.05 were considered signi cant. Next, according to the best expression cut-off (the expression that yields maximal difference with regard to survival between the two groups at the lowest log-rank P-value), the patients were devided into high-risk group and low-risk group. Univariate Cox analysis was performed on mRlncRNA to identify prognostic mRlncRNA.
Development of a prognostic signature for LUAD LASSO is a practical tool that can lter variables in a large amount of data to reduce the complexity of the prediction model [19]. To determine the prognosis-related lncRNA signature, the R package of glmnet [20] was used to t a LASSO to the mRlncRNA. As shown in Table 3, a total of 446 tumor samples of LUAD patients in TCGA were randomly divided into training cohort (n = 356) and testing cohort (n = 90) at a ratio of 8:2 to construct and validate the lncRNA signature. The model was constructed by LASSO on the training cohort. Then, it was validated in the testing cohort and entire TCGA cohort. According to the expression level (x i ) of the prognostic lncRNA and the regression coe cient (β i ) from the multivariate Cox analysis model, an equation for calculating the risk score is generated, as shown below: Risk score= . To estimate the prognostic power of risk score for OS, Kaplan-Meier method and log-rank test were used for survival analysis with an overall signi cance level of P <0.05. Besides, we constructed time-dependent receiver operating characteristic (ROC) curves with 1-, 3-and 5-year follow-ups as the de ning point. And we used the R package of time ROC to calculate the area under the ROC curve (AUC) to evaluate the predictive ability of risk score.
Comparison of immune cell in ltration in high-risk and low-risk group Cibersort, a versatile calculation method for quantifying cellular tissue gene expression pro les cell fraction, enables accurate estimation of immune combinations for tumor biopsy [21]. The immune enrichment score was used to compare the distribution of immune cell types between high-risk and lowrisk groups (classi ed by prognostic signatures).

Pathway enrichment analysis
The gene set enrichment analysis (GSEA) is a computational method used to assess whether a prede ned gene set shows statistically signi cant and consistent differences between two groups (e.g. phenotypes) [22,23]. To explore the potential functions of the mRlncRNA signature, we performed GSEA enrichment analysis by GSEA software (downloaded from http://software.broadinstitute.org/gsea/index.jsp, v4.1) for the high-risk and low-risk groups. The gene database of "c2.cp.kegg.v7.4.symbols.gmt" from the molecular signature database was analyzed. The signi cance threshold was set at | normalized enrichment score |>1 P <0.05 and false discovery rate < 0.25. The results were plotted by using the R package of ggplot2.
Analysis of the correlation of risk score with molecular subtype and clinical stage The Wilcoxon test was conducted to explore the correlation of the risk score with the subtype and clinicopathological characteristics, and statistical signi cance was indicated by a P value less than 0.05. The LUAD subtype were obtained from a previous study [24].

Nomogram construction
In order to predict 1-, 3-and 5-year survival rates, a nomogram was constructed based on the results of LASSO analysis. The R package of rms was used to test its prediction accuracy through a calibration chart.

Results
The owchart of the present study was summarized in Fig. 1.

Identi cation of differentially expressed m6A regulators and lncRNA
To assess the m6A regulators that may be involved in the development of LUAD, we systematically investigated the expression patterns of m6A regulators between LUAD and normal lung tissues based on the available TCGA and GEO cohorts. As shown in Table 2, m6A related genes RBM15B, IGF2BP1, IGF2BP3, YTHDF1, HNRNPC, and HNRNPA2B1 were signi cantly differentially expressed and upregulated in all four LUAD datasets (P < 0. 05). Especially, the expression of YTHDC1 was up-regulated in LUAD only in TCGA, while it was down-regulated in other datasets. So YTHDC1 was ruled out. Besides, 5907 DElncRNAs were identi ed. These results indicated that m6A RNA methylation regulators possessed essential biological roles in LUAD development.

Identi cation of mRlncRNA and prognostic lncRNA
We obtained 1035 mRlncRNAs by evaluating the correlation between 6 m6A regulators and 5907 DElncRNAs. Then 99 mRlncRNAs associated with prognosis were con rmed according to univariate Cox analysis.
To assess the ability of the lncRNA signature in predicting the survival of LUAD, we established a risk score of the 13-lncRNA signature with the following formula: Evaluating the predictive power of the 13-lncRNA prognostic signature in training cohort The grouping was determined based on the best cut-off value of the 13-lncRNA signature, and the 356 LUAD patients of the training cohort were thus divided into high-risk group (n = 197) and low-risk group (n = 159). Fig. 3a and Fig. 3b illustrated the risk score, survival time and survival status of each patient in the training cohort. A heat map of 13-lncRNA expression was shown in Fig. 3c. Fig. 3d showed the Kaplan-Meier survival curves for the low-and high-risk groups in the training cohort. The survival time was signi cantly lower in the high-risk group than in the low-risk group (P < 0.001). In addition, timedependent receptor operating characteristic (ROC) curves of the 13-lncRNA prognostic signature were shown in Fig. 3e. The area under the curve (AUC) of 1-,3-,5-year OS for the 13-lncRNA prognostic signature risk score was 0.66, 0.69, and 0.71, respectively.
Validation of the 13-lncRNA prognostic signature in testing cohort and entire TCGA LUAD cohort To further con rm the predictive power and stability of the 13-lncRNA signature in predicting OS of LUAD patients, we validated the 13-lncRNA signature in the testing cohort (n = 90) and the entire TCGA LUAD cohort (n = 389) ( Fig. 4 and Fig. 5). The risk score of 13-lncRNA signature was calculated according to the above formula. The KM survival curves of the high-risk group and low-risk group in the test cohort were shown in Figure 4D. Survival time was signi cantly higher in the low-risk group than that in the high-risk group (P = 0.009). Multivariate Cox regression analysis revealed that the risk score of 13-lncRNA signature (HR = 5.8,95% CI = 1.6-22) may be an independent prognostic indicator of LUAD. In addition, time-dependent ROC curves for 13-lncRNA prognostic signature were plotted. Consistent with the ndings in the training cohort, 13-lncRNA prognostic signature had a good discrimination performance on the prognosis of patients with LUAD. The 1-year, 3-year, and 5-year AUC values of the prognostic signature risk scores were 0.61, 0.68, and 0.76, respectively (Fig. 4E). Similar validation results were obtained in the entire TCGA cohort of LUAD. Therefore, risk score of the 13-lncRNA signature might be considered as an independent prognostic indicator of LUAD OS.

Comparison of immune cell in ltration in high-risk and low-risk group
The modi cation of mRNA and lncRNA by m6A has been shown to contribute to the post-transcriptional regulation of the immune response [25]. Therefore, we explored the differences in the 22 tumor-in ltrating immune cells between the two groups.
In the high-risk group, the proportions of dendritic cells activated, macrophages M1, macrophages M2, neutrophils, T cells CD4 memory activated were increased, while the proportions of B cells memory, T cells CD4 memory resting were decreased (Fig. 6a, all P < 0. 05). These ndings strongly suggested that this 13-lncRNA signature was associated with prognosis by interfering with immune cell in ltration in LUAD.
The sensitivity to immunotherapy and chemotherapy among the high-risk and low-risk group Chemotherapy and immunotherapy are currently essential part of the treatment for advanced NSCLC [26]. The chemotherapy response for cisplatin, mitomycin and docetaxel of each LUAD patient in the TCGA cohort was calculated by using the R package of pRRophetic [28]. The estimated IC50 values showed that high risk group demonstrated had a better response to mitomycin and docetaxel, while low risk group had a better response to cisplatin (Fig. 6b, all P < 0. 05). THE expression of PD-L1 on tumor cells has been shown to be positively correlated with the e cacy of PD-1/PD-L1 in NSCLC [27]. Fig. 6c showed that the expression levels of PD-L1, LAG-3, TIM-3 in the high-risk group were higher than that in the low-risk group (all P < 0.05). Above ndings demonstrated that the mRlncRNA risk model had potential predictive value for chemotherapy and immunotherapy.
Pathway enrichment analysis GSEA was further used to verify the potential molecular mechanism between high-risk and low-risk groups. The results showed that compared with the low-risk group, the high-risk group was more enriched in cell cycle, DNA replication, P53 signaling pathway, mismatch repair and nucleoside exception repair, all of which were highly related to tumorigenesis, tumor development and immunotherapy (Fig. 6d).
Identi cation of risk score as a key regulator to promote LUAD progression The proximal-in ammatory (PI) and the proximal-proliferative (PP) subtypes were associated with higher risk scores, while the terminal-respiratory unit (TRU) subtype was associated with lower risk scores (Fig.7  a). The correlation between risk score and clinicalopathological characteristics of LUAD patients were shown in Fig.7 b-g, indicating that risk score was positively correlated with several categories, namely pathologic stage, tumor stage, lymph node status (all P <0.05).

Construction and validation of nomogram
To create a clinically applicable scoring tool to predict the OS of LUAD patients at 1-, 3-, and 5 years, we used risk score (based on 13-lncRNA signature), age, and pathological stage to create a nomogram (Fig.  8a). The 1-year, 3-year, and 5-year OS calibration curves showed that the nomogram's prediction of OS did not deviate signi cantly from the actual OS (Fig. 8b).

Discussion
Lung cancer is the leading cause of cancer deaths around the world, causing nearly 1.8 million deaths in 2020 [1]. The 5-year survival rate for stage IV lung cancer patients is less than 10% [29]. Therefore, it is necessary to nd more effective biomarkers to guide the treatment of LUAD patients. Abnormal expression of m6A regulators has been con rmed to play an important role in tumor. In our study, we rst analyzed multiple data sets and observed that the expression of 6 m6A regulatory genes (RBM15B, The same as m6A regulators, lncRNA is also a potential new type of cancer biomarker and therapeutic target. LncRNA is abundantly expressed in a variety of cancer, and its abnormal expression and mutation are widely related to tumorigenesis, metastasis and tumor staging [34]. Studies show that certain lncRNAs are speci cally expressed in the plasma of lung cancer patients, which can improve the accuracy of early lung cancer screening [35]. For example, METTL3, one of the m6A regulatory genes, increases the expression of lncRNA ABHD11-AS1 by modifying the m6A of ABHD11-AS1, which promotes the proliferation of cancer cells and the Warburg effect, resulting in a poor prognosis for NSCLC patients [36]. Besides, lncRNAs also regulate m6A methylation in cancer. LINC01234, which is overexpressed in NSCLC, interacts with HNRNPA2B1 to promote cell proliferation and inhibit apoptosis [37]. Similarly, LIN28B-AS1 could interact with IGF2BP1 to promote the proliferation and metastasis of LUAD [38].
However, the research on lncRNAs related to m6A in LUAD is still lacking. This study aims to provide new ideas for the regulation of m6A on lncRNA in LUAD patients.
The crosstalk between tumor cells and immune cells in the tumor microenvironment (TME) plays an important regulatory role in tumorigenesis, including patient prognosis and treatment response [46]. The m6A modi cation has been con rmed to play an important role in the regulation of anti-tumor immunity in dendritic cells [47]. Some studies have shown that lncRNA is not only a prognostic biomarker, but also can be used to distinguish the proportion of immune cells in ltrated in characteristics of TME [48,49]. Therefore, we further studied the TME in LUAD by comparing the in ltration of 22 immune cell types. We found that the in ltration levels of dendritic cells activated, macrophages M1, macrophages M2, neutrophils, T cells CD4 memory activated were higher in the high-risk group than those in the low-risk group. Conversely, the proportions of B cells memory, T cells CD4 memory resting were attenuation in the high-risk group. In a mouse model of NSCLC, in ltrating neutrophils contribute to tumor development by promoting immune escape and altering angiogenesis leading to a hypoxic environment [50]. Similarly, lung cancer cells induce memory CD4 + T cells to secrete interleukin-22 to promote tumor growth [51]. Macrophages, which constitute 50% of the mass of solid cancers, are related to the poor prognosis of a variety of tumors. The pro-in ammatory M1 macrophage phenotype expressed mainly in the initial stage of tumor promotes tumor transformation, and in the later stage of carcinogenesis, most of the macrophage population transforms into the anti-in ammatory M2 phenotype [52]. These indicate that the 13-mRlncRNA signature may interfere with immune cell in ltration in LUAD, while the exact regulatory mechanism of these mRlncRNAs on immune cells is still unknown and more researches are needed.
Although chemotherapy remains the primary treatment for lung cancer, recent developments in immunotherapy have also greatly improved the survival rates of lung cancer patients [53]. In our study, the ICI biomarkers LAG-3, PD-L1 and TIM-3 were signi cantly high expressed in the high-risk group. There is no doubt that PD-L1 is an important target of LUAD immunotherapy, which has shown promising e cacy in patients with NSCLC [55,56]. TIM-3 positivity is closely related to the worse survival of LUAD [57]. Besides, the expression of LAG-3 is related to the poor prognosis of LUAD, which is also related to the high expression of PD-L1 [58]. METTL3 increases the level of M6A modi cation of lncRNA MALAT1, which ultimately promotes the induction of NSCLC metastasis and cisplatin resistance [53]. The response to cisplatin, mitomycin and docetaxel between the two groups was analyzed by using the R package of pRRophetic. The results demonstrated that high and low-risk groups showed different responses to cisplatin, mitoycin and docetaxel, respectively. In summary,13-mRlncRNA may modulate immunotherapy and chemotherapy responses in LUAD patients.
GESA results showed that high-risk group was signi cantly enriched in classic tumor malignant characteristic pathways, including cell cycle, DNA replication, P53 signaling pathway, mismatch repair, and nucleoside exception repair. Previous experimental studies provided evidences that mismatch repair is associated with LUAD progression [59]. Huang et al. found that the activation and up-regulation of P53 signal contributes to lung cancer cell apoptosis and inhibits cell proliferation by arresting the cell cycle in the G1 phase [60]. These studies have suggested that P53 and other signaling pathways may serve as the targets for m6A methylation modi cation. Therefore, these indicate that m6A methylation modi cation, P53 and other signaling pathways are jointly involved in the differential regulation of TME between the high and low risk subgroups.
Based on the study of gene expression, three molecular types of LUAD have been proposed, namely terminal-respiratory unit (TRU), the proximal-in ammatory (PI) and the proximal-proliferative (PP) subtypes. These subtypes are also con rmed to be closely related to the clinicopathological characteristics of the patients s and show a strong ability to predict prognosis [24]. In our study, a higher risk score is associated with PI and PP, while a lower risk score is associated with TRU. Consistent with our results, patients with TRU subtypes have a lower tumor proliferation rate and a better prognosis. On the contrary, patients with PI and PP subtypes are more likely to identify non-targeted driven mutant genes, which are associated with higher tumor proliferation rates and poor prognosis [61, 62].
Simultaneously, risk score is positively correlated with clinicalopathological characteristics such as pathological stage. These results indicated that 13-lncRNA signature may be a key regulator to stimulate LUAD progression.
Overall, these 13 lncRNAs can predict the prognosis of LUAD patients. They may affect the prognosis of patients by affecting the sensitivity of immunotherapy and chemotherapy, in uencing different immune cell in ltration, and activating different signaling pathways. These factors can cross talk with each other, or affect the prognosis of patients by activating other ways. However, these results need to be further studied.
This study is the rst to comprehensively identify and systematically analyze the characteristics of m6A regulators and mRlncRNA in LUAD. We have developed a 13-mRlncRNA prognostic model, and the results indicate that 13-mRlncRNA signature is a promising predictor, which may provide new insights for LUAD treatment strategies and guide effective therapy. However, our research also has certain limitations. Our study requires more cohorts of LUAD patients to verify the predictive power of the con rmation signature and nomogram. Another limitation is that the exact molecular mechanism of these 13 lncRNAs has not been studied. Therefore, it is necessary to conduct further in vitro and in vivo studies based on functional studies to verify these hypotheses and make these results more convincing in future clinical applications.   Figure 1 Flow chart of analysis.      Risk score is capable of propelling GC progress. (a-g) The relationship between risk score and molecular subtype, age, gender, pathologic stage, tumor stage, lymph node status, and metastasis.