Construction of a m6A RNA Methylation Regulator-related Signature for Prognosis and Immune Response in Lung Adenocarcinoma

Gene expressions with clinicopathological data were downloaded from The Cancer Genome Atlas (TCGA), and Gene Expression Omnibus (GEO) database. We compared the expression of twenty m6A regulator genes between tumor and normal samples. Univariate and least absolute shrinkage and selection operator (LASSO) Cox regression model were used to derive a multi-gene signature. A signature-based nomogram was developed, and the prediction performance was validated by an exterior validation set (GSE72094). Weighted gene co-expression network analysis (WGCNA) was performed to construct a co-expression network and identify the hub genes. The association of m6A signature with immunity was examined.


Background
Lung cancer is the most frequent type of malignant tumors, and results in the leading cancer-related cause of death in the United States (US) [1]. According to the US cancer statistic report, about 228,820 new cases were expected to be diagnosed with lung cancer in 2020, and approximately 135,720 Americans were projected to die of this cancer [2]. Lung adenocarcinoma (ADC), the most common histological type, accounts for approximately 49.7% of all lung cancers in the US [3], and exhibits an absolute growth in incidence currently [4,5]. Despite the great improvement has been achieved in lung cancer screening and personalized treatment modalities (precision medicine) in current years, the 5-year survival rates still remain dismal. As for stage I, the 5-year survival probability is 68-92%, while for stage IV, it decreased to 0-10% [6,7]. Therefore, exploring a prognostic molecular signature and developing a prediction nomogram are essential for selecting the optimum therapeutic strategies and improving the adverse prognosis for lung ADC patients.
Meanwhile, it is discovered that the changes of m6A regulator genes can exert important impacts on various physiological processes, such as self-renewal capability, circadian periods, embryonic development, stem cell differentiation, and cell death [22]. Moreover, increasing evidence has con rmed that aberrant m6A methylation modi cation is intimately related to cancer pathogenesis and development [11,23]. For instance, it is reported that METTL3 is generally up-regulated in clear cell renal cell carcinoma, and the deletion of METTL3 is signi cantly correlated with a worse survival [24]. In addition, another study also demonstrates the overexpression of METTL3 can dramatically suppress proliferation, migration and invasion for colorectal carcinoma (CRC) cells [25]. Thus, m6A regulator genes are not only promising to become the potential molecular markers for cancer diagnosis and prognosis, but also quali ed to serve as the targets for the design of targeted anticancer drugs, which has been demonstrated in several literatures [26][27][28]. Several studies have investigated the role of m6A regulator genes in the survival of cancers, such as glioma [29], gastric cancer [30], and head and neck squamous cell carcinoma [31]. However, they did not include the newly discovered m6A regulator genes, seldom constructed a m6A-based nomogram, and never investigated the association with immunity Therefore, in this study, we systematically evaluated the expression of twenty m6A regulator genes in 535 lung ADC and 59 normal samples, from which we screened out seventeen differentially expressed genes for further study. Meanwhile, we derived a multi-gene prognostic signature, and further assessed its performance in predicting the survival probability of lung ADC patients. Furthermore, we developed a prognostic nomogram to serve as a simple and reliable clinical tool to forecast each lung ADC patient's survival probability. To verify the model reliability, another independent cohort available from the GEO was used as an exterior validation set. Weighted gene co-expression network analysis (WGCNA) was performed to identify the risk-related modules, physiological biology, pathways, and hub genes. In addition, we also evaluated the association of m6A prognostic signature with immune microenvironment, immune in ltration, immune checkpoint molecules, tumor mutational burden (TMB), and the response of immune checkpoint blockade (ICB).

Datasets
The whole study was designed and conducted based on the following ow diagram (Fig. 1). The RNA-seq transcriptome data, clinicopathological data, and simple nucleotide variation data were downloaded from the TCGA database (https://portal.gdc.cancer.gov). For RNA-seq data, the expression value of each gene was normalized by the Fragment Per Kilobase Million (FPKM) method. To conduct an external validation, GSE72094 [32] was used as a validation set. The gene expression pro le with clinicopathological data was accessed from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/). As only a small number of patients were classi ed as "Asian" or "American Indian/ Alaska Native", we mainly concentrated on white and black patients.

Statistical analysis
Wilcoxon-Mann-Whitney test was utilized to compare the expression level of each m6A regulator gene between 535 lung ADC samples and 59 normal samples. In the meantime, we also utilized Wilcoxon-Mann-Whitney test to compare the expression level of each m6A regulator gene for age, gender, race, and smoking history, and Kruskal-wallis test for American Joint Committee on Cancer (AJCC) stage. In addition, spearman correlation analysis was performed to investigate the correlations among different m6A regulator genes. To verify our model performance of the training set, we employed an independent GEO cohort (GSE72094) as an exterior validation set. Furthermore, univariate Cox model was employed to assess the association of m6A regulator genes with overall survival (OS) in the training cohort, from which we screened and identi ed six genes signi cantly correlated with survival (P < 0.05). Then, we further included these genes into the least absolute shrinkage and selection operator (LASSO) Cox regression model to construct an optimum prognostic signature. Eventually, ve m6A regulator genes with their coe cients were ascertained based on lambda.min (the lambda value corresponding to the minimum mean error) via 10-fold cross validation, and thereby a ve-gene prognostic signature (IGF2BP1, IGF2BP2, HNRNPA2B1, METTL3, and HNRNPC) was derived. In order to gure out each patient's risk score in the training cohort and validation cohort, we performed a sum over each gene' score, which was acquired through multiplying the expression level of each gene by its coe cient. Patients were dichotomized into low-risk and high-risk groups according to the median value of risk scores. The prediction accuracy of the prognostic signature was assessed using Receiver operating characteristic (ROC) curves.
Chi-square tests were employed to compare the frequency distributions of grouped variables (including age, gender, race, smoking history, and AJCC stage) between the two risk groups. Additionally, survival curves between different risk groups were plotted by Kaplan-Meier method and compared by log-rank test. Subsequently, univariate Cox model was utilized to assess the association of variables (including risk score and clinicopathologic characteristics) with OS in the training set, from which we identi ed the variables signi cantly associated with survival (P < 0.05) and introduced them into the multivariate Cox model. For continuous variables (such as risk score), the restricted cubic splines (RCS) with three knots located at the empirical quantiles (10%, 50%, and 90%) were tted to relax the linearity assumption of the model [35] [35][35] [35]. Then, forest plots were drawn to better visualize each prognostic variable's effect on OS, and a nomogram on the basis of the coe cients was constructed to forecast the 1-year, 3-year, and 5-year survival probability of each lung ADC patient. Accordingly, concordance index (C-index), area under the ROC curve (AUC), and calibration curves were used to evaluate the model performance in the training and validation set. A decision curve analysis (DCA) was conducted to assess the clinical usefulness of the nomogram [36]. All statistical analyses were performed utilizing the statistical software R v3.5.2. A two-tailed P < 0.05 was considered statistically signi cant.

WGCNA
Differentially expressed genes (DEGs) between high-risk and low-risk groups were screened based on the following criteria: (1) false discovery rate (FDR) < 0.05; (2) |logFoldChange| > 1. We incorporated the identi ed DEGs into the WGCNA to construct a gene co-expression network. Based on the power value β, a weighted adjacency matrix was constructed, and then the matrix was changed into a topological overlap matrix (TOM). Subsequently, gene modules were identi ed using the dynamic shear approach.
According to the module signi cance (MS), and the correlation coe cients between Module eigengenes (MEs) and risk scores, the most risk-related module was determined. All the genes in the modules were included into the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses using the DAVID database (https://david.ncifcrf.gov/). The determination of hub genes were based on the following criteria: (1) conforming to the cutoff criteria (Module membership (MM) > 0.85 and Gene signi cance (GS) > 0.55); (2) the top ten percent of genes based on the connectivity of the weighted network; (3) the top ten percent of genes based on the degree of the protein-protein interaction (PPI) analysis; (4) genes signi cantly associated with survival (P < 0.05).

Immunity
We utilized ESTIMATE algorithm to calculate each patient's immune score, stromal score, and tumor purity [37]. CIBERSORT was used to determine the proportions of the 22 immune cell subtypes for each sample [38], and P-value < 0.05 was set as the threshold value. As immune checkpoint molecules were important for immunotherapy, we explored their associations with m6A prognostic signature. A total of ten immune checkpoint molecules (PD-L1, PD-1, PD-L2, CTLA-4, IDO1, LAG3, TIM-3, TIGIT, CD27, and ICOS) were examined, and the expression differences between the high-risk and low-risk groups were assessed using Wilcoxon-Mann-Whitney test. TMB is de ned as the total number of the somatic coding mutations per million bases. In this study, the mutational frequency of each sample was computed based on the number of variants divided by 38 (due to the length of exons being 38 million). For purpose of evaluating the potential responses of ICB, including CTLA-4 and PD-1, for the high-risk and low-risk groups, Tumor Immune Dysfunction and Exclusion (TIDE) algorithm was adopted, and the proportion of the response was calculated [39].

Results
The differential expression of twenty m6A regulator genes between normal and lung ADC samples Allowing for the crucial role of each m6A regulator gene in cancer pathogenesis and development, we systematically investigated the expressions of twenty m6A regulator genes between normal and lung ADC samples, and the expression level of each gene was visualized by heatmap and boxplot. As shown in the heatmap, the expression levels of m6A regulator genes were signi cantly different between tumor and normal samples (Fig. 2a). To be exact, HNRNPA2B1, HNRNPC, HNRNPG, IGF2BP1, IGF2BP2, IGF2BP3, KIAA1429, METTL3, RBM15, YTHDF1, YTHDF2, and YTHDF3 were signi cantly up-regulated in lung ADC tissues, while FTO, METTL14, METTL16, WTAP, and ZC3H13 were signi cantly up-regulated in normal control samples ( Fig. 2a-b). In addition, YTHDC1, YTHDC2, and ALKBH5 were observed with no signi cant difference between tumor and normal samples. Consequently, these three genes were removed, and a total of seventeen genes were included into further study. We also explored the genetic changes (simple nucleotide variation) of these seventeen m6A regulator genes, and discovered that the mutation frequency was generally low (no more than 3%) (Fig. 2c), which denoted that the expression changes of these genes were less likely to be caused by single nucleotide mutation.
The interaction and correlation among the seventeen m6A regulator genes For better comprehending the relationships among these m6A regulator genes, we further investigated their interactions and correlations. As shown in the interaction analysis ( Fig. 2d-e), METTL3 and METTL14 were regarded as the hub genes of the network, while IGF2BP1, and IGF2BP3 had less associations with others. Moreover, YTHDF3, IGF2BP2, and METTL16 had no relationship with the others. Notably, 'writer' genes had the highest while 'readers' genes had the least associations with others ( Fig. 2d-e). Interestingly, all the 'writer' genes (except for METTL16) were interacted with each other (Fig. 2d), while METTL16 had no association with the others. As displayed in the correlation analysis, seventeen m6A regulator genes tended to be weakly to moderately correlated with each other. In addition, we also observed that KIAA1429 was positively correlated with YTHDF3 ( Fig. 2f), and their correlation coe cient was the highest (r = 0.66) than the others (Fig. 2f). Meanwhile, the highest negative correlation coe cient was found between FTO and HNRNPC, whereas they merely had a low correlation (r = -0.29) (Fig. 2f). Besides, KIAA1429 was the only gene signi cantly correlated with the other 16 m6A regulator genes (Fig. 2f), suggesting a critical in uence in the network.
Development of a prognostic signature and identi cation of the low-risk and high-risk groups To assess the association of m6A regulator genes with OS, univariate Cox regression model was employed in the training set. Our ndings revealed that six out of seventeen genes were identi ed as the prognostic genes (P < 0.05), that is, IGF2BP1, IGF2BP2, IGF2BP3, HNRNPA2B1, METTL3, and HNRNPC (Fig. 3a). Then, we incorporated these genes into the LASSO Cox regression model to develop an optimum prognostic signature. Ultimately, ve genes were identi ed, and their coe cients were calculated based on the lambda.min via 10-fold cross validation (Fig. 3b-d). Thereby, a ve-gene prognostic signature (IGF2BP1, IGF2BP2, HNRNPA2B1, METTL3, and HNRNPC) was determined. Each patient's risk score was computed via the following formula: Risk score = 0.037*IGF2BP1 + 0.004*IGF2BP2 + 0.007*HNRNPA2B1 + 0.022*HNRNPC + (-0.125)*METTL3, and patients were dichotomized into high-risk and low-risk groups on the basis of the median risk score (value = 0.060).
Additionally, we also explored the distributions of risk scores and survival status, and our results revealed that low-risk subgroup had more alive cases and a higher OS than high-risk subgroup (Fig. 3e-f).
According to the results of the survival curve, we found that the high-risk group had a signi cantly poorer OS than low-risk group (P < 0.0001) (Fig. 4a), which indicated that the signature could successfully distinguish high-risk from low-risk lung ADC patients. Meanwhile, ROC curve demonstrated that the AUC of the ve-gene prognostic signature were 0.684 and 0.646 for 3-year and 5-year survival rate of lung ADC patients, respectively, which was slightly less than the "acceptable discrimination" (Fig. 4b) [40].
Identi cation of the risk score as a prognostic indicator and development of a predictive nomogram For purpose of ascertaining whether the ve-gene signature-derived risk score could be served as a prognostic marker, both univariate and multivariate Cox model were employed in the training cohort (n = 398). The clinicopathologic characteristics of the training cohort were displayed in Table 1. In the univariate Cox model, AJCC stage (HR, 1.629; 95% con dence interval (CI), 1.400-1.894; P < 0.0001), and risk score (HR, 3.332; 95% CI, 2.242-4.952; P < 0.0001) were determined to be signi cantly related to the OS for lung ADC patients (Fig. 5a). Further included into the multivariate Cox model, AJCC stage (HR, 1.574; 95% CI, 1.353-1.831; P < 0.0001) and risk score (HR, 3.119; 95% CI, 2.059-4.724; P < 0.0001) were still signi cantly related to survival, and could be perceived as the independent unfavorable prognostic factors (Fig. 5b). Moreover, a nomogram based on the coe cients from multivariate Cox model was derived to visually forecast the 1-year, 3-year and 5-year survival probability for each lung ADC patient (Fig. 5c). According to the variable values (AJCC stage, or risk score) of each patient, drew the vertical lines from the value points of corresponding variable row to the "Points" row. Identifying the intersection points of vertical lines with the "Points" row, each variable' score was acquired. By performing a sum over each variable' score, a total score was determined. Based on the derived total score, continued to draw a vertical line from the "Total Points" row to the "1-Year Survival", "3-Year Survival" or "5-Year Survival" row, then we would obtain a corresponding predicted survival probability.
In order to determine the discrimination e cacy and prediction accuracy of the nomogram in the training set, related indicators, such as C-index, AUC, and calibration curves, were evaluated. Our results revealed that the derived nomogram was well calibrated, as the curves were close to the diagonal line (Fig. 5d).
Besides, the C-index of the model was 0.71 (CI, 0.66-0.76) (Fig. 5d), and the AUC uctuated around 0.75 in the follow-up period, which indicated a satisfactory discrimination power (Fig. 5e). We also discovered that the nomogram had a higher discrimination power than the conventional prognostic index-AJCC stage (Fig. 5e). Subsequently, the DCA showed that the clinical net bene t gained from the nomogram was higher than that in the either hypothetical treat-all-patients or treat-none scenarios, when the threshold probabilities were within the range of 0.06-0.48, 0.18-0.81, and 0.42-0.80 for 1-year, 3-year, and 5-year OS, respectively (Fig. 5f). Therefore, the developed nomogram had a favorable clinical net bene t for predicting 1-year, 3-year OS, and 5-year OS.

Model validation in GEO dataset
To further verify the model reliability, a total of 320 lung ADC samples available from the GSE72094 were used as a validation cohort. The clinicopathologic characteristics of the validation cohort were displayed in Table 1. Based on the formula mentioned above, each patient's risk score was computed, and patients were dichotomized into high-risk and low-risk groups based on the determined cutoff value (value = 0.060). As shown in Fig. 6a-b, high-risk subgroup had more dead cases, and a lower OS than low-risk subgroup. Similar ndings were also observed in the survival curve (P = 0.001) (Fig. 6c). When further performing univariate and multivariate Cox analyses, we found risk score was still perceived as an independent unfavorable prognostic factor for lung ADC patients, both P < 0.0001 (Fig. 6d-e). Meanwhile, ROC curve indicated that the AUC of the prognostic signature were 0.695 and 0.656 for 3-year and 5-year OS, respectively, which was -less than the "acceptable discrimination" (Fig. 6f). Furthermore, the 1-year, 3year, and 5-year nomogram calibration curves were still close to the diagonal line, and the C-index was 0.72 (CI, 0.67-0.77), both denoting a satisfactory model performance (Fig. 6g). In addition, we also found the AUC of the model uctuated around 0.75 in the follow-up period (from 1-year to 5-year follow-up), and was superior to the AJCC stage (Fig. 5e).

WGCNA
After differential expression analysis, a total of 741 DEGs (335 up-regulated and 406 down-regulated) were identi ed (Fig. 7a), and we further included them into the WGCNA to develop a co-expression network. For purpose of constructing a scale-free network, the soft threshold power was set as 8, which achieved a higher scale-free topology t index (> 0.8) accompanied with a higher mean connectivity (Fig. 7b-c). By means of the average linkage hierarchical clustering, a total of three modules (blue, brown, and turquoise) were identi ed (Fig. 7d). To investigate the associations of clinical characteristics with modules, correlation coe cients between MEs and clinical characteristics were calculated. Turquoise module was identi ed with the highest correlation with risk score (Fig. 7e-f).
To functionally annotate the genes in the turquoise module, both GO and KEGG enrichment analyses were conducted, and we found the genes were primarily enriched in biological processes, including cell division, cell proliferation, and cell cycle, as well as in signaling pathways, including cell cycle, and p53 signaling pathway (Fig. 8a-b).
A total of 22 genes highly correlated with risk score (GS > 0.55) and turquoise module (MM > 0.85) were determined as the potential candidates of hub genes (Fig. 8c). Based on the connectivity of the weighted network, top fteen genes (top 10%) were identi ed as the hub gene candidates. Furthermore, we performed a PPI analysis for all genes in the turquoise module using STRING database and Cytoscape software, top 15 genes (top 10%) with the highest degree were regarded as the hub gene candidates (Fig. 8d). By intersecting the above three lists of candidate genes, a total of six common genes were identi ed and included into the further survival analysis (Fig. 8e). Moreover, to investigate the association of six genes with the survival of lung ADC patients, univariate Cox model was adopted, and our ndings revealed that the genes were signi cantly related to survival (Fig. 8f). Eventually, these six genes (CCNA2, CCNB1, BUB1B, BUB1, KIF2C, and KIF11) were all determined as the real hub genes.
The association of m6A prognostic signature with immunity ESTIMATE algorithm [37] was utilized to evaluate tumor immune microenvironment, including immune score, stromal score, and tumor purity for each lung ADC sample. In general, there were no signi cant differences between the low-risk and high-risk groups for immune score, stromal score, and tumor purity (all P > 0.05, Fig. 9a), and the correlation coe cients between the signature-based risk scores and immune microenvironment scores were all below 0.2 (Fig. 9b) which was regarded as negligible correlation [41,42]. In addition, we also investigated the association of m6A prognostic signature with immune in ltration. As for the 21 immune cell types, high-risk patients had signi cantly lower proportions of Macrophages M1, T cells CD4 memory activated, T cells CD8, and Neutrophils (Fig. 9c). As immune checkpoint molecules were important for immunotherapy, we explored their associations with m6A prognostic signature. As shown in Fig. 9d, high-risk patients had signi cantly higher expression levels of PD-L1 and PD-L2. As TMB is regarded as an important biomarker of response for PD-1/PD-L1 blockade, we examined the association of the signature with TMB. In the present study, there was a low correlation between risk score and TMB (r = 0.28) (Fig. 9e), and high-risk patients had signi cantly higher TMB (Fig. 9f), suggesting a superior response to ICBs among high-risk patients. Finally, in order to assess the potential responses of ICB, including CTLA-4 and PD-1, for different risk groups, TIDE algorithm was adopted. We found high-risk patients had a signi cantly higher proportion of ICB response than low-risk patients (Fig. 9g). In summary, all the above ndings indicated that high-risk patients might be more sensitive to the ICB therapy.

Discussion
It is generally believed that patients diagnosed with late stage lung ADC tend to have a poorer clinical prognosis [2,6,43,44]. Therefore, seeking for a robust prognostic signature to distinguish high-risk from low-risk patients and a nomogram to predict each patient's survival probability, is essential for determining the optimum therapeutic strategies and reversing the adverse prognosis for lung ADC patients. Accumulating evidence proved that abnormal m6A RNA methylation modi cations were intimately related to the pathogenesis and development of various cancers [11,23,29,31,[45][46][47], whereas the studies investigating their role in lung ADC are few. In this study, we discovered a signi cantly abnormal expression of m6A regulator genes in lung ADC. Besides, a robust ve-gene prognostic signature was developed and regarded as an unfavorable prognostic factor for lung ADC.
Based on the signature, we constructed a prognostic nomogram to predict each patient's survival probability. To further assess the model performance, an exterior validation set (GSE72094) was used, and the nomogram was demonstrated with a satisfactory discrimination and calibration performance.
Our results revealed that seventeen out of twenty m6A regulator genes were observed with either upregulation or down-regulation in lung ADC samples, indicating these genes might be involved in the oncogenic activities and prognosis for lung ADC patients. Unanimous with our ndings, the abnormal expression of m6A regulator genes was also found in other cancers. For instance, it is reported that the writer WTAP was found to be up-regulated in various cancers, including hepatocellular carcinoma (HCC) [48], acute myelogenous leukemia (AML) [49], and glioblastoma [50], and was identi ed as an oncogene.
The eraser ALKBH5 was reported to be highly expressed in ovarian cancer [51], and the silencing of ALKBH5 signi cantly improved autophagy and suppressed proliferation and invasion. The reader YTHDF1 was observed with an up-regulation in CRC [52], and its knockdown signi cantly inhibited the tumorigenicity and colonosphere formation ability. Therefore, expression dysregulation of m6A regulator genes was prevalently found in various cancers, and was strongly related to tumorigenesis, progression, and prognosis.
When GSEA was adopted, we found m6A regulator genes were signi cantly associated with some important malignancy-related pathways, including p53 signaling pathway, cell cycle, mismatch repair, nucleotide excision repair, and pathways in cancer. Comparable results were also reported in other studies. It is reported that m6A regulator genes were extremely important in several cancer-related pathways, such as p53 signaling pathway [29,53], cell cycle [29,30,54], Ras [30], in ammatory response [29,53], and PPAR signaling pathway [53].
In the present study, a ve-gene prognostic signature (IGF2BP1, IGF2BP2, HNRNPA2B1, METTL3, and HNRNPC) was identi ed and displayed with a great prediction value for patients with lung ADC. Furthermore, a distinct survival difference was found between different risk groups, which denoted that clinicians could successfully stratify the lung ADC patients into low-risk and high-risk groups based on the risk scores derived from the prognostic signature, thereby conducing them to make wiser clinical decisions. For the ve identi ed m6A regulator genes, IGF2BP1, IGF2BP2, HNRNPA2B1, and HNRNPC were determined as adverse prognostic genes, while METTL3 was determined as a favorable prognostic gene. Interestingly, all these genes were highly expressed in lung ADC patients. Yan M et al. demonstrated that HNRNPC was drastically up-regulated in non-small cell lung cancer (NSCLC) tissues, which was in agreement with our research. Then, the researchers also discovered the overexpression of HNRNPC dramatically accelerated the proliferation, migration, and invasion of lung cancer cells. Moreover, the elevated expression of HNRNPC was signi cantly related to advanced tumor stages, metastasis, and shorter survival time [55]. IGF2BP1, a target of miR-491-5p, was reported to be signi cantly increased in the expression of NSCLC cells, and promoted tumor cell proliferation, migration, and invasion [56]. The same trend was also discovered in liver cancer [57]. Zhu J et al. revealed that METTL3 was signi cantly up-regulated in lung adenocarcinoma, and patients with high expression of METTL3 were observed with signi cantly better OS [24]. Similarly, another study reported that METTL3 was regarded as a tumor suppressor gene for CRC, and the up-regulation of METTL3 could dramatically suppress tumor cell proliferation, migration and invasion. Furthermore, patients with elevated expression of METTL3 were observed with a signi cantly better survival [25,45]. Overall, their ndings were in agreement with ours.
However, other studies demonstrated an entirely opposite function of METTL3 for bladder cancer [58], NSCLC [59], and liver cancer [24]. Du M et al. proved that METTL3 was targeted by miR-33a and the down-regulated METTL3 could inhibit the proliferation of lung cancer cells [59]. In addition, another study found METTL3 was signi cantly up-regulated in bladder cancer and its knockdown could signi cantly suppress bladder cancer cell proliferation, invasion, and survival in vitro and tumorigenicity in vivo [58].
These ndings demonstrated that METTL3 plays different (or even opposite) roles in different cancers [60]. IGF2BP2, a direct target of miR-485-5p, was discovered to be signi cantly up-regulated in the expression of NSCLC, and its depletion signi cantly suppressed tumor cell proliferation and invasion [61]. Similarly, IGF2BP2 was overexpressed in pancreatic cancer and patients with the overexpression of IGF2BP2 were observed with signi cantly worse OS. Furthermore, the up-regulated IGF2BP2 expression promoted tumor cell growth by activating the PI3K/Akt signaling pathway [62]. HNRNPA2B1, involved in RNA-binding and pre-RNA processing, was high expression in NSCLC patients and associated with a worse prognosis. Furthermore, the overexpression of HNRNPA2B1 accelerated NSCLC cell growth by activating COX-2 signaling pathway [63].
Based on the multivariate Cox model, we developed a prognostic nomogram to predict each lung ADC patient's 1-year, 3-year and 5-year OS probability. As far as we know, this was the rst study incorporating the m6A regulator genes into the model to construct a prognostic nomogram for lung ADC patients.
Notably, the nomogram performed well both in the training and validation set, indicating a robust prediction performance. To evaluate the clinical usefulness, we utilized DCA to ascertain whether the nomogram-based decisions could improve patients' outcomes. The DCA showed that the threshold probabilities were within the range of 0.06-0.48, 0.18-0.81, and 0.42-0.80 for 1-year, 3-year, and 5-year OS, respectively. If the threshold probabilities of lung ADC patients were within the above ranges, adopting the nomogram to predict OS added more bene ts than either hypothetical treat-all-patients or treat-none scenarios. In addition, we also found the nomogram had a higher predictive accuracy than the traditional prognostic index-AJCC stage. In summary, our ve-gene signature-based nomogram was effective in predicting each lung ADC patient's survival probability and could offer a better reference for treatment guidance than single traditional clinical index.
To explore the potential reasons for risk signature being an independent prognostic factor for lung ADC patients, we conducted a WGCNA to identify the related modules, physiological biology, pathways, and hub genes. Our results revealed that cell cycle, an important cancer-related pathway, was signi cantly associated with the risk signature. Furthermore, a total of six genes (CCNA2, CCNB1, BUB1B, BUB1, KIF2C, and KIF11) were determined as the real hub genes. Notably, these hub genes all played an important role in the proceeding of cell cycle [64][65][66][67][68][69][70]. As our prognostic signature was consisted of ve m6A regulator genes (IGF2BP1, IGF2BP2, HNRNPA2B1, METTL3, and HNRNPC), we inferred that these ve m6A-related genes might act with the six hub genes, thereby in uencing the process of cell cycle. Further experimental studies were warranted to ascertain the real molecular mechanisms.
ICB therapy has revolutionized traditional treatment strategies for NSCLC and other cancers, and advanced patients treated with anti-PD-1 and anti-CTLA-4 agents have been demonstrated with better prognosis [71][72][73]. In addition, previous literatures have reported that m6A is extremely important in immunoregulation and autoimmune diseases [74,75]. Therefore, it is essential to investigate the association of the developed m6A signature with immunity. To our knowledge, this is the rst study assessing the relationship between m6A signature with immunity. It is reported that higher PD-L1 expression in tumor cells was closely associated with improved e cacy of immunotherapy [76,77], and the stimulation of PD-1/ PD-L1 pathway inhibits CD8 + T cell proliferation and promotes its apoptosis [78,79]. In this study, we found high-risk lung ADC patients had signi cantly higher expressions of PD-L1 and PD-L2, and lower proportion of CD8 + T, which suggested high-risk lung ADC patients had a superior response to ICBs. Additionally, we also found high-risk patients had signi cantly higher TMB. Several reports have demonstrated that TMB serves as a useful biomarker of response for PD-1/PD-L1 blockade across various cancers, and higher TMB is associated with superior progression-free survival and objective response rates [77,[80][81][82]. Finally, after using TIDE algorithm, we also found high-risk patients had a signi cantly higher proportion of ICB response than low-risk patients. All the above ndings indicated that high-risk patients might be more sensitive to the ICB therapy and more likely to bene t from immunotherapy. With the aid of the signature, clinicians can be easier to identify the potential bene ciaries from immunotherapy.
Several limitations in the present study should be noted. First, all our data were derived from the existing public database. Although an excellent performance in the prediction of the survival for lung ADC patients was observed in our study, a prospective, multicenter study was still warranted to further validate our results. Second, as the study populations were mainly from the US, and we only focused on white and black patients, our ndings might not be generalized to other countries and races. Third, the C-index of our nomogram was merely around 0.7 and the AUC of the signature was within 0.6-0.7. Therefore, incorporating some acknowledged prognostic factors, such as tumor grade, radiation, chemotherapy, operation modes, and immunotherapy, might be conducive to enhancing the prediction accuracy of the present model. Finally, all our results were based on the data mining, and consequently an experimental study is essential for better ascertaining the associated molecular mechanisms.

Conclusions
In conclusion, our studies systematically evaluated the expression and prognosis of m6A regulator genes in lung ADC. A ve-gene prognostic signature was identi ed as an unfavorable prognosis of lung ADC, and the signature could successfully distinguish high-risk from low-risk lung ADC patients. Furthermore, a signature-based nomogram was developed and was effective in predicting each lung ADC patient's survival probability. In addition, we performed a WGCNA and inferred that the genes in the signature might act with the six hub genes, thereby in uencing the process of cell cycle. We also found high-risk lung ADC patients had a signi cantly higher proportion of ICB response and were more sensitive to the ICB therapy. Future clinical and experimental research is warranted to further verify our ndings.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The research data will be shared upon request.

Competing interests
The authors declare no con ict of interest.   Expression of twenty m6A regulator genes and interaction among them (a) The heatmap of twenty m6A regulator genes in 59 normal and 535 lung ADC tissues, "*" referring to 0.01 ≤ P < 0.05, "**" referring to 0.001 ≤ P < 0.01, "***" referring to P < 0.001. (b) Boxplot visualizing the differential expression of twenty m6A regulator genes between normal and tumor tissues, "*" referring to 0.01 ≤ P < 0.05, "**" referring to 0.001 ≤ P < 0.01, "***" referring to P < 0.001.        The association of the m6A signature with immunity (a) Boxplot visualizing the differences of the immune microenvironment (immune score, stromal score and tumor purity) between the high-risk and low-risk groups.(b) Spearman correlation analysis between the immune microenvironment (immune score, stromal score and tumor purity) and risk scores. (c)Relative proportion of immune in ltration between the high-risk and low-risk groups.(d) Boxplot visualizing the differences of the immune checkpoint molecules between the high-risk and low-risk groups. (e)Spearman correlation analysis between TMB and risk scores.(f)Violin plot visualizing the difference of TMB between the high-risk and low-risk groups. (g) Boxplot visualizing the proportions of response between the high-risk and low-risk groups.