Establishment of a 5 DNA Methylation Site Prognostic Signature Associated with N6-Methyladenosine in Lung Adenocarcinoma

Background: There is a high incidence of lung adenocarcinoma (LUAD). Even with surgery, targeted therapy and immunotherapy, the survival rate of LUAD patients is still low. N6-methyladenosine (m6A) and DNA methylation markers can help with the diagnosis and treatment of LUAD patients. Therefore, it is necessary to identify a novel m6A-related DNA methylation sites signature to predict the survival of patients with LUAD. Methods: In this study, we screened 15 m6A-related genes and their 217 methylation sites. RNA sequencing data of 15 genes and the clinicopathological parameters of TCGA-LUAD were obtained from the TCGA database (http://cancergenome.nih.gov/). The LUAD-DNA CpG site information was obtained from the Illumina Human Methylation 450 BeadChip (Illumina, San Diego, CA, United States). The methylation sites related to prognosis were screened using univariate COX analysis, and the independent predictors of LUAD patients were identied using multivariate COX analysis of least absolute shrinkage and selection operator (LASSO) Cox regression analysis. Finally, a model with 5 methylation sites as the main body to predict the prognosis of OS in patients with LUAD was obtained. According to the risk grouping of the prediction model, Kaplan-Meier curve and the receiver operating characteristic (ROC) curve were performed in the test and training sets to assess the predicted capacity of the model. In addition, a nomogram constructed by combining the risk score of methylation group and other related clinicopathological factors to verify the reliability of our model. Results: We constructed a m6A-related 5-DNA methylation site model to predict OS in LUAD patients. According to the results of the Kaplan-Meier curve, both the test set and the training set, the high-risk group showed a worse prognosis. The AUCs of the 5 DNA methylation signature at 1, 5 and 10 years in test datasets were 0.730, 0.649 and 0.726, respectively, and 0.679, 0.656 and 0.732 in training datasets. Finally, we constructed a nomogram to further verify the reliability of the model. Conclusion: In this study, we analyzed the methylation sites of m6A-related genes and established a m6A-related 5-DNA methylation site model to predict OS in LUAD patients.


Introduction
Based on the latest information provided by the International Agency for Research on Cancer for the year 2020, there were 19.3 million new con rmed cancer cases and 10 million cancer deaths worldwide 1 . Lung cancer is the leading cause of cancer deaths. As shown in 2017, the total number of patients who died from breast, prostate and colorectal cancers combined was lower than the number of patients who died from lung cancer 2 . Epigenetics helps put forward novel views into understand tumorigenesis 3 .
Epigenetics has been an attractive eld in recent years 4 . Most research pays attention to DNA methylation and histone modi cations 5 . However, with the development of high-throughput sequencing, RNA strands were found to contain modi cations such as N6-methyladenosine (m6A), Cytosine hydroxylation (m5C) and N1-methyladenosine (m1A) during translation 6,7 . These modi cations have proven to be closely related to RNA metabolism, especially mRNA degradation, splicing, stability and translation 8 .
Over 100 RNA modi cations have been identi ed 9 . The m6A modi cation inserts a methyl substituent onto the N atom at position 6 of adenosine and is one of the most common modi cations found in eukaryotic mRNA 10 . Among other members of the RNA family, m6A methylation can also be detected in tRNA, ribosomal RNA and microRNA 11 . M6A methyltransferases (Writers), m6A demethylases (erasers) and m6A methylated reading proteins (Readers) regulate m6A methylation dynamically and reversibly 12 . Recently, m6A modi cations have been linked to obesity, autoimmune diseases, infertility and many other diseases 13,14 . In addition, m6A modi cations are closely related to tumor progression. For example, in cervical squamous cell carcinoma (CSCC), the expression of FTO is signi cantly increased and the level of m6A methylation is signi cantly reduced, suggesting a poor prognosis 15 . FTO promotes the development of lung squamous cell carcinoma (LUSC) 16 . There is also evidence suggesting that targeting METTL3 mRNA 3'UTR binding sites inhibits the proliferation of non-small cell lung cancer (NSCLC) cells 17 . Therefore, a full understanding of the key m6A RNA methylation regulatory factors is critical for the treatment of lung cancer. Among all types of lung cancer, the incidence rate of lung adenocarcinoma (LUAD) is the highest 18 . However, little is known about the physiological process of m6A modi ed regulators in lung adenocarcinoma.
In our study, genomic methylation pro les of 15 widely reported m6A regulators in LUAD patients were obtained from the TCGA database, and an overall survival (OS) risk prediction model based on DNA methylation was established. Its ability as a molecular prognostic model was evaluated by ROC analysis, and a nomogram was used to determine the reliability of DNA methylation factors as LUAD biomarkers. Our conclusions will provide new methods for predicting OS in LUAD patients.

DNA methylation data collection
The LUAD-DNA CpG site information was obtained from the Illumina Human Methylation 450 BeadChip (Illumina, San Diego, CA, United States) and the TCGA-biolinks package in R language was used to analyze clinical data 25 . β values were used to visualize DNA methylation levels using the formula: β = M/(U+M+100), where M represents methylation of the target CpG site and U represents unmethylation. A total of 485,577 methylation sites were obtained. The methylation data was matched with screened samples to evaluate the correlation between DNA methylation levels and OS.

Establishment and veri cation of the prediction model
The 455 samples were divided into two parts: two-thirds of the clinical samples were divided into the training group to identify and build a prognostic risk model. The remaining one-third of samples were placed into the validation group to con rm the predictive prognostic value of the biomarker. Signi cant methylation sites related to the prognosis of LUAD were identi ed using LASSO regression analysis. Meanwhile, to prevent over tting, LASSO 1000 iterations were conducted using the "glmnet algorithm" package in R. We established a risk model based on the following formula: Risk Score = ∑ (Methylation level of each independent prognostic site× regression coe cient).
Patients in the training set were divided into high and low risk groups according to the risk score. Kaplan-Meier was used to draw the survival curve of high and low risk groups. A bilateral logarithmic rank test was used to test the survival difference between the high risk and low risk groups. A ROC curve was used to explore the survival of patients at 1, 5 and 10 years. In addition, the same method was used to verify the prediction model in the test set.

Establishing a predictive nomogram
According to the 'rms' package from R software, a nomogram was constructed to evaluate the reliability of the prediction tool. Other pathological parameters (gender, staging, N staging, T staging) and methylation risk score were evaluated using COX proportional risk assessment (including multivariate and univariate). Concordance index (C index), calibration chart and ROC curves were used together to evaluate the prediction ability of the nomogram.
2.5 Immune cell in ltration and biological pathways.
Immune and stromal cells are two types of cells that in ltrate the tumor immune microenvironment. We used "ESTIMATE" in the R package to evaluate the immune score, matrix score and tumor purity 26 . Quantitative analysis of the immune cell in ltration level was observed using GSVA and Single-sample GSEA (ssGSEA) in the R language GSVA software package 27,28 . Finally, we evaluated the 24 types of immune cells, including natural immune cells and acquired immune cells. To explore the relationship between the predictive model and immune in ltration, we analyzed the different 24 immune cells and their in ltration levels in the high risk and low risk groups (p<0.05). Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of the DEGs were performed using Metascape (http://metascape.org).

Statistical analyses
Univariate analysis was performed in the training set to screen methylation sites signi cantly associated (P 0.01) with OS in LUAD patients. Then, Lasso Cox regression analysis was used to further establish prognostic methylation sites and construct a Cox multiple regression model. A survival curve was drawn using Kaplan Meier method and was used to evaluate OS between the two groups. In this study, p < 0.05 was considered as statistically signi cant.

Results
3.1 Identi cation of the 5-methylation site signature in LUAD patients β values of 217 methylation sites of the m6A-related genes were extracted, and 85 differential methylation sites were obtained through differential analysis. Finally, 11 methylation sites (P 0.01) related to prognosis were obtained using univariate analysis. Then, 11 DNA methylation sites were selected as candidate prognostic factors for LUAD patients. We randomly divided the LUAD samples from the TCGA into a training dataset (n = 320) and a test set (n = 135). Next, based on Lasso Cox regression analysis and multivariate Cox proportional hazard regression, a risk score model was constructed using 11 candidate methylation sites, and nally the risk score formula of 5 methylation sites was obtained: Risk score = 10.31586047 × cg15425530 + 8.315651802 × cg01506766 − 1.566587167× cg26651810+ 6.831866332 ×cg15249062 −2.830317036 × cg07620844.
With the median risk score employed as the threshold, a prognosis classi er was established to classify individuals into high-and low-risk classes in the training and test datasets (Fig. 1A-F). Results showed that the high-risk group showed worse OS than the low-risk group in the training and test datasets (Fig.  1G & H).
We measured the predictive ability of the 5 DNA methylation signatures for predicting OS by Area Under the Curve (AUC) of Time-dependent ROC curves. The AUCs of the 5 DNA methylation signature at 1, 5 and 10 years in test datasets were 0.730, 0.649 and 0.726, respectively (Fig. 1I), and 0.679, 0.656 and 0.732 in training datasets (Fig. 1J), indicating that the 5-DNA methylation signature showed strong potential to function as a prognostic marker in clinical practice.

Development and assessment of the nomogram
To further verify the ability of the 5-DNA methylation prediction model, a multivariate Cox model univariate Cox model were used for some other clinicopathological factors (gender, stage, T stage and N stage) and methylation relevant risk score ( Fig. 2A & B). The results of multivariate Cox regression analysis showed that the biomarker of 5-DNA methylation signi cantly correlated with OS (P < 0.05, HR 2.58, 95% CI (1.22 ~ 5.46)) in test datasets, and the result showed in training datasets (P < 0.05, HR 1.59, 95% CI (1.04 ~ 2.43)), which indicated that prognostic signature is an independent prognostic predictor in LUAD patients. Then, a nomogram with risk levels and clinical risk characteristics to predict 1 -, 2 -, and 3year OS incidence was constructed (Fig. 2C& D). Results showed that C index, AUC and calibration maps had higher values (Fig. 3), strongly proving the reliability of the nomogram as an important model for predicting OS of LUAD.

5-DNA methylation signature-associated immune in ltration and biological pathways
To identify the 5-DNA methylation signature-associated biological pathways, we performed differential gene expression analysis for the high-and low-risk groups, and functional enrichment analysis of DEGs was performed by metascape. A few of the top 20 pathways, including the Pro Talpha C8 complex, regulation of cell projection organization, Vitamin D receptor pathway and Chromatin modifying enzymes were observed (Fig. 4A & B). Using the GSVA package, a TCGA LUAD mRNA dataset was used to perform a single-sample Gene Set Enrichment Analysis (ssGSEA). The correlation between the different types of immune cell in ltration and 5-DNA methylation signature risk scores was analyzed using ssGSEA. Results shown in Figure 4C indicate a signi cant enrichment of immune cells in the low-risk group compared with the high-risk group, and the in ltration of HLA, iDCs and Type-II-IFN-Response were more obvious (P 0.01).

Discussion
LUAD is still contributes to high morbidity and mortality 29 . Surgery is the best treatment of choice. However, postoperative recurrence, metastasis and even death still result in patients treated with surgery 30 . Therefore, it is necessary to develop a new and more reliable prognostic model. A large literature has shown that multiple molecular markers can be used to determine the prognosis of a variety of cancers 31,32 . Furthermore, DNA methylation can also be used to predict the prognosis of cancer 33,34 . The Kaplan-Meier method was used to analyze DNA methylation sites related to the survival and prognosis of LUAD, and the receiver operating characteristic (ROC) analysis was used to evaluate its ability as a molecular prognostic model. At the same time, we also generated a nomogram to judge the reliability of DNA methylation factors as a LUAD biomarkers. We nally developed the m6A regulatory factor-DNA methylation signature, which was able to predict the OS of LUAD patients.
In this study, nal 5-methylation sites were obtained from four genes: ZC3H13, RBM15B, FTO and METTL16. ZC3H13, RBM15B and METTL16 come from the "writers" group and FTO belongs to the "erasers" group. It has been reported that these genes play an important role in the occurrence and development of cancer. For instance, Zhu et al reported that ZC3H13 may be an upstream regulator of the Ras-ERK signal pathway, and inactivated Ras-ERK signal transduction inhibits the proliferation and invasion of colorectal cancer 37 . Zhang et al believe that the high expression of ZC3H13 and METTL16 indicates that the survival rate of breast cancer patients is poor, while the high expression of RBM15B indicates a higher survival rate 38 . Tao et al reported that FTO is highly expressed in bladder cancer and promotes its occurrence by regulating the MALAT / miR-384 / MAL2 axis through m6A RNA modi cations 39 . Results show that four genes related to the 5-DNA methylation signature play an important role in tumor progression.
According to the C index, ROC curves and correction maps, we successfully established a methylation nomogram of m6A-related regulatory proteins, which can effectively predict the OS of LUAD patients. Univariate COX analysis con rmed that the 5-DNA methylation signature was signi cantly related to the prognosis of LUAD. Combined with age, sex, tumor stage and other multivariate COX analysis, the model was more reliable to predict OS in LUAD patients. The successful establishment of the nomogram further con rmed that the prognostic model of the methylation signature of m6A-related regulatory proteins effectively predicts OS in LUAD patients. Meanwhile, the results may contribute to the development of accurate biomarkers.
Our study still showed some limitations. First, the nomogram model can also refer to other factors related to tumor prognosis in addition to the clinicopathological factors contained in the o cial website of the TCGA. In addition, further clinical trials are needed to verify the application value of the 5-DNA methylation signature. Moreover, the large set and complex statistical analysis of DNA methylation determination also increase the di culty for further experimental veri cation. Therefore, the 5-DNA methylation signature still has a long way to go before being applied in the clinic. However, our results also have great advantages. First, we used univariate and multivariate COX analyses to screen methylation sites of the m6A-regulatory proteins related to the prognosis of LUAD, excluding other linear interferences in the study, making the results more reliable. Second, we successfully established a nomogram related to DNA methylation of the m6A-regulatory protein, and used an effective method to predict OS in LUAD patients. The exact probability of survival can be effectively judged by the nomogram, while related studies of many other molecular markers only determine the possibility of patient survival, which provides strong evidence for the application of our model in the clinic.

Conclusion
In summary, we obtained genome-wide methylation data for LUAD patients from the TCGA, and established an OS risk prediction model based on DNA methylation related to m6A-regulatory proteins. We hope that these results can be applied to the clinic after being con rmed by larger, more centers and more precise studies.  Time-dependent receiver operating characteristic curves at 1, 3, 5 and 10 years based on the prognostic signature.  The calibration plots for predicting patient 1-, 3-,5-or 7-year OS in the training set. and test set.