Development and Validation of Epigenetic Signature Predict Survival for Patients with Laryngeal Squamous Cell Carcinoma.

Establishing epigenetic signature to improve the accuracy of survival prediction and optimize therapeutic strategies for laryngeal squamous cell carcinoma (LSCC) by a genome-wide integrated analysis of methylation and the transcriptome. LSCC DNA methylation datasets and RNA sequencing datasets were acquired from the Cancer Genome Atlas (TCGA). MethylMix was applied to detect DNA methylation-driven genes (MDGs), which developed an epigenetic signature. The predictive accuracy and clinical value of the epigenetic signature were evaluated by receiver operating characteristic and decision curve analysis, and compared with tumor-node-metastasis (TNM) stage system. In addition, prognostic value of the epigenetic signature was validated by external Gene Expression Omnibus (GEO) database. According to five MDGs of epigenetic signature, the candidate small molecules for LSCC were screen out by the CMap database. A total of 88 DNA MDGs were identified, five of which (MAGEB2, SUSD1, ZNF382, ZNF418, and ZNF732) were chosen to construct an epigenetic signature. The epigenetic signature can effectively divide patients into high-risk and low-risk group, with the area under curve (AUC) of 0.8 (5-year overall survival [OS]) and AUC of 0.745 (3-year OS). Stratification analysis affirmed that the epigenetic signature was still a significant statistical prognostic model in subsets of patients with different clinical variables. Multivariate Cox regression analysis indicated that the efficacy of epigenetic signature appears independent of other clinicopathological characteristics. In terms of predictive capacity and clinical usefulness, the epigenetic signature was superior to traditional TNM stage. In addition, the epigenetic signature was confirmed in external LSCC cohorts from GEO. Finally, CMap matched the 10 most significant small molecules as promising therapeutic drugs to reverse the LSCC gene expression. An epigenetic signature, with five DNA MDGs, was identified and validated in LSCC patients by integrating multidimensional genomic data, which may offer novel research directions and prospects for individualized treatment of patients with LSCC.

molecules as promising therapeutic drugs to reverse the LSCC gene expression.

Conclusion:
An epigenetic signature, with five DNA MDGs, was identified and validated in LSCC patients by integrating multidimensional genomic data. Compared TNM stage alone, it generates more accurate estimations of the survival probability and maybe offer novel research directions and prospects for individualized treatment of patients with LSCC.

Background
As an aggressively malignant neoplasm, laryngeal squamous cell carcinoma (LSCC) is one of the most prevalent cancers in head and neck region, and represents 85-95% of all laryngeal cancer [1].
According to the American Cancer Society, the estimated respective new cases and new death are 13 150 and 3 710 annually, with incidence and mortality rates of 4.0 and 1.1 per 100 000, respectively [2]. More than 50% of LSCC patients have progressed to the locally advanced stage or metastasis when they are diagnosed due to the absence of the early asymptomatic nature of this cancer, which affects the prognosis of patients [3]. Though treatments have improved during the past decades, long-term survival outcome remains unsatisfactory, with 5-year survival rate only around 60% in the USA [4]. Thus, it is of great importance to ascertain novel biomarkers for valuable molecular targeted therapy and establish predictive models to optimize therapeutic strategies in LSCC patients.
LSCC is a heterogeneous disease in terms of therapeutic response and clinical prognosis. To a certain extent, clinical heterogeneity can be related to distinct molecular subtypes by gene expression patterns [5].RNA expression profiles usually exhibit relative stochastic and rapid variations, which can be directly related to important pathways in malignant cells. DNA methylation, serves as a major epigenetic modification that is involved in the transcriptional regulation of genes and maintains the stability of the genome, is less variable and semi-stable, but shows large variations linked to the activity of cellular processes. Therefore, the combination of transcriptome and epigenetic status would be helpful to identify new markers and improve the accuracy of prognosis. What's more, changes in DNA methylation with a high level of plasticity allows tumor cells to quickly adapt to changes in metabolic restrictions or physiology during the process of tumorigenesis [6,7]. Hence, it is reasonable to analyses the DNA methylation pattern in the tumor cells to find novel therapeutic targets and predictors for survival in LSCC patients.
The availability of high throughput genomic assays such as RNA-seq and DNA methylation-seq have opened the possibility for a comprehensive characterization of all molecular alterations of cancers, leading to the discovery of new biomarkers of clinical and therapeutic value [8]. In the present study, we performed a genome-wide integrated analysis of methylation and the transcriptome to characterize the crosstalk between DNA methylation and RNA regulation for patients with LSCC in The Cancer Genome Atlas (TCGA) database. We identify methylation-driven genes (MDGs), and then developed an epigenetic signature capable of predicting the overall survival, and further screen correlated small molecule target drugs. The proposed epigenetic signature was validated in external datasets from the GEO database. In addition, we assessed the predictive ability and clinical application of the epigenetic signature and compared it to the TNM stage.
. Besides, patients who were alive at last follow-up are considered as censored observations. All data were normalized in the R computing environment using the edgeR package or Limma package.

Materials And Methods
Methylation data were in the form of β value, representing the ratio of the methylation probe data vs total probe intensities. Then, the average DNA methylation value for all CpG sites correlated with a gene was calculated and merged into a matrix with the function of TCGA-Assembler. Data were utilized according to the data access policy of TCGA and GEO. All analyses were conducted per relevant regulations and guidelines.
Identification of DNA methylation-driven genes The MethylMix R package was employed for analysis that integrated DNA methylation data for 117 LSCC samples and 16 normal samples and paired gene expression data for 111 LSCC samples to appraise DNA methylation events that have a significant impact on the expression of the corresponding gene, indicating that the gene is a DNA methylation-driven gene (MDG). A total of three procedures of MethylMix analysis were described as previous studies [9].In addition, the differential methylation value where gene with a positive differential methylation value signifies hypermethylation and gene with a negative differential methylation value signifies hypomethylation can be applied in subsequent analysis.
Functional enrichment and pathway analysis of methylation-driven genes Gene ontology (GO) analysis, including the molecular function (MF), biological process (BP) and cellular component (CC), was performed on identified MDGs using the DAVID database (http://david.abcc.ncifcrf.gov/), which provides integrative and systematic annotation tools for uncovering biological meaning of genes. And we used GOplot R package to visualize the result.
Additionally, pathway analysis was carried out for the MDGs with ConsensusPathDB (http://cpdb.molgen.mpg.de/), which is a functional molecular interaction database, integrating information on gene regulation, signal transduction, biochemical metabolism, protein interacting, genetic interacting signaling in humans. Humancyc, Kegg, Reactome, Wikipathways, Smpdb, and Biocarta and Signalink were selected for subsequent analysis. P < 0.05 was set as the threshold value.
Construction and verification of an epigenetic signature First, univariable Cox regression analysis is utilized to select prognostic related MDGs with P < 0.05 as the threshold. After primary filtering, the prognostic related MDGs were all assembled into multivariate Cox regression model, then an epigenetic signature based on these prognostic related MDGs is developed.
The epigenetic signature risk score was generated through a linear combination of the expression levels of independent DNA MDGs using coefficients from multivariate Cox regression as the weights.
On the basis of the median risk score, LSCC patients were classified into two cohorts (high-risk cohorts and low-risk cohorts). Survival differences between high-risk group and low-risk group were assessed by Kaplan-Meier survival analysis and then compared by the log-rank test. Time-dependent receiver operating characteristic (ROC) curves by means of the timeROC package were applied to evaluate predictive performance. Also, stratified analysis base on various clinical characteristics is conducted to evaluate the discrimination ability of epigenetic signature. Importantly, the GSE65858 dataset from the GEO database was applied to validate the predictive value of the epigenetic signature.

Gene set enrichment analysis (GSEA)
We downloaded GSEA software from the GSEA website (http://software.broadinstitute.org/gsea/index.jsp). 109 LSCC patients were categorized into high-risk cohorts and low-risk cohorts, which was served as the phenotypes. Gene sets related to biosignaling on MSigDB (http://software.broad institute.org/gsea/downloads.jsp) could be found on the GSEA home as reference gene sets. Each analysis was repeated 1000 times according to the default weighted enrichment statistical method. Statistically significant pathways were screened based on the false discovery rate (FDR) < 0. 25 Comparison of predictive performance and clinical usefulness between epigenetic signature and TNM stage ROC analysis by means of the survivalROC package was carried out to investigate and compare the discrimination ability of the epigenetic signature with traditional TNM staging in TCGA and GEO database. Decision curve analysis (DCA) by means of the stdca.R was employed to evaluate the clinical usefulness and net benefit of the epigenetic signature, and compared to TNM stage in TCGA and GEO database [10].

Joint survival analysis and methylated loci associated with OS
To further investigate the impact of prognostic related MDGs on LSCC patients, we carried out joint survival analysis combined methylation and gene expression to identify hub genes associated with prognosis in patients with LSCC by the survival R package. In addition, we retrieved relevant loci for prognostic related MDGs from downloaded LSCC methylation data. We merged the value of corresponding methylated sites into one matrix and conducted univariate Cox analysis to select potential prognosis methylated loci.P < 0.05 was regarded as statistically significant.

Identification of candidate small molecules agents
The CMap database (http://www.broadinstitute.org/cmap/), which collects more than 7,000 gene expression profile changes induced by various small molecular drugs, was adopted to investigate candidate small molecules agents for treatment in LSCC patients. 5 prognostic related MDGs of epigenetic signature were divided into down-regulated and up-regulated groups, which were uploaded into CMap database to screen related active small molecules agents. Then, the enrichment scores were calculated, which signify similarity rang from − 1 to 1. A negative connectivity score (closer to − 1) indicated higher similarity between the genes, which represents potential therapeutic value, whereas a positive connectivity score (closer to + 1) demonstrates the matched small molecule can induce the state of LSCC cells.

Statistical analysis
R software (R version 3.5.2) and SPSS statistics 22.0 were utilized to conduct the statistical analysis. A two-sided P < 0.05 would be recognized as statistically significant except for where a certain P value has been given.

Identification of MDGs, functional enrichment and pathway analyses
Based on three matrices and steps of MethylMix, in total, 88 genes, 77 hypermethylated genes and 11 hypomethylated genes, were defined as the epigenetic drivers with Cor <−0.3, |logFC| > 0and P < 0.05 (Supplementary material 3). The GO term enrichment analysis for MDGs shows the top 6 clusters of enriched sets with significant differences (P < 0.05) (FigureS1). As to molecular function Kaplan-Meier analysis indicated that in low risk cohorts LSCC patients were more inclined to higher OS time than patients in high risk cohorts (P<0.001) (Figure3C). Time-dependent receiver operating characteristic (ROC) curves showed that epigenetic signature had a superior prediction capacity, with AUC of 0.80 (5 years OS) and AUC of 0.745(3 years OS) ( Figure 3D).In addition, stratification analysis was carried out in subsets of patients with different clinical variables (T1-T2 vs T3-T4, N0 vs N1-N3, I-II stage vs III-IV stage, G1-G2 vs G3-G4, positive margin status vs negative margin status, lymphovascular invasion vs no lymphovascular invasion ) for epigenetic signature. In T3-T4, N0 or N1-N3, III-IV stage, G1-G2 or G3-G4, negative margin status, lymphovascular invasion or no lymphovascular invasion subgroup, the epigenetic signature was still a statistically and clinically prognostic model (Figure S3 and S4). However, in T1-T2, I-II stage and positive margin status subgroup did not reach significant statistics. Noteworthily, external GEO cohorts (GSE65858 database) were utilized to verify the predictive performance of the epigenetic signature. As was displayed in Figure 4, patients with low risk score were more prone to survival and had higher OS time than patients with high-risk score, which consistent with the results of the TCGA dataset.
Furthermore, the AUC of epigenetic signature (AUC of 5 year OS: 0.756 and AUC 3 year OS: 0.812) confirmed that the predictive accuracy of the prognostic model was satisfactory.

Gene set enrichment analysis
To investigate potential biological pathways for 5 prognostic related MDGs, we carried out the GSEA, and reveal a total of 56 items were significantly enriched with FDR < 0.25. The level of risk score for epigenetic signature was considered as the phenotypes, and the findings uncovered that high-risk level of epigenetic signature may closely correlated with several important crosstalk, comprising of JAK-STAT signaling pathway, mismatch repair, P53 signaling pathway, T cell receptor signaling pathway, B cell receptor signaling pathway, basal transcription factors, cell cycle, ErbB signaling pathway ( Figure 5).

Independence of the epigenetic signature from clinicopathological features
To investigate whether the epigenetic signature is independent of the clinicopathological characteristics in TCGA database, Univariate Cox regression analysis found that female, the presence of lymphovascular invasion, positive margin status, and high epigenetic signature risk score were associated with shorter OS ( Table 1). Multivariate Cox regression analysis continued to verify that epigenetic signature was an independent predictor of unfavorable OS (HR: 1.52, 95%CI: 1.13-2.04, P= 0.006), after adjustment for other risk variables. In addition, external GEO cohorts (GSE65858 database) were utilized to validate whether the epigenetic signature is also independent of other clinical features. Univariate and multivariate Cox regression analyses were performed, which indicated that the epigenetic signature a significant independent indicator for OS (HR: 1.88, 95%CI: 1.25-2.83, P= 0.002) by adjusting covariates ( Table 2).

Comparison of predictive performance and clinical usefulness between epigenetic signature and TNM stage
To evaluate the predictive ability of epigenetic signature, we compared epigenetic signature to AJCC TNM stage model, ROC curve analysis was performed in TCGA database. As was displayed in Figure   6A and 6B, the AUC of epigenetic signature for predicting 5-year and 3-year OS were 0.795 and 0.759, respectively, while that of TNM stage model were 0.543 and 0.541, respectively. Similar results were also found in the GEO cohorts. The AUC of epigenetic signature for predicting 5-year and 3-year OS were 0.798 and 0.812, respectively, and the AUC of the TNM stage model were 0.621 and 0.649, respectively ( Figure 6C and 6D). Finally, the clinical usefulness of the epigenetic signature was measured by the DCA, an abstract statistical concept, which provided visualized information on the clinical value of a model. DCA graphically revealed that the epigenetic signature at diverse cutoff times (5-year and 3-year OS) was superior to the traditional TNM staging based on the continuity of potential death threshold (x-axis) and the net benefit of risk stratification using the model (y-axis) in TCGA cohorts and GEO cohorts (Figure 7).

Joint survival analysis and methylated loci associated with OS
Joint survival analysis, that is, the methylation and gene expression matched evaluation, was additionally conducted to exploit the prognostic value of MAGEB2, SUSD1, ZNF382, ZNF418 and ZNF732. The hypermethylation and low-expression of ZNF382 and ZNF418 exhibited a marked correlation with the poor prognosis of patients with LSCC; The hypomethylation and high-expression of MAGEB2 and SUSD1 exhibited a conspicuous association with the unfavorable prognosis of patients with LSCC. Nevertheless, the combination of methylation and expression of ZNF732 did not have a significant effect on the prognosis of LSCC patients. A total of 61 methylation sites (10 methylated sites in MAGEB2, 7 methylated sites in SUSD1, 13 methylated sites in ZNF382, 16 methylated sites in ZNF418, and 15 methylated sites in ZNF732) were screened out and the univariate Cox regression analysis uncovered that 4 key methylation loci (2 specific methylation sites in MAGEB2 and 2 specific methylation sites in ZNF382) were significantly associated with LSCC prognosis (TableS1).  With respect to predictive capacity and clinical usefulness, the epigenetic signature was superior to traditional TNM stage. Additionally, the epigenetic signature was validated in external LSCC cohorts from GEO. Finally, CMap matched the 10 most significant small molecules as promising therapeutic drugs to reverse the LSCC gene expression.

Identification of related active small molecules
An increasing number of researches recognize that epigenetic changes such as the hypermethylation of tumor suppressor genes and hypomethylation of oncogenes in the diagnosis, progression and prognosis of LSCC played a critical role [11][12][13]. By quantitative methylation-specific polymerase (qMSP) chain reaction assays in 96 LSCC patients, Shen et al [11] revealed that LZTS2 promoter hypermethylation is linked to risk, progression, and prognosis of LSCC,which can serve as a diagnostic and prognostic biomarker for LSCC. Analyzing the LSCC tissues and Hep-2 cells to investigate the methylation status of the CpG islands of MYCT1 and mRNA levels, Yang et al [12] identified that hypermethylation contributed to the transcriptional downregulation of MYCT1 and could inhibit cancer cell differentiation in LSCC. By interfering with its binding to c-Myc, DNA methylation of the CGCG site (− 695 to − 692) of MYCT1 altered the promoter activity in LSCC. Recently, Weigel et al [13] concluded that a regulatory role of TREX2 DNA methylation for gene expression which might be an indicative of incidence and survival in LSCC.TREX2 DNA methylation could be a potential predictor of treatment response and as a biomarker to understand carcinogenesis in stratified epithelia. To our knowledge, this is the first study carried out a genome-wide integrated analysis of methylation and the transcriptome from TCGA database to seek novel biomarker as potential molecular targeted therapy and to create an epigenetic signature for LSCC patients to optimize therapeutic strategies.
When applying high-throughput methodology with 450,000 probes, it is necessary to distinguish the epigenetic changes ("driver") that act as effectors of the malignant phenotype from alterations of "passenger" without any biologic function [14]. Hence, a model-based tool (MethylMix) was applied to identify those genes with aberrant methylation and related these data to RNA-seq data affecting gene expression. Consequently, we identified a cohort of 88 MDGs in LSCC. The functional analysis indicated MDGs have mainly attached oneself to gene expression (transcription), RNA polymerase II transcription, transcription factor activity, sequence-specific DNA binding and so on. It was a hint that DNA methylation is involved in the dysregulation of genes with distinct functions and is functionally linked to outcomes in LSCC patients. Guidelines, decision making about larynx-preserving are highly relied on tumor TNM staging. In our research, via ROC curve analysis, epigenetic signature shown more precise predictive ability compared with TNM stage model in TCGA and GEO database, which could effectively identify low-risk patients prone to larynx-preserving treatment and avoid needless total laryngectomy. Interestingly, DCA results indicated that LSCC survival-related treatment decisions based on the epigenetic signature led to more net benefit than treatment decisions based on TNM stage, or treating either all patients or none in TCGA and GEO database. Taken together, the current epigenetic signature would be clinically useful for the clinicians in tailoring survival-associated treatment decisions.
One prominent finding in our study was that combining methylation and RNA expression data with survival analysis, we identified 4 MDGs (MAGEB2, SUSD1, ZNF382, ZNF418), which may be serve as potential biomarkers or drug targets for early diagnosis and prognostic assessment. MAGEB2, as a member of the MAGEB family, which belongs to the cancer testicular antigens, is located in the last exon on chromosome X. The MAGEB2 gene has been reported to be overexpression in several types of tumors, such as esophageal cancer [15] and lung cancer [16], which has been implicated in carcinogenesis and considered as a potential cancer biomarker [17]. Pattini et al. [18] indicated that MAGEB2 is activated by promoter demethylation in HNSCC, which has growth-promoting effects on a minimally transformed oral keratinocyte cell line. SUSD1, which encodes the sushi domain-containing protein 1 precursor, acts as a common motif in protein-protein interaction [19]. SUSD1 presented subthreshold associations with venous thromboembolism was detailed in a report by Tang et al [20].
Nevertheless, information regarding the role of SUSD1 in cancer is lacking. ZNF418 (zinc finger protein 418), a novel Krüppel-associated box/Cys2His2 zinc finger protein, which was involved in critical regulators for the initiation, maintenance, and development of tumor via cell differentiation, tumor suppression, proliferation and apoptosis [26,27]. Wang et al [28] revealed that ZNF418 was identified as a direct target of miR-1204 and mediated the function of miR-1204 in HCC cells, which inhibited HCC progression through MAPK and c-Jun signaling pathways and potentially serves as a new prognostic biomarker and therapeutic target for HCC. Thus, further characterization of molecules such as MAGEB2, SUSD1, ZNF382 and ZNF418 will provide a new perspective for the development and progress of LSCC, and aided to find potential therapeutic targets for LSCC patients.
Another prominent finding in our study was that we identify a set of potential small molecule drugs that mimic the expression of normal cellular genes, analyzing the MDGs in CMap database. Small molecule drugs with a highly significant negative enrichment value possessed the potential to alter the gene expression of LSCC, and thus inhibiting the progression of tumors. Simvastatin, as a type of statin, specific inhibitors of 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase, is frequently utilized to treat hypercholesterolemia and prevent cardiovascular diseases. A growing number of researches have reported simvastatin-related potentially beneficial effects in cancers, including prostate cancer, breast cancer, ovarian cancer, and adenocarcinoma [29][30][31][32]. Recently, Ma et al [33] revealed that simvastatin induced cell cycle arrest in the G1 phase in C666-1 cells via decreasing the expression of cyclin-dependent kinase 4 (CKD4) and cyclin D1, and increasing p27 expression in C666-1 cells. Simvastatin treatment inhibited protein kinase B and extracellular signalregulated kinase 1/2 activation, which is a potential chemotherapy agent for nasopharyngeal carcinoma. Thiocolchicoside, a semi-synthetic colchicine derived from plant honeysuckle, is a muscle relaxant and utilized to treat orthopedic disorders and rheumatologic on account of its antiinflammatory and analgesic mechanisms. Reuter et al [34] demonstrate that thiocolchicoside exerts an effect on anticancer via the NF-κB pathway resulting in inhibition of cyclooxygenase-2 promoter activity and NF-κB reporter activity. However, efficacy and safety of those small molecule drugs (including simvastatin and thiocolchicoside) on LSCC are still not investigated. Hence, it is urgently demanded to verify the effect of these candidate small molecule drugs on treating LSCC in further studies.
Despite the remarkable sense, it is inevitable that limitations also existed in our study. First, we merely extract retrospectively target data (TCGA and GEO datasets) through biological algorithm approaches the absence of fresh clinical samples to screen and verify our results. So, the application          Gene set enrichment analysis for identification of the underlying pathways using risk score as the phenotype (A and B).

Figure 5
Gene set enrichment analysis for identification of the underlying pathways using risk score as the phenotype (A and B).