Identification of prognostic related methylation sites
After patient data were preprocessed as described above, 206,635 methylation sites were identified. To identify the specific CpG sites that were significantly correlated with overall survival in laryngeal cancer, the univariate COX regression analysis was conducted. Among these methylation sites, 17,751 methylation sites were identified as potential DNA methylation biomarkers for overall survival in patients with laryngeal cancer. Subsequently, a multivariate COX regression analysis was performed on these CpG sites (variables also include TMN stage, age, stage), and 455 independent CpG sites related to the prognosis of patients with laryngeal cancer were identified (Supplementary Table 1).
Molecular subtype classification based on Consensus Clustering
Using the ConsensusClusterplus R software package, the methylation profiles of 455 CpG sites from 112 samples were used for consensus clustering of samples to obtain molecular subtypes of laryngeal cancer. Consensus unsupervised clustering of 122 samples from laryngeal cancer patients revealed 2–7 clusters. Compared with 3, 4, and 6 clusters, cluster 5 had a lower value for the proportion of ambiguously clustered pairs (PAC), which reflected a near-perfect stable partitioning of the samples at the correct K value (Fig. 1A-1C). The consensus matrix shown in Fig. 2A represents the consensus for k = 5 and displays a well-defined 5-block structure. A heatmap corresponding to the dendrogram in Fig. 2A with T category, N category, M category, stage, age, and DNA methylation subgroup as the annotations is shown in Fig. 2B.
Kaplan-Meier survival analysis showed a significant difference in the prognosis between the five subgroups (P < 0.05). As shown in Fig. 3A, cluster 3 has the best prognosis, while cluster 4 and 2 have the worst prognosis. Then, we analyzed the intra-cluster ratios of 5 clusters according to the T, N, M, stage, and age, as shown in Fig. 3B-3F. The correlation trends between features and specific clusters are as follows: Cluster 3,4 with young patients; Cluster 3 with the male population; Clusters 2 with higher grade and advance stage; Cluster 3,5 with early-stage; Cluster 4 with higher M stage; Cluster 1,3 with higher T stage; Cluster 3 with lower N stage (Fig. 4F). These results indicate that each clinical parameter is related to a different intra-cluster ratio.
Annotate prognostic-related CpG sites and functional enrichment analysis
Genomic annotation of the above-mentioned prognostic CpG sites was used to identify the promoter regions of 567 genes. Then we conducted a functional enrichment analysis of 567 genes and identified significantly 405 GO terms and 17 enriched pathways (P < 0.05), as shown in Fig. 3A, B. The top 5 enriched GO terms were enriched in signal transduction by p53 class mediator, autophagy, DNA repair complex and p53 binding, which included basic cancer-related biological processes. The three most significantly enriched pathways were the TGF − beta signaling pathway, autophagy, and cell cycle. As shown in Fig. 3C, close relationships were identified among the 17 pathways and between those pathways and the cell cycle and cellular senescence.
Then, the expression of methylated genes identified in the subgroups was explored. The expression values of 567 genes out of 122 samples are available. The gene expression heat map is shown in Fig. 3D. The gene expression patterns differ between these subgroups, which indicates that the DNA methylation level usually reflects the expression of these genes.
Identifying subtype-specific CpG sites
The expression of methylated genes identified in the subgroup was explored. 455 CpG sites were analyzed for differences among 5 molecular subtypes, and 70 molecular subtype-specific CpG sites were identified. Among them, subtype 3 has the most differentially expressed CpG sites, which are hypermethylated compared with other subtypes (Table 1). The gene expression heatmap is shown in Fig. 4C. The differences in gene expression patterns between subgroups suggest that DNA methylation levels usually reflect the expression of these genes. Then, we further screen for cluster-specific methylation sites by including the methylation sites as features of clusters. Finally, the 32 cluster-specific methylation sites were identified and the heat map was shown in Fig. 5A. Analysis using clusterProfiler showed that these genes were enriched in 3 pathways, as shown in Fig. 5B. Cluster 3 had the largest number of specific sites, all of which were hypermethylated, and the methylation level was the highest among all the clusters (Fig. 6).
Table 1
Multivariate COX stepwise regression analysis of hub DNA methylation sites.
DNA methylation site
|
coef
|
HR
|
HR.95L
|
HR.95H
|
pvalue
|
cg00413617
|
-3.90264
|
0.020189
|
0.000579
|
0.704155
|
0.031278
|
cg01330280
|
8.118517
|
3356.042
|
19.0643
|
590790.9
|
0.002089
|
cg05893785
|
5.122021
|
167.6739
|
9.392133
|
2993.413
|
0.000496
|
cg07098752
|
-4.02448
|
0.017873
|
0.000666
|
0.479747
|
0.016506
|
cg10699171
|
3.121739
|
22.68581
|
2.139974
|
240.4916
|
0.009555
|
cg16508480
|
2.441683
|
11.49237
|
0.722159
|
182.8884
|
0.083736
|
cg18470295
|
4.456162
|
86.15617
|
2.747028
|
2702.151
|
0.011252
|
cg24464397
|
-3.47284
|
0.031029
|
0.000574
|
1.678696
|
0.08809
|
cg24911113
|
-3.74838
|
0.023556
|
0.001724
|
0.321816
|
0.004956
|
Establishment and evaluation of a prognostic prediction model for laryngeal cancer patients
Cluster 3 was chosen as the candidate cluster because it contains a large number of samples, has a good prognosis, and has the most specific methylation sites. In multivariate Cox regression method, nine methylation sites as optimal features were ultimately recognized, including cg00413617, cg01330280, cg05893785, cg07098752, cg10699171, cg16508480, cg18470295, cg24464397 and cg24911113 (Table 2). Then, we obtained the risk score based on the formula described above and calculated the risk score of each sample (Fig. 7A). As the risk score increased, the methylation level of nine specific sites increased significantly. Hypermethylation was associated with low-risk patients, while hypomethylation was associated with high-risk patients. The area under the curve of the ROC curve for 1,3,5 year reached 0.80, 0.86, and 0.89, respectively, indicating that the model functioned well (Fig. 7B). Based on the median of risk score, the samples were divided into two groups: high-risk group and low-risk group. Patients in the high-risk group suffered a lower survival probability (Fig. 7C). Univariate and multivariate Cox regression analysis suggested that the established model could become an independent predictor, including age, gender, histological grade, stage, TNM stage, and risk score (Fig. 8A, B, Table 3).
Table 2
Identification of differentially expressed DNA methylation sites between Cluster 3 and other Clusters
DNA methylation site
|
conMean
|
treatMean
|
logFC
|
pValue
|
fdr
|
cg00413617
|
0.325219
|
0.468693
|
0.527231
|
3.79E-07
|
8.22E-06
|
cg00765828
|
0.263837
|
0.487255
|
0.88503
|
6.36E-07
|
1.32E-05
|
cg01330280
|
0.133973
|
0.092864
|
-0.52874
|
0.000205
|
0.001084
|
cg01501009
|
0.295805
|
0.433979
|
0.552982
|
6.88E-07
|
1.36E-05
|
cg01760756
|
0.233642
|
0.364166
|
0.640295
|
6.13E-08
|
2.15E-06
|
cg02002583
|
0.322265
|
0.515678
|
0.678223
|
1.52E-08
|
9.89E-07
|
cg02825211
|
0.298547
|
0.507008
|
0.764052
|
1.16E-08
|
9.65E-07
|
cg04524477
|
0.183084
|
0.109166
|
-0.74598
|
9.77E-05
|
0.000563
|
cg04671611
|
0.304968
|
0.509565
|
0.74061
|
3.36E-08
|
1.70E-06
|
cg04682802
|
0.428694
|
0.622653
|
0.538483
|
3.64E-07
|
8.22E-06
|
cg05893785
|
0.278157
|
0.195427
|
-0.50927
|
5.49E-05
|
0.000357
|
cg07098752
|
0.27403
|
0.42237
|
0.624174
|
3.82E-06
|
4.82E-05
|
cg09685934
|
0.153233
|
0.242603
|
0.662869
|
1.19E-06
|
1.93E-05
|
cg10378804
|
0.149836
|
0.088401
|
-0.76125
|
2.40E-05
|
0.000195
|
cg10699171
|
0.342727
|
0.216581
|
-0.66215
|
0.000302
|
0.00143
|
cg11308277
|
0.204648
|
0.357149
|
0.80338
|
4.74E-08
|
1.96E-06
|
cg11718317
|
0.148513
|
0.257033
|
0.791361
|
3.24E-05
|
0.000246
|
cg13614286
|
0.154288
|
0.240086
|
0.63793
|
1.48E-07
|
3.97E-06
|
cg13765961
|
0.676689
|
0.470397
|
-0.52461
|
9.78E-07
|
1.71E-05
|
cg14901205
|
0.324492
|
0.590179
|
0.862969
|
1.27E-08
|
9.65E-07
|
cg15746719
|
0.330852
|
0.530793
|
0.681965
|
1.10E-06
|
1.85E-05
|
cg16508480
|
0.387702
|
0.556004
|
0.520144
|
4.98E-05
|
0.000333
|
cg17102963
|
0.267475
|
0.471817
|
0.818821
|
1.94E-09
|
8.56E-07
|
cg18470295
|
0.266957
|
0.417205
|
0.644148
|
7.33E-06
|
7.76E-05
|
cg18885392
|
0.492172
|
0.34046
|
-0.53168
|
5.11E-06
|
5.67E-05
|
cg21549195
|
0.356057
|
0.605258
|
0.765441
|
2.06E-07
|
5.21E-06
|
cg21609106
|
0.253179
|
0.48718
|
0.944299
|
9.72E-09
|
9.65E-07
|
cg23288973
|
0.227864
|
0.4023
|
0.8201
|
9.05E-07
|
1.65E-05
|
cg24120357
|
0.094457
|
0.062855
|
-0.58764
|
0.014751
|
0.028683
|
cg24464397
|
0.217307
|
0.339064
|
0.641821
|
8.16E-06
|
8.44E-05
|
cg24911113
|
0.482802
|
0.69979
|
0.53549
|
4.35E-08
|
1.96E-06
|
cg25563456
|
0.279144
|
0.437056
|
0.646809
|
1.55E-06
|
2.44E-05
|
Table 3
The prognostic effect of different clinical parameters.
Variable
|
Univariate analysis
|
Multivariate analysis
|
|
HR
|
95%Upper CI
|
95%Lower CI
|
P-value
|
HR
|
95%Upper CI
|
95%Lower CI
|
P-value
|
age
|
0.999068
|
0.958695
|
1.041141
|
0.964665
|
0.989581
|
0.950541
|
1.030224
|
0.610039
|
gender
|
0.27971
|
0.12816
|
0.61047
|
0.001377
|
0.253659
|
0.097922
|
0.657081
|
0.004732
|
grade
|
0.859083
|
0.54693
|
1.349394
|
0.509711
|
0.56247
|
0.334321
|
0.946314
|
0.03017
|
stage
|
1.016037
|
0.637785
|
1.618617
|
0.946612
|
0.612462
|
0.283796
|
1.321759
|
0.2116
|
T
|
0.971235
|
0.669208
|
1.409572
|
0.877939
|
1.720153
|
0.902914
|
3.277086
|
0.099065
|
M
|
1.032813
|
0.72288
|
1.475629
|
0.85923
|
0.644025
|
0.422328
|
0.982099
|
0.040967
|
N
|
1.944678
|
1.456393
|
2.596671
|
6.53E-06
|
2.352073
|
1.652825
|
3.347146
|
2.02E-06
|
riskScore
|
2.718282
|
1.950762
|
3.78778
|
3.48E-09
|
3.094405
|
2.114331
|
4.528781
|
6.13E-09
|