Immune-related DNA methylation data-based molecular 1 classification associated with the prognosis of patients with 2 hepatocellular carcinoma

11 Background ： The combination of epigenetic drugs and immunotherapy should be able 12 to develop an optimal treatment plan for hepatocellular carcinoma (HCC), yet its 13 mechanism is still in the preliminary exploration stage. The purpose of this study is to 14 analyze the DNA methylation and gene expression profiles of immune-related CpG 15 sites to identify the molecular subtypes and CpG sites related to the prognosis of 16 HCC. 17 Methods: In this study, the DNA methylation and gene expression datasets were 18 downloaded from The Cancer Genome Atlas database, together with immune-related 19 genes downloaded from the immunology database and analysis portal database to explore the prognostic molecular subtypes of HCC. Univariate and multivariate 21 survival analysis was used for selecting the significant methylation sites, and the 22 consensus clustering was performed to find the best molecular subtype associated 23 with the survival of HCC. Next, we used the least absolute shrinkage and selection 24 operator (LASSO) algorithm to construct a prognostic-related model and performed 25 internal verification. Finally, we explored the levels of 16 immune-related genes 26 expression correlate with the infiltration levels of immune cells in HCC. 27 Results: By performing consistent clustering analysis on 830 immune-related CpG 28 sites in 231 samples of a training set, we identified seven subgroups with significant 29 differences in overall survival. Finally, 16 classifiers of immune-related CpG sites 30 were constructed and used in the testing set to verify the prognosis of DNA 31 methylation subgroups, and the results were consistent with the training set. Using the 32 TIMER database, we analyzed 16 immune-related CpG sites expression with the 33 abundance of six types of immune infiltrating cells and found that most are positively 34 correlated with the level of infiltration of multiple immune cells in HCC. 35 Conclusions: This study screened potential immune-related prognostic methylation 36 sites and established a new prognosis model of HCC based on DNA methylation 37 molecular subtype, which may help in the early diagnosis of HCC and developing 38 more effective personalized treatments. 39

In the global epigenomic changes, DNA methylation is one of the important 86 epigenetic regulators, which can adjust the expression of related genes and the 87 phenotype of cells [14] and is closely related to the inactivation of the X chromosome 88 [15], tissue differentiation [16], cellular development [17], and genetic imprinting 89 [18]. Unlike genetic mutations, changes in DNA methylation can be reversed and 90 occur early in the body. Abnormal changes can be detected through body fluids and 91 have the potential for early diagnosis and prognosis of cancer [14]. To date, there is 92 already conclusive evidence that epigenetic markers can be used for prognosis and 93 predictive biomarkers in oncology, which will lead to breakthroughs in the prevention 94 and treatment of human cancer and will expand into other diseases in the future [19]. 95 A comprehensive analysis of DNA methylation and gene expression also contributes 96 to identify epigenetic drivers in cancer [20]. Xu [21] constructed and validated a 97 diagnostic prediction model for HCC based on ten specific methylation markers.  HumanMethylation450 Bead-Chip array in TCGA database were retrieved from the 127 UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/), which is an 128 application that gathers various -omic data with clinical and phenotype data to study 129 the correlation between them. The data downloaded from this website contained 380 130 tumor tissues and 50 adjacent normal tissues [30]. In addition, a total of 1811 IRGs 131 were derived from the immunology database and analysis portal (ImmPort, 132 https://www.immport.org/) [31]. 133 Preliminary screening of DNA methylation loci in hepatocellular carcinoma 134 The data of TCGA-LIHC DNA methylation array contain 485,578 methylation 135 sites, and the methylation level of each probe is represented by the β value, which 136 ranges from 0 to 1, with 0 indicating unmethylation and 1 indicating complete 137 methylation. On the one hand, we deleted samples with more than 70% missing 138 methylation sites or containing "NA" sites and removed unstable genomic sites, such 139 as methylation sites on sex chromosomes. Furthermore, the regulation of gene 140 silencing/expression by abnormal DNA methylation in the promoter region, which is 141 located 2,500 bp upstream to 500 bp downstream of the gene, is one of the key   -0.14*cg14076258±0.56*cg15140465±0.58*cg15929078±0.31*cg19476647+ 205 -0.05*cg21282997+-1.69*cg23165623+-0.70*cg24065044+-1.91*cg26822501. 202 The expression profile data of specific CpG methylation sites were subsequently 206 extracted from both the training and testing group, followed by substitution into the 207 model for calculation. According to the above-mentioned formula, we can get the risk 208 score of each methylation site. The training set is divided into a high-risk group and a     To discover the molecular subgroups related to the prognosis of HCC, we 260 performed a consensus cluster analysis on the methylation levels of the 830 261 methylation sites obtained in the univariate and multivariate analysis mentioned above.

262
Taking advantage of the ConsensusClusterPlus package in R software, we re-sampled 263 the qualified data 1,000 times and judged the optimal molecular subtype according to  We further analyzed the differences in 830 methylation sites in HCC subtypes, 301 obtaining 238 subtype-specific CpG sites and 189 genes in the promoter region. The 302 heatmap in Figure 5B shows that cluster 4 has the most differential methylation sites, 303 and all of them are hypomethylation sites. Clusters 5 and 3, with a good prognosis, 304 were enriched to 34 and 48 differential methylation sites, most of which were 305 hypermethylation sites. Compared with other clusters, cluster 5 has the highest 306 methylation level (Figure 6).

307
To study the biological mechanism of these specific methylation sites in the  Figure   334 8A, the ordinate is the correlation coefficient. As the value of λ increases, the 335 correlation coefficient gradually becomes 0. The ordinate in Figure 8B is the error of 336 cross-validation. When the cross-validation error is the smallest, we can see that the 337 optimal coefficient to be included in the model is 16.

338
Subsequently, we used the formula in the Materials and Methods section to 339 calculate the risk score of each sample, ranked the samples according to the risk score, 340 and selected the median value of the risk score to divide the training set into two 341 groups of high and low risk. As shown in Figure 9, the high-risk group has a worse 342 prognosis ( Figure 9A), and the risk score is between -11 and -1 ( Figure 9B). As the 343 risk score increases, the patient's OS rate decreases ( Figure 9C). As shown in Figure   344 9D, patients in the high-risk group have a worse prognosis, which is associated with  infiltrating levels of CD8 + T cell, neutrophil and dendritic cell ( Figure 13N).  to discover molecular subgroups related to the prognosis of HCC. We found that the 461 survival difference among each subtype is statistically significant (Figure 4A), and 462 every subtype has different clinical characteristics. In addition, we annotated the 830 463 specific methylation sites and obtained 513 genes located in the promoter region.

464
Combining this with transcriptome data, we extracted the expression of these genes in   Yes.

555
Competing interests 556 The authors declare that they have no conflicts of interest. We thank Accdon (www.accdon.com) for its linguistic assistance during the 569 preparation of this manuscript.