Integrative modeling of multi-omics data for predicting tumor mutation burden in lung cancer patients

Immunotherapy has been widely used in the treatment of lung cancer, and one of the most effective biomarker for the prognosis of immunotherapy currently is tumor mutation burden (TMB). Although whole-exome sequencing (WES) could be utilized to assess TMB, several problems prevent its routine clinical application. TPM:TMB prediction model; SNPs:single-nucleotide polymorphisms; UCSC:University Cruz; RPM:reads per million; BMI:Body Mass Index; GO:Gene Ontology; KEGG:Kyoto Encyclopedia of Genes and Genomes; LASSO:least absolute shrinkage and selection operator; PPV:positive predictive value; NPV:negative predictive value; PCA:principal component analysis; ROC:receiver operating characteristic; TSS1500:sequence region from − 200 to -1500 nt upstream of the transcription start site; TSS200:sequence region − 200 nt upstream of the transcription start site; ECM:extracellular matrix; EPIMMUNE:epigenomic prole based on a microarray DNA methylation signature; FPKM:Fragments per kilobase per million mapped reads; Ct:cycle threshold.


Abstract Background
Immunotherapy has been widely used in the treatment of lung cancer, and one of the most effective biomarker for the prognosis of immunotherapy currently is tumor mutation burden (TMB). Although whole-exome sequencing (WES) could be utilized to assess TMB, several problems prevent its routine clinical application.

Methods
To develop a simpli ed TMB prediction model, patients with lung adenocarcinoma (LUAD) in The Cancer Genome Atlas (TCGA) were randomly split into training and validation cohorts, and categorized into TMBhigh (TMB-H) and TMB-low (TMB-L) groups respectively.

Results
Based on the 610 differentially expressed genes, 50 differentially expressed miRNAs and 58 differentially methylated CpG sites between TMB-H and TMB-L patients, we constructed 4 predictive signatures and established TMB prediction model through machine learning methods that integrating the expression or methylation pro les of 7 genes, 7 miRNAs and 6 CpG sites. The multi-omics model exhibited excellent performance in predicting TMB with the area under curve (AUC) of 0.911 in the training cohort and 0.859 in the validation cohort. Besides, the signi cant correlation between the multi-omics model score and TMB was observed.

Conclusions
In summary, we developed a prognostic TMB prediction model by integrating multi-omics data in patients with LUAD, which might facilitate the further development of quantitative real time-polymerase chain reaction (qRT-PCR) based TMB detection assay.

Background
Lung cancer is one of the most common malignancies worldwide, and it is the rst leading cause of tumor-related mortality with an increasing incidence in recent years [1]. It was reported that 2.1 million new cases of lung cancer were diagnosed around the world in 2018, which accounted for 11.6% of all new cancer patients [2,3]. Despite the improvements in chemotherapy and targeted therapy, the 5-year overall survival (OS) for patients with lung cancer remains poor [1,4]. Nevertheless, immunotherapy, especially the application of immune checkpoint inhibitors (ICIs), had made a great breakthrough in the treatment of cancer and dramatically increased survival rate and quality of life for patients with lung cancer [5][6][7][8][9].
As the most successful representative of immunotherapy, programmed cell death-1/programmed cell death ligand-1 (PD-1/PD-L1) inhibitors had shown better performance over the conventional chemotherapy in terms of OS, response rate, and progression-free survival (PFS) for the treatment of lung cancer [10,11]. Four ICIs that targeting PD-1 or PD-L1 have been approved by the U.S. Food and Drug Administration (FDA) for treatment in patients with non-small cell lung cancer (NSCLC) [12]. Furthermore, a large amount of clinical research had demonstrated that immunotherapy alone or in combination with chemotherapy could be used for rst-line treatment of patients with metastatic lung cancer [13][14][15][16]. It was reported that patients with higher PD-L1 expression had better outcomes compared to patients with lower or no PD-L1 expression using the anti-PD-L1 antibody clone 22C3 [17]. Unfortunately, only 10%-20% of NSCLC patients have considerable curative effects and the majority of patients cannot bene t from immunotherapy [18][19][20], therefore biomarker is urgently needed to rationalize the utilization of immunotherapy for patients.
Tumor mutation burden (TMB) emerged recently as a reliable biomarker that signi cantly correlated with immunotherapy e cacy across a wide spectrum of tumor types. TMB is de ned as the number of somatic mutations per megabase (Mb) of the genome examined. Previous studies found that higher TMB was associated with improved objective response, durable clinical bene t, and PFS in NSCLC patients treated with PD-1/PD-L1 inhibitors [21]. It had been reported that PFS among stage IV or recurrent NSCLC patients with high TMB was signi cantly longer with PD-1/PD-L1 plus cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) treatment than with chemotherapy [22]. Moreover, through analyzing 7,033 patients with different types of cancer, TMB was found to be a useful biomarker for predicting the response of ICIs across different types of cancer, and higher TMB (highest 20% in each histology) was associated with better OS [23].
Whole-exome sequencing (WES) is considered as the gold standard for the measurement of TMB, but it is time-consuming and carries high cost [24]. Thus, targeted next generation sequencing (NGS) that pro les the mutation landscape of selected genes has been adopted as an alternative approach for predicting the TMB. Memorial Sloan Kettering-Integrated Mutation Pro ling of Actionable Cancer Targets (MSK-IMPACT) (468 genes) and FoundationOne companion diagnostic (CDx) (324 genes) are two extensively utilized targeted NGS methods, and both of them have been approved by the FDA for clinical application (https://www.cancer.gov/news-events/cancer-currents-blog/2017/genomic-pro ling-tests-cancer). Despite the fact that targeted NGS is effective in predicting TMB, various problems arise for its routine clinical application, such as the limit of detection, germline mutation exclusion, and standard cutoff threshold determination [25]. Moreover, targeted NGS is still time-consuming and costly compared with other clinical molecular tests [25]. In an effort to establish a simpli ed, more cost-effective and faster approach to predict TMB in patients with lung adenocarcinoma (LUAD), this study was conducted to develop a multi-omics model that can precisely categorize patients into high TMB or low TMB for better immunotherapy prognosis.
In this study (Fig. 1), we rstly divided patients in The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) into TMB-high (TMB-H) and TMB-low (TMB-L) groups. Then, the differentially expressed genes, miRNAs and differentially methylated CpG sites were calculated between the two TMB groups in training cohort. Next, a multi-omics TMB prediction model (TPM) involving the expression pro les of selected genes, miRNAs and methylation pro les of CpG sites was established through the machine learning method with high sensitivity and speci city. Finally, the independent cohort (validation cohort) was used to verify the performance of TPM, therefore enabling the prediction of the TMB in patients with LUAD.  Table S1) [26]. The somatic mutation pro les (mutation annotation format, MAF) were processed by Mutect software. Missense mutations, nonsense mutations, splice-site mutations, frameshift insertions, frameshift deletions, in-frame insertions or in-frame deletions identi ed in the samples were regarded as positive mutations. The gene expression pro les of 594 samples were annotated through g:Pro ler website [27], and normalized using the scale method in limma package [28]. The low abundance pro les were eliminated. The DNA methylation pro les were annotated using IlluminaHumanMethylation450kanno.ilmnl12.hg19 R package . Quality control for the DNA methylation pro les was conducted through min R package to eliminate certain CpG sites [29], in which the singlenucleotide polymorphisms (SNPs) existed [30], multiple mapping to human reference genome was found [31], and the methylation information of any samples was not available. In addition, the CpG sites located in the sex chromosomes were excluded for analysis [32]. The miRNA expression pro les of 495 samples including 450 samples from LUAD tissue and 45 samples from matched normal lung tissue in TCGA database were downloaded from the University of California Santa Cruz (UCSC) Xena database (https://xena.ucsc.edu/public). Then, the miRNA expression pro les were transformed into reads per million (RPM) and the miRNAs expressed in more than 10% of patients with LUAD were extracted. The clinical information of 522 patients with LUAD from the TCGA database was obtained using TCGAbiolinks R package [26]. The clinical information covered submitter id, age, gender, tumor stage, state, weight, Body Mass Index (BMI), alcohol history, height, days to last follow up, years smoked and race, the information was used for subsequent analysis (Table 1).

TMB calculation and classi cation of patients
Somatic mutation pro les of the acquired samples were processed by Mutect software, and the identi ed somatic mutations, including base substitution, deletions, and insertions, were ltered according to the following criteria: rst, the minimum sequencing coverage for the mutations should be greater than or equal to 10; second, the variant allelic fraction should be greater than or equal to 5%. Then, the TMB was calculated as the total count of somatic mutations identi ed divided by 38 Mb, which is the length of exons in this study. According to the previously reported cutoff threshold of 10 in patients with LUAD [22,33], they were divided into TMB-H group (TMB ≥ 10) and TMB-L group (TMB < 10). Density plot of TMBdistribution for all patients with LUAD and boxplot of the correlation between tumor stage and TMB was drawn by ggplot2 R package .

Tumor-in ltrating immune analysis
Tumor-in ltrating immune analysis was performed through Tumor Immune Estimation Resource (TIMER) tool [34], which is a comprehensive resource for systematical analysis of immune in ltrates across diverse cancer types. The estimated abundances of six immune in ltrates (B cells, CD4(+) T cells, CD8(+) T cells, Neutrophils, Macrophages and Dendritic cells) were compared between TMB-H and TMB-L patients.

Multi-omics analysis between TMB-H and TMB-L patients
Differentially expressed genes between TMB-H and TMB-L patients in the training cohort were identi ed through limma R package with adj.p-value < 0.01 and log2 FoldChange > 3 [28], and then illustrated in volcano plot and heatmap by ggplot2 and pheatmap R package respectively . Differentially expressed miRNAs between TMB-H and TMB-L patients in the training cohort were identi ed through limma R package with adj.p-value < 0.05 and log2 FoldChange > 0.35 [28], and then illustrated in volcano plot and heatmap by ggplot2 and pheatmap R package respectively . In addition, the target genes of the differentially expressed miRNAs were searched and analyzed through miRWalk website tool (http://mirwalk.umm.uni-heidelberg.de/) [35]. Differentially methylated CpG sites between TMB-H and TMB-L patients in the training cohort were identi ed through limma R package with adj.p-value < 0.05 and log2 FoldChange > 0.15 [28], and then illustrated in volcano plot and heatmap by ggplot2 and pheatmap R package respectively and annotated by IlluminaHumanMethylation450kanno.ilmn12.hg19 R package .

Functional enrichment analysis
Gene Ontology (GO) database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database are two extensively used approaches for functional enrichment studies of large-scale genes [36,37]. We rst converted the gene symbols into ENTREZ ID via org.Hs.eg.db R package, and then GO and KEGG analysis of differentially expressed genes were implemented using ggplot2, enrichplot and clusterPro ler R packages [38]. Meanwhile, the GO and KEGG enrichment analysis were conducted for the target genes of differentially expressed miRNAs using the same method as described above.
1.6 Construction of the TPM Firstly, we constructed 4 possible prediction biomarker signatures: gene signature (45 genes), miRNA signature (45 miRNAs), CpG site signature (45 CpG sites), and multi-omics signature (15 genes + 15 miRNAs + 15 CpG sites) using differentially expressed genes, miRNAs and differentially methylated CpG sites with a minimum p-value between TMB-H and TMB-L patients in the training cohort. Then, the expression pro les or methylation pro les of each gene, miRNA and CpG site from the 4 biomarker signatures in the training cohort were extracted. Next, the least absolute shrinkage and selection operator (LASSO) logistic regression model analysis was performed to select the optimal biomarker signature for predicting TMB through glmnet R package [39]. The predictive performance for each biomarker signature was evaluated by the lambda.min and matched area under curve (AUC). Finally, the differentially expressed or methylated genes, miRNAs and CpG sites identi ed with non-zero regression coe cients were used to construct the TPM. The TPM score was calculated using the regression coe cients from the LASSO analysis to weight the expression or methylation level of the chosen biomarkers. The validation cohort was used to evaluate the performance of the TPM through assessing the predicting sensitivity, speci city, positive predictive value (PPV), negative predictive value (NPV) and the AUC.

Principal component analysis (PCA)
Differentially expressed genes, miRNAs, and differentially methylated CpG sites identi ed through LASSO analysis were used to perform PCA. The expression or methylation pro les of the genes, miRNAs and CpG sites were extracted from each patient, and ggfortify R package was utilized to conduct the PCA .

ROC analysis
Receiver operating characteristic (ROC) curve is a useful tool to evaluate classi ers for biomedical and bioinformatics applications. In this study, ROC curve analysis was conducted using pROC R package to investigate the performance of TPM in predicting TMB [40].

Correlation analysis and regression analysis
The correlation between TPM score and TMB was analyzed by cor.test R function with the two-side pearson method. Samples were plotted in two-dimensional plots with the TPM score and TMB value.
Regression analysis between TPM score and TMB was performed using lm R function.

TMB-based division of patients with LUAD
The WES data of tumor tissue from a total of 567 patients were acquired from the TCGA-LUAD database, and the clinical characteristics of the patients were summarized (Table 1), the mean age of the patients was 65.800, among which 242 were males and 280 were females (Table 1). TMB was calculated as the number of somatic mutations identi ed per megabase (Mb) in the tumor tissue sample of each patient. It was found that most of patients with LUAD had a TMB ranging from 0 to 40 (Figure 2A). According to the cutoff threshold of TMB=10, 184 patients were classi ed as TMB-H and 383 patients were classi ed as TMB-L ( Figure 2B). The TMB-H and TMB-L patients were found evenly distributed in different tumor stages as expected ( Figure 2C). From the TCGA-LUAD project, tumor samples from 440 patients were found having coupled WES, RNA-seq, miRNA-seq, and DNA methylation data ( Figure 3A, Additional le 6: Table S1), among which 148 patients belong to TMB-H group and 292 patients belong to TMB-L group ( Figure 3B). Patients were then randomly split into the training cohort (70%, 103 TMB-H patients Vs 204 TMB-L patients) and the validation cohort (30%, 45 TMB-H patients Vs 88 TMB-L patients) without overlap for developing a multi-omics model to predict TMB.

The landscape of tumor-in ltrating immune cells in patients with LUAD
The proportions of different tumor-in ltrating immune cells between TMB-H patients and TMB-L patients were calculated through the website tool "TIMER", the landscape of the tumor-in ltrating immune cells was summarized via the heatmap, in which the abundance of CD4(+) T cells (p = 0.030) showed more abundant density in TMB-L patients compared with TMB-H patients ( Figure 4A, Additional le 1: Figure  S1). Whereas the B cell, CD8(+) T cell, Dendritic cell, Macrophage cell and Neuprothil cell had similar density between TMB-H patients and TMB-L patients (Additional le 1: Figure S1, Figure 4A). Meanwhile, the correlations among different tumor-in ltrating immune cell types were moderate or weak ( Figure 4B).
These results suggested that the TMB status might be associated with the abundance of CD4(+) T cell.

Multi-omics analysis of transcriptome, miRNAome and methylome between TMB-H and TMB-L patients
Differentially expressed genes, miRNAs and differentially methylated CpG sites between TMB-H patients and TMB-L patients in the training cohort were identi ed. In summary, 480 genes and 36 miRNAs were upregulated in TMB-H patients, whereas 130 genes and 14 miRNAs were downregulated in TMB-H patients ( Figure 5A, B, C, D). Moreover, 10 CpG sites were hypermethylated and 48 CpG sites were hypomethylated in TMB-H patients ( Figure 5E, F). GO-enrichment analysis with clusterPro ler R package suggested that the differentially expressed genes were mainly involved in the biological processes including nuclear division, chromosome segregation, organelle ssion and so on (Additional le 2: Figure   S2A). Moreover, the differentially expressed genes were also enriched in the cellular component including nuclear chromosome part as well as in the molecular function including damaged DNA binding, cell adhesion molecule binding, single-stranded DNA binding and ribonucleoprotein complex binding. KEGG pathways enrichment analysis suggested that the differentially expressed genes were mainly related to pyrimidine metabolism (Additional le 2: Figure S2B). These results demonstrated that differentially expressed genes might correlated with carcinogenesis-related processes [41]. In addition, GO and KEGG enrichment analysis were also performed for the target genes of differentially expressed miRNAs. The target genes were found to be mainly enriched in netrin-activated signaling pathway, DNA-binding transcription activator, single-stranded RNA binding, MAP kinase activity and so on (Additional le 3: Figure S3A). KEGG analysis suggested that the target genes were enriched in Glycerolipid metabolism, Platinum drug resistance, EGFR tyrosine kinase inhibitor resistance and Endocrine resistance (Additional le 3: Figure S3B). In addition, most of the differentially methylated CpG sites were found to locate in the gene body regions (Additional le 4: Figure S4), and 5 CpG sites were found to locate in the TSS1500 (sequence region from -200 to -1500 nt upstream of the transcription start site) and TSS200 (sequence region -200 nt upstream of the transcription start site) region of genes (Additional le 4: Figure S4, Additional le 7: Table S2).

Machine learning based construction of TPM
To develop TPM, we rstly generated 4 possible prediction biomarker signatures including the gene signature, miRNA signature, CpG site signature and multi-omics signature, which were composed of expression pro les of top 45 differentially expressed genes, top 45 differentially expressed miRNAs and top 45 differentially methylated CpG sites between TMB-H and TMB-L patients respectively (Additional le 8: Table S3). Using 10-fold cross-validation method and two-class logistic regression (type.measure = 'auc'), we then implemented the LASSO logistic analysis to select the optimal signature based on the expression or methylation pro les of the genes, miRNAs and CpG sites from the training cohort. The optimal biomarkers for the 4 prediction biomarker signature were obtained with non-zero regression coe cients ( Figure 6, Table 2), and as a result, the multi-omics signature with maximum measure (0.868) was selected as the optimal biomarker signature for predicting the TMB ( Figure 6D, Table 2). PCA using the shrunk multi-omics signature suggested that TMB-H patients and TMB-L patients could be separated obviously (Additional le 5: Figure S5). Based on the multi-omics signature, we nally constructed TPM by weighing the expression or methylation of the genes, miRNAs and CpG sites through the regression coe cients from the LASSO analysis (Table 3). The TPM was showed as the following math formula: TPM score = -1.555696454 * cg02031308 -0.939485314 * cg03286742 -0.532855695 * cg04046889 -1.603385472 * cg12095807 -1.171295176 * cg16794961 -1.341848062 * cg24553235 + 0.203290638 * YBX2 + 0.000323171 * HLTF + 0.355814358 * KLC3 + 0.017454209 * WRNIP1 + 0.010739241 * CKS1B + 0.013056543 * RNF26 + 0.039397451 * ZYG11A + 0.582628142 * hsa-miR-571 + 3.954182602 * hsa-miR-586 + 0.068239671 * hsa-miR-151b + 0.000724033 * hsa-miR-378i + 0.25824073 * hsa-miR-6727-5p -0.731679875 * hsa-miR-502-3p -0.007119299 * hsa-miR-6798-3p. AUC of the constructed TPM in the training cohort was 0.911 showing its superior predictive accuracy ( Figure 7A). Besides, the p-value of two-side t-test was 3.40e-48 between TPM score and TMB ( Figure 7B), which suggested that TPM score was highly correlated with TMB in patients with LUAD.

Evaluation of the predicting accuracy of TPM in the validation cohort
To evaluate the predicting e cacy of the TPM constructed in the training cohort, the validation cohort was used in the subsequent analysis. We used the expression or methylation pro les of genes, miRNAs and CpG sites in the patients from the validation cohort as input parameters for calculating the TPM score. According to the threshold of -3.366, 41 patients from TMB-H group were predicted as TMB-H, and 66 patients from the TMB-L group were predicted as TMB-L. In summary, the TPM has a sensitivity of 0.911, speci city of 0.750 and accuracy of 0.805 in predicting the TMB in the validation cohort, and the PPV was 0.651, NPV was 0.943 (Additional le 9: Table S4). ROC analysis revealed the AUC of the TPM in the validation cohort was 0.859 ( Figure 8A), and the p-value of the two-side t-test was 1.19e-14 between the TPM score and TMB ( Figure 8B). These results suggested that the TPM performed relatively high TMB-predicting accuracy.

Discussion
Immunotherapy has been demonstrated particularly successful in NSCLC treatment and is being adopted as a rst-line treatment option worldwide [13,22,42]. Nevertheless, only a small portion of the unselected patients can bene t from immunotherapy [25,43]. Therefore, biomarkers for patient selection become important to achieve effective therapy. TMB has been recognized as the effective prognostic biomarker in NSCLC patients according to the results from numerous clinical trials [22,23,44,45]. Although the targeted NGS has been proved to be an alternative approach of WES for the prediction of TMB, the accuracy and simplicity of targeted NGS remain challenging as various parameters should be taken into consideration [46]. In this study, we developed a mathematic multi-omics model that could precisely predict the TMB in patients with LUAD through LASSO logistic regression analysis, and the prediction accuracy of the model was validated in an independent cohort with high sensitivity and speci city (Fig. 8). Furthermore, as the input parameter in this model includes expression pro les of 7 genes, 7 miRNAs, and the methylation pro les of 6 CpG sites, which could be obtained through quantitative real time-polymerase chain reaction (qRT-PCR). This model paved the way for the further development of the simpli ed qRT-PCR based clinical assay for TMB prediction.
The tumor microenvironment refers to the network of cells and structures that surround a tumor cell, and it consists of immune cells, mesenchymal cells, endothelial cells, extracellular matrix (ECM) molecules, and in ammatory mediators [47]. High TMB indicates the presence of more neoantigens in the tumor microenvironment, which promote the in ammatory response and result in the alteration of transcriptomic and epigenetic signature [47]. It has been proved that gene expression signatures in the tumor microenvironment were associated with the prognosis in NSCLC [48][49][50][51]. In agreement with previous studies, the differentially expressed genes between TMB-H and TMB-L patients identi ed in this study were found to enrich in the immune-related damaged DNA binding, nuclear division, nuclear chromosome segregation, organelle ssion, single-stranded DNA binding, ribonucleoprotein complex binding and pyrimidine metabolism (Additional le 2: Figure S2) [52][53][54][55][56]. The 7 genes used in constructing the TPM might be involved in the carcinogenesis, for instance, Y box binding protein 2 (YBX2) was differentially expressed between different subtypes of breast cancer and was one of RNA processing factors which contribute to subtype-speci c splicing [57]. Meanwhile, it was found that LINC00958 promoted cell proliferation and migration in oral squamous cell carcinoma through miR-627-5p/YBX2 axis [58]. Helicase-like transcription factor (HLTF) was a tumor suppressor, the expression of HLTF was negatively correlated with colorectal cancer and its overexpression regulated the TGF-β/SMAD pathway to suppress the migration and invasion of CRC [59]. It was reported that the wild type alleles of kinesin light chain 3 (KLC3) Lys751Gln was signi cantly correlated with greater smoking intensity, and genetic variations may in uence the progression of lung cancer [60]. Then, WRN helicase interacting protein 1 (WRNIP1) was inhibited by miR-22 to increase the radiosensitivity of small-cell lung cancer cells [61]. Besides, the expression of CDC28 protein kinase regulatory subunit 1 (CKS1B) in lung cancer cells developed the chemoresistance through Hsp90 and MEK1/2 pathway [62]. miRNAs expression in the tumor microenvironment plays a crucial role in mediating and controlling several immune and cell interactions, and convolute in the regulation of immune checkpoints, PD1 and PD-L1 [63]. It was reported that a 25 miRNA-based signature classi er could predict the TMB level with high accuracy [64]. A cluster of highly expressed miRNA including hsa-miR-492, hsa-miR-498, hsa-miR-320 were found to be correlated with tumorigenesis of retinoblastoma [65]. Moreover, the invasion, proliferation and migration of cervical cancer cells were found to be promoted by hsa-miR-6727-5p, which might play an important role in cervical cancer progress [66]. In this study, we mapped the differentially expressed miRNAs between TMB-H and TMB-L patients to their target genes, and enrichment analysis of the target genes suggested that DNA-binding transcription activator, single-stranded RNA binding, MAP kinase activity, Glycerolipid metabolism, Platinum drug resistance and EGFR tyrosine kinase inhibitor resistance related to lung cancer metabolism were affected in the tumor microenvironment. The 7 miRNAs used in constructing the TPM in this study include hsa-miR-571, hsa-miR-586, hsa-miR-151b, hsa-miR-378i, hsa-miR-6727-5p, hsa-miR-502-3p, hsa-miR-6798-3p. It was reported that hsa-miR-378i, hsa-miR-92a-3p, hsa-miR-93-5p and so on were important for identifying both colon and rectal cancer [67]. In previous study, it was demonstrated that gallbladder carcinoma metastasis was developed by lncRNA-HGBC through the activation of miR-502-3p-SET-AKT cascade [68].
Changes in DNA methylation is one of the most important epigenetic alteration in tumor microenvironment. A multicenter study in 15 hospitals suggested epigenomic pro le based on a microarray DNA methylation signature (EPIMMUNE) could serve as an effective biomarker in predicting the outcomes of NSCLC patients treated with PD-1 inhibitors [69], and the FOXP1 could be associated with validated predictive biomarkers for better-selecting patients to bene t with immunotherapy [69]. The CpG sites signature also had a relatively high predictive performance (measure = 0.861) of TMB, suggesting its great value in NSCLC prognosis. Cg02849937 located in the TSS1500 region of C7orf13, and its expression level was inverse associated with promoter methylation using whole-genome integrative analysis [70]. In addition, cg27281030 located in the TSS1500 region of NLRP12. Previous studies demonstrated that NLRP12 could regulate in ammation and tumor, and it is believed that hepatocellular carcinoma was negatively regulated by NLRP12 through the suppression of in ammation and proliferation of hepatocytes [71]. Besides, cg23179456 located in the TSS1500 region of ADCY4, which has been proved to be a biomarker for breast cancer [72].
Through multi-omics analysis, we integrated gene or miRNA expression and DNA methylation data to re ect the subtle alterations of the tumor microenvironment to precisely predict the TMB for better prognosis of patients with LUAD in immunotherapy. Fragments per kilobase per million mapped reads (FPKM) of YBX2, HLTF, KLC3, WRNIP1, CKS1B, RNF26, ZYG11A, RPM of hsa-miR-571, hsa-miR-586, hsa-miR-151b, hsa-miR-378i, hsa-miR-6727-5p, hsa-miR-502-3p, hsa-miR-6798-3p as well as beta-value of cg02031308, cg03286742, cg04046889, cg12095807, cg16794961, cg24553235 were extracted from the RNA-seq, miRNA-seq and Illumina HumanMethylation450 BeadChip respectively for calculating the TPM score, and the threshold of TPM for predicting the TMB level had been predetermined as -2.947 with the speci city of 0.887 and the sensitivity of 0.825 in the training cohort. Although the FPKM, RPM and beta value involved in the TPM were based on high-throughput sequencing or chip analysis, it is feasible to convert them to cycle threshold (Ct) value in qRT-PCR, and thus simplify the prediction of TMB by using benchtop qRT-PCR instrument. The conversion of FPKM in different samples to Ct values could be probably through the comparison of the targeted gene expression to reference gene expressions, such as actin and eukaryotic elongation factor (eEF), which have relative consistent expression under different tumor microenvironment, and the beta value of CpG sites could also be converted into Ct value though the quantitative MethyLight technology [73]. To our best knowledge, this is the rst time to construct the TPM for patients with LUAD from multi-omics view.

Conclusion
In summary, the present study developed a multi-omics risk model with high speci city and sensitivity in predicting the TMB for patients with LUAD, and laid the base for a more simpli ed and cost-effective TMB test assay establishment. Nevertheless, this study was solely bioinformatics research, and clinical sample validation for the TPM had not been implemented. The training cohort and the validation cohort used in this study were relatively small in size and required further expansion to increase the accuracy.          Characterization of the top50 differential expressed genes, miRNAs and differential methylated CpG sites between TMB-H and TMB-L patients. Volcano plot showed the differentially expressed genes (a) and miRNAs (c) or differentially methylated CpG sites (e) between TMB-H and TMB-L patients. The red dots represented upregulated genes, miRNAs or hypermethylated CpG sites; the blue dots represented downregulated genes, miRNAs or hypomethylated CpG sites; the black dots represented genes, miRNAs or CpG sites with no signi cantly differential expression or methylation. Hierarchical clustering heatmap of differentially expressed genes (b) and miRNAs (d) or differentially methylated CpG sites (f) between TMB-H and TMB-L patients. Orange indicated the upregulated genes, miRNAs or hypermethylated CpG sites; Blue indicated the downregulated genes, miRNAs or hypomethylated CpG sites. TMB-H, TMB-high; TMB-L, TMB-low.