A Prognostic Model Based on TMB-Related Lncrnas Predicts Outcome of Liver Cancer Patients


 TMB-related lncrnas and their clinical significance in liver cancer have not been explored.Combining the expression levels of these lncrnas, somatic mutation files and clinical information in patients with liver cancer,we identified 514 lncrnas that are closely related to TMB.Based on these lncrnas, a TMB-derived lncRNA Signature (TMBLncSig) was established. TMBLncSig categorized the patients into high-risk group and low-risk group. There was a significant difference in the prognosis of the patients between the two groups, which was further verified in the independent test set.TMBLncSig is associated with TMB level in liver cancer patients and has the potential to be used as a TMB level measurement tool. In summary this st provide new thought for further study on the role of lncRNA in the difference of TMB level in liver cancer patients.


Introduction
Primary liver cancer is the second leading cause of cancer-related death worldwide and therefore a major public health challenge [1].Long-term survival and poor prognosis are associated with liver cancer, which kills more than 700,000 people worldwide each year [2].In the current clinical work, the clinical information of patients is mined to predict the disease progression and prognosis of patients, such as age, tumor stage, tumor grade, lymph node involvement.Some studies have pointed out that hepatocellular carcinoma has heterogeneity, which is re ected in the molecular level of clinical traits, as well as the occurrence, development and prognosis of hepatocellular carcinoma.
Turmor with high TMB is generally considered to be increased in neo-antigen burden, so this kind of tumor may bene t more from immunotherapy [3].Moreover, HCC patients with low TMB levels tend to have a better prognosis than those with high TMB levels [4].Long non-coding RNAs (lncRNAs) do not encode proteins and are larger than 200 nt [5].Although lncRNAs do not have the ability to encode proteins, recent studies have shown that they are involved in a variety of biological processes [6].Lncrnas also play a important role in the development and metastasis of tumors [7].With the development of gene sequencing technology, it is gradually found that there are signi cant differences in the expression levels of a large number of lncrnas in different types of tumors, but the functions of most of them are still unknown [8]. With the development of research, people realize that lncrNA also plays an important role in Tumor mutational burden (TMB) .For example, lncRNA DDSR1 participates in DNA damage Response (DDR), and DDR variation will lead to TMB variation.LncRNA NORAD helps to maintain genomic stability, which also affects TMB size [9] [10]. However, the relationship between these tumor mutational burdenrelated lncrnas and the clinical progress of patients has not been further explored.In this study LncRNAs related to TMB was identi ed based on TCGA database,and their ability to indicate TMB level and predict prognosis in patients with liver cancer was explored.

Data collection
We collected Clinical information Transcriptome high throughput sequencing data and somatic mutation data of liver cancer patients form TCGA database (https://portal.gdc.cancer.gov/) [11][12]. We annotated the transcriptome high-throughput sequencing data using GRCH38 [13](last-updated 2019-06) then we obtained the expression of Long non-coding RNA .Among these data, 369 patients with corresponding clinical information, somatic mutation data and lncRNA expression level were selected for further analysis.These patients were divided into the training group and the veri cation group, which consisted of 185,184 patients respectively table 1 .The clinical and pathological characteristics of the patients in TCGA data sets are summarized in table 1.
Identi cation of TMB-associated lncRNAs TMB is calculated as the total number of somatic mutations (including mutations, insertion and deletion of non-synonymous points in exon coding region)/the size of target region, on the unit of mutations/Mb [14].The TMB of each patient's tumor sample was calculated, and the patients were then sorted in TMB size order.The top 25% of patients were de ned as the TMB-high group, while the bottom 25% were de ned as the TMB-low group.The expression pro les of lncrnas in the TMB-high and TMB-low groups were obtained by limMA method (fold change > 2 or <0.5 and false discovery rate (FDR) adjusted P < 0.05),and these were de ned as TMB-related lncrnas.

Statistical analysis
Methods of hierarchical cluster analysis are used in this study ,and we used Euclidean distances and Ward's linkage Method to complete a hierarchical Cluster analysis [15].The relationship between TMBassociated lncRNAs and overall survival was calculated by univariate and multivari Cox proportional hazard regression analysis .Through the above process, TMB-associated lncRNAs related to prognosis were obtained,then a TMB-derived lncRNA Signature (TMBLncSig) was constructed.TMBLncSig score of each patient is shown in the following formula: TMBLncSig is a prognostic risk score for the liver cancer patient.The lncRNAs in the above formula represent their expression levels in the patient samples.The coe cient multiplied before each lncRNA represents the contribution of this lncRNA to the prognostic risk score obtained by multivariate cox analysis of regression coe cients.
In the training group, the median score was used to divide patients into two groups (high-risk group with high TMBLncSig or low-risk group with low TMBLncSig).
Product-limit method is selected to calculate survival rate and median survival.We used log-rank test to explore whether there is a difference between the survival of the high-risk group and the low-risk group [16].P value less than 0.05 means that the difference between the two groups is signi cant [17].TMBLncSig's predictive e cacy of patient survival is evaluated by the time-dependent receiver operating characteristic (ROC) curve [18].All the above statistical analyses were completed by R (version 4.0.3).

Results
Identi cation of TMB-related lncRNAs in liver cancer patients We calculated the Tumor Mutational burden of all Tumor samples from TCGA patients with liver cancer.The patients were sorted by TMB, and the top 25% of patients with the largest TMB(n=90) were treated as the high-TMB group, and the 25% with the smallest TMB(n=88) were treated as the low-TMB group."Limma" package of "R" software was used to screen the differential genes between the low-TMB group and high-TMB group [19].According to the threshold set by us(|FC| 2 FDR 0.05), a total of 514 lncrnas were considered to have different expression levels in the two groups.Among them, 313 lncRNAs were up-regulated in the low-TMB group, and 201 lncRNAs were up-regulated in the high-TMB group(Supplementary Table 1).
Unsupervised hierarchical clustering analysis was carried out for a total of 374 tumor samples using these 514 lncRNAs( Figure 1A) [15]. The samples were clustered into two groups using the above method.Because the TMB of the samples between the two groups is signi cantly different, we de ne the two groups as lowTMB-like group and HightMB-like group.The TMB's median value of the lowTMB-like group is signi cantly higher than that of the HightMB-like group (3.316 vs 1.882,P 0.001 Mann-Whitney U test; Figure 1B).
Development of a TMB-derived lncRNA signature for outcome prediction in the training set In order to explore the effect of these TMB-related lncRNAs on the prognosis of liver cancer patients, we matched the expression levels of these lncRNAs with the clinical information of patients in the training group and the veri cation group. In the training group, we used univariate cox proportional regression analysis and found that 33 TMB-related lncRNAs were signi cantly associated with the prognosis of liver cancer patients(P < 0.05,Supplementary Table 2). The multivariate cox proportional hazard regression analysis was performed on these genes and 10 lncrnas obtained. The TMB-derived lncRNA Signature (TMBLncSig) was constructed to predict the prognosis of liver cancer patients according to the multivariate Cox analysis coe cient and the expression level of these 10 TMB-related lncRNAs(Supplementary Table 3 The TMB-derived lncRNA Signature (TMBLncSig) was used to obtain the risk score of each liver cancer patient in training set, and the patients were divided into two groups according to their median value as the dividing line(0.918).The group with TMBLncSig score above 0.918 was called high risk group, while the group with TMBLncSig score below 0.918 was called low risk group.
Kaplan-meier analysis was used to describe the difference in prognosis between the two groups.
In the training group, all patients were ranked from low to high according to TMBLncSig, and the difference of TMB between the low risk group and the high risk group was observed (Figure 2c, Figure   2D).It showed that There were signi cant differences in TMB between high and low risk groups.As the gure shows, the TMB of the high-risk group is higher than that of the low-risk group(median TMB 3.592 versus 2.842, P < 0.001, Mann-Whitney U test).Lncrna RP1-47M23.3 and DIO3OS were highly expressed in the TMBLncSig low-risk group and Lncrnas AC128709.3,C10orf126,HOXD-AS2,LINC00668,LINC00707,MIR210HG,ZEB2-AS1,RP11-437L7.1 were highly expressed in the TMBLncSig high-risk group.
Independent validation of TMBLncSig in the liver cancer data set with RNA-seq platform To verify whether TMBLncSig can be used for other samples that are not in the training set,We tested its predictive power in the validation set (184 patients of TCGA database). With the same riskscore cutoff as the training set, the veri cation set was divided into the high-risk group and the low-risk group with 105 and 79 pantients respectively. The high risk group had a worse prognosis than the low risk group ( Figure   3A, median OS 0.83 versus 1.35 years, P = 0.008, log-rank test). The high risk group had a lower one-year survival rate than the low risk group(71% versus 87%). The AUC for The 1 year survival were 0.662 in test group through time-dependent ROC curves analysis( Figure 3C). In the validation set, there were signi cant differences in TMB between the high and low risk groups (median 3.41 versus 2.47, P < 0.001, Mann-Whitney U test; Figure 3G).
In all TCGA liver cancer patients, TMBLncSig still has a good predictive effect on the prognosis of patients. We still took 0.918 as the cutoff value and divided all patients in the TCGA set into the high-risk group(n=171) and the low-risk(n=198) group. Similar to the training set validation set, the high risk TCGA group had a worse prognosis than the low risk TCGA group( median survival 1.08 versus 1.52 years,P = 0.001 log-rank test; Figure 4B).The high risk group had a lower one-year survival rate than the low risk group(74% versus 91%).The AUC for The 1 year survival were 0.679 in TCGA set through time-dependent ROC curves analysis( Figure 4D). In the TCGA set , there were signi cant differences in TMB between the high and low risk groups (median 3.5 versus 2.6, P < 0.001, Mann-Whitney U test; Figure 4H).
The TMBLncSig predicts outcome better than TP53 mutation status There are signi cant differences in the proportions of TP53 mutations over TMBLncSig high-risk and lowrisk groups. This is similar in the training set, validation set and TCGA set ( Figure 5A).In the training set, the proportion of TP53 gene mutations in the samples of patients in the high-risk group was 42%, and that in the low-risk group was 26%(chi-square test P =0.039). In the validation set,the proportion of TP53 gene mutations in the samples of patients in the high-risk group was 38%, and that in the low-risk group was 19%(chi-square test P =0.007).A similar phenomenon was observed in the TCGA set,In the TCGA set,the proportion of TP53 gene mutations in the samples of patients in the high-risk group was 40%, and that in the low-risk group was 22% (chi-square test P 0.001). This shows that TMBLncSig is also closely related to TP53 gene mutation. Many literatures have shown that TP53 predicts poor prognosis in liver cancer patients.Therefore, we further observed the prediction of TMBLncSig and TP53 mutation on the prognosis of liver cancer patients.Through TMBLncSig and TP53 mutations, we divided TCGA set HCC patients into four groups.We refered to patients with TP53 mutations and high TMBLncSig risk value as TP53 Mutation/high risk group(n=66). And by doing that, we got TP53 Mutation/low risk group(n=43), TP53 Wild/high risk group(n=98) and TP53 Wild/low risk group(n=150).The survival curves of the four groups are shown in the Figure 5B. Patients in the TP53 Wild/low risk group had the best prognosis, while patients in the TP53 Mutation/high risk group had the worst prognosis(median OS 1.71 years versus 0.833 years , P = 0.01, log-rank test).The ability of TP53 mutation to predict the prognosis of liver cancer patients is limited, and combination with TMBLncSig can achieve better prognosis prediction effect.

Independence of the TMBLncSig from other clinical factors
Strati cation analysis was used to validate the Independence of the TMBLncSig from other Clinical factors.Patients of TCGA Set were divided into young group(n=177) and old group(n=192) at the median age (age = 60).Patients in each group were divided into high-risk and low-risk groups by TMBLncSig,and there were signi cant differences in OS between the high-risk and low-risk groups in both young group (log-rank test P 0.001; Figure 4A) and old group(log-rank test P 0.001; Figure 4B).Then patients in TCGA set were divided into early-stage groups (stage I or II n = 255) and late-stage group (stage III or IV n = 90).TMBLncSig classi ed the early-stage and late-stage groups into high-risk group and low-risk group respectively.The OS differences between low risk groups and high risk group of early-stage and late-stage groups were shown in Figure 4C and Figure 4D respectively.The OS was signi cantly different between the two groups .We used Grade and Tumor status to perform the same operation above, and the results were shown in Figure 4E, Figure 4F, Figure 4G and Figure 4H. TMBLncSig has a good ability to predict the prognosis of patients in early-Tumor status group (T1 or T2, n=273), late-Tumor status group(T3 or T4, n=93),low pathological grading group(G1 or G2, n=231) and high pathological grading group (G3 or G4, n=133).

Discussion
The occurrence, development and treatment of liver cancer has become a research hotspot [20][21][22][23]. Previously, the treatment and prognostic prediction of patients were established according to tumor size, pathological grading , and lymph node metastasis [24][25]. In reality, however, due to the heterogeneity of liver cancer patients, these traditional indicators are often unable to play their roles as predictive indicators [26]. Tumor mutational burden (TMB) is a new biomarker for prediction of response to PD-(L)1 treatment [27][28]. It has also been reported that high TMB predicts poor prognosis in liver cancer patients [29].
LncRNAs has recently been shown to be closely related to the occurrence and development of tumors and abnormal expression of some lncrnas can be prognostic markers in tumor patients [30][31][32][33][34][35] . With the development of research, people realize that lncrNA also plays an important role in Tumor mutational burden (TMB) [36]. For example, lncRNA DDSR1 participates in DNA damage Response (DDR), and DDR variation will lead to TMB variation [37] . LncRNA NORAD helps to maintain genomic stability [38], which also affects TMB size. How to identify TMB-related lncrnas and their clinical signi cance for cancer remains to be explored. Therefore, we matched the expression level of lncRNA with the tumor mutation phenotype to identify TMB-related lncRNA. Then we identi ed 514 TMB-related lncrnas in this computational frame in liver cancer. In addition, on the basis of these TMB-related lncrnas, we built an lncRNA signature (TMBLncSig ) to use these lncRNAs to predict the clinical outcome of patients with liver cancer . In the training set, TMBLncSig can well divide patients into low risk group with good prognosis and high risk group with poor prognosis, and there is a signi cant difference in TMB level between the two groups, which can also be observed in the independent veri cation set. This suggests that TMBLncSig is not only predictive of prognosis, but also an indicator of TMB levels in patients with liver cancer.
"TP53 is the most frequently mutated gene in human cancer." [39] TP53 mutations occur in more than half of human tumors [40]. In this study TP53 mutation ratio was signi cantly different among patients in the high-risk group and the low-risk group. The higher score of the patients obtained from TMBLncSig indicated a higher TP53 mutation ratio.This suggests that TMBLncSig can predict TP53 mutation status to a certain extent. Moreover, it was observed that both the TP53 wild-type patients tand TP53 mut-type patients could be distinguished by TMBLncSig with different prognostic outcomes. The prognosis of both TP53 wild-type and TP53 mut-type patients was better in the low-risk group than in the high-risk group.TMBLncSig may be more capable of predicting prognosis than TP53 mutation status alone and it can identify the TP53 wild -type and TP53 mut -type patients in different intermediate subtype. Although our study, such as the construction of TMBLncSig model, can assess the TMB level of patients to a certain extent and provide a feasible way to predict the prognosis of patients, it still has certain limitations and needs further study. TMBLncSig has been veri ed internally by TCGA data sets , but it still needs to verify its robustness in more independent data sets and different echnology platforms. This study is based on the computational frame of TMB level differences,therefore, the regulation mechanism of TMBLncSig for prognosis and Tumor Mutational burden needs further experimental discovery.

Conclusion
LncRNAs related to TMB was identi ed based on TCGA database, which provided resources and methods for further research on the role of lncRNAs in TMB.A TMB-derived lncRNA Signature (TMBLncSig) was established by combining the expression levels of these lncrnas, somatic mutation les and clinical information in patients with liver cancer.TMBLncSig can be used as a prognostic marker for liver cancer patients and has been validated in independent test set.In this study, TMBLncSig is of great signi cance to the TMB level of liver cancer patients and has a certain guiding role in predicting the prognosis of patients.

Declarations Declaration of interests
The authors declare no potential con ict of interest.    In the boxplot, red represents the highrisk group, blue represents the low-risk group, and the TMB level of the high-risk group is signi cantly higher than that of the low-risk group.Statistical analysis was performed using the Mann-Whitney U test. TCGA set (F) .In the boxplot, the TMB level of the High-Risk (red) Group is higher than that of the low-risk Group (blue) in the testing set (G) and TCGA set (H). Horizontal lines: median values. Statistical analysis was performed using the Mann Whitney U test. Mutation/low risk group(n=43), TP53 Wild/high risk group(n=98) and TP53 Wild/low risk group(n=150)). Statistical analysis was performed using the log-rank test. Kaplan-Meier curve analysis of overall survival in high-risk group and low-risk group for young patients (A) and old patients (B). Kaplan-Meier curve analysis of overall survival in high-risk group and low-risk group for early-stage patients (C) and late-stage patients (D).Kaplan-Meier curve analysis of overall survival in high-risk group and low-risk group for low pathological grading group(E) and high pathological grading group (F) .Kaplan-Meier curve analysis of overall survival in high-risk group and low-risk group