Integrative analysis of the characteristic of lipid metabolism-related genes for the prognostic prediction of hepatocellular carcinoma

Background: Dysregulation of lipid metabolism is implicated in the progression of hepatocellular carcinoma (HCC). We therefore investigated the molecular characteristics of lipid-metabolism-related genes in HCC. Methods: Multi-dimensional bioinformatics analysis was conducted to comprehensively identify the lipid metabolism-related genes (LMRGs) from public databases, as well as the clinical information, immune features, and biological characteristics of HCC. The IMGR were then used to classify HCC into molecular phenotypes. Six lipid metabolism-related genes sets with the potential to predict the prognosis of HCC patients were identified. Results: A total of 770 HCC patients with complete clinical information and corresponding 776 LMRGs were downloaded from 3 databases. Univariate cox and non-negative matrix factorization analyses were used to classify HCC patients into 2 clusters. This molecular classification was associated with overall survival, clinical characteristics, and immune cells. The biological function of the differentially expressed LMRGs in the 2 clusters showed the genes associated with tumor-related metabolism pathways. A combination of multivariate/univariate cox regression and least absolute shrinkage and selection operator analyses were conducted to build a 6 LMRGs signature (6-IS) to predict the prognosis of HCC. The 6-IS signature was found to be an independent prognostic factor. Performance of the 6-IS prognostic signature was verified in a validation set and compared with an external data set. Results revealed the 6-IS signature could effectively predict the prognosis of patients with HCC. Conclusions: This study provides new insights into the role of LMRG in the pathogenesis of HCC and presents a novel prognostic signature 6-IS monitoring the clinical course of HCC.


Introduction
Hepatocellular carcinoma (HCC) is one of the most common malignant tumors characterized by high malignancy, high metastatic potential, and poor prognosis. [1] Most patients with HCC are diagnosed at the advanced stage, at this stage they cannot benefit from surgery or chemoradiotherapy. [2] In recent years, therapies based on biological targets have been proposed for patients with HCC. [3] However, the clinical benefit of available biomarker for early diagnosis and prognostic assessment was still limited. Thus, it is important to study the pathogenesis of HCC and find specific targets that can be used to improve early diagnosis and prognostic prediction.
In human, many metabolic functions take place in liver cells. [4] Being an important site for lipid metabolism, HCC results in many lipid metabolic abnormalities. [5] Previous studies have shown that HCC is accompanied with abnormal changes such as increased de novo synthesis of fatty acids, suppressed oxidation levels, high secretion of insulin and insulin-like growth factors, and abnormal metabolism of phosphatidylcholine. [6] These metabolic processes provide intermediate energy substrates that enable HCC cells to grow, proliferate and metastasize. [7] In addition, several enzymes and signaling molecules, such as 3-hydroxy-3-methylglutaryl-coenzyme A reductase and AKT/mTORC1 pathway regulate lipid metabolism of HCC cells. Thus, the metabolic enzymes and pathways associated with these processes can be used as biomarker for diagnosis and treatment of HCC. [8] Numerous studies have been carried out to uncover the biological phenotypes and molecular classification of HCC on the basis of lipid metabolic patterns. [9] Fox example, the de novo fatty acid synthesis phenotype in tumor cells has been associated with up-regulated lipid-related genes at multiple levels, such as transcription, translation and post-translation modification, and enzyme activity, as well as the influence of these genes on oncogenes. [10] Moreover, molecular classification of HCC based on lipid metabolism-related genes reveals distinct tumor subtypes. Using bioinformatic methods, Bidkhori et al [11] subdivided HCC patients in 3 clusters with distinct metabolic and signaling pathways at the genome, transcriptome, and proteome levels. These clusters were associated with clinical features and survival rate. However, lipid metabolic program and molecules have not been fully exploited in the prognostic prediction of HCC patients.
In this study, data of 770 HCC patients were divided into 2 molecular clusters based on 776 lipid metabolism-related genes (LMRGs). The 2 molecular clusters were associated with the clinical features, immune infiltration, and tumor metabolism-related biological processes. We also established a prognostic signature for the HCC patients. A flow chart showing the protocol of this study is shown in Figure S1, Supplemental Digital Content 1, http://links.lww.com/MD/H366. This study extends our understanding of the molecular basis of lipid metabolism involved in the pathogenesis of HCC and suggested the lipid metabolism-related genes for the hallmark of prognostic prediction in HCC patient.

Patients' information and genome expression dataset
Multiple datasets were downloaded from several databases including the cancer genome atlas (TCGA), Gene Expression Omnibus, and Database of Hepatocellular Carcinoma Expression Atlas (HCCDB). 371 samples obtained from TCGA were subjected to quality control and filtration procedure and we collected 342 samples that met the conditions, which were randomly divided into training and validation sets. To avoid random allocation bias which may affect the stability of subsequent modeling, all samples were sampled 100 times in advance to ensure that training and validation sets were consistent in clinical features. The GSE15654 data set obtained from Gene Expression Omnibus database were preprocessed for quality control and filtration, and we finally got 216 samples. The HCCDB18 dataset containing 212 samples and corresponding clinical information derived from HCCDB database were downloaded directly. Detailed information of the 3 datasets are presented in Table S1, Supplemental Digital Content 2, http:// links.lww.com/MD/H367. The study was approved by the ethics committee, People's Hospital of Longhua, Shenzhen.

Molecular classification of HCC based on lipid metabolism-related genes
The LMRGs were obtained from 6 lipid metabolism-related pathways (Table S2, Supplemental Digital Content 3, http:// links.lww.com/MD/H368) in Molecular Signature Database v7.0 (MSigDB) (www.gsea-msigdb.org/gsea/msigdb). A total of 776 LMRGs were retained after exclusion of overlapping genes. We extracted 776 LMRGs from the TCGA expression profile data, and genes with expression value above 0 in more than half of samples were retained. Finally, 739 LMRGs were enrolled for subsequent analysis. Univariate cox analysis was performed on 739 LMRGs with coxph R package to mine out HCC-related LMRGs. Next, the HCC-related LMRGs were processed through non-negative matrix factorization (NMF) clustering algorithm using the NMF R package. The NMF analysis and 50 iterations were carried out with the standard "brunet" pattern. [12] k values which indicates the optimal number of clusters ranged from 2 to 10. The average contour width of the common member matrix was determined with the NMF R package, and the minimum member of each subclass was set to 10. The optimal k value was determined from indicators of cophenetic, residual sum of squares (RSS), and silhouette. Differences in clinical features between the clusters based on HCC-related LMRGs were compared using Chi-square test. The Tumor Immune Estimation Resource (TIMER) (https://cistrome.shinyapps.io/timer/) algorithm was employed to investigate the association between clusters and immune score.

Construction of a prognostic signature based on LMRGs
Differential expression of HCC-related LMRGs between clusters was analyzed by the DESeq2 algorithm using limma R package. Significant LMRGs were those with false discovery rate < 0.05 and absolute value of log2 fold change > 1. Next, significant differentially expressed HCC-related LMRGs were subjected to univariate cox analysis to determine their association with survival of HCC using survival coxph function R package. The log rank P < .01 was set as the threshold. To narrow the gene range and build a prognostic model with high accuracy, we used the least absolute shrinkage and selection operator method to reduce the dimensionality and select the most significant differentially expressed HCC-related LMRGs. The 10-fold cross validations methods were employed to select optimal values of the penalty parameter lambda. [13] Next, multivariate cox analysis was performed on the genes obtained in the above steps, and the least value of Akaike information criterion within cox proportional regression model was calculated to retain the most significant genes to construct an LMRGs signature. A risk score based on LMRGs signature set was calculated as follows: risk score = expression gene 1 × β gene 1 + expressiongene 2 × β gene 2 + ⋯ + expression gene x × β gene x , where x is the number of LMRGs and β is the coefficient value for each LMRGs. We normalized the risk score to z-score using binormalization process algorithm, and the z-score value >0 and <0 for the samples were classified into high-and low-risk groups, respectively.

Statistical analysis
The pheatmap R package was used to display unsupervised hierarchical clustering heatmap of HCC-related LMRGs and a volcano Plot of the differentially expressed LMRGs between clusters was plotted using a ggplot2 R package. The OS was estimated using the Kaplan-Meier (KM) method, and the sensitivity and specificity of the survival curve were assessed through receiver operating characteristic (ROC) curve by calculating the area under curve (AUC) of ROC using pROC R package. The GO and KEGG analyses were performed for the differentially expressed LMRGs using clusterprofiler R package. P < .05 was set as the threshold of statistical significance. Independent t test and Mann-Whitney U test were conducted to compare variables between groups, for variables following normal and abnormal distribution, respectively. The association between LMRGs signature and clinical features was analyzed by univariate and multivariate survival analyses. The association between the LMRGs signature and immune/stromal score was determined by calculating the immune and stromal scores of each sample using estimate R package comparing high and low risk groups. The potential mechanisms of LMRGs signature were analyzed by Gene Set Enrichment Analysis (GSEA) analysis using GSVA R package. Pearson correlation coefficients were used to analyze the association between LMRGs signature and biological functions. The prognostic value of the LMRGs signature and other signatures was assessed by Harrell's concordance index (c-index) using rms R package. Restricted mean survival time is an index of the area under the KM curve at a specific timepoint. It was used to evaluate the predictive value of the LMRGs signature at different timepoints. All statistical analyses were performed using the SPSS Version 25.0 software and R software version 3.4.0, and a P < .05 was considered statistically significant.

Identification of molecular subtypes based on LMRGs
Univariate cox analysis was performed on preprocessed 739 LMRGs obtained from TCGA dataset. In total, 324 HCCrelated LMRGs were identified and used for HCC classification. The cophenetic coefficients, which indicate the stability of classified cluster was used to calculate the optimal k value. We performed comprehensive analysis on the cophenetic, RSS, and silhouette index, from which we selected the k = 2 as the optimal value. Consequently, 2 molecular subtypes (cluster 1 and cluster 2) were identified based on LMRGs ( Figure S2, Supplemental Digital Content 4, http://links.lww.com/MD/ H369). Notably, the matrix heat map exhibited clear boundaries based on k value of 2, suggesting that the molecular subtypes classification was stable (Fig. 1A). The gene cluster heatmap of 324 HCC-related LMRGs revealed marked differences between cluster 1 (C1) and cluster 2 (C2). Specifically, the expression level of HCC-related LMRGs in C2 was significantly higher than C1. In addition, distribution of clinical features between C1 and C2 exhibited significant differences (Fig. 1B). KM analysis revealed that C2 had significant shorter overall survival (OS) than C1 (P = .0099) (Fig. 1C). The accuracy of molecular subtypes classification based on LMRGs was determined by comparing the association between the 2 clusters and clinical features using Chi-square test. Results showed that the pathological classification of tumor (T) (P = .0002), stage (P = .0478), and grade (P = .0391) were significantly different between the 2 clusters (Table S3, Supplemental Digital Content 5, http://links.lww.com/MD/H370). Further comparison of the immune scores between the 2 clusters was performed using TIMER algorithm. Except CD8 cells, the immune scores of B cell (P = .045), CD4 T cell (P = .015), neutrophil (P = .015), macrophage (P = .001), and dendritic (P = .001) in clusters 2 were higher than that of cluster 1 (Fig. 1D). Collectively, these results revealed that the LMRGs signature could classify HCC into distinct molecular subtypes and was associated with clinical characteristic.

Construction and validation of an LMRGs signature
First, we screened for differentially expressed LMRGs between cluster 1 and clusters 2. A total of 400 LMRGs were found to be significantly differentially expressed. A volcano and clustering map revealed a distinct distribution of up-regulated and down-regulated LMRGs between the 2 clusters ( Figure S3A and B, Supplemental Digital Content 6, http://links.lww.com/MD/ H371). Results of GO analysis showed that the differentially expressed LMRGs were primarily enriched in metabolic process, such as glutamate and lactate metabolism, as well as in tumorigenesis-related process, including cell-cell adhesion and cell migration ( Figure S3C, Supplemental Digital Content 6, http:// links.lww.com/MD/H371). In the KEGG analysis, LMRGs were predominantly enriched in metabolic pathways, such as glucagon signaling, metabolism of xenobiotics by cytochrome P450, and retinol metabolism ( Figure S3D, Supplemental Digital Content 6, http://links.lww.com/MD/H371). Functionally, the differentially expressed LMRGs were involved in tumorigenesis and metabolism related pathways.
Univariate Cox regression and least absolute shrinkage and selection operator ( Fig. 2A and B) analyses were conducted to select suitable genes from the 400 differentially expressed LMRGs. 20 significant genes were revealed through above these 2 analyses were further subjected to a multivariate Cox  (Table S4, Supplemental Digital Content 7, http://links.lww.com/MD/H372) were used to construct an LMRGs signature (termed as 6-IS) using the risk score formula. Next, we explored the association between 6 LMRGs and HCC survival. Unlike FMO3 which correlated with good prognosis in high-risk group, SLC11A1, RNF10, KCNH2, ME1, and ZIC2 correlated with shorter survival time in the high-risk group than in the low-risk group (Fig. 2C). Moreover, the 6 LMRGs signature predicted significant differences in survival outcomes between C1 and C2. We then analyzed the expression profile of the 6 LMRGs in the 2 clusters. Except FMO3, the expression of SLC11A1, RNF10, KCNH2, ME1, and ZIC2 in C2 were significantly higher than in C1 (Fig. 2D). Thus, SLC11A1, RNF10, KCNH2, ME1, and ZIC2 were considered as the hazard indexes and FMO3 as the protective index for construction of an independent prognostic LMRGs signature.
According to the calculation of the risk scores of 6-IS in each sample, we depicted the risk score plot, survival status, and expression profiles of the 6 LMRGs in patients from training set. We found that HCC patients with high-risk scores had higher mortality rates than those with low risk scores, and the changes in expression of 6 LMRGs with increased risk score revealed that SLC11A1, RNF10, KCNH2, ME1, and ZIC2 as hazard index and FMO3 as protective index (Fig. 3A). In the ROC analysis, area under ROC curve (AUC) for the 6-IS signature was 0.80 for 1 year, 0.82 for 3 years, and 0.84 for 5 years, indicating a high prognostic prediction accuracy of the 6-IS signature (Fig. 3B). In the training cohort, patients were divided into high-and low-risk groups. KM analysis based on 6-IS showed that the OS of low-risk groups was significantly better than that of high risk group (Fig. 3C). These results showed that the 6-IS could serve as an independent signature predicting the survival outcomes of HCC patients in the validation set ( Figure S4A

Association of 6-IS signature with clinical features and molecular characteristics of HCC
KM analysis showed that the clinical features, including alpha-fetoprotein (AFP) (P = .02871), stage (P = 3e − 05), T (P = 2e − 05), N (lymph node) (P = .02519), and M (metastasis) (P = .00223) could divide the HCC patients of training set based on OS analysis ( Figure S5, Supplemental Digital Content 9, http://links.lww.com/MD/H374). Next, we predicted the OS of HCC patients using the 6-IS signature according to the above clinical features (AFP > 20, AFP <=20, T, N, M, Stage I + II, and Stage III + IV). Consistently, we found that 6-IS signature could distinguish the low-risk group from high-risk group. This analysis also revealed that HCC patients in the high-risk groups had significantly shorter survival time than those in low-risk group (Fig. 4). Univariate Cox and multivariable Cox analyses were performed to verify the prognostic value of 6-IS in HCC patients from TCGA and HCCDB18 databases. Thus, we concluded that 6-IS is an independent prognostic marker associated with survival when treated as continuous variable both in TCGA (P = 2.97E−09 and P = .0049) and HCCDB18 (P = .032 and P = .0453) sets (Table 1). In subsequent analyses, we calculated immune, stromal and estimate scores for each sample from TCGA. Except stromal score, immune and estimate scores of high-risk group were significantly higher than those of the lowrisk group ( Figure S6A, Supplemental Digital Content 10, http:// links.lww.com/MD/H375). Similar results were obtained in the HCCDB18 dataset ( Figure 6B, Supplemental Digital Content 10, http://links.lww.com/MD/H375). We then compared expression of LMRGs in samples from TCGA, HCCDB18, and GSE15654 sets using GSEA analysis. An ssGSEA value was obtained which was used to infer the association between 6-IS risk score and biological function. Most of the biological functions were negatively associated with 6-IS risk score, such as glyoxylate and dicarboxylate metabolism, drug metabolism cytochrome p450, and beta alanine metabolism. In contrast, biological functions related to tumorigenesis, including glycerophospholipid metabolism, fatty acid metabolism, cell cycle, and RNA degradation were positively linked to the 6-IS risk score ( Figure S7A

Comparison of 6-IS signature with external models
Six genes signature, [14] 8 genes signature, [15] 6 genes-based prognostic signature, [16] and 4 genes signature [17] were used as the external data set for validation tests. These signatures were established to calculate risk score and assess the OS of patients in TCGA using similar methods as in our study. In line with 6-IS, KM analysis showed that all the 4 models could divide HCC patients into high-and low-risk groups, and the highrisk groups had significantly shorter survival time than low-risk groups (Fig. 5A-D). However, except that 8-genes signature (0.813) which showed similar results with our study, ROC analysis revealed that the average AUC value of 1, 3, and 5 years from 6 genes signature (0.613), 6 genes-based prognostic signature (0.770), and 4 genes signature (0.686) were lower than  that of 6-IS (0.82) (Fig. 5A-D). In the c-index analysis, the 6-IS showed better prognostic ability than other 4 models (Fig. 5E). Besides, restricted mean survival time analysis revealed that 6-IS performed better than other 4 models in the prognostic prediction of HCC patients (Fig. 5F). These results confirmed that the 6-IS is a robust prognostic prediction signature.

Discussion
Accumulating evidence show that tumorigenesis is accompanied with metabolic reprogramming of various nutrients that not only sustain cancer cell survival, but also regulate gene expression, emergence of mutations, and immune tumor microenvironment. [18] The most classic example of tumor metabolic reprogramming is the "Warburg effect," in which cancer cells tend to use glycolysis to replace normal cells that thrive on aerobic metabolism for survival. [19] Abnormalities in glucose and lipid metabolism in tumors have been the focus of recent studies. [20 Currently, abnormal lipid metabolism in HCC, especially genes related to lipid metabolism have not fully explored to determine their role in the pathogenesis, diagnosis, and treatment of HCC. In this study, we used multi-dimensional bioinformatics methods to screen out abnormally regulated genes related to lipid metabolism in HCC, and used these genes to construct a prognostic prediction signature.
Deregulation of genes related to lipid metabolism has been implicated in the tumorigenesis of HCC. For example, ADH1A triggered oncogenic transformation of hepatocytes leading to poor survival, [21] whereas extracellular PEDF inhibited angiogenesis in HCC by inducing lipid metabolic disorders. [22] These insights into the molecular mechanisms and genes markers involved in the pathogenesis of HCC have extended our understanding of the metabolic profile of HCC. However, the clinical utility of single genes targets in HCC has been challenging. Here, we used data-mining bioinformatic approaches to explore the relationships between lipid metabolism related genes and clinical features of HCC. A prognostic signature 6-IS was constructed which showed good prediction results. The prognostic signature 6-IS was also associated with the OS, clinical features and metabolic signaling pathways of patients of HCC. The 6-IS comprised 6 genes obtained using multidimensional algorithms, including FMO3, SLC11A1, RNF10, KCNH2, ME1, and ZIC2. This signature is superior as it overcomes the shortcomings of single genes such as interference from other factors. Consistent with a previous study, we found that FMO3 suppresses tumor progression by decreasing cell viability, hence it is a protective index. [23] In addition, our analysis revealed that ME1 or ZIC2, which exhibit stem-cell features, correlated with poor prognosis, hence are risk indexes. [24,25] However, these previous studies reported the presence of potential errors and bias in their analyses. For this reason, we constructed a statistical signature with multiple genes comprising clinical information to improve the efficiency of prognostic prediction. Notably, our 6-IS signature performed better than other models from external datasets in terms of prognostic prediction. Nevertheless, we acknowledge that further studies are needed to validate the performance of 6-IS in a prospective cohort.
Immune estimation tools, such as TIMER and ESTIMATE were used to calculate immune scores to assess the immune infiltration cells across groups. Similar to other studies, we found that HCC is characterized by heavy infiltration by immune cells, and that the inflammatory response in the liver is the main mechanism contributing to hepatitis, cirrhosis, and HCC. [26] Unlike other studies that used global transcriptome of HCC to analyze the immune cells composition and perform molecular classification, we only used LMRGs to investigate the immune score and carry out metabolism stratification. A strong link between metabolism and immunity has been demonstrated during tumorigenesis. [27] Results in this study suggested that LMRGs regulate immune cells during the development of HCC. Accumulating evidence show that metabolic and epigenetic reprogramming are important mechanisms that regulate tumor immunity, and most epigenetic reprogramming genes are those related to fatty acid, cholesterol esters and phosphatidylcholine metabolism. [28] In KEGG and GO analyses, we found that the differentially expressed LMRGs between clusters were enriched in diverse metabolic pathways. Moreover, GSEA analysis revealed that upregulated or downregulated LMRGs used to construct the 6-IS were associated with metabolic pathways. We hypothesized that the LMRGs participated in several metabolic pathways that modulate functions and phenotypes of immune cells during the pathogenesis of HCC.
Clinical features, such as AFP, TNM, tumor stage and grade, and other pathological classifications are widely used in the clinical management of HCC. [29] However, these data are biased and lack specificity. This calls for identification of more accurate indicators and phenotypes to improve existing diagnostic and therapeutic guidelines. [30] In this study, we predicted the OS based on AFP, stage, and TNM indicators and then compared the performance of 6-IS with the above clinical features. Results showed that 6-IS was superior to other clinical features in predicting the prognosis of HCC. In addition, 6-IS was found to be an independent factor when compared with other clinical features. Given that 6-IS is yet to be verified in prospective patients, we suggest that it is combined with the traditional clinical features to improve the clinical management of HCC.

Conclusion
In conclusion, we present an LMRG-based signature to classify HCC patients into molecular clusters based on metabolic profiles. The 6-IS was found to be a robust prognostic prediction marker for HCC patients. This signature was associated with clinical features, immune cells and various functions. This study provides novel insights to the prognostic value of lipid metabolism in HCC.