Integrated regulatory-metabolic network model reveals critical mechanism and potential targets for Hepatocellular Carcinoma


 Background HCC (Hepatocellular carcinoma), the predominant form of liver cancer, has long been the top three leading cause of death in cancer worldwide. Although researchers have spent lot of effort to identify molecular targets available for treatment, the high tumor heterogeneity makes it difficult to develop effective therapy options and the drug response remains low. Under this circumstance, the precise stratification strategies are more than required. However, previous researches generally focused on single biological level, such as genome, transcriptome, or proteome, and are not able to discover effective therapeutic targets, so the systematic study of both regulation and metabolism of HCC is needed. Methods In this paper, we use two different algorithms to reconstruct regulatory networks for both HCC and normal liver cells, then integrate them with corresponding metabolic models in order to discover TFs (transcriptional factors) affecting tumorigenesis. Furthermore, a machine learning algorithm is utilized to classify HCC samples, differentially expressed genes, altered metabolic reactions and biological pathways are identified in lowest overall survival (OS) rate sub-type compared to others. Results We classify TCGA-LIHC samples into three sub-types with significantly different OS rate, and this stratification strategy is validated in another independent dataset LIRI-JP. Then, we identify 5 key TFs affecting cancer cell growth and CREB3L3 is believed to be associated with poor prognosis. The comprehensive metabolic analysis on personalized metabolic models highlight 18 metabolic genes essential for tumorigenesis in all three sub-types of patients, besides, ACADSB and CMPK1 are highly possible to be strongly correlated with lower OS. Conclusions Among 20 metabolic genes identified through metabolic analysis, 15 of them have already been targeted by approved drugs according to DrugBank. In addition, miRNAs targeting key TFs and genes are also involved in well-known cancer related pathways. The multi-scale regulatory-metabolic model reveals the critical mechanism of HCC cell proliferation and suggests potential targets.


Background
HCC is the most common type of primary liver cancers, it has become the third leading cause of deaths from cancers not only in developing countries but also in developed countries (1). Obesity, diabetes, fatty livers, virus infection and many other diseases can all turn into HCC, and because of the current di culty in early-diagnosis, the treatment of HCC is unsatisfactory. Although, drugs like Sorafenib and Lenvatinib had been approved by FDA, the drug-response rates were relatively low due to high tumor heterogeneity.
So the precise strati cation and effective targets discovery are required for treatment. Recently, several studies reported the subtype classi cation of HCC based on genetic omics data. For instance, the earlystage Chinese HCC samples have been classi ed using proteomic data and successfully uncovered the mechanism of early tumor cell development (2). The proteomic of HBV-related HCC samples identi ed three subgroups with distinct features in metabolic reprogramming, microenvironment dysregulation, and cell proliferation (3).
It has been well recognized that metabolic reprogramming is an important characteristic and driver of cancer. Genome-scale metabolic models (GEMs) have been successfully used to characterize cancer metabolism as well as identify drug targets for cancer treatment, which is a powerful framework to mechanistically represent the relationship between genotype and phenotype by computationally modeling the biochemical constraints that are imposed on the phenotype. They are capable of predicting various biological tasks under given circumstances (4,5) thus providing the basis for identifying essential genes or reactions for a particular objective function. By comparing the alteration of metabolism between normal and tumor tissue models, many disease-related genes/proteins/metabolites have been identi ed and then experimentally validated. For example, Folger et al.(6) used microarray data to leverage key genes for non-small-cell lung cancer; Agren et al. (7) utilized data from HPA Database with INIT algorithm to successfully construct 69 cell-speci c models and 16 cancer-speci c models. Uhlen et al.(8) employed RNA-Seq data from TCGA Database together with tINIT algorithm to reconstruct 6753 patient-speci c metabolic models for various cancers. Several anti-cancer drugs designed based on molecular targeting strategy have already been approved by FDA and put into clinical use: Avastin for EGFR-mutation induced non-small cell lung cancer; Pertuzumab for HER2-positive type breast cancer, etc.
In comparison, there has been few breakthroughs in identifying effective therapeutic targets for HCC. The recent study by Bidkhori et al. took a new beginning for HCC metabolism study, which utilized metabolic network topology analysis to divide 179 TCGA-LIHC samples into three subtypes and identify potential subtype-speci c therapeutic targets (9). However, tumor development is far more complex which requires understanding of the complicated multi-level mechanisms including both regulation and metabolism. It is well known that metabolism can be affected by transcriptional regulation and vice vasa, therefore the integration of regulation and metabolism is very important for cross-talking study, and may be useful in precise strati cation. So far, there has been few regulatory networks constructed for liver tissue, let alone ones for HCC tissue, more than this, the combination of regulation and metabolism in studying HCC is totally empty. In this study, we leveraged the application of integrated regulatory-metabolic model in disease study to investigate the possible mechanism of HCC using the whole TCGA-LIHC samples. Firstly, we constructed the regulatory networks of HCC and normal liver tissues, then combined with human liver metabolic model using the state-of-art algorithm. We classi ed HCC patients into different subgroups by expression data of TFs and genes in the integrated model, and evaluated classi cation results by overall survival outcomes. Using the integrated regulatory-metabolic model, we identi ed mechanism of HCC tumor cell progression, as well as genes associated with poor prognosis and the potential therapeutic targets, the results showed high consistency with previous in-silico and experimental studies. In addition, we analyzed the miRNAs information to further support that the genes identi ed by the integrated model are highly important for HCC tumorigenesis and may be the targets for clinical treatment. The main schema of this study was shown in Fig. 1.

Expression Data
RNA-Seq expression data were obtained from three sources: 315 HCC samples with clinical outcomes from TCGA-LIHC Project; 232 HCC samples with clinical outcomes from ICGC-LIRI Project; 50 HCC paired tumor-normal samples from Liu et al .(10) (GSE77314). GRCh37/hg19 was used as the reference genome and all data have already been pre-processed to quantitate from reads to FPKM through cu inks.
The GSE77314 dataset was also used to infer tumor and normal liver regulatory networks, and all of the three datasets were employed in model integration.

Metabolic Models
The genome-scale metabolic liver model used for integration was from HMA Database (https://metabolicatlas.org/gems/repository). It was built based on the combination of HMR2(7) model with RNA-Seq data of liver tissue to provide approach to explore metabolic and proteomic functions in cancer (11). According to the authors, high-energy compounds cannot be generated from low-energy ones to avoid futile cycle in models and this property is accessed by adding certain constraints.
The patient-speci c genome-scale metabolic models used for metabolic analyses are retrieved from Liver Hepatocellular Carcinoma models section in BioModels Database (https://www.ebi.ac.uk/biomodels/). Uhlen et al. employed tINIT to do the reconstruction, and the characteristic of metabolic pathways in each model is determined by the protein abundance detected from individual patient RNA-Seq data in TCGA-LIHC Project. Biomass representing cell growth (whose formula is also obtained from Uhlen et al.) is set to be the objective function. We selected 315 out of 338 HCC individual models with clear clinical stage information (exclude'not reported') for metabolic reprogramming analysis.

Construction of regulatory network
We used two independent algorithms MERLIN (Modular regulatory network learning with per gene information) and CMIP (Conditional mutual information measurement using a parallel computing framework) to construct the tumor/normal regulatory networks from the expression data (GSE77314).
MERLIN, which is designed by Roy S. combines both per-gene method and per-module concept based on probabilistic graphical model to infer regulatory network, so it can not only include memberships deduced from individual gene but it can also take the similarity within a group of genes into consideration. The algorithm has been proved to be effective in predicting transcriptional changes in human differentiation neural progenitor cells. In addition, MERLIN outperformed several other state-of-the-art algorithms. We used default settings except modifying folds of cross-validation to ve in our study. CMIP is proposed by Chen et al. The interactions between genes are measured by the conditional mutual information measurement to avoid neglecting subtle relations under certain conditions, for example, both A and B are strongly connected to C, then the actual relation between A and B may be confusing because of the interference of C. The performance is evaluated by average AUC and AUPOR of ten benchmark datasets provided by DREAM3, and the results showed that CMIP performs better than other algorithms, at the meantime, parallel computation strategies enable it to handle genome-scale datasets and to complete tasks within a relative short time period compared to other popular methods presented in DREAM Projects (12). We run CMIP in default parameters to let the algorithm automatically decide the threshold dynamically.
The regulatory relations deduced by both of the two algorithms were regarded as 'high con dence' regulations and tagged in the regulatory networks for further integration with metabolic model.
Integration of regulatory network and metabolic model IDREAM (Integrated Deduced Regulation And Metabolism) algorithm proposed by our group was used to do the integration, which uses bootstrapping regulatory relations delivered by EGRIN (Environment and Gene Regulatory In uence Network) (13) and introduce PROM (Probabilistic Regulation of Metabolism)like framework to add metabolic constraints. IDREAM has been successfully applied in Saccharomyces cerevisiae to predict growth phenotype of gene mutants and shows high consistence with experiment data. In addition, it has uncovered novel synthetic lethal pairs of TFs and metabolic genes with important interaction mechanism.
We collected the liver regulatory network from RegulatoryCircuits published by Marbach et al. (14) (http://regulatorycircuits.org/) and human regulatory network from RegNetwork published by Liu et al. convincing results from literatures were also included. There are 1366 TFs in the union of these two regulatory networks. There are 2456 metabolic genes contained in liver tissue model in HMA Database mentioned above; Then we used IDREAM to build the integrated regulatory-metabolic model to investigate TFs affecting cell growth in tumor and normal respectively.
In total, 3492 genes are determined (1366 + 2456 but exclude the overlapped genes and ones that are unavailable in expression data) Metabolic Analysis COBRA Toolbox incorporated in MATLAB was used for metabolism analysis, we replaced default solver with Gurobi to get better results. It focuses on presenting feasible phenotypic states through setting appropriate constraints gained from prior knowledge or assigned conditions. And by identifying the metabolic task to be studied, a reduced set of solutions can be achieved for further analysis.
Our research mainly used SingleGeneDeletion function to nd metabolic genes whose knockout will lead to the decrease of cell growth; and OptimizeCbModel function to nd the optimal growth rate, and corresponding ux distribution. DAVID (https://david.ncifcrf.gov/) was employed to do the enrichment analysis, we followed the default settings provided, chose proper thresholds and selected the results comprising of KEGG pathways.

Strati cation, survival and DEG analysis
The 3492 genes ltered above were used to stratify 315 TCGA-LIHC samples using the NMF(non-negative matrix factorization) consensus cluster method built in R packages called 'NMF'. It is a machine learning algorithm aiming to distinguish different molecular patterns in high-throughput genomic data. We used 200 iterations to determine the best clustering number between 2 and 10, and the result was set to 3 according to the principal of cophenetic score and average silhouette width. Then 300 iterations were performed to identify the group members.
We used clinical outcomes to evaluate the clustering results, Kaplan-Meier survival curve implemented in 'survival' packages in R was employed to assess the overall survival rate. Three classes showed signi cant differences in survival outcomes For the analysis of DEG (differently expressed genes), we used a linear model and moderated t-statistics based algorithm implemented in the 'Limma' package, with log2(FC) > = 1 and P-value < = 0.05. We operated this procedure between three classes, respectivelyand the intersection were taken to be the nal gene set.

Network topology Analysis
Cytoscape software was used to do the topology exploration. The 'Tools'-'Merge'-'networks' function with the optional parameter 'difference' was used to nd the difference between tumor and normal liver network. The principle was to remove all identical nodes so to identify TFs/metabolic genes only exist in HCC/normal regulatory network. And then we highlighted the hub genes being responsible for the abnormity on topological structure. For the multi-scale network visualization, we used 'Tools'-'Merge'-'networks'function with optional parameter'union' to integrate miRNA-genes network and TFs-target genes network to present a comprehensive view of the core regulatory-metabolic network.

Results And Discussion
Regulatory network difference between tumor and normal liver cells Due to the indispensability between metabolic and regulatory mechanisms, it is reasonable to take regulatory network into consideration when studying cancer metabolism. There are many algorithms designed to infer these networks from sequencing pro les and the results have been validated in model organisms like S.cerevisiae and E.coli. We employed MERLIN and CMIP algorithms (Methods) together with paired RNA-Seq data obtained from GEO database (GSE77314)(10) to construct the regulatory networks of HCC and paired normal tissues. There are in total 15143 pairs and 29127 pairs of regulation between transcriptional factors and target genes deduced from tumor and normal samples (Supplementary Table 1). Among them, 1654 pairs are same. Cytoscape was used to visualize the topology difference (Methods) of these two networks, after deducting the nodes with little in uence, the most cored structure is shown in Fig. 2.
NME2 and NFKBIA are clearly the hub TFs playing important roles in normal liver models. In contrast, these two TFs are absolutely absent in HCC tumor model. It is well-known that nuclear factor κB (NF-κB) affects multiple biological process by regulating immune response and in ammation, so it has been recognized as hallmark in cancer progression(16). NFKBIA is a member of a cellular protein family which can mask the nuclear localization signals of NF-κB and block its way to bind DNA. Because of this inhibition ability, it has long been considered as a tumor suppressor (17). Also, NME is a gene family associated with suppressing cancer metastasis and invasion(18). Especially, NME2, which is located on 17q21, its products have been reported to inhibit the metastasis of breast cancer and lung cancer (19,20). Besides, there are experiments supporting this idea by showing that the RNA expression of NFKBIA and NME2 are signi cantly decreased in HCC tissues compared to normal tissues. Therefore, the reconstructed regulatory networks effectively revealed the critical differences between liver cancer and normal tissues. It is persuasive to consider NME2 and NFKBIA as tumor suppressor factors for future study.
Integrative regulatory-metabolic network identi ed abnormality of Hippo signaling as key misregulation in HCC In our previous work, we developed a framework called Integrated Deduced Regulation And Metabolism (IDREAM) (21), which uses bootstrapping linear regression model on large-scale gene expression dataset to predict transcription factor (TF) regulation on enzyme-encoding genes, and then utilizes a probabilistic regulation of metabolism approach to apply regulatory constraints to the metabolic network. Then the integrated model can predict the in uence of each TF knockout on certain objective function, such as cell growth. It has been successfully applied in Saccharomyces cerevisiae to effectively predict the in uence of transcriptional regulation on metabolic phenotype. Here we used IDREAM algorithm (Methods) to integrate regulatory network with metabolic models to identify potential transcription factors vital to the growth of HCC cells.
The composition of integrated model for HCC and normal liver tissue are listed in Supplementary Table 1. The basic metabolism part is consistent, while the transcriptional factors and target genes are different. There are 1313 TFs in HCC model and 1312 TFs in normal model, among them, 33 and 32 TFs are unique, respectively. To analyze the 32 speci c TFs (including NME2 and NFKBIA mentioned above) not involved in HCC regulatory network, the Reactome database show that they were enriched for YAP1-and WWTR1 (TAZ)-stimulated gene expression pathway. They are transcriptional co-activators interacting with TEAD family genes to promote expression of TFs critical to cell proliferation and apoptosis through Hippo signaling pathway, which means the depletion of these 32 TFs will lead to the abnormality of Hippo signaling and may induce a wide range of cancers. In addition, the 33 speci c TFs in HCC integrated model were mainly enriched in pathways in cancer metabolism and transcriptional misregulation. Both conclusions supported that the integrated models we reconstructed are credible.
For each TF knockout, we changed the constraints on corresponding reactions according to activation/inhibition interactions and then simulated the cell growth rate to get the ratio over wild type. We found TFs affecting both tumor and normal cell growth, and TFs only reduce growth of tumor cell particularly (Supplementary Table 2), for example, ETV7, whose knockout will lead to nearly 2/3 reduction in tumor cell growth; CREB3L3, which seems to be closely correlated with poor prognosis in HCC.
Precise strati cation of TCGA-LIHC samples based on metabolic and transcriptional gene expression Although researchers have long been working on identifying genes or pathways available for treatment, the high heterogeneity of HCC has always been an obstacle in the precise clinical diagnosis, it is partly due to the fact that the current TNM stage classi cation is not that e cient. Therefore, researchers are devoted to try to stratifying HCC patients on molecular level and hope to nd corresponding therapeutic targets. Bidkhori et al. utilized metabolic network-based method to divide 179 TCGA-LIHC samples into three subtypes and identi ed their speci c characteristics; Jiang et al. used proteomic data to classify HCC patients and explored the mechanism of early-stage HCC tumor cell. Inspired by them, here we used 315 HCC samples attached with actual clinical stage information from TCGA-LIHC dataset and 3492 genes (Methods) obtained from multiple sources to identify altered metabolism among different subgroups and speci c characteristics of poor prognosis subgroup.
Using a non-negative matrix factorization consensus-clustering analysis, three major classes are concluded in TCGA-LIHC cohort, 130, 127 and 58 cases are classi ed into Class1, Class2 and Class3 respectively (Methods). The survival curves are shown in Fig. 3a,from which we can see that Class2 has an signi cantly lower overall rate of survival. By collecting 159 overlapped samples used in Bidkhori et al.'s study and ours, we reached a relatively good agreement in identifying the lowest OS subgroup: 90.57% (48 of 53, their results that are also in ours ) and 70% (49 of 70, our results that are also in theirs). Consequently, we focused on digging out the characteristics of poor prognosis subgroup Class2 at both transcriptome and metabolism level. In order to validate the effectiveness of our strati cation strategy, we applied the same strategy on LIRI-JP data (ICGC database) to split it into three subgroups, as we can see in Fig. 4a, the P-value of survival curves are still signi cant. By analyzing the pathways enriched for differentially expressed genes, the result shows high consistency with the ones of TCGA-LIHC data: the up-regulated genes are mainly enriched in well-known cancer related pathways in improving cell proliferation, such as increased glucose uptake as a principal nutrient source in central carbon metabolism of cancer, cell cycle and fructose metabolism; HIF signaling, which consists of master regulators of oxygen homestasis that not only allow tumor cells to adapt to hypoxic environment by enhancing oxygen delivery but also affect important growth factors like VEGF; let alone viral carcinogenesis and hepatitis B pathways which can be directly link with HCC development. In the meantime, the down-regulated genes are generally found in pathways contributing to drug metabolism, for example, the PPAR signaling pathway, which has also been identi ed in less aggressive HCC subtypes through proteomics analysis; drug cytochrome P450 metabolism, which is indeed reduced in advanced cancer patients (23).
Integrated models revealed that PI3K-Akt and mTOR signaling pathways are critical to HCC tumor cell growth By using IDREAM algorithm, we found 8, 13, 5 TFs being vital for HCC cell growth respectively in Class1, Class2 and Class3 of TCGA-LIHC (after excluding TFs that also affect normal tissue). There are 3 TFs common in all three classes, as listed in Table 1. The knockout of ETV7 shows the greatest decrease in growth rate in all three classes. ETV7 is a transcription factor belonging to ETS family, which has been proved to be responsible for the development of different tissues as well as the progression of several cancers, such as HCC (24,25). Due to its translocation function, the overexpression of ETV7 has been associated with tumorigenic transformation and anti-apoptosis by blocking Mys-induced apoptosis pathway(26-28). Furthermore, there are growing experimental evidences showing that ETV7 also plays a signi cant role in mTOR signaling pathway by assembling mTOR3 complex, which can stimulate cell proliferation and is not sensitive to rapamycin, a common anti-tumor agent, unlike mTOR1/2 (29). So the depletion of ETV7 may cause the inactivation of mTOR3, thus leading to tumor cell death after treatment. In addition, CTBP1 is a well-known cancer hallmark taking responsibility in pro-tumorigenic process as well as affecting regulatory network (30). It can bind to C-terminus of the adenovirus protein E1A to promote cell-proliferation and invasion (31). Besides, the characteristic of cancer cells (high NADH level) makes it possible for CTBP1 to bind to NADP with a high a nity thus triggering a conformational change, leading to hyper-activity in both tumorigenesis and tumor progression.
To explore characteristics of TFs leading to low survival rate and poor prognosis, we selected TFs whose knockout only in uence samples in Class2 but not in Class1 or Class3 of TCGA-LIHC dataset. There are 5 TFs speci c for Class2 (threshold: ratio < 0.98), in addition, 4 of 5 these TFs are also in the result of the lowest survival subgroup (Class3) of LIRI-JP dataset (the ratio of HNF4A exceeded the threshold a little), as shown in Table 1.
The knockout of CREB3L3, which also appears in the decreases the growth rate of tumor cells by over 15%, but have no effect on normal tissues. CREB3L3 is reported to be activated in the lipid metabolism in liver-speci c tissue in a mutual manner with PPARα(32). They both play important roles in the utilization of fatty acid for energy under fasted state, just like that in cell proliferation, so it is not surprising to see its absence will result in the decreased growth rate in tumor cells. The expression of CREB3L3 is link with anti-apoptosis, cell survival and HBV-associated HCC development by regulating hepatic genes in PI3K-Akt and AMPK signaling pathways. With the combination of consistent in-silico analyses and biological knowledge, we suggested CREB3L3 as a therapeutic target, especially for those HCC patients in advanced stage.
Metabolic genes in cholesterol biosynthesis are druggable targets in HCC treatment We incorporated patient-speci c models established by Uhlen et al. to do metabolic analyses, including identi cation of metabolic genes essential for tumor cell growth and annotation of the speci c reactions altered during tumor development. All 315 metabolic models were built at the aim of representing tumor growth. On the basis of genetic human metabolic model HMR2 and RNA-Seq expression data from TCGA-LIHC, a task-driven model reconstruction algorithm called tINIT was employed to construct all models.
We conducted the single gene deletion simulation by a function provided in COBRA Toolbox (33). The total gene number of each model ranges from 1106 to 2169. We rst differentiate the essential genes respectively in the three subtypes of TCGA-LIHC samples by the NMF strati cation strategy mentioned above. And then collected genes that existed in at least half of the samples in each class. After wiping out those having no in uence on tumor cell growth, there are 19, 20, and 18 genes left, respectively. The 18 genes found in Class3 (relatively high overall survival rate) are same with other two classes, and one other gene, ACADSB was shared by Class1 and Class2, while the other gene CMPK1 was identi ed in Class2 alone. We checkout the total 20 in uential genes to see whether there is prior knowledge about their therapeutic potentiality in Drugbank, as shown in Table 2. There are only ve of the predicted essential genes having not been recorded in Drugbank, but the high con dence of existing drug targets hitting suggested that those ve metabolic genes are possibly be potential targets and worthy to be explored in depth for future researches.
Enhancement of glutathione and fatty-acid biosynthesis is important metabolic reprogramming associated with poor prognosis It is widely accepted that tumor cells reprogram certain metabolic pathways consisting of numerous reactions to meet the demand of unlimited cell proliferation, aggressive invasiveness and anti-apoptosis.
We investigated 1329 reactions existing in all 315 patient-speci c models to see the ux patterns and enzymes differed between the poor-prognosis subgroup (Class2) and the other two classes. We There are two reactions having ux change of type1 and type2, respectively. According to these four reactions, GSH (glutathione) production is suspected to increase in Class2 samples due to the enhancement of AKG biosynthesis and cysteine accumulation in cytosol. GSH is a key member of cell immune response system, lacking it can easily lead to cell death, several labs have con rmed its common occurrence in all cancers (41) and considered it as potential therapeutic target. Besides, the loss of SLC25A11, the enzyme catalyzing these reactions, has been validated to inhibit tumor cell growth in non-small cell lung cancer (42). Baulies et al. (43) suggested that the over-expression of SLC25A11 works as an adaptive mechanism of HCC to provide enough GSH for vast cell growth, meanwhile, SLC25A11 induced exportation of AKG to cytosol also activates mTOR pathway to promote cell growth and anabolism through EGLNs (egl-9 family hypoxia-inducible factors) (44).
There are eleven reactions carried no ux in Class 2 but having positive ux in Class 1/3 (type 3), four of them belongs to porphyrin metabolism. The enzyme UROD involved in this pathway is a recently identi ed potential anti-cancer target due to its ability to convert uroporphyrinogen to coproporphyrinogen (45). Another enzyme ALAD has been found over-expressed in breast cancer patients with a favorable clinical outcome for its up-regulation can suppress cell proliferation and invasion(46). In addition, a set of enzymes responsible for carnitine shuttle including SLC22A1, SLC25A20, SLC25A29 and CPT2 are also down-regulated in HCC tumor cells, they mainly play rate-limiting roles in controlling fatty acid oxidation (47). Their low-expression are proved to be signi cantly associated with worse patient survival(48) and differentiation state by impairing NO production and mTOR signaling pathway mediated by arginine, in some situation, even leading to severe autophagy (49,50).
There are three reactions having non-zero ux in Class 2 but zero ux in Class 1/3 (type 4), involving fatty acid activation responsible for providing adequate ATP and CoA for tumor cell growth; glycine, serine, threonine metabolism helping reduce ROS pressure through SGOC metabolic network during tumor metastasis (51), and arginine/proline metabolism, which can regulate response to nutrient and oxygen deprivation in oncogenesis thus avoiding cell apoptosis (52). Furthermore, the exploration of enzymes revealed that ACADSB (which is also indicated by previous analyses), ACSL3 and ACSL4 regulate distinct proteins like p-AKT, LSD1 and β-catenin to stimulate tumor cell proliferation (53).
To conclude, the altered reactions speci c to Class2 samples generally aim at promoting tumor cell growth and decreasing their sensitivity towards normal apoptosis signals. Several key enzymes have already been regarded as biomarkers in cancers, with deeper exploration, it is convincing to gure out their detailed mechanism and nd more available therapeutic targets. Meanwhile, the metabolic reprogramming accounting for poor prognosis also supports that our strati cation of the HCC patients are meaningful. miRNAs regulating the in uential genes for HCC cell proliferation To disclose a more comprehensive interplay between regulation and metabolism of HCC, we further retrieved microRNAs regulating the in uential genes highlighted in our previous analyses: including 3 common TFs showing in uence in all three Classes; 5 overlapped TFs speci cally affecting the lowest survival subgroup of TCGA-LIHC(Class2) and LIRI-JP(Class3); and 20 metabolic genes revealed by single gene deletion result. From MIRNET, we found six highly-connected microRNAs, three of which have been validated to be directly linked with HCC, including miR-124-3p, miR-1-3p, and miR-24-3p (Supplementary Table 5).
MiR-124-3p and miR-1-3p have been reported to be down-regulated in HCC patients compared to normal (54,55). While miR-24-3p has been considered to be involved in HCC diagnosis panel for its abnormal over-expression.
The speci c mechanism of how the loss-function or gain-function of these miRNAs contribute to tumorigenesis remains unclear, however there are some experiment-based theories available. Especially, miRNA-124-3p, which also correlates with a wide range of other cancers including breast cancer, lung cancer, colorectal cancer, etc. appears to be the key microRNA during oncogenesis(56-58). Zheng et al. In addition, miR-26b-5p, which has the maximal connection (9) in regulating 28 highlighted genes, has been experimentally validated to under-express in HCC patients with worse prognosis. Because it can suppress tumor invasion as well as induce apoptosis by targeting SMAD1(66), which is consistent with our conclusion about SMAD gene. Nevertheless, three genes obtained by our integrated regulatorymetabolic analysis in previous section, CMPK1, ACADSB, RORC are directly regulated by miR-26b-5p, and the fact that they are all speci c genes for Class2 (the worst overall survival rate) proved our conclusion is credible. By combining the in-silico results in our work together with prior knowledge obtained from literatures, Fig. 5a shows the core network comprising of microRNAs, TFs and genes involved in HCC, which can provide basis for further studies in HCC development.
To take 28 candidate genes and 6 top-connected miRNAs into whole consideration, we conducted IPA (Ingenuity Pathway Analysis) to explore the biological connection among them. As shown in Fig. 5b, EGFR is inferred and densely linked with our core gene set. EGFR is one of the most crucial genes responsible for cancer cell growth, its over-expression can lead to unlimited cell proliferation, just like that in tumor cells, so it has long been focused and treated as therapeutic targets in cancer therapy. Multiple FDA-approved drugs such as Ge tinib, Lapatinib have been proved to be effective in EGFR-related nonsmall-cell lung cancer and several other cancers. So the IPA result indicated that our core-gene set concluded from integrated regulation-metabolism analyses are convincible and worthy further researches.

Conclusions
Integrated regulatory-metabolic network uncovered differences between HCC and normal liver cells Due to the comprehensive information linking genes-proteins-reactions contained in genome-scale metabolic models, researchers have successfully identi ed numerous potential disease-related biomarkers by metabolic analyses. Meanwhile, because of the inseparability between metabolic and regulatory mechanism, the integration of transcriptional regulation with metabolism would allow us to better describe the impact of mutations and environmental perturbations on functional metabolism. This kind of integration has been proved to be effective in model organisms such as S. cerevisiae and E. coli, but has not yet been applied to study malignant disease in human, such as cancers.
Although there are a few regulatory networks of normal liver tissue, the regulatory networks of HCC is still in depletion, not to mention the integration of regulatory network with metabolic model. In this study, we used two different approaches CMIP and MERLIN to construct regulatory networks for HCC and normal liver tissue samples. Through topology analysis, NME2 and NFBIKA are highlighted as tumor suppressor TFs because of their absence in tumor regulatory network and high connectivity in non-tumor one. Then we integrated the regulatory networks with human liver metabolic model, and compared the effects of TFs on cell growth in tumor and normal models, respectively. We found TFs only reduce growth of tumor cell particularly as potential targets. Further, by comparing the in uential TFs in early stage and advanced stage samples, we found ve TFs, including SMAD2, HEY2, ELK1, CREB3L3, CCDC6 are vital to advanced stage HCC patients.
Three subtypes of HCC samples demonstrated signi cantly different prognosis By splitting TCGA-LIHC samples using pre-ltered 3492 genes, we de ned three patient subgroups which can be distinguished by overall survival rate and Class2 showed the worst survival. We identi ed 3 essential TFs for HCC tumor cell growth common in all three groups, among which ETV7 showed the greatest impact, especially in Class2 by decreasing cell growth rate by about 88%. ETV7 is a transcription factor belonging to ETS family, which has been long proved to be responsible for the progression of several cancers, including HCC. Because of its translocation function, the overexpression of ETV7 has been associated with tumorigenic transformation and anti-apoptosis by blocking Mys-induced apoptosis pathway. There are growing experimental evidences showing that ETV7 also plays a signi cant role in mTOR signaling pathway by assembling mTOR3 complex to stimulate cell proliferation and prevent cell from rapamycin, a common anti-tumor agent. In addition, the knockout of 5 TFs speci cally affecting samples in Class2, we identi ed potential TFs related to poor prognosis. Among them, CREB3L3, which has also been identi ed as in uential for advanced stage HCC samples by integrated model in previous analysis. The same strati cation strategy has been applied on LIRI-JP dataset to validate the effectiveness: the survival outcomes shows signi cantly differences between subgroups and the result of TFs affecting the lowest survival subgroup shows high consistency with that of TCGA-LIHC dataset.
In addition, the poor prognosis group Class2 also exhibited speci c pattern of altered metabolism. Flux alterations in Class2 samples showed the accumulation of both AKG and cysteine, which indicated the over production of GSH, a key member of cell immune response system in improving cell proliferation and avoiding apoptosis. Besides, the biosynthesis of fatty acids, mTOR signaling were also hyperactivated, and pathways like glycine, serine, threonine metabolism were employed to reduce ROS pressure during tumor homeostasis.
Key metabolic genes in cholesterol biosynthesis identi ed by patient-speci c models are potential targets The metabolic analyses based on patient-speci c models found 20 metabolic genes playing important roles in HCC tumor cell growth by participating in cholesterol biosynthesis pathway, 15 of them have already been targeted in various cancers or cancer-related diseases according DRUGBANK, such as ACADSB SQLE,which are targeted to suppress cancer cell-proliferation; EBP, which is one of the targets of Tamoxifen in breast cancer therapy; CMPK1 ACACB FDFT1 CRAT SLC22A5 HMGCR which play important roles in treatment of hepatic related disease to prevent HCC, providing strong proof for metabolic network-based target discovery in cancer. Therefor the other 5 candidate genes might also be potential targets. What's more, we further found ACADSB and CMPK1 appeared to be speci cally essential in Class2, indicating these two genes are associated with poor prognosis and may be the targets for treatment of more serious HCC patients.
The multi-scale regulatory-metabolic network revealed the critical mechanism of HCC cell proliferation In addition to the integration of transcriptional regulation with metabolism, we further expanded our analysis to the epigenetic regulation carried by miRNAs. Based on the highlighted genes (total 28 key genes), we dig out miRNAs regulating these candidates from MIRNET. There are three miRNAs, miR-124-3p, miR-1-3p, and miR-24-3p reported as very important factors associated with HCC tumorigenesis and functioned in well-known cancer-related pathways like NOTCH, PI3K-Akt and mTOR. We illustrated the core network of HCC cell proliferation involving regulations between miRNAs-TFs, miRNAs-Targets and TFs-Targets across multi-scale (Fig. 5a), and emphasized the targets that were highlighted in the combined analyses. In general, the inhibition of miRNAs on over-expressed genes in HCC are consistent with their validated function such as suppressing tumorigenesis, which do provide supports to the key genes predicted as affecting cancer cell growth through cross-talking.
Especially, the direct regulation of miR-26b-3p on ACADSB and CMPK1 provide more experimentally evidence to support the idea that these two metabolic genes are link with lower OS in HCC. Moreover, the biological connection inferred by IPA indicated these highlighted genes are closely connected to EGFR, which plays a signi cant role in cancer cell proliferation, providing evidence for our analyses.

Competing interests
We declare that we have no nancial and personal relationships with other people or organizations that can inappropriately in uence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be constructed as in uencing the position presented in, or the review of the manuscript entitled. Shanghai municipal health commission (ZK2015B01 and 201540114). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Authors' contributions YX was responsible for the data gathering, data preprocessing, statistical analysis, regulatory network inferring, model integration, metabolic analysis and manuscript writing; HZ was responsible for the TCGA-LIHC RNA-Seq data gathering and data preprocessing; QY was responsible for the IPA analysis; RS was responsible for the previous literature sharing; KW was responsible for the data preprocessing; YS managed the project.; ZW was responsible for the design of integrated model, explanation of results, and manuscript writing.
All authors have read and approved the manuscript.

Figure 1
The schema of the integrated model for strati cation and key targets discovery.

Figure 2
Core structure of different nodes between Normal/Tumor Regulatory topology networks. a. Difference between normal and tumor networks. b. Difference between tumor and normal networks; Nodes lled with green are TFs while nodes lled with red are metabolic genes.   Multi-scale network exploration of HCC mechanism. a. Core miRNA relation network of potential therapeutic targets. The octagon represents metabolic genes while the square represents TFs and the circle represents miRNAs; The orange color represents genes/TFs that have been experimentally validated as crucial genes in HCC tumor cells; b.