In present study, 116 DEGs were identified in the merged dataset formed from three HBV-related HCC datasets. Nine important candidate DEGs were acquired through the RF classifier, and a neural network model was created. Four independent datasets were used to verify the classification (HBV-related liver cancer or non-cancerous liver tissues with HBV) efficiency of the model, and the AUC efficiency was excellent. The immune cell infiltration result shows that the percentages of 12 types of immune cells were significantly different between HBV-related HCC tissues and non-cancerous liver tissue with HBV
Random Forest (RF) and Neural Network (NN) are different types of algorithms. Random forests (RF) is an ensemble decision tree approach where each decision tree processes a sample and predicts an output label. Decision trees in an ensemble are independent. ANN is composed of many layers of nodes that carry the signal and process it to make the final decision[22]. An ANN model for the diagnosis and screening of HBV-related HCC was constructed based on nine important genes from random forests. Of these nine genes, TOP2A and BUB1B have been extensively studied in HCC[23-27]. KMO[28, 29], CDHR2[30], CLEC1B[31], CXCL14[32] and FCN2[33] were significantly decreased in HCC tissues (or) and cell lines,overexpression of these genes exhibits tumour-inhibitory effects towards HCC[28, 29], including inhibits tumor formation and the growth of subcutaneous tumours, suppresses proliferation, migration and invasion of HCC cells, EMT and induced apoptosis. FCN3 expression was significantly lower in HCC tissues than in normal tissues[34]. However, more in vitro and in vivo experiments are needed to further confirm its effect on HCC. KMO[29],CXCL14[35], CAP2 [36] and FCN3[37] were prognostic markers in HCC, and the combination of PD-L1high and CLEC1Blow expression has been shown to predict worse outcomes[38].
CAP2 was a valuable molecular marker in the histological diagnosis of early HCC[39], and its overexpression might be related to multistage hepatocarcinogenesis[40]. In addition, CAP2 transcriptional levels were significantly suppressed in silibinin-treated HCC cells. Silibinin could be a potential therapeutic agent against HCC, particularly for HBV-related HCCs[41]. These findings indicate that CAP2 may play a critical role in the carcinogenesis or progression of HBV-related HCC. CXCL14 was markedly suppressed in HBV-related HCC tissues, and its polymorphisms were associated with advanced-stage chronic HBV infection[42]. FCN2 is active in hepatitis B infection[43], and ficolin-2 serum levels and FCN2 haplotypes contribute to the outcome of HBV infection in a Vietnamese cohort[43]. Ficolin-2 was implied, which may play a crucial role in innate immunity against HBV infection.
This study aimed to establish an effective diagnostic model for HBV-related HCC based on gene expression data from GEO. The three datasets in the training group are from different countries, using the same sequencing platform, which minimised the effect of confounding factors to some extent. Four independent datasets from different countries and regions were used to assess the performance of this diagnostic model, increasing the stability, usefulness and credibility of this model. The results show that the diagnostic model has high sensitivity and specificity in four test datasets, and the AUC efficiency was excellent.
Other types of diagnostic and predictive models for HBV-related HCC have also been established previously. Integrated analysis of the microbiome and host transcriptome revealed that six important microbial markers associated with the tumour immune microenvironment or bile acid metabolism showed the potential to predict clinical outcomes[44]. LncRNA was also a potential diagnostic biomarker for HBV-related HCC, and AL356056.2, AL445524.1, TRIM52-AS1, AC093642.1, EHMT2-AS1, AC003991.1, AC008040.1, LINC00844 and LINC01018 were screened out by machine learning[45]. Based on the data from the hospital authority data collaboration lab, 124,006 patients with CVH with complete data were included to build the models, and HCC-RS from the ridge regression machine learning model accurately predicted HCC in patients with chronic viral hepatitis[46]. In addition, another study identifies noninvasive biomarkers by applying a urinary proteomic strategy[47].
Infiltrating immune cells, a component of the tumour microenvironment, are involved in many processes, including tumour growth, invasion and metastasis. Accumulating evidence has shown that HCC tumours harbour a significant level of immune cell infiltration, and the status of immune cell infiltration and its characteristics are usually associated with different prognostic outcomes[48, 49]. In this study, the density of B cells memory, T cells CD8, T cells regulatory (Tregs), NK cells resting, macrophages M0, dendritic cells activated in tumour tissues significantly increased compared with non-cancerous liver tissues with HBV. In contrast, the density of B cells naïve, Plasma cells, T cells CD4 memory resting, T cells gamma delta, NK cells activated, mast cells activatedin HBV-related HCC tissues significantly decreased. T cells, B cells, NK cells, macrophages and mast cells have been previously reported to be present in immune cell infiltrates of HCC and play essential roles in the development, prognosis and immunotherapy treatment of HCC. High densities of naïve B cells and plasma cells were associated with superior survival[50]. The antitumor or tumour-promoting effects of tumour-infiltrating lymphocytes depend on the proportion of the lymphocyte subsets constituent in the tumour microenvironment, and T lymphocytes are the primary TIL cells in HCC[51]. The mechanism of mast cell activation in HCC is unclear, but its activation facilitates immune escape and resultant tumor growth[49]. More importantly, HBV-specific CD8+ T cells, HBV-non-specific CD8+, CD4+T, B and NK/NKT are all involved in the development of HBV-related HCC[52].
This study also has some limitations. First, HCC exhibits high heterogeneity, which contains etiologic, geographic and molecular heterogeneity. Molecular heterogeneity can be further classified into interpatient, intertumor and intratumor heterogeneity[53]. The HBV-related HCC diagnosis model using an ANN was solely based on gene expression data. Therefore, it is difficult to use a single model to accurately diagnose HCC at an early stage, although the model performed satisfactorily on the training and validation datasets. Second, the number of samples used for the construction and validation of this model was relatively small. Third, subsequent confirmatory experiments and clinical practice are needed to further monitor the accuracy and stability of the diagnostic model.