Background
Hepatocellular carcinoma (HCC) is the main pathological subtype of primary liver cancer (PLC). Early onset HCC is insidious and characterized by easy metastasis, recurrence, and a high mortality rate. Most patients are in the middle or late stage, with a poor prognosis.
Objective
This study aimed to develop and validate a prediction model for HCC disease progression by machine learning (ML) algorithms, based on clinical blood biomarkers, circulating tumor cells (CTCs), and circulating endothelial cells (CECs) from pretreatment patients. Furthermore, the risk factors for 5-year survival in HCC patients were identified, and used to guide clinical diagnosis and treatment decisions.
Methods
A total of 76 newly diagnosed patients with HCC were eventually enrolled between September 2018 and July 2019. The follow-up time was 1–67 months in this group. Patients who survived for 5 years after the first surgery, were divided into a surviving group (n = 34) and a nonsurviving group (n = 42). The pathological data and related survival factors of patients were collected before treatment. The final subset of features was filtered by the support vector machine recursive feature elimination feature (SVM-RFE) algorithm, chi-square test and Student’s test. Prediction models for 5-year survival in patients with HCC were established by logistic regression, support vector machine (SVM), decision tree classification (DTC), random forests (RF), and extreme gradient Boosting (XGBoost), respectively. Additionally, the optimal model was established through validation. The models were evaluated by specificity, F1 score, recall value, accuracy and area under the receiver operating characteristic curve (AUC-ROC).
Results
Among the included patients, the follow-up time ranged from 1–67 months. The significant variable set, which included 22 variables, was screened. Ranking the importance of variables, the top 22 characteristic variables were as follows: maximum diameter, presence or absence of distant metastasis, CNLC stage, ALB, age, RBC, the large CTC, total bilirubin, PD-L1 (-) CTC, ≥ Pentaploid CTC, AFP, vascular cancer thrombus and satellite nodules, WBC, CTC, BCLC stage, multiple nodules, AST, PD-L (-) CTC-WBC cluster, Triploid CTC, LYM, PD-L1 (-) CEC-WBC cluster and degree of cirrhosis. The AUC-ROC values for predicting the 5-year survival rate of HCC patients by the logistic regression, SVM, DTC, RF, and XGBoost models were 0.7367, 0.9706, 0.6569, 0.7412, and 0.7031, respectively. Among them, the SVM model had the best 5-year survival performance in predicting HCC (Accuracy = 0.9868, F1 score = 0.9882, Recall value = 1.0000).
Conclusion
The SVM model based on ML methods could predict the 5-year survival of HCC patients and has good recognition ability, with greater accuracy than traditional models. Diagnosis and treatment can be utilized to intervene in the risk factors in this model, and improve patient prognosis.