A Deep Learning Nomogram for the Prediction of Early Recurrence in Hepatocellular Carcinoma After Curative Surgery

Meng Yan Jinan University First A liated Hospital Xiao Zhang Zhuhai City People's Hospital Bin Zhang Jinan University First A liated Hospital Zhijun Geng Sun Yat-sen University Cancer Center Chuanmiao Xie Sun Yat-sen University Cancer Center Wei Yang Southern Medical University Shuixing Zhang Jinan University First A liated Hospital Zhendong Qi Zhujiang Hospital Ting Lin Zhujiang Hospital Qiying Ke GZHUCM: Guangzhou University of Chinese Medicine Xinming Li Zhujiang Hospital Shutong Wang Sun Yat-sen University First A liated Hospital Xianyue Quan (  quanxianyue2014@163.com ) Zhujiang Hospital https://orcid.org/0000-0003-2293-9345


Introduction
Hepatic resection is the rst-line treatment for patients with early-stage Hepatocellular carcinoma (HCC) and well-preserved liver function [1]. However, the recurrence rate of HCC is as high as 70% ve years after surgery Ishizawa et al. [2]. More than 80% of recurrences are intrahepatic, including intrahepatic metastases from a preliminary tumor, considered as true recurrence, and de novo multicentric metastasis [3]. The poor prognosis of patients is closely related to intrahepatic metastases and mainly presents as early recurrence (within 2 years), whereas late recurrence (>2 years) is more likely to be associated with underlying liver diseases, such as cirrhosis [4]. Thus, the assessment of the risk of early recurrence in HCC patients is clinically relevant.
Numerous cancer-related factors have been identi ed as predictors of early recurrence, such as microvascular invasion (MVI), surgical margin, and tumor size [5,6]. However, most of the risk factors can only be obtained by postoperative pathology and cannot be used to assess prognosis and develop treatment plans prior to hepatectomy. Medical imaging is a routine preoperative examination for patients with HCC. Gadolinium-ethoxybenzyl-diethylenetriamine pentaacetic acid (Gd-EOB-DTPA) is a hepatobiliary-speci c contrast agent, and Gd-EOB-DTPA-enhanced magnetic resonance imaging (MRI) is better than [7] MRI using other agents [7].
Although several imaging features observed on Gd-EOB-DTPA-enhanced MRI, including arterial peritumoral enhancement, non-smooth tumor margin, and peritumoral hypointensity on hepatobiliary phase (HBP) [8], have been demonstrated to be associated with early recurrence in HCC patients, the prediction of early recurrence by MR features may be subjective and dependent on the radiologist's experience. Visual features are limited by image grayscale and lose much information about tumor heterogeneity.
Deep learning (DL), a subset of machine learning, is a new diagnostic technology for mining the internal information of medical images. DL can be applied to prognosis prediction [9], and treatment response evaluation [10] by automatically extracting deep-learned or high-order image features. Among them, convolutional neural network (CNN) is famous for handling image classi cation tasks [11]. Thus, if DLbased image features can be directly used to predict prognosis, it would provide a promising non-invasive method to provide better options for individualized patient treatment.
Therefore, we aimed to investigate the feasibility of deep features extracted from Gd-EOB-DTPA-enhanced MR images for predicting the early recurrence of HCC after curative resection. Furthermore, we evaluated the predictive performance of the DL-based nomogram incorporating deep features and signi cant clinical indicators.

Patients and dataset
The institutional ethics review boards of Zhujiang Hospital of Southern Medical University (2020-KY-094-01) and Sun Yat-Sen University Cancer Center (SL-B2021-214-02) approved this retrospective study, and the requirement for informed consent was waived. Patients with suspected HCC who underwent Gd-EOB-DTPA-enhanced MRI scans between January 2012 and September 2018 at two institutions prior to curative resection were consecutively included. The inclusion criteria were as follows: (a) patients with pathological con rmation of HCC; (b) patients with Barcelona Clinic Liver Cancer (BCLC) stage 0, A, or B HCC; (c) patients received no previous anti-cancer treatment; and (d) patients who underwent Gd-EOB-DTPA-enhanced MRI of the liver within one month before surgery. The exclusion criteria were as follows: (a) recurrent HCC or combined with hepatocyte cholangiocarcinoma or metastatic tumor in the liver; (b) without radiographic MVI or extrahepatic metastasis; (c) incomplete clinical, radiological, pathological, or follow-up data; and (d) patients died due to postoperative complications or liver cancer rupture within two weeks ( Supplementary Fig. 1). All patients were randomly divided into the training and validation sets at a ratio of 7:3.
Baseline clinicopathological data were collected from electronic medical records. Clinical data included demographics and time to early recurrence. Laboratory features included neutrophil count, Hepatitis B virus DNA, α-fetoprotein (AFP) level, alanine aminotransferase (ALT), aspartate aminotransferase (AST), and γ-Glutamyltranspeptidase (GGT). Pathologic data were the presence of MVI, de ned as tumor emboli in a vascular space lined by endothelial cells on microscopy [12], Barcelona Clinic Liver Cancer (BCLC) stage, and tumor number.

Follow-up surveillance and clinical endpoint
All patients were followed up for at least two years after curative resection. Patients were screened for tumor recurrence through serum AFP level, ultrasound, contrast CT or MRI scan of the chest and abdomen in the rst month after surgery and once every three months thereafter during the rst year and every six months thereafter. The censored follow-up time was October 1, 2020.
The study endpoint was early recurrence, which was de ned as one (or more) of the following events that occurred within two years after curative resection: (a) presence of new hepatic lesions with typical imaging ndings of HCC, (b) atypical imaging ndings with biopsy or re-postoperative pathologycon rmed HCC, or postoperative transarterial chemoembolization-revealed tumor staining, and (c) extrahepatic metastases.

MR imaging acquisition
The MRI machines and parameters are provided in Supplementary Methods and Supplementary Table 1.

Qualitative analysis of MR Images
Two radiologists with 5 and 10 years of experience in diagnostic abdominal imaging independently observed the imaging features without prior knowledge of the pathological ndings, and when they disagreed on the results, the decision was made by mutual agreement. The MR features included: (a) tumor size, de ned as maximum diameter on transverse HBP images; (b) arterial phase (AP) enhancement type: type 1 represents a homogeneous enhancement pattern with no increased arterial blood ow; type 2 represents a homogeneous enhancement with increased arterial blood ow; type 3 represents a heterogeneous enhancement containing non-enhanced areas; type 4 represents a heterogeneous enhancement pattern with irregular ring-like structures [13][14][15]; type 5 represents a heterogeneous and hypointense; (c) capsule appearance: peripheral rim of uniform and smooth hyperenhancement in the portal phase (PVP) or delayed phase, and categorized into three groups (absent, incomplete and complete) [16]; (d) hypodense halo: a rim of hypointensity partially or wholly surrounding the tumor; (e) intratumor necrosis, a low signal on T1-weighted imaing (T1WI), a high signal on T2weighted imaging (T2WI), and a low signal on all enhanced phases; (f) satellite nodules, de ned as small (< 2 cm) tumor nodules close (< 2 cm) to the main tumor [17]; (g) peritumoral hypointense, de ned as a ame-like or wedge-shaped hypointense areas of the hepatic parenchyma around tumor on HBP images [18]. Supplementary Fig. 2 shows the images of the MR features.
Image segmentation and DL feature extraction The regions of interest were delineated around the boundary of tumor at the largest dimension. A state-ofthe-art architecture VGGNet-19 was then applied to extract 1472 DL features from the AP, PVP, and HBP images, respectively. The DL network contains ve convolutional layers, four max-pooling layers, three fully connected layers, and a softmax layer. The DL work ow is shown in Fig. 1. More details are provided in Supplementary Materials.

Feature selection and deep learning signature development
The DL features were subjected to the following steps: feature value preconditioning, de-redundancy, and dimensionality reduction to select the features strongly related to early recurrence; then, machine learning methods were used to predict the status of outcome events and establish a DL signature that can predict early recurrence. All features were rst normalized to the range of [0,1] by the minimum-maximum normalization method. Moreover, Spearman correlation analysis was added to retain features associated with the early recurrence of HCC (P < 0.05). Then, the Pearson correlation coe cient (i.e., r) was used to remove one redundant feature with a lower r from the feature pairs (r > 0.9). The high predictive features obtained were further screened by variance analysis, recursive feature elimination (RFE), and Relief algorithm. Five types of classi ers, including random forest (RF), support vector machine (SVM), least absolute shrinkage and selection operator logistic regression (LASSO), Adaboost, and Gaussian Process (GP), were compared to identify the outcome status of early recurrence for every phase of the DL features.

Clinical and deep learning analysis
A two-sided P-value was considered statistically signi cant if < 0.10. Univariate logistic regression analysis was performed in the training set, and signi cant variables P < 0.10 were entered into the multivariate logistic regression using the forward likelihood ratio method to identify the independent risk factors for early recurrence. A two-sided P-value < 0.05 was considered statistically signi cant. The nomogram was plotted based on the results of multivariate logistic regression models.
Collinearity analysis of conventional clinical factors and DL signatures was also performed. The evaluation indicators were tolerance and variance in ation factor (VIF), tolerance value < 0.1, or a VIF value > 5, considered as collinearity between two variables.

Statistical analysis
Comparisons between two groups were conducted using the Chi-square test or Fisher's exact test for categorical variables while the Mann-Whitney U test for continuous variables.
The receiver operating characteristic curve (ROC) analysis was employed to calculate the area under the curve (AUC), accuracy, sensitivity, and speci city. Comparisons between different DL signatures, and between different models were performed using the Delong test. Model t was assessed using calibration plots using 1000 bootstrap resamples. The clinical utility of the models was evaluated using decision curve analysis (DCA). Softwares and packages for statistical analyses are provided in Supplementary Materials. All statistical tests were two-sided with a signi cance level of 0.05.

Clinical characteristics
A total of 285 patients (male, n = 254; mean age, 52.89 years ± 11.69, range, 13-79 years) were included, which was divided into the training set (n = 195) and validation set (n = 90), respectively. Early recurrence occurred in 77 (27.0%) patients and there was no difference in the early recurrence rate between the training set (27.7%, 54/195) and validation set (25.6%, 23/90). The clinical characteristics of the two groups are shown in Table 1. No statistically signi cant difference was observed between the two sets (P = 0.373-1.000).   Table 2); however, only neutrophil count, AST, and MVI were identi ed as independent risk factors for early recurrence of HCC. The clinical nomogram was built based on the three indicators (Fig. 2a), achieving an AUC of 0.751 (95% CI: 0.674-0.827) in the training set and 0.712 (95% CI: 0.582-0.841) in the validation set (Table 4, Fig. 2c-d). DL, deep learning. After that, the above-mentioned seven clinical factors together with mp-MR-based DL signature were included in the multivariate analysis, and tumor number, MVI, and mp-DL signature were identi ed as independent risk factors for early recurrence (P < 0.05). We developed a DL nomogram incorporating the tumor number, MVI, and mp-MR-based DL signature (Fig. 2b), which signi cantly outperformed the clinical nomogram, yielding an AUC of 0.949 (95% CI: 0.919-0.980) in the training set and 0.908 (95% CI: 0.841-0.976) in the validation set (Table 4, Fig. 2c-d). The Delong test showed a signi cant difference between the clinical nomogram and DL nomogram in the training set (P < 0.0001) and validation set (P = 0.002). Fig. 2e-f demonstrate good calibration of the DL nomogram. The DCA curve showed that the DL nomogram had a higher net bene t than the clinical nomogram ( Fig. 2g-h).

Discussion
In this present study, we used the DL approach to explore the informative features from Gd-EOB-DTPAenhanced MRI images that were associated with early recurrence of HCC and established three singlelayered DL signatures and mp-MR DL signature fused with three-phase MR sequences. The results showed that the mp-MR DL signature was better than the three single-layered DL signatures.
Subsequently, the DL nomogram was constructed by integrating tumor number, MVI, and the mp-MR DL signature, which achieved higher predictive accuracy and better net bene t than the clinical nomogram.
This study demonstrated the incremental value of the DL signature as compared to the conventional clinical nomogram.  [21]. In the eld of oncology, radiomics is a recently emerged technology that extracts a large number of quantitative image features from standard-of-care medical imaging using data-characterisation algorithms. Zhao et al. [22] constructed a nomogram by integrating radiomic score and clinicalradiological factors, with an AUC of 0.873. Kim et al. [23] developed a combined clinicopathologicradiomic model via RF algorithm, which acquired a C-index of 0.716. Although these radiomic studies yielded good performance, they were limited to a small sample size and a single feature selection method. In the eld of feature engineering, different machine learning based dimensionality reduction techniques have distinct mathematical senses and inherent limitations; thus, multiple algorithms should be combined to select robust features [24]. Prior studies also proved that different dimensional reduction methods combined with several machine learning methods could maximize the diagnostic performance of the model. Dai et al. [25] found that feature selection and modeling methods could have potential effects on prediction models. The optimal radiomic model for MVI evaluation was constructed by gradient boosting decision tree (GBDT) classi er, which outperformed logistic regression, SVM, and RF. Ni et al. [26] identi ed LASSO plus GBDT as the optimal combination for predicting MVI in HCC patients from 21 combination methods (three feature selection methods and seven classi cation methods). In our study, we compared three feature selection methods together with ve classi cation methods to determine the best combination and found that RFE or Relief combined with GP classi er obtained the optimal performance in building DL signatures.
Recently, arti cial intelligence has emerged as an effective tool to demonstrate multi-modal patient data [27]. Given the fact that radiomics comes with time-consuming progress of features engineering, we used the VGGnet-19, which can generally perform a more robust image task through automatic analysis without export's intervention [28]. Gd-EOB-DTPA-enhanced MRI could better capture the perfusion and functional alterations, and be more sensitive and accurate in the detection of HCC. Hence, we chose Gd-EOB-DTPA-enhanced MRI instead of CT images to predict the early recurrence in patients with HCC. Liu et al. [29] also revealed that Gd-EOB-DTPA-enhanced MRI had a signi cantly higher sensitivity and overall accuracy for HCC especially small lesions than multiphasic CT without substantial loss of speci city. Wei et al. [30] explored the validity of DL in predicting MVI of HCC using CT and Gd-EOB-DTPA-enhanced MRI to train two DL models and suggested that the EOB-MRI based model achieved a better prediction result (AUC = 0.812) than the CT-based model (AUC = 0.736). In addition, the results of our study demonstrated that the HBP signature yielded higher AUC and sensitivity than AP signature and PVP signature, whereas lower speci city and accuracy than PVP signature. Well-de ned tumor margins on HBP images allow for a more accurate tumor delineation than that on AP and PVP images, in which the tumor margins could be affected by peritumoral enhancement, capsule appearance, and hypodense halo. Furthermore, this current study also showed the complementary role of MR sequences in DL analysis.
Our study also has several limitations. First, the retrospective nature of this study may induce inevitable selection bias. Second, our study was conducted in two centers, more institutions for external validation are required in the future. Finally, the value of the DL model for improving long-term survival in HCC patients remains unclear, the differences between DL model-assisted and non-assisted practices warrant further study to prove the clinical application of the DL model.