This retrospective study was approved by the institutional review board of Zhongshan Hospital of Fudan University (Shanghai, China). The written informed consent was obtained from each patient. Considering follow-up time, 1829 consecutive patients, who underwent surgical resection for HCC from January to December 2016 at Zhongshan Hospital, Fudan University, were included. The pathology of HCC was confirmed by two independent pathologists. Patients who met the following inclusion criteria were enrolled in this study: (1) Pathologically confirmed HCC patients after surgical resection; (2) Available for the pathological assessment of MVI; (3) Available for effective DCE-MRI and clinical data within 1 month preoperatively. Patients who met the following exclusion criteria were excluded (1) With prior antitumoral therapies and hepatectomy; (2) With multiple liver tumors; (3) With macrovascular invasion and extrahepatic metastasis. The flow-chart of this study is shown in Supplementary Figure 1.
The final population consisted of 601 patients in this study. All patients were randomly divided into the training cohort (n=461, 76.7%) and the testing cohort (n=140, 23.3%).
Pathological evaluation and clinical parameters
The definition of MVI is the presence of tumor in the portal, hepatic venous system, or the branches surrounding the hepatic tissue lined by endothelium, which is visible only by microscopy (Cong et al. 2016; Roayaie et al. 2009). The MVI status is defined as MVI-absent and MVI-present. MVI grade is classified as MVI-grade 1: the number of MVI <5 and the distance of MVI ≤1 cm away from the tumor tissues, and MVI-grade 2: the number of MVI >5 or the distance of MVI >1 cm away from the tumor tissues, according to the practice guidelines for the Pathological Diagnosis of Primary Liver Cancer of China (Cong et al. 2016). Both MVI status and grade were examined by two independent pathologists in the Zhongshan hospital. Clinical parameters were collected, including sex, age, routine blood test, blood biochemical test, blood coagulation function test, markers of hepatic fibrosis, hepatitis virus B carriers, AFP, and tumor size. Serum component index, such as platelet-lymphocyte ratio (PLR), neutrophil-lymphocyte ratio (NLR), lymphocyte-to-monocyte ratio (LMR), prognostic nutritional index (PNI), aspartate aminotransferase-to-platelet ratio index (APRI), aspartate aminotransferase-to-neutrophil ratio index (ANRI) and aspartate aminotransferase-lymphocyte ratio (ALR), were calculated as previous reported (Zheng et al. 2017). Details of clinical parameters can be found in the Supplementary Table 1.
All HCC patients underwent preoperative MRI with gadopentetate dimeglumine (Magnevist; Bayer Schering Pharma AG, Berlin, Germany) on a 1.5-T (Avanto, Siemens, Erlangen, Germany and Aera, Siemens, Erlangen, Germany) or 3.0 T scanners (Magnetom Verio, Siemens Medical Solution, Erlangen, Germany and Signa HDx, GE Healthcare, Milwaukee, WI, USA). Eight routine abdominal DCE-MRI sequences included turbo spin-echo T2-weighted (T2WI) with fat suppression, diffusion-weighted images (DWI) of b=0 and 500 s/mm2, automatically generated apparent diffusion coefficient (ADC) maps under free-breathing, three-dimensional (3D) T1-weighted volumetric interpolated breath-hold examination of pre-contrast phase, arterial phase (20–30 s), portal venous phase (about 80 s), and delayed phase (3 min). Details of MRI parameters were listed in the Supplementary Table 2. Tumor regions were drawn manually slice-by-slice by an abdominal radiologist and a hepatic surgeon. The annotation only requires the manual selection of the tumor region in each slice without precise tumor boundary. Before feeding to the models, these regions were confirmed and corrected by a senior radiologist and a senior hepatic surgeon with more than 10 years of experience in reading abdominal MRI.
Developing a DL model
A DL model was constructed to predict the MVI status and grade of MVI. The architecture of the DL model is shown in Figure 1. The DL model has eight inputs, which are 3D volumes of interest (VOIs) of the eight MRI sequences. This model has eight separate conventional neural network (CNN) branches, which are used for feature extraction from each of the eight VOIs. Then, the features extracted from the eight branches were fused, and the fused features were fed into fully connected (FC) layers and a SoftMax layer to obtain the predicted results. Notably, instead of using Resnet or other deep neural networks as the branches, we designed specific architecture as CNN branches for feature extraction in this task, as described in Supplementary Materials.
The input VOI of the DL model was cropped from raw MRI sequences according to the tumor annotation mask and resized to 64×64×16 pixels by third-order spline interpolation [0, 1]. The label of input VOIs was MVI grade encoded to one-hot. The DL model has three outputs, corresponding to the predicted probabilities of MVI-absent, MVI-grade 1, and MVI-grade 2, respectively. In the MVI grade prediction task, the category corresponded to the maximum value among the three outputs was recorded as the predicted MVI grade. Based on this result, MVI-grade 1 and 2 categories were both recorded as MVI-present in the MVI status prediction task.
In the training stage, the DL model was trained using a SoftMax cross-entropy loss to learn the MVI grade of the inputs. In order to account for class imbalance while calculating cross-entropy loss, each class was weighed according to its frequency, such that cases of rare conditions would contribute highly to the loss of function (Liu et al. 2020). Data augmentation was applied to improve generalization, which included horizontal flipping, vertical flipping, cropping, and zoom transformation for the training cohort. Next, we set learning rate at 10-4 and applied the Adam optimizer to update the model parameters on a batch size of 2. Early stopping was used with the patience parameter set at 50. Finally, the DL model was subjected to a maximum of 1000 epochs of training.
Developing a DL combined with clinical parameter (DLC) model
To improve the predictive performance, we integrated clinical information with DL model. The backward stepwise method was applied to select the significant clinical parameters. The minimum Akaike information criterion (AIC) index was used as the stop criterion to determine the optimal characteristics. Then, the selected parameters were incorporated into the DL model to form the DLC model (Figure 1). Notably, for the selected parameters, categorical variables were encoded by one digit (i.e., -1 or 1 for each state), and continuous variables were normalized to [-0.5, 0.5].
During the DLC model training, eight VOIs were entered into the model as input, and the selected clinical parameters were input into the FC layer instead of extracting deep features and combining them offline. The training procedure and hyperparameter setting of the DLC model were similar to those of the DL model.
Developing a radiomics model
To compare the predictive performance of the DL and DLC models, we also constructed a radiomics model (Aerts 2016), which is described in the Supplementary Materials.
Correlation analysis between resection margins and oncological outcomes in the DLC-predicted MVI absent/present population
Further investigation between resection margins and oncological outcomes was performed in the DLC-predicted MVI-absent/-present population. The definition of resection margin is the minimum distance between the tumor and the cutting edge in the formalin fixed tissues. All resection margins were evaluated by two independent pathologists in our center. A maximal Youden index in receiver operating characteristic (ROC) curve analysis was calculated to determine the cutoff point of the resection margins. In this study, the margins at or less than cutoff points were defined as narrow resection margins, while the others were defined as wide resection margins.
Continuous data were expressed as using the mean ± standard deviation (SD) or median (interquartile range (IQR)) as appropriate. Continuous variables were analyzed using Student’s t-test or Mann–Whitney U test appropriately. Categorical variables were analyzed using the χ2 test. The recurrence-free survival (RFS) was recorded from the date of surgery to the date of radiographic detection of recurrence. Overall survival (OS) was defined as the interval between surgery and death or censor at the date of the last follow-up. A ROC curve was used to assess diagnostic performance. Statistical analysis was performed using SPSS v.25 (IBM Inc., Armonk, NY, USA) and R software (R software version 3.5.2, R Project for Statistical Computing, http://www.r-project.org). The proposed DL and DLC models were implemented using Python (version 3.5, https://www.python.org) based on the Pytorch package (https://pytorch.org/). A two-sided P-value <0.05 was considered statistically significant.