Study design and patient population
This retrospective study was approved by Zhongshan hospital Ethics committee and the requirement for written informed consent was waived. All procedures involving human participants were performed in accordance with the 1975 Helsinki declaration and its later amendments.
We queried our institution’s medical records to derive data from patients who underwent hepatic resection for HCC in year 2015 and year 2018 respectively. The key inclusion criteria for our study were as follows: (1) resectable HCC lesion without macroscopic vascular invasion; (2) underwent preoperative gadoxetic acid–enhanced and DW liver MR imaging within 1 month before surgery; (3) without a history of preoperative anti-cancer treatment; (4) pathological confirmation of HCC (5) MR imaging quality adequate for analysis. Exclusion criteria included: (1) received other anti-tumor therapies before surgery; (2) incomplete clinical or pathological information.
A total of 321 confirmed cases of HCC were identified, with 149 HCC patients forming the 2015 cohort and 172 patients forming 2018 cohort, according to the inclusion and exclusion criteria. Data of some preoperative laboratory examinations were collected, including liver function tests, hepatitis B and C immunology, serum a-fetoprotein (AFP) level, serum alanine aminotransferase (ALT), aspartate aminotransferase (AST), γ-glutamyl transpeptidase (GGT), serum total bilirubin (TB), conjugated bilirubin (CB), serum albumin (ALB), platelet count (PLT), prothrombin time (PT), international normalized ratio (INR). The diagnosis of HCC was histologically or clinically confirmed based on the criteria of the American Association for the Study of Liver Diseases (AASLD)36.
MR imaging acquisition
All HCC patients underwent preoperative Gadoxetic acid-enhanced MR imaging examination by a 1.5T scanner (Siemens Healthcare, Erlangen, Germany). Image acquisition procedures were performed as previously reported37. Namely, MR Imaging sequences included axial T2-weighted imaging(T2), diffusion-weighted imaging (DWI), in-phase and opposed-phase T1-weighted imaging(T1), and pre-contrast and post-contrast dynamic three-dimensional T1-weighted volumetric-interpolated breath-hold examination (VIBE) at arterial phase (T1A) , portal venous phase (T1V) , delayed phase (T1D) after injection of 0.025 mmol/kg of gadoxetic acid (Primovist, Bayer Schering Pharma, Berlin, Germany) into the cubital vein, followed by a 20-mL saline flush.
Deep learning network architecture and workflow
Considering that different modalities of MRI contain different features to characterize MVI, analyzing the effect of different modalities and combinations of modalities for MVI prediction is necessary and important. To analyze MR Images from different modality, we first performed the feature extraction individually and then fused the extracted features to predict the status of MVI. Specifically, the whole procedure was divided into three steps: single modal image feature extraction, feature fusion and feature normalization (Fig.1).
Step1: feature extraction from a single modality
Given a 2D slice image from a single modality, a region of interest (ROI) in a rectangle shape from the original image was cropped by two experienced doctors. In order to focus on the boundary region of the tumor, we enlarged the ROI by 5-10 pixels at every boundary of the rectangular ROI. In other words, the inputs of our deep learning MR analysis model were the enlarged ROIs.
Considering that the large amount of training data could improve the performance of the model, we used data augmentation method to increase number of ROIs including image flipping, image scaling, adding gaussian noise. In order to make all the ROIs have the same size, we resized them into 320 320, then the processed ROI were input into the conventional neural network (CNN) network. The detailed network architecture of the CNN is as following. The first part of the CNN includes one 64 7@7 convolution layer, a normalization module and a max pooling layer. After going through all these layers, we obtained a feature map of the input ROI roughly. The following structure of the CNN network contains six 64 3@3 convolution bottleneck modules and six 128 3@3 convolution bottleneck modules, in which all the bottleneck modules are employed from Resnet. The raw feature map of the ROI obtained at the first part was processed by these bottleneck modules sequentially, thus the feature of the ROI at different scales could be learned accordingly. The motivation behind this design of the network is that different scales of convolution bottleneck can help the CNN network learn feature wit diverse scales. Features learned at shallow layers pay much attention to the detailed structure information while features learned at deep layers care more about the global information. Considering the target to analyze MR image is to predict the MVI level which is a global characteristic of an image, we directly use the output feature of the last 128 3@3 convolution bottleneck module which is a 512-dimensional feature vector and feed into the next fully connected layer (FC).
Step2: Feature fusion from multiple modalities
For each patient in 2015 HCC cohort, 3 continuous slices showing the maximal diameter of the tumor were first exported from six modalities (i.e.T1, T1A, T1V, T1D, T2, DWI), and then analyzed in our model. As mentioned above, the feature of an image from each modality is extracted beforehand. Theoretically, we could use a CNN network with six branches sharing same weights to extract six modality images simultaneously, and then concatenated the learned six features as a fused one to feed into the FC layer. After passing the FC layer and the SoftMax, a confidence score identifying the level of MVI of the patient was obtained. However, the prediction performance of the CNN network by combining all six modalities was not as well as expected, and three of six modalities (i.e.T1A, T1V, T1D) after empirically evaluation provides the best performance. More details could be referred to the experiment Section.
Step3: Feature normalization
After concatenating the features from different modalities, the fused feature was feed into a FC layer combined with a SoftMax classifier, which helped to normalize the feature into n 1 bin vector. Here n is the number of MVI levels.
Step4: Network Training and Testing
Considering that we had 3 continuous slices for each modality, anyone of the three slices could be used to characterize the MVI information from this modality. Thus, for each patient, 27 different slice combinations from three modalities (i.e.T1A, T1V, T1D) could be used as the input of the CNN network. In other words, for each patient, 27 training samples sharing the same label are given. Different from any data augmentation techniques, data shuffling like above is a unique way employed in our model thus the performance of our network could be further improved.
Given a testing sample, in the inference step, any one of the 27 different slice combinations could be used as the input of the trained CNN network. Without loss of the generality, we simply proceed to the first slice of 3 continuous slices for each modality.
Histopathology
All surgical specimens were examined by 2 experienced pathologists, particularly to detect the presence of MVI. MVI was defined as the presence of tumor invasion in smaller intrahepatic vessels including a portal vein, hepatic vein, or a large capsular vessel of the surrounding hepatic tissue lined by endothelium that was visible only on microscopy7. MVI grade is classified as M1: the number of MVI < 5 and the distance of MVI ≤ 1 cm away from the tumor tissues, and M2: the number of MVI > 5 or the distance of MVI > 1 cm away from the tumor tissues, according to the practice guidelines for the Pathological Diagnosis of Primary Liver Cancer of China38. The histologic parameters ordinarily included Edmondson-Steiner grade, size, surgical margin and MVI status of the tumor.
Statistical analysis
Statistical analysis was performed using SPSS v.25 (IBM Inc., Armonk, NY, USA) and R software (R software version 3.5.2, R Project for Statistical Computing, http:// www.r-project.org). The discrimination performance of the DL predictive model was measured by the area under the ROC curve (AUC) value in the primary training/validation set. Calibration curves were plotted to analyze the diagnostic performance of the predictive model in the overall cohort. Decision curve analysis was conducted to determine the clinical usefulness and net benefits of the developed predictive model.
Patients were consistently followed up since the date of surgical resection at intervals of 2 to 3 months. Recurrence-free survival (RFS) and overall survival (OS) were defined as the interval between surgery and detection of first recurrence or death. Patients were censored in case of emigration, or on 31 Dec 2020, whichever came first. Survival curves were plotted using the Kaplan-Meier method and compared by log-rank test. A two-tailed p value <0.05 was considered statistically significant.