The protocol of this retrospective study was approved by the institutional review boards of the Saga University Hospital (approval number: 2017-11-JINSOKU-07) and Kyushu University Hospital (2021 − 201). Figure 1 shows the overall workflow of this study for segmenting HCC regions using TSTL. DCE CT images and annotations for poorly and well-differentiated HCCs at three phases (arterial, portal, equilibrium phases) were inputted to the TSTL models after a preprocessing. After generating a pre-trained segmentation model using CT images of lung cancer patients, poorly and well-differentiated HCC regions were sequentially and separately segmented using TSTL. The average 3D Dice’s similarity coefficient (3D-DSC) and the 95th-percentile of the Hausdorff distance (95% HD) were employed for evaluating the segmentation accuracy based on a nested 4-fold cross-validation test [19–21]. Additionally, Spearman’s rank correlation coefficients and errors between predicted HCC regions and annotations were assessed with respect to the longest HCC diameters.
Clinical cases
Eighty patients with poorly differentiated HCC and 48 patients with well-differentiated HCC were selected from patients treated in our institutions from 2013 to 2019. The patients underwent resection, RFA, TACE, and molecularly targeted drug therapy [5]. Table 1 summarizes the clinical cases of poorly differentiated HCC and well-differentiated HCC enrolled in this study.
Table 1
Summary of clinical cases for poorly differentiated HCC and welldifferentiated HCC in this study.
| Poorly differentiated HCC | Well-differentiated HCC |
Total number of cases | 80 | 48 |
Age (years), min-max(median) | 49–91 (68.5) | 57–86 (70) |
Sex | | |
Male | 45 | 30 |
Female | 35 | 18 |
Treatment methods | | |
RFA | 23 | 12 |
Resection | 30 | 19 |
TACE | 25 | 10 |
Molecularly targeted drug | 2 | 5 |
Follow-up | 0 | 2 |
Average longest HCC diameter of annotations (mm) | 20.9 | 19.2 |
HCC patients were scanned at an arterial phase (40 s after injection), a portal phase (65 s after injection), and an equilibrium phase (240 s after injection) on a multi-slice CT scanner (Aquilion ONE ViSION, 320 rows, Canon Medical Systems, Otawara, Japan; LightSpeed VCT VISION, 64 rows; GE Healthcare, Chicago, IL; SOMATOM Definition force, 256 rows, Siemens Medical Solutions, Erlangen, Germany). The DCE CT images were acquired with a tube voltage of 120 kV, a gantry rotation time of 0.5 s, matrix size from 512 × 512 × 150 to 512 × 512 × 210 pixels (hereinafter expressed as 512 × 512 × 150–210 pixels); pixel sizes of 0.628, 0.680, and 0.750 mm; and slice thicknesses of 0.72, 0.80, 1.00 mm, respectively. In matrix sizes of CT images, the first two values represent the number of in-plane (axial plane) pixels, and the third value is the number of slices (inferior‒superior direction). The image noise level (standard deviation [SD]) was kept within 6.0 HU on the CT scanner, by using an automatic exposure control technique. CT images were reconstructed with two consecutive algorithms: applying a reconstruction function for a standard abdominal region with a low-frequency enhancement function, followed by using an iterative reconstruction algorithm.
The HCC region contours were delineated as annotations (reference contours) to be VOI on the arterial phase CT images in poorly differentiated HCC and on the equilibrium phase in well-differentiated HCC, based on consensus between an abdominal radiologist (JN) and a medical physicist (NN), using open-source software: 3D Slicer (version 4.10.2) [22]. The annotations were manually registered with HCC regions on the other two phases by translating the annotations, because misregistration of DCE images at different phases occurred between the annotations and HCC regions due to respiratory motions. CT images for well-differentiated HCC regions were inverted with respect to the CT values to obtain higher CT values in the HCC regions than those in the background.
Preprocessing of HCC region datasets for CT images
Anisotropic CT and annotation binary images were transformed into isotropic images with an isovoxel size of 0.75 mm, using cubic and nearest neighbor interpolations, respectively [23]. The volumes of intereste (VOI) were cropped based on the center of volume of the circumscribed square of the maximum dimension annotations from the annotation data sets. The cropped VOI images were resized to 64 × 64 × 8–58 pixels to adjust the input image sizes to the NiftyNet platform [18]. A Laplacian of Gaussian (LoG) filter was applied to the interpolated CT images to reduce noise and enhance edges. The LoG was defined as
\(\text{L}\text{o}\text{G}(x, y,z)= -\frac{1}{\pi {\sigma }^{4}}\left[1-\frac{{x}^{2}+{y}^{2}+{z}^{2}}{2{\sigma }^{2}}\right]{e}^{-\frac{{x}^{2}+{y}^{2}+{z}^{2}}{2{\sigma }^{2}}} ,\) | (1) |
where \(\sigma\) denotes the SD, which was set as \(\sigma =2.0\) in this study. |
Two-step transfer learning based on Dense V-net
Figure 2 shows the architecture of the TSTL based on Dense V-net. In the first transfer learning, the weights of the lung cancer segmentation model were updated by inputting the poorly differentiated HCC data to construct the poorly differentiated HCC segmentation model. In the second transfer learning, the weights of the poorly differentiated HCC segmentation model were updated by feeding the well-differentiated HCC data to construct the well-differentiated HCC segmentation model.
The reason why we employed the Dense V-net was that Cui et al. [18] reported that Dense V-net was feasible for segmentation of lung cancer regions, which achieved a 3D-DSC of 0.832 ± 0.072 and HD of 4.57 ± 2.44 mm. Dense V-net is composed of two advantageous features, the dense connection, and V-network. The main advantage of dense connection is that it inputs all the features extracted in the previous layer, avoiding overfitting when using deeper networks and smaller data sets. The V-network is an extension of the U-network, where the input image is changed to a 3D image and the residual learning of the U-Net is employed to improve the performance.
Transfer learning is a technique for adapting a model learned in one domain to another domain. Deep convolutional activation features learned from ImageNet, a large natural image database, have been successfully transferred to differentiate patients into the degree of differentiation for HCCs from CT images, with little training data [24]. We hypothesized that a DL model for lung cancer would be applicable to segmentation of poorly differentiated HCC, which could then be retrained for well-differentiated HCC, because of the more similarity between lung cancer and poorly differentiated HCC, as well as that between poorly and well-differentiated HCCs, than natural images. We verified that lung tumors for 12 cases, which were randomly selected, were similar to HCC for different poorly differentiated 12 cases by making all combinations between two datasets, whose result was an average cross-correlation coefficient of 0.719 ± 0.12.
Three types of DCE CT images at arterial, equilibrium, and portal phases were fed into three channels of Dense V-net, respectively, and HCC contours (reference contours) were given as annotation data to the network. Optimal hyperparameters for building a model with weights for a poorly differentiated HCC were as follows: an activation function of tanh; a loss function of Dice loss [25]; a dropout rate of 60; an optimization algorithm of RMSProp [26]; a batch size of 8; a learning rate of 0.001; and the number of epochs of 50. Furthermore, the model with weights for a well-differentiated HCC was as follows: tanh; Dice loss; 65; RMSProp; 8; 0.001; and 60. It is advantageous to employ data augmentation techniques such as image rotation, flipping, and scaling to reduce the DL overfitting problem using smaller amounts of medical image data [27]. The images were randomly rotated from − 40° to + 40°, randomly flipped, and randomly scaled from − 20% to + 20%.
Evaluation of proposed models
The proposed models were evaluated with the nested cross-validation method, because Dora et al. [21] showed that nested cross-validation could avoid an overfitting problem and evaluate the generalization performance of models. In the nested cross-validation, a cross-validation was divided into two parts: inner and outer cross-validation loops. Figure 3 shows a procedure of the nested cross-validation method. The whole dataset is split into 4 folds, one outer fold is retained as a 1st test dataset, and the remaining 3 outer folds are merged and split into 4 folds for an inner cross-validation. Hyperparameters were optimized in the 1st inner training, and the constructed model was evaluated with the 1st outer test dataset in the outer cross-validation. This procedure is repeated for the rest of the outer training and tests. The model with the highest 3D-DSC in the four outer folds for poorly differentiated HCC is employed in the second transfer learning. To investigate the impact of TSTL on the performance of the proposed segmentation models, all evaluation indices were obtained with and without the TSTL.
3D-DSC and the 95% HD were used to evaluate the accuracy of the proposed segmentation models [19, 20]. The 3D-DSCs denote the degree of region similarity between reference HCC regions and the regions predicted using the proposed model [19]. The HD is a mathematical construct used to measure the 3D closeness of two sets of points that are subsets of a metric space. If two images are identical, the HD is zero. The larger HD becomes, the more the shapes of two images differ. In this study, we used 95% HD, that is, the 95th-percentile of the HD between two sets, because the maximum HD is highly sensitive to outliers [20].
Since the longest HCC diameters are assessed in clinical practice, Spearman's rank correlation coefficients (SCs) and the errors in the longest HCC diameters were calculated between the annotated and predicted regions in poorly and well-differentiated HCC. The longest HCC diameters of the regions were measured using an in-house program based on principal component analysis (PCA) (MATLAB version 2019b, MathWorks Inc., Natick, MA) [28]. For the measurement of HCC diameters, the coordinates of voxels within an estimated tumor region were extracted to calculate a covariance matrix. The eigenvalues were calculated by applying a singular value decomposition to the covariance matrix. The longest HCC diameters were estimated using the square roots of the first eigenvalues. The average, minimum, maximum, and percentage errors for the longest HCC diameters were evaluated. Furthermore, to evaluate the impact of the HCC diameter on the accuracy of region segmentation by 3D-DSC, the dataset was divided into two groups according to the median of the longest HCC diameter of the annotations.
Two Dense V-nets were built on the NiftyNet platform [29]. NiftyNet is a convolutional neural network platform designed for medical image analysis research and is an open-source system that uses the TensorFlow framework (version: 1.14.0). NiftyNet was implemented in Python for TensorFlow to improve the readability of the code. All calculations were performed using a GPU of NVIDIA RTX2080 (NVIDIA Corporation, Los Alamitos, CA).
Statistical analysis
R (version 4.0.3) was used for all statistical analyses [30]. In the model for poorly and well-differentiated HCC, the Mann‒Whitney U-test was applied to assess statistically significant differences in the average 3D-DSC and 95% HD between the proposed model and the model without TSTL.