Dual segmentation models for poorly and well-differentiated hepatocellular carcinoma using two-step transfer deep learning on dynamic contrast-enhanced CT images

The aim of this study was to develop dual segmentation models for poorly and well-differentiated hepatocellular carcinoma (HCC), using two-step transfer learning (TSTL) based on dynamic contrast-enhanced (DCE) computed tomography (CT) images. From 2013 to 2019, DCE-CT images of 128 patients with 80 poorly differentiated and 48 well-differentiated HCCs were selected at our hospital. In the first transfer learning (TL) step, a pre-trained segmentation model with 192 CT images of lung cancer patients was retrained as a poorly differentiated HCC model. In the second TL step, a well-differentiated HCC model was built from a poorly differentiated HCC model. The average three-dimensional Dice’s similarity coefficient (3D-DSC) and 95th-percentile of the Hausdorff distance (95% HD) were mainly employed to evaluate the segmentation accuracy, based on a nested fourfold cross-validation test. The DSC denotes the degree of regional similarity between the HCC reference regions and the regions estimated using the proposed models. The 95% HD is defined as the 95th-percentile of the maximum measures of how far two subsets of a metric space are from each other. The average 3D-DSC and 95% HD were 0.849 ± 0.078 and 1.98 ± 0.71 mm, respectively, for poorly differentiated HCC regions, and 0.811 ± 0.089 and 2.01 ± 0.84 mm, respectively, for well-differentiated HCC regions. The average 3D-DSC for both regions was 1.2 times superior to that calculated without the TSTL. The proposed model using TSTL from the lung cancer dataset showed the potential to segment poorly and well-differentiated HCC regions on DCE-CT images.


Introduction
Hepatocellular carcinoma (HCC) was the sixth most commonly diagnosed cancer in the world in 2020, with an estimated 815,000 new cases and 747,000 deaths annually [1]. Approximately 90% of primary liver cancers are HCC, which is the fourth leading cause of cancer-related deaths worldwide [2][3][4]. This study addresses primary HCC only, because metastatic liver cancer is treated by different treatment approaches [5]. According to the Clinical Practice Guidelines for Hepatocellular Carcinoma 2017 [6], HCC treatment strategies should be decided based on the Child-Pugh score, metastasis, and portal invasion status, the number of nodules, and the size of the single largest dimension of enhancing viable tumor on arterial phase dynamic contrast-enhanced (DCE) computed tomography (CT) images. In particular, radiofrequency ablation (RFA) 1 3 and transcatheter arterial chemoembolization (TACE) have been chosen depending on the longest HCC diameter (threshold value: 3 cm). Therefore, the accurate segmentation of HCC regions on arterial phase DCE-CT images is substantially necessary for determination of treatment strategies, where this study can contribute by providing more accurate measurement of longest HCC diameters based on HCC regions predicted by proposed models. However, there are two types of HCCs, well and poorly differentiated HCC, which have different imaging features at their tumor peripheral regions in DCE-CT images. Moreover, DCE patterns differ in the three phases of the arterial phase, portal vein phase, and equilibrium phase depending on the differentiation of the HCC [7]. That is because the arterial blood flows from angiogenesis could increase in the peripheral regions of poorly differentiated HCC [8]. Thus, separate segmentations of the poorly and well-differentiated HCC regions on DCE-CT images in the three phases could improve the accuracy. The degree of the differentiation could affect the treatment strategy and management of treatment progress [6]. Recurrence and metastasis are more likely to occur in poorly differentiated HCC with high malignancy grade than well-differentiated HCC [9]. In particular, the patients with poorly differentiated HCCs larger than 2 cm in diameter may have early recurrence within 2 years after their surgery [10]. These backgrounds also encourage us to investigate separate segmentation approaches for poorly and well-differentiated HCCs with three phases DCE-CT images.
However, a study by Kim et al. [11] suggested that there is interobserver variability in the measurement of the maximum HCC tumor diameter (range of Cohen's kappa coefficients for 9 cases: 0.28-0.86), which could lead to poor reliability in decision-making on treatment strategies. Therefore, automated segmentation of HCC regions is essential for accurately and reliably measuring the longest tumor diameters. Past studies have developed deep learning (DL) models for diagnosing the differentiation of HCCs and automated segmentation of HCC regions [12][13][14][15][16]. Yasaka et al. [12] evaluated the accuracy of differential diagnosis of hepatic masses using DCE-CT images. Furthermore, the average DSCs for the segmentation accuracy of liver tumor including primary HCCs and metastatic tumors from colorectal, breast, and lung cancers in CT images using DL. Yuan et al. [13] studied a hierarchical framework based on deep fully convolutional deconvolutional neural networks to segment liver tumor regions. Li et al. [14] proposed a novel hybrid densely connected UNet (H-Dense Unet), which consists of a 2D dense Unet and a three-dimensional (3D) counterpart for hierarchically aggregating volumetric contexts. Qayyum et al. [15] proposed a hybrid 3D residual network (RN) with a squeeze-and-excitation (SE) block for volumetric segmentation of the kidney, liver, and their associated tumors, such as HCC. Alirr [16] proposed a method for liver and tumor segmentation using a fully convolutional neural network (FCN) with region-based level set functions. These previous studies [13][14][15][16] used the liver tumor segmentation (LiTS) challenge dataset [17], where are three issues against the background of this study mentioned above. First, there are metastatic liver tumors from colorectal, breast, and lung cancer as well as primary hepatocellular carcinoma in this dataset. Second, the dataset does not have the labels of poorly differentiated and well-differentiated HCC. Third, the dataset does not include three phase images (arterial, portal, equilibrium phases) in DCE-CT examinations, and it contains one unknown phase image for each case.
On the contrary, since the aim of this study was to investigate DL models for the dual segmentation of poorly and well-differentiated primary HCC regions separately for optimum selection of a treatment strategy, we employed our own clinical dataset including three phase DCE-CT images with poorly and well-differentiated primary HCC regions.
We hypothesized that the separate segmentations of the poorly and well-differentiated HCC regions on three phases DCE-CT images could improve their accuracy, because the DCE-CT images describe different features of poorly and well-differentiated HCC in the arterial, portal, and equilibrium phases, from a diagnostic point of view. Time-variant information on DCE-CT images could be useful for prediction of microvascular invasion of HCC, especially peripheral regions [18]. This information could increase the segmentation accuracy of each HCC. To mitigate the overfitting problem and to make the best use of clinical resources for lung cancer and HCC cases, we employed two-step transfer learning (TSTL). We also hypothesized that a DL model for lung cancer would be applicable to segmentation of poorly differentiated HCC, which could be retrained for segmenting well-differentiated HCC, because of their similarity. We applied the DL model for lung cancer segmentation developed by Cui et al. [19] From the clinical point of view for primary HCC treatment decisions, the longest diameters obtained from predicted HCC regions were compared with those from reference HCC regions.

Materials and methods
The protocol of this retrospective study was approved by the institutional review boards of the Saga University Hospital (approval number: 2017-11-JINSOKU-07) and Kyushu University Hospital (2021-201). Figure 1 shows the overall workflow of this study for segmenting HCC regions using TSTL. DCE-CT images and annotations for poorly and well-differentiated HCCs at three phases (arterial, portal, equilibrium phases) were inputted to the TSTL models after a preprocessing. After generating a pre-trained segmentation model using CT images of lung cancer patients, poorly 1 3 and well-differentiated HCC regions were sequentially and separately segmented using TSTL. The average 3D Dice's similarity coefficient (3D-DSC), the intersection over union (IOU), sensitivity, specificity and the 95th-percentile of the Hausdorff distance (95% HD) were employed as objective metrics for evaluating the segmentation accuracy based on a nested fourfold cross-validation test [20][21][22][23][24]. Additionally, Spearman's rank correlation coefficients and errors between predicted HCC regions and annotations were assessed with respect to the longest HCC diameters. As for a subjective metric, Cohen's kappa coefficients [23,25] were utilized to evaluate the segmentation accuracy of proposed models.

Clinical cases
Eighty patients with poorly differentiated HCC and 48 patients with well-differentiated HCC were selected from patients treated in our institutions from 2013 to 2019. The patients underwent resection, RFA, TACE, and molecularly targeted drug therapy [6]. Table 1 summarizes the information on clinical cases of poorly differentiated HCC and welldifferentiated HCC enrolled in this study.
HCC patients were scanned at an arterial phase (40 s after injection), a portal phase (65 s after injection), and an equilibrium phase (240 s after injection) on a multi-slice CT scanner (Aquilion ONE ViSION, 320 rows, Canon Medical Systems, Otawara, Japan; LightSpeed VCT VISION, 64 rows; GE Healthcare, Chicago, IL; SOMATOM Definition force, 256 rows, Siemens Medical Solutions, Erlangen, Germany). The DCE-CT images were acquired with a tube voltage of 120 kV, a gantry rotation time of 0.5 s, matrix size from 512 × 512 × 150 to 512 × 512 × 210 pixels (hereinafter expressed as 512 × 512 × 150-210 pixels); pixel sizes of 0.628, 0.680, and 0.750 mm; and slice thicknesses of 0.72, 0.80, 1.00 mm. In matrix sizes of CT images, the first two values represent the number of axial in-plane pixels, and the third value is the number of slices (inferior-superior direction). The image noise level (standard deviation (SD)) was kept within 6.0 HU on the CT scanner, by using an automatic exposure control technique. CT images were reconstructed with two consecutive algorithms: applying a reconstruction function for a standard abdominal region with a low-frequency enhancement function, followed by using an iterative reconstruction algorithm. The HCC region contours were delineated as annotations (reference contours) to be the volumes of interest (VOI) on the arterial phase CT images in poorly differentiated HCC and on the equilibrium phase in well-differentiated HCC, based on consensus between an abdominal radiologist (JN) and a medical physicist (NN), using open-source software: 3D Slicer (version 4.10.2) [26]. The annotations were manually registered with HCC regions on the other two phases by translating the annotations, because misregistration of DCE images at different phases occurred between the annotations and HCC regions due to respiratory motions. CT images for well-differentiated HCC regions were inverted with respect to the CT values to obtain higher CT values in the HCC regions than those in the background.

Preprocessing of HCC region datasets for CT images
Anisotropic CT and annotation binary images were transformed into isotropic images with an isovoxel size of 0.75 mm, using cubic and nearest neighbor interpolations, respectively [27]. VOI were cropped based on the center of volume of the circumscribed square of the maximum dimension annotations from the annotation data sets. The cropped VOI images were resized to 64 × 64 × 8-58 pixels to adjust the input image sizes to the NiftyNet platform [19]. A Laplacian of Gaussian (LoG) filter was applied to the interpolated CT images to reduce noise and enhance edges. The LoG was defined as where denotes the SD, which was set as = 2.0 in this study. Figure 2 shows the architecture of the TSTL based on Dense V-net. In the first transfer learning, the weights of the lung cancer segmentation model were updated by inputting the poorly differentiated HCC data to construct the poorly differentiated HCC segmentation model. In the second transfer learning, the weights of the poorly differentiated HCC segmentation model were updated by feeding the well-differentiated HCC data to construct the well-differentiated HCC segmentation model.

Two-step transfer learning based on Dense V-net
The reason why we employed the Dense V-net was that Cui et al. [19] reported that Dense V-net was feasible for  Fig. 2 Architecture of two-step transfer learning (TSTL) with Dense V-networks in this study. y l , y p , and y w are the predicted regions for lung cancer, and poorly and well-differentiated HCC, respectively. x l , x p , and x w are the training datasets for lung cancer, and poorly and well-differentiated HCC, respectively. w lung , w pHCC and w wHCC are weight coefficients of training for lung cancer, and poorly and welldifferentiated HCC, respectively segmentation of lung cancer regions, which achieved a 3D-DSC of 0.832 ± 0.072 and HD of 4.57 ± 2.44 mm. Dense V-net is composed of two advantageous features, the dense connection, and V-network. The main advantage of dense connection is that it inputs all the features extracted in the previous layer, avoiding overfitting when using deeper networks and smaller data sets. The V-network is an extension of the U-network, where the input image is changed to a 3D image and the residual learning of the U-Net is employed to improve the performance.
Transfer learning is a technique for adapting a model learned in one domain to another domain. Deep convolutional activation features learned from ImageNet, a large natural image database, have been successfully transferred to differentiate patients into the degree of differentiation for HCCs from CT images, with little training data [28]. We hypothesized that a DL model for lung cancer would be applicable to segmentation of poorly differentiated HCC, which could then be retrained for well-differentiated HCC, because of the more similarity between lung cancer and poorly differentiated HCC, as well as that between poorly and well-differentiated HCCs, than natural images. We verified that lung tumors for 12 cases, which were randomly selected, were similar to HCC for different poorly differentiated 12 cases by making all combinations between two datasets, whose result was an average cross-correlation coefficient of 0.719 ± 0.12.
Three types of DCE-CT images at arterial, equilibrium, and portal phases were fed into three channels of Dense V-net, respectively, and HCC contours (reference contours) were given as annotation data to the network. Optimal hyperparameters for building a model for a poorly differentiated HCC were as follows: an activation function of tanh; a loss function of Dice loss [29]; a dropout rate of 60; an optimization algorithm of RMSProp [30]; a batch size of 8; a learning rate of 0.001; and the number of epochs of 50. Furthermore, the hyperparameters for a well-differentiated HCC were as follows: tanh; Dice loss; 65; RMSProp; 8; 0.001; and 60. It is advantageous to employ data augmentation techniques such as image rotation, flipping, and scaling to reduce the DL overfitting problem using smaller amounts of medical image data [31]. The images were randomly rotated from − 40° to + 40° (interval: 1 degree), randomly flipped, and randomly scaled from − 20% to + 20% (interval: 1%).
Two Dense V-nets were built on the NiftyNet platform [32]. NiftyNet is a convolutional neural network platform designed for medical image analysis research and is an opensource system that uses the TensorFlow framework (version: 1.14.0). NiftyNet was implemented in Python for Tensor-Flow to improve the readability of the code. All calculations were performed using a GPU of NVIDIA RTX2080 (NVIDIA Corporation, Los Alamitos, CA). GPU memory consumption for this study was 680 MB.
The sum of trained network weights for poorly and welldifferentiated HCC in Dense V-net in this our study were 2,628,950. These trained weights were extracted from the checkpoint file that was created by a listing function for parameters of Tensor Flow.

Evaluation of proposed models
The proposed models were evaluated with the nested crossvalidation method, because Dora et al. [24] showed that nested cross-validation could avoid an overfitting problem and evaluate the generalization performance of models. In the nested cross-validation, a cross-validation was divided into two parts: inner and outer cross-validation loops. Figure 3 shows a procedure of the nested cross-validation method. The whole dataset is split into fourfolds, one outer fold is retained as a 1st test dataset, and the remaining three outer folds are merged and split into fourfolds for an inner cross-validation. Hyperparameters were optimized in the 1st inner training, and the constructed model was evaluated with the 1st outer test dataset in the outer cross-validation. This procedure is repeated for the rest of the outer training and tests. The model with the highest 3D-DSC in the four outer folds for poorly differentiated HCC is employed in the second transfer learning. To investigate the impact of TSTL on the performance of the proposed segmentation models, all evaluation indices were obtained with and without the TSTL.
3D-DSC, IOU, sensitivity, specificity and the 95% HD were used as objective metrics to evaluate the accuracy of the proposed segmentation models [20][21][22][23]. The 3D-DSCs denote the degree of region similarity between reference HCC regions and the regions predicted using the proposed model [23]: where TP is the number of true positive voxels, FN is the number of false negative voxels, and FP is the number of false positive voxels. IOU is defined as the number of voxels in the intersection between reference regions and the predicted regions divided by the number of voxels in those union [23] as follows: IOU is a more rigorous evaluation of the region segmentation model than DSC. The segmentation performance of the proposed model was further evaluated in terms of sensitivity and specificity for HCC region and normal liver tissue, respectively [23]. Sensitivity measures the portion of positive voxels in the annotation that are also identified as and where TN is the number of true negative voxels. The HD is a mathematical construct used to measure the 3D closeness of two sets of points that are subsets of a metric space. If two images are identical, the HD is zero. The larger HD becomes, the more the shapes of two images differ. In this study, we used 95% HD, that is, the 95th-percentile of the HD between two sets, because the maximum HD is highly sensitive to outliers [22].
Since the longest HCC diameters are assessed in clinical practice, Spearman's rank correlation coefficients and the errors in the longest HCC diameters were calculated between the annotated and predicted regions in poorly and well-differentiated HCC. The longest HCC diameters of the regions were measured using an in-house program  [33]. For the measurement of HCC diameters, the coordinates of voxels within an estimated tumor region were extracted to calculate a covariance matrix. The eigenvalues were calculated by applying a singular value decomposition to the covariance matrix. The longest HCC diameters were estimated using the square roots of the first eigenvalues. The average, minimum, maximum, and percentage errors for the longest HCC diameters were evaluated. Furthermore, to evaluate the impact of the HCC diameter on the accuracy of region segmentation by 3D-DSC, the dataset was divided into two groups according to the median of the longest HCC diameter of the annotations.
Cohen's kappa coefficients [34] were calculated as a subjective metric to assess the consistency of superiority between predicted HCC regions and annotations. The same center slices of the annotation and predicted HCC region were selected to compose a test dataset of 128 cases (poorly differentiated HCC: 80, well-differentiated HCC: 48). Two radiation oncologists (M.O., K.O.) blindly decided which region was better, annotation or predicted HCC region, from a clinical point of view [34]. When the predicted HCC region was better, it was recorded as one; otherwise, it was recorded as zero. Landis and Koch [25] have recommended the criteria of the Cohen's kappa coefficient: 0-0.20: slight;

Statistical analysis R (version 4.0.3) was used for all statistical analyses [35].
In the model for poorly and well-differentiated HCC, the Mann-Whitney U-test was applied to assess statistically significant differences in the average 3D-DSC, IOU and 95% HD between the proposed model and the model without TSTL. The correlations of the longest diameter between annotation and the predicted HCC region for poorly and well-differentiated HCC were evaluated using the Spearman's rank correlation coefficient and its test. Figure 4 shows the average 3D-DSCs and IOUs obtained by proposed models with TSTL and models without TSTL for lung cancer, respectively. The models without TSTL were independently constructed for poorly and well-differentiated HCCs from scratch. For poorly differentiated HCC, the average 3D-DSCs were 0.849 ± 0.078 by the proposed model and 0.684 ± 0.154 by the model without the first TL (p < 0.05). Regarding well-differentiated HCC, the average 3D-DSCs were 0.811 ± 0.089 by the proposed model and 0.700 ± 0.162 (p < 0.05) by the model without the second TL. Mean IOUs of poorly differentiated HCC were 0.683 ± 0.083 for the proposed model and 0.571 ± 0.14 for the model without the first TL (p < 0.05). Mean IOUs of well-differentiated HCC were 0.674 ± 0.094 for the proposed model and 0.534 ± 0.160 (p < 0.05) for the model without the second TL. The sensitivity and specificity for poorly differentiated HCC with the proposed model were 0.888 and 0.964, respectively, those with the model without the first TL were 0.801 and 0.876. Those for well-differentiated HCC with the proposed model were 0.864 and 0.970, respectively, those with the model without the second TL were 0.787 and 0.863. Figure 5(a), (b) illustrate regions (green line) that were well-predicted by the proposed model for two patients with poorly differentiated HCC (Patients 1 and 2), and two patients with well-differentiated HCC (Patients 3 and 4). The contrast-injection phase of the HCC image data was the arterial phase. Images in this figure are arranged at ± 2 slices from the center slice of the tumor. Figure 5 Figure 6 shows the 95% HDs obtained by the proposed models using TSTL and the models without TSTL. The average 95% HDs for poorly and well-differentiated HCC using the proposed model with TSTL were 1.98 ± 0.71 mm and 2.01 ± 0.83 mm, respectively, whereas those using the model without TSTL were 2.67 ± 1.04 mm and 2.78 ± 1.50 mm, respectively (p < 0.05). Figures 7 and 8 show correlations in the longest HCC diameter between the predicted regions and annotations for poorly and well-differentiated HCC, respectively. Spearman's rank correlation coefficients for the proposed model and the model without TSTL were 0.952 (p < 0.001) and 0.782 (p < 0.001), respectively, for poorly differentiated HCC, and 0.943 (p < 0.001) and 0.714 (p < 0.001), respectively, for well-differentiated HCC. Figures 4, 6, 7, and 8 indicate the feasibility of using TSTL. Based on these data, the following two hypotheses were proven: (1) Separate segmentations of poorly and welldifferentiated HCC regions could improve their accuracy, and (2) a DL model for lung cancer is applicable to segmentation of poorly differentiated HCC, and could be retrained for segmenting well-differentiated HCC, because of their similarity. Figure 9 shows the average 3D-DSCs for patient groups with small and large HCC diameters. The patients were divided into two groups (small and large diameter groups), according to the median of the longest annotated diameter for poorly and well-differentiated HCC. The median diameters for poorly and well-differentiated HCC were 18.4 mm and 17.3 mm, respectively. For poorly differentiated HCC, the average 3D-DSCs were 0.822 ± 0.061 in the small group and 0.875 ± 0.050 in the large group (p < 0.05). For well-differentiated HCC, the average 3D-DSCs were 0.800 ± 0.076 in the small group and 0.822 ± 0.081 in the large group (p > 0.05). Table 2 shows the errors in the longest HCC diameters of the annotated image and the proposed segmentation models with and without TSTL. The average errors in the longest HCC diameter for poorly differentiated HCC using the proposed model with TSTL and the model without TSTL were 2.42 mm and 4.05 mm (p < 0.05), respectively. Those for well-differentiated HCC were 2.21 mm and 3.94 mm (p < 0.05), respectively. Table 3 indicates the evaluation for consistency of superiority between predicted HCC regions and annotations. Cohen's kappa coefficients for poorly and well-differentiated HCCs were 0.625 (95% confidence interval (CI) 0.453-0.796) and 0.624 (95% CI 0.402-0.845), respectively, indicating a substantial agreement between two radiation oncologists' evaluations according to the recommendation of Landis and Koch [25] Table 4 shows a comparison of the proposed model with three automated models developed in previous studies using the LiTS Challenge dataset [17]. This comparison is discussed in the next section.

Discussion
The proposed models achieved a higher average 3D-DSC of 0.834 (0.849 and 0.811 for poorly and well-differentiated HCCs, respectively) than the average 3D-DSC of 0. 761 to 0.825 obtained for liver tumor including HCC in previous studies (Table 4) [13][14][15][16]. As for the criterion of better IOU, Codella et al. [36] reported that the correct segmentation was achieved at 0.7 or higher. IOU of our results was 0.679, which was lower than the criterion. That is because IOU depends on false negative regions Eq. (3), whose rate was 0.122 (1-sensitivity) in this study ( Table 4). The sensitivity of the proposed model with TSTL was 0.878 (0.887 and 0.864 for poorly and well-differentiated HCCs, respectively), which was lower than those (0.943) obtained in a past study [15]. The specificity in this study was similar to those in the study [15] (Table 4). In the past study [15], the image resolution was approximately 1.0 to 2.0 mm in the axial plane and 0.45 to 6.0 mm in the z-axis direction for anisotropic voxels, whereas in our study the resolution was 0.75 mm for isotropic voxels. Therefore, the image resolution of the past study was rougher than ours, which might affect the sensitivity. The average HD of the proposed model was 2.46 mm (2.57 and 2.39 mm for poorly and well-differentiated HCCs, respectively), which were lower than the values of 6.10 to 6.26 mm obtained in past studies ( Table 4). The average Cohen's kappa coefficient in this study is 0.624, which indicates substantial agreement with the criteria presented by Landis and Koch [25]. The coefficients for poorly and well-differentiated HCCs were 0.625 and 0.624, respectively. Thus, TSTL could be potentially useful  in selecting treatment options for primary HCC in future clinical practice. TSTL could be used to segment the HCC region and then measure the longest HCC diameter, which is one of the factors for selecting the HCC treatment option. In particular, TACE or resection could be selected based on a threshold value of 3 cm. As shown in Fig. 4 and Table 2, there are cases with low 3D-DSCs and large errors, such as the two cases (Fig. 5(c), (d)) with the lowest 3D-DSC for poorly and well-differentiated HCCs. One of the causes for this could be smaller regions, which caused lower segmentation accuracies, and particularly overestimation of poorly differentiated HCC regions (Figs. 7(a), 9(a)). The other causes may be low-contrast and blurred regions. Therefore, the HCC regions may have been segmented jointly with angiogenic vessels. The issues of small size and low contrast or blurring should be addressed in future studies.
This study had three limitations. First, we did not address moderately differentiated or undifferentiated HCCs, but rather focused on poorly and well-differentiated HCCs. Therefore, the proposed models should be applied to moderately differentiated or undifferentiated HCCs in future studies. The second limitation is that the proposed models have not been applied to large databases. In the cases used in this study, there are the small number of poorly and well-differentiated cases with diameters greater than 2.6 cm and 2.1 cm, respectively. Besides, as shown in Fig. 9, the DSCs for the small diameter group were smaller than the large group. The number of 48 well-differentiated cases was obviously smaller than 80 poorly cases. Therefore, by collecting data from other institutions and open data sets, we should add more training cases with various sizes of both differentiated HCCs, especially well-differentiated HCC, for improving the segmentation accuracy. Third, it is essential to visualize reasoning heat maps (e.g., Grad-CAM) to represent how important pixels of feature maps are for annotations in segmentation tasks [37]. We should work on the visualization of the reasoning heat maps in future works.

Conclusion
The proposed models constructed by TSTL from a lung cancer dataset showed the potential for automatic segmentation of poorly and well-differentiated HCC regions on the DCE-CT images. The average 3D-DSC and 95% HD values obtained by the proposed models are 0.834 and 1.99 mm, respectively. Therefore, the proposed model has the potential to be useful in delineating HCCs as a treatment option and in assisting hepatologists and radiologists.