Patient Characteristics and Model development
The nucleus map set was used to train U-net to capture localized nuclear architectural information (n = 120). The other three sets that were used for training and validation were: 1) 552 patients from the train set that joined in the development of the model, 2) 144 patients from the LT set, and 3) the TCGA set that was used to externally validate the model; 552 patients from the train set exhibited obvious outcomes (good: 274, bad: 278) and were enrolled for obtaining definite facts. The patients’ demographics are presented in Table 1.
Table 1
Baseline characteristics in the nucleus map set, train set, LT set, and the TCGA set.
Variables
|
Nucleus map set
(n=120)
|
Train set
(n=552)
|
LT set
(n=144)
|
TCGA set
(n=302)
|
Age (year)
|
59 (49-65)
|
55(47-63)
|
52 (45-58)
|
60 (51-68)
|
Gender (male)
|
104(86.7%)
|
478(86.6%)
|
130 (90.3%)
|
208 (68.9%)
|
AFP (ng/ml)
|
34.7 (6.6- 708.0)
|
76.5(7.2-888.0)
|
49.3 (7.7-1418.3)a
|
11.0 (4.0-231.5)b
|
Grade
|
|
|
|
|
G1
|
15 (12.5%)
|
32 (5.8%)
|
3 (2.1%)
|
43 (14.2%)
|
G2
|
54 (45.0%)
|
243 (44.0%)
|
42 (29.2%)
|
142 (47.0%)
|
G3
|
42 (35.0%)
|
217 (39.3%)
|
35 (24.3%)
|
103 (34.1%)
|
G4
|
9 (7.5%)
|
54 (9.8%)
|
0 (0.0%)
|
10 (3.3%)
|
Missing
|
0 (0.0%)
|
6 (1.1%)
|
64 (44.4%)
|
4 (1.3%)
|
Total tumor size
|
|
|
|
|
<5cm
|
79 (65.8%)
|
334 (60.5%)
|
35 (24.3%)
|
|
≥5cm
|
41 (34.2%)
|
210 (38.0%)
|
109 (75.7%)
|
|
Missing
|
0 (0.0%)
|
8 (1.4%)
|
0 (0.0%)
|
302 (100.0%)
|
Tumor number
|
|
|
|
|
Single
|
108 (90.0%)
|
479 (86.8%)
|
52 (36.1%)
|
|
Multiple
|
12 (10.0%)
|
67 (12.1%)
|
92 (63.9%)
|
|
Missing
|
0 (0.0%)
|
6 (1.1%)
|
0 (0.0%)
|
302 (100.0%)
|
Variables
|
Nucleus map set
(n=120)
|
Train set
(n=552)
|
LT set
(n=144)
|
TCGA set
(n=302)
|
Stage_AJCC
|
|
|
|
|
Stage II
|
25 (20.8%)
|
145 (26.3%)
|
34 (23.6%)
|
66 (21.9%)
|
Stage III
|
5 (4.2%)
|
53 (9.6%)
|
63 (43.8%)
|
69 (22.8%)
|
Stage IV
|
0 (0.0%)
|
12 (2.3%)
|
21 (14.6%)
|
3 (1.0%)
|
Missing
|
0 (0.0%)
|
7 (1.3%)
|
1 (0.7%)
|
20 (6.6%)
|
Data are median (IQR) or n (%). a, NA = 1. b, NA = 71. |
Firstly, we used an image segmentation model to get the heat map of nuclei segmentation for each tile. This segmentation model was a U-net neural network trained using the nucleus map set. The loss function was Dice and the final Dice Score for the nucleus map set could reach 82%. The segmentation result was not desired to be too precise, since information other than nuclei, such as cytoplasm and shape of the whole cell, was also accountable in the heatmap. A total of 57415 tiles (small image patches with 224 × 224 pixels) were extracted from the train set (good: 28534, poor: 28881). The pre-trained U-net was used to get a heat map of nuclei segmentation for each tile before finally training our model. We concatenated the heat map of nuclei segmentation and the color-normalized RGB tiles at channel level and produced a 4-channel tile. The bags containing 4-channel tiles were then dumped onto a feature extractor of the MobileNet V2 model. We used a generalized mean with a sign as the aggregation function since it could keep the extremes while simultaneously taking the average into account. The output of the aggregation function, which represents the score of the pathological image was activated using a sigmoid function and then compared with a given threshold of 0.4457, where 0.4457 is a hyperparameter. Finally, the images are classified into certain a class based on their scores. The pipeline for MobileNet V2 HCC classification (MobileNet V2_HCC_Class) is shown in Figure 1.
The model generalized to LT for the HCC dataset
The output of our neural networks could categorize patients into low-risk and high-risk subgroups. In the LT set, 144 patients with complete follow-up data were included, of which 65 patients relapsed during follow-up. The available variables for analysis are age at diagnosis, gender, serum alpha-fetoprotein (AFP), Child-Pugh score, the model for end-stage liver disease (MELD), tumor size, tumor number, grade, and tumor stage according to the American Joint Committee on Cancer (Stage AJCC). Univariable analyses indicated that the variables AFP, tumor size, grade, tumor number, and Stage AJCC were associated with a shorter RFS (Table S1). Tiles from the tissue array of these patients were retrieved and processed under the proposed model. The MobileNetV2_HCC_Class was a strong predictor of RFS in the whole LT set and was even capable of stratification of other common prognostic features (Stage AJCC, AFP, tumor number, and tumor size) (Figure 2).
Multivariate analyses showed that the MobileNetV2_HCC_Class had an independent prognostic value (HR = 3.44 (2.01–5.87), p<0.001) after adjusting known prognostic markers remarkable in the univariable analyses, such as Stage AJCC, AFP, tumor number, and tumor size (Figure 3A). The time-dependent AUC curves are depicted in Figure 3B. During the entire course of the 3-year follow-up, the MobileNetV2_HCC_Class maintained relatively higher AUC values than the other factors in the first two years after LT.
The model generalized to the TCGA dataset
The robustness of our model was evaluated on an independent series from the TCGA. 302 patients satisfied the inclusion criteria, and 165 of them with recurrence were recorded. The slides were gathered from various centers. The available variables that were entered for analysis are age at diagnosis, age, gender, AFP, vascular invasion, stroma tumor ratio (STR), tumor-infiltrating lymphocyte (TIL), grade, and Stage AJCC. The clinical, biological, and pathological feature most related to a shorter survival should be the AJCC stage in univariable analyses (Table S2). Tiles from WSIs of the 302 patients were retrieved and processed under the proposed model. In the TCGA set, MobileNetV2_HCC_Class predicted the RFS while also following the stratification of other significant prognostic features like Stage AJCC, AFP, grade, or vascular invasion (Figure 4).
The classifier seemed strong during multivariable analysis (HR = 2.55 (1.64–3.99), p<0.001), upon adjusting the known prognostic markers remarkable in univariable analyses, such as Stage AJCC, AFP, grade, and vascular invasion (Figure 5A). The results show that the model can capture complicated non-redundant patterns in which baseline variables influence HCC patients’ survival. The time-dependent AUC curves are depicted in Figure 5B. During the entire course of the 6-year follow-up, the MobileNetV2_HCC_Class maintained relatively higher AUC values than other factors after HCC resection.
Histological analysis of tiles
The MobileNetV2_HCC_Class could extract tiles with the highest predictability from thousands of tiles. The prime histological features related to recurrence could be surveyed by retrieving 400 tiles with the highest predictability (high recurrence risk: 200, low recurrence risk: 200) among 302 patients of the TCGA with MobileNetV2_HCC_Class. Four such histological features were found from tumoral areas. The presence of stroma, high degree of cytological atypia, and nuclear hyperchromasia were related to high risk (p = 0.0003, p = 0.0010, p = 0.0012, respectively), while immune cell infiltration was associated with low risk (p = 0.0019) (Figure 6, Table S3). The above findings show that the proposed deep learning model detects established histological patterns related to recurrence among HCC patients.