Patients
This is a retrospective study for which ethical approval was obtained and informed consent from patients was waived. Between January 2013 and November 2019, patients who underwent Gd-EOB-DTPA-enhanced MRI examination before surgery or biopsy were consecutively included in this study according to the following inclusion and exclusion criteria. The inclusion criteria were: (1) pathologically confirmed HCC; (2) received Gd-EOB-DTPA-enhanced MRI of the liver within 1 month before surgery or biopsy; (3) images without obvious artifact; (4) if multiple lesions were present, the largest one was selected. The exclusion criteria were: (1) received previous treatment, such as anti-tumor therapies, radiofrequency ablation, transcatheter arterial chemoembolization (TACE), and so on; (2) incomplete clinical or pathological information. All enrolled patients were randomly divided into training and validation cohorts at a ratio 7:3.
Histopathological examination
The tumor tissue sections were stained using monoclonal mouse anti-human Ki-67 antibody (Beijing Zhongshan Golden Bridge Biotechnology Company, Beijing, China). The Ki-67 expression was evaluated by calculating the frequency of 1 Ki-67-positive cells. Ki-67 was considered positive when the cell nuclei were stained brown yellow. Immunoreactive cells were classified as low Ki-67 expression (≤ 14% immune-reactivity) or high Ki-67 expression ( > 14% immune-reactivity) according to previous studies [7, 11].
MRI protocol
The details of MRI protocol and the sequences used in this study were presented in the Supplementary 1.
Tumor segmentation
Tumor segmentation was manually performed on AP, PVP, HBP and T2W images with 3D Slicer (http://www.slicer.org), and a three-dimensional (3D) region of interest (ROI) that covered the whole tumor was delineated along the border of tumors. HBP or T2W images were first for manual segmentation. Subsequently, AP and VP images were delineated, as the tumor margins on HBP or T2W images were clearer than that on AP and VP images. Taking this delineating order would mitigate software-related segmentation errors. The segmentation was independently performed by two radiologists (Y.Y., 10 years of liver imaging experience; Y.F., 8 years of liver imaging experience) in 30 randomly chosen patients to assess inter-observe reproducibility. The segmentation was performed again by the radiologist (Y.F.) at another day to assess the intra-observe reproducibility. The remaining images of patients were segmented by the radiologist (Y.F.). Both of the radiologists were blinded to the clinical outcomes.
Preprocessing and radiomic features extraction
Before radiomic features extraction, preprocessing of images was performed, including Laplacian of Gaussian (LoG) preprocessing, wavelet transformations, bin discretization and radiomic matrix symmetry. Features extraction was performed using the Slicer Radiomics extension, which incorporates the PyRadiomics library into 3D Slicer [18]. Extracted features included first order statistics, shape and texture features, which were gray level co-occurrence matrix (GLCM), gray level size zone matrix (GLSZM), gray level run length matrix (GLRLM), gray level dependence matrix (GLDM) and neighboring gray tone dependence matrix (NGTDM). Among these features, flatness and least axis from shape features were excluded based on the definition of the feature, as discussed in the documentation of PyRadiomics, and sum average was excluded because it is directly correlated with joint average [19]. Thus, a total of 1,300 radiomic features were extracted for each unique lesion.
Radiomic feature selection and model development
The least absolute shrinkage and selection operator (LASSO) logistic regression with 5-fold cross-validation was used to select the most useful features in the training cohort. Rad-score was calculated for each patient using the linear combination of selected features multiplied by their respective coefficients.
Comparison of radiomics model in the training and validation cohort
These models assessed in the training cohorts were applied to validation cohort. The Receiver operating characteristic (ROC) curve, Delong test, calibration curve and decision curve analysis (DCA) were utilized to illustrate the diagnostic performances of these constructed models, and the cutoff values were selected according to the Youden index to determine the corresponding sensitivity and specificity.
Combined model development and validation
For the development of combined model, we performed multivariate logistic regression analysis of clinical factors in training cohort, including age, sex, hepatitis B, hepatitis C, cirrhosis, serum alanine aminotransferase (ALT) level, serum aspartate aminotransferase (AST) level, serum gamma-glutamyl transferase (GGT) level, and serum alpha-fetoprotein (AFP) level. Clinical factors that reached statistical significance with P values less than 0.05 were selected into the combined model, which also included the optimal Rad-score.
Calibration curves were adopted to analyze the diagnostic performance of the combined model in both training and validation cohort. Decision curve analysis was conducted to determine the clinical usefulness of the combined model by quantifying the net benefits at different threshold probabilities in the validation cohort.
Statistical analysis
The continuous variables were described as median and interquartile range, and the categorical variables were described as frequency and percentage. D'Agostino-Pearson test was used to test normality of dates. Independent sample t-test or Mann-Whitney U nonparametric rank sum test was used to compare clinical characteristics between the training and validation cohort, and between high Ki-67 expression and low Ki-67 expression groups in the training and validation cohort for continuous variables, while
the Chi-squared test or Fisher exact test were conducted for categorical variables. Two-sided P values < 0.05 were considered statistically significant. The inter-observer and the intra-observer reproducibility to the extracted features were assessed by the intra-class correlation coefficient (ICC). ICC≥0.8, 0.5-0.79 and < 0.5 indicated high, middle, and low consistency, respectively [20]. LASSO logistic regression, and multivariable logistic regression analysis were performed to select radiomics features and clinical risk factors using the “glmnet” and “rms” package running in R software, version 3.0.1 (http://www.Rproject. org). The calibration and decision curve were plotted using the “rms” and “rmda” package. Other statistical analyses were performed using the MedCalc software (Version 16.2.0, https://www.medcalc. org).