Data enrollment
This retrospective study received approval from the institutional review board (IRB number: 2021060), all methods were performed in accordance with the relevant guidelines.The Biomedical Research Ethics Committee of Peking University First Hospital granted a waiver for written patient informed consent. The data were collected retrospectively from four distinct hospitals. The study included patients who underwent prostate mpMRI followed by RP between November 2017 and December 2022. This encompassed gathering mpMRI images and pertinent clinical information, such as age, PSA levels, mpMRI results, biopsy pathology, and RP pathology reports. The exclusion criteria were as follows: 1) prior endocrine therapy, 2) benign prostate hyperplasia according to RP pathology, 3) missing PSA data, 4) incomplete biopsy pathology records, 5) incomplete MR images, and 6) evident image artifacts. The process of data enrollment is depicted in Fig. 1.
Figure 1 Data enrollment process
Flowchart depicting the process of data inclusion. The training dataset comprises data from Hospital_1, while the external validation dataset includes data from Hospital_2, Hospital_3, and Hospital_4. All the data adhered to the same inclusion and exclusion criteria. In total, the training dataset consisted of 345 included patients, and the external validation dataset included 533 patients, 18 from Hospital_2, 231 from Hospital_3, and 284 from Hospital_4.
Reference standard
All patients underwent both biopsy and RP, and pathology samples were available for all patients. Pathology was assessed by experienced pathologists following the guidelines of the International Society of Urological Pathology (ISUP) group. The reference standard was established using the RP pathology results, categorizing the patients into three groups: ISUP_1, ISUP_2, and ISUP_3 ~ 5.
MR scanning protocols
The mpMRI images in this study were acquired with 13 MR scanners from five different vendors. Detailed information regarding the MR scanners and image acquisition protocols can be found in Table 1.
Table 1
MR scanning protocols used in different hospitals
| Overall | Training dataset | | External validation dataset (N = 533) | P value |
| (N = 878) | Hospital_1 | | Hospital_2 | Hospital_3 | Hospital_4 |
| (N = 345) | | (N = 18) | (N = 231) | (N = 284) |
MR scanner | | | | | | | |
Magnetic field | | | | | | | < 0.001 |
1.436 T | 5 (0.6%) | 0 (0%) | | 0 (0%) | 0 (0%) | 5 (1.8%) | |
1.5 T | 229 (26.1%) | 19 (5.5%) | | 0 (0%) | 45 (19.5%) | 165 (58.1%) | |
3.0 T | 644 (73.3%) | 326 (94.5%) | | 18 (100%) | 186 (80.5%) | 114 (40.1%) | |
Station name | | | | | | | < 0.001 |
GE-Scanner_1 | 262 (29.8%) | 262 (75.9%) | | 0 (0%) | 0 (0%) | 0 (0%) | |
GE-Scanner_2 | 37 (4.2%) | 0 (0%) | | 0 (0%) | 37 (16.0%) | 0 (0%) | |
GE-Scanner_3 | 159 (18.1%) | 0 (0%) | | 1 (5.6%) | 44 (19.0%) | 114 (40.1%) | |
PHILIPS_Scanner_1 | 38 (4.3%) | 38 (11.0%) | | 0 (0%) | 0 (0%) | 0 (0%) | |
PHILIPS_Scanner_2 | 1 (0.1%) | 1 (0.3%) | | 0 (0%) | 0 (0%) | 0 (0%) | |
PHILIPS_Scanner_3 | 1 (0.1%) | 1 (0.3%) | | 0 (0%) | 0 (0%) | 0 (0%) | |
SIEMENS_Scanner_1 | 14 (1.6%) | 0 (0%) | | 14 (77.8%) | 0 (0%) | 0 (0%) | |
SIEMENS_Scanner_2 | 43 (4.9%) | 0 (0%) | | 0 (0%) | 43 (18.6%) | 0 (0%) | |
SIEMENS_Scanner_3 | 64 (7.3%) | 0 (0%) | | 0 (0%) | 64 (27.7%) | 0 (0%) | |
SIEMENS_Scanner_4 | 19 (2.2%) | 19 (5.5%) | | 0 (0%) | 0 (0%) | 0 (0%) | |
SIEMENS_Scanner_5 | 3 (0.3%) | 0 (0%) | | 3 (16.7%) | 0 (0%) | 0 (0%) | |
SIEMENS_Scanner_6 | 208 (23.7%) | 0 (0%) | | 0 (0%) | 43 (18.6%) | 165 (58.1%) | |
UIH_Scanner_1 | 29 (3.3%) | 24 (7.0%) | | 0 (0%) | 0 (0%) | 5 (1.8%) | |
DWI/ADC protocol | | | | | | | |
B value (10^-6 s/mm2) | | | | | | | < 0.001 |
800 | 8 (0.9%) | 6 (1.7%) | | 1 (5.6%) | 0 (0%) | 1 (0.4%) | |
1000 | 288 (32.8%) | 0 (0%) | | 2 (11.1%) | 8 (3.5%) | 278 (97.9%) | |
1200 | 97 (11.0%) | 0 (0%) | | 0 (0%) | 96 (41.6%) | 1 (0.4%) | |
1400 | 377 (42.9%) | 285 (82.6%) | | 1 (5.6%) | 87 (37.7%) | 4 (1.4%) | |
1500 | 52 (5.9%) | 0 (0%) | | 12 (66.7%) | 40 (17.3%) | 0 (0%) | |
2000 | 56 (6.4%) | 54 (15.7%) | | 2 (11.1%) | 0 (0%) | 0 (0%) | |
Repetition time (ms) | | | | | | | < 0.001 |
Median [Q1,Q3] | 3000 [2630,3800] | 2680 [2650,3000] | | 2500 [2500,3660] | 4310 [3000,4850] | 3000 [2090,3000] | |
Echo time (ms) | | | | | | | < 0.001 |
Median [Q1,Q3] | 61.7 [59.9,70.0] | 61.0 [59.8,61.5] | | 91.0 [82.8,91.0] | 70.0 [63.0,74.0] | 70.0 [59.7,70.0] | |
Pixel bandwidth (MHz) | | | | | | | < 0.001 |
Median [Q1,Q3] | 1950 [1540,1950] | 1950 [1950,1950] | | 1300 [1300,1500] | 1630 [1540,1950] | 1540 [1540,1950] | |
Slice thickness (mm) | | | | | | | < 0.001 |
Median [Q1,Q3] | 3.50 [3.00,4.00] | 4.00 [4.00,4.00] | | 4.00 [4.00,4.00] | 3.00 [3.00,3.50] | 3.50 [3.00,3.50] | |
Slice spacing (mm) | | | | | | | < 0.001 |
Median [Q1,Q3] | 4.00 [3.50,4.00] | 4.00 [4.00,4.00] | | 4.80 [4.80,4.80] | 3.60 [3.30,4.00] | 3.50 [3.50,4.00] | |
Reconstruction diameter (ms) | | | | | | | < 0.001 |
Median [Q1,Q3] | 230 [200,240] | 240 [220,240] | | 250 [250,250] | 221 [200,260] | 200 [200,340] | |
Pixel spacing (ms) | | | | | | | < 0.001 |
Median [Q1,Q3] | 1.25 [0.938,1.63] | 0.938 [0.938,0.938] | | 1.30 [1.30,1.30] | 1.33 [0.877,1.63] | 1.56 [1.33,1.79] | |
T2WI protocol | | | | | | | |
Repetition time (ms) | | | | | | | < 0.001 |
Median [Q1,Q3] | 3560 [3130,4470] | 3130 [3020,3500] | | 4000 [4000,4000] | 3710 [3340,4120] | 5400 [3410,5540] | |
Echo time (ms) | | | | | | | < 0.001 |
Median [Q1,Q3] | 105 [88.9,112] | 88.8 [87.3,93.0] | | 100 [100,100] | 108 [96.0,112] | 112 [106,112] | |
Pixel bandwidth (MHz) | | | | | | | < 0.001 |
Median [Q1,Q3] | 163 [163,200] | 163 [163,163] | | 203 [203,203] | 190 [160,200] | 200 [163,200] | |
Slice thickness (mm) | | | | | | | < 0.001 |
Median [Q1,Q3] | 3.50 [3.00,4.00] | 4.00 [4.00,4.00] | | 3.00 [3.00,3.00] | 3.00 [3.00,3.50] | 3.50 [3.00,3.50] | |
Slice spacing (mm) | | | | | | | < 0.001 |
Median [Q1,Q3] | 4.00 [3.50,4.00] | 4.00 [4.00,4.00] | | 3.60 [3.60,3.60] | 3.60 [3.50,4.00] | 3.50 [3.50,4.00] | |
Reconstruction diameter (ms) | | | | | | | < 0.001 |
Median [Q1,Q3] | 200 [200,240] | 240 [240,240] | | 178 [178,178] | 200 [200,220] | 200 [200,200] | |
Pixel spacing (ms) | | | | | | | < 0.001 |
Median [Q1,Q3] | 0.469 [0.391,0.625] | 0.469 [0.469,0.469] | | 0.625 [0.625,0.625] | 0.391 [0.344,0.625] | 0.625 [0.391,0.625] | |
Image features extracted by AI algorithms
Following anonymization, the DICOM files were converted to NIFTI format using dicom2nii.py (Python 3.5) and then input into pretrained AI software. For detailed technical information about the AI software, please refer to the supplementary material.
For each enrolled patient, the largest lesion predicted by the AI algorithms on the mpMRI images was designated the index lesion for analysis. The lesion's location was classified as solely in the peripheral zone (PZ), solely in the transitional zone (TZ), or in both the peripheral and transitional zones (PZ + TZ). The following image features were computed and utilized in the prediction model: 1) lesion volume, 2) diameter of the lesion, signifying the largest diameter of the lesion, 3) mean ADC value of the lesion (ADClesion), 4) mean signal intensity of the lesion on DWI (DWIlesion), and 5) mean signal intensity of the lesion on T2WI (T2WIlesion) (as illustrated in Fig. 2).
Step 1 involves selecting and inputting anonymized mpMRI images, which encompass T2WI, DWI, and ADC maps. In Step 2, the pre-trained AI models are employed to identify suspicious areas indicative of PCa. Step 3 entails identifying the index lesion by selecting the largest lesion from the predicted labels. The index lesion was annotated as the red areas on the images. Step 4 encompasses extracting image features from the index lesion, encompassing 1) clinical image features such as lesion volume, lesion diameter, ADC value, DWI signal intensity, and T2WI intensity, 2) conventional radiomic features, and 3) deep radiomic features. In Step 5, various prediction methods are developed, including a) biopsy prediction, which forecasts ISUP grouping based on biopsy pathology; b) PIRADS prediction, which anticipates ISUP grouping based on the PIRADS category; c) a clinical model, which incorporates age, PSA, PIRADS category, and clinical image features to predict ISUP grouping; d) radiomics model, which forecasts ISUP grouping using conventional radiomic features obtained from the index lesion; and e) deep-radiomics model, which forecasts ISUP grouping using deep-radiomic features extracted from the index lesion.
Prediction methods
To predict the ISUP_1, ISUP_2, and ISUP_3 ~ 5 classes, five prediction methods were developed, namely, the PIRADS category, biopsy pathology, a clinical model, a radiomics model, and a deep-radiomics model. The steps involved in developing the prediction methods are illustrated in Fig. 2. The training dataset utilized was sourced from Hospital_1, while the external validation dataset included data from Hospital_2, Hospital_3, and Hospital_4.
For the construction of the clinical model, clinical variables were employed as covariates. These included age, PSA levels, PIRADS score, and image features of the suspected clinically significant prostate cancer (csPCa) lesion, which included position, volume, diameter, ADC value, signal intensity on DWI, and signal intensity on T2WI.
To construct the radiomics model, image features were derived from regions of interest (ROIs) on the ADC maps using the PyRadiomics package in Python (https://pyradiomics.readthedocs.io/en/latest/changes.html). Z score normalization was applied to standardize the extracted features, and Pearson correlation coefficients (PCCs) were computed to identify highly correlated features. Features with a PCC value exceeding 0.99 were eliminated to mitigate multicollinearity. The Kruskal‒Wallis test was then employed to select the features for the final radiomics model. For the classifier, the least absolute shrinkage and selection operator (LASSO) algorithm was utilized.
To construct the deep-radiomics model, image features were extracted using a deep learning algorithm. The construction process involved several key steps: 1) Preprocessing ADC maps: Initial preprocessing included normalizing the intensities of the ADC maps. 2) ROI resampling: ROIs were resampled to ensure a consistent voxel size. 3) Deep Feature Extraction: A pretrained deep learning model was used to extract features from the segmented ROIs. 4) Feature Dimension Reduction: The resulting channel feature maps underwent dimension reduction by filtering with the maximum value, resulting in a set of 512 one-dimensional features. 5) Model Construction: After extracting the deep features, the construction of the deep-radiomics model followed a procedure similar to that of the radiomics model, as previously described.
For model tuning, 5-fold cross-validation was employed to select the optimal value of the hyperparameter α, which controls the strength of the L1 penalty. A grid search was performed over a range of α values, and the one yielding the highest cross-validated accuracy was chosen. This process identified a subset of significant predictors associated with ISUP classes.
Prediction efficacy evaluation
The external dataset was used to assess the prediction effectiveness of the methods via receiver operating characteristic (ROC) analysis. This evaluation was conducted by computing the area under the ROC curve (AUC).
Statistical analysis
The statistical analysis was conducted using R 4.1.3 software. Quantitative variables with a normal distribution are presented as the mean ± standard deviation, while those with a nonnormal distribution are presented as the median [Q1, Q3]. Categorical variables are presented as frequencies. The normality of the variables was assessed using the Kolmogorov‒Smirnov test, associations between categorical variables were examined using the chi-square test, and differences between multiple groups were compared using the Kruskal‒Wallis test. The DeLong test was used to compare differences in the area under the curve (AUC) among the five types of prediction methods. A P value less than 0.05 was considered to indicate statistical significance.