Accurate risk stratification, which mainly depends on PSA level, biopsy grade group and stage classification, plays a pivotal role in guiding treatment management for PCa patients. However, it has been demonstrated that prostate biopsy often underestimates the cancer and incorrectly assigns NCCN risk stratification [3, 4]. There are several reasons accounting for the discrepancies between the biopsy and RP grades: sampled cores and indications for biopsy, differences in biopsy techniques, erroneous diagnostic interpretation, tumor heterogeneity, sampling error on biopsy, clinician interpretation of the biopsy GG (global versus highest/composite/overall score) and practice variations regarding biopsy grade assignment [7, 23]. The incorrect risk stratification may impact treatment planning, patient selection and decision-making processes. Therefore, it is extremely important to the identify risk factors associated with upgrading to avoid under-treatment, especially among those PCa patients who are considered appropriate candidates for AS. Unfortunately, there are currently no widely accepted predictive models to accurately predict the final individualized GG at RP and the discrimination ability of various models remains modest [24–27]. Machine learning has been previously used for predicting outcomes in other fields of medicine, including the identification of lung cancer based on routine blood indices and the in-hospital rupture of type A aortic dissection [28, 29]. Given the excellent performance of machine learning algorithms in classification, four machine learning algorithms were employed in our study to determine relevant risk factors; we then developed and validated four novel prediction models to identify those PCa patients at high risk of harboring upgrading at RP before making treatment decisions.
Overall, up to 49.4% included patients were upgraded at RP, especially biopsy GG 1 patients, with the proportion of upgrading being 72.3%. Similarly, Altok and colleagues  reported that 70.9% of biopsy GG 1 patients in their study cohort were upgraded at RP, and most were upgraded to GG 2. These observations explain why some patients with GG 1 disease at biopsy suffer metastases or die of prostate cancer and suggest that a substantial proportion of biopsy GG 1 patients who embark on active surveillance are not, in fact, suitable candidates . Given the known risk of underestimation in biopsy specimens, the prediction of GG upgrade plays a major role when considering individualized therapy for PCa patients, especially AS . In our series, %fPSA (>0.16 versus ≤0.16), apical involvement at MRI (No versus Yes) and biopsy grade group (GG 4, GG 3, GG 2 versus GG 1) were independent factors in multivariable logistic regression analysis. %fPSA, apical involvement at MRI, biopsy grade group and clinical T stage at MRI were significantly associated with upgrading in Lasso-LR and SVM model. However, in a comparable study, Alshak et al.  demonstrated that only the PI-RADS score was a significant predictor of upgrading. Besides, Gandaglia et al.  reported that preoperative PSA level, GG at MRI-targeted biopsy and clinically significant PCa at systematic biopsy were independent risk factors of upgrading at RP. The differences in results between our study and the latter two studies might be due to the fact that the latter two studies did not include detailed core biopsy information, which has been successfully shown to contain huge potential predictive value.
In our study, imaging factors such as apical involvement at MRI and clinical T stage at MRI, were more important predictors than clinical parameters according to the results of ML-based feature ranking analyses, except for RF analysis. This implied that mp-MRI had great potential in predicting upgrading, irrespective of its important role in detecting csPCa and assigning accurate risk stratification for PCa patients. The routine mp-MRI examination for patients with suspected PCa before biopsy was indeed beneficial and helpful. Among those biopsy-related variables, biopsy GG was always the strongest predictor. In the LR and Lasso-LR model, the number of positive cores, presence of csPCa at core, presence of a core with a tumor length >0.6 cm, maximum tumor length in a single core, total tumor length and percentage of tumor in total biopsy cores demonstrated almost no value in the prediction of upgrading at RP, while the number of positive cores, total tumor length and percentage of tumor in total biopsy cores ranked ahead in the RF model. As these features, including D-max, were reliable proxies of tumor volume, the size of tumor should not be considered relevant to the presence of upgrading . Nonetheless, Corcoran et al.  reported that tumor volume of PCa was a significant predictor of upgrading in multivariable analysis, and the measurement of surrogate of tumor volume might predict those at greatest risk of Gleason score upgrade. One thing to be noted was that the patient cohort in the study of Corcoran et al.  did not include those patients with biopsy GG 3 and 4. %fPSA outperformed TPSA and PSAD in the prediction of upgrading in the LR, Lasso-LR and SVM models. On the contrary, in the mean decrease accuracy and mean decrease Gini evaluation of RF models, TPSA and PSAD ranked higher than %fPSA.
For the performance of ML-based models, the Lasso-LR model showed the best discriminative power with an AUC of 0.776 (95%CI: 0.729–0.822), followed by SVM (AUC 0.740; 95%CI: 0.690–0.790), LR (AUC 0.725, 95%CI: 0.674–0.776) and RF (AUC 0.666; 95%CI: 0.618–0.714). The nomogram developed by He et al.  achieved an AUC of 0.753 in the prediction of upgrading, which was higher than that of LR but lower than Lasso-LR in the present study. Also, Moussa et al.  constructed a normogram for predicting the possibility of upgrading, with a concordance index of 0.68. Additionally, all of the ML-based models except for RF outperformed the predictive models constructed by Kulkarni et al.  and Athanazio et al. , with AUC values of 0.71 and 0.699 in the respective studies. Of note, in a study consisting of 2982 PCa patients treated with RP, the model for predicting upgrading based on logistic regression analysis showed a predictive accuracy of 0.804; in contrast, in our study, the Lasso-LR model presented the best predictive accuracy of 0.712 . Despite the better predictive accuracy of the model in the study of Chun et al. , it was still difficult to determine the model with best performance when compared with our ML-based models as there was no other discrimination metrics such as AUC, sensitivity, specificity, PPV and NPV in their study. It should be noted that the good performance of our ML-based models might be related to the inclusion of mp-MRI information and detailed biopsy information.
Despite several strengths, our study has certain limitations. First, the data on PCa patients who underwent RP enrolled in our study cohort were retrospectively collected at a single institution, which may have resulted in selection bias. Second, the case-level highest Gleason grade group was more commonly assigned to patients undergoing systematic TRUS-guided biopsy in our country; hence, we should also construct predictive models to identify risk factors associated with upgrading using a comparison between the highest biopsy GG and final RP samples.