Predicting Gleason score upgrade from biopsy pathology to radical prostatectomy specimens: a new nomogram and internal validation

Background: The objectives of this study were to investigate the discrepancy of Gleason score (GS) between biopsy pathology and radical prostatectomy (RP) specimens and to determine the predictors followed by constructing a nomogram for Gleason sum upgrade (GSU). Methods: We retrospectively reviewed our prospectively maintained prostate cancer (PCa) database. 166 patients who underwent RP following biopsy from October 2012 to September 2019 were enrolled after selection. Univariate and multivariate logistic regression were sequentially performed to determine the independent predictors. Nomogram was constructed based on independent predictors and receiver operating curve was undertaken to estimate the discrimination. Calibration curve was carried out to assess the concordance between predictive probabilities and true risks. Results: GS concordance rate was 40.4%, whilst GSU was 43.4%. There exists no statistical signicance in distribution between global GS and highest GS. The independent predictors are PSA (prostate specic antigen), GPC (greatest percentage of cancer), clinical T-stage and PI-RADS (Prostate Imaging Reporting and Data System) score in the multivariate model. Our model showed good discrimination performance (area under the curve, 0.714). Our developed nomogram was validated internally with good calibration. The model underestimated the risk at the probability range 52-68% and below 28%. The overestimate risk was at the range 29-51% and above 69%. Conclusions: Utilization of basic clinical variables (PSA and T-stage) combined with imaging variable (PI-RADS score) and pathological variable (GPC) increases the predictive accuracy of GSU nomogram improves performance in predicting actual probabilities. From a clinical standpoint, our new nomogram may provide urologists a tool for assessing the risk and making treatment decision for PCa patients.


Background
Gleason score (GS) is a critical prognostic factor for risk strati cation and disease management for prostate cancer (PCa). Even if Gleason grading system has been modi ed over time [1] , the accuracy of biopsy GS for predicting prostatectomy GS was reported to be barely satisfactory [2,3] . Plenty of studies have investigated manifold variables which could have impacts on the GS upgrade from biopsy to prostatectomy. However, the ndings of independent predictors were controversial [4][5][6][7][8][9][10][11][12][13][14] . Models and nomograms were built in recent years, seldom of which included pathological and imaging variables [ 9-12, 15, 16] .
Many studies reported cases of undertreatment in which biopsy GS upgraded by at least 1 score. GS upgrades often bring urologists into a dilemma of assessing the true risk and determining optimal treatment options for PCa. For instance, active surveillance is recommended for patients with GS 6 or 3+4 and not appropriate for ones with GS 4+3 or above [17] . Patients with GS 8 or above should be underwent radical prostatectomy (RP) followed by lymph node dissection and other ancillary therapy in case of PSA failure and poor prognosis [18] . Similarly, external beam radiation therapy is recommended to be combined with androgen deprivation therapy in patients with GS 4+3 or above [19] .
To address this matter, our study focused on examining the rate of concordance and upgrade between initial biopsy and nal pathology, determining the independent predictors and constructing a new nomogram for GS upgrade. We therefore expect to guide urologists to reassess the risk after biopsy and select optimal treatment modalities for PCa patients after a comprehensive evaluation.

Data acquisition and patient selection
We retrospectively reviewed our prospectively maintained PCa database. 166 patients who underwent RP following biopsy from 1st October 2012 to 30th September 2019 were nally enrolled into our study.
Patients who were biopsy-naïve met the inclusive criteria. The exclusive criteria were as follows: (1) patients untreated with any modalities between biopsy and RP; (2) patients with no comorbid cancers; (3) patients with missing data of key variables. Patient information was reviewed consecutively by two researchers for the avoidance of data errors. This study does not contain any human participants or animals and it receives ethics approval from Capital Medical University a liated Beijing Friendship Hospital Ethics Committee.

Technique of biopsy and MRI protocol
Patients received general anesthesia in the lithotomy position followed by transperineal biopsy with an 18-gauge needle. The biopsy was performed under the guidance of transrectal ultrasonography utilizing an extended scheme of systematic biopsy. Each core of tissue was submitted in a separate container.
Sampling approach was standard practice across our center performed by candidate urologists. Each patient signed a written informed consent.
PI-RADS (Prostate Imaging Reporting and Data System, version 2) score assignment and prostate volume measurement were based on multiparametric magnetic resonance imaging (mpMRI). MpMRI were performed before biopsy with protocols consisted of T2-weighted imaging, diffusion-weighted imaging with apparent diffusion coe cient map, and dynamic contrast-enhanced sequences and calculated b value of 1000 or above. All images were evaluated by urological radiologists with expertise.

Pathological assessment
Each core of specimen was assigned and reported separately using global GS by a specialized uropathology group. Hence, non-uniform interpretation of reports could be avoided between pathologists and clinicians [20] . The highest GS was also shown in the reports and we compared it with its global counterpart to investigate how they in uence the concordance. All RP specimens were examined and reported by the same teamwork. Two biopsy parameters were applied to measure tissue tumor extent (TTE) including GPC and FPC [21] . GPC was abbreviated from the greatest percentage of cancer which re ected the amount of cancer involvement in a single core. The de nition of FPC was the fraction of positive cores. Gleason sum upgrade (GSU) was de ned as any score upgrade from biopsy to RP with GS 7 separated into 3+4 and 4+3. Concordance was de ned as GS remaining unchanged after surgery.

Statistical analysis
The demographic data were shown in subgroups of total, GSU and non-GSU. Normality of distribution of the variables was checked using the Shapiro-Wilk tests and P-P plots. Normally distributed numerical variables were analyzed by the student t-test. Mann-Whitney U test was applied to determine the signi cance of nonnormally distributed numerical variables. Chi-square test was used for categorical variables. Univariate regression analysis was performed followed by the multivariate analysis. Variables that were found statistically signi cant in univariate analysis entered the multivariate analysis in a forward stepwise selection. Nomogram was constructed with validated independent predictors. The performance of the prediction model was evaluated by discrimination and calibration. Discrimination was measured using the receiver operating curve (ROC) with the area under the curve (AUC). Calibration was assessed by visually inspecting the plots of predicted probability and actual probability. Statistical tests and gure plotting were performed using computer software of SPSS version 24.0 and R version 3.6.2. Tests were 2 sided and P 0.05 was the threshold for statistical signi cance.

Results
A statistical signi cance was met when it came to variables: cT-stage, PI-RADS score, PSA and GPC (all p < 0.05). Patients in GSU subgroup had a higher ratio for T 2b-2c cancers (61.1%) and PI-RADS score of 4-5 (84.7%), higher PSA level (16.7 ng/ml) and higher GPC (80%). Other demographic details were shown in Table 1.
We found that concordance rate between initial biopsy pathology and nal RP specimens was 40.4% (67/166), whilst GSU was 43.4% (72/166). Favorable concordance rates were found in patients with biopsy GS 4+3 and 8 (30.4% (7/23) and 44.4% (8/18) respectively). The most pronounced increase was for GS 6 upgrading to 3+4 and GS 8 upgrading to 9 (38.2%, 21/55) and the most pronounced decrease was for GS 8 downgrading to 4+3 (38.9%, 7/18). A patient was diagnosed with GS 4 through biopsy, whereas no foci were observed with malignant pathological ndings in RP specimens. We also incorporated this patient (as GS downgrade) into the cohort of non-GSU. The most pronounced discrepancy between global GS and highest GS were within groups of biopsy GS 6, 3+4 and 9, in which the change of patient number is 6, 5 and 5 respectively. Due to utilizing global GS and highest GS separately, the only changes of patient number in RP GS were within groups of GS 8 and 9 (both 2). More details were depicted in Table 2.
Concordance rate was higher and upgrade rate was lower with a trend in the highest GS group. (42.8% and 43.4% respectively). In respective subgroups, however, there exists no statistical signi cance between global GS and highest GS. Besides, Gleason sum change (ΔGS) was shown in Table 3.
In univariate analysis, PSA, GPC, cT-stage and PI-RADS score were predictors of GSU (all P 0.05). According to multivariate analysis, these variables were validated as independent predictors for GSU (all P 0.05). (Table 4) The AUC of our multivariate prediction model was 0.714 ( Fig.1.). Nomogram was constructed based on independent predictors of multivariate analysis (Fig.2.). AUC values of GSU model predictors and point assignments of GSU nomogram were detailed in Table 5. Our internally validated calibration plot demonstrated that the rates of predicted probabilities closely paralleled the observed rates (Fig.3.). Our nomogram for predicting GSU might underestimated the risk at the probability range 52-68% and below 28%. The overestimate risk was at the range 29-51% and above 69%.

Discussion
Discrepancy of GS between initial biopsy pathology and RP specimens is mainly attributed to the following reasons: sampling error, operation-related bias of biopsy, variability of pathological assignment and non-uniform interpretation of pathology reports [20,22] . A systematic review including 14839 patients reported concordance rate was 63%, while overall upgrade was found in 30% [3] . Our study found the concordance rate was 42.6% of cases which was lower than the average level. GSU was 43.4% of all cases which was higher. The rates are comparable with several series but distinct from others which would be attributable to the reasons above. Most surgeons and oncologists would like to choose highest GS in a pathology report for the patient's management [20] . However, highest GS was sometimes present only in cores with smallest amount of tumor, which may not be appropriate for the patients. In our study, we did not found there were any discrepancy in concordance rate or Gleason sum change between global GS and highest GS. The nally changes in RP GS were just within GS 8 and 9 ,which meant that risks signi cantly upgraded in some speci c scenarios. Further investigations would be necessitated to verify the applicability of global GS and highest GS in different cases.
PSA is a widely used indicator for risk strati cation and different outcomes in PCa. As for predicting GS change, PSA might also play a role [4][5][6][7][8][9][10][11][12] . In current study, we found that PSA was also predictive and increased PSA was strongly correlated with GSU. Even though the odds ratio for PSA is slightly larger than one, the cumulative effect in patients with high level of PSA should draw urologists' attention to GSU.
GPC, the percentage of cancer in the core with the highest amount of cancer involvement was reported to be correlated with upgrade [5][6][7]13] . In current study, this pathological variable which was commonly used by pathologists to determine TTE was an independent predictor of GSU. Higher maximum percentage of cancer indicates big tumor volume which might harbor undetected high-grade cancer. Even if small maximum percentage of cancer provides pathologists with inadequate information, it seems that its impact on GS assignment of primary and secondary pattern was subtle.
Clinical T-stage is an important variable as for predicting prognosis and guiding treatment decisions. It is also predictive for Gleason upgrade found in several studies [4,[8][9][10]12] . It is acknowledged that advanced T stage is correlated with large tumor volume, which could bring di culties in sampling small Gleason patterns. However, T2 compared to T3 was found easier to underestimate of true GS in the study of Chun et.al, which is consistent with our study [10] . Most PCa occurs within periphery zone. Extracapsular extension does not mean that the tumor is large in volume, especially in PZ which is compressed by hyperplastic tissue of transition zone in elder patients. Diagnostic accuracy of biopsy GS predicting RP GS varied across different prostate zones, which might give another explanation and need further investigations [23] .
PI-RADS score was validated to have good performance for detecting and localizing of PCa [24] . And mpMRI has demonstrated value in terms of differentiating the indolent tumor from clinically signi cant prostate cancer [25] . We found that PI-RADS score in a two-tier categorization was an independent predictor of GSU, which is consistent with what Song et.al has found [14] . Gleason scores were more likely to upgrade among patients of PI-RADS 4-5 (found in 62.8% compared with 37.2% for PI-RADS 2-3). Highgrade cancers or patterns detected by mpMRI with PI-RADS 4-5 were off target in cores resulting in GSU.
Our study was not the rst to construct nomograms for predicting GSU from biopsy to RP. The biopsy scheme performed in the study of Chun et al. (accuracy, 0.804) predicting sum upgrade were diversi ed (mostly 6-8 cores, 81.3%), whereas number of cores was not included as a variable which would affect the accuracy [10] . Even if the study of Moussa et al. predicting sum upgrade incorporated many variables including pathological parameters into their nomogram, some statistically insigni cant variables rendered the model unstable (C-index, 0.68) [9] . Kulkarni et.al and Capitanio et.al have built nomograms predicting upgrade for patients with GS 6, which yielded accuracy of 71% and 66.1% respectively [11,15] . These nomograms suit for low-risk patients which could not represent all circumstances of upgrade. We have found two studies based on Chinese population. Wang et.al built a nomogram predicting sum upgrade and it showed favorable statistical performance (C-index, 0.795) [12] . They included primary and secondary GS separately which were strongly associated with upgrade and improved the model performance. Wang et.al also found Chun's nomogram showed poor concordance between the predicted and observed probabilities with validation of Chinese population. He et.al not only built a nomogram predicting sum upgrade but also built another nomogram predicting upgrade from GS ≤6 to GS ≥7 [16] . Two nomograms used different variables from each other and both showed good performance (AUC, 0.753 and 0.727 respectively). Our model predicting GSU included pathological and imaging variables.
Including independent predictors rendered our model the favorable statistical performance (AUC, 0.714) and the predicted probabilities closely approximated to the actual risk in calibration plot. When probabilities ranged from 52% to 68% and below 28%, underestimation of the risk occurred, while overestimation of risk ranged from 29% to 51% and above 69%. This tendency illustrates that it's easier to overestimate the probabilities of GSU than underestimate it when predicted probabilities are higher than 50%. When the probabilities are lower than 50%, the discrepancy between the true risk and predicted risk might be negligible.
According to our present study, several clinical recommendations might be taken into consideration.
Patients who are reevaluated to have high probabilities of GSU during active surveillance could start a curative therapy in case of delay of treatment. On the contrary, patients with low probabilities of GSU who are unwilling to or could not receive interventions are more inclined to undergo the watchful waiting or active surveillance. Moreover, patients might bene t from resection of neurovascular bundle or lymph node dissection who are at high risk of upgrade. Similarly, the adjuvant or neoadjuvant hormonal therapy as ancillary treatment might be considered in high-risk patients who receive radiation therapy. These clinical recommendations might give urologists more con dence in clinical decision-making and provide more precise and comprehensive assessment of the risk and more personalized and optimal treatment options for PCa patients.
However, there are limitations of the present study. First of all, information from database included potential inaccuracy and inevitable bias due to the retrospective nature. Secondly, our study was lack of large amount of patients because of our strict indications to perform RP. Thus we need further prospective studies with large amount of selective patients for validation. Thirdly, only patients undergone RP were selected into cohort which might not represent the reality. Fourthly, external validation for our nomogram would be furtherly necessitated, even if our nomogram served as a statistically wellperformed tool. Finally, the accuracy of our model could potentially be improved by integrating additional variables such as serum biomarkers and susceptibility genes.

Conclusions
Pathological concordance of GS between biopsy cores and RP specimens in our center was comparable with other series. Utilizing either global GS or highest GS for biopsy did not seem to have impact on concordance rate. Utilization of basic clinical variables (PSA and T-stage) combined with imaging variable (PI-RADS) and pathological variable (GPC) increases the predictive accuracy of GSU nomogram improves performance in predicting actual probabilities. From a clinical standpoint, our new nomogram may provide urologists a tool for assessing the risk and making treatment decision for PCa patients.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests Funding Not applicable.
Authors' contributions XW was a major contributor in conception and writing the manuscript. XW and YZ analyzed and interpreted the patient data. ZJ and FZ drafted and supervised the work and PY and YT substantively revised it. All authors read and approved the nal manuscript and have agreed both to be personally accountable for the author's own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature. Tables   Table 1 Demographics  Clinical     Clinical