3.1 Characteristics of patients and NAT response from SJTU-BCDB
A total of 2,409 breast cancer patients from SJTU-BCDB who met the eligibility criteria were selected, of which 1,686 patients were assigned to the training set, and 723 patients in the internal validation set. There were no considerable differences in terms of clinical characteristics, treatment regimens, and NAT response between two cohorts. Table 1 illustrated the clinicopathological of the whole population. The median age was 50 (ranging from 21 to 83) years old. 2,311 (95.9%) patients were diagnosed as invasive ductal carcinoma. 556 (23.1%) patients were classified as cN0 cases. cN1, cN2, and cN3 disease were found in 1,314 (54.5%), 238 (9.9%), and 301 (12.5%) patients, respectively. The proportion of TN, HR-HER2+, and HR+HER2+ molecular subtypes was 25.8%, 20.5%, and 53.7%, respectively. Most patients (77.4%) of the study population were attributed to Ki67≥30%. Approximately one-half of the included patients received NAT regimen containing taxanes (54.7%), and 41.2% of breast cancer patients treated with NAT combining with anthracyclines and taxanes. Targeted therapy was given in 95.9% of the HER2+ breast cancer patients population. After NAT, 85.2% (2053/2409) of breast cancer patients received mastectomy, the rest of patients (14.8%, 356/2409) were treated with BCS.
The total breast pCR rate after NAT was 39.5% (951/2409). A nodal pCR was observed in 1533 patients (63.6%) (P < 0.001). Patients with HR-HER2+ disease had a higher breast pCR rate (61.54%) and a higher nodal pCR rate (78.14%) than those with HR+HER2+ and TN disease (P = 0.001) (Fig.2a). In addition, patients who had a better clinical primary tumor response (CR and PR) were more likely to achieve a nodal pCR (Fig.2b). Nodal pCR rates ranged from 82.19% among patients who had a breast complete response (CR) to 24.66% among those who had a breast progression disease (PD) (P < 0.001) (Fig.2b).
3.2 Correlation between ypN+ and clinicopathological characteristics in the training cohort
Using the training cohort, univariate logistic regression analyses were used to identify significant preoperative factors associated with ypN+ (Table 2). Univariate analyses identified clinical tumor (cT) stage (P = 0.006), clinical nodal (cN) stage (P < 0.001), molecular subtype (P < 0.001), Ki67 expression (P < 0.001), tumor grade (P < 0.001), and clinical primary tumor response (P < 0.001) as independent impact factors for pathological nodal response. These factors were then entered into an adjusted multivariable regression to identify predictors of ypN+.
Multivariate analysis (Table 3) indicated that patients who had cN1 (OR: 5.031, 95% CI: 3.579-7.073, P < 0.001), cN2 (OR: 6.486, 95% CI: 4.102-10.256, P < 0.001), and cN3 (OR: 8.679, 95% CI: 5.636-13.364, P < 0.001) disease were less likely to achieve a nodal pCR compared with those who had cN0 disease. Compared with HR-HER2+ disease, TN (OR: 2.618, 95% CI: 1.826-3.752, P < 0.001) and HR+HER2+ (OR: 3.018, 95% CI: 2.175-4.187, P < 0.001) diseases were more likely to have ypN+. Lower Ki67 expression breast cancer patients (OR: 1.506, 95% CI: 1.148-1.976, P = 0.003) were less likely to achieve a nodal pCR. Compared with those whose tumor grade was unknown, patients who had I/II (OR: 2.731, 95% CI: 2.104-3.544, P < 0.001) and III tumor grade (OR: 3.051, 95% CI: 2.233-4.168, P < 0.001) were less likely to achieve a nodal pCR. Patients who had partial response (PR) (OR: 1.849, 95% CI: 1.045-3.271, P = 0.035), stable disease (SD) (OR: 3.212, 95% CI: 1.758-5.870, P < 0.001), and progression disease (PD) (OR: 4.132, 95% CI: 2.046-8.342, P < 0.001) after NAT had a higher rate of ypN+ compared with those who achieved ycT0.
3.3 Construction and internal validation of the nomogram model
A multivariable logistic regression nomogram was developed using variables including cT stage, cN stage, molecular subtype, Ki67 status, tumor grade, and clinical primary tumor response after NAT (Fig.3). The value of each variable was given a score on the “points” line. Then the total sum for each variable is located on the “total points” line, and a line can be drawn downward to calculate the probability of ypN+. Based on the ROC analysis, the nomogram showed a robust discrimination with an AUC of 0.782 (95% CI: 0.759-0.805) (Fig.4a). The calibration curve of the training cohort showed a high degree of fit between the predicted and actual values, which indicated that the nomogram could well predict ypN+ (Fig.5a).
When the nomogram was applied to the internal validation cohort, the AUC was 0.753 (95% CI: 0.717-0.789), proving that the nomogram had good discriminatory power and provided precise predictions of ypN+ (Fig.4b). The calibration was also good for the validation cohort and indicated that the nomogram was well calibrated (Fig.5b). DCA curve revealed that the nomogram would add more net benefit both in training and internal validation cohort, which indicated that the nomogram had the better clinical predictive power and could serve as an effective diagnostic tool for ypN+ (Fig.6a and Fig.6b).
3.4 External validation of the nomogram model
The external cohort included 108 patients from IPMCH who were enrolled for the external validation of the nomogram. The selection process was shown in Fig.7. A total of 72 (66.7%) patients achieved nodal pCR, and 43 (39.8%) patients achieved breast pCR. The demographic and pathological features of the patients were summarized in Table 4. When the nomogram was applied to the external validation set, the AUC was 0.783 (95% CI: 0.692–0.873) (Fig.8), which showed that the nomogram had good discriminatory power in the external validation data sets. The calibration and DCA curve were also indicated that the nomogram had the better clinical predictive power (Fig.9 and Fig.10).
3.5 Prediction accuracy of different cutoff points
Table 5 showed that the prediction accuracies of ypN+ in our model varied according to the risk cutoff points. The sensitivity and negative predictive values decreased as the cutoff value increased, but the specificity and positive predictive values increased. When predicting the probabilities of patients who were more likely to have ypN+, the patients with false positive rates accounted for 13.2% and 8.8% of those with scores of ≥50% and ≥55%, respectively. Among patients who had a predicted probability of ypN+ ≤20% and ≤25%, the false negative rates accounted for 9.2% and 15.5%, respectively. These results demonstrated that our nomogram model can accurately predict the probability of ypN+ by combining information from routinely available clinicopathological characteristics.