Data synthesis: A total of four hundred and five potentially relevant publications were found according to our initial search strategy, one hundred and sixteen publications from PubMed/Medline, two hundred and eighty-nine publications from Embase. There were three hundred and one publications after duplicate publications were removed; of these, we excluded two hundred and sixty-eight articles because the title or abstract did not meet the eligibility criteria. Eight systematic reviews, four studies on metastatic breast cancer, and eleven non-neoadjuvant chemotherapy were further excluded. Eventually, nine articles[11–19] (eight studies) were identified as eligible for our analysis, including eight independent studies for OS, six studies for RFS, and two for DFS[14]. The flow chart of the literature search and study selection is shown in Fig. 1.
Study quality: The eligible studies were conducted in Australia, Saudi Arabia, Italy, Peru, the USA and Canada. The number of participants ranged from 58-1101 per study for 4521 patients across the ten studies. Detailed baseline characters of each eligible study were listed in table 1. Table 2 showed the HR results from each eligible study.
Data synthesis: The combined HR for OS was 1.52 (95% CI 1.29–1.78; P = 0.000) by fixed-effects model. No statistically significant heterogeneity was found (P = 0.114; I2 = 39.8%), and this difference was statistically significant (Z = 4.31; p = 0.000), Fig. 2.
The pooled HR for RFS/DFS was 1.47 (95%CI: 1.27–1.71,I2 = 61.9%,Fig. 3a) by random-effects model, with significant heterogeneity. When the study by T.L. Sutton et al. [12] which contributed substantial heterogeneity was excluded, low heterogeneity was found (P = 0.142, I2 = 37,6%), the pooled HR was 1.41 (95% CI:1.22–1.64), the data was statistically significant (Z = 3.77, p = 0.000), Fig. 3b.
Sensitivity analysis was used to assess the root of heterogeneity. As shown in Fig. 4, the individual data set had no significant influence on the OS and RFS/DFS, demonstrating the reliability and stability of the results in our meta-analysis.
Asymmetrical funnel plot showed in Fig. 5a and P = 0.002 < 0.05 for Egger's test demonstrated that our studies existed publication bias for OS. Statistical tests also showed publication bias for RFS/DFS was found in our study ( p < 0.05 for Egger's test and p = 0.003 for Begg's test), Fig. 5b. Then, trim-and-fill method was conducted to adjust funnel plots[20, 21]. Three missing studies were added in analysis of RFS/DFS and four studies in OS, Fig. 6. The recalculated results were still significant for RFS/DFS (HR = 1.33, 95% CI = 1.04–1.72; random-effects model; p < 0.01) and OS (HR = 1.38, 95%CI = 1.07–1.78, random-effects model; p < 0.01), indicating the conclusions in our meta-analysis were stable and reliable.