A Nomogram Individually Predicts the Overall Survival of Patients with Breast Cancer after Surgery(cid:0)a Retrospective Study in the SEER Database and China

Background: Patients with breast cancer have a poor prognosis. We want to construct a more elaborative and validate nomograms for predicting overall survival in patients with breast cancer. Methods: A total of 68363 breast cancer patients who underwent surgery between 2011 and 2015 were recruited from the Surveillance Epidemiology, and End Results (SEER) database. After eliminating lacking clinical information, 60445 eligible breast cancer patients were randomly divided into the training corhort(n=42327) and the internal validation corhort(n=18118) in a ratio of 7:3. The endpoint of this study was overall survival(OS). Multivariate Cox proportional hazards regression models was performed to identify independent risk factors of OS in the training corhort, and then the nomogram was constructed. The nomogram predictive performance was evaluated by the Harrell’s concordance index (C-index), the time-dependent receiver operating characteristics (ROC) curve (AUC), calibration curve, decision curve analysis(DCA) and clinical impact curve. Moreover, the nomogram was veried by the internal validation corhort and external validation corhort. Results: Age, gender, grade, 7th AJCC stage, ER status, PR status, Her-2 Status, breast subtype were found to be independent risk factor of OS(P<0.05). The nomogram integrating these eight factors was constructed and proved excellent discrimination capability in the training corhort(C-index, 0.724 (95%CI, 0.716-0.732))(cid:0)which was demonstrated in the internal validation corhort(C-index, 0.717 (95% CI, 0.705-0.729) and external validation corhort(C-index, 0.793 (95% CI, 0.724-0.862)). Calibration curve for the probability of 1-, 3- and 5-year OS demonstated good concordance between nomogram prediction and actual observed results in both the training and validation corhort. Besides, the DCA and clinical impact curve indicated the clinical usefulness of our constructed nomogram. Conclusions: We developed a nomogram that integrate clinicopathological variables, which can precisely predict the 1-, 3- and 5-year OS of breast cancer patients after surgery. Validation uncovered preeminent discrimination power for the nomogram, indicating that it presents satisfactory clinical application.

Conclusions: We developed a nomogram that integrate clinicopathological variables, which can precisely predict the 1-, 3-and 5-year OS of breast cancer patients after surgery. Validation uncovered preeminent discrimination power for the nomogram, indicating that it presents satisfactory clinical application. Therefore, the nomogram can help clinicians in formulating the suitable therapy strategies for individual patients.

Background
Breast cancer is one of the most common malignancies in the worldwide and the second leading cause of cancer death in female patients [1,2]. The incidence of breast cancer has been increasing gramatically in recent decades, and the exact pathogenesis remains not clearly known yet. With the continuous progress of various treatment modalities in latest years , but the prognosis of breast cancer is still remains poor.
The American Joint Committee on Cancer (AJCC) TNM staging system is globally recognized and widely used to predict disease progression and design effective therapeutic strategies of breast cancer patients [3,4], but it is largely in uenced by a variety of conclusive clinicopathological characteristics, for example age, gender, histological differentiation and breast subtype, which may also have a favorable impact on survival in patients [5,6]. If the prognosis could be accurately predicted in breast cancer patients after resection, comprehensive scienti c treatment would be taken timely to high-risk patients to increase the survival and decrease the mortality.
The nomogram is considered as a useful and reliable clinical tool to help clinicians and patients estimate the overall survival probability and make personalized decisions by incorporating vital clinical indicators to predict the survival outcome of certain cancers individually [7,8].The good accuracy and simplicity of the prediction of the nomogram make it a new standard to guide the treatment of cancer patients [9].
In this study, our aim is to build a more elaborative nomogram to predict the overall survival rates of breast cancer patients with relatively large cohort in 1-, 3-and 5-year, based the surveillance, epidemiology, and end results (SEER) database.

Study population selection and design
We acquired the study population from the SEER (Version 8. were(1) patients diagnosed according to exfoliative cytology; (2) patients with unkown 7th AJCC stage, ER status, PR status, Her2 status, breast subtype, survival time and follow-up vital status. Ultimately, the data of 60445 patients were enrolled and analysed in this study from the SEER database. Furthermore, the eligible breast cancer patients were randomly splited into two groups in a ratio of 7:3, the training corhort(n=42327) and internal validation corhort(n=18118). In addition, a chinese corhort (n=332) from Taizhou Hospital of Zhejiang Province a liated to Wenzhou Medical University was used for external validation. Our study procedure is shown in Figure 1.
Extraction of demographic and clinicopathological characteristics from SEER database, containing age at diagnosis, gender, year of diagnosis, tumor site, behavior recode, grade, histology diagnostic con rmation, 7th AJCC stage, ER(estrogen receptor) status, PR(progesterone receptor) status, Her2(human epidermal growth factor receptor-2) status, breast subtype, survival months and follow-up vital status. The chinese corhort follow-ups ended in April 29, 2020 or the date of patient death, whichever came rst. The overall survival time(OS) was de ned as the time from diagnosis to death for any reason or last follow-up.
The analysis of data from SEER is not limited by medical ethics review and does not require informed consent. The external validation cohort was approved by the hospital ethics committee, but because it was a retrospective study, informed consent was not required(project number: K20210809) . All procedures conducted in studies involving human participants met the 1964 declaration of Helsinki and its subsequent amendments or similar ethical standards.

Statistical Analyses
Categorical variables were described as frequency and proportion. Continuous variables were described as mean (standard deviation, SD) or median (interquartile ranges,IQR). In order to compare the basic characteristics between training corhort and validation corhort, chi-square test or Fisher's exact test was used for categorical variables, student's t-test was used for normal distribution variables, and Mann-Whitney U-test was used for non normal distribution variables. Univariate and multivariate Cox proportional risk regression model was used to recognize independent risk factors of OS in the training corhort(P<0.05). The survival differences were compared between groups using Kalan-Meier curves. These independent risk factors were used to construct nomogram for breast cancer patients. The nomogram was validated internally(training corhort) and externally(internal validation cohort and external validation cohort). The calibration curves were used to compare the correlation between actual and predicted survival outcomes [9]. Both the Harrell's concordance index (C-index) and area under the time-dependent receiver operating characteristics (ROC) curve (AUC) can be used to assess the discrimination accuracy of the nomogram [10]. The decision curve analyses(DCA) and clinical impact curve were used to evaluate the clinical usefulness and bene ts of the predictive model [11] The SPSS 22.0 and the R software (version 4.0.3) were used for All statistical analysis.Two tailed test P-value less than 0.05 was considered statistically signi cant.

Study population characteristics
The demographic and clinicopathological characteristics of the breast cancer patients are listed in

Independent prognostic factors of OS
In the training corhort, the results of univariate and multivariate analysis are shown in Table 2. The multivariate analysis revealed that age, gender, grade, 7 th AJCC stage, ER status, PR status, Her-2 status, breast subtype were found to be independent risk factor of OS(P<0.05). It's more intuitionistic to present the results as a forest plot ( Figure 2). Figure 3 displays the Kaplan-Meier survival curve in training corhort. The patients with above 60 years old, ER negative, PR negative, Her-2 negative and triple negative had shorter survival time(P<0.05). The grade IV and 7 th AJCC stage IV group had the worst prognosis(P<0.0001).

Nomogram Construction
Based on the multivariate Cox regressions analyses results, a nomogram was constructed ( Figure 4) for predicting OS in training corhort. The nomograms indicated that 7th AJCC stage contributed most to OS.
Just to make it easier to use the nomogram, every variable was allocated a score that a vertical line is drawn upward to ascertain the number of points of each variable value. Add these numbers and nd the corresponding position on the total point axis, a straight line is drawn downward to the survival axis to ascertain the possibility of OS. For example, the patient was less than 60 years old, grade , 7th AJCC stage , ER status negative, PR status negative, Her-2 status positive and the breast subtype was luninal B. Her total points were 13.2, and the probability of 1-year OS survival, 3-year OS survival, 5-year OS survival was more than 95%, 87%, 75%, respectively, on the basis of the nomogram.
The calibration curve for predicting 1-, 3-, or 5-years OS present a good concordance of the nomogram predicted probability with actual observations in both the training corhort, internal and external validation corhort( Figure 6). Ultimately, the DCA curves was carried out to indicated the clinical net bene ts of the nomogram with that of the 7th AJCC stage, grade( Figure 7A,7B,7C). The nomogram predict the OS probabilities showed largest net bene ts in the training, internal and external validation corhor( Figure  7D,7E,7F).

Discussion
In this study, we construct a nomogram to predict the OS of breast cancer patients after surgery, which integrated age at diagnosis, gender, grade, 7 th AJCC stage, ER status, PR status, Her-2 status, and breast subtype. Furthermore, the nomogram was validated using the internal and external validation cohort. This nomogram was more signi cantly predictive(AUC: 0.735) than the 7th AJCC stage(AUC: 0.634). In addition, the calibration curve showed that the predicted 1-, 3-and 5-year OS closely related to the actual observations, whether it is in training, internal and external validation corhort. Similarly, the DCA curve manifested it had largest net bene ts and promising clinical applicability.
According to the results of multivariate cox regression analysis, eight clinicopathological characteristics were found to be independent dangerous factors, containing age at diagnosis, gender, grade, 7th AJCC stage, ER status, PR status, Her-2 status and breast subtype. The nomogram reveals that the 7th AJCC stage is the most important variable affecting OS, which is mainly due to the 7th AJCC stage, including tumor size, lymph node metastasis and distant metastasis, which are very important factors affecting the prognosis of breast cancer [12,13]. Our results indicated that patients who were older than 60 years, the older they were, the worse the prognosis was. As similar with our consequences, age was proved to be a signi cant association with OS in several other studies [14][15][16]. One possible reason for this that OS might be in uenced by age not only associated with the clinical course of a disease, but also with age-related complications [17]. Moreover, aging might facilitate the growth of tumor cells by suppressing the immune system [18]. As can be seen from the nomogram, breast cancer patients who are negative for ER and PR had a poor prognosis, which is in accordance with other ndings [19,20]. Histological grade showed undifferentiated tumors and triple negative breast cancer, which has been recognized as an index of a poor prognosis in breast cancer patients [16,21].
The nomogram is a imaging exhibition of a statistical prediction model that provides survival probability of a particular outcome [22,23]. Therefore, the parameters should be readily retrievable and gaugeable. What's more, the nomogram has enough discriminant capability, and the prediction is in good agreement with the actual observation. In terms of prognosis, nomogram of other tumors has also been proved to be economical and practical. Fang et al. [24]established the nomogram by combining age, tumour size, differentiation, N stage, M stage with tumor location to precisely predict OS of gastroenteropancreatic neuroendocrine neoplasms. Kong et al.[6] integrating age at diagnosis,T stage, N stage, and M stage to construct a credible and powerful nomogram to predict prognosis for adrenocortical carcinoma patients after surgery. There is growing evidence has shown that the nomogram presents a better predictive capacity than the traditional AJCC TNM stage in numerous tumors [24][25][26]. In comparison with the extensively used AJCC TNM stage, our nomogram is not merely simple and convenient, but also provides a accurate individual prognosis for different patients. Therefore, the nomogram will help clinicians to estimate personalized survival probability and make with optimize therapeutic schedule.
In addition to the reliable data sources of nomogram, our study has several advantages. Firstly, the clinicopathological characteristic of breast cancer patients that we collected from SEER database was abundant and comprehensive, therefore assuring construct the accurate and reliable prognostic nomogram. Secondly, the nomogram presents superior discrimination ability in predicting OS compared with 7th AJCC stage. The effectiveness and practicability of the nomogram was veri ed by the internal and external validation corhort. Finally, this research employed eight clinicopathological variables that are easily to obtain and extensively used in clinical practice, which bring simple and convenient for using of the nomogram.
This study had the following shortcomings. First, this is a retrospective study from the SEER database, selection bias was hardly inevitable. Second, we can not get more clinical information from SEER database, such as vascular invasion, radiotherapy, chemotherapy and laboratory data, which, if contained, the sensitivity and speci city of nomogram can be improved. Third, the lack of clinical data and follow-up may affect the discrimination and prediction ability of nomogram. We should design a prospective experiment to further verify the nomogram in future.

Conclusions
In summary, construction of nomogram to predict the postoperative OS in breast cancer patients. Our user-friendly nomogram, an easy-to-use tool for risk evaluation and survival prediction in personalized breast cancer patients, which can precisely and e ciently provide individualized counseling, timely surveillance, and clinical evaluation.

Availability of data and materials
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics approval and consent to participate
The analysis of data from SEER is not limited by medical ethics review and does not require informed consent. The external validation cohort was approved by the Taizhou Hospital of Zhejiang Province ethics committee, but because it was a retrospective study, informed consent was not required. All procedures conducted in studies involving human participants met the 1964 declaration of Helsinki and its subsequent amendments or similar ethical standards.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.    Figure 1 Flowchart of sample selection for this study. AJCC, American Joint Committee on Cancer; SEER, Surveillance, Epidemiology, and End Results; ER, estrogen receptor; PR, progesterone receptor; Her-2, Human epidermal growth factor 2-neu.

Figure 2
The effects of different prognostic factors was described by forest plot. AJCC, American Joint Committee on Cancer; ER, estrogen receptor; PR, progesterone receptor; Her-2, Human epidermal growth factor 2-neu.   The ROC curves in the training cohort(A), internal validation cohort(B) and external validation cohort(C). AUC, area under the time-dependent receiver operating characteristics curve.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.