A Risk Stratication Model for Predicting Survival in Breast Cancer Patients With Lung Metastasis: A Study of the SEER Database and A Chinese Cohort

Background Distant metastases are the leading cause of death among breast cancer (BC) patients, and lung is the second preferential colonized sits. However, there has been no prognostic evaluation of BC patients with lung metastases so far. This research aimed to predict the overall survival (OS) and breast cancer-specic survival (BCSS) for BC patients with lung metastases and stratify them into different risk groups. Result A total of 3888 patients from the SEER database were eligible for the subsequent analyses, with the training cohort of 1944 patients and the validation cohort I of 1944 patients. In addition, 374 patients from the Chinese XXX database were assigned as the validation II. Race, age, grade, subtype, number of metastatic sites, surgery, and chemotherapy were identied as determinant prognostic variables and integrated to construct the nomogram. Ultimately, a risk stratication model was established, and all patients were classied as three risk groups. Conclusion We constructed prognostic prediction and risk stratication models for BC patients with lung metastases and validated it in both the SEER cohort and the Chinese cohort; these could assist clinicians in risk prediction, prognosis evaluation, and decision making on an individualized basis.


Introduction
Breast cancer (BC), accounting for 30% of the female cancers, is the most frequently diagnosed malignancy and second leading cause among women worldwide 1 . Despite the treatment breakthrough, its metastases at distant sites remain to be the primary cause of death among BC patients, reducing the 5-year survival rate from 98-27% 2 . Only approximately 6% of patients diagnosed as stage IV disease initially; while approximately 30% of patients diagnosed as early-stage disease initially but developed stage IV disease eventually 3 . Furthermore, owing to the increased detection of metastasis, the incidence rate of stage IV BC increased by 2.5% annually between 2001 to 2011 4 .
It's well recognized that tumors, including breast cancer, tend to metastasize to certain organs 5 . In the case of BC, lung and bone are the most frequently colonized sites, and the liver and brain in a lower extend. Especially, BCs can be categorized into different molecular subtypes based on hormone receptors (HR) and human epidermal growth factor receptor 2 (HER2) and growing evidence reveal that different molecular subtype prefers diverse sites of metastases [6][7] . To be speci c, triple negative (HER2-/HR-), triple positive (HER2+/HR+) and HER2-enriched (HER2+/HR-) subtype preferentially metastasize to visceral organs including the lung, liver, and brain; while hormone receptor positive (HR+) subtype tends to metastasize to the bone [8][9] .
Notably, among all metastatic BC patients, the lung is one of the most common colonized sits, with an autopsy incidence of 80% 10 . Usually, breast cancer lung metastasis (BCLM) patients elicit little or no clinical symptoms until the metastatic tumors replace the vast of the lung. Despite varieties of treatments for BCLM patients, including endocrine therapy, chemotherapy, and target therapy, the prognostic outcomes of BCLM patients are still dismal, with the median OS time of 21 months 11 . Besides, BC is a heterogenous malignancy with diverse characteristics and prognosis. The prognosis endings of BC patients can be affected by the demographical and clinicopathological factors, including race, age, subtype, grade, metastatic sites, tumor-node-metastasis (TNM) [12][13][14][15] . For example, compared to bone metastases, lung, liver, and brain metastases exhibit worse prognosis 16 . According to a retrospective study, among BCLM patients, the survival outcomes of HR+/HER2-and HR+/HER2 + subtypes are better than those of HR-/HER2 + and HR-/HER2-subtype 11 . Thus, it's of great signi cance to evaluate the prognosis in BCLM patients.
In recent years, nomograms are widely used to conveniently and accurately predict the clinical outcomes of the particular population, especially among cancer patients [17][18][19][20] . Thus, nomograms could help the clinicians in prognosis evaluation and decision making in clinical practice, meeting our desire for personalized medicine. Compared with the conventional TNM staging system, nomograms could integrate all parameters related to survival to provide a more accurate estimation 21 (6) with certain follow-up data; (7) with lung metastasis.

Statistical analyses
To construct, validate, and calibrate the nomogram, the eligible population were randomly divided into the training group and the validation group. Descriptive analysis was applied to depict the demographic, clinicopathological features, and socioeconomic features of the population. In the training cohort, OS was estimated in different subgroups via Kaplan-Meier plots and univariate prognostic parameters were compared by log-rank tests. Prognostic factors achieving signi cance at P<0.05 were enrolled for future multivariable analyses via the Cox proportional hazards regression model. The determinant prognostic variables identi ed by the multivariable analyses were chosen to generate the nomograms.
Considering the potential competitive risk factors, BCSS was also estimated via univariate and multivariate analyses in the training cohort. The probability of death was assessed by the cumulative incidence function (CIF) and the difference in CIF between subgroups was tested using the Find & Gray model.
Based on the outcomes of the univariate and multivariate analyses, nomograms were generated via the rooms and survival package to predict the 2-, 3-and 5-year probabilities of the OS and BCSS in patients with lung metastasis.
To validate and calibrate the nomograms, 1000 bootstrap resamples were conducted for internal validation within the training cohort and external validation with the validation cohorts. The Calibration curves were conducted to assess the ability of the nomogram to make unbiased estimates and the Harrell concordance index (C-index) was used to evaluate the discrimination performance via the Hmisc package.
Based on each patient's total risk score from the nomogram for OS prediction in the training cohort, a risk strati cation model was established. All population were divided into three prognosis groups and the respective Kaplan-Meier plots were developed via the survminer, ggplot2, and survival package.
Two-sided P <0.05 were considered statistically signi cant. All statistical analyses were executed by SPSS Statistics 25.0 and R 4.0.2.

Patient characteristics
After data selection, 3888 BC patients with lung metastasis were eligible for the subsequent analyses, with the training group of 1944 patients and the validation group I of 1944 patients. In addition, 374 patients from the Chinese XXX database were assigned as the validation II. The demographic and clinicopathological features of the population are shown in Table1. The distribution of each subgroup in the training cohort and the validation cohort I were similar enough, verifying that two cohorts were randomly divided. In the training cohort, 5 . Therefore, we selected these statistically signi cant prognostic parameters and established a nomogram to predict the 2-, 3-, and 5-year OS.
Breast cancer-speci c survival Assessments of the probabilities of breast cancer-speci c death (BCSD) and non-breast cancer-speci c death (NBCSD) among the training group are shown in Table3. Obviously, black, grade III/IV, HER2-/HRsubtype, 3 metastasis sites to the liver, bone, and brain, and not accepting surgery were only signi cantly correlated with higher two-and ve-year cumulative incidence of BSCD (P < 0.001). Especially, 7 determinant prognostic factors for OS were also signi cantly correlated with the probabilities of BCSD (P < 0.05 for all outcomes), indicating that the previous outcome was not in uenced by the competing events. All prognostic factors which were signi cantly associated with cumulative incidences of BCSD and statistically signi cant for OS via multivariate analysis simultaneously were selected for nomogram construction.
Construction, validation, and calibration of the nomograms Nomograms integrated eight independent prognostic factors to predict the 2-, 3-and 5-year OS and BCSS in the training cohort (Figure2). Scores assigned for the factors in each subgroup are listed in Table4.Among all involved factors, 3 metastatic sites to the liver, brain, and bone had the highest score of 100, followed by subtype (HER2-/HR-: score 96), age (>80: score 69), the number of metastasis sites to the liver, brain, and bone (2: score 68), surgery (no: score 43), grade (III/UD: score 37), race (black: score 34), chemotherapy (no: score29). To predict the probability of 2-, 3-, and 5-year OS, we simply added up the scores of each variable to obtain the total score of individual patients and located the total score on the bottom scales.
The C-indexes for the nomograms predicting OS and BCSS in the training group were 0.707 (95%CI 0.690-0.724) and 0.698 (95%CI 0.702-0.732), suggesting acceptable discrimination performance. In the training group and the validation groups, the calibration curves of the nomograms predicting OS and BCSS

Discussion
Distant metastases are the primary cause of death among BC patients, and lung is the second preferential colonized sits for breast cancer 2 . Therefore, it is necessary to identify the factors related to prognosis and integrate them all to predict individual survival. Moreover, BC is a highly heterogeneous malignancy, and metastatic BC patients have multiple features different from other BC patients. Although numerous nomograms have been established for predicting the prognostic endings in the BC population, no speci c nomogram has been reported for the population with lung metastasis yet [26][27][28] . In our work, we constructed and validated nomograms for predicting 2-, 3-, and 5-year OS and BCSS for BC patients with lung metastasis on the foundation of a large-scale population from the US SEER database and a Chinese cohort. Furthermore, a risk strati cation model was developed according to the total scores of individual patients from the OS nomogram. The nomogram and risk strati cation model could assist both clinicians and patients in personalized decision-making and clinical study design. For instance, highrisk patients need to accept additional systematic therapies, and the clinical follow-up should be narrowed to adjust the treatment protocols timely.
Although distant metastases are the primary cause of death among BC patients, some patients die from other causes. These competitive events might lead to the overestimation of the mortality risks. To assess the impact of competitive events, we introduced a competing-risk model 29 . The 2-and 5-year cumulative incidences of BCSD were 36.5% and 58.3%, respectively. Also, the 2-and 5-year probabilities of NBCSD were 12.3% and 20.4%, respectively, suggesting an approximately three-fold higher risk of BCSD than NBCSD. Notably, except for age and chemotherapy, all factors were found only signi cantly correlated with BCSD (P < 0.001 for all outcomes).
According to the analyses, we identi ed race, age, grade, subtype, surgery, chemotherapy, and number of metastatic sites as determinant prognostic parameters, which were in accordance with the previous publications [30][31][32] . Previous studies have highlighted that older, black, grade III/IV, and HR-/HER2-subtype probably have worse prognosis among stage IV BC patients, and our study con rmed that 11 . Besides, we found that patients undergoing surgery would probably have a better prognosis, prolonging both OS and BCSS time. Earlier studies reported that metastatic BC patients could bene t from initial breast surgery, as the surgery could lower the tumor burden, provide accurate pathological information, and alleviate clinical symptoms 33 . Moreover, a current meta-analysis integrated the results from 30 observation research and highlighted that locoregional surgery signi cantly prolonged OS time in metastatic BC patients, especially in patients with clear margins or limited disease burden 34 . Surprisingly, patients accepting BCS would probably have better prognosis than those accepting mastectomy, which might be explained that patients accepting BCS had lower tumor burden. Surprisingly, the median OS time in the low-, intermediate-and high-risk group for the US SEER cohort and Chinese cohort was 46.1 and 83.5, 28.7 and 50.9 months, and 13.4 and 39.0months, respectively. The difference in the median OS time might have resulted from the small scale of the Chinese cohort, and large-scale, multicenter cohorts would be better. Among the Chinese validation cohort, the calibration curves of the nomograms fell on a 45° diagonal line and the Kaplan-Meier survival plots between different risk subgroups achieved statistical signi cance. Those indicated that the nomograms and risk strati cation models generated according to the US SEER cohorts could also be utilized in the Chinese cohort to identify the high-risk population.
Nevertheless, some limitations should be taken into consideration. Firstly, a proportion of patients were excluded because of lacking certain follow-up data and complete clinicopathologic information for some important variables, such as grade and subtype, which may lead to some bias. Secondly, some prognostic parameters including Ki-67 positivity, multigene signature assessment, body mass index (BMI), Eastern Cooperative Oncology Group (ECOG) performance status and smoking status were not recorded in the SEER database, which might increase the robustness and effectiveness of the predictive model [36][37] . Thirdly, the detailed treatment was not available in the SEER database. Since the systematic treatments could provide better control of distant disease in BC, detailed chemotherapy, endocrine therapy, target therapy, and immunotherapy protocols are of great importance. Fourthly, the SEER database only records patients with distant metastasis. Therefore, patients with recurrence or later metastasis during follow-up are not enrolled in this study. Last but not least, the retrospective study limited the application of our predictive model. Thus, further validation in a prospective cohort is needed before the nomograms are applied in clinical practice.

Conclusion
Using a large-scale population, we identi ed the independent prognostic parameters and constructed nomograms to predict the prognostic outcomes in BC patients with lung metastases in both the US and Chinses population. Our nomogram could visualize the assessment of the probability of 2-, 3-, and 5-year OS and BCSS. Furthermore, we established a risk strati cation model to recognize the patients of high risk who need more personalized treatments.

Declarations
Ethics approval and consent to participate This study did not involve animals.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
This retrospective study was approved by the Ethics Committee Review Board of Fudan University Shanghai Cancer Center (050432), and the need to obtain informed consent was waived.

Consent for publication
Not applicable.
Availability of data and materials: Dataset for this study are available from the Surveillance, Epidemiology, and End Results (SEER) database: Incidence -SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2018 Sub (1975-2016 varying).
The authors have declared that no con icts of interest exist in this work.      Calibration diagrams for the 2-year overall survival (OS) and breast cancer-speci c survival (BCSS) in the training cohort (a, d), the validation cohort I (b, e), and the validation cohort II (c, f). The calibration curves fell on a 45° diagonal line. The perpendicular line indicates 95% con dence intervals.