Clonal evolution characteristics and reduced dimension prognostic model for non-metastatic metachronous bilateral breast cancer

Background How to evaluate the prognosis and develop overall treatment strategies of metachronous bilateral breast cancer (MBBC) remains confused in clinical. Here, we investigated the correlation between clonal evolution and clinical characteristics of MBBC; we aim to establish a novel prognostic model in these patients. Methods The data from Surveillance, Epidemiology, and End Results (SEER) database and the First Hospital of Jilin University were analyzed for breast cancer–specific cumulative mortality (BCCM) by competing risk model. Meanwhile, whole-exome sequencing was applied for 10 lesions acquired at spatial–temporal distinct regions of five patients from our own hospital to reconstruct clonal evolutionary characteristics of MBBC. Then, dimensional-reduction (DR) cumulative incidence function (CIF) curves of MBBC features were established on different point in diagnostic interval time, to build a novel DR nomogram. Results Significant heterogeneity in genome and clinical features of MBBC was widespread. The mutational diversity of contralateral BC (CBC) was significantly higher than that in primary BC (PBC), and the most effective prognostic MATH ratio was significantly correlated with interval time (R 2 = 0.85, p< 0.05). In SEER cohort study (n = 13,304), the interval time was not only significantly affected the BCCM by multivariate analysis (p< 0.000) but determined the weight of clinical features (T/N stage, grade and ER status) on PBC and CBC in prognostic evaluation. Thus, clinical parameters after DR based on interval time were incorporated into the nomogram for prognostic predicting BCCM. Concordance index was 0.773 (95% CI, 0.769–0.776) in training cohort (n = 8,869), and 0.819 (95% CI, 0.813–0.826) in validation cohort (n = 4,435). Conclusions Bilateral heterogeneous characteristics and interval time were determinant prognostic factors of MBBC. The DR prognostic nomogram may help clinicians in prognostic evaluation and decision making.


Introduction
Metachronous bilateral breast cancer (MBBC) with high heterogeneity accounts for 3% of total breast cancer (BC) (1,2). Owing to the increasing morbidity of BC, prolongation of survival time, and improvement of detection rate, a growing number of patients with BC are diagnosed as contralateral disease and treated with mastectomy (2)(3)(4). Recently, majority of studies focused on the risk factors for the formation of contralateral breast cancer (CBC) in patients with primary breast cancer (PBC) (5) and expected to prevent the occurrence of CBC via bilateral mastectomy (6,7), whereas, for clinical practice, it is not yet clear whether patients would benefit from bilateral mastectomy in terms of mortality (6). More importantly, once diagnosed as CBC, how to develop overall treatment strategies and evaluate the prognosis of these MBBC remains confused in clinical (8)(9)(10)(11). Actually, most clinical understanding of MBBC is obviously distinct from unilateral breast cancer (UBC) (12,13). For a patient with MBBC, clinical and pathological characteristics between the PBC and CBC can be consistent or inconsistent, which makes the subtypes of a characteristic multiplying (Table 1). Applying the prognosis evaluation system of UBC to MBBC will be complicated and inapplicable; thus, it is urgently needed to build an evaluation model for predicting the prognosis of MBBC.
Considering the lower morbidity of MBBC than UBC, we collected and analyzed the clinicopathologic and prognostic data from Surveillance, Epidemiology, and End Results (SEER) database. Each patient with MBBC in SEER database had two recordings for PBC and CBC, respectively, but the follow-up outcome was the same. Clinicopathologic characteristics were reclassified into new variables to unify the recordings to represent one patient for the further research (Table 1). In view of the relatively long overall survival of BC, to exclude the death form non-cancer specific causes, competing risk modeling was used to select the independent risk factors that affected the followup outcomes (with p< 0.05 after multivariate analysis in Table 1). Although this classification method combined the PBC and CBC to evaluate the prognosis, the multifarious subtypes of each variable limited the clinical application.
The concordance of molecular subtype was closely associated with survival outcome in synchronous BBC and MBBC. Among them, patients with MBBC had lower molecular subtype concordance rate than patients with synchronous BBC (14). The spatial-temporal heterogeneity (15) between PBC and CBC, in terms of the clinical, molecular, and genomic characteristics (16), makes it more complicated and confused to fully understand this disease. In this study, we investigated the regularity of heterogeneity distribution and clonal evolution characteristics between PBC and CBC and firstly found that the interval time dimension was a determinant prognostic factor of MBBC. Then, with the help of mathematical model, we reclassified the meaningful variables form multivariate analysis of competing risk modeling (Table 1) depended on interval time, which reduced the number of subtypes efficiently and was named as dimensional reduction (DR) ( Table 2). Based on the novel DR competing risk model (Table S5), which reanalyzed the DR variables, a concise and precise DR nomogram was established to help clinicians in clinical prognostic evaluation and decision making.

Study population
We obtained the study participants from the populationbased SEER database  and the First Hospital of Jilin University (2001-2019). With a focus on evaluation of MBBC, we defined survivors of CBC as patients with BBC who survived more than 6 months after the diagnosis of PBC (6). Whereas, fulfilling any one of the following criteria would be excluded: (1) had distant metastases at diagnosis of the primary lesion, (2) less than 18 years of age or older than 97 years of age at diagnosis with PBC, and (3) the duration of follow-up was less than 3 months or withdraw. Finally, we identified 13,304 patients who  Non-P, without partner at diagnosis (single, divorced, widowed, or separated); With-P, with partner at diagnosis (married, unmarried or domestic partner, or same sex or opposite sex partner); T, tumor; N, node; IDC, invasive ductal carcinoma; ILC, invasive lobular carcinoma; BCS, breast-conserving surgery; SM, simple mastectomy; RM, radical mastectomy; ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor 2; +, positive; -, negative. The meaning of symbol "<=" was "less than or equal to". The meaning of symbol "<" was "less than". The meaning of symbol ">" was "more than". The meaning of the bold values means these p values were less than 0.05 and considered as having statistical significance.

Whole-exome sequencing and data analysis
We surgically removed sample acquired at spatial-temporal distinct regions from five patients who received chemotherapy. DNA libraries for WGS were generated by Illumina TruSeq DNA Library Preparation Kit (Illumina, San Diego, CA) from shear DNA fragments with a peak of 250 bps, which extracted from tumor tissues (the QIAamp DNA FFPE Tissue Kit, Qiagen, Hilden, Germany). NimbleGen EZ 64M human exome array probes (SeqCap EZ Human Exome Library v3.0) were used in hybridization. DNA sequencing was performed using an HiSeq 3000 instrument (Illumina, San Diego, CA) with 2 × 75 bp paired-end sequencing strategy. Process of reads alignment and calling for somatic single-nucleotide variations (SNVs) are described in the Supplementary Data.

Dimensional-reduction mathematical model
After plotting the cumulative incidence function (CIF) curves of T stage, N stage, grade, and ER status at different point in interval time by univariate competing risk analysis, the area under the curve of each subgroup was obtained. According to the difference of the areas among subgroups of the variable, the weight used to evaluate the different proportion of PBC and CBC in predicting the cancer-specific death was calculated at a point in time (17,18). A reference index of the weight of primary or contralateral cancer was normalized and standardized as the reference weight value of each point in interval time (19). To fit the weight values, the nonlinear fitting function with parameters was set and the regression coefficients were worked out.

Statistical analysis
Competing risk modeling: Follow-up was begun from the diagnosis of CBC to the date of death or the last recording from SEER or the hospital. In SEER cohort, there were 2,357 patients died from cancer and 2,624 patients died for other causes within 25 years of follow-up, which was suitable for breast cancerspecific cumulative mortality (BCCM) calculated by Fine and Gray's competing risk model (9) to remove interference from other causes of death. We did not censored follow-up at age more than 70 years since other cause deaths could be excluded by the risk-competitive model.
Construction of the nomogram: According to the competing risk model, four independent prognostic variables were included and revised by DR mathematical model. We further screened for prognosis impact factors by Fine and Gray's competing risk regression analysis and constructed a corresponding competing risk nomogram. The eligible patients from SEER were divided into two groups randomly by 2:1: training cohort (n = 8,869) and validation cohort (n = 4,435). There were no differences in clinical features between the training cohort and validation cohort except in surgery method (Table S3). Concordance index (C-index) values were used to measure the discrimination performance and calibration curves were assessed graphically by plotting the observed rates against the nomogram-predicted probabilities via a bootstrap method with 1,000 resamples.
Statistics of clinical characteristics between PBC and CBC were analyzed by c 2 test used to compare categorical characteristics. In the competing risk model, the clinicopathologic factors affecting the follow-up outcomes independently were selected, subdistribution hazard ratios (SHRs) with 95% confidence interval (CI) were calculated, and CIF curves were plotting using STATA Version 15.0. Other data analyses were performed using R software version 3.6.1 (R Foundation for Statistical Computing). Two-sided p< 0.05 was considered statistically significant.

Clinical characteristics distribution of MBBC
Among the 473,909 patients with BC in SEER, 13,304 individuals of MBBC were included, which is consistent with a 3%-4% overall morbidity of MBBC (baseline characteristics in Tables S1-S2). Diagnosis of PBC had a peak at approximately age ranged from 48 to 68 years, and incidence of most CBC was at the ages 56 to 76 years ( Figures 1A, B). The mean age of diagnosis with PBC and CBC was 58 versus 66 years, and the interval time between PBC and CBC ranged from 6 months to 25 years (mean interval, 7 years), and the occurrence risk of CBC gradually decreased as the time interval lengthens ( Figure 1D). In addition, the younger the onset age of CBC means the shorter spacing interval of MBBC that the median interval in patients younger than 40 years was only 3 years (p< 0.0001, Figure 1E). All of the clinical characteristics between PBC and CBC were significantly different, including marriage status, differentiation grade, pathology, tumor size, lymph nodes metastasis (LNM), estrogen receptor (ER) status, progestrone receptor (PR) status, and HER2 status (p< 0.0001; Table S2). Heterogeneity of MBBC was obvious in clinical features, inconsistent proportions between PBC and CBC among the above characteristics were 17.14, 55.77, 42.64, 40.11, 38.18, 27.50, 38.08, and 22.02%, respectively ( Figure 1C).

Heterogeneity of somatic mutations and clonal evolution in BBC
To evaluate the heterogeneity of nonsilent mutations between bilateral tumor lesion, we sequenced 10 spatially distinct regions from five operable patients with BBC. In terms of a single patient, each mutation defined as ubiquitous (present in bilateral tumor regions) or heterogeneous (present in one side of the lesion). Spatial heterogeneity was identified in all five BBCs, with almost all heterogeneous mutations between bilateral tumor lesion (range: 95.4-100%), except for only one ubiquitous mutation GATA3 in patient P03 (Figure 2A).
To further explore the dynamics of the mutational processes shaping BBC genomes over time, the spectra of point mutations in each lesion were dissected. Compared with synchronous BBC (patients P01 and P02), heterogeneous distribution of somatic mutations in MBBC was significantly associated with the sequence of onset and interval time. The mutational diversity of CBC was significantly higher than that of PBC, and the shorter the interval exhibited an increase in somatic mutation of CBC, indicating the poorer prognosis (patients P03-P05, Figure 2B).To characterize the genomic instability process between the occurrence of PBC and CBC, we investigated common mutational signatures via catalogue of somatic mutation in cancer [COSMIC (20), https://cancer.sanger.ac.uk/ cosmic/signatures_v2], which contained signatures 1, 6, 11, and 19. Thereinto, signatures 1 and 11 were closely associated with age of cancer diagnosis and chemotherapy drugs, such as alkylating agent ( Figure 2C).
To further predict the clinical outcomes of MBBC, we integrated clinical, molecular, and ITH (21) features (measured as the percentage of late mutations (22), highlighted the complex interaction between driver status and tumor heterogeneity (23) from multiple layers, and discovered that several highly correlated features were found to stratify patient outcomes, such as MATH (24) ratio and clone numbers ( Figure S1B). Plotting the correlation structure across these features, we discovered that clinical feature interval time was significantly associated with MATH ratio (p< 0.05, R2 = 0.85, Figures 2D-F), suggesting that time interval between BBC is an important reflection of tumor evolution and clinical prognosis. Furthermore, previous large-scale sequencing of pan-cancer studies had reported that patient outcome could be better predicted by clinical features than by genomic features (25). Thus, a prognostic model based on essential clinical features might stratify the prognosis of MBBC.

Comprehensive analysis of PBC and CBC in competing risk model
In our study, to identify the essential features, several clinical parameters were included into competing risk models (1): race, the age at diagnose time of CBC, the interval time ranged from PBC to CBC (2); differences of marriage status, tumor size, LNM, grade, pathology, molecular status, and surgery types in PBC and CBC. In univariate analysis of BCCM, almost all of the clinical parameters had significant differences (p< 0.000) but for race, marital status, and HER2 status. According to multivariate analysis, interval time (p< 0.000), tumor size (p< 0.001), LNM (p< 0.006), grade (p< 0.032), and ER status (p = 0.006) between PBC and CBC significantly affected the prognosis of these patients (Table 1 and Table S4). The surgery method (p > 0.050) was not a critical factor in prediction of bilateral BCCM.

Interval time determines the weight of MBBC characteristics in BCCM
In view of the importance of interval time in tumor clonal evolution, we conducted further stratified analysis at different intervals based on the above prognostic factors. Estimates for BCCM differed across the interval time, significantly survival discrepancy for MBBC patients with spacing interval< 3 years, 3-7 years and > 7 years. When patients diagnosed with CBC within 3 years, critical clinical features (T stage, N stage, grade, and ER status) of PBC and CBC almost simultaneously inflected the BCCM of patients. Once patients with interval time > 7 years, clinical characteristics in CBC had a prominent impact on the prognosis of these patients, suggesting that interval time might determine the weight of clinical features on PBC and CBC in prognosis evaluation. While for patients with interval time within 3-7 years, the distribution of clinical features was between the above two-time dimensions (Figure 3).

Bilateral characteristic DR model of BC based on interval time
Considering the important role of interval time in MBBC and the interference of interphase with other clinical factors on prognosis assessment, we illustrate the DR model dependent on interval time. Taking the T stage as an example ( Figure 4A), we introduce a comprehensive indicator/index to describe the correlation between the bilateral staging: Where w c (t),wp (t) represent the incidence interval t dependent weighting function, which can be estimated by data fitting. We draw the CIF curve f ij of T stage with respect to interval, by univariate competing risk analysis (PBC defined as i, CBC defined as j), then calculate the L 1 norm of f ij for each subgroup For each interval t, we define the L 1 norm difference among all the subgroups as D p , Normalizing all the D p , and define it as the (PBC) weight w p for each interval t. By nonlinear fitting, we obtain the relationship between w p and t, where a, b, and c are the coefficients. Then the weight for (CBC) is defined as On this basis, we employed the same procedure to deal with N stage, tumor cell grade, and ER status (Figures 4B-D).

DR nomogram
According to the DR CIF curve, we reduced the key clinical feature data of PBC and CBC in two-time dimensions to the unified dimension and assigned different weights. The weight of T stage on PBC and CBC, for example, a patient with interval of 5 years (T3 stage on PBC and T1 stage on CBC), was 0.37731 and 0.62269, respectively. Ulteriorly, referring to assignment score in Table 2, the DR of T stage (T DR stage) = 3 × 0.37731 + 1 × 0.62269 = 1.75462, so T DR stage was II stage (score: 1.51-2.0). The DR stage parameters were included in the multivariate analysis of competing risk, T DR (p< 0.001), N DR (p< 0.001), grade DR (p = 0.028), ER DR (p< 0.001), to generate a DR nomogram ( Figures 4E; Figure S1).
The C-index of the nomogram was 0.773 (95% CI, 0.769-0.776) in training cohort (n = 8,869), and 0.819 (95% CI, 0.813-0.826) in validation cohort (n = 4435), respectively. In contrast to modeling for a single-spatial and temporal dimension in previous studies, DR nomogram was proved to have higher predictive power. Calibration plots revealed superb agreement between the nomogram-predicted probabilities and actual observations ( Figures 4F, G).

Discussion
With regard to MBBC, no matter CBC is primary or metastatic, spatial-temporal heterogeneity between the PBC and CBC poses a significant challenge for assessing the prognosis and designing effective treatment regimens (26, 27). However, up to now, no studies have described the heterogeneous distribution and clonal evolution characteristics of these patients with MBBC (28,29).
In this study, we collected samples and clinical data of BBC at disparate intervals from combined with large sample data from the SEER database, to analyze the clinical heterogeneous features and clonal evolution characteristics of MBBC from time and space dimensions. We verified that significant heterogeneity in genome ( Figure 2A) and clinical features (Table S2) of BBC was widespread, especially for the diversity of driver gene mutation that was almost completely distinct between PBC and CBC. This significant heterogeneity poses a great challenge to the establishment of clinical prognostic models, which just based on unilateral lesion.
More importantly, we found that all of CBCs exhibited more different driver mutations and/or recurrent copy number aberrations than that in PBC, and the mutational diversity of CBC was significantly higher in patients with shorter interval time. In addition, a shorter interval time was significantly associated with a higher MATH ratio and poorer survival, mostly owe to the age of CBC diagnosis ( Figure 1E characteristics, just from one lesion (2,3), cannot reflect the real prognosis. Some studies focused on the influence of worse characteristics on disease outcome (30). However, our study found that interval time plays an important role in prognosis of MBBC apart from clinical and molecular features. Thus, we included the interval time into account in our prognosis models (14).
To resolve the issue, we built a bilateral evaluation model that synchronously take heterogeneity of clinical features on both sides of the lesion into account for the first time, including T stage, N stage, grade, ER status, and interval time. Even so, the time dimension (interval time) was proven to have complex correlations with the other prognostic factors (p< 0.0001, Figure 1E, Figure S2), which could interfere with the predictive efficacy of the prognostic model. Thus, we reduced the time dimension dependent on the weight of clinical features of bilateral lesions at distant time node using CIF curves by crossing over with mathematics, to establish DR nomogram for actual observation for 3-, 5-, and 10-year BCCM, which was significantly better optimization of prognosis stratification than a traditional nomogram, and C-index improved by 0.05 and 0.06 in training cohort and internal validation cohort, respectively. In addition, this nomogram was only based on four basic clinical features, which greatly improves the clinical applicability of this model and facilitates clinical popularization. The application of the dimension reduction method could also extrapolate the prediction model to the clinical prediction of synchronous BC, and only the weight balance of occurrence of BBC is 0.5. A study-based SEER showed that the CBC were more and more likely to be detected at an early stage within short interval time (<= 1 years) and treated with mastectomy (4). The explanations of choosing mastectomy over breast-conserving surgery were complex and unclear, and we think with the help of the nomogram, more sensible therapeutic schedule will be made.
Validation of the nomogram is essential to avoid over-fitting and determine generalizability of prognostic model (31). In the current study, calibration plots showed optimal agreement between prediction and actual observation, guaranteeing the reliability and feasibility of the established nomogram ( Figures 4F, G). The much higher C-index of the DR nomogram was revealed in internal validation cohort than that in the training cohort, indicating the effective repeatability. In the tentative external validation cohort from China (n = 89), the C-index was similar with the training cohort, suggested that the model was adaptable to the Asian population in spite of the small sample size ( Figure S3). Even so, the validation of large sample and multicenter clinical data is still needed in the future.
On the other hand, whole-exome sequencing showed that gene mutations seemed to be completely different in PBC and CBC, hard to pin down the correlations with specific genetic mutations. However, the mutation signatures were all concentrated in the characteristics related to chemotherapeutic drugs alkylating agents, suggesting that the significance of drug stress selection in clone evolution (31,32). This study provided an excellent in vivo model for improving the understanding of tumor evolution, which would guide clinical decision making to a certain extent. For example, the significance of chemotherapy elimination regimen for the malignant evolution of contralateral tumors and long-term outcome in patients with early BC should be considered.
However, our study still has the following limitations. First, given the low incidence of BBC with 0.22−3.08% in China (33), only 89 patients with BBC were included in this study among 25,119 cases from the First Hospital of Jilin University. Despite this external validation of the DR nomogram showed similar C-index with the training cohort, the small sample size still has the possibility of analysis bias. Even so, the validation of large sample and multicenter clinical data is still needed in the future to enhance the credibility of the results and applicability in clinical practice. Second, limited by the types of clinical and molecular factors included from SEER database, molecular indicators such as BRCA mutations were not included. Thus, it warrants an extend sample size with complete molecular, pathological, and clinical features to verify its clinical benefit.
In conclusion, we established and validated a novel DR nomogram for predicting BCCM of patients with MBBC. The clinicians could more precisely estimate the survival of individual patients and identify subgroups of patients who are in need of a specific treatment strategy by this nomogram.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the Ethics Committee of the First Hospital of Jilin University. The patients/participants provided their written informed consent to participate in this study.