A Nomogram Predicting Survival in Patients with Breast Ductal Carcinoma in Situ with Microinvasion

Background: Ductal carcinoma in situ with microinvasion (DCISM) can be challenging to balance the risks of overtreatment versus undertreatment. We aim to identify prognostic factors in patients with DCISM and construct a nomogram to predict breast cancer-specic survival (BCSS). Methods: Women diagnosed with DCISM were selected from the Surveillance, Epidemiology and End Results database (1998-2015). Clinical variables and tumor characteristics were evaluated and Cox proportional-hazards regression model was performed. A nomogram was con structed from the multivariate logistic regression model to combine all the prognostic factors to predict the prognosis of DCISM patients at 5 years, 10 years, and 15 years. Results: We identied 5,438 total eligible breast cancer patients with a median and max survival time of 78 and 227 months, respectively. Here, patients with poorer survival outcomes were those diagnosed between 1988-2001, African-American race, under 40 years of age, higher tumor N stage, progesterone receptor-negative tumor, and received no surgery (all P < 0.05). The nomogram was constructed by the seven variables and passed the calibration and validation steps. The area under the receiver operating characteristic (ROC) curve (AUC) of both the training set and the validating set (5-year AUC: 0.77 and 0.88, 10-year AUC: 0.75 and 0.73, 15-year AUC: 0.72 and 0.65) demonstrated excellent reliability and robust performance. Conclusion: Our current study is the rst to construct nomograms of patients with DCISM which could help physicians identify breast cancer patients that more likely to benet from more intensive treatment and follow-up.


Background
Ductal carcinoma in situ (DCIS) with microinvasion (DCISM) is a mostly preinvasive breast carcinoma with a small component of invasive disease (presence of one or more foci of stromal invasion, none exceeding 1 mm in size) and presumably has a low but plausible risk of metastasis [1,2]. Tumors with any invasive foci of 1mm or larger in size are de ned as invasive carcinoma [1,2]. Microinvasive carcinoma is an uncommon disease, accounting for a mere 1% of all breast cancer diagnoses [3][4][5]; furthermore, tumor microinvasion is found in association with only approximately 5-10% of DCIS cases [6][7][8]. Microinvasive cancer is rarely ever seen in the absence of an adjacent in situ lesion [6]. This may be due to di culty visualizing an isolated 1-mm invasive component, whereas an adjacent in situ lesion will dramatically enhance its detectability. Consequently, microinvasive carcinoma is usually described as "DCIS with microinvasion" despite the presence of DCIS not being necessary. Although DCISM patients account for only a small proportion of total breast cancer cases, the incidence of DCISM continues to increase along with a very signi cant rise in DCIS as a result of increased detection of breast cancer with the widespread adoption of mammography screening [9,10].
Current guidelines from the National Comprehensive Cancer Network (NCCN) recommend DCIS treatment and systemic therapy utilization for the majority of DCISM cases, which more closely re ects the therapeutic guidelines for DCIS than for that of invasive carcinoma [11]. However, several years ago it was recommended that patients with microinvasive carcinoma be treated the same as patients with small invasive cancers [12]. While surgery is the standard treatment in DCIS and the majority of invasive carcinomas, additional treatment options vary quite widely between the two entities. Most notably, adjuvant chemotherapy is part of the national treatment guidelines for many invasive breast but is not recommended for DCIS [13]. Given that DCISM is relatively rare compared to pure DCIS and most invasive ductal carcinomas, there exists limited and controversial data regarding its tumor biology and diseases prognosis that serves to guide disease management and patient counseling. Several single-institution retrospective studies have reported clinical features, management, and prognostic implications for DCISM, but yield con icting results [14][15][16]. Although DCIS, DCISM, and T1a invasive ductal carcinoma all have generally excellent prognosis, some population-based studies have revealed that DCISM more closely resembles small invasive carcinoma than pure DCIS and many practitioners are treating it accordingly as such [17,18]. Breast cancer, even with microinvasion, is a very heterogeneous disease characterized by diverse histopathologic and molecular features that are associated with distinct clinical outcomes. As a result, it can be challenging to balance the potential risks of overtreatment versus undertreatment in DCISM.
The American Joint Committee for Cancer (AJCC) staging system is a widely used tool for clinicians to predict disease outcomes and guide therapeutic decision making [19,20]. However, given the many variables that in uence the course of cancer, a prognosis based on the AJCC staging system alone is simply insu cient. A precise estimate of DCISM mortality is required to evaluate the clinical implications of this early-stage cancer and guide individualized therapeutic approaches. Nomograms, with the ability to generate an individual probability of a clinical event by integrating biological and clinical variables, help ful ll this requirement and aid in the development of personalized medicine [21][22][23]. There are cur rently no studies constructing a nomogram for DCISM female breast cancer. To address this issue, this study aims to establish a comprehensive and reliable prognostic model of DCISM by building a nomogram to better understand the risk factors and prognosis. To obtain a su cient number of DCISM cases, the Surveillance, Epidemiology and End Results (SEER) cancer database of the National Cancer Institute was used in this study.

Source of Data
Study data was obtained from the SEER database of the National Cancer Institute, an open access resource for epidemiologic and survival analyses of various cancers, consisting of a collection of 18 high quality populationbased cancer registries with very high estimated completeness of reporting. All data is publicly available and deidenti ed, and therefore exempted from the review of an Institutional Review Board. SEER database data do not require informed consent.
The SEER*Stat software from the National Cancer Institute (Surveillance Research Program, National Cancer Institute SEER*Stat software, http://www.seer.cancer.gov/seerstat) (Version 8.1.5) was used to identify eligible patients with the following inclusion criteria: female, diagnosed between 1988-2015, pathological diagnosis of breast ductal carcinoma, unilateral breast cancer, stage T1mic, one primary site only, and known age at diagnosis.
Information regarding the human epidermal growth factor receptor-2 (HER2/neu) status is only available in the SEER database from 2010 onwards; therefore, HER2 variable was not included in the analysis. Patients diagnosed with breast cancer after 2015 were excluded to ensure adequate follow-up time. The pathological diagnosis was based on the primary site and according to the International Classi cation of Disease for Oncology, Third Edition (ICD-O-3). Breast cancer-speci c survival (BCSS) was the primary study outcome of the SEER data, which was calculated as the time period from the date of diagnosis to the date of breast cancer-speci c death. The causes of death were categorized as either breast cancer related or non-breast cancer related. Patients who died of non-breast cancer related causes were censored regarding the date of death.

Nomogram Development
The following clinical variables were extracted for the study: year of diagnosis, age, marital status, race, N stage (derived from AJCC stage group 6 th edition), primary site, laterality, grade, estrogen receptor (ER) status, progesterone receptor (PR) status, surgery, chemotherapy, radiation. Continuous predictors were tested for linearity and converted to categorical variables if the relationship was determined to be nonlinear. Categorical variables were collapsed over categories, with no signi cant differences. For nomogram construction and validation, all cases were randomly divided into training (n = 3,806) and validating (n = 1,632) cohorts with a ratio of 7:3 [24]. Univariate and multivariate Cox regression were then used to screen for variables that signi cantly correlated with BCSS in the training group. After backwards stepdown validation, predictors that remained in the model (P value less than 0.05) were year of diagnosis, age, race, N stage, PR status, surgery, and chemotherapy. The resulting multivariate Cox regression model was used to calculate risk score and build the nal nomogram prognostic model.

Model Validation
The validity of the nomogram was tested by discrimination and calibration [21]. The discrimination was estimated by the area under the receiver operating characteristic (ROC) curve (AUC) [25]. The theoretical value of the AUC is between 0 and 1; an AUC larger than 0.5 indicates prediction performance better than random chance. Calibration curves were plotted to evaluate the consistency between predicted and actual survival rates at 5, 10, and 15 years [22]. A perfect prediction would result in a 45-degree calibration curve (i.e., the identity line).

Other Statistical Methodologies
To account for differences in baseline characteristics across the groups, we matched each patient who received chemotherapy to another patient who did not using the following predetermined factors: year of diagnosis, age, marital status, race, N stage, primary site, laterality, grade, ER status, PR status, surgery, chemotherapy, radiation. Propensity score matching method was utilized and the matching quality was tested. Kaplan-Meier curves, with the corresponding results of log-rank tests, were constructed for breast cancer-speci c survival. The same methodology was carried out for patients receiving radiation therapy. All statistical analyses were performed in SPSS (version 24.0; IBM Corp, Armonk, NY, USA) or R environment (version 3.4.0; Vienna, Austria; http://www.Rproject.org). All tests were two-sided, and P < 0.05 was considered statistically signi cant.

Clinicopathological Characteristics of Pa tients
Application of the aforementioned inclusion and exclusion criteria resulted in a nal study population of 5,438 DCISM cases (Figure 1). These cases were randomly divided into two distinct groups: 3,806 cases were used as the training cohort, while 1,632 cases were used as the validating cohort. The follow-up time ranged from 0 to 227 months (median 78 months) for the training cohort and from 0 to 226 months (median 78 months) for the testing cohort. Patient, disease, and treatment characteristics for the study population are summarized in Table 1. The demographic and clinical variables were similar in the training and validating groups. The majority of patients were diagnosed between 2002-2015, over 40 years of age, Caucasian, tumor grade II-III, ER positive, N0-N1 stage, and had undergone surgery and chemotherapy.

Building Nomogram Prognostic Model in Training Cohort
In the univariate analysis, each of the following variables signi cantly increased the BCSS: "diagnosed in 2002-2015", "age between 40 and 70", "married", "Caucasian", "N0 stage", "grade I and II", "PR positive", "received surgery", "no chemotherapy" and "received radiotherapy" ( Table 2). After stepwise selection via multivariate analysis to further remove potential redundancies, the year of diagnosis, age, race, N stage, PR status, surgery, and chemotherapy were used in the nal nomogram model (coe cients summarized in Table 2). The nal risk scores for 5-year, 10-year, and 15-year BCSS were calculated by adding up the score of each item using the nomogram depicted in Figure 2. It was demonstrated that surgery contributed the most to prognosis, followed by N stage, age, race, chemotherapy, year of diagnosis, and lastly PR status.

Statistical Matching for Chemotherapy and Radiotherapy
Chemotherapy and radiotherapy were both commonly applied adjuvant therapies for treatment of breast cancer. Therefore, survival analyses were additionally performed for these two important variables. To ensure that differences in outcome were not attributed to baseline differences in demographic and clinical characteristics across the therapeutic groups, we performed a 1:1 (chemotherapy: no chemotherapy) matched case-control analysis using the propensity score-matching method. We obtained a group of 726 patients with 363 patients from each chemotherapy group ( Figure 4A). Here, we found that chemo therapy was associated with a better prognosis of DCISM ( Figure 4A, P = 0.018). The same analysis was performed for radiotherapy with a group of 1,588 patients with 794 patients in each radiotherapy group ( Figure 4B). From this, we determined that radiotherapy was not associated with BCSS of DCISM ( Figure 4B, P = 0.872).

Discussion
Because DCISM constitutes a small minority of cases of breast cancer, it has been di cult to de nitively characterize its biological behavior, prognostic factors, and outcomes of multimodality therapy among patients.
Previous studies have reported the prognostic implications and clinical management for DCISM, but the therapeutic recommendations proposed in microinvasive breast carcinomas are highly varied and remain controversial [14][15][16]26]. Recent medical literature shows that current treatment patterns and prognosis of DCISM are comparable to those with small volume invasive ductal carcinoma [17,18]. DCISM breast cancer is a quite heterogeneous disease and could be associated with distinct clinical outcomes. It remains challenging to nd a proper, balanced treatment. In this study, a nomogram prognostic model was developed and validated using a large cohort of breast DCISM cases across the United States. Based on routinely available demographic, staging, and treatment information, this nomogram predicts the survival probability for individual DCISM patients and contributes to the development of personalized medicine.
In our present study, we constructed a comprehensive model based on a combination of various risk factors to predict prognosis of breast DCISM. The seven variables include age, race, year of diagnosis, AJCC N stage, PR status, chemotherapy, and surgery were kept in this nomogram after multivariate Cox regression screening and backward stepwise selection; these were all readily available information in the clinical database. Measured by the concordance index, the nomogram passed the discrimination step with an AUC of 0.77, 0.75 and 0.72 (for 5-,  (Figure 2; nomogram calculations are as follows: age = 2, corresponding to score of 0; race =2, corresponding to score of 35; year of diagnosis = 2, corresponding to score of 0; AJCC N stage = 1, corresponding to score of 42.5; surgery = 1, corresponding to score of 0; chemotherapy = 1, corresponding to score of 32; PR status = 0, corresponding to score of 22; the sum equals 131.5 total points, corresponding to a 10year BCSS of 87% [breast cancer-speci c death of 13%], and 15-year BCSS of 82% [breast cancer-speci c death of 18%]). At the 10-year time point, this patient has a 13% risk of breast cancer-speci c death, calculated by a nomogram that can identify a breast cancer-speci c death event 73% of the time. As depicted by the calibration plots in Figure 3, the apparent and bias-corrected curve both t well with the ideal diagonal line. Additionally, taking 15-year BCSS as an example, this nomogram is more accurate at predicting BCSS of 90% than 97% ( Figure  3E. On the calibration plot for the 15-year outcome, at a BCSS of 90%, the red circle overlaps with the blue dotted line indicating near-perfect calibration; however, at a BCSS of 97%, the red circle and blue dotted line do not overlap. As characterized by the con dence intervals in calibration plots, there obviously lies an additional degree of uncertainty in a nomogram estimation. Thus, we concluded that this nomogram model is nevertheless quite reliable and robust in making accurate assessments and predictions. The prognostic factors described in this study were basically consistent with ndings of previous studies. Younger age, lymph node metastasis, multifocality, positive hormone receptor status have all previously been shown to be of signi cant relevance to the prognosis of DCISM patients [18,[26][27][28]. Excluding the therapeutic factors, AJCC 6th edition N stage contributes the most to the nal risk score (Figure 2), with clear distinctions between two adjacent N stages. The signi cant contribution of N stage to this nomogram strongly suggests that certain subsets of breast cancer may have an enhanced propensity to metastasize, exhibiting a worse prognosis even when the primary lesion is very small [29][30][31]. It is evident from Figure 2 that patients with DCISM treated by lumpectomy had the same or even slightly lower risk scores than those who were treated by mastectomy. Data looking speci cally at DCISM is quite limited. The National Surgical Adjuvant Breast and Bowel Project (NSABP) B-06 has shown that stage I and II breast cancer patients who underwent lumpectomy with subsequent radiation had the same rate of survival as those who underwent mastectomies. A recent study based on well-matched, contemporary data revealed that breast-conserving therapy was associated with superior overall survival compared to mastectomy for early-stage breast cancer [32], and is consistent with the results in our current study.
Undergoing adjuvant chemotherapy in DCISM patients corresponds to a higher risk score according to the nomogram. This might be due to patients at higher risk of relapse being more likely to be selected for chemotherapy. In addition, there are some plausible explanations for why PR status passed the selection process and was kept in the nomogram while ER status did not. Firstly, our study supported the notion that ER positive, PR negative breast cancer is associated with reduced bene ts from endocrine therapy [33] and worse clinical outcomes. Secondly, ER positive breast cancers have a higher distant recurrence risk than triple-negative breast cancer [34], so these patients are mostly treated with endocrine therapy which would signi cantly reduce distant recurrence. Due to the lack of information about endocrine therapy, this therapeutic variable was not included during construction of the nomogram, which may have led to ER status being left out of the nomogram. Prognostic implications of applying different adjuvant treatments are further shown in Figure 4. After propensity score matching, cases treated with chemotherapy had better BCSS as expected, supporting the explanation that patients received chemotherapy had higher risk score was due to clinicopathological factors. When statistically matched, radiotherapy showed no correlation with prognosis, indicating this adjuvant local-regional treatment might contribute more to local control than to BCSS.
There were several limitations in the study. Firstly, the information regarding the HER2/neu status is only available in the SEER database from 2010 onwards. If cases that were diagnosed before 2010 were excluded, the sample size would be dramatically reduced and follow-up time insu cient. Therefore, all cases diagnosed between 1988-2015 were enrolled and HER2 status was not included in the construction of the nomogram. Secondly, the SEER database lacks information about endocrine therapy, so this potential confounding factor could not be analyzed. Thirdly, the retrospective nature of our study may have introduced a certain level of bias in our analysis results. Finally, the sequence of treatment was not considered. Because neither recurrence nor progression is recorded in SEER, we had to treat the therapies as baseline variables instead of time-varying covariates. As a result, it was assumed that the exact treatment combination was determined and given at the time point of diagnosis. Since the exact timing of the treatment is not available, relying on this assumption is necessary to incorporate the therapeutic information into the nomogram.

Conclusion
Our study is the rst to develop a nomogram prognostic model speci cally for DCISM patients. This model was constructed by using a large population-based training dataset and involved multiple treatment facilities, resulting in a smaller sampling bias. Researchers, clinicians, and patients themselves can easily predict the survival probability for each individual case using the readily available clinical information. This would help providers counsel patients more accurately about their prognosis and determine the best treatment strategy. Additionally, this comprehensive and individualized risk score calculation method may be used as strati cation criteria in randomized studies and clinical trials.

Declarations
Ethics approval and consent to participate: Study data was obtained from the SEER database of the National Cancer Institute, an open access resource for epidemiologic and survival analyses of various cancers. All data is publicly available and de-identi ed, and therefore exempted from the review of an Institutional Review Board. SEER database data do not require informed consent. Consent for publication: We have obtained consents to publish this paper from all the participants of this study.
Availability of data and materials: The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
Competing Interests: The authors declare that they have no competing interests.   Flowchart of the case selection process in the study.