Establishment and Validation of The Axillary Lymph Node Burden Using Cone-Beam Computerized Tomography and Ultrasound-Based Prediction Models in T1-2 Breast Cancer Patients

Background: This study aimed to develop and validate models to preoperatively predict the risk of the lymph node (LN) burden based on the Z0011 clinical trial to assist breast cancer surgical decision-making. Methods: Data on 1394 consecutive patients who presented at Sun Yat-sen University Cancer Center for Cone-beam breast computerized tomography (CBBCT) examinations between April 3, 2019, and July 17, 2020, were retrospectively collected. 387 patients who met the inclusion criteria were included and randomly divided into training and validation cohorts. Clinical-pathological information of all patients was recorded, and images were reinterpreted in this study. A bidirectional stepwise method followed by multi-variable analysis was used to incorporate preoperative features and build optimal model sets with the training cohort for prediction of N 0 versus N + and N (cid:0) 3 versus N ≥ 3 . Results: The ROC curves of two models were generated with the training cohort, and their calibration abilities were estimated using 1000 bootstrap resamples. The bias-corrected C-index of the models were 0.779 (95% CI, 0.752–0.793) in model one and 0.809 (95% CI, 0.794–0.833) in model two for the training cohort. Decision curves and clinical impact curves were plotted to evaluate prediction performance for further clinical application. Delong’s test showed comparable performance of both cohorts. Conclusions: Our models were developed as reliable and noninvasive tools for the preoperative prediction of nodal status, and we hope that they can serve as useful tools for the early planning of treatment strategies for breast cancer patients.


Introduction
Breast cancer is one of the most commonly diagnosed malignancies in women, ranking rst in terms of mortality among women and seriously affecting female health worldwide (1). Because of its rapid progression and insidious symptoms in the early stage of biological invasion, breast cancer should be closely monitored with current advanced technology (2). The outcome of axillary lymph node dissection (ALND) plays an important role in the formulation and implementation of breast cancer regimens, including postoperative radiotherapy and chemotherapy (3). It is traditionally used for patients with metastatic sentinel lymph node (SLN) during surgery. However, a large number of previous reports based on the Z0011 clinical trial veri ed the advantages of forgoing standard ALND, which may cause complications, such as lymphedema, infection, limitations of shoulder motion, major vessel and nerve injury and overtreatment for non-metastatic patients (4,5). Early-stage invasive breast cancer patients with only 1 or 2 SLN metastases can be regarded as having a low burden and receive radiotherapy as a substitute after sentinel lymph node biopsy (SLNB) (6). Compared with ALND, SLNB is mainly applied in early-stage breast cancer patients and has fewer traumatic complications. However, there is a potential risk of false negatives due to the distant metastasis of other regions that are not detected or destruction of the drainage network of patients with a high clinical stage (T 3-4 ) (7,8). Moreover, patients with early breast cancer who have node-negative disease can actually forgo SLNB. Therefore, an accurate preoperative assessment of the lymph node (LN) status can avoid unnecessary traumatic procedures such as ALND and SLNB in the post-Z0011 era (9). Thus, taking these circumstances into account, studies have been undertaken to discover a possible relationship between primary tumor behavior and lymphatic metastasis to aid in reliable preoperative diagnostic evaluation (10).
The application of the textual analysis of radiographic data reveals the heterogeneity within the tumor tissue with a quantitative assessment; however, the accuracy of lesion segmentation restricts its application. Deep learning improves its performance by automatically learning discriminative features from images (11). As a novel approach in the eld of radiology, the utility of its clinical application still needs to be veri ed with more studies. The manual interpretation of medical images still accounts for a large proportion of the actual work. Therefore, prediction models with descriptors on medical images can add value by assisting with LN evaluation.
Cone-beam breast computed tomography (CBBCT), an emerging and promising diagnostic imaging tool, has been broad application currently. Its clinical value has been veri ed (12) with advantages in the visualization of breast masses and microcalci cation with enhancement due to its acquisition of high spatial and contrast resolution images. It has advantages, including the 3D reconstruction of the breast to reduce overlapping glands and increase the clear display of lesions (12). A previous study showed good agreement and repeatability in the assessment of breast density and lesions between readers (13,14). Due to its convenient methods of 3D reconstruction, it can simultaneously display the relationship between vessels and tumors (15). It is time-and cost-saving and provides a more comfortable experience (16). CBBCT achieved comparable diagnostic accuracy to ultrasound\magnetic resonance imaging\mammography; however, its eld of view cannot cover the whole breast area, and LN appearance are not visible on its images (14).
Accordingly, axillary ultrasound (AUS), another routine and cost-e cient diagnostic tool, is frequently used in the clinic currently. AUS can visualize detailed axillary structures and provide accurate measurements. However, the diagnosis of nodal metastasis via AUS is operator-dependent and imprecise for low axillary metastatic burden (17). Jackson et al. (18) suggested a false negative rate of 4% for predicting a high burden of heavy-burden LNs (N≥3).
Although MRI has high sensitivity for detecting abnormalities, it may be limited by its small ROI size and the subjective identi cation of the region of interest from LN, which is de ned by the reader manually (19). Therefore, studies have investigated the combined use of different modalities for LN prediction. Most studies show equivalent values in their constructed models ( 20,21).
In this study, we analyzed a mixture of quanti ed and semantic data from CT-derived features of the primary tumor and LN features with AUS to examine their relationship. We believe that the pooled impact of tumor and LN analysis will improve its prediction.

Clinical Data and Participants
This study used anonymous data and received approval from the Institutional Ethics Committee of Sun Yat-sen University Cancer Center (No. B2019-016) .Written informed consent was obtained from all participating patients in the study. A database of 1328 consecutive patients who presented at Sun Yatsen University Cancer Center for CBBCT examination from January 2019 to July 2020 was retrospectively reviewed. Their clinical records were reviewed, with informed consent obtained from all patients, and only patients initially diagnosed with T 1 -T 2 stage invasive breast cancer were included in the study cohort. All patients had no history of additional systematic treatment, including chemotherapy, radiation, or ipsilateral breast surgery. Finally, a total of 387 patients who underwent surgery with SLNB or ALND after examination with histologically proven malignant breast tumors were enrolled ( Figure 2). The eligible population was randomly divided into two independent cohorts at a ratio of 70.0% (271/387) and 30.0% (116/387).

Protocols for Dedicated Cone-Beam Breast CT
All participants underwent dedicated breast CT examination (KBCT 1000®, Koning Corporation) using a single coil for each breast in the prone position. CBBCT images were acquired for the affected side with precontrast injection, 60 s postcontrast injection, and 120 s postcontrast injection. For each examination, the injection of contrast enhancement materials was performed under generally accepted standard protocols. Iodine contrast agent was injected intravenously with a high-pressure injector at a concentration of 300-370 mg/ml and 1-2 ml per kilogram for the injection dose. The total amount did not exceed 100 ml for a single patient. During the injection procedure, close attention was paid to the patient's physical condition. The injection speed was set at 2.0-3.0 ml/s. Optimal scanning parameters were set automatically at a constant voltage of 49 kVp and tube current of 50-160 mA (calculated automatically according to the size and density of the breast). The standard reconstruction mode (1024×1024, 0.273 mm 3 ) was selected to reconstruct multiplanar images. Dedicated three-dimensional visualization software was utilized (Visage CS Thin Client/Server, Visage Imaging), and the automatic skin removal option was used to better distinguish lesions from the surrounding gland tissue on 3D images.

Reassessment of Image Parameters
Two radiologists (with four to seven years of experience) who were blinded to any clinical and pathological information reviewed all images independently. Both noncontrast-enhanced and contrastenhanced CBBCT images were reassessed in consensus. Discrepancies with reassessment were resolved by consulting experts in breast cancer diagnosis (Y.P.W.) to reach a nal conclusion. ΔCT was computed using a formula proposed by Prionas et al. (24): ΔHU = (HU lesion -post-HU fat -post ) -(HU lesion -pre-HU fatpre ). Distance to the nipple was measured manually on sagittal sections. The relationship between vessels and masses was visualized with specialized three-dimensional reconstruction software. ( Figure   3) AUS images were acquired at our hospital using IU22 (PHILIPS, The Netherlands) and ACUSON S2000 (SIEMENS, Germany) systems with a high-frequency transducer (12 to 15 MHz). Electronic reports were reviewed, and parameters including primary LN morphology, blood ow type and detection site were evaluated and recorded.

Biopsy and Pathology Assessments
For each patient undergoing a preoperative CT scan, concordance with subsequent pathology outcomes was determined after surgical excision, including the evaluation of "histological grade and types, number of metastatic LNs, bundle/vascular invasion and expression of molecular markers". Histopathological subtypes of breast cancer were classi ed as invasive ductal carcinoma (IDC), mixed-type (IDC with other components), and special-type breast cancer. Tumor (nuclear) grade was categorized using the Nottingham grading system as a reference. The expression of estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor type 2 (HER2), and Ki-67 was determined using IHC. Tumors with more than one positive tumor cell on the basis of the nuclear staining intensity were considered to be ER/PR positive (25). The HER2 staining intensity was scored from 0 to 3, and tumors with scores of 3+ or positive results of Fisher's test were considered to exhibit HER2 overexpression. For the Ki-67 expression status, nuclear staining of at least 14% was considered to indicate a high level of proliferation. All LNs were surgically removed via SLNB/ALND and assessed with postoperative biopsy. The presence of micrometastases or macrometastases on SLNB was indicative of a positive LN status.
Isolated tumor cells were classi ed as node-negative. According to the ACOSOG Z0011 criteria, two positive LNs was an optimal cutoff to select patients who can avoid traumatic axillary dissection surgery. Therefore, outcomes of the LN status were divided into three groups to develop two separate clinical prediction models.

Feature Selection and Statistical Analyses
Patients who met the inclusion criteria were retrospectively reviewed and treated as independent variables in statistical analysis. Two models were constructed using the LN status as outcome indicators: a) prediction of negativity and metastasis (N0 versus N+) and b) low burden and high burden (N 3 versus N≥3). All clinical and pathological primary lesion information was derived from a medical database. Continuous variables were analyzed using Levene's test and the Mann-Whitney test, whereas differences in categorical variables were assessed using the chi-squared test, adjusted chi-squared test or Fisher's exact test. Univariate analysis was performed on the primary feature set to choose model parameters.
Then, a bidirectional elimination method was applied to modify the model by repeatedly adding in and removing features until an optimal equation was developed (22,23). Finally, multivariable logistic regression analysis was used to select statistically meaningful parameters and t the best prediction model. Statistical analysis was completed by using IBM SPSS version 26 (IBM, Armonk, NY, USA) and R studio software (version 1.3 https://rstudio.com/products/rstudio/download/). All statistical hypothesis tests were two-sided, and p-values < 0.05 were considered to indicate signi cance. The randomization method was used to divide the primary cohort into two separate groups with a ratio of "0.7 * n". The analysis of continuous and categorical variables was conducted with SPSS software. Feature selection was conducted using bidirectional stepwise regression with the "mass" package. Plotting of the nomogram, calibration curves and clinical impact curves was performed with the package "rms". Decision curve analysis (DCA) construction was performed by means of library (rmda). The evaluation of the prediction ability of ROC curves was performed with "pROC" bag. The Hosmer-Lemeshow test and Cindex were performed using the "ResourceSelection" and "Hmisc" packages.

Assessment and validation of Prediction Performance
Receiver operating characteristic curves (ROCs) were plotted to assess the discriminative performance of the two prediction models. Additionally, the predictive ability of the two models was compared between the combinative or independent use of two preoperative imaging modalities. The area under the curve (AUC) of both examinations was computed in both the primary and internal validation cohorts. DeLong's test, as a quantitative method for comparing the AUC of two curves, was calculated for over tting risk between the training and validation cohorts.The predictive performance of the two models was assessed by calibration curves plotted with 1000 bootstrap samples to evaluate the agreement between the predictions and actual observations. The adjusted C-index was calculated to demonstrate the accuracy of the prediction models with bootstrap correction. Hosmer-Lemeshow tests were used to assess the tness degree.

Clinical Utility of the Final Model
Axillary LN metastasis prediction was illustrated with nomograms integrating clinicopathological features to provide explicit models for image-assisted treatment planning.To determine the clinical signi cance of the nal model, decision curve analysis and clinical impact curve analysis were performed.

Summary of Clinicopathological Characteristics
The distribution of both the training and validation cohorts is described in Figure 1. A total of 239 (61.8%) patients had no metastatic LNs in biopsy after surgery, 78 patients had one or two metastases in the LNs, and the remaining patients (n=70) had more than three LN metastases con rmed by biopsy. According to the Z0011 clinical trial, patients were divided into the nonmetastatic (n = 172), low-risk (n=48), and highrisk (n = 51) groups in the training group. The baseline clinical characteristics for both the training and validation cohorts are summarized in Table 1. Detailed descriptive clinical and pathological characteristics and semantic features are listed in Supplementary Tables.

Data Analyses
On univariate analyses, several factors were found to be signi cantly related to outcomes, including the CT-reported lesion number, maximum diameter, distance to the nipple, thickened or sunken skin, subcutaneous fat space, invasion of the pectoralis major, and relationship between the vessel and mass and the US-reported number of LNs, maximum LN length and axis, LN shape, blood ow type, and boundary between the cortex and medulla, and the pathology-reported bundle/vascular invasion in both models. In model two, lesion number, and the relationship between the vessel and mass were signi cantly related to outcomes( Table 2). Clinical and pathological features were selected with a bidirectional selection method to reduce complexity and downsize predictive parameters. The AIC (Akaike information criterion) was applied in model construction (22). Models corresponding to the minimum AIC values were selected for construction. Selected features were ulteriorly fed on to multivariable logistic regression analyses that performed to eliminate factors and identify the most signi cant predictive factors (p<0.05) with clinical bene ts associated with the severity of nodal burden ( Table 3). The nal models are consisted of features that were adjusted by multivariable analyses.

Illustration of the Prediction Models
Nomograms are straightforward, and our prediction models can be better visualized with the aid of nomograms ( Figure 4). Selected CBBCT and AUS morphological features were given scores based on the regression coe cients to establish scoring criteria. Hence, a total score can be calculated for each patient and used to determine the probability of predicted nodal risk.

Diagnostic Performance of Prediction Models
The prediction ability of models incorporating features selected for both CBBCT and AUS yielded AUCs of 0.884 (95% CI: 0.841-0.927) and 0.891 (95% CI: 0.841-0.940) in the training cohort for models one and two, respectively. Figure 5A and B also shows the detailed performance of the combined prediction models versus the individual prediction ability of CBBCT and AUS. For model one, the AUC decreased to 0.733 for CBBCT and 0.727 for AUS. However, each examination of the prediction of negativity versus metastasis (model 1) and the level of LN burden (model 2) was inferior to that of the combined model. In addition, the ROC curves of the training and validation cohorts with AUC values are shown in Figure 5C and D. The AUC of the validation cohort decreased to 0.878 (95% CI: 0.823-0.933) for model one and 0.880 (95% CI: 0.836, 0.925) for model two. Delong's test showed that the validation cohort was comparable to the combined models performance in the training cohort for model one (p=0.341) and model two (p =0.335).

Internal Validation of Prediction Models
The calibration curve demonstrated good agreement between the performance of the training ( Figure 6A) and validation ( Figure 6B) sets in model one. Figure 6C and D illustrate comparable performance for both groups in the second model. Among these four graphs, diagonal red lines represent perfect performance and are closely t to the diagonal blue line, which indicates better prediction performance. The biascorrected C-index was 0.79 (0.85,0.91). The goodness-of-t analysis with the HL test resulted in a P value of 0.579, which indicated that the model t well. For model two, the bias-corrected C-index was 0.75 (0.81,0.88), whereas the P value of the HL test indicating goodness-of-t was 0.440.
The decision curve for model one showed good prediction performance in both the training and validation groups in terms of its clinical application. Purple curves (Figure 7) were located above the extreme curves, demonstrating that model one can add more bene t within a large range of risk thresholds. However, model 2 yielded no potential clinical net bene t under the threshold of 0.03.
Clinical impact curves were also generated to assess the clinical usefulness of the prediction models ( Figure 8). It visually showed the estimated numbers of individuals who were deemed high-risk and true positives in the range of 0.01 to 0.07 and 0.075 by using risk models one and two, respectively. Given that the rate of LN metastasis was approximately 0.75 and 0.73 (300/400 and 220/300) in each model, the other 600 and 700 participants might avoid ALND with the surveillance of nodal burden through the clinical application of our risk nomogram.

Discussion
The noninvasive preoperative prediction of levels of LN metastasis has become increasingly important since it can provide relevant information to assist with surgical and adjuvant radiation therapy planning.
In this study, we evaluated whether preoperative parameters could quantify the risk probability of LN outcomes in patients with early-stage breast cancer. Outcomes comprising the prediction goals were based on the clinical trial Z0011 and the current surgical regimen. To the best of our knowledge, this is the rst study to date that applied assessment with CBBCT and AUS for nodal status prediction. It represents an innovative method for the early assessment of LN status based on preoperative evaluation with two widely used modalities in the clinic currently. A large number of previous reports adopted the default quantitative values of the radiomics signature from tumor segmentation to reveal additional information that was invisible to the naked eye (16, 22, ). However, the process of manual assessing lesions in diagnosis cannot be replaced by other means. However, LNs are mostly not present on CBBCT scans unless they are enlarged or situated low. Studies have suggested that there are possible connections between the biological behavior of the primary tumor and its lymphatic spread. CBBCT evaluation of tumor size and the number of lesions is important for the prediction of LN burden (26). Multifocality may lead to an additional total tumor burden in patients compared with unifocal breast cancer. Studies have reported that multifocality may also increase lymphatic positivity and the likelihood of locoregional metastasis (27). Additionally, with reference to the TNM classi cation, Cabioglu et al. (28) combined the assessment of the largest diameter to provide a more accurate assessment of LN burden with clinical value (29).
The characteristic of the distance to the nipple is often used as a critical factor in assessment prior to performing nipple-sparing mastectomy. In this paper, the measurement of the distance between the nipple and tumor was included in the nal model because it can affect regional recurrence. Jordan et al. (30) suggested that a tumor-to-nipple distance of 1 cm or less was the only signi cant risk factor for recurrence (OR, 13.5833; p = 0.0385) in univariate analysis. The observation that the distance between breast cancer and the nipple impacts axillary nodal metastases has also been veri ed by several studies (31,32).Obscure subcutaneous fat space was selected as one component in the construction of our model, and similar ndings were reported by Zhang et al. (33) Lymphatic nets within the tumor or inside the fat space were interwoven, allowing subcutaneous metastasis to reach the ipsilateral axilla from the breast parenchyma.
Chang et al. (3) suggested that axillary imaging warrants consideration of at least axillary level I or II to examine lymphadenopathy. A degraded muscular status may be indicative of a shorter time of progression (34). Because the structure of pectoralis muscles is anatomically well visualized on CBBCT scans, we also incorporated the tumor involvement of this feature into our analyses to study its correlation with LN metastasis. Analyses of pectoralis muscle structure may allow better evaluation of the diagnosis in breast cancer patients and facilitate the optimal selection of therapy (35).For highburden prediction, the relationship between vessels and the mass was also observed with CBBCT due to its unique 3D visual function of the tumor and peri-tumor structure. This feature is similar to the blood ow signal from Doppler ultrasound of the primary tumor, and its association with metastatic LN was reported by a few articles. Our prediction that abundant vasculature within or adjacent to the tumor may be indicative of lymphatic burden, however the feature has no statistical value in this literature. (36).
US-reported abnormal shapes of LNs were also reported in previous publications (37). Wang et al. (38) discovered that HR and Her2 expression are associated with a high-burden LN status. The number of suspicious LNs detected was also related to outcome when using LN >2 as the cutoff in the prediction of high burden (38). In this paper, one LN was used as the cutoff for prediction because one abnormal LN on AUS can be predictive of a low burden of LNs (N 1-2 ), as demonstrated by Puri et al. (39) Other USdependent characteristics, such as the maximum LN axis and the effacement of the lymphatic hilum, are closely related to a negative LN status (40). The lymphatic hilum is an entrance for lymphatic vessels and nerves. It is in close anatomical proximity to the medulla and cortex. Nevertheless, the predictive value of the boundary between the cortex and medulla has rarely been described previously. Few studies have demonstrated that the deformity of the medulla is a predictor of LN metastasis (41).
Above all, our constructed models have advantages for clinical application, and several aspects should be noted. First, we developed models that include parameters that are of practical value and reproducible,. We reassessed variables that were generally evaluated during the process of routine examination to nd their association with post-surgical lymphatic outcomes. This type of medical imaging evaluation is time-consuming but still cannot be replaced by computerized methods at present.
Noninvasive preoperative parameters that evaluate the risk probabilities of the LN burden in breast cancer will be of value in clinical practice. Our study demonstrated superior discrimination ability for the classi cation of the LN status using the combination of noninvasive CT and AUS. This association can achieve an equal or higher AUC than noninvasive MRI or US or PET/CT for the evaluation of the LN burden in other papers (14). Second, given that the results obtained from biopsy are regarded as the gold standard and can remarkably improve the prediction performance and are often available after surgery, "bundle/vascular invasion" was not selected for the construction of our models. These factors will introduce some selection bias to the nal assessment. Third, we discovered that tumor biology affected the LN outcome because either CBBCT or AUS alone did not provide a better prediction than their combination. Fourth, we provided a novel view of bidirectional feature selection methods that achieved comparative performance in model building compared with other studies.
Despite the advantages offered by the approaches presented herein, there were some limitations to note.
First, features of mass-like and non-mass enhancement were not included in the study due to the limited number of patients. Further investigation is needed to study the association between the enhancement type and LN outcome. Second, the morphology and distribution of calci cations play an important role in CBBCT screening compared with in breast MRI and US. However, calci cation was not used as a predictor to avoid in uencing the model's accuracy and producing bias. Subsequent studies are required to incorporate preoperative imaging features into more speci c categories, which may be of greater use for clinical decision making. Third, the robustness and reproducibility of the prediction models needs to be validated in an external cohort, and a larger sample size is needed to optimize the prediction models. Fourth, given that micrometastatic invasion in LNs or early stages of breast cancer can have an inconspicuous appearance on images, they may be underestimated by preoperative examinations. Therefore, further study with a combination of computerized assessments is needed. Fifth, a study showed that a subset of patients with a low burden of SLN had a poor prognosis after simple biopsy. Thus, a more precise evaluation of LNs should be performed in clinical practice in the future (42).

Conclusion
This study provided a new perspective on incorporating the advanced technology of CBBCT with USreported nodal appearance to develop effective models to improve the preoperative predictive e cacy of the LN burden in T1-2 patients more conveniently and accurately.

Availability of data and materials
The database analyzed during the current study are available from the corresponding author on reasonable request.The con dential patient data should not be shared.

Competing interests
The authors declare no competing interests.    Percentages of the population in both the primary and validation cohorts for the strati cation of LN burden.