Dual-energy three compartment breast imaging (3CB) for novel compositional biomarkers to improve detection of malignant lesions

1 We explore a compositional breast imaging technique known as three compartment breast (3CB) to improve malignancy detection. The addition of 3CB compositional information to computer-aided detection (CAD) software 3 improved malignancy predictions resulting in an area under the receiver operating characteristic curve (AUC) of 4 0.81 (confidence interval (CI) of 0.74-0.88) on a held-out test set, while CAD software alone achieved an AUC of 5 0.69 (CI 0.60-0.78). We also identified that invasive breast cancers have a unique compositional signature 6 characterized by reduced lipid content and increased water and protein content when compared to surrounding 7 tissues. Clinically, 3CB may potentially provide increased accuracy in predicting malignancy and a feasible avenue 8 to explore compositional breast imaging biomarkers. X-ray attenuation also a distinctly different study is to demonstrate that compositional profiles of the combined with can improve specificity of breast cancer detection. A dual-energy mammography technique known as 3- 61 Compartment Breast (3CB) imaging was used to obtain the lipid-water-protein (LWP) fractions of the breast on a 62 pixel-by-pixel basis. The 3CB scientific principals and imaging protocols have been previously presented 19,29 as 63 well as the characteristics of malignant versus benign lesions. 20,30 To quantify the added clinical value of 3CB 64 imaging, we compared the performance of CAD based models to identify malignancies without and with 3CB lesion 65 characterization. Malignant and non-malignant masses and hormone receptor status were further studied to better 66 understand the biological mechanism which led to increased specificity of models that include 3CB composition.


INTRODUCTION
Breast cancer is the leading cause of cancer death among women globally 1

. Early detection with screening 11
mammography has a beneficial impact on survival and has been shown to reduce cancer mortality. [2][3][4][5][6] However, the 12 accuracy resulting from breast imaging technologies still has room for improvement. For instance, in the U.S., 71% 13 of biopsies do not result in a breast cancer diagnosis suggesting a modest specificity. 7,8 Furthermore, breast density 14 affects the accuracy of full field digital mammography (FFDM) since dense tissue can mask tumors, diminishing 15 the sensitivity of mammography by 10-20% compared to women with fatty breasts. 9 Compared to FFDM, digital 16 breast tomosynthesis (DBT) increases cancer detection rates and decreases recall rates. However, the added benefit 17 of DBT is difficult to quantify and studies have demonstrated that, positive biopsy rates following screening DBT 18 are similar to those following screening FFDM. 9,10 Also, in a registry study including over 1.5 M screening 19 mammograms from 46 registry sites, it was shown that women with the extremely dense breast tissue had neither 20 reduced recall nor increased cancer detection rates for DBT compared to FFDM. 11 Improvements to sensitivity and 21 specificity are needed and could result in an increase in detecting malignancies and reduction of unnecessary, benign 22

biopsies. 23
The fundamental information that a radiologist uses, the attenuation of X-rays from a single exposure, has remained 24 the same since the inception of breast imaging in 1913. 12 Without additional information, mammography provides 25 only relative radiopacity (i.e. tissue density relative to a background of fat) and lesion type, such as mass, 26 asymmetry, or calcifications. Lesion classification is limited to detection of calcifications, which are often benign, 27 as well as the shape and symmetry of high-density breast masses. Thus, lesion classification has limited reliably in 28 predicting an invasive breast cancer. Computer-aided detection (CAD) software attempts to improve the diagnostic 29 accuracy of mammography through the utilization of computer vision and artificial intelligence algorithms to 30 automatically identify anomalies. 13 Yet, the fundamental information used by CAD is identical to the information 31 radiologists use. While CAD has been shown to be clinically beneficial by some, 14,15 others have shown that the 32 addition of CAD had no significant improvement to screening sensitivity and specificity. 16 It is likely that the limit 33 breast thickness) a system of three equations was solved which resulted in the LWP thicknesses at each pixel. 81 Absolute accuracy of this technique has been previously verified using reference standards. 19,32 82 Pathology results were reported on all biopsies and radiologist delineated the mammographic abnormalities on 83 presentation mammogram images. This work reports on results from 349 participants after exclusions. Participants 84 were excluded if biopsy site annotation coordinates could not be correctly registered on presentation or 3CB images, 85 if lesion pathology was incomplete, or if the 3CB data set was incomplete. The 3CB protocol requires that images 86 be acquired on calibration objects prior to patient imaging and the absence of calibration images or poor image 87 quality, due to excessive movement between HE and LE image acquisition, resulted in an incomplete 3CB data set 88 and exclusion.  The corresponding 3CB LWP thickness maps were generated for all 660 FFDM images and were used to quantify 98 the composition within the radiologist delineated ROIs. Standard presentation images and the corresponding 3CB 99 composition maps can be observed in Figure 2a. Note that the 3CB images are thickness maps where each pixel 100 corresponds to a thickness, in centimeters, of a given composition. To abstract compositional information away 101 from morphological features, we computationally extracted nine measurements to quantify the composition within 102 a given region. These nine measurements included the mean, median, standard deviation, minimum, maximum, 103 kurtosis, skew, total and percentage value of all pixels contained within a ROI. 104 Three additional outer ROIs were derived from the lesion ROI to capture the background or tissue immediately 105 surrounding a lesion, see Figure 2b. Each outer region captured all pixels extending 2mm from the border of the 106 previous region. Therefore, the first, second, and third outer regions contain all pixels extending from the edge of 107 the lesion ROI out to 2mm, the edge of the first outer region out to 2mm, and the edge of the second outer region 108 out to 2mm, respectively. In other words, in relation to the lesion border, the first, second, and third outer regions 109 measure 0 to 2mm, 2mm to 4mm, and 4mm to 6mm, respectively. For each lesion, we obtained nine compositional 110 measurements from four ROIs (lesion and three outer regions) on each of the three compositional images (3CB 111 LWP maps) which resulted in 108 compositional features per lesion ROI. 112

Clinical CAD lesion detection 113
Low energy, standard FFDMs were processed using commercial CAD software (SecondLook, version 7.2, iCAD, 114 Nashua, NH) to identify suspicious masses and calcifications. The CAD software utilizes a proprietary algorithm 115 to delineate suspicious ROIs for masses and individual calcifications as well as assigns a probability of malignancy 116 for each delineation. Note that for input to our analysis, we used the calcification cluster ROI rather than each 117 individual calcification ROI. Calcification cluster ROIs were calculated using the convex hull or minimum envelope 118 which encompasses all calcifications associated with a cluster. Therefore, CAD delineated ROIs, used in our final 119 analysis, may consist of either a suspicious mass or a calcification cluster. 120 From 660 images, CAD delineated 1187 ROIs. Only 418 CAD delineated ROIs had a 25% or greater overlap with 121 the radiologist delineated biopsy sites. Overlapping ROIs were included in the modeling dataset and the 108, 3CB 122 features were extracted for each ROI. See Figure 1a. 123 The 769 non-overlapping CAD ROIs were excluded from our analysis because they did not overlap biopsy sites 124 and thus, pathology diagnosis could not be confirmed. Of the patients with DCIS pathologies, CAD failed to 125 delineate any ROI on one patient resulting in a complete miss. CAD missed one delineation for the CC view for 126 one patient and missed another delineation on the MLO view for another patient. In total, four DCIS ROIs were not 127 identified by CAD. Of patients with IDC pathologies, CAD completely missed delineations on seven patients, 128 missed three delineations on the MLO view, and one delineation for the CC view. In total, 18 IDC lesions were not 129 identified by CAD but were delineated by the radiologists and are present in the final data set. 130

Predictive modeling with morphology and 3CB 131
The final dataset, consisting of 1107 ROIs (689 radiologist and 418 CAD delineated, see Figure 1a) from 349 132 patients, was split by patient into a train, validation, and test set using a 60%, 20%, 20% split. The data was split by 133 patient ID such that all ROIs for a given patient remained exclusively in one of the three datasets. This data split 134 condition ensured no data leakage and ROIs from a single patient, which are highly correlated, did not end up in 135 both the training and test set, for example. To reiterate, the train, validations, and test data sets contained their own 136 unique subset of patients and patient ROIs and the test set contained 20% of the patients. 137 A neural network model was trained to predict malignancy probability from the 108 extracted 3CB features and the 138 prediction from CAD. CAD predicts probabilities of malignancy rather than specific lesion type. To compare 139 against CAD performance, target labels were created for our dataset which combined BN and FA pathologies into 140 a non-malignant label. ROIs with DCIS and IDC pathologies were also combined into a new malignant label. The 141 final model was trained to output these new targets or probability of malignancy. 142 On the unseen, independent hold out test set, the commercial CAD output of probability of malignancy resulted in 143 a mean area under the receiver operating characteristic (ROC) curve of 0.69 and a 95% confidence interval (CI) of 144 0.60-0.78. On this same test set, the neural network model, which utilized both morphological features captured by 145 CAD and compositional features derived from 3CB, resulted in a mean area under the curve (AUC) of 0.81 and CI 146 of 0.74-0.87. Bootstrapping (1000 bootstrap samples) was used to calculate the mean AUC and 95% confidence 147 intervals for ROC curve presented in Figure 3a. 148

Quantifying the added diagnostic benefit of 3CB for malignancy prediction 149
To quantify the added value of 3CB's compositional information to information derived from standard clinical 150 imaging, we calculated the integrated discrimination improvement (IDI) and net reclassification improvement (NRI) 151 of malignancy prediction 33,34 . IDI and NRI offer additional insight into the benefits of new biomarkers by evaluating 152 for non-events is 13.2%, and the overall IDI, which is the summation of all IDI events and non-events, is 12.1%. 162 Figure 3b also allows for the investigation of lesion reclassification with respect to BI-RADS assessment categories. 163 Vertical dash lines indicate the border between BI-RADS categories. From left to right the lines indicate the BI-164 RADS 3/4a, 4a/4b, 4b/4c, 4c/5 borders at risk threshold of 2%, 10%, 50%, and 95% 35 , respectively. NRIs for events 165 and non-events were calculated at each BI-RADs risk threshold. For all events or malignant lesions, a positive NRI 166 indicates that the new model, which includes 3CB, more correctly predicted a lesion's malignancy probability 167 resulting in a higher score. For non-events or benign lesions, a positive NRI indicates that the new model correctly 168 changed a lesion's malignancy probability from a higher to lower score. The total NRI for a given threshold is the 169 summation of the events and non-events NRIs. The total NRI at each BI-RADs risk threshold (3/4a, 4a/4b, 4b/4c, 170 4c/5) is 4%, 15%, 29%, and -28%. The overall NRI, which is the summation of all NRI events and non-events 171 across all thresholds, is 25%. A breakdown of each test set ROIs classification by both reference and new models 172 as well as their NRIs are presented in Figure 3c.  Previous work showed that compositional features were predictive of lesion pathology and models could be built 229 to reasonably identify malignancies from composition alone. 38 Like trained radiologists, CAD software only has 230 morphology, texture, and image opacity available to make malignancy probability decisions. When combining these 231 morphologic features with compositional features from 3CB in our neural network model, the AUC on the test set 232 increased. The IDI and NRI analysis showed that the boost in AUC is attributed to increased specificity via the 233 reduction of false positives or lowering malignancy probability on non-malignant lesions that CAD had previously 234 assigned a high probability. The potential reduction in false positives is highlighted by the large red area between 235 the new and reference models in Figure 3b. The addition of 3CB features also resulted in more accurate BI-RADS 236

classification of malignant lesions and reclassification to lower BI-RADS categories for non-malignant lesions thus 237
demonstrating an increase in confidence levels with respects to the decision to biopsy. Using 3CB imaging to 238 increase specificity has the potential to be clinically beneficial with only minimal additional risk (10% additional 239 dose from the acquisition of a second, high-energy, mammogram). It should also be noted that this method of lipid signature, which is the difference between the lesion and its surrounding region, increases as the region of 247 comparison is moved further away from the lesion border. This further supports the aggressive growth natures of 248 invasive cancers in that it begins to metabolize lipid from its peripheries. While non-malignant lesions were 249 significantly different from malignant lesions for all composition types (LWP), it is likely that predictions were 250 primarily driven by the lipid compositions since that signature is the greatest. There is a positive correlation between 251 the magnitude of each compositional signature and the distance away from the lesion border. In other words, there 252 is a gradient difference in tissue composition such that the composition becomes more different than normal breast 253 tissue nearer the lesion. Although our models and analysis are focused on detection, gradient compositional changes 254 of the breast could be useful in a screening situation as well. 255 Recall, 108 3CB features were purposely extracted from the image in order to abstract the compositional 256 information away from morphology. However, the entire 3CB thickness maps of LWP affords more information, 257 on many orders of magnitude, than what was captured by the 108 features we used. In addition, there are more A reader study demonstrated CAD's ability to improve radiologists' ability to detect breast cancers 39 . Since we 261 demonstrated improvements to CAD prediction with 3CB, it is reasonable to presume that the addition of 3CB 262 would also further improve radiologists' ability to accurately detect and classify lesions. The translational clinical 263 benefit of 3CB is the increased confidence in the decision to biopsy which has the potential to reduce unnecessary 264 biopsies. 265 266 METHODS 267

3CB Imaging 268
All images were acquired from women scheduled for percutaneous breast biopsy, prior to their biopsy. We used a 269 single Hologic Selenia full-field digital mammography system (Hologic, Inc. Bedford, MA) to image women with 270 3CB. This particular system configuration has a molybdenum X-ray anode and two internal X-ray filters of either 271 Molybdenum or Rhodium. Two mammograms were acquired on each woman's affected breast using a single 272 compression. The first exposure was made to mimic the clinical screening mammogram conditions such that the 273 Selenia's internal software chooses the voltage and current settings based on breast thickness usually below 30 kVp. 274 The second mammogram was acquired at a fixed voltage (39 kVp) and current for all participants. An additional 3-275 mm thick X-ray filter was placed in the beam path to remove more of the low energy X-rays. A high energy exposure 276 (39 kVp/Rh filter) was made using an additional 3-mm plate of copper in the beam to increase the average energy 277 of the high energy image. The 39 kVp high-energy voltage is the highest obtainable voltage on the Selenia unit. We 278 limited the total dose of this procedure to be approximately 110% of the mean-glandular dose of an average 279 screening mammogram. The images were collected under an investigational review board approval to measure 280 breast composition. The calibration standards and 3CB algorithms for generating compositional thickness maps 281 have been previously described in full 32,40 . delineations. Presentation images used for radiologist readings and delineations are co-registered with the resulting 292 3CB thickness maps. Therefore, the delineation coordinates will be projected in the exact location on all 3CB 293 thickness maps. 294

CAD Delineations 295
Patients' diagnostic mammograms, in raw DICOM format, were pushed to a local iCAD PowerLook server running 296 CAD software. Using iCADs CAD proprietary algorithm at the most sensitive setting, the iCAD delineate 297 suspicious masses and each individual calcification within a calcification cluster. CAD also assigned a probability 298 of malignancy for each suspicious mass and calcification cluster. The x and y coordinates of each CAD delineation 299 was captured and stored in an output Extensible Markup Language (XML) file. 300 Some patients present with numerous calcifications and the total number of individual calcifications significantly 302 outnumber the total number of masses identified by CAD. In order to address this possible imbalance, new 303 delineations were generated which delineated an ROI for a calcification cluster rather than use the delineation for 304 all individual calcifications. CAD automatically indexed and grouped each calcification. The new ROI was 305 calculated by taking all calcification within a cluster, extracting the set of x and y coordinates, and calculating the 306 convex hull for that set of all coordinates 41,42 . 307

3CB Feature Engineering 308
Three outer region ROIs were generated around the lesion ROI to evaluate the area and tissue immediately 309 surrounding a lesion. Each outer region ROI was 2mm in thickness and thus 2mm away from the border of the 310 previous ROI. Using the lesion ROI coordinates, a Euclidean distance transform 42 was used to compute coordinates 311 2mm away from the lesion border, for the first outer region. This same method was used to compute the second 312 outer region and third outer region by using the first outer region and second outer region as the reference points 313 for the Euclidean distance transform respectively. As a result, each lesion has a total of four ROIs; one for the lesion 314 and three outer regions 2mm in thickness at distances of 2, 4, and 6mm from the lesion ROI, see Figure 2b. 315 Single numeric values, which includes mean, median, standard deviation, minimum, maximum, kurtosis, skew, 316 total and percentage, were captured to characterize each ROI. These nine values were calculated for all four lesion 317 ROIs on each of the three 3CB thickness maps. A total of 108 total compositional features were extracted from 318 every single lesion. 319

Data Augmentation 320
To combat overfitting of our model, data augmentation was implemented at the level of ROI delineations. The 3CB 321 thickness maps contain the point thicknesses of a given composition (LWP) on a pixel by pixel basis and the 108 322 extracted features are derived from all pixels within an ROI. It is unlikely that an ROI delineation perfectly captures 323 all pixels corresponding to a lesion while perfectly excluding pixels corresponding to normal breast tissue. In 324 addition, human variability explains that delineation ROIs for the same lesion will not result in the exact same 325 coordinates despite delineations originating from the same radiologist. Therefore, our augmentation strategy was 326 meant to account for possible variability involved with delineating ROIs. ROIs were rotated within a range 10 327   From left to right is the standard presentation craniocaudal mammogram used for reading by radiologist, lipid thickness map, water thickness map, and protein thickness map. Grayscale colorbars, adjacent to 3CB thickness maps, indicate thickness in cm. b, The composition of the background or tissue surounding a lesion was measured progressively by capturing three outer regions extending from the border of the lesion (yellow solid line). The outer regions extend from the lesion border at distances of 2mm (orange dot-dashed line), 4mm (cyan dotted line), and 6mm (magenta dashed line). c, CAD delinations that had some agreeance with radiologist ROIs (yellow line) were included in the final dataset. CAD delinates suspisous masses (cyan dot-dashed line) and calcification clusters (magenta dotted line). Outer regions for all ROIs (radiologist and CAD delinated) were calculated but not displayed in this sub-figure for easy viewing. The IDI is the sum of the IS and IP (-1.06 + 13.17) which is 12.11 and a positive IDI indicates that predictive models benefit from the addtion of 3CB. The borders of the BI-RADS assessment categories are indicated by the vertical dashed lines. NRI for events or cancers (black) and non-events or benigns (red) are calcuated at each BI-RADS border to demonstrate 3CBs effect on specificty with respects to each BI-RADS category. c, This table shows that adding 3CB allows for more accurate BI-RADS classification, as determined by probability of maligancy, for lesions with both malignant and non-malignant pathologies or events and nonenvents. The NRI for events and non-events is -0.02 and 0.25. The overall NRI, which is the sum of NRI events and non-events, is 0.25.