Machine learning-based image analysis for accelerating the diagnosis of complicated preneoplastic and neoplastic ductal lesions in breast biopsy tissues

Diagnosis of breast preneoplastic and neoplastic lesions is difficult due to their similar morphology in breast biopsy specimens. To diagnose these lesions, pathologists perform immunohistochemical analysis and consult with expert breast pathologists. These additional examinations are time-consuming and expensive. Artificial intelligence (AI)-based image analysis has recently improved, and may help in ordinal pathological diagnosis. Here, we showed the significance of machine learning-based image analysis of breast preneoplastic and neoplastic lesions for facilitating high-throughput diagnosis. Images were obtained from normal mammary glands, hyperplastic lesions, preneoplastic lesions and neoplastic lesions, such as usual ductal hyperplasia (UDH), columnar cell lesion (CCL), ductal carcinoma in situ (DCIS), and DCIS with comedo necrosis (comedo DCIS) in breast biopsy specimens. The original enhanced convoluted neural network (CNN) system was used for analyzing the pathological images. The AI-based image analysis provided the following area under the curve values (AUC): normal lesion versus DCIS, 0.9902; DCIS versus comedo DCIS, 0.9942; normal lesion versus CCL, 0.9786; and UDH versus DCIS, 1.000. Multiple comparison analysis showed precision and recall scores similar to those of single comparison analysis. Based on the gradient-weighted class activation mapping (Grad-CAM) used to visualize the important regions reflecting the result of CNN analysis, the ratio of stromal tissue in the whole weighted area was significantly higher in UDH and CCL than that in DCIS. These analyses may provide a more accurate and rapid pathological diagnosis of patients. Moreover, Grad-CAM identifies uncharted important histological characteristics for newer pathological findings and targets of research for understanding diseases.


Introduction
Breast cancer is a major cause of death in women [1,2]. Breast cancer treatments improve prognoses of patients with early stage disease, but patients with advanced stage still have poor prognosis [3][4][5].
Therefore, early diagnosis of non-invasive carcinoma or precancerous lesions is most important for preventing development of advanced breast cancer development. Pathological examinations are essential for detecting these lesions. The most recent classi cation of breast ductal lesion is based on the differences in biological behavior from basic and clinical research outcomes, such as normal mammary ducts, hyperplastic lesions, precancerous lesions, and carcinomas [6,7]. These lesions differ in their proliferative behavior and genetic background [6,8], and a precise diagnosis of these lesions contributes to good clinical outcomes in the patients. However, for the diagnosis of complicated ductal lesions, pathologists need to perform several types of immunohistochemical staining that require additional time and cost [8,9]. Moreover, despite meticulous analysis, the nal diagnosis of these lesions differs between pathologists because of the di culty of morphological and immunohistochemical assessments [10,11].
Arti cial intelligence (AI) eld has been closely corroborated with pathological diagnoses in the recent decade. Along with the improvement of image processing technologies, including whole slide imaging (WSI), the AI approach has been integrated into pathological diagnosis [12][13][14][15]. Nowadays, AI-based pathological image analysis is being used in predicting cancer recurrence [16,17], therapeutic outcomes [16], and genetic mutations [18]. The U.S. Food and Drug Administration has recently approved the WSI system for digital pathology, and the use of AI-based pathological diagnosis is accelerating as a robust supporting tool for pathologists. The AI-based pathological image analyses can process very detailed and enormous information that pathologists could never assess. For example, AI can detect a slight difference in nuclear features and gland angularity that is di cult to be identi ed in microscopic inspection by a pathologist [19,20].
In the eld of breast cancer pathology, AI-based image analysis is important for more accurate diagnosis. Recent reports have been shown that machine learning-based image analysis can distinguish invasive and non-invasive cancer [17,21]. However, these systems did not distinguish more detailed benign, preneoplastic, and neoplastic lesions, such as usual ductal hyperplasia (UDH), columnar cell lesion (CCL), and ductal carcinoma in situ (DCIS) with comedo pattern necrosis (comedo DCIS) [17,21]. Thus, to overcome problem for pathological diagnosis in breast cancer, the detailed assessment of early tumorigenic lesions and non-malignant lesions is essential for providing appropriate treatment for patients.
In this study, we performed AI-based analysis on four ductal lesions with similar morphological characteristics, but clinically different therapeutic outcomes (UDH, CCL, DCIS, and comedo DCIS). The result is promising as a novel supporting tool for the usual pathological diagnosis and may identify newer morphological characteristics for an improved diagnosis of breast lesions.

Patients and histological material
This study included 125 biopsy cases (with 15 open biopsy cases) that comprised 222 slides-77 cases of DCIS including 23 cases of comedo DCIS, 13 cases of CCL, 23 cases of UDH, and 113 cases of normal mammary glands in the cases. All cases were obtained between 2010 and 2020. Each biopsy slide included 2-6 tissues. The average age of patients with DCIS, comedo DCIS, CCL, and UDH lesions was 58.0, 55.9, 47.6, and 49.5 years, respectively. A total of 595 images, comprising 110 images for DCIS, 75 images for comedo DCIS, 150 images for CCL, 110 images for UDH, and 150 images for normal mammary glands were used for AI training or testing. The work ow is shown in Fig. 1. UDH cases were subjected to immunohistochemical examinations for ER, CK5/6, or p63 to distinguish them from DCIS. All cases were obtained from the diagnostic pathology division database at the Kanagawa Cancer Center Hospital. All images were obtained from the biopsy samples. All diagnostic criteria for DCIS, comedo DCIS, CCL, and UDH were based on the 5 th edition of the World Health Organization's blue book [7]. Six pathologists (T.Y., K.K., E.Y., Y.O., S.S., M.S., and K.W.) separately diagnosed these cases, and S.S. reviewed all cases. Institutional review boards at the Kanagawa Cancer Center and the Chiba University approved all the test set study activities.
Obtaining Images of each lesion from biopsy samples All hematoxylin and eosin (H&E) stained slides were digitalized using Leica Aperio CS2 (Leica Biosystems Imaging, Inc., Wetzlar, Germany). Further, all images were analyzed using Qupath-0.2.0-m7 (open-source software [19]). Images were captured at 10x and 20x magni cation. Each image included only one lesion and excluded any artifacts.
Model construction and training of the convoluted neural network (CNN) for pathological image analysis Python programming language, version 3.6.7 and Keras, version 2.2.4 with Tensor ow, version 1.14.0 at the backend were used to build the CNN architecture. In the present study, we used the InceptionV3 architectural model, that was previously trained on ImageNet. Inception-v3 is an Inception family of convolutional neural network architectures with several improvements such as label smoothing, factorized 7x7 convolution, and the use of auxiliary classi ers to propagate label information lower down the network [22]. The input images were scaled down to 299 × 299 pixels. We then ne-tuned the model with the image dataset of images of DCIS, Comedo DCIS, at epithelial atypia (FEA), columnar cell hyperplasia, and normal glands. Weights in the rst 249 layers were frozen, and weights in the other layers were retrained with our data. The network was trained for 100 epochs with a learning rate of 0.1 that was reduced if no improvement was observed. The model training convergence was monitored using cross-entropy loss. All images were randomly augmented using Image Data Generator (https://keras.io/preprocessing/image/) by a rotation angle range of 180°, width shift range of 0.2, height shift range of 0.2, brightness range of 0.3-1.0, and horizontal and vertical ip in 50%. The CNN was trained and validated using a computer with a GeForce RTX 2060 graphics processing unit (NVIDIA, Santa Clara, CA), a Core i7-9750 central processing unit (Intel, Santa Clara, CA), and 16 GB of random access memory. To evaluate the performance of the CNN, we plotted the receiver operating characteristic (ROC) curve and calculated the area under the curve (AUC). We also calculated the precision and recall for assessing the diagnostic accuracy of the CNN model.

Statistical analysis
For experimental data analysis, GraphPad Prism 9 software (GraphPad Software, San Diego, California, USA) was used. Normality assumption was tested using one-way ANOVA for the analysis of tumorstromal area. Tukey-Kramer method was used for multiple comparisons in the analysis of tumor-stromal areas (Fig. 7). A p < 0.05 was considered statistically signi cant.

Experiment 1: DCIS vs. normal
Distinguishing non-invasive cancer from normal mammary glands First, we attempted to distinguish non-invasive cancer and normal mammary glands to test the AI-based diagnosis system. The differential diagnosis between the normal mammary gland and non-invasive cancer / DCIS is essential, but not di cult for pathologists. Therefore, to distinguish them is indispensable for our system. The CNN analysis successfully distinguished DCIS from normal mammary glands (Fig. 2, AUC score: 0.9902, precision: 0.935, and recall: 0.953). These results suited this system's requirement, and thus, we applied the analysis to all other benign preneoplastic and neoplastic lesions. In case of DCIS lesions, the comedo DCIS pattern that has necrosis at the center of DCIS has a more aggressive phenotype. Comedo DCIS pattern is a risk factor for recurrence post breast-conserving surgery [23,24], and is also correlated with the expression of poor prognostic markers [25,26]. The comedo DCIS pattern is rare and usually mixed with other DCIS lesions, leading to its omission from screening. Therefore, accurately identifying the comedo DCIS pattern is vital for pathological analysis. The AUC score in the CNN analysis was very high (0.9942). Moreover, the precision and recall were 0.972 and 0.954, respectively (Fig. 3).

Experiment 3: CCLs vs. normal
Distinguishing CCLs and normal mammary glands CCLs include columnar cell hyperplasia and FEA. FEA has a neoplastic gene alteration than that in normal mammary glands [27][28][29]. Additionally, CCLs, including columnar cell hyperplasia, have similar neoplastic genetic alterations [30,31]. At present, CCL cases need to be followed up closely [32][33][34]. Therefore, distinguishing CCLs from normal mammary glands is important for breast biopsy tissue. Thus, we distinguished CCLs from the normal mammary glands using our system. The AUC was 0.9786, and for normal vs. CCLs, the precision and recall were 0.935 and 0.953, respectively (Fig. 4).

Experiment 4: DCIS vs UDH
Distinguishing hyperplastic lesion from DCIS Next, we focused on the hyperplastic lesions in the mammary glands. Usual ductal hyperplasia (UDH) is a common lesion in the mammary tissue. However, dense proliferation of the mammary gland epithelium resembles DCIS, and additional immunohistochemical examination is needed for the differential diagnosis between DCIS and UDH [35,36]. To simplify this differential diagnosis process, we attempted to distinguish between DCIS and UDH using the machine learning system. The morphological structure was very similar between DCIS and UDH (Fig. 5), but the AUC score was 1.000, and the precision and recall were 1.000 and 1.000, respectively (Fig. 5).

Experiment 5: Distinguishing all 5 lesions
Finally, we attempted to simultaneously distinguish all ve lesions (normal mammary glands, CCL, UDH, DCIS, and comedo DCIS) to test our system's applicability for supporting daily pathological diagnoses. The average precision and recall between the ve lesions were 0.923 (0.863-0.973) and 0.927 (0.880-0.991), respectively (Fig. 6). Thus, these results are similar to those of individual comparisons performed in experiments 1-4.

Experiment 6:
Feedback from machine learning-based analysis to the practical microscopic ndings From these results, we speculated that the machine learning-based evaluation criteria/process of each lesion may differ from that of human pathologists, especially from the UDH vs. DCIS analysis. Therefore, we used Gradient-weighted Class Activation Mapping (Grad-CAM) data, to visualize the important regions re ecting results of the CNN analysis and identify newer morphological characteristics to support the pathological diagnosis. According to the Grad-CAM results, although the current pathological diagnosis only focuses on the morphology of the epithelial structure, the AI-analysis concentrated more on stromal tissue (Fig. 7). The stromal tissue ratio in the whole weighted area (red and yellow colored area) was very high in normal mammary glands than that in others (Fig. 7). The percentage of average epithelial lesions and stromal lesions were 70.5% and 29.5% in DCIS, 27.4% and 72.6% in normal, 59.3% and 40.7% in UDH, 57.8% and 42.2% in CCL, and 68.9% and 31.1% in comedo DCIS, respectively (Fig. 7). From the statistical analysis, the ratio of stromal tissue in the whole weighted area was signi cantly higher in the normal mammary glands than that in all other lesions (Fig. 7). Moreover, the stromal tissue ratio in the UDH and CCL was signi cantly higher than that in DCIS (Fig. 7).

Discussion
Pathological diagnosis of neoplastic or benign lesions with morphological similarity is always di cult, even though the patient's treatment is completely different for them. Sometimes these problematic lesions cannot be precisely diagnosed even by IHC or consulting with experts. In the eld of breast pathology, some hyperplastic, preneoplastic, and neoplastic lesions such as UDH, CCL, and DCIS are considered as di cult to diagnose lesions. Our AI system could successfully distinguish normal mammary gland, UDH, CCL, DCIS, and comedo DCIS tissues. This is the rst study to address the issue of the diagnosis of these indistinguishable breast lesions using machine learning-based image analysis. In previous studies, AI-based analysis for breast pathology has been used for classifying only normal, benign, in situ, and invasive cancer [17,21,37]. These studies combine the benign lesions and preneoplastic lesions into one criterion, such as benign hyperplastic lesions (UDH) and preneoplastic lesions (CCL). Here, we obtained high AUC scores as well as precision and recall, in both one-to-one comparison and comparison of all 4 lesions with normal mammary glands. Therefore, our machinelearning system may be helpful for the daily pathological diagnosis of breast biopsy tissues.
In this study, we showed the importance of stroma for distinguishing breast lesions (Fig. 7). For several years, the most critical morphological characteristics for diagnosing benign and malignant tumors were those of the tumor cell itself, and ndings of surrounding stromal tissue were merely supportive information. With an improved understanding of the role of stromal cells such as cancer-associated broblasts in cancer progression [38,39], information of the stromal structure is now getting focus for the diagnosis of cancer [40,41]. Therefore, the ndings of this study support the utility of stromal structure for the diagnosis of benign and malignant breast lesions.
This study has a limitation. We used images that include only one lesion for the AI analysis. For the analysis using WSI, many lesions coexist on a slide and each lesion needs to be distinguished. The WSIbased analysis is ideal, but currently it is di cult to develop a practically usable diagnostic systems. However, our system can be applied for daily diagnosis. The pathological division of several hospitals has a light microscope with a camera system to capture images of the IHC results and for research use. Our system only needs an image of the lesion that can be captured using a camera attached to any light microscopies, or even using camera of a smartphone. It is a very high-throughput system than WSI of a single slide. In future, we plan to apply the existed system to the WSI analysis to simultaneously detect multiple lesions from WSI.
In case of early neoplastic lesions in the mammary gland, lobular neoplasms are also crucial as are DCIS or atypical ductal hyperplasia (ADH). Lobular carcinoma in situ (LCIS) lesions have totally different morphological and immunohistochemical characteristics than those of ductal lesions, and can help distinguishing benign and malignant ductal lesions based on staining of E-cadherin as a novel protein marker [42,43]. Thus, we exclude LCIS images in our system. However, benign and malignant ductal lesions are sometimes very similar, and are di cult to distinguish even using IHC methods. Therefore, we focused on the ductal early neoplastic and preneoplastic lesions in this study.
Other AI analyses have been performed for scoring or predicting the expression of biomarkers, such as HER2, Ki-67, estrogen receptor (ER) or progesterone receptor (PgR) [44][45][46][47], and assisting in identi cation of lymph node metastasis [48] or tumor-associated stroma [49]. However, these analyses are not directly related to the pathological diagnosis of complicated breast lesions.
The pathologists' workload on breast cancer has recently increased annually. Along with efforts for diagnosis, pathologists also need to collect data on tumor cell density and tumor area for the genetic analysis of cancer as well as evaluation of several kinds of biomarkers, such as ER, PgR, HER2, and Ki-67 positivity [7]. Moreover, some recent very effective molecular target therapies, including anti-PD-1 or -PD-L1 treatment and CDK4/6 inhibitors, require the information about the expression of particular proteins in the cancer cells or immune cells to predict therapeutic e cacy [50][51][52]. Therefore, this machine learningbased diagnostic support tool may shorten the duration of pathological screening of patients' biopsies and surgical specimens, and allow pathologists to concentrate on other essential works related to therapeutic decisions.
In conclusion, this study established an AI system that can differentiate complex benign and malignant breast lesions with a very high accuracy. This system can be used for pathological diagnosis of di cult to identify breast lesions.  Figure 1 Work ow for the image analysis of benign and malignant lesions in breast biopsy specimens Five different benign, preneoplastic and malignant lesions are selected from breast biopsy specimens, and images from these lesions are pooled and used for machine learning analysis. We performed two type of analysis-binary comparison of lesions, and simultaneous comparison of all ve lesions.