Histo-Fusion: A novel domain specific learning to identify Invasive Ductal Carcinoma (IDC) from Histopathological Images

doi:10.21203/rs.3.rs-2106899/v1

Today breast cancer is the leading type of cancer among women undergoing cancer screening. A slight delay in detecting and diagnosing this disease may result in irreversible convolutions. Histopathological images from the biopsy examination present a large amount of structural information that can significantly improve the prognosis for breast cancer. The pathological analysis, which involves the microscopic examination of the histopathological slides, is a challenging task. An automated computer-aided detection (CAD) procedure is inevitable, as it may decrease the pathologist examination time and help detect the disease at an early stage. Lately, deep learning methods using artificial neural networks are consistently in use to improve the performance of CAD methods. A common practice among recent studies is to use the transfer learning approach of training deep neural network architectures. Transfer Learning is an established learning approach that facilitates a deep neural network to train quickly on a specific dataset and resolve an interdisciplinary problem. Deep Learning methods employing the transfer learning approach have provided highly competitive results on the datasets consisting of the whole slide images, which are captured generally at high resolutions. However, the performance is not remarkably appreciable on the small and low-resolution image datasets, in particular the datasets that include patch samples. In this direction, the present study proposes a novel domain-specific learning strategy, Breast Histo-Fusion, which aims to detect breast cancer even from images of low resolution and small size. Further, four state-of-the-art deep CNNs (AlexNet, VGG19, ResNet, and DenseNet) are trained using both learning approaches: Transfer Learning and Histo-Fusion on the IDC dataset. The proposed Histo-Fusion learning approach has improved the discriminating abilities and performance of each deep CNN by providing better results of (AlexNet – 95.75%, VGG19–95.96%, ResNet34–96.17%, ResNet50–96.67%, and DenseNet121–97.49%) compared to (AlexNet – 90.41, VGG19–90.51%, ResNet34–90.83%, ResNet50–92.27%, and DenseNet121–93.05%) using the transfer learning strategy. As a result, the procedure can help expert pathologists to perform accurate diagnoses and reduce false-positive rates.

Breast Cancer

Deep Convolutional Neural Networks

Invasive Ductal Carcinoma

Transfer Learning

Histopathological Images

Breast cancer is the leading diagnostic cancer amongst various cancers that affect women today. With 2.3 million new cases and 685,000 deaths worldwide, breast cancer is the fourth leading cause of mortality (Sung, 2021). The Indian Council for Medical Research (ICMR) predicted a figure of 2,00,000 new breast cancer cases among Indian women (Mathur, 2020). The effective prognosis of this morbidity is possible with timely diagnosis through an efficient detection system. The assessment method to help detect breast cancer follows a 3-step process: mammograms (for visualization of early breast structure changes), clinical assessment, and needle biopsy: H&E-stained histology (Azam Hamidinekoo, 2018). Mammogram images and histopathological images are the most commonly used modalities for breast cancer screening and diagnosis. However, around 10% patients are generally advised to go for a more thorough assessment after mammography (Neal, 2010), but despite it being an effective method, the procedure exhibits a trade-off between specificity (91%) and sensitivity (84%), resulting in needless biopsies (Elmore, 2009). As a result, patients are subjected to significant stress and trauma besides being forced to bear increased healthcare costs.

Today histopathological analysis is the only appropriate method for diagnosing malignancy on breast tissues (Elmore, 2009), which involves a keen microscopic examination by an expert pathologist. But in the present-day healthcare scenario, India is facing a severe dearth of pathologists against a looming number of patients. As per statistics, the population-pathologist ratio in India stands around 65000 (Robboy SJ, 2013) far lower than the USA where it is around 17500. The complex nature of the histopathological examination and the massive number of investigations per pathologist may lead to false diagnoses. (SA., 2014; Welch HG, 2014). To address this problem computer-aided detection (CAD) methods using image processing and analysis have been developed to help the domain experts to arrive at a correct diagnosis (Azam Hamidinekoo, 2018). Such techniques aim to provide additional insight in tumour region identification, mitotic activity, and in the identification of breast cancer subtypes such as IDC (Invasive Ductal Carcinoma), ILC (Invasive Lobular Carcinoma) (Veta, 2014). Although CAD methods are effective, accurate detection of abnormal breast tissue still remains a challenging task, because the texture patterns present in histopathological images are highly complex. For precise diagnosis, an appropriate feature extraction method with the capability to extract significant features from these complicated patterns is highly desirable (Xie J, 2019). This cannot be achieved through CAD, because these systems primarily rely on prior knowledge of the input data (O., 2019), while the focus remains to find a fair balance between detection accuracy and computational complexity (Nanni L, 2017). CAD systems tend to result in high rates of false positives, and recent studies have confirmed that the diagnostic capacity of these models cannot improve further (Lehman CD, 2015), it is urgent to look up for some advanced and accurate detection procedures.

The expeditious advancement in computational technologies makes it possible to use deep learning methods (especially the deep convolutional neural networks) for object detection and object classification tasks (Litjens G, 2017) (Zhao R, 2016). Such methods can also be used in the early detection of breast cancer (Zhao R, 2016) (Lee JG, 2017). Recently several studies employing deep learning methods have achieved better results than the traditional CAD systems by providing diverse analyses of suspicious scans (Li Y, 2016) (Hedjazi MA, 2017). Unlike conventional machine learning and CAD methods, deep learning models can automatically extract the significant features from the input data (Zhang X, 2019). Deep CNNs carry out this feature learning by following a hierarchical framework, making it possible to combine the features extracted from the low-level and high-level abstracts through a non-linear approach (Azam Hamidinekoo, 2018). Deep learning methods adjust themselves according to the inputs provided, improving the correlation between input and output using an iterative training process (Bengio, 2009). Indeed, deep learning techniques using transfer learning have provided competitive results in diagnosing breast cancer from histopathological samples, but the performance of these methods is not promising in case of datasets with image samples of low resolution and small size.

The present study proposes a novel Histo-Fusion learning method that aims to identify breast cancer from histopathological images more accurately than the conventional methods presently in vogue. The study uses various state-of-the-art deep CNNs which include smaller network architectures (AlexNet and VGG19) and also the deep architectures (ResNet and DenseNet), to investigate the performance of the Histo-Fusion training approach against the established Transfer learning strategy. The proposed Histo-Fusion learning strategy improved the ability of each deep CNN to accurately diagnose breast cancer from small sized low-resolution image samples, compared to the transfer learning training approach.

Data Acquisition, data pre-processing, feature mining, and decision mining are the main integral modules for an automatic classification system (Sharma S, 2020). Among these, feature mining holds prominence because its performance is directly correlated to the quality of features extracted from input images. Conventional methods like Local Binary Pattern (LBP) (T. Ojala, 2002), Parameter Free Threshold Adjacency Statistics (PFTAS) (L. P. Coelho, 2010), Oriented Fast and Rotated BRIEF (Rublee, 2011), and Local Phase Quantization (LPQ)(V. Ojansivu, 2008) use ‘texture’ as a significant attribute for feature extraction.(Spanhol FA, 2015) evaluated the feature extraction ability of the conventional methods in conjugation with various classifiers like ‘1 nearest neighbor (1-NN) (K. Q. Weinberger, 2006), Quadratic Linear Analysis (QDA) (Tharwat, 2016), Support Vector Machines (SVM) (B. E. Boser, 1992), and Random Forests (RF) (V. Lepetit, 2006). The study concluded that SVM, using fractal dimension as a feature descriptor, performed well among various classifiers on low-resolution images (A. Chan, 2016).(Roy, Das, Kar, Schwenker, & Sarkar, 2021) used ensemble learning by stacking and extracted the textural features from the histopathological images for classifying them into IDC + and IDC - categories. The study also conducted an in-depth comparative analysis of different machine learning classifiers in classifying breast cancer histopathological image patches inferring that the CatBoost (CB)(Prokhorenkova, Gusev, Vorobev, Dorogush, & Gulin, 2018) classifier proved more efficient with an AUC score of 92.2%, followed by the extra-Tree model − 89.7%, and multi-level perceptron (MLP) − 90.4% while RF(Breiman, 2001) yielded the lowest AUC score of 87.1%. The conventional methods of classification using machine learning are comparatively more complex in nature and the results they yield are highly inconsistent, which is due to the fact that these methods depend on the standard of features retrieved by various feature descriptors. To address the issue, recourse is taken to utilize deep learning methods which are discussed below.

Recently, deep neural network architectures with inherent abilities to retrieve the most significant features from histopathological images were used to perform automatic classification tasks. The first attempt to use deep neural networks to detect IDC from whole-slide histopathological images used a smaller CNN consisting of just three layers (Cruz-Roa, 2014). Besides, the study used some classical handcrafted methods such as Fuzzy-color-histogram, RGB-histogram, LBP, Harlick features, graph-based features, and gray histogram for feature extraction and classification tasks. With an F1-score of 0.7180 and a balanced accuracy score of 0.8423, the CNN-based approach provided better results when compared to handcrafted methods. On the other hand, Fuzzy-color histogram (F1: 0.6753 and balanced accuracy: 0.7874) and RGB histogram (F1: 0.666 and balanced accuracy: 0.7724) yielded better results among the classical methods used. (B. N. Narayanan, 2019) used color consistency and histogram equalization for pre-processing the histopathological patch images. The deep CNN used in the study obtained an AUC value of 93.5% on color consistency pre-processed patches compared to 87.6% obtained on the patches pre-processed with histogram equalization. We may attribute this increment to the color consistency technique that can maintain the contrast levels across all the images of the dataset used, which provides a better discriminating ability to a deep CNN to perform the classification task.

(J. L. Wang, 2018) applied a good number of CNNs for automatic feature extraction and classification tasks to optimize the F1 and AUC scores, four of which were already implemented by (Cruz-Roa, 2014). Each CNN architecture is trained on the IDC dataset till it achieves maximum accuracy with minimal gradient. The deeper network of the four deep CNNs yielded higher AUC and F1 scores compared to the less deep networks. Precisely the deepest of the four CNNs with a single dropout layer yielded F1: 0.923, BAC: 0.866, and Accuracy: 89% compared to a network comprising two dropout layers which yielded F1: 0.911, BAC: 0.814, and Accuracy: 87%. An interesting inference drawn from this study is that the performance increase through data augmentation is directly proportional to the depth of the CNN used. Strangely, the study is silent about the specific procedures used to accomplish the data augmentation process.(M. J. Rahman, 2018) applied data augmentation through random rotation, horizontal and vertical flipping/shifting on a training set comprising breast cancer histopathological images. A six-layer CNN with ReLU non-linearity and Adadelta optimizer using data augmentation classified the test samples into IDC + and IDC - obtaining F1: 0.8934 and Accuracy: 89%. Moreover, the architecture displayed less model overfitting, certainly by virtue of data augmentation.

(H. Alghodhaifi, 2019) investigated the performance of depth-wise separable convolution model against the standard convolution model of CNN on the IDC dataset. The depth-wise separable convolution characterizes itself by performing a single convolution on a single channel at a given time, whereas the standard convolution performs the convolution operation on all channels simultaneously. To rectify the non-linearity towards the end of convolution operations, numerous activation functions like ReLU, Tanh, Sigmoid are independently used in both models to test their respective response to the classification task. The standard convolution neural network model provided better performance results (Precision: 0.81, Specificity: 0.73, Sensitivity: 0.71, F1: 0.3, and Accuracy 87%) in contrast to the depth-wise convolutional neural network. Among various activation functions used in this model, ReLU with Accuracy: 87.1% outperformed others, followed by Sigmoid with a marginally less Accuracy: 86.4%. (Hernandez, 2019) used an under-sampling approach to reduce the bias of a classifier towards the majority class in the training dataset. Under-sampling tends to remove some of the sample patches from the majority class of the training set making it a balanced training package. The balanced dataset is further used for training a CNN architecture as implemented by (M. J. Rahman, 2018) with a slight variation i.e., use of variable dropout rates. The optimized CNN-based approach yielded Accuracy: 0.854 on the IDC dataset. (F. P. Romero, 2019) implemented a multi-level batch normalization using the inception CNN as base architecture. These modules, indeed, help mitigate the internal co-variance shift enabling better training of the CNN model. The implementation of this method finally resulted in obtaining BAC: 89% on an imbalanced IDC test set.

(Sujatha, 2020) used residual networks ResNet34 and ResNet50 on different training sets containing varying sample instances of IDC patch images. ResNet50 showed slightly better performance over ResNet34 in identifying IDC + and IDC - samples. The study concluded that the discriminating ability of the CNN models decreases directly in proportion to the decrease in sample instances of the training sets.(Saad Awadh Alanazi, 2021) investigated the performance of machine learning methods like Logistic Regression, KNN, and SVM along with deep learning methods involving three different CNN architectures. In the classification involving the identification of IDC + and IDC- using conventional machine learning methods, SVM outperformed its other two counterparts yielding Accuracy: 78.5%. On the other hand, out of the three CNNs, the two shallower networks comprising two and three Convolutional layers yielded respectively 59% and 76% accuracy scores on the same set of test samples, while the third deeper network with five Convolutional layers achieved an accuracy score of 87%.

Implementation of ensemble learning that seeks a better performance by combining the outputs of two deep CNNs (DenseNet121 and DenseNet169) obtained BAC: 92.07% and F1: 0.9570 (Nusrat Ameen Barsha, 2021). This ensemble model in conjugation with Test-time-augmentation achieved better performance in classifying IDC patch samples than the individual pre-trained models. The study further highlights the performance improvements achieved in the classification task of the ensemble model by upscaling the image patch samples from 50x50 pixels to 250x250 pixels. (Alzubaidi & Al-Amidie M, 2021) used a domain-specific learning approach, similar to the one used in this study, by initially training the deep CNN on a large dataset comprising images of skin cancer. Afterwards, the deep CNN is trained and fine-tuned on histopathological images of the BreakHis dataset. The deep CNN model using domain-specific learning achieved accuracy: 97.5% against 85.29% using training from scratch. (Attallah O, 2021) proposed a novel automatic detection method, Histo-CADx, for identifying breast cancer from histopathological images. In its initial phase, the Histo-CADx investigates the impact of combining features extracted using deep learning methods with those obtained from traditional methods. Later the Histo-CADx implements a multi-classifier system by fusing the outputs of three individual classifiers. Such an arrangement achieved better performance in classifying breast cancer from histopathological images of BreakHis and ICIAR datasets.

Dataset Description

In order to classify the histopathological images of breast cancer into sub-categories of IDC+ and IDC –, and to further validate the proposed Histo-fusion approach, two different datasets BreakHis and IDC have been used. PNG image format has been utilized for storing histopathological image samples of breast tissue in both datasets.

1) BreakHis Dataset

Currently BreakHis dataset (F. A. Spanhol, 2016) is the largest available repository of breast cancer histopathological whole slide images accessible to the research community on the Internet. As a general observation, the distribution of image samples of benign and malignant categories is highly imbalanced across all multi-class datasets. The same holds true for the datasets used in this study as well. Further, in the BreakHis dataset, benign and malignant samples are sub-divided into eight sub-classes (four in each category). Each sample image has a size of 700x460 pixels, captured at four resolution levels: 40X, 100X, 200X, and 400X.

Table 1

BreakHis Histopathological Sample Image arrangement across various magnifications.

Resolution Level	Benign	Malignant	Total
40X	625	1370	1995
100X	644	1437	2081
200X	623	1390	2013
400X	588	1232	1820
Total	2480	5429	7909

2) IDC Dataset

The IDC dataset (Janowczyk A, 2016) used in this study pertains to the class of breast cancer histopathological datasets comprising 162 whole slide images captured at 40X magnification with huge spatial dimensions of the order of 10¹⁰ pixels. A group of expert pathologists subdivided each of the full-slide sample images into many non-overlapping patches. After annotating each sub-sample with an appropriate label, the pathologists extracted 78786 patch subsamples belonging to the IDC + category and 277524 belonging to IDC - category. However, while examining these samples, it is observed that a good number of patch samples contain a large portion of white or black segments. Such samples are removed from the total count to ensure a confusion free training of deep CNN model. Finally, the patch sub-samples are reduced to a count of 190972 for IDC - and 34977 for IDC + samples.

Table 2

Distribution of IDC (+) and IDC (-) patch samples across the original dataset and the one used in this study.

Patch/WSI	IDC -	IDC +	Total
IDC Patch Original Dataset	198738	78786	277524
IDC Patch Dataset used	190972	34977	225949

Convolutional Neural Networks

In recent times, deep learning models have been widely used for computer vision and object recognition tasks. These models have performed remarkably well on medical images for diagnosing the morbidity to a precision level. In classifying breast cancer from the IDC samples, the study takes recourse to four state-of-the-art architectures: AlexNet, VGG19, ResNet and DenseNet.

i. AlexNet

AlexNet (Krizhevsky, 2012) is among the premier pre-trained architectures proposed in 2012 that has won the ImageNet large-scale visual challenge (ILSVRC) in the same year of its launch. The AlexNet architecture is a deep network (5 Convolution layers and 3 Fully Connected layers) in which, except for the output layer, each layer uses the ReLU activation function. With ReLU activation in place, the training process is almost six times faster than any other activation function. AlexNet takes care of model overfitting by using two dropout layers inherently.

ii. VGG19

VGG19 (Karen Simonyan, 2015) (Visual Geometry Group) CNN is a successor to AlexNet, comprising 16 Convolutional layers and 3 Fully Connected layers. The VGGNet generally performs well on object recognition tasks, particularly those involving facial recognition. However, the network architectures belonging to VGG groups are complex since these include a broad set of trainable parameters.

iii. ResNet

Currently, ResNet (He, 2016) is one of the most powerful network architectures in tasks involving computer vision, image recognition, and natural language processing. In comparison to VGG19, ResNet is a deeper architecture that utilizes skip connections to solve the vanishing gradient problem. This network architecture allows its layers to learn the residual mapping instead of the underlying mapping being generally used by other network architectures. Meanwhile, the ResNet architecture has implemented a network of 34 layers, 50 layers, 101 layers, and 152 layers over time. However, the ResNet50 is the most commonly used variant, in which the usage of residual connections propels its success over other established deep CNNs like AlexNet, VGGNet, and Inception networks.

iv. DenseNet

Recently a new paradigm of implementing a deep CNN - DenseNet (G. Huang, 2017) interconnects the layers of a network in a cascade fashion. Each layer accepts the feature maps from all the preceding layers which enables seamless propagation of rich features across a network, ensuring that the network is least affected by the vanishing gradient problem. The DenseNet architecture comprises a convolution layer, a dense block, a transition layer, and a classifier. The architecture is considered to be one of the computationally most efficient deep CNNs.

For classifying histopathological images into IDC + and IDC - categories, five different deep CNNs: AlexNet, VGG19, ResNet34, ResNet50, and Denset121 have been used. Initially, these architectures are trained on the histopathological images using an established learning mechanism i.e., Transfer learning. Later, the proposed Histo-Fusion approach is employed to train the deep CNN models. Performance of each deep CNN, using the two distinct training approaches, is evaluated through standard assessment measures like accuracy, precision, sensitivity, specificity, and F1 score. For the identification of IDC+ and IDC – samples, 70% of the dataset is reserved for model training, 10% for model validation, and 20% for testing purposes. This dataset breakdown is carried out in such a way that there is no overlap of image samples across the three splits.

Generally, the deep CNN models exhibit the problem of overfitting while being trained for longer durations. The study employs post-augmentation procedure to mitigate the issue, further ensuing that augmented image are present in the training set only. It enables deep CNN architectures to utilize the extra training data, enhancing the scope to improve their discriminating abilities.

Transfer Learning

Training of deep CNN models using limited data is a highly challenging task. To meet the challenge, numerous research works employed Transfer Learning approach that has yielded encouraging results (Huynh BQ, 2016) (Suzuki S, 2016) (Alghodhaifi, Alghodhaifi, & Alghodhaifi, 2019) (Cruz-Roa, 2014). Currently data augmentation and transfer learning are the only two apt tools available for rapid and effective model training. Data augmentation techniques provide different craniocaudal and Medio-oblique views of the breast tissue that help improve the detection accuracy and reduce false-positive rates (Abdelhafiz D, 2019; Wang J, 2017). Transfer Learning facilitates the utilization of pre-trained deep CNN architectures to solve an allied interdisciplinary problem. These pre-trained deep CNNs train on a large ImageNet dataset (Russakovsky O, 2015) consisting of 1.2M images of diverse nature. Afterwards, the deep CNNs are optimized to perform the 1K classification task. Finally, Transfer Learning allows these pre-trained networks to receive training on the specific dataset to perform the required classification task.

Histo-Fusion Approach

There is no denial to the fact that deep CNNs using Transfer Learning play a vital role in accomplishing the image classification tasks by improving the precision results and minimizing the training time. However, it suffers from the problems of negative transfer and overfitting. The deep CNNs pre-trained on a different dataset fail to provide satisfactory performance mainly because of the negative transfer problem. This usually happens when the learning acquired in previous training affects the performance of a new task deteriorating the performance of a deep CNN. Deep CNNs also encounter the issue of model overfitting which leads to learning unnecessary details using Transfer Learning approach. Consequently, Transfer Learning does not yield satisfactory results, especially in the datasets exhibiting a high degree of inter-class similarity.

Keeping in view the above disadvantages, a novel domain-specific training framework - Histo-Fusion is attempted to combat the problems stemming up in Transfer Learning. The proposed Histo-Fusion method (fig.-3) enables the deep CNN models to initially undergo extensive training on whole slide histopathological images of comparatively higher resolution (BreakHis Histopathological Images). Later, all the related learning capabilities acquired by the deep CNNs are applied to the IDC dataset, and finally the proposed framework classifies the samples into IDC+ and IDC - categories. The domain-specific learning pursued in this manner has imparted a positive learning experience in deep CNNs yielding comparatively encouraging results. Considering the clinical significance of identifying correct IDC+ and IDC- from the histopathological images, a framework with even a marginally better discrimination ability is an immense gain. The novel domain-specific training framework: Histo-Fusion comprises the following two main stages with image pre-processing, feature extraction, model training and validation as their respective sub-stages.

Histo-Fusion Stage 1:

This stage involves training each deep CNNs on whole slide images of the BreakHis dataset from scratch. Weights are assigned randomly to each deep CNN. The image pre-processing of the BreakHis samples is performed by resizing them to 224x224 pixels, ensuring compatibility of the input samples with the input layers of each deep CNN. Overfitting problem, generally encountered during the CNN training process, is taken care of through data augmentation additionally helping to eliminate the classifier bias towards the majority class. In this study the data augmentation procedure is carried out by using rotation, zooming, shifting, and flipping techniques. In this stage training of deep CNNs on the BreakHis samples is performed using a constant learning rate. The training process continues till the validation loss of each deep CNN is minimized. As BreakHis consists of whole slide images with comparatively better resolution, the deep CNNs learn desirable discriminating features between positive and negative samples which is evident from the accuracy scores obtained in this stage. Finally, after achieving the optimum accuracy scores, the weights of each deep CNN are recorded which are used in Histo-Fusion stage 2.

Histo-Fusion Stage 2:

This stage involves further training of the weight optimized deep CNNs on breast cancer histopathological images of the IDC dataset. Image pre-processing of the IDC dataset is performed by resizing image patches to 48x48 pixel size. In this stage also data augmentation is achieved using rotation, zooming, shifting, and flipping techniques. The training of deep CNNs is initially carried out by re-training terminal layers using a constant learning rate followed by re-training all the layers on the IDC patch samples using discriminating learning rates. Discriminating learning is an effective method of training a deep CNN where the layers of a network are split into three groups – initial layers, middle layers and terminal layers. Each group of layers is assigned different learning rates using an incremental approach where smaller learning rates are applied to initial layers followed by a relatively larger learning rates to the middle layers and a largest optimal learning rates to the terminal ones. Since initial layers of a network contain very fine details (edges or lines) about the data and such information is not likely to change compared to the information learnt by the terminal layers. This approach of discriminating learning helps in optimal network training. The training process is continued till the training and validation loss gets minimized. Finally, the performance of each deep CNN is evaluated on a test set comprising patch images of the IDC dataset.

Performance Assessment

To test and validate the performance of the proposed Histo-Fusion learning approach, each deep CNN uses standard measures of assessment like accuracy, F1 Score, specificity, sensitivity, and precision. All these measures are calculated at the image level methodically in the following manner:

$$Accuracy=\frac{True Positive \left(TP\right)+True Negative \left(TN\right)}{True Positive \left(TP\right)+False Postive \left(FP\right)+True Negative \left(TN\right)+False Negative \left(FP\right)}$$

$$F1 Score= \frac{2*True Positive \left(TP\right) }{\left\{2*True Positive \left(TP\right)\right\}+ False Positive \left(FP\right)+False Negative \left(FN\right)}$$

$$Specificity= \frac{True Negative \left(TN\right)}{True Negative \left(TN\right)+False Positive \left(FP\right)}$$

$$Sensitivity= \frac{True Positive \left(TP\right)}{True Positive \left(TP\right)+False Negative \left(FN\right)}$$

$$Precision= \frac{True Positive \left(TP\right)}{True Positive \left(TP\right)+False Positive \left(FP\right)}$$

True Negatives signify the total number of benign/IDC- samples classified correctly, whereas True Positives are the sum of malignant/IDC + samples that get classified correctly. False Positives are the summation of benign samples which are incorrectly classified. False negatives are the total malignant samples but are incorrectly classified. Furthermore, to establish the correspondence between the true-positive rates and false-positive rates, AUROC (AUC - Area under the curve and ROC - Receiver operating characteristics) is used to assess the relationship.

The proposed Histo-Fusion learning approach initially extracts the most significant features from the higher resolution histopathological images that helps to improve the discriminating ability of deep CNN models. Subsequently, the CNN models are fine-tuned by additionally training them on IDC patch samples of lesser resolution for classifying them into IDC+ and IDC- categories. For evaluating the performance ability of different to classify breast cancer histopathological images, we carry out experiments by training them independently using two different learning approaches: Transfer Learning and Histo-Fusion Learning.

Transfer Learning Results

Since the weights of pre-trained deep CNN models are optimized to perform a 1K classification task on the ImageNet dataset, the transfer learning approach has the advantage of achieving better accuracy results taking less training time than training deep CNNs from scratch. During the implementation of transfer learning, the pre-trained deep CNN architectures classify the IDC patch samples into IDC + and IDC - categories. Figure 4 presents the accuracy scores for different deep CNN architectures on the IDC test samples. A comparison of these scores reveals that the DesneNet121 yielded the highest accuracy score of 93.05% followed by ResNet50 with an accuracy score of 92.27%, while AlexNet attained an accuracy of 90.41%, which is the lowest among all deep CNNs used. Different studies using various deep CNNs, after applying transfer learning, in the classification of IDC patch samples have reported similar results.

Histo-Fusion Learning Results

This learning approach follows a two-stage training process as mentioned earlier. Initially, deep CNN architectures are trained from scratch on the BreakHis whole-slide histopathological samples, using a constant learning rate of 0.001. Since the training is performed from scratch, the deep CNNs take a longer training time to converge compared to the transfer learning approach. Results obtained by different deep CNNs in stage-1 of Histo-Fusion Learning on BreakHis histopathological images are presented in figure 5.

The stage-2 of proposed learning approach starts with additional training on IDC patch samples by re-training the terminal layers of each deep CNN using a constant learning rate. Later, all the layers of are allowed to participate in the training process on the sample patches of the IDC training set using a discriminative learning rate for optimal network training. The application of discriminative learning rates enabled deep CNN architectures to use varying learning rates for different layers benefiting each deep CNN to increase their respective accuracy scores. The increment may be smaller in magnitude, yet it has significant clinical implications. A comparison of accuracy scores obtained by different deep CNNs after the use of discriminative learning rates is brought out in table 3, which points to a critical observation that deeper networks get benefitted more by applying this technique.

Table 3

A Comparison of % accuracy scores obtained using constant learning rates and discriminative learning rates.

Deep CNN Model	Accuracy – Constant Learning Rate	Accuracy – Discriminative Learning Rates	Percent Increase
AlexNet	94.9%	95.57%	0.67%
VGG19	95.16%	95.96%	0.80%
ResNet 34	95.3%	96.17%	0.87%
ResNet 50	95.61%	96.67%	1.06%
DenseNet121	96.2%	97.49%	1.29%

Table 4 presents the performance and evaluation metrics of various deep CNNs after completing training using the Histo-Fusion approach. AlexNet, the primitive most among all deep CNNs used, achieved accuracy of 95.57% F1: 0.9762, Specificity: 0.8117, Sensitivity: 0.9740, and precision: 0.9785. VGG19 displayed better discrimination abilities in classifying the IDC sample patches into IDC+ and IDC - categories compared to AlexNet. The model yielded an accuracy: 95.96%, F1: 0.9775, Specificity: 0.836, Sensitivity: 0.9729, and precision: 0.9821. ResNet34, although being an elementary deep CNN in the league of residual networks, classified the IDC patch samples with a peak accuracy of 96.17%. This 34-layer deep neural network attained F1: 0.9786, Specificity: 0.830, Sensitivity: 0.9767, and precision: 0.9805. On the other hand, a comparatively deeper residual network ResNet50 using Histo-Fusion Learning yielded higher metric scores i.e., accuracy: 96.67%, F1: 0.9813, specificity: 0.851, sensitivity: 0.9804, and precision: 0.9823. Finally, DenseNet121 provided the best performance among all the deep CNNs used with an accuracy of 97.49% F1: 0.9859, specificity: 0.9012 sensitivity: 0.9834, and precision: 0.9885 on a test set comprising IDC samples.

Table 4

The various performance measures of deep CNNs on the IDC dataset after the Histo-Fusion stage 2.

Model	Sensitivity	Specificity	F1-Score	Precision	Accuracy (%)
AlexNet	0.97401	0.81170	0.97628	0.97856	95.75
VGG19	0.97293	0.83625	0.97751	0.98214	95.96
ResNet34	0.97679	0.83004	0.97864	0.98050	96.17
ResNet50	0.98040	0.85166	0.98138	0.98235	96.67
DenseNet121	0.98344	0.90120	0.98599	0.98855	97.49

In the light of above figures, it is profoundly evident that the proposed domain-specific learning i.e., Histo-Fusion has improved the discriminating abilities of all the deep CNN architectures used. Furthermore, the results obtained demonstrate that the proposed Histo-Fusion learning method considerably enhanced the performance of each deep CNN used by reporting superior metric scores as compared Transfer Learning, which is illustrated in figure 6. Another critical inference drawn from figure 6 is that the proposed Histo-Fusion learning approach has paid dividends by improving the overall performance of even the shallower networks as these also yielded highly competitive results in comparison to Transfer Learning approach.

To measure the diagnostic ability of various deep CNNs used, true-positive rates are plotted against the false-positive rates for each model in the form of a ROC curve (figure 7). Theoretically, AUC values obtained from the ROC curve operate within the range of 0 and 1.tic ability of a deep CNN model. The value of 1 signifies that the model is 100% accurate, while 0 indicates that the model is completely inaccurate. Since the AUC values obtained from ROC curves for each deep CNN are optimal (nearly equal to 1) it confirms the high sensitivity of each deep CNN to each sub-class of IDC patch samples using Histo-Fusion learning approach.

Comparison with related studies

This section presents a brief review of the methods used and the accuracy scores achieved by other research studies having used the same dataset, in order to assess the effectiveness of the proposed Histo-Fusion method. For the sake of brevity, salient features of these similar studies have been reproduced in tabular form in table 5 to compare them with the results achieved by the present study. The studies conducted so far have mostly used either conventional machine learning techniques or deep CNN methods for identifying invasive ductal carcinoma from histopathological images. The Conventional CAD methods have limited abilities to perform the required task yielding lesser accuracy and increased false-positive rates. (Roy, Das, Kar, Schwenker, & Sarkar, 2021) implemented a wide range of machine learning methods, the most efficient among which is the CatBoost method, but that too yielded less accurate results, even lesser than those obtained by simple CNN methods.

Table 5

Comparison of the performance of proposed Histo-Fusion learning with related studies. In the table header Sen., F1, Prec., and Acc. Indicate the Sensitivity, F1-Score, Precision and Accuracy respectively, while '-' indicates that the specific value is not available and '*' represents the balanced accuracy. The value of image size is specified in pixels.

Method	Method used	Image Size	Sen.	F1.	Pre.	Acc. (%)
(Cruz-Roa, 2014)	CNN with training from scratch.	100 x 100	0.796*	0.718*	0.766*	0.842*
(M. J. Rahman, 2018)	CNN with augmented data and training from scratch.	50 x 50	-	0.893	-	0.890
(J. L. Wang, 2018)	CNN with training from scratch.	50 x 50	0.930	0.923	0.923	0.890
(H. Alghodhaifi, 2019)	IDCNet with augmented data and training from scratch.	50 x 50	0.935	0.76	0.810	0.8713
(Hernandez, 2019)	CNN using Transfer Learning.	50 x 50	0.854*	0.852	0.851	0.854*
(F. P. Romero, 2019)	InceptionNet using Transfer Learning	50 x 50	-	0.897*	-	0.890*
(Sujatha, 2020)	ResNet50 using Transfer Learning	48 x 48	-	-	-	0.910
(Roy, Das, Kar, Schwenker, & Sarkar, 2021)	CatBoost classifier & traditional feature extraction methods like SIFT, SURF etc.	48 x 48 pixels.	0.888	0.907	0.934	0.925
(TANGUDU, 2021)	ResNet50 and Transfer Learning.	48 x 48	0.9482	0.9284	0.9576	0.9236
(Nusrat Ameen Barsha, 2021)	DenseNet121 and DenseNet169 with Transfer Learning	250 x 250	-	0.9570*	-	0.9207*
Present study using Transfer Learning	DenseNet121	48 x 48	-	-	-	0.930
Present study using Histo-Fusion	DenseNet121	48 x 48	0.983	0.985	0.988	0.974

Most of the studies dealing with deep CNNs to classify the IDC image samples have employed Transfer Learning approach (Hernandez, 2019) (Sujatha, 2020) (J. L. Wang, 2018). As compared to traditional methods, a substantial decrease in false-positive rates has been observed using deep CNNs. However, it has been observed that the diagnostic ability of these deep CNNs using Transfer Learning cannot be improved beyond a certain level (93.05% - fig. 4) irrespective of the network depth. The results obtained the present study also corroborate this fact (Figure 4 and Figure 6). However, the proposed Histo-Fusion Learning approach enhanced the overall discriminating abilities of each deep CNN used and cumulatively increased the accuracy scores with increase in depth of the deep CNN (Table 4).

The present study proposes a domain-specific learning technique i.e., Histo-Fusion which is an efficient method to train deep CNN architectures for automatic breast cancer classification from IDC small patches of histopathological images. In this approach, deep CNNs are exhaustively trained on two different datasets (BreakHis and IDC) comprising histopathological images of breast cancer. The proposed Histo-Fusion approach enables the deep CNNs to improve their ability to discriminate between positive and negative samples thereby reducing false-positive rates. To enhance the performance of deep CNNs for automatic classification, differential learning rates have been used that proved to be very useful.

Transfer Learning provides excellent results in identification and classification tasks on datasets comprising whole slide images of good resolution. However, the same does not hold for datasets comprising smaller sized images of comparatively lower resolution. The poor performance of Transfer Learning on small sized and low-resolution images may simply be attributed to negative-transfer problem. To mitigate this problem, the study proposes a domain-specific learning i.e., Histo-Fusion. Histo-Fusion imparts a positive transfer while training the deep CNNs on two related datasets, BreakHis and IDC. Among various deep CNNs used in this study, the highest performance is achieved by using Histo-Fusion learning (DenseNet121–97.49%, ResNet50–96.67%, ResNet34–96.17%, VGG19–95.9% and AlexNet – 95.7%) compared to Transfer Learning (DenseNet121–93.05%, ResNet50–92.27%, ResNet34–90.83%, VGG19–90.3% and AlexNet – 90.8%). The proposed Histo-Fusion learning approach also resulted in a considerable decrease in false-positive rates of each deep CNN, yielding scores surpassing the results obtained in similar studies using Transfer Learning (Table 5).

Funding

The authors received no funding for this work.

Competing Interests

The authors declare that there are no competing interests.

A. Chan, J. A. (2016). Automatic Prediction of tumor malignancy in breast cancer with fractal dimension. . Royal Society open science.
Abdelhafiz D, Y. C. (2019). Deep convolutional neural networks for mammography: advances, challenges and applications. BMC Bioinformatics.
Alghodhaifi, H., Alghodhaifi, A., & Alghodhaifi, M. (2019). Predicting Invasive Ductal Carcinoma in breast histology images using Convolutional Neural Network. In Proceedings of the 2019 IEEE National Aerospace and Electronics Conference (NAECON), (pp. 15-19). Dayton, OH, USA.
Alzubaidi, L., & Al-Amidie M, A.-A. A.-S. (2021). Novel Transfer Learning Approach for Medical Imaging with Limited Labeled Data. Cancers (Basel).
Attallah O, A. F. (2021). Histo-CADx: duo cascaded fusion stages for breast cancer diagnosis from histopathological images. . PeerJ Comput. Sci.
Azam Hamidinekoo, E. D. (2018). Deep learning in mammography and breast histology, an overview and future trends. Medical Image Analysis, 47, 45 - 67.
B. E. Boser, I. M. (1992). A training algorithm for optimal margin classifiers. Fifth annual workshop on Computational Learning THeory, ACM, 144 - 152.
B. N. Narayanan, V. K. (2019). Convolutional Neural Network for Classification of Histopathology Images for Breast Cancer Detection. EEE National Aerospace and Electronics Conference (NAECON), (pp. 291 - 295). IEEE.
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 1-127.
Breiman, L. (2001). Random Forests. Mach. Learn., 5 - 32.
Cruz-Roa, A. B. (2014). Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. Medical Imaging : Digital Pathology .
Elmore, J. G. (2009). Variability in interpretive performance at screening mammography and radiologists characteristics associated with accuracy. . Radiology, 641 - 651.
F. A. Spanhol, L. S. (2016). A Dataset for Breast Cancer Histopathological Image Classification. IEEE Transactions on Biomedical Engineering, 1455 - 1462.
F. P. Romero, A. T. (2019). Multi-Level Batch Normalization in Deep Networks for Invasive Ductal Carcinoma Cell Discrimination in Histopathology Images. EEE 16th International Symposium on Biomedical Imaging (ISBI 2019) (pp. 1092 - 1095). IEEE.
G. Huang, Z. L. (2017). Densely Connected Convolutional Networks. EEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 2261 - 2269).
H. Alghodhaifi, A. A. (2019). Predicting Invasive Ductal Carcinoma in breast histology images using Convolutional Neural Network. 2019 IEEE National Aerospace and Electronics Conference (NAECON) (pp. 374 - 378). IEEE.
He, K. Z. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , (pp. 770 - 778).
Hedjazi MA, K. I. (2017). On identifying leaves: A comparison of CNN with classical ML methods. Signal Processing and Communications Applications Conference (SIU) (pp. 1- 4). IEEE.
Hernandez, A. M. (2019). Enhanced Deep Learning Approach for Predicting Invasive Ductal Carcinoma from Histopathology Images. 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD),, (pp. 142-148).
Huynh BQ, L. H. (2016). Digital mammographic tumor classification using transfer learning from deep convolutional neural networks. J Med Imaging.
J. L. Wang, A. K. (2018). A Study on Automatic Detection of IDC Breast Cancer with Convolutional Neural Networks. International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 703 - 708). IEEE.
Janowczyk A, M. A. (2016). Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. J Pathol Inform.
K. Q. Weinberger, J. B. (2006). Distance metric learning for large margin nearest neighbor classification. . Advances in neural information processing systems, 1473 - 1480.
Karen Simonyan, A. Z. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations.
Khuriwal, N. &. (2018). Breast cancer detection from histopathological images using deep learning. In 2018 3rd international conference and workshops on recent advances and innovations in engineering (ICRAIE), 1-4.
Kopans, D. B. (1992). The positive predictive value of mammography. American Journal of Roentgenology, 521 - 526.
Krizhevsky, A. S. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 1097 - 1105.
L. P. Coelho, A. A.-S. (2010). Structured Literature image finder: extracting information from text and images in biomedical literature. Linking Literature, Information, and Knowledge for Biology, Springer., 23 - 32.
Lee JG, J. S. (2017). Deep Learning in Medical Imaging: General Overview. Korean J Radiol., 570-584.
Lehman CD, W. R. (2015). Diagnostic accuracy of digital screening mammography with and without-aided detection. JAMA Intern Med., 1828 - 1837.
Li Y, C. H. (2016). A survey of computer-aided detection of breast cancer with mammography. . J Health Med Inf.
Litjens G, K. T. (2017). A survey on deep learning in medical image analysis. . arXiv preprint arXiv:170205747.
M. J. Rahman, R. I. (2018). Automatic System for Detecting Invasive Ductal Carcinoma Using Convolutional Neural Networks. TENCON 2018 - 2018 IEEE Region 10 Conference (pp. 673 - 678). IEEE.
Mathur, P. a. (2020). Cancer Statistics, 2020: Report From National Cancer Registry Programme, India. JCO Global Oncology, 1063-1075.
Nanni L, G. S. (2017). Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recognition, 158-172.
Neal, L. T. (2010). Clinician’s guide to imaging and pathologic findings in benign breast disease. . Mayo Clinic Proceedings, (pp. 274 - 279).
Nusrat Ameen Barsha, A. R. (2021). Automated detection and grading of Invasive Ductal Carcinoma breast cancer using ensemble of deep learning models. Computers in Biology and Medicine.
O., A. (2019). Multi-tasks biometric system for personal identification. IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) (pp. 110-114). IEEE.
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A., & Gulin, A. (2018). CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, (pp. 3 - 18). Montreal, QC, Canada,: Curran Associates Inc.: Red Hook.
Robboy SJ, W. S.-W. (2013). Pathologist workforce in the United States: I.Development of a predictive model to examine factors influencing supply. Archives of Pathology and Laboratory Medicine, 1723–1732.
Roy, S., Das, S., Kar, D., Schwenker, F., & Sarkar, R. (2021). Computer Aided Breast Cancer Detection using Ensembling of Texture and Statistical Image Features. Sensors.
Russakovsky O, D. J. (2015). Imagenet large scale visual recognition challenge. Int J Comput Vis, 211-252.
SA., F. (2014). Screening mammography benefit controversies: sorting the evidence. Radiol Clin N Am., 455 - 480.
Saad Awadh Alanazi, M. M. (2021). Boosting Breast Cancer Detection Using Convolutional Neural Network. Journal of Healthcare Engineering,.
Sharma S, M. R. (2020). Conventional Machine Learning and Deep Learning Approach for Multi-Classification of Breast Cancer Histopathology Images-a Comparative Insight. J Digit Imaging., 632-654.
Spanhol FA, O. L. (2015). A dataset for breast cancer histopathological image classification. IEEE Transactions on Biomedical Engineering., 1455 - 1462.
Sujatha, H. C. (2020). ResNet: Detection of Invasive Ductal Carcinoma in Breast Histopathology Images Using Deep Learning. 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC) (pp. 60-67). IEEE.
Sung, H. F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin., 209 - 249.
Suzuki S, Z. X. (2016). Mass detection using deep convolutional neural network for mammographic computer-aided diagnosis. . In: Society of Instrument and Control Engineers of Japan (SICE) 2016 55th Annual Conference of the IEEE. IEEE.
T. Ojala, M. P. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence, 971-987.
TANGUDU, N. &. (2021). COMPUTER AIDED DIAGNOSIS OF BREAST CANCER FROM HISTOPATHOLOGICAL IMAGES USING DEEP LEARNING TECHNIQUES. Turkish Journal of Physiotherapy and Rehabilitation,.
Tharwat, A. (2016). Linear vs Quadratic discriminant analysis classifier. International Journal of Applied Pattern Recoginition., 145 - 180.
V. Lepetit, P. F. (2006). Keypoint recognition using randomized trees. IEEE transactions on pattern analysis and machine intelligence, 1465 - 1479.
V. Ojansivu, J. H. (2008). Blur insensitive texture classification using local phase quantization. . International Conference on image and signal processing. (pp. 236 - 243). Springer.
Veta, M. P. (2014). Breast Cancer Histopathology Image Analysis . IEEE Transactions on Biomedical Engineering, 1400 - 1411.
Wang J, D. H. (2017). Detecting cardiovascular disease from mammograms with deep learning. IEEE Trans Med Imaging.
Welch HG, P. H. (2014). Quantifying the benefits and harms of screening mammography. JAMA Intern Med., 448 - 454.
Xie J, L. R. (2019). Deep learning based analysis of histopathological images of breast cancer. Frontiers in Genetics .
Zhang X, Z. Y. (2019). Classifying breast cancer histopathological images using a robust artificial neural network architecture. In: Rojas I., Valenzuela O., Rojas F., Ortuño F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science. Springer.
Zhao R, Y. R. (2016). Deep Learning and Its Applications to Machine Health Monitoring: A Survey. . arXiv preprint arXiv:161207640.

No competing interests reported.

Histo-Fusion: A novel domain specific learning to identify Invasive Ductal Carcinoma (IDC) from Histopathological Images

Status:

Version 1

Abstract

Figures

Introduction

Related Literature

Materials And Methods

Dataset Description

1) BreakHis Dataset

2) IDC Dataset

Convolutional Neural Networks

Proposed Method

Transfer Learning

Histo-Fusion Approach

Histo-Fusion Stage 1:

Histo-Fusion Stage 2:

Performance Assessment

Results And Discussion

Transfer Learning Results

Histo-Fusion Learning Results

Comparison with related studies

Conclusion

Declarations

Funding

Competing Interests

References

Additional Declarations

Status:

Version 1