Data Acquisition, data pre-processing, feature mining, and decision mining are the main integral modules for an automatic classification system (Sharma S, 2020). Among these, feature mining holds prominence because its performance is directly correlated to the quality of features extracted from input images. Conventional methods like Local Binary Pattern (LBP) (T. Ojala, 2002), Parameter Free Threshold Adjacency Statistics (PFTAS) (L. P. Coelho, 2010), Oriented Fast and Rotated BRIEF (Rublee, 2011), and Local Phase Quantization (LPQ)(V. Ojansivu, 2008) use ‘texture’ as a significant attribute for feature extraction.(Spanhol FA, 2015) evaluated the feature extraction ability of the conventional methods in conjugation with various classifiers like ‘1 nearest neighbor (1-NN) (K. Q. Weinberger, 2006), Quadratic Linear Analysis (QDA) (Tharwat, 2016), Support Vector Machines (SVM) (B. E. Boser, 1992), and Random Forests (RF) (V. Lepetit, 2006). The study concluded that SVM, using fractal dimension as a feature descriptor, performed well among various classifiers on low-resolution images (A. Chan, 2016).(Roy, Das, Kar, Schwenker, & Sarkar, 2021) used ensemble learning by stacking and extracted the textural features from the histopathological images for classifying them into IDC + and IDC - categories. The study also conducted an in-depth comparative analysis of different machine learning classifiers in classifying breast cancer histopathological image patches inferring that the CatBoost (CB)(Prokhorenkova, Gusev, Vorobev, Dorogush, & Gulin, 2018) classifier proved more efficient with an AUC score of 92.2%, followed by the extra-Tree model − 89.7%, and multi-level perceptron (MLP) − 90.4% while RF(Breiman, 2001) yielded the lowest AUC score of 87.1%. The conventional methods of classification using machine learning are comparatively more complex in nature and the results they yield are highly inconsistent, which is due to the fact that these methods depend on the standard of features retrieved by various feature descriptors. To address the issue, recourse is taken to utilize deep learning methods which are discussed below.
Recently, deep neural network architectures with inherent abilities to retrieve the most significant features from histopathological images were used to perform automatic classification tasks. The first attempt to use deep neural networks to detect IDC from whole-slide histopathological images used a smaller CNN consisting of just three layers (Cruz-Roa, 2014). Besides, the study used some classical handcrafted methods such as Fuzzy-color-histogram, RGB-histogram, LBP, Harlick features, graph-based features, and gray histogram for feature extraction and classification tasks. With an F1-score of 0.7180 and a balanced accuracy score of 0.8423, the CNN-based approach provided better results when compared to handcrafted methods. On the other hand, Fuzzy-color histogram (F1: 0.6753 and balanced accuracy: 0.7874) and RGB histogram (F1: 0.666 and balanced accuracy: 0.7724) yielded better results among the classical methods used. (B. N. Narayanan, 2019) used color consistency and histogram equalization for pre-processing the histopathological patch images. The deep CNN used in the study obtained an AUC value of 93.5% on color consistency pre-processed patches compared to 87.6% obtained on the patches pre-processed with histogram equalization. We may attribute this increment to the color consistency technique that can maintain the contrast levels across all the images of the dataset used, which provides a better discriminating ability to a deep CNN to perform the classification task.
(J. L. Wang, 2018) applied a good number of CNNs for automatic feature extraction and classification tasks to optimize the F1 and AUC scores, four of which were already implemented by (Cruz-Roa, 2014). Each CNN architecture is trained on the IDC dataset till it achieves maximum accuracy with minimal gradient. The deeper network of the four deep CNNs yielded higher AUC and F1 scores compared to the less deep networks. Precisely the deepest of the four CNNs with a single dropout layer yielded F1: 0.923, BAC: 0.866, and Accuracy: 89% compared to a network comprising two dropout layers which yielded F1: 0.911, BAC: 0.814, and Accuracy: 87%. An interesting inference drawn from this study is that the performance increase through data augmentation is directly proportional to the depth of the CNN used. Strangely, the study is silent about the specific procedures used to accomplish the data augmentation process.(M. J. Rahman, 2018) applied data augmentation through random rotation, horizontal and vertical flipping/shifting on a training set comprising breast cancer histopathological images. A six-layer CNN with ReLU non-linearity and Adadelta optimizer using data augmentation classified the test samples into IDC + and IDC - obtaining F1: 0.8934 and Accuracy: 89%. Moreover, the architecture displayed less model overfitting, certainly by virtue of data augmentation.
(H. Alghodhaifi, 2019) investigated the performance of depth-wise separable convolution model against the standard convolution model of CNN on the IDC dataset. The depth-wise separable convolution characterizes itself by performing a single convolution on a single channel at a given time, whereas the standard convolution performs the convolution operation on all channels simultaneously. To rectify the non-linearity towards the end of convolution operations, numerous activation functions like ReLU, Tanh, Sigmoid are independently used in both models to test their respective response to the classification task. The standard convolution neural network model provided better performance results (Precision: 0.81, Specificity: 0.73, Sensitivity: 0.71, F1: 0.3, and Accuracy 87%) in contrast to the depth-wise convolutional neural network. Among various activation functions used in this model, ReLU with Accuracy: 87.1% outperformed others, followed by Sigmoid with a marginally less Accuracy: 86.4%. (Hernandez, 2019) used an under-sampling approach to reduce the bias of a classifier towards the majority class in the training dataset. Under-sampling tends to remove some of the sample patches from the majority class of the training set making it a balanced training package. The balanced dataset is further used for training a CNN architecture as implemented by (M. J. Rahman, 2018) with a slight variation i.e., use of variable dropout rates. The optimized CNN-based approach yielded Accuracy: 0.854 on the IDC dataset. (F. P. Romero, 2019) implemented a multi-level batch normalization using the inception CNN as base architecture. These modules, indeed, help mitigate the internal co-variance shift enabling better training of the CNN model. The implementation of this method finally resulted in obtaining BAC: 89% on an imbalanced IDC test set.
(Sujatha, 2020) used residual networks ResNet34 and ResNet50 on different training sets containing varying sample instances of IDC patch images. ResNet50 showed slightly better performance over ResNet34 in identifying IDC + and IDC - samples. The study concluded that the discriminating ability of the CNN models decreases directly in proportion to the decrease in sample instances of the training sets.(Saad Awadh Alanazi, 2021) investigated the performance of machine learning methods like Logistic Regression, KNN, and SVM along with deep learning methods involving three different CNN architectures. In the classification involving the identification of IDC + and IDC- using conventional machine learning methods, SVM outperformed its other two counterparts yielding Accuracy: 78.5%. On the other hand, out of the three CNNs, the two shallower networks comprising two and three Convolutional layers yielded respectively 59% and 76% accuracy scores on the same set of test samples, while the third deeper network with five Convolutional layers achieved an accuracy score of 87%.
Implementation of ensemble learning that seeks a better performance by combining the outputs of two deep CNNs (DenseNet121 and DenseNet169) obtained BAC: 92.07% and F1: 0.9570 (Nusrat Ameen Barsha, 2021). This ensemble model in conjugation with Test-time-augmentation achieved better performance in classifying IDC patch samples than the individual pre-trained models. The study further highlights the performance improvements achieved in the classification task of the ensemble model by upscaling the image patch samples from 50x50 pixels to 250x250 pixels. (Alzubaidi & Al-Amidie M, 2021) used a domain-specific learning approach, similar to the one used in this study, by initially training the deep CNN on a large dataset comprising images of skin cancer. Afterwards, the deep CNN is trained and fine-tuned on histopathological images of the BreakHis dataset. The deep CNN model using domain-specific learning achieved accuracy: 97.5% against 85.29% using training from scratch. (Attallah O, 2021) proposed a novel automatic detection method, Histo-CADx, for identifying breast cancer from histopathological images. In its initial phase, the Histo-CADx investigates the impact of combining features extracted using deep learning methods with those obtained from traditional methods. Later the Histo-CADx implements a multi-classifier system by fusing the outputs of three individual classifiers. Such an arrangement achieved better performance in classifying breast cancer from histopathological images of BreakHis and ICIAR datasets.