White Blood Cells (WBC) Images Classication Using CNN-Based Networks

Background: Computer-aided methods for analyzing white blood cells (WBC) are popular due to the complexity of the manual alternatives. Recent works have shown highly accurate segmentation and detection of white blood cells from microscopic blood images. However, the classification of the observed cells is still a challenge, in part due to the distribution of the five types that affect the condition of the immune system. Methods: (i) This work proposes W-Net, a CNN-based method for WBC classification. We evaluate W-Net on a real-world large-scale dataset that includes 6,562 real images of the five WBC types. (ii) For further benefits, we generate synthetic WBC images using Generative Adversarial Network to be used for education and research purposes through sharing. Results: (i) W-Net achieves an average accuracy of 97%. In comparison to state-of-the-art methods in the field of WBC classification, we show that W-Net outperforms other CNN- and RNN-based model architectures. Moreover, we show the benefits of using pre-trained W-Net in a transfer learning context when fine-tuned to specific task or accommodating another dataset. (ii) The synthetic WBC images are confirmed by experiments and a domain expert to have a high degree of similarity to the original images. The pre-trained W-Net and the generated WBC dataset are available for the community to facilitate reproducibility and follow up research work. Conclusion: This work proposed W-Net, a CNN-based architecture with a small number of layers, to accurately classify the five WBC types. We evaluated W-Net on a real-world large-scale dataset and addressed several challenges such as the transfer learning property and the class imbalance. W-Net achieved an average classification accuracy of 97%. We synthesized a dataset of new WBC image samples using DCGAN, which we released to the public for education and research purposes.


Background
White blood cells (WBCs) are one type of blood cells, besides red blood cell and platelet, and are responsible for the immune system, defending against foreign substances and bacteria. WBCs are typically categorized into five major types: neutrophils, eosinophils, basophils, lymphocytes and monocytes. Neutrophils consist of two functionally unequal subgroups: neutrophil-killers and neutrophil-cagers, and they defend against bacterial or fungal infections [2]. The amount of eosinophils increase in response to allergies, studies, which are greatly focused on the segmentation and detection tasks, less attention has been given to the WBC classification task and factors impacting the accuracy and performance of the task.
Accurate WBC classification is also beneficial for diagnosing leukemia, a type of blood cancer in which abnormal WBCs in the blood rapidly proliferate, decreasing the number of normal blood cells making the immune system vulnerable to infections In the US, around 60,000 people are diagnosed with leukemia every year, and around 20,000 people die of leukemia annually. From 2011 to 2015, leukemia was the sixth most common cause of cancer-caused death in the US [8]. There are various types of leukemia, including ALL (Acute lymphocytic leukemia), AML (Acute myelogenous leukemia), CLL (Chronic lymphocytic leukemia), CML (Chronic myelogenous leukemia). Chronic leukemia progresses more slowly than acute leukemia which requires immediate medical care. Acute leukemia is characterized by proliferation of blasts, CLL is characterized by increased lymphocytes while CML shows markedly increased neutrophils and some basophils in the blood [9]. Therefore, accurate classification of WBCs contributes to the diagnosis of leukemia.
Recent advancements in the field of computer vision and computer-aided diagnosis show a promising direction for the applicability of deep learning-based technologies to assist accurate classification and counting of WBC. Convolutional neural network (CNN) is one of the most common and successful deep learning architectures that have been utilized for analyzing and classifying medical imagery data [10,11,12,13]. In this paper, we propose W-Net, a CNN-based network for WBC images classification. W-Net consists of three convolutional layers and two fully-connected layers, and they are responsible for extracting and learning features from WBC images and classifying them into five classes using a softmax classifier. In comparison to state-of-the-art methods, W-Net shows outstanding results in terms of accuracy. Further, we investigate the performance of several deep learning architectures in performing the WBC classification task. We applied and compared the performance of several architectures including W-Net, AlexNet [14], VGGNet [15], ResNet [16], and Recurrent Neural Network (RNN). Moreover, we compared the utilization of different classifiers such as softmax classifier and Support Vector Machine (SVM) on top of the adopted models. Moreover, we explore the effects of pre-training W-Net using public datasets, such as the LISC public [17], on its performance. Understanding the importance of large-scale datasets on the models' performance, we generate new WBC images using GAN [18] to augment current educational and research datasets.
Contribution. The contributions of this paper are as follows. 1 We propose ❶ W-Net, a CNN-based network, designed to accurately classify WBCs while maintaining a high efficiency through minimal depth of the CNN architecture.
❷ We evaluate the performance of W-Net using a real-world large-scale dataset that consist of 6,562 real images. ❸ We address and handle the problem of imbalanced classes of WBCs and achieve an average classification accuracy of 97% for all classes. ❹ We show how W-Net which consists of three convolutional layers stands among most popular CNN-based architectures, in the field of image classification and computer vision, in performing the WBCs classification task. ❺ Serving the purpose of advancing the task, we studied the applicability of transfer learning and generating larger datasets of WBC images using GAN for the public release. ❻ We generate and publicize synthetic WBC images using Generative Adversarial Network to be used for education and research purposes. The synthetic WBC images are verified by experiments and a domain expert to have a high degree of similarity to the original images. The pretrained W-Net and the generated WBC dataset are available for the public.
Organization. The rest of the paper is organized as follows: in section 1, we review literature. We introduce our model W-Net in section 2. We evaluate W-Net through various experiments on WBC images in section 3. Our design choices and the experiment result are discussed in section 4. We release a new WBC dataset using GAN in section 5. Finally, we conclude our study in section 6.

Previous Works
Analysis of white blood cells (WBC) has vital importance in diagnosing diseases. Distribution of the five WBC types, (basophils, eosinophils, lymphocytes, monocytes and neutrophils) reflects highly on the condition of the immune system. Analyzing the components of WBCs requires performing segmentation and classification processes. The traditional analysis of WBC includes observing a blood smear on a microscope and using the visible properties, such as shapes and colors, to classifing the blood cells. However, the accuracy of the WBCs analysis depends significantly on the knowledge and experience of the medical operator [19]. This makes the process of analyzing of WBCs using conventional methods time-consuming and labor-intensive [19,20,23]. Therefore, many studies have proposed computer-aided technologies to facilitate the WBC analysis through accurate cell detection and segmentation to reduce the manual efforts needed by human experts. For instance, Shitong and Min [39] have proposed an algorithm based on fuzzy cellular neural networks to detect WBCs in microscopic blood images as the first key step for automatic WBC recognition. Using mathematical morphology and fuzzy cellular neural networks, the authors achieved a detection accuracy of 99%. The detection of WBCs is followed by a segmentation process, which segments the image into nucleus and cytoplasm regions. This task has been pursued by several studies providing accurate segmentation using a variety of methods. The most common approach for nuclei segmentation is the clustering based on extracted features from pixels values [40,41]. The literature shows a successful nuclei segmentation using different clustering techniques, such as K-means [42], fuzzy K-means [41], C-means [41], and GKmeans [43]. Among other unsupervised techniques for nuclei segmentation beside clustering, many studies utilized thresholding [21,23,44,24,45,46], arithmetical operations [25], edge-based detection [41,46], region-based detection [46], genetic algorithm [47], watershed algorithm [46], and Gram-Schmidt orthogonalization [17]. The literature on WBC segmentation process is very rich and provides valuable insights for the WBC identification. Andrade et al. [40] provides a survey and a comparative study on the performance of 15 segmentation methods using five public WBC databases. Some of these works are dedicated to the separation of adjacent cells, while many others addressed particularly the separation of overlapping cells. After the segmentation process, the WBC image classification or identification process is conducted. The distinction between the task of WBC identification and WBC image classification is the identification process aims to detect and identify leucocytes in an image, while the classification process aims to distinguish the different types of WBC. Even though many studies are dedicated to segmentation and identification task, fewer researches are addressed the classification of the WBCs. The literature shows that classification methods used for this purpose include the K-Nearest Neighbor (KNN) classifier [20,24], Bayesian classifier [35,23,24], SVM classifier [19,24,43,37,17], Linear Discriminant Analysis (LDA) [33], decision trees and random forest classifier [36,24], and deep learning [21,25,17,27,31,37]. Table 1 shows an overview of the performance and methods of the related works.

CNN with Medical Images
Due to the vast success in a variety of applications, CNN has been adopted in several medical applications where imagery inputs are analyzed for diagnosis or classification. In the field of medical imaging, CNN has been successfully utilized for histological microscopic image [48], pediatric pneumonia [49], diabetic macular edema [49], ventricular arrhythmias [50], thyroid anomalies [51], neuroanatomy [52], and others [10,11,12,13,53,54,55,56,57,58,59]. Kermany et al. [49] showed that CNN can detect diabetic macular edema and age-related macular degeneration with high accuracy and with a comparable performance of human experts.   The authors also demonstrated the applicability of CNN in diagnosing pediatric pneumonia from chest X-ray images. Alexander et al. [48] have provided the state-of-the-art performance (by the publication date) using CNN for histopathological image classification on the dataset of the ICIAR 2018 Grand Challenge on Breast Cancer Histology Images. Acharya [50] have shown that CNN can be used to accurately detect shockable and non-shockable life-threatening ventricular arrhythmias. Wachinger et al. [52] proposed DeepNAT, a CNN-based approach for automatic segmentation of NeuroAnaTomy in magnetic resonance images. The authors showed that their approach provided comparable results to those of state-of-the-art methods.

Methods
This section provides a description of the dataset used in this study, the pre-processing steps for the WBC images, and the proposed CNN-based architecture for WBC classification.

Dataset
We use a real-world dataset of 6,562 images that belong to five WBC types, namely, neutrophil, eosinophil, basophil, lymphocyte, and monocyte. The dataset was provided by The Catholic University of Korea (The CUK), and approved by the Institutional Review Board (IRB) of The CUK [60]. The images were captured by Sysmex DI-60 machine [61], and provided with 360 361 3 (3 channels, RGB) image size. Table 2 shows the number of images per class: 2,006 neutrophils (NE) images, 1,310 eosinophils (EO) images, 377 basophils (BA) images, 1,676 lymphocytes (LY) images and 1,193 monocytes (MO) images. The class distribution in our dataset is 30%, 20%, 6%, 26% and 18% for the five classes. for properly fitting them into a GPU memory and for efficient processing. Samples of the processed images are shown in Figure 1. The image normalization process was applied to reduce the heterogeneity of the RGB distribution in the images and to prevent over/underflow. This step is shown in Figure 2.

W-Net: Architecture and Design
We introduce our CNN-based model architecture for WBC image classification. As illustrated in Figure 2, W-Net consists of three convolutional layers and two fully-connected layers, and they are responsible for extracting and learning features from WBC images to accurately classifying them into five classes using a softmax classifier. Each convolutional layer has a kernel size of 3 x 3 with stride of size 1 and uses ReLU activation function f(x) = max(0, x). The first convolutional layer has 16 filters, the second has 32 filters and third has 64 filters. After each convolutional layer, there is a max-pooling layer of size 2 x 2 with stride of size 2 and zero padding. We also use dropout regularization with p = 0.6 [62] to prevent overfitting in each convolutional layer. The output of the third convolutional layer is flattened and fed into the first fully connected layer which has 1,024 units. ReLU activation, and dropout with p = 0.6 are followed. The second fully connected layer has five units (five classes of WBC) and is followed by softmax classifier to map the output (features) to one of the five classes. The network has a total size of 16,806,949 trainable parameters. The model parameters were initialized using xavier initializer x = sqrt(6 / (in + out)). The training of models is guided by minimizing the softmax-cross-entropy loss function using adam optimizer [63] with a learning rate of 0.0001.
The training process is conducted with different batch sizes and terminated by the conclusion of 500 training epochs. The evaluation of the models are conducted using a 10-fold crossvalidation approach [64]. The structure is illustrated in Table 3.
Design choices for W-Net are discussed in section 4.

Experiments
We show the performance of W-Net for WBC classification and compare the softmax classifier of W-Net with SVM. We show that W-Net provides remarkable results in the WBC classification by comparing it to the prior work. We also show how the number of layers affects performance. The comparison includes AlexNet, VGGNet, ResNet and RNN models. For transfer learning, we provide insights on adopting pre-trained W-Net to gain higher WBC classification performance on public datasets. ROC curve and AUC are a useful method for evaluating a system in medical area and are usually used to classify a binary task such as a diagnosis. However, we remark that our results are only based on an accuracy, because the output of our model is multiple-class not the binary. Table 9 in Appendix A.1 shows the accuracy achieved by W-Net using 10-fold cross-validation approach. Conducting the experiments required 33.87 hours of model's training time. For the neutrophil, 1,800 images were used for training and 206 images were used for testing in each fold, and the average accuracy was 98%. For the eosinophil, 1,179 images were used for training and 131 images were used for testing in each fold, and the average accuracy was 97%. For the basophil, 340 images were used for training and 37 images were used for testing in each fold, and the average accuracy was 95%. For the lymphocyte, 1,509 images were used for training and 167 images were used for testing in each fold, and the average accuracy was 97%. For the monocyte, 1,074 images were used for training and 119 images were used for testing in each fold, and the average accuracy was 97%. The average overall accuracy of the five WBC classes was 97%.

W-Net vs W-Net-SVM Performance
We compared softmax classifier of W-Net with SVM to demonstrate classifier's abilities in performing the WBC classification task. We trained a W-Net model with SVM classifier (W-Net-SVM) using hinge loss function [65] l(y) = max(0, 1t ㆍ y) instead of softmax (W-Net). We followed the same experimental settings adopted in previous experiment including the training parameters, dataset, pre-processing steps, work station environment, and 10-fold cross-validation approach for the evaluation. The network has a total of 16,806,949 trainable parameters. Table 10 in Appendix A.1 shows the performance of W-Net-SVM using 10-fold cross-validation in the WBC classification task. The training time of W-Net-SVM was 33.79 hours. The achieved results for the neutrophil, eosinophil, basophil, lymphocyte, and monocyte classes are 98%, 97%, 87%, 98%, and 97%, respectively. The overall average accuracy of the five classes was 95%.

WBC Classification with AlexNet
This experiments adopts AlexNet architecture in the WBC classification task. AlexNet network consists of five convolutional layers and three fully-connected layers which apply ReLU activation function (in all layers except the last (softmax) layer). The training of AlexNet model is conducted by minimizing the softmax-cross-entropy loss function using the momentum optimizer [66]. Using a corssvalidation approach, the best training hyperparameters that achieved the best WBC classification accuracy are as follows: learning rate = 0.001, decay = 0.0005, momentum = 0.9, dropout p = 0.5, batch size = 128 and training epochs = 90. We follow the same experimental settings adopted in previous experiments by using same dataset, pre-processing steps (except for the image size, we re-sized the images to 224 x 224 x 3 for AlexNet), workstation environment, and the 10-fold cross-validation evaluation approach. The AlexNet-based network has a total of 46,767,493 trainable parameters. Table 11 in Appendix A.1 shows the performance of AlexNet using a 10-fold crossvalidation approach in the WBC classification task. The achieved results of classification accuracy are 97%, 99%, 33%, 93%, and 99% for the neutrophil, eosinophil, basophil, lymphocyte, and monocyte classes, respectively. The overall average accuracy is 84%.

WBC Classification with VGGNet
We compared W-Net with VGGNet to demonstrate the effectiveness of W-Net in the WBC image classification. We trained a VGGNet-based model that consists of 16 convolutional layers and three full-connected layers, which followed with ReLU activation function. The model training is conducted using the minimization of the softmax-cross-entropy loss though adam optimizer. Using a cross-validation method, the best training hyperparameters that achieved the best classification accuracy are as follows: learning rate = 0.000001, dropout p = 0.5, batch size = 1, and training epochs = 300. This experiment followed the same experimental settings adopted in previous experiments. The VGGNet-based model includes a total of 121,796,165 trainable parameters. The training of the VGGNetbased model required 510.59 hours of training time. Table 12 in Appendix A.1 shows the results of the 10-fold cross-validation of VGGNet-based model in the WBC classification. The classification accuracy for the neutrophil is 100%, for eosinophil is 38%, for basophil is 31%, for lymphocyte is 19%, and for monocyte is 53%. The overall average accuracy of the five classes is 44%.

WBC Classification with ResNet
We adopts a ResNet50 network for WBC classification, which consists of 50 convolutional layers. The ResNet50 model is trained by minimizing the softmax-cross-entropy loss using momentum optimizer. Using a cross-validation approach, the best training hyperparameters to achieve the highest accuracy in the WBC classification task are as follows: learning rate = 0.001, decay = 0.0001, momen-tum = 0.9, batch size = 32, and training epochs = 50. The training and evaluation of the models are in compliance with experimental settings adopted in previous experiments. The ResNet50 model includes a total of 23,544,837 trainable parameters. The model required a training time of 8.38 hours. Table 13 in Appendix A.1 shows the classification accuracy obtained by of ResNet50 model using a the 10-fold cross-validation approach. The achieved accuracy for the individual class is as follows: the neutrophil 50%, eosinophil 51%, basophil 56%, lymphocyte 48%, and monocyte 50%. The overall average accuracy of the five classes is 51%.

WBC Classification with RNN
We explore the capabilities of RNN in the WBC classification task. Using RNN for WBC image classification, we adopted the common approach by considering the image rows as sequences and the columns as timesteps. Since we the WBC images are 128 128 3 images, we feed the model with batches of 128 sequences of size 128 3. The RNN model adopted in this experiment consists of only one single hidden layer. The experimental settings for the training process are set with the following search space: learning rate = 0.0001, 0.001, 0.003, 0.01, 0.1, 0.3, batch size = 16, 32, 64, 128 and hidden units = 16, 32, 128, 256, 512, 1,024. For hyper-parameter selection, we split 6562 images into train/test/validation sets by 5504/512/546 ratio. The best test accuracy was achieved when using a learning rate of 0.01, batch size of 64, and 32 hidden LSTM units. Once hyperparameters are selected, we conducted a new training process using a 10-Fold evaluation approach, where 10 different models are trained and evaluated using ten fold splits (each time a model is trained on nine folds and tested on one fold). The achieved accuracy for the individual classes are as follows: neutrophil 89%, eosinophil 88%, basophil 57%, lymphocyte 93%, and monocyte 90%. The results are shown in Table 14, Appendix A.1. The average accuracy of the five classes is 83%. Table 4 shows a summary of the results achieved by the different models, namely, W-Net, W-Net with SVM, AlexNet, VGGNet, ResNet, and RNN, using our dataset. The reported results are the average score of different accuracy metrics, accuracy, precision, recall, and F1-score. For W-Net, the accuracy, precision, recall, and F1-score are all

Further Training with Public Data
The LISC public dataset [17] includes WBC images of size 720 576 3 that were collected from peripheral blood of eight normal people. The images are classified by a hematologist into five types of WBC: neutrophils, eosinophils, basophils, lymphocytes and monocytes. For pre-processing the public dataset, we cropped the WBC images (nucleus and cytoplasm regions) in the original images, and then re-sized the images to 128 128 3 for training. We used a total of 254 WBC images as our dataset: 56, 39, 55, 56 and 48 images for neutrophil, eosinophil, basophil, lymphocyte and monocyte, respectively. Using the LISC public data, this experiments show the performance of W-Net when adopted for different datasets. Moreover, we show the performance of W-Net using transfer learning when a pre-trained W-Net is fine-tuned to classify WBCs from different dataset or used for different WBC-related tasks. To this end, we conducted two experiments as follows: ❶ W-Net architecture is used for building a WBC classifier trained using only the LISC public data, ❷ a pre-trained W-Net-based model is fine-tuned to classify WBCs from LISC public data. Except the training epochs, the training hyperparameters are set to be identical in both experiments. In the first experiment, W-Net-based model was trained from scratch using 4,000 training epochs (254 4,000/5 iterations) on the LISC public data. The training process was concluded after 10.33 hours. In the second experiment, we establish a pre-trained W-Net-based model (trained on our dataset for 500 training epochs.) to be used on the LISC public data. The pre-trained W-Net-based model was fine-tuned for 4,000 epochs (254 4,000/5 iterations) on the public data. The training process was concluded after 10.83 hours. Table 15 in Appendix A.1 shows the result of the first experiment where W-Net is used to classify WBCs from the LISC public data. The achieved results is an average accuracy of 91%. Table 16 in Appendix A.1 shows the result of the experiment. The average accuracy achieved using a pre-trained W-Net model is 96%.
In the results, the second experiment shows a better performance. This results shows that training a model in largescale dataset (such as the one used for this study) can benefit other transfer learning tasks, where the model is fine-tuned to other dataset or performing other WBC-related tasks. We share our pre-trained model on GitHub [67] and believe that using the transfer learning property of deep learning models can help other researchers in the field.

Design Considerations for W-Net
Design choices for our deep learning architecture are described in this section. There are two challenging issues to consider in choosing a specific architecture in the large design space for WBC classification problem: One is how to figure out the data imbalance problem, and the other is to classify similar-looking images into the relatively small number of classes. In many dataset in real world, data imbalance are quite common and WBC images resembles way more each other compared to objects in traditional image classification problems. Also, the number of classes is quite limited compared with the traditional object identification problems such as ImageNet challenge. Therefore, it is necessary to take a different approach to the classification problem.

Handling Data Imbalance: Large Batch and Sampling
The results show that W-Net performs well despite the dataset's imbalance, which is observed by the number of samples for each class. Even though the least-represented class in the dataset (basophil with 6% of the dataset) show the least accuracy of 95% in comparison to other classes, this accuracy is still higher than the results achieved by other methods, e.g. CNN-based and RNN-based models, for the same class. This performance can be due to several reasons. For instance, the evaluation of all experiments follows a stratified k-folds cross-validation approach, which preserves the percentage of samples across all folds. Using this approach allows the sampling from all classes in different ratios in each fold, which dictates the inclusion of all classes in both the training and testing phases. When using a small batch size, e.g. five samples as adopted during the training of W-Net, the error resulting from misclassifying one class, especially from underrepresented classes, highly impacts the average cost of the learning epoch and contributes in an effective learning process for these classes. In contrast, using a large batch size and considering a random sampling scheme for batching could result in minimizing the effect of misclassification of underrepresented classes since performing well on other classes could out-weigh the misclassification of small, if any at all, samples from classes with small ratios in the dataset.
Having different distributions of image samples per class is a hard part to classify WBC images. W-Net achieves an accuracy of 95% for identifying the basophil class which are represented with the least number of samples (377 samples and a ratio of 6% of the dataset). This result is remarkable knowing that all other CNN-based and RNN-based models achieved an accuracy below 56% and 57%, respectively, for the same class. The overall average accuracy of W-Net is 97%, which is the highest among other methods for WBC classification. Considering the results for this large-scale dataset, W-Net presents a state-of-theart performance.
Furthermore, the result of W-Net with softmax classifier is 97%, the result of W-Net with SVM classifier is 95% and they seem similar. However, for the basophil class that has 6% distribution of our dataset, the accuracy of W-Net with SVM is only 87% and it is lower than 95% the result of softmax. AlexNet has many layers than our W-Net, however, the average accuracy is 84%, and especially the average accuracy of the basophil class that has 6% distribution of our dataset is 33%. This means the SVM classifier and the network of AlexNet are not appropriate to address the unbalanced dataset. As a result, we can claim that W-Net with softmax classifier is more effective than AlexNet and W-Net with SVM classifier in WBC image classification area.

WBC Dedicated Architecture with Shallow Depth
In the 10-fold cross-validation evaluation of W-Net, the minimum average accuracy is 91% (basophil, Fold-9) and maximum average accuracy is 100%. However, in the case of VGGNet and ResNet architectures which have more depth (considering teh number of layers), the variance between the folds is from 0% to 100% resulting in 44% 10-fold average accuracy, and from 0% to 100% resulting in 51% 10-fold average accuracy, respectively. This means that very deep networks may not be the optimal choice for WBC image classification. The results of this research show that architectures such as W-Net's, which has five layers (three convolutional and two fully-connected.), can be sufficient and more effective in the WBC classification task in comparison to other deeper networks such as VGGNet and ResNet. In general, deep networks are known to perform well for the image classification, the VGGNet and ResNet with deep networks show good performance in ILSVRC. However, they did not show good performance in WBC image dataset. We claim that our dataset to be classified is different from the dataset aimed by those deep networks in two aspects: 1) the ILSVRC dataset has 1,000 classes, but our WBC dataset has only five classes, and 2) The images of the ILSVRC dataset are very different from each other (For example, they are dog, bird, flower and food etc.), while our dataset has very high visual similarity.
To support this claim we conducted two simple experiments, ❶ the first experiment was to run W-Net on 200 classes (bird, ball and car etc.) of images from Tiny ImageNet dataset [68] and ❷ the second experiment was to run W-Net on five classes without visual similarity (fish, clothes, chair, car and teddy bear) from Tiny ImageNet dataset with the same (imbalance) distribution of our WBC dataset. In these two experiments, we only used different dataset with our WBC dataset, and used same network, parameters (learning rate and training epoch etc.) and 10-fold cross-validation approach with our W-Net. In the first experiment, we used the dataset with 200 classes, and each class had 500 images. We used total 100,000 images. The result from the first experiment showed 100% accuracy for the 200th class, but 0% accuracy for the other 199 classes. The average accuracy was 0.5%, it showed that the model was not trained at all. In the second experiment, we used the dataset with 5 classes, and each class had 500, 333, 100, 433, 300 (making them have the same distribution with our dataset) images. We used total 1,666 images. The result from the second experiment showed 34% accuracy for the third class (100 chair images), and 84%, 78%, 90% and 65% for other classes, respectively. The average accuracy was 79%, which was not as good as the results of W-Net using our dataset. Therefore, we claim that a simple network may be better to classify our WBC dataset with data distribution imbalance, small number of classes, and visual similarity.

Why Not RNN?
RNN-based models perform well in sequential data and show remarkable results in capturing temporal dependencies within the input data. There are different variations of RNN, and for our experiments we used LSTM models for their abilities to handle long-term dependencies (e.g. 128 sequences in our application) and the vanishing gradient problem. The average achieved results when using one-layer LSTM model with 32 hidden units is 83%. This result is far from the results achieved by W-Net (97%). However, it outperform other CNN-based models such as VGGNet (44%) and ResNet50 (51%).

Dataset Sharing
Recent advances in big data have also led to advances in deep learning, accordingly having a good dataset has become important. In this section, we generate new WBC image samples using Generative Adversarial Networks (GAN) [18]        , where, x~p data (x) and z~p z (z) indicate the distribution of a real data and a fake data respectively, the D aims to maximize logD(x) and G aims to minimize log(1-D(G(z))), to maximize the chance to recognize real images as real and generated images as fake. This expression defines GAN as a minimax game.  train (G and D) models for generating images. For the network of D, six convolutional layers, one fully connected layer, LeakyReLU [70] activation, sigmoid activation and dropout are used. For the network of G, six convolutional layers, one fully connected layer, ReLU activation, sigmoid activation, dropout, and batch normalization [71] Figure 3 shows the samples of both the original images (left side) for training DCGAN model and the generated images (right side) by trained DCGAN model. The first row of the Figure 3 is the neutrophil class, followed by the eosinophil, the basophil, the lymphocyte, and the monocyte. Generated Image Quality. To see how similar images were generated from the original images, we verified the generated WBC images using ❶ baseline-W-Net, ❷ generative-W-Net (i.e., W-Net trained on generated synthetic dataset), ❸ cosine similarity, and ❹ domain-expert experiment with a medical laboratory specialist. First, we experimented to classify the generated images using W-Net. Table 5 shows the confusion matrix for the results achieved for the classification of the generated WBC im-ages. The second column indicates true classes, the second row indicates predicted classes, and the images are well-classified with 100% accuracy by W-Net. Second, we trained W-Net model using the 5,000 generated synthetic images. For the training, we follow the same experimental settings of creating the baseline-W-Net. Then, we evaluated the generative-W-Net for classifying the 6,562 real WBC images. Table 6 shows the confusion matrix for the results achieved for the classification of real WBC images using generative-W-Net. The images are classified with an accuracy of 95%, precision of 93%, recall of 95%, and F1-score of 94%. Third, we measure the similarity between the original images and the generated images using cosine similarity. We first measure the cosine similarity between the original images and the original images for each class (e.g. 377 vs 377 for the basophil class), then we measure the cosine similarity between the original images and the 1,000 generated WBC images for each class (e.g. 377 vs 1,000 for the basophil class) and then we compare them. Table 7 shows the difference in the cosine similarity between the original images and generated images. It was 4% for the neutrophil, 3% for the eosinophil, 7% for the basophil, 6% for the lymphocyte, and 6% for the monocyte with average 5% for five classes. Fourth, we conducted a domain-expert experiment on how well a medical laboratory specialist could classify the generated WBC images. The dataset used in this experiment consists of 10 random original images and 10 random generated images for each class, i.e., a total of 100 images. Without informing the medical laboratory specialist of the source of the WBC images in the dataset, we asked for the classification of provided images. Table 8 shows the confusion matrix for this experiment. The results show that the specialist well-classified the given WBC samples with an accuracy of 95%. Among the five misclassified images, there are three original images and only two generated images. The results of all verification methods for the generated images show that the generated images are similar to the original images. We released the generated (labeled) WBC images on GitHub [67] for the education and research purposes.

Conclusion
Analysis of WBC images is essential for diagnosing leukemia. Although there are several methods for detecting and counting WBCs from microscopic images of a blood smear, the classification of the five types of WBCs is still a challenge in real-life applications, which we addressed in this work. The rapid growth in the area of computer vision and machine/deep learning have provided feasible solutions to classification tasks in many domains. This work proposed W-Net, a CNN-based architecture with a small number of layers, to accurately classify the five WBC types. We evaluated W-Net on a real-world largescale dataset and addressed several challenges such as the transfer learning property and the class imbalance. W-Net achieved an average classification accuracy of 97%. Moreover, we compared the result of W-Net with W-Net with SVM, AlexNet, VGGNet, ResNet and RNN architectures to show the superiority of W-Net which consists of three layers over other architecture. We synthesized a dataset of new WBC image samples using DCGAN, which we released to the public for education and research purposes.         Figure 1 Neutrophil, eosinophil, basophil, lymphocyte and monocyte from the left. These were cropped and rescaled with 128 x 128 x 3 for e cient training.

Figure 2
An overview of the pre-processing and the proposed CNN-based architecture for WBC image classi cation. The pre-processing consists of cropping, re-sizing and normalizing. Three convolutional layers (including three pooling layers) are in charge of extracting and learning features, and two fully connected layers are in charge of classi cation.