A Hybrid Classi(cid:12)cation Method of Medical Image Based on Deep Learning

With the rapid development of modern medical science and technology, medical image classi(cid:12)ca-tion has become a more and more challenging problem. However, in most traditional classi(cid:12)cation methods, image feature extraction is di(cid:14)cult, and the accuracy of classi(cid:12)er needs to be improved. Therefore, this paper proposes a high-accuracy medical image classi(cid:12)cation method based on deep learning, which is called hybrid CQ-SVM. Speci(cid:12)cally, we combine the advantages of convolutional neural network (CNN) and support vector machine (SVM), and integrate the novel hybrid model. In our scheme, quantum-behaved particle swarm optimization algorithm (QPSO) is adopted to set its parameters automatically for solving the SVM parameter setting problem, CNN works as a trainable feature extractor and SVM optimized by QPSO performs as a trainable classi(cid:12)er. This method can automatically extract features from original medical images and gen-erate predictions. The experimental results show that this method can extract better medical image features, and achieve higher classi(cid:12)cation accuracy.


INTRODUCTION
With the rapid development of medical technology and the continuous emergence of various medical devices, more and more medical images appear in clinical diagnosis. Therefore, medical image analysis has been widely used, and how to effectively and accurately classify these image information has become an urgent problem.
In recent decades, in order to solve the problem of classification, many effective methods which involved in statistical learning [1], pattern recognition [2], computer vision [3] and signal processing [4] have been proposed. Medical image classification and recognition are important research direction of pattern recognition, which has been widely concerned by researchers [5][6][7][8][9]. In many classification methods, support vector machine (SVM) [10] stands out and becomes one of the most popular methods, which is because SVM model has strong generalization ability. Different from the previous classification methods based on empirical risk minimization, SVM is a new machine learning method based on statistical learning theory, which has strong nonlinear processing ability. In recent years, SVM has become a standard tool in the field of machine learning and is increasingly being applied to pattern recognition, data mining, intelligent control, prediction, etc [11][12][13][14]. However, SVM still has some disadvantages because of the dubious determination of the parameters, such as penalty parameter C and kernel parameter σ. Sometimes, from parameters optimization to model selection, SVM may fall into an over-fitting problem [15]. Therefore, when applying SVM to solve the classification problem, the parameters including the penalty parameter C and the kernel parameter σ directly affect the final classification accuracy.
For a long time, researchers have done a lot of research on the parameters optimization of SVM. For example, a grid search (GS) method based on cross validation (CV) idea to find optimal parameters, Gao et al. predicted the status of the Tennessee Eastman (TE) Process by optimized SVM model [16]. Sopyla et al. proposed to apply Barzilai Borwein update steps in the classical SGD algorithm to solve the optimization of the SVM model [17]. The partial derivatives proposed by Rubio et al. [18] are used to improve the procedure for parameter optimization of LS-SVM. However, these methods are easy to fall into local optimum when optimizing the parameters of SVM model, and result from certain difficulties to the improvement of SVM performance. In recent years, with the development of machine learning, evolutionary algorithms (EA) such as genetic algorithm [19], particle swarm optimization algorithm (PSO) [20], and quantumbehaved particle swarm optimization algorithm (QP-SO) [21] have emerged. These algorithms are improved on the basis of non-derivative learning methods, which makes the optimization method much more flexible. Among them, QPSO is a swarm-based stochastic optimization technology. Compared with standard PSO, QPSO has more global search ability and less control parameters. Therefore, we employ QPSO to solve SVM parameters optimization problem.
Deep learning method is a new technology in the field of pattern recognition, which imitates the automatic learning process of human brain. It is developed by researchers in the early neural network and has been successfully applied to various types of image processing tasks [22][23][24][25][26]. In recent years, due to the breakthrough of deep learning method in image processing [27], the application of deep learning method in the field of medical image classification has also begun [28][29][30]. Hua et al. [31] preliminarily discussed the application of deep belief network model and deep convolution neural network to the classification of solitary pulmonary nodule. David et al. [32] used image processing technology and image classifier based on artificial neural network to analyze the retinal image automatically, and classified the image according to the disease situation. Ramirez et al. [33] applied artificial neural network to brain images to analyze the condition of Alzheimer's patients. Derya Avci et al. [34] proposed a new approach based on adaptive wavelet entropy energy and neural network classifier for urine cell recognition. Among them, convolutional neural network (CNN), as a classical learning model in deep learning, has been widely used. CNN has the ability of representation learning, which can do shiftinvariant classification for input information according to its hierarchical structure. In addition, the rapid development of modern computers provides strong computational support for CNN training. Through thou-sands or even more training, the parameters in the network are constantly updated, and finally the convergence state is arrived. Traditional medical image classification methods mostly need to extract features such as brightness, shape, gradient, gray level and texture, and select relevant features. The process is complicated, while the deep learning method can automatically train the image and extract multi-level features, and does not need complex image processing steps, so the process is simple. However, as far as we know, in many literatures, few people use CNN as a feature extraction method. In our work, CNN acts as a feature extractor to extract the features of the original image, which can avoid the complex image processing steps in the process of feature extraction and retain the important features of the image. And compared with the current popular feature extraction method of principal component analysis (PCA) [35].
On the basis of the previous studies, this paper proposes a new medical image classification method, which combines CNN, QPSO and SVM, named hybrid CQ-SVM model. Specifically, as a trainable feature extractor, CNN extracts the important features of the original image. SVM optimized by QPSO is used as a trainable output classifier for image classification and prediction. In order to evaluate the performance of the hybrid CQ-SVM model, we carry out experiments based on the image of breast cancer cells, and confirm the validity of the model.
In summary, the main contributions of this paper are as follows.
(1) We use CNN to design a trainable feature extractor, which can automatically extract the advanced features of the original image, making the image distinguishable.
(2) We use QPSO to optimize SVM, design a trainable high-precision output classifier, and do a lot of comparative experiments to verify its effectiveness.
(3) Combining the advantages of CNN and SVM, we propose a new hybrid CQ-SVM model to improve the accuracy of medical image classification.
The rest of this paper is organized as follows. Section 2 introduces some preliminary knowledge. Section 3 constructs the hybrid CQ-SVM model. Analysis for the experiments and results are provided in Section 4. Section 5 gives the conclusions and future research direction.

PRELIMINARIES
In this section, we introduce CNN model, QPSO algorithm and SVM model in turn. L L L L L P P P P P P Fig. 1: Structure of the adopted CNN.

CNN Model
CNN is a kind of multi-layer neural network with deep supervised learning structure [36]. It can be regarded as two parts: an automatic feature extractor and a trainable classifier. The feature extractor consists of several feature mapping layers, which can extract data features with high recognition rate by convolutional filtering and down sampling. The trainable classifier is usually a fully connected multi-layer perceptron. The back-propagation algorithm is used to train the weight and bias in the feature extractor.   1 presents an example of CNN structure, which consists of five layers. The input layer L 1 is a matrix with the size of P 1 × P 2 . The original image will be transferred to the input layer after preprocessing. L 2 and L 3 are feature mapping layers, which are used to calculate data features. Among them, the size of the convolutional filter kernel is 5 × 5, and the down sam-pling operation ratio after filtering is 2. Each neuron on the layer is connected to the previous layer, and each neuron has 25 inputs in total. All neurons in a feature mapping layer share the same kernel and connection weight, which is called "weight sharing". Each feature map layer reduces the feature size from the previous feature size P to ⌈(P − 4)/2⌉. As shown in Fig. 1, , where ⌈x⌉ denotes the largest integer not exceeding x. L 4 and L 5 are hidden layers, which together form a fully connected multi-layer perceptron. In our experiments, we use a more complex CNN network named LeNet-5 [37], and use the optimized SVM classifier as the trainable classifier to replace the multi-layer perceptron. This process will be described in detail in section 3.

QPSO Algorithm
QPSO algorithm is an improvement of PSO algorithm. Its main goal is to search the position close to the optimal solution in the search space. In the QPSO algorithm, each particle is a point in the search space, which represents a solution of the problem in the ddimensional space. The number of particles N is determined by the user. The positions of all particles are randomly initialized. The position of the ith particle in the kth iteration is represents the current optimal position of the particle. P (k) (g) is called the global optimal position, which represents the position of particles with the best fitness value. The update equation of particle position is as follows: where M best is the average of P(i), and represents the best position in the average particle history. It is calculated as the following equation.
p(i) is a design parameter, called a local attractor. It is an important parameter for the convergence of QPSO algorithm. It is calculated as follows: In Eq. (1), α is called updated parameter, which is used to control the convergence rate of the algorithm. Its calculation formula is as Eq. (4). α 0 is the initial value of α, and α 1 is the final value of α. k max is the maximum number of iterations. In our experiments, we set these two values to 0.5 and 0.9, respectively [38].
In the above formulas, θ, t and λ are the uniformly generated random numbers in the range of [0,1].

SVM Model
As shown in Fig. 2, the main idea of SVM is to optimize the linear separable classification plane. The ultimate goal is not only to reduce the sample classification error, but also to maximize the sample classification interval. H is the optimal hyperplane. H 1 and H 2 are class interfaces.
(1) Mathematical Modeling of SVM: Suppose there is a training sample set The goal of the SVM classifier is to find an optimal hyperplane f (x) = w T ϕ(x) + b in the sample space, so that the hyperplane can divide the samples into two categories correctly. The optimization problems to be solved by SVM are as follows: where w is an n-dimensional vector known as weight or support vector, and b is scalar called bias. ξ i is a relaxation variable, which is used to solve the approximate separable problem in SVM. C is a regularization parameter (also called penalty factor), which controls the penalty degree of wrong samples, and achieves the trade-off between the ratio of wrong samples and algorithm complexity.
(2) Kernel Function: The function of kernel is to transform the completely indivisible problem into the separable problem or to reach the approximately separable state. Choosing proper kernel function is very important for constructing SVM model with good classification performance. The common kernel functions are linear kernel function, polynomial kernel function, Gaussian radial basis function and Sigmoid kernel function. In this paper, we choose radial basis function (RBF) as the kernel function of SVM. The specific expression is as follows: where σ is a kernel parameter, which is related to the number of support vectors and the determination of training time.

HYBRID CQ-SVM MODEL
In this section, we introduce the proposed hybrid CQ-SVM model. The model is designed with reference to CNN classification model and SVM classification model. First, the hybrid model based on CNN is described, and then QPSO is used to optimize SVM in the hybrid model based on CNN to build its corresponding hybrid model.

Overview of Hybrid CQ-SVM Model
The design idea of the hybrid CQ-SVM classifier model proposed by us comes from the advantages of CNN and SVM [39]. CNN has a strong learning ability, and can automatically learn the important features of images. It does not need complex image processing steps. SVM has powerful generalization ability. The two combinations will undoubtedly form a better model. The architecture of CQ-SVM model is designed by replacing the last output layer of CNN model with SVM classifier. Moreover, in order to further improve the classification performance of SVM, QPSO is used to optimize its parameters. For the output units in the last layer of CNN network, they are the estimated probabilities of input samples. Each output probability is calculated by the activation function. The input of the activation function is a linear combination of the output of the previous hidden layer and the weight, plus a bias term. It is meaningless to see the output value of the hidden layer. It is only meaningful to the network itself. However, these output values can be used as the feature data of the original image, so they can be used as the input of SVM classifier. The structure of the hybrid CQ-SVM model is shown in Fig. 3. Firstly, the preprocessed image data should be inputted to the original CNN for network structure training, and the network weight and bias parameters should be updated until the training process converges. What should be emphasized here is that the CNN used in our experiment is LeNet-5 network structure. Then, the trained CNN network is used as the feature extractor of CQ-SVM model. After CNN feature extraction, SVM optimized by QPSO takes the output of hidden layer (L 4 ) as the new feature vector of training. Finally, the SVM classifier gets a good optimization and training. It will perform the recognition task, and make a new prediction for the image with this automatic feature extraction. model are used as model fitness evaluation. At the end of each iteration, condition judgment will be carried out. The parameter optimization is executing until the termination of the model training, then C and σ are obtained as the results. In this paper, QPSO is used to search for the best combination of these two parameters. The specific steps are as follows:
Step 2: Establish training sample set and test sample set.
Step 3: Determine the structure of SVM and the choice of kernel function. The RBF function is chosen in our experiments.
Step 4: Calculate the optimal location of particles and particle swarm.
Step 5: Update the particle position based on the update formula.
Step 6: After the end of this iteration, if the number of iterations exceeds the preset value or the calculation error reaches the preset value, the accuracy requirements will be met and the calculated value will be output. Otherwise, go to step 4.

EXPERIMENTAL RESULTS
In this section, we first introduce the experimental process and related parameter settings, then introduce the medical image data set used in the experiment, and finally analyze the experimental results.

Data Set
The medical image data set used in the experiment is from the breast histopathology image on the kaggle website [40]. The original data set consists of a complete slide image of 162 breast cancer specimens, each scanned 40 times, and 277524 color blocks (IDC negative 198738, IDC positive 78786) with an extraction size of 50 × 50. The mixed breast slice image is shown in Fig. 5. We select 12000 images as the experimental data set, including 6000 negative images and 6000 positive images.

Experiment Process and Parameter Setting
(1) The flowchart of the experiment is shown in Fig.  6. The left is the training process, and the right is the testing process. In all experiments, the symbols 'CN-N' and 'PCA' refer to the corresponding feature extraction methods based on CNN and PCA. Firstly, two kinds of images are extracted from breast histopathology images, and training set and test set are obtained. After preprocessing (binarization, grayscale, etc.), the data are extracted by CNN. Finally, SVM classifier optimized by QPSO is used to classify and recognize the data.
(2) For CNN, we use a typical LeNet-5 framework, whose network structure is described in [38]. Here, we select the convolution kernel size as 5 × 5, and the pooling size as 2×2 to extract the characteristics of medical images. At the same time, we also use another typical feature extraction method: PCA, for which we use 90% components on purpose to ensure a fair comparison. For QPSO-SVM, the searching ranges of C and σ are all set as [0.1,100]. The swarm size N is set to 100. The maximum number of iterations k max is set to 50. For comparison, we select another two highly related optimization methods PSO and CV. For PSO-SVM, the maximum number of iterations k max , the swarm size, and the searching ranges of C and σ are set as the same with QPSO-SVM. For CV-SVM, three-fold CV is used and the searching ranges of C and σ are all set as [ 2 −10 , 2 −9 , . . . , 2 10 ] . We should emphasize that the QPSO and PSO based optimization methods only need the extreme values of the ranges of C and σ without defining a step value for these two parameters. The QPSO-SVM, PSO-SVM and CV-SVM classification methods are all implemented under the package of Libsvm. Here, we should also emphasize that we empirically select the values for most of the involved parameters. Although this strategy might be suboptimal, it has been observed to produce good results in practice. Another important reason is that all methods share the same parameter settings, which is fair for comparison.

Samples
Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test  1000 200 2000 400 3000 600 4000 800 5000 1000 6000 1200 6000 1000 7000 1400 8000 1600 9000 1800 10000   (3) Finally, we would like to emphasize that the experiments were conducted under Matlab R2014a in a processor Intel(R) Core(TM) i7-8700CPU@3.20GHz with an operating system of 64-bit.

Experiments with Medical Images
(1) First, we randomly select 10000 images as training sets and 2000 images as test sets to train CNN. Fig. 7 shows the training errors of the CNN model with 10∼80 training epochs. It can be seen that the curve of the training error drops, becomes more and more smooth, and tends to be stable. The test errors of Fig.  7a ∼ Fig. 7h are 4.14%, 4.40%, 3.35%, 3.00%, 3.10%, 5.45%, 2.95%, 3.10%, respectively. The curves in Fig. 7a and Fig. 7b are not smooth, and there are many spikes. The curves in Fig. 7e and Fig. 7f tend to be stable, but there are many spikes. The curve in Fig. 7g is relative-ly smoothest among the 8 curves, and there are fewer spikes. The above results show that the training error does not decrease with the increase of the number of iterations. Therefore, in the following experiments, we choose the corresponding model of Fig. 7g as the CNN feature extractor.
(2) In this part, we use multiple experiments to prove the effectiveness of CNN feature extraction. In the experiment, we use the SVM classifier which comes with Matlab toolkit, hereinafter referred to as NO-SVM. First of all, we use 12000 pictures to do multiple experiments, and gradually increase the number of pictures in training set and test set. We used NO-SVM and CNN-SVM to do comparative experiments. The experimental results are shown in Table 1 and Table 2. Each group is carried out eight experiments. Fig. 8 shows the average value of the training accuracies in each group. As you can see from the figure, as the size of the data Table 2: Test accuracy of CNN-SVM(%).

Samples
Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test Train Test  1000 200 2000 400 3000 600 4000 800 5000 1000 6000 1200 6000 1000 7000 1400 8000 1600 9000 1800 10000    (3) In this part, we carry out a comparative test of feature extraction methods. 6000 and 1000 pictures are randomly selected as training set and test set, and CNN-SVM, PCA-SVM and NO-SVM are used for multiple groups of experiments. The experimental results are recorded in Table 3. It should be noted that the training set and data set used in each group of experiments are the same. In order to make the results more intuitive, we made a line chart of test accuracy, as shown in Fig. 9. Blue line and dots represent CNN-SVM, red line and dots represent PCA-SVM, and black line and dots represent NO-SVM respectively. Three lines of different colors represent the corresponding average value. It can be seen from the figure that after the feature extraction method is added, the test accuracy is significantly improved. Compared with PCA, the test accuracy of CNN is higher than that of PCA. (4) In this part, we still use 6000 pictures as the training set and 1000 pictures as the test set. We prove the effectiveness of the proposed CQ-SVM classification method for medical images. First, we obtain the fitness curves of QPSO-SVM, PSO-SVM, and the fitness surface of CV-SVM based on CNN. As shown in Fig. 10, we can see the fitness values reach to 94.65%, 94.42%, and 94.31% for QPSO-SVM, PSO-SVM, and CV-SVM on train samples, respectively. The average fitness curves tell the average fitness between the current position and its optimal position for each particle during the evolution. While, the best fitness curves (the red lines) tell the overall fitness of the swarm, which are stable during the evolution and reflect the final outcome accuracy. Concerning CV-SVM, the 3-D graphics in those figures directly illustrate the accuracy since CV conducts a simple grid search of different C and σ parameters within a predefined range. Therefore, from Fig.  10 we can see our parameter settings for QPSO-SVM, PSO-SVM, and CV-SVM are appropriate for deriving a desired and stable classification performance. Similarly, the method of averaging the results of multiple groups of experiments. We give the experimental results of using different feature extraction methods and optimization methods on the same training set and test set. The results are shown in Table 4. We highlight the best ones (the highest classification accuracy) with bold typeface for each cases. It can be seen that the proposed CQ-SVM model has the highest test accuracy. √ " means the corresponding method is used in the experiment.

CONCLUSION
This paper has proposed a new hybrid CQ-SVM model based on deep learning to solve the problem of med-ical image classification. This model has used a CNN based feature extraction method and SVM optimized by QPSO for classification. CNN is a trainable automatic feature extractor, and SVM optimized by QPSO is a trainable output classifier. In order to prove the effectiveness of the model, we have carried out multiple groups of classification accuracy comparison experiments, and evaluated the performance of CQ-SVM classification model under different feature extraction methods and parameter optimization methods as well as their combination. Further experiments with additional scenes and comparison methods should be conducted in the future. Furthermore, we also envisage the following future perspectives for the development of the presented work.
(1) In this paper, we only give the experiment with low dimensional optimization application by adopting QPSO. In our future work, we will dig high dimensional practical problem to validate the advantages of QPSO.
(2) We will study and improve the optimization method of SVM, so that it can optimize the parameters of SVM better.
(3) We will use more complex CNNs to perform feature extraction on higher-dimensional pictures, such as AlexNet, VGGNet.
(4) We only compare the traditional feature extraction method such as PCA. In our future work, we will include some state-of-the-art methods.

Compliance with Ethical Standards
Conflict of interest Author Yulong Wang declares that he has no conflict of interest. Author Xiaofeng Liao declares that he has no conflict of interest. Author Dewen Qiao declares that he has no conflict of interest. Author Jiahui Wu declares that he has no conflict of interest. Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.