Two-way threshold-based intelligent water drops feature selection algorithm for accurate detection of breast cancer

Breast cancer is one of the common reasons for deaths of women over the globe. It has been found that a Computer-Aided Diagnosis (CAD) system can be designed using X-ray mammograms for early-stage detection of breast cancer, which can decrease the death rate to a large extent. This paper work proposes a novel 2-way threshold-based intelligent water drops IWD “algorithm for feature selection to design an effective and efficient CAD system that can detect breast cancer in early stage. This approach first extracts the local binary patterns in wavelet domain from mammograms and then applies our introduced 2-way threshold-based IWD algorithm to extract most important subset of features from the extracted features set. Two-way thresholding is a technique to find a lower bound and an upper bound on the number of features to be selected in the optimal subset. So, using these threshold values, IWD is capable of producing multiple optimal subsets of features rather than producing a single optimal subset of features. The best subset among the above subsets is then used to train and deploy support vector machine (SVM) to classify new mammograms. The results have shown that the proposed model outperforms many of the existing CAD systems. Further we have compared our introduced feature selection technique with other meta-heuristic features selection techniques such as ant colony optimization, particle swarm optimization, simulated annealing, genetic algorithm, gravitational search algorithm, inclined planes optimization and gray wolf optimization algorithm and found that it outperforms the other feature selection techniques. The accuracy, precision, recall, specificity and F1-score of our proposed framework are measured on MIAS dataset as 99%, 98.7%, 98.123%, 96.2% and 98.4%, respectively, and on DDSM dataset as 97.89%, 96.9%, 96.4%, 94.8% and 96.2%.


Introduction
Breast cancer is one of the vital categories of cancer, which causes a huge number of deaths of women around the globe. According to ''Breast Cancer Research Foundation (BCRF),'' around 2.3 million new cases of breast cancer were recorded in the year 2020 (Statistical data for breast cancer 2020). Between the year 2008 and 2012 the detected breast cancer cases increased by 20% as well as the mortality rate by 14% (Statistical data for breast cancer 2020). One of the main reasons for increase in these rates is unhealthy lifestyle in most of the urban and economically developed countries. With the involvement of the modern technologies, though we cannot reduce the increasing rate of breast cancer cases, but we can decrease the mortality rate. By detecting breast cancer in early stage, one can take the necessary actions to prevent its further growth. One of such detection techniques is CAD.
CAD systems (Computer -aided Diagnosis: Tipping Point of Digital Pathology 2017; Baker et al. 2003;Suzuki et al. 2005) are automated systems which assist doctors to interpret the medical images. They can be treated as interdisciplinary technology which combines components of artificial intelligence and computer vision with radiological and pathology image processing. For the diagnosis of breast cancer, the CAD systems need to be trained with mammograms. Mammograms are the images generated by mammography (Pisano and Yaffe 2005;Gøtzsche and Jørgensen 2013). It uses a low-dose X-ray system to look inside the internal tissues and parts of the breast. In this kind of system certain part of the body is exposed to some amount of ionizing radiation to get images of inside of the body. Later physicians can consult and check those images for further diagnosis.
Designing of a CAD system consists of various steps such as pre-processing, segmentation, feature extraction, feature selection and designing efficient classifier (Bandyopadhyay 2010; Mirzaalian et al. 2007;Ibrahim et al. 2016;Comer et al. 1996;Kupinski and Giger 1998;Lu et al. 2013;Hong and Sohn 2009;Mustra et al. 2016;Singh et al. 2016;Beura et al. 2015;Heinlein et al. 2003;Du et al. 2010;Zhang et al. 2017;Liu and Tang 2013). In the preprocessing phase removal of artifacts, noise, etc., from the mammograms needs to be done. Segmentation is basically the process of finding region of interests (ROI). The region of interest is the phase where all detected regions are analyzed for special characteristics. In feature extraction, the features from the mammograms are get extracted in the form of vectors. After feature extraction classifier modeling and validation of this classifier is done so that newly coming mammograms or test samples can be classified properly.
There have been various CAD systems proposed in recent years that have their own advantages and disadvantages. Table 1 depicts various CAD systems along with their limitations. So, one can build an effective and efficient CAD system by altering techniques in either one or more phases (preprocessing, segmentation, feature extraction, feature selection and designing efficient classifier), while designing. From the literature, it has been found that in most of the works, main focus remains on the in pre-processing, segmentation, feature extraction and classification phases. Only few researchers have put emphasis on the feature selection in the post-analysis phase, or features selection has been applied in a very refinement level for mammogram classification. On the other hand, in spite of using most demanding and appealing techniques in all the phases only few researchers could have achieved a proper performance of a CAD system which can identify abnormalities in mammograms. It also demands more resources to train the model, if we consider all the features extracted. Feature selection basically happens after the feature extraction from the mammograms while designing a CAD system. If feature extraction gives n number of features, we try to find m dominating features from those n features, where m\n, in feature selection. There can be found many feature selection techniques available in the literature in different problem domains, but only few of them have been applied to design a CAD system. In the next paragraph we have put some insight on various feature selection techniques. So, to remove the above-mentioned gap we have tried to build a CAD system by keeping very basic techniques for all the phases except the feature selection. For the feature selection we have introduced a 2-way thresholdbased IWD algorithm. This algorithm finds subsets of features from the extracted LBP (Du et al. 2010) feature in wavelet domain. The results have shown that our proposed CAD system outperforms many of the existing state-of-theart works in the literature.
There are many other bio-inspired models that have been utilized for problems in the medical domain other than mammograms. Woźniak et al. in their article Proposed a CAD system that uses an automated morphological operation-based segmentation for finding the suspicious masses in the breast The authenticity of the proposed work has been proved only against GLCM based classification, which restricts it to be more generalized. On the other hand, the accuracy detection rate has achieved as 95% only with 44 mass mammograms Soulami, K. B., Saidi, M. N., and Tamtaoui, A. (2016) (Soulami et al. 2016) Proposed a CAD system that uses-i) SVM as a base classifier, ii) entropy thresholding for pectoral muscle removal, iii) PSO for ROI extraction, iv) GLCM to extract shape and the texture features The heavy-weighted processing in the segmentation and feature extraction phases makes the designing of the CAD system slow. Accuracy of classification is also very less ( Proposed an associative classifier-based fuzzy neural network integrated CAD system for mammogram classification The classifier, presented in this system, takes association rules as input, involves creation and training using fuzzy neural network which makes in slow in the training process. On the other hand, the accuracy rate achieved is also 95% Singh, V. P., Srivastava, S., and Srivastava, R. (2017)  The phase of feature extraction is slow in this proposed CAD system Two-way threshold-based intelligent water drops feature selection algorithm for accurate detection of… 2279 Proposed an intelligent healthcare system that uses harmony search (HS) and simulated annealing (SA) combined for precise and accurate malignancy. The classifier that has been used for their proposed CAD system is SVM (with kernel) and to proof the efficiency of their proposed framework they have applied it on local mammographic dataset No convergence analysis has been provided for the proposed work. The efficiency of the framework has been measured only in terms of sensitivity and specificity (Woźniak and Połap 2018) have proposed bio-inspired methods to detect respiratory diseases such as pneumonia, lungs sarcoidosis and cancer from medical images. They used the bio-inspired methods to search for special features of pixels that represent the above-mentioned diseases. Capizzi et al. (Capizzi et al. 2019) in their work have proposed a rule-based fuzzy system combined with neural network to evaluate an X-ray image for detecting small lung nodule detection. Their proposed method uses type-1 fuzzy membership functions. It also uses bio-inspired reinforcement learning system for reserving the generalization capabilities of the probabilistic neural network. It has been found that almost all the metaheuristic feature selection techniques produce a single subset of features for which a global optimum value of the objective function can be obtained. Each time we run such an algorithm; the total number of features selected in the subset remains almost the same. But, through experiments we have found that, by controlling the number of features to be selected within the subset, we can further enhance the performance of such an algorithm. So, in spite of running those algorithms to produce a single subset (with fixed number of features), we can run them to generate multiple subsets with different number of features within them. To do so, we need to run them for a range assigned to the parameter that represents the total number of features in a subset. In this paper work, we have introduced a mechanism called 2-way thresholding to produce multiple subsets of features from the extracted features set. The reason behind this is to make IWD flexible enough to find more accurate result.
IWD is a meta-heuristic optimization algorithm proposed by Shah-Hosseini (Shah-Hosseini 2009) in the year 2009. It is a population-based algorithm based on the natural phenomenon of flow of the water drops while finding their way to the river. IWD is best suited for finding minimal cost path. While finding the minimal cost path, it uses the previous experience. This algorithm operates on a graph N; E ð Þ, where N is the set of nodes and E is the set of edges which can be generated from the problem in hand. A set of IWDs needs to be initialized on the graph so that a simulation for movement of water drops can be created. At each iteration all IWDs achieve their solutions by traversing the nodes of the graph to get a complete solution (i.e., path)T IWD . At the end of each iteration an iteration best solution T IB is obtained with the assessment through a quality function.
The IWD that we have used for feature selection is associated with a controlling parameter that represents final number of features denoted by N Features . In spite of putting Proposed a CAD system that combines GAbased feature selection technique and adaptive neuro-fuzzy interface system (ANFIS) The proposed framework achieves only 71% accuracy rate Two-way threshold-based intelligent water drops feature selection algorithm for accurate detection of… 2281 restriction on the number of important features IWD can select from the extracted feature set, it would be much better if we would make it flexible. The controlling parameter mentioned above has been used to fulfill this purpose. To set the range for this control parameter we have introduced a mechanism of 2-way thresholding which find a lower bound (LB) and an upper bound (UB) on the features of the dataset. IWD with this concept of thresholding for optimal feature subsets selection improves the convergence rate as well as the detection performances of the CAD system. So, the primary contributions of our propose framework can be listed as below • Designed a CAD system in which we first extracted the LBP features from the mammograms by decomposing them using discrete wavelet transform (DWT). We have decomposed the images through the multi-resolution for making texture visualization clearer. Further, we have extracted computationally light-weighted local binary patterns from each level. This feature takes the advantages of the gradient-based feature and holds the characteristics such as tolerance against illumination changes and robustness against monotonic gray-level changes. • The extracted features set contains some irrelevant features which normally may misguide the classifiers during their training phase. We have introduced a new variant of intelligent water drops (IWD) algorithm-''2-way threshold-based IWD'' to select most dominating set of features from the extracted features set. The 2-way thresholding mechanism first finds a LB and UB on the number of dominating features to be selected that makes the IWD algorithm capable of finding a number of subsets of dominating features. IWD algorithm without 2-way thresholding can only find a rigid subset of dominating features, and as a result, the variations of the performance of the CAD systems with respect to the increase and decrease of the total number of features in the optimal subset cannot be captured properly. • The selected relevant features are then used to train SVM along with 5 other classifiers. Among all these classifiers we found that SVM gives better detection performances as compared to the other classifiers.
The rest of the paper has been divided into three sections. ''Section 2'' contains elaborative details of the proposed framework and the methods used to develop the proposed framework. In this section LBP feature selection in wavelet domain, IWD algorithm and 2-way thresholding has been explained. ''Section 3'' explains and depicts all the results and analysis of the experiments that we carried out to proof our claims. Finally, ''Section 4'' concludes the proposed work giving some hints on its scope in the near future.
2 Proposed methods based on 2-way threshold-based IWD The CAD system proposed in this work consists of two basic phases. The first phase is the pre-phase and the second one is the post-phase. The pre-phase consists of all the trivial steps of image processing such as pre-processing and feature extraction. The basic contribution of this paper work can be found in the post-phase of the model where we have introduced a 2-way threshold-based IWD. This phase takes the dataset or features set generated by the pre-phase as input that contains a total of 1024 features and apply the algorithm. A basic model of our proposed CAD system is shown in Fig. 1. Further the post-analysis comprises of designing a classification model that classifies mammograms more accurately. The base classifier, we have used here is support vector machine (SVM). In the first step, we trained SVM classifier by fetching the dataset or features set generated after pre-processing. In this step we did not go through any overhead of making SVM fine tune by adding any optimization technique nor by fetching selective Fig. 1 General model of the proposed CAD system features from the dataset. We then tried to find the accuracy of the model using k À fold cross -validation method for various values of k. In the second step, we trained the SVM classifier by optimal features set generated by our proposed feature selection algorithm. We carried out the first step only for the comparative analysis.

Feature extraction for the CAD system
In this work, we have extracted wavelet-based LBP features (W-LBP) from each mammogram by decomposing them using discrete wavelet transform (DWT). This decomposition was done up to two levels. Figure 2a depicts the process of wavelet decomposition.
2D-DWT decomposes the ROIs of the mammograms into four sub-bands LL (1) (low-low), LH (1) (low-high), HL (1) (high-low) and HH (1) (high-high), respectively, in different resolution levels. While doing so, it preserves the low and high-frequency details of the images. Among all these four bands LL (1) can be considered as the finest version of the original image fetched as an input. The LL (1) sub-band further decomposed using 2D-DWT to get four second-level sub-bands, and we extracted LBP features from each sub-band as shown in Fig. 2b. Since the dimensions of LBP features are 256, the total number of features, we extracted, is 1024 (256 Â 4).
LBP features are gray-scale local texture features. They are computationally lightweight features derived from local neighborhood of each pixel in the image. LBP operator can be defined mathematically as below: where g i : The gray-scale value of neighborhood pixel. g c : The gray-scale value of center pixel. P : Connectivity from neighborhood pixels. R : Neighborhood radius for N equally spaced pixels. Two-way threshold-based intelligent water drops feature selection algorithm for accurate detection of… 2283 2.2 Intelligent water drops algorithm (IWD) IWD is an efficient population-based nature-inspired metaheuristic optimization algorithm proposed by Shah-Hosseini (2009) in the year 2009. This algorithm is based on the observation of the movement of the water drops while finding their way to the river, lakes or seas. This algorithm leads to a solution based on the previous experiences, i.e., the solutions obtained in the previous iterations. The algorithm is best suited for finding minimal cost path. From the observation it has been found that water drops tend to move through a path with less soil. The algorithm tries to remove soil from the components of best solution so that other water drops get attracted to the path of the solution. The given problem that needs to be solved using IWD can be represented as a graph N; E ð Þ, where N is the set of nodes and E is the set of edges. To make the algorithm operate, a set of IWDs needs to be initialized. At each iteration, all IWDs achieve their solutions by traversing the nodes of the graph to get a complete solution (i.e., path)T IWD . At the end of each iteration, an iteration best solution T IB can be obtained with the assessment through a quality function. The steps to perform this algorithm can be summed up as follows Phase 1 (Initialization): This phase is responsible for initialization of the static and dynamic parameters of the process. In this phase the given problem is also converted into a graph representation. The static parameters are as follows N IWD : Number of water drops.
Þ : Variables to update the velocity of the water drops. a s ; b s ; c s ð Þ : Variables to update the soil of the local path. MaxIter: Maximum number of iterations. initSoil: Initial value of the local soil. The dynamic parameters of the algorithm get initialized at the start of the iteration and get updated during the search process. The dynamic parameters are as follows V IWD r c : Feature list visited by each water drop r. initVel IWD r : Velocity of the water drop r. Soil IWD r : Soil of the water drop r. In this phase a complete graph representation G ¼ N; E ð Þ of the given problem is produced. In this representation N denotes the set of nodes (features of the given problem) and E denotes the set of edges. The algorithm distributes all the water drops randomly on the nodes of the graph.
Phase 2 (Building solution): This phase is responsible for building solutions for all water drops in a single iteration. To do so this phase goes through two steps i) Edge Selection: In this step, the water drop r on a feature i uses a probability function to choose the next un-visited feature j. The probability function is shown below In the above probability function f soil i; j ð Þ ð Þgives the inverse value of the soil between nodes i and j and soil i; j ð Þ gives the amount of soil on the local path between nodes i and j. ii) Rules of updating: As a water drop r moves from node i to j the values of the velocity and soil of the water drop gets updated. These values at time t þ 1 ð Þ have been shown below The soil that gets removed from the local path carried by the water drop r can be represented as In Eq. (3) time i; j; vel IWD r t þ 1 ð Þ À Á is the time required by the water drop r to move from node i to node j.
In Eq. (4) HUD is problem dependent which is called as heuristic undesirability.
Based on Eqs. (5) and (6), the soil on the path between node i and node j, as well as, soil carried by each water drop can be updated Phase 3 (Rules for reconstruction): The iteration's best solution T IB can be found out from the set of solutions obtained by every IWD, and it can be computed as follows In Eq. (7), T IB is a best solution containing least number of features among all IWDs. After getting this solution T IB , the path's soil that forms the best solution of the current path gets updated as below Phase 4 (Condition for Termination): Phase 2 and phase 3 keep getting executed until the maximum number of iterations is reached. It can be shown below The dynamic parameters need to be set to default values at the start of the next iteration.

Heuristic
In case of our problem, heuristic can be defined as a function which gets evaluated for various choices to decide the alternatives to be followed. This paper work, the proposed feature selection algorithm used this function during the feature selection phase, i.e., the evaluation of the function, determines the next feature to be selected in a subset and the most optimal subset to be selected by the quality function.
In case of IWD, two metrics-the soil content on the path and HUD (heuristic undesirability)-decides the path to be followed or next feature to be selected during the searching of the solution. The soil content Soil i; j ð Þ represents the content of soil on the path that connects node i and node j. The relationship between the Soil i; j ð Þ and probability of the selection of the path can be represented as: Soil i; j ð Þ / 1 Probability of selection of the path The heuristic undesirability HUD i; j ð Þ can be defined as the undesirability of selecting node j after selecting node i into the set of solutions.
After going through various literature on CAD system designing, we have found that most of the feature selection techniques used for in CAD systems are based on singleobjective optimization approach. So, in this proposed work we have kept our approach as single-objective approach, but, in future, we are really willing to extend our proposed work to use powerful multi-objective optimization approach (Shahraki and Zahiri 2020). In this work, we have chosen cross-validation error rate as the HUD function.

F D t n ð Þ ð Þ¼Error
The relationship between the HUD i; j ð Þ and probability of selecting the node j can be represented as follows: HUD i; j ð Þ / 1 Probability of selection of node j If the value of HUD i; j ð Þ decreases, the probability of selecting node j increases, and if the value of HUD i; j ð Þ increases, the probability of selecting node j decreases. This is due to the fact that the heuristic chosen in our case is the cross-validation error rate.
The quality function to decide the iteration's best solution in this paper work has been defined as: where SL T IWD ð Þ: Number of features in the subset selected by the IWD. DR T IWD ð Þ: Detection rate of SVM for the IWD solution.
So, the iteration best solution is the one that gives the minimum value of the quality function which can be represented as:

Concept of thresholding
In this paper work, we have introduced a mechanism to produce multiple subsets of features from the extracted features set. The reason behind this is to make IWD flexible enough to find more accurate result. It has been found that almost all the metaheuristic feature selection techniques produce a single subset of features for which a global optimum value of the objective function can be obtained. Each time we run such an algorithm; the total number of features selected in the subset remains almost the same. But, through experiments we have found that, by controlling the number of features to be selected within the subset, we can further enhance the performance of such an algorithm. So, in spite of running those algorithms to produce a single subset (with fixed number of features), we can run them to generate multiple subsets with different number of features within them. To do so, we need to run them for a range assigned to the parameter that represents the total number of features in a subset. This phenomenon can be explained more clearly with the help of Fig. 3. In the figure, each circle is representing a subset of features selected by a metaheuristic feature selection technique and the radius of each circle r 1 ; r 2 ; r 3 ; r 4 and r 5 represents the total number of features in each subset where r 1 \r 2 \r 3 \r 4 \r 5 . If one of these subsets is the optimum subset produced by the algorithm, then we can easily get multiple subsets by incrementing or decrementing the radius. This also encourages us to find a lower threshold value and an upper threshold value for the radius. The 2-way thresholding is a process of finding lower bound (LB) and an upper bound (UB) on the total number of features, T Featues , to be selected in the subset by IWD. To find the LB and UB in this work, we first performed an exhaustive search by making the total number of T Featues in a subset varying from 1 to 1024. While doing so we followed complete randomization without using IWD, i.e., we let our algorithm pick up features without any quality function or objective function evaluation. We evaluated the cross-validation error rate for each of this subset and recorded the results as shown in Algorithm 2.
A case study to understand the LB and UB selection mechanism is depicted in Fig. 4. As the total number of features in our dataset after LBP features extraction is 1024, we initialized N f with a value 1024 (N f ¼ 1024). We performed the iterations as explained in Algorithm 2 and found that for the index value 60 and 61, the difference E 60 ½ À E 61 ½ is not equal to zero. So, our LB is 60 now. To find the UB we added a threshold distance T dist which is an input to the algorithm. In our case we added a threshold distance T dist ¼ 40. After adding the T dist we decided UB as 100. It is up to the implementation what value one can choose for T dist . In this case study we have found that for all index value less that 60, the accuracy is 95%.
In this process of selecting LB and UB, low accuracy rates had been encountered as no algorithm has been applied to select i features. We found that by applying IWD for this randomization we can increase the accuracy rate to a significant amount. A case study for this randomization is shown in Fig. 5 Figure 6 depicts the whole proposed feature selection framework in the form of a flowchart.

Result analysis and discussion
The simulations and experiments for this proposed work have been performed on a basic standard computer system with the configuration mentioned in Table 3. On the other hand, Python 3.8 is the basic environment that we have used for the implementation of our proposed CAD system. This environment gives flexibility of integrating various packages or modules to accomplish a particular task. For our task the packages or modules that have been used are shown in Table 4.
The dataset that has been used to perform our experiments is mini-MIAS dataset. This dataset contains 322 digital mammograms each of size 1024Â 1024 pixels. After applying the basic feature extraction techniques on the mammographic images of the MIAS dataset, the features had been recorded in a.csv file. For one particular image we have recorded 1024 features with the class of that sample attached to it. Like this, we recorded a total of 327 samples.
For the depiction of proper result and analysis, in this paper work we have followed a step-by-step implementation model. In the first step, we have performed the experiments to proof the superiority of IWD algorithm, over other metaheuristic feature selection approaches such as ACO, PSO, SA, GA, GSA, IPO and GWO for our defined problem. To do this, we have done a comparative analysis of these algorithms by comparing their performances. We have evaluated the statistics of the minima obtained in each of the 1000 runs of all the algorithms for our defined objective function. These statistics are the mean value, the standard deviation, the minimum value (best) and maximum value (worst). Figure 7a-h depicts the results obtained from this evaluation.
From the above results, it is very much clear that IWD works much better than the other algorithms, but the IWD, we have considered for the above evaluation, is the IWD without any integration of 2-way thresholding. So, in the next step we have evaluated the same statistics for IWD with 2-way thresholding. The results have shown further improvements in the performance of IWD algorithm. The reason behind applying 2-way thresholding to IWD is that there may exist a better subset near to the optimal subset found by the algorithm, with the total number of features greater or less than the total number of features in the optimal subset. This holds true for all the algorithms we have mentioned in this paper work. The variations in the error rate with respect to the total number of features in the subsets for all the above-mentioned algorithms are depicted in Fig. 9a-h.
So, from the above results it is also clear that IWD algorithm works much better than the other algorithms even with the variations in the values for N Features . For each run of the algorithm, with different total number of features in the selected subset, IWD generates less error rate (objective function value) compared to the other algorithms.
To consider these variations in the error rate with respect to N Features , 2-way threshold-based IWD first finds a lower bound and an upper bound for N Features . To find out these bounds it runs Algorithm 2. The result obtained by running this algorithm is shown in Table 5. From the above evaluation, we found LB as 60 and UB as 100, because below 60 the global accuracy rate remains at 95 and beyond 100 the global accuracy rate decreases gradually. Now we ran IWD algorithm for each value in this range and obtained multiple subsets of features with N Features varying from 60 to 100. Finally, we keep the best feature subset for validation of our classifier which is SVM. Few of the best subsets obtained during this run of the algorithm are shown in Table 6. Among all the above-selected subsets of features S 3 or S 4 or S 6 can be used to train the SVM for post-analysis, i.e., for classification. When new samples come, based on these features SVM can then classify the mammograms. The precision-recall curves of SVM while trained with all the above subsets are shown in Fig. 10 Now with respect to the number of features in the final subset selected and the objective function value generated for this subset, we can compare the performance of the mentioned algorithms. It has been observed that IWD with 2-way thresholding outperforms all the other algorithms. Table 7 depicts the total number of features in the final subset and the global accuracy rate for this subset.
To do the convergence analysis of our proposed 2-way threshold-based IWD algorithm, we have generated convergence curves for 9 runs of this algorithm on the extracted features set. In each of these runs, we found that the algorithm converged properly without any fail. Figure 11 depicts the convergence curves for IWD with 2-way thresholding.
Further, we have also compared the time of convergence of our propose algorithm with the time of convergence of ACO, PSO, SA, GA, GSA, IPO and GWO. We found that our algorithm converges faster that all the mentioned metaheuristic feature selection techniques. Figure 12 depicts the convergence rate of each of the algorithm. Few of the parameters used in our algorithm have been chosen through experiments, and few of them have been chosen based on their standard experimental values found in various literature. The interpretation of each of the parameters and how each of these parameters has assigned that particular value have been explained below • N Features : This parameter can be considered as the controlling parameter of our two-way thresholdingbased algorithm. It can take a value within the lower bound (LB) and the upper bound (UB) which have been decided using Algorithm 3. In our case the LB and UB have been found as 60 and 100. • n IWD : This parameter is to define the number of intelligent water drops for our algorithm. The value of this parameter can vary from problem to problem. We have found that for a value greater than 200 for this parameter the fitness values of our fitness function do not indicate any improvements and that is why we have chosen a number between 100 and 200 for this parameter. • n ITER : The number of iterations for our algorithm has been chosen as 100 for all the runs with N Features values set within LB and UB. Through various run of IWD for our defined problem we have found that beyond number of iterations, n ITRER ¼ 70, the algorithm gets converged and the values of the fitness function almost remain constant. • a v ; b v ; c v : These are velocity parameters used to update the velocity of the water drops. After going through various applications of IWD algorithm found in literature (Shah-Hosseini 2009Hosseini 2007;Alijla et al. 2013Alijla et al. , 2014 we decided to set the values for these parameters as a v ¼ 1; b v ¼ 0:01 and c v ¼ 1. Running our proposed algorithm with other random values of these parameters we found that the algorithm gives highest performance for the above set values. • a s ; b s ; c s : These are soil parameters used to update the soil associated with the water drops. After going through various applications of IWD algorithm found in literature (Shah-Hosseini 2009Hosseini 2007;Alijla et al. 2013Alijla et al. , 2014 we decided to set the values for these parameters as a s ¼ 1; b s ¼ 0:01 and c s ¼ 1. Running our proposed algorithm with other random values of these parameters we found that the algorithm gives highest performance for the above set of values. • 2 s ; q IWD and q n : 2 s is a constant parameter which needs to be a small positive number to prevent division by zero in the function f Á ð Þ used in the algorithm. It has  Alijla et al. 2013;Alijla et al. 2014). q n is the local soil updating parameter whose value should be a small positive number less than (Shah-Hosseini 2009Hosseini 2007;Alijla et al. 2013Alijla et al. , 2014. So, the value of this parameter has been chosen as 0.9. The global soil updating parameter q IWD on the other hand has been chosen as given in Alijla et al. (2013) • initVel, initSoil: The constant initVel is the initial velocity associated with each of the water drops. We set the value of this parameter to 4 as suggested in [81,82,8384,85]. On the other hand, the constant represents the initial soil associated with each path between every two nodes i and j such that soil i; j ð Þ ¼ initSoil: This parameter can be chosen as any random value as suggested in (Shah-Hosseini (2009);Hosseini 2007;Shah-Hosseini 2010;Alijla et al. 2013;Alijla et al. 2014). We found that for initSoil ¼ 1000 our algorithm gives maximum performance (Table 8).
The control parameter settings of all the other feature selection techniques (ACO, PSO, SA, GA, GSA, IPO and GWO) are shown in Table 9.
In the second step, we used the optimal subsets obtained from each of the algorithms to validate the classifier. To measure the performance of the classification model, we have used four fundamental metrics and they are true positive (TP), false positive (FP), true negative (TN) and false negative (FN). From these metrics, we have evaluated the following performance measures.
• Recall or sensitivity.
The above performance measures have been calculated as shown in Eqs. (10), (11), (12), (13) and (14) b Fig. 7 a-h Statistical analysis for algorithms IWD, ACO, PSO, SA, GA, GSA, IPO and GWO, respectively Recall or Sensitivity = TP TP + FN ð12Þ We have evaluated the performance measures for 5 most used classifiers such as SVM, Naïve Bayes (NB), k-nearest neighbors (k-NN), decision tree (DT), random forest (RF) and artificial neural network (ANN). From the results we have found that combination of 2-way thresholding with SVM yields the best results in terms of the above performance measures. Tables 10, 11, 12, 13, 14 and 15 depict the results of the evaluations.
To justify the performance of our CAD system we have also provided a comparative analysis with various other CAD systems along with their accuracy in Table 16.
From the above experiments, it has been found that our proposed CAD system integrated with 2-way thresholdbased IWD for feature selection generates an accuracy of 99% in less convergence time.   Two-way threshold-based intelligent water drops feature selection algorithm for accurate detection of… 2297   Two-way threshold-based intelligent water drops feature selection algorithm for accurate detection of… 2299

Conclusion and future work
This work proposed a CAD system which integrates an effective feature selection technique in the post-analysis phase. This feature selection technique is based on a metaheuristic optimization algorithm IWD. To make IWD more flexible we have also introduced a concept of thresholding. Using this thresholding IWD is capable of finding a set of subsets of features from the dataset in spite of finding a single rigid subset. Even though we have applied this model on mammogram classification, this model can be used for other real-world classification problem as well. In medical domain this model will be suitable for datasets with large number of features or attributes. Further this model can also be used for applications such as detection of tumor, detection of polyps in the colon and lung cancer, etc. We have compared our proposed CAD system with many of the existing systems, and results have shown that our system outperforms the other. Further we have compared the feature selection technique used in this paper work with other metaheuristic approaches such as ACO, PSO, simulated annealing and GA. The results have shown that our introduced features selection technique outperforms the others.
In medical domain there are many other classification problems, such as lung cancer detection, DNA sequencing, skin cancer detection for which our proposed model can be extended further. On the other hand, the proposed concept ''2-way thresholding'' is a generalized concept that can be used for other problems related to feature selection.

List of variables
• Variables related to feature extraction: g i : The gray-scale value of neighborhood pixel. g c : The gray-scale value of the center pixel. P: Connectivity from the neighborhood pixels. R: Neighborhood radius for N equally spaced pixels.