An efficient chaotic salp swarm optimization approach based on ensemble algorithm for class imbalance problems

Class imbalance problems have attracted the research community, but a few works have focused on feature selection with imbalanced datasets. To handle class imbalance problems, we developed a novel fitness function for feature selection using the chaotic salp swarm optimization algorithm, an efficient meta-heuristic optimization algorithm that has been successfully used in a wide range of optimization problems. This paper proposes an AdaBoost algorithm with chaotic salp swarm optimization. The most discriminating features are selected using salp swarm optimization, and AdaBoost classifiers are thereafter trained on the features selected. Experiments show the ability of the proposed technique to find the optimal features with performance maximization of AdaBoost.


Introduction
Classification plays an important role in machine learning and data mining. Most traditional classification algorithms work well in a condition of balanced distribution of samples among the classes. However, when class distribution is imbalanced, classifiers often encounter performance degradation (Galar et al. 2013). A dataset is said to be imbalanced when there are more instances in at least one of the classes compared to the others. The class with the greatest number of instances is called majority class, while classes with lower number of instances are defined minority classes (Searle and Searle 1987). In imbalanced classification problems, the minority class is often the class of interest, and the cost of misclassification is higher with the minority class than it is with the others (Maldonado et al. 2014;Di Martino et al. 2013;López et al. 2013;Rekha and Reddy 2018;Zhai et al. 2018). For example, in cancer diagnosis many more people will not have cancer in comparison with those who have it. If a patient who has cancer is mistakenly diagnosed as not having it, disaster may arise as treatment is not given (Menardi and Torelli 2014). Classification on imbalanced datasets is a challenging problem, basically because the lack of a sufficient number of samples of the minority class complicates the task of identifying combinations of features with discriminative power.
Many approaches to deal with imbalanced data have been proposed, and they can be summarized into three categories: data-level strategies, algorithm-level strategies and costsensitive algorithms. Data-level strategies modify a dataset to overcome imbalance, trying to balance all classes. In the past fifteen years, numerous oversampling and undersampling algorithms have been proposed. One of the most popular among them is Synthetic Minority Oversampling TEchnique (SMOTE) proposed by Chawla et al. (2002). SMOTE uses interpolation to generate synthetic samples for minority class. SMOTE-based improved sampling algorithms were very popular in data-level techniques (Ramentol et al. 2012;Gao et al. 2011;Verikas et al. 2010). Variants of SMOTE also try to leverage on the distribution of majority class samples. Recently, Fiore (2020) introduced attraction-repulsion Weber problem, to generate samples that are as close as possible to minority samples and as far as possible to majority samples as well. Other approaches included the use of generative adversarial networks to produce synthetic samples that are indistinguishable from real ones . Undersampling and oversampling can improve the performance of classifiers by balancing the skewed distribution. However, they are criticized because important information may be lost with undersampling, while redundant or inappropriate samples may be generated with oversampling.
Strategies at the algorithmic level focus on changing existing classification algorithms to improve the ability of classifiers to learn from minority classes (Galar et al. 2011). Cost-sensitive and one-class learning are the common techniques proposed in the literature (López et al. 2012;Cao et al. 2013). In order to increase the classifier learning potential, cost-sensitive algorithms adopts a cost matrix by assigning higher misclassification costs to minority samples while training the algorithm. Researchers have suggested many approaches in this respect, including the combination of datalevel approaches with cost-sensitive learning, the cost of collecting features before classification, and the cost of using class distribution prior to classification (Thanathamathee and Lursinsap 2013;Weiss et al. 2013;Thai-Nghe et al. 2010).
Ensemble-level techniques are also widely used currently for class imbalance problems. Bagging (Sun et al. 2018;Chung and Kim 2015) and boosting (Viola and Jones 2002;Li et al. 2015;Chawla et al. 2003;Joshi et al. 2001) are two popular ensemble classifiers. Many researchers proposed various research articles to improve the performance of AdaBoost classifier (Li et al. 2019;Wang et al. 2019). Haixiang et al. (2016) proposed an algorithm to handle multi-class imbalance problems called BSPO-AdaBoost-KNN algorithm. The algorithm worked significantly well in identifying the important features. Yijing et al. (2016) proposed an adaptive classifier to balance the skewed data distribution very efficiently. It employed feature selection and preprocessing to train multiple classifiers. Xinwu et al. (2016) used graphic and math model to analyze the multi-class AdaBoost algorithm and proposed a new kind of multiclass classification method to decrease the accuracy of the weak classifier effectively. The extreme learning machine (ELM) is an effective multi-classification learning algorithm but does not work for imbalanced datasets. By enclosing weighted ELM algorithms in the AdaBoost framework, Li et al. (2018) suggested AdaBoost-weighted composite kernel ELM. For each sample, this architecture integrates spatial and spectral information into the composite kernel. The combination of composite kernel methods and the AdaBoost architecture with weighted ELM provided excellent performance in improving the accuracy of the classification. An improved AdaBoost algorithm with a weight vector was proposed by Dou and Chen (2017). In order to represent the recognition power of the base classifiers, the weight vector assigns weight to each individual class. This algorithm greatly increases the precision of classification by preventing overfitting. For imbalanced data classification, an ensemble evolutionary algorithm was suggested by Li et al. (2017). For better classification, they applied genetic algorithms to the AdaBoost classifier. In Qiaojin et al. (2008), the author proposed four algorithms based on exceeded threshold value of majority class samples. The algorithms are defined based on threshold growth such as A-AdaBoost, B-AdaBoost, C-AdaBoost and D-AdaBoost. The algorithms provide effective results when handling imbalanced data. In the study, particle swarm optimization (PSO) was used to resolve the problems of class imbalance, but the main disadvantage of the PSO algorithm is that in high-dimensional space it may fall into the local optimum and has a low convergence rate in the iterative process.
Many machine learning algorithms do not handle large feature sets very well (Yin et al. 2013;Nikhath and Subrahmanyam 2019); thus, several techniques are targeted at applying feature selection in an efficient manner. Selection of features can be categorized into two methods: (1) filter methods and (2) wrapper methods. In filter-based methods, features are selected based on the properties of the data without using learning algorithms, whereas wrapper methods use a learning algorithm to select the best feature subset according to some optimality criterion. In comparison with filter methods, wrapper methods are more accurate but more expensive computationally Li et al. 2018). Imbalanced datasets exacerbate the problem of efficient feature selection, as there are limited samples to base choices on. Performance is also becoming increasingly important in conjunction with the surge in data volume and variety, owing to the availability of sensors. This trend is also going to acquire momentum with the adoption of paradigms such as transformative computing, especially when techniques such as cognitive analysis will be applied (Ogiela and Ogiela 2020).
In this paper, we propose a novel approach based on feature selection using chaotic salp swarm optimization (CSSA) and classification with AdaBoost. Performance evaluation is done using the AUC [area under (the ROC) curve]. Further, CSSA is adopted to prevent being trapped in local optima and also to avoid slow convergence while selecting the features. Strengths of SSA include good convergence properties, good solution quality, the ability to work quite well on a wide range of optimization problems, adaptability, robustness, scalability, ease of implementation, and reasonable execution times. The AdaBoost algorithm works well with different classification algorithms, which can be applied as weak classifiers.
AdaBoost shows a high degree of precision on a variety of problems.
The remainder of this paper is organized as follows: Section 2 provides an introduction to AdaBoost and SSA. The illustration of feature selection using CSSA is presented in Sect. 3. Performance and evaluation are presented in Sect. 4. Finally, the conclusions are reported in Sect. 5.

Preliminaries
This section presents the concepts of AdaBoost and bioinspired optimization algorithm of Salp Swarm Optimization (SSA).

Adaptive boosting algorithm (AdaBoost)
AdaBoost is a boosting technique widely used to address classification problems and regression problems as well (Zhang and Chen 2017). Boosting is based on the repeated application of weak classifier, i.e., a classifier that is required to be only marginally more accurate than random guessing. Here, a decision tree is used as a base classifier. Weights are assigned to samples at each iteration, and based upon the results of previous iterations, weights are modified and the classifier is applied again. At the beginning, every sample has the same weight. If a sample is misclassified on an iteration, then its weight will be increased on the next iteration; conversely, if it is classified correctly, then its weight will be reduced. Direct application of the AdaBoost algorithm to imbalanced classification problems may cause performance degradation, as the main concentration will be on misclassified samples rather than on minority samples.

Salp swarm optimization (SSA)
Evolutionary and swarm-based algorithms have recently been commonly used for the problem of selecting features. Swarm-based algorithms are collective optimization algorithms because improvement is applied on a set of solution, rather than on a single one. By applying agents to reach an optimal solution, these algorithms adaptively check the feature space. The bio-inspired optimization algorithms have been widely used in feature selection such as ant colony optimization (ACO) (Al-Ani 2005), particle swarm optimization (PSO) (Xue et al. 2012;Bewoor et al. 2017 (Ahmed et al. 2018). The above algorithms have been widely used for feature selection, but our study focuses more on feature selection in imbalanced dataset.
SSA is a meta-heuristic algorithm based on nature that was proposed by Mirjalili et al. (2017). The intuition of SSA emerged from the swarming behavior of salps in deep oceans. These creatures form swarms, known as salp chains, believed to facilitate their movement in the waters through coordination. In SSA, the population of salps is divided into two roles: the leader and the followers. The leader determines the direction of movement, and the remaining salps follow suit.
The location of the i-th salp is represented by a ndimensional vector, where n is the dimension of the search space (in our case the number of features).
The mathematical model for updating the position of the leader is as follows: where x 1 represents the position of the leader, u and l denote, respectively, the upper and lower bounds, the food location is F, and r 1 , r 2 , r 3 are parameters that govern the behavior of the swarm. Note that in Eq. (1), x 1 is meant to be at the current iteration, i.e., x 1 (k), whereas all the variables in the right-hand side are meant to be at the previous iteration. For example, F in Eq.
(1) stands for F(k − 1). To avoid cluttering the notation, explicit indexing has been suppressed. The value of r 1 depends on the iteration and is determined as follows: where K is the predetermined maximum number of iterations and the current iteration is defined by k. The other two parameters r 2 and r 3 are random numbers drawn from a uniform distribution in the interval [0, 1]. The followers' position is modified using Eq. (3): where x j indicates the location of the j-th salp (which is a follower, since j ≥ 2). Since the location of the food source (the global optimum sought) is unknown, it is substituted by the position of the best solution achieved thus far. At each iteration, the fitness function is evaluated at the position of each salp; the best position is found and compared with F. If the newly found position is advantageous with respect to the previous one, F is updated accordingly.

The proposed approach for feature selection in imbalanced dataset
The SSA is a recently proposed optimization algorithm. It has unique features for solving optimization and feature selection problems. The characteristics of SSA are simple, efficient and easy to implement with only one parameter to balance exploration and exploitation. However, it suffered trapping at local optima and with low convergence rate. In feature selection problems, all solutions are limited to the binary values of 1 and 0, where 1 represents the presence of feature and 0 vice versa. For feature selection using SSA, a binary representation has been used by applying equation Eq. (4), where r 4 is a random number uniformly sampled from the interval [0, 1]. Further, conventional SSA was modeled to solve continuous optimization problem and to convert into a discrete version; we applied sigmoid transfer function (Kennedy and Eberhart 1997).

Chaotic mapping
Methods based on chaotic maps have recently been used to improve the efficiency of meta-heuristic algorithms (Sayed et al. 2018). The optimal feature selection using SSA for class imbalance problems adopted chaotic mapping to avoid stagnation at local optima as mentioned in (Sayed et al. 2018), and the optimal chaotic mapping to improve the original SSA's efficiency was through the logistic map. In SSA, the three main parameters r 1 , r 2 , r 3 affect the performance. As r 2 shows a significant impact on balancing the exploitation and exploration, in our proposed approach, a logistic map is employed to adjust the r 2 parameter to avoid falling into local optima while classifying the features. The logistic map is a function of the interval [0, 1] into itself, defined in Eq. (5) In Eq. (5), y n is the value at the n-th iteration of the map, and µ is a constant whose value determines the behavior of the map. Chaos ensues for µ = 4. Chaos is characterized by nonperiodicity and sensitive dependence on initial conditions. The value of the initial position y 0 is set to 0.7 as in Sayed et al. (2019).

Fitness function
To determine the quality of each solution (salp position), a fitness function is used. The best solution for the class imbalance issue is one that increases the classifier's output with a minimal number of features selected. So, one of the effective performance evaluation for classification is to use AUC (area under the curve), which indicate a scalar value, in range [0,1] (Fawcett 2006). At each iteration, the imbalanced dataset is trained using the AdaBoost classifier with the decision tree as the weak classifier. To improve the algorithm's classification performance, the error is measured using AUC as a fitness function. The AUC value is determined by using the area under the ROC curve (Galar et al. 2011).
The fitness function is defined based on Eq. (6).
In Eq. (6), the number of class thresholds to be used in the ROC curve is defined by N . The higher the value, the better the AUC approximation. The accuracy of the two classes is represented as TP i , FP i at the i class threshold (i.e., truepositive and false-positive rates, respectively).

Parameters initialization
The salp swarm algorithm starts with random initialization of salp positions. The lower boundary and upper boundary values are initialized to 0 and 1, respectively. The population size is set to 50 for global optimization, and number of iterations is set to 20. The terminating condition is when the maximum number of iterations is met.

The CSSA-AdaBoost framework
The AdaBoost algorithm combines multiple weak classifier into one single strong classifier by adjusting its weights at each iteration. The ensemble algorithms focus more on difficult examples without class differentiation. But in class imbalance problems, the majority class samples contribute more to the accuracy. Therefore, it is simpler to strengthen the true negatives instead of enhancing the true positives, which is not desirable. However, to overcoming these limitations, ensemble algorithms need be changed or combined with another technique to deal with imbalance problem.
In our approach, SSA algorithm is used to optimize the weights of the weak classifier while training using AdaBoost algorithm. SSA is one of the most popular and efficient meta-heuristic optimization algorithms with a smaller number of parameters. However, it suffers with slow convergence velocity and trapping in local optima (Sayed et al. 2018). Therefore, this paper proposes an improved CSSA with AdaBoost to boost the performance and also to avoid falling into local optima. The logistic chaotic map provides high stability compared with the tradition SSA (Sayed et al. 2018). The logistic chaotic map with AdaBoost provides an optimal result in handling imbalanced dataset. The framework of our proposed approach is presented in Fig. 1.

Performance evaluation
Our experiments were performed on a personal computer with an Intel (R) Core i5 processor and 8 GB of memory. With various classification algorithms using the MATLAB tool, we have implemented our proposed algorithm. To test the performance of our proposed model, we have conducted three key experiments. The first experiment aims to examine whether the performance of our model can be enhanced by combining feature selection with boosting. Our model compares single ensemble models such as C4.5, CSSA-C4.5 and AdaBoost-C4.5 (Ahmed et al. 2018;Zhang and Chen 2017) with our model. The second experiment was conducted to compare the performance of SSA and logistic chaotic SSA algorithm on 15 datasets. Finally, the proposed model has been compared with three state-of-the-art ensemble methods used to deal with imbalance class distribution problems.

Data set
The proposed algorithm is evaluated using 15 datasets with different imbalance ratios (IR), obtained from the Keel repository. 1 The imbalance ratio is defined as the proportion of the number of instances in the majority class to the number of instances in the minority one. Table 1 shows the details of the imbalanced datasets with number of features and imbalance ratio. Table 2 illustrates the confusion matrix for binary class problems. Its entries are TP (true positive), TN (true negative), FP (false positive), FN (false negative).

Measures
In our experiment, positive instance refers to minority class and negative instance refers to majority class. The confusion matrix provides information about the actual and predicted values after classification. However, the classifier performance is evaluated based on the confusion matrix. The confusion matrix comprises four entries represented as 1. True positive (TP): It refers to the number of positive samples that a classifier correctly predicts as positive. 1 https://sci2s.ugr.es/keel/imbalanced.php.  3. False positive (FP) often referred to as false alarm; defined as the number of negative samples wrongly identified by a classifier as positive.
4. False negative (FN) Sometimes referred to as miss; is calculated as the amount of positive samples that a classifier wrongly assigns as negative.

Results and analysis
Three key experiments were performed to test the proposed model, as described in the previous section. The first experiment aims to examine whether the performance of our model can be enhanced by combining feature selection with boosting. Our model is compared with single ensemble models such as C4.5, CSSA-C4.5 and AdaBoost-C4.5. The second experiment was performed on 15 datasets to compare the SSA performance with logistic chaotic SSA algorithm. Finally , the proposed model was compared with three other state-ofthe-art ensemble algorithms used to deal with problems of class imbalance.
For the classification of the training data set, the standard AdaBoost algorithm is used. As the number of weak classifiers increases, AUC output continues to improve, but the improve in AUC slows down dramatically when the number of weak classifiers exceeds 10. The AUC can hardly   Figure 2 shows the graph growth trend of AUC (represented as y-axis) with respect to increase in the weak classifier (represented as x-axis).
In the first experiment, we investigated whether our model's performance can be enhanced by combining feature selection with boosting. Our model is compared with single ensemble models such as the C4.5 classifier, CSSA using the C4.5 decision tree classifier, and AdaBoost. Secondly, on 15 datasets, we aim to measure the performance of the SSA and logistic chaotic SSA algorithm. Finally, our proposed approach was compared with three state-of-the-art ensemble algorithms used to address the class imbalance problem.
In the first experiment, we investigated whether our model's performance can be enhanced by combining feature selection with boosting. Our model is compared with single ensemble models such as the C4.5 classifier, CSSA using the C4.5 classifier and AdaBoost. We performed experiments to compare the CSSA-AdaBoost with a simple C4.5 classifier, CSSA with a C4.5 classifier and AdaBoost to verify the effec-  tiveness of the feature selection and AdaBoost used in our model. Different metrics such as precision, G-mean, F-score and AUC have been evaluated. Tables 3 and 4 show the precision, G-mean, F-measure and AUC of our proposed CSSA-AdaBoost approach and three comparative methods separately. Based on Tables 3 and 4, we observe that CSSA-AdaBoost obtained superior performance results when compared with three algorithms, namely C4.5 classifier, CSSA with a C4.5 classifier and AdaBoost. Moreover, in terms of precision, the algorithm with feature selection performed better, which means that CSSA-AdaBoost is better than AdaBoost and C4.5. From our observation, these findings confirm that by eliminating redundant and unrelated features, CSSA optimization improves accuracy. We also note that in assessing the output of imbalanced data, both CSSA and AdaBoost are successful in terms of accuracy and AUC. Figures 3 and 4 show the comparison of accuracy, G-mean, Fmeasure and AUC of the four algorithms (Note: x-axis represents the measure and y-axis represents the various datasets).
Secondly, we have evaluated the performance of the SSA and logistic chaotic SSA algorithm on 15 datasets. The main objective of this experiment is to examine the AUC, Fmeasure and G-mean metric performance of SSA and logistic Fig. 3 Comparison of accuracy and F-measure metrics obtained with respect to C4.5, CSSA using C4.5, AdaBoost and CSSA-AdaBoost CSSA algorithms. Table 5 provides a comparison of the various metrics obtained with respect to SSA and logistic chaotic SSA. CSSA outperformed several datasets over SSA, as seen in Table 5. Figure 5 shows the comparison of SSA and logistic CSSA algorithm in terms of AUC, G-mean and F-measure. Finally, we compared our proposed model with other three state-ofthe-art ensemble algorithms used to deal with class imbalance problems.
Finally, our proposed method was compared with three state-of-the-art ensemble algorithms used to deal with problems of class imbalance. In this comparison, we select three ensemble algorithms, i.e., SMOTEBoost (Chawla et al. 2003), EasyEnsemble (Liu 2009) and RUSBoost (Dwiyanti et al. 2016). These ensemble algorithms are commonly used for comparison in the literature (Galar et al. 2013;López et al. 2013;Dwiyanti et al. 2016). All these methods are based on sampling and boosting techniques, in which SMOTE-Boost uses SMOTE for minority class oversampling, while EasyEnsemble and RUSBoost use random majority class undersampling techniques. All of these algorithms choose various base classifiers (we prefer C4.5 as the base classifier), such as SVM, Ripper and CART. In terms of AUC, Table 6 presents the performance of CSSA-AdaBoost along with the three algorithms. Figure 6 shows the comparison of CSSA-AdaBoost and three state-of-the-art algorithms in terms of AUC. (Note: x- Fig. 4 Comparison of G-mean and AUC metrics obtained with respect to C4.5, CSSA using C4.5, AdaBoost and CSSA-AdaBoost axis represents the measure and y-axis represents the various datasets).
Results indicate that all four algorithms are equivalent, as all of them have obtained the best results relative to the others on different datasets. Overall, EasyEnsemble and CSSA-AdaBoost are marginally better than the other algorithms, whereas imbalanced datasets with different IRs seem better handled with CSSA-AdaBoost.

Conclusions
Swarm algorithms are commonly used to solve complex problems with optimization. In this paper, to deal with class imbalance issues, a novel chaotic salp swarm algorithm with AdaBoost technique (CSSA-AdaBoost) is suggested. In this study, logistic chaos mapping along with the AdaBoost algorithm was used to improve classifier performance. To select the most discriminating features using SSA, the suggested CSSA-AdaBoost algorithm was used. As a fitness function, we used AUC, making the CSSA-AdaBoost classifier focus more on the accuracy with the minority class. Our proposed approach has been compared with state-of-the-art algorithms, and the results indicate superior performance.
In the future, multiple strategies such as preprocessing and cost-sensitive method will be explored with CSSA-AdaBoost for learning binary and multi-class imbalance classification.  In addition, the implications of our framework in the perspective of explainable machine learning will be studied, and a modification that will favor the interpretation of classification decisions will be explored.

Declarations
Conflict of interest Rekha G declares that she has no conflict of interest. Krishna Reddy V declares that he has no conflict of interest. Chandrashekar Jatoth declares that he has no conflict of interest. Ugo Fiore declares that he has no conflict of interest.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.