Chromosomes identification based differential evolution (CIDE): a new bio-inspired variant for network intrusion detection

Differential Evolution (DE) is perhaps one of the most efficient and versatile evolutionary techniques and its different variants have been proposed in the recent past. However, all these variants often suffer from the shortcomings of slow and/or premature convergence. Further, nearest neighbour (NN) is a frequently used classifier in the domain of data mining but its primitive version is noise sensitive and shows low accuracy. Different data reduction methods such as prototype selection, prototype generation, and DE are widely used to mitigate these drawbacks of NN. However, none of the methods substantially avoids all these shortcomings. This paper proposes a new variant of DE that is based on the biological phenomenon of the centromeric location of chromosomes. The proposed ‘Chromosomes Identification based Differential Evolution (CIDE)’ extracts the knowledge from databases and constructs effective and reliable decision rules for classification. In this paper, the performance of the proposed CIDE technique is computed and compared with 1-NN classifier optimized by fourteen state-of-art variants of DE, and 1-R classifier using six-real datasets as well as KDDCup’99 dataset to illustrate its superiority over the 1-NN and 1-R classifier. Finally, the performance of CIDE is also compared with Naïve Bayes, support vector machine, Neural Network, and Random Forest for the KDDCup’99 dataset using different performance metrics.


Introduction
Nature is endowed with diversified phenomena and many of them are related to computing. Natural computing is a highly interdisciplinary domain and it mimics nature-inspired methodologies and techniques. Precisely, it is the amalgamation of natural phenomena and computation and is widely used in fundamental research and information sciences. In the recent past, different nature-inspired techniques are used for performing specific tasks including the information fusion in offspring generation [1]. In [2], a novel population-based optimization method known as Aquila Optimizer (AO) is proposed. This method is motivated by the Aquila's behaviours observed in nature during the process of prey catching. Further, evolutionary computation (EC) is a peculiar ingredient of natural computing and it is motivated by the Darwinian theory of evolution [3]. Broadly, EC can be classified into three streams. These are [4]: • Evolutionary strategies • Evolutionary programming • Genetic algorithm (GA) EC is widely used for solving the problems of the domain of data mining, and data reduction processes are very useful tasks within the purview of data mining. In [5], a new and efficient clustering technique is proposed for the clustering of the text document. This technique entails a novel attribute selection technique based on particle swarm optimization (PSO) and it also renders efficient dimensionality reduction and thus requires reduced computational cost. Two important data reduction techniques are given in [6,7]. These are: • Prototype selection (PS) • Prototype generation (PG) In the PS process, a subset of instances is selected from the original training dataset. The literature survey reveals the fact that the PS process can be grouped into three categories. Namely: condensation [8], edition [9], and hybrid methods [10]. The condensation method is an attempt to remove those instances which do not possess sufficient classification capabilities. Whereas, the task of the edition method is to remove the noisy instances from the datasets and hybrid methods combine the features of both approaches. The PG method for data reduction is also known as the prototype abstraction method [10]. PG method can perform varieties of tasks such as it can select the data, modifying the data, and even more, it can perform interpolation and movements of instances.
EC can also be used for solving the PS and PG problems [11,12]. In [13,14], a novel aspect of EC is proposed that is known as the Differential Evolution (DE). The functioning of DE differs slightly from GA. Indeed, the mutation process of DE is different from GA. The mutation operation of DE is an outcome of arithmetic combinations of individuals whereas the mutation operation of GA creates minor changes in the positioning of genes. Perhaps, the good convergence and its comprehensiveness render increased popularity for DE [15]. Recent advances of DE are given in [16].
Furthermore, it is pertinent to mention that like other evolutionary algorithms, DE also requires parameters specification. Different control parameters (CP) used in DE is given below: • Population size • Scalar factor S • Cross over rate (CR).
These parameters substantially affect the efficacy of DE. However, the selection of CP is time-consuming and requires a high computational cost. The effect of good parameter selection on the convergence of DE is given in [17][18][19]. In [20], the adjustment of CP is performed using the problem-dependent heuristic rules but in this approach, the population size is not adaptive and thus the performance of this approach varies with the population size. This is the major drawback of this approach. A hybrid strategy of DE and modified particle swarm optimization for the numerical solution of a parallel manipulator is given in [21]. The convergence rate of this approach is high and also provides efficient global optimization. However, its performance is problem-dependent and is not versatile.
In this paper, the authors propose a new variant of DE that is based on the phenomenon of chromosomes identification. The chromosomes identification method is mainly based on geometric and morphological features of chromosomes. Indeed, the locus of 'centromere' plays a vital role in chromosomes identification. Further, the identification of the chromosome facilitates the task of species classification. Figure 1 shows the biological pattern of centromeric locations in different chromosomes. Figure 1 is drawn based on concepts given in [22]. The contributions of this paper are enumerated below:

DE methods
This section will render introductory information on DE. Section 2.1 presents the basic information of DE and the advancements that took place in the recent past are given in Sect. 2.2.

DE
DE evolves a given initial population. Let NP be a stochastically chosen population size with uniform distribution and D, G are dimensional parameter vector and generation respectively. At G = 0, the commencing population is randomly sampled from the feasible solution space. At each generation G, the encoded candidate solutions can be symbolically written as: Analytically, the minimum and maximum parameter bounds of search space can be represented as: . . .. . .; x D max g The initial value of the jth parameter in ith individual at the generation G ¼ 0 is generated by It is important to mention that rand (0, 1) represents a random variable of uniform distribution within the range The first two operations are implemented to create the new vectors. These vectors are known as trial vectors. Subsequently, the selection operation decides the survival of the vectors for the next generation. Details of these operations are elaborated in the Appendix.

Recent advances
Different variants of DE have been proposed in the recent past. In [24], a hybridized approach of the fuzzy logic controller and genetic operator is implemented to set the CP. Further, in [25], a combination approach of DE is introduced and it has experimentally been observed that this approach augments the efficacy and versatility but in this technique, some values are empirically obtained. Further, a stochastic genetic algorithm is also proposed which  employs a stochastic coding strategy [26]. But, it is pertinent to mention that the relationship between the interaction of control parameter setting and performance of GA is complex to establish and is not comprehensible. This makes the stochastic coding strategy less viable. In [27], a new variant of DE, called FADE is introduced which uses the concept of fuzzy logic for dynamic adjustment of F and CR. But, the use of a fuzzy logic controller for parameter searching renders its functionality complex. Moreover, in [28], a hybridization of DE and estimation of distribution algorithm (EDA) is proposed and it has been observed from experimental analysis that the hybridization of DE and EDA algorithms outperforms the DE and the EDA. However, its performance is not good in all scenarios. Breast et al. [20] introduced a new variant of DE which uses F and CR at the individual level. This variant is called self-adaptive DE (JDE). The main drawback of this approach is that the population size is not adaptive and this approach is also not tested for different population sizes. In addition, SADE [17] is a different variant of DE that incorporates a pool of distinct trial vector generation strategies and a pool of values for F and CR. In SADE, trial vector generation strategies and associated control parameter values are gradually self-adapted. This self-adaptation takes place through learning from previous experiences and thus renders more computational cost. In [29], normal and Cauchy distributions are implemented to create the F and CR. However, its performance for multi-objective evolutionary optimization seems debatable.
In [30], a neighbourhood-based DE mutation is proposed that is equipped with a self-adaptive weight factor but this model does not provide sufficient insight regarding the empirical or theoretical approach to select the neighbourhood size. Further, in [31] a scale factor local search DE (SFLSDE) is proposed. This method uses a memetic algorithm with self-adaptation. However, for the large problem the scale factor local search needs modification.
Like SADE [17], EPSDE [32] is a different variant of DE that entails a pool of values for F and CR. However, the self-adaptation in EPSDE is not as much robust as in classical DE. In addition, an improved DE with a modified orthogonal learning strategy is proposed in [33]. However, its time complexity is more than other algorithms when solving simple problems.
Different strategies for setting the associated parameters are also suggested in the recent past. In [34,35], how to avoid premature convergence or stagnation is discussed. In [36], an innovative mutation operator is introduced to speed up the convergence. In [37], a trial vector generation strategy and its various parameters have been suggested. Further in [38], it is suggested that for the optimum performance of DE and to prevent premature convergence, 3D \ NP \ 8D should exist. Moreover, F should not be less than the threshold value, where the threshold value is problem-dependent. Further, in [39], it is suggested that the range of F should be from 0.4 to 0.95 and 0.9 is considered a good choice to initiate. However, all these suggestions are mainly problem-dependant.

The proposed CIDE algorithm
The concept of chromosomes identification technique given in [22] motivated the authors to construct the CIDE technique. In this technique, each instance of the dataset is considered as the chromosomes and different attributes of the instance are assumed to be different segments of the chromosomes. Further, classes are considered as different species and an attempt has been made to classify different classes based on the centromeric location of chromosomes. The method of calculating the centromeric locations is briefly described as: Let the complete length of the chromosome be 'c' and the length of the longer and shorter arm is 'l', and 's' respectively. With the help of these parameters, we can calculate the centromeric locations in terms of difference 'd', ratio 'r', and centromeric index 'i'.
Mathematically, 'd', 'r', and 'i' can be represented as: Further, like conventional DE, our proposed CIDE algorithm also entails three steps. These are mutation, cross-over, and selection. In this proposed technique Eq. (3) is used for mutation operator and Eq. (4) and Eq. (5) will be used for crossover operation. In our proposed technique 'd' is the mutated vector and 'r' and 'i' are the trial vector. Further, in the selection operation, the max-min values of 'd', 'r', and 'i' are selected for generating the classification rules.

The structure and implementation details of CIDE
The algorithmic details of CIDE are shown in Fig. 2

First generation of CIDE
Step 1: Mutation This is the first step of CIDE after initialization. This operation generates a mutant vector. In this operation, the maximum and minimum attribute values are considered for each instance. As per our supposition, the max and min attribute values represent the longer and shorter arm of the chromosomes. As our mutation operator is based on Eq. (3), we can represent it as DE=best À to À worst=1orDE=ðl À sÞ=1; where, 'l' and 's' represent the longer and shorter arm of the chromosome as already stated. The values of 'l' and 's' are selected from the range ½1; D, where D is the dimension of the instances of the dataset. The mutated vector for the first instance given in Table 1 is calculated as: The number written in subscript shows the number of generation ¼ 4:6 À 0:3 (i.e., Difference of maximum and minimum attributes values) Likewise, another mutated vector can be calculated for other instances. All mutated vectors are shown in Table 2.
Step 2: cross-over After mutation, CIDE performs crossover operation like conventional DE. In this step, we will find the values of 'r' and 'i' for different instances of the dataset by implementing the Eq. (4) and Eq. (5). This step produces the trial vectors 'r' and 'i'. For example, the trial vector 'r' for the first instance of Table 1 can be calculated as: Step 1 Set the generation number and initialize a population of individuals. This individuals include longer arm of the chromosomes i) and shorter arm of the chromosomes i) Step 2 WHILE stopping criterion is not satisfied.

END FOR
Construct the appropriate classification rules with the help of these lower and upper bounds.

Step 2.4
Increment the generation count and repeat the step 2. The trial vectors 'r', and the graph between 'd' and 'r' for different instances of sample Table 1 are shown in  Table 2 and Fig. 3 respectively. We can calculate the trial vector 'i' with the help of Eqs. (2) and (5). The chromosome's length of the first instance given in Table 1 can be calculated as:

Fig. 2 Algorithmic description of CIDE
Similarly, the trial vectors of other instances can be worked out. These trial vectors are shown in Table 2.
Step3: selection This operation is required to choose the better individual. In the proposed CIDE algorithm, these better individuals are the min and max values of 'd', 'r', and 'i' for different classes of the dataset. The min and max values of 'd', 'r', and 'i' for different classes obtained after the first generation are shown in bold in Table 2 and its diagrammatic representation is given in Fig. 4. We can easily generate different classification rules by observing the non-overlapping portion of the lines representing different classes in Fig. 4. These rules are sufficient to classify many instances of the sample Table 1. However, instance number 10, 11, 12, 14 and 15 are still unclassified. Therefore, the second generation of the CIDE algorithm is required.

Second generation of CIDE
Step 1: mutation In the second generation of the CIDE algorithm, the second longer value of the attribute is taken as 'l' and the second smaller attribute is taken as 's' e.g., the value of 'l' and 's' will be 3.4 and 1.4 respectively for the first instance of sample Table 1 in the second generation. Therefore, Step 2: cross-over Similarly, 'r', and 'i' can be calculated as: In the same way, d 2 ; r 2 ; and i 2 can be computed for other instances for the second generation of the CIDE algorithm.
Step 3: selection It is explicit that the classification rules obtained after the first generation are sufficient to classify class X completely and many instances of class Y and Z as well. This is explicit from Fig. 4 because the red line representing class X is not overlapped with the line of class   Fig. 4b and c. Thus, it is genuine to construct only those classification rules from the second generation which are associated with class Y and Z. The min and max of d 2 ; r 2 ; and i 2 obtained after the second generation is shown bold and italic in Table 2 for different classes and is diagrammatically represented in Fig. 5. Further, we can easily construct the following classification rules associated with classes Y and Z with the help of a non-overlapped portion of lines shown in Fig. 5. (c) Based on trial vector i 2 : Now, all the rules obtained after the first and second generation can classify classes X and Y completely. Many instances of class Z have also been classified but some instances of class Z are still unclassified. Therefore, a general classification rule can also be generated for class Z. This rule can be written as: ''If any instance does not follow the above-mentioned rules implicate class Z''.

Classification rules of Iris dataset
Iris dataset is taken from [23] for the explanation of the CIDE technique. Details of this dataset are given in (a) min-max of (b) min-max of (c) min-max of 4 Table 3. Following classification rules can be generated for this dataset after the first generation with the help of Fig. 6.
(a) Based on mutated vector d 1 : 1 Similarly, the rules given below can be obtained after the second generation with the help of Fig. 7. The classification rules obtained after the first generation will classify some instances of each class. Similarly, the classification rules obtained in the second generation will again classify some new instances of different classes. In this way, we can construct the classification rules generation after generation and some new instances will be classified in each generation. Therefore, it is obvious that for a given dataset the cumulative classification accuracy of all generations will render substantially good results. Figures 8 and 9 show the scatter plot for the first and second generation of the CIDE algorithm for the iris dataset.        [40]. The convergence in probability can be used for the convergence analysis of different DE variants. Our proposed DE variant i.e., CIDE can be considered as an optimization problem in which we have to optimize the min i.e., lower bound (LB) and max i.e., upper bound (UB) of mutated vector 'd', trial vector 'r', and 'i'. Mathematically, this can be described as: where A d is the set of mutated vectors, A r and A i be the set of the trial vector. G ¼ 1; 2; . . .; NP, is the number of generations. Similarly, we have to optimize the given problem.
Here f ð:; :; :Þ is the objective function. Let S be a measurable space. We can consider S similar to a dataset whose different attributes or features are A d G; A r G; and A i G . Further, different instances can be obtained by calculating A d G; A r G; and A i G for G ¼ 1; 2; . . .; NP. Therefore, we can say that and S is bounded. The optimal solution set which consists of LB and UBofA d G; A r G; and A i G is denoted as S Ã .
Here, x Ã is the collection of LBandUB of A d G; A r G; and A i G : Let lð:Þ be a function that implicates the measure of space S: Perhaps l S Ã ð Þ ¼ 0; which reflects the fact that set S Ã is with measure zero. Precisely, we can say that this will be the situation when LB ¼ UB for mutated vector A d G; trial vector A r G; and A i G for every class of the dataset. Therefore, keeping the accuracy of the practical problem in view, we can assume an expanded set S Ã d such that UB À LB ð Þ[ d, where d is a positive value. We can work out the appropriate value of d so that the accuracy of classification rules can be increased and render the condition l S Ã ð Þ[ 0. The asymptotic convergence of the algorithm can be defined in several ways for analysis. An attempt has been made to define the convergence in terms of probability below.
We will use this definition for the analysis of the CIDE algorithm.
Definition [41] Let X NP ð Þ; NP ¼ 0; 1; 2; . . . f g be a series of populations created by CIDE for solving the optimization problem max f x ð Þ; x 2 S f g . Then the condition for its convergence to global optimum is given by: i.e., the probability of population created, and expanded set S Ã d such that UB À LB ð Þ[ d will increase as the number of generations increases.
Let us give a sufficient condition for the global convergence of CIDE.
Theorem [41] Consider using CIDE to solve the optimization problem max f x ð Þ; x 2 S f g . Indeed, in the NP k th target population X NP k ð Þ; 9 individual x (at least one). Further, with the help of the crossover operator, the individual x will map to the trial individual y, and thus its probability of occurrence will be greater than zero. Mathematically, we can say: i.e., as the series , is a small positive value that can alter as NP k .
Indeed, after every generation proposed algorithm generates artificial data which is the measurable space S as already discussed. We can say that S ¼ XðNPÞ as given in Eq. (6). Further, it has already been declared that S Ã d is a set that contains LB and UB of A d G; A r G; A i G : Let S 1 be the measurable space and S Ã d À Á 1 be LB and UB of A d G; A r G; A i G obtained after the first generation of CIDE algorithm respectively. Then it is obvious that S Similarly, it will also be true that S Ã d À Á 2 S 2 and so on. Therefore, the condition given in Eq. (6) will always be satisfied. Further, it is obvious that the probability p y 2 S Ã d È É will always be equal to one because S Ã d S is true for every generation. Therefore, the sufficient condition of convergence given in Eq. (7) will also be satisfied. Thus, we can infer that the proposed CIDE algorithm converges to the global optimum.

Asymptotic analysis
The runtime complexity analysis of the DE algorithm is hard because of its non-predictable nature [42]. However, following the work of Zielinski et al. [43], we note that the average runtime of a standard DE algorithm is OðNP; D; G max Þ, where the highest digit of generation is G max . Further, in [30] it is given that the runtime complexity of different DE variants likes DE/rand to best/1, DEGL, etc. also possesses the average runtime complexity of OðNP; D; G max Þ. Furthermore, the runtime complexity of 1-R is OðnÞ while there are 1-rules and n-features. In addition, the runtime complexity of 1-R is O (n 2 ) while 1-rule pairs. Perhaps, it can inhibit 1-R as a viable tool.
In the proposed CIDE algorithm, finding the max and min quantity of each attribute considering all instances will have a runtime complexity of OðDÞ; where D is the number of attributes of the dataset. Further, the total runtime complexity of the NP population will be O NP; D ð Þ; therefore, for G max i.e., the highest digit of generation, net runtime complexity of CIDE will also be OðNP; D; G max Þ.

Limitations of CIDE
It is explicit from the previous discussion that the proposed CIDE technique performs its computation based on maximum and minimum attribute values. However, this technique has its limitations as given below.
• In some datasets, the maximum and minimum attribute values of the first generation are very close to the maximum and minimum attribute values of the second generation. This will render the difference (d) obtained during mutation i.e., step 1 for all instances of different classes in the first generation in closed vicinity of the value of 'd' obtained in the second generation. Similar is the case for trial vector 'r' and 'i' calculated during the crossover operation i.e., step 2. Finally, in selection operation i.e., step 3 we generate the min and max values of 'd', 'r', and 'i' to construct the classification rules. Thus, it is intuitive to visualize that if the maximum and minimum attribute values of the first generation are very close to the maximum and minimum attribute values of the second generation then the efficiency of classification rules constructed during the selection operation will not display the substantial classification accuracy. This fact is also valid if the maximum and minimum attribute values of the second generation are very close to the third generation and so on. This can be considered as the main limitation of the proposed algorithm. • It is explicit that if the number of attributes in the dataset is less, the number of generations that take place will also be less. Eventually, it will result in a fewer number of classification rules obtained. This can also cause mitigation in classification accuracy.

Datasets used
The performance of the CIDE algorithm has been checked experimentally on six actual datasets given in [23]. These datasets are Iris, Wine, New-Thyroid, Breast, Wisconsin, and Splice Junction. The number of instances and attributes of these datasets are given in Table 3. The number of instances in these datasets is of the increasing order from Iris to Splice Junction. Indeed, these datasets are taken in an attempt to visualize the performance of the proposed algorithm. However, substantially increased datasets are not taken in this experimentation.

Algorithms
The performance of the proposed CIDE technique is compared with two algorithms. These are 1-NN and 1-R. Following variants of DE are used to optimize the 1-NN algorithms.
In addition to these variants of DE, the authors also included four recent variants of DE to optimize the 1-NN. These are given below. The mutation operators for all these variants of DE have been given in Appendix. Table 4. These specifications are suggested by the developer of different variants. The authors have considered these specified parameters in every problem. It is pertinent to mention that this work is motivated by the research given in [44]. Therefore, the algorithm and parameters considered are similar to [44].

Analysis of the results
Cross-validation is frequently used to estimate and validate the model. In this experimentation, the test mode for evaluation is fivefold cross-validation. In fivefold crossvalidation, the dataset is randomly partitioned into five disjoint sets. Further, different classifiers are run using one set as a test dataset and the remaining four sets as a training dataset. This phenomenon is repeated five times. Finally, the mean of five results obtained from fivefold cross-validation is calculated which produces a single result.
Further, the classification accuracy of the proposed algorithm is compared with different state-of-art variants of DE given in the Appendix. These variants of DE are used to optimize the nearest neighbour (NN) classifier [30], and 1-R is considered as the basis of performance.
The average classification accuracies obtained after fivefold cross-validation of selected algorithms for six datasets are given in Table 5. The bold result in each column shows the best value obtained.
Our validation results show that for the iris dataset the proposed CIDE algorithm displays maximum testing accuracy i.e., 86.66%. The second best testing accuracy for this dataset is 85.94% which is shown by 1-R. Further, for the wine dataset, we observed different patterns of testing accuracies for different algorithms. For the wine dataset, 1-NN optimized with SADE (Learning Period = 50) renders a maximum testing accuracy of 80.02% while the second-best testing accuracy for the same dataset is obtained by 1-R. 1-R algorithm displays a testing accuracy of 79.96% which is very near to the accuracy obtained by 1-NN optimized by SFLSDE/ReadToBest/1/Bin. Our proposed CIDE algorithm also performs well for the newthyroid dataset and displays a validation accuracy of 83.72% which is very close to the best result given by 1-NN optimized by JADE i.e., 83.98%. For the breast dataset, the proposed CIDE algorithm again produces the best validation accuracy of 84.38%. However, 1-NN optimized by SADE (Learning Period = 100) algorithm also produces a considerably good result for the breast dataset which is 83.14%. Moreover, for the Wisconsin dataset, the performance of 1-NN optimized by SADE (Learning Period = 100) is comparatively best. 1-NN optimized by SADE (Learning Period = 100) produces an accuracy of 81.07% for Wisconsin dataset. The second-best accuracy for Wisconsin dataset is produced by 1-NN optimized by SFLSDE/RandToBest/1/Bin i.e., 80.32%. The proposed CIDE algorithm displays accuracy of 79.89% for Wisconsin dataset that is very close to the accuracy obtained by 1-NN optimized by SFLSDE/RandToBest/1/Bin. Further, 1-NN optimized by SADE (Learning Period = 50), and SFLSDE/Rand/1/Bin, produce almost similar results. However, the testing accuracy obtained by 1-R for Wisconsin dataset is 78.88% which is considerably less than the accuracy produced by the proposed CIDE algorithm (i.e., 79.89%). In addition, for Splice dataset, the proposed CIDE technique renders the best validation accuracy of 80.32%. Further, 1-R displays the second best validation accuracy for Splice dataset (i.e., 79.22%), and 1-NN optimized by SFLSDE/RandToBest/1/Bin produces the thirdbest accuracy (i.e., 79.02%) for this dataset.
The standard deviations (SD) of these algorithms in fivefold cross-validation are displayed in Table 6. The SD of the proposed CIDE technique is lowest for Iris, New Thyroid, and Wisconsin datasets. It is explicit from Table 6 that the SD of the proposed CIDE algorithm is 1.41, 1.94, and 1.87 respectively. For Wine dataset, 1-NN optimized by DEGL (Linear) shows the lowest SD which is 2.46. Similarly, for Breast dataset, the SD of 1-NN optimized by SFLSDE/RandToBest/1/Bin is 1.90 and it is the lowest. Moreover, for Splice dataset 1-NN optimized by DEGL (Exponential) produces the lowest SD of 3.21.
Over and above, the mean SD of the CIDE technique for all six datasets is 2.39 and comparatively, it is also the lowest. The second-lowest mean SD for six datasets is 2.84 that is displayed by 1-NN optimized by SFLSDE/Rand-ToBest/1/Bin and the third lowest mean SD is shown by 1-NN optimized by DEGL (Linear) (i.e., 2.90). However, the mean SD of 1-NN optimized by SFLSDE/RandToBest/ 1/Bin and DEGL (Linear) is almost equal.  Bold values display the maximum testing accuracies and best accuracies of standard deviations obtained for different datasets by different algorithms respectively The authors also conducted the Friedman test to validate the proposed CIDE algorithm. The outcome of this test is briefly presented in sub-Sect. 4.4.1.

Friedman test
In [45], it has been mentioned that statistics is of paramount importance to validate the innovative method. Therefore, the Friedman test is conducted for this purpose. The ranking of the algorithms for different datasets is shown in Table 7.
This test compares R j ¼ 1 N À Á P i r j i where ; R j is the average rank of algorithms. If we have k-algorithms and N datasets then the r j i represents the rank of jth of k algorithms on the ith of N datasets. It is important to mention that as per the null hypothesis algorithms are entirely equivalent. It implicates that ranks R j are equal. The mean rank R j ¼ 8:99. The Friedman statistic can be calculated Further, v 2 F with ðk À 1Þ is considered for distribution. It is important to mention that ðk À 1Þ is the 'degree of freedom (DF)'. It has been observed that v 2 F = 77.04. In addition, in [46], a better statistic is used and in this statistic the Fdistribution is considered. Moreover, in this distribution k À 1 ð Þand k À 1 ð ÞðN À 1Þ is the number of DF. The F F = 20.31and F F is distributed as per F-distribution. The number of DF calculated is 17 À 1 ¼ 16 and 17 À 1 ð Þ 6 À 1 ð Þ¼80. Over and above, the 'critical value' of Fð16; 80Þ at the 'level of confidence' a ¼ 0:05 is 2.08 which suggests the rejection of the null hypothesis. Detailed information regarding the implementation of the Friedman test is given in [47].

Experimental results and analysis for KDDCup'99 dataset
Network security is of paramount importance to keep confidential data and information safe from unauthorized third-party access. Network Intrusion Detection System (NIDS) is a succinct domain of research. Perhaps, it is because this tackles miscellaneous possibilities and issues of the real-time applications for network security. The NIDS detects the intrusion and sends the information to the authority. The role of the swarm and evolutionary algorithms in the domain of NIDS is thoroughly discussed in [48].
The performance of the proposed CIDE algorithm has been checked experimentally for the intrusion detection dataset i.e., KDDCup'99 [49]. There are 4,898,430 labelled and 311,029 unlabeled connection records in this dataset. The numbers of attributes in labelled connection records are 41. Different types of attacks for labelled records of the KDDCup'99 dataset are depicted in Table 8. Further,  Table 9 entails the complete list of the set of features of this dataset. It is important to mention that in Table 9 'c' and 's' stand for continuous and symbolic respectively. Further, this dataset incorporates one type of normal data and 22 different attack types categorized into four classes. These four classes are given below.
• Denial of Service (DoS): In this attack, the attacker attempts to stop legitimate users from using a service. • Probe: In this attack, the attacker strives to elicit information about the target host. • User-to-Root (U2R): In this attack, the attacker has local access to the victim machine and attempts to obtain the privileges of the superuser. • Remote-to-Login (R2L): In this attack, an attacker attempts to get access to the victim's machine even he does not have an account on the victim machine.
In this experimentation, 10% of the entire KDDCup'99 labelled dataset is taken for experimentation purposes. It includes 4, 94,020 records and 41 features. The distribution of connections types for this dataset is shown in Table 10. The most substantial drawback of the KDDCup'99 dataset is the number of redundant instances. The presence of redundant instances renders the training of the learning algorithm difficult. The overall effect of these redundant instances makes the learning algorithm biased towards the   The performance metrics like accuracy, false alarm rate, and precision are registered for 1-NN optimized by different DE variants, 1-R, and the proposed CIDE algorithm. The formulae used to calculate these metrics are given in Eqs. (8), (9), and (10).
False alarm rate Table 11 shows the classification accuracies (A), false alarm rate (B), and precision (C) of DoS ? 10% Normal and Probe ? 10% Normal datasets. For Dos ? 10% Normal dataset and Probe ? 10% Normal, the classification accuracy and precision are highest for the proposed CIDE algorithm. For the Dos ? 10% Normal dataset, the classification accuracy and precision are 83.68% and 84.56% respectively and for the Probe ? 10% Normal dataset, the classification accuracy, and precision are 85.57% and 86.87% respectively. The false alarm rate of the proposed CIDE algorithm is also lowest for Dos ? 10% Normal and Probe ? 10% Normal datasets. This is 0.0387 and 0.0343 respectively for these datasets. The second-best classification accuracy for the DoS ? 10% Normal dataset is displayed by 1-NN optimized with SADE (Learning Period = 100). It is 82.24%. Further, for the Probe ? 10% Normal dataset, 1-R shows the second-best classification accuracy of 84.84%. Precision and false alarm rate of 1-NN optimized with SADE (Learning Period = 100) is also second-best for the DoS ? 10% Normal dataset. These are 83.57% and 0.0389 respectively. Similarly, the precision of 1-R is also second-best (85.56%) for the Probe ? 10% Normal dataset whereas its false alarm rate is not good for the Probe ? 10% Normal dataset. It is 0.0422. Table 12 shows the classification accuracies (A), false alarm rate (B), and precision (C) of R2L ? 10% Normal In contrast, for the U2R ? 10% Normal dataset, the 1-NN optimized by the JADE algorithm renders the highest classification accuracy and precision. The 1-NN optimized by the JADE algorithm shows 82.88% classification accuracy and 84.03% precision for this dataset. However, the classification accuracy and precision of the proposed CIDE algorithm are 82.62% and 84.03% respectively. Classification accuracies, false alarm rate, and precision of these datasets are also illustrated in Fig. 10.
Further, we also compared the performance of the proposed CIDE technique with different existing state-of-art techniques such as Naive Bayes, SVM, Neural Network, and Random Forest in terms of the performance metrics like accuracy, precision, recall, and F1-score. This comparison is shown in Table 13. The values of these performance metrics for the above-mentioned techniques are taken from [50,51]. The formulae used to calculate the recall and F1-score are given in Eqs. (11) and (12).
The average classification accuracy of CIDE can be obtained from Tables 11 and 12 Table 13, it is explicit that the accuracy, precision, recall, and F1-score of the proposed CIDE technique is better than the Naive Bayes, SVM, and Neural Network. However, these performance metrics for the proposed CIDE technique are not as good as Random Forest.

Conclusions
The proposed CIDE technique renders the processes of prototype selection and prototype generation and is easy to implement. Experimental analysis reveals the fact that the proposed CIDE technique performs better than the 1-NN classifier optimized from different variants of DE, and 1-R for Iris, Breast, and Splice datasets. Its performance is also very near to the 1-NN optimized from JADE for the New Thyroid dataset. The classification accuracy of 1-NN optimized from JADE is 83.98% whereas the proposed  Probe ? 10% Normal datasets and its performance is also satisfactory for R2L ? 10% Normal and U2R ? 10% Normal datasets. Over and above, the performance of the proposed CIDE technique is also compared with Naive Bayes, SVM, Neural Network, and Random Forest for KDDCup'99 dataset and it is observed that its performance is better than the Naive Bayes, SVM, and Neural Network in terms of accuracy, precision, recall, and F1-score. However, the performance of Random Forest is superior for this dataset.
The future research scope of this work is pervasive. It can be used within the purview of the medical domain for disease classification and prediction. Further, the implementation of the proposed technique and observing its performance for increased datasets as well as its use in the pursuit of anomaly detection can also be considered as subsequent future research work.

Mutation operation
This is the first step after initialization. This operation generates a mutant vector. At each generation G, DE creates a mutant vector V i;G ¼ fv 1 i;G ; v 2 i;G ; . . .. . .; v D i;G g for each individual X i;G : X i;G is called the target vector. The six frequently used operators for mutation are given in Eqs. (1) to (6). Details of these operators are given in [44]. a. DE/rand/1: V i;G ¼ X r i 1 þ FðX r i 2 ;G À X r i 3 ;G Þ ð 13Þ b. DE/best/1: V i;G ¼ X best;G þ FðX r i 1;G À X r i 2;G Þ ð 14Þ c. DE/rand-to-best/1: d. DE/best/2: e. DE/rand/2: f. DE/rand-to-best/2: In the above equations; r i 1 ; r i; 2 ; r i 3 ; r i 4 ; r i 5 2 1; NP ½ and F is scaling factor. X best;G is the vector of highest fitness at generation G.NP is the population size.

Cross-over operation
After mutation, DE performs a binomial crossover operation. This step is implemented with the target vector X i;G , and its mutant vector V i;G , and a trial vector is produced. The binomial (uniform) crossover is generally used for this purpose. Mathematically, this crossover is given below.

Selection operation
This operation is required to choose the better individual from X i;G and U i;G : These better individuals are transferred to the next generation. The selection operation is given in Eq. (21).
DEGL variant is introduced in [30]. In this variant neighbourhoods scheme is used. Further, ''Local'' and ''Global'' neighbourhoods are incorporated in this variant for mutation. In addition, two kinds of mutation operators are also used. This variant implements a new parameter, known as ''scalar weight''. Like other adaptive methods, this method employs a different scheme for adaptation.

Appendix 5 DE with scale factor local search (SFLSDE)
In [31], SFLSDE is introduced. In this variant, two local searches are implemented. These are scale factor golden section (SFGSS) and scale factor hill-climb (SFHC). In SFLSDE, trial vector U i;G is worked out by the five numbers chosen randomly. Mathematical expressions are given below.
Author contributions A new bio-inspired variant of differential evolution is developed and applied for intrusion detection and classification of many real datasets.
Funding Not Applicable.

Data availability Available.
Code availability Available.
Material availability Available.

Declarations
Conflict of interest No conflict of interest.

Ethical approval No human or animal is involved in experimentation.
Consent to participate Not Applicable.

Consent for publication
The authors are giving full consent for the publication of the manuscript.