The specific details and basic ideas used in the proposed model are described below in Fig. 1. This section proposes six machine learning classifiers and three feature selection methods.
3.1. Machine Learning Classifiers
Decision Tree
There is a classifier whose graphical explicit feature determination is an essential part of the learning cycle: the selection table. The entire question of the study selection table includes selecting the correct credits to be combined. Usually, this is done by estimating the cross-approval execution of the various feature subsets of the form and selecting the best performing subset. Fortunately, the leave-one-out cross-approval is very gentle for this classifier. Obtaining cross-approval errors from the selection list obtained from the preparation information is just a matter of controlling the class check associated with each table entry, because the design of the table will not be changed or erased as occasions increase [25]. To a large extent, the feature space is searched through the pursuit of best priority, because this method is less likely to fall into the largest neighborhood than other methods, such as forward selection.
K-Nearest Neighbor
In order to handle the different marks, a directional calculation is used, which is adjusted and a bunch of name signals are obtained. To assign another point, it finds the nearest point and makes a decision on that point, so it assigns the nearest mark [26]. The following distance work is used to evaluate KNN.
Random Forest
Random forest classifier is a comprehensive learning system for collecting, backing off, and various efforts that can be performed with the help of decision trees. These decision trees can work during planning, and the benefits of this category can be portrayal or retrogressive. With the help of such unpredictable remote areas, people can resolve their affinity for over-adaptation to arrangement sets [27].
At the random forest level, it is completely expected on all trees. The importance of the entire part of each tree is evaluated and isolated by the complete number of trees:
\(\text{R}\text{F}\text{f}\text{i}\text{i}=\frac{\sum _{\text{j}\text{ϵ}\text{a}\text{l}\text{l} \text{t}\text{r}\text{e}\text{e}\text{s}}\text{n}\text{o}\text{r}\text{m} }{\text{T}}\) fiij
Where,
RFfii = the significance of highlight I determined from all trees in the Random Forest model
Norm fiij = the standardized element significance for I in tree j
T = absolute number of trees
Bagging
The idea of bagging (deciding grouping, average recurrence type problems and uninterrupted ward income factors) is suitable for prescient information mining space, adding expected orders (predictions) from many models, or models from various learning information of similar types. It is also used to solve the inherent instability of the results, while applying complex models to the index of usually little information. Assuming that the task of information mining is to build a model with a foresighted arrangement, there are usually very few data sets to prepare the model. We can generate sub-examples (with substitutions) from the data set multiple times and apply, for example, tree classifiers (such as CART and CHAID) to progressive examples. In fact, it is common to develop completely different trees for various examples, outlining the instability of the model that is usually obvious with a small number of data sets [28]. One strategy for determining individual predictions (for novel perceptions) is to use all the trees found in various examples and apply some basic democracy: the last feature is a feature that various trees often predict.
AdaBoosting
AdaBoosting or Adaptive Boosting is an AI used for meta-computation. Different learning indicators are usually used to further improve execution efficiency. The benefits of other learning evaluations will be combined into a weighted whole, which is stable with the last benefit of the supported classifiers. AdaBoosting is versatile and can guarantee substitute students who are powerless due to the misclassification of past classifiers [29]. AdaBoosting perceives large amounts of data and one condition. On some issues, over-fitting is not as defensive as other learning measures. Each substitute may be weak, but as long as everyone performs better than any theory, the last model may eventually be severely affected by a strong substitute.
$${E}_{t}=\sum _{i}E[{F}_{t-1}\left({x}_{i}\right)+{\alpha }_{t}h\left({x}_{i}\right)]$$
Among them, \({\text{F}}_{\text{t}-1}\left(\text{x}\right)\) = Boosted classifier, E (F) = error function, \({\text{F}}_{\text{t}}\left(\text{x}\right)={\text{a}}_{\text{t}}\text{h}\left(\text{x}\right)\)=frail learner, \(\text{h}\left({\text{x}}_{\text{i}}\right)\) = test in the learning set, t = no. if iteration, \({{\alpha }}_{\text{t}}\) = distribution coefficient, \({\text{E}}_{\text{t}}\) = boost the result of the classifier.
Gradient Boosting
Gradient boosting is an AI method for recurrence and characterization problems. It gives an expectation model as a bunch of general forecasting models and selection trees. Like other upgrade methods, it builds models in an ingenious way and summarizes them by allowing self-affirmation to be recognizable by appalling work [30].
Extensive use of "gradient improvement" follows strategy 1 to limit target work. In each cycle, we adjust the basic students to the negative point of the negative tendency, and continue to increase the normal value, and add it to the previously emphasized motivation.
$${\text{F}}_{\text{m}}\left(\text{x}\right)={\text{F}}_{\text{m}-1}\left(\text{x}\right)-{{\gamma }}_{\text{m}}\sum _{\text{i}=1}^{\text{n}}\nabla {\text{F}}_{\text{m}-1}\text{L}\left({\text{y}}_{\text{i}},{\text{F}}_{\text{m}-1}\left({\text{x}}_{\text{i}}\right)\right),$$
$${{\gamma }}_{\text{m}}=\frac{\text{arg}\text{m}\text{i}\text{n}}{{\gamma }}\sum _{\text{i}=1}^{\text{n}}\text{L}\left({\text{y}}_{\text{i}},{\text{F}}_{\text{m}-1}\left({\text{x}}_{\text{i}}\right)\right)-{\gamma }\nabla {\text{F}}_{\text{m}-1}\text{L}\left({\text{y}}_{\text{i}},{\text{F}}_{\text{m}-1}\left({\text{x}}_{\text{i}}\right))\right)$$
Where,\(\text{L}\left(\text{y}, \text{F}\left(\text{x}\right)\right)\text{i}\text{s} \text{a} \text{d}\text{i}\text{f}\text{f}\text{e}\text{r}\text{e}\text{n}\text{t}\text{i}\text{a}\text{b}\text{l}\text{e} \text{l}\text{o}\text{s}\text{s} \text{f}\text{u}\text{n}\text{c}\text{t}\text{i}\text{o}\text{n}\)
3.2. Feature Selection Method
Suppose we consider the list of capabilities to be processed as x with n features. The feature selection is picking m, out discrete advancement problem n contains the set, that is, m ≤ n (24). Display and execute a classifier that is basically unaffected by features. Therefore, it is fundamentally important to deal with unimportant features from the feature set [31].
Chi2 Test
The χ2 (chi2) test involves determining the calculation of χ2 between each component and the target and selecting the ideal number of features with the best χ2 score by using the following equation [32]:
$${\chi }2=\sum _{\text{i}=1}^{\text{n}}\frac{{({\text{O}}_{\text{i}}-{\text{E}}_{\text{i}})}^{2}}{{\text{E}}_{\text{i}}}$$
Where,\({\text{O}}_{\text{i}}=\text{O}\text{b}\text{s}\text{e}\text{r}\text{c}\text{a}\text{t}\text{i}\text{o}\text{n} \text{i}\text{n} \text{c}\text{l}\text{a}\text{s}\text{s} \text{i}\)
$${\text{E}}_{\text{i}}= \text{O}\text{b}\text{s}\text{e}\text{r}\text{v}\text{a}\text{t}\text{i}\text{o}\text{n}\text{s} \text{i}\text{n} \text{c}\text{l}\text{a}\text{s}\text{s} \text{i} \text{i}\text{f} \text{t}\text{h}\text{e}\text{r}\text{e} \text{w}\text{a}\text{s} \text{n}\text{o} \text{r}\text{e}\text{l}\text{a}\text{t}\text{i}\text{o}\text{n}\text{s}\text{h}\text{i}\text{p} \text{b}\text{e}\text{t}\text{w}\text{e}\text{e}\text{n} \text{t}\text{h}\text{e} \text{f}\text{e}\text{a}\text{t}\text{u}\text{r}\text{e} \text{a}\text{n}\text{d}$$
$$\text{t}\text{a}\text{r}\text{g}\text{e}\text{t}.$$
Extra Trees Classifier
For extracting salient features between data set elements by applying the element importance of the model, the model scores each information component and the higher the score, the more components in the income variable [33]. We apply the ET classifier to evaluate the five main features of the data set.
Correlation Matrix
Correlation is an attribute to check whether the characteristics of the data set are associated with the target variable. The relationship may be positive or negative. To a certain extent, if the single meaning of a feature is expanded, it will increase the value of fairness, while if the single meaning of the relevance is expanded, it will reduce the objective value [34]. Through the heat map, it can undoubtedly discover which features are most suitable for the target variable.