By assisting enterprises in identifying, averting, and addressing security threats and vulnerabilities, ML algorithms serve a vital part in improving data security. To enhance security measures, they make use of data patterns and insights. Large and diverse datasets are best for training ML algorithms, which frequently need ongoing updates to respond to new threats. Despite the fact that these algorithms have the potential to greatly improve data security, they should be a part of a larger security strategy that also includes preventative measures like access restriction, encryption, and routine security audits. Three different ML models are used in this study. In the following section, those algorithms are explained in detail.
A. RF
For prediction tasks, the RF technique is an effective machine-learning tool. The Leo Breiman-created RF algorithm builds individual classification or regression trees for prediction by utilizing bootstrap aggregation (bagging) and random feature selection [13]. In several studies in a variety of fields, including economic forecasting, satellite imaging, genetic and biological analyses, and classification and regression difficulties, RFs have shown to have outstanding predictive powers. RF classifiers are gaining popularity in the field of computer vision, including well-known variations such as Random Ferns along with exceedingly random trees. Ongoing research in the field of RF allows researchers to enhance accuracy, reduce learning and classification time, or achieve both objectives simultaneously. This study focuses on improving RF's precision because it is one of the most effective categorization techniques [14].
Nevertheless, an RF may use less-than-ideal tree classifiers, producing inaccurate classification results, because of the multitude of data distributions in feature spaces with large dimensions. When a significant proportion of poor-quality trees is present in the RF, the collective decision-making of all the trees may result in erroneous decisions. To mitigate this, the research seeks to optimize the RF by identifying and excluding underperforming trees to minimize their detrimental impact on overall performance. Additionally, randomization in RF can lead to correlated trees, potentially impacting performance. The RF's classification accuracy can be increased by reducing the correlation between trees. This research selects only uncorrelated trees with good classification accuracy in order to improve an extensive amount of decision trees throughout an RF.
An ensemble classification technique called RF integrates the findings of various decision trees. Numerous approaches to creating RFs have been put out over the past ten years, with Breiman's method rising to prominence due to its greater performance over alternatives. The process of building an RF consists of three steps:
-
The first phase is Training Data Sampling - Using the bagging approach, randomly sample \(D\) with replacement to produce \(K\) subgroups of the training data \(\{{D}_{1} , {D}_{2} , ..., {D}_{K}\}\).
-
The second phase is Feature Subspace Sampling and Tree Classifier Building - Given every training session dataset \({D}_{i}(1\le i\le K)\), a tree should be grown using a decision tree technique. The optimal split should be chosen as the dividing feature to generate a child node after evaluating all potential splits within each node's subspace \({X}_{i}\) (where \(F << M\)). A tree \({h}_{i}({D}_{i}\), \({X}_{i})\) constructed from training data \({D}_{i}\)under subspace \({X}_{i}\) is the end product of this procedure, which continues until the halting requirements are satisfied.
-
The third phase is judgment Aggregation - Create an ensemble classification judgment by using votes from the majority among the \(K\) trees \(\left\{{h}_{1}\left({D}_{1}, {X}_{1}\right),{h}_{2}\left({D}_{2}, {X}_{2}\right)\dots {h}_{K}\left({D}_{K}, {X}_{K}\right)\right\}\)that together make up an RF.
The procedure is mostly driven by the quantity of \(K\) trees required to form an RF and the total quantity of \(F\) randomly selected features required to build a decision tree. Breiman states that parameter \(F\) is calculated as \(F= [ {log}_{2}M+1]\) and parameter \(K\)is commonly set to 100. Use greater values of \(K\) and \(F\) for datasets that are huge and have a high degree of dimension.
B. KNN
A straightforward and popular supervised ML technique known as K-Nearest Neighbors. It can be used for both classification and regression problems. KNN is a non-parametric, instance-based learning algorithm that generates predictions based on similarities between data points rather than on assumptions about the distribution of the underlying data [15]. KNN is a flexible algorithm that excels at being clear to understand and simple to use. When the decision limit is not very complex, it can be effective in solving difficulties. Ties in classification can happen when different classes receive the same amount of KNN votes. In these circumstances, a number of tie-breaking techniques can be applied. Within high-dimensional spaces or when dealing with unbalanced datasets, it might not function at its best. For successful KNN implementation, hyperparameter adjustment and data pretreatment are frequently required.
To train a KNN model, the following procedures are done. Identify the value of K, which denotes the quantity of closest neighbors to be taken into account while making forecasts. To determine the best value for K, you can experiment with various values. Odd integers like 1, 3, 5, etc. are frequent values for K. Select a distance metric that accurately captures how similar different data points are. Every particular situation and the properties of the data will determine whatever metric one uses. The K-nearest neighbors' class labels should be determined. Assign the projected class for the new data point to the class label that appears the most frequently among the K neighbors. Try out various K values and different distance metrics to see which one produces the greatest performance for the model on an evaluation or validation set.
C. ANN
ANNs represent a fascinating field of computational models that draw inspiration from the intricate structure of the human brain [16]. They are designed to emulate various aspects of human-like behavior, encompassing critical processes such as learning, adaptation, association, generalization, and abstraction. These functionalities are particularly prominent during the training phase, where ANNs evolve and refine their internal representations based on data. At the core of ANNs are artificial neurons, which serve as the fundamental processing units. These neurons are interconnected in intricate ways, forming a network. One of the remarkable features of ANNs is their remarkable ability to learn from data that is incomplete and laden with noise. This is in stark contrast to traditional computing systems, where a malfunctioning component can lead to a catastrophic system failure. In ANNs, classification is an inherent property thanks to their distributed processing nature. If an individual neuron malfunctions, its erroneous output can be overwritten or compensated for by the correct outputs generated by its neighboring neurons.
The versatility of ANNs makes them a powerful tool for solving complex real-world problems where the relationships between input attributes and desired outputs may not be well understood. They excel in scenarios involving continuous value inputs and outputs, a characteristic that sets them apart from many other ML algorithms. ANNs have been successfully applied in various domains, including handwritten character recognition and medical diagnosis, showcasing their adaptability and efficacy. Moreover, techniques for parallelization can be employed to expedite the computational processes, and recent developments in rule extraction from trained ANNs enhance their utility in data mining tasks, especially in numerical classification and prediction.
Within the realm of ANNs, learning is the central process. It involves iteratively adjusting the synaptic weights that connect artificial neurons to minimize errors. Learning in ANNs is akin to the continuous adaptation of the network's parameters based on environmental stimuli. The type of learning employed depends on how these parameter adjustments are carried out. Two prominent categories are supervised learning, where known input-output pairs \(\left({x}_{i},{y}_{i}\right)\)are provided, and unsupervised learning, where desired output values (\({y}_{i}\)) are absent, and the network must uncover patterns and structures in the data independently. Figure 4 illustrates the depiction of the fundamental components of an artificial neuron.
One of the most widely used training algorithms for ANNs, especially in the context of multi-layer perceptrons (ANN-MLP), is the error backpropagation method. This method involves presenting a pattern to the input layer of the network and then processing it layer by layer until the network produces the final response (\({f}_{mlp}\)). This response is calculated by considering a combination of synaptic weights (\({v}_{I}\)and \({w}_{ij}\)), biases (\({b}_{I0}\)and \({b}_{0}\)), and an activation function \(\left(\phi \right),\) as outlined in Eq. (1).
\({f}_{mlp}=\phi \left[{\sum }_{1}^{{N}_{on}}{v}_{1}\phi \left(\sum {w}_{ij}{x}_{l}+{b}_{l0}\right)+{b}_{0}\right]\) [1]
The MLP training process is significantly influenced by the choice of the learning rate parameter. When the learning rate is set too low, the training of the ANN becomes sluggish, whereas an excessively high learning rate can result in training oscillations, hindering the convergence of the learning process. Typically, this parameter's values fall within the range of 0.1 to 1.0. Training an MLP using the backpropagation algorithm often demands numerous iterations through the training dataset, leading to lengthy training times. If the training process encounters a local minimum, it may struggle to reduce the error for the training set, plateauing at an unacceptable level. One effective strategy for accelerating the learning rate without inducing oscillations is to introduce a momentum term. This constant factor influences how past weight changes affect the current direction of weight adjustments in the weight space. It is advisable to set the momentum rate within the range of 0 to 1 [17].