K-Nearest Neighbors (KNN)
KNN is an instance-based learning algorithm used for both regression and classification tasks. Its operating principle is that similar data points exist in close proximity. In classification tasks, it assigns a class to a data point based on the majority class among its 'k' nearest neighbors. In regression tasks, it predicts the output value by averaging the values of the 'k' nearest neighbors [7]. This algorithm has several advantages that make it appealing for machine learning tasks. It is straightforward to understand and implement, making it an excellent choice for beginners. Additionally, it does not require a separate training phase, allowing for quick setup. KNN is also flexible because it does not make any assumptions about the underlying data distribution, enabling it to adapt effectively to a wide range of data patterns [8]. However, KNN has some notable disadvantages. It performs slowly on large datasets because it computes the distance to all training data points for each prediction, and it requires significant memory to store all these points. The algorithm is also sensitive to irrelevant features, which can affect distance calculations and reduce prediction accuracy [8, 9].
Multilayer Perceptron (MLP)
A Multilayer Perceptron (MLP) is a type of feedforward neural network composed of an input layer, one or more hidden layers, and an output layer. Each layer consists of nodes (or neurons), with every node in one layer connected to every node in the next layer, forming a fully connected network [10] as shown in Fig. 1a.
MLPs can be used for both classification and regression tasks across various fields. For example, in image recognition, they classify images into different categories; in speech recognition, they transcribe spoken words into text. They are also employed in financial forecasting to predict market trends and stock prices, and in the medical field, MLPs assist in diagnosing diseases based on patient data. Input data enters through the input layer and moves forward through the hidden layers, eventually reaching the output layer where it generates a result. Connections between neurons have weights, and each neuron includes a bias term. These parameters are tuned during training. The backpropagation algorithm calculates the gradient of the loss function concerning each weight and bias. Weights are then updated using an optimization algorithm like gradient descent to reduce prediction error. Neurons apply activation functions such as sigmoid, hyperbolic tangent (tanh), or rectified linear unit (ReLU) to introduce non-linearity, enabling the network to learn complex patterns.
Training Multilayer Perceptrons (MLPs) can be time-consuming and resource-intensive, especially with large datasets and deep networks [m3]. Large amounts of data are usually required to train MLPs effectively and prevent overfitting [m4]. Several factors need to be considered to ensure effective performance when implementing MLPs. Data preprocessing, such as normalizing or standardizing input data, can improve training convergence. Hyperparameter tuning, which involves selecting the appropriate architecture (number of layers and neurons) and hyperparameters (number of epochs, learning rate, batch size), often requires experimentation and cross-validation [11]. Multilayer Perceptrons (MLPs) can model complex relationships in data due to their multi-layer structure and non-linear activation functions. The architecture can be customized by varying the number of layers and neurons, making MLPs adaptable to various problems.
Random Forest
Random Forest is an ensemble learning algorithm that creates multiple decision trees during training and combines their results to improve accuracy and reduce overfitting. This algorithm applies to both regression and classification tasks [14]. Combining the predictions of several base estimators, it reduces variance and enhances generalization. Random Forest also provides insights into feature importance, allowing analysts to identify the most significant features. These characteristics highlight its effectiveness as a robust and adaptable tool in predictive modeling. During training, Random Forest uses bootstrap sampling to create random subsets of the training data. At each split, a random subset of features is selected to ensure diversity and reduce correlation among the trees. Each tree is built independently on different subsets of data and features (Fig. 1b). Random Forest achieves high accuracy by combining predictions from multiple decision trees, reducing variance and improving overall accuracy. This ensemble approach enhances the algorithm's robustness, significantly reducing the likelihood of overfitting compared to individual decision trees, especially with an adequate number of trees.
Despite its strengths, Random Forest has several drawbacks. First, training multiple decision trees can be computationally intensive, requiring substantial processing power and memory, which poses challenges for large datasets. Second, the ensemble nature reduces interpretability compared to individual decision trees, making it harder to understand how specific features influence predictions across the model. Finally, the complexity of Random Forest necessitates careful hyperparameter tuning, including the number of trees, tree depth, and feature selection criteria [15]. Effective implementation of Random Forest involves several critical considerations. Hyperparameter tuning is essential, with parameters such as the number of trees [r6], maximum depth, and minimum samples per leaf needing optimization for optimal performance. Handling imbalanced datasets is another important factor, where techniques like resampling or adjusting class weights can help maintain balance and improve predictive outcomes [16]. Leveraging parallel processing capabilities can significantly accelerate training times by allowing each decision tree to be built independently, enhancing overall computational efficiency.
M5 Tree
M5 Tree is a machine learning algorithm used for regression tasks. It builds decision trees by recursively splitting the dataset based on feature conditions to predict continuous numerical values. Unlike traditional decision trees, M5 Tree incorporates model tree nodes where linear regression models are used at the leaf nodes to improve accuracy. This hybrid approach allows M5 Tree to capture complex relationships in data while maintaining interpretability [17].
SMOReg
SMOReg, or Sequential Minimal Optimization for Regression, is a machine learning algorithm designed for regression tasks using support vector machines (SVM). It optimizes a set of linear equations derived from training data to find a regression function that best fits the data points. SMOReg operates by iteratively updating weights for support vectors until convergence, aiming to minimize prediction errors. This algorithm is particularly effective in tasks where the relationship between outputs and inputs is not linear and requires flexibility in defining decision boundaries [18]. However, SMOReg may be computationally intensive and sensitive to parameter tuning, impacting its efficiency and performance on large datasets.
FFNN (Feedforward Neural Network)
A Feedforward Neural Network (FFNN) is a type of artificial neural network where information flows in one direction, from input nodes through hidden layers to output nodes. Each layer comprises nodes (neurons) connected to nodes in the next layer by weighted connections. During training, FFNNs adjust these weights based on error gradients to minimize prediction errors. They are widely used for tasks like classification and regression in fields such as finance, healthcare, and natural language processing [19]. FFNNs excel in capturing complex patterns in data and can generalize well, but they may struggle with handling sequential data and require extensive hyperparameter tuning to prevent overfitting [20].
CNN (Convolutional Neural Network)
A Convolutional Neural Network (CNN) is designed for processing grid-like data, such as images, and is used in tasks like image recognition and computer vision. CNNs employ convolutional layers that apply filters over input data to detect spatial patterns, followed by pooling layers to reduce dimensionality and increase computational efficiency [21]. They can be computationally intensive and require large amounts of training data to generalize effectively.
LSTM (Long Short-Term Memory)
An LSTM network is a type of recurrent neural network (RNN) designed to model sequential data by overcoming issues like the vanishing gradient problem that traditional RNNs often face [22]. LSTMs utilize a sophisticated gating mechanism to selectively retain or forget information across long sequences. This capability makes them highly suitable for tasks such as time series prediction, natural language processing, and speech recognition [23]. LSTMs excel in capturing long-term dependencies in data and they accommodate sequences of varying lengths during training. However, they may be prone to overfitting with small datasets and require careful tuning of hyperparameters like the number of layers and learning rates to achieve optimal performance.