The dataset consisted of 1134 X 6 values out of which 1134 X 5 columns were input parameters and one column consisting of 1134 parameters was output. The input parameters consisted of distances in X and Y directions, temperature, age and relative humidity, while the output was HCP value. A five folds cross-validation was carried out to prevent over-fitting of data. The Principal Component Analysis (PCA) was also carried out. The criteria for Component reduction was specified at 95 % explained variance. The explained variance per component was obtained as 43.0 %, 38.2 %, 11.7 %, 6.6 % and 0.5 % for distance in X, distance in Y, age, temperature and RH respectively. Hence, after training, only the first four components were kept. The original normalized dataset is presented in Fig. 6
Results of Boosted Trees
Boosted Trees is a sophisticated ensemble learning method that stands out as a powerful machine learning tool. It is very good at both regression and classification tasks. Boosted Trees combine many decision trees to create a powerful predictive model. This method uses sequential training to create a decision tree on the original dataset and iteratively improve the model's predictive power by fixing previous errors. Gradient boosting allows each new tree to minimize a predetermined loss function by reducing ensemble errors. Gradient boosting recalibrates training instances based on the loss function gradient relative to ensemble predictions. Boosted Trees use restricting tree depth, shrinkage parameters to modify each tree's contribution, and sub-sampling during training to avoid over-fitting. Boosted Trees are known for their outstanding prediction ability and robustness to noisy data. They can identify complex correlations between features and target variables across varied datasets. Fig 8 shows the results of Boosted Trees. Fig 8 (a) shows the response plot of Boosted trees. The blue dots represent the true values while the yellow dots represent the predicted values. The response plot is non-linear, this indicates that the predicted response is not consistently influenced by the input variable throughout its entire range. It thus, gives a visual evaluation of the predicted performance of the model, enabling recognition of patterns, trends, outliers, and areas that could use improvement. Similar results are obtained for bagged trees and optimizable ensembles.
Fig. 7(b) shows the plot of predicted vs actual response of boosted trees. The x-axis shows the response variable's observed values, while the y-axis shows the model's anticipated values. Each point on the scatter plot represents an observation in the dataset, and its position reflects the relationship between its actual values and its anticipated values. All points should align along a diagonal line to show that the model's predictions and observed results match. Systematic deviations from this line can reveal model performance trends or biases and suggest improvements. The spread and variability of data points around the diagonal line can reveal the model's accuracy and precision over a number of actual value ranges. The Predicted vs. Actual Response graphic can be used to assess the model's predictive strength and ability to capture data linkages.
Fig.7 (c) presents the residual plot obtained from boosted ensemble
Bagged Trees
Bagged Trees, also known as Bootstrap Aggregating Trees, are a powerful ensemble learning technique used in machine learning for regression and classification. For regression and classification, multiple decision trees are built independently using varied bootstrap samples of the training data and then averaged or voted on. This approach uses tree diversity to improve model prediction accuracy and robustness. Bagged Trees construct subsets of data with replacement using bootstrap sampling, allowing several trees to capture different data distributions. Bagged Trees reduce variance by aggregating predictions from these trees, reducing overfitting and improving model stability. Bagged Trees may also train huge datasets and distributed computing systems efficiently due to its parallelizability. Bagged Trees can handle noise and outliers, but their simplicity may restrict their ability to capture complicated correlations compared to ensemble approaches. However, Bagged Trees remain popular among machine learning practitioners because to their simplicity, versatility, and solid performance across domains and applications. Fig. 8 shows the results of Boosted Trees.
Optimizable Ensemble
Bayesian optimization and Random search optimizers were next used with maximum iterations of 30, while the acquisition function used was expected improvement per second plus. The hyperparameter search range in the ensemble method was with Bag and LSBoost. The number of learners ranged from 10-500 with a learning rate of 0.001-1. The minimum leaf size ranged between 1-567 and the number of predictors to sample were 1-4.
For the Bayesian optimization, LSBoost Ensemble method, with a minimum leaf size of 94, number of learners as 270, a learning rate of 0.34858 and number of predictor samples of 4 was obtained as the optimized hyperparameter. For this model, the RMSE, R2, MSE and MAE were 0.018097, 0.97, 0.00032752 and 0.013769 respectively. The prediction speed was approximately 4600 obs/sec and training time was 116.66 sec. The response plot,
For the Bayesian optimization, LSBoost Ensemble method, with a minimum leaf size of 94, number of learners as 270, a learning rate of 0.34858 and number of predictor samples of 4 was obtained as the optimized hyperparameter. For this model, the RMSE, R2, MSE and MAE were 0.018097, 0.97, 0.00032752 and 0.013769 respectively. The prediction speed was approximately 4600 obs/sec and training time was 116.66 sec.
In Random search optimizers Fig. 10, number of learners as 58, number of predictor samples of 2 was obtained as the optimized hyperparameter. For this model, the RMSE, R2, MSE and MAE were 0.023896, 0.95, 0.00057534 and 0.018053 respectively.
Performance Evaluation
RMSE: This metric measures the average difference between predicted and actual values. Lower values indicate better performance. LSBoost with Bayesian optimization has the lowest RMSE (0.018097), indicating the most accurate predictions. R squared: R-squared represents the proportion of variance in the dependent variable explained by the independent variables. Higher values are desirable, indicating better model fit. LSBoost with Bayesian optimization achieves the highest R-squared (0.97), suggesting it captures a significant portion of the variance in the data.
MSE: Similar to RMSE, MSE measures the average squared difference between predicted and actual values. Lower values indicate better performance. LSBoost with Bayesian optimization has the lowest MSE (0.00032752), indicating superior predictive accuracy.
MAE: MAE measures the average absolute difference between predicted and actual values. Again, lower values are better. LSBoost with Bayesian optimization achieves the lowest MAE (0.013769), indicating the smallest average absolute prediction error.
Prediction Speed: This metric measures how quickly the model can make predictions. Higher values are desirable, indicating faster prediction times. Here, the Bagged Trees ensemble method has the highest prediction speed (12000 obs/sec), followed by Boosted Trees (8500 obs/sec). Training Time: Training time measures how long it takes to train the model. Shorter times are preferable, particularly for large datasets or real-time applications. LSBoost with Bayesian optimization has the longest training time (116.66 sec), whereas Bagged Trees and Boosted Trees have relatively shorter training times (9.0191 sec and 9.7955 sec, respectively).
Minimum Leaf Size and Number of Learners: These are hyperparameters that can significantly impact model performance. LSBoost with Bayesian optimization has a larger minimum leaf size (94) and a higher number of learners (270) compared to the other methods. PCA: This indicates the percentage of variance explained by each principal component. PCA can be used for dimensionality reduction. It's unclear how this relates directly to model performance without further context.
Number of Predictors to Sample: This hyperparameter determines the number of features randomly selected at each split. LSBoost with Bayesian optimization uses 4 predictors to sample, while the others have different values.
Critical evaluation of the performance: LSBoost with Bayesian optimization consistently outperforms other methods across various metrics, indicating its effectiveness in this scenario. However, its training time is substantially longer compared to other methods. Depending on the application, this trade-off between accuracy and computational cost needs careful consideration. Bagged Trees and Boosted Trees also perform well, offering a good balance between prediction speed and accuracy. These methods might be preferable for applications where training time is a crucial factor, and slight decreases in predictive accuracy are acceptable. Overall, the choice of ensemble method depends on the specific requirements of the application, considering factors such as prediction accuracy, training time, and computational resources available. LSBoost with Bayesian optimization stands out for its high accuracy but may be less suitable for time-sensitive applications due to its longer training time.
In this particular scenario, the efficiency of LSBoost with Bayesian optimisation is demonstrated by the fact that it consistently outperforms other approaches across a comprehensive range of metrics. However, in comparison to other approaches, the amount of time required for training is significantly longer. It is necessary to give careful thought to this trade-off between accuracy and computing cost, depending on the application. The longer training time of LSBoost with Bayesian optimization compared to other methods can be attributed to several factors:
Bayesian Optimization Overhead: Bayesian optimization involves building a probabilistic model of the objective function and iteratively selecting new hyperparameters based on the model's predictions. This iterative process incurs computational overhead, including the evaluation of the objective function and updating the probabilistic model.
Complexity of LSBoost: LSBoost is a sophisticated ensemble learning method that sequentially combines weak learners (typically decision trees) to minimize a least squares objective function. The optimization process in LSBoost is inherently more complex compared to simpler methods like Bagged Trees or Boosted Trees, requiring more computational resources and time for training.
Large Search Space: Bayesian optimization explores a broader search space for hyperparameters compared to other methods. While this thorough exploration can lead to better-performing models, it also requires more computational effort to evaluate a larger set of hyperparameters and select the optimal combination.
Number of Learners: LSBoost with Bayesian optimization in this scenario employs a high number of learners (270). Increasing the number of learners typically leads to longer training times due to the sequential nature of the ensemble method. Each additional learner requires fitting to the residuals of the previous models, which adds to the computational burden.
Minimum Leaf Size: The choice of hyperparameters, such as the minimum leaf size (94 in this case), can also influence training time. A larger minimum leaf size can result in more complex trees, which may require more computational effort to build during the training process.
While LSBoost with Bayesian optimization may have longer training times, it offers superior predictive performance across various metrics, as evidenced by lower RMSE, MSE, and MAE values and higher R-squared values. Therefore, the trade-off between training time and accuracy must be carefully considered based on the specific requirements of the application. In scenarios where prediction accuracy is paramount and computational resources are available, the longer training time of LSBoost with Bayesian optimization may be justified. However, for time-sensitive applications or when computational resources are limited, alternative methods with shorter training times, such as Bagged Trees or Boosted Trees, may be more suitable despite potentially sacrificing some predictive performance.