Predicting Medical Waste Generation and Associated Factors Using Machine Learning in the Kingdom of Bahrain

14 Effective planning and managing medical waste necessitate a crucial focus on both the public and private 15 healthcare sectors. This study uses machine learning techniques to estimate medical waste generation and 16 identify associated factors in a representative private and a governmental hospital in Bahrain. Monthly data 17 spanning from 2018 to 2022 for the private hospital and from 2019 to February 2023 for the governmental 18 hospital was utilized. The ensemble voting regressor was determined as the best model for both datasets. The 19 model of the governmental hospital is robust and successful in explaining 90.4% of the total variance. 20 Similarly, for the private hospital, the model variables are able to explain 91.7% of the total variance. For the 21 governmental hospital, the significant features in predicting medical waste generation were found to be the 22 number of inpatients, population, surgeries, and outpatients, in descending order of importance. In the case of 23 the private hospital, the order of feature importance was the number of inpatients, deliveries, personal income, 24 surgeries, and outpatients. These findings provide insights into the factors influencing medical waste generation 25 in the studied hospitals and highlight the effectiveness of the ensemble voting regressor model in predicting 26 medical waste quantities. 27


Introduction
Medical waste (MW) poses a critical public health concern in all societies due to pathogens, bacteria, chemical anticancer agents, hazardous, and radioactive wastes, all of which can be classified as hazardous wastes, which give rise to an array of diseases (WHO, 2014).Moreover, the presence of sharps can cause infections and a great deal of harm to people who are in contact with them, which is extremely dangerous (Lindahl & Grace, 2015).Health hazards and environmental pollution could result in the absence of a proper management system for such medical materials (Neves, Maia, de Castro e Silva, Vimieiro, & Gomes Mol, 2022;Sabour, Mohamedifard, & Kamalan, 2007).Hence, it is imperative to have a well-established management system in place.The complexity and heterogeneity of hospital waste production make it very difficult to establish such a management system.Starting such a complex system requires accurately estimating short-or long-term waste generation.Storage, collection, and transfer systems need to be designed and managed based on short-term estimation of MW generation (Matsuto & Tanaka, 1993;Sengupta & Agrahari, 2017), while long-term estimations are crucial to select landfill sites, waste treatment technologies, or understanding the impact of new policies and initiatives (Sengupta & Agrahari, 2017).Direct sampling can be used to measure MW generation rates; however, not all hospitals have enough resources to build a complete database (M Abbasi, Abduli, Omidvar, & Baghvand, 2013;Sengupta & Agrahari, 2017).Several methods have been used to predict MW generation rates, including sample surveys, data mining, and models based on effective factors.These models include statistical or conventional methods that are focused mostly on deterministic methods or trend analysis regardless of the dynamic properties of the MW generation (Sengupta & Agrahari, 2017).Consequently, advanced methods are required for data modelling of complex systems, which have acceptable performance in predicting dynamic systems behavior, to build a non-linear relationship between inputs and outputs (Golbaz, Nabizadeh, & Sajadi, 2019).
It was stated that conventional forecasting techniques are not ideal for accurate prediction of MW generation when there is a scarcity of data or short datasets (Karpušenkaitė, Ruzgas, & Denafas, 2016).Moreover, Jahandideh et al. (2009) found that the traditional methods of forecasting MW performed poorly; on the contrary, the machine learning (ML) method performed more accurately.
With an increased number of predictors, traditional regression methods are not enough in accurately predict the MW generation.In contrast, advanced techniques, which utilize more complex and sophisticated algorithms, demonstrate superior and robust results in predicting the amount of MW (Altin et al., 2023).Therefore, more advanced methods like ML algorithms were employed in this study.
Data scientists have highlighted that a universally optimal algorithm for solving all problems does not exist.
The algorithm selected is contingent on the nature of the problem, the number of features involved, and the most appropriate model type (Mahesh, 2020).
It is evident that ML can be applied to small datasets, and it has revealed the significant growth of the research community focused on addressing the challenges posed by small dataset problems through ML (Kokol, Kokol, & Zagoranski, 2022).Gheyas & Smith (2010) applied an ML algorithm and demonstrated its practical nature by showcasing its capability to achieve successful predictions even with incomplete or small datasets.For instance, Karpušenkaitė et al. (2016) used ML to analyze short and extra-short datasets that contain 20, 10 and 6 observations.The type of hospital has been identified as a significant factor in the generation of MW, as indicated by previous studies (Golbaz et al., 2019;Thakur & Ramesh, 2018).It is essential to focus on the public and private healthcare sectors to effectively plan and manage such waste (Devi, Ravindra, Kaur, & Kumar, 2019).Therefore, this study uses advanced ML methods to determine the significant factors of MW generation in two datasets, one for a government hospital and another for a private hospital, and find the best model for predicting the MW generation rate.

Literature review
In Bahrain, AI was applied to study different types of waste like predicting domestic, commercial and construction and demolition (C&D) (Coskuner, Jassim, Zontul, & Karateke, 2021), municipal solid waste (Jassim, Coskuner, & Zontul, 2022), and forecasting daily domestic waste (Jassim, Coskuner, Sultana, & Hossain, 2023).However, as per the authors' knowledge, AI has no application in estimating or forecasting MW and associated factors in Bahrain or GCC.Based on a comprehensive literature review, limited studies applied AI associated with non-conventional and advanced methods in MW; these studies are summarized in the Supplementary material (Table S1).Jahandideh et al. (2009) predicted generating different types of MW, sharp, infectious, general, and total, in selected hospitals in Fars province, Iran.The study applied multiple linear regression (MLR) and the advanced method, the artificial neural networks (ANNs), using a 5-fold cross-validation procedure to verify the performance of the models.The study found that based on root mean squared error (RMSE), mean absolute error (MAE), and coefficient of determination (R2), the developed model using MLR was poor.However, the MLR found bed occupancy and hospital capacity influential factors.On the other hand, the performance measure of ANNs was high (R 2 = 0.99), confirming the good fit of the data.The study mentioned that the success of the ANN approach could be attributed to the non-linear nature of ANNs in solving problems, which makes it possible to relate independent variables to dependent variables in a non-linear manner.Arabgol & Ko (2013) predicted the healthcare waste quantities of 50 hospitals located in Iran.The study applied three models based on MLR, ANN, and a combination of ANN and genetic algorithm (ANN-GA).In this study, GA was applied to find the optimal initial weights in the ANN to improve the performance of ANN for prediction.The performances of the three models were evaluated by Mean Square Error (MSE), and the obtained results were MLR (28.8659),ANN (1.930) and ANN-GA (2.9563).The results showed that GA significantly impacts optimizing initial weights and improving the performance of ANN.
A study conducted by Karpušenkaitė et al. (2016) aimed to develop models to forecast MW generation in Lithuania using short and extra-short datasets; one dataset was based on 2012-2013 regional annual data; the second on 2004-2013 annual national data (long annual data); and the third on 2008-2013 annual national data (short annual data), with 20, 10 and 6 observations, respectively.The study used MLR, ANN, partial least squares (PLS), support vector machines (SVM) and four non-parametric regression methods.The best and most promising results in the regional data case were exhibited by generalized additive (R 2 = 0.90455), and the smoothing splines models (R 2 = 0.98584) and was the multilayer feedforward ANN (R 2 = 0.61103) in the regional, long, and short annual data case, respectively.Adamović and others (2018) predicted annual hazardous healthcare waste quantities using a general regression neural network (GRNN), a form of ANN model.Biannual data were collected from 28 European Union ) EU( countries; the model was trained using data in the period 2004-2012 and tested using the data from 2014.Different economic, social, industrial and sustainability indicators have been used as input variables (16 variables).The optimal model was designated using sensitivity analysis and correlation analysis.The results revealed that the predicting capability of the model is high since the prediction was with errors below 25%.Thakur & Ramesh (2018) studied the production rates of the MW compositions (yellow, red and blue waste) and the associated factors for 75 hospitals using cross-sectional data for May 2015 in Uttarakhand, India, by applying MLR and ANN methods.Also, seasonal variation in the generated MW has been investigated on longitudinal data that cover 2 years (2013 and 2014) using polynomial regression analysis.Based on used performance criteria (RMSE, MAE and R 2 ), the ANN model was much better than MLR.The MLR method found that bed occupancy and the type of hospital are significant factors towards MW generation.The polynomial regression analysis found that the amount of MW is affected by seasons throughout the years.Golbaz, Nabizadeh, & Sajadi (2019) carried out a study to estimate MW and associated factors.The study used MLR in addition to several Neuron-and Kernel-based ML methods.As a result of the study, the Neuron-and Kernel-based ML methods performed satisfactorily, but Kernel-based models (average MSE = 0.003-0.008and average R 2 = 0.82-0.86)have outperformed Neuron-based models (average MSE = 0.009-0.023and average R 2 = 0.68-0.74).The MLR model found that the number of staff and ownership of the hospital were the most significant variables for predicting the rate of MW generation.Ceylan, Bulkan, and Elevli (2020) predicted MW generation in Istanbul, Turkey, using ML techniques.The authors collected data from three hospitals and used different models: Autoregressive Integrated Moving Average (ARIMA), Grey Model (GM), Support Vector Regression (SVR) and simple linear regression.The results showed that the ARIMA model had the highest accuracy R2 (0.9888), MAD (588.4712),RMSE (763.6852),and MAPE (11.7595), followed by the SVR and GM (1,1) models.Erdebilli & Devrim-İçtenbaş (2022) developed an ensemble voting regression (VR) model through ML algorithms, for instance, random forests (RFs), adaptive boosting (AdaBoost), and gradient boosting machines (GBMs) to predict the MW for Istanbul, in Turkey.The results revealed that among the three models, the best performance was achieved by RF.Then, these three models were used as a base to build the suggested VR model, and considering the base models' performances, the weighted averages were used.
Compared with the three baseline models, the recommended model performed the best and had the lowest RMSE (843.70).Rashid and co-authors (2022) forecasted the MW production in hospital Taiping in Perak for two years, from 2021 to 2022.The study implemented the non-linear autoregressive (NAR) neural network method using monthly data of MW generated for 2018 -2020, comprising 36 datasets.The total MW was predicted for 2020 and 2021 as 397,510.564kg and 326,608.8845kg, respectively.The R 2 value for testing 15 experiments ranged between 0.686739 to 1. Altin and others (2023) predicted the amount of MW for a private hospital in Antalya located in Turkey, using monthly data (57 months) from January 2018 to September 2022.
Deep Learning method and Kernel-based SVM were employed.Although both methods were successful in terms of evaluation criteria, the Deep Learning method (R 2 = 0.466, RMSE = 0.094, and MAE = 0.079) achieved greater success than Kernel-based SVM (R 2 = 0.221, RMSE = 0.264, and MAE = 0.202).

Datasets and selected variables.
This work investigates the factors at the level of Bahrain, focusing on the public and private healthcare sectors as it is essential to effectively plan and manage the MW.In this study, the most relevant factors of MW generation were investigated in two datasets; SMC as representative of the public healthcare sector and one private hospital (PH) as a representative of the private healthcare sector in Bahrain.The following datasets were built; Salmaniya Medical Complex (SMC) is selected for this study because it is the largest public healthcare facility in the Kingdom of Bahrain and provides all kinds of healthcare needs for Bahrain's citizens and residents.It also serves as a teaching and research center for healthcare professionals (MOH, 2023).The SMC has been providing medical care to the people of Bahrain since 1959 and is the most visited medical facility in the country.With a full range of services and specialized departments, SMC was relaunched in 1978 and continues to meet the healthcare needs of citizens and residents (Portal, 2022).It was chosen as a representative government healthcare facility.
The data used in this study excluded the COVID-19 waste, focusing on the factors that were influenced during the normal situation and without the impact of abnormal conditions.Data about the generated MW quantities in kg, number of inpatients, outpatients, surgeries and hospital capacity (beds) were obtained from the hospital administration.The collected data were obtained on a monthly basis with effect from January 2019 to February 2023.
A hazardous MW collection system and recording health data have recently been created in SMC.Therefore, the study access to get large and relevant data sets for research is somewhat limited.
A private hospital PH, healthcare facility was selected because it is one of the largest private hospitals in Bahrain.It provides a wide range of medical and surgical services, including neurology, cardiology, orthopedics, and general surgery, among others, so it offers a reasonably wide spectrum of medical services.In addition, the hospital has both local and expatriate patients in its patient population.
Data about the amount of MW in kg, the number of inpatients, outpatients, hospital capacity (beds), surgeries, deliveries, patients' admission days and bed occupancy were obtained from the concerned office of the hospital, and the dataset was considered on a monthly basis from January 2018 to Nov 2022.Here, the obtained amount of MW represents all classifications of MW in the hospital except the cytotoxic and chemical types.
In addition, web-based data about the population and age 65 and above (Age65andAbove) were collected from the word bank (Bank, 2023), and the amount of monthly personal loans (in BD where currently 1 BD = 2.65 $) were obtained from the Central Bank of Bahrain (CBB, 2023).

Machine Learning Algorithms
The concept of ML can be seen as a subset of AI, where computers can learn from information by blending statistical analysis methods and computer science in order to create algorithms that are "statistically proficient" (Gutierrez, 2020).These algorithms can be classified into two main types: supervised and unsupervised.Within supervised learning, two distinct categories exist: classification and regression algorithms.These categories are defined based on the approach they employ to establish relationships between dependent and independent variables.
Ensemble techniques involve ML algorithms that leverage several base models to enhance the performance of classification and regression problems as well as feature selection.These techniques combine predictions from individual models to generate more accurate and robust results (Y.Li & Chen, 2020).Based on the learner generation process, ensemble techniques are classified into two types: sequential, denoted by boosting (Y.Li & Chen, 2020), and parallel, denoted by bagging (Breiman, 1996).In boosting approach, various sets for training with sample size "n" are created from the data using bootstrap sampling, a technique that ensures the independence of these distinct training sets.The final model is then constructed by aggregating the predictions from all individual models (Y.Li & Chen, 2020).The base learners in ensemble models are established independently of each other as they are developed simultaneously.In the sequential ensemble type, multiple learners are constructed in a sequential manner, which enhances the performance of the ultimate model.This sequential construction allows subsequent learners to potentially overcome the limitations or errors of their predecessors, leading to improved overall model performance.

Baseline algorithms
Random Forest (RF): The RF is an ML technique that links multiple decision trees to address regression and classification problems.
By employing bootstrap sampling, the decision trees are combined, and their collective results, such as majority classification vote and average, are used for regression tasks (Ahmad et al., 2021;Byeon, 2021).The RF is well-suited for handling missing, imbalanced, and multicollinear data, making it a robust approach in such scenarios (Byeon, 2021;Y. Li & Chen, 2020).The analysis using RF consists of two stages.Stage 1: A forest is built by creating multiple decision trees.A sample from the initial training dataset is randomly selected with replacement to create sub-datasets.Regression trees are then constructed based on these smaller sets of data.
The number of variables and trees can be adjusted during the training stage.Stage 2: Predictions can be made once the RF model is trained.The input variables are combined for each regression tree, and their predictions are aggregated.Then, the average of the predictions from all the trees is calculated to find the ultimate result (Ahmad et al., 2021).

Light Gradient Boosting Machine (LightGBM):
The LightGBM is a powerful ML algorithm that falls in a gradient-boosting framework based on a gradientboosting decision tree (GBDT) (Microsoft, 2023b).It applies gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB) technologies.LightGBM uses a leafwise tree-splitting method, which sets it apart from the conventional GBM method.This unique method allows LightGBM to construct intricate models that deliver superior accuracy.By employing the GBDT algorithm and the leaf-wise method, LightGBM offers several benefits, including reduced memory consumption and faster training speed (Park, Moon, Jung, & Hwang, 2020).This algorithm offers a range of hyperparameters that can be fine-tuned to optimize its performance.Key hyperparameters such as the number of iterations, learning rate, and number of leaves significantly influence the prediction accuracy.To avoid overfitting, adjustments can be made to the colsample by tree and subsample hyperparameters (Ke et al., 2017;Vinayak & Gilad-Bachrach, 2015).It is specifically designed to deliver fast and efficient performance while maintaining high predictive accuracy.LightGBM is widely used for both classification and regression tasks in various domains (Ke et al., 2017).

Elastic Net (EN):
The EN is a regularization-based feature selection technique commonly used in ML models (Zou & Hastie, 2005).It combines the strengths of two other regularization methods, L1: least absolute shrinkage and selection operator (LASSO) and L2: Ridge regression regularization.EN can handle multicollinearity, which occurs when some predictors correlate strongly (Wang, Liang, Liu, Song, & Zhang, 2022).The LASSO regularization can encounter problems of inconsistency and instability in the presence of multicollinearity since it may arbitrarily select one predictor over another.Ridge regularization handles multicollinearity better, but it may retain many irrelevant predictors.EN, on the other hand, addresses these issues by selecting a subset of predictors that are correlated but not redundant.Moreover, the elastic net can reduce overfitting, which occurs when the model extremely fits the training data but shows poor performance on new data.The LASSO and ridge methods could also address overfitting by incorporating regularization, but elastic net can achieve it more effectively by leveraging the advantages of both methods.Therefore, EN optimizes the bias-variance trade-off by balancing underfitting and overfitting (Medium, 2019).Another advantage of the elastic net is its ability to perform feature selection, which involves identifying the most important predictors for the predicted variable.
While LASSO can also perform feature selection by assigning zero to some coefficients, it may miss some relevant predictors when faced with many predictors.In contrast, Ridge regression cannot perform feature selection as it retains all predictors but applies shrinkage.EN addresses this issue by simultaneously performing feature selection and setting certain coefficients to zero while retaining other significant predictors (Zou & Hastie, 2005).

Extreme Gradient Boosting (XGBoost):
The XGBoost is a popular gradient-boosting algorithm for classification and regression tasks.While XGBoost is commonly known for its effectiveness in classification problems, it can also be applied to regression problems.It is an algorithm that employs scalable tree boosting and is specifically designed to handle sparsity in data (Chen & Guestrin, 2016).It is established upon the principles of gradient boosting, integrating concepts from optimization and ML domains (Lai, Yang, Kristiani, Liu, & Chan, 2020;S. Li & Zhang, 2020).The predictive model is trained in an additive manner in this algorithm.For prediction of ith data instance and at tth iteration, ft is added to minimize the objective function (Eq.1): + Ω(  )  =1 (Eq. 1) Where l is a differentiable convex loss function that measures the difference between the target   and the prediction  ̂, Ω penalizes the complexity of the XGBoost model, and   is the input (Chakraborty, Mondal, Barua, & Bhattacharjee, 2023;Chen & Guestrin, 2016).
Extreme Random Trees (ERT): The ERT regression was proposed by (Geurts, Ernst, & Wehenkel, 2006); it is an ensemble learning method that relates the concepts of decision trees and random forests.It is a variation of the random forest algorithm and is particularly effective for handling high-dimensional data with noisy or incomplete features.It is an algorithm that constructs an ensemble of un-pruned decision trees in a top-down fashion.The approach involves random cut-point selection for node splitting, rather than using bootstrap replicas, and utilizes the entire learning sample to grow the trees (Heddam, 2021).Moreover, this algorithm substantially reduces the prediction model's variance, which helps prevent overfitting, improves generalization performance, and slightly increases the model's bias, which may introduce a small trade-off between bias and variance.This algorithm achieves these benefits with low computational costs compared to other ensemble methods (Basith, Manavalan, Shin, & Lee, 2018).Mathematically, this algorithm composes a set of decision trees (T), and tree (t ∈ {1…T}) uses the entire training dataset independently during the training process; these trees could be built as either decision or regression trees (Nattee, Khamsemanan, Lawtrakul, Toochinda, & Hannongbua, 2017).

Voting Regressor (VR)
A voting ensemble is an approach in an ML ensemble that uses multiple models instead of a single model to enhance the overall performance of a system.This methodology can be used in classification and regression problems by aggregating the predictions of multiple models.In the case of regression problems, ensembles known as VRs are used, where the ultimate estimate is obtained by averaging the estimations of all the individual models (Kilimci, 2022).Weighted voting (WV) and average voting (AV) are the two methods for awarding votes in VRs.For AV, the weights of all models in the ensemble are equivalent and equal to 1.
However, it assumes all models to be equally effective, which is often not the case, especially when different ML algorithms are employed and is considered its limitation.While WV assigns a weight coefficient to each model in the ensemble, these weights can be either floating-point numbers between 0 and 1, with their sum equal to 1, or integers starting at 1, indicating the number of votes assigned to each respective ensemble model (Shahhosseini, Hu, & Pham, 2022).

Process of Proposed Model
Two ensemble voting regression models were developed in this work that used 3 RF and one LightGBM for

Phase 1: Preparing the Data
Official data was obtained from the two selected hospitals and the public domain; the details of the selected features and data source are explained in section 3.1, and two distinct datasets were created.The input variables (features) were carefully selected to encompass the potential factors that could impact the quantity of MW for each type of the selected hospital.The independent variables (features) were selected based on an overview of effective parameters in MW production from a literature review (Table S1) and interviews with hospital administrators and academics.Data preprocessing was carried out, an important stage in ML modelling.Realworld data often exhibit inaccuracy, inconsistency, and incompleteness, which could cause erroneous results.
Data preprocessing plays a vital role in addressing these issues and ensuring the reliability of the findings (Ramírez-Gallego, Krawczyk, García, Woźniak, & Herrera, 2017).Data preprocessing involves important steps prior to modelling, such as cleansing, instance selection, normalization, transformation, feature extraction, and feature selection (Nguyen et al., 2021).
In the two datasets, there were minimal missing data points.Therefore, two methods were used to handle these missing values, omitting rows and estimating values using a forecasting approach to ensure accurate decisionmaking (Kwak & Kim, 2017).The population data were converted from yearly to monthly by spreading the annual data over the 12 months and then smoothing it by taking 12 moving averages.A moving average is a technique that calculates the average of a set of data points over a sliding time window.By smoothing the data in this way, any short-term fluctuations or noise can be reduced, making it easier to identify long-term trends or patterns.
In this study, the variables were initially normalized to a range of values between 0.0 and 1.0 to facilitate their processing in a standardized manner.The normalized indicators were then assigned as both independent and dependent variables in the analysis (Altin et al., 2023).This way of normalization would help in comparing the performances of the models (Carbonell, Michalski, & Mitchell, 1983).The datasets were split into training and testing sets, with 75% of the samples allocated for training purposes and 25% of the samples for testing.

Phase 2: Building the Model
In the training stage, the parameters (hyperparameters) of the utilized algorithms for the two models were set by default (Microsoft, 2023a).The corresponding algorithms were incorporated to build the two ensemble VR using the default weighted averages.Thereafter, the trained two ensemble VR models for the two datasets were fitted.Lastly, the performances of the two proposed models were assessed using different metrics, R 2 , MAE, mean squared error (MSE), and RMSE, as detailed hereunder.

Performance criteria
The models' performances for each method and the relevance of the characteristics resulting from the test data analysis were evaluated using different measures.And the outcomes of the predictions made by the algorithms were computed.To measure the ML models' performances, four metrics, coefficient of determination (R 2 ), root mean squared error (RMSE), mean absolute error (MAE), and mean squared error (MSE), were used, as shown in Eq. 2 -Eq. 5 (Erdebilli & Devrim-İçtenbaş, 2022;Golbaz et al., 2019;Karpušenkaitė et al., 2016;Rashid et al., 2022).where,   represents the actual value for the i th observation,   represents the predicted value, n represents the number of observations, and ̅  represents the average of predicted values.Models with higher R 2 values and lower RMSE, MAE and MSE were more successful than others (Nguyen et al., 2021).

Experimental data and testing environment
Microsoft Azure Machine Learning Studio is the platform chosen to develop a tool and perform experiments (https://studio.azureml.net).It is a cloud-based machine learning service that provides the ability to publish experimental results on the web.

Results and Discussion
MW generation investigation should encompass private and public sector healthcare facilities.The number of private healthcare facilities represents a large proportion of the total healthcare facilities in Bahrain (Mohamed, Ebrahim, & Al-Thukair, 2009;NHRA, 2020).At the same time, public sector facilities hold a significant share of the healthcare landscape.The key aspects of healthcare facilities in Bahrain from 2010 to 2019 are demonstrated in Figure 2. It reveals that government facilities have 82% of the total beds available and 54% of doctors for healthcare services.In addition, most of the patients in the country are treated in governmental facilities where the admissions rate is 68% and the outpatients rate is 80%.Investigating the two types of healthcare facilities in this study could provide a comprehensive approach, ensure a holistic understanding of MW generation, and pave the way for sustainable and efficient waste management in Bahrain's healthcare sector.

Hyperparameter Optimization
After computing the performance metrics based on parameter default values from the training and test data, the optimal combinations of hyperparameters were acquired.The tuned hyperparameters for the single ML algorithms were found, as shown in Tables 1 and 2.
Table 1 shows the parameters corresponding to three RF models and one LightGBM model within an ensemble framework for SMC.The ensemble weights determine the contribution of each algorithm to the final prediction model, with the first RF weighing 0.533, the second 0.067, and the third 0.333.The "Max Features" parameter represents the maximum number of features considered when splitting nodes in the model; it controls the number of features considered at each split in the RF models, with the first and second RF models both using Finally, the "No. of Estimators" parameter specifies the number of estimators (or trees) used in each model, which varies between 10 and 600.
The parameters for the PH model are summarized in Table 2, which indicates that the first, second RF, first ERT and XGB have equal and lowest ensemble weights of 0.067, while the second ERT and third RF have the equivalent and highest ensemble weights of 0.267.The maximum number of features and the minimum number of samples required to create or split a leaf node varies across the RF and ERT models.The EN model employs parameters like "Alpha," which equals 0.001, and it controls the regularization strength, and "L1 Ratio," which determines the balance between L1 (LASSO) and L2 (Ridge) regularization and it equals 1.The tolerance value in this context likely refers to the convergence tolerance, which determines the threshold at which the training process stops iterating based on the convergence of the model's optimization algorithm.So, the EN model would continue iterating until the optimization process reaches a tolerance level of 0.0001, indicating sufficient convergence.The max iteration was 1000.The XGB model utilizes parameters such as "Base Score", "Learning Rate", and "Max Depth" to fine-tune the boosting process; they equal 0.500, 0.300 and 6, respectively.The number of estimators (or trees) used in all models ranges from 25 to 100.
These hyperparameters and ensemble weights collectively influence the performance of the ensemble model.
By combining the predictions from each model, the ensemble aims to achieve enhanced accuracy and predictive power.Understanding and tuning these parameters are critical in optimizing the model's performance for research purposes.Various relevant features were considered and tested as inputs in the model to identify the optimal models for the datasets utilized in this study.The study aimed to find the most effective models for the given datasets by exploring different combinations of the selected features and which combination of features would yield the best result in prediction accuracy and performance.For the SMC dataset, the initial experiment involved 5 features, including inpatients, outpatients, surgeries, population, and hospital capacity.This resulted in an R 2 value of 0.79, indicating a moderate level of predictive accuracy.To further improve the model, an additional feature, 'Age65andAbove', was incorporated, leading to a significant enhancement in the R 2 value, being enhanced to 0.904.In the PH dataset, 7 features were initially considered, including inpatients, outpatients, surgeries, hospital capacity, deliveries, patient's admission days, and bed occupancy.The resulting R 2 value was 0.742, indicating a relatively lower level of predictive accuracy.To improve the model, an additional feature, 'personal loan', was added to account for its potential impact on private hospitals, which improved the R 2 value to 0.917.Subsequently, the population feature was introduced, further enhancing the model's predictive capability with an improved R 2 value of 0.943.These findings demonstrate the iterative feature selection and incorporation process to optimize the models for each dataset, resulting in improved prediction performance.
The final proposed models' performance measures for governmental and private hospitals are shown in Table 3.As shown in Table 3, the performance measures using RMSE, MAE, MSE and R 2 for SMC and PH are 0.099257, 0.032, 0.003 and 0.904; 0.13528, 0.048, 0.004, and 0.917, respectively.It is important to highlight that various metrics were used to evaluate the models' performance, unlike other studies that rely solely on a single metric, such as MSE (Arabgol & Ko, 2013).
Considering the performance measures, the voting ensemble method in this study is a successful, superior method and the best model for both SMC and PH, as they had the highest R 2 score and the lowest RMSE, MAE, and MSE scores.It means that a voting ensemble comprising three RF and lightGBM algorithms can be used to predict the future amounts of MW for the government hospital, and a voting ensemble comprising 3 RF, EN, XGB and 2 ERT algorithms for the private hospital, even with a small dataset.
In comparison to other studies that used the MLR method, the model proposed in this study demonstrates greater success in this particular aspect than certain other studies (Bdour, Altrabsheh, Hadadin, & Al-Shareif, 2007;Ceylan et al., 2020;Golbaz et al., 2019;Jahandideh et al., 2009;Karpušenkaitė et al., 2016;Thakur & Ramesh, 2018).The proposed model performed closer to that of the ML algorithms for predicting MW amounts that give superior results, for instance, the ANN models developed by Jahandideh and others (2009) and Thakur & Ramesh (2018), Generalized additive and smoothing splines models developed by Karpušenkaitė and others (2016), GRNN, which is a form of ANN proposed by Adamović and others (2018) and ARIMA model developed by Ceylan and others (2020).By comparing the R 2 value, it is closer to the model Voting ensemble using Random Forest, -Adaptive Boosting -Gradient Boosting Machine algorithms developed in Turkey (Erdebilli & Devrim-İçtenbaş, 2022).
The performance of the proposed models in this study is better than many other ML algorithms for predicting MW amount.For example, ANN (MSE = 1.930) and ANN augmented with GA (genetic algorithm) ANN-GA (MSE = 2.9563) developed by Arabgol & Ko (2013) in Iran.Also, it outperformed the multilayer feedforward MLF-ANN model (R 2 = 0.61103) that predicted Lithuania's short annual data case (Karpušenkaitė et al., 2016).
The proposed model also surpasses the Kernel-based models (average R 2 = 0.82-0.86)and Neuron-based models (average R 2 = 0.68-0.74)proposed by Golbaz and others (2019).The model herein exhibits superiority over the deep learning method (R 2 = 0.466, RMSE = 0.094, and MAE = 0.079) and Kernel-based SVM (R 2 = 0.221, RMSE = 0.264, and MAE = 0.202) developed by Altin and others (2023).The ML algorithms used in this study yielded insights into the relative importance of parameters influencing the MW quantities.The degree of significance features for the governmental hospital is shown in Figure 5(a).
It was found that only four features are significant for MW amounts in SMC, and the relevance follows the order: number of inpatients, populations, surgeries and outpatients.
The degree of significance features for the private hospital is shown in Figure 4(b).It was found that five features are significant for MW amounts in PH, and the order of relevance is the number of inpatients, deliveries, personal income, surgeries, and outpatients.
An overview of the results obtained in this study shows that the number of inpatients is the most critical feature for both SMC and PH, while the least significant for both hospitals was found to be the number of outpatients.
The most intriguing finding is that the population is a significant feature in both hospitals, but its importance on MW generation is more pronounced in governmental hospitals than in private ones.
These results partially align with previous studies reported by Tesfahun, Kumie, & Beyene (2016), wherein they found that both the number of inpatients and outpatients are significant factors; they found a strong correlation between inpatient and MW generation but less correlation with outpatients.Another study found that the inpatients and population are significant factors (Wei, Cui, Ye, & Guo, 2021).In contrast, a similar study by Golbaz et al. (2019) included the inpatients using the MLR method, but it did not emerge as a significant factor.Altin et al. (2023) studied the inpatients in their study using Kernel-based SVM and Deep learning, but they did not point out the importance of significant features.
Moreover, the results revealed that the population was the second most important feature for the government hospital, while it was fifth place in the private hospital.The population plays an essential role in government hospitals' MW generation for several reasons.Firstly, government hospitals are often more accessible to a wide range of people, including underserved communities, as they provide services free of charge to the citizens.
Also, government hospitals often serve a larger and more diverse patient base, leading to increased healthcare activities and procedures.Additionally, the scope of services offered by government hospitals is generally broader, including specialized treatments and referral cases, leading to a more significant impact on waste generation and resulting in more MW generation.Therefore, the finding that the population's influence on MW generation is higher in government hospitals is aligned with the hospital's broader role in catering to diverse healthcare needs within the community.
One remarkable finding is that the inclusion of personal loan data in the private hospital dataset significantly improved the model's performance, where R 2 improved from 0.742 to 0.917 (before adding the variable population).Although other more crucial features exist, adding personal loan data remains undeniably significant in predicting MW in private hospitals.This highlights the relevance of personal loan information as a noteworthy factor influencing MW generation in these healthcare facilities.
This study found the best model for estimating the MW generation with a good performance and identifying the most important features.Ceylan and others (2020) obtained a superior model for estimating the amounts of MW using a time-series method (R 2 =0.9888); however, no significant factors were proposed by their study.In addition, other studies that used ML methods did not identify the most important factors associated with MW prediction (Arabgol & Ko, 2013;Karpušenkaitė et al., 2016).Korkut (2018) achieved a significant score by utilizing only a single input factor: the population.However, knowing the other most important features associated with the MW amounts is imperative for substantial implications for strategic decision-making.The findings can provide valuable insights and guidance for developing effective strategic plans.
It was stated that MW estimation is often a complex and challenging task due to the lack of data and the fact that many different factors can influence the amount of MW (Erdebilli & Devrim-İçtenbaş, 2022).The proposed model for MW estimation in this study had an excellent performance as it was based on utilizing real data and providing the most relevant input variables.

Conclusion
This study successfully estimated the generation of MW and identified important influencing features in a governmental and private hospital in Bahrain.The ensemble voting regressor models demonstrated superiority in capturing the complex relationships between the predictor variables and MW generation.The number of inpatients, population, surgeries, and outpatients for the governmental hospital emerged as significant features influencing MW generation.The number of inpatients, deliveries, personal income, surgeries, and outpatients were found to be important features in the private hospital.The model performance evaluation indicated the SMC dataset.And 3 RF, one EN, one XGBoost and 2 ERT for the PH dataset.The proposed model's superiority stemmed from integrating multiple ML algorithms, which collectively endowed it with the capability to accurately predict the quantities of MW in both datasets.The workflow of developing the best ML model is demonstrated below in Figure1.

Figure 1 :
Figure 1: Workflow of proposed ML model.

Figure 3 :Figure 4 :
Figure 3: Total Medical waste in SMC from January 2019 to February 2023 as the max features value, while the third RF model uses 0.400.The "Min Samples Leaf" and "Min Samples Split" determine the minimum number of samples required to create a leaf node and split a node, respectively, with each RF model having different values for these parameters.In the LightGBM model, the ensemble weight is 0.067, indicating its contribution to the overall ensemble prediction.The learning rate parameter controls the step size at each boosting iteration, with a value of 0.1158.The min split gain parameter specifies the minimum gain required to split a leaf node in the LightGBM model, with a value of 0.5789.The subsample parameter determines the fraction of samples used for training each tree, with a value of 0.9000.
(Al-Omran, Khan, Ali, & Bilal, 2021;Al-Omran, Khan, Perna, & Ali, 2023)generated in the government hospital, SMC, over 50 months, spanning from January 2019 to February 2023, is graphically represented in Figure3.Notably, the amounts of MW specifically associated with COVID-19 cases were intentionally excluded from the SMC database.However, despite this exclusion, there was a noticeable overall increase in waste generation after January 2020.In June 2021, when COVID-19 cases escalated in Bahrain(Al-Omran, Khan, Ali, & Bilal, 2021;Al-Omran, Khan, Perna, & Ali, 2023), there was a significant surge in waste generation.This rise in waste generation during the stated period in the hospital could potentially be attributed to the consequences following the recovery from COVID-19.

Table 1 :
Hyperparameter and weights of Ensemble VR model of SMC.

Table 2 :
Hyperparameter and weights of Ensemble VR model of PH.

Table 3 .
Performance measures of the proposed model for SMC and PH hospitals.