Enhancing Machine Learning Algorithms to Assess Rock Burst Phenomena

One of the main challenges that deep mining faces is the occurrence of rockburst phenomena. Rockburst prediction with the use of machine learning (ML) is currently gaining attention, as its prognosis capability in many cases outperforms widely used empirical approaches. However, the required data for conducting any analysis are limited, while also having imbalances in their recorded instances associated with rockburst intensities. These, combined with the multiparametric nature of the phenomenon, can deteriorate the performance of the ML algorithms. This study focuses on the enhancement of the prediction performance of ML algorithms by utilizing the oversampling technique Synthetic Minority Oversampling TEchnique (SMOTE). Five ML algorithms, namely Decision Trees, Naïve Bayes, K-Nearest Neighbor, Random Forest and Logistic Regression, were used in a series of parametric analyses considering different combinations of input parameters, such as the maximum tangential stress, the uniaxial compressive and tensile strength, the stress coefficient, two brittleness coefficients and the elastic energy index. All models kept their hyperparameters fixed, and were trained with the initial dataset, in which synthetic instances were added gradually aiming in the attenuation of a balanced dataset and its further expansion, until the number of synthetic instances reached the number of real data. The assessment of the SMOTE technique is given and its performance is evaluated though the different strategies adopted. The results indicate that SMOTE has a considerable positive effect in the accuracy of the overall classification and especially in the improvement of the within-class classification accuracy, even after the balancing of the dataset.


Introduction
Rockbursts are explosive failures of rock mass around an underground opening, which occur when very high stress concentrations are induced around the excavation (Hoek 2007). Rockburst has been a serious problem in deep underground excavations and many incidents have been recorded and documented worldwide, with some of them associated with fatal results (Andrieux et al. 2013;Shepherd et al. 1981;Zhang et al. 2012;Hedley 1992;Chen et al. 1997; Brady and Brown (2004) defined rockburst as ''a sudden displacement of rock that occurs in the boundary of an excavation, and cause substantial damage to the excavation''.
Two conditions are required to cause this phenomenon. Firstly, the stress that is developed in the rock or the discontinuity exceeds their strength and secondly, the energy released far exceeds the one consumed during the failure process. The stress conditions, the geological structure, the mechanical properties of the rock mass, the human factor and their interaction are the elements responsible for triggering both seismic events and rockburst phenomena. The geological structure involves the presence of faults, shear zones, bedding planes, anticlines and synclines, stratification, bedding planes and material heterogeneity, which affect the stress distribution and can lead to high stresses. The mechanical properties of the rock mass involve the uniaxial compressive and tensile strength, the material brittleness, the heterogeneity of the rock mass, the presence of discontinuities, the friction angle and the Modulus of elasticity. The overall stiffness of the surrounding system and the deformation characteristics of the bursting material affect the intensity of rockburst. The depth of the tunnel, its support, its shape and orientation, the method of excavation and exploitation and the production rate comprise the human factor. Cook (1963) and Salamon (1983) related rockburst and mine seismicity and characterized rockburst as part of the general term seismic event that damages mine workings. Ortlepp and Stacey (1994) distinguished rockburst from seismic events and defined rockburst as damage in a tunnel, resulting from seismic events. Muller (1991) categorized rockburst types in strain burst, pillar burst and fault slip burst. Diederichs (2018) mentions that the evolution of a rockburst phenomenon is affected by the concentration of stresses due to cross-sectional geometry, geological parameters and creeping phenomena, the reduction of confining pressures on the shaft, the ability of the rockmass to store elastic energy and the presence of soft and stiff loading system. According to Castro et al. (2012) strainbursts mainly take place under small confining stresses. In such conditions the failure scenarios include the creation or expansion of parallel cracks and the contribution of the spalling effect or the kinematic instability of the parts. In addition, these cracks reduce the stiffness of the loading system resulting in strainburst phenomena. In contrast, fault slip-bursts occur mostly in conditions of high confining stresses.
Material heterogeneity is a significant parameter that may contribute in the evolution of a rockburst event, because it affects the local strain and stress distribution and crack behavior (Hofmann et al. 2020;Wu et al. 2020;Li et al. 2016). Furthermore due to the extreme and complex stress conditions that lead a deep construction project to marginal stability states, the importance of less common factors that may act in a cumulative manner, such as chemical degradation of the rock mass (Chen et al. 2020a, b) and temperature increase (Chen et al. 2017) cannot be ignored.
The complexity of rockburst and the insufficient understanding of its mechanism  hinders its prediction and the subsequent implementation of mitigation actions. Rockburst prediction with the use of machine learning (ML) is an alternative approach adopted by many researchers that focuses on the learning by experience, while bypassing the need for knowing the cause. The major problem of this approach though, is the lack of sufficient amount of data, which is the key issue for having accurate predictions. More particularly, there is data scarcity related to certain types or intensity scales of rockburst, making the training and pattern recognition of the ML algorithms a very challenging task. Thus, the strategy of employing a method capable of adding synthetic instances and data at selected classes can be proven a valid alternative to overcome such limitations.
In this paper this strategy is used and evaluated against a number of options and analyses made though several ML algorithms. To be more precise, the utilization of the Synthetic Minority Oversampling Technique (SMOTE) methodology is made regarding the rockburst prediction and classification using five (5) major ML algorithms, Decision Trees, Naïve Bayes, K-Nearest Neighbor, Random Forest and Logistic Regression. Instead of the commonly used strategy of employing SMOTE at high oversampling rates to generate synthetic instances only in the initial minority class, in this paper we add synthetic instances in all the constantly changing minority classes, while keeping the oversampling at low rates so as to control and progressively evaluate the process. Additionally, the parametric analysis regarding the number of the input attributes is performed so as to understand the importance of such features and to come up with possible approaches that seems to maximize the positive effect of SMOTE in the classification accuracy.

Rockburst Prediction Methods
According to Qin et al. (2019) currently it is not possible to predict rockburst phenomena; nevertheless areas with a high rockburst tendency can be located with the use of techniques like microseismic monitoring (MS) and/or numerical modeling. Wang (2018) states that the accurate prediction of a seismic event is a difficult task due to the complex and multiparametric nature of the phenomenon and a fundamental step in the rockburst prediction process is the evaluation of the rockburst tendency. According to Zhang and Fu (2008) rockburst prediction can be distinguished between short term and long term. Short term prediction methods (Liu et al. 2014;Cai et al. 2014a, b, c;Cao et al. 2015Cao et al. , 2016He et al. 2011;Hosseini et al. 2011;Gong 2010;Cheng et al. 2009;Yu 2009) include borehole stress, backanalysis, electromagnetic emission, acoustic emission, charge method, microseismic monitoring, and active or passive seismic velocity tomography and are used during the construction stage. On the other hand, long term prediction methods are utilized mainly in the early design stage of a project and involve empirical criteria, numerical modeling, laboratory tests and currently the use of machine learning. The use of microseismic monitoring in the rockburst prediction has been a common topic by many researchers (Dou 2018;Liu 2011;Cai et al. 2014a, b, c). The use of numerical modeling (Vatcher et al. 2014;Tianwei et al. 2015;Board et al. 2007;Vardar et al. 2019;Khademian 2016;Poeck et al. 2016;Khademian and Ozbay 2019;Manouchehriana and Cai 2018;Mitri et al. 1999;Jiang et al. 2010;Sharan 2007) in the rockburst prediction and its combination with other techniques is also a research topic that is investigated by many researchers, but it's main use focuses on the establishing of the burst prone areas and still there is not a universally accepted methodology of simulating accurately dynamic phenomena. Other research studies focus on the simulation of seismic waves generated from fault slips or from the failing rock and the associated damage that is caused in an underground excavation and its support (Qinghua et al. 2016;Dehghan Banadaki and Mohanty 2012;Qiu et al. 2019;Gao et al. 2019;Raffaldi et al. 2017;Cho and Kaneko 2004;Hu et al. 2019;He et al. 2016;Wu et al. 2019a, b). According to Kaiser et al. (1996) numerical modeling for the rockburst prediction is based mostly on static approaches due to the complexity of the phenomenon and the difficulty to realistically simulate the dynamic procedures that are involved during a rockburst.
Regarding the long term rockburst prediction and its classification the empirical approaches are commonly used for the preliminary design of a deep underground construction project. Currently a geomechanical engineer can choose according to his judgement and the uniqueness of the situation between a plethora of rockburst evaluation criteria and some of those include also the prediction of the intensity of the event. Many researchers (Russenes 1974;Hoek and Brown 1980;Turchaninov et al. 1972;Martin et al. 1999;Tajdus et al. 1997) proposed empirical criteria based on the correlation of the stress conditions and the rock strength. Others (Cook et al. 1966;Salamon 1984;Kaiser et al. 1996;Mitri et al. 1993;Brady and Brown 2004;Hedley 1992;Wang and Park 2001;Weng et al. 2017;Kidybinski 1981;Neyman et al. 1972;Ryder 1988) proposed rockburst energy related criteria, from which the energy release rate and excess release rate criteria are the most commonly used, especially in deep underground mines in South Africa. Other criteria that are primarily used for the pillar bursts are based in the assessment of the relative stiffness of the host rock and the failing rockmass (Wiles 2002;Gill et al. 1993;Blake and Hedley 2003). Other empirical approaches are based on the rock brittleness (Singh 1987;Peng et al. 1996;Feng et al. 2000), which can be evaluated by laboratory experiments and relate the pre-and post-peak characteristics of the tested rock. Zhang et al. (2020) proposed a rockburst criterion based on the GSI classification system. Finally, other researches (Durrheim et al. 1998;Heal et al. 2006;Qiu et al. 2011) proposed rockburst evaluation criteria based on the combination of the above indexes and other construction factors.

Machine Learning in Rockburst Prediction
Despite the fact that machine learning has been successfully used in a broad range of areas over the last decades, its utilization in the field of rock engineering is relatively new. Morgenroth et al. (2019) states that machine learning can be a valuable tool to be integrated into the rock engineering practices, due to the complex nature of the geotechnical problems, the difficulty in utilizing all geotechnical data into empirical and numerical models and the rapid increase of the collected data. McGaughey (2019) stated that the application of artificial intelligence in the field of rock engineering is not a simple task, because the data required to make a prediction are sparsely scattered in space and time. However, correlations can be found between large volumes of data, the creation of statistical models through which predictions can be made, and the influence of individual factors on the overall behavior of a system can be made as well as the creation of scenarios and assumptions. Another utility of machine learning in the field of geoengineering is the addressing of issues such as the identification of terrain deformations or instability areas, with limited resources (Tsangaratos and Ilia 2014).
Shirani Faradonbeh et al. (2020) conducted 139 laboratory tests to collect data on the prediction of rockburst-induced trends, which he introduced into 2 models based on gene expression programming (GEP) and classification regression tree algorithms (CART). He first singled out the most important and independent parameters through clustering techniques (AHC, SSE, multiple regression analysis) and then successfully trained the prediction models. Pu et al. (2019) used the Support Vector Machine algorithm to predict rockbursts and their intensity based on 246 rockburst incidents. The input attriputes included the tangential stress, the uniaxial strength, the tensile strength, the stress factor, the brittleness index and the energy index. Initially he aimed at the separation of the independent variables as well as the reduction of the data dimension by utilizing the distributed Stochastic Neighbor Embedding method and then through the clustering method he grouped the remaining data. He then successfully trained a model based on the Support Vector Machine algorithm. Wu et al. (2019a, b) used the Least Squares Support Vector Machine algorithm to create a rockburst forecast model and by conducting sensitivity analyses he reported that the ratio of tangential stress to the uniaxial compressive strength has the greatest influence on the forecast. Li and Jimenez (2018) used the Logistic Regression algorithm in a database consisting of rockburst and nonrockburst incidents. The input attributes included the depth, the maximum tangential stress, the elastic energy index, the uniaxial compressive and tensile strength of the rock. They reported that the depth, the uniaxial strength and the energy index have the greatest weight. In conclusion, he compared the results of the model with 6 empirical indicators and found that the algorithm performed better. Ghasemi et al. (2020) utilized a Decision Tree algorithm to predict the occurrence and intensity of rockburst based on a dataset composed of 174 cases. Furthermore, he evaluated the importance of the input parameters and found that the energy index, the stress factor and the brittleness coefficient are the most important.
Faradonbeh and Taheri (2018) collected a database of 134 rockburst cases and trained the algorithms neural network, GEP and Decision Trees. Afraei et al. (2018) used regression models to predict rockburst and evaluated the importance of the insert attributes that contributed to the predictions. He found that the most important parameters are the maximum tangential stress, the stress factor, the elastic energy index and the uniaxial compressive strength of the rock.
In the rockburst prediction topic Ribeiro Sousa et al. (2017) performed such relevant research and attained a classification scheme from a dataset composed of 60 rockburst cases with the input parameters being the uniaxial compressive strength, the modulus of elasticity, the stress conditions, the excavation geometry and the equivalent cross-section of the opening. The algorithms that were utilized and compared with each other were the K-Neighbor algorithm, the Decision Tree, the Neural Network, the Support Vector machine and the Naïve Bayes. Additionally, he performed a sensitivity analysis to find the weight of each parameter in the final predictions. Li et al. (2017c) presented the application of Bayesian networks on rockburst prediction based on 135 rockburst cases and used as input parameters the depth, the maximum tangential stress, the uniaxial compressive and tensile strength of the rock and the elastic energy index. He reported that the Tree Augmented Naive Bayes algorithm had the best accuracy. Zhou et al. (2016a, b) compared the algorithms Linear Discriminant analysis, Quadratic Discriminant Analysis, Partial Least-squares Discriminant Analysis, Naïve Bayes, K-Nearest Neighbor, Multilayer Perceptron Neural Network, Classification Tree, Support Vector Machine, Random Forest and Gradient-Boosting Machine on the prediction of rockburst intensity based on 246 incidents. The input parameters, which were examined based on their influence, included the stress factor, the depth, the uniaxial strength, the brittleness index and the elastic energy index. Random Forest showed the best performance, while the variable with the highest weight was found to be the energy index. Dong et al. (2013) compared the algorithms Random Forest, Artificial Neural Networks and Support Vector Machine regarding the rockburst prediction and its intensity based on 46 incidents and proved that the Random Forest algorithm had the best performance. Adoko et al. (2013) used the ANFIS method, which is a method combining neural networks with fuzzy logic, in order to predict the intensity of rockburst, based on a dataset consisting of 174 rockburst cases. Zhou et al. (2012) used the Support Vector Machines algorithm regarding rockburst prediction based on 132 rockburst incidents. He et al. (2012) compared the algorithms Decision Trees, K-Nearest Neighbor, Support Vector Machine and Neural Networks regarding the classification of rockburst intensity based on reported rockburst cases. The input parameters included the distance of the event from the excavation, the excavation geometry, the type of support, the uniaxial strength, the modulus of elasticity, the cross-sectional area, the excavation depth, the stress factor, the existence of discontinuities and the excavation method. He reported that neural networks showed the best performance, while the decision trees showed the worst performance.  introduced the Fisher Discriminant Analysis method for rockburst prediction based on 15 cases. Chen et al. (2003) applied Neural Networks regarding the prediction of rockburst and its intensity. Zhao (2005) used the Support Vector Machine algorithm for the long-term prognosis of rockburst based on 16 rockburst cases. Ge and Feng (2008) combined Neural Networks with the AdaBoost algorithm in order to categorize and predict rockburst. Gathering data from 36 rockburst cases and using the tangential stress, the stress factor, the brittleness coefficient and the elastic energy index as input parameters, he presented a promising rockburst forecasting system. Su et al. (2008) proposed the K-Nearest Neighbor algorithm for the rockburst prediction, which is one of the simplest and most effective algorithms in the field of machine learning. Table 1 presents a summary of the ML algorithms, attributes, number of data and the classification accuracy obtained from different researchers regarding the rockburst prediction. The following results have been produced from different datasets, using various evaluation techniques and thus they cannot be directly compared with each other. Nevertheless, one can get a clear idea of the principal attributes used for the assessment and, moreover, the estimated general accuracy level and performance attained in the ML approaches.

Proposed Methodology
The rockburst databases, which are used in various research approaches, have two main challenges to overcome. The first is the unequal distribution of cases and data gathered per rockburst class, while the second, is the lack of sufficient amount of incidents proportional to the complexity of the phenomenon. In the following approach, the SMOTE technique is utilized and the generated synthetic instances are used to improve the qualitative as well as the quantitative characteristics of the rockburst database. The qualitative part refers to the balancing of the database, meaning that synthetic instances are gradually added until the number of cases become equal for all classes. The quantitative part of the improvement refers to the further expansion of the database with synthetic instances, which are distributed uniformly in all classes after the balancing of the dataset.
A wide range of algorithms and attribute combinations are used, so as to showcase the results obtained and evaluate the performance attained over a great variety of modelling schemes. Five of the most common ML algorithms (Decision Trees, Naïve Bayes, K-Nearest Neighbor, Random Forest and Logistic Regression) are selected, while five attributes' combinations are evaluated, based on an attribute selection filter. At first the training is made with the use of the initial database, while subsequently SMOTE is put into effect to generate synthetic instances that were added gradually in the dataset over a total of 48 steps. For each step the algorithms  Li et al. (2017a, b) are trained and evaluated by applying the tenfold cross-validation technique, while keeping the hyperparameters of the algorithms fixed. Finally, a hold-out test set is introduced to the trained classifiers that is ultimately used to test their performance using a number of evaluation indexes (metrics) in terms of the overall classification accuracy attained and their within-class classification fidelity. The methodology is illustrated in Fig. 1.

Data Sources Description and Preparation
The database used in the paper is composed of 249 published rockburst cases over the period 1991-2013, as collected and compiled by various researchers (Bai et al. 2002;Cai et al. 2005;Ding et al. 2003;Du et al. 2006;Feng and Wang 1994;Jia et al. 2013;Jia and Fan 1991;Jiang 2008;Kang 2006;Li and Wang 2009;Li 2009;Li and Xie 2005;Liang 2004;Peng et al. 2010;Qin et al. 2009;Su et al. 2010;Wang et al. 1998Wang et al. , 2004Wang et al. , 2005Wang et al. , 2009Wang et al. , 2010Wu and Yang 2005;Xia 2006;Xiao 2005;Xu et al. 2008;Yang et al. 2010;Yi et al. 2010;Yu et al. 2009;Zhang et al. 2007Zhang et al. , 2010Zhang et al. , 2011Zhang and Li 2009;Zhang 2002Zhang , 2005Zhao 2007). This database is given as a supplementary data to this paper and is available to be used from other researchers. The database is consisted of a number of parameters including the maximum tangential stress (r h ), the uniaxial compressive strength (r c ), the tensile strength  Fig. 1 Stages of the proposed ML methodology to assess rockburst intensity (r t ), the stress coefficient (SCF = r h /r c ) as given by Martin et al. (1999), the brittleness coefficient (B1 = r c /r t ) as proposed by Peng et al. (1996), the brittleness coefficient (B2 = (r c -r t )/(r c ? r t )) as proposed by Singh (1987) and, finally, the elastic energy index (W et ). These attributes, which illustrate and monitor the basic conditions needed for the initiation and propagation of the rockburst phenomenon, consist the input data of the analysis. They are used by the majority of the researchers for the long term rockburst prediction and are part of the most empirical indexes for rockburst assessment. The output of the database is the rockburst's intensity, which can be discerned into four categories: None, Low, Moderate, and Heavy. This intensity based classification and its meaning is presented in Table 2, as proposed by Zhou et al. (2012). The data is used without having made any prior normalization. If a normalization process had been implemented some effects in terms of accuracy improvement or speed could have been observed, as in generally helps the learning process. However, the decision was made to attempt the modelling with the use of the raw data in the models so as to simplify any pre-processing and actually witness the performance of the algorithms. Furthermore, it can be observed that the maximum values of r h , W et and SCF are outliers and correspond to Heavy rockburst incidents. Rockburst intensity is highly dependent on the stress conditions and the ability of the rock to store energy and the increase of these values leads to higher rockburst intensities. Thus these outliers were not deleted and remained on the database.
In Fig. 2 an overview of the distribution of all attributes in the dataset is given, both in terms of values, and in terms of the rockburst intensity class occurrence (None, Low, Moderate and Heavy). It can be easily seen that the rockburst intensity is distributed throughout the value range of all parameters, without having a clearly defined trend or pattern. In addition, in Table 3 some basic statistical information regarding the input attributes are presented, covering the minimum and maximum values as well as the mean values and their respective standard deviation.
Nevertheless, in all the above, the most important thing to see is the imbalanced nature of the database, meaning that the classes are composed of unequally quantities of instances. This is a common issue especially when dealing with phenomena like rockbursts, where the occurrence-and recorded data-of certain intensity classes is more scarce than some other. Thus, the ''None'' class participates at a rate of 19% (47 cases), the ''Low'' class has 29% of the total (73 cases), the ''Moderate'' class 33% (83 cases) and finally the ''Heavy'' class consists 18% of the dataset (46 cases). This fact makes things more complicated since the level of training attained at certain classes could significantly differ and perhaps this could have a negative effect on the accuracy and quality of the analysis. Loosening and falls, often as violent detachment of fragments and platy blocks. The surrounding rock will be bursting severely and suddenly thrown out or ejected into the tunnel, accompanied by strong bursts and roaring sound, and will expand rapidly to the deep surrounding rock Ultimately, the dataset is divided in two parts, the training and the testing subsets. The division has been made using the 70-30 rule, with 71% of the data consisting the training set (178 cases) of the ML model and the rest 29% (71 cases) forming the testing holdout set, which is to be introduced to the finally trained model to assess and measure its performance. The division of the dataset was made randomly, while finally the distribution per class in both training and testing subsets are approximately the same as in the total database.

Synthetic Minority Oversampling
Technique-SMOTE An imbalanced database can create poor performance results or overfitting problems, as often the databases' minority class or classes can be overlooked by the Fig. 2 Data visualization in terms of rockburst intensity and attribute distribution (none: blue; low: red; moderate: cyan; heavy: grey) machine learning algorithms. Sun et al. (2009a, b), stated that database imbalances is a key issue and an obvious problem in employing machine learning algorithms for classification applications, accompanied by other factors such as small databases, class separability issues, etc. Chawla (2004) outlined the importance of the class imbalance problem along with the data distribution within each class in the classifier's performance.
One method for dealing with imbalanced datasets is the adoption of the Synthetic Minority Oversampling Technique (SMOTE) technique (Chawla et al. 2000), which increases the quantity of the minority class with new instances synthesized from existing instances of the minority class. According to Fernandez et al. (2018) the utilization of SMOTE preprocessing algorithm is considered ''de facto'' standard in the framework of learning from imbalanced data.
This technique, which is illustrated in Fig. 3, injects new synthetic data into the database so to increase the available number of instances in the databases' minority class and hence strengthen its presence. It is an oversampling method and it generates new instances with the help of interpolation between the positive instances that lie together.
The procedure involves the following steps. Firstly the minority class is set where A = {x 1 , x 2 , …, x t }. For each x [ A the k-nearest neighbors are obtained based on the calculation of the Euclidean distance between x and every other minority points in set A. Next, for each x belongs to A, n minority points from its k-nearest neighbors are chosen and form the set A 1 . Lastly, for every sample x k [ A 1 new synthetic instances are interpolated based on the following formula: where rand(0, 1) represents the random number between 0 and 1. SMOTE is defined by the k and n indices where, k = nearest neighbors and n = no. of samples to be generated.
The SMOTE process in this research used five (5) nearest neighbors for the creation of the instances, while the oversampling was kept at low rates (5-10%), meaning that the synthesized data created 3 or 4 instances per step. The new synthetic data were inserted to the rockburst classes ''None'', ''Low'' and ''Heavy'', until the balancing of the dataset was succeeded. After that point new synthetic instances were placed successively to all classes. In total 182 synthetic instances were added in the starting training set in 48 steps, from which 32%, 19%, 16% and 33% correspond to the classes ''None'', ''Low'', ''Moderate'' and ''Heavy'', respectively.

ML Model Building
The development of the SMOTE methodology and the ML model is made though the WEKA (Waikato Environment for Knowledge Analysis) open source software. WEKA (Witten et al. 2011) is a robust platform for data mining experiments containing four application environments (Explorer, Experimenter, KnowledgeFlow and Simple CLI). For this study's experiments the Explorer application was used, due to its user friendly environment, the simplicity in visualizing the data and the easy access to plenty of tools and data analytic processes. The use of this software provides a great degree of automation and flexibility in the design model, as well as consistency and confidence in the overall results obtained. Through the next paragraphs the steps to develop and build the ML model are given.

Attribute Selection
Aiming in the optimization of a classifier's performance the Correlation Attribute Evaluation filter combined with the Ranker search method was adopted. This filter has been applied in the initial database before the implementation of the SMOTE balancing. It weights and ranks features based on Pearson's product moment correlation (Hall 1999) for the purpose of a targeted reduction of the amount of attribute combinations. The results of the filter in the rockburst database are presented in Fig. 4. From this figure it is observed that the maximum tangential stress has the biggest weighting factor, followed by the energy index, the stress factor, the brittleness coefficient B1, the brittleness coefficient B2, the tensile strength and the uniaxial compressive strength. Hence in order to gradually decrease the number of inputs and witness the effect on the prediction capability of the ML models, based on the most important parameters, we designed our analysis on the following five attribute combinations leading in twenty-five basic classifiers:

Stratified Cross Validation
Stratified cross-validation is a resampling technique for performance evaluation purposes, in which a systematic way of running repeated percentage splits is done, in an effort to minimize bias from the training and testing subset selection procedure. Cross validation offers two main advantages. Firstly a model is trained with every instance of a dataset and secondly overfitting problems can be reduced. According to Witten and Frank (2005) cross-validation is gaining ascendance and is probably the evaluation method of choice in most practical limited-data situations. Our models were trained and evaluated with the tenfold cross-validation method in the whole training subset. The process involves the division of a dataset into 10 equally proportional folds with class values, from which 9 folds are used for training and the remaining fold is used for testing. Thus 10 evaluation results are originated and averaged. Having done this tenfold cross-validation and computed the evaluation results, Weka invokes the learning algorithm a final (11th) time on the entire dataset so as to have a final working model that can be used for the case selected. In this 11-th ML model the testing subset, that is consisted of completely new data, is introduced so as to attain the final performance in the classification accuracy of the rockburst classes.

Building Classifiers
A total of five ML algorithms have been selected to perform the classification of the rockburst, namely J48, Naïve Bayes, Logistic Regression, Random Forest and K-Nearest Neighbor. Each of them has unique features and can be used in problems related to the identification and classification of patterns as the  Fig. 4 Attribute weight to the overall rockburst intensity ones presented in this research. Some brief notes on those algorithms is given hereinafter.
J48 is an algorithm that generates the decision tree C4.5 and is used for classification tasks. The advantages of this algorithm include the intelligibility of the decision rules and the forecasts, its simplicity in preparing and editing the database, the ability to work both on constant and categorical variables and the generation of efficient forecasts.
Naïve Bayes is a statistical classifier and is based on Bayes' theorem. The advantages of this algorithm include the ease in building and interpreting the algorithm and the ability to be trained with a small dataset.
Logistic Regression algorithm investigates, based on the probability theory, the influence caused by many independent variables on a dependent variable. Logistic regression is used in classification problems and is suitable in cases that the variables are linearly distinct. The advantages of this algorithm include the ability to work with unsmooth dispersion of variables, the presentation of the forecasts with probabilities and its ease to for training and interpretation.
Random Forests are a subcategory of decision trees. The technique initially involves developing decision trees, where each tree ends up voting. The final decision is made with the ''forest'' choosing the class with the most votes. Some of the advantages of this algorithm include the generation of accurate forecasts even with limited datasets, the ability to deal with many attributes and the minimization of overfitting.
The K-Nearest Neighbors algorithm is used in categorization problems and is based on memorizing instances of the training set to determine the class of a new instance through the use of distance-based measures. The advantages of this algorithm involve the simple implementation, the ability to work with linear and nonlinear classes and the update of the algorithm with new instances at a minimal cost.
Though the result obtained, the overall ML performance evaluation was made, for all the 25 classifier configurations, with and without the use of the synthetic data (SMOTE on/off). In order to make the assessment of the classification's accuracy as predicted from the models/algorithms used, a set of 4 major performance evaluation indices have been employed, namely the Classification Accuracy, K-statistic, F-Measure and Area Under the Curve.
The overall Classification Accuracy index (ACC) is the simplest metric to use and is ratio of the correctly classified instances to the total number of instances, taking values from 0 to 100%. The K-Statistic (k) was introduced by Cohen (1960) and takes into account chance agreement. Its formula is as follows: where f O are the observed agreements between raters, f E are the expected agreement, and n is the total number of observations. Thus, in the case where two measurements agree only at the chance level, the value of kappa is 0; when the two measurements agree perfectly, the value of kappa is 1. F-Measure (F-M) is the harmonic mean of precision and recall, where precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, and recall is the number of true positive results divided by the number of all samples that should have been identified as positive. This metric of a test's accuracy has been widely used in the evaluation of oversampling algorithms from previous researchers (Ali et al. 2015;Bajer et al. 2019). Finally, the Area Under the Curve (AUC) is a summary of the ROC curve and measures the ability of a classifier to distinguish between classes. In general, an AUC of 0.5 suggests no discrimination while 1 denotes perfect classification.

Results of the ML Methodology
All ML models performed relatively well in accurately classifying the rockburst classes of the unknown testing subset that was introduced to them. The results below are focusing on the attained performance prediction capability in terms of accuracy with and without the use of synthetic data (SMOTE methodology) with respect to the selected number of the input attributes/parameters used (from 3 to 7 attributes).
The results for the ML models used are given in the following Figs. 5, 6, 7, 8 and 9. In each diagram, each line represents one of the 5 attribute combinations used. Y-axis represent the accuracy level, while x-axis denotes the total instances that were used for the training of the ML models. Their start position is the value of the 178 instances (initial training dataset), from which new synthetic data are added until the final value of 360 instances is reached, attaining the doubling of their initial data points (instances), meaning that the synthetic data inserted equals the real data of the initial database. The vertical line at the point of 248 instances represents the threshold where the balancing of the dataset is reached, meaning that at this point all the rockburst classes of the training dataset contain the exact number of instances. Thus, with this threshold in mind the diagrams can be discerned in two major parts; the first one up to the point of balancing and the second from then on until the doubling of the dataset.
The Random Forest algorithm has the best accuracy when the initial training dataset is used. This is shown for all attribute combinations that yield consistently high accuracy levels ranging from 71.8% (RF 3attributes ) to 74.6% (RF 4,6attributes ). At the early stage of the SMOTE process, before the balancing of the dataset is achieved, this algorithm showed an improvement in its performance. The maximum attained accuracy scores during this stage ranged from 74.6% (RF 3,4attributes ) to 76% (RF 5,6,7attributes ). After the balancing of the dataset the accuracy of the classifiers dropped in general, except the one of the 5-attribute classifier, whose performance was steadily increased after the point of 288 instances. The 5-attribute classifier (RF 5attributes ) achieved the highest accuracy (77.5%) at the points 340 and 356, which is the best score in this study.
As for the KNN algorithm the starting accuracy varies between 60.6% (KNN 7attributes ) and 69%  (KNN 4attributes ). During the balancing of the dataset, the addition of synthetic instances improved the performance of the classifiers with 5, 7 and 3 attributes, but the highest accuracy score is maintained by the 4 attribute starting classifier. SMOTE enhanced further the predictive ability of the classifiers after the balancing of the dataset. The classifiers with 3 and 7 attributes achieved their highest scores (67.6%) at the point of 256 instances, while the model KNN 5,6attributes outperformed the highest starting score (73.2% and 71.8%) at the points 356 and 344 respectively. Regarding the J48 diagrams the starting accuracy scores range between 60.6% (7 attribute) and 69% (4 attribute). The classifier J48 6attributes starts with the second lowest accuracy (63.38%), but before the balancing of the dataset (at 233 instances) achieves the highest accuracy of all the J48 classifiers (71.8%). Similarly, during the balancing stage, the performance of the classifiers J48 3,7attributes reached their peak scores (70.4% and 69%), while the J48 5attributes classifier increased its accuracy at the point 214. After the balancing stage, the addition of synthetic instances enhanced the performance of the J48 5attributes classifier, which attained its highest accuracy (70.4%) at the point 320.  The starting accuracy scores that were obtained by the Naïve Bayes algorithm, range between 57.7% (NB 7attributes ) and 66.2% (NB 3,4attributes ). The highest scores attained by the classifiers are achieved at the first stage of the SMOTE procedure, before the balancing of the dataset, between the points at 187 and 200 instances. The NB 5attributes classifier obtained the highest accuracy (70.4%) in comparison with the rest Naïve Bayes classifiers, followed by the NB 3,4attributes classifier (69%).
Finally the Logistic Regression algorithm presented the worst starting scores, which vary between the values of 54.9% (LR 5attributes ) and 57.7% (LR 4,6,7attributes ). Similarly to the Naïve Bayes algorithm the performance enhancement of the Logistic Regression classifiers occurs at the early stage of the procedure. The LR 3attributes and LR 5attributes classifiers attain improved scores (57.4% and 56.3%), while the LR 7attributes classifier obtains the highest overall accuracy (59.15%).
In Table 4 the maximum increase (Max) and decrease (Min) of the evaluation metrics that were achieved compared with the starting scores (Start) of the classifiers during the SMOTE process is presented. The table is split based on the number of attributes (No) and the machine learning algorithm. The blue boxes represent the maximum increase (%) per algorithm per evaluation metric, while the yellow boxes represent the maximum scores of the evaluation metrics before and after the use of SMOTE. The negative percentage values indicate a decrease in classifier's performance due to SMOTE utilization as compared to the initial / starting performance of the classifier (without SMOTE). From the same table it can be observed that the performance of the ML models with respect with the metrics of ACC, k, F-M and AUC is consistent, in general. This is evident when looking for instance the cases where the best and worst classification performance is achieved (i.e., J48 7 attributes ) in which all metrics exhibit the same behavior.
It can be seen, that all ML algorithms (before SMOTE) which are consisting of 4 attributes have achieved the best starting/initial scores. In those classifiers SMOTE had negative effect in their evaluation metrics, denoting a drop in performance. On the contrary, the highest scores, as a result of SMOTE, were obtained by ML algorithms having 5, 6 and 7 attributes, which attained lower initial scores. The same trend is observed in the cases exhibiting the maximum percentage increase in the metrics, in which the low-attribute algorithms (3 and 4 attributes) have the smallest increase rates, indicating that SMOTE performs better when dealing with an increased number of evaluation attributes.
The highest starting scores (SMOTE off) were obtained by the 4-attribute Random Forest classifier-RF 4attributes (ACC, k and F-M values of 74.6%, 0.66, 0.74, respectively), while, the best overall results (SMOTE on) were achieved by the 5-attribute Random Forest (RF 5attributes ) classifier (77. 5%, 0.7, 0.77, respectively). That means that the use of SMOTE In a head to head comparison in the RF 5attributes classifier the SMOTE increased the accuracy by 6%, 9% and 5% with respect to ACC, k and F-M metrics. The maximum increase due to SMOTE was registered in the J48 algorithm that used the 7-input attributes (J48 7 attributes ). The increase in its accuracy measured by the ACC, k, F-M and AUC values was 14%, 26%, 15% and 10%, respectively. In general, 20 out of the 25 starting ML classifiers,namely J48 7,6,5,3attributes ,kNN 7,6,5,3attributes ,LR 7,5,3attributes ,NB 7,6,5,4,3attributes ,RF 7,6,5,3attributes performed better with the use of SMOTE, indicating the positive effect of the technique in the rockburst classification and prediction.
In Table 5 a comparison is given between the starting classifiers (before SMOTE) and the best classifiers (after SMOTE) focusing not only on the overall classification performance but rather taking into account the classification accuracy attained within any of the individual rockburst intensity classes (within class classification). This could indicate the ability of the ML algorithms to further distinguish and correctly classify rockburst patterns. The evaluation is made taking into account the True Positive Rate (TP Rate) i.e. the proportion of positives that are correctly identified, and the F-M index.
Overall the values of the metrics after SMOTE were greatly improved between 3% and 33.5%. SMOTE affected positively the capability of the ML algorithms in distinguishing the classes ''None'', ''Low'' and ''Moderate'' and more specifically J48 and Random Forest algorithms were benefited the most. These algorithms achieved 100% accuracy in distinguishing the existence of rockburst. An issue exists in the classification accuracy of the ''Heavy'' rockburst cases, which is due to the fact that the rockburst database consists of both strain-bursts and fault-slip bursts, which lead to within-class sub concepts.
It is clear though that after utilizing SMOTE, the differences between the metrics per class are smoothed and the overall results are more homogeneous in all ML algorithms. For example regarding the J48 algorithm and the TP rate, the classifier scores 0.476 at the ''Low'' rockburst class, a value significantly lower than those of the other classes. The TP rate improved substantially by almost 30% and reached the value of 0.619 after SMOTE. This improvement was also witnessed in all other classes and also occurred in the F-M values indicating that SMOTE enhanced both the overall performance of the algorithms as well their classification performance within the classes.

Discussion
As already mentioned, a direct comparison between the results of the present research and the results published in the literature wouldn't be realistic, because of the existing differences in the training and testing datasets of the algorithms. In addition to that, instead of employing the commonly used strategy of constantly adjusting the hyperparameters of the ML algorithms using a stable, non-changing, dataset, the ML algorithms in this research were kept constant in their initial structure and characteristics, while the training dataset was gradually expanded. In any case the results obtained were amongst the best recorded, indicating that the incorporation of the SMOTE technique in the whole process can be a useful addition to obtain a more balanced database, an element of key importance in making accurate prognosis, especially in cases of geotechnical character where data are hard to find. The lack of sufficient amount of rockburst data is noticeable in the J48 and Random Forest algorithms (Figs. 5 and 7). The addition of synthetic instances was carried out in very small steps with rates of 5-10%, however the results of the evaluation metrics showed great sensitivity per step of the process. For instance regarding the Random Forest algorithm consisting of 5 attributes the accuracy obtained was 73.2% at point 268, 64.8% at point 272 and 70.4% at point 276. This nonlinearity that is reflected on the above diagrams, reveals the lack of sufficient number of training data and indicate the need for enriching the rockburst database. Further optimization is possible though, as the SMOTE process itself could be more refined and enhanced by changing the number of neighbors used or using targeted synthetic instances to be created that would affect the performance of the classifiers. At another level, the weight evaluation and final selection of the parameters to be used as inputs as well as different data preparation strategies (e.g. normalization) in their values could allow for a potential increase of the attained prediction accuracy. A limitation of this approach lies in the exact ability and easiness of SMOTE to generate synthetic instances and if this is hastily utilized just to come up with a prognosis, it could lead at some point to overfitting problems. To overcome this barrier, additional research should be made regarding the combination of synthetic data with virtual data originated from rockburst numerical modeling. Of course until more realistic and validated methodologies to overcome such issues are available, the utilization of SMOTE should be done with caution, up to the balancing point of the database, especially in cases that incorporate considerable amount of input parameters and when the initial accuracy of ML algorithms is rather low.

Conclusions
This study examined the effect of SMOTE in five (5) ML algorithms under a set of five (5) different input attribute characteristics, regarding the rockburst longterm prediction capability with respect to its expected intensity class. The methodology followed generated synthetic data at stepped intervals until the balancing of the dataset could be achieved and continued further on, so as to assess the best strategies in employing this technique. A total of 25 classifiers have been evaluated against a hold-out testing dataset, with the use of several performance metrics measuring the quality of the classification /prediction performance. Based on the findings the following can be stated: • The implementation of SMOTE managed to increase the performance of twenty (20) out of the twenty five (25) total classifiers, thus proving its value as a tool for enhancing the capability of ML algorithms when dealing with imbalanced datasets. • The increased accuracy of the classifiers can be obtained either before or after the balancing of the database, with respect to the algorithms used and the input attributes. • The maximum classification accuracy scores (ACC) obtained by the algorithms (J48 6attributes -71.83%, RF 6attributes -77.46%, KNN 5attributes -73.24% and NB 5attributes -70.42%) are among the highest in the current literature. • The most reliable model was the Random Forest algorithm consisting of five attributes (RF 5attributes ). and trained in a dataset composed of 340 instances, from which the number of synthetic instances was 162. The classifier obtained an accuracy (ACC) of 77.46%. • The maximum percentage increase due to SMOTE is the J48 algorithm consisting of 7 attributes (J48 7attributes ). The increase in ACC, k, AUC and F-M performance evaluation metrics was 14%, 25.5%, 10.2% and 13.9%, respectively. • In general, the generation of synthetic instances through SMOTE increased the overall performance metrics by an average of 5-10%. More importantly though it has significantly improved and smoothed the within class classification accuracy of the algorithms up to 30%.
The methodology presented and the use of SMOTE is a step towards the right direction that could enable the enhanced training of algorithms used for rockburst prediction. Of course, many issues need to be resolved so as to be globally used. To facilitate further research it should be stressed out that the availability of the actual data to all researchers is a significant factor and a decisive leap forward for the validation of using synthetic data in scientific areas like the one of geotechnics. To this end, the initial dataset used in this research, with reference to their initial source of origin, is provided as supplemental data that could either used or be further enriched and passed along to the engineering community.
Funding None.
Availability of Data and Materials Available after request.

Declaration
Conflict of interest The authors declare that there is no conflict of interest.