Development of a predictive model of rock fragmentation for Nui Phao open-pit mine in Vietnam using multiple-output Neural Networks and Monte Carlo Dropout technique

Precise and reliable prediction of blast fragmentation is essential for the success of mining operations. Over the years, various machine learning models using articial neural network have developed and proven to be ecient in predicting the blast fragmentation. In this research, we design multiple-output neural networks to forecast the cumulative distribution function (CDF) of blast fragmentation to improve this prediction. The model architecture contains multiple response variables in the output layer that correspond to the CDF curve’s percentiles. We apply Monte Carlo dropout procedure to estimate the uncertainty of the predictions. Data collected from a Nui Phao open-pit mine in Vietnam are used to train and validate the performance of the model. Results suggest that multiple-output neural network models provide better accuracy than single-output neural network models that predict each percentile on a CDF independently. Whereas, Monte Carlo dropout technique can give valuable and relative reliable information during decision making.


Introduction
Blasting is an essential process in open-pit mining operations. Their results signi cantly impact the e ciency of downstream processes such as crushing and grinding, loading, haulage, and processing in terms of cost and production. Proper fragmentation is the primary goal of blasting. Therefore, a precise and reliable model for predicting blast fragmentation distribution is incredibly bene cial for mining operations. Blast parameters are de ned as the optimisation of blast design to obtain a good fragmentation.
Open-pit mine blasts result in a widespread muck pile containing thousands of individual rock fragments with different geometric properties. The cumulative distribution function (CDF) curve of fragmentation is usually deployed to evaluate the distribution of rock fragments within a blast muck pile and understanding the physics of blasting parameters. Given the percentile values of fragmentation size obtained by sieving analysis, the CDF curve computes rock fragments' distribution as mass weights of the remaining fragments on each screen with a known mesh size. Different empirical models have been introduced for constructing the CDF curve. For instance, blast fragmentation distribution can be referred to as the Rosin-Rammler or Weibull distributions (Bennett 1936;PAUL Rosin 1933;Paul Rosin, Rammler, and Sperling 1933). The works by Koshelev et al. (Koshelev et al. 1971) and Kuznetsov (Kuznetsov 1973) explored the relations between fragment sizes and the explosive charging parameters or the hardness of blasted rock. Cunningham (Cunningham 1983) proposed the Kuz-Ram model using Kuznetsov's equation and Rosin-Rammler distribution for estimating the fragment sizes and the uniform index of blast muck piles. Nevertheless, some restrictions on using these experimental models due to the intensive labour work needed to determine the factors in the equations and the experiments' bias in some speci c situations.
In recent years, blasting researchers have expressed their concerns about the usefulness of machine learning models when predicting blast fragmentation. Actual data collected from fragmentation measurement using computer vision programs have motivated researchers to design more effective prediction models. Several multi-variate regression algorithms have been applied to capture most blast parameters (Chakraborty et al. 2004;Hudaverdi, Kulatilake, and Kuzu 2011). However, the nonlinear relationship of blast parameters is still a challenging task for these models. Shi et al. (X. Z. Shi et al. 2012) introduced the Support Vector Machines (SVM) to forecast the mean fragmentation size for bench blasting, accounting for blast design parameters and rock mass properties. The result indicates that the SVM model is scienti c and feasible with high accuracy.
Remarkably, Arti cial Neural Networks (ANN) models have been attracted to many researchers in areas of blast fragmentation prediction due to their ability of self-learning and organisation, nonlinear pattern recognition, and generalisation. Various authors applied the ANN model using blast design parameters and rock mass properties to predict the fragmentation. Oraee and Asi (Oraee and Asi 2006) represented an ANN model trained by 13 input parameters and three output parameters in Gol-e-Gohar iron ore mine to predict three percentile values of the fragmentation CDF curve. Other authors developed different ANN models with multiple inputs and a single output to predict a single percentile value of the fragmentation CDF curve or the average fragmentation size. For instance, Kulatilake et al. (Kulatilake et al. 2010) evaluated four learning algorithms to train the ANN model and show the ability to predict more accurate fragmentation sizes of the ANN model and the multi-variate regression model. Sayadi et al. (Sayadi et al. 2013) performed a comparative study on applying two different ANN model structures to predict blast fragmentation and back break. Shi et al. (X. Shi et al. 2013) and Enayatollahi et al. (Enayatollahi, Bazzazi, and Asadi 2014) proved better predictability of the ANN model than the empirical formulas obtained by regression analysis. JA Rosales-Huamani et al. (Rosales-Huamani et al. 2020) designed an ANN model that outputs three results, de ning 80th, 50th, and 20th percentiles of fragmentation sizes.
Even though the developed ANN models above tend to produce better fragmentation estimates than other empirical formula and linear regression models, direct estimation of individual percentiles often has nonmonotonic CDF curves. Hence, postprocessing is required to replace violating percentiles or resorting predicted percentiles. These models often ignore the relation between percentiles and the connection between each percentile and the associated blast parameters. JA Rosales-Huamani et al. (Rosales-Huamani et al. 2020) also reported that their model produced a moderate accuracy since it tends to focus on the target of the 80th percentile while decreasing the effect on the 50th and 20th percentile. There is a pressing need for multi-output learning paradigms as a solution.
Another issue is that standard ANN models for regression only return value predictions given some input variables. By de nition, the value prediction is an estimate and an approximation and contains some uncertainty. The uncertainty may result from the model errors or noise in training data or unknown parameters like blasting engineers' skills or blasted rock's properties. Therefore, the value predictions are insu cient for describing the real uncertainty of the forecast. Gal and Ghahramani (Gal and Ghahramani 2016) introduced a simple Monte Carlo Dropout (MCD) method to capture model uncertainty. They have discovered that training any ANNs with dropout is used to prevent over tting and could be interpreted as an approximate of weight's posterior. The trained ANNs with dropout can make multiple predictions to estimate uncertainty. The MCD mechanism has mainly been applied for image processing tasks with convolutional neural networks (CNNs), where the uncertainty can be evaluated visually (Kim, Kim, and Choe 2020;Myojin et al. 2019;Stoean et al. 2020). Uber rides also used MCD combined with Bayesian NN for the large-scale time series anomaly detection (Zhu and Laptev 2017). A probabilistic wind power forecasting model based on Bayesian theory and MCD is presented in (Wen et al. 2019). The employment of MCD within the deep learning structure in healthcare was shown in (Wickstrøm, Kampffmeyer, and Jenssen 2020).
This study aims to improve the prediction of the CDF curve of blast fragmentation size in bench blasting with a multi-output neural network (MNN) and MCD. With this aim, we undertake a hold-out sample forecasting competition and compare the performance of MNN and single-output neural network (SNN). We demonstrate the ability of MNN in simultaneously predicting multiple and monotonically increasing percentiles of the fragmentation CDF, resulting in better predictions than SNN. MCD is applied to obtain an empirical distribution of the CDF curve, rather than a single CDF curve, allowing us to access the uncertainty in predictions.
The paper proceeds as follows. The rst section introduces and reviews the literature on forecasting the blast fragmentation distribution with ANNs. The second section then describes the data used in the study and the design of the MNN model. In the following section, experimental the results are presented and compared. Finally, the discussion and conclusion complete the paper.

Data
The data consists of 903 samples of blasting records collected at Nui Phao open-pit mines, Vietnam, from 2013 until 2019. The mine applies the bench blasting technique for breaking waste rock and ore. The blasting parameters can be classi ed into controllable and uncontrollable groups. The controllable parameters, including explosive type, burden, blast holes spacing, bench height, drilling depth, explosive charge length, sub-drilling length, and stemming length, are shown in Fig. 1 and Table 1. The hardness of rock and ore, which is classi ed into soft, medium, and hard, is uncontrollable. Fortis7030, Extra100, Extra7030, Anfo, Package, Extra6535, ANE, or mixing Package&Anfo, Fortis7030&Anfo, Fortis7030&Flexigel are primary explosives used in the mine. In Table 1, percentile values were mapped using the image analysis method. High-quality images representing the blasted muck pile are chosen to be analysed using Split-desktop software (LLC 2016).
The scale and distribution of the input data are different from each other. Hence, the numeric data are scaled to lie between 0 and 1. Simultaneously, the categorical data such as rock hardness and explosive type are converted into dummy encoding technique. Figure 2 shows the strong and positive correlations between all output parameters. These correlations are inherent due to the monotonic relationship between output parameters as percentiles of the CDF curve. Besides, we observed no correlations for all input parameters. The basic ANN con guration comprises arti cial neurons or nodes that simulate the human brain's biological neurons. Figure 3 illustrates the simple operations of an arti cial neuron. A neuron receives and sums all signals with the associated weights from its connected neurons on each connection. Each neuron then executes an activation function to give the nal output. Logistic, Tanh, and Recti ed Linear Unit (ReLU) are commonly used functions. For each layer, a neuron also has a bias to adjust the output following the weighted sum of inputs.
Neurons are grouped into one input layer, one or more hidden layers, and one output layer (Fig. 4). Each neuron uses a transfer function on hidden layers to fully connect to all neurons on the next layer. The back-propagation algorithm is widely used for training ANN models. It propagates and minimises the loss (error) between the output and the target back into the network. The associated weights on each connection are modi ed along with the initial loss. The input data then fed forward again, generating a new output and loss. The process is repeated until the loss obtains an acceptable value.

MNNs against SNNs
A typical regression task involves predicting a single outcome of interest. Consequently, the output layer has one single neuron (SNN model). However, some tasks require predicting more than a single result. These tasks are referred to as multiple-output regression (Borchani et al. 2015). The output layer has two or more neurons (MNN model), simultaneously predicting a set of outcomes. While SNN models ignore the correlations between outputs, MNN models leverage them to improve predictive accuracy (Breiman and Friedman 1997). Our goal is to leverage the monotonic relationship between fragmentation percentiles, as shown in Fig. 2. In the training process, the loss is computed for each target output before averaging them to adjust the connection weights.

ANN model uncertainty using Monte Carlo dropout
Dropout is commonly used in ANN models to avoid over-tting (Srivastava et al. 2014) or prevent hidden neurons from learning unnecessary training data features (Hinton et al. 2012). In training time, standard dropout randomly eliminates neurons from their layer by a given probability p (Fig. 4) (Gal and Ghahramani 2016) demonstrated that MCD could be used as a Bayesian approximation of a Gaussian process to capture model uncertainty. For the same input, one can run the ANN model multiple times to obtain a set of predictions resulting in a distribution or determine statistical values such as mean or standard deviation.

Model training and testing
The objective of MNN and SNN con guration is to optimise the prediction accuracy. The networks' input layer has 27 neurons, one for each blasting parameters after being normalised and encoded. The MNN contains nine neurons in the output layer representing nine percentiles of fragmentation distribution. The SNN has one neuron in the output layer generating one percentile to be estimated. Consequently, nine individual SNN models need to build to calculate each required percentile. We provided later details on other features of the models in Sect. 2.2.5. Both models were t using Keras 2.3.1 (Chollet 2015) with a TensorFlow backend.
A tenfold cross-validation approach was used to tune the model hyperparameters and evaluate the model performance. The procedure involves the following steps: 1. The dataset is randomly partitioned into ten parts, 90% for training, and the remaining 10% as a holdout set for testing.
2. The 90% training set is further randomly separated into two groups: 90% for training and 10% for validation.
a. First, the network is initialised using random weights. The weights and inputs traverse through all neurons and activation functions from the rst to the last layer. This forward pass outputs nine percentile values for each observation for both training and validation sets.
b. In the next step, the network measures how good the outputs is by a cost function called mean squared error (MSE): The MSE for the training set is backpropagated and updates the weights and biases to optimise the output values. The stochastic gradient descent is employed as the optimiser for this step. The network continues to adjust by running through new observations from the training set.
c. The early stopping strategy was applied to prevent the network from over tting the training dataset.
The training process is terminated after 500 epochs when the MSE is not improved signi cantly. The nal weights are then evaluated on the hold-out set, which was wholly partitioned in step 1.

3.
Step 1 and 2 are repeated ten times, giving each observation a chance to train the models nine times and in the hold out set one time. The ten-fold cross-validation process results are the mean of the MSE, which measures the model performance on an unseen dataset.

Hyperparameter optimisation
Hyperparameter optimisation aims to nd the set of ANN model con guration arguments that yield the best performance measured on the dataset. The criteria for selecting the nal hyperparameter include choosing the model with the lowest cross-validated MSE. Besides, the chosen hyperparameters should generate zero monotonic violations in forecasting the nine percentiles of the CDF curve. The number of monotonic violations was monitored after each training epoch by: where n is the number of samples in the training dataset, m is the number of percentiles (m=9), max refers to the maximum value of percentiles in the sequence. Each predicted percentile value is checked against the max. If the percentile value is not equal to the max, it will break the monotonicity. For instance, if the nine percentiles were 10,20,30,15,25,28,40,50, and 60 cm, there would be a total of three monotonic violations because 15, 25, and 28 are not the max (30 cm) in their sequence.
There are many ways to implement hyperparameter optimisation, such as manual, grid, or random search. In this study, we chose Bayesian Optimization to optimise the hyperparameters of the two models. Bayesian Optimization searches the optimum hyperparameters using a surrogate probability model of hyperparameter values. The objective function is much easier to optimise than the actual one in training the machine learning model. Bayesian approaches work by nding the next set of hyperparameters that perform the best on the surrogate model's objective function before evaluating them on the actual objective function. It is noted that the time spent on selecting the next promising hyperparameters from the past results is not signi cant in comparison with the time spent on the objective function. Therefore, Bayesian Optimisation fast and effective in comparison with other methods.
The range of hyperparameters for the MNN and SNN models are shown in Table 2. The Bayesian Optimisation was run for 50 iterations using ten-fold cross-validations on the full dataset (903 samples). The Adam optimisation algorithm (Kingma and Ba 2014) was xed to optimise the models. The hyperparameters with the best model performance (lowest MSE ) on hold-out sets were chosen to be trained in the next section. We deployed the Python library Hyperopt (Bergstra, Yamins, and Cox 2013) to run the Bayesian Optimisation. The nal model is structured by one input layer, three hidden layers and one output layer. The input layer contains 27 neurons using the dropout rate of 0.045. The rst hidden layer is formed with 260 neurons with a dropout rate of 0.143 while both second and the third layers consist of 440 neurons with a dropout rate of 0.101. For the output, the nal layer owns nine neurons corresponding to nine percentiles of the CDF curve. All neurons execute the ReLU as an activation function on the input value. In the training process, the learning rate of 0.001 and the batch size of 210 were applied.

Results
3.1 Training and testing Not only producing nonmonotonic CDF curves on validation data, but the SNN models are also generally less accurate than MNN models on testing data. Some examples of predictions by SNN and MNN models on testing data are illustrated in Fig. 6a. The absolute error in prediction accuracy tends to increase for larger percentiles for both SNN and MNN models, as shown in Fig. 6b. Nevertheless, the MNN models generate a lower absolute error than the SNN models for all percentiles, except for the 90th percentile, which is the same for both. The discrepancy in prediction accuracy is distinctive in lower percentiles. Given a better performance of the MNN models, they are used for further analysis and all results hereafter refer to the MNN models. 3.2 Dropout uncertainty Figure 7 shows three examples of 1000 predicted CDF curves using the Monte Carlo dropout procedure. The realisations of CDF curves mostly capture the true CDF curves. The coverage probabilities were computed to determine how well the uncertainty from MCD is calibrated to the observations. For wellcalibrated predictions, the green shade in Fig. 8 should be as close as possible the diagonal black dashed line. This would correspond to the situation in which a given prediction interval should contain the fraction of observations. For example, a 50% prediction interval should include 50% of the observations. In Fig. 8, the coverage probabilities for prediction intervals ranging from 5-100% were constructed for each percentile. The prediction intervals for all percentiles are relatively well-calibrated. Nevertheless, the MNN model tends to be overcon dent, mostly for large prediction intervals. The overcon dent prediction can be observed clearly in 20th percentile where its coverage probabilities are below the "ideal" line.

Discussion
The loss for each output is averaged before back-propagation in training an MNN. A single training epoch for an MNN is computationally equivalent to that for an SNN. Therefore, training an MNN does not increase the computation burden in learning the model weights than training an SNN. Furthermore, SNN requires optimising hyperparameters and training independently for each output of interest. Whereas, MNN is a more straightforward approach since it only needs a single-time method for optimising hyperparameters and training.
Unlike the SNN where a single value is predicted for each percentile of the CDF curve of blast fragmentation, the MNN leverages the relation between percentiles to forecast a CDF curve that does not violate the monotonicity simultaneously. For instance, if the MNN predicts a CDF curve of small-sized fragmentations for a particular blast, the value for the 90th percentile will smaller than it might have been if predicted independently because the MNN shares the information between percentiles. Compared to the SNN, the MNN can provide better accuracy in predictions. The MNN, by its structured inference, is suitable for the multi-variate nature and complex interactions of the multiple outputs used to represent the CDF curve of blast fragmentation.
One of the most prominent and essential bene ts of MCD is the uncertainty information it provides. Since blasting operation is affected by many complicated factors, one cannot guarantee a perfect prediction model. MCD can be considered an ensemble of networks to generate a set of predictions (Fig. 7). This prediction set's mean and variance can be used to estimate how con dent the model is with its predictions and provide valuable information during decision making. For instance, in the case of high uncertainty, more tests and data collection should be implemented or eventually passing the case to a human to avoid potentially wrong results.
The reliability plots in Fig. 8 suggest that prediction uncertainty under the MNN model is relatively reliable for all percentiles of the CDF curve. However, more work is needed to improve the calibration of the SNN model. The model seems to be overcon dent when predicting the unseen test data. This miscalibration generally observed for model uncertainty provided by MCD (Bishop 2006;Kendall and Gal 2017) since it is only applied to obtain uncertainty in the model weights due to training with limited data and knowledge (Gal, Hron, and Kendall 2017;Guo et al. 2017).

Conclusion
Precise and reliable prediction of blast fragmentation is essential for the success of mining operations. In this paper, an attempt is made to design the MNN for predicting the fragmentation CDF curve in the blasting operation of an open-pit mine. Based on k-fold validation, we applied Bayesian Optimization to search the optimum hyperparameters and then trained the MNN with MCD. In the competition with the SNN, the MNN leverages the relation between percentiles of the CDF curve to provide better accuracy with no monotonic violations. Besides, MNN is a simpler approach in term of structure and effort for tuning and training. To improve the reliability of MNN, we applied the MCD mechanism in both training and predicting. The MNN model generates multiple CDF predictions, providing uncertainty estimates to the predictions with more trustworthy.
Similarly to other types of ANNs with MCD, the MNN tends to be miscalibrated. Future works will focus on improving the calibration of MNN model and more real data from different mines to verify the prediction accuracy and evaluate other associated costs. Declarations