Intelligent Prediction of the Equivalent Circulating Density from Surface Data Sensors During Drilling by Employing Machine Learning Techniques

: The precise control for the equivalent circulating density (ECD) will lead to evade well control issues like loss of circulation, formation fracturing, underground blowout, and surface blowout. Predicting the ECD from the drilling parameters is a new horizon in drilling engineering practices and this is because of the drawbacks of the cost of downhole ECD tools and the low accuracy of the mathematical models. Machine learning methods can offer a superior prediction accuracy over the traditional and statistical models due to the advanced computing capacity. Hence, the objective of this paper is to use the artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) techniques to develop ECD prediction models. The novel contribution for this study is predicting the downhole ECD without any need for downhole measurements but only the available surface drilling parameters. The data in this study covered the drilling data for a horizontal section with 3,570 readings for each input after data preprocessing. The data covered the mud rate, rate of penetration, drill string speed, standpipe pressure, weight on bit, and the drilling torque. The data used to build the model with a 77:23 training to testing ratio. Another data set (1,150 data points) from the same field was used for models` validation. Many sensitivity analyses were done to optimize the ANN and ANFIS model parameters. The prediction of the developed machine learning models provided a high performance and accuracy level with a correlation coefficient (R) of 0.99 for the models' training and testing data sets, and an average absolute percentage error (AAPE) less than 0.24%. The validation results showed R of 0.98 and 0.96 and AAPE of 0.30% and 0.69% for ANN and ANFIS models respectively. Besides, a mathematical correlation was developed for estimating ECD based on the inputs as a white-box model.


Introduction
Equivalent circulating density is an important parameter for monitoring the drilling operations especially for the narrow window between the formation and the fracture pressure. ECD is the total pressure of the mud hydrostatic column and the annular losses, and hence, it shows the mud pressure against the formation in the case of mud circulation 1 . Therefore, it is critical to estimate the ECD with a high degree of precision to avoid any well control issues like loss of circulation, formation fracturing, and underground blowout situations.
During the drilling operations, several factors were found to have an impact on the ECD, and among them, the annular pressure losses, wellbore geometry, mud properties (density and viscosity), mud pumping rate, downhole pressure and temperature, and concentration of cuttings [2][3][4][5] . ECD can be acquired by means of downhole measurements, estimation using mathematical models, and/or predicting with the help of artificial intelligence (AI) techniques. The new technology in the drilling tools assisted in implementing a continuous circulating tool to monitor the ECD and provide good control for the formation pressure 6 . Downhole measurements of the ECD are available using downhole sensors as measurements while drilling and pressure while drilling 7,8 . The downhole measurement is considered accurate and robust for ECD values, however, the implementation of these downhole tools is not common due to the expensive daily charge and operational limitations such as downhole pressure and temperature that cause the tool failures.
Several mathematical correlations exist in the literature for estimating the ECD that are different in the fluid type and the parameters utilized as inputs. ECD estimation by implementing the material balance calculation for the mud compositional analysis was studied in the literature 9,10 . However, the models had many assumptions and limitations regarding the downhole pressure, temperature, mud types. Bybee 11 introduced a mathematical equation to calculate the ECD. The model considers the effect of concentration of solids in the annular, in addition to, the mud static density and other mud-related parameters.
The developed mathematical correlations are limited to some applications, and it ignores a lot of other input parameters that have an impact on the ECD values. Such ignored parameters as well geometry, fluid rheological properties, the rotation of the drill string, downhole pressure and temperature conditions that affect the mud density, cuttings dispersion, hole cleaning, and swab and surge of drillpipe movements in the hole 12,13 . Ignoring these parameters will affect the ECD prediction and lead to the inaccurate evaluation of ECD and causes well control problems during the drilling operations 14,15 .

Predicting ECD by Employing Machine Learning Techniques
Predicting the ECD from the drilling parameters is considered a new outlook for drilling engineering practices in the petroleum industry and that because of the limitations of the downhole ECD tools and the low accuracy of the mathematical models.
Artificial intelligence is a technique that utilized high computing capabilities for processing advanced algorithms to solve technical/problematic issues by simulating the human brain's thinking manner 16 . AI has many tools like artificial neural networks (ANNs), adaptive neuro-fuzzy inference system (ANFIS), support vector machine (SVM), and functional networks (FN) that showed high performance and accuracy level for prediction and classification problems 17 . The implementation of AI has wide applications in many disciplines of engineering, economics, medicine, military, marine sectors, etc. 18,19 .
In the oil and gas industry, many studies utilized machine learning techniques for finding solutions for practical challenges [20][21][22][23] . Intelligent models were accomplished by artificial intelligence tools for many purposes as identifying the formation lithology 24 , predicting the formation and fracture pressures 25,26 , estimating the properties of reservoir fluids 27 , estimating the oil recovery factor 28,29 , predicting the tops of the drilled formation 30 , ROP prediction and optimization for different drilled formations and well profiles [31][32][33] , determining the content of total organic carbon [34][35][36] , and estimating the rock static Young's modulus [37][38][39][40] , predicting the compressional and shear sonic times 41 , determining the rock failure parameters 42, detecting the downhole abnormalities during horizontal drilling 43 , determining the wear of a drill bit from the drilling parameters 44 , and predicting the rheological properties of drilling fluids in real-time [45][46][47][48][49] .
For ECD prediction, Table 1 represents recent works that were performed for ECD prediction from the drilling and mud parameters. Ahmadi 50 utilized the least square support vector machine (LLSVM), ANFIS, and enhanced particle swarm optimization PSO-ANFIS tools to estimate the ECD from only mud initial density, pressure, and temperature. The results showed the outperformance of ANN than the other tools. Ahmadi et al. 51 studied predicting ECD by employing PSO-ANN, fuzzy inference system (FIS), and a hybrid of genetic algorithm (GA) and FIS (GA-FIS) from the initial mud density, pressure, and temperature data. The PSO-ANN model presented a high degree of prediction performance in terms of coefficient of determination (R 2 ) and average absolute percentage error between the actual and predicted values of ECD.
Alkinani et al. 52 predicted the ECD using the ANN model that had only one hidden layer and 12 neurons and the study utilized drilling parameters in addition to the hydraulics and mud properties as mud pumping rate, properties of the mud (density, plastic viscosity, and yield point), total flow area for the bite nozzles (TFA), revolutions per minute for the drill pipe (RPM), and the weight on bit (WOB). Abdelgawad et al. 5 provided a model for ECD prediction using two AI techniques ANN, and ANFIS. The study provided an ECD-ANN model of one hidden layer with 20 neurons, while the ANFIS model was developed by utilizing five membership functions with gaussian membership function (gaussmf) as the input membership function and the output membership function was a linear type. Rahmati and Tatar 53 employed radial basis function (RBF) to build an ECD prediction model that showed a good prediction capability with R 2 of 0.98 and AAPE of 0.22%. It is clear from the literature that the AI models enhanced the ECD prediction, however, the models are different in terms of the input parameters, the data used to feed the models, and the methodology followed for the ECD prediction. One of the shortcomings found from many studies in the literature is that the downhole pressure and temperature are required as inputs in the prediction models, and from an operational view, downhole sensors are required to acquire these parameters with high accuracy for better ECD prediction, and this will add operational cost and time for the data collecting. Consequently, the new contribution of this study is to employ available real-time drilling parameters from surface rig sensors to build ECD prediction models using ANN and ANFIS techniques.
The novel approach in this study is that the AI models are mainly dependent only on the mechanical drilling parameters that are mud pumping rate (GPM), rate of penetration (ROP), drillstring speed in revolutions per minute (RPM), stand-pipe pressure (SPP), weight on bit (WOB), and drilling torque (T). Besides, the study presented an empirical correlation that can be easily utilized for ECD estimation from only the drilling parameters. The AI models that were presented in this study were validated from another data set to ensure high and robust performance for ECD prediction.

Materials and Methods
The study utilized real drilling data that was collected from the drilling operations from real-time sensors. Figure 1 represents, in brief, the processing flow to provide robust ECD models starting from the data gathering, data cleaning and filtering to provide the model input parameters with good quality, the training process for the AI model and optimizing the model parameters with the trained algorithm, testing the accuracy for the model results and if the accuracy is low, then re-training process should be performed in order to get the optimum model parameters for high accuracy performance for the ECD prediction.

Data Description
The data obtained for the current study was collected during a drilling phase in the Middle East. The data covered the horizontal section for drilling the 5-7/8-inch hole. Total 3,570 points were obtained after the data preprocessing. The drilling parameters that were utilized as inputs for the model were collected from the surface rig sensors that represent GPM, ROP, RPM, SPP, WOB, and T. ECD data was collected from the downhole pressure tool and it was used for the model output estimation. Also, another cleaned data set (1,150 points) from the same drilling phase was employed for further model validation as unseen data set to ensure the model prediction performance.

Data Cleaning and Statistical Analysis
The obtained data are preprocessed by removing the missing points and the data outliers using MATLAB. As shown in Figure 2, the correlation coefficients (R) between the output (ECD) and drilling parameters after preprocessing the data. The relative importance of the data showed that SPP and T have the highest R of 0.87 and 0.85 respectively with the ECD, while the WOB showed the least R (-0.01) with the ECD and that shows that the relationship might be a nonlinear type between ECD and WOB. It is noticed that T, SPP, and RPM showed a direct relationship with ECD, while GPM, ROP, and WOB presented an indirect relationship with ECD.

Building AI Models
This study employed two techniques from the AI tools to develop ECD prediction models using only the drilling parameters. ANN and ANFIS techniques are trained using the input data by training and testing ratio of 77 to 23. The training and testing data sets were randomly selected. The sensitivity analysis for each model parameter to have the best model architecture. The model prediction was evaluated with two statistical parameters in addition to the ECD profiles for the actual and the predicted data. The correlation coefficient (R) and the average absolute percentage error (AAPE) were calculated by Equations 1 and 2. (1) where N is the number of samples in the dataset, is the actual output, i is the predicted output.

Artificial Neural Network (ANN) Model
ANN tool was utilized for solving engineering problems by its processing algorithms based on interconnected artificial neurons that mimic the biological neural networks 54,55 . Three layers represented the common architecture for ANN which are the input layer, hidden layer, and output layer 56 . These layers are connected by a set of weights and biases which are tuned during the optimization process of the network to control the prediction performance of the network 57 . The network is usually trained with different learning algorithms to optimize the network and to control the processing of the neurons 58 . These neurons are considered the elementary elements from which any neural network is constructed 59 .
Many parameters were tested to check its impact on the ANN model accuracy as the hidden layer/s number, the neurons` number, network, training, and transfer functions. Figure 3 shows the design of the developed ANN model in this study.

Adaptive Neuro-Fuzzy Inference System (ANFIS) Model
ANFIS is an adaptive neuro-fuzzy inference system that was established in the early 1990s 47 . ANFIS is a type of ANN that depends on the Takagi-Sugeno fuzzy inference system 60 . The interface of ANFIS utilized a set of fuzzy "if-then rules" that can learn and optimize the nonlinear functions 61 . ANFIS architecture consists of four layers. The first layer, called the fuzzification layer, collects the inputs and determines the membership functions (e.g. sigmoid, gaussian, trapezoidal, or straight line). The second layer, denoted as "rule layer", applies many fuzzy "if-then" rules. In the third layer, databases are employed for membership function rules, and the decision-making unit is developed for the inference operations, while in the last layer, the defuzzification interface is performed 61 .
ANFIS model was developed using the subtractive clustering method. The cluster radius and number of iterations are ANFIS parameters that were checked for the optimization process.

Results and Discussion:
This section discussed the obtained results from the two AI developed models for predicting the ECD from the real-time drilling parameters.

ANN Results
The designed ANN code was optimized by testing many scenarios to achieve the best model parameters that are listed in Table 3. For each code run, only one parameter option was tested and the results were compared in terms of R and AAPE. By the end of the optimization process, the best combination of the model parameters was recognized. Training to testing ratio for the data sets was found to be 77 to 23% as 2,743 data points for training and 827 points of data for the testing process, only one hidden layer with 15 neurons was efficient for better prediction accuracy, the best network, training, and transfer functions were fitting network (newfit), Levenberg-Marquardt backpropagation (trainlm), and softmax respectively, and 0.12 is the optimum learning rate.

ANFIS Results
The same procedures were followed for optimizing the model parameters, however, cluster radius and iterations number are the target parameters for the ANFIS model. After several runs for the ANFIS code, the optimum parameters were found as 0.8 for the cluster radius and 300 for the number of iterations. Figure 5 displays the ANFIS results for the model training and testing processes.

ECD Empirical Correlation from ANN model
An empirical correlation was developed for ECD estimation from the ANN model. The empirical correlation that can be employed to estimate the ECD using the input/drilling parameters and the weights and biases of the optimized ANN model. The developed empirical correlation can be used after normalizing the inputs to be in the range between -1 and 1 (Equation 3): (3) Where, is the normalized value for variable , is the value of variable at point i, is the minimum value of variable , is the maximum value of variable .
The minimum and maximum values for each parameter that are used for data normalization are presented in Table 4. The proposed empirical correlation that can be used for ECD estimation in the normalized form is presented in Equation 4. The correlation uses the weights and biases that are shown in Table 5.
where, is the normalized ECD, is the number of neurons in the hidden layer, i.e. 15, is the weight associated with each feature between the input and the hidden layers, is the weight associated with each feature between the hidden and the output layers, is the bias associated with each neuron in the hidden layer, is bias of the output layer.
The obtained has to be to an actual ECD value, Equation 5 can be used: Where, is the normalized ECD obtained from the developed correlation, is the actual value (pcf).

Models Validation
The validation process for the developed models is essential especially for the practical operations in the oil and gas industry. The developed ANN and ANFIS models were validated to ensure the models' performance for predicting the ECD for unseen data. An unseen data set (1,150 points) from the same field was collected and cleaned to be fed to the models as inputs to estimate the ECD and compare the actual versus the predicted ECD from the models. Figure 6 represents the ECD prediction performance from the two developed models. ANN model provided a higher accuracy level than ANFIS, however, the two models showed a high ECD prediction that shows a correlation coefficient of 0.98 and 0.96, and AAPE of 0.3 and 0.69 for ANN and ANFIS respectively.

Comparison of the Models Performance
The two developed machine learning techniques showed a strong performance for the ECD prediction. However, ANN outperformed the ANFIS model especially for the validation process as the slight underestimating for the ECD prediction from the ANFIS model. Figure 6 shows the error histogram for the two models for the three stages (training, testing, and validation). Both models have a slight normal distribution for the errors for training and testing that ranged between -0.4 to 0.6 (pcf). The validation process showed different distribution for the histogram of the errors as ANN had a normal distribution with a range from -0.4 to 0.8 (pcf), while ANFIS showed a range for the errors between 0 to 1 (pcf) and this is attributed to the underestimating of the ECD.
In addition, Figure 7 summarizes the performance of the two developed models in terms of the correlation coefficients and average absolute percentage error between the actual and predicted ECD values for Training, testing, and validation data sets. It is clear that the ANN model has a better performance that ANFIS for estimating ECD for the validation process as ANN provided a correlation coefficient of 0.98 while ANFIS had 0.96, and for the AAPE, ANN had 0.3% while it was 0.69% for the ANFIS model.

Conclusions
The equivalent circulating density (ECD) was predicted from the real-time recordings of the surface drilling sensors by employing two different machine learning techniques (ANN and ANFIS). The input drilling data are GPM, ROP, RPM, SPP, WOB, and T. The data (3570 points) was split to build the model with a 77: 23 training to testing ratio (2,743 data points for training and 827 points for testing). Another data set (1,150 points) from the same field was used for the validation process of models. Many sensitivity analyses were performed to optimize the ANN and ANFIS model parameters. The following conclusions represent the outputs from the work:  The statistical analysis data showed a wide range for all parameters that showed the solid data-base for the two AI models.
 ANN model was optimized by one hidden layer, 15 neurons for the hidden layer, fitting network (newfit) network function, Levenberg-Marquardt backpropagation (trainlm) as a training function, and softmax as a transfer function for the model architecture.
 ANN results showed a high R of 0.99 for the training and testing, and a low AAPE of 0.24% for training and 0.19% for testing.
 ANFIS model has the optimum parameters of 0.8 for the cluster radius and 300 for the iterations number and results showed that R was 0.99 for the training and testing, and AAPE of 0.23% for training and 0.18% for testing.
 The models' validation showed a strong prediction performance for ANN and ANFIS as R was 0.98 and 0.96, and AAPE was 0.30% and 0.69% for ANN and ANFIS respectively.
 The developed empirical correlation for ECD based on the optimized ANN model showed high accuracy for predicting the ECD in real-time without the need for the ANN code.
The new contributions from this study will save time and cost for estimating ECD in the real drilling operations as the machine learning models were built based on the drilling data collected by the drilling sensors.