Application of soft computing approaches for modeling annular pressure loss of slim-hole wells in one of Iranian central oil fields

In order to have a better control over the drilling process and reduce the overall cost of this drilling operation, engineers have tried to use soft computing (SC) techniques to conduct the pre-estimation of drilling events. It is critically important to estimate the annular pressure losses (APL) for non-Newtonian drilling muds within annulus in order to specify pump rates and also to be able to choose the most appropriate mud pump systems while conducting the drilling operations. To develop the vigorous and exact models to enable the prediction of APL, two popular models were employed, i.e., multilayer perceptron (MLP) [optimized by Levenberg–Marquardt (LM), Bayesian regularization (BR), scaled conjugate gradient (SCG), resilient back propagation (RB), and Broyden–Fletcher–Goldfarb–Shanno (BFGS)] and radial basis function (RBF). Subsequently, applying a committee machine intelligent system (CMIS), the four top models were combined into a unit paradigm. Several tools such as error distribution diagram, cross plot, trend analysis, and cumulative frequency diagram were used in conjunction with statistical calculation to assess the efficiency of models. Consequently, the CMIS model was introduced as the most exact technique which has the greatest coefficient of determination (R2 close to one) as well as the lowest root-mean-square error (RMSE close to zero) for the tested dataset.

One of the crucial parts of any drilling package is drilling hydraulic that provides the possibilities to calculate the pressure profiles along the wellbore and especially in the annulus, enhancing the validity and safety (Rooki 2016; Kazemi-Beydokhti and Hajiabadi 2018; Jafarifar and Najjarpour 2021.In order to be able to characterize pump rates and choose the mud pump systems while performing the drilling process, it is highly important to accurately determine the pressure losses of non-Newtonian drilling muds within annulus (Rooki 2015;Ozbayoglu and Sorgun 2010;Kumar et al. 2020).The pressure loss gradient in an annulus is dominated by the rheological behavior of fluid as well as the geometry of system (Ozbayoglu and Sorgun 2010; Kumar et al. 2020).
If the pressure losses are ineffectively predicted, this may result in critical drilling problems, including stuck pipe, kicks, loss circulation, and inappropriate selection of rig power when performing the drilling operations (Rooki 2015(Rooki , 2016)).Nonetheless, when estimating the frictional pressure losses in a well annulus for drilling or well control processes, engineers have faced significant problems.
There are several factors that can affect the annular pressure losses (APL) during drilling, such as rheological properties of drilling fluid, flow rate, weight of drilling fluid, flow regime, hole configuration, rotation rate, pipe eccentricity, and some other variables (Jafarifar and Simi 2023;Ozbayoglu et al. 2018;Sorgun et al. 2011;Razi et al. 2013).A general method to calculate the APL is carried out according to the development of empirical correlations, which are developed to extend pipe flow correlations and also for non-Newtonian fluids.Furthermore, one of the critical forces that control well cleaning both in deviated drilling and horizontal drilling is frictional pressure loss, which has a direct impact on shear stress (Razi et al. 2013;Saasen et al. 1998;Ozbayoglu et al. 2010).
Numerous attempts have been made in the industry to describe slim-holes in different ways.In a conventional well, the production interval may have varying diameters, between 6 1/2 and 9 5/8 inch; yet, this value may range from 3 1/2 to 6 inch in a slim-hole (Jafarifar et al. 2020;Azari and Soliman 1997;Enilari et al. 2006).The pressure losses are significantly affected by the reduced annular clearance in slim-holes, while 90% of the pressure losses take place in the bit nozzles and the drill string in a conventional well.In spite of this, nearly 60% of pressure losses happen in the annulus in slim-hole wells (Jafarifar et al. 2023;Hansen et al. 1999;Sagot and Dupuis 1994;Ribeiro et al. 1994).
Reduction of the overall drilling costs has been mentioned among the most significant issues in the petroleum industry (Brunsman et al. 1994;De Sousa et al. 1999).One of the most reliable methods to attain a considerable decrease in costs is the use of slim-hole drilling (Jafarifar et al. 2020;Pks and Yerubandi 2010;Scott and Earl 1961;Cohen et al. 1995).
Different studies have been conducted so far in order to propose a comprehensive technique to predict APL.Chien (1970) has attempted to investigate the flow design transition of Bingham plastic and laminar flow pressure loss in pipes and annuli both theoretically and experimentally.In another study by Uner et al. (1989), the APL for both eccentric annulus and concentric annulus has been experimentally determined.It was shown in their study that a considerable reduction in the volumetric flow rate of a power-law fluid took place as the inner pipe was displaced from a concentric position to an eccentric position.Haciislamoglu (1994) developed a special approach to calculate pressure losses.In their procedure, correlations for the single/multi-phase fluid flow conditions as well as the added fluid flow rheology were developed.Wang et al. (2000) tried to show the effects of some factors on APL in slim-hole drilling.They also demonstrated that some conditions may result in an increase the APL, for instance enhancing the drill pipe rotation speed, increasing the annular mud flow velocities, decreasing the pipe eccentricity and finally annular slim clearance.Silva and Shah (2000) in an experimental study attempted to investigate how eccentricity can affect pressure loss in annulus.Their research indicated that in completely eccentric annuli, the APL was 18-40% lower than that of the concentric annuli.Zhou et al. (2005) presented a mechanistic model for underbalanced drilling with aerated muds.The hydraulic model determines the flow pattern and predicts frictional pressure losses in a horizontal concentric annulus.
Ozbayoglu and Sorgun (2010) proposed a method for APL prediction of non-Newtonian fluids in situation of eccentric horizontal.In another study by Han et al. (2010), the effect of drill pipe rotation and annulus inclination on particle rising velocity, the drilling mud carrying capacity, and as well as pressure loss in a slim-hole well was investigated.According to the results obtained in their study, it was found that an increase in pressure loss was seen mixture of mud flow rate and rotation of the inner cylinder.In an extensive theoretical and experimental study conducted by Kelessidis et al. (2011), it was attempted to investigate the pressure losses of fluids that were modeled as Herschel-Bulkley in eccentric and concentric annuli in transitional, turbulent, and laminar flow.
An algorithm was established by Song and Guan (2012) to predict pressure profile in condition of underbalanced drilling processes during circulation of aerated drilling muds in deep water wells.Based on their research, deep water wellbore pressure distribution was found to be nearly independent of circulating periods.
In order to estimate the pressure loss of Herschel-Bulkley drilling muds during horizontal annulus with the use of an artificial neural network (ANN), Rooki (2015) proposed a new method.Furthermore, the frictional pressure losses and flow patterns of two-phase fluids were estimated by Ozbayoglu and Ozbayoglu (2009) in horizontal wellbores by means of ANN.A general regression neural network (GRNN) introduced by Rooki (2016) was used for the prediction of APL of Herschel-Bulkley drilling muds in concentric and eccentric position.The values predicted in the mentioned study by means of GRNN were very close to the experimental values, as the average absolute percent relative error (AAPRE) was lower than 6.24%, while the coefficient of determination (R 2 ) observed for the estimation of pressure loss was 0.99.
A neural network was introduced by Ozbayoglu et al. ( 2018) for estimation of APL in complex geometries.The motion in this study exhibited a greater precision than the other models existing in the literature.It was found that the use of neural networks enables the prediction of the pressure losses very accurately, particularly with respect to complicated fluids and geometries.A data-driven approach was proposed by Gul et al. (2019) for the prediction of frictional pressure losses in polymer-based fluids.They compared the predictions resulted from their proposed data-driven approach for frictional pressure loss with the experimental data as well as the models, which are widely applied in the industry.
A method to predict Herschel-Bulkley fluids using Bayesian neural network (BNN), random forest (RF), and support vector machines (SVM) for prediction of APL in both concentric and eccentric positions was also proposed by Kumar et al. (2020).Their research underscored the performance assessment of the considered algorithms and their pitfalls to accurately predict pressure loss.Jafarifar et al. (2020) was conducted a research in one of Iranian central oil fields to study the development of water-saltbased drilling fluids for slim-hole wells.In their research, the Bingham plastic and power-law models were used to evaluate each composition, and then, the results obtained for each model were compared.Lopes Pereira et al. (2022) presented a model for enhanced annular pressure drop in shallow ultra-ERD wells.Their work presents an enhanced methodology to accurately model annular pressure drops in ERD wells.
This study is aimed at establishing a more convenient and reliable method to predict APL for drilling mud flow applying the ANN technique and committee machine intelligent system (CMIS) model.Moreover, none of the literature models has considered such large amount of experimental data.Two well-known ANN models [i.e., multilayer perceptron (MLP) and radial basis function (RBF)] have been developed for this purpose; it should be mentioned that the MLP model is trained by Levenberg-Marquardt (LM), Bayesian regularization (BR), scaled conjugate gradient (SCG), resilient back propagation (RB) and Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithms.Next, from among all the methods developed for this purpose, four models exhibiting the lowest statistical errors and greatest accuracies were selected and integrated into a unique model known as CMIS.
The main novelty of this study is to propose the CMIS method for this purpose, which has not been utilized or tested in the previous performance comparison studies.In addition, by conducting a new comparison investigation, the best model is detected and chosen for future applications.Statistical error analysis and graphical evaluation were used to measure the validity and accuracy of the introduced models.The effects of the five input variables on the estimation of APL were investigated by conducting the sensitivity analysis technique.

Experimental database description
In the present study, it was attempted to accumulate a large data set encompassing 882 experimental data for the APL obtained from rheology tests and the associated calculations.The input parameters that are included in the data bank are inside diameter (ID), outside diameter (OD), velocity of fluid (V), properties of drilling mud, i.e., plastic viscosity (PV), yield point (YP), apparent viscosity (AV), mud weight (MW), and the operating variable, i.e., mud flow rate (Q), were considered as inputs of the network, while APL is supposed to be the output parameter.Table 1 shows an example of the available experimental data, which has covered the ranges of drilling parameters; however, for better access to the research background, the whole databank is provided in the supplementary file.Table 2 shows the statistical descriptions of the chosen input and output parameters, i.e., 1-42 cp for PV, 0-90 lb/ ft 2 for YP, 8.957-11.029ppg for MW, 125-250 gpm for Q, 4.125-6.125inch for OD, 2.875-4.750inch for ID, 4.042-8.543ft/s for V, 3.591-328.242cp for AV and 0.009-0.679psi/ft for APL.This dataset has been extracted from a previously conducted study by Jafarifar et al. (2020) that has been applied to conduct the analysis in the current paper.See the previous publication (Jafarifar et al. 2020) for the relevant detailed information.
The general methodology for the whole study is illustrated in Fig. 1.As it can be seen in this figure, this research study includes three main steps of preprocessing, processing and post-processing, each one consisting of several sub-steps.Preprocessing includes the outlier detection process, which removes all the data points that are out of the normal range.After this step, the main process of this study is performed, which incorporates several prediction methods, including RBF-ANN, MLP-ANN and CMIS.The CMIS method incorporates several individual predictors (i.e., MLP-LM, MLP-BR, MLP-SCG, and MLP-RBF) in a single algorithm to boost the accuracy of prediction process.In the final stage, the post-processing in being performed by using the superior method (i.e., CMIS) for all the predictions to achieve the final results and calculate the ultimate prediction errors.

Multilayer perceptron
ANN is an innovative soft computing (SC) method with specific performance characteristics, which are close to the biologic neural structure (Osman and Aggour 2003;Mohaghegh 2000;Tomiwa et al. 2019;Specht et al. 2007).An ANN model structurally consists of three different components, i.e., a transfer function, a learning algorithm and the network architecture (Asadisaghandi and Tahmasebi 2011; Van and Chon 2018;Al-khdheeawi and Saleh Mahdi 2019).This structure can identify difficult configurations very fast and precisely.ANN consists of two processing components, which are applied to create data and links allowing the interconnection among the components.The processing ingredients inside the layers are neurons, which are all in their particular layer acting in the identical manner (Mohammadi and Richon 2007;Cordes et al. 1995;Hinton et al. 2010;Nakamoto 2017;Ahmadi  and Chen 2019).Each neuron consists of two parameters, i.e., bias and weight.Also, these factors are called synaptic agents, which are usually ameliorated by an iterative training method utilizing a learning principle.Moreover, transfer function, including Logsig and Tansig, is typically applied for making the neuron acting and assessing the duplicate of the neurons to an external drive (Maren et al. 2014;Haji-Savameri et al. 2020).Two of the most famous types of ANNs are RBF and MLP neural networks (Hemmati-Sarapardeh et al. 2018;Varamesh et al. 2017).The main difference between the RBF and MLP is that the data processing procedure is performed through the neurons.MLP techniques can recognize different outlines and are also capable of detecting complicated relationships among the inputs and outputs in the data.A MLP model includes input and output layers.The input data are parallel to the input layer, and the number of neurons is determined by the number of input data in the input layer.On the other hand, the output layer is parallel to the output of the pattern.The advantages of MLPs are: • It can be used to solve complex nonlinear problems.
• It handles large amounts of input data well.
• It makes quick predictions after training.
• The same accuracy ratio can be achieved even with smaller samples.
In the MLP model, there is one or more hidden layers allowing the communication between the layers of input and output.A single hidden layer is appropriate for many problems (Hemmati-Sarapardeh et al. 2016;Ameli et al. 2018;Dashti et al. 2018;Rostami et al. 2018), and to detect the neurons number existing in this layer, a trial-and-error technique should be used.Generally, two hidden layers are utilized for networks that are highly intricate.The number of neurons in the output layer or hidden is obtained through this procedure: initially, the neurons amount in any layer at a certain weight are multiplied, and then, they are summed up and a bias terms is added to them.Eventually, in order to transfer the obtained value, the activation function is utilized.
In this study, in order to train the MLP algorithm, the following learning algorithms are used: LM algorithm, BR algorithm, SCG algorithm, RB algorithm, and BFGS algorithm.In their study, Hemmati-Sarapardeh et al. ( 2018) have provided more details about the above-mentioned algorithms.A scheme provided for the MLP network, which is employed in this study, is shown in Fig. 2.

Radial basis function
A firmly established neural network is the RBF that is applied for both regression and classification.Indeed, the notion of RBF of neural networks is based on the function approximation theory.Broomhead and Lowe (1988) introduced the RBF neural networks for the first time, which are a type of feed-forward neural networks.The mentioned networks are able to treat the data that are distributed in a scattered way.They can also generalize to a great dimension space very easily and then give highly precise outcomes for the intended request (Broomhead and Lowe 1988), even if it is a regression of classification problem.These networks are extensively utilized in some mathematical studies investigating the physical properties assessment (Varamesh et al. 2017;Najafi-Marghmaleki et al. 2017).Typically, a RBF technique has a three-layer structure.In such an architecture, an input layer is linked to the output layer via a hidden layer (Panda et al. 2008).
Figure 3 shows a schematic illustration of this algorithm.The input vector is transmitted to the hidden layer by the input layer.Indeed, the input layer contains p input nodes where the value of p amounts to the quantity of input parameters in the technique.The main element of the RBF neural network is the hidden layer.The task of this layer is to transfer the data from the input space to a hidden space, which has a higher number of dimensions than the input space (Zhao et al. 2015;Hemmati-sarapardeh et al. 2018).In the hidden layer, each point is generally centered at a Fig. 1 The general workflow in this study certain space that has a specific radius, and hence, the distance between the input vector and its center is commutated in each neuron (Yu et al. 2008).The central vector consists of cluster centers represented by c ij with j indicating the center vectors number (j = 1,…, N).It is noteworthy that the value of N should be less than the quantity of input data utilized to train the technique (Panda et al. 2008).For instance, for a hypothetical model which has five input variables as well as 200 data sets for the purpose of training, i is assumed to be extended from 1 to 5 and N should be considered less than 200.The Euclidean distance is used to measure the distance between the inputs and centers that is commutated as shown below: For the above-said example, it is assumed that in a model containing five factors, p is equal to 5. In the following, a number of widely applied RBFs are mentioned: In the present research, the Gaussian function is applied as the activation function since it presents a smoother behavior when x = c j and the U (r) has the maximum value and it reduces with an increase in the value of r.Consequently, the just parameter that can activate the neurons in the hidden layer is the input vector x, which is located in proximity to the center of the Gaussian RBF.The spread coefficient of the Gaussian function r indicates the width or radius of bell-shape curve, and it should be where N is the neurons number in the hidden layer, x is the linking weight, c indicates the center, (||x -c||) is the Euclidean distance.The Gaussian RBF neural networks include two main parameters, i.e., the number of neurons in the spread coefficient and the hidden layer.As mentioned before, the neurons number in the hidden layer could be the same as the number of input data sets.Notwithstanding, as the number of neurons increases, the complexity of the networks enhances, but the degree of error decreases as a result.It is noteworthy that as the neurons' number enhances and the spread values get smaller, an acceptable level of accuracy to train the data will result; however, it cannot predict the testing data accurately.As a result, these variables must be optimized.In some previous studies, a trial-and-error procedure has been applied for the optimization of these parameters, while more accurate results can be provided by optimizing them by means of a meta-heuristic algorithm.

Committee machine intelligent system
In the standard method using intelligent techniques, a large number of alternative models are made and finally the most suitable model is chosen, and then, the remaining models are excluded.In such a condition, it is a waste of time to attempt to train the excluded models.We can overcome this shortcoming if we combine the intelligent models to make a committee machine (committee of machines).The CMIS is a type of ANN proposed by Nilsson (1965).The philosophy of this technique is dividing and conquering in order to be able to solve an issue.Committee machines are generally divided into two categories, which are static and dynamic structure (Haykin 2007; Dashti et al. 2018;Shokrollahi et al. 2015): The approach adopted in the current research belongs to the static structure category.In fact, the solutions provided by different models are merged in such a way to obtain a more suitable solution and also to have a precise solution.The question, which is posed here, is that how we can combine the various solutions (models) in order to achieve the best model.One way to do this is through a linear combination of the solutions by a simple averaging procedure or through weighted averaging (Ghiasi-Freez et al. 2012;Perrone and Cooper 1992;Hashem and Schmeiser 1993).In the simple averaging procedure, all the solutions have an equal contribution.This is exactly the main deficiency of this averaging procedure since the most precise solution must provide the highest contribution to the last solution.In the weighted averaging procedure, the solutions are unified according to their level of exactness and the total of linear combination coefficients amounts to one (Shokrollahi et al. 2015).In the present research, a rectified weighted averaging was used, and afterward, a bias term was added to the linear correlation.

Levenberg-Marquardt algorithm
One of the most well-known techniques in the MLPNN modeling to find the weight coefficients and the optimal bias is called LM that is used for nonlinear least-square issues.In the mentioned model, it is attempted to find the minima of local type that are not necessarily the same as the points of global minima.Despite the fact that being very distant from the final minima is quite possible, we may find the solution of the problem using the LM technique.It is not necessary to scheme the Hessian matrix.Put it differently, it is possible to calculate the approximate Hessian matrix and the gradient matrix using the following formulas as indicated in Eqs. ( 7) and ( 8 where T, J, and e represent transpose matrix operation, Jacobian matrix, and error vector of the network, respectively.By executing the Newton's formula, the following update equation is obtained to find the interconnection weights: where x and g indicate interconnection weights and the coefficient of Newton-like update, respectively.After performing every stage, the g amount and hence the performance function is reduced.The readers are recommended to look for the related information in the existing literature (Hagan and Menhaj 1994).

Bayesian regularization algorithm
According to the LM algorithm, the popular training algorithm known as BR is used for finding the weight and bias parameters (MacKay 1992; Foresee and Hagan 1997).
In this technique, we should obtain the weights in a way that the least-squares error is achieved when doing the calculation.Therefore, the network structure can be well generalized by a suitable adjustment of the squared weights and errors (Pan et al. 2013).The interrelating neural weights used in the objective function are obtained as illustrated in Eq. ( 10) (Yue et al. 2011;Rostami et al. 2018): where a & b, E D , E x and F(x) represent the coefficients of objective function, the total of network errors, the total of squared network weights, and objective function, respectively.The weight vector and the training set distributions are determined by the distribution of Gaussian category.
Using the Bayes hypothesis, the afore-said variables of b and a are assigned for a normal weight space.In order to minimize the F(x), the computations are conveyed to the LM phase; as a result, it will be possible to update the weight space.If the stopping condition is not satisfied, this process continues to estimate the new b and a values (Yue et al. 2011).

Scaled conjugate gradient algorithm
In order to tune the weight space, the conventional backpropagation method leads to the most negative descent direction.The decrease in the performance function would be faster in this way, but it might not satisfy the requirement for the highest convergence speed (Yue et al. 2011).Accordingly, it possible to achieve a greater convergence speed using the conjugate gradient, at the same time that the diminished deviation is preserved in all the previous stages (Kis ¸i and Uncuog ˘lu 2005), which is shown in the following: where P andg 0 represent the conjugate direction and the search direction, respectively.In order to move along the present search line, the most proper distance can be specified through the optimization of the phase size.The line search method is used to calculate the sizes as shown in the following equation (Kis ¸i and Uncuog ˘lu 2005; Rostami et al. 2018): Subsequently, after combining it with the prior search line direction, the next path of the search line is specified as indicated in following principle (Kis ¸i and Uncuog ˘lu 2005; Hagan et al. 1996).
The conjugate algorithms are categorized according to the b-determination technique (Kis ¸i and Uncuog ˘lu 2005).The SCG method can be used along with the line search technique.This approach is a cheaper method compared to the line search technique.This algorithm is obtained by combining the conjugate gradient and trust area.You can refer to the literature (Møller 1990;Rostami et al. 2018) for more details.

Resilient backpropagation algorithm
Various transfer functions like Sigmoid or Tansig are used in the MLP neural network.These functions are involved in reducing the infinite input domain to the limited output domain.In an activation function for instance Tansig, by entering a big input, the line slope approaches zero.This may lead to some problems when using the steepest descent for training the network.This is because of the minor amount of the gradient, and thus, some small variations occur in weights and biases.The RB algorithm has been suggested to solve these problems in order to eliminate the adverse effects of the sectional derivatives (Riedmiller and Braun 1993).

Broyden-Fletcher-Goldfarb-Shanno algorithm
The BFGS method represents the most efficacious quasi-Newton method used for unrestricted nonlinear schedules in the extensive area of nonlinear programming.In general, BFGS is different form the Newton method, where an assessment to Hessian matrix is assumed instead of the true Hessian matrix H (Bazaraa et al. 2013;Nezhad et al. 2013).The minimization of the conventional Newton methods can be achieved by calculating the gradient as well as the Hessian matrix of subsequent derivatives when optimizing the function optimization.Moreover, the Hessian matrix should be calculated and reversed; normally, more time is needed to calculate this process.However, the updating and inversion of Hessian in quasi-Newton (QN) methods are performed through the analysis of the gradient vectors, resulting in the reduction of the objective function with great success.
QN methods are capable of building a technique for the objective function by assessing the variations in gradient; this algorithm has shown to have a super-linear convergence, which results in the improvement of the steepest descent, particularly while solving tough problems.Furthermore, in the QN methods it is not necessary to compute the second derivatives in comparison with the conventional Newton methods.Thus, QN methods are thought to be more efficient.Therefore, some famous types of QN methods have been widely applied by different researchers, including Broyden class (BC), Davidon-Fletcher-Powell (DFP), symmetric rank one formula (SR1), and BFGS model (Zhang et al. 2020).However, in this study the BGFS algorithm has been employed.

Prediction of APL by the developed models and CMIS method
At the initial step, the statistical error of each model was examined.Next, the graphical error was assessed for each model to validate their accuracy and adequacy.In this study, it was attempted to collect 882 experimental data points for the APL from three slim-hole wells.In all these models, the obtained data have been divided into training and test sets.It was decided to use 80% of the datasets (706 data point) as the training set and the rest of the datasets (176 data point, amounting to 20%) were applied as the test set to evaluate the exactness of the proposed method.The procedure of dividing the gathered data into the testing and training sets was completely randomized in order to prevent the agglomeration of the determined data sets in a possible area of the issue.PV, YP, Q, OD, ID, AV, V, and MW were chosen as the input parameters for the models, and APL was selected as the output of the model.The models developed here were called MLP and RBF, whereas the learning algorithms proposed for the MLP models are LM, BR, SCG, RB, and BFGS.It was found that from among all the designed networks, the most optimal architecture of the MLP based models is 8-10-6-1.The first number represents the number of input data.In addition, in the first and second hidden layers, the numbers of neurons are indicated with the second and third numbers.The last number denotes the output data number.The use of two hidden layers in MLP model is the most effective selection with Logsig and Tansig transfer functions being applied for the first and second hidden layers, respectively.In addition, purelin transfer function is applied for the output data.
RBF models consist of two essential parameters, i.e., the spread coefficient and the neurons number that are improved applying a trial-and-error practice.The spread coefficient for RBF amounted to 0.6, while the number of neurons is 360.The statistical parameters of the testing, training and the total data sets that are computed for the ANN models advanced in the current study are presented in Table 3.As it is seen, the highest accuracy [(R 2 = 0.994), the least error (AAPRE) = 4.992, root-mean-square error (RMSE) = 0.009, as well as the lowest standard deviation (SD) = 0.139] was yielded by the CMIS model, suggesting that the CMIS method gives the highest ability to predict the APL compared to other models.Assessment of the data presented in Table 3 indicates that from among all the developed models, the CMIS method was shown to be most accurate as the AAPRE value obtained by this method in the testing, training and total data points was the lowest one.However, from among the whole proposed models, MLP-BFGS gives the highest AAPRE.
The valid models obtained after CMIS are RBF, MLP-BR, MLP-LM, MLP-SCG, MLP-RB, and MLP-BFGS, respectively.In the current study, six intelligent models were established and finally the four-best models, namely MLP-LM, MLP-BR, MLP-SCG and RBF, were integrated into a single model known as a CMIS method.A schematic illustration of the CMIS method, which is developed in the current study, is shown in Fig. 4. A multiple linear regression is used in order to find the optimum coefficients of CMIS method, resulting in the following equations obtained for the CMIS model: where a 1 to a 5 are as follows:a 1 = 0.56594349; a 2 = 0.24483030; a 3 = 0.05099936; a 4 = 0.13861489; a 5 = -0.00008577

Statistical evaluation of the models
Once a new model is made, it is necessary to check its predictive performance.Accordingly, it is essential to have the inaccuracy or the fitness degree of the measured data in order to evaluate every model.The major standard to measure the fitness of data is called AAPRE, although there are several definitions available for the degree of deviations in every modeling investigation, i.e., RMSE, R 2 , SD, as well as the average percent relative error (APRE).As the degree of deviation from the reality decreases (becoming closer to zero) and the fitness degree increases (becoming close to one) between the model and the empirical data, the model shows a satisfactory function.The definitions of the criteria applied in the present study to assess the model are shown in Eqs. ( 15)-( 19): 1. Average absolute percent relative error (AAPRE): 2. Root-mean-square error (RMSE): 3. Coefficient of determination (R 2 ): 4. Average percent relative error (APRE): 5. Standard deviation (SD): Here, x i denotes the actual value, y i represents the estimated value, x stands for the average actual value, y indicates the average estimated value, and N represents the data points number.The reliability of the suggested models is shown with a low value for APRE, AAPRE, RMSE, SD, and a high R 2 value for the present dataset.On the other hand, since the AAPRE is a relative value that is not affected by the number of data, it is selected as the most essential factor in our study (Wu et al. 2020;Shi et al. 2020).As it is seen in Table 3, the CMIS model gives the least values for APRE, AAPRE, RMSE and SD compared to the other methods proposed so far.Also, the suggested CMIS model can predict all data points with very high exactness, with its RMSE being equal to 0.009.On the other side, all the techniques advanced in the current study are capable of predicting the APL with a high degree of exactness, and the experimental data are in accordance with the suggested models.More importantly, the greatest accuracy in predicting the APL is provided by the CMIS.

Graphical error analysis
Several illustrations are used to assess the model capability in order to have a better evaluation.However, once this visual survey was used, the accuracy of models proposed for the prediction of APL enhanced.The most important plot is a cross-plot evaluating the accuracy of the values predicted according to the R 2 .It is noteworthy that the closeness of the data suggests that the unit slope line confirms the high quality as well as the accuracy of the prediction.Another approach that can be used for such an evaluation could be conducted based on error distribution, where the predicted parameter is drawn against the AAPRE.Here, the distributions of errors around the zeroerror line are illustrated, and then, the error trend is calculated and investigated in order to clarify whether the data around the zero-error line is zero or not.Another valuable tool used to graphically investigate the accuracy of prediction is cumulative frequency diagram.As it is seen in this diagram, the cumulative frequency is illustrated versus ARE.Thus, once the cumulative frequency increases, the portions of the estimations are placed in a typical domain of error enhances.
Figure 5 presents the comparison made between the experimental data and the model estimations made for APL that are illustrated as the graphical error analysis conducted for seven models developed in this study.The data are stated for both training and test sets versus the experimental data.Clearly, the data of the whole models are centralized around an area close to the unit slope line, indicating a good agreement between the modeling and the experimental results.No significant dispersed data are seen for any of the proposed techniques, while the CMIS model indicates the most integrated data near the unit slope line.This is the minimum value obtained for RBF since some of the data are distributed close to the 45°line.In addition, due to the fact that this distribution is relatively symmetrical, no significant overestimation or underestimation is observed for the APL values predicted here.Accordingly, the most accurate model is the CMIS method, which has the greatest R 2 value both for the test and train datasets, yielding the most density of data close to the 45°line.In order to be able to study the reliability of the introduced CMIS method and the suggested models, the plots for error distribution were plotted as shown in Fig. 6.The percent relative error (PRE) versus the APL for the entire models is shown in Fig. 6.As you can see, the least data distribution close to the zero-error line has occurred in the CMIS method, in which most of the data sets are centralized on the zero-error line.However, of all the models, MLP-RB and MLP-BFGS showed the highest error value for the testing as well as the training sets.
Furthermore, when investigating the MLP-BFGS and MLP-RB models, it is found that the data sets are distributed under the zero-error line, indicating the fact that the model has overestimated the APL.This finding puts emphasis on the higher efficiency, validity and usefulness of the CMIS method in predicting the APL.Another way to verify the greatest ability of the developed models is to apply the cumulative frequency plot displayed in Fig. 7.The cumulative frequency of the data sets in comparison with the absolute relative error (ARE) obtained for the developed models is illustrated in Fig. 7.According to the same figure, it is found that the CMIS method gives the highest cumulative frequency at a normal ARE compared to other methods.For instance, at ARE = 5%, around 75% of CMIS, 71% of RBF, 69% of MLP-BR, 68% of MLP-LM, 63% of MLP-SCG, 59% of MLP-RB and 57% of the MLP-BFGS estimates, the estimation error was found to be equal or less than 5%.As it is observed in this figure, in the CMIS model above 85% of the predicted values show an ARE value of lower than 10%.These results indicate that the CMIS model could be used to predict the APL with great success.As another comparison among the models, group error plot was used.As shown in Fig. 8, the CMIS model has the least amount of error and the most uniformity.After the CMIS model, MLP-BR is the most accurate due to lower value of AAPRE in range of 0.23-0.40.

Accuracy and validity of the developed models and CMIS method
The method introduced in the current study is very helpful due to the high reliability and accuracy of the CMIS method, given the fact that various models have been combined together and a large database has been applied for this development.A considerable point to mention is that as all the other data-driven methods, the models established here are associated with the input data.Despite the fact that a large number of various types of data are used to develop the models, the error for the out-of-therange data points may be higher than the values seen in this research; hence, more care must be exercised in the application of these methods.
The statistical errors for all the intelligent models are shown in Fig. 9, including AAPRE, SD, RMSE, and R 2 .As seen in Fig. 9a, the lowest AAPRE value obtained, which was equal to 4.992%, belonged to CMIS.It can be concluded that in MLP algorithm, BR and LM are better compared to SCG, RB and BFGS.Moreover, Fig. 9b illustrates the SD values obtained for the whole models.The CMIS and MLP-SCG techniques indicate the low values of SD amounting to 0.022 and 0.139, respectively, while MLP-BFGS and MLP-BR indicate the greatest SD value.Figure 9c shows the RMSE values for the entire models.In addition, for the MLP model that is optimized using the LM technique, the lowest RMSE value was gained, whereas the CMIS model gives the minimum error value from among the models mentioned here.Finally, the last statistical parameter shown in Fig. 9d is reported to be R 2 .The greatest outcomes were gained for CMIS, amounting to 0.994.In case of using the other techniques, a large portion of the results that were associated with various optimization algorithms were found to be in the acceptable range of 0.968 for RBF to 0.991 for MLP-LM; however, all the models that have a R 2 greater than 0.90 typically demonstrate a good accuracy.
The trend analysis is shown in Fig. 10, in order to investigate the reliability of the suggested techniques.The trend analysis has been conducted in Fig. 9 for all the developed models, indicating the changes in the APL against the MW, PV, Q, OD, and ID in subplots (a) to (e), respectively.As shown in Fig. 10a, the APL estimates of the CMIS model accord to the experimental data and follow a similar trend (as the MW is enhanced, an increase is seen in APL), whereas the other existing models, namely MLP-RB and MLP-BFGS, are not able to predict the changes in APL properly.According to Fig. 10b, all the models developed here follow a similar trend as the experimental data with an increase in the PV value, whereas the CMIS method exhibits the best trend.Figure 10c exhibits the effect of flow rate changes on APL.It is clear that the APL predicted by the CMIS method is the best-fitted to the experimental data.So, it can be concluded that the model gives the best results compared to other models, whereas the existing models, namely MLP-RB and MLP-SCG, demonstrate the highest differences.As it is seen in Fig. 10d, the CMIS method is capable of predicting the APL very precisely with changes in OD, while the deviation of the MLP-SCG algorithm from the experimental data is quite significant.According to Fig. 10e, APL reduces in a similar trend with an increase in ID.As such, it is obvious that the CMIS is more accurate than the other models.
Of the different factors influencing the APL, it was found that MW, PV, and Q positively affect the APL trend (as shown in Fig. 10a-c), although the negative trend of APL changes against ID and OD indicated in Fig. 10d and  e.Once more, the higher accuracy of the CMIS model developed here to predict the APL values with changes in the input is proved.Eventually, the tools suggested in the current study can be of great value to effectively estimate the APL in simulating the upstream process.

Conclusions
The purpose of this study was to apply different optimized intelligent techniques for the prediction of the APL value.It is time-consuming to experimentally measure the pressure losses in a well annulus.For this purpose, the model proposed in this study is a reliable and cheap tool that can be easily used to estimate the APL in petroleum industry very quickly.All the results obtained here have been compared to the experimental data by means of statistical and graphical error evaluates.The following conclusions have been made according to this study: 1.Both of MLP and RBF models provided good estimates of APL.In MLP algorithm, it can be concluded that MLP-BR and MLP-LM models have the highest accuracy among all models.

Fig. 7
Fig. 7 Cumulative frequency of absolute relative error in various models for estimation of the APL

Fig. 9
Fig. 9 Statistical errors of the suggested intelligent techniques

2.
The newly CMIS method developed to forecast the APL can predict the APL values very accurately (AAPRE of 4.992% and RMSE of 0.009).3. The validity of the models suggested in the current study can be shown in the following order: CMIS [ RBF [ MLP-BR [ MLP-LM [ MLP-SCG [ MLP-RB [ MLP-BFGS.4. Based on the analysis of sensitivity and reliability, it can be stated that the best model evaluated in the present research has been the CMIS model, since it has demonstrated a satisfying performance to predict the APL.

Fig. 10
Fig. 10 Trend analysis: a effect of mud weight on APL, b effect of plastic viscosity on APL, c effect of flow rate on APL, d effect of outside diameter on APL, and e effect of inside diameter on APL

Table 2
Statistical description of the chosen input and output parameters

Table 3
Statistical parameters of the suggested models for estimation of annular pressure loss