0% found this document useful (0 votes)
11 views

Developing a New Model for Drilling Rate of Penetration Prediction Using Convolutional Neural Network

This research article presents a new model for predicting the rate of penetration (ROP) in drilling operations using convolutional neural networks (CNN) and other machine learning algorithms. The study utilizes data from two wells, applying techniques such as median filtering for noise reduction and feature selection to enhance model accuracy. Results indicate that the CNN model outperforms other algorithms in terms of accuracy and generalizability for ROP prediction.

Uploaded by

Subhamoy Ghosh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Developing a New Model for Drilling Rate of Penetration Prediction Using Convolutional Neural Network

This research article presents a new model for predicting the rate of penetration (ROP) in drilling operations using convolutional neural networks (CNN) and other machine learning algorithms. The study utilizes data from two wells, applying techniques such as median filtering for noise reduction and feature selection to enhance model accuracy. Results indicate that the CNN model outperforms other algorithms in terms of accuracy and generalizability for ROP prediction.

Uploaded by

Subhamoy Ghosh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Arabian Journal for Science and Engineering (2022) 47:11953–11985

https://doi.org/10.1007/s13369-022-06765-x

RESEARCH ARTICLE-PETROLEUM ENGINEERING

Developing a New Model for Drilling Rate of Penetration Prediction


Using Convolutional Neural Network
Morteza Matinkia1 · Amirhossein Sheykhinasab2 · Soroush Shojaei3 · Ali Vojdani Tazeh Kand4 · Arad Elmi5 ·
Mahdi Bajolvand6 · Mohammad Mehrad6

Received: 23 September 2021 / Accepted: 28 February 2022 / Published online: 6 April 2022
© King Fahd University of Petroleum & Minerals 2022

Abstract
Before adjustable parameters of drilling can be optimized, it is necessary to have a high-accuracy model for predicting the
rate of penetration (ROP), which can represent the effects of drilling parameters and formation-related factors on the ROP.
Accordingly, the present research attempts to use different algorithms, including convolutional neural network (CNN), simple
form of least square support vector machine (LSSVM) and its hybrid forms with either particle swarm optimization (PSO),
cuckoo optimization algorithm (COA), or genetic algorithm (GA), and also hybrids of multilayer extreme learning machine
with either of COA, PSO, or GA, to model ROP based on mud-logging and petrophysical data along two wells (Wells A
and B). For this purpose, firstly, median filtering was applied to the data for the sake of denoising. Next, petrophysical logs
were upscaled to make their scales matched to that of mud-logging data. Then, the nondominated sorting genetic algorithm
(NSGA-II) was combined with a multilayer perceptron (MLP) neural network to select the set of the most significant features
for estimating the ROP on the data from Well A. Feature selection results showed that the accuracy of the estimator models
increases with the number of parameters to a maximum of seven, beyond which only subtle enhancements were seen in the
modeling accuracy. Accordingly, the ROP was modeled using depth, bit rotary speed, mud weight, weight on bit, compressive
wave slowness, total flow rate, and neutron porosity. Training the hybrid, CNN, and LSSVM algorithms using the training
data from Well A showed that the model built upon the CNN algorithm tends to produce the smallest root-mean-square error
(RMSE  1.7746 ft/hr), as compared to the other models. In addition, the smaller difference in error between the training and
testing phases (RMSE  2.5356 ft/hr) for this model indicates its high generalizability. This fact was proved by the lower
estimation error of this model for predicting the ROP at Well B, as compared to other models.

Keywords ROP prediction · Deep learning · Petrophysical logs · Mud logging · Feature selection

B Mahdi Bajolvand
[email protected]
1 Department of Petroleum Engineering, Omidiyeh Branch,
Morteza Matinkia Islamic Azad University, Omidiyeh, Iran
[email protected]
2 ACECR Institute of Higher Education (Isfahan Branch),
Amirhossein Sheykhinasab Isfahan, Iran
[email protected]
3 Department of Petroleum Engineering, Semnan University,
Soroush Shojaei Semnan, Iran
[email protected]
4 Islamic Azad University Central Tehran Branch, Hashemi
Ali Vojdani Tazeh Kand Rafsanjani Complex, Tehran, Iran
[email protected]
5 Faculty of Engineering, Islamic Azad University of Quchan,
Arad Elmi Quchan, Iran
[email protected]
6 Faculty of Mining, Petroleum and Geophysics Engineering,
Mohammad Mehrad Shahrood University of Technology, Shahrood, Iran
[email protected]

123
11954 Arabian Journal for Science and Engineering (2022) 47:11953–11985

1 Introduction method in the field (e.g., bit abrasion) has limited the applica-
bility of such models extensively. Another challenge of using
Rate of penetration (ROP) is among the most important per- such models (which is especially the case for the Burgoyne
formance indicators used in the drilling of oil and gas wells. and Young model) is the relatively high number of constants
A large number of factors affect the ROP in a particular that impose significant impacts on the overall modeling accu-
drilling operation [1]. These can be categorized under six racy, as reflected by the large number of studies reporting on
general categories, namely rig productivity, hydraulic param- optimization of such coefficients [11–14].
eters of the bit, drilling fluid properties, human factors, and The MLBIMs constitute another group of methods used
formation properties. A known challenge in the evaluation for estimating the ROP. Thanks to their high flexibility in
of ROP is the relatively large number of effective factors, terms of input parameters and their powerful capability for
some of which are even impossible to measure. The ROP is modeling complex nonlinear associations, these methods
important due to the close associations between the drilling have been increasingly regarded by researchers in the recent
depth, ROP, and cost of the drilling operation, not to men- past. On top of this, these approaches allow for taking into
tion that the drilling cost increases with depth exponentially account a larger number of parameters, as compared to
[2]. Accordingly, researchers have attempted to formulate the analytic models. Exploring the patterns within the data,
relationship between controllable operating parameters (e.g., these approaches may lead to models of higher accuracy and
weight on bit (WOB), bit rotary speed per minute (RPM), generalizability compared to analytical models [15]. Bilgesu
and total flow rate (TFLO)) and non-controllable operating et al. were the first to use an artificial neural network (ANN)
parameters (e.g., formation drillability and intact rock resis- for ROP estimation [16]. Multilayer perceptron (MLP) and
tivity over the drilling trajectory) to predict the ROP and radial basis function (RBF) ANNs have been widely used
hence find the optimum conditions for the ROP [3]. A review by researchers in this field [17, 18]. As a variant of neural
of the existing literature shows that, despite the widespread networks (NNs), extreme learning machine (ELM) has been
efforts made to develop relationships and models for esti- used for ROP estimation in various research works because
mating the ROP, there are still active studies in the field, of its high computation pace coupled with excellent accu-
and various new methods are implemented and analyzed by racy in modeling nonlinear behavior of variables, usually
academics and industrial practitioners. Without any doubt, producing higher accuracy than the competitors [19–22]. In
achieving a comprehensive model of ROP prediction that can contrast, Ahmed et al. showed that the least square support
provide for adequate levels of accuracy and generalizability vector machine (LSSVM) could result in high-accuracy esti-
can serve as the most important objective for such studies. mates of ROP [23]. Bani Mustafa et al. used second-order
According to a broad classification scheme, ROP prediction multivariate regression to establish a mathematical equation
models can be categorized under two main classes, namely among different controllable drilling parameters (WOB,
physics-based models (PBMs) and machine learning-based RPM, and TFLO) based on the drilling data along several
intelligent methods (MLBIMs). wells drilled in southern Iraq. Then, they implemented
The PBMs for ROP prediction are, indeed, mathematical the response surface methodology (RSM) to identify the
equations that have been once developed based on particu- optimal range for each of the controllable parameters for the
lar datasets. In this regard, these models clearly reflect the purpose of maximum ROP. Outcomes of this study further
relationships among different parameters and the way they highlighted the significant role of the estimator model in
affect the ROP. Popular examples of PBMs for ROP predic- determining optimal values of the drilling parameters and
tion include those proposed by Burgoyne and Young (1994) hence the cost and time of the drilling operation [24]. Tewari
and Warren et al. (1986) for roller cone bits, those published et al. used the data from three adjacent wells in a hydrocar-
by Harreland and Rampersad (1999), Motahhari et al. (2008), bon field in Norway to develop a model for estimating the
and Al-AbdulJabbar et al., 2018) for polycrystalline diamond ROP, based on which the best operating parameters for bit
cutter (PDC) bits, and the one developed by Bingham (1996) selection could be determined. In this study, the authors used
for both bit types [4–9]. These models are described in detail ANN and RSM to develop the predictor model. They further
in "Appendix A". By the way, implementation of these mod- applied the artificial bee colony algorithm and genetic
els has always been accompanied with different challenges. algorithm (GA) to select the best possible values of the
PBMs are developed on the basis of sets of experimental controllable parameters. Results of this research suggested
results and well data. In many cases, this has been led to that the development of a high-accuracy model can serve
reduced levels of accuracy because of differences in the for- as a key in the bit selection process—a factor that plays a
mation type [10]. Furthermore, the need for parameters that determinant role in final drilling operation cost [25].
could be particularly measured in the laboratory (e.g., forma- In MLBIMs, the presence of controllable parameters that
tion drillability or geometrical parameters of the bit) or were can be set by the user considering the conditions of the spe-
commonly not measured or lacked a standard measurement cific problem at hand is of crucial importance. Some of such

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11955

parameters (e.g., number of hidden layers in an MLP ANN) has highlighted the more capabilities of the hybrid method-
are linked to the structure of such methods. In contrast, some ologies compared to simple algorithms. Therefore, in this
others (e.g., weights and biases of the neurons in NNs and study, we used CNN and hybrid forms of MELM to model
hyperparameters in LSSVM) have more to do with initializa- the ROP, with the results compared to those of one of the
tion and evolution of them toward converging to the solution. most powerful algorithms that have been extensively used
Many researchers employed different algorithms to optimize in the literature, i.e., LSSVM in hybrid and straightforward
these parameters. Elkatatny optimized the structure of an forms.
ANN in terms of the numbers of hidden layers and neu-
rons in each layer as well as the transition function using a
methodology known as self-adaptive differential evaluation. 2 Studied Wells
He then developed, based on weights, biases, and layers of
this ANN, an analytic model to estimate the ROP. The results In this study, the data acquired along two wells in an oilfield in
showed superior accuracy of this model compared to previous SW Iran were used for ROP modeling based on petrophysical
analytic approaches [26]. Anemangely et al. employed the logs and drilling parameters. Respecting the confidentiality
particle swarm optimization (PSO) and cuckoo optimization of the collected data, we herein refer to the two wells as
algorithm (COA) for setting the weights and biases of neu- Wells A and B, respectively. The petrophysical parameters
rons to achieve minimum error on an MLP ANN. Comparing acquired over the two wells included GR, NPHI, RHOB, spe-
the results between the simple MLP ANN and the opti- cific resistivity (RT), compressive wave slowness (DTCO),
mized versions of the network showed that MLP-COA hybrid and photoelectric log (PEF). In terms of drilling data, we
approach can serve as an efficient tool for high-accuracy pre- were provided with actual ROP, WOB, torque (TRQ), TFLO,
diction of ROP [1]. Ashrafi et al. compared performances returned mud weight (MW-OUT), equivalent mud weight
of PSO, GA, independent component analysis (ICA), and (ECD), and d-exponent parameter (d-exp). An overview of
biogeography-based optimization (BBO) in optimizing MLP the petrophysical logs and drilling data over the studied wells
and RBF NNs. The results indicated the superior perfor- is presented in Figs. 1, 2, 3, 4. Table 1 gives descriptive statis-
mance of the MLP-PSO compared to other approaches [27]. tics for different parameters at the studied wells. Formation
Seeking to optimize hyperparameters of an LSSVM tops over the two wells are described in Table 2.
model, Mehrad et al. used PSO, COA, and GA. In this
research, for the sake of comparison, hybrid models of MLP
coupled with the mentioned algorithms were further built and 3 Methodology
compared to SVR and random forest (RF) models. This study
was based on petrophysical parameters (density (RHOB), Preprocessing plays a crucial role in obtaining high-accuracy
neutron porosity (NPHI), and gamma-ray (GR)) and uniaxial models. Accordingly, in this work, we began by denoising
compressive strength (UCS) of rock and drilling data simul- the collected data. As a next step, in order to address the dif-
taneously. The results showed that the ROP model developed ference in scale between the petrophysical and mud-logging
based on the LSSVM-COA tended to exhibit the highest level data, a resampling task was performed to come up with
of accuracy [28]. In another piece of work, Ansari et al. devel- a uniform sampling interval. Subsequently, the data corre-
oped an ROP model using an SVR model that was optimized sponding to each well were managed into a database before
using ICA, with the results clearly indicating the higher accu- proceeding to feature selection on the data from Well A. The
racy of the optimized model compared to the simple SVR selected features were then used for modeling by means of
model [29]. intelligent algorithms (IAs). Finally, the developed models
A review of the literature published so far shows that the were validated on the data from Well B. Figure 5 presents the
research for developing ROP estimation models is still in flowchart through which the present study was performed.
progress because of the importance of such models in opti- Each step is explained in the following.
mizing adjustable drilling parameters, not to mention the
significance of achieving a comprehensive model of high 3.1 Preprocessing
accuracy coupled with widespread generalizability. On the
other hand, the application of multilayer extreme learning Identification and extraction of patterns on the data can-
machine (MELM) and convolutional neural network (CNN) not be properly accomplished unless one is provided with
in other engineering problems, such as pore pressure predic- sufficient data, in terms of quantity and quality, and proper
tion [30], shear wave velocity estimation [31], and gas flow feature selection is performed using a machine learning (ML)
rate prediction [32], has shown their enormous potentials approach. These conditions not only increase the chances of
for developing high-accuracy wide-generalizability models. achieving a high-accuracy, highly generalizable model, but
Hybridization of optimization problems with estimator ones also accelerate the main processing of the data. Therefore,

123
11956 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Fig. 1 Variations in mud-logging data within the studied interval of Well A

Fig. 2 Variations in mud-logging data within the studied interval of Well B

prior to modeling, it is necessary to subject the collected data Noise represents an inevitable part of any real data, which
to proper preprocessing. exist at a minimum level of 5% even when the conditions are
fully controlled [33, 34]. Noisy data are known to result in
the extraction of unrealistic rules out of the data, degrading
3.1.1 Denoising
the generalizability of the associated models for predicting
unforeseen data [35, 36].
One of the most important factors affecting the performance
of an ROP estimation model is the quality of input data.

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11957

Fig. 3 Variations in petrophysical data within the studied interval of Well A

Fig. 4 Variations in petrophysical data within the studied interval of Well B

In the fields of signal and image processing, noise identifi- salt-and-pepper noise. In this technique, each data point is
cation plays a crucial role in selecting an appropriate method replaced by the median of a predefined neighborhood around
of noise attenuation. Considering the problem conditions and the original data point. For this purpose, we begin by sort-
structure of petrophysical logs and drilling data, median fil- ing all data points within the neighborhood in the order of
tering (MF) can serve as an efficient tool for denoising such value/intensity/magnitude before the median of them can
data. One-dimensional (1D) MF is a well-known technique be identified. Now, the original data point is replaced by
in image processing for addressing random, Gaussian, and the median only if it differs from that significantly, with

123
11958 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Table 1 Descriptive statistics of the petrophysical and drilling data over the studied wells

Parameter Unit Well #A Well #B

Min Avg Max Min Avg Max

Petrophysical Logs
Depth f eet 7519.68 10,087.33 12,660.76 7716.53 9683.39 11,646.98
BS I nch 12.25 – 12.25 8.5 – 8.5
CGR G AP I 3.66 32.36 110.54 16.13 41.57 109.59
PEF b/e 2.07 4.42 6.1 2.17 4.73 5.94
RHOB gr /cc 1.45 2.51 2.82 1.97 2.54 2.82
RT ohmm 0.05 2.99 73.35 0.06 1.22 13.03
NPHO V /V 0 0.16 0.5 0 0.11 0.46
DTCO μs/ f t 47 68.75 127.98 48.42 65.98 117.71
Count – 10,283 7865
Record rate ft 0.5 0.5
Drilling Parameters
ROP f t/h 2.88 19 93.27 0.21 23.28 121.86
WOB klb f 1.1 13.97 33.1 4.55 15.32 29.82
RPM r pm 92 153.69 176 60.19 122.67 170.98
TRQ klb f . f t 1.62 2.94 4.72 2.71 4.89 7.08
TFLO gpm 476 588.97 764 478 528.11 570
MW-OUT pc f 78.03 82.27 89.5 78.28 81.3 85.03
ECD pc f 64.67 83.04 90.41 78.38 81.68 85.82
d-exp – 0.026 0.07 1.54 0 1.21 1.61
Count – 1566 1199
Record rate ft 3.28 3.28

Table 2 Formation tops over the trajectories of the Wells A and B 3.1.2 Upscaling
Formation Top Depth (m) Number of Data
Points Knowing that conventional petrophysical logs and drilling
data have been originally sampled at different rates, they
Well A Well B cannot be incorporated into a single model of ROP predic-
tion unless their scales are somewhat uniform. Two different
Pabedeh 2292 60 –
approaches have been proposed to this problem, namely
Gourpi 2352 264 264
downscaling the drilling data (which are usually sampled
Ilam 2609 83 83 at a lower rate, e.g., 1 sample every 3.28 ft) and upscaling
Laffan 2693 8 8 the petrophysical logs (which are usually sampled at a higher
Sarvak 2702 616 616 rate). Since the downscaling of drilling data is usually a cause
Kazhdumi 3319 201 201 of increased levels of error, the common practice is to rather
Dariyan 3522 184 27 opt for upscaling the petrophysical parameters. One of the
Gadvan 3707 150 – most popular techniques for this purpose is averaging, which
Total – 1566 1199 can be expressed as follows:

n
the original data point remaining unchanged otherwise [37]. j i1 ai
aupscaled  (1)
With such a procedure, this technique provides for different n
levels of smoothing depending on the filtering window. Nev-
ertheless, this is a very effective method when the database where ai refers to the values of data points corresponding
j
contains a few points with abruptly different values from the to equally spaced depths and n and aupscaled are the number
others. and upscaled value of such points.

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11959

Fig. 5 Workflow of the present research

3.1.3 Feature Selection in this study. Figure 6 shows the flowchart through which
the NSGA-II approaches the optimal solutions. As shown in
Inclusion of all, or even the most possible number of, features the figure, the NSGA-II begins by generating an initial pop-
in developing an estimator model may not necessarily lead to ulation randomly. Next, each solution is evaluated using an
improved accuracy of the model. Through feature selection, objective function (sometimes called cost function). Among
one can evaluate different features to eliminate similar ones to the solutions comprising each generation, a given number
not only increase the processing speed of ML-based methods of them are selected through a binary tournament selection
but also enhance the accuracy and generalizability of the approach. This approach begins by selecting, on a random
model. basis, two solutions out of the population and then identifies
Given the large number of parameters affecting the ROP, the best solution by comparing the two. This algorithm is
it is always very important to identify the minimum number based on two selection criteria, namely the rank and crowd-
of parameters that can provide for acceptably high accuracy ing distance of different solutions [41]. Accordingly, the best
and reliability of the resultant model. Feature selection refers solution is the one that exhibits the smallest rank coupled with
to the identification of the smallest subset of features that the largest crowding distance. Once finished with this selec-
imposes the largest impact on reducing the model complexity tion approach, a predefined subset of the original generation
while increasing the prediction accuracy [38]. Conventional is obtained. Now, one may proceed to apply mutation opera-
ML techniques require feature selection as a prerequisite—a tor on a subgroup of the selected individuals and the crossover
task that needs to be performed by a relevant expert [39]. The operator on another subgroup of them to come up with a pop-
application of multi-objective optimization algorithms, as ulation of offsprings and mutated individuals. Now, this new
compared to single-objective optimizers, has shown the supe- population is combined with the original population followed
riority of the multi-objective optimization algorithms [40]. by sorting the members in the increasing order of rank and
Accordingly, considering the feature selection problem, one then the decreasing order of crowding distance. At this stage,
can adopt a multi-objective optimization algorithm with two we have a population whose members are sorted by rank, in
objectives: minimization of the number of input parameters the first place, and crowding distance, in the second place.
and minimization of the fitting error. Therefore, the non- We shall now select as many of the top ranked individuals as
dominated sorting genetic algorithm-II (NSGA-II) is used the population size of the original population. These finally

123
11960 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Non-dominated
Truncate population
Start sorting

Problem definition and Sort population first based on Calculation of


set NSGA-II parameters crowding distance and then based crowding distance
on rank

Sort population first based on


Initialize population crowding distance and then based
Calculation of on rank
crowding distance

Cost function evaluation Select pareto-front


Non-dominated
sorting

Non-dominated sorting
Combine parents No Stopping criteria
and child population
met?

Calculation of crowding
distance
Yes
Mutation and cost
function evaluation
Pareto-optimal solution

Sort population first based on


Crossover and cost
crowding distance and then based
function evaluation
on rank End

Fig. 6 Flowchart of NSGA-II

selected individuals then represent the lowest ranks and the 3.2.1 Predictors
largest crowding distances, as compared to other members
of the combined population. The selected individuals con- So far, numerous researchers have tried to estimate ROP
stitute the next generation, with the other members of the based on petrophysical logs and drilling data by means of
combined population simply omitted from the process. This a versatile set of different predictor algorithms. The MELM
cycle is repeated until a predefined stopping criterion is met. and LSSVM have long been among the most accurate meth-
The non-dominated solutions obtained for a multi-objective ods for such a purpose. In the meantime, to the best of our
optimization problem usually comprise a Pareto front. None knowledge, no one has ever used deep learning (DL) algo-
of the solutions on a Pareto front is preferable over others, and rithms for this purpose, despite the successful application of
each one of them can represent an optimal selection under DL algorithms (e.g., CNN) for solving complex problems.
the considered set of conditions [41]. Therefore, we herein utilize DL algorithms to estimate ROP
from a compilation of petrophysical logs and drilling data,
as explained in the following.

• MELM algorithm
3.2 ROP Modeling with AIs
The ELM network is a fast yet relatively accurate algorithm
Respecting the superior performance of hybrid ML algo- for solving highly complex problems. First introduced by
rithms compared to simple models and PBMs for ROP Liang et al. [42], it has been proven efficient for ROP esti-
estimation, we opted for hybrid intelligent models in this mation as well. In this algorithm, a hidden layer is utilized
research. These include two broad categories of models, to avoid time-intensive iterations during the learning pro-
namely predictor and optimizer models, as explained in cess, so as to come up with significantly faster processing
the following. Subsequently, we present the procedure for times. In this structure, weights of the intermediate layer
building a hybrid algorithm by combining the predictor and are randomly selected from a uniform distribution. The out-
optimizer models. put weights of the ELM network are determined from the

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11961

Fig. 7 Structure of a simple ELM

Fig. 8 Flowchart of LSSVM

Moore–Penrose generalized inverse of the hidden layer out- iterative simulations in the back-propagation stage—a step
puts [43]. Figure 7 indicates the structure of an ELM. Inspired that rather cannot be avoided in an MLP network [43]. In
by single-hidden layer feedforward networks (SLFNs), this order to solve complex nonlinear equations that are associ-
processing machine is highly fast yet very simple. Huang ated with clustering or regression solution methods, one can
et al. showed, theoretically, that random determination of use fast-track ML with multiple hidden layers in the form of a
hidden nodes in an SLFN can prevent it from being trained MELM. The flowchart shown in Fig. 8 depicts the main steps
[44]. The linearity of the changes in the output layer provides of problem-solving via MELM. As far as prediction is con-
for optimizing the output weight with minimum regression cerned, the MELM is much more accurate than a single-layer
error. This aspect of this network eliminates the need for machine.

123
11962 Arabian Journal for Science and Engineering (2022) 47:11953–11985

• LSSVM algorithm Table 3 Most popular kernel functions used with LSSVM [28, 49]

Type of kernel Equation Parameters


As a supervised method, SVM is a powerful tool of ML
with applications in the solving of classification and estima- Linear (lin-kernel) K (xi ,x)  xiT x N/A
tion of nonlinear functions [45]. The regression formulation Polynomial K (x ,x)  t: intercept
used in an SVM is customized for solving quadratic pro- (poly-kernel)  i d d:degree of
xT x
gramming problems, which makes them computationally t + ic polynomial
expensive models [46]. In order to reduce this relatively high RBF (RBF-kernel) σ 2 : variance of the
K (xi ,x) 
computational cost, Suykens presented the LSSVM as an Gaussian kernel
exp − x−x i
2

alternative to SVMs. σ2

Compared to SVM, LSSVM provides high-speed training. MLP K (xi ,x)  θ: bias
This can be attributed to the use of equality constraints instead (MLP-kernel) tanh kxiT x + θ k: scale parameter
of inequality constraints encountered in quadratic program-
ming problems. In the meantime, sparseness and robustness
are known challenges to this approach. Of the main disadvan- In this equation, the Lagrangian coefficients are deter-
tages of such methods, one may refer to increased training mined using the Karush–Kuhn–Tucker (KKT) equality con-
time and output model error on industrial data, which are straints, as expressed in Eqs. (5) through (8) [48]:
especially the case when the input data suffer from het-
eroscedasticity and/or imbalanced distribution [47].  N
∂L
Assuming a training dataset {xi .yi } where i  1, 2, · · · , N , 0→w αi ϕ(xi ) (5)
∂w
input data are of n dimensions (xi ∈ R n ) and output result i1
is of one dimension (yi ∈ R), the regression version of the  N
∂L
LSSVM takes the following general form: 0→ αi  0 (6)
∂b
i1
y  w T φ(x) + b (2) ∂L
 0 → αi  γ ei , i  1, 2, . . . , N (7)
∂ei
where w is the weight vector, b is the bias term, and φ(x)
∂L
is a kernel function through which the input data can be  0 → w T ϕ(xi ) + b + ei − y  γ ei , i  1, 2, . . . , N
mapped into a feature space of higher dimensions. Various ∂αi
(8)
kernel functions have been defined for the LSSVM so far,
including linear, polynomial, RBF, and MLP (Table 3). Equa-
Considering the mentioned equations, the weight vector
tion (3) expresses the objective function and constraints for
can be obtained from Eq. (9):
a LSSVM.

N 
N
1  2
N
1 w αi ϕ(xi )  γ ei ϕ(xi ) (9)
min C(w.e)  w T w + γ ei
2 2 (3) i1 i1
i1
Subject to : y  w φ(x) + b + ei
T
The weights are herein defined as linear combinations of
the Lagrangian coefficients for the input data into the learn-
in which γ is a regularization parameter, which must be opti- ing process. By substituting Eq. (9) into Eq. (2), one may
mized before a high-accuracy model can be achieved. This come up with Eq. (10) where K (xi ,x) is the kernel function.
parameter establishes a balance between lower learning error Performance of an LSSVM is largely determined by the used
and smoothness. In order to solve Eq. (3) with the help of kernel function and hypervalues [50].
Lagrangian algorithm, one needs to define Lagrangian coef-
ficients αi —which can be positive or negative—for each xi , 
N 
N

as proposed in Eq. (4): y αi ϕ(xi )T ϕ(x) + b  αi K (xi , x) + b (10)


i1 i1

1 
N
• CNN algorithm
L (w.b.e.α)  w2 + γ ei2
2
i1
Representing a special case of ANN, CNNs have been intro-

N  
− αi w φ (x) + b + ei − yi
T duced in different forms for DL networks. The widespread
i1 (4) application of CNN in complicated image processing tasks

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11963

such as image classification, object identification, and speed solution through an evolutionary process. In an optimiza-
detection is primarily due to its remarkable performance in tion problem with Nvar decision variables, habitat is defined
such areas [51, 52]. DL is comprised of different assessment as a 1 × Nvar array (i.e., habitat  [x1 ,x2 ,x3 , . . . ,x Nvar ])
levels, with each level being capable of learning several dif- indicating spatial coordinates of the corresponding cuckoo
ferent features [53]. A major advantage of CNN is the use in the decision space. The decision variables take decimal
of automatic feature selection on raw input data for solving values, and the profitability of each habitat is determined
the problem. You may consider that the selection of supe- by evaluating a profit function ( f p ) at that habitat (i.e.,
rior features is a determinant stage of any modeling process pr o f it  f p (habitat)). Given that this algorithm was orig-
by an ML method, significantly affecting the final process- inally formulated for maximization problems, maximization
ing time and modeling accuracy. Another advantage of CNN of the profit function expressed in the form of Eq. (11) ensures
over conventional network structures is its extended general- the minimization of the cost function ( f c ).
izability as it weights different parameters and hence lowers
the number of input parameters, a fact that facilitates the Profit  −cost(habitat)  − f c x1 , x2 , x3 , . . . , x Nvar (11)
implementation of large-scale networks, which are usually
difficult to deal with using conventional NNs [51]. Neverthe- The evolutionary process of COA is as follows. Given an
less, a major problem with this methodology is the relatively initial population of cuckoos (N pop ), the journey begins by
large number of adjustable parameters, including not only the forming a N pop × Nvar matrix of candidate habitats. Next, a
weights and biases but also the convolutional parameters. randomly selected number of eggs is assigned to each habi-
Figure 9 shows the structure of a CNN. As the figure sug- tat. In nature, this value ranges from 5 to 20, on average.
gests, a CNN has multiple parallel filters that can be adjusted Indeed, these values set a lower and upper bound to the num-
for feature selection. The input vector is filtered by each ber of eggs at each habitat. In real world, cuckoos try to lay
of these convolutional layers. Each of these convolutional their eggs at the maximum possible distance to their actual
layers produces its own specific output vector. Therefore, habitats. In COA, this maximum distance is referred to as
dimensions increase with the number of convolutional lay- egg-laying radius (ELR). For each cuckoo, the ELR is set
ers. Thus, a pooling layer is used to decrease the dimensions based on the overall number of eggs and the lower and upper
and normalize them. Finally, outputs of all pooling layers bounds (V arlow and V arhigh , respectively) to the decision
are compiled and fed into a compact layer for producing the variables (Eq. (12)). Notably, the parameter α—which is an
ultimate outputs. This compact layer (very like an MLP NN) integer value—is used to control the ELR.
is composed of several neurons, with the number of neurons
set by the user.
No. of Current Cuckoo’s Eggs
ELR  α ×
Total number Eggs
3.2.2 Optimizers
× Varhigh − Varlow (12)
Metaheuristic optimization algorithms have been widely
mixed with predictor algorithms for solving a versatile spec- At the next step, each cuckoo lays eggs, on a random basis,
trum of problems. COA and PSO are among the most widely in the nest of host birds within its ELR. Once finished with the
used optimization algorithms in the previous literature. In this egg-laying process, the least similar eggs to those of the host
work, we use these two algorithms in combination with pre- bird are identified and thrown away by the host bird. In this
dictor algorithms to develop models of wide generalizability way, each round of egg-laying is followed by the elimination
and high accuracy. Each of these algorithms is explained in of a certain percentage of the cuckoos’ eggs (usually equal to
the following subsections. 10%). These are the ones that are laid in habitats of less than
enough nutrition, keeping them from converging to an opti-
mal solution. In the meantime, the other eggs get mature in
• COA
the nests of the host birds and then hatch from the egg to be fed
by the host bird. Grown into maturity, the little cuckoos con-
COA is a population-based algorithm and was originally tinue to live in their habitats for some time. However, as the
developed to solve continuous multidimensional nonlinear egg-laying time arrives, they migrate to places where they can
problems usually encountered for optimization purposes find more food and host birds whose eggs look much like the
[54]. In an optimization problem, one shall begin by pre- cuckoos’ eggs. Indeed, the cuckoos migrate to destinations
senting the decision variables in the form of an array; in corresponding to the highest cost (i.e., the largest supply of
COA, this array is referred to as habitat. That is, each habitat food and the best environmental conditions for living). Since
represents a candidate solution for the optimization prob- the matured cuckoos are now well scattered among different
lem. The candidate solutions converge to a global optimum habitats, it is now pretty difficult to identify which cuckoo

123
11964 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Fig. 9 Structure of a CNN

Fig. 10 Flowchart of COA


Start

Determine egg laying


Initialize Cuckoos with eggs
radius for each cuckoo

Lay eggs in different nests


Move all cuckoo toward
Some of eggs are detected best environment
and killed

No Population is less Determine cuckoo


than max value? societies
Kill cuckoos in worst
area Yes

Find nests with best


Check survival of eggs in survival rate
nests (calculate Profit)

Stop condition No Some of eggs are


is met? detected and killed

Yes

End

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11965

belongs to which community. In order to address this diffi- Start


culty, K-means clustering algorithm is devised to classify the
cuckoos. According to Rajabioun [54], the best results can be Problem definition
obtained with 3–5 clusters. Once the cluster to which each and load data
cuckoo belongs was identified, the average profit per clus-
ter is calculated. Accordingly, the best habitat can be found Initilize particles
in the cuckoo cluster for which the average profit is maxi-
mal. Therefore, this cluster is identified as a destination for
Evaluate each particles and store
migrating cuckoos from other cuckoo clusters. The cuckoos the best cost and its position
do not follow a direct path toward their target habitat but
rather move at a certain inclination angle to the destination Update position and velocity of
for a particular part of the path. Noteworthily, there must be each particle
a parameter to control the cuckoos’ population considering
possible food deficiencies and elimination of some cuckoos evaluate each particle and update
by hunters or due to lack of adequate nests for growing the the best cost and its position

offsprings. In this respect, in COA, only N Max cuckoos pro-


ducing the highest profits survive. After some iterations, all
NO The stopping
cuckoos converge to the best possible habitat, where their criterion is met?
eggs are at the maximum similarity to those of the host bird
and availability of food supply is maximal. Figure 10 indi- Yes
cates the flowchart of COA schematically. Recover the best
particle

• PSO End

Fig. 11 Flowchart of PSO algorithm


Particle swarm optimization (PSO) is a metaheuristic algo-
rithm that was first developed by Kenedy and Eberhart [55].
Based on an initial population, this algorithm can solve the new position is lower than the best cost among all previous
complex mathematical problems, especially nonlinear opti- iterations. In order to update the particle swarm-best position,
mization problems encountered in engineering fields [56]. the cost of the swarm-best position at the previous iterations
The commonality of the best solution among the population is compared to that at the current iteration, and the position
members and the fewness of the adjustable parameters are is updated only if the current swarm-best position is found
the key advantages of this algorithm. to be associated with a lower cost than all of the previous
PSO searches for the best solution out of a popula- iterations. This process continues until a predefined stopping
tion of particles. Each particle possesses five properties, criterion—which is usually defined as reaching a predefined
namely position (x), velocity (V ), personal-best position number of iterations—is met. Eventually, the final particle
(Pb)—which is ever experienced by the corresponding parti- swarm-best position and the corresponding cost are reported
cle—the cost associated with the personal-best position, and as the best solution.
particle swarm-best position (Gbest ). At the first iteration, the
velocity and position of each particle are determined on a
random basis, with the associated cost then calculated using Vi (t + 1)  wVi (t) + c1r1 (Pbi (t) − xi (t))
a cost function. At this iteration, the personal-best position + c2 r2 (G best (t) − xi (t)) (13)
of each particle is set to be the current position of the par-
ticle, with the associated cost serving as the personal-best
cost. Comparing various particles in terms of cost, the posi- xi (t + 1)  xi (t) + Vi (t + 1) (14)
tion corresponding to the lowest cost is found and nominated
as the particle swarm-best position. At the next iteration (t + In the above equations, the subscript i refers to the particle
1), particle velocity is updated using Eq. (13) for every sin- number in the swarm, w is the weight of inertia (controlling
gle particle, with the new position of each particle obtained the recurrence of the particle velocity; see [57]), c1 and c2 are
by adding the new velocity to the position of that particle at positive coefficients referring to personal and swarm learning
the previous iteration (t), see Eq. (14). The updated positions factors, respectively, and r1 and r2 are random numbers in
are now reevaluated through the cost function to update the the range of [0, 1] [58]. Figure 11 depicts the flowchart of
personal-best position and cost for each particle if the cost at the PSO algorithm.

123
11966 Arabian Journal for Science and Engineering (2022) 47:11953–11985

the scene for achieving solutions of higher quality. When the


mutation operator is applied to a chromosome, the value of
one or more genes of a part of offspring chromosomes is
changed randomly. As a next step, the newly generated solu-
tions are assessed using the fitness function, and the entire
process is iterated until the stopping criterion is met, at which
point the best solution is reported.

3.2.3 Hybrid Predictor–Optimizer Algorithms

A key challenge ahead of implementing an IA in practice is to


formulate the optimal structure of the model for the specific
data under study. In this work, we hybridized MELM and
LSSVM models with different optimization algorithms to
come up with optimum architectures of the formers. This is
explained in detail in the following.

• Hybridizing MELM with different optimizers

In the MELM, the weights and biases are all set on a random
basis. This implies that the model output may exhibit differ-
ent degrees of accuracy. At the other end of the spectrum, it is
Fig. 12 Flowchart of GA
usually costly and time-intensive to develop a high-accuracy
model via a barely trial-and-basis approach, and even that
• GA cannot guarantee the realization of a model with the highest
possible accuracy. Under such circumstances, one can opt
for optimization algorithms to determine optimal numbers
In 1975, Holland proposed the GA based on Darwin’s theory of layers and neurons per layer, weights, and biases for a
of biological evolution [59–61]. Based on an initial popula- MELM, so as to achieve a model with the highest possible
tion of chromosomes (i.e., solutions), this algorithm presents level of accuracy.
a reflection of the natural selection process where individuals For such a purpose, it is necessary to identify the optimum
of the highest fitness are selected for reproduction to gen- structure of the algorithm itself. Accordingly, the weights
erate the next-generation offsprings. The offsprings inherit and biases of the algorithm are considered as decision vari-
characteristics of the parent and even transmit them to the ables to an optimization algorithm. At each iteration of the
proceeding generation. Should the parents exhibit better fit- optimization algorithm, a MELM is built with the best val-
ness, the offsprings would exhibit a higher fitness as well, ues of the decision variables at that iteration and then tested
which is equal to a higher chance of survival compared to on training data. Subsequently, the optimization algorithm
their parents. Figure 12 shows the flowchart of GA. improves the values of the weights and biases based on the
On this basis, in GA, the journey toward the optimal solu- feedback received from the built model at the previous itera-
tion starts from an initial population of chromosomes that tion, and this process continues until the stopping criterion of
are initialized randomly. The size of this initial population is the optimizer is satisfied, upon which the optimal values of
linked to the nature and complexity of the problem at hand the weights and biases are reported. Finally, using these opti-
and remains unchanged throughout different iterations of this mal values, the built MELM is assessed on the testing data.
algorithm. The values assigned to each chromosome are ana- Figure 13 depicts this process in the form of a flowchart.
lyzed by a fitness function. Thus, parent chromosomes are
selected from the group of chromosomes with the highest fit-
• Hybridizing LSSVM with optimization algorithms
ness values compared to other chromosomes. This is realized
by means of crossover and mutation operators. The crossover
operator swaps, randomly, parts of one chromosome with The performance of LSSVM algorithm has much to do
those of another. The result is an offspring that inherits cer- with the choice of the kernel function and the regularization
tain properties from each of the parent chromosomes rather parameter. Accordingly, before this method can be imple-
than exactly resembling either of them. This operator sets mented properly, one must identify the most appropriate

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11967

Fig. 14 Flowchart of the hybrid LSSVM algorithm

Fig. 13 Flowchart of hybrid MELM algorithm


measurement conditions (i.e., weather conditions, operator’s
experience, tool capabilities and age, etc.) was available, 5%
kernel function based on the specific conditions of the prob- denoising was performed by applying the median filter on
lem using a trial-and-error technique. Next, the values of the petrophysical and mud-logging data. Figures 15 and 16
hyperparameters of the LSSVM, determined by calculating demonstrate the raw measured and denoised mud-logging
the conjugate gradient, are extracted and taken as a mem- and petrophysical-logging data, respectively, along Well A in
ber of the population in an optimization problem. Through a the studied field. As is evident on these figures, the denoising
number of iterations, the optimizer optimizes these hyperpa- attenuated, to some extent, the abrupt peaks over the logs.
rameters in such a way that the LSSVM algorithm produces The mud-logging data had been acquired at a sampling
outputs as close to the target parameter as possible. Figure 14 interval of 1 m, while one petrophysical reading was avail-
shows the flowchart of the hybrid LSSVM algorithm. able every 0.1524 m, meaning that petrophysical logs were
acquired at some 6 points each meter. Before the mud-logging
and petrophysical data could be combined properly for the
4 Results and Discussion purpose of ROP modeling, one had to homogenize their sam-
pling frequencies. This could be achieved through either
4.1 Pre-processing upscaling or downscaling method. Upon a possible down-
scaling, the sampling interval of mud-logging data would
Denoising is somewhat inevitable given the unavoidable decrease from 1 m down to 0.1524 m, which means that
effect of the noise on any measured data in the field. Indeed, every successive two points of original data would be used
without such a denoising step, it seems infeasible to achieve to interpolate 4 or 5 extra points. It is pretty trivial that esti-
a model of adequate accuracy and generalizability. Setting mating such a relatively large number of unknown points
the level of denoising requires a knowledge of data measure- from only two known points is susceptible to very high lev-
ment conditions, but even the best-controlled conditions are els of error, although this approach would end up increasing
susceptible to leave some 5% of noise on the useful data. the volume of input data. At the other end of the spectrum,
In this research, as no sufficient information on the data although the upscaling method would lead to loss of some

123
11968 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Fig. 15 Comparison between noisy and denoised mud-logging data in the studied interval of Well A

Fig. 16 Comparison between


noisy and denoised petrophysical
log data in the studied interval of
Well A

data, but rather provides for a higher degree of reliability. well to perform further processing with the final aim of ROP
Therefore, continuing with the research, the average value of modeling.
the petrophysical logs over each meter of the studied depth Investigation of how different independent variables affect
was reported as the corresponding value on the upscaled log. one another as well as the dependent parameter can provide a
Figure 17 compares the original and upscaled petrophysical good basis for understanding the inter-parameter associations
logs along Well A. Upon scaling the petrophysical logs to before proceeding to intelligent modeling. Figures 18 and 19
the scale of mud-logging data, a database was built for each indicate heat maps corresponding to the correlation matrices
of the petrophysical logs and mud-logging data at Wells A

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11969

Fig. 17 Comparison between


original petrophysical logs
acquired every 0.1524 m along
Well A and the upscaled values
to a sampling interval of 1 m over
the studied interval at Well A

Fig. 18 Heat map of correlation matrices of the petrophysical logs and mud-logging data at Well A

123
11970 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Fig. 19 Heat map of correlation matrices of the petrophysical logs and mud-logging data at Well B

Table 4 Optimal values of the adjustable parameters of the NSGA-II due to the need for preparing such a vast amount of input
algorithm data. On the other hand, since all field-measured data are
Parameter Value contaminated with some level of noise, the introduction of
such a huge volume of noisy data into a model can attenuate
Number of iterations 100 the accuracy and generalizability of the model considerably.
Population ratio 75 This highlights the need for identifying the most significant
Cross-over ratio 0.62 parameters affecting the ROP for developing a simple yet
Mutation ratio 0.05 accurate and widely generalizable prediction model. In this
respect, the hybrid algorithm of MLP-NSGA-II was applied
onto the data from Well A to select the features that were most
strongly correlated to the ROP. In this algorithm, the MLP
and B, respectively. As the figures suggest, at Well A, in NN tries to model the ROP based on the set of features intro-
contrast with Well B, most of the investigated parameters duced by NSGA-II. Existing differences in the measurement
were found to be inversely correlated to the ROP. At both scales of different parameters could lead to weights and/or
wells, compared to other studied parameters, the depth and biases that were irrelevant to the effect of the correspond-
mud weight exhibited much more significant relationships ing features on the ROP. Accordingly, one had to normalize
with the ROP. It was also evident that the ROP increases the parameters to be able to select more appropriate features.
with RPM and DT. Interestingly, WOB was found to exhibit For this purpose, setting a lower (X min ) and an upper bound
no association with the ROP at Well B. This was while an (X max ) to the values (X) of each feature, Eq. (15) was used
increase in WOB at Well A tended to decrease the ROP. Given to normalize the values in the range of [-1, 1].
the high resistivity of the rocks drilled at both wells, the more  
significant effect of RPM, rather than WOB, on the ROP in X − X min
X norm 2 −1 (15)
the studied intervals was pretty expected. X max − X min
As indicated earlier, numerous parameters affect the ROP.
Considering all of these parameters in the ROP modeling In order to achieve the best results with the hybrid algo-
can drastically affect the complexity of the developed model rithm of MLP-NSGA-II, it was necessary to optimize its
and, at the same time, limit the applicability of this model adjustable parameters. For this purpose, firstly, MLP NNs

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11971

Fig. 20 Variations in error and


COV with the number of input
parameters into ROP modeling
by means of MLP-NSGA-II

Table 5 Best features selected


for different numbers of input No. of Input parameters RMSE R-square
parameters into MLP-NSGA-II Features (ft/h)
algorithm for ROP modeling (the
set of best features selected for 1 DEPTH 7.0059 0.4784
further analysis is highlighted in 2 DEPTH, RPM 5.5182 0.5808
bold)
3 DEPTH, RPM, MW 4.6279 0.6852
4 DEPTH, RPM, MW, WOB 4.0705 0.7496
5 DEPTH, RPM, MW, WOB, DT 3.7496 0.7991
6 DEPTH, RPM, MW, WOB, DT, TFLO 3.5008 0.8313
7 DEPTH, RPM, MW, WOB, DT, TFLO, NPHI 3.3984 0.8609
8 DEPTH, RPM, MW, WOB, DT, TFLO, NPHI, d-exp 3.3895 0.8753
9 DEPTH, RPM, MW, WOB, DT, TFLO, NPHI, d-exp, ECD 3.3893 0.8817
10 DEPTH, RPM, MW, WOB, DT, TFLO, NPHI, d-exp, 3.3893 0.8844
ECD, TRQ
11 DEPTH, RPM, MW, WOB, DT, TFLO, NPHI, d-exp, 3.3892 0.8865
ECD, TRQ, RT
12 DEPTH, RPM, MW, WOB, DT, TFLO, NPHI, d-exp, 3.3892 0.8886
ECD, TRQ, RT, RHOB
13 DEPTH, RPM, MW, WOB, DT, TFLO, NPHI, d-exp, 3.3893 0.8893
ECD, TRQ, RT, RHOB, CGR
14 DEPTH, RPM, MW, WOB, DT, TFLO, NPHI, d-exp, 3.3893 0.8894
ECD, TRQ, RT, RHOB, CGR, PEF

with different structures were used to model ROP on the the NSGA-II. Table 4 reports the optimal values of these
data from Well A. The tenfold cross-validation algorithm was parameters upon conducting the sensitivity analysis.
used to reach a stable accuracy in the modeling. Results of this Once finished with setting the adjustable parameters of
sensitivity analysis showed that an MLP NN with three hid- the MLP-NSGA-II algorithm, the algorithm can be applied
den layers containing 6, 5, and 6 neurons, respectively, could to the data from Well A. Figure 20 shows variations of
lead to highly accurate models. As a next step, adjustable RMSE and coefficient of variation (COV) for the best set
parameters of the NSGA-II algorithm were optimized to of variables with different numbers of input features into the
realize two objectives, namely minimizing the number of MLP-NSGA-II algorithm. As this figure suggests, the model-
input variables into the MLP NN and minimizing the root- ing error decreases while the COV increases with increasing
mean-square error (RMSE) of modeling with the NSGA-II. the number of input parameters. But the rate of this improve-
Sensitivity analysis was performed on the number of itera- ment decreases as the number of input parameters grows
tions, population size, and mutation and crossover ratios for

123
11972 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Fig. 21 Optimizing the MELM


structure by combining it with
PSO algorithm

larger so that no significant reduction in the ROP model- Table 6 Best values of the adjustable parameters of the PSO algorithms
ing error was observed for any number of input parameters for optimizing the MELM structure
beyond 7. In the meantime, the learning process with 8 input Controllable PSO
parameters took some 30% more time than that with 7 input parameters
parameters. Table 5 reports the set of best features obtained
Optimization of Optimization of the
for the MLP-NSGA-II by considering different numbers of weights and biases number of hidden
input parameters for the purpose of ROP modeling. The table layers and nodes
shows that, setting the number of input parameters to 7, the
best set of features included DEPTH, RPM, MW, WOB, DT, Population 80 60
TFLO, and NPHI. Accordingly, these seven parameters were Iteration 200 60
used for ROP modeling in this work. C1 2.05 2.05
C2 2.05 2.05
Inertia weight 0.97 0.96
4.2 Developing Predictor Models

In order to undertake modeling with a MELM hybridized adjust the controllable parameters of the optimization algo-
with an optimization algorithm, it is necessary to start by rithms as well as the MELM. The PSO algorithms had their
determining the optimal numbers of layers and nodes per parameters adjusted through a trial-and-error approach, with
layer. Herein we used PSO to optimize the structure of the results reported in Table 6. The incorporation of two
MELM. That is, the numbers of layers and nodes of the optimization algorithms into the structure of MELM would
MELM was determined by the PSO before proceeding to increase the processing time considerably. In order to tackle
ROP modeling using the MELM. In this process, the PSO this problem, we limited the allowable ranges of the numbers
algorithm improves its results based on the modeling error. of layers and nodes for the PSO that was aimed at optimizing
On the other hand, the modeling error with a MELM is the model structure. This was done by testing MELM-PSO
highly dependent on the values of the hyperparameters of the models with different numbers of layers containing the same
MELM. Therefore, in order to come up with an appropriate number of neurons in each layer and proceeded to ROP mod-
judgment about different structures of the MELM, PSO was eling, finally storing the resultant RMSE, with the results
deployed to optimize the weights and biases of the MELM. shown in Table 7 for the data from Well A. The table proves
Figure 21 shows how the MELM structure was optimized by that the estimator model had its accuracy increased with the
combining two different PSO algorithms. In order to imple- numbers of layers and nodes per layer. But models with 8
ment the PSO-MELM-PSO algorithm, one should properly layers exhibited somewhat higher RMSEs than those with

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11973

Table 7 RMSEs obtained from


MELMs of different structures Number of nodes in each layer Number of layers
based on the data from Well A
2 4 6 8

5 3.5058 3.2683 3.0847 3.0851


10 3.4041 3.0702 2.8916 2.9389
15 3.2599 2.9382 2.6076 2.8508
20 3.3274 3.0184 2.6981 2.9011

was adopted for further analyses and ROP prediction on the


data from Well A. Considering the type of selected kernel
function, the optimization algorithm had to find the optimal
values of γ and σ 2 . Known as regularization parameter, γ
contributes to reduced complexity coupled with improved
performance of the output model. Indeed, an increase in γ
translates into increased complexity or, say, the nonlinearity
of the model. Also known as squares of bandwidth, σ 2 is the
parameter of the RBF kernel, and its value controls the extent
of the considered neighborhood in the model. The higher the
value of this parameter, the more neighboring data involved
in the modeling, which ends up with a more nonlinear model.
Therefore, the accuracy and generalizability of an LSSVM
are highly determined by the values of these two parameters.
Fig. 22 Comparison between RMSEs obtained with different kernels Implementation of a hybrid algorithm for predicting the
for LSSVM in ROP modeling on the data from Well A
ROP requires not only adjusting the structure and selecting
appropriate kernel function for the estimator algorithm but
also optimizing the controllable parameters for the optimizer
6 layers. Respecting this finding, we limited the range of algorithm. Therefore, at this stage, we followed a trial-and-
changes in the number of layers of PSO to 4–6. When it came error approach to the optimization of controllable parameters
to the count of nodes per layer, Table 6 depicts that the models of the optimizer algorithms. Table 8 provides optimal values
with 20 nodes per layer were associated with higher levels of of the mentioned parameters in the COA, PSO, and GA for
error than the models with 15 neurons per layer, convincing us different estimator algorithms. Performing these algorithms
to limit the range of the number of nodes per layer in between for different numbers of iterations, it was figured out that
10 and 20. Applying these constraints onto the PSO-MELM- they converge to a solution way before the 200th iteration.
PSO algorithm, optimal structure of the MELM was found Accordingly, we set the maximum number of iterations for
based on the data from Well A. According to the results, the ROP modeling to 200.
lowest level of error (RMSE  2.4681 ft/h) was found with Once finished with adjusting the structure and control-
a MELM with 6 layers containing 15, 18, 20, 20, 16, and 15 lable parameters of the hybrid algorithms of MELM-GA,
nodes in the 1st to the 6th layers, respectively. MELM-PSO, MELM-COA, MELM-GA, LSSVM-PSO, and
Knowing that the LSSVM algorithm is highly dependent LSSVM-COA, one could apply them to the data from Well
on the choice of kernel function and initialization of its hyper- A for the sake of ROP modeling. In parallel to these hybrid
parameters, it is necessary to select the appropriate kernel algorithms, we further used simple LSSVM (with RBF ker-
function considering the conditions of the problem at hand nel) and CNN algorithms on the same data for the sake of
before proceeding to the actual modeling. For this purpose, comparison. Optimum kernel size and number of convolu-
several LSSVMs were built with various kernel functions for tional filters were found to be 5 and 300, respectively, upon
ROP prediction on the data from Well A. Data separation was a sensitivity analysis on the CNN. Training and testing were
carried out using tenfold cross-validation. Figure 22 shows performed using 80% and 20% of the data from Well A,
a bar chart of error for LSSVMs developed using different respectively. The tenfold cross-validation algorithm was fur-
kernel functions for the purpose of ROP modeling on the data ther used in the training process of the models.
from Well A. As indicated by the figure, the LSSVM built Figure 23 shows variations in error at different iterations
using the RBF exhibited the smallest levels of RMSE com- of the optimization algorithms when training the MELM and
pared to other models. Given this finding, this kernel function

123
11974 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Table 8 Optimized values of


controllable parameters of the Optimization algorithms Parameters Estimator algorithms
GA, PSO, and COA optimizers,
as per a trial-and-error approach LSSVM MELM
on the data from Well A
COA Size of the initial population 40 50
Maximum number of cuckoos 100 140
Minimum number of eggs for 5 5
each cuckoo
Maximum number of eggs for 10 10
each cuckoo
Number of clusters 1 1
PSO Swarm size 120 150
Cognitive constant 2.05 2.05
Social constant 2.05 2.05
Inertia weight (damping ratio) 0.97 0.97
GA Population size 140 170
Selection method Roulette wheel Roulette wheel
Crossover Uniform Uniform (p  1)
Mutation Uniform (p  0.05) Uniform (p  1)
Mutation ratio 0.05 0.05
Selection pressure (roulette 2 2
wheel)

(a) (b)

Fig. 23 Variations in error at different iterations of the hybrid algorithms of a MELM and b LSSVM in the training process on the data from Well
A

LSSVM models. As the figure suggests, COA provided for of the models underestimated the ROP at higher ROP lev-
a higher convergence rate compared to the GA and PSO els (as shown by deviation of the best-fit line to the left of
and could further get to a global solution rather than being the Y  T line). This can be linked to the lower availability
trapped in local optima (which was the case with the GA and of data at higher ROP values. The intensity of this prob-
PSO). A comparison between the two estimator algorithms, lem is, however, lower with the CNN model, as compared
namely MELM and LSSVM, in this figure shows that the to other models, while simple form of LSSVM is associ-
hybridized MELM algorithms outperformed the hybridized ated with the largest deviations at higher ROPs. This leads
LSSVM algorithms in ROP prediction. us to the conclusion that hybrid models are generally more
Shown in Fig. 24 are cross-plots of measured ROP against accurate than the simple form of LSSVM. Table 9 compares
estimation results of the hybrid algorithms as well as LSSVM the developed models based on various error indices and the
and CNN in the training stage. The figure indicates that all coefficient of determination (COD) based on the data from

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11975

(a) MELM-GA (b) MELM-PSO

(c) MELM-COA (d) LSSVM-GA

(e) LSSVM-PSO (f) LSSVM-COA

Fig. 24 Cross-plots of measured ROP against estimation results of the hybrid algorithms and DL method on the data from Well A

Well A. As is evident from the table, the CNN exhibited from Well A. Once again, the figure shows that the models
much lower RMSEs than the competing models. Focusing on underestimated the ROP at higher ROP levels. As this effect
the hybrid models, MELM-COA and LSSVM-COA showed was already observed in the training phase, this outcome in
higher accuracies than the others. the test phase was pretty anticipated. The smallest and largest
Figure 25 presents cross-plots of measured ROP against deviations of the best-fit line out of the Y  T line at higher
estimation results of the trained hybrid algorithms on the data

123
11976 Arabian Journal for Science and Engineering (2022) 47:11953–11985

(g) LSSVM (h) CNN

Fig. 24 continued

Table 9 Comparing prediction


results of the trained models on Model R-square APD AAPD SD RMSE (ft/h)
the data from Well A based on
different error indices and COD MELM-GA 0.9771 − 4.4800 9.5347 2.1559 2.1550
MELM-PSO 0.9773 − 3.6077 8.7250 2.1076 2.1072
MELM-COA 0.9788 − 3.6388 8.0373 1.9261 1.9256
LSSVM-GA 0.9744 − 4.4139 9.2356 2.2290 2.2282
LSSVM-PSO 0.9734 − 3.9945 9.8113 2.1589 2.1598
LSSVM-COA 0.9748 − 4.0962 9.1399 2.0839 2.0831
LSSVM 0.9814 − 2.8827 10.4080 2.6168 2.6564
CNN 0.9828 − 3.2572 7.9676 1.7747 1.7746

ROPs were seen with CNN and LSSVM models, respec- less significant on the results of CNN modeling. Error his-
tively. Table 10 compares the results of trained models on togram of the CNN model showed lower values of the mean
the data from Well A based on a handful of error indices (μ) and standard deviation (σ ) compared to other models,
and COD. As the table suggests, CNN provided significantly following a more-or-less normal distribution. Focusing on
lower levels of error than the other models. Moreover, the the hybrid algorithms, the MELM-COA and LSSVM-COA
hybrid models produced lower levels of error than their sim- showed close-to-normal distributions. Table 11 presents the
ple forms, with the highest accuracies, among the hybrid results of a comparison between different trained models in
models, exhibited by the MELM-COA and LSSVM-COA ROP prediction at Well B based on various error indices
models. The smaller difference between RMSEs of the train- and COD. The table suggests that the CNN tends to pro-
ing and testing phases for the CNN model, compared to other duce the smallest RMSE compared to all other models,
studied models, could indicate better generalizability of this indicating the broad generalizability of this model. Note-
model to similar formations at the wells within the studied worthily, the MELM-COA and LSSVM-COA hybrid models
field. produced RMSEs well comparable to those of the CNN,
proving their acceptable generalizability. Figure 27 compares
4.3 Models Validation the estimation results of the best models (i.e., MELM-COA,
LSSVM-COA, and CNN) against measured ROPs over the
In order to put the generalizability of the trained models studied interval at Well B in the form of depth profiles. The
on test, they were used to estimate ROP on similar forma- figure shows that all of the three models succeeded to predict
tions at Well B. Figure 26 shows the histogram of error onto the changes in ROP over the drilled depth.
which a normal distribution is fitted for different ROP pre-
diction models at Well B. This figure shows some right-ward
skewness on the error histogram, which refers to the under-
estimation of ROP at higher ROP values. This is, however,

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11977

(a) MELM-GA (b) MELM-PSO

(c) MELM -COA (d) LSSVM-GA

(e) LSSVM-PSO (f) LSSVM -COA

Fig. 25 Cross-plots of measured ROP against estimation results of the hybrid algorithms and DL method based on the data from Well A

123
11978 Arabian Journal for Science and Engineering (2022) 47:11953–11985

(g) LSSVM (h) CNN

Fig. 25 continued

Table 10 Comparing prediction


results of the trained models on Model R-square APD AAPD SD RMSE (ft/h)
the data from Well A based on
different error indices and COD MELM-GA 0.8873 − 5.8894 13.4009 3.2565 3.2519
MELM-PSO 0.8922 − 5.8637 14.1165 3.1752 3.1710
MELM-COA 0.9273 − 4.3837 11.9959 2.7108 2.7073
LSSVM-GA 0.8826 − 4.2034 14.1440 3.3888 3.4016
LSSVM-PSO 0.8968 − 5.4878 13.8043 3.2063 3.2015
LSSVM-COA 0.8926 − 5.3479 12.9394 3.0885 3.0840
LSSVM 0.9373 − 1.3689 13.0998 3.3135 3.4238
CNN 0.9289 − 4.4522 10.8538 2.5367 2.5356

5 Conclusion can achieve an optimal combination of training time (min-


imum), accuracy (maximum), and complexity (simplest)
In this study, hybridized forms of MELM and LSSVM with for the model with exactly 7 input parameters.
COA, PSO, and GA along with simple forms of the LSSVM • According to the results of the MLP-NSGA-II algorithm,
and CNN algorithms were used to model the ROP at two the optimum set of input parameters to the final algo-
vertically drilled wells in two different hydrocarbon fields. rithms included DEPTH, RPM, MW, WOB, DT, TFLO,
For this purpose, the input data (petrophysical logs and mud- and NPHI.
logging data) were denoised using the median filter. Then, the • Training the models using 80% of the data at Well A
features imposing the most significant impacts on the ROP showed that the CNN-based model could produce the
were selected using the MLP-NSGA-II algorithm. The data smallest errors (RMSE  1.7746 ft/hr), as compared to
on the selected features at one of the wells (Well A) were other models.
used to train and test the models, with the data acquired at • Application of the trained models to the test data at Well
the other well used for validating the models. The following A showed that the CNN-based model was still associated
conclusions were drawn out of this study: with the lowest levels of error (RMSE  2.5356 ft/hr), as
compared to the other models.
• The smaller difference in error between the training and
testing phases for the CNN-based model indicated its
• Application of the feature selection algorithm showed that higher generalizability to similar areas (other wells in the
an increase in the number of input parameters to the esti- studied field).
mator algorithms increases their accuracy and the resultant
coefficient of determination (COD). This performance
improvement, however, becomes negligible as the num-
ber of input parameters exceeds 7. This implies that one

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11979

(a) MELM-GA (b) MELM-PSO

(c) MELM -COA (d) LSSVM-GA

(e) LSSVM -PSO (f) LSSVM-COA

Fig. 26 Error histogram and fitted normal distribution (red line) for eight models on the data from Well B

123
11980 Arabian Journal for Science and Engineering (2022) 47:11953–11985

(g) LSSVM (h) CNN

Fig. 26 continued

Table 11 A comparison between


trained models on ROP Model R-square APD AAPD SD RMSE (ft/h)
prediction at Well B based on
COD and various error indices MELM-GA 0.9991 − 9.9742 16.1760 3.4634 3.5750
MELM-PSO 0.9957 − 13.8909 18.4860 3.0728 3.0797
MELM-COA 0.9903 − 18.2783 22.0145 2.6024 2.6345
LSSVM-GA 0.9973 − 28.2629 34.5391 4.5075 4.5129
LSSVM-PSO 0.9932 − 7.7959 16.9898 3.2137 3.2911
LSSVM-COA 0.9911 − 16.2236 18.5532 2.6474 2.7686
LSSVM 0.9684 − 20.8521 31.5046 4.9122 4.9131
CNN 0.9890 − 10.6179 16.7185 2.5547 2.5791

• The hybridized form of the LSSVM model was found to • Application of the models for predicting the ROP at Well
be more accurate than its simple form, proving the supe- B showed that the CNN-based model could produce much
rior performance of metaheuristic algorithms in terms of more accurate results than the other models, further prov-
realizing more accurate results. ing the high generalizability of this model.
• The hybridized form of the LSSVM model was found to • Given the better results of the proposed methodology in
be more accurate than its simple form, proving the supe- this study, application of this methodology for predicting
rior performance of metaheuristic algorithms in terms of the ROP at vertical wells penetrating other fields is strongly
realizing more accurate results. recommended if an adequate volume of data is available.
• The hybridized forms of MELM algorithm produced
lower RMSEs than the corresponding LSSVM-based algo-
rithms.

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11981

Fig. 27 A comparison of outputs


among the best estimator models
over the depth profile of Well B

6 Appendix f 2  e2.303a2 (10000−D) (18)

6.1 Appendix A: More Applicable Physics-Based


f 3  e2.303a3 D
0.69
(g p −ρc ) (19)
Models

6.1.1 Bourgoyne and Young Model f 4  e2.303a4 D (g p −ρc ) (20)

In 1965, Bourgoyne and Young developed a model for relat- ⎡  ⎤a5


ing the ROP to different drilling fluid properties, formation
WOB
− WOB
⎢ db db ⎥
characteristics, and mechanical parameters using exponen- f5  ⎣  t
⎦ (21)
4− WOB
tial equations for roller-cone bits (Eq. (16)) [4]. This model is db t
based upon eight components ( f 1 through f 8 ) and eight coef-  a6
RPM
ficients (a1 through a8 ). Equations (16) through (24) express f6  (22)
60
the mentioned components.
f 7  e−a 7 h (23)
ROP  f 1 × f 2 × f 3 × f 4 × f 5 × f 6 × f 7 × f 8 (16)
 a8
Fj
f 1  e2.303a1 (17) f8  (24)
1000

123
11982 Arabian Journal for Science and Engineering (2022) 47:11953–11985

Table 12 Upper and lower


bounds to constant coefficients of Constant Bourgoyne and Young [4] Eren [11] Nascimento et al. [21]
the Bourgoyne and Young model Coefficient
according to different studies Lower Upper Lower bound Upper Lower Upper
bound bound bound bound bound

a1 0.5 1.9 1.0005588 3.29142 0.5 3.91


a2 0.000001 0.0005 0.000191 0.00479 0.000001 0.00479
a3 0.000001 0.0009 0.00035 0.6588 0.000001 0.65885
a4 0.000001 0.0001 0.000057 0.00034 0.000001 0.00086
a5 0.5 2.0 0.102882 0.8528 0.102882 2.0
a6 0.4 1.0 0.48 1.684 0.4 2.23
a7 0.3 1.5 0.284286 2.587 0.025 2.5873
a8 0.3 0.6 0.63243 1.0805 0.3 1.0805

The coefficients a1 through a8 are constants applied to the number of inserts on the bit, nt is the number of inserts in con-
corresponding parameters, including the formation drillabil- tact with rock, ω is the cutting angle of the formation rock,
ity, normal compaction trend, under-compaction exponent, and τ is called the comprehensive factor. The parameter α in
differential pressure exponent, WOB, rotational speed com- this model is a constant.
ponent, bit wear exponent, and hydraulic exponent, respec-
tively. Ranges of the coefficients a1 through a8 are set 6.1.4 Motahhari et al. Model
according to the drilling conditions and shall be determined
considering previous drilling reports. Table 12 reports the Motahhari et al. proposed an ROP prediction model for PDC
presented ranges of the coefficients for the Bourgoyne and bits, as given in Eq. (27) [7].
Young model (1974) as per different studies.
 
G × RPMγ × WOBβ
6.1.2 Warren Model RO P  W f (27)
db × CCS
Warren proposed a model for roller-cone bits, as expressed
in Eq. (25) [5]. in which WOB, RPM, d b , and CCS refer to weight-on-
bit, rotational speed, bit diameter, and confined compressive
 −1 strength of the rock, respectively. Moreover, W f is the bit
a × UCS2 × db2 c wear function and β, γ , and G are constant coefficients. A
ROP  + (25)
RPM × WOB
b 2 RPM × d challenge encountered when trying to implement this model
is the method of evaluating the bit wear function, which
where WOB, RPM, d b , and UCS refer to weight-on-bit, requires careful assessment of the bit wear at well site.
rotational speed, bit diameter, and unconfined compressive
strength of the rock, respectively. Moreover, a, b, c, and d 6.1.5 Al-Abduljabbar et al. Model
are constant coefficients.
In 2018, Al-Abduljabbar et al. proposed a special model for
6.1.3 Harreland and Rampersad Model predicting ROP for PDC bits, as per Eq. (28) [8].
 
The Harreland and Rampersad model was originally pro- TRQ × SPP × TFLO × RPM × WOB f
posed for PDC bits, as written in Eq. (26) [6]. ROP  16.96
db2 × ρ × PV × UCSe
 2 (28)
80 × n t × m × RPMa WOB
ROP  τ
db2 × tan2 ω 100 × n t × UCS In this model, WOB, RPM, d b , and UCS refer to weight-
(26) on-bit, rotational speed, bit diameter, and unconfined com-
pressive strength of the rock, respectively. Moreover, SPP
where WOB, RPM, d b , and UCS refer to weight-on-bit, denotes system pore pressure, TFLO is the fluid flow rate,
rotational speed, bit diameter, and unconfined compressive TRQ is the torque, PV is the mud viscosity, and ρ is the bit
strength of the rock, respectively. Moreover, m refers to the wear factor, with e and f being constant coefficients.

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11983

Table 13 A comparison between


the results of the models trained Step Model R-square APD AAPD SD RMSE (ft/h)
on the noisy and denoised data
for predicting the ROP in the Training MELM-GA 0.9756 − 4.6748 9.5518 2.2108 2.2113
training and testing phases MELM-PSO 0.9762 − 3.7482 8.8592 2.1675 2.1695
MELM-COA 0.9769 − 3.6041 8.2079 1.9589 1.9598
LSSVM-GA 0.9738 − 4.4219 9.2675 2.2811 2.2816
LSSVM-PSO 0.9725 − 4.1927 9.9359 2.2112 2.2147
LSSVM-COA 0.9767 − 4.1087 9.1904 2.1288 2.1302
LSSVM 0.9789 − 2.9671 10.7493 2.7436 2.7448
CNN 0.9802 − 3.1728 8.0057 1.9018 2.9031
Test MELM-GA 0.8792 − 5.9075 13.4862 3.3186 3.3185
MELM-PSO 0.8803 − 5.8762 14.1379 3.2407 3.2406
MELM-COA 0.8995 − 4.4981 12.0958 2.9516 2.9509
LSSVM-GA 0.8790 − 4.5008 14.2366 3.4172 3.4174
LSSVM-PSO 0.8796 − 5.5993 13.8594 3.2401 3.2400
LSSVM-COA 0.8884 − 5.4188 13.0527 3.1975 3.1972
LSSVM 0.8945 − 2.9596 13.1287 3.4169 3.4159
CNN 0.8936 − 4.4607 11.0813 2.6903 2.6908

Table 14 A comparison between


the results of the models trained Model R-square APD AAPD SD RMSE (ft/h)
on the noisy and denoised data
for predicting the ROP in the MELM-GA 0.9911 − 9.9815 16.5079 3.4983 3.6082
validation phase MELM-PSO 0.9862 − 13.5892 18.6490 3.1207 3.1229
MELM-COA 0.9855 − 17.9908 22.4611 2.6994 2.7286
LSSVM-GA 0.9905 − 28.0072 35.1125 4.5398 4.5443
LSSVM-PSO 0.9856 − 8.0004 17.4923 3.2594 3.3371
LSSVM-COA 0.9853 − 16.4928 18.8007 2.7008 2.8296
LSSVM 0.9562 − 21.1067 32.1851 5.0172 5.0195
CNN 0.9839 − 10.7536 16.9916 2.5953 2.6307

6.1.6 Bingham Model in this study were applied to noisy data. For this purpose,
the petrophysical logs that were selected through feature
The Bingham model was proposed for both common types selection by the MLP-NSGA-II were declared as input to
of bits, as per Eq. (29) [9]. the algorithms in the training phase. Similar to the training
phase, the controllable parameters of the algorithms were
  A5
WOB determined based on the denoised data. Given that the range
ROP  Fd × RPM (29) of input data could affect the values of the considered criteria,
db
the models that were trained on noisy data were applied to
in which WOB, RPM, and d b refer to weight-on-bit, rota- the denoised data of the training and testing phases to predict
tional speed, and bit diameter, respectively. In addition, the ROP. Table 13 reports the results of applying the models
Fd denotes the formation drillability, and A5 is the WOB that were trained on the noisy data to the denoised data of
exponent. A common problem with this method is the deter- the training and testing phases based on the estimation error
mination of A5, a task that can be accomplished only based and coefficient of determinations. A comparison between this
on in-lab tests on the bit. table and Tables 9 and 10, which refer to the models that
were trained on the denoised data in the training and test-
6.2 Performance Evaluation of the Denoising Step ing phases, highlights that the models that were trained on
the denoised data provided for higher levels of accuracy. In
In order to evaluate the effect of the noise attenuation process addition, the smaller difference in estimation error between
on the performance of the models, the algorithms adopted

123
11984 Arabian Journal for Science and Engineering (2022) 47:11953–11985

the training and testing phases for the denoised data, as com- 6. Hareland G.; Rampersad P.R.: Drag—Bit model including wear.
pared to the noisy data, indicated the higher reliability of the Society of Petroleum Engineers (1994)
7. Motahhari H.R.; Hareland G.; James J.A.; Bartlomowicz M.:
models trained on the denoised data. A similar performance Improved drilling efficiency technique using integrated PDM and
evaluation step was conducted in the validation phase. The PDC bit parameters. Petroleum Society of Canada (2008)
ROP estimation error generated upon training the model on 8. Al-AbdulJabbar, A.; Elkatatny, S.; Mahmoud, M.: A robust rate
the noisy data and then applying the trained model on the of penetration model for carbonate formation. J. Energy Resour.
Technol. 141, 042903 (2019)
denoised data from Well B (Table 14) was compared to that 9. Bingham G.: A new approach to interpreting rock drillability. Tech
for the model trained on the denoised data for the validation Man Repprint Oil Gas J 93 P. (1965)
phase (Table 11), indicating the higher generalizability of the 10. Etesami D.; Zhang W.J.; Hadian M.: A formation-based approach
models trained on the denoised data. for modeling of rate of penetration for an offshore gas field using
artificial neural networks, J. Nat. Gas Sci. Eng., 104104, ISSN
1875-5100, https://doi.org/10.1016/j.jngse.2021.10410 (2021)
Authors’ Contributions Morteza Matinkia took part in data curation and
11. Eren T.: Real time optimization of drilling parameters during
visualization; Amirhossein Sheykhinasab involved in visualization, and
drilling operations. PhD. thesis, Middle East Technical University
writing—original draft; Soroush Shojaei involved in methodology, for-
(2015)
mal analysis, validation, visualization; Ali Vojdani Tazeh Kand took part
12. Kutas D.T.; Nascimento A.; Elmgerbi A.M.; Roohi A.; Prohaska
in investigation, and writing—review & editing; Arad Elmi involved
M.; Thonhauser G.; Mathias M.H.: A Study of the Applicability of
in writing—original draft, and formal analysis; Mahdi Bajolvand took
Bourgoyne and Young ROP Model and Fitting Reliability through
part in investigation and visualization; Mohammad Mehrad involved in
Regression. Paper presented at the International Petroleum Tech
project administration and code developer.
Conf, Doha, Qatar, December. Doi: https://doi.org/10.2523/IPTC-
18521-MS (2015)
Funding There is no funding for this study. 13. Anemangely, M.; Ramezanzadeh, A.; Tokhmechi, B.: Determina-
tion of constant coefficients of Bourgoyne and Young drilling rate
model using a novel evolutionary algorithm. J. Min. Environ. 8,
693–702 (2017). https://doi.org/10.22044/jme.2017.842
Declarations 14. Bahari, M.H.; Bahari, A.; Moharrami, F.N.; Naghabi Sistani, M.:
Determination of Bourgoyne and Young Model coefficient using
Conflict of interest The authors declare that they have no known Genetic Algorithm to Predict Drilling Rate. J. Appl. Sci. (2008).
competing financial interests to personal relationships that could have https://doi.org/10.3923/jas.2008.3050.3054
appeared to influence the work reported in this paper. 15. Hegde, C.; Gray, K.E.: Use of machine learning and data analytics
to increase drilling efficiency for nearby wells. J. Nat. Gas. Sci. Eng.
Availability of Data and Material Due to the nature of this research, 40, 327–335 (2017). https://doi.org/10.1016/j.jngse.2017.02.019
participants of this study did not agree for their data to be shared publicly, 16. Bilgesu, H.I.; Tetrick, L.T.; Altmis, U.: A New Approach for
so supporting data are not available. the Prediction of Rate of Penetration (ROP) Values. Society of
Petroleum Engineers (1997)
Code Availability Developed codes are not available. 17. Pollock J.; Stoecker-Sylvia Z.; Veedu V.: Machine Learning for
Improved Directional Drilling. In: Offshore Technology Confer-
Consent to Participate All authors participated in this research. ence. Offshore Technology Conference (2018)
18. Sabah, M.; Mohsen, T.; Wood, D.A.; Khosravanian, R.; Ane-
Consent for Publication All authors agreed to submit and publish this mangely, M.; Younesi, A.: A machine learning approach to predict
manuscript in the Arabian Journal for Science and Engineering. drilling rate using petrophysical and mud logging data. Ear. Sci.
Info. (2019). https://doi.org/10.1007/s12145-019-00381-4
19. Gan C.; Cao W.; Wu M.: Prediction of drilling rate of penetration
(ROP) using hybrid support vector regression: A case study on the
References Shennongjia area, Central China. J. Pet. Sci. Eng. 106200 (2019)
20. Ahmed, O.; Adeniran, A.; Samsuri, A.: Rate of penetration pre-
1. Anemangely, M.; Ramezanzadeh, A.; Tokhmechi, B.: Drilling rate diction utilizing hydro spec energy. Intechopen (2018). https://doi.
prediction from petrophysical logs and mud logging data using an org/10.5772/intechopen.76903
optimized multilayer perceptron neural network. J. Geophys. Eng. 21. Nascimento, A.; Elmgerbi, A.; Roohi, A.: Reverse engineering:
15, 1146–1159 (2018) a new well monitoring and analysis methodology approaching
2. Augustine C.; Tester J.W.; Anderson B.: A comparison of geother- playing-back drill-rate tests in real-time for drilling optimiza-
mal with oil and gas well drilling costs. In: 7 Proceedings. Stanford tion. J. Energy Resour. Technol. (2017). https://doi.org/10.1115/
University, Stanford, California, p. 16 (2006) 1.4033067
3. Amar K.; Ibrahim A.: Rate of penetration prediction and optimiza- 22. Gandelman, R.A.: Prediçao da ROP e otimizaçao em tempo real
tion using advances in artificial neural networks: a comparative de parâmetros operacionais na perfuraçao de poços de petróleo
study. In: Proceedings of the 4th International Joint Conference on offshore. Ph.D thesis, Federal University of Rio de Janeiro (2012)
Computational Intelligence. SciTePress—Science and and Tech- 23. Ahmed, O.S.; Adeniran, A.A.; Samsuri, A.: Computational intelli-
nology Publications, Barcelona, Spain, pp. 647–652. https://doi. gence based prediction of drilling rate of penetration: a comparative
org/10.5220/0004172506470652 (2012) study. J. Pet. Sci. Eng. 172, 1–12 (2019)
4. Bourgoyne, A.T.J.; Young, F.S.J.: A multiple regression approach 24. Bani Mustafa, A.; Abbas, A.K.; Alsaba, M.: Improving drilling
to optimal drilling and abnormal pressure detection. Soc. Pet. Eng. performance through optimizing controllable drilling parameters.
J. 14, 371–384 (1974). https://doi.org/10.2118/4238-PA J. Petrol. Explor. Prod. Technol. 11, 1223–1232 (2021). https://doi.
5. Warren, T.M.: Penetration rate performance of roller cone bits. SPE org/10.1007/s13202-021-01116-2
Drill. Eng. 2, 9–18 (1987). https://doi.org/10.2118/13259-PA

123
Arabian Journal for Science and Engineering (2022) 47:11953–11985 11985

25. Tewari, S.; Dwivedi, U.D.; Biswas, S.: Intelligent drilling of oil 43. Yeom, C.U.; Kwak, K.C.: Short-term electricity-load forecasting
and gas wells using response surface methodology and artificial using a TSK-based extreme learning machine with knowledge rep-
bee colony. Sustainability 13, 1664 (2021). https://doi.org/10.3390/ resentation. Energies 10(10), 1613 (2017). https://doi.org/10.3390/
su13041664 en10101613
26. Elkatatny, S.: Development of a new rate of penetration model using 44. Huang, G.B.; Wang, D.H.; Lan, Y.: Extreme learning machines: a
self-adaptive differential evolution-artificial neural network. Arab. survey. Int. J. Mach. Learn. Cybern. 2, 107–122 (2011). https://doi.
J. Geosci. (2019). https://doi.org/10.1007/s12517-018-4185-z org/10.1007/s13042-011-0019-y
27. Ashrafi, S.B.; Anemangely, M.; Sabah, M.; Ameri, M.J.: Appli- 45. Vapnik, V.: The Nature of Statistical Learning Theory. Springer
cation of hybrid artificial neural networks for predicting rate of Science & Business Media, Berlin (2013)
penetration (ROP): a case study from Marun oil field. J. Pet. Sci. 46. Wang, H.; Hu, D.: Comparison of SVM and LS-SVM for regres-
Eng. 175, 604–623 (2019) sion. Int Conf Neural Netw Br. IEEE, pp. 279–283 (2005)
28. Mehrad, M.; Bajolvand, M.; Ramezanzadeh, A.; Neycharan, J.G.: 47. Si, G.; Shi, J.; Guo, Z.; Jia, L.; Zhang, Y.: Reconstruct the support
Developing a new rigorous drilling rate prediction model using a vectors to improve LSSVM sparseness for mill load prediction.
machine learning technique. J. Pet. Sci. Eng. 192, 107338 (2020). Math. Probl. Eng. (2017). https://doi.org/10.1155/2017/4191789
https://doi.org/10.1016/j.petrol.2020.107338 48. Sabah, M.; Mehrad, M.; Ashrafi, S.B.; Wood, D.A.; Fathi, S.:
29. Ansari, H.R.; Sarbaz Hosseini, M.J.; Amirpour, M.: Drilling Hybrid machine learning algorithms to enhance lost-circulation
rate of penetration prediction through committee support vector prediction and management in the Marun oil field. J. Pet. Sci. Eng.
regression based on imperialist competitive algorithm. Carbonates 198, 108125 (2021). https://doi.org/10.1016/j.petrol.2020.108125
Evaporites 32, 205–213 (2017). https://doi.org/10.1007/s13146- 49. Anemangely, M.; Ramezanzadeh, A.; Amiri, H.; Hoseinpour, S.A.:
016-0291-8 Machine learning technique for the prediction of shear wave veloc-
30. Matinkia, M.; Amraeiniya, A.; Behboud, M.M.; Mehrad, M.; ity using petrophysical logs. J. Pet. Sci. Eng. (2019). https://doi.org/
Bajolvand, M.; Gandomgoun, M.H.; Gandomgoun, M.: A novel 10.1016/j.petrol.2018.11.032
approach to pore pressure modeling based on conventional well 50. Duan, K.; Keerthi, S.S.; Poo, A.N.: Evaluation of simple per-
logs using convolutional neural network. J. Pet. Sci. Eng. 211, formance measures for tuning SVM hyperparameters. Neu-
110156 (2022). https://doi.org/10.1016/j.petrol.2022.110156 rocomputing 51, 41–59 (2003). https://doi.org/10.1016/S0925-
31. Mehrad, M.; Ramezanzadeh, A.; Bajolvand, M.; Reza Hajsaeedi, 2312(02)00601-X
M.: Estimating shear wave velocity in carbonate reservoirs from 51. Indolia, S.; Goswami, A.K.; Mishra, S.P.; Asopa, P.: Conceptual
petrophysical logs using intelligent algorithms. J. Pet. Sci. Eng. understanding of convolutional neural network—a deep learning
212, 110254 (2022). https://doi.org/10.1016/j.petrol.2022.110254 approach. Procedia Comput. Sci. 132, 679–688 (2018). https://doi.
32. Abad, A.R.B.; Ghorbani, H.; Mohamadian, N.; Davoodi, S.; org/10.1016/j.procs.2018.05.069
Mehrad, M.; Aghdam, S.K.; Nasriani, H.R.: Robust hybrid machine 52. Nebauer, C.: Evaluation of convolutional neural networks for visual
learning algorithms for gas flow rates prediction through wellhead recognition. IEEE Trans. Neural Netw. 9, 685–696 (1998)
chokes in gas condensate fields. Fuel 308, 121872 (2022). https:// 53. Mrazova, I.; Kukacka, M.: Can deep neural networks discover
doi.org/10.1016/j.fuel.2021.121872 meaningful pattern features? Procedia Computer Sci. 12, 194–199
33. Maletic, J.I.; Marcus, A.: Data Cleansing: Beyond Integrity Anal- (2012)
ysis. In Conference on Information Quality, pp. 200–209 (2000) 54. Rajabioun, R.: Cuckoo optimization algorithm. Appl. Soft. Com-
34. Wu, X.: Knowledge Acquisition from Databases, Intellect books put. 11, 5508–5518 (2011). https://doi.org/10.1016/j.asoc.2011.
(1995) 05.008
35. Garćia, L.P.; de Carvalho, A.C.; Lorena, A.C.: Noisy data set iden- 55. Kenned, J.; Eberhart, R.: Particle swarm optimization. In: Pro-
tification, International Confrece on Hybrid Artificial Intelligence ceedings of ICNN’95-Int Conf Neural Netw. IEEE, pp. 1942–1948
Systems (Springer), pp 629–38 (2013) (1995)
36. Lorena, A.C.; de Carvalho, A.C.: Evaluation of noise reduction 56. de Moura Meneses, A.A.; Machado, M.D.; Schirru, R.: Particle
techniques in the splice junction recognition problem. Genet. Mol. swarm optimization applied to the nuclear reload problem of a
Biol. 27(4), 665–672 (2004) pressurized water reactor. Prog. Nuc. Eng. 51, 319–326 (2009).
37. Gonzalez R.; Woods R.: Digital image processing. Pear- https://doi.org/10.1016/j.pnucene.2008.07.002
son/Prentice Hall; Available: http://books.google.com/books? 57. Pedersen, M.E.H.; Chipperfield, A.J.: Simplifying particle swarm
id¼8uGOnjRGEzoC (2008) optimization. Appl. Soft Comput. 10, 618–628 (2010)
38. Osman, H.; Ghafari, M.; Nierstrasz, O.: The impact of feature selec- 58. Coello, C.C.; Lamont, G.B.; van Veldhuizen, D.A.: Evolution-
tion on predicting the number of bugs. ArXiv180704486 Cs (2018) ary Algorithms for Solving Multi-Objective Problems, 2nd edn.
39. Lee, K.B.; Cheon, S.; Kim, C.O.: A convolutional neural network Springer, US (2007)
for fault classification and diagnosis in semiconductor manufactur- 59. Katoch, S.; Chauhan, S.S.; Kumar, V.: A review on genetic algo-
ing processes. IEEE Trans. Semicond. Manuf. 30, 135–142 (2017) rithm: past, present, and future. Multimed. Tools Appl. 1–36 (2020)
40. Liu, Y.; Chen, G.: Optimal parameters design of oilfield surface 60. Kunjur, A.; Krishnamurty, S.: Genetic algorithms in mechanism
pipeline systems using fuzzy models. Inf. Sci. 120(1–4), 13–21 synthesis. J. Appl. Mech. Robot. 4, 18–24 (1997)
(1999) 61. Michalewicz, Z.; Schoenauer, M.: Evolutionary algorithms for con-
41. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T.: A fast and eli- strained parameter optimization problems. Evol. Comput. 4, 1–32
tist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. (1996)
Comput. 6, 182–197 (2002). https://doi.org/10.1109/4235.996017
42. Liang, N.Y.; Huang, G.B.; Saratchandran, P.; Sundararajan, N.:
A fast and accurate online sequential learning algorithm for
feedforward networks. IEEE Trans. Neural Netw. Learn Syst. 17,
1411–1423 (2006)

123

You might also like