0% found this document useful (0 votes)

4 views

A machine learning approach to predict drilling rate using petrophysical and mud logging data

This research article presents a machine learning approach to predict the rate of penetration (ROP) in drilling using petrophysical and mud logging data. The study evaluates various machine learning algorithms, including artificial neural networks and support vector regression, and identifies key input variables that significantly impact ROP. Results indicate that the hybrid MLP-PSO model outperforms other algorithms in accuracy, particularly when applied to filtered data.

Uploaded by

Subhamoy Ghosh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

A machine learning approach to predict drilling rate using petrophysical and mud logging data

Uploaded by

Subhamoy Ghosh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Earth Science Informatics (2019) 12:319–339

https://doi.org/10.1007/s12145-019-00381-4

RESEARCH ARTICLE

A machine learning approach to predict drilling rate

using petrophysical and mud logging data
Mohammad Sabah 1 & Mohsen Talebkeikhah 1 & David A. Wood 2 & Rasool Khosravanian 1 & Mohammad Anemangely 3 &
Alireza Younesi 1

Received: 4 October 2018 / Accepted: 4 March 2019 / Published online: 25 March 2019
# Springer-Verlag GmbH Germany, part of Springer Nature 2019

Abstract
Predicting the drilling rate of penetration (ROP) is one approach to optimizing drilling performance. However, as
ROP behavior is unique to specific geological conditions its application is not straightforward. Moreover, ROP is
typically affected by various operational factors (e.g. bit type, weight-on-bit, rotation rate, etc.) as well as the
geological characteristics of the rocks being penetrated. This makes ROP prediction an intricate and multi-faceted
problem. Here we compare data mining methods with several machine learning algorithms to evaluate their accuracy
and effectiveness in predicting ROP. The algorithms considered are: artificial neural networks (ANN) applying a
multi-layer perceptron (MLP); ANN applying a radial basis function (RBF); support vector regression (SVR), and an
hybrid MLP trained using a particle swarm optimization algorithm (MLP-PSO). Data preparation prior to executing
the algorithms involves applying a Savitzky–Golay (SG) smoothing filter to remove noise from petrophysical well-
logs and drilling data from the mud-logs. A genetic algorithm is applied to tune the machine learning algorithms by
identifying and ranking the most influential input variables on ROP. This tuning routine identified and selected eight
input variables which have the greatest impact on ROP. These are: weight on bit, bit rotational speed, pump flow
rate, pump pressure, pore pressure, gamma ray, density log and sonic wave velocity. Results showed that the
machine learning algorithms evaluated all predicted ROP accurately. Their performance was improved when applied
to filtered data rather than raw well-log data. The MLP-PSO model as a hybrid ANN demonstrated superior
accuracy and effectiveness compared to the other ROP-prediction algorithms evaluated, but its performance is
rivalled by the SVR model.

Keywords Rate of penetration . Data mining . Machine-learning predictions . ROP variables . Feature selection ranking . Data
filtering

Introduction
Communicated by: H. Babaie
A high-priority goal for the drilling industry is to find a robust
Electronic supplementary material The online version of this article solution for predicting and optimizing the drilling rate of pen-
(https://doi.org/10.1007/s12145-019-00381-4) contains supplementary etration named (ROP). However, optimizing ROP on its own
material, which is available to authorized users.
is unlikely to optimize the overall because there are complex
interdependencies between ROP and a number of other dril-
* Mohammad Sabah
[email protected] ling variables. Geological, geotechnical and operational fac-
tors as well as the drilling-rig equipment characteristics and
1
specifications render ROP behavior highly non-linear and,
Department of Petroleum Engineering, Amirkabir University of
Technology (Tehran polytechnic), 424 Hafez Avenue, consequently, difficult to predict (Akgun 2007, Darbor et al.
Tehran 15875-4413, Iran 2017). The impact of various influencing parameters on dril-
2
DWA Energy Limited, Lincoln, UK ling rate can be modeled by several mathematical approaches
3
such as non-linear correlations and stochastic methods
Faculty of mining, Petroleum and Geophysics Engineering,
Shahrood University of Technology, Shahrood, Iran (Moradi et al. 2010).
320 Earth Sci Inform (2019) 12:319–339

Over the past few decades, many researchers tried to estab- network and upper-layer solution aware model. Their results
lish relationships between the parameters influencing ROP indicated that all of these methods were able to generate high-
and to develop ROP-prediction models (BINGHAM 1965; ly accurate ROP predictions. Khandelwal and Armaghani
Warren 1987; Winters et al. 1987; Hareland and Rampersad (2016) compared a multiple regression strategy with ANN
1994; Motahhari et al. 2010). Unfortunately, none of the pub- and hybrid genetic algorithm (GA)-ANN to predict drilling
lished analytically-derived correlations result in a reliably- rate index (DRI) based on the geological attributes of the
accurate prediction models. This is due to the complexity formations being drilled. Their results suggested that the
and uncertainty which exist among the parameters affecting GA-ANN technique was more responsive in predicting DRI
drilling rate at different geological provinces, specific well compared to the other models they evaluated. Bezminabadi
locations and drilling equipment (Mendes et al. 2007). et al. (2017) estimated ROP applying multiple nonlinear re-
Furthermore, applying these analytical models sometimes gression (MNR) and an ANN. Typically, geomechanical prop-
leads to wildly inaccurate and meaningless ROP predictions. erties of the rock formations being penetrated by the drill bit
In recent years, significant advances in computational pro- have been considered together with operational parameters to
cessing power and machine learning algorithms has intro- improve ROP prediction performance of the machine-learning
duced more powerful and flexible data processing solutions, models proposed. Eskandarian et al. (2017) proposed the ap-
such as: artificial neural networks (ANN), support vector re- plication of a random forest (RF) optimization algorithm and
gression (SVR), fuzzy inference systems (FIS), efficient data monotone multi-layer perceptron (MON-MLP) for estimating
mining techniques, evolutionary optimization algorithms, and ROP. They applied the R-programming package Bfscaret^
the opportunities to combine them in hybrid forms. This ca- (Szlek and Mendyk 2015) to analyze the relative importance
pability has facilitated more robust prediction performance for of a range of input parameters on ROP as the dependent var-
many complex non-linear systems, such as drilling rate of iable. Hegde et al. (2017) compared analytical based and ma-
penetration. Such approaches tend to outperform the tradition- chine learning models for generating ROP predictions. Their
al non-linear correlation methods because they are able to results demonstrated that data-driven models achieved better
minimize uncertainty for a significant number of input vari- ROP predictions compared to physics based models. Darbor
ables with variable non-linear dependencies and generate et al. (2017) developed and implemented non-linear multiple
more consistent and reliable predictions of the dependent var- regression (NLMR) and multilayer perceptron neural network
iable(s) (Yilmaz and Kaynar 2011). (MLP-ANN) models for ROP prediction. Their results indi-
Many previous studies have applied various machine- cated that brittleness, rock quality designation (RQD) index,
learning techniques and data mining methods to predict water content, and anisotropy index were the input variables
ROP and other key drilling variables. Basarir et al. (2014) with the most influence on ROP. Yavari et al. (2018) evaluated
compared the capability of multiple regression methods (lin- performance of ANFIS and mathematically derived ROP
ear and non linear) and an adaptive neuro fuzzy inference models in in terms of ROP estimation. Abbas et al. (2018)
system (ANFIS) for predicting ROP. Their results showed that implemented artificial neural network for predicting dril-
the ANFIS technique achieved superior prediction accuracy in ling rate, reducing drilling time as well as the drilling
comparison with regression methods. Hegde et al. (2015) pre- cost. They took into account the wellbore geometry in-
dicted ROP by utilizing statistical learning techniques such as: cluding inclination and azimuth along with other drilling
trees, bagged trees and random forests which were used for a parameters.
data set with nine input variables. Bodaghi et al. (2015) used The literature review provided demonstrates that data min-
genetic and cuckoo search algorithms to optimize SVR ing and artificial intelligence methods have been widely used
models that establish meaningful interaction relationships by many researchers in recent years to predict ROP from a
among drilling variables and ROP. On the other hand, range of drilling variables; in most cases achieving superior
Kahraman (2016) claimed that the ANN models evaluated accuracy to traditional correlations. Despite of these previous
were more dependable than regression models for predicting works, uncertainty remains concerning which machine-
ROP. These ANN models involved a unique combination of learning techniques and drilling input variables provide the
input variables including uniaxial compressive strength, ten- most accurate and reliable predictions of ROP. Another limi-
sile strength and relative abrasiveness of the formations being tation of previous studies is ignoring the formation parameters
drilled in addition to the commonly used drilling operational and rock mechanical characteristics in developing the predic-
variables. Jiang and Samuel (2016) utilized a hybrid artificial tive models for ROP. However, in some studies, formation
neural network coupled with an ant colony optimization characteristics were involved in ROP estimation, they either
(ACO) algorithm in an attempt to optimize ROP prediction. were considered as constants or could not be applicable during
Shi et al. (2016) assessed the performance of various machine- the drilling operations. Failure to consider formation charac-
learning techniques for predicting ROP. The techniques eval- teristics has contributed to low accuracy of proposed models
uated included: extreme learning machine; artificial neural while making them slow to converge (Sultan and Al-Kaabi
Earth Sci Inform (2019) 12:319–339 321

2002; Abtahi 2011; Hamrick 2011; Huang et al. 2011). involves five main phases: data collection; up-scaling and
Wireline logs can provide valuable information concerning noise reduction; feature selection; training the machine-
mechanical characteristics of underground layers, but they learning models; and, evaluation of results when the models
are run once the drilling of a formation is actually completed. are applied to testing data sets that have not been used in
This is one reason why some ROP prediction models ignore model training.
rock characteristics in real-time drilling penetration rate esti-
mation. However, with modern well data handling and inter-
pretation techniques, such information can provide useful in- Noise reduction by data filtering
sight for the planning and execution of infill wells (i.e., future
wells planned to be drilled between two existing wells) (Raji Measured data derived from real drilling operations is always
et al. 2017). Furthermore, building 3D-earth models, such as affected by noise, to a greater or lesser extent. Even in the best
geophysical-earth models, based on existing and offset wells’ controlled conditions, errors in the measured data are approx-
data is a simple, and now routinely applied, method that is imately 5% or more (Orr 1998; Redman 1998). There are
effective in predicting and comparing petrophysical and me- various statistical definitions of the term Bnoise^ that are fre-
chanical rock properties at proposed location for future infill quently applied to machine learning datasets. It is commonly
wells planned. These methods and techniques mean that the recognized that data sets permeated with Bnoise^ have nega-
available petrophysical log data (from offset wells) can be tive impacts on the learning capabilities of machine-learning
exploited as an effective tool for estimating drilling ROP dur- algorithms, which typically increase the computational time
ing ongoing drilling operation, particularly in field develop- taken for them to learn and reduce their prediction accuracy
ments where multiple development wells are drilled and (Anemangely et al. 2018). Also, applying the rules and gen-
logged. eral relationships established for the trained models to new
Considering the limitations dealt with previous stud- datasets is problematical when there is noise present either
ies, the objective of this study is performing a detailed within the trained dataset or the new dataset (Lorena and de
comparison of several data mining and machine-learning Carvalho 2004; Garcia et al. 2015).
methods, evaluating their accuracy and effectiveness in Noisy data for any of the variables involved in the dataset is
predicting ROP. In addition, the models evaluated here a particular issue for accuracy when that data involves a low
take into account petrophysical data together with mud signal-to-noise ratio (SNR). Detecting the source of the noise
logging data (i.e., geological data that is routinely gen- and distinguishing noisy data records is essential if that noise
erated in almost all drilling operations) to include the is to be diminished or removed. Many factors can lead to noise
rock characteristics in developing the relevant models and irregularities in drilling datasets (Anemangely et al.
and providing accurate estimates of ROP. The models 2018), including: drill-string vibration; bit, or other down-
developed also incorporate noise filtering (smoothing hole tool, changes; replacement of drillers and/or drilling en-
high-frequency data fluctuations from the well-log data) gineers during the drilling operation; geological heterogene-
and feature-selection (establishing the most influential ities in the formation being drilled; changes in mud additives,
input variables with respect) routines to accelerates the etc.
data processing and keep the models from overfitting. A Savitzky–Golay (SG) smoothing filter (Savitzky and
Golay 1964) was applied in order to eliminate at least some
of the noise from the compiled dataset. This method applies a
Methodology polynomial function to reduce noise in the data variables by
replacing acquired values in the data records that are identified
As was previously mentioned, there are an array of parameters as noise with values generated by the SG function. On the
which influence ROP directly and indirectly. However, con- basis of the least-squares error identified, an n-order
sidering a large number of parameters for developing a pre- polynomial function is derived for selected points across
dictive model will increase computational cost and in some a drilled formation. Number of selected points should be
cases decrease the model accuracy. Therefore, an optimum odd and more than the order of the derived polynomial
number and combination of these parameters should be con- function. The derived polynomial function will then pre-
sidered to increase computational speed while preserving as serve the data trends for the variable if a higher poly-
much accuracy as possible. This aim can be achieved by ap- nomial order is applied, or if the number of data records
plying a suitable feature selection technique on extracted data fitted within that interval is reduced. However, a reduc-
set. However, before implementing such technique, measured tion in the polynomial order or a larger number of data
data (which are contaminated with noise) should be denoised records used to define the specific interval may destroy
to be able to develop accurate and reliable predictive models. the data trends for the variable and result in excessive
The workflow adopted for this study is illustrated in Fig. 1. It data smoothing.
322 Earth Sci Inform (2019) 12:319–339

Fig. 1 Schematic diagram of workflow applied to the ROP-prediction models developed (RF = Random Forest, DT = Decision Tree, SVR = Support
Vector Regression; MLP = Multi-Layer Perception; RBF = Radial-Basis Function)

Feature selection – Establishing the input variable unsupervised learning techniques, sequential and heuristic
with significant influence on ROP search algorithms (Law et al. 2004; Chandrashekar and
Sahin 2014; Davoudi and Vaferi 2018; Gholami et al. 2018).
Feature ranking establishes a decision-making index which Differences between predicted and observed ROP data need to
defines the relative importance of the features (variables) in be minimized. Consequently, residual error (typically mea-
their influence on the dependent variable (ROP). This index sured as mean squared or root mean squared residual error)
can then be used reduce the dimensions of dataset (i.e., reduce tends to be the most effective criteria in selecting meaningful
the number of input variables considered), by eliminating input parameters. Utilizing correlation coefficients (a sub-
those variables with very low influence on the dependent var- group of filter methods that have been widely used in many
iable. Applying a feature selection routine can lead to: shorter previous studies) is sub-optimal in selecting the most influen-
algorithm training times; clarification of model relationships tial features (Anemangely et al. 2017). Conversely, wrapped
among the variables selected; reducing the risk of overfitting methods which employ sequential or heuristic search algo-
the data; and, lowering the costs associated with data collec- rithms achieve more robust solutions, particularly in
tion (James et al. 2013). Methods historically applied for fea- interacting with a large number of features (Vafaie and Imam
ture selection include filter methods, embedded methods, 1994; Chandrashekar and Sahin 2014). Accordingly, in this
Earth Sci Inform (2019) 12:319–339 323

study, a genetic algorithm (GA), which is a widely-used heu- value. To avoid the model being over-trained (over-
ristic search algorithm, was implemented to determine indis- fitted), the weights applied in Eq. (1) for the feature-
pensable inputs for ROP estimation with RMSE as the objec- selection algorithm (Wtrain and Wtest) are set to 0.45 and
tive function to be minimized. 0.55, respectively, and not allowed to vary (Anemangely
Regarding feature selection using GA, similar to the other et al. 2019). The sequence of steps involved in the
meta-heuristic algorithms, the initial GA population is gener- feature ranking and selection routine developed are il-
ated randomly followed by the execution of three distinct lustrated as a flowchart in Fig. 2.
Breproduction^ operations to modify the population for sub-
sequent iterations (generations) of the model, viz., crossover, Machine learning algorithms
mutation and random selection (Ab Wahab et al. 2015).
Once a population of solutions is established for each Decision tree (DT) algorithm
new generation, the relative suitability of each individ-
ual solution in the population is determined. This is The classification and regression tree (CART) technique be-
achieved using a Brank-based^ method by calculating a longs to a class of non-parametric supervised learning tools
fitness measure related to the error between the actual implemented in various industrial applications (Maucec et al.
measured objective function value and the prediction. 2015). DT methods used for data mining are typically divided
The individual solutions (generated using a pre-defined into two categories: (1) classification tree; and, (2) regression
MLP network) in a generation are then ranked accord- tree. The appropriate type of tree to apply to a specific dataset
ing to their values for that fitness function. A pressure- is decided based on the output variable type; if discrete and
selection term is defined and applied to the GA-MLP categorical, a classification tree is appropriate; if continuous
feature selection algorithm using roulette-wheel selec- then a regression tree is appropriate (Breiman 2017). As ROP
tion. This eventually identifies the features (variables) is a continuous data variable, a regression tree analysis is
that have the greatest impact on the accuracy of the applied to the compiled dataset.
ROP predictions. Initially, just one feature which The underlining procedure for applying either type of DT
achieves the best fitness-function value is identified adheres to the same sequence of steps. A data mining DT
and selected. Progressively the number of selected fea- divides observations (data records) into subgroups by creating
tures is increased to identify the contribution of each in splits on predictors (Maucec et al. 2015). This involves binary
reducing the ROP prediction error. The selected combi- recursive partitioning by implementing a binary splitting pro-
nation of features can be varied in each iteration of the cess in which Bparent^ nodes are progressively divided into
GA-MLP due to the roulette-wheel selection process. two Bchild^ nodes. The number of rows constituting the tree
The GA-MLP methodology developed for feature selec- gradually increase as more Bchild^ splits are added.
tion makes it possible to assess the trend of ROP- Eventually a row of Bparent^ nodes cannot be further divided
prediction-error reduction as the number of features in- and these form the terminal nodes (Singh 2017). The first step
volved increases. The appropriate structure of the MLP in constructing a DT is to establish the appropriate splitting
network (number of hidden layers and number neurons criteria. This is typically achieved by consideration of the
in each hidden layer) used in the GA-MLP feature se- input variables aiming to minimize the RMSE between the
lection routine can be found through trial and error ap- measured and calculated value of the output variable (ROP
proach. Following this, the dataset is randomly divided in the case considered). The next step generates one root node
into two subsets, viz., a training subset and a testing and two child nodes (Singh 2015). Subsequent steps involve
subset, containing 70% and 30% of the entire dataset, repeating the procedure considering additional input variables
respectively. In order to obtain a single value of model to establish further splits creating new child nodes. Ultimately,
error for each round of feature selection, Eq. (1) can be the DT generates a logical sequence of splitting criteria based
used to combine the two error terms (from the training upon the input variables considered and a DT diagram illus-
and testing subsets) (Anemangely et al. 2018): trating the basis on which each split in the tree is created.

Random forest (RF) optimization algorithm

RMSEmodel ¼ W train RMSE train þ W test RMSE test ð1Þ
The random forest (RF) optimization algorithm involves an
ensemble of several unpruned decision trees with sequential
Where RMSE is the root mean square error between mea- growth, rather than just a single, restricted tree (Breiman
sured and predicted ROP values; and, W are weights applied to 2001). The main hallmarks for RF include bootstrap resam-
the RMSE established for the training and testing subsets to pling, random feature selection, out–of-bag error estimation
adjust their contribution to the total cost function (RMSEmodel) and full depth decision tree growth (Jiang et al. 2009). For
324 Earth Sci Inform (2019) 12:319–339

Fig. 2 Flowchart describing the steps involved in the feature ranking and selection routine applied to the compiled dataset

each tree in random forest, a new training data set is cre- estimates of the objective function (Youssef et al. 2016;
ated applying bootstrap sampling method, and the remain- Douglas et al. 2018). Ultimately, an average of the predic-
ing data set is called out-of-bag (OOB) (Jiang et al. 2009). tion errors (OOB error) is established by an internal cross-
A decision tree grows using created new training data set validation algorithm across all the trees grown in the arti-
without pruning through CART methodology (Duda et al. ficial forest. Tree growth in the random forest algorithm is
2012). In each split of node of a decision tree, a small controlled by: 1) the number of trees grown in the forest
subset (m) of features are selected randomly from all the (ntree); the subset size (m); and, 3) the minimum leaf size.
available features (input variables, M). Size of m stays These control parameters are optimized to achieve the most
constant throughout growing the forest. However, the ac- accurate prediction of the objective function derived col-
tual features included in the m subset at each node are lectively from all the trees grown in the forest.
allowed to vary. Consequently, the optimum group of fea-
tures involved at each node is established based on m not Multi-layer perceptron neural network (MLP-NN)
M features (Eskandarian et al. 2017). Resampling data with
replacement and randomly changing predictive variable MLP networks are comprised of an input layer, one or more
sets ensures the diversification between ensembles of trees hidden layers, and an output layer. There are several process-
and enhances the chances of finding more accurate ing neurons in each layer and every neuron is thoroughly
Earth Sci Inform (2019) 12:319–339 325

connected to subsequent layers by weighted interconnections. layer or the output layer. A bias value is also added to the
The number of neurons in input layer matches the number of weighted value and that calculated value is passed to the acti-
input variables in the system being modelled. The number of vation level in which a pre-defined activation (transfer) func-
neurons in the output layer corresponds to the number of out- tion is applied to produce the value of the node in the next
put variables to estimated. The relationships between the input layer. There are several activation (transfer) functions com-
and output layers an MLP are determined by its hidden monly applied to derive values for the next hidden layer or
layer(s) (Lashkarbolooki et al. 2012). MLP efficiency and output layer. The relationships involved in MLP neural net-
accuracy is significantly influenced by its architecture, i.e., works are defined by Eq. (2):
the number of hidden layers and number of neurons in each
!
hidden layer (Hemmati-Sarapardeh et al. 2016). Although, an N k−1
MLP with just one hidden layer is typically able to resolve Y jk ¼ F k ∑ W ijk Y iðk−1Þ þ Bik ð2Þ
i¼1
most prediction problems; more than one hidden layer may be
required to achieve meaningful results for more complex sys-
tems. Multiplying the weight of each node in previous layer Where Yjk and Bik are the neuron j’s output from the MLP’s
by its calculated value forms one of the inputs to the nodes to k’s layer, and the bias term for neuron j in layer k, respectively.
which it is connected in the next layer, either another hidden Wijk is the randomly selected weight applied to the initial

Fig. 3 Schematic map showing the location of the Marun oil field and the stratigraphy of the study area. The Marun oil field’s reservoir is the Asmary
Formation
326 Earth Sci Inform (2019) 12:319–339

Table 1 General information about studied well function. The network reaches its optimum performance when
Wellbore inclination (degree) 0 the measured distance becomes zero and degrades by increas-
Wellbore azimuth (degree) N45 W ing the distance. While RBF-NN has only a single hidden
Hole size (in) 8.5
layer, two sets of weights are applied: one for connections
Mud type Water base mud (WBM)
between hidden and input layers; and, the other for connec-
Bit type Roller cone (insert bit)
tions between hidden and output layers. A non-linear function
is used to transform the input area into hidden area. However,
IADC bit code 625
the transformation from the hidden layer to the output layer
Formation type Asmary
which is typically linear (Park and Sandberg 1991; Wu et al.
2012). The RBF network outputs are typically expressed as
Eq. (3):
training procedure. Fk is the transfer function which can take
several distinct mathematical forms, such as an identity func- f ðxi Þ ¼ wT φðxi Þ ð3Þ
tion, a binary-step function, binary sigmoid, bipolar sigmoid,
Gaussian, and linear functions (Fausett 1994). Where: wT is the transposed output of layer vector, φ(xi) is
the kernel function, for which a Gaussian relationship is com-
monly used (Haykin and Network 2004).
Radial basis function neural network (NN)

The RBF algorithm emerged in the late 1980s (Broomhead MLP-PSO hybrid neural network
and Lowe 1988). Its simple structure and its ability to effec-
tively and accurately process arbitrarily scattered data across Back propagation algorithms especially LM algorithm are the
multi- dimensional space has made it a useful alternative to widely used techniques to adjust weights and biases of MLP-
MLP networks (Elsharkawy 1998; Lashkenari et al. 2013). NNs. However, several studies (Lee et al. 1991; Wang et al.
Moreover, RBF networks models can be trained in a single 2004; Armaghani et al. 2017) reveal that applying back propaga-
direct procedure rather than iterative process involved in train- tion (BP) algorithms to tune parameters of neural networks can
ing MLP networks (Venkatesan and Anitha 2006). RBF net- involve some disadvantages such as slow learning rate
works involve only one hidden layer with a variable number and becoming trapped in local minima. Here we train
of nodes (RBF units). an MLP using a particle swarm optimization (PSO) al-
The architecture of an RBF ANN is as a two-layer feed- gorithm in an attempt to improve ROP prediction accu-
forward neural network, in which the inputs are transferred to racy compared with the MLP network trained by the
the output layer via the neurons in the hidden layer (Park and Levenberg-marquardt (LB) back-propagation algorithm.
Sandberg 1991; Wu et al. 2012). The RBF algorithm has two The PSO algorithm is widely used as an optimizer be-
key control parameters for each RBF unit: 1) the center loca- cause it provides excellent search capabilities of a multi-
tion of the function; and 2) the deviation or spread of that dimensional feasible search space with good conver-

Table 2 Statistical information

pertaining to the input variables Coded factor Parameter Unit Minimum Maximum Average
(X1 to X11) and the output
(dependent) variable, ROP (X12), X1 Neutron Porosity (NP) – 0.28 0.44 0.35
for the compiled 1000-data point X2 Density kg/m3 2.2 2.44 2.37
dataset X3 Shear Wave Velocity (Ts) us/ft 207.95 322.79 261.27
X4 Compressional Wave Velocity (Tp) us/ft 98.37 122.32 108.3
X5 Gamma Ray (GR) GAPI 70.1 121.10 102.47
X6 Weight on Bit (WOB) 1000 Kgf 0.18 9.96 4.87
X7 Bit Rotational Speed (BRS) RPM 93 143.69 135.58
X8 Pump Pressure (PP) MPa 18.34 22.71 22.08
X9 Bit Flow Rate (BFR) m3/s 0.050 0.0581 0.0572
X10 Mud Weight (MW) kg/m3 1680.0 1720.0 1700
X11 Pore Pressure gradient (Pp) kg/m3 1040 1590 1476
X12 Rate of Penetration (ROP) m/h 3.55 34.88 24.86
Earth Sci Inform (2019) 12:319–339 327

gence performance (Atashnezhad et al. 2014). For these needs to be minimized using slack variables (δi, δi∗) to
reasons, the PSO algorithm is selected as a training determine the optimum values of w and b:
alternative for a hybrid MLP network to predict ROP 1 1
R ð f Þ ¼ k w k2 þ c ∑ δ i δ i *
from the compiled dataset. More detailed information 2 8 i¼1
about the functioning of the PSO algorithm is provided < d i −wT φðxi Þ−b ≤ ε þ δi ð5Þ
by (Ab Wahab et al. 2015). Subject to wT φðxi Þ þ b−d i ≤ ε þ δi *
:
δi ; δi * ≥ 0

Support vector regression (SVR) network

Where: c is a constant which defines a compromise be-
SVR networks are supervised machine-learning algo- tween flatness and estimation error; and, ε is an error-
rithms processing data with a multi-dimensional regres- monitoring parameter. Eq. (9) is solved based on the founda-
sion technique (Cortes and Vapnik 1995). The prediction of a dual-problem formulation involving the Lagrange
tion of unknown variables is achieved in SVR by es- multipliers, ai, ai∗ ∈ [0, c]. The solution derived by this meth-
tablishing an optimal linear regression solution in a new od is expressed as Eq. (6):
feature space. That new feature space is based on input l
data mapped from the original solution space and ex- f ðxÞ ¼ ∑ ai −ai * k xi ; xi * þ b ð6Þ
tending it into a higher m-dimensional space (Vapnik i¼1

2013). If the SVR training data subset exists as a p-

dimensional input vector (p input variables) associated Where k(xi, xi∗) is a kernel function which can be a n-order
with a one-dimensional target (output) vector (variable), polynomial function:
the regression function (f(x)) to predict the output vari- n
able (Bodaghi et al. 2015) is typically expressed as Eq. k xi ; xi * ¼ 1 þ xi * xi ð7Þ
(4):
c, ε and n are the three main control parameters for the
risk function of the SVR algorithm and these can be
f ðxÞ ¼ wT φðxÞ þ b ð4Þ determined using various optimization routines. A hy-
brid approach involving a grid search and a pattern
Where: φ(x) is a nonlinear mapping function; w is a search has been used as an optimization technique in
weighting vector; and, b is the bias term of the regres- several published SVR applications (Saffarzadeh and
sion equation. A risk function expressed as Eq. (5) Shadizadeh 2012; Asoodeh and Bagheripour 2013;

Fig. 4 Pair plot to illustrate the relationships between the input data variables
328 Earth Sci Inform (2019) 12:319–339

Fattahi et al. 2015). A grid search initially attempts to Geology and formation setting
locate a region close to the global optimum point.
Subsequently, a pattern search is executed over a The borehole dataset used to develop and evaluate the ROP
constrained search zone surrounding the best solution prediction models is derived from a single vertical well pene-
identified by the grid search. trating the Marun oil field in SW Iran. Figure 3 shows the

Fig. 5 Recorded and denoised data of the well under study

Earth Sci Inform (2019) 12:319–339 329

Table 3 Results of feature selection and ranking routine applying the For the dataset compiled from the single vertical well
GA based feature selection method
studied, the drilling bit type and size and the formation
Number of Selected inputs RMSE type are constant. Consequently, those parameters are
input variables eliminated as being influential on ROP for the dataset
studied. The appropriate sample-collection interval ap-
1 X8 3.2909
plied in compiling the data set from the raw wireline-
2 X11,X6 2.1849
log and mud-log depth curves needs to be established.
3 X11,X6,X7 1.602
The potential up-scaled values of the petrophysical logs
4 X11,X9,X6,X8 1.3787
that could be used for the studied well are constrained
5 X11,X9,X8,X6, X7 1.3294
by the large difference between the scales at which the
6 X7,X6,X9,X11, X2, X8 1.2397
petrophysical logs were recorded (measurements taken
7 X2,X8,X7,X11, X6, X9, X3 1.1964 at 15.24cm intervals) versus the mud logging data (mea-
8 X6,X5,X9,X11, X8, X3, X2, X7 1.1462 surements taken at 50 cm intervals). To overcome this
9 X5,X3,X2,X1, X9, X7, X6, X8, X11 1.1345 complication, a simple averaging method was used. For
10 X8,X2,X1,X7, X3, X5, X4, X11, X9, X6 1.1332 each petrophysical log, the simple averaging method
11 X1,X5,X7,X2, X6, X11, X4, X3, X9, X10, X8 1.1315 involved taking available data points along each
* The bolded row lists the optimum number of features and their
50 cm depth interval were averaged. The simple average
combinations for the petrophysical data was considered as the repre-
sentative value for that 50 cm drilling interval to be
compared with the mud-log data for the same interval.
location and stratigraphy of the studied area. The database This made it possible to integrate the log-curve versus
consists of 1000 data points which were collected from the depth data into a uniform data set. The statistical infor-
Asmary formation (one of the primary carbonate reservoirs of mation for the eleven input variables and the dependent
the Marun oil field) extending across a 500-m drilled section variable (ROP) is listed in Table 2. Furthermore, Fig. 4
(from 3000 m to 3500 m measured depth). Major input vari- Shows the relation (based on regression coefficient) be-
ables influencing ROP are divided into two main categories: tween input variables and the dependent variable (i.e.
1) those derived from the mud log; and 2) those derived from drilling rate). In general, correlation coefficients are pos-
the standard wireline log suite. The variables derived from the itive for all variables except for mud weight and bit
mud log include weight on bit (WOB); bit rotational speed rotational speed (BRS). BRS is inversely related (i.e.
(BRS); bit flow rate (BFR); mud-pump pressure (PP); and, negatively correlated) to ROP. Such a relationship be-
mud weight (MW). The variables derived from the wireline tween ROP and BRS is counter-intuitive, as it is rea-
logs include compressional (Tp) and shear sonic velocities sonable to expect that increasing the BRS should lead
(Ts); density; gamma ray (GR); neutron porosity (NP); and, to an increase in the ROP. This may be a specific fea-
pore pressure gradient (PPG). General information regarding ture of the data set studied and the drilling conditions
the studied well is listed in Table 1. prevailing within the studied depth interval. Other

Fig. 6 Prediction accuracy 3.5

improvement trend achieved by
applying the GA-MLP feature 3
selection and ranking routine
(RMSE associated with different 2.5
numbers of inputs variables in-
volved in ROP prediction). The 2
RMSE

RMSE values shown combines

RMSE for training and validation
1.5
datasets
1

0.5

0
0 2 4 6 8 10 12
Number of inputs
330 Earth Sci Inform (2019) 12:319–339

parameters have a positive correlation with ROP, i.e. an recorded of the well (in green) along with denoised data (in
increase in the values of these other variables is associ- black) using SG filter. According to the figure, trend of chang-
ated with an increase in drilling rate. es in the denoised data is in good agreement with that of
recorded data, i.e. noise effect has been minimized.
Regarding to the feature ranking, a trial and error approach
Results and discussion was conducted to find appropriate structure of MLP-NN used in
the feature ranking routine. It was found that the best configu-
For aim of reducing noise, the appropriate polynomial order ration of MLP-NN composed of two hidden layers with 6 and 5
and number of data records selected to define each interval neurons in first and second layer, respectively. The cost-function
need to be chosen carefully when defining the SG function to (RMSE) results of applying proposed feature- selection routine
apply. Data extracted from the daily drilling reports available to the compiled dataset are listed in Table 3 and illustrated in
for the studied well and the geological characteristics of the Fig. 6. Clearly, increasing the number of input variables in-
formation drilled were used evaluate sensitivity cases to clar- volved in ROP prediction leads to a decrease in the RMSE
ify the selection of these two key values. For the mud-logging value. However, the RMSE improvements (reductions) follow
data, the optimal polynomial order and number of data records a trend with a reducing slope as the number of features increases.
to be fitted were established as 3 and 13, respectively. For the Indeed, for the compiled dataset, the reduction in RMSE be-
wireline-log data, the optimal polynomial order and number of comes negligible once the number of input variables exceeds
data records to be fitted were established as 5 and 17, respec- eight (Fig. 6). In other words, eight parameters including weight
tively. Figure 5 demonstrates a comparison between the data on bit, pump pressure, bit rotational speed, bit flow rate, pore

PP<3594.72 N=700 PP>=3594.72

Mean=24.9112
Std. dev.=4.18288

N=22 BRS<137.099 N=678 BRS>=137.099

Mean=14.0598 Mean=25.2633
Std. dev.=5.36426 Std. dev.=4.18288

WOB<4.52927 WOB>=4.52927 EPP<1.41103 N=311 EPP>=1.41103

N=367
Mean=26.6433 Mean=23.6349
Std. dev.=3.47111 Std. dev.=3.12001

EPP<1.36879 N=30 PP<3723.78 N=281 PP>=3723.78

N=34 N=333 EPP>=1.36879 Mean=24.0303
Mean=19.9311
Mean=22.5148 Mean=27.0649 Std. dev.=1.6107 Std. dev.=2.98014
Std. dev.=4.18288 Std. dev.=3.24595

PP<3645.27 N=174 PP<3645.27 WOB<9.27267 N=107 WOB>=9.27267

N=57 BFR<906.37 N=276 BFR>=906.37 Mean=24.9555 Mean=22.5257
Mean=23.6566 Mean=27.7687 Std. dev.=2.4012 Std. dev.=3.2186
WOB<18.886 Std. dev.=3.90593 WOB>=18.886 Std. dev.=2.59004

N=32 N=25 N=41 WOB<6.5763 N=235 WOB>=6.5763 N=34 WOB<13.898 N=140 WOB>=13.898 N=47 EPP<1.54799 N=60 EPP>=1.54799
Mean=21.5807 Mean=26.3137 Mean=25.2395 Mean=28.21 Mean=27.1905 Mean=24.4128 Mean=20.3247 Mean=24.2499
Std. dev.=3.91831 Std. dev.=1.63305 Std. dev.=3.8782 Std. dev.=1.99906 Std. dev.=2.20396 Std. dev.=2.12236 Std. dev.=3.1593 Std. dev.=1.98679
EPP<1.5515 EPP>=1.5515
EPP<1.56039 EPP>=1.56039

N=20 N=21 N=25 BFR<907.434 N=210 BFR>=907.434 EPP<1.53283 N=116 EPP>=1.53283 N=24 N=700 N=20 N=25 N=35
Mean=23.2019 Mean=27.18 Mean=25.9396 Mean=28.4803 Mean=24.0731 Mean=26.0544 Mean=24.9112 Mean=17.8758 Mean=23.0456 Mean=25.1101
Std. dev.=4.36602 Std. dev.=1.96634 Std. dev.=1.59942 Std. dev.=1.86828 Std. dev.=2.05656 Std. dev.=1.6375 Std. dev.=4.18288 Std. dev.=2.96594 Std. dev.=1.93385 Std. dev.=1.54334

N=152 N=39
PP<3683.39 N=58 PP>=3683.39 EPP<1.45207 EPP>=1.45207 BRS<139.836 N=77 BRS>=139.836
Mean=28.0826 Mean=22.4399
Mean=29.5225 Mean=24.9004
Std. dev.=1.84681
Std. dev.=1.49537 Std. dev.=1.88043 Std. dev.=1.603

N=27 N=31 N=96 GR<103.208 N=56 GR>=103.208 N=24 WOB<4.38768 N=53 WOB>=4.38768
Mean=30.3699 Mean=28.7844 Mean=27.4544 Mean=29.1595 Mean=24.0049 Mean=25.3058
Std. dev.=1.51942 Std. dev.=1.01917 Std. dev.=1.90262 Std. dev.=1.1143 Std. dev.=1.75997 Std. dev.=1.35938

Density<2.3575 Density>=2.3575

N=20 N=76 N=27 N=29 N=22 N=31

Mean=25.5645 Mean=27.9518 Mean=28.7176 Mean=29.571 Mean=24.8595 Mean=25.6226
Std. dev.=1.64454 Std. dev.=1.64226 Std. dev.=1.0777 Std. dev.=0.99839 Std. dev.=0.64519 Std. dev.=1.63154

EPP<1.3921 EPP>=1.3921

N=34 N=42
Mean=26.75 Mean=28.9246
Std. dev.=1.25455 Std. dev.=1.22458

EPP<1.40006 EPP>=1.40006

N=20 N=22
Mean=28.5151 Mean=29.297
Std. dev.=1.31042 Std. dev.=1.03401

Fig. 7 Optimal regression tree with statistical information regarding the ROP prediction for each node shown in the text box
Earth Sci Inform (2019) 12:319–339 331

pressure, density, shear wave velocity and gamma ray were training data set, thereby creating an accurate predictive mod-
recognized as the most influential parameters on drilling rate. el. The optimal regression tree established for drilling rate
Hence, the machine-learning and data mining ROP-prediction with statistical information for each node in the DT is shown
models developed here for comparison incorporate just eight in Fig. 7. The text box for each node provides statistical infor-
input variables to optimize their ROP predictions. mation (mean, standard deviation and number of observa-
To develop the relevant models, firstly, the overall data set tions) for that node. The first split in the ROP-prediction DT
consists of 1000 records divided randomly into two subsets. developed (row 1 to row 2) is performed based on the input
70% of the entire dataset (700 data records) is allocated to the variable pump pressure (PP). The parent node in this tree
training subset and dedicated to train the prediction algo- (single node in row 1) shows that there are a total of 700
rithms. 30% of the entire dataset (300 data records) is exclud- observations in the entire unconstrained training subset cov-
ed from the training routine and utilized to test the algorithms ered by the criteria applied to that parent node. That data
once they are trained. The data records of the compiled dataset subset has a mean ROP value of 24.91m/h and a standard
are normalized to enhance accuracy and avoid variable scaling deviation of 4.18 m/h.
influences. The following relationship is used to normalize all By training several RF models and using a range of
data variable to a scale varying from −1 to +1 (Deosarkar and values for the control parameters, the RF model can be
Sathe 2012): optimized to derive the most accurate predictions using
the least number of grown tress, thereby improving ac-
xi −xmin
xni ¼ 2 −1 ð8Þ curacy and reducing computational time. Figure 8 illus-
xmax −xmin trates the mean square error (MSE) in the ROP predic-
tion of the evaluated RF algorithm as it evolves by
Where: i is number of parameters, xmax and xmin refer to the more trees being grown for different minimum leaf sizes
maximum and minimum values of variable xi (Table 2), re- applied to each RF. For each minimum leaf size value
spectively. It important to note that the performance of the used, there is no significant reduction in MSE value of
decision tree do not depend on scale of the data used, and no developed models when number of the grown trees ex-
pre-processing like data normalization is necessarily required ceeds 40 (i.e., the models have all converged to their
for developing an optimum decision tree (Rokach and optimum solution by the time 40 trees have been
Maimon 2008). Therefore, in this study, the non-normalized grown). Decreasing the value of the minimum leaf size
data is used to develop the DT in order to provide better increases the required calculations (computational time),
visualization about tree growth and splitting criteria. but, clearly (Fig. 8) generates more accurate predictions.
For the compiled dataset, a 5-fold-cross-validation se- The optimum value for this parameter for producing the
quence is applied to fit an optimized regression tree for the most accurate ROP prediction is 5 (Fig. 8).

Fig. 8 Mean square error (MSE)

value of the random forest algo-
rithm in predicting ROP versus
the minimum leaf size control
value
332 Earth Sci Inform (2019) 12:319–339

Regarding to MLP-NN development, there are several dif- Table 4 Summary of architecture of the MLP developed to predict ROP
ferent training algorithms commonly applied such as scaled Network type Multi-layer perceptron
conjugate gradient (SCG), Levenberg–Marquardt (LM), gradi-
ent descent with variable learning rate back propagation Training function Levenberg-Marquardt
(GDX), and Resilient back propagation (RP) (Moghadassi backpropagation
et al. 2011; Yetilmezsoy et al. 2011; Ghoreishi and Heidari Number of layers 3
2013). Numerous studies have highlighted the efficiency of Nodes in 1st hidden layer 4
the LM algorithm in estimating non-linear systems (Ornek Transfer function of 1st hidden layer TANSIG
et al. 2012; Ceryan et al. 2013, Armaghani et al. 2017). Nodes in 2nd hidden layer 6
Therefore, LM back-propagation algorithm is used for training Transfer function of 2nd hidden layer TANSIG
an MLP with two hidden layers with Tansig (hidden layers) and Neurons in output layer 1
Purelin (output layer) transfer functions to predict ROP for the Transfer function of output layer PURELIN
compiled dataset. Sensitivity analysis was conducted consider- Performance objective function MSE
ing various numbers of neurons in each of the hidden layers to
establish their impact on the mean square error (MSE) of the
predicted ROP value. Figure 9 displays the results of that sen- predicted ROP values. The maximum number of iterations
sitivity analysis and suggests that the optimum number of neu- was set at 300 for MLP-NN training. The particle swarm size
rons is 4 for the first hidden layer and 6 for the second hidden (i.e., particle population) was set at 80. The network architec-
layer to minimize the MSE of the ROP prediction. A summary ture of the hybrid MLP-NN (number of hidden layers, number
of the properties of the MLP developed to predict ROP for the of neurons, transfer functions and basis function) were the
compiled data set are listed in Table 4. same as those used for MLP training including the utilization
For developing a reliable RBF-NN model, it is necessary to of the LM-back propagation algorithm. Figure 11 illustrates
determine optimum number of neurons in the hidden layer. the convergence of MSE for the ROP prediction to minimum
Here, we use a trial and error approach to obtain optimum num- values executing the MLP-PSO network. The optimum MLP-
ber of neurons, although various optimization algorithms could PSO network architecture and optimized values for the PSO
also be used for this purpose (Najafi-Marghmaleki et al. 2017). control parameters are listed in Tables 5 and 6, respectively.
Sensitivity analysis is conducted by constructing multiple RBF All predictive models developed and executed for this
networks with different number of neurons in their hidden layer study, were constructed using Matlab version 7.14.0.739
(Fig. 10) indicating that the algorithm converges to optimum (Demuth et al. 2009). The ROP-prediction performance of
ROP predictions when number of neurons is about 40. the six developed machine-learning algorithms, are evaluated
In the hybrid MLP-PSO network, weights and biases of and compared using graphical illustrations and statistical mea-
neural network are optimized with the PSO algorithm with sures of accuracy. This comparison is then used to select the
the objective function being the MSE of the measured and most accurate prediction method. There are various statistical

Fig. 9 Number of neurons in the two hidden layers of the developed MLP versus MSE for the ROP prediction for the training data subset
Earth Sci Inform (2019) 12:319–339 333

Fig. 10 RMSE of the ROP 30

prediction achieved by the RBF
network versus the number of 25
neurons in the hidden layer

RMSE
15

0
0 5 10 15 20 25 30 35 40 45
Number of neurons

accuracy performance indicators that could be calculated to These indicators meaningfully compare the ROP-
access the ROP-prediction performance of the six developed prediction accuracy of the developed models, and have been
models. Here, Variance Account For (VAF), Root Mean widely used for such purposes by many other published
Square Error (RMSE), Performance Index (PI), and coeffi- machine-learning prediction studies (Yılmaz and Yuksek
cient of determination (R2) indicators are calculated as defined 2008; Boyacioglu and Avci 2010; Basarir et al. 2014). The
by Eq. (9) to Eq. (12), respectively. results for each of these statistical measures of accuracy are
listed for the six developed models in Table 7.
12 The ROP-prediction accuracy achieved by the developed
1 p
RMSE ¼ ∑ ðy −zr Þ2 ð9Þ models is impressive for five of the models based on the low
p r¼1 r
values for RMSE and the high values for R2, PI and VAF
(Table 7) that they have achieved following optimization.

VAF MLP-PSO having RMSE and R2 values of 1.12 and 0.93 for
PI ¼ rþ −RMSE ð10Þ
100 testing data subset, respectively, outperforms the other devel-
oped models. It is closely followed in second place by the
SVR model. However, the DT model is ranked last in terms
varðyr −zr Þ
VAF ¼ 1− 100 ð11Þ of its prediction accuracy and is considered not fit for purpose
varðyr Þ
in terms of predicting ROP from drilling parameters.
Cross plots of predicted versus measured values for the six
∑pr¼1 ðyr −zr Þ2
R2 ¼ 1− 2 ð12Þ developed models applied to the testing data subset are shown
∑pr¼1 yr −yr;mean in Fig. 12, each displaying the coefficient of determination

Fig. 11 Convergence of the 8

hybrid MLP-PSO network to the
optimum solution (minimum 7
RMSE)
6

5
RMSE

0
0 50 100 150 200 250 300 350
Iteraon
334 Earth Sci Inform (2019) 12:319–339

Table 5 Summary of architecture of the MLP hybridized with Table 7 Statistical indicators of ROP-prediction performance accuracy
PSO algorithm for the six developed machine learning algorithms

Network type Hybridized neural network Model Data set RMSE R2 VAF PI

Training function Particle swarm optimization Decision tree Test 1.8 0.8511 84.75 −0.0395
algorithm Train 1.15 0.924 92.41 0.73
Number of layers 3 All 1.38 0.8975 89.76 0.4633
Number of neurons in 1st hidden layer 4 Random forest Test 1.48 0.8829 87.05 0.3198
Number of neurons in 2nd hidden layer 6 Train 1.106 0.9365 93.66 0.7978
Number of neurons in output layer 1 All 1.23 0.9185 91.85 0.6451
Transfer function for 1st hidden layer TANSIG SVR Test 1.17 0.9163 91.68 0.6973
Transfer function for 2nd hidden layer TANSIG Train 0.4896 0.9877 98.77 1.49
Transfer function for Output layer PURELIN All 0.7636 0.9687 96.87 1.18
Performance function MSE MLP Test 1.3 0.9172 91.7 0.5753
Train 1.1 0.9365 93.66 0.7978
All 1.15 0.9284 92.85 0.7372
RBF Test 1.49 0.8996 89.83 0.3521
(R2) value achieved and the linear regression line to which that Train 1.48 0.8723 87.33 0.326
value applies. The R2 values show good correlations between All 1.48 0.8815 88.16 0.3355
measured and predicted values for most of the developed MLP-PSO Test 1.12 0.9492 93.39 0.7752
models. The MLP-PSO and SVR models demonstrate the best Train 0.9668 0.9596 94.96 0.9573
prediction performance based on the R2 metric. On the other All 1.08 0.941 94.15 0.87
hand, the DT model achieves the poorest accuracy based on
R2 values. The relative deviations of developed models are * The bolded rows lists the performance indices of the best predictive
model
displayed in Fig. 13. The smallest relative deviations are
displayed by the SVR and MLP-PSO models, and highest
relative deviation is associated with the DT model.
The prediction accuracy outcomes displayed in Table 7 and minima, while some evolutionary algorithms, such as PSO,
Figs. 12 and 13 also show that the random forest and decision have superior global search capabilities. These results are based
tree algorithms do not perform as well as the MLP-PSO, SVR, on drilling inputs from a single formation penetrated by a single
MLP and RBF algorithms in predicting ROP for the compiled vertical well. Further studies on a range of rock formations and
dataset. However, the random forest model provides slightly different types of well (vertical, horizontal, inclined) are re-
more accurate ROP predictions than the decision tree model. quired to verify the relative performance of these algorithms.
Interestingly, MLP-NN trained by BP algorithm (Levenberg- However, this study achieves its objective of demonstrating that
Marquardt) do not yield satisfactory results compared with its these algorithms are capable of providing highly accurate pre-
hybrid model in ROP prediction. The RMSE and R2 values of dictions for ROP from a range of drilling input variables com-
1.3 and 0.9357, respectively, were yielded for testing data of bining operational and petrophysical measurements, particular-
simple MLP model. Poorer performance of the simple MLP ly when that data is filtered for noise.
model compared to the hybrid MLP network is interpreted to In addition, Fig. 14a presents a comparison between the re-
be due to the local search behavior of BP algorithms which can sults of the SVR model trained using the preprocessed data set
result in them failing in finding optimum parameters for ANNs. and the denoised drilling rate measurements. The model trained
Some BP algorithms have a tendency to converge into local using the denoised data exhibits a very good ROP-prediction
performance and successfully models the trend of changes ob-
served in the ROP measured data for the depth interval studied.
On the other hand, the model trained with raw data (Fig. 14b)
Table 6 Control
shows large fluctuations and its results are not as well matched
parameter values applied Swarm size 80 with the measured drilling rate when compared to Fig. 14a.
in the PSO Algorithm
Maximum number of iterations 300
Cognitive Constant 0.5
Social Constant 2.5 Conclusions
Inertia Weight 5
Inertia Weight Damping Ratio 0.6 In order to predict the drilling rate, mud logging data was
extracted from drilling reports associated with a single vertical
Earth Sci Inform (2019) 12:319–339 335

40 35
35 R² = 0.8511 30 R² = 0.8829
30

Predicted (m/hr)
Predicted (m/hr)
25
25
20
20
15
15
10
10
5 5

0 0
0 10 20 30 40 0 10 20 30 40
Measured (m/hr) Measured (m/hr)
a b

40 40
35 R² = 0.9163 35 R² = 0.9172
30 30

Predicted (m/hr)
Predicted (m/hr)

25 25
20 20
15 15
10 10
5 5
0 0
0 10 20 30 40 0 10 20 30 40
Measured (m/hr) Measured (m/hr)
c d

40 40
35 R² = 0.8996 35 R² = 0.9492
30 30
Predicted (m/hr)

Predicted (m/hr)

25 25
20 20
15 15
10 10
5 5
0 0
0 10 20 30 40 0 10 20 30 40
Measured (m/hr) Measured (m/hr)
e f
Fig. 12 Cross plot of measured versus predicted values of ROP for six developed machine-learning models applied to the testing data subset: a DT; b
RF; c SVM; d MLP; e BF; and, f MLP-PSO

well, penetrating one formation with a uniform bit type and properties, petrophysical logs were added into collected data
dimensions, drilled in the Maroon field (SW Iran). In order to set. The overall data set include 11 variables with 1000 data
consider the formation characteristics and rock mechanical records divided randomly into two parts: 70% used for
336 Earth Sci Inform (2019) 12:319–339

8 8
6 6
4

Relave deviaon
Relave deviaon 4
2 2
0
0
-2
-2
-4
-4
-6
-8 -6

-10 -8
a b

8 4
6
2
Relave deviaon

Relave deviaon
2 0
0
-2 -2

-4
-4
-6
-8 -6
c d

5 4
4 3
3 2
Relave deviaon

Relave deviaon

2 1
1
0
0
-1
-1
-2 -2
-3 -3
-4 -4
-5 -5
-6 -6
e f
Fig. 13 Relative deviations of measured versus predicted values of ROP for six developed machine-learning models-The horizontal axis is the data
record index number for the testing data set: a DT; b Random forest; c SVM; d MLP; e RBF; and, f MLP-PSO

algorithm training; 30% used independently for algorithm enable the most significant input variables to be identified and
testing. Savitzky–Golay (SG) filter was implemented as a selected for optimum-ROP prediction. Eight input variables
noise reduction method for reducing the effect of noise on were selected based on the results of the GA tuning. These
the data, shortening the training time for each algorithm and were: weight on bit, bit rotational speed, pump flow rate, stand
providing models with significantly lower errors. Following pipe pressure, pore pressure, gamma ray, density log and sonic
this, a feature ranking method based on heuristic search algo- wave velocity. To provide better understanding regarding ac-
rithm was applied to the SG-filtered data as a tuning routine to curacy of mostly proposed machine learning algorithms and
Earth Sci Inform (2019) 12:319–339 337

60
Denoised ROP
50
Results of trained SVR using denoised data
40
ROP (m/hr)

0
0 50 100 150 200 250 300
Data record index number for tesng data subset
a
80

70 Measured ROP

60 Results of trained SVR using raw data

50
ROP (m/hr)

0
0 50 100 150 200 250 300
Data record index number for tesng data subset
b
Fig. 14 Measured versus predicted ROP versus data record index numbers for the SVR model applied to the testing data subset: a trained with pre-
processed drilling data, and b trained with raw drilling data

introduce the most reliable one, several statistical models were & Statistical and graphical analysis revealed that the prediction
applied to the selected input variables to generate benchmark performance of the developed hybrid MLP (MLP-PSO) was
predictions of rate of penetration against which the prediction superior to the other algorithms, but almost matched by the
performance of the different types of artificial neural networks SVR model. It was found that using a PSO algorithm for
was compared. adjusting weights and biases of MLP-NN outperform the
Main achievements of this study are outlined as follows: LM back propagation algorithm to provide a robust model
for predicting ROP with high reliability.
& Feature ranking results showed that using petrophysical & Applying the SVR, MLP, RBF and MLP-PSO algorithms
logs along with drilling parameters significantly enhance to pre-processed (filtered) data demonstrated the superior
the prediction performance of machine learning algorithms. prediction performance of these three models over the
According to the new achievements in drilling technology decision tree and random forest algorithms (statistical
and rock mechanics, petrophysical logs can be used as models) when applied to the compiled dataset.
practical tools to provide information concerning formation & The SVR model trained using pre-processed data to re-
characteristics for real time drilling estimation. duce noise (using an SG filter) significantly outperformed
338 Earth Sci Inform (2019) 12:319–339

SVR model applied to the raw data in terms of the accu- Boyacioglu MA, Avci D (2010) An adaptive network-based fuzzy inference
system (ANFIS) for the prediction of stock market return: the case of
racy of its ROP prediction capabilities.
the Istanbul stock exchange. Expert Syst Appl 37(12):7908–7912
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L (2017) Classification and regression trees. Routledge
Nomenclature WOB, Weight on bit; BRS, Bit rotation speed; BFR, Bit Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable
flow rate; PP, Pump pressure; MW, Mud weight; GR, Gamma ray; Ts, functional interpolation and adaptive networks. Royal Signals and
Sonic shear velocity; Tp, Sonic compressional velocity; Pp, Pore pressure; Radar Establishment Malvern (United Kingdom)
NP, Neutron porosity; DT, Decision tree; RF, Random forest; MLP, Multi- Ceryan N, Okkan U, Kesimal A (2013) Prediction of unconfined com-
layer perception; RBF, Radial- basis function; SVR, Support vector regres- pressive strength of carbonate rocks using artificial neural networks.
sion; PSO, Particle swarm optimization; x, Input variable value; W, Weight Environ Earth Sci 68(3):807–819
matrix; b, Bias vector; N, Number of clusters; M, Number of input and Chandrashekar G, Sahin F (2014) A survey on feature selection methods.
output variables; δ, Slack variable; ε, Error- monitoring parameter; a, Comput Electr Eng 40(1):16–28
Lagrange multiplier; K, Kernel function; ci, Center of RBF unit i; RMSE, Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):
Root mean square error; PI, Performance index; VAF, Variance account for 273–297
Darbor M, Faramarzi L, Sharifzadeh M (2017) Performance assessment
of rotary drilling using non-linear multiple regression analysis and
multilayer perceptron neural network. Bull Eng Geol Environ:1–13
References Davoudi E, Vaferi B (2018) Applying artificial neural networks for sys-
tematic estimation of degree of fouling in heat exchangers. Chem
Ab Wahab MN, Nefti-Meziani S, Atyabi A (2015) A comprehensive Eng Res Des 130:138–153
review of swarm optimization algorithms. PLoS One 10(5): Demuth H, Beale M, Hagan M (2009) MATLAB version 7.14.0.739;
e0122827 neural Network toolbox for use with Matlab. The Mathworks
Abbas AK, Rushdi S, Alsaba M (2018) Modeling rate of penetration for Deosarkar MP, Sathe VS (2012) Predicting effective viscosity of magne-
deviated Wells using artificial neural Network. Abu Dhabi tite ore slurries by using artificial neural network. Powder Technol
International Petroleum Exhibition & Conference, Society of 219:264–270
Petroleum Engineers Douglas RK, Nawar S, Alamar MC, Mouazen A, Coulon F (2018) Rapid
Abtahi A (2011) Bit wear analysis and optimization for vibration assisted prediction of total petroleum hydrocarbons concentration in contam-
rotary drilling (VARD) using impregnated diamond bits. Memorial inated soil using vis-NIR spectroscopy and regression techniques.
University of Newfoundland Sci Total Environ 616:147–155
Akgun F (2007) Drilling rate at the technical limit. Int J Pet Sci Technol 1: Duda RO, Hart PE, Stork DG (2012) Pattern classification. Sons, John
99–118 Wiley &
Anemangely M, Ramezanzadeh A, Tokhmechi B (2017) Shear wave Elsharkawy AM (1998) Modeling the properties of crude oil and gas
travel time estimation from petrophysical logs using ANFIS-PSO systems using RBF network. SPE Asia Pacific oil and gas confer-
algorithm: a case study from Ab-Teymour Oilfield. J Nat Gas Sci ence and exhibition, Society of Petroleum Engineers
Eng 38:373–387 Eskandarian S, Bahrami P, Kazemi P (2017) A comprehensive data min-
Anemangely M, Ramezanzadeh A, Tokhmechi B, Molaghab A, ing approach to estimate the rate of penetration: application of neural
Mohammadian A (2018) Drilling rate prediction from petrophysical network, rule based models and feature ranking. J Pet Sci Eng 156:
logs and mud logging data using an optimized multilayer perceptron 605–615
neural network. J Geophys Eng 15(4):1146–1159 Fattahi H, Gholami A, Amiribakhtiar MS, Moradi S (2015) Estimation of
Anemangely M, Ramezanzadeh A, Amiri H, Hoseinpour S-A (2019) asphaltene precipitation from titration data: a hybrid support vector
Machine learning technique for the prediction of shear wave veloc- regression with harmony search. Neural Comput & Applic 26(4):
ity using petrophysical logs. J Pet Sci Eng 174:306–327 789–798
Armaghani DJ, Mohamad ET, Narayanasamy MS, Narita N, Yagiz S Fausett LV (1994) Fundamentals of neural networks: architectures, algo-
(2017) Development of hybrid intelligent models for predicting rithms, and applications. Prentice-Hall Englewood Cliffs
TBM penetration rate in hard rock condition. Tunn Undergr Space Garcia LP, de Carvalho AC, Lorena AC (2015) Effect of label noise
Technol 63:29–43 in the complexity of classification problems. Neurocomputing
Asoodeh M, Bagheripour P (2013) Fuzzy classifier based support vector 160:108–119
regression framework for Poisson ratio determination. J Appl Gholami E, Vaferi B, Ariana MA (2018) Prediction of viscosity of several
Geophys 96:7–10 alumina-based nanofluids using various artificial intelligence
Atashnezhad A, Wood DA, Fereidounpour A, Khosravanian R paradigms-comparison with experimental data and empirical corre-
(2014) Designing and optimizing deviated wellbore trajecto- lations. Powder Technol 323:495–506
ries using novel particle swarm algorithms. J Nat Gas Sci Ghoreishi S, Heidari E (2013) Extraction of epigallocatechin-3-gallate from
Eng 21:1184–1204 green tea via supercritical fluid technology: neural network modeling and
Basarir H, Tutluoglu L, Karpuz C (2014) Penetration rate prediction for response surface optimization. J Supercrit Fluids 74:128–136
diamond bit drilling by adaptive neuro-fuzzy inference system and Hamrick TR (2011) Optimization of operating parameters for minimum
multiple regressions. Eng Geol 173:1–9 mechanical specific energy in drilling. West Virginia University
Bezminabadi SN, Ramezanzadeh A, Jalali S-ME, Tokhmechi B, Roustaei Hareland G, Rampersad P (1994) Drag-bit model including wear. SPE
A (2017) Effect of rock properties on ROP modeling using statistical Latin America/Caribbean Petroleum Engineering Conference,
and intelligent methods: a case study of an oil well in southwest of Society of Petroleum Engineers
Iran. Arch Min Sci 62(1):131–144 Haykin S, Network N (2004) A comprehensive foundation. Neural Netw
Bingham G (1965) A new approach to interpreting rock drillability. 2(2004):41
Technical Manual Reprint, Oil and Gas Journal (OGC) 1965:93 P Hegde C, Wallace S, Gray K (2015) Using trees, bagging, and random
Bodaghi A, Ansari HR, Gholami M (2015) Optimized support vector forests to predict rate of penetration during drilling. SPE Middle East
regression for drilling rate of penetration estimation. Central Intelligent Oil and Gas Conference and Exhibition, Society of
European Journal of Geoscience (CEJG) 7(1) Petroleum Engineers
Earth Sci Inform (2019) 12:319–339 339

Hegde C, Daigle H, Millwater H, Gray K (2017) Analysis of rate of Raji WO, Gao Y, Harris JM (2017) Wavefield analysis of crosswell seis-
penetration (ROP) prediction in drilling using physics-based and mic data. Arab J Geosci 10(9):217
data-driven models. J Pet Sci Eng 159:295–306 Redman TC (1998) The impact of poor data quality on the typical enter-
Hemmati-Sarapardeh A, Ghazanfari MH, Ayatollahi S, Masihi M (2016) prise. Commun ACM 41(2):79–82
Accurate determination of the CO2-crude oil minimum miscibility Rokach L, Maimon OZ (2008) Data mining with decision trees: theory
pressure of pure and impure CO2 streams: a robust modelling ap- and applications. World scientific
proach. Can J Chem Eng 95:253–261 Saffarzadeh S, Shadizadeh SR (2012) Reservoir rock permeability pre-
Huang G-B, Wang DH, Lan Y (2011) Extreme learning machines: a diction using support vector regression in an Iranian oil field. J
survey. Int J Mach Learn Cybern 2(2):107–122 Geophys Eng 9(3):336–344
James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to Savitzky A, Golay MJ (1964) Smoothing and differentiation of data by
statistical learning. Springer simplified least squares procedures. Anal Chem 36(8):1627–1639
Jiang W, Samuel R (2016) Optimization of rate of penetration in a convo- Shi X, Liu G, Gong X, Zhang J, Wang J, Zhang H (2016) An efficient
luted drilling framework using ant Colony optimization. IADC/SPE approach for real-time prediction of rate of penetration in offshore
Drilling Conference and Exhibition, Society of Petroleum Engineers drilling. Math Probl Eng 2016:1–13
Jiang R, Tang W, Wu X, Fu W (2009) A random forest approach to the Singh A (2015) Root-cause identification and production diagnostic for gas
detection of epistatic interactions in case-control studies. BMC bio- Wells with plunger lift. SPE Reservoir Characterisation and Simulation
informatics 10(1):S65 Conference and Exhibition, Society of Petroleum Engineers
Kahraman S (2016) Estimating the penetration rate in diamond Drilling in Singh A (2017) Application of data Mining for Quick Root-Cause
Laboratory Works Using the regression and artificial neural Identification and Automated Production Diagnostic of gas Wells
Network analysis. Neural Process Lett 43(2):523–535 with plunger lift. SPE Prod Oper 32:279–293
Khandelwal M, Armaghani DJ (2016) Prediction of drillability of rocks Sultan MA, Al-Kaabi AU (2002) Application of neural network to the
with strength properties using a hybrid GA-ANN technique. determination of well-test interpretation model for horizontal wells.
Geotech Geol Eng 34(2):605–620 SPE Asia Pacific Oil and Gas Conference and Exhibition, Society of
Lashkarbolooki, M., A. Z. Hezave and S. Ayatollahi (2012). "Artificial neural Petroleum Engineers
network as an applicable tool to predict the binary heat capacity of Szlek J, Mendyk A (2015) Package ‘fscaret’
mixtures containing ionic liquids. Fluid Phase Equilib 324(0): 102–107
Vafaie H, Imam IF (1994). Feature selection methods: genetic algorithms
Lashkenari MS, Taghizadeh M, Mehdizadeh B (2013) Viscosity predic-
vs. greedy-like search. Proceedings of IEEE International
tion in selected Iranian light oil reservoirs: artificial neural network
Conference on Fuzzy and Intelligent Control Systems
versus empirical correlations. Pet Sci 10(1):126–133
Vapnik V (2013) The nature of statistical learning theory. Springer sci-
Law MH, Figueiredo MA, Jain AK (2004) Simultaneous feature selection
ence & business media
and clustering using mixture models. IEEE Trans Pattern Anal Mach
Venkatesan P, Anitha S (2006) Application of a radial basis function neural
Intell 26(9):1154–1166
network for diagnosis of diabetes mellitus. Curr Sci 91(9):1195–1199
Lee Y, Oh S-H, Kim MW (1991) The effect of initial weights on prema-
ture saturation in back-propagation learning. IJCNN-91-Seattle in- Wang X, Tang Z, Tamura H, Ishii M, Sun W (2004) An improved
ternational joint conference on neural networks, 1991., IEEE backpropagation algorithm to avoid the local minima problem.
Lorena AC, de Carvalho AC (2004) Evaluation of noise reduction tech- Neurocomputing 56:455–460
niques in the splice junction recognition problem. Genet Mol Biol Warren T (1987) Penetration rate performance of roller cone bits. SPE
27(4):665–672 Drill Eng 2(01):9–18
Maucec M, Singh AP, Bhattacharya S, Yarus JM, Fulton DD, Orth JM Winters W, Warren T, Onyia E (1987) Roller bit model with rock ductility
(2015) Multivariate analysis and data Mining of Well-Stimulation and cone offset. SPE Annual Technical Conference and Exhibition,
Data by use of classification-and-regression tree with enhanced in- Society of Petroleum Engineers
terpretation and prediction capabilities. Society of Petroleum Wu Y, Wang H, Zhang B, K-L Du (2012) Using radial basis function
Engineers (SPE) Economics & Management 7(02):60–71 networks for function approximation and classification. ISRN
Mendes JRP, Fonseca TC, Serapião A (2007) Applying a genetic neuro- Applied Mathematics 2012
model reference adaptive controller in drilling optimization. World Yavari H, Sabah M, Khosravanian R, Wood D (2018) Application of an
oil:29–36 adaptive neuro-fuzzy inference system and mathematical rate of
Moghadassi A, Hosseini SM, Parvizian F, Al-Hajri I, Talebbeigi M penetration models to predicting drilling rate. Iranian Journal of
(2011) Predicting the supercritical carbon dioxide extraction of oreg- Oil & Gas Science and Technology (IJOGST) 7(3):73–100
ano bract essential oil. Songklanakarin Journal of Science & Yetilmezsoy K, Ozkaya B, Cakmakci M (2011) Artificial intelligence-
Technology (SJST) 33(5) based prediction models for environmental engineering. Neural
Moradi H, Bahari MH, Naghibi Sistani MB, Bahari A (2010) Drilling rate Network World 21(3):193–218
prediction using an innovative soft computing approach. Sci Res Yilmaz I, Kaynar O (2011) Multiple regression, ANN (RBF, MLP) and
Essays 5 ANFIS models for prediction of swell potential of clayey soils.
Motahhari HR, Hareland G, James J (2010) Improved drilling efficiency Expert Syst Appl 38(5):5958–5966
technique using integrated PDM and PDC bit parameters. J Can Pet Yılmaz I, Yuksek A (2008) An example of artificial neural network
Technol 49(10):45–52 (ANN) application for indirect estimation of rock parameters.
Najafi-Marghmaleki A, Barati-Harooni A, Tatar A, Mohebbi A, Rock Mech Rock Eng 41(5):781–795
Mohammadi AH (2017) On the prediction of Watson characteriza- Youssef AM, Pourghasemi HR, Pourtaghi ZS, Al-Katheeri MM (2016)
tion factor of hydrocarbons. J Mol Liq 231:419–429 Landslide susceptibility mapping using random forest, boosted re-
Ornek M, Laman M, Demir A, Yildiz A (2012) Prediction of bearing gression tree, classification and regression tree, and general linear
capacity of circular footings on soft clay stabilized with granular models and comparison of their performance at Wadi Tayyah Basin,
soil. Soils Found 52(1):69–80 Asir region, Saudi Arabia. Landslides 13(5):839–856
Orr K (1998) Data quality and systems theory. Commun ACM
41(2):66–71 Publisher’s note Springer Nature remains neutral with regard to jurisdiction-
Park J, Sandberg IW (1991) Universal approximation using radial-basis- al claims in published maps and institutional affiliations.
function networks. Neural Comput 3(2):246–257