0% found this document useful (0 votes)
80 views

Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization

The drilling phase has been reported to be the most expensive phase of oil exploration and production, hence several research efforts have been targeted at improving its efficiency. The rate of penetration (ROP) has also been identified as the most important metric for improving drilling performance, hence, several research efforts have reported different methods of predicting ROP optimal values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization

The drilling phase has been reported to be the most expensive phase of oil exploration and production, hence several research efforts have been targeted at improving its efficiency. The rate of penetration (ROP) has also been identified as the most important metric for improving drilling performance, hence, several research efforts have reported different methods of predicting ROP optimal values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Investigating and Ranking the Rate of


Penetration (ROP) Features for Petroleum
Drilling Monitoring and Optimization
Ijegwa David Acheme, Osemengbe Oyaimare Uddin*Ayodeji Samuel Makindes
Department of Computer Science, Edo State University Uzairue,
Nigeria

Abstract:- The drilling phase has been reported to be the only just started to play a substantial role in oil drill
most expensive phase of oil exploration and production, optimization. This has largely been made possible by the
hence several research efforts have been targeted at current accessibility of enormous datasets. (Braga, 2019).
improving its efficiency. The rate of penetration (ROP) Machine learning (ML) models holds promising results in
has also been identified as the most important metric for this sector, as this will lead to the efficient processing of the
improving drilling performance, hence, several research massive amounts of data, which are produced by several
efforts have reported different methods of predicting internets of thing (IoT) sensors at oil rigs to aid decision
ROP optimal values. Recently, artificial intelligence (AI) making. Major oil firms have already invested hundreds of
and machine learning (ML) models have been reported dollars in the IT infrastructure to establish Real-Time
for the prediction of ROP. However, the ROP is Operation Centers (RTOC), which read drilling data from
influenced by several factors, and the interactions among rigs in real-time. With the help of these readings, specialists
these factors introduces a kind of complexity that affects can instantly assess data in the centers, enabling quicker
its accurate prediction. This research work sets out to decision-making, a decrease in stuck pipe incidents, hole
achieve two important objectives, firstly, to investigate cleaning problems, and fluid loss occurrences, as well as an
and rank the most important factors for the prediction increase in the number of wells that can be monitored with
of the ROP, and secondly, to carry out a comparative the same amount of staff. (Al-khudiri et al., 2015).
study and ranking of selected machine learning Additionally, the accessibility of this data has provided the
algorithms for the prediction of ROP. In order to achieve essential groundwork for the application of artificial
this, the open source volve dataset which is a complete intelligence and machine learning techniques for the
set of data from the North Sea oil field was utilized. creation of smart models for more precise and reliable real-
Eighteen (18) machine learning models were built using time drilling performance monitoring and optimization.
this dataset and their performances compared. The
result showed the random forest regressor with an As a result of the enormous amounts of
RMSE value of 0.0010 and R2 score of 0.891 as the most instrumentation that modern drilling rigs possess for the
efficient algorithm among the eighteen chosen for this collection of parameters from almost every piece of
work. Further experimentation also revealed the most equipment installed in the drilling rig, using sensors to
influential factors for predicting the rate of penetration, measure their states, and enabling remote and safe
these features in order of importance are; measured operations, there has been an exponential increase in the
depth, bit rotation per minute, formation porosity, shale amount of data generated at oil rigs. This has prepared the
volume, water saturation, log permeability. The output way for the creation of predictive analytics machine learning
of this study work offers a blueprint for choosing models and decision support systems.
algorithms and features when implementing ML
solutions for optimizing oil drilling, and this is helful in As researchers continue to study these datasets created
at oil rigs, choosing the appropriate machine learning
the development of real-time ROP prediction models and
hybridization. algorithms and features for the precise prediction of ROP
poses a challenge because the ROP is influenced by a
Keywords:- Rate of Penetration Prediction, oil drilling, number of variables that have complex relationships, and the
machine learning, feature selections extent of their influence also varies as some are more
relevant than others. In addition to implementing and
I. INTRODUCTION contrasting various ML techniques, the goal of this research
is to investigate and rank the factors that have been
Several researchers have noted that the drilling phase published in the literature for ROP prediction. A machine
remains the most expensive phase of oil exploration and learning model built with many of the lowly influential
production (Cao, et al, 2021, Sircar, et al, 2021; Darwesh et factors or with a less efficient algorithm is not likely to give
al., 2019; Ameloko et al., 2019; and Lashari et al., 2019). satisfactory results. By focusing on the most crucial
Therefore, ongoing research projects aimed at drilling elements, these models will perform better in terms of
process optimization to achieve a decrease in the overall prediction, computation, and training time, and will be
expenses connected with the drilling process have been easier to understand. (Acheme, et al., 2022).
reported. Although equipment, products, and processes are
always being improved, machine learning methods have

IJISRT23OCT215 www.ijisrt.com 1841


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
II. RELATED WORKS the prediction of ROP. The research that have proposed ML
models for ROP prediction are reviewed in this section.
The speed at which a wellbore is being drilled can be
used to define the rate of penetration (ROP). By monitoring The majority of research has been reported on using
the depth at regular periods of time in feet or meters per neural networks as a machine learning method. For ROP
hour, one can manually calculate this. High ROP values prediction, these neural network models have used a variety
suggest quick drilling, which translates to higher drilling of input parameters (Jahanbakhshi, 2012). A hybrid neural
productivity. Reducing this time in order to attain a greater network model was proposed by Ashrafi et al. (2011) that
ROP is a crucial optimization approach for oil firms because made use of the Savitzky-Golay (SG) smoothing filter to
ROP is such a direct measurement of the overall time remove noise from retrieved data in order to estimate the
necessary to drill an oil well. This section presents many rate of penetration.
approaches that have been used to optimize ROP. These
methodologies, which can be broadly categorized into A feedforward neural network model for predicting
traditional and data-driven models, have focused on penetration rate was published by Lashari et al. in 2019. The
modeling and predicting the ROP using specific drilling work made use of a few elements, including differential
parameters that can be manipulated on the surface, such as pressures, mud flow, bit weight, and bit rotations per
weight-on-bit (WOB), rotary speed (RPM), etc. Data-driven minute. The input variables were made up of these
models refer to machine learning methods for the prediction attributes. Datasets used for the creation of their model came
of ROP, while traditional models refer to mathematical from both an oil field and lab simulations. By comparing the
equations that have been developed by tests and field projected values with the actual measured value, the
experience. anticipated ROP values are then utilized to detect bit failure
or malfunction. Any detected variance suggests that the bit
A. Traditional ROP Models is performing below par, and this can be a red signal.
One of the early mathematical models for ROP
prediction was developed by Maurer in 1962, who used a An artificial neural network (ANN) model was used in
rock cratering technique to develop a formula using the the study by Wang and Salehi (2015) to forecast hydraulics
parameters bit diameter, rock strength, weight on bit pump pressure and to provide early warnings. The model
(WOB), and rotations per minute (RPM). This is according was implemented using MATLAB's fitting tool, and the
to Alsaihati et al (2022). Another early mathematical sensitivity of the chosen input parameters was examined
equation-based ROP prediction model is the Bingham using the forward regression method. Data sets were
model, which is described in Hegde et al. (2018). It uses gathered from chosen well samples and used to verify the
similar input parameters along with an extra empirical model. In similar formations, the model predicted pump
constant, "k," which stands for a parameter that was pressure vs well depth. While powerful tools, neural
dependent on formation. In Eckel (1967), Eckel presented a networks have proven to be particularly effective at handling
further early conventional model that examined the impact high-dimensional modeling. (Hinton et al., 2012;
of mud on ROP. The Bourgoyne and Young (BY) model Schmidhuber, 2015; Hegde et al., 2015) contend that when
(Bourgoyne & Young, 1974) is the earliest model that has applied to low dimensional issues, they typically
garnered the most attention and media coverage. The underperform when compared to simpler machine learning
formation strength, undercompaction, normal compaction models like random forest, which have reported greater
trend, differential pressure, bit diameter and weight, prediction accuracies. ROP is typically monitored in real-
rotational speed, tooth wear, and bit hydraulics were other time by equipment that uses measurement-while-drilling
geological and physical aspects that were taken into (MWD) techniques. The optimization of the rate of
consideration. penetration is required since greater ROP values indicate
that drilling distance is being covered more quickly. Oil
B. Data Science Models drilling businesses want to cover greater distances more
The goal of leveraging data gathered during drilling to quickly in order to save time and money. WOB and RPM
create predictive models of ROP is the application of data are two factors that can be directly regulated and have an
science and machine learning techniques for the prediction impact on the rate ROP. The soil formation affects the other
and optimization of ROP. In order to forecast the rate of factors (PHIF, VSH, SW, and KLOGH). ROP first rises
penetration, such models use surface-measured until a point called the founder point or the sweet spot
characteristics as input variables, such as weight on bit, (optimum point), after which it starts to fall. This has been
rotations per minute, and flow rate. Oil drilling typically observed through studies. As a result, to retain the best
entails extensive data collection from both surface and performance moving forward, the values of the external
subsurface areas employing IOT sensors. These sensors can variables must be raised. Regrettably, ROP does not always
gather a lot of information on the condition of the bit rise proportionately to changes in these variables' values.
underneath. Plotting, analyzing, and controlling bit
performance, in this case the ROP, are done using the
obtained data. Due to the fact that ML models the
relationship between input factors in order to predict an
output (target) variable, the availability of these datasets has
created the groundwork for the construction of models for

IJISRT23OCT215 www.ijisrt.com 1842


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
III. PROPOSED METHODOLOGY  understanding the business issue, that is the problem
 Data preparation and cleanup
The process used in this work to create machine  Data modeling
learning and data science models is conventional. Six (5)
 Model Assessment
phases make up the methodology depicted in figure 1, and
 Model Implementation
they are as follows:

Business
Understanding
 Business
requirements
 Pattern
Identification
 Staging

Model Data Understanding


Operationalization  Data Sources
 Deployment  Environment
 Scoring/feedback  Quality Assessment

DATA

Model Evaluation Data Preparation

 Validation/Testing  Wrangling
 Performance  Preparation
 Retraining  Labelling

Data Modelling
 Algorithm Selection
 Training
 Tuning

Fig. 1: Data Science Methodology (Nwankwo, 2020)

A. Understanding the Problem and Data Collection  Height (measured height)


The open source volve data was the dataset used in this  WOB (Weight on bit)
study. This comprehensive set of North Sea oil field data is  SURF_RPM (surface rotation per minute)
made up of real-time drilling data and Computed  PHIF (formation porosity)
Petrophysical Output (CPO) log data from well number  Shale Volume (VSH)
15/9-F-15 in the Volve Oil Field in the North Sea (Equinor  Water saturation (SW)
2018). It is available for research, study, and development
 Log permeability (KLOGH).
purposes. Seven (7) input variables and one (1) target
 TARGET VARIABLE: ROP_AVG (rate of penetration
variable make up this dataset. which are:
average)

IJISRT23OCT215 www.ijisrt.com 1843


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Table 1: Snapshot of the Dataset
Depth WOB SURF_RPM ROP_AVG PHIF VSH SW KLOGH
0 3305 26217.864 1.314720 0.004088 0.086711 0.071719 1.000000 0.001000
1 3310 83492.293 1.328674 0.005159 0.095208 0.116548 1.000000 0.001000
2 3315 97087.882 1.420116 0.005971 0.061636 0.104283 1.000000 0.001000
3 3320 54793.206 1.593931 0.005419 0.043498 0.110040 1.000000 0.001000
4 3325 50301.579 1.653262 0.005435 0.035252 0.120808 1.000000 0.001000
... ... ... ... ... ... ... ... ...
146 4065 71081.752 2.104258 0.008808 0.087738 0.291586 1.000000 0.162925
147 4070 72756.626 2.333038 0.008824 0.019424 0.503175 1.000000 -0.001124
148 4075 83526.789 2.333326 0.008799 0.054683 0.689640 1.000098 0.002261
149 4080 84496.549 2.334673 0.008375 0.022857 0.640100 1.000000 0.001000
150 4085 86658.559 2.331339 0.008454 0.022857 0.640100 1.000000 0.001000

There were a total of 150 entries in the dataset, each B. METHODOLOGY


with eight (8) features. The dataset listed in Table 1 was used to construct the
chosen machine learning algorithms in order to meet the
goals of this study. Figure 2 displays the many steps of the
complete procedure.
DATA
COLLECTION Data Transformation
Data Cleaning Data Feature
Data Wrangling Engineering
From IOT sensors

Received at RTOC

Stored in EDR
Build Machine Learning Models
Eighteen Regression Algorithms
Implemented

 Comparative Analysis of Model


Performance
 Evaluation of Features  Ranking of Models
 Ranking of Features  Adoption of Most efficient Model

Fig. 2: Proposed Architecture

C. Data Modelling and Evaluation Cooks distance outlier detection was performed to
In order to learn more from the data and find hidden estimate outliers in the dataset (figure 3). An estimation of a
trends, exploratory data analysis was next done. The data is data point's influence is called the Cook's Distance. It takes
then split into training and testing portions in a 70:30 ratio each observation's leverage and residual into account. When
using the chosen features. Then, machine learning the ith observation is taken out of a regression model, the
algorithms receive this. The machine learning algorithms change in the model is calculated as Cook's Distance.
employed and their performance comparison are shown in
Table 1.

IJISRT23OCT215 www.ijisrt.com 1844


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 3: Cook’s Distance Outlier Detection

Table 2: Selected Regression Models and their Performances


Model MAE MSE RMSE R2 RMSLE MAPE TT (Sec)
rf Random Forest Regressor 0.0006 0.0000 0.0010 -0.0891 0.0010 0.1082 0.407
gbr Gradient Boosting Regressor 0.0006 0.0000 0.0010 -0.1076 0.0010 0.1133 0.045
et Extra Trees Regressor 0.0006 0.0000 0.0009 -0.1805 0.0009 0.1064 0.363
huber Huber Regressor 0.0007 0.0000 0.0010 -0.1856 0.0010 0.1213 0.029
dt Decision Tree Regressor 0.0007 0.0000 0.0012 -0.3617 0.0012 0.1298 0.014
knn K Neighbors Regressor 0.0009 0.0000 0.0013 -0.5364 0.0012 0.1572 0.059
ridge Ridge Regression 0.0007 0.0000 0.0010 -0.5516 0.0010 0.1219 0.012
br Bayesian Ridge 0.0007 0.0000 0.0010 -0.5530 0.0010 0.1224 0.014
en Elastic Net 0.0009 0.0000 0.0012 -0.5729 0.0012 0.1531 0.013
lightgbm Light Gradient Boosting Machine 0.0007 0.0000 0.0011 -0.5771 0.0011 0.1281 0.046
lr Linear Regression 0.0007 0.0000 0.0010 -0.5861 0.0010 0.1227 0.304
lar Least Angle Regression 0.0007 0.0000 0.0010 -0.5861 0.0010 0.1227 0.013
lasso Lasso Regression 0.0009 0.0000 0.0012 -0.6040 0.0012 0.1532 0.014
llar Lasso Least Angle Regression 0.0009 0.0000 0.0012 -0.6929 0.0012 0.1531 0.014
dummy Dummy Regressor 0.0009 0.0000 0.0012 -0.6929 0.0012 0.1531 0.013
omp Orthogonal Matching Pursuit 0.0007 0.0000 0.0011 -0.7197 0.0011 0.1262 0.012
ada AdaBoost Regressor 0.0007 0.0000 0.0011 -0.7845 0.0011 0.1267 0.067
par Passive Aggressive Regressor 0.0079 0.0001 0.0080 -138.4456 0.0080 1.0000 0.013

To examine the ROP_AVG target variable's prediction D. Evaluation of the Random Forest Regressor
accuracy using the chosen features, the models provided in Further evaluation analysis of the algorithm, including
Table 2 were put into practice. Regression model evaluation the residual plot, error plot, learning and validation curves,
standards like MAE, MSE, RMSE, R2, and others are used was conducted after it was determined that the random
as the comparison measures. According to our findings, the forest (rf) algorithm was the most effective among the
random forest regressor model performed better than all the selected eighteen (18) machine learning algorithms tested
others and is ranked number 1, whereas the passive with the dataset. Additionally, feature priority ranking was
aggressive regressor performed poorly and is ranked number done to determine which features were most crucial for
18. predicting ROP. Figures 4 show these results.

IJISRT23OCT215 www.ijisrt.com 1845


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 4: Residuals for the random forest algorithm

The model's fit was verified using the residual plot Figure 4's plot displays erratically spaced points that retain
under the assumptions of constant variance, normality, and an approximately constant width around the line of identity;
error independence. The discrepancy between the this is a sign of a sound model because it is close to a null
observational and fitted values can be seen on the plot. residual plot.

Fig. 5: Prediction error for the random forest algorithm

Fig. 6: Learning and validation curve for the random forest regressor

With the use of the learning and validation curves E. Feature Importance and Ranking
(Figure 6), the performance of the model was further Calculating the relevance of a feature involves weighing
examined. These diagrams display a model's performance the decrease in node impurity by the likelihood of reaching
with time or as the training data set grows. They are helpful that node. The node probability can be computed by
for models created using incremental datasets. The dividing the total number of samples by the number of
validation curve demonstrates how effectively the model samples that reach the node. The values of the more
generalizes with values that have not previously been significant traits are higher.
observed, while the training curves demonstrate how well
the model learns.

IJISRT23OCT215 www.ijisrt.com 1846


Volume 8, Issue 10, October 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 7: Feature importance

As seen in figure 7, the features are measured depth, [4.] Ameloko A.A., Uhegbu G.C. and Bolujo E. (2019)
rotations per minute on the surface, shale volume, weight on Evaluation of Seismic and petrophysical parameters
bit, formation porosity, water saturation, and log for hydrocarbon prospecting of G-field, Niger Delta,
permeability, in that order of importance. Nigeria Journal of Petroleum Exploration and
Production Technology (2019) 9:2531–2542.
IV. SUMMARY AND CONCLUSION

In comparison to other stated traditional


methodologies, the creation and implementation of efficient
machine learning applications for ROP prediction offers
superior outcomes. This is because there are more datasets
available that are produced at oil rigs, but choosing the best
machine learning features and algorithms presents a real
difficulty. It's unlikely that a model created using a lot of
insignificant factors or a less effective algorithm can
produce adequate results. Because of this, we evaluated 18
machine learning methods in this research effort by creating
these models from the Volve drilling dataset of the North
sea in order to compare and rate their performance. The end
result of this work offers a blueprint for choosing algorithms
and features for developing ML solutions for optimizing oil
drilling. Hybridization and the creation of real-time ROP
prediction algorithms can both benefit from this.

REFERENCES

[1.] Acheme, I. D., Vincent, O. R., & Olayiwola, O. M.


(2022). Data Science Models for Short-Term
Forecast of COVID-19 Spread in Nigeria. In
Decision Sciences for COVID-19 (pp. 343-363).
Springer, Cham. https://doi.org/10.1007/978-3-030-
87019-5_20
[2.] Al-khudiri, M. M., Al-sanie, F. S., Paracha, S. A.,
Miyajan, R. A., Awan, M. W., Aramco, S., Kashif,
M., and Ashraf, H. M. (2015). Application Suite for
24 / 7 Real-Time Operation Centers 2.Operation
Centers ' Systems.
[3.] Alsaihati, A., Elkatatny, S., & Gamal, H. (2022).
Rate of penetration prediction while drilling vertical
complex lithology using an ensemble learning model.
Journal of Petroleum Science and Engineering, 208,
109335.

IJISRT23OCT215 www.ijisrt.com 1847

You might also like