Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization
Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization
ISSN No:-2456-2165
Abstract:- The drilling phase has been reported to be the only just started to play a substantial role in oil drill
most expensive phase of oil exploration and production, optimization. This has largely been made possible by the
hence several research efforts have been targeted at current accessibility of enormous datasets. (Braga, 2019).
improving its efficiency. The rate of penetration (ROP) Machine learning (ML) models holds promising results in
has also been identified as the most important metric for this sector, as this will lead to the efficient processing of the
improving drilling performance, hence, several research massive amounts of data, which are produced by several
efforts have reported different methods of predicting internets of thing (IoT) sensors at oil rigs to aid decision
ROP optimal values. Recently, artificial intelligence (AI) making. Major oil firms have already invested hundreds of
and machine learning (ML) models have been reported dollars in the IT infrastructure to establish Real-Time
for the prediction of ROP. However, the ROP is Operation Centers (RTOC), which read drilling data from
influenced by several factors, and the interactions among rigs in real-time. With the help of these readings, specialists
these factors introduces a kind of complexity that affects can instantly assess data in the centers, enabling quicker
its accurate prediction. This research work sets out to decision-making, a decrease in stuck pipe incidents, hole
achieve two important objectives, firstly, to investigate cleaning problems, and fluid loss occurrences, as well as an
and rank the most important factors for the prediction increase in the number of wells that can be monitored with
of the ROP, and secondly, to carry out a comparative the same amount of staff. (Al-khudiri et al., 2015).
study and ranking of selected machine learning Additionally, the accessibility of this data has provided the
algorithms for the prediction of ROP. In order to achieve essential groundwork for the application of artificial
this, the open source volve dataset which is a complete intelligence and machine learning techniques for the
set of data from the North Sea oil field was utilized. creation of smart models for more precise and reliable real-
Eighteen (18) machine learning models were built using time drilling performance monitoring and optimization.
this dataset and their performances compared. The
result showed the random forest regressor with an As a result of the enormous amounts of
RMSE value of 0.0010 and R2 score of 0.891 as the most instrumentation that modern drilling rigs possess for the
efficient algorithm among the eighteen chosen for this collection of parameters from almost every piece of
work. Further experimentation also revealed the most equipment installed in the drilling rig, using sensors to
influential factors for predicting the rate of penetration, measure their states, and enabling remote and safe
these features in order of importance are; measured operations, there has been an exponential increase in the
depth, bit rotation per minute, formation porosity, shale amount of data generated at oil rigs. This has prepared the
volume, water saturation, log permeability. The output way for the creation of predictive analytics machine learning
of this study work offers a blueprint for choosing models and decision support systems.
algorithms and features when implementing ML
solutions for optimizing oil drilling, and this is helful in As researchers continue to study these datasets created
at oil rigs, choosing the appropriate machine learning
the development of real-time ROP prediction models and
hybridization. algorithms and features for the precise prediction of ROP
poses a challenge because the ROP is influenced by a
Keywords:- Rate of Penetration Prediction, oil drilling, number of variables that have complex relationships, and the
machine learning, feature selections extent of their influence also varies as some are more
relevant than others. In addition to implementing and
I. INTRODUCTION contrasting various ML techniques, the goal of this research
is to investigate and rank the factors that have been
Several researchers have noted that the drilling phase published in the literature for ROP prediction. A machine
remains the most expensive phase of oil exploration and learning model built with many of the lowly influential
production (Cao, et al, 2021, Sircar, et al, 2021; Darwesh et factors or with a less efficient algorithm is not likely to give
al., 2019; Ameloko et al., 2019; and Lashari et al., 2019). satisfactory results. By focusing on the most crucial
Therefore, ongoing research projects aimed at drilling elements, these models will perform better in terms of
process optimization to achieve a decrease in the overall prediction, computation, and training time, and will be
expenses connected with the drilling process have been easier to understand. (Acheme, et al., 2022).
reported. Although equipment, products, and processes are
always being improved, machine learning methods have
Business
Understanding
Business
requirements
Pattern
Identification
Staging
DATA
Validation/Testing Wrangling
Performance Preparation
Retraining Labelling
Data Modelling
Algorithm Selection
Training
Tuning
Received at RTOC
Stored in EDR
Build Machine Learning Models
Eighteen Regression Algorithms
Implemented
C. Data Modelling and Evaluation Cooks distance outlier detection was performed to
In order to learn more from the data and find hidden estimate outliers in the dataset (figure 3). An estimation of a
trends, exploratory data analysis was next done. The data is data point's influence is called the Cook's Distance. It takes
then split into training and testing portions in a 70:30 ratio each observation's leverage and residual into account. When
using the chosen features. Then, machine learning the ith observation is taken out of a regression model, the
algorithms receive this. The machine learning algorithms change in the model is calculated as Cook's Distance.
employed and their performance comparison are shown in
Table 1.
To examine the ROP_AVG target variable's prediction D. Evaluation of the Random Forest Regressor
accuracy using the chosen features, the models provided in Further evaluation analysis of the algorithm, including
Table 2 were put into practice. Regression model evaluation the residual plot, error plot, learning and validation curves,
standards like MAE, MSE, RMSE, R2, and others are used was conducted after it was determined that the random
as the comparison measures. According to our findings, the forest (rf) algorithm was the most effective among the
random forest regressor model performed better than all the selected eighteen (18) machine learning algorithms tested
others and is ranked number 1, whereas the passive with the dataset. Additionally, feature priority ranking was
aggressive regressor performed poorly and is ranked number done to determine which features were most crucial for
18. predicting ROP. Figures 4 show these results.
The model's fit was verified using the residual plot Figure 4's plot displays erratically spaced points that retain
under the assumptions of constant variance, normality, and an approximately constant width around the line of identity;
error independence. The discrepancy between the this is a sign of a sound model because it is close to a null
observational and fitted values can be seen on the plot. residual plot.
Fig. 6: Learning and validation curve for the random forest regressor
With the use of the learning and validation curves E. Feature Importance and Ranking
(Figure 6), the performance of the model was further Calculating the relevance of a feature involves weighing
examined. These diagrams display a model's performance the decrease in node impurity by the likelihood of reaching
with time or as the training data set grows. They are helpful that node. The node probability can be computed by
for models created using incremental datasets. The dividing the total number of samples by the number of
validation curve demonstrates how effectively the model samples that reach the node. The values of the more
generalizes with values that have not previously been significant traits are higher.
observed, while the training curves demonstrate how well
the model learns.
As seen in figure 7, the features are measured depth, [4.] Ameloko A.A., Uhegbu G.C. and Bolujo E. (2019)
rotations per minute on the surface, shale volume, weight on Evaluation of Seismic and petrophysical parameters
bit, formation porosity, water saturation, and log for hydrocarbon prospecting of G-field, Niger Delta,
permeability, in that order of importance. Nigeria Journal of Petroleum Exploration and
Production Technology (2019) 9:2531–2542.
IV. SUMMARY AND CONCLUSION
REFERENCES