FINAL
FINAL
Prof. RS MISHRA
DEC 2024
CANDIDATE’S DECLARATION
We, AAKARSHIT TAWRA (2K21/PE/01) ADITYA ROY (2K21/PE/02) ISHANT CHAUDHRY (2K21/PE/23)
KAUSTUBH KRISHNA CHAUKIYAL (2K21/PE/31) students of B. Tech. Pro Engineering, hereby declare that the
project Dissertation titled "Application of machine learning in predicting the yield strength of API steels"
which is submitted by us to the Department of Delhi Technological University, Delhi in partial fulfillment of
the requirement for the award of the degree of Bachelor of Technology, is original and not copied from any
source without proper citation. This work has not previously formed the basis for the award of any Degree,
Diploma Associateship, Fellowship or other similar title or recognition.
I hereby certify that the Project Dissertation titled “Application of machine learning in predicting the yield
strength of API steels” which is submitted by AAKARSHIT TAWRA (2K21/PE/01) ADITYA ROY (2K21/PE/02)
ISHANT CHAUDHRY (2K21/PE/23) KAUSTUBH KRISHNA CHAUKIYAL (2K21/PE/31) B.Tech. Production and
industrial Engineering, Delhi Technological University, Delhi in partial fulfillment of the requirement for the
award of the degree of Bachelor of Technology, is a record of the project work carried out by the students
under my supervision. To the best of my knowledge this work has not been submitted in part or full for any
Degree or Diploma to this University or elsewhere.
Place: Delhi
Date: SUPERVISER
ABSTRACT
The significance of machine learning has grown exponentially across diverse fields, including the mechanical
industry. API steels being a subgroup of high-strength low-alloy (HSLA) steels have been designed for use in
the petroleum industry. In this research, the application of machine learning models to estimate the mechanical
properties of API steels was explored. Both non-linear and linear machine learning models were employed to
predict the tensile strength and yield strength of API steels. The models were evaluated using different
performance metrics on test samples, which produced promising results. The results exhibit the effectiveness of
machine learning techniques in predicting mechanical properties, making them a valuable tool for researchers
and engineers in the materials industries.
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to all those who have contributed to the successful completion
of this research project. First and foremost, we extend our heartfelt appreciation to our supervisor Prof. RS
MISHRA for his guidance, support, and valuable insights throughout the entire research process. His expertise
and encouragement have been instrumental in shaping our work.
We are also immensely grateful to the participants who provided the necessary data for this study. Their
cooperation and willingness to share their knowledge and experiences have been invaluable. Furthermore, we
would like to acknowledge the contributions of the various research papers, websites, and handbooks from
which we sourced relevant information. Their work has laid the foundation for our research and enriched our
understanding of the subject matter.
Last but not least, we would like to express our gratitude to our families for their unwavering support and
understanding throughout this endeavor. Their love, patience, and encouragement have been a constant source
of motivation.
In conclusion, we acknowledge the collective efforts of all those who have contributed to this research project.
Their support and contributions have been essential in our journey towards achieving our research objectives.
CANDIDATE’S DECLARATION......................................................................................ii
CERTIFICATE....................................................................................................................iii
ABSTRACT......................................................................................................................... iv
ACKNOWLEDGEMENT................................................................................................... v
CONTENTS......................................................................................................................... vi
LIST OF TABLES.............................................................................................................viii
LIST OF SYMBOLS AND ABBREVIATIONS............................................................... ix
CHAPTER 1 INTRODUCTION.........................................................................................1
1.1 HSLA STEELS............................................................................................................................... 2
1.2 API STEELS.................................................................................................................................. 4
1.3 MACHINE LEARNING.................................................................................................................. 7
CHAPTER 3 METHODOLOGY......................................................................................16
3.1 DATA COLLECTION.................................................................................................................... 16
3.2 DATA ANALYSIS......................................................................................................................... 20
3.3 FEATURE SELECTION.................................................................................................................22
3.4. MODEL TRAINING....................................................................................................................23
3.4.1 Multiple regression........................................................................................................... 24
3.4.2. Decision Tree.................................................................................................................... 28
3.4.3 Random Forest.................................................................................................................. 30
3.4.4 Extreme gradient boosting................................................................................................31
3.5 MODEL EVALUATION................................................................................................................ 32
Symbols
∑ Summation
Coefficients
Residual sum of squares
Penalty term in regularization of multiple regression
Abbreviations
ANN Artificial Neural Network
API American Petroleum Institute
ARB Accumulative Roll Bonding
ASM American Society of Mechanical Engineers
ASTM American Society for Testing and Materials
CART Classification and Regression Tree
CEN European Committee for Standardization
ECB European Chemical Bulletin
ER Ensemble Regression
GPR Gaussian Process Regression
HEA High Entropy Alloys
HSLA High-Strength Low-Alloy
MAE Mean Absolute Error
MAPE Mean Absolute Percentage error
ML Machine Learning
MMC Metal Matrix Composites
MSE Mean Squared Error
NLP Natural Language Processing
R2 Score Coefficient of Determination
RF Random Forest
SAE Society of Automotive Engineers
SVR Support Vector Regression
TS Tensile Strength
UTS Ultimate Tensile Strength
w.r.t with respect to
XGBoost Extreme Gradient Boosting
YS Yield Strength
1
CHAPTER 1 INTRODUCTION
Purpose of the Study
The primary purpose of this study is to leverage machine learning techniques to predict the
mechanical properties of API steels, specifically yield strength and tensile strength, with high
accuracy and efficiency. By developing data-driven models, the study aims to provide a
cost-effective, time-saving alternative to traditional experimental methods for evaluating material
properties.
This research also focuses on exploring the relationship between the chemical composition of API
steels and their mechanical behavior. By identifying the key factors influencing strength, the
study seeks to optimize material design for the petroleum industry, ensuring safety, durability, and
performance in critical applications such as pipelines, offshore platforms, and drilling equipment.
The motivation for this project stems from the challenges faced in traditional material testing and
the evolving needs of the energy sector:
1.Challenges in Traditional Testing: Conventional methods to determine mechanical
properties, such as tensile and yield strength, rely on extensive laboratory experiments. These
methods are not only time-consuming and expensive but also often impractical when dealing
with large datasets or repeated testing.
2.Need for Precision in the Petroleum Industry: API steels are extensively used in the
petroleum industry, where safety and performance are paramount. Pipelines and other
infrastructure are subjected to extreme pressures, temperatures, and corrosive environments,
requiring materials with reliably predicted mechanical properties to avoid catastrophic failures.
3.Advancements in Machine Learning: Machine learning offers a modern approach to
address these challenges by enabling faster, more accurate predictions of material properties using
historical data. The ability to model complex relationships between chemical composition and
mechanical behavior motivates the integration of data-driven techniques in material science.
4.Demand for Innovation in Material Design: As the energy sector evolves, there is a
growing demand for optimized materials that balance strength, weight, and corrosion
resistance. This study aligns with that demand by providing insights into material behavior,
helping manufacturers design steels that meet specific industry needs.
2
1.1 HSLA STEELS
High-strength low-alloy (HSLA) steels represent a specialized class of steels that have been
developed to meet the growing demand for materials that combine high strength, durability, and cost
efficiency. Unlike traditional carbon steels, HSLA steels do not rely solely on carbon content for their
mechanical properties. Instead, their strength and performance are achieved through the precise
addition of microalloying elements and advanced processing techniques.
HSLA steels are characterized by their ability to provide superior mechanical properties while
maintaining a relatively low weight. This makes them an integral material in industries that require
components to withstand high loads, resist wear, and perform reliably under challenging conditions.
The term “high-strength low-alloy” itself encapsulates the essential features of these steels: they offer
high mechanical strength compared to conventional carbon steels, and they achieve this through low
levels of alloying elements (typically less than 5% by weight).
The genesis of HSLA steels can be traced back to the need for materials that could overcome the
limitations of traditional steels. Conventional carbon steels, while versatile, have a trade-off between
strength and ductility, with higher strength often resulting in reduced formability and weldability.
HSLA steels were developed to break this compromise, offering a unique combination of attributes.
One of the most significant breakthroughs in HSLA steel development was the introduction of
microalloying, where minute quantities of elements such as vanadium, niobium, and titanium are
added. These elements refine the grain structure of the steel, leading to significant improvements in
strength and toughness. This grain refinement is typically achieved during controlled rolling
processes, where precise temperature and deformation control allow for the production of steels with
exceptional performance characteristics.
In addition to their enhanced strength, HSLA steels are notable for their ability to maintain toughness
across a wide range of temperatures. This toughness ensures that the material can absorb energy and
resist fractures even in cold environments, making it an ideal choice for applications in regions with
extreme weather conditions.
Another defining feature of HSLA steels is their ability to resist atmospheric and environmental
corrosion. While traditional steels may require protective coatings or treatments to achieve similar
levels of resistance, HSLA steels inherently offer better durability due to the specific alloying
elements used in their composition. This property significantly enhances their longevity and reduces
maintenance requirements in structures and components exposed to harsh conditions.
HSLA steels are classified into several categories based on their mechanical properties,
3
chemical composition, and processing methods. Here are some common classification
systems for HSLA steels:
1. ASTM: The American Society for Testing and Materials (ASTM) has several
standards for HSLA steels based on their yield strength, tensile strength, and elongation.
For example, ASTM A572 and A588 are HSLA steels with minimum yield strengths of 50
ksi and 70 ksi, respectively.
2. SAE: The Society of Automotive Engineers (SAE) classifies HSLA steels based on
their minimum yield strength and composition. For example, SAE J1392 defines several
grades of HSLA steels, such as 050XLK, 060XLK, and 070XLK, which have minimum
yield strengths of 50 ksi, 60 ksi, and 70 ksi, respectively.
3. API: The American Petroleum Institute (API) has standards for HSLA steels used in
oil and gas industry, like API 5L X70 and X80, which have minimum yield strengths of 70
ksi and 80 ksi, respective
4. CEN: The European Committee for Standardization (CEN) classifies HSLA steels
based on their mechanical properties and chemical composition. For example, EN 10149
specifies several grades of HSLA steels with minimum yield strengths ranging from 315
MPa to 700 MPa.
Overall, the classification of HSLA steels is based on their intended application and the
required mechanical properties, and the standards organizations have developed various
systems to ensure that HSLA steels meet the specific requirements for their end use.
4
1.2 API STEEL
API steels, also known as American Petroleum Institute steels, are a group of high-strength
low-alloy (HSLA) steels [2] that are specifically designed for use in the petroleum industry [3].
These steels are developed and tested by the American Petroleum Institute (API) to meet certain
standards for strength, toughness, and durability in harsh environments.
API steels are often made with a mixture of alloying elements including nickel, molybdenum,
and chromium, which enhance the strength and corrosion resistance of the steel [4]. They are also
often designed to withstand extreme temperatures, pressures, and corrosive environments commonly
encountered in the oil and gas industry [5], [6].
API steels are commonly used in a range of applications including pipelines, offshore platforms,
and drilling equipment, where their high strength and toughness make them ideal for withstanding
the demanding conditions of the industry.
API steels are named according to a standardized system that includes both a letter and a number.
The letter indicates the type of steel, while the number indicates the minimum yield strength of the
steel in ksi [7].
For example, in the case of X56, the "X" indicates that this is a type of high-strength, low- alloy
steel, while the "56" indicates that the minimum yield strength of the steel is 56 ksi .
The letter designation system for API steels includes several different types of steel, including:
● L: Low-alloy steel
● C: Carbon-manganese steel
● P: Chromium-molybdenum steel
The number that follows the letter designation specifies the minimum yield strength of the steel in
5
ksi. For example, X70 steel has a minimum yield strength of 70 ksi, while L80 steel has a minimum
yield strength of 80 ksi.
It's important to note that API steel grades also have other requirements beyond yield strength, such
as maximum hardness and minimum toughness, that must be met in order to be used in specific
applications in the petroleum industry.
API steels are a versatile material that can be used in a wide range of harsh environments, making
them ideal for use in the petroleum industry. The following are some of the most common API steel
grades and their general characteristics:
● J55: A low to medium-strength carbon steel with a minimum yield strength of 55 ksi,
primarily used for casing and tubing in mildly corrosive environments.
● K55: A medium-strength carbon steel with a minimum yield strength of 55 ksi, similar to
J55 but with slightly higher mechanical properties and a lower sulfur content.
● N80: A medium-strength, low-alloy steel with a minimum yield strength of 80 ksi,
primarily used for casing and tubing in moderate to highly corrosive environments.
● P110: A high-strength, low-alloy steel with a minimum yield strength of 110 ksi, primarily
used for casing and tubing in highly corrosive environments or wells with high pressure and high
stress.
● X52, X60, X65, and X70: High-strength, low-alloy steels with minimum yield strengths of
52 ksi, 60 ksi, 65 ksi, and 70 ksi, respectively, primarily used for pipelines and other transmission
applications.
● Q125: A high-strength, quenched and tempered steel with a minimum yield strength of
6
125 ksi, primarily used for drilling equipment in highly demanding applications such as deepwater
drilling.
It's important to note that each API steel grade has specific mechanical and chemical properties that
must be met in order to be approved for use in the petroleum industry, and that these properties can
vary depending on the specific application and environment.
API steels are used in a variety of applications throughout the petroleum industry, from upstream
exploration and production to downstream refining and petrochemical processing. Specifically, these
steels are used in a variety of applications such as:
1. Oil and Gas Pipelines: API steels are commonly used in the construction of pipelines for
the transportation of oil and gas. The high strength and toughness of these steels make them ideal for
withstanding the high pressures and stresses that can occur during pipeline operation [8].
2. Offshore Platforms: API steels are used in the construction of offshore platforms, which
must withstand harsh environmental conditions including high winds, waves, and corrosive
saltwater. The strength, toughness, and corrosion resistance of API steels make them well-suited for
this application.
3. Drilling Equipment: API steels are also used in the manufacture of drilling equipment
such as drill pipes [9], casing, and tubing [10]. These components must be able to endure the
extreme conditions of drilling operations, including high pressures, temperatures, and corrosive
fluids.
4. Refineries and Petrochemical Plants: API steels are used in the construction of equipment
such as storage tanks, pressure vessels [11], and heat exchangers [12] in refineries and
petrochemical plants [13]. The high strength and corrosion resistance of these steels make them
ideal for use in these harsh environments.
7
Overall, API steels play a critical role in the petroleum industry, helping to ensure the safety,
reliability, and efficiency of the equipment and infrastructure used in the exploration, production,
transportation, and processing of oil and gas.
Machine learning has transformed numerous fields by enabling systems to process and analyze
large amounts of data faster and more accurately than humans. It has become an essential tool in
applications ranging from daily conveniences like personalized recommendations on streaming
platforms to critical tasks such as diagnosing diseases or optimizing supply chains.
1. Automation: Machine learning reduces the need for manual intervention by automating
complex processes.
2. Scalability: Models can process vast datasets and adapt to new data, making them
scalable for diverse applications.
3. Improved Decision-Making: By identifying patterns in data, machine learning provides
actionable insights for better decision-making in industries like finance, healthcare, and
manufacturing.
1. Image and Speech Recognition: Machine learning techniques have been successfully applied in
areas such as image recognition, object detection, and speech recognition [15]. These applications are
used in various fields like autonomous vehicles, medical imaging, and voice assistants.
2. Natural Language Processing (NLP): NLP techniques facilitate the comprehension and
processing of human language by machines. Applications include language translation, sentiment
analysis, text summarization, and chatbots [16].
9
3. Fraud Detection: Machine learning algorithms can identify anomalies and patterns in large
datasets, making them valuable for fraud detection in finance [17], insurance [18], and e-commerce
sectors [19].
5. Healthcare and Medicine: Machine learning is revolutionizing healthcare with applications
such as disease diagnosis, drug discovery, medical imaging analysis, and personalized medicine
[21].
6. Predictive Analytics: Machine learning models have the capability to analyze past data and
generate forecasts in various domains, including finance, sales, marketing, and demand forecasting
[22].
7. Autonomous Systems: Machine learning plays a crucial role in autonomous systems like
self-driving cars, drones, and robots [23]. These systems use ML algorithms to perceive and
understand the environment, make decisions, and navigate.
These are just a few examples of how machine learning is being applied across different
industries. As technology advances and data availability increases, the potential for machine
learning applications continues to expand, driving innovation and solving complex problems.
4. Computational Resources: Training machine learning models, especially on large datasets, can
require significant computational power and time.
As technology advances and data availability grows, machine learning is expected to play an even
greater role in shaping the future. Developments in areas such as deep learning, natural language
processing, and reinforcement learning are opening up new possibilities in AI, from more
human-like interactions in virtual assistants to the automation of scientific discoveries.
Bhandari et al. [24] in his work presents a study on the application of machine learning (ML)
methods to predict the yield strength of high entropy alloys (HEAs) at high temperatures. HEAs
have gained significant attention due to their promising properties and potential applications in
structural materials. The authors utilize the random forest (RF) regressor model to predict the
yield strengths of MoNbTaTiW and HfMoNbTaTiZr at temperatures of 800°C, 1200°C, and
1500°C. The predicted results are compared with experimental data, and the accuracy and
effectiveness of the ML model are evaluated. The research paper explores an interesting and
relevant topic in the field of materials science and engineering. The use of machine learning
methods to predict the mechanical properties of HEAs, such as yield strength, is a valuable
approach that can save time and costs associated with experimental trials. The paper is
well-structured, providing clear sections for the introduction, computational methods, results, and
discussion. Choudhury et al. [25] discussed about a machine intelligence-based model for
predicting the mechanical properties of low carbon steels. The authors address the need for an
automated prediction model that can evaluate the mechanical properties without the need for
11
experimental processes. They propose a model based on machine learning techniques to predict
the elongation and yield strength of low carbon steels produced through various
thermomechanical processes. The authors emphasize that the mechanical
data and thermophysical calculations, the researchers identified key features that influence yield
strength. The correlations revealed that Tensile test temperature had the strongest impact,
followed by the nitrogen content, cold rolling, entropy, Co content, grain size, Fe content, and
melting temperature. Understanding these relationships provides insights into modifying the yield
strength of nitrogen-doped CoCrFeMnNi HEAs for specific applications. The ML model
demonstrated high accuracy, with a coefficient of determination (R2) of 95.54% and a low mean
absolute error (MAE) of 33.10. The predicted yield strength values closely matched the
experimental values, confirming the reliability of the model. Notably, the model accurately
predicted the yield strength of a nitrogen-doped HEA that underwent cold rolling and annealing,
with an error of only 1.36%. This study highlights the potential of ML techniques in optimizing
thermomechanical processing parameters and predicting material properties for HEAs. The ability
to design and develop HEAs with superior properties using ML models and material parameters
has significant implications for various industries. Overall, this research showcases the power of
ML in accelerating the discovery and design of advanced materials with desired mechanical
properties, ultimately leading to more efficient and cost-effective material development processes.
Xu et al. [27] presented in his study about predicting the tensile properties of AZ31 magnesium alloys using
machine learning techniques. The authors explain the rules employed for data collection, such as handling
missing attribute values and excluding certain processing methods. They provide details on the ANN and
SVM models used, including the architecture, activation functions, optimization strategies, and
13
hyperparameters. The dataset exploration section presents the characteristics of the collected data, including
the distribution of yield strength (YS), ultimate tensile strength (UTS), and tensile elongation (EL). The
authors calculate the Pearson correlation coefficients to assess the relationship between individual attributes
and the output properties. The research paper provides a comprehensive study on predicting the tensile
properties of AZ31 magnesium alloys using machine learning techniques. The authors successfully
demonstrate the applicability of ANN and SVM models for this task and discuss the implications of their
findings. The paper is well-organized and provides sufficient details on the methodology and results.
However, it would benefit from further discussion on the limitations and potential future directions of the
research. Chun et al. [28] attempted to provide a simple, accurate, and cost-effective method to predict the
residual tensile strength of corroded steel structures. The researchers used FEM to obtain tensile strength data
for artificially corroded plates, generated using a spatial autocorrelation model. The corroded surface data and
material properties were used as input, and the tensile strength was used as the output to train the ANN model.
The accuracy of the model was validated using leave-one-out cross-validation. The research paper confirms
that the proposed ANN approach outperforms previous methods in terms of accuracy. The comparison
between FEM results and experimental data shows a good agreement, validating the FEM model used to
develop the ANN. The study demonstrates that the ANN model can predict tensile strength without the need
for additional tensile tests or FEM analysis. The research paper presents an innovative application of artificial
neural networks for predicting the tensile strength of corroded steel plates. The proposed ANN model offers a
simple, accurate, and cost-effective method to assess the residual tensile strength of corroded steel structures.
The results demonstrate the superiority of the ANN approach compared to previous methods. Sami et al. [29]
demonstrated on the prediction of the tensile and compressive strength of concrete using various machine
learning algorithms. The authors highlight the importance of concrete strength in determining the durability
and performance of structures and the challenge of optimizing the constituent proportions to achieve
high-strength concrete. The goal of the research is to develop accurate and efficient prediction models to
replace time-consuming and resource-intensive laboratory tests. To address this need, the authors employ
different machine learning algorithms, including tree regression models, regression models, ensemble
regression (ER), support vector regression (SVR), and Gaussian process regression (GPR), to predict the
tensile and compressive strength of concrete. The models are trained and tested using a dataset compiled from
journal publications. The results of the study demonstrate the effectiveness of the machine learning models in
predicting concrete strength. The exponential Gaussian process regression (GPR) model exhibits the highest
performance and accuracy among the models considered. It achieves an impressive R2 of 0.98, RMSE of
14
2.412 MPa, and MAE of 1.6249 MPa for predicting the compressive strength of concrete using eight input
variables during the training phase. In the testing phase, the model maintains its accuracy with an R2 of 0.99,
RMSE of 0.0025134 MPa, and MAE of 0.0016367 MPa. Similarly, the GPR model performs well in
predicting the tensile strength of concrete with an R2, RMSE, and MAE of 0.99, 0.00049247 MPa, and
0.00036929 MPa, respectively. Najjar et al. [30] in his study presented a detailed investigation of the
mechanical properties and prediction modeling of aluminum nanocomposites. The research is significant as it
addresses the growing interest in metal matrix composites (MMCs), specifically aluminum metal matrix
composites (Al- MMCs), due to their exceptional properties and wide range of applications. In mechanical
properties, we observe a significant enhancement in the yield strength (YS), ultimate tensile strength (UTS),
and hardness of the composites compared to the unreinforced aluminum. The UTS and YS values show a
notable increase after the initial ARB cycles, and further improvements are achieved with subsequent cycles.
The highest UTS enhancement is achieved in the composite with 4% SiC, reaching 445% after 9 ARB cycles.
These findings highlight the effectiveness of the ARB technique and the addition of SiC particles in enhancing
the mechanical properties of Al-MMCs. The research paper provides a comprehensive study on the
fabrication and characterization of aluminum nanocomposites reinforced with µ-SiC particles using the
accumulative roll bonding (ARB) technique. The experimental results highlight the improvement in the
mechanical properties of the composites with increasing ARB cycles and SiC content. Additionally, the
proposed machine learning model based on the modified random vector functional link using the Growth
Optimizer Algorithm shows promising potential for accurately predicting the tensile properties of the
composites. The findings presented in this paper contribute to the field of metal matrix composites and offer
valuable insights for researchers and practitioners in materials science and engineering
Stoll. et. el. [31] addressed the growing need for data-driven methods in materials science due to the
expanding amount of material data from experiments and simulations. The paper explores the
potential of machine learning (ML) techniques in predicting material properties and facilitating
material characterization. The paper also discusses the evolution of materials science paradigms, from
experimental investigations to analytical equations, computational simulations, and data-driven
science. It emphasizes the value of large data volumes in discovering hidden correlations and patterns
15
that may not be apparent in smaller datasets. However, the authors acknowledge the challenges
posed by handling large amounts of data and the limitations of small datasets commonly encountered
in materials science due to expensive and time-consuming data acquisition processes. The authors
examine different ML approaches based on SPT data and discuss a case study involving the prediction
of tensile properties using ML models trained on SPT data. The goal is to determine whether a ML
model can accurately predict tensile properties based on SPT measurements. Shaheen et al. [32]
developed a novel approach to predict the strength and stiffness reduction factors for HSS at elevated
temperatures using machine learning techniques, considering the effect of material chemical
composition. The authors highlight that no prior studies are available in the open literature regarding
the prediction of HSS mechanical properties at elevated temperatures by machine learning, making
their research contribution unique. The development of the artificial neural network (ANN) is
explained, highlighting the use of a multilayer perception model with feed-forward back-propagation
for supervised learning. The three-layer structure of the ANN, including the input layer, hidden layers,
and output layer, is described. The authors provide equations defining the elevated temperature
reduction factors for ultimate tensile strength, effective yield strength, 0.2% proof strength, and
Young's modulus, which were adopted throughout the analysis in the paper. The authors give
equations defining the elevated temperature reduction factors for ultimate tensile strength, effective
yield strength,0.2 validation strength, and Young's modulus, which were espoused throughout the
analysis in the paper. The authors declare that they've no given contending fiscal interests or particular
connections that could have appeared to impact the work reported in this paper.
16
CHAPTER 3 METHODOLOGY
The data for this research project was collected from multiple sources, including research
papers, websites, and handbooks. These diverse sources were utilized to ensure a
comprehensive and well-rounded dataset for analysis.
Websites were another important source of data, offering a wide range of resources such as
17
industry reports, technical specifications, and case studies. Care was taken to select
authoritative websites from reputable organizations, academic institutions, and government
agencies, making sure the gathered data is accurate and relevant.
Handbooks, reference materials, and technical manuals also played a crucial role in data
collection. These resources provided essential background information, industry standards,
and practical guidelines related to the subject matter. The handbooks were carefully chosen
based on their relevance and reputation within the field.
Overall, the data collection process involved meticulous gathering, organization, and
verification of information from diverse sources. By utilizing a combination of research
papers, websites, and handbooks, we aimed to ensure the completeness and reliability of
the dataset, allowing for a robust analysis and meaningful conclusions to be drawn from
the research findings.
18
19
20
During the analysis of the collected data, various exploratory data analysis techniques were
employed to gain insights into the dataset. Descriptive analysis was conducted to
summarize and understand the distribution, central tendency, and variability of the different
variables.
During the analysis, it was observed that some features exhibited a substantial number of
outliers, indicating potential anomalies or errors in the data. These outliers were carefully
examined to determine their validity and potential impact on the analysis results. Similarly,
certain features were found to have minimal variation, suggesting a lack of diversity or
limited usefulness in the analysis. These features were noted and their implications were
taken into account during subsequent modeling and interpretation stages.
Feature selection is a process in machine learning where we identify and select the most
important features (variables) from a dataset that significantly contribute to the performance of a
predictive model. This step is crucial in simplifying the model, improving accuracy, and reducing
overfitting by removing irrelevant or redundant data.
In the project, feature selection likely involved analyzing the dataset of API steels to identify
which chemical compositions or material properties most strongly influenced yield and tensile
strength. The process could include:
23
1.Eliminating Redundant Features : Variables like physical properties that showed little to no
variation were removed because they didn’t provide new information to the model.
2.Handling Outliers : Features such as chromium (Cr) and nickel (Ni) may have contained
significant outliers. These were removed for linear regression models to improve robustness but
retained for non-linear models, which can handle outliers better.
3.Retaining Informative Features : Key features (e.g., hardness, carbon content) that strongly
impacted the mechanical properties were retained for model training.
By performing feature selection, the project ensured that only the most relevant data
contributed to the predictive models, enhancing their accuracy and efficiency.
Model training is the process of teaching a machine learning algorithm to recognize patterns and
relationships in data by using a labeled or unlabeled dataset. During training, the algorithm learns
from the input data to optimize its parameters and improve its predictive capabilities for new,
unseen data.
Various machine learning algorithms were employed in this study to predict the yield strength of
API steels. These algorithms included multiple regression, Lasso regression, Ridge regression,
decision tree, and random forest. Each algorithm has its unique approach and characteristics that
contribute to the predictive modeling process.
5. Output:The trained models, particularly Random Forest and XGBoost, provided high
accuracy in predicting the mechanical properties of API steels.
Multiple regression is a linear regression technique that aims to establish a relationship between
the dependent variable (yield strength and tensile strength) and multiple independent variables
(features). It presupposes a linear relationship and estimates the coefficients of the predictor
variables to predict the target variable [35]. Fig. 3.4 shows the implementation of ridge regression
in python.
25
26
Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a type of linear
regression that incorporates L1 regularization. This technique not only minimizes the
residual sum of squares (RSS) to fit the model but also adds a penalty proportional to the
absolute values of the regression coefficients.
27
Decision trees are non-linear models that utilize a hierarchical structure of decision rules to make
predictions. They partition the feature space into smaller regions and assign a prediction value to
each region. Decision trees can capture complex relationships and interactions among the features
[38].
A common approach for creating decision trees is CART (Classification and Regression Trees).
The CART algorithm divides the data into two subsets, each of which is divided along a single
feature. To maximize the gain in homogeneity or decrease in impurity in the resulting subsets, the
feature and splitting threshold are selected. A measure of the output values' variability is used to
determine the impurity of a subset. Entropy and Gini impurity are frequently used impurity
measures for classification tasks, whereas mean squared error (MSE) is frequently used for
regression assignments [39]. The cost function that the CART algorithm minimizes in order to
determine the optimal split of a given node is shown below :
29
n = Records in the dataset
The initial root node of the CART algorithm represents the entire dataset. The method selects the feature and
the splitting threshold at each level of the tree in order to reduce impurity or improve the homogeneity of the
generated subgroups. Up until it meets a stopping requirement, such as a minimum number of samples in each
leaf node or a maximum tree depth, the algorithm divides the data again into smaller subsets [40].
Once the tree is built, it is possible to make predictions based on new data by moving through the tree from
root to leaf nodes. Based on the value of the feature in the input data, the algorithm applies the decision rule
corresponding to the splitting feature at each node and then proceeds down the relevant branch. The prediction
is then made using the leaf node's related output value.
.
30
3.4.3.2 RANDOM TREE
Random forest is a technique in ensemble learning that utilizes the power of multiple decision
trees to formulate predictions. It leverages the principle of "wisdom of the crowd" by aggregating
the estimations of individual trees. Random forest can improve the predictive accuracy and handle
non-linear relationships effectively [41].
In Random Forest regression, individual trees are constructed by utilizing a randomized subset of
the training data along with a randomized subset of the input features [42]. Enhancing the model's
performance is achieved by reducing the correlation among the trees. Averaging the forecasts of
each individual tree yields the prediction of the Random Forest model [43]. In comparison to other
regression models, Random Forest has a number of advantages, such as the ability to handle large
datasets with numerous input variables, the ability to recognize and manage non-linear
relationships between input features and the target feature, and the ability to deal with outliers and
missing data [44]. On the other hand, due to the quantity of decision trees generated, it is not
interpretable [45] and takes additional training time and memory [46]
31
3.4.4 BOOSTING MODEL
To create predictions about the target variable in XGBoost regression, the method first constructs
a single decision tree. The residual difference between the estimated values and the real values is
then calculated, and this error serves as the target variable for the subsequent tree. Until a
predetermined number of trees have been produced or the error has been minimized, the
algorithm repeats this procedure, creating one tree at a time and adding it to the ensemble.
Fig.3.12 shows the implementation of XGBoost using python.
The key benefits of XGBoost are its performance, speed, and scalability. It is built to effectively
handle massive datasets with millions of rows and thousands of columns. In order to lessen
overfitting and increase generalization, XGBoost can additionally tolerate missing values and
features using built-in regularization.
But XGBoost also has significant shortcomings. When working with huge datasets, it can be
computationally expensive and demands precise hyperparameter adjustment. Additionally,
XGBoost is a "black-box" model, which makes it challenging to identify and understand the
fundamental connections between the independent and the predicted variable.
32
By employing these diverse algorithms, the study aimed to explore and compare their
performance in predicting the yield strength of API steels. Each algorithm brings its own
strengths and characteristics to the analysis, providing a comprehensive understanding of their
effectiveness and suitability for the given dataset.
Mean Absolute error (MAE): MAE indicates that on average, how far the predicted values are
from actual values, without taking their direction into consideration. It is calculated by taking the
absolute difference between the real and estimated values and then averaging them. The Mean
Absolute Error (MAE) shares the same units as the feature that is being estimated. The lower the
value of MAE, the better the model’s performance, since it means that the model's estimations are
closer to the real values [48].
33
Mean Absolute Percentage Error (MAPE): MAPE is a metric that quantifies the average
percentage difference between the estimated and true values. To calculate the MAPE, the absolute
percentage difference between the estimated and true values is obtained, and the resulting values
are averaged. MAPE is expressed as a percentage. The model's performance improves as the
value of MAPE decreases, since it shows that the model's predictions exhibit a higher level of
accuracy when compared to the actual values in terms of percentages [49].
R-squared (R2) Score: The R2 score evaluates how much of the variability in the target variable
can be accounted for by the independent variables incorporated in the model, ranging between 0
and 1. A higher R2 value indicates a better model’s performance. The calculation of R2 involves
subtracting the ratio of the sum of squares of the residuals to the total sum of squares from 1. A R2
score of 1 denotes that the model fits the data perfectly, whereas a score of 0 indicates that the
model fails to explain any variation in the target variable [50].
34
It is important to choose the appropriate performance metrics on the basis of specific problem and
the type of data being analyzed. These metrics help researchers and practitioners understand the
strengths and weaknesses of the models, compare different algorithms, and make informed
decisions about model selection and improvement. By quantifying the model's performance,
performance metrics offer valuable insights into the effectiveness and reliability of ML models.
35
CHAPTER 4 RESULT AND DISCUSSION
The Results and Discussion section presents the findings and analysis of the predictive models
developed for the mechanical properties of API steels. The objective of this study was to explore
the effectiveness of various machine learning algorithms in predicting the yield strength and
tensile strength of API steels. In this section, we showcase the outcomes of the trained models and
discuss their performance in terms of accuracy, reliability, and interpretability. Furthermore, we
analyze the importance of different features in predicting the mechanical properties and offer a
deeper understanding of the fundamental connections among the composition and properties of
API steels. The results obtained from this study contribute to the understanding and prediction of
the mechanical behavior of API steels, enabling better decision-making and optimization in the
selection and utilization of these materials in various applications.
The feature importance analysis provides valuable insights into the contribution of each input
variable in predicting the mechanical properties of API steels. Feature importance represents the
relative influence or significance of each feature in the predictive model. It helps us understand
which variables have the most impact on the outcome variable, such as the yield strength of API
steels. The feature importance is calculated based on the model's internal mechanism, such as the
weights assigned to features in linear regression or the split points in decision trees. By examining
the feature importance values, we can identify the key factors that affect the yield strength of API
steels and prioritize them for further investigation or optimization. Additionally, feature
importance analysis aids in the interpretability of the model by highlighting the most influential
features, enabling researchers and engineers to acquire a more profound comprehension of the
underlying relationships between the chemical composition and mechanical properties of API
steels.
36
CHAPTER 5 CONCLUSIONS
IN THIS CHAPTER, WE WILL CONCLUDE THE DIFFERENT TYPES OF ALGORITHMS THAT WE WILL USE TO
PREDICT THE YIELD STRENGTH AND TENSILE STRENGTH OF API STEEL AND OUT OF THESE, WHICH IS THE
MOST EFFECTIVE METHOD/MODEL .
37
REFERENCES
[2] G. Ananta Nagu and T. K. G Namboodhiri, ‘Effect of heat treatments on the hydrogen embrittlement
susceptibility of API X-65 grade line-pipe steel’, Bull. Mater. Sci, vol. 26, no. 4, pp. 435–439, 2003.
[3] A. K. Das, ‘The present and the future of line pipe steels for petroleum industry’, Center for Bioinformatics and
Molecular Biostatistics, vol. 25, no. 1–3, pp. 14–19, Jan. 2010, doi: 10.1080/10426910903202427.
[5] W.E. White and G. I. Ogundele, ‘Influences of Dissolved Hydrocarbon Gases and Variable Water Chemistries on
Corrosion of an API-L80 Steel’, CORROSION, vol. 43, no. 11, pp. 665–673, 1987.
[6] R. Elgaddafi, R. Ahmed, and S. Shah, ‘Modeling and experimental studies on CO2-H2S corrosion of API carbon
steels under high-pressure’, J Pet Sci Eng, vol. 156, pp. 682–696, 2017, doi: 10.1016/j.petrol.2017.06.030.
[8] F. O. Kolawole, S. K. Kolawole, J. O. Agunsoye, J. A. Adebisi, S. A. Bello, and S. B. Hassan, ‘Mitigation of Corrosion
Problems in API 5L Steel Pipeline-A Review’, J. Mater. Environ. Sci, vol. 9, no. 8, pp. 2397–2410, 2018, [Online]. Available:
http://www.jmaterenvironsci.com!
[9] M. Ziomek-Moroz, ‘Environmentally assisted cracking of drill pipes in deep drilling oil and natural gas wells’, J
Mater Eng Perform, vol. 21, no. 6, pp. 1061–1069, Jun. 2012, doi: 10.1007/ s11665-011-9956-6.
[10] P. D. Thomas, ‘Steels for Oilwell Casing and Tubing - Past, Present and Future’, JOURNAL OF PETROLEUM
TECHNOLOGY, pp. 495–500, May 1963, [Online]. Available: http://onepetro.org/JPT/
article-pdf/15/05/495/2213849/spe-527-pa.pdf
[11] R. L. Amaro, E. S. Drexler, and A. J. Slifka, ‘Fatigue crack growth modeling of pipeline steels in high pressure
gaseous hydrogen’, Int J Fatigue, vol. 62, pp. 249–257, 2014, doi: 10.1016/ j.ijfatigue.2013.10.013.
[12] ‘Overview of API 660 - Shell-and-Tube Heat Exchangers’. https://inspectioneering.com/tag/ api+660 (accessed
May 21, 2023).
38
[13] C. Subramanian, ‘Localized pitting corrosion of API 5L grade A pipe used in industrial fire water piping applications’,
Eng Fail Anal, vol. 92, pp. 405–417, Oct. 2018, doi: 10.1016/ j.engfailanal.2018.06.008.
[15] K. Noda, Y. Yamaguchi, K. Nakadai, H. G. Okuno, and T. Ogata, ‘Audio-visual speech recognition using deep
learning’, Applied Intelligence, vol. 42, no. 4, pp. 722–737, Jun. 2015, doi: 10.1007/s10489-014-0629-7.
[16] P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, ‘Natural language processing: An introduction’, Journal of
the American Medical Informatics Association, vol. 18, no. 5. pp. 544–551, Sep. 2011. doi: 10.1136/amiajnl-2011-000464.
[17] J. Perols, ‘Financial statement fraud detection: An analysis of statistical and machine learning algorithms’, Auditing:
A Journal of Practice & Theory, vol. 30, no. 2, pp. 19–50, May 2011, doi: 10.2308/ajpt-50009.
[18] C. Gomes, Z. Jin, and H. Yang, ‘Insurance fraud detection with unsupervised deep learning’,
Journal of Risk and Insurance, vol. 88, no. 3, pp. 591–624, Sep. 2021, doi: 10.1111/jori.12359.
[19] J. Nanduri, Y. Jia, A. Oka, J. Beaver, and Y. W. Liu, ‘Microsoft uses machine learning and optimization to reduce
e-commerce fraud’, INFORMS Journal on Applied Analytics, vol. 50, no. 1, pp. 64–79, Jan. 2020, doi:
10.1287/inte.2019.1017.
[20] F. O. Isinkaye, Y. O. Folajimi, and B. A. Ojokoh, ‘Recommendation systems: Principles, methods and evaluation’,
Egyptian Informatics Journal, vol. 16, no. 3. Elsevier B.V., pp. 261–273, Nov. 01, 2015. doi: 10.1016/j.eij.2015.06.005.
[21] A. Rajkomar, J. Dean, and I. Kohane, ‘Machine Learning in Medicine’, New England Journal of Medicine, vol. 380,
no. 14, pp. 1347–1358, Apr. 2019, doi: 10.1056/nejmra1814259.
[22] S. Makridakis, E. Spiliotis, and V. Assimakopoulos, ‘Statistical and Machine Learning forecasting methods: Concerns
and ways forward’, PLoS One, vol. 13, no. 3, Mar. 2018, doi: 10.1371/journal.pone.0194889.
[23] S. Y. Choi and D. Cha, ‘Unmanned aerial vehicles using machine learning for autonomous flight; state-of-the-art’,
Advanced Robotics, vol. 33, no. 6, pp. 265–277, Mar. 2019, doi: 10.1080/01691864.2019.1586760.
[24] U. Bhandari, M. R. Rafi, C. Zhang, and S. Yang, ‘Yield strength prediction of high-entropy alloys using machine
learning’, Mater Today Commun, vol. 26, Mar. 2021, doi: 10.1016/ j.mtcomm.2020.101871.
[25] A. Choudhury, ‘Prediction and Analysis of Mechanical Properties of Low Carbon Steels Using Machine Learning’,
Journal of The Institution of Engineers (India): Series D, vol. 103, no. 1, pp. 303–310, Jun. 2022, doi:
10.1007/s40033-022-00328-y.
[26] M. Veeresham, R. Jain, U. Lee, and N. Park, ‘Machine learning approach for predicting yield strength of
nitrogen-doped CoCrFeMnNi high entropy alloys at selective thermomechanical processing conditions’, Journal of
Materials Research and Technology, vol. 24, pp. 2621–2628, May 2023, doi: 10.1016/j.jmrt.2023.03.146.
[27] X. Xu, L. Wang, G. Zhu, and X. Zeng, ‘Predicting Tensile Properties of AZ31 Magnesium Alloys by Machine Learning’,
JOM, vol. 72, no. 11, pp. 3935–3942, Nov. 2020, doi: 10.1007/ s11837-020-04343-w.
[28] C. N. N. Karina, P. jo Chun, and K. Okubo, ‘Tensile strength prediction of corroded steel plates by using machine
learning approach’, Steel and Composite Structures, vol. 24, no. 5, pp. 635–641, Aug. 2017, doi:
10.12989/scs.2017.24.5.635.
[29] B. H. Ziyad Sami et al., ‘Feasibility analysis for predicting the compressive and tensile strength of concrete using
machine learning algorithms’, Case Studies in Construction Materials, vol. 18, Jul. 2023, doi: 10.1016/j.cscm.2023.e01893.
39
[30] I. M. R. Najjar, A. M. Sadoun, M. A. Elaziz, H. Ahmadian, A. Fathy, and A. M. Kabeel, ‘Prediction of the tensile
properties of ultrafine grained Al–SiC nanocomposites using machine learning’, Journal of Materials Research and
Technology, vol. 24, pp. 7666–7682, May 2023, doi: 10.1016/j.jmrt.2023.05.035.
[31] A. Stoll and P. Benner, ‘Machine learning for material characterization with an application for predicting
mechanical properties’, GAMM Mitteilungen, vol. 44, no. 1, Mar. 2021, doi: 10.1002/ gamm.202100003.
[32] M. A. Shaheen, R. Presswood, and S. Afshan, ‘Application of Machine Learning to predict the mechanical
properties of high strength steel at elevated temperatures based on the chemical composition’, Structures, vol. 52, pp.
17–29, Jun. 2023, doi: 10.1016/j.istruc.2023.03.085.
[34] S.-W. Choi, ‘The Effect of Outliers on Regression Analysis: Regime Type and Foreign Direct Investment’, Quart J Polit
Sci, vol. 4, pp. 153–165, 2009, doi: 10.1561/100.00008021_supp.
[35] M. M. Wagner, A. W. Moore, and R. M. Aryel, ‘Combining Multiple Signals for Biosurveillance’, in Handbook of
Biosurveillance, Academic Press, 2006, pp. 235–242.
[36] Induraj, ‘How to derive B0 and B1 in Linear Regression- Part2’, 2020. https://
induraj2020.medium.com/how-to-derive-b0-and-b1-in-linear-regression-4d4806b231fb (accessed Apr. 15, 2023).