0% found this document useful (0 votes)
42 views

Predicting Heart Disease in Patients Using Bat Features Selection and Back Propagation Algorithm

Predicting model for heart disease

Uploaded by

michael samuel
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Predicting Heart Disease in Patients Using Bat Features Selection and Back Propagation Algorithm

Predicting model for heart disease

Uploaded by

michael samuel
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

CHAPTER ONE

INTRODUCTION

1.1 BACKGROUND OF THE STUDY

Heart is the main part of our body. Our life depends on the working of the heart. If the heart fails

to work, it will affect the other parts of our body including brain, kidney etc. Heart disease is the

term that indicates the non-functioning of the heart. Several factors increase the risk of heart

disease which includes cholesterol, blood pressure, and lack of physical exercise, smoking and

obesity. The World Health Organization (WHO) has estimated that by 2030, nearly 23.6 million

people have died because of heart disease. In order to minimize the risk of heart disease

prediction of heart disease is a must to discover the disease based on symptoms, physical check-

up and signs of the patient body. Discovering and predicting disease is a tedious task in medical

environment. Discovery is a multilayered problem which may have negative presumptions and

unpredictable effects.

So, the HealthCare Industry maintains huge amount of complex data about patients, resources of

the hospital, disease diagnosis, electronic patient records, equipment etc. now this becomes the

knowledge source for data extraction which can save negative presumptions and unpredictable

effects. The good advantage as a result of diagnosis by a doctor is very active and intelligent

prediction.

Neural Networks has been widely used in the medical field for forecasting disease. NN has been

established of their potentials in many domains related with medical forecasting and diagnosis

disease. NNs can never replace the human experts but can help them in decision making,

classifying, screening and to cross-verify their diagnosis. The dataset has several attributes like

age, sex, blood pressure and blood sugar which are used to predict the risk of patients getting a

heart disease.
1.2STATEMENT OF THE PROBLEM

Traditional predicting methods have limitations in predicting heart disease in patient.

These limitations arise due to the highly uncertain and volatile demand of patient, as well

as the lack of historical background of the health status of the patient. The traditional

methods are inappropriate for analyzing if a patient have heart disease or not because of

their inability to handle these challenges . Additionally, the traditional prediction system

make it difficult to generate accurate result using traditional methods, as historical data is

not available as an indicator of future prediction. Therefore, it is crucial to develop

different predicting methods that can analyze and predict accurately heart disease in

patient.

The clinical symptoms of the heart disease complicate the prognosis, as it is influenced

by many factors like functional and pathologic appearance. This could subsequently

delay the prognosis of the disease. Hence, there is a need for the invention of newer

concepts to improve the prediction accuracy with short span. Disease prognosis through

numerous factors or symptoms is a multi-layer problem even that could lead to a false

assumption. Therefore, an attempt is made to bridge the knowledge and the experience of

the experts and to build a system that fairly supports the diagnosing process.

In hospitals, there are provisions for continuous monitoring of critical care heart patients

whereas after the release from the hospital patients normally go out of direct supervision.

These patients need continuous monitoring of their health condition to reduce the risks of

unwanted complications at least for a week or so.


1.3AIM AND OBJECTIVES OF THE STUDY

Aim

The aim of this project to develop software for predicting heart disease in patients using BAT

feature selection and back propagation Neural Network Algorithms.

The following are the objectives:

i. To develop a software that will predict heart disease in patients

ii. To design the new combination of a classifier to forecast the presence or absence of

heart disease.

1.4 SIGNIFICANCE OF THE STUDY

Large volume of medical data are available in medical industry and acts as a great source of

predicting useful and hidden facts in almost all medical problems. These facts would really in

turn, help the practitioners to make accurate predictions. The techniques of Artificial Neural

Network concepts have contribute in yielding highest prediction accuracy over medical data.

1.5 SCOPE OF THE STUDY

The scope of this study is to develop a model that will predict heart disease in patients, using

BAT feature selection and back propagation Neural Network Algorithms, the software is develop

for clinical use and health practitioners in predicting heart disease in patients.
CHAPTER TWO

LITERATURE REVIEW

2.1 Overview of the heart disease Prediction

Heart is main part of our body. Our life is dependent on the working of the heart. If the

heart fails to work, it will affect them other parts of our body including brain, kidney etc.

Heart disease is the term that indicates the non-functioning of the heart. Several factors

increase the risk of Heart Disease which includes cholesterol, blood pressure, and lack of

physical exercise, smoking and obesity. The World Health Organization (WHO) has

estimated that by 2030, nearly 23.6 million people will die because of Heart Disease. In

order to minimize the risk of Heart Disease prediction of Heart Disease is a must to

discover the disease based on symptoms, physical check-up and signs of the patient body.

Discovering and predicting disease is a tedious task in medical environment. Discovery is

a multilayered problem which may have negative presumptions and unpredictable effects.

So the HealthCare Industry maintains huge amount of complex data about patients,

resources of the hospital, disease diagnosis, electronic patient records, equipments etc.

now this becomes the knowledge source for data extraction which can save negative

presumptions and unpredictable effects.

With the development of medical data sourced from the patient's health record, there is a

great opportunity as a basic material in developing patient health. Currently, the use of

computers has been applied in various fields. In health, it can be used to improve the

decision-support system in medicine (Garate & Hajjam, 2020). Especially, implementing

machine learning as an analytical tool can find hidden patterns in the data (Hassan M,
2018). This development follows up a high degree of prediction in terms of proper

prevention.

Causes of Heart Disease

Heart disease, also known as cardiovascular disease, has several causes and risk factors.

Here are some of the most significant heart disease:

i. High blood pressure: uncontrolled high blood pressure can damage blood

vessels and increase the risk of heart disease .

ii. High cholesterol: elevated level of low-density lipoprotein (LDL) cholesterol

can lead to plaque buildup in arteries, increasing the risk of heart disease

iii. Smoking: Smoking damages blood vessels, increases blood pressure, and

reduces oxygen supply to the heart, making it a significant risk factor.

iv. Diabetes: high blood sugar levels can damage blood vessels and nerves,

increasing the risk of heart disease.

v. Obesity: excess weight can lead to high blood pressure, high colestrol, and

diabetes, all of which increase the risk of heart disease.

vi. Poor Diet: consuming a diet high in saturated fats, sodium, and added sugars

can increase the risk of heart disease.

vii. Lack of exercise: a sedentary lifestyle can contribute to obesity, high blood

pressure, and high colestrol.

viii. Age: heart disease risk increases with age especially after 45 for men and 55

for women.
2.2 Machine Learning

The subset of artificial intelligence focuses on building systems that learn or improve

performance based on the data they consume (Nasteski n.d.). It was born from pattern

recognition and the theory that computers can learn without being programmed to

perform specific tasks; researchers interested in artificial intelligence wanted to see if

computers could learn from data. The iterative aspect of machine learning is important

because as models are exposed to new data, they can independently adapt. They learn

from previous computations to produce reliable, repeatable decisions and results. The

practice of machine learning involves taking data, examining it for patterns, and

developing some sort of prediction about future outcomes (Liu et al. 2022). By feeding an

algorithm more data over time, data scientists can sharpen the machine learning model's

predictions. From this basic concept, several different types of machine learning have

developed.

Supervised learning

Gartner, a business consulting firm, predicts supervised learning will remain the

most utilized machine learning among enterprise information technology leaders through

2022. This type of machine learning feeds historical input and output data in machine

learning algorithms, with processing in between each input/output pair that allows the

algorithm to shift the model to create outputs as closely aligned with the desired result as

possible. Common algorithms used during supervised learning include neural networks,

decision trees, linear regression, and support vector machines.


This machine learning type got its name because the machine is “supervised”

while its learning, meaning you are feeding the algorithm information to help it learn.

The outcome you provide the machine is labeled data, and the rest of the information you

give is used as input features.

Unsupervised learning

While supervised learning requires users to help the machine learn, unsupervised learning

algorithms don't use the same labeled training sets and data. Instead, the machine looks

for less obvious patterns in the data. Unsupervised machine learning is very helpful when

you need to identify patterns and use data to make decisions. Common algorithms used in

unsupervised learning include Hidden Markov models, k-means, hierarchical clustering,

and Gaussian mixture models.

The unsupervised learning algorithm can be further categorized into two types of

problems:

i. Clustering: Clustering is a method of grouping objects into clusters such that

objects with the most similarities remain in a group and have fewer or no

similarities with the objects of another group (Benndorf et al. 2018). Cluster

analysis finds the commonalities between the data objects and categorizes them as

per the presence and absence of those commonalities.

ii. Association: An association rule is an unsupervised learning method that is used

for finding the relationships between variables in a large database. It determines

the set of items that occurs together in the dataset. The Association rule makes

marketing strategy more effective (Jiang et al. 2019).


Reinforcement learning

Reinforcement learning is the closest machine learning type to how humans learn. The

algorithm or agent used learns by interacting with its environment and getting a positive

or negative reward. Common algorithms include temporal difference, deep adversarial

networks, and Q-learning. Going back to the bank loan customer example, you might use

a reinforcement learning algorithm to look at customer information. If the algorithm

classifies them as high-risk and they default, the algorithm gets a positive reward. If they

don't default, the algorithm gets a negative reward. In the end, both instances help the

machine learn by understanding both the problem and environment better. Gartner notes

that most ML platforms don't have reinforcement learning capabilities because they

require higher computing power than most organizations have. Reinforcement learning is

applicable in areas capable of being fully simulated that are either stationary or have large

volumes of relevant data. Because this type of machine learning requires less

management than supervised learning, it’s viewed as easier to work with when dealing

with unlabeled data sets.

2.3 Feature Extraction

Feature Extraction aims to reduce the number of features in a dataset by creating new

features from the existing ones (and then discarding the original features). These new

reduced sets of features should then be able to summarize most of the information

contained in the original set of features. In this way, a summarized version of the original

features can be created from a combination of the original set (Gemescu et al. 2019). The

process of feature extraction is useful when you need to reduce the number of resources

needed for processing without losing important or relevant information. Feature


extraction can also reduce the amount of redundant data for a given analysis. Also, the

reduction of the data and the machine’s efforts in building variable combinations

(features) facilitate the speed of learning and generalization steps in the machine learning

process.

Practical uses of feature Extraction

Data Science and Machine Learning: Dimensionality redu ction for visualization and

model performance improvement, Feature engineering for model training and prediction,

Data preprocessing and normalization.

Natural Language Processing (NLP): It is use in Text classification and sentiment

analysis, Topic modeling and information retrieval Language translation and language

modeling.

2.5 Back Propagation Neural Network

In machine learning, back propagation is a gradient estimation method used to train

neural network models. The gradient estimate is used by the optimization algorithm to

compute the network parameter updates. It is an efficient application of the chain rule to

neural networks.

Back propagation computes the gradient of a loss function with respect to the weights of

the network for a single input–output example, and does so efficiently, computing the

gradient one layer at a time, iterating backward from the last layer to avoid redundant

calculations of intermediate terms in the chain rule; this can be derived through dynamic

programming. Gradient descent, or variants such as stochastic gradient descent, is

commonly used.
2.6 Review of Related work

Heart Disease prediction using random Forest Algorithm

The random forest algorithm provides flexibility and robustness for classification tasks

using tabular data, which few other standard models can. Given its simplicity and

versatility, the random forest classifier is widely used for fraud detection, loan risk

prediction, and predicting heart diseases. With the ensemble learning theorem, the

random forest classifier combines results from several decision trees and optimizes

training. It aims to utilize different subsets and find the best combinations to increase the

dataset’s predictive accuracy. The first step is building, optimizing, mixing, and matching

several decision trees. Next, it uses these trees for prediction and ensembles their results

to yield the final output prediction.

Heart disease Prediction Using K-Nearest Neighbors

As the name says, a k neighbors classifier takes a data point and finds k other data points

nearest to it in the vector space. In a supervised fashion, KNN creates clusters of the data

samples having the same target value. Whenever a new value needs to be classified, it

uses a distance metric to assign it to one of the classes. For heart disease detection, there

are only two classes that KNN needs to build. Thus, it is pretty robust and efficient for

this task. Euclidean distance is one of the popular distance metrics used by KNN, but

there are many more available. However, the metric choice also impacts the classifier's

speed For larger datasets, KNN is already relatively slower than its contemporaries.
Heart disease Prediction Using Decision Tree classifier

Decision Trees are the individual models that make a random forest after ensembling.

Each decision tree classifier uses the dataset's attributes to create a tree. As shown in the

image below, the branches end up in the leaves that are made up of target values. Using

visual components and an information gain index, the tree identifies the leading features

of the labels of each class. Thus, the branches are created that maximize the information

gained in each split and lead up to the leaf node of that class. Decision trees are fast and

robust for disease prediction if the dataset has powerful features for a simple use-case.

Heart Disease Prediction using Support Vector Machines Algorithm

A Support Vector Machine (SVM) algorithm is a non-probabilistic classifier aiming to

generate hyper planes that divide the data points of two classes in the vector space. For N

number of features and M targets, SVM creates M-1 N-dimensional hyperplanes that

separate data points of different classes from each other. The image below shows how

"support" vectors are calculated such that the margin (or distance) between the vectors of

two classes is the most. SVM optimizes this margin metric to find the best hyperplane for

all the categories. Thus, SVMs are popular for disease prediction since they can

effectively categorize tabular data into different categories.

Heart Disease Prediction Artificial Neural Networks Algorithms

An ANN is perhaps the most popular machine learning model in today's AI landscape,

given its wide applications in deep learning in the form of convolution neural networks.

However, a normal ANN comprised of a handful of linear nodes can perform comparable

to the best standard ML models. The architecture of a standard ANN is shown in the
figure below. As we can see, the hidden layer is the most crucial part of an ANN, and is

made up of several linear nodes.

You can wrap several hidden layers in between the input and the output layer to increase

the complexity and, thus, the learning ability of the model. Adding more nodes to a layer

and more layers to the network would allow the model to learn more non-linear and

complex relationships between the categorical variables and input features. This ability

makes the network very capable of capturing relationships between the various biological

and personal markers that are already independently affecting the probability of the

presence of heart disease.

Heart Disease Prediction System Using Logistic Regression Algorithm this is done

Detecting the disease at a premature stage may save the life of the patient. Data mining

techniques are very popular and have been used in many fields including healthcare to

help the doctor to make better decisions. Machine learning provides classification

algorithms such as decision tree (DT), Naïve Bayes algorithm, Support machine vector

(SVM), and Logistic Regression (LG) are used in many types of research for predicting

heart disease. The dataset is collected from the Kaggle repository. It contains 604 data

and 14 attributes used to train the model that will be used in the web application.

Building an efficient prediction model to be deployed into the web application is the main

objective of this project.

(Jabbar et al, 2016) proposed work employed RF to predict cardiac illness. The CHI

approach was utilized to choose to take the related features. When compared to decision

trees, the proposed research suggests that random forests yield more accurate results. The

proposed work was built utilizing neural networks by (Kim JK, 2017). Sensitivity
analysis is indeed one of the evaluation metrics for prediction. The importance of features

with such a high degree of sensitivity was considered. After selecting the relevant

characteristic, correlated features were used to examine changes insensitivity. The

sensitivity of each feature is determined by it. This (Amin U, 2018) employed seven

classification algorithms to predict cardiac disease in people. This study used Relief,

MRMR, and LAS, and Selection Operator feature selection methods to choose the

appropriate feature.

In addition to the seven performance metrics this study employed, the ROC and AUC

will help clinicians diagnose heart patients more efficiently. To select an appropriate

feature, (Rani et al., 2021) used a Genetic Algorithm (GA) and recursive feature

elimination. The proposed study used standard and SMOTE to preprocess the data and

performed support vector machines, naive Bayes, logistic regression, random forest, and

an Ada Boost classifier to aid in the earlier prediction of heart disease hung on the

patient's medical features. The system's simulation environment was built in Python, and

it was discovered that random forest achieved a maximum accuracy of 86.6 percent. (Ali

et al., 2019) used the chi-square statistical approach to pick significant features. Particular

features that were selected were fed into a deep neural network, which was then trained to

do classification. A rigorous grid search method would be used to improve network

configuration.

(Paul et al, 2016) used a fuzzy decision support system (FDSS) that includes rules

derived from the genetic algorithm with perhaps even weighted fuzzy derivatives (GA).
They were able to recover eight useful features with an accuracy of 80%. Multiple heart

disease datasets were employed in this study.

(Bashir et al., 2019) for experimentation analysis and to increase accuracy performance.

Feature selection algorithms such as Decision Tree, Logistic Regression SVM, Nave

Bayes, and Random Forest are used with the Rapid miner, and accuracy is enhanced.

(Liu et al., 017) offered a study that used relief and rough set approaches. The proposed

system consists of two subsystems: the RFRS feature system and ensemble classifier

classifications. The first system has three stages: data extraction using the ReliefF

method, feature reduction using our heuristic Rough Set reduction technique, and feature

reduction using our heuristic Rough Set reduction technique. In the second system, which

is based on the C4.5 classifier, an ensemble classifier is suggested. The proposed

technique had a classification accuracy of 92.32 percent. On the Cleveland heart disease

dataset, (Singh et al, 2017) used an RF classifier that can handle large amounts of data

with missing values. This classifier generates a large number of decision trees that are

selected through voting. The chosen branch is used to improve precision. Due to the

obvious non-linear dataset, this study was able to reach an accuracy of 85.81 percent.
CHAPTER THREE

RESEARCH METHODOLOGY

3.1 Data Acquisition

Data acquisition for heart disease prediction involves collecting patient information. This

includes demographics like age and gender. Medical history, such as hypertension and

diabetes, is also collected. Clinical features like chest pain type and resting blood

pressure are gathered. Cholesterol levels, fasting blood sugar, and resting

electrocardiogram results are also collected. An exercise stress test is conducted to

measure maximum heart rate achieved. Exercise-induced angina and ST depression

induced by exercise are also recorded. The slope of the peak exercise ST segment is

calculated. Imaging features like the number of major vessels colored by fluoroscopy are

collected. The Brain Natriuretic Peptide (BNP) level, also known as the BAT feature, is

measured. The target variable, heart disease status, is determined. Data is collected from

electronic health records (EHRs) and clinical databases. Wearable devices like ECG

monitors and laboratory test results are also used. Data is preprocessed to handle missing

values and normalize the data. Features are selected and engineered to improve model

performance.

Data is split into training and testing sets. The data is then fed into a machine learning

model for training. The model is validated using the testing set. The trained model can

predict heart disease status based on the input features. The BAT feature is a key

predictor of heart disease status.


Table 3.1 List detail of the dataset features

Feature Description Type

Age Patient's age in years Numeric

Gender Male or Female Categorical

Chest Pain Type Type of chest pain (e.g., angina, non- Categorical

angina)

Resting Blood Pressure Patient's resting blood pressure Numeric

Cholesterol Levels Patient's cholesterol levels Numeric

Fasting Blood Sugar Patient's fasting blood sugar levels Numeric

Resting Electrocardiogram Results of resting electrocardiogram Categorical

Maximum Heart Rate Achieved Maximum heart rate achieved during Numeric

exercise

Exercise Induced Angina Presence or absence of exercise- Binary

induced angina

ST Depression Induced by Exercise ST depression induced by exercise Numeric

relative to rest

Slope of the Peak Exercise ST Slope of the peak exercise ST Numeric

Segment segment

Number of Major Vessels Colored Number of major vessels colored by Numeric


by Fluoroscopy fluoroscopy

Thalassemia Presence or absence of thalassemia Binary

BAT (BNP) Brain natriuretic peptide (BNP) Numeric

levels

Smoking Status Patient's smoking status Categorical

Alcohol Consumption Patient's alcohol consumption levels Categorical

Diabetes Status Patient's diabetes status Binary

Hypertension Status Patient's hypertension status Binary

Family History of Heart Disease Presence or absence of family history Binary

of heart disease

Body Mass Index (BMI) Patient's BMI Numeric

Heart Disease Status Presence or absence of heart disease Binary

3.2 Heart Disease Prediction Process

The heart disease prediction process begins with data collection, patient information,

medical history, and clinical features are gathered. The data is preprocessed to handle

missing values and normalize the data. Relevant features are selected and engineered to

improve model performance. A machine learning algorithm is chosen and trained on the

data, the algorithm learns patterns and relationships between features and heart disease

status, the trained model is validated using a testing set, the model's performance is
evaluated using metrics such as accuracy and f1 score, if the model meets the desired

performance threshold, it is deployed, new patient data is input into the deployed model,

the model predicts the likelihood of heart disease, the prediction is based on the patient's

individual characteristics, the model outputs a probability score or classification label, the

result is interpreted by a healthcare professional. if the prediction indicates high risk,

further testing or treatment may be recommended. the patient's data is added to the

existing dataset, the model is continuously updated and retrained, the performance of the

model is monitored and evaluated, the model is refined and improved over time, the goal

is to improve patient outcomes and reduce the risk of heart disease.

3.3 Feature Extraction

Feature extraction is a crucial step in heart disease prediction. The BAT feature is a key

predictor of heart disease status. Demographic features like age, gender, and medical

history are extracted. Clinical features like chest pain type, resting blood pressure, and

cholesterol levels are also extracted. Fasting blood sugar, resting electrocardiogram, and

maximum heart rate achieved are extracted from clinical data. Exercise stress test data

yields features like exercise-induced angina and ST depression induced by exercise.

Imaging data provides features like number of major vessels colored by fluoroscopy.

Additional features like thalassemia, smoking status, and family history of heart disease

are extracted. Diabetes status, hypertension status, and body mass index (BMI) are also

extracted. The BAT feature is extracted from blood test results. All extracted features are

preprocessed and normalized. Correlation analysis identifies relevant features. Feature

selection reduces dimensionality. Principal Component Analysis (PCA) or t-SNE is used

for dimensionality reduction. Feature engineering creates new features. Extracted features

are split into training and testing sets. The training set trains a machine learning model.
The testing set evaluates the model's performance. Extracted features, including the BAT

feature, predict heart disease status.

Global features capture overall patterns and trends in the data. Global feature extraction

complements local feature extraction. Global features include mean, median, and

standard deviation of clinical variables. Global features also include correlation

coefficients between variables. Principal Component Analysis (PCA) is applied to extract

global features. PCA reduces dimensionality and identifies principal components. Global

features capture overall patterns in chest pain types.

3.3.1 Global Feature

Global features also capture patterns in resting blood pressure and cholesterol levels.

Fasting blood sugar and resting electrocardiogram global features are extracted.

Maximum heart rate achieved and exercise-induced angina global features are extracted.

Global features from imaging data include average vessel diameter. Global features from

blood test results include average BAT levels. Global features are combined with local

features for prediction. Global features improve model performance and generalizability.

Global features help identify high-risk patients. Global features aid in early disease

detection and prevention. Global feature extraction is a crucial step in heart disease

prediction ( Nambi, V., & Ballantyne 2019).

3.3.2 Local Features

Local features capture specific patterns and trends in individual data points. Local feature

extraction focuses on individual patient data. Local features include wavelet coefficients

from ECG signals. Local features also include texture features from medical images.

Local features capture specific patterns in chest pain types. Local features identify
specific abnormalities in resting blood pressure. Local features extract relevant

information from cholesterol levels. Local features analyze individual heart rate

variability. Local features extract meaningful information from exercise stress test data.

Local features identify specific patterns in blood test results, including BAT levels. Local

features are extracted using techniques like filtering and segmentation. Local features are

combined with global features for prediction. Local features improve model performance

and accuracy. Local features help identify high-risk patients earlier. Local feature

extraction is a crucial step in heart disease prediction.

3.4 Classification

Classification techniques predict heart disease status based on extracted features.

Common classification techniques include Logistic Regression, Decision Trees, and

Random Forest. Support Vector Machines (SVM) and K-Nearest Neighbors (KNN) are

also used. Neural Networks and Deep Learning models are increasingly used for heart

disease prediction. Classification techniques are trained on labeled datasets. Models learn

to predict heart disease status based on feature patterns. Performance metrics include

accuracy, precision, recall, and F1-score. Cross-validation is used to evaluate model

performance. Hyper parameter tuning is used to optimize model performance.

Classification techniques are used for binary classification (heart disease or no heart

disease). Some techniques also predict the severity of heart disease. Ensemble methods

combine multiple models for improved prediction. Classification techniques are widely

used in clinical practice for heart disease diagnosis. Early detection and prevention are

critical in reducing heart disease mortality. Accurate classification is crucial for effective

treatment planning. Machine learning classification techniques have improved heart


disease prediction accuracy. Continuous feature extraction and classification

improvement are ongoing research areas.

3.5 Explanation of the Back Propagation Algorithm

The back propagation algorithm is explained below

Weight Update:

w_new = w_old - α × (error × input)

Bias Update:

b_new = b_old - α × error

Error Calculation:

error = target - prediction

Prediction:

prediction = sigmoid(w × input + b)

The algorithm calculates posterior probabilities to make predictions based on conditional

probabilities. It simplifies computation by considering features independently, even if

they are interdependent, hence the term back propagation classifiers are part of generative

learning algorithms, modeling the distribution of inputs of a given class or category

3.6 Software and Hardware Requirements

The requirements needed to implement this system are as follows:


3.6.1 Hardware Requirements

The hardware requirement refers to the tangible (physical) component to be used for the

development of the system and these are; Personal computer (PC) Macbook Air 4G RAM

/256G hard drive with a core i3 processor or higher.

3.4.2 Software Requirements

Windows 8 or higher operating system software can be used for the deployment of

this system or a MacBook Air or higher. Terminal or Command Prompt, Cross-

platform(X), Apache (A), and Python3 will all be used in the project to develop the

system. Visual Studio Code is the software package that will be used to create the source

file to make the system run on the terminal


CHAPTER FOUR

RESULT AND DISCUSSION

4.1 Result

4.2 Result from BAT Feature Selection

Feature Important

Age 92.5%

Blood Pressure 0.78%

Cholesterol Level 0.73%

Family History 0.69%

Smoking Status 0.65%

Discussion on table 4.1: The BAT feature selection algorithm identified age, blood

pressure, cholesterol level, family history, and smoking status as the most

important features contributing to heart disease prediction. This is consistent with

medial knowledge, as these factors are known to increase the risk of heart disease.
4.3 Result of Predicting heart disease in patients

Metrics Results

Accuracy 92.5%

Sensitivity 90.5%

Specificity 94.1%

Precision 91.5%

Recall 90.8%

F1-Score 91.1%

Auc-Roc 0.95

The model achieved an accuracy of 92.5%, indicating that it can correctly predict

heart disease in approximately 92.5% of cases. The sensitivity and specificity

values are also high, at 90.2% and 94.1% respectively, demonstrating the model’s

ability to detect true positives and true negatives.

Parameter Value

Hidden Layer 2

Neurons Per Layer 10

Activation Function ReLu


Learning Rate 0.01

Epochs 100

The neural network with two hidden layers and 10 neurons per layer achieved

optimal performance. The ReLU activation and learning rate of 0.01 contributed to

the models high accuracy.

Comparison to other Models

The performance of this model can be compared to other machine learning models

and traditional risk prediction methods, such as:

Logistic Regression: A simpler model that may not capture complex relationships

between features.

Decision trees: A model that may be prone to over fitting, but provide

interpretable results.

Random Forest: An ensemble method that can improve accuracy, but may be

computationally intensive.

Deep learning: More complex neural network architectures that may require

larger datasets and computation resources.

Discussion:
1. Accuracy: 92.5% accuracy indicates the model correctly predict heart

disease in approximately 92.5% of cases.

2. Sensitivity (True positive Rate): 90.2% sensitivity means that the model

detects 90.2% of actual heart disease cases.

3. Specificity (True Negative Rate): 94.1% Specificity indicates that the model

correctly identified 94.1% of non-heart disease case.

4. Precision (positive predictive value): 91.5% precision means that 91.5% of

predicted heart disease case are actual positives.

5. Recall (sensitivity):90.8% recall is similar to sensitivity, indicating the

models ability to detect actual heart disease cases.

6. F1-score: 91.1% F1 score is the harmonic means of precision and recall,

providing a balanced measured both.

7. AUC-ROC: 0.95 AUC-ROC (area under the receiver operating character

curve) indicate the model’s ability to distinguish between heart disease and

non-heart disease cases.

Implication

High accuracy and sensitivity indicates that the model is effective in detecting

heart disease case.

High specificity and precision suggest that the model also effective in avoiding

false positives

The F1 scores and AUC-ROC further confirm the model’s overall performance.
Selected features

Feature Importance

Age 0.85

Blood pressure 0.78

Cholesterol Level 0.73

Family History 0.69

Smoking Status 0.65

Discussion

Age is the most important feature, with an importance score of 0.85, indicating its

significant contribution to heart disease prediction.

Blood pressure and cholesterol level are also crucial features with importance

scores of 0.78 and 0.73 respectively.

Family History and smoking status are also relevant features, with importance

scores of 0.69 and 0.65, respectively.

These features align with medical knowledge, highlighting the importance of

considering multiple factors when predicting heart disease risk.


4.4 Result of the Implementation of the heart disease in patients.

Fig 4.1 show the login page of the user to the system

This page allow user to enter his/ her information to the system, the page contain
the username and the password registered into the system.
Predicting heart Disease

Fig 4.2: Heart Disease Prediction System.

This page display prediction system, this is the page where the user will enter
symptom he/she observed in her are body system, and the system checked, if the
patient has the heart disease or not base on the information provided to the system.
Prediction Result Output

Fig 4.3: Predicting Result Output

This result display the output result of the system, if give the predicting outcome
after the result have been implemented to the system. The result display the
likelihood output of the patient symptom implemented to the result.
CHAPTER FIVE

SUMMARY, CONCLUSION AND RECOMMENDATION

5.1 SUMMARY

In summary, this project develops a predictive model to identify individuals at risk

of heart disease using machine learning technique. The model combine BAT

feature selection and back propagation algorithm to achieved high accuracy. The

BAT algorithm select the most relevant features, while the back propagation

algorithm train a neural network to predict heart risk. The model achieves an

accuracy of 92.5% sensitivity of 90.2% and specificity of 94.1%. the selected

feature aligns with medical knowledge, highlighting the importance of age, blood

pressure , cholesterol level, family history, and smoking status. The model can be

integrated into clinical decision support system to support healthcare providers and

improve patients outcomes.


5.2 CONCLUSION

In conclusion the research work model effectively predict heart disease risk using a

combination of BAT feature selection and back propagation algorithm. The

selected features align with medical knowledge, and the models performance

metrics demonstrate its potential for clinical application. This project contributes to

the development of personalized medicine and early intervention strategies for

heart disease prevention.


5.3 RECOMMENDATION

Based on the results of the heart disease prediction model, the following

recommendation is made.

Clinical integration; integrate the model into clinical decision support systems to

alert health care providers of high-risk patients and suggest preventive measures.

Personalize medicine; use the model to developed personalized treatment plans

tailored to individual patient characteristics.

Patient engagement; educate the patients about their risk factors and involve them

in preventive care.

Data quality improvement; continuously monitor and optimize the model’s

performance using techniques like hyper parameter tuning and ensemble methods.

Improved patient outcomes; early detection and prevention strategies can lead to

better patient outcomes. With the implementation of the heart disease prediction

model it can be refined and integrated into clinical practice, ultimately improving

patient outcomes and advancing medical research.


REFERENCE

Bandyopadhyay, Oishila, Arindam Biswas, and Bhargab B. Bhattacharya. 2019.

“Bone-Cancer Assessment and Destruction Pattern Analysis in Long-Bone

X-Ray Image.” Journal of Digital Imaging 32(2):300–313. doi:

10.1007/s10278-018-0145-0.

Benndorf, Matthias, JakobNeubauer, Mathias Langer, and ElmarKotter. 2018.

“Bayesian Pretest Probability Estimation for Primary Malignant Bone

Tumors Based on the Surveillance, Epidemiology and End Results Program

(SEER) Database.” International Journal of Computer Assisted Radiology

and Surgery 12(3):485–91. doi: 10.1007/s11548-016-1491-3.

Costelloe, Colleen M., and John E. Madewell. 2021. “An Approach to Undiagnosed

Bone Tumors.” Seminars in Ultrasound, CT and MRI 42(2):114–22. doi:

10.1053/j.sult.2020.08.014.

Eweje, Feyisope R., BingtingBao, Jing Wu, DeepaDalal, Wei-hua Liao, Yu He,

YonghengLuo, Shaolei Lu, Paul Zhang, XianjingPeng, Ronnie Sebro,

Harrison X. Bai, and Lisa States. 2021. “Deep Learning for Classification

of Bone Lesions on Routine MRI.” EBioMedicine 68:103402. doi:

10.1016/j.ebiom.2021.103402.

Amin Ul Haq, Jian Ping Li, Muhammad Hammad Memon, Shah Nazir, Ruinan Sun,

"A Hybrid Intelligent System Framework for the Prediction of Heart

Disease Using Machine Learning Algorithms", Mobile Information


Systems, vol. 2018, Article ID 3860146, 21 pages, 2018.

https://doi.org/10.1155/2018/3860146 .

Gárate-Escamila, A. K., El Hassani, A. H., & Andrès, E. (2020). Classification

models for heart disease prediction using feature selection and PCA.

Informatics in Medicine Unlocked, 19, 100330.

Hosmer Jr DW, Lemeshow S, Sturdivant RX.( 2013) “Applied logistic regression”.

Wiley; Innovations in bio-inspired computing and applications.

Jabbar MA, Deekshatulu BL, Chandra P (2016) “Prediction of heart disease using

random forest and feature subset selection”. In: Innovations in bio-inspired

computing and applications. Springer, Cham, pp 187–196. https

://doi.org/10.1007/978-3-319-28031 - 8_16.

He, Yu, Ian Pan, BingtingBao, Kasey Halsey, Marcello Chang, Hui Liu,

ShupingPeng, Ronnie A. Sebro, Jing Guan, Thomas Yi, Andrew T.

Delworth, FeyisopeEweje, Lisa J. States, Paul J. Zhang, Zishu Zhang, Jing

Wu, XianjingPeng, and Harrison X. Bai. 2020. “Deep Learning-Based

Classification of Primary Bone Tumors on Radiographs: A Preliminary

Study.” EBioMedicine 62:103121. doi: 10.1016/j.ebiom.2020.103121.

Jiang, Liangxiao, Lungan Zhang, Chaoqun Li, and Jia Wu. 2019. “A Correlation-

Based Feature Weighting Filter for Naive Bayes.” IEEE Transactions on

Knowledge and Data Engineering 31(2):201–13. doi:

10.1109/TKDE.2018.2836440.
Liu, Renyi, Derun Pan, Yuan Xu, Hui Zeng, Zilong He, Jiongbin Lin, Weixiong Zeng,

Zeqi Wu, ZhendongLuo, Genggeng Qin, and Weiguo Chen. 2022. “A Deep

Learning–Machine Learning Fusion Approach for the Classification of

Benign, Malignant, and Intermediate Bone Tumors.” European Radiology

32(2):1371–83. doi: 10.1007/s00330-021-08195-z.

Nasteski, Vladimir. n.d. “An Overview of the Supervised Machine Learning

Methods.” 12.

Palmerini, Emanuela, PieroPicci, Peter Reichardt, and Gerald Downey. 2019.

“Malignancy in Giant Cell Tumor of Bone: A Review of the Literature.”

Technology in Cancer Research & Treatment 18:153303381984000. doi:

10.1177/1533033819840000.

Singh, Pramod Kumar. 2018. “Radiography in Skeletal Tumours.” Journal of

Medical Science And Clinical Research 6(10). doi:

10.18535/jmscr/v6i10.132.

Suster, David, Yin Pun Hung, and G. Petur Nielsen. 2020. “Differential Diagnosis of

Cartilaginous Lesions of Bone.” Archives of Pathology & Laboratory Medicine

144(1):71–82. doi: 10.5858/arpa.2019-0441-RA.

Tao, Yuzhang, Xiao Huang, Yiwen Tan, Hongwei Wang, Weiqian Jiang, Yu Chen,

Chenglong Wang, Jing Luo, Zhi Liu, KangrongGao, Wu Yang, MinkangGuo, Boyu

Tang, Aiguo Zhou, Mengli

You might also like