Detection of Heart Failure Using Different Machine Learning Algorithms
Detection of Heart Failure Using Different Machine Learning Algorithms
ISSN No:-2456-2165
Abstract:- Heart is the key organ of our body as blood Data gathering about such diseases has been part of the
circulation towards other organs depends upon efficient study for a long time. We are considering cholesterol,lung-
working of the heart . Nowadays, Coronary artery function test, treadmill check, etc.. as parameters to predict
diseases diminish the working ability of hearts to a large the risk rate of the heart.
extent, resulting in failure of hearts in many cases. A
survey conducted by WHO reveals that around 29.20% II. LITERATURE REVIEW
of the world’s population i.e 17 million people die due to
various heart diseases each year. For identifying various The paper "Predictive Analysis of Heart Disease using
heart diseases, several pathological procedures and K-Means and Apriori Algorithms" by Hadia Admin[2019]
medical investigations are being done by doctors. With and his team ,proposed a technique for detecting heart
the use of data mining and machine learning techniques, failure in patients using K- Means and Apriori. Their
better insights can be provided from the existing test approach showed that Apriori and K-mean algorithms when
results and the number of pathological procedures can applied together, the infection can be anticipated much
be reduced. A system created using Data Mining and superior than before and it helps the doctor to make
Machine Learning algorithms, can overcome the dearth necessary decisions of diagnosing the patients These
of examining tools for classifying the data and predicting algorithms effectively predicted the cardiac risk stage (low,
the Risk state of Cardiac patients. In this paper, a medium, or high) and assisted clinicians in accurately
comparative survey of such approaches for investigation understanding the patient's condition and providing
of Cardiac diseases using Data Mining techniques is appropriate diagnosis. These algorithms also aided in the
presented. These comparative study results would be hospital's storage and maintenance of the patient's record.
really helpful for researchers in this domain for
channelizing their research in the appropriate direction. The paper "Congestive heart failure detection using
random forest classifier" by Zerina Masetic and his
Keywords:- Comparative Study, Machine Learning, colleagues demonstrates the outstanding performance of the
Investigation, Naive Bayes, K-Nearest Neighbor, Random Random Forest algorithm, demonstrating that it is useful in
Forest, Decision Table, K-Means. determining the defeat of jammed heart and may be a
treasure in conveying information that will be convenient in
I. INTRODUCTION therapy.
Heart Diseases are increasing day-by-day. As per The paper "Heart Disease detection using Naive
WHO survey, approx. 17.9 million people from all over the Bayes Algorithm" by K. Vembandasamy [2015] and his
globe die due to various health diseases, and approx 80% team stated that data mining techniques have acted as one of
of these deaths are due to coronary heart diseases (eg heart the most important and known solutions for many health-
attack) and cardiovascular diseases (eg strokes).Due to this related issues. Among this techniques Naive bayes
development of most of the countries gets affected to some algorithm has given appropriate results regarding the
extent .Predicting heart diseases beforehand and changing prediction of heart disease. The results thus obtained shows
lifestyle accordingly - would help to reduce the number of that Naive Bayes algorithm provides accuracy of 86% with
deaths. Machine learning and data mining ae are used now a minimum execution time.
days in solving the problems related to many health
diseases. Prediction is one of the important problems where The Decision Tree technique, according to Dilip
machine learning techniques are widely utilized. In this Kumar Choubey[2020], relies on the top randomness of
work, we have done a comparative study of existing data input samples. It's a Divide and Conquer (DAC) method in
mining and ML algorithms which helped us to predict heart which the trees are constructed from the top down method.
diseases by processing existing heart patients’ data using The data is first preprocessed by splitting it into training and
machine learning algorithms. Heart diseases are not just test data in this algorithm.
coronary diseases but they vary for more inner parts which
are connected to the heart.
K.Vembandasa Heart Naive A Naive Bayes Decision trees ● The outcomes acquired show that
my,R.Sasipriya Diseases Bayes technique is easy to give less accurate the Naive Bayes technique
PPand Detection build, having no results while provides 86.4198% of
E.Deepa,(2015 Using Naive complex dull variable synthesizing small correctness with least time.
) Bayes evaluation which enables datasets in some
Algorithm it to be helpful cases.
particularly in the area
for determining the risk
rate of heart patients.
Zerina Congestive Random Normal and congestive ● The results of different trials
Masetic,Abdul heart failure Forest heart failure are both were analysed using a variety of
Hamit detection treated with machine statistical metrics (sensitivity,
Subasi(2016) using random learning approaches specificity, accuracy, F-measure,
forest (CHF). and ROC curve), and it was
classifier discovered that the random forest
The random forest method provides 100% accuracy.
algorithm detects CHF
with 100% accuracy.
Keshav Heart Disease Decisio A web app is built using ● In their experiment, they used the
Srivastava, Prediction n Trees, flask, and these packages Cleveland Heart Disease dataset
Dilip Kumar using KNN,N are used to make from the UCI repository to pre-
Choubey(2020 Machine aïve predictions based on the process data with missing values
) Learning and Bayes, data supplied by the and used algorithms such as
Data Mining Random user. Future researchers Decision Tree, K-Nearest
Forest, can enhance their Neighbour, Support Vector
SVM. accuracy by employing Machines, and Random Forest to
data mining techniques achieve accuracy of 79 %, 87%,
to retrieve hidden and 83 %, respectively.
information from
samples.. ● The AUC for Decision Tree, K-
Nearest Neighbour, Support
Vector Machines, and Random
Forest is 71.6 %, 88.5 %, 90.4 %,
and 90.8 %, respectively,
according to the ROC curve.
Algorithmic Survey
Here are some of the algorithms that we have identified for getting more accurate results regarding investigation of Heart Disease.
Definition ● The K Means method is a ● Decision Trees are a supervised ● Random Forest is an
recursive technique that Machine Learning method in extractor that holds
attempts to split a dataset which data is continually several decision trees on
into K separate clusters, divided according to a set of distinct subsets of a
each of which contains rules. dataset and averages
just one data point. ● Decision nodes and leaves are them to increase the
● Data Mining projects are two procedures that may be dataset's prediction
done by mostly using K used to describe the tree. accuracy.
means Quality of clusters ● The ultimate results are ● It is superior than a
remains the same represented by the leaves, while single decision tree
throughout the execution the decision nodes are the because it reduces over-
process for showing the points where the data is split. fitting by averaging the
accurate output. results.
Algorithmic steps 1. K is the number of clusters 1. S begins the chapter with the 1. Choose K data points at
to specify. root node, which includes the random from the
2. Initialize the centroids by whole dataset. training set. Create
shuffling the dataset and 2. Using the Attribute Selection decision trees for the
then picking K data points Measure, find the best attribute data points you've
at random for the centroids in the dataset (ASM). chosen (Subsets).
without replacing them. 3. Subdivide the S into subsets 2. Choose N for the
3. Continue iterating until the that include the best attribute's number of decision trees
centroids do not change. potential values. you wish to create.
i.e. the clustering of data 4. Create a node in the decision 3. Steps 1 and 2 should be
points does not change. tree that holds the best attribute. repeated.
5. Create new decision trees in a 4. Find the forecasts of
● Calculate the total of all recursive manner using the each decision tree for
data points' squared subsets of the dataset produced new data points, then
distances from all in step -3. allocate the new data
centroids. 6. Continue this procedure until points to the category
● Assign each data point to you can no longer categorise the with the most votes.
the cluster that is closest to nodes any further and refer to
it (centroid). the last node as a leaf node.
● Calculate the cluster
centroids by averaging all
of the data points that
correspond to each cluster.
Accuracy Gives less accurate results Gives less accurate results. Gives more accurate
results.
Dataset Can handle massive amount of Cannot handle large data and noisy Can handle enormous
data and noisy data data amounts of data as well as
noisy data
Pros 1. Execution is rather simple. ● Decision trees need less work ● Random Forest can
2. It can handle huge data for data preparation during pre- handle both
sets. processing than other methods. classification and
3. Convergence is guaranteed. ● A decision tree does not need regression problems.
4. It's possible to warm up the data normalisation.
● It can handle big
locations of centroids. ● A decision tree does not need datasets with a lot of
5. Adapts quickly to new data scalability.
situations. dimensionality.
● In addition, missing values in
6. Generalizes to other forms the data have no significant ● It improves the model's
and sizes of clusters, such impact on the decision tree- accuracy and eliminates
as elliptical clusters. building process. the problem of
● A decision tree model is simple overfitting.
to understand and convey to
technical teams and
stakeholders.
Cons ● It necessitates determining ● A little change in the data can ● Despite the fact that
the number of clusters (k) result in a huge change in the random forest may be
ahead of time. decision tree's structure, used for both
● It can't deal with noisy data resulting in instability. classification and
or outliers. ● When compared to other regression tasks, it is not
● Clusters having non- algorithms, a decision tree's better suited to
convex forms are not calculation might get rather regression tasks..
appropriate for detection. complicated at times.
● The training period for decision
trees is frequently longer.
III. CONCLUSION
REFERENCES