0% found this document useful (0 votes)

117 views

Diabetes Prediction Using Data Mining Te

This document summarizes a research paper that aims to design and implement a diabetes prediction system using data mining techniques. The researchers collected data from Fudawa Health Centre and used data mining algorithms like the naïve bayes classifier to analyze the data and predict diabetes. The goal was to automate diabetes prediction to help doctors and reduce workload. The results showed the proposed system had better prediction accuracy than manual prediction methods.

Uploaded by

Fariha Tabassum

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views

Diabetes Prediction Using Data Mining Te

Uploaded by

Fariha Tabassum

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

International Journal of Research and Innovation in Applied Science (IJRIAS) | Volume IV, Issue VI, June 2019|ISSN 2454-6194

Diabetes Prediction Using Data Mining Techniques

Desmond Bala Bisandu1*, Dorcas Dachollom Datiri2, Eva Onokpasa3, Godwin Thomas4, Musa Maaji Haruna5, Aminu
Aliyu6, Jerry Zachariah Yakubu7
1,2,3,4,5
Department of Computer Science University of Jos, Nigeria
6
Federal University Kebbi, Nigeria
7
Department of Science and Technology University of Jos, Nigeria

Abstract:- This research work was conducted on the design and Diabetes is divided into two distinct types; type 1
implementation of a diabetes prediction system, a case study of diabetes enforces the need for artificially infusing insulin
Fudawa Health Centre. This research will help in automating through medicines or by injections and type 2 diabetes,
prediction of diabetes even before clinicians arrived. The current pancreas create insulin, but it is not effectively used by the
process of carrying this activity is manually which tends not to
body. The majority of people with diabetes are affected by
analyzing data flexible for the doctors, and transmission of
information is not transparent. The system was design using type 2 diabetes. Diabetes was a common problem among
Java Programming Language, Weka Tool, and MySQL adult’s specifically middle-aged people but due to changing
(Microsoft Structured Query Language) as the back end and a lifestyles diabetes affects children too. Type 1 diabetes is
strategic approach to analyse the existing system was taking in unpreventable because of the various external environmental
order to meets the demands of this system and solve the stimulants which result in the destruction of body’s insulin-
problems of the existing system by implementing the naïve beyes producing cells. However, changing lifestyle to achieve the
classifier. The implementation of this new system will help to required body weight and obtain the physical activities can
reduce the stressful process, doctors’ face during prediction of help to prevent type 2 diabetes to enlarge. Diabetes is a
diabetes, the result of the experiment shows that the proposed
chronic health problem with devastating, yet preventable
system has a better prediction in terms of accuracy.
consequences. It is characterized by high blood glucose levels
Keywords: Diabetes, Data mining, weka tool, Diagnosis, resulting from defects in insulin production, insulin action, or
prediction, Naïve bayes classifier, technique both.1,2 Globally, rates of diabetes were 15.1 million in 2003
I. INTRODUCTION the number of people with diabetes worldwide is projected to
increase to 36.6 million by 2030. Of these, 90-95% of these

T he term “diabetes” is a disease that occurs when the blood

glucose in the body, also called blood sugar, is too high.
Blood glucose is the main source of energy and comes from
cases are adults with type 2 diabetes. Diabetes impacts men
and women proportionately; there are over 12 million men
with diabetes and 11.5 women with diabetes. Therefore,
the food we eat. According to doctors, diabetes occurs when a predicting diabetes manually sometimes seems not to be
gland known as pancreas does not release a hormone called objective and it consumes a lot of time and cost. Diabetes
insulin in sufficient quantity. Insulin is a hormone that carries treatment focuses on controlling blood sugar levels to prevent
sugar from the bloodstream to various cells to be used as various symptoms and complications through diet and
energy. Lack of insulin disrupts the body’s natural ability to exercise.
produce and use insulin accurately. As a result of this, high
levels of glucose are released in urine. In the long -term, Data mining is a relatively new concept used for
diabetes when not properly managed can lead to organ failure, retrieving information from a large set of data. Mining means
cardiovascular diseases and disrupts other functions of the using available data and processing it in such a way that it is
body. WHO (world Health organization) has listed diabetes as useful for decision-making. Data mining is the process of
one of the four major NCDs (non communicable diseases) in discovering patterns in large data sets involving methods at
the world today (World Health Day, 2016). Statistics released the intersection of machine learning, statistics, and database
by WHO are alarming. Diabetes, as mentioned earlier, can systems. Data mining is an interdisciplinary subfield
lead to other major complicated cardiovascular diseases. of computer science and statistics with an overall goal to
According to WHO, 3.7 million fatalities occurs before the extract information (with intelligent methods) from a data set
age of 70, and this high mortality rate is attributed to diabetes and transform the information into a comprehensible structure
and cardiovascular diseases. Uncontrolled blood glucose level for further use. Data mining thus has evolved based on human
is the major factor behind diabetes. Diabetes is a health needs which can help humans in identifying relationship
problem in Nigeria. It is the most common chronic diseases patterns and forecasts based on pre-set rules and stipulations
across all population and age groups. According to a report by built into the program (Eapen, 2004). Data mining helps in
WHO 2016, in Nigeria, 13% of deaths are related to diabetes. pattern identification and categorizing data records by
This report also shows that the harmful effects of this disease conducting cluster analysis, identification of odd records also
are increasing at a rapid speed. called detecting anomalies and association rule mining or
dependencies

www.rsisinternational.org Page 103

International Journal of Research and Innovation in Applied Science (IJRIAS) | Volume IV, Issue VI, June 2019|ISSN 2454-6194

Frawley and Piatetsky (1996) describes data mining pattern recognition, which are then used in classification
as the process of extracting implicit and previously which makes prediction based on the set of accepted input, in
undisclosed important information about data sets that can be which every given input, there are two feasible classes that
used for effective decision-making. The process is termed as form the inputs (Madzov et al, 2009). Support vector machine
Knowledge Discovery in Database, Such discovered is designed based on the principle of structural risk
knowledge can be very useful in many areas of sciences, and minimization principle with the basic idea of finding
health care is no different having a Knowledge Discovery in hypothesis with lowest error. However, the drawback of this
Database would help in predicting trends of many kinds of learner is that its computation is highly expensive thereby
diseases and illness. So doctors, rather than depending on their running slow on high dataset and it does not offer probability
own knowledge and experience, can use data mining and estimate directly. It also does not perform very well on large
specifically Knowledge Discovery in Database to predict or to dataset because higher training time is required.
forecast and to predict trends that would lead to better
Naïve bayes classification is simple and particularly suited
diagnoses, reduce cost and save person-hours for the
when the dimensionality of input is high. Despite its
organization. Data mining is placed as a statistical interface,
simplicity, it can outperform more sophisticated classification
data mining lies in the interface of statistics, database
method. This classifier works on the assumptions that: the
technology, recognizing patterns, machine readable data, and
data must be categorical in nature, occurrences of attributes
intelligent expert systems (Obenshain, 2004). The prime
independent and predict accurately on high volume dataset.
objective of data mining is to extract information from data
sources and alter it into a comprehensible assembly of
II. RELATED CONCEPTS
information for more uses (Data Mining Curriculum, 2014).
Data mining is a process that is used to locate correlations The clinical presentation of diabetes in a patient is the
between data and form pattern of relationships among cluster symptomatic features presented by the patients. This feature is
fields in the enormous interactive database (Extract -Nature an indication of the disease cause and has direct impact in
Biotechnology 18, 2000). guiding clinicians about the decision to take. In case of
classifying positive and negative diabetes, the following
With data mining techniques, doctors around the
parameters were considered: age, insulin, smoke, age first
world will be able to predict illnesses effectively and be better
smoked and where the survey was taking.
equipped to manage potential high-risk candidates. Such
analysis and predictions become critical if the objective is to  Positive class label (P): patients can be confirmed to
provide relief to millions around the world. have positive diabetes, when the patient has one or
This research addressed the main challenging issue more symptoms of diabetes and has also been
confronting the health care industry, which is lack of quality confirm by laboratories. Since these features are the
service at minimal cost implying from diagnosing to the most occurring symptoms in a patient with diabetes.
predicting patients correctly(chaurasia and pal, 2013) or  Negative class label (N): patients may have some of
sometimes even understand the complications that may result the parameters (symptoms) of positive diabetes, but
from the diseases(srinavas et al 2010). This issue can lead to after several trying of diagnostic test confirm, the
unfortunate clinical decision that can result in devastating diabetes is undetectable. This means that the
consequences that are unacceptable (Apte and Dangare 2012). existence of the signs may be as a result of the other
The availability of patients medical data has derived the need concomitant disease.
for clinicians and patients for alternative computer-based 2.1 Naïve Bayes Classifier
assessment tool that can assist in decision –making (soni et al
2011) for example, the physicians can compare analytical The Naive Bayesian classifier is based on Bayes’
information of numerous patients with the matching condition theorem with the independence assumptions between
and physicians can equally confirm their results with the predictors. A Naive Bayesian model is easy to build, with no
conformity of other part of the country (srinavas et al 2010). complicated iterative parameter estimation which makes it
particularly useful for very large datasets. Despite its
This research applies naïve bayes classification technique on simplicity, the Naive Bayesian classifier often does
the dataset obtained from Fudawa health care centre, jos surprisingly well and is widely used because it often
plateau state Nigeria. The dataset was preprocessed to remove outperforms more sophisticated classification methods. It
noise and null fields using weka tool and it was further works on the assumptions that: classifying categorical data,
divided into training dataset and test dataset. The following occurrences of an event independent and predict accurately on
parameters were used for detecting and classifying the high dataset.
diabetes into positive and negative class, the parameters are:
age, insulin, smoke cigarette, agefirstsmoked, where survey ALGORITHM
was taking. Let A be a training dataset. Suppose each tuple is represented
Support vector machine is a method that uses the concept of by n-dimensional attribute vector X=(X₁, X₂…..Xn) which
computer science and statistics to analyze data and support

www.rsisinternational.org Page 104

International Journal of Research and Innovation in Applied Science (IJRIAS) | Vo
Volume
lume IV, Issue VI, June 2019|ISSN 2454-6194
2454

represents ‘n’ measurement on the tuple from ‘n’ attributes B₁, P (X| C₁) = ∏ P (X₁| C₁).
). Therefore,
B₂……Bn.
P (X| C₁) = P (X₁| C₁)* P (X₂| C₂)*
)* P (X₃| C₃)*…………..* P
Suppose that there are N classes: C₁, C₂………Cn.
………Cn. Given a (Xn| Cn)
tuple X, the classifier will predict that X belongs to the class
The advantages of the Naïve bayes classifier are as follows:
having the highest probability (P) condition on Y that is, Y
belongs to class C₁ if and only if  Ability to approximate probabilities for a class of any
given instances and also it relative simplicity.
P (Ck|X) ˃P (Ci|X) for 1≤i≤n i≠ k
 It requires less model training time.
The algorithm will maximize  It performs well in the present of irrelevant features
P(C₁|X)=P(X| C₁)P(C₁)/ P(X) and P(C₂|X)=P(X|
|X)=P(X| C
C₂)P(C₂)/ 2.2 The Knowledge Discovery in Databases (KDD)
P(X)
Knowledge Discovery in Databases (KDD) is the
Hence, procedure used to attain important and useful
knowledge from a large collection of previously
P (C₁|X) ˃ P (C₂|X) if and only if
collected data. The process involves selecting,
P (X| C₁) P (C₁) ˃ P (X| C₂) P (C₂),
), since P(X) is the same in preparing and cleansing the data from unnecessary
both cases. information. Any previously available information is
incorporated into the data sets. Data interpretations
Given the dataset with many attributes, it will be expensive to are conducted to achieve precise outcomes from
compute P (X| C₁). Therefore,, we assume that the values of available results as shown in Figure
Fi 1.
the attributes are conditional independent.. Thus,

Figure 1: KDD Process (Courtesy Maimon, Rokhah, 2010)

2.3 Data Mining Techniques model

el is focused on analyzing a set of identified classes.
Regression is a mathematical and statistical tool used widely
There are two types of Data Mining task; the predictive model
in using numeric values for forecasting time series analysis.
and descriptive model. Both models are explained as follows:
Prediction as the term implies means correctly envisioning the
Predictive Model future using logical computation of available data.
The predictive data-mining
mining model predicts the future Descriptive Model
outcomes based on past records present in the ddatabase or
This model is to discover patterns in the data and understand
with known answers.. Data mining will help figure out the
the relationships between the data attributes. Descriptive
future credit risk of the applicant and predict future credit
Model represents the main feature of the data, and
history of the applicant by using past data. Classification is
summarizes. The collected knowledge can be used to develop
known as the procedure used to locate a model that best suits
marketing programs for targeting audience. Clustering
identified data sets or ideas. The model helps predict the class
of objects when class labels are not available. The resultant

www.rsisinternational.org Page 105

International Journal of Research and Innovation in Applied Science (IJRIAS) | Vo
Volume
lume IV, Issue VI, June 2019|ISSN 2454-6194
2454

examines data objects without referring to an identified class III. METHODOLOGIES AND PREDICTION
label. FRAMEWORK
Summarization is to categorize the distinctive properties of
3.1 Prediction Framework
data and point out if the data values are to be categori
categorized as
noise or outliers. The framework use clinical parameters of diabetes to classify
diabetes in a patient. The steps involved are general data
This research classifies data mining as shown in the as shown
collection, data pre-processing,
processing, classification and prediction.
in Figure 2.
 Data collection, the data of the patients having
diabetes is collected.
 Data pre-processing
processing was done to remove noise and
null fields.
 Classification and prediction was done using Naïve
Bayes classifier to classify
cla the dataset into
categories.
The working of the frame work is illustrated as follows:
1. Data collection and preprocessing is done
2. Preprocess data is stored in a training dataset
3. Test dataset is stored in database test dataset.
dataset The test
dataset is compared for classification into positive and
negative class label. If patient is having diabetes,
diabet then patient
is classified as positivee (P), while patient is classified as
Figure 2: Data Mining Tasks negative (N) if patient does not have diabetes as shown in
Figure 3.

Data collection

Data pre
pre-processing

Training Dataset Naïve Bayes classifier Test Dataset

Positive Negative

Figure 3: Frame Works for Predicting Diabetes

3.2 Using The Naïve Bayes In The Study P (|H) is the probability of predictor given class
The naïve bayes method discussed in section 2 works as P (E) past (prior) probability of the predictor
follows on our problem.
Class diabetes is calculated as:
Using the naïve bayes formula,
 Positive class (p): patients may have diabetes if the
P (H|E) = P (E|H) P (H)/ P (E) probability of selected features point out that the
probability of positive class is greater than negative
Where H is the class
class.
P (H|E) is a posterior probability of class given predictor P (positive| patient) = P (patient| P)* P) P (P)/ P
(patient)
P (H) is the past (prior) probability of class

www.rsisinternational.org Page 106

International Journal of Research and Innovation in Applied Science (IJRIAS) | Vo
Volume
lume IV, Issue VI, June 2019|ISSN 2454-6194
2454

 Negative class (N):: patients may not have diabetes if 3.4.1Functional Requirement
the probability of selected features point out that the
A functional requirement describes what a software
probability of negative class is greater than positive
system should do. The functional requirement also specifies
class.
the operations and activities that a system must be able to
P (Negative| patient) = P (patient| N) *P (N)/ P
perform.
(patient)
Functional Requirements should include:
TABLE: 1 Parameters used for prediction
 Descriptions of data to be entered into the system
Serial
number
Parameters Description Allowed values  Descriptions of work-flows
flows performed by the system
syst
Discrete integer  Descriptions of system reports or other outputs
1 Age Age of the subject
value
Take any drug or Some of the functional requirement of the proposed system
injection that can includes;
2 Take insulin Yes or No
prevent you from having
diabetes i. The proposed system will provide a platform to
3
Smoke Whether the subject
Yes or No
analyze dataset for new patients.
cigarette smoke cigarette ii. The proposed system will measure dataset for
Age first Age the subject does the Discrete integer
4 accuracy
smoked smoking value
Where did 3.4.2 Non-Functional Requirement
Where the subject Home or
5 you take the
takes the survey Office
survey? Non-functional
functional requirements, as the name suggests,
are requirements that are not directly concerned with the
3.3 The Proposed Application Software specific services delivered by the system to its users. This is a
requirement that specifies criteria that can be used to judge the
This research applied a technique of data mining for operation of a system, rather than specific behaviors.
predicting heart and diabetes risks for individual patients of
Fudawa. This research used the mining comparison prediction 3.5 System Design and Modeling
algorithm and used patient data sets attributes that affect the Software design is a creative activity in which
prediction. The program was developed using Java coding software components and their relationships, based on
language and MySQL as the database. requirements are identified. It is the process of defining the
3.4Requirement
Requirement Analysis Of Proposed System component modules, interfaces and the architecture of the
system to satisfy user requirements. The modeling of the
Requirement analysis is the process of determining system was done using Unified Modeling Language (UML)
user expectations for a new or modified data. The objects and components. The UML diagrams used in the
requirements for a system are the descriptions of what the design of the proposed d system includes the Data Flow
system should do, the services that it provides and the diagram (DFD), class diagram and activity diagram as shown
constraints on its operation. The system requirements are in Figure 4 and Figure 5.
classified in two types.

Figure 4: Data Flow Diagram for the Proposed System

www.rsisinternational.org Page 107

International Journal of Research and Innovation in Applied Science (IJRIAS) | Vo
Volume
lume IV, Issue VI, June 2019|ISSN 2454-6194
2454

Figure 55: Unified Modeling Language for the Proposed System

IV. EXPERIMENTAL RESULTS total 50 105

Portion of real data was used for training the model. We have In the table above, there are two predicted classes: ‘yes’ and
only one training set for classifying the patients to either ‘no’. classifier made a total of 155 predictions out of the 155
positive or negative diabetes. Using naïve bayes classification cases, classifier predicted ‘yes’ 105 times and ‘no’ 50 times.
discussed in section 3,  Sensitivity=(TPR)= Tp/ (Fn+Tp)
A total of 155 cleaned preprocessed records were collected 90/ (5+90) =90/95
and stored in database say diabetes. 155 were used to for =0.95*100
training the model in the classification phase. During the =95%
performance testing 50 records sample was drawn from initial  Specificity =(TNR)= Tn// (Tn+Fp)
(
155 populations as a validation set. 45/ (45 + 15) =45/60
0.75*100
In this study,, we check the accuracy of the Naive Bayes =75%
classifier using confusion matrix.
 False positive rate=(FPR)= Fp/ (Tn+Fp)
4.1 Confusion Matrix 15/ (45 + 15) =15/60
=0.25 *100
Confusion matrix is used to summarize the performance of a
=25%
classification algorithm. It demonstrates the accuracy of a
 False negative rate=(FNR)= Fn/ (Fn+Tp)
solution to a given classification problem. It contains
5/ (5+ 90) = 5/ 95
information about the predicted and actual classifications done
=0.05*100
by a classifier system.. Performance of the mo
model is normally
=5%
evaluated using the data in the confusion matrix. In this study,
 Precision= Tp/ (Tp+Fp)
we achieved 90-95%95% accuracy of correctly classified instances
90/ (90 + 15) =90/ 105
in the classification phase.
=0.86 *100
Table 2: Confusion Matrix =86%
N=155 Predicted: No Predicted:: Yes total  Accuracy = (Tp +Tn)/N
(90 + 45)/155 =135/155
Actual: No Tn=45 Fp=15 60
=0.95 *100
Actual :Yes Fn=5 Tp=90 95 =95%

www.rsisinternational.org Page 108

International Journal of Research and Innovation in Applied Science (IJRIAS) | Volume IV, Issue VI, June 2019|ISSN 2454-6194

4.2 Hardware Requirements to receive or utilize back-end capabilities of the host system. It
enables users to access and request the features and services
The most common set of requirements defined by
of the underlying information system. The front-end system
any operating system or software application is the physical
can be a software application or the combination or hardware,
computer resources, also known as hardware, A hardware
software and network resource. The following are the tools
requirement list is often accompanied by a hardware
used for the front end of the application:
compatibility list (HCL), especially in case of operating
systems. 1. Java Programming Language
2. Weka Tool
Below are the Minimum requirements of this system
3. Naïve Bayesian Classifier was used for the front end
1. A Keyboard 4. Java Development Kit
2. UPS (uninterrupted power supply)
Back End: A back-end of an application or program serves
3. A pointing device
indirectly in support of the front-end services, usually by
4. VGA (video graphics adapter)
being closer to the required resource or having the capability
5. A minimum of 64mb RAM(Random Access
to communicate with the required resource. The back-end of
Memory) or higher
the application may interact directly with the front-end or,
6. A Pentium processor (or any equivalent processor) of
perhaps more typically, is a program called from an
common speeds of 1.27MHZ or above.
intermediate program that mediates front-end and back-end
4.3 Software Requirements activities. Microsoft Structured Query Language (MySQL) is
used as the database application; Wamp Server was used as
Software requirements deal with defining software
the testing server.
resource requirements and prerequisites that need to be
installed on a computer to provide optimal functioning of an 4.5 System Implementation
application. These requirements or prerequisites are generally
In the course of the research, we tested the system
not included in the software installation package and need to
using Net Beans Development Environment. The
be installed separately before the software is installed. The
implementation of the diabetes prediction system was
following are the software of the system.
successfully met, although many challenges were encountered
1. Windows Operating System (OS) such as windows such as errors during execution, financial problem, and also
8, windows 10, windows 7, windows 8.1 getting information we needed was a bit delayed, but with all
2. Weka Tool this the system was implemented successfully by making sure
3. Net Beans Integrated Development Environment all things are corrected.
(IDE)
4.5 .1 Software Performance Testing
4. Java Development Kit (JDK)
Performance testing is generally executed to
4.4 Choice of Tools
determine how a system or sub-system performs in terms of
The following tools are used for the application responsiveness and stability under a particular workload. It
can also serve to investigate measure, validate or verify other
Front End: A front-end system is part of an information quality attributes of the system, such as scalability, reliability
system that is directly accessed and interacted with by the user and resource usage as shown in Figure 6.

Figure 6: display of the software interface for predicting diabetes

www.rsisinternational.org Page 109

International Journal of Research and Innovation in Applied Science (IJRIAS) | Vo
Volume
lume IV, Issue VI, June 2019|ISSN 2454-6194
2454

Figure 7: Display ffrom Machine Learning Algorithms for Test of Accuracy

V. CONCLUSION AND FUTURE WORK health sector, which means that it is necessary for knowledge
discovery in the healthcare’s sector.
5.1 Conclusion
Much more than huge savings in costs in terms of medical
An Application using a data mining algorithm of
expenses, loss of duty time and usage of critical medical
classes’ comparison has been developed to predict the
facilities,
occurrence of or recurrence of diabetes risks. In addition, the
result of the application shows that the predictions system is The naïve bayes classifier based system is very useful for
capable of predicting diabetes
tes effectively, efficiently and most diagnosis of diabetes. The system can perform good
importantly, timely. That means the application is capable of prediction with less error and this technique could be an
helping a physician in making decisions towards patient important tool for supplementing the medical doctors in
health risks. It generates results that make it closer to the real performing expert diagnosis.
agnosis. In this method the efficiency of
life situations. That makes the data
ata mining more helpful in the forecasting was found to be around 95%.

www.rsisinternational.org Page 110

International Journal of Research and Innovation in Applied Science (IJRIAS) | Volume IV, Issue VI, June 2019|ISSN 2454-6194

This application would be a tremendous asset for [14]. Hui Lin,H-W. Wu, (2009) ‘Mining frequent patterns in image
databases with 9DSPA Representation’, Journal of Systems and
doctors who can have structured specific and invaluable
Software, 82(4), pp.603-618.
information about their patients / others so that they can [15]. Lazakidou A. Athina and Siassiakos M. Konstantinos, (2009),
ensure that their diagnosis or inferences are correct and ‘Handbook of Research on Distributed Medical Informatics and
professional. E-health’,
[16]. Madzov, G., Gjorgevikj, D. and Chorbev, I. (2009) ’A multi-class
Finally, the huge appreciations received from the doctors on SVM classifier utilizing binary decision tree’, informatica, vol.33,
having such software prove that in a place like, where diseases NO.2, pp233-241
[17]. Maimon,Oded., Rokach, Lior.,( 2010), Introduction to knowledge
are on the rise, such applications should be developed to cover discovery in databases, University of Tel Aviv, Springer Science
the entire state. The common person stands to benefit from and Business Media.
doctors having such a tool so that he/she can be better [18]. Mena, Jesus. (2011), Machine Learning Forensics for Law
knowledgeable as far as personal health and wellbeing is Enforcement, Security and Intelligence, Boca Raton, FL: CRC
Press, ISBN 978-1-4398-6069-4
concerned. [19]. Obenshain, Mary, K., (2004) Application of Data Mining
Techniques to Healthcare Data, Statistics for Hospital
5.2 Future Work Epidemiology
Future work should be done on improving the accuracy of the [20]. Paitetsky-Shapiro, Gregory. Parker,Gary. (2011), Lesson: Data
Mining, Knowledge Discovery: An Introduction.
prediction by increasing the level of training data. Its [21]. Quinlan, J. Induction of Decision Trees. Mach Learn 1986; 1:81-
performance can be further improved by identifying and 106.
incorporating various other parameters and increasing size of [22]. Reutemann, Peter. Witten, Ian, H. (2010), WEKA Experiences
training. with a Java Open- source Project, Journal of Machine Learning
Research, 11: pp 2533-2541
REFERENCES [23]. Robert. E., Hoyt, A and Yoshihashi, Ann, (2014), ‘Health
Informatics, Practical Guide for Health Care’, 6th ed., See e.g.
[1]. Acharya, Rajendra, U and Yu, Wenwei, (2010).Data Mining OKAIRP (2005) Fall Conference, Arizona State University: Data
Techniques in Medical Informatics. The Open Medical Informatics mining.
Journal, PMCID: PMC2916206. [24]. Shanta, Kumar, .Patil, P and Kumaraswamy, Y.S., (2011).
[2]. Aflori C., and Craus, M., (May 2007) Grid Implementation of the “Predictive data mining for medical diagnosis of heart disease
Apriori algorithm Advances in Engineering Software, 38(5), pp. prediction” IJCSE, 17.
295-300. A. J.T. Lee, Y.H. Liu, H.Mu Tsai, H. [25]. Srinivas, K., (2010). “Analysis of Coronary Heart Disease and
[3]. Anbarasi M., (2010). ‘Enhanced Prediction of Heart Disease with Prediction of Heart Attack in coal mining regions using data
Feature Subset Selection using Genetic Algorithm,’ International mining techniques”, IEEE Transaction on Computer Science and
Journal of Engineering Science and Technology, 2(10), 5370- Education (ICCSE), p(1344 - 1349).
5376. [26]. Witten, Ian, H., Frank, Eibe. and Mark A., (2011). Data Mining:
[4]. Bronzino, D. Joseph, Medical Devices and Systems, 2006 Practical Machine Learning Tools and Techniques (3rd Ed.)
[5]. Chauraisa V., and Pal, S.,(2013). ‘Data Mining Approach to Elsevier, ISBN 978-0-12-374856-0
Detect Heart Diseases’, International Journal of Advanced [27]. World Health Day (2016) WHO calls for global action to halt rise
Computer Science and Information Technology (IJACSIT), 2, (4), in and improve care for people with diabetes
pp 56-66. http://www.who.int/diabetes/global-report/WHD16-press-release-
[6]. Clifton, Christopher (2010), Encyclopedia Britannica: Definition EN_3.pdf?ua=1
of Data Mining Retrieved 2016. [28]. Yoo, Illhoi., Alafaireet, Patricia., Marinov, Miroslav., Pena-
[7]. Data Mining Curriculum, ACM SIGKDD, 2006-04-30, retrieved Hernandez, Keila, Gopidi, Rejitha., Chang, Jia-Fu and Hua, Lei,
2016 (2011). Data Mining in Healthcare and Biomedicine: A Survey of
[8]. Fayyed, Usama., (15 June1999), First Editorial By Editor-In- the Literature, Med Syst DOI, Springer.
Chief, SIGKDD Explorations 1:1, doi:
10.1145/2207243.2207269
[9]. Fayyed, Usama., Piatetsky- Shapiro, Gregory., Smyth, Padhraic.,
(1996) From Data Mining to Knowledge Discovery in Databases.
[10]. Han J.,and Kamber, M. (2010). Data Mining: Concepts and
Techniques, 2nd ed., the Morgan Kaufmann Series.
[11]. Han Jiawei., & Kamber, Micheline. (2001), Data Mining:
Concepts and Techniques, pp. 5
[12]. Hastie, Trevor, Tibshirani Robert, Friedman, Jerome. (2009), ‘The
Elements of Statistical Learning’,
[13]. HninWintKhaing,(2011). “Data Mining based Fragmentation and
Prediction of Medical Data”, IEEE.

www.rsisinternational.org Page 111

Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Final Report
No ratings yet
Final Report
8 pages
Analysis of Various Data Mining Techniques To Predict Diabetes Mellitus
No ratings yet
Analysis of Various Data Mining Techniques To Predict Diabetes Mellitus
6 pages
Diabetic Prediction System Using Data Mining: September 2016
No ratings yet
Diabetic Prediction System Using Data Mining: September 2016
8 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
(IJCST-V12I2P11) :K. Pranith, V. Aravind, R. Pavan, Mr. K. Anil Kumar
No ratings yet
(IJCST-V12I2P11) :K. Pranith, V. Aravind, R. Pavan, Mr. K. Anil Kumar
4 pages
A Survey On Diabetic Prediction System Using Machine Learning
No ratings yet
A Survey On Diabetic Prediction System Using Machine Learning
5 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
Unexpected Results: Embedded Information in Fingerprints Regarding Diabetes
No ratings yet
Unexpected Results: Embedded Information in Fingerprints Regarding Diabetes
11 pages
MLA_report
No ratings yet
MLA_report
19 pages
Diabetes Management System Using Machine Learning
No ratings yet
Diabetes Management System Using Machine Learning
4 pages
Research Paper
No ratings yet
Research Paper
5 pages
An Effective Approach For Detecting Diabetes Using Deep Learning Techniques Based On Convolutional LSTM Networks
No ratings yet
An Effective Approach For Detecting Diabetes Using Deep Learning Techniques Based On Convolutional LSTM Networks
7 pages
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
No ratings yet
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
11 pages
BDA Paper8
No ratings yet
BDA Paper8
6 pages
1 s2.0 S2405959521000205 Main
No ratings yet
1 s2.0 S2405959521000205 Main
8 pages
Paper_18-Diabetes_Prediction_Using_Machine_Learning
No ratings yet
Paper_18-Diabetes_Prediction_Using_Machine_Learning
9 pages
Vol 7 No 03
No ratings yet
Vol 7 No 03
3 pages
Severity Detection of Diabetic Retinopathy1
No ratings yet
Severity Detection of Diabetic Retinopathy1
17 pages
1-s2.0-S2772442524000030-main
No ratings yet
1-s2.0-S2772442524000030-main
13 pages
Healthcare 09 01712
No ratings yet
Healthcare 09 01712
19 pages
A Transformer On Tabular Data Comparative Analysis With Linear and Tree Base Machine Learning Algorithm On Diabetic Dataset
No ratings yet
A Transformer On Tabular Data Comparative Analysis With Linear and Tree Base Machine Learning Algorithm On Diabetic Dataset
6 pages
No_17
No ratings yet
No_17
6 pages
Ijsetr Vol 3 Issue 1-94-99
No ratings yet
Ijsetr Vol 3 Issue 1-94-99
6 pages
Diabetes Mellitus Prediction Using Class
No ratings yet
Diabetes Mellitus Prediction Using Class
5 pages
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
No ratings yet
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
12 pages
Identification of Diabetes Disease From Human Blood Using Machine Learning Techniques
No ratings yet
Identification of Diabetes Disease From Human Blood Using Machine Learning Techniques
7 pages
KNN Diabetes Internasional 2
No ratings yet
KNN Diabetes Internasional 2
6 pages
Session 01 - Paper 07
No ratings yet
Session 01 - Paper 07
9 pages
Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques
No ratings yet
Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques
13 pages
245-Article Text-2088-1-10-20240129
No ratings yet
245-Article Text-2088-1-10-20240129
8 pages
Transforming Diabetes Care Through Artificial Intelligence: The Future Is Here
No ratings yet
Transforming Diabetes Care Through Artificial Intelligence: The Future Is Here
14 pages
A Model For Early Prediction of Diabetes
No ratings yet
A Model For Early Prediction of Diabetes
6 pages
Prediction of Diabetics Based On Machine Learning
No ratings yet
Prediction of Diabetics Based On Machine Learning
8 pages
Type 2 Diabetes Mellitus Prediction Model Based On Data Mining
No ratings yet
Type 2 Diabetes Mellitus Prediction Model Based On Data Mining
8 pages
5G Smart Diabetes Toward Personalized Diabetes Diagnosis With Healthcare Big Data Clouds-PAPER
No ratings yet
5G Smart Diabetes Toward Personalized Diabetes Diagnosis With Healthcare Big Data Clouds-PAPER
11 pages
3 Journal
No ratings yet
3 Journal
9 pages
An assessment of machine learning models and algorithms for early
No ratings yet
An assessment of machine learning models and algorithms for early
14 pages
47 Ijcse 03092
No ratings yet
47 Ijcse 03092
6 pages
Diabetes Paper
No ratings yet
Diabetes Paper
6 pages
Khatoon 2020
No ratings yet
Khatoon 2020
7 pages
peerj-cs-1914
No ratings yet
peerj-cs-1914
30 pages
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
No ratings yet
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
24 pages
An_Approach_to_detect_multiple_diseases_using_mach
No ratings yet
An_Approach_to_detect_multiple_diseases_using_mach
8 pages
V5i9 0240
No ratings yet
V5i9 0240
4 pages
Chapter I (1) - Merged
No ratings yet
Chapter I (1) - Merged
23 pages
Predicting Heart Disease in Patients Using Bat Features Selection and Back Propagation Algorithm
No ratings yet
Predicting Heart Disease in Patients Using Bat Features Selection and Back Propagation Algorithm
37 pages
Research Proposal
100% (1)
Research Proposal
13 pages
Machine Learning Model for Prediction of Prediabetes Among Adults in Nigeria and Ghana
No ratings yet
Machine Learning Model for Prediction of Prediabetes Among Adults in Nigeria and Ghana
11 pages
04 Smart Detection of Diseases Using Machine Learning
No ratings yet
04 Smart Detection of Diseases Using Machine Learning
12 pages
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
No ratings yet
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
12 pages
4 Journal
No ratings yet
4 Journal
13 pages
fgene-14-1252159
No ratings yet
fgene-14-1252159
15 pages
Using Bayes Network For Prediction of Type-2 Diabetes: Yan Hu
No ratings yet
Using Bayes Network For Prediction of Type-2 Diabetes: Yan Hu
5 pages
Syncronova Health Intelligence
No ratings yet
Syncronova Health Intelligence
15 pages
Research Paper
No ratings yet
Research Paper
3 pages
Healthcure Disease Detection - 1678257628
No ratings yet
Healthcure Disease Detection - 1678257628
6 pages
10.3934 Publichealth.2023030
No ratings yet
10.3934 Publichealth.2023030
21 pages
Decision Tree Discovery For The Diagnosis of Type II Diabetes
No ratings yet
Decision Tree Discovery For The Diagnosis of Type II Diabetes
5 pages
Diabets Arandom Forest
No ratings yet
Diabets Arandom Forest
5 pages
Diabetics 9 42 Edited 1 2 Semifinal
No ratings yet
Diabetics 9 42 Edited 1 2 Semifinal
34 pages
Transforming Treatment: New Pathways to Lifesaving Care with Data and AI
From Everand
Transforming Treatment: New Pathways to Lifesaving Care with Data and AI
Ryan Bauer
No ratings yet
Prediction of Diabetes Using Bayesian Network: Mukesh Kumari, Dr. Rajan Vohra, Anshul Arora
No ratings yet
Prediction of Diabetes Using Bayesian Network: Mukesh Kumari, Dr. Rajan Vohra, Anshul Arora
5 pages
Prediction For Diabetes and Heart Diseas
No ratings yet
Prediction For Diabetes and Heart Diseas
10 pages
Fahima Afroz Rozy and Fariha Tabassum
No ratings yet
Fahima Afroz Rozy and Fariha Tabassum
52 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Analysis of Machine Learning Algorithms On Cancer Dataset
No ratings yet
Analysis of Machine Learning Algorithms On Cancer Dataset
10 pages
Iterative Autoassociative Memory Models For Image Recalls and Pa
No ratings yet
Iterative Autoassociative Memory Models For Image Recalls and Pa
6 pages
Medical Image Classification Thesis
100% (3)
Medical Image Classification Thesis
8 pages
Compute2
No ratings yet
Compute2
10 pages
BUETK Students Employment Prediction Using Machine Learning
No ratings yet
BUETK Students Employment Prediction Using Machine Learning
5 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
100% (1)
40 Interview Questions On Machine Learning - AnalyticsVidhya
21 pages
3 - Image Forgery Detection Based On Fussion of Light Weight Deep Learning Models
No ratings yet
3 - Image Forgery Detection Based On Fussion of Light Weight Deep Learning Models
78 pages
A Study On Method of Feature Extraction For Handwritten Character Recognition
No ratings yet
A Study On Method of Feature Extraction For Handwritten Character Recognition
6 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
54 pages
Tables Log Activated
No ratings yet
Tables Log Activated
575 pages
CS231n Convolutional Neural Networks For Visual Recognition 5
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 5
13 pages
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
No ratings yet
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
5 pages
Seminar Report Bhavesh
No ratings yet
Seminar Report Bhavesh
25 pages
Crime Prediction and Analysis Using Data Mining
No ratings yet
Crime Prediction and Analysis Using Data Mining
6 pages
Fresco
No ratings yet
Fresco
50 pages
(C) Amity University Online: Module-1: Introduction To Statistics
No ratings yet
(C) Amity University Online: Module-1: Introduction To Statistics
92 pages
Ayush Et Al. - 2020 - Geography-Aware Self-Supervised Learning
No ratings yet
Ayush Et Al. - 2020 - Geography-Aware Self-Supervised Learning
11 pages
Data Mining Warehousing - Data Mining - Notes
No ratings yet
Data Mining Warehousing - Data Mining - Notes
56 pages
Classification Tree Method
No ratings yet
Classification Tree Method
5 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Macular OCT Classification Using A Multi-Scale Convolutional Neural Network Ensemble
No ratings yet
Macular OCT Classification Using A Multi-Scale Convolutional Neural Network Ensemble
12 pages
Improving Sentiment Analysis in Arabic: A Combined Approach
No ratings yet
Improving Sentiment Analysis in Arabic: A Combined Approach
9 pages
Capsule Network On Font Style Classification
No ratings yet
Capsule Network On Font Style Classification
13 pages
QM Training - 1 - Master Data Manual
100% (1)
QM Training - 1 - Master Data Manual
54 pages
AICV Lab Manual
No ratings yet
AICV Lab Manual
36 pages
Preprints201912 0154 v1 PDF
No ratings yet
Preprints201912 0154 v1 PDF
6 pages

Diabetes Prediction Using Data Mining Te

Uploaded by

Diabetes Prediction Using Data Mining Te

Uploaded by

International Journal of Research and Innovation in Applied Science (IJRIAS) | Volume IV, Issue VI, June 2019|ISSN 2454-6194

Diabetes Prediction Using Data Mining Techniques

T he term “diabetes” is a disease that occurs when the blood

www.rsisinternational.org Page 103

www.rsisinternational.org Page 104

Figure 1: KDD Process (Courtesy Maimon, Rokhah, 2010)

2.3 Data Mining Techniques model

www.rsisinternational.org Page 105

Training Dataset Naïve Bayes classifier Test Dataset

Figure 3: Frame Works for Predicting Diabetes

www.rsisinternational.org Page 106

Figure 4: Data Flow Diagram for the Proposed System

www.rsisinternational.org Page 107

Figure 55: Unified Modeling Language for the Proposed System

IV. EXPERIMENTAL RESULTS total 50 105

www.rsisinternational.org Page 108

Figure 6: display of the software interface for predicting diabetes

www.rsisinternational.org Page 109

Figure 7: Display ffrom Machine Learning Algorithms for Test of Accuracy

www.rsisinternational.org Page 110

www.rsisinternational.org Page 111

You might also like