SlideShare a Scribd company logo
The International Journal Of Engineering And Science (IJES)
|| Volume || 5 || Issue || 10 || Pages || PP 35-39 || 2016 ||
ISSN (e): 2319 – 1813 ISSN (p): 2319 – 1805
www.theijes.com The IJES Page 35
Applying K-Means Clustering Algorithm to Discover Knowledge
from Insurance Dataset Using WEKA Tool
Dr. Abdelrahman Elsharif Karrar1
, Marwa Abdelhameed Abdalrahman2
,
Moez Mutasim Ali3
1
College of Computer Science and Engineering, Taibah University, Saudi Arabia
2
College of Computer Science and Information Technology, University of Science and Technology, Sudan
3
College of Computer Science and Information Technology, University of Science and Technology, Sudan
--------------------------------------------------------ABSTRACT-------------------------------------------------------------
Data mining works to extract information known in advance from the enormous quantities of data which can
lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining
in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases
and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that
they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering
algorithm which collect a number of data based on the characteristics and attributes of this data, and process
the Clustering by reducing the distances between the data center. This algorithm is applied using open source
tool called WEKA, with the Insurance dataset as its input.
Keywords: Clustering, Centroid, Data Mining, knowledge discovery, K-means, WEKA.
---------------------------------------------------------------------------------------------------------------------------------------
Date of Submission: 17 May 2016 Date of Accepted: 05 November 2016
--------------------------------------------------------------------------------------------------------------------------------------
I. INTRODUCTION
The present age is characterized by the use of advanced data technology to save and retrieve data and enormous
quantities and which is described data warehousing. These data provide open the door to a range of specialized
in the management of those data subjects was the most prominent topic of data mining, which is one of the
important methods to get useful information from the data. Scientists have known the term data mining as "part
of the process of knowledge discovery in databases, which are made using multiple methods its goal configure
models of data". [1]
Data mining techniques are designed to extract hidden information in the database, this modern technology has
imposed itself firmly in the information age and in the light of the great technological development and
widespread use of databases, they offer institutions in all areas the ability to explore and focus on the most
important information in the databases. Data mining techniques focus on predictions future, explore behavior
and trends allowing to take decisions correct and taken at the right time, also as the data mining techniques to
answer Many questions in record time, especially those questions that are difficult to answer them by using
methods statistical traditional. [2]
II. DATA MINING
Data mining technology is simply extract the important information from huge amount of information to follow
certain mechanisms analysis this information, or it’s a technique used in the process of extracting data from data
warehouses. Data Mining passes a number of stages starting from data cleansing, and standardization of data,
and the relevant test data, then transfer them, classify then evaluated and extract data. [3]
There are many definitions of the concept of data mining:
 Is a computerized search for the knowledge of data without prior assumptions about what can be this
knowledge?
 Is the process used by companies to convert the raw data into useful information?
 Analysis of large sets of data and summarize the data in the new forms be understandable and useful to its
users and is done in banks telecommunications companies commercial transactions and scientific data
(Biology - Astronomy).
 Process data analysis by linking them with artificial intelligence techniques and statistical process, is simply a
process of exploration and search for specific and useful information in a huge amount of data.
Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using…
www.theijes.com The IJES Page 36
 Search the relevant information together collected by common characteristics and linked unit subject or
specialization is then to find this information between a very large amounts of information that has no
relationship was presented to the decision maker. [3]
III. TECHNIQUES OR METHODS OF DATA MINING
Data mining process uses several techniques through which you can discover the hidden trends and models in
large amounts of data and can be used one or more of these techniques are as follows:
 Prediction: Use of available data and the application of certain techniques to give them values successful
future.
 Description: Process description available data to see their ratings by the presence of the relationships
between them.
 Classification: It is a set of data analysis to create a set of database assembled that can be used to classify
any future data to find information that relates to the common characteristics and classification many tools
such as decision tree, nearest neighbor and regression.
 Association: Is the database that includes fixed coupling relationships among a group of objects in the
database any association between the occurrence of an event and another event occurs, which is often
called the market basket analysis.
 Sequential Analysis: Which is similar to association and placed in the name the link, but analysis linked in
time in the Search for models occurs in any deal with the succession of data that occur in separate cases.
 Clustering: The idea of collecting data is a simple idea in nature and very close to the human way of
thinking where we are whenever we deal with a large amount of data tends to summarize the vast amount
of data into a small number of groups or categories in order to facilitate the process of analysis. Algorithms
assembly used widely not only for the organization and classification of data but also the data compression
and build a model arrangement.The process of clustering is the process of collecting objects or items that
possess the qualities and attributes are similar in groups called clusters. The process of clustering one of the
main roads in the process of data mining and can be used as a standalone tool to gain insight into how the
distribution of data, control characteristics of each group and focus on a specific set of groups so for further
analysis and can be as a preliminary step to the work of an elementary or other techniques such as
characterization and classification. [4]
IV. K-MEANS ALGORITHM
Is one of the clustering algorithms, it collect a number of data based on the characteristics and attributes of this
data and the process of the Clustering by reducing the distances between the data center. The steps of this
algorithm are:
 Determine the number of clusters K which is a step Initialize Preliminary.
 Determine the coordinates of the centers of clusters (Centroid) randomly for the first time and calculate the
average of the points that belong to the center for the rest of the times.
 Calculate the distance between each example and among all centers and is used Euclidean dimension. Given
the Euclidean distance ) ij
d
) between the two examples( i,, j ) the following relationship:
 Data collection (examples) with its nearest center.
 Repeat steps 2 through 4 until you get stability (and the absence of moving objects within the clusters), or
even repeating a certain number of times. [5]
Before you start in the presentation of the results of the application must describe the data which were used in
the search: this paper apply the algorithm (k-means) to the insurance company data to clarify the best payment
method that can be available to the customers, as the company suffered from the challenges and difficulties
limit their ability to deal with payment methods. This was done using a program WEKA.
V. WEKA
WEKA is a data mining system developed by the University of Waikato in New Zealand that implements data
mining algorithms using the JAVA language. WEKA is a state-ofthe-art facility for developing machine
learning (ML) techniques and their application to real-world data mining problems. It is a collection of machine
learning algorithms for data mining tasks. The algorithms are applied directly to a dataset. [6]



n
k
jkikij
xxd
1
2
)(
Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using…
www.theijes.com The IJES Page 37
WEKA implements algorithms for data preprocessing, classification, regression, clustering and association
rules. It also includes visualization tools. [6]
To perform cluster analysis in weka the dataset is needed to be loaded to weka and it should be in the format of
CSV or ARFF file format. If the dataset is not in arff format we need to be converting it.
Figure 1: WEKA GUI
Figure 2: Customer Data in the Insurance Company
Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using…
www.theijes.com The IJES Page 38
Figure 3: Choose the Data File
Figure 4: Choosing the Attributes
Figure 5: Cluster mode
Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using…
www.theijes.com The IJES Page 39
Figure 6: Clustered output
VI. RESULT DISCUSSION
When we study the output report we can see four clusters (cluster 0, cluster1, cluster2, cluster3). In the report
appear as:
Number of clusters selected by cross validation: 4
The first column gives you the overall population centroid. The second and third and fourth and fifth columns
give you the centroids for cluster 0, 1, 2 and 3 respectively. Each row gives the centroid coordinate for the
specific dimension.
Here we must notice that finding the centroids is an essential part of the algorithm. The centroids are a result of
a specific run of the algorithm and are not unique, a different run may generate a different centroid set.
For each cluster there is “clusters prior” probability. The estimators consist of a number for each possible
attribute value, and the attribute values are treated in order.
 Cluster0 has total 4 objects, out of which majority of objects (6) data sets.
 Cluster1 has total of 22 objects, out of which majority of objects (32) data sets.
 Cluster2 has total 20 objects, out of which majority of objects (29) data sets.
 Cluster3 has total 22 objects, out of which majority of objects (32) data sets.
VII. CONCLUSION
Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low
inter cluster similarity. K-means algorithm is one of the clustering algorithms, it collects a number of data based
on their characteristics and attributes, and run the process of Clustering by reducing the distances between the
data center. WEKA an open source tool is used to apply K-means algorithm on insurance dataset.
REFERENCE
[1]. Daniel T. Larose, “Data Mining Methods and Models”, 2011.
[2]. G. K. Gupta, “Introduction to Data Mining with Case Studies”, 2006.
[3]. Gregory Piatetsky, “From Data Mining to Knowledge Discovery:An Introduction”, 2012.
[4]. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques (Second Edition)”, 2014.
[5]. Subhash Sharma, Ajith Kumar, “Cluster Analysis and Factor Analysis”, 2014.
[6]. Yizhou Sun, “an Introduction to WEKA”, 2008.
Ad

More Related Content

What's hot (19)

Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
IOSRjournaljce
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
ijdmtaiir
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Universitas Pembangunan Panca Budi
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
IJARIIE JOURNAL
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
Editor IJCATR
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
warishali570
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
IJDKP
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
Vaibhav Dhattarwal
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
Editor IJMTER
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
eSAT Publishing House
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
eSAT Journals
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
IJSRD
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER
 
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET Journal
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
IOSRjournaljce
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
ijdmtaiir
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Universitas Pembangunan Panca Budi
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
IJCSIS Research Publications
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
Editor IJCATR
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
warishali570
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
IJDKP
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
Editor IJMTER
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
eSAT Publishing House
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
eSAT Journals
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
IJSRD
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER
 
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET Journal
 

Viewers also liked (8)

Receitas AlimentaçãO Escolar Lanche Gostoso 12
Receitas AlimentaçãO Escolar Lanche Gostoso 12Receitas AlimentaçãO Escolar Lanche Gostoso 12
Receitas AlimentaçãO Escolar Lanche Gostoso 12
tsunamidaiquiri
 
Artes rudolf 01
Artes rudolf 01Artes rudolf 01
Artes rudolf 01
Rudolf Rotchild Costa Cavalcante
 
Selection of Plastics by Design of Experiments
Selection of Plastics by Design of ExperimentsSelection of Plastics by Design of Experiments
Selection of Plastics by Design of Experiments
theijes
 
Primera guía de observación
Primera guía de observaciónPrimera guía de observación
Primera guía de observación
NormalistaV
 
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
theijes
 
Aula1
Aula1Aula1
Aula1
Maria do Carmo Alberto
 
Domingo Cristo Rey ciclo c
Domingo Cristo Rey ciclo cDomingo Cristo Rey ciclo c
Domingo Cristo Rey ciclo c
Diócesis de Mayagüez
 
Arte vs Diseño
Arte vs DiseñoArte vs Diseño
Arte vs Diseño
Frida López
 
Receitas AlimentaçãO Escolar Lanche Gostoso 12
Receitas AlimentaçãO Escolar Lanche Gostoso 12Receitas AlimentaçãO Escolar Lanche Gostoso 12
Receitas AlimentaçãO Escolar Lanche Gostoso 12
tsunamidaiquiri
 
Selection of Plastics by Design of Experiments
Selection of Plastics by Design of ExperimentsSelection of Plastics by Design of Experiments
Selection of Plastics by Design of Experiments
theijes
 
Primera guía de observación
Primera guía de observaciónPrimera guía de observación
Primera guía de observación
NormalistaV
 
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
Practical Implementation for Stator Faults Protection and Diagnosisin 3-Ph IM...
theijes
 
Ad

Similar to Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using WEKA Tool (19)

Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative Study
Fiona Phillips
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Editor IJCATR
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
IRJET Journal
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
IJCSES Journal
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
mustafasmart
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
ijdpsjournal
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
IJSRD
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
IJSRD
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
IJAEMSJORNAL
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
Nicolle Dammann
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
Basma Gamal
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
IJERA Editor
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
Vaibhav Dhattarwal
 
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
IJMER
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
IRJET Journal
 
Software Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining TechniquesSoftware Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining Techniques
AM Publications
 
DATA WAREHOUSING AND DATA MINING (R18A0524).pdf
DATA WAREHOUSING AND DATA MINING (R18A0524).pdfDATA WAREHOUSING AND DATA MINING (R18A0524).pdf
DATA WAREHOUSING AND DATA MINING (R18A0524).pdf
subapacet
 
A LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGA LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMINING
Carrie Romero
 
31 34
31 3431 34
31 34
Ijarcsee Journal
 
Applications Of Clustering Techniques In Data Mining A Comparative Study
Applications Of Clustering Techniques In Data Mining  A Comparative StudyApplications Of Clustering Techniques In Data Mining  A Comparative Study
Applications Of Clustering Techniques In Data Mining A Comparative Study
Fiona Phillips
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Editor IJCATR
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
IRJET Journal
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
IJCSES Journal
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
mustafasmart
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
ijdpsjournal
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
IJSRD
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
IJSRD
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
IJAEMSJORNAL
 
How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...How Partitioning Clustering Technique For Implementing...
How Partitioning Clustering Technique For Implementing...
Nicolle Dammann
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
Basma Gamal
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
IJERA Editor
 
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...Applying Classification Technique using DID3 Algorithm to improve Decision Su...
Applying Classification Technique using DID3 Algorithm to improve Decision Su...
IJMER
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
IRJET Journal
 
Software Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining TechniquesSoftware Bug Detection Algorithm using Data mining Techniques
Software Bug Detection Algorithm using Data mining Techniques
AM Publications
 
DATA WAREHOUSING AND DATA MINING (R18A0524).pdf
DATA WAREHOUSING AND DATA MINING (R18A0524).pdfDATA WAREHOUSING AND DATA MINING (R18A0524).pdf
DATA WAREHOUSING AND DATA MINING (R18A0524).pdf
subapacet
 
A LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMININGA LITERATURE REVIEW ON DATAMINING
A LITERATURE REVIEW ON DATAMINING
Carrie Romero
 
Ad

Recently uploaded (20)

IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
DSP and MV the Color image processing.ppt
DSP and MV the  Color image processing.pptDSP and MV the  Color image processing.ppt
DSP and MV the Color image processing.ppt
HafizAhamed8
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 

Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using WEKA Tool

  • 1. The International Journal Of Engineering And Science (IJES) || Volume || 5 || Issue || 10 || Pages || PP 35-39 || 2016 || ISSN (e): 2319 – 1813 ISSN (p): 2319 – 1805 www.theijes.com The IJES Page 35 Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using WEKA Tool Dr. Abdelrahman Elsharif Karrar1 , Marwa Abdelhameed Abdalrahman2 , Moez Mutasim Ali3 1 College of Computer Science and Engineering, Taibah University, Saudi Arabia 2 College of Computer Science and Information Technology, University of Science and Technology, Sudan 3 College of Computer Science and Information Technology, University of Science and Technology, Sudan --------------------------------------------------------ABSTRACT------------------------------------------------------------- Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input. Keywords: Clustering, Centroid, Data Mining, knowledge discovery, K-means, WEKA. --------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 17 May 2016 Date of Accepted: 05 November 2016 -------------------------------------------------------------------------------------------------------------------------------------- I. INTRODUCTION The present age is characterized by the use of advanced data technology to save and retrieve data and enormous quantities and which is described data warehousing. These data provide open the door to a range of specialized in the management of those data subjects was the most prominent topic of data mining, which is one of the important methods to get useful information from the data. Scientists have known the term data mining as "part of the process of knowledge discovery in databases, which are made using multiple methods its goal configure models of data". [1] Data mining techniques are designed to extract hidden information in the database, this modern technology has imposed itself firmly in the information age and in the light of the great technological development and widespread use of databases, they offer institutions in all areas the ability to explore and focus on the most important information in the databases. Data mining techniques focus on predictions future, explore behavior and trends allowing to take decisions correct and taken at the right time, also as the data mining techniques to answer Many questions in record time, especially those questions that are difficult to answer them by using methods statistical traditional. [2] II. DATA MINING Data mining technology is simply extract the important information from huge amount of information to follow certain mechanisms analysis this information, or it’s a technique used in the process of extracting data from data warehouses. Data Mining passes a number of stages starting from data cleansing, and standardization of data, and the relevant test data, then transfer them, classify then evaluated and extract data. [3] There are many definitions of the concept of data mining:  Is a computerized search for the knowledge of data without prior assumptions about what can be this knowledge?  Is the process used by companies to convert the raw data into useful information?  Analysis of large sets of data and summarize the data in the new forms be understandable and useful to its users and is done in banks telecommunications companies commercial transactions and scientific data (Biology - Astronomy).  Process data analysis by linking them with artificial intelligence techniques and statistical process, is simply a process of exploration and search for specific and useful information in a huge amount of data.
  • 2. Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using… www.theijes.com The IJES Page 36  Search the relevant information together collected by common characteristics and linked unit subject or specialization is then to find this information between a very large amounts of information that has no relationship was presented to the decision maker. [3] III. TECHNIQUES OR METHODS OF DATA MINING Data mining process uses several techniques through which you can discover the hidden trends and models in large amounts of data and can be used one or more of these techniques are as follows:  Prediction: Use of available data and the application of certain techniques to give them values successful future.  Description: Process description available data to see their ratings by the presence of the relationships between them.  Classification: It is a set of data analysis to create a set of database assembled that can be used to classify any future data to find information that relates to the common characteristics and classification many tools such as decision tree, nearest neighbor and regression.  Association: Is the database that includes fixed coupling relationships among a group of objects in the database any association between the occurrence of an event and another event occurs, which is often called the market basket analysis.  Sequential Analysis: Which is similar to association and placed in the name the link, but analysis linked in time in the Search for models occurs in any deal with the succession of data that occur in separate cases.  Clustering: The idea of collecting data is a simple idea in nature and very close to the human way of thinking where we are whenever we deal with a large amount of data tends to summarize the vast amount of data into a small number of groups or categories in order to facilitate the process of analysis. Algorithms assembly used widely not only for the organization and classification of data but also the data compression and build a model arrangement.The process of clustering is the process of collecting objects or items that possess the qualities and attributes are similar in groups called clusters. The process of clustering one of the main roads in the process of data mining and can be used as a standalone tool to gain insight into how the distribution of data, control characteristics of each group and focus on a specific set of groups so for further analysis and can be as a preliminary step to the work of an elementary or other techniques such as characterization and classification. [4] IV. K-MEANS ALGORITHM Is one of the clustering algorithms, it collect a number of data based on the characteristics and attributes of this data and the process of the Clustering by reducing the distances between the data center. The steps of this algorithm are:  Determine the number of clusters K which is a step Initialize Preliminary.  Determine the coordinates of the centers of clusters (Centroid) randomly for the first time and calculate the average of the points that belong to the center for the rest of the times.  Calculate the distance between each example and among all centers and is used Euclidean dimension. Given the Euclidean distance ) ij d ) between the two examples( i,, j ) the following relationship:  Data collection (examples) with its nearest center.  Repeat steps 2 through 4 until you get stability (and the absence of moving objects within the clusters), or even repeating a certain number of times. [5] Before you start in the presentation of the results of the application must describe the data which were used in the search: this paper apply the algorithm (k-means) to the insurance company data to clarify the best payment method that can be available to the customers, as the company suffered from the challenges and difficulties limit their ability to deal with payment methods. This was done using a program WEKA. V. WEKA WEKA is a data mining system developed by the University of Waikato in New Zealand that implements data mining algorithms using the JAVA language. WEKA is a state-ofthe-art facility for developing machine learning (ML) techniques and their application to real-world data mining problems. It is a collection of machine learning algorithms for data mining tasks. The algorithms are applied directly to a dataset. [6]    n k jkikij xxd 1 2 )(
  • 3. Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using… www.theijes.com The IJES Page 37 WEKA implements algorithms for data preprocessing, classification, regression, clustering and association rules. It also includes visualization tools. [6] To perform cluster analysis in weka the dataset is needed to be loaded to weka and it should be in the format of CSV or ARFF file format. If the dataset is not in arff format we need to be converting it. Figure 1: WEKA GUI Figure 2: Customer Data in the Insurance Company
  • 4. Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using… www.theijes.com The IJES Page 38 Figure 3: Choose the Data File Figure 4: Choosing the Attributes Figure 5: Cluster mode
  • 5. Applying K-means Clustering Algorithm to Discover Knowledge from Insurance Dataset Using… www.theijes.com The IJES Page 39 Figure 6: Clustered output VI. RESULT DISCUSSION When we study the output report we can see four clusters (cluster 0, cluster1, cluster2, cluster3). In the report appear as: Number of clusters selected by cross validation: 4 The first column gives you the overall population centroid. The second and third and fourth and fifth columns give you the centroids for cluster 0, 1, 2 and 3 respectively. Each row gives the centroid coordinate for the specific dimension. Here we must notice that finding the centroids is an essential part of the algorithm. The centroids are a result of a specific run of the algorithm and are not unique, a different run may generate a different centroid set. For each cluster there is “clusters prior” probability. The estimators consist of a number for each possible attribute value, and the attribute values are treated in order.  Cluster0 has total 4 objects, out of which majority of objects (6) data sets.  Cluster1 has total of 22 objects, out of which majority of objects (32) data sets.  Cluster2 has total 20 objects, out of which majority of objects (29) data sets.  Cluster3 has total 22 objects, out of which majority of objects (32) data sets. VII. CONCLUSION Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. K-means algorithm is one of the clustering algorithms, it collects a number of data based on their characteristics and attributes, and run the process of Clustering by reducing the distances between the data center. WEKA an open source tool is used to apply K-means algorithm on insurance dataset. REFERENCE [1]. Daniel T. Larose, “Data Mining Methods and Models”, 2011. [2]. G. K. Gupta, “Introduction to Data Mining with Case Studies”, 2006. [3]. Gregory Piatetsky, “From Data Mining to Knowledge Discovery:An Introduction”, 2012. [4]. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques (Second Edition)”, 2014. [5]. Subhash Sharma, Ajith Kumar, “Cluster Analysis and Factor Analysis”, 2014. [6]. Yizhou Sun, “an Introduction to WEKA”, 2008.