SlideShare a Scribd company logo
International Journal of Engineering Trends and Technology (IJETT) – Volume 38 Number 7- August 2016
ISSN: 2231-5381 http://www.ijettjournal.org Page 380
Data Mining Classification Comparison (Naïve
Bayes and C4.5 Algorithms)
Leni Marlina1
, Muslim2
, Andysah Putera Utama Siahaan3
Faculty of Computer Science
Universitas Pembangunan Panca Budi
Jl. Jend. Gatot Subroto Km. 4,5 Sei Sikambing, 20122, Medan, Sumatera Utara, Indonesia
Abstract - The development of data miningis
inseparable from the recent developments in
information technology that enables the accumulation
of large amounts of data. For example, a shopping
mall that records every sales transaction of goods
using various POS (point of sales). Database data
from these sales could reach a large storage capacity,
even more being added each day, especially when the
shopping center will develop into a nationwide
network. The development of the internet at the
moment also has a share large enough in the
accumulation of data occurs. But the rapid growth of
data accumulation it has created conditions that are
often referred to as "data rich but information poor"
because the data collected can not be used optimally
for useful applications. Not infrequently the data set
was left just seemed to be a "grave data". There are
several techniques used in data mining which includes
association, classification, and clustering. In this
paper, the author will do a comparison between the
performance of the technical classification methods
naïve Bayes and C4.5 algorithms.
Keywords - Data mining, Classification , Naïve bayes,
C4.5
I. INTRODUCTION
The word "mining" means of a large number of
base materials which have a long process and the
literature of other disciplines such as artificial
intelligence, statistics and database [1][4]. Some
techniques that are often mentioned in the literature
data mining are clustering, classification, association
rule mining, neural networks, genetic algorithms and
others. Of all the existing techniques, the distinguishes
perceptions of data mining is the development of data
mining techniques are applied to the database
application on a large scale that turns on the
application of large-scale data that can provide a lot of
new challenges that could ultimately bring
methodologies new [2][3]. Before data mining is
becoming popular as it is today, data mining
techniques can only be applied to data with small-
scale only. The commencement of the application of
data mining in the business world today, come to
make data mining is also applied to other fields that
require large-scale data analysis such as
bioinformatics and defense. There are several
techniques used in data mining which includes
association, classification, and clustering. Association
rule mining is a data mining techniques to discover the
rules of associative between a combination of items.
Classification is the process of finding a model or
function which explain or differentiate the concept or
class of data, with the aim to be able to estimate the
class of an object that the label is not known.
Clustering performs a set of known-based data without
specific data classes. In this paper, the author will do a
comparison between the performance of the
classification techniques between Naïve Bayes and
C4.5
II. THEORIES
A. Data Mining.
In a simple data mining is mining or the discovery
of new information by looking for patterns or specific
rules on very large amounts of data. Data mining is
also referred to as a series of processes for adding
additional value in the form of knowledge that had
been unknown manually from a data set. Data mining,
often referred to as knowledge discovery in databases
[6]. KDD is an activity that includes the collection,
use of data, historical to find regularities, patterns or
relationships in large data sets.
There are some factors that define data mining:
1. Data mining is the process of digging an
added value to the data collected in the past.
2. The object of data mining is that large
amounts of data or complex.
3. The purpose of data mining is to find
connections or patterns that may provide a
useful indication
Data mining is not an entirely new field. One of the
difficulties in defining the data inherited many aspects
and techniques from the fields of science already
established that existing first.
B. Data Mining Technique
Association
Association also called the market basket analysis.
A typical business the problem is to analyze the sales
transaction table and identify products that are often
purchased together by the consumer [5][6].
Association function is often used to find a relation or
correlation between the set of items. Association rule
mining often calls the role in the context of the
International Journal of Engineering Trends and Technology (IJETT) – Volume 38 Number 7- August 2016
ISSN: 2231-5381 http://www.ijettjournal.org Page 381
purposes of marketing strategy, catalog design, and
business decision-making process. Association rule
mining also has a description that is not much
different that mining techniques to find an associative
rule that exists between a combination of items.
Traditionally, the association rules are used to find
business trends by analyzing consumer transactions
[9]. Important or not associative rules can be
determined by two parameters, namely the support
which is a percentage of a combination of items in the
databaseand the confidence that the strong relationship
between items in the rules associative
Classification
Classification is an act to give the group in every
circumstance. Each state contains a bunch of attributes,
one of which is a class attribute. This method needs to
find a model that can explain the class attribute as a
function of the input attribute. A decision tree is one
of the most popular methods of classification because
it is easy to be interpreted by humans. Here, each
branching stating the conditions that must be met and
the tips of trees declared class data. Decision tree
algorithm C4.5 is the most famous, but the algorithm
is not able to handle the data that has a large scale.
The classification process is divided into two stages:
learning and test [7][10]. At this stage of learning,
some of the data that has been known will be fed to
build the model estimates, which later in the test phase,
the model that has been formed will be tested by using
most other data to determine the accuracy of the
model. If the accuracy is limited, then the model can
be used to predict the unknown data class.
Clustering
Clustering also referred to as segmentation. This
method is used to identify a natural group of a case
based on an attribute group, and group data that have
similar attributes. Clustering grouping based on the
data without specific data classes. Even in classes that
data is not yet known because, the clustering is often
classified as unsupervised learning methods [8]. The
principle of clustering is to maximize the similarity
between members of the class and minimize
similarities between classes / clusters. Clustering can
be performed on the data that has several attributes
that have been mapped as a multidimensional space.
Most clustering algorithm to build a model through a
series of repetition and stop when the models are
converging or assembled.
III. EVALUATION
A. Bayes Theorema
In probability theory and statistics, Bayes theorem
is a theorem with two different interpretations. In the
interpretation of Bayes theorem states how much the
degree of subjective belief must rationally change
when there is a new lead. In frequentist interpretation
of this theory describes the representation of the
inverse probability of two events. This theorem is the
basis of statistical Bayes and has applications in
science, engineering, economics, game theory,
medicine, and law. Application of Bayes' theorem to
update the trust is called Bayesian inferences. Bayes
Theorem, named Rev. Thomas Bayes, describes the
relationship between the conditional probability of
two events A and B as follows:
Naive Bayes algorithm is one of the algorithms
contained on classification techniques. Naive Bayes
classification method is a probability and statistics
raised by the British scientist Thomas Bayes, which
predict future opportunities based on the experience of
earlier and became known as Bayes' Theorem. The
naive theorem is combined with an attribute condition
where it is assumed to be independent. Naive Bayes
classification is assumed that there is or is not a
specific characteristic of a class has nothing to do with
the characteristics of other classes. Equation of Bayes'
theorem is:
Remarks :
X = Unknown data class
H = Hipothesis data
P(H|X) = Hipothesisprobability
P(H) = Prior probability
P(X|H) = Condition probabilty
P(X) = Probability
To explain Naive Bayes theorem, note that the
classification process requires some clues to determine
what classes are suitable for the samples analyzed.
Therefore, the Bayes theorem above adjusted as
follows:
C represents the class, while the variable F1 ... Fn
represents the characteristics of instructions needed to
perform the classification. The formula is explained
International Journal of Engineering Trends and Technology (IJETT) – Volume 38 Number 7- August 2016
ISSN: 2231-5381 http://www.ijettjournal.org Page 382
that the chances entry of samples of certain
characteristics in the class C (Posterior). It is the
chance appearance of class C (before the entry of the
sample, often called priors), multiplied by the
likelihood of the characteristics of the sample
characteristics to the class C (also called likelihood),
divided by the likelihood of global characteristics of
the sample characteristics (also called evidence).
Therefore, the above formula can also be written
simply as follows:
Evidence value is always fixed for each class on a
single sample. The value of the posterior will be
compared with the values of other class posteriors to
determine to what class a sample would be classified.
Further elaboration Bayes formula is done by
describing using the product rule as follows:
It can be seen that the translation of these causes
more and more complex factors that affect the value of
the terms of probability, which is almost impossible to
be analyzed one by one. As a result, the calculation
becomes difficult to do. Here is used the assumption
of independence is very high (naive), that each user is
independent (independent) from each other. With
these assumptions, then apply a similarity as follows:
From the equation above it can be concluded that
the naive independence assumption makes
requirements the opportunity to be simple, so the
calculation becomes possible to do. Furthermore, the
translation (|, ...,) 1 n P C F F can be simplified into:
The above equation is a model of Naive Bayes
theorem which would then be used in the
classification process. For classification with
continuous data Density Gauss used the formula:
Remarks :
P = Chance
i X = Atribut i
i x = Attribut value
Y = Class
j y = Subclass
= Mean
= Standard Deviation
The flow of Naive Bayes method is as follows:
1. Read the training data
2. Calculate the amount and probability, but if
the numerical data:
a. Find the mean and standard deviation of
each parameter is numeric data.
b. Find probabilistic value by counting the
number of the corresponding data from
the same category divided by the amount
of data in the category.
3. Getting the values in the table mean, standard
deviation and probability.
B. Algoritma C4.5
An algorithm C4.5 decision tree algorithm group.
This algorithm has input in the form of training
samples and samples. Training samples in the form of
sample data that will be used to build a tree that has
been substantiated. While samples are data field that
will be used as a parameter within the classification
data. C4.5 algorithms are algorithms result of the
development of the algorithm ID3. Improvements
from ID3 algorithm C4.5 algorithms performed in the
case (Santosa, 2003):
1. Can handle with missing value
2. Can solve with continuous data
3. Pruning
4. There are rules
The measures undertaken by the C4.5 algorithm in the
form of a decision tree is as follows:
1. At the beginning of the establishment of the
tree will begin to create a node that
symbolizes the training sample.
2. If the samples have the same class, then the
node is used as a leaf node with the class
label.
International Journal of Engineering Trends and Technology (IJETT) – Volume 38 Number 7- August 2016
ISSN: 2231-5381 http://www.ijettjournal.org Page 383
3. If the samples do not have the same class,
then the algorithm will seek gain the highest
ratio of the available attributes, as a way to
select the attributes that most influence on the
training sample provided. Later this attribute
will be attributed to the examiner or the
decision on that node. The thing to note is
that when those attributes are worth continue,
then attributes must be discrete first.
4. Branch for each node will be established
based on the known values of attribute testing.
5. This algorithm will continue to do the same
process recursively to form a decision tree for
each sample in each of its parts.
6. The recursive process will stop, when one of
the following conditions are met, namely:
a. All samples were given to the node is
derived from the same class.
b. No other attributes that can be used to
partition the sample further.
c. No samples that meet test
attribute.Dalam this case, a leaf is
created and labeled with the class that
has the largest sample (majority voting).
At this stage of the learning algorithm C4.5 has two
working principles, such as:
1. Making the decision tree. The purpose of the
algorithm is to construct a decision tree
inducers tree data structure that can be used
to predict the class of a case or a new sample
that do not have class. C4.5 decision tree is
doing construction with the divide and
conquer method. At first only created the root
node by applying a divide and conquer
algorithm. It chooses to solve cases that best
by calculating and comparing the gain ratio;
then nodes formed at the next level, divide,
and conquered algorithm will be applied
again to form the leaves.
2. Making the rules (rule set). Rules are rules
that form of decision trees will form a
condition in the form if-then. These rules are
obtained by tracking the decision tree from
the root to the leaves. Each node and
branching requirements will form a condition
or an if, while the values contained in the
leaves will form an outcome or an then.
IV. CONCLUSION
Each of these techniques and methods has their
way. Each algorithm has advantages and
disadvantages. C4.5 algorithm works by grouping
several training sample data that will result in a
decision tree based on the facts on the training data.
While on Bayes, decision obtained based on existing
experience at previous events. Bayes counts events
that occur in the data into samples to determine the
decision on the problems faced.
REFERENCES
[1] M. J. Berry, G. Linoff, Data Mining Techniques: For
Marketing, Sales, and Customer Support, New York: John
Wiley & Sons, Inc, 1997.
[2] D. T. Larose, Data Mining Methods and Models, Canada: A
John Wiley & Sons, Inc, 2006.
[3] A. Kumar, O. Singh, V. Rishiwal, R. K. Dwivedi, R. Kumar,
“Association Rule Mining On Web Logs For Extracting
Interesting Patterns Through Weka Tool,” International
Journal of Advanced Technology In Engineering And
Science, vol. 3, no. 1, pp. 134-140, 2015.
[4] C. D., Discovering Knowledge in Data: An Introduction to
Data Mining, Canada: John Wiley & Sons, 2014.
[5] T. Krishna, D. Vasumathi, “A Study of Mining Software
Engineering Data and Software Testing,” Journal of
Emerging Trends in Computing and Information Sciences,
vol. 2, no. 11, 2011.
[6] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,
I. H. Witten, “The WEKA Data Mining Software: An
Update,” SIGKDD Explorations, vol. 11, no. 1, pp. 10-18,
2015.
[7] D. Tomar, S. Agarwal, “A survey on Data Mining approaches
for Healthcare,” International Journal of Bio-Science and
Bio-Technology, vol. 5, no. 5, pp. 241-266, 2013.
[8] T. Silwattananusarn, A. D. KulthidaTuamsuk, “Data Mining
and Its Applications for Knowledge Management : A
Literature Review from 2007 to 2012,” International Journal
of Data Mining & Knowledge Management Process, vol. 2,
no. 5, 2012.
[9] S. Rajagopal, “Customer Data Clustering Using Data Mining
Technique,” International Journal of Database Management
Systems, vol. 3, no. 4, pp. 1-11, 2011.
[10] W. Fitriani and A. P. U. Siahaan, "Comparison Between
WEKA and Salford Systemin Data Mining Software,"
International Journal of Mobile Computing and Application,
vol. 3, no. 4, pp. 1-4, 2016.
Ad

More Related Content

What's hot (20)

Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
IJDKP
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
IJERA Editor
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
idescitation
 
Evaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classificationEvaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classification
eSAT Journals
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER
 
Evaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileEvaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for file
eSAT Publishing House
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
IOSR Journals
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusability
Alexander Decker
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
Vaibhav Dhattarwal
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
IJDKP
 
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET Journal
 
Bx044461467
Bx044461467Bx044461467
Bx044461467
IJERA Editor
 
Ae044209211
Ae044209211Ae044209211
Ae044209211
IJERA Editor
 
Review on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent ItemsReview on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent Items
vivatechijri
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET Journal
 
Z36149154
Z36149154Z36149154
Z36149154
IJERA Editor
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
butest
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09
Rajkishorepanda
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
IJDKP
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
IJERA Editor
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
idescitation
 
Evaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classificationEvaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classification
eSAT Journals
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Editor IJMTER
 
Evaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileEvaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for file
eSAT Publishing House
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
IOSR Journals
 
11.software modules clustering an effective approach for reusability
11.software modules clustering an effective approach for  reusability11.software modules clustering an effective approach for  reusability
11.software modules clustering an effective approach for reusability
Alexander Decker
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
IJDKP
 
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET Journal
 
Review on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent ItemsReview on: Techniques for Predicting Frequent Items
Review on: Techniques for Predicting Frequent Items
vivatechijri
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET Journal
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
butest
 

Viewers also liked (17)

Performance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural networkPerformance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural network
IAEME Publication
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Editor IJMTER
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
eSAT Journals
 
10 Algorithms in data mining
10 Algorithms in data mining10 Algorithms in data mining
10 Algorithms in data mining
George Ang
 
top 10 Data Mining Algorithms
top 10 Data Mining Algorithmstop 10 Data Mining Algorithms
top 10 Data Mining Algorithms
Nagasuri Bala Venkateswarlu
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
sikander kushwaha
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
jamuga gitulho
 
Data analysis
Data analysisData analysis
Data analysis
neha147
 
A study on the factors considered when choosing an appropriate data mining a...
A study on the factors considered when choosing an appropriate data mining a...A study on the factors considered when choosing an appropriate data mining a...
A study on the factors considered when choosing an appropriate data mining a...
JYOTIR MOY
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
mayurik19
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
Venkata Reddy Konasani
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
International School of Engineering
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
jamiebrandon
 
Data analysis using spss
Data analysis using spssData analysis using spss
Data analysis using spss
Muhammad Ibrahim
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
Tilahun Nigatu Haregu
 
Performance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural networkPerformance analysis of data mining algorithms with neural network
Performance analysis of data mining algorithms with neural network
IAEME Publication
 
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETSURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Editor IJMTER
 
High performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayesHigh performance intrusion detection using modified k mean & naïve bayes
High performance intrusion detection using modified k mean & naïve bayes
eSAT Journals
 
10 Algorithms in data mining
10 Algorithms in data mining10 Algorithms in data mining
10 Algorithms in data mining
George Ang
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
jamuga gitulho
 
Data analysis
Data analysisData analysis
Data analysis
neha147
 
A study on the factors considered when choosing an appropriate data mining a...
A study on the factors considered when choosing an appropriate data mining a...A study on the factors considered when choosing an appropriate data mining a...
A study on the factors considered when choosing an appropriate data mining a...
JYOTIR MOY
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
mayurik19
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
jamiebrandon
 
Ad

Similar to Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms) (20)

EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Journals
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
IJwest
 
Configuring Associations to Increase Trust in Product Purchase
Configuring Associations to Increase Trust in Product Purchase Configuring Associations to Increase Trust in Product Purchase
Configuring Associations to Increase Trust in Product Purchase
dannyijwest
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
IJCSES Journal
 
Data mining
Data miningData mining
Data mining
International Islamic University
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Editor IJCATR
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
eSAT Publishing House
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
IOSR Journals
 
E017153342
E017153342E017153342
E017153342
IOSR Journals
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
IOSR Journals
 
A literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesA literature review of modern association rule mining techniques
A literature review of modern association rule mining techniques
ijctet
 
Associative Classification: Synopsis
Associative Classification: SynopsisAssociative Classification: Synopsis
Associative Classification: Synopsis
Jagdeep Singh Malhi
 
Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...
IJDKP
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
Amr Abd El Latief
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold Preference
IJCERT
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
ijdpsjournal
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
andreecapon
 
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
IJTET Journal
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Journals
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
IJwest
 
Configuring Associations to Increase Trust in Product Purchase
Configuring Associations to Increase Trust in Product Purchase Configuring Associations to Increase Trust in Product Purchase
Configuring Associations to Increase Trust in Product Purchase
dannyijwest
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
IJCSES Journal
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Editor IJCATR
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
eSAT Publishing House
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
IOSR Journals
 
Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
IOSR Journals
 
A literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesA literature review of modern association rule mining techniques
A literature review of modern association rule mining techniques
ijctet
 
Associative Classification: Synopsis
Associative Classification: SynopsisAssociative Classification: Synopsis
Associative Classification: Synopsis
Jagdeep Singh Malhi
 
Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...Incremental learning from unbalanced data with concept class, concept drift a...
Incremental learning from unbalanced data with concept class, concept drift a...
IJDKP
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
Amr Abd El Latief
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold Preference
IJCERT
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
ijdpsjournal
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
andreecapon
 
Characterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining TechniquesCharacterizing and Processing of Big Data Using Data Mining Techniques
Characterizing and Processing of Big Data Using Data Mining Techniques
IJTET Journal
 
Ad

More from Universitas Pembangunan Panca Budi (20)

Application of Data Encryption Standard and Lempel-Ziv-Welch Algorithm for Fi...
Application of Data Encryption Standard and Lempel-Ziv-Welch Algorithm for Fi...Application of Data Encryption Standard and Lempel-Ziv-Welch Algorithm for Fi...
Application of Data Encryption Standard and Lempel-Ziv-Welch Algorithm for Fi...
Universitas Pembangunan Panca Budi
 
An Implementation of a Filter Design Passive LC in Reduce a Current Harmonisa
An Implementation of a Filter Design Passive LC in Reduce a Current Harmonisa An Implementation of a Filter Design Passive LC in Reduce a Current Harmonisa
An Implementation of a Filter Design Passive LC in Reduce a Current Harmonisa
Universitas Pembangunan Panca Budi
 
Simultaneous Response of Dividend Policy and Value of Indonesia Manufacturing...
Simultaneous Response of Dividend Policy and Value of Indonesia Manufacturing...Simultaneous Response of Dividend Policy and Value of Indonesia Manufacturing...
Simultaneous Response of Dividend Policy and Value of Indonesia Manufacturing...
Universitas Pembangunan Panca Budi
 
Insecure Whatsapp Chat History, Data Storage and Proposed Security
Insecure Whatsapp Chat History, Data Storage and Proposed SecurityInsecure Whatsapp Chat History, Data Storage and Proposed Security
Insecure Whatsapp Chat History, Data Storage and Proposed Security
Universitas Pembangunan Panca Budi
 
Online Shoppers Acceptance: An Exploratory Study
Online Shoppers Acceptance: An Exploratory StudyOnline Shoppers Acceptance: An Exploratory Study
Online Shoppers Acceptance: An Exploratory Study
Universitas Pembangunan Panca Budi
 
Prim and Genetic Algorithms Performance in Determining Optimum Route on Graph
Prim and Genetic Algorithms Performance in Determining Optimum Route on GraphPrim and Genetic Algorithms Performance in Determining Optimum Route on Graph
Prim and Genetic Algorithms Performance in Determining Optimum Route on Graph
Universitas Pembangunan Panca Budi
 
Multi-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Multi-Attribute Decision Making with VIKOR Method for Any Purpose DecisionMulti-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Multi-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Universitas Pembangunan Panca Budi
 
Mobile Application Detection of Road Damage using Canny Algorithm
Mobile Application Detection of Road Damage using Canny AlgorithmMobile Application Detection of Road Damage using Canny Algorithm
Mobile Application Detection of Road Damage using Canny Algorithm
Universitas Pembangunan Panca Budi
 
Super-Encryption Cryptography with IDEA and WAKE Algorithm
Super-Encryption Cryptography with IDEA and WAKE AlgorithmSuper-Encryption Cryptography with IDEA and WAKE Algorithm
Super-Encryption Cryptography with IDEA and WAKE Algorithm
Universitas Pembangunan Panca Budi
 
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Universitas Pembangunan Panca Budi
 
Prototype Application Multimedia Learning for Teaching Basic English
Prototype Application Multimedia Learning for Teaching Basic EnglishPrototype Application Multimedia Learning for Teaching Basic English
Prototype Application Multimedia Learning for Teaching Basic English
Universitas Pembangunan Panca Budi
 
TOPSIS Method Application for Decision Support System in Internal Control for...
TOPSIS Method Application for Decision Support System in Internal Control for...TOPSIS Method Application for Decision Support System in Internal Control for...
TOPSIS Method Application for Decision Support System in Internal Control for...
Universitas Pembangunan Panca Budi
 
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Universitas Pembangunan Panca Budi
 
Violations of Cybercrime and the Strength of Jurisdiction in Indonesia
Violations of Cybercrime and the Strength of Jurisdiction in IndonesiaViolations of Cybercrime and the Strength of Jurisdiction in Indonesia
Violations of Cybercrime and the Strength of Jurisdiction in Indonesia
Universitas Pembangunan Panca Budi
 
Marketing Strategy through Markov Optimization to Predict Sales on Specific P...
Marketing Strategy through Markov Optimization to Predict Sales on Specific P...Marketing Strategy through Markov Optimization to Predict Sales on Specific P...
Marketing Strategy through Markov Optimization to Predict Sales on Specific P...
Universitas Pembangunan Panca Budi
 
Prim's Algorithm for Optimizing Fiber Optic Trajectory Planning
Prim's Algorithm for Optimizing Fiber Optic Trajectory PlanningPrim's Algorithm for Optimizing Fiber Optic Trajectory Planning
Prim's Algorithm for Optimizing Fiber Optic Trajectory Planning
Universitas Pembangunan Panca Budi
 
Image Similarity Test Using Eigenface Calculation
Image Similarity Test Using Eigenface CalculationImage Similarity Test Using Eigenface Calculation
Image Similarity Test Using Eigenface Calculation
Universitas Pembangunan Panca Budi
 
Data Compression Using Elias Delta Code
Data Compression Using Elias Delta CodeData Compression Using Elias Delta Code
Data Compression Using Elias Delta Code
Universitas Pembangunan Panca Budi
 
A Review of IP and MAC Address Filtering in Wireless Network Security
A Review of IP and MAC Address Filtering in Wireless Network SecurityA Review of IP and MAC Address Filtering in Wireless Network Security
A Review of IP and MAC Address Filtering in Wireless Network Security
Universitas Pembangunan Panca Budi
 
Expert System of Catfish Disease Determinant Using Certainty Factor Method
Expert System of Catfish Disease Determinant Using Certainty Factor MethodExpert System of Catfish Disease Determinant Using Certainty Factor Method
Expert System of Catfish Disease Determinant Using Certainty Factor Method
Universitas Pembangunan Panca Budi
 
Application of Data Encryption Standard and Lempel-Ziv-Welch Algorithm for Fi...
Application of Data Encryption Standard and Lempel-Ziv-Welch Algorithm for Fi...Application of Data Encryption Standard and Lempel-Ziv-Welch Algorithm for Fi...
Application of Data Encryption Standard and Lempel-Ziv-Welch Algorithm for Fi...
Universitas Pembangunan Panca Budi
 
An Implementation of a Filter Design Passive LC in Reduce a Current Harmonisa
An Implementation of a Filter Design Passive LC in Reduce a Current Harmonisa An Implementation of a Filter Design Passive LC in Reduce a Current Harmonisa
An Implementation of a Filter Design Passive LC in Reduce a Current Harmonisa
Universitas Pembangunan Panca Budi
 
Simultaneous Response of Dividend Policy and Value of Indonesia Manufacturing...
Simultaneous Response of Dividend Policy and Value of Indonesia Manufacturing...Simultaneous Response of Dividend Policy and Value of Indonesia Manufacturing...
Simultaneous Response of Dividend Policy and Value of Indonesia Manufacturing...
Universitas Pembangunan Panca Budi
 
Insecure Whatsapp Chat History, Data Storage and Proposed Security
Insecure Whatsapp Chat History, Data Storage and Proposed SecurityInsecure Whatsapp Chat History, Data Storage and Proposed Security
Insecure Whatsapp Chat History, Data Storage and Proposed Security
Universitas Pembangunan Panca Budi
 
Prim and Genetic Algorithms Performance in Determining Optimum Route on Graph
Prim and Genetic Algorithms Performance in Determining Optimum Route on GraphPrim and Genetic Algorithms Performance in Determining Optimum Route on Graph
Prim and Genetic Algorithms Performance in Determining Optimum Route on Graph
Universitas Pembangunan Panca Budi
 
Multi-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Multi-Attribute Decision Making with VIKOR Method for Any Purpose DecisionMulti-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Multi-Attribute Decision Making with VIKOR Method for Any Purpose Decision
Universitas Pembangunan Panca Budi
 
Mobile Application Detection of Road Damage using Canny Algorithm
Mobile Application Detection of Road Damage using Canny AlgorithmMobile Application Detection of Road Damage using Canny Algorithm
Mobile Application Detection of Road Damage using Canny Algorithm
Universitas Pembangunan Panca Budi
 
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Universitas Pembangunan Panca Budi
 
Prototype Application Multimedia Learning for Teaching Basic English
Prototype Application Multimedia Learning for Teaching Basic EnglishPrototype Application Multimedia Learning for Teaching Basic English
Prototype Application Multimedia Learning for Teaching Basic English
Universitas Pembangunan Panca Budi
 
TOPSIS Method Application for Decision Support System in Internal Control for...
TOPSIS Method Application for Decision Support System in Internal Control for...TOPSIS Method Application for Decision Support System in Internal Control for...
TOPSIS Method Application for Decision Support System in Internal Control for...
Universitas Pembangunan Panca Budi
 
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Combination of Levenshtein Distance and Rabin-Karp to Improve the Accuracy of...
Universitas Pembangunan Panca Budi
 
Violations of Cybercrime and the Strength of Jurisdiction in Indonesia
Violations of Cybercrime and the Strength of Jurisdiction in IndonesiaViolations of Cybercrime and the Strength of Jurisdiction in Indonesia
Violations of Cybercrime and the Strength of Jurisdiction in Indonesia
Universitas Pembangunan Panca Budi
 
Marketing Strategy through Markov Optimization to Predict Sales on Specific P...
Marketing Strategy through Markov Optimization to Predict Sales on Specific P...Marketing Strategy through Markov Optimization to Predict Sales on Specific P...
Marketing Strategy through Markov Optimization to Predict Sales on Specific P...
Universitas Pembangunan Panca Budi
 
Prim's Algorithm for Optimizing Fiber Optic Trajectory Planning
Prim's Algorithm for Optimizing Fiber Optic Trajectory PlanningPrim's Algorithm for Optimizing Fiber Optic Trajectory Planning
Prim's Algorithm for Optimizing Fiber Optic Trajectory Planning
Universitas Pembangunan Panca Budi
 
A Review of IP and MAC Address Filtering in Wireless Network Security
A Review of IP and MAC Address Filtering in Wireless Network SecurityA Review of IP and MAC Address Filtering in Wireless Network Security
A Review of IP and MAC Address Filtering in Wireless Network Security
Universitas Pembangunan Panca Budi
 
Expert System of Catfish Disease Determinant Using Certainty Factor Method
Expert System of Catfish Disease Determinant Using Certainty Factor MethodExpert System of Catfish Disease Determinant Using Certainty Factor Method
Expert System of Catfish Disease Determinant Using Certainty Factor Method
Universitas Pembangunan Panca Budi
 

Recently uploaded (20)

Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Engage Donors Through Powerful Storytelling.pdf
Engage Donors Through Powerful Storytelling.pdfEngage Donors Through Powerful Storytelling.pdf
Engage Donors Through Powerful Storytelling.pdf
TechSoup
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
"Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules""Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules"
rupalinirmalbpharm
 
03#UNTAGGED. Generosity in architecture.
03#UNTAGGED. Generosity in architecture.03#UNTAGGED. Generosity in architecture.
03#UNTAGGED. Generosity in architecture.
MCH
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
Kasdorf "Accessibility Essentials: A 2025 NISO Training Series, Session 5, Ac...
Kasdorf "Accessibility Essentials: A 2025 NISO Training Series, Session 5, Ac...Kasdorf "Accessibility Essentials: A 2025 NISO Training Series, Session 5, Ac...
Kasdorf "Accessibility Essentials: A 2025 NISO Training Series, Session 5, Ac...
National Information Standards Organization (NISO)
 
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
Nguyen Thanh Tu Collection
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Sugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptxSugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptx
Dr. Renu Jangid
 
dynastic art of the Pallava dynasty south India
dynastic art of the Pallava dynasty south Indiadynastic art of the Pallava dynasty south India
dynastic art of the Pallava dynasty south India
PrachiSontakke5
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
Grade 3 - English - Printable Worksheet (PDF Format)
Grade 3 - English - Printable Worksheet  (PDF Format)Grade 3 - English - Printable Worksheet  (PDF Format)
Grade 3 - English - Printable Worksheet (PDF Format)
Sritoma Majumder
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Engage Donors Through Powerful Storytelling.pdf
Engage Donors Through Powerful Storytelling.pdfEngage Donors Through Powerful Storytelling.pdf
Engage Donors Through Powerful Storytelling.pdf
TechSoup
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
"Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules""Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules"
rupalinirmalbpharm
 
03#UNTAGGED. Generosity in architecture.
03#UNTAGGED. Generosity in architecture.03#UNTAGGED. Generosity in architecture.
03#UNTAGGED. Generosity in architecture.
MCH
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
Nguyen Thanh Tu Collection
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Sugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptxSugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptx
Dr. Renu Jangid
 
dynastic art of the Pallava dynasty south India
dynastic art of the Pallava dynasty south Indiadynastic art of the Pallava dynasty south India
dynastic art of the Pallava dynasty south India
PrachiSontakke5
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
Grade 3 - English - Printable Worksheet (PDF Format)
Grade 3 - English - Printable Worksheet  (PDF Format)Grade 3 - English - Printable Worksheet  (PDF Format)
Grade 3 - English - Printable Worksheet (PDF Format)
Sritoma Majumder
 

Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)

  • 1. International Journal of Engineering Trends and Technology (IJETT) – Volume 38 Number 7- August 2016 ISSN: 2231-5381 http://www.ijettjournal.org Page 380 Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms) Leni Marlina1 , Muslim2 , Andysah Putera Utama Siahaan3 Faculty of Computer Science Universitas Pembangunan Panca Budi Jl. Jend. Gatot Subroto Km. 4,5 Sei Sikambing, 20122, Medan, Sumatera Utara, Indonesia Abstract - The development of data miningis inseparable from the recent developments in information technology that enables the accumulation of large amounts of data. For example, a shopping mall that records every sales transaction of goods using various POS (point of sales). Database data from these sales could reach a large storage capacity, even more being added each day, especially when the shopping center will develop into a nationwide network. The development of the internet at the moment also has a share large enough in the accumulation of data occurs. But the rapid growth of data accumulation it has created conditions that are often referred to as "data rich but information poor" because the data collected can not be used optimally for useful applications. Not infrequently the data set was left just seemed to be a "grave data". There are several techniques used in data mining which includes association, classification, and clustering. In this paper, the author will do a comparison between the performance of the technical classification methods naïve Bayes and C4.5 algorithms. Keywords - Data mining, Classification , Naïve bayes, C4.5 I. INTRODUCTION The word "mining" means of a large number of base materials which have a long process and the literature of other disciplines such as artificial intelligence, statistics and database [1][4]. Some techniques that are often mentioned in the literature data mining are clustering, classification, association rule mining, neural networks, genetic algorithms and others. Of all the existing techniques, the distinguishes perceptions of data mining is the development of data mining techniques are applied to the database application on a large scale that turns on the application of large-scale data that can provide a lot of new challenges that could ultimately bring methodologies new [2][3]. Before data mining is becoming popular as it is today, data mining techniques can only be applied to data with small- scale only. The commencement of the application of data mining in the business world today, come to make data mining is also applied to other fields that require large-scale data analysis such as bioinformatics and defense. There are several techniques used in data mining which includes association, classification, and clustering. Association rule mining is a data mining techniques to discover the rules of associative between a combination of items. Classification is the process of finding a model or function which explain or differentiate the concept or class of data, with the aim to be able to estimate the class of an object that the label is not known. Clustering performs a set of known-based data without specific data classes. In this paper, the author will do a comparison between the performance of the classification techniques between Naïve Bayes and C4.5 II. THEORIES A. Data Mining. In a simple data mining is mining or the discovery of new information by looking for patterns or specific rules on very large amounts of data. Data mining is also referred to as a series of processes for adding additional value in the form of knowledge that had been unknown manually from a data set. Data mining, often referred to as knowledge discovery in databases [6]. KDD is an activity that includes the collection, use of data, historical to find regularities, patterns or relationships in large data sets. There are some factors that define data mining: 1. Data mining is the process of digging an added value to the data collected in the past. 2. The object of data mining is that large amounts of data or complex. 3. The purpose of data mining is to find connections or patterns that may provide a useful indication Data mining is not an entirely new field. One of the difficulties in defining the data inherited many aspects and techniques from the fields of science already established that existing first. B. Data Mining Technique Association Association also called the market basket analysis. A typical business the problem is to analyze the sales transaction table and identify products that are often purchased together by the consumer [5][6]. Association function is often used to find a relation or correlation between the set of items. Association rule mining often calls the role in the context of the
  • 2. International Journal of Engineering Trends and Technology (IJETT) – Volume 38 Number 7- August 2016 ISSN: 2231-5381 http://www.ijettjournal.org Page 381 purposes of marketing strategy, catalog design, and business decision-making process. Association rule mining also has a description that is not much different that mining techniques to find an associative rule that exists between a combination of items. Traditionally, the association rules are used to find business trends by analyzing consumer transactions [9]. Important or not associative rules can be determined by two parameters, namely the support which is a percentage of a combination of items in the databaseand the confidence that the strong relationship between items in the rules associative Classification Classification is an act to give the group in every circumstance. Each state contains a bunch of attributes, one of which is a class attribute. This method needs to find a model that can explain the class attribute as a function of the input attribute. A decision tree is one of the most popular methods of classification because it is easy to be interpreted by humans. Here, each branching stating the conditions that must be met and the tips of trees declared class data. Decision tree algorithm C4.5 is the most famous, but the algorithm is not able to handle the data that has a large scale. The classification process is divided into two stages: learning and test [7][10]. At this stage of learning, some of the data that has been known will be fed to build the model estimates, which later in the test phase, the model that has been formed will be tested by using most other data to determine the accuracy of the model. If the accuracy is limited, then the model can be used to predict the unknown data class. Clustering Clustering also referred to as segmentation. This method is used to identify a natural group of a case based on an attribute group, and group data that have similar attributes. Clustering grouping based on the data without specific data classes. Even in classes that data is not yet known because, the clustering is often classified as unsupervised learning methods [8]. The principle of clustering is to maximize the similarity between members of the class and minimize similarities between classes / clusters. Clustering can be performed on the data that has several attributes that have been mapped as a multidimensional space. Most clustering algorithm to build a model through a series of repetition and stop when the models are converging or assembled. III. EVALUATION A. Bayes Theorema In probability theory and statistics, Bayes theorem is a theorem with two different interpretations. In the interpretation of Bayes theorem states how much the degree of subjective belief must rationally change when there is a new lead. In frequentist interpretation of this theory describes the representation of the inverse probability of two events. This theorem is the basis of statistical Bayes and has applications in science, engineering, economics, game theory, medicine, and law. Application of Bayes' theorem to update the trust is called Bayesian inferences. Bayes Theorem, named Rev. Thomas Bayes, describes the relationship between the conditional probability of two events A and B as follows: Naive Bayes algorithm is one of the algorithms contained on classification techniques. Naive Bayes classification method is a probability and statistics raised by the British scientist Thomas Bayes, which predict future opportunities based on the experience of earlier and became known as Bayes' Theorem. The naive theorem is combined with an attribute condition where it is assumed to be independent. Naive Bayes classification is assumed that there is or is not a specific characteristic of a class has nothing to do with the characteristics of other classes. Equation of Bayes' theorem is: Remarks : X = Unknown data class H = Hipothesis data P(H|X) = Hipothesisprobability P(H) = Prior probability P(X|H) = Condition probabilty P(X) = Probability To explain Naive Bayes theorem, note that the classification process requires some clues to determine what classes are suitable for the samples analyzed. Therefore, the Bayes theorem above adjusted as follows: C represents the class, while the variable F1 ... Fn represents the characteristics of instructions needed to perform the classification. The formula is explained
  • 3. International Journal of Engineering Trends and Technology (IJETT) – Volume 38 Number 7- August 2016 ISSN: 2231-5381 http://www.ijettjournal.org Page 382 that the chances entry of samples of certain characteristics in the class C (Posterior). It is the chance appearance of class C (before the entry of the sample, often called priors), multiplied by the likelihood of the characteristics of the sample characteristics to the class C (also called likelihood), divided by the likelihood of global characteristics of the sample characteristics (also called evidence). Therefore, the above formula can also be written simply as follows: Evidence value is always fixed for each class on a single sample. The value of the posterior will be compared with the values of other class posteriors to determine to what class a sample would be classified. Further elaboration Bayes formula is done by describing using the product rule as follows: It can be seen that the translation of these causes more and more complex factors that affect the value of the terms of probability, which is almost impossible to be analyzed one by one. As a result, the calculation becomes difficult to do. Here is used the assumption of independence is very high (naive), that each user is independent (independent) from each other. With these assumptions, then apply a similarity as follows: From the equation above it can be concluded that the naive independence assumption makes requirements the opportunity to be simple, so the calculation becomes possible to do. Furthermore, the translation (|, ...,) 1 n P C F F can be simplified into: The above equation is a model of Naive Bayes theorem which would then be used in the classification process. For classification with continuous data Density Gauss used the formula: Remarks : P = Chance i X = Atribut i i x = Attribut value Y = Class j y = Subclass = Mean = Standard Deviation The flow of Naive Bayes method is as follows: 1. Read the training data 2. Calculate the amount and probability, but if the numerical data: a. Find the mean and standard deviation of each parameter is numeric data. b. Find probabilistic value by counting the number of the corresponding data from the same category divided by the amount of data in the category. 3. Getting the values in the table mean, standard deviation and probability. B. Algoritma C4.5 An algorithm C4.5 decision tree algorithm group. This algorithm has input in the form of training samples and samples. Training samples in the form of sample data that will be used to build a tree that has been substantiated. While samples are data field that will be used as a parameter within the classification data. C4.5 algorithms are algorithms result of the development of the algorithm ID3. Improvements from ID3 algorithm C4.5 algorithms performed in the case (Santosa, 2003): 1. Can handle with missing value 2. Can solve with continuous data 3. Pruning 4. There are rules The measures undertaken by the C4.5 algorithm in the form of a decision tree is as follows: 1. At the beginning of the establishment of the tree will begin to create a node that symbolizes the training sample. 2. If the samples have the same class, then the node is used as a leaf node with the class label.
  • 4. International Journal of Engineering Trends and Technology (IJETT) – Volume 38 Number 7- August 2016 ISSN: 2231-5381 http://www.ijettjournal.org Page 383 3. If the samples do not have the same class, then the algorithm will seek gain the highest ratio of the available attributes, as a way to select the attributes that most influence on the training sample provided. Later this attribute will be attributed to the examiner or the decision on that node. The thing to note is that when those attributes are worth continue, then attributes must be discrete first. 4. Branch for each node will be established based on the known values of attribute testing. 5. This algorithm will continue to do the same process recursively to form a decision tree for each sample in each of its parts. 6. The recursive process will stop, when one of the following conditions are met, namely: a. All samples were given to the node is derived from the same class. b. No other attributes that can be used to partition the sample further. c. No samples that meet test attribute.Dalam this case, a leaf is created and labeled with the class that has the largest sample (majority voting). At this stage of the learning algorithm C4.5 has two working principles, such as: 1. Making the decision tree. The purpose of the algorithm is to construct a decision tree inducers tree data structure that can be used to predict the class of a case or a new sample that do not have class. C4.5 decision tree is doing construction with the divide and conquer method. At first only created the root node by applying a divide and conquer algorithm. It chooses to solve cases that best by calculating and comparing the gain ratio; then nodes formed at the next level, divide, and conquered algorithm will be applied again to form the leaves. 2. Making the rules (rule set). Rules are rules that form of decision trees will form a condition in the form if-then. These rules are obtained by tracking the decision tree from the root to the leaves. Each node and branching requirements will form a condition or an if, while the values contained in the leaves will form an outcome or an then. IV. CONCLUSION Each of these techniques and methods has their way. Each algorithm has advantages and disadvantages. C4.5 algorithm works by grouping several training sample data that will result in a decision tree based on the facts on the training data. While on Bayes, decision obtained based on existing experience at previous events. Bayes counts events that occur in the data into samples to determine the decision on the problems faced. REFERENCES [1] M. J. Berry, G. Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Support, New York: John Wiley & Sons, Inc, 1997. [2] D. T. Larose, Data Mining Methods and Models, Canada: A John Wiley & Sons, Inc, 2006. [3] A. Kumar, O. Singh, V. Rishiwal, R. K. Dwivedi, R. Kumar, “Association Rule Mining On Web Logs For Extracting Interesting Patterns Through Weka Tool,” International Journal of Advanced Technology In Engineering And Science, vol. 3, no. 1, pp. 134-140, 2015. [4] C. D., Discovering Knowledge in Data: An Introduction to Data Mining, Canada: John Wiley & Sons, 2014. [5] T. Krishna, D. Vasumathi, “A Study of Mining Software Engineering Data and Software Testing,” Journal of Emerging Trends in Computing and Information Sciences, vol. 2, no. 11, 2011. [6] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “The WEKA Data Mining Software: An Update,” SIGKDD Explorations, vol. 11, no. 1, pp. 10-18, 2015. [7] D. Tomar, S. Agarwal, “A survey on Data Mining approaches for Healthcare,” International Journal of Bio-Science and Bio-Technology, vol. 5, no. 5, pp. 241-266, 2013. [8] T. Silwattananusarn, A. D. KulthidaTuamsuk, “Data Mining and Its Applications for Knowledge Management : A Literature Review from 2007 to 2012,” International Journal of Data Mining & Knowledge Management Process, vol. 2, no. 5, 2012. [9] S. Rajagopal, “Customer Data Clustering Using Data Mining Technique,” International Journal of Database Management Systems, vol. 3, no. 4, pp. 1-11, 2011. [10] W. Fitriani and A. P. U. Siahaan, "Comparison Between WEKA and Salford Systemin Data Mining Software," International Journal of Mobile Computing and Application, vol. 3, no. 4, pp. 1-4, 2016.