0% found this document useful (0 votes)
22 views

Comparing The Performance of Frequent Pattern Mini

Uploaded by

yosobon319
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Comparing The Performance of Frequent Pattern Mini

Uploaded by

yosobon319
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/272864559

Comparing the Performance of Frequent Pattern Mining Algorithms

Article in International Journal of Computer Applications · May 2013


DOI: 10.5120/12129-8502

CITATIONS READS

33 16,840

2 authors, including:

Kanwal Garg
Kurukshetra University
22 PUBLICATIONS 187 CITATIONS

SEE PROFILE

All content following this page was uploaded by Kanwal Garg on 16 March 2016.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications (0975 – 8887)
Volume 69– No.25, May 2013

Comparing the Performance of Frequent Pattern


Mining Algorithms
Dr. Kanwal Garg Deepak Kumar
Assistant Professor, Dept. of Computer Science and M.Tech Research Scholar, Dept. of Computer
Applications, Kurukshetra University, Kurukshetra, Science & Applications, Kurukshetra University,
Haryana, India Kurukshetra, Haryana, India

ABSTRACT minimum support. The presented paper is organized in five


sections: the first section contains the introduction, the second
Frequent pattern mining is the widely researched field in data section presents a brief description of the three frequent pattern
mining because of it’s importance in many real life applications. mining algorithms namely Apriori, Eclat and FP Growth. The
Many algorithms are used to mine frequent patterns which gives third section gives the methodology used. The fourth section
different performance on different datasets. Apriori, Eclat and presents a comparative analysis of the algorithms used under
FP Growth are the initial basic algorithm used for frequent varying conditions. Fifth section gives the conclusion and in the
pattern mining. The premise of this paper is to find major last references are listed.
issues/challenges related to algorithms used for frequent pattern
mining with respect to transactional database.
2. FREQUENT PATTERN MINING
General Terms ALGORITHMS
Algorithms.
Now the researcher elaborate the various frequent itemset
Keywords mining algorithms.
Data Mining, Frequent Pattern Mining.

2.1 Apriori Algorithm


Apriori is the very first algorithm for mining frequent patterns. It
1. INTRODUCTION was given by R agarwal and R srikant in 1994 [5].It works on
Data mining has long been an active area of research in horizontal layout based database. It is based on Boolean
databases. The day by day decreasing cost and compactness of association rules which uses generate and test approach. It uses
storage devices has made it possible to store every transaction of BFS (breadth first search). Apriori uses frequent k itemsets to
a transactional database [2]. This storage solves two problems find a bigger itemset of k+1 items. In Apriori support count for
first they can access the data any times second this data helps each item is given, the algorithm first scan the database to find
them to find relationship among data items. The problem of out all frequent items based on support. The calculation of
finding relationship among different data items was first frequency of an item is done by counting it’s occurrence in all
introduced by agarwal et. al.[1]. The solution to this problem transactions [6]. All infrequent items are dropped.
can help to enhance the earnings, optimized storage. In this Apriori property: All subsets of a frequent itemsets which are
section the researcher introduces the concept of transactional non empty are also frequent.
database, database layout, frequent pattern, frequent itemset and Apriori follows two steps approach:
candidate itemset. A database is a systematically arranged In the first step it joins two itemsets which contain k-1 common
collection of data, so that it can be retrieved and manipulated items in kth pass. The first pass starts from from the single item,
easily at a later time. There are different kinds of database, like the resulting set is called the candidate set Ck. In the second step
active database, cloud database, embedded database and the algorithm counts the occurrence of each candidate set and
transactional database etc, but in this paper the researcher deals prune all infrequent itemsets. The algorithm ends when no
with transactional database only. A transactional database is a further extension found.
database in which there is no auto commit. Most modern
relational database are the transactional database [3]. A database
layout tells how data is represented. There are two layout which 2.2 Eclat Algorithm
are in common use, horizontal layout and the vertical layout. In
horizontal layout there are two columns. First represents the Eclat is a vertical database layout algorithm used for mining
transaction id and second represents the items bought in that frequent itemsets. It is based on depth first search algorithm. In
transaction. In vertical layout the first column represent the item the first step the data is represented in a bit matrix form. If the
id and the second shows the transactions id in which the item is bought in a particular transaction the bit is set to one else
particular item is bought. There is a third layout also known as to zero. After that a prefix tree needs to be constructed. To find
projected layout. This is not a physical layout. In this layout the the first item for the prefix tree the algorithm uses the
system records only the transaction identifier and associated intersection of the first row with all other rows, and to create the
item. It is a divide and conquer mechanism which reduces the second child the intersection of the second row is taken with the
size of database recursively by considering only the longest rows following it [4]. In the similar way all other items are
pattern. A frequent pattern is a pattern which occurs in found and the prefix tree get constructed. Infrequent rows are
comparatively more transactions. A frequent itemset is an discarded from further calculations. To mine frequent itemsets
itemset whose support is greater than some user-specified the depth first search algorithm is applied to prefix tree with

29
International Journal of Computer Applications (0975 – 8887)
Volume 69– No.25, May 2013

backtracking.. Frequent patterns are stored in a bit matrix Figure 1. Comparison of Apriori, Eclat and FP
structure. Eclat is memory efficient because it uses prefix tree. Growth algorithm on artificial dataset.
The algorithm has good scalability due to the compact
representation.

2.3 Fp Growth Algorithm

Frequent pattern growth also labeled as FP growth is a tree


based algorithm to mine frequent patterns in database the idea
was given by (han et. al. 2000) [10].It is applicable to projected
type database. It uses divide and conquer method [7]. In it no
candidate frequent itemset is needed rather frequent patterns are
mined from fp tree. In the first step a list of frequent itemset is
generated and sorted in their decreasing support order. This list
is represented by a structure called node. Each node in the fp
tree, other than the root node, will contain the item name,
support count, and a pointer to link to a node in the tree that has
the same item name [6]. These nodes are used to create the fp
Figure 2.Comparison of Apriori, Eclat and FP
tree. Common prefixes can be shared during fp tree
Growth algorithm on artificial dataset when the
construction. The paths from root to leaf nodes are arranged in
number of transactions are made three times
non increasing order of their support. Once the fp tree is
the original.
constructed then frequent patterns are extracted from the fp tree
starting from the leaf nodes. Each prefix path subtree is
processed recursively to mine frequent itemsets. FP Growth
takes least memory because of projected layout and is storage
efficient. A variant of fp tree is conditional FP tree that would be
built if we consider transactions containing a particular itemset
and then removing that itemset from all transactions. Another
variant is parallel fp growth (PFP) that is proposed to parallelize
the fp tree on distributed machines [8]. FP Growth is improved
using prefix-tree-structure, Grahne and Zhu [9].

3. METHODOLOGY
Figure 3. Comparison of apriori, éclat and FP
Growth algorithm on artificial dataset when the
The above mentioned three algorithms were implemented in number of attributes are made three times the
java and there performance was compared on synthetic dataset original.
by varying number of attributes and instances. The performance
comparing parameter is execution time.

4. COMPARATIVE ANALYSIS

The comparative analysis of the algorithms is shown below by


varying various parameter.

30
International Journal of Computer Applications (0975 – 8887)
Volume 69– No.25, May 2013

Figure 4. Comparison of Apriori, Eclat and FP the between the two. Also from figure 2 and figure 3 it is clear
Growth algorithm on a dataset with the number that increasing the number of attributes affects more any
of transactions three times the original and individual algorithm than increasing the number of tuples by
number of attributes three times the original. same factor. When the number of transactions are made three
times and the number of tuples are also increased three times,
there is sharp increase in time taken by all algorithms. At this
stage FP Growth performs the best and Apriori performs the
worst as shown in figure 4. Figure 5 shows the scaling of FP
Growth algorithms under different conditions. Figure 6 and
figure 7 shows scaling of éclat and apriori under different
conditions. From the comparison of figure 5, figure 6 and figure
7 it is clear that FP Growth performs well under all kinds of
variations and therefore is the best among three algorithms.
Apriori performs the worst among three algorithms and thus
shows least scalability, Where as éclat lies in the middle of FP
Growth and Apriori.
Figure 5. Scaling of the FP Growth with respect
to the number of varying attributes and
transactions. 5. CONCLUSION
Frequent pattern mining is the most important step in association
rules which finally helps us in many applications like market
basket analysis, clustering, series analysis, games, decision
making, object mining, website navigation etc. In this paper the
researcher surveyed the pattern mining algorithms namely
apriori, Eclat and FP Growth. It is found that apriori uses join
and prune method, Ecalt works on vertical datasets and FP
Growth constructs the conditional frequent pattern tree which
satisfy the minimum support.
The major weakness of Apriori algorithm is producing large
number of candidate itemsets and large number of database
scans which is equal to maximum length of frequent itemset [5].
Figure 6. Scaling of Eclat with number of It is very much expensive to scan large database[11]. A true
varying attributes and transactions. reason of apriori failure is it lacks efficient processing method
on database [7]. FP Growth is the best among the three
algorithms and is thus most scalable. Eclat performs poorer than
FP Growth and the Apriori performs the worst. In the future
improvements must be taken care to enhance the performance of
Apriori and Eclat using a better layout to store the data.

6. REFERENCES
[1] Agarwal, R.C., Agarwal, C.C. and Prasad, V.V.V. (2001) A
tree projection algorithm for generation of frequent item
sets. Journal of Parallel and Distributed Computing, 61(3),
Pp. 350–371.
Figure 7. Scaling of the Apriori with number of [2] Bhadoria et. al. Analysis of Frequent Itemset Mining on
varying attributes and transactions.
Variant Datasets published in int.J.comp. Tech.appl.,
vol(2)5, ISSN:2229-6093, Pp. 1328-1333.
[3] http://en.wikipedia.org/wiki/Database_transaction [on 11th
nov 2012].
[4] C.Borgelt. “Efficient Implementations of Apriori and Eclat”.
In Proc. 1st IEEE ICDM Workshop on Frequent Item Set
Mining Implementations, CEUR Workshop Proceedings
90, Aachen, Germany 2003.
[5] Goswami D.N et. al. “An Algorithm for Frequent Pattern
Mining Based On Apriori ” (IJCSE) International Journal
on Computer Science and Engineering Vol. 02, No. 04,
4. RESULT AND DISCUSSION 2010, Pp. 942-947.
On standard dataset FP Growth performs the best and Apriori [6] Rahul Mishra et. al. “Comparative Analysis of Apriori
takes the maximum time. When the number of transactions are Algorithm and Frequent Pattern Algorithm for Frequent
made three times the original, there is more increment in time Pattern Mining in Web Log Data.” (IJCSIT) International
for both FP Growth and Eclat than Apriroi. When the number of Journal of Computer Science and Information
attributes are made three time the original, FP Growth performs Technologies, Vol. 3 (4) , 2012, Pp. 4662 – 4665.
the best and Apriori shows a sharp increase while Eclat lies in

31
International Journal of Computer Applications (0975 – 8887)
Volume 69– No.25, May 2013

Workshop on High Performance Data Mining:


Pervasive and Data Stream Mining.
[7] SathishKumar et al. “Efficient Tree Based Distributed Data
Mining Algorithms for mining Frequent Patterns” [10] Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns
International Journal of Computer Applications (0975 – without candidate generation. In Proc. 2000
8887) Volume 10– No.1, November 2010. ACMSIGMOD Int. Conf. Management of Data.
[8] Haoyuan Li,Yi Wang,Dong Zhang, Ming Zhang,Edward [11] Deepak Garg et. al. “Comparative Analysis of Various
Chang 2008.”Pfp: parallel fp-growth for query Approaches Used in Frequent Pattern Mining” (IJACSA)
recommendation Proceedings of the 2008 ACM International Journal of Advanced Computer Science and
conference on Recommender systems Pp. 107-114. Applications, Special Issue on Artificial Intelligence.
[9] G. Grahne and J. Zhu , May 2003. ”High performance
mining of maximal frequent itemsets”, In SIAM’03

32

IJCATM : www.ijcaonline.org
View publication stats

You might also like