0% found this document useful (0 votes)
9 views4 pages

Fptreehuffman

Uploaded by

manabideka19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Fptreehuffman

Uploaded by

manabideka19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

IJCSI International Journal of Computer Science Issues, Vol.

9, Issue 3, No 2, May 2012


ISSN (Online): 1694-0814
www.IJCSI.org 466

Construction of FP Tree using Huffman Coding.


Dr. S.N. Patro 1, Prof. Sujogya Mishra 2, Mr. Pratyusabhanu Khuntia3 and Mr. Chidananda Bhagabati 4
1
DRIEMS
Cuttack, Orissa, India

2
Department of CSE
Krupajal Engineering College
Bhubaneswar, Orissa, India

3
Department of CSE
Krupajal Engineering College
Bhubaneswar, Orissa, India

4
Department of CSE
Krupajal Engineering College
Bhubaneswar, Orissa, India

1. Introduction analyzing and presenting strong rules discovered in


databases using different measures of interestingness .
Generally, data mining is the process of analyzing data association rules are employed today in many application
from different perspectives and summarizing it into areas including web usage mining,intrusion detection &
useful information - information that can be used to bioinformatics. Association rule
increase revenue, cuts costs, or both. Data mining
software is one of a number of analytical tools for learning typically does not consider the order of items
analyzing data. It allows users to analyze data from many either within a transaction or across transactions.
different dimensions or angles, categorize it, and
summarize the relationships identified. Technically, data 1.2. Table 1: Data base with 4 items and 5
mining is the process of finding correlations or patterns
among dozens of fields in large relational databases.
transactions

Data mining is primarily used today by companies with


a strong consumer focus - retail, financial, transaction ID milk bread butter beer
communication, and marketing organizations. It enables
these companies to determine relationships among 1 1 1 0 0
"internal" factors such as price, product positioning, or 2 0 0 1 0
staff skills, and "external" factors such as economic
indicators, competition, and customer demographics. 3 0 0 0 1
And, it enables them to determine the impact on sales, 4 1 1 1 0
customer satisfaction, and corporate profits. Finally, it
enables them to "drill down" into summary information
5 0 1 0 0
to view detail transactional data.
The problem of association rule mining is defined as:
1.1. In data mining, a pattern is a particular data I={i1,i2 ,…in} be a set of n binary attributes called items.
behavior, arrangement or form that might be of a Let D={t1,t2,…tm} be a set of transactions called the
business interest, even though we are not sure about that database. Each transaction in D has a unique transaction
yet. But it is a straight point. ID and contains a subset of the items in I. A rule is
defined as an implication of the form X=>Ywhere
Frequent patterns are item sets, subsequences, or X,Y<=I andX Y. The sets of items (for short itemsets) X
substructures that appear in a data set with frequency no and Y are called antecedent (left-hand-side or LHS) and
less than a user-specified threshold. A subsequence, such consequent (right-hand-side or RHS) of the rule
as buying first a PC, then a digital camera, and then a respectively.
memory card, if it occurs frequently in a shopping history
database, is a frequent pattern. To illustrate the concepts, we use a small example from
the supermarket domain. The set of items is
In data mining ,association rule learning is a popular and I={milk,bread,butter,butter} and a small database
well researched method for discovering interesting containing the items (1 codes presence and 0 absence of
relations between variables in large databases, describes

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 3, No 2, May 2012
ISSN (Online): 1694-0814
www.IJCSI.org 467

an item in a transaction) is shown in the table to the  Store the set of frequent items of each
right. An example rule for the supermarket could be transaction in some compact structure, to avoid
{bread,butter}=>milk, meaning that if butter and bread repeatedly scanning of DB.
are bought, customers also buy milk.  If multiple transactions share an identical
frequent item set, they can be merged into one
Frequent pattern mining plays an essential role in mining with the number of occurrences registered as
associations, correlations casualty, sequential patterns, count.
episodes, multi-dimensional patterns, max-patterns,
partial periodicity, emerging patterns, and many other The frequent items are sorted in their frequency
data mining tasks. descending order.

Most of the previous studies, adopt an Apriori-like


approach, which is based on an anti-monotone Apriori
heuristic: if any length k pattern is not frequent in the Table 2. A transaction database as running example.
database, its length(k+1) super-pattern can never be
frequent. The essential idea is to iteratively generate the TID ITEMS BOUGHT FREQUENT ITEMS
set of candidate patterns of length(k+1) from the set of 100 f; a; c; d; g; i; m;p f; c; a; m; p
frequent patterns of length of(k-1), and check their 200 a; b; c; f; l; m; o f; c; a; b;m
corresponding occurrence frequencies in the database. 300 b; f; h; j; o f; b
400 b; c; k; s; p c; b; p
The above studies achieve good performance gain by
500 a; f; c; e; l; p; m; n f; c; a; m; p
(possibly significantly) reducing the size of candidate
sets. But it is costly to handle a huge no. of candidate sets
and it’s tedious to repeatedly scan the database and check 2.1. Construction of FP-tree using Huffman
a large set of candidates by pattern matching. Coding:
Here we have proposed a compact data structure, called Calculate the no. of occurrences of each item. E.g.-In the
frequent pattern tree, constructed using an extended above example occurrences of different items are as
Huffman code-tree structure storing crucial quantitative follows-
information about frequent patterns. F=4,C=4,A=3,M=3,P=3,B=3.
The FP-tree is constructed from the above items is as
This paper proposes a novel frequent pattern tree follows-
structure based on an efficient FP-tree-based mining  Read the items with maximum
method: i.e FP-growth. This approach is more efficient occurrences.
due to compression of large database into smaller data  Form the first node with F=4, C=4.
structure, pattern fragment growth mining, partitioning  Form the next node with A=3, M=3,
based method. keeping the nodes with maximum value to
left side.
2.Design and Construction  At last form the node with P=3, B=3.
 Indicate the vertices with binary value 0,
Let I=<a1,a2,…an>Let I = <a1; a2, an> be a set of items,
1.Give value ‘0’ to the left vertices and ‘1’
and a transaction database DB=<T1,T2,…Tn>, where Ti is
a transaction which contains a set items in I. The support to the right vertices.
(or occurrence frequencies) of a pattern A, which is a set Diagrammatic representation of different steps in
of items, is the no. of transactions containing A in DB.A construction of the Huffman tree are as follows-
is a frequent pattern if A’s support is no less than a
predefined minimum threshold £. 1st Step:-

Given a transaction database DB and a minimum support C: F: 4 M: A: B: P: 3


threshold, £, the problem of finding the complete set of
frequent patterns is called the frequent pattern mining
problem.
2nd Step:-
Example 1: Let the transaction database, DB, be (the
1st two columns of) Table 1 and £= 3.A compact data
structure can be designed based on the following M: A: B: P: 8
observations.
 Perform one scan of DB to identify the set of
frequent items.
C: F:
4

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 3, No 2, May 2012
ISSN (Online): 1694-0814
www.IJCSI.org 468

3rd Step:- Table 3. Code Generated

Item Code
B: P: 6 8 C 00
3 3 F 01
M 100
A 101
B 110
M A: C: F:
:3 3 4
P 111
4
4th Step: From the above codes, it can be observed that no two
items have the same code. So data can be saved
6 6 8 efficiently without any overlapping.

This approach is more efficient due to: compression of


large data base to smaller data structure, pattern
fragment growth mining method, and portioning based
M A B P C F divide-and-conquer search method.

3. Conclusion.
5th step:
We have proposed a novel data structure, frequent
pattern tree (FP-tree) using Huffman coding, for storing
1 compressed, crucial information about frequent patterns,
8 and developed a pattern growth method, FP-growth, for
efficient mining of frequent patterns in large databases.

6 6 There are several advantages of FP-growth over other


C: F: approaches: (1) It constructs a highly compact FP-tree,
which is usually substantially smaller than the original
database, and thus saves the costly database scans in the
subsequent mining processes. (2) It applies a pattern
M A B P: growth method which avoids
costly candidate generation.(3)It generates unique binary
6th Step: codes for each data item, which avoids data redundancy
and repetition of data
20

References
[1] Mining Frequent Patterns without candidate
generation- Jiawei Han,Jian Pei,Yiwen Yin.
8 12 [2] Introduction to Algorithms-T.H.Cormen,
C.E.Leiserson, R.L.Rivest.
[3] Data Mining Concepts-Michael J.A. Berry and
Gordon Linoff, Wiley, 1997.
[4] R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A
C: 4
tree projection algorithm for generation of frequent
F: 4 6 6
itemsets. In J. Parallel and Distributed
Computing,2000.
[5] R. Agrawal and R. Srikant. Fast algorithms for
mining association rules. In VLDB'94, pp. 487{499.
[6] R. Agrawal and R. Srikant. Mining sequential
M A: 3 B: 3 P: 3 patterns. In ICDE'95, pp. 3-14.
[7] R. J. Bayardo. E_ciently mining long patterns from
databases. In SIGMOD'98, pp. 85-93.
[8] S. Brin, R. Motwani, and C. Silverstein. Beyond
From the above diagram, we can determine the codes of market basket: Generalizing association rules to
each input as follows: correlations. In SIGMOD'97, pp. 265-276.

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 3, No 2, May 2012
ISSN (Online): 1694-0814
www.IJCSI.org 469

[9] G. Grahne, L. Lakshmanan, and X. Wang. E_cient


mining of constrained correlated sets. In ICDE'00.
[10] J. Han, G. Dong, and Y. Yin. E_cient mining of
partial periodic patterns in time series database. In
ICDE'99, pp. 106-115.
[11] J. Han, J. Pei, and Y. Yin. Mining partial
periodicity using frequent pattern trees. In CS
Tech. Rep. 99-10, Simon Fraser University, July
1999.
[12] M. Kamber, J. Han, and J. Y. Chiang. Metarule -
guided mining of multi-dimensional association
rules using data cubes. In KDD'97, pp. 207-210.
[13] M. Klemettinen, H. Mannila, P. Ronkainen, H.
Toivonen, and A.I. Verkamo. Finding interesting
rules from large sets of discovered association
rules. In CIKM'94, pp. 401-408.
[14] B. Lent, A. Swami, and J. Widom. Clustering
association rules. In ICDE'97, pp. 220-231.
[15] H. Mannila, H Toivonen, and A. I. Verkamo.
Discovery of frequent episodes in event
sequences.Data Mining and Knowledge Discovery,
1:259-289, 1997.
[16] R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang.
Exploratory mining and pruning optimizations of
constrained associations rules, In SIGMOD'98.

Dr. S.N. Patro, a distinguishing figure in the field of


Computer Science and Engineering. At present he is
working in DRIEMS, Cuttack, Odisha. He has
completed Ph.D in Computer Science and Engineering.
His area of interest includes Wireless Sensor Network,
Cryptography, Algorithm Design and Analysis.

Prof. Sujogya Mishra is an eminent researcher in the


field of Computer Graphics, Data Base Management
Systems, Algorithm Design and Analysis. At present he
is working in the Department of Computer Science and
Engineering, Krupajal Engineering College,
Bhubaneswar. He has completed M.Tech in Computer
Science and Engineering. Now he is pursuing Ph.D in
Utkal University, Vani Vihar, Bhubaneswar, Odisha.

Mr. Pratyusabhanu Khuntia is a researcher in the field of


Algorithm Analysis and Design, Data Base Management
Systems. At present he is working in the Department of
Computer Science and Engineering, Krupajal
Engineering College, Bhubaneswar. He has completed
M.Tech in Computer Science and Engineering.

Mr. Chidananda Bhagabati is a research fellow in the


field of Algorithm Analysis and Design, Data Base
Management Systems. At present he is working in the
Department of Computer Science and Engineering,
Krupajal Engineering College, Bhubaneswar. He has
completed M.C.A from BPUT, Odisha.

Copyright (c) 2012 International Journal of Computer Science Issues. All Rights Reserved.

You might also like