0% found this document useful (0 votes)
7 views

Presentation On Counting Frequent Itemsets

Uploaded by

VISHWAJEET TYAGI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Presentation On Counting Frequent Itemsets

Uploaded by

VISHWAJEET TYAGI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Counting frequent itemsets in a stream,

Points to be covered
❑ PCY algorithm
❑ Example
❑ References
Frequent item Sets:
Set of items which occur more frequently(satisfying minimum support
count) in the given data set.

For example:
Bread and Butter generally occurs more frequently in the transactions data
set of a grocery store.
PCY algorithm:
It was developed by three Chinese scientists Park, Chen, and Yu. This is an
algorithm used in the field of big data analytics for the frequent itemset
mining when the dataset is very large.

Steps:
1.To identify the length or we can say repetition of each candidate
in the given dataset.
2.Reduce the candidate set to all having length 1.
3.Map pair of candidates and find the length of each pair.
4.Apply a hash function to find bucket no.
5.Draw a candidate set table.
Threshold value or minimization value = 3
Hash function= (i*j) mod 10
T1 = {1, 2, 3}
T2 = {2, 3, 4}
T3 = {3, 4, 5}
T4 = {4, 5, 6}
T5 = {1, 3, 5}
T6 = {2, 4, 6}
T7 = {1, 3, 4}
T8 = {2, 4, 5}
T9 = {3, 4, 6}
T10 = {1, 2, 4}
T11 = {2, 3, 5}
T12= {3, 4, 6}
Step 1: Mapping all the elements in order to find their length.
Items → {1, 2, 3, 4, 5, 6}

Key Value
1 4

2 6

3 8

4 8

5 6

6 4
Step 2: Removing all elements having value less than 1.
But here in this example there is no key having value less than 1.
Hence, candidate set = {1, 2, 3, 4, 5, 6}

Step 3: Map all the candidate set in pairs and calculate their
lengths
T1: {(1, 2) (1, 3) (2, 3)} = (2, 3, 3)
T2: {(2, 4) (3, 4)} = (3 4)
T3: {(3, 5) (4, 5)} = (5, 3)
T4: {(4, 5) (5, 6)} = (3, 2)
T5: {(1, 5)} = 1
T6: {(2, 6)} = 1
T7: {(1, 4)} = 2
T8: {(2, 5)} = 2
T9: {(3, 6)} = 2
T10:______
T11:______
T12:______
• Note: Pairs should not get repeated avoid the pairs that are already written before.
• Listing all the sets having length more than threshold value: {(1,3) (2,3) (2,4)
(3,4) (3,5) (4,5) (4,6)}
Step 4: Apply Hash Functions. (It gives us bucket number)
Hash Function = ( i * j) mod 10
(1, 3) = (1*3) mod 10 = 3
(2,3) = (2*3) mod 10 = 6
(2,4) = (2*4) mod 10 = 8
(3,4) = (3*4) mod 10 = 2
(3,5) = (3*5) mod 10 = 5
(4,5) = (4*5) mod 10 = 0
(4,6) = (4*6) mod 10 = 4
Now, arrange the pairs according to the ascending order of their
obtained bucket number.
Bucket no. Pair
0 (4,5)
2 (3,4)
3 (1,3)
4 (4,6)
5 (3,5)
6 (2,3)
8 (2,4)
Step 5: In this final step we will prepare the candidate set.

Bit vector Bucket no. Highest Support Pairs Candidate Set


Count
1 0 3 (4,5) (4,5)
1 2 4 (3,4) (3,4)
1 3 3 (1,3) (1,3)
1 4 3 (4,6) (4,6)
1 5 5 (3,5) (3,5)
1 6 3 (2,3) (2,3)
1 8 3 (2,4) (2,4)
• Note: Highest support count is the no. of repetition of that
vector.
• Check the pairs which have the highest support count more
than 3, and write in the candidate set, if less than 3 then reject.
References:
https://www.includehelp.com/big-data/pcy-algorithm-in-big-data-analytics.aspx

You might also like