0% found this document useful (0 votes)

460 views

Assignment 2

- The document appears to be an assignment for a course on data mining and knowledge discovery. It contains 3 questions related to the Apriori algorithm, association rule mining, and measures of correlation. - For the first question, the student draws an itemset lattice for a sample transactional dataset and calculates the percentage of frequent itemsets, pruning ratio, and false alarm rate of the Apriori algorithm. - The second question asks the student to analyze an association rule based on a contingency table and determine if purchases are independent. - The third question asks the student to compare various measures of correlation that can be used with the given transaction data.

Uploaded by

tzelo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

460 views

Assignment 2

Uploaded by

tzelo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Data Mining and Knowledge

Discovery
Assignment 2

Student: Angelos Ikonomakis s161216

Instructor: Jae-Gil Lee

Technical University of Denmark

Korea Advanced Institute of Science and Technology
April 14, 2017
KSE525 Assignment 1

Contents
1 Question 1 2
1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Question 2 5
2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Question 3 7
3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Page 1 of 12
KSE525 Assignment 1

1 Question 1
1.1 Description
The Apriori algorithm uses a generate and count strategy for deriving frequent
itemsets. Candidate itemsets of size k + 1 are created by joining a pair of frequent
itemsets of size k (this is known as a candidate generation step). A candidate is
discarded if anyone of its subsets is found to be infrequent during the candidate pruning
step. Suppose the Apriori algorithm is applied to the dataset shown below with
minsup = 30%, i.e., any itemset occurring in less then 3 transactions is considered to
be infrequent.

Transaction ID Items Bought

1 {a,b,d,e}
2 {b,c,d}
3 {a,b,d,e}
4 {a,c,d,e}
5 {b,c,d,e}
6 {b,d,e}
7 {c,d}
8 {a,b,c}
9 {a,d,e}
10 {b,d}

Figure 1: Example of market basket transactions

1. Draw an itemset lattice representing the dataset given in the above table. Label
each node in the lattice with the following letter(s):

• N : If the itemset is not considered to be a candidate itemset by the Apriori

algorithm. There are two reasons for an itemset not to be considered as
a candidate itemset. (1) it is not generated at all during the candidate
generation step, or (2) it is generated during the candidate generation step
but is subsequently removed during the candidate pruning step because one
of its subsets is found to be infrequent.
• F : If the candidate itemset is found to be frequent by the Apriori algo-
rithm.
• I : If the candidate itemset is found to be infrequent after support counting.

2. What is the percentage of frequent itemsets (with respect to all itemsets in the
lattice)?

3. What is the pruning ratio of the Apriori algorithm on this data set? (Pruning
ratio is defined as the percentage of itemsets not considered to be a candidate
because (1) they are not generated during candidate generation or (2) they are
pruned during the candidate pruning step.)

Page 2 of 12
KSE525 Assignment 1

4. What is the false alarm rate (i.e, percentage of candidate itemsets that are found
to be infrequent after performing support counting)

1.2 Answer
1. Firstly before drawing the lattice, we should calculate the support of each item-
set.

σ(a,b,d,e) 2
s1 = |T |
= 10
= 0, 2
σ(b,c,d) 2
s2 = |T |
= 10
= 0, 2
σ(a,b,d,e) 2
s3 = |T |
= 10
= 0, 2
σ(a,c,d,e) 1
s4 = |T |
= 10
= 0, 1
σ(b,c,d,e) 1
s5 = |T |
= 10
= 0, 1
σ(b,d,e) 4
s6 = |T |
= 10
= 0, 4
σ(c,d) 4
s7 = |T |
= 10
= 0, 4
σ(a,b,c) 1
s8 = |T |
= 10
= 0, 1
σ(a,d,e) 4
s9 = |T |
= 10
= 0, 4
σ(b,d) 6
s10 = |T |
= 10
= 0, 6

Then we should create a frequency table for each itemset.

Page 3 of 12
KSE525 Assignment 1

Item Count
a,b 3
a,c 2
Item Count Item Count
a,d 4
a 5 a,b,d 2
a,e 4
b 7 a,b,e 2
c 5
→ b,c 3 → b,c,d 2
b,d 6
d 9 a,d,e 4
b,e 4
e 6 b,d,e 4
c,d 4
c,e 2
d,e 6

Figure 2: Lattice (1-itemset) – (2-itemsets) – (3-itemsets)

Figure 3: (green-I) – (red-N) – (white-F)

2. The percentage of frequent itemsets is calculated by the fraction of the frequent

items of the lattice divided by the total number of itemsets. Thus,

Σ(F ) 15
F req = |T |
= 31
= 49%

3. The pruning ratio of the algorithm is calculated by summing the number of in-
frequent and Apriori algorithm candidates and then dividing them by teh total
number of itemsets. Thus,

Page 4 of 12
KSE525 Assignment 1

Σ(I+N ) 16
P run = |T |
= 31
= 51%

4. The false alarm rate is calculated by dividing the sum of Infrequent items after
performing support counting by the total number of itemsets. Thus,

Σ(I) 5
Alarm = |T |
= 31
= 16%

2 Question 2
2.1 Description
The following contingency table summarizes supermarket transaction data, where hot
dogs refer to the transactions containing hot dogs, hot dogs refers to the transactions
that do not contain hot dogs, hamburgers refers to the transactions containing ham-
burgers, and hamburgers refers to the transactions that do not contain hamburgers.

hot dogs hot dogs Σrow

hamburgers 2000 500 2500
hamburgers 1000 1500 2500
Σcol 3000 2000 5000

Figure 4: Contingency table

1. Suppose that the association rule “hot dogs ⇒ hamburgers” is mined. Given a
minimum support threshold of 25% and a minimum confidence threshold of 50%
, is this association rule strong?

2. Based on the given data, is the purchase of hot dogs independent of the purchase
of hamburgers? If not, what kind of correlation relationship exists between the
two?

3. Compare the use of the all_confidence, max_confidence, Kulczynski, and cosine

measures with lift and correlation on the given data.

2.2 Answer
1. In order for the association rule to be strong the support should be greater then
the minimum support threshold and the confidence should be greater then the
minimum confidence threshold.

Page 5 of 12
KSE525 Assignment 1

In our case,

σ(hotdog,Hamburger) 2
sup = |T |
= 5
= 40%
σ(hotdog,Hamburger) 2
conf = σ(hotdog)
= 3
= 66.7%
Both are greater then their thresholds so we can say that the association rule is
strong.

2. In order to check the dependence and correlation between two associations, we

should calculate their lif t. Two itemsets are independent when the occurrence
of one(A) is independent of the occurrence of the other(B). That occurs when,

P (A ∪ B) = P (A) P (B)

And this means that the lif t equals to 1 calculated by the following equation.

P (AB)
lif t = P (A)P (B)
In case the lif t is greater then 1, then the itemsets are positively correlated and
if it is less then 1, the itemsets are negatively correlated. So,

P (hotdog∪hamburger) 2000/5000
lif t = P (hotdog)P (hamburger)
= (3000/5000)(2500/5000)
= 1.33
Then we can say that the lift is greater then 1 and the itemsets are positively
correlated.

3. In order to calculate all_confidence, max_confidence, Kulczynski, and cosine

measures with lift and correlation we will use the following equations.

sup(AB)
AllConf = max(sup(A),sup(B))

sup(AB) sup(AB)
M axConf = max sup(A)
, sup(B)

sup(AB) 1 1
Kulc = 2 sup(A)
+ sup(B)
sup(AB)
Cosine = √
sup(A)sup(B)

Lift will be calculated by using the equation from the previous sub-question.

Page 6 of 12
KSE525 Assignment 1

dh dh dh dh AllConf MaxConf Kulc Cos Lift

Dataset 2000 500 1000 1500 0.67 0.8 0.732 0.731 1.33

Figure 5: Interestingness table

3 Question 3
3.1 Description
Install R and then two packages arules and arulesViz. Answer the following ques-
tions using R. For each question, hand in your R code as well as your answer (result).

1. Load the “Groceries” data set. Please obtain the following information: (i) the
most frequent item, (ii) the length of the longest transaction, and (iii) the first
five transactions.

2. Mine all association rules with the minimum support 0.001 and the mini-
mum confidence 0.8.

3. Draw a scatter plot for all association rules. Here, the x − axis represents the
support, the y − axis represents the confidence, and the shading of a point
represents the lift. [Hint: use the “plot” function in the arulesViz package.]

4. Select the top-3 association rules according to the lift and print these rules.

5. Draw the top-3 rules as a graph such that a node becomes an item. [Hint: use
the “plot” function in the arulesViz package.]

Manuals for R packages:

• arules

• arulesVis

3.2 Answer
1. Before answering in the questions we should first install packages in Rstudio and
load the "Groceries" dataset.

1 # Install Dependencies
2 install . packages ( " arules " )
3
4 # Load Libraries
5 library ( " Matrix " )
6 library ( " arules " )
7

8 # Load Groceries dataset

9 data ( " Groceries " )

Page 7 of 12
KSE525 Assignment 1

Now we are able to run some statistics on the dataset. First we can check the
most frequent items and the length of the longest transaction just by checking
the summary of the dataset. Thus,

1 # Take a look at the data

2 summary ( Groceries )

Figure 6: Summary output

We can see that the most frequent item is "whole milk", and the most lengthy
transaction consists of 32 items. Below the code for the first five transactions
and then the output on the console.

1 # Filter out data by index

2 inspect ( Groceries [1:5])

Page 8 of 12
KSE525 Assignment 1

Figure 7: Inspect output

2. After running some simple statistics on the dataset and we know what it consist
of, we are able to mine association rules of the itemsets.

1 # Apply Apriori and extract rules

2 rules <- apriori ( Groceries , parameter = list ( support
=0.001 , confidence =0.8) )
3
4 # Check rules exist with those minimum thresholds
5 rules

Figure 8: Rules output

When we check the first 10 rules we see the below list.

1 # Inspect rules
2 inspect ( rules [1:10])

Page 9 of 12
KSE525 Assignment 1

Figure 9: First 10 rules output

And lastly the summary of the rules is the following.

1 # Check rules ’ summary

2 summary ( rules )

Figure 10: Summary rules output

3. In order to draw the scatter plot we should first install the arulesViz package
and then load the library.

Page 10 of 12
KSE525 Assignment 1

1 # Install Dependencies
2 install . packages ( " arulesViz " )
3

4 # Load Libraries
5 library ( " arulesViz " )
6
7 # Plot rules
8 plot ( rules )

Figure 11: Rules plot output

4. The top-3 association rules according to the lift are the following.

1 # Filter out rules ordered by lift column

2 inspect ( head ( sort ( rules , by = " lift " ) , n = 3) )

Figure 12: Rules plot output

Page 11 of 12
KSE525 Assignment 1

5. In order to draw the graph of those rules, first we need to save those rules in a
seperate variable to be fed by the plot.
1 # Create a subrules variable to draw graph
2 subrules <- subset ( rules , lift >=8.34)
3

4 # Draw graph
5 plot ( subrules , method = " graph " )

Figure 13: Subrules graph output

Page 12 of 12

Functional Performance of Older Adults
100% (2)
Functional Performance of Older Adults
601 pages
RDT CET - 5
No ratings yet
RDT CET - 5
24 pages
CST 370 Final Exam
No ratings yet
CST 370 Final Exam
7 pages
Mechanics of Materials - Poisson's Ratio
100% (5)
Mechanics of Materials - Poisson's Ratio
11 pages
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bankpdf download
100% (5)
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bankpdf download
43 pages
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank instant download
100% (4)
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank instant download
55 pages
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank - Quickly Download And Experience The Full Content
100% (4)
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank - Quickly Download And Experience The Full Content
53 pages
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bankinstant download
100% (5)
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bankinstant download
52 pages
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank instant download
100% (4)
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank instant download
48 pages
Week 7 Assignment 1
No ratings yet
Week 7 Assignment 1
6 pages
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank download pdf
100% (17)
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank download pdf
59 pages
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank - Download The Complete Set In PDF DOCX Format
100% (8)
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank - Download The Complete Set In PDF DOCX Format
46 pages
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank - Instant Download To Read The Complete Content
No ratings yet
Calculus for Scientists and Engineers Early Transcendentals 1st Edition Briggs Test Bank - Instant Download To Read The Complete Content
60 pages
Grade 7 Workbook
100% (1)
Grade 7 Workbook
103 pages
Exam-dm1-121017-ans
No ratings yet
Exam-dm1-121017-ans
8 pages
10th SSC Maths 1 Sample Question Paper B 1
No ratings yet
10th SSC Maths 1 Sample Question Paper B 1
4 pages
B2 Course Materials 60hr[1]
No ratings yet
B2 Course Materials 60hr[1]
98 pages
Form 2 Curriculum Assessment 2024
No ratings yet
Form 2 Curriculum Assessment 2024
13 pages
IBA Maths Book
No ratings yet
IBA Maths Book
125 pages
Official SAT Practice Test 2013-14
No ratings yet
Official SAT Practice Test 2013-14
47 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
30 pages
MATHS
No ratings yet
MATHS
14 pages
question-1717809
No ratings yet
question-1717809
7 pages
UnitTest D21 Mar 2024
No ratings yet
UnitTest D21 Mar 2024
5 pages
10 Maths-Arihant Paper-14
No ratings yet
10 Maths-Arihant Paper-14
6 pages
Nat Reviewer Mathematics Iv: Pre-Test
No ratings yet
Nat Reviewer Mathematics Iv: Pre-Test
4 pages
SET B DAA CT1 QP-key
No ratings yet
SET B DAA CT1 QP-key
16 pages
Proton
100% (2)
Proton
2 pages
Assignment 1 Solution
No ratings yet
Assignment 1 Solution
3 pages
Grade7Test
No ratings yet
Grade7Test
3 pages
2024 Sydney Girls
No ratings yet
2024 Sydney Girls
30 pages
Actividades 5º (1° Parte) Con Sol
No ratings yet
Actividades 5º (1° Parte) Con Sol
45 pages
X Math Standard MS
No ratings yet
X Math Standard MS
8 pages
Solutions manual, Modern control engineering, fourth edition -- Ogata, Katsuhiko; Ogata, Katsuhiko_ Modern control -- Upper Saddle River, N_J, ©2002 -- 9780130609083 -- ed02bc2c28c1812870c68c08b5930d6c -- Anna’s Ar
No ratings yet
Solutions manual, Modern control engineering, fourth edition -- Ogata, Katsuhiko; Ogata, Katsuhiko_ Modern control -- Upper Saddle River, N_J, ©2002 -- 9780130609083 -- ed02bc2c28c1812870c68c08b5930d6c -- Anna’s Ar
292 pages
Maths Exemplar Answers
100% (1)
Maths Exemplar Answers
20 pages
Winter Homework
No ratings yet
Winter Homework
3 pages
Assignments For Week 4 2024
No ratings yet
Assignments For Week 4 2024
11 pages
Mcdermott 6014 Final Summer B 17
No ratings yet
Mcdermott 6014 Final Summer B 17
14 pages
ANSWERS KEY
No ratings yet
ANSWERS KEY
4 pages
General Mathematics in The Modern World: Submitted To: Genero Batocael
No ratings yet
General Mathematics in The Modern World: Submitted To: Genero Batocael
14 pages
Kerala XII Annual Exam March 2022 Maths (Commerce) Question Paper (SY57)
No ratings yet
Kerala XII Annual Exam March 2022 Maths (Commerce) Question Paper (SY57)
12 pages
Sciu-178 - Material Durante La Clase - U04
No ratings yet
Sciu-178 - Material Durante La Clase - U04
9 pages
COS212 (Data Structures and Algorithms) Tutorial Exercise 7 2020/05/15
No ratings yet
COS212 (Data Structures and Algorithms) Tutorial Exercise 7 2020/05/15
2 pages
[SRN JR-023] EAMCET WEEKLY TEST [ NOV 08]
No ratings yet
[SRN JR-023] EAMCET WEEKLY TEST [ NOV 08]
13 pages
Test Bank For Mathematics With Applications In The Management, Natural, And Social Sciences 11/E 11th Edition Margaret L. Lial, Thomas W. Hungerford, John P. Holcomb, Bernadette Mullins download
100% (2)
Test Bank For Mathematics With Applications In The Management, Natural, And Social Sciences 11/E 11th Edition Margaret L. Lial, Thomas W. Hungerford, John P. Holcomb, Bernadette Mullins download
53 pages
11 QP Annual 22-23 N
No ratings yet
11 QP Annual 22-23 N
6 pages
ICSE Maths Paper 2023
No ratings yet
ICSE Maths Paper 2023
5 pages
Hornsby Girls 2024 2U Trials & Solutions
No ratings yet
Hornsby Girls 2024 2U Trials & Solutions
55 pages
question-1895056
No ratings yet
question-1895056
7 pages
B.Tech May2022 Comp CSPE-64 Sem4
No ratings yet
B.Tech May2022 Comp CSPE-64 Sem4
4 pages
Tree _ DPP 07 (of Lec 09)
No ratings yet
Tree _ DPP 07 (of Lec 09)
4 pages
Httpsgifs - Africawp Contentuploads202004Grade 8 Mathematics Revision Exemplar Papers PDF
No ratings yet
Httpsgifs - Africawp Contentuploads202004Grade 8 Mathematics Revision Exemplar Papers PDF
118 pages
Careers360: IMO Class 3
No ratings yet
Careers360: IMO Class 3
3 pages
23-24 X Maths Euro
No ratings yet
23-24 X Maths Euro
12 pages
11_MATHS_WS
No ratings yet
11_MATHS_WS
17 pages
DWDM_QB[1]
No ratings yet
DWDM_QB[1]
6 pages
2014 Barker 2U Trial
No ratings yet
2014 Barker 2U Trial
21 pages
MATHEMATICS 7 Summative Test
100% (3)
MATHEMATICS 7 Summative Test
3 pages
Category 4
No ratings yet
Category 4
24 pages
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Jumpstarters for Algebra, Grades 7 - 8: Short Daily Warm-ups for the Classroom
From Everand
Jumpstarters for Algebra, Grades 7 - 8: Short Daily Warm-ups for the Classroom
Wendi Silvano
No ratings yet
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
AkhilaJayanthi Hyderabad Secunderabad, Telangana 3.03 Yrs
No ratings yet
AkhilaJayanthi Hyderabad Secunderabad, Telangana 3.03 Yrs
2 pages
ATPL Inst 8.6 PDF
No ratings yet
ATPL Inst 8.6 PDF
2 pages
Sugar Common Pool Inventory 2007-2010
No ratings yet
Sugar Common Pool Inventory 2007-2010
17 pages
Curriculum PDF
No ratings yet
Curriculum PDF
3 pages
Lesson Plan in Tle: III. Developmental Tasks Teacher's Activity Student's Activity
100% (1)
Lesson Plan in Tle: III. Developmental Tasks Teacher's Activity Student's Activity
6 pages
Water, PH and Buffers
100% (1)
Water, PH and Buffers
43 pages
A Training On Bts Nokia Installation and Commissioning: Presented By, Saurabh Bansal B.Tech (Ece) VTH Sem
No ratings yet
A Training On Bts Nokia Installation and Commissioning: Presented By, Saurabh Bansal B.Tech (Ece) VTH Sem
40 pages
ĐÁP ÁN ĐỀ THI THỬ SỐ 09
No ratings yet
ĐÁP ÁN ĐỀ THI THỬ SỐ 09
8 pages
Nail The Mix EQ Guide
No ratings yet
Nail The Mix EQ Guide
12 pages
An Analytical Approach of Doxofylline: A Review: ISSN-2231-5667 (Print) ISSN - 2231-5675 (Online)
No ratings yet
An Analytical Approach of Doxofylline: A Review: ISSN-2231-5667 (Print) ISSN - 2231-5675 (Online)
4 pages
Chapter 1: Introduction To Islamic Ideology
No ratings yet
Chapter 1: Introduction To Islamic Ideology
29 pages
Disha Physics 500 BlockBuster Problems For JEE Advanced PDF
No ratings yet
Disha Physics 500 BlockBuster Problems For JEE Advanced PDF
309 pages
EE340-Spring2025-Chap3-Maxwell-Eqns(1)
No ratings yet
EE340-Spring2025-Chap3-Maxwell-Eqns(1)
41 pages
7th Math Unit 15 Lec2
No ratings yet
7th Math Unit 15 Lec2
9 pages
Mixed Worksheet - Paper 2 Section B - SL
No ratings yet
Mixed Worksheet - Paper 2 Section B - SL
51 pages
(China Academic Library) Huaiqi Wu (Auth.) - An Historical Sketch of Chinese Historiography-Springer-Verlag Berlin Heidelberg (2018)
No ratings yet
(China Academic Library) Huaiqi Wu (Auth.) - An Historical Sketch of Chinese Historiography-Springer-Verlag Berlin Heidelberg (2018)
504 pages
All Forms HSE Docs by Hseprof Com 1697423389
No ratings yet
All Forms HSE Docs by Hseprof Com 1697423389
76 pages
Internship Report
No ratings yet
Internship Report
16 pages
Anomalous Expansion of Water
No ratings yet
Anomalous Expansion of Water
2 pages
SITHCCC019 - Written Assessment - Part-B
No ratings yet
SITHCCC019 - Written Assessment - Part-B
24 pages
Exercise 1
No ratings yet
Exercise 1
2 pages
The Green Archipelago Forestry in Pre-Industrial Japan by Conrad Totman
No ratings yet
The Green Archipelago Forestry in Pre-Industrial Japan by Conrad Totman
319 pages
Notes On Factoring by GCF: - Page I
No ratings yet
Notes On Factoring by GCF: - Page I
3 pages
414 Tutorials
No ratings yet
414 Tutorials
2 pages
0.P-9 Linear Inequalities and Absolute Value Inequalities
No ratings yet
0.P-9 Linear Inequalities and Absolute Value Inequalities
21 pages
Geometry 101
No ratings yet
Geometry 101
27 pages
GAG Test - MEMO
No ratings yet
GAG Test - MEMO
5 pages
MSDS Black Magic Shampoo 03.21.12
No ratings yet
MSDS Black Magic Shampoo 03.21.12
5 pages