Mining Massive Data University of Primorska Fall 2020

This document contains 8 questions about frequent itemset mining and finding similar items. Specifically, it asks about computing frequent items and itemsets given transaction data with different support thresholds, calculating confidence of association rules, using a triangular matrix to count item pairs, applying the PCY algorithm to transaction data hashed into buckets, estimating expected Jaccard similarity of random sets, computing the maximum number of shingles in a document, calculating minhash signatures to estimate Jaccard similarity between columns of a matrix, and evaluating an S-curve function.

Uploaded by

Đorđe Klisura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views2 pages

Mining Massive Data University of Primorska Fall 2020

Uploaded by

Đorđe Klisura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Mining Massive Data

University of Primorska
Fall 2020

Homework 1
(Frequent itemset mining and Finding similar items)

1. Suppose there are 100 items, numbered 1 to 100, and also 100 baskets, also numbered 1 to 100.
Item i is in basket b if and only if i divides b with no remainder. Thus, item 1 is in all the baskets,
item 2 is in all fifty of the even-numbered baskets, and so on. Basket 12 consists of items {1, 2, 3,
4, 6, 12}, since these are all the integers that divide 12. Answer the following questions:
(a) If the support threshold is 5, which items are frequent?
(b) If the support threshold is 5, which pairs of items are frequent?
(c) What is the sum of the sizes of all the baskets?
(d) Suppose the support threshold is 5. Find the maximal frequent itemsets.

2. Consider the data of the previous problem. What is the confidence of the following association
rules?
(a) {5, 7} → 2.
(b) {2, 3, 4}→ 5.

3. Suppose that we use a triangular matrix to count pairs, and n, the number of items, is 20.
(a) What pair’s count is in a[100]?
(b
) Suppose the support threshold is 5. Find the maximal frequent itemsets.

4. Here is a collection of twelve baskets. Each contains three of the six items 1 through 6.
{1, 2, 3} {2, 3, 4} {3, 4, 5} {4, 5, 6}
{1, 3, 5} {2, 4, 6} {1, 3, 4} {2, 4, 5}
{3, 5, 6} {1, 2, 4} {2, 3, 5} {3, 4, 6}
Suppose the support threshold is 4. On the first pass of the PCY Algorithm we use a hash table
with 11 buckets, and the set {i, j} is hashed to bucket i×jmod 11.
(a) By any method, compute the support for each item and each pair of items.
(b) Which pairs hash to which buckets?
(c) Which buckets are frequent?
(d) Which pairs are counted on the second pass of the PCY Algorithm?

1
Mining Massive Data
University of Primorska
Fall 2020

5. Suppose we have a universal set U of n elements, and we choose two subsets S and T at random,
each with m of the n elements. What is the expected value of the Jaccard similarity of S and T?

6. What is the largest number of k-shingles a document of n bytes can have? You may assume that
the size of the alphabet is large enough that the number of possible strings of length k is at least n.

7. Consider the matrix with six rows below.

(a) Compute the minhash signature for each column if we use the following three hash functions:
h1(x) = 2x + 1 mod 6; h2(x) = 3x + 2 mod 6; h3(x) = 5x + 2 mod 6.
(b) Which of these hash functions are true permutations?
(c) How close are the estimated Jaccard similarities for the six pairs of columns to the true
Jaccard similarities?

8. Evaluate the following S-curve:

for s = 0.1, 0.2, . . . , 0.9, for the following values of r and b:

(a) r = 3 and b = 10.
(b) r = 6 and b = 20.
(c) r = 5 and b = 50.

Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitin - Available For Instant Download And Reading
89% (9)
Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitin - Available For Instant Download And Reading
66 pages
Introduction To SAP CPI 1708473586
No ratings yet
Introduction To SAP CPI 1708473586
16 pages
Practise Mathematics: Grade 7 Book 1
From Everand
Practise Mathematics: Grade 7 Book 1
Esther Chen
4/5 (2)
CS426 SolutionForHomework1
No ratings yet
CS426 SolutionForHomework1
6 pages
Assocrules 2
No ratings yet
Assocrules 2
49 pages
Homework 7
0% (1)
Homework 7
5 pages
6.00 Quiz 2, 2011 - Name
No ratings yet
6.00 Quiz 2, 2011 - Name
8 pages
ch04 LSH
No ratings yet
ch04 LSH
54 pages
Lecture 15
No ratings yet
Lecture 15
4 pages
Handling Large Datasets
No ratings yet
Handling Large Datasets
26 pages
Arsdigita University Month 2: Discrete Mathematics - Professor Shai Simonson Problem Set 2 - Sets, Functions, Big-O, Rates of Growth
No ratings yet
Arsdigita University Month 2: Discrete Mathematics - Professor Shai Simonson Problem Set 2 - Sets, Functions, Big-O, Rates of Growth
3 pages
discrete Structure 1
No ratings yet
discrete Structure 1
21 pages
Data Mining: Sketching, Locality Sensitive Hashing
No ratings yet
Data Mining: Sketching, Locality Sensitive Hashing
61 pages
TM3 ch06 Frequent Itemsets
No ratings yet
TM3 ch06 Frequent Itemsets
54 pages
Data Mining of Very Large Data
No ratings yet
Data Mining of Very Large Data
50 pages
03a34fcf3012a242ffcae5534e9fa511_MIT6_006S20_ps3-questions
No ratings yet
03a34fcf3012a242ffcae5534e9fa511_MIT6_006S20_ps3-questions
4 pages
hw5 Sol PDF
No ratings yet
hw5 Sol PDF
7 pages
CS513 Spring 2020 Design and Analysis of Data Structures and Algorithms
No ratings yet
CS513 Spring 2020 Design and Analysis of Data Structures and Algorithms
2 pages
Hash Solution
100% (2)
Hash Solution
3 pages
sample_question
No ratings yet
sample_question
19 pages
bigdata-1
No ratings yet
bigdata-1
3 pages
Computational Tools DTU Presentation Week4
No ratings yet
Computational Tools DTU Presentation Week4
40 pages
2015-Spr
No ratings yet
2015-Spr
4 pages
ch03 Assocrules
No ratings yet
ch03 Assocrules
59 pages
midterm1practice
No ratings yet
midterm1practice
11 pages
CSN-513-MTE-2022 1
No ratings yet
CSN-513-MTE-2022 1
2 pages
CS246 Hw1
No ratings yet
CS246 Hw1
5 pages
In Between Any Two Number There Is
No ratings yet
In Between Any Two Number There Is
50 pages
Quiz 1
No ratings yet
Quiz 1
6 pages
L3
No ratings yet
L3
54 pages
Class XI Mathematics Worksheet3 - SETS
No ratings yet
Class XI Mathematics Worksheet3 - SETS
3 pages
Data Mining
No ratings yet
Data Mining
24 pages
DSA 15 - Compre - Question Paper PDF
No ratings yet
DSA 15 - Compre - Question Paper PDF
2 pages
MQM100 MultipleChoice Chapter2
No ratings yet
MQM100 MultipleChoice Chapter2
9 pages
A2
100% (1)
A2
2 pages
drdo_2020_cse_paper_1_1_25
No ratings yet
drdo_2020_cse_paper_1_1_25
4 pages
Advanced Algorithms Course. Lecture Notes. Part 10: Hashing
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 10: Hashing
4 pages
Quiz 1
No ratings yet
Quiz 1
16 pages
Engineering Mathematics-III (Common For CSE and IT) : Unit-I - SET
No ratings yet
Engineering Mathematics-III (Common For CSE and IT) : Unit-I - SET
4 pages
1701.09042
No ratings yet
1701.09042
7 pages
Association Rules
No ratings yet
Association Rules
58 pages
pgcs2012
No ratings yet
pgcs2012
5 pages
Olympiad
No ratings yet
Olympiad
11 pages
Unit - IV A DA
No ratings yet
Unit - IV A DA
39 pages
Probability and Statistics Assignment #1
No ratings yet
Probability and Statistics Assignment #1
6 pages
1 Overview: Lecture 2 - February 3, 2005
No ratings yet
1 Overview: Lecture 2 - February 3, 2005
6 pages
Ai Graduate Sample Question Paper
100% (1)
Ai Graduate Sample Question Paper
3 pages
Maths Solutions
No ratings yet
Maths Solutions
31 pages
mscds2021 Solutions
No ratings yet
mscds2021 Solutions
18 pages
MI2026 Problems
No ratings yet
MI2026 Problems
44 pages
Chennai Mathematical Institute
No ratings yet
Chennai Mathematical Institute
14 pages
Lec1d-assoc-rules2
No ratings yet
Lec1d-assoc-rules2
22 pages
Data Mining, Spring 2010. Exercises 1, Solutions
No ratings yet
Data Mining, Spring 2010. Exercises 1, Solutions
6 pages
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Practice Problems for the SAT Arithmetic
From Everand
Practice Problems for the SAT Arithmetic
Deeptha Thattai
1/5 (1)
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Practise Mathematics Grade 7 Book 8
From Everand
Practise Mathematics Grade 7 Book 8
Esther Chen
5/5 (1)
The Green Book of Mathematical Problems
From Everand
The Green Book of Mathematical Problems
Kenneth Hardy
4.5/5 (3)
Number: To Infinity and Beyond
From Everand
Number: To Infinity and Beyond
Oliver Linton
No ratings yet
Practise Mathematics: Grade 7 Book 7
From Everand
Practise Mathematics: Grade 7 Book 7
Esther Chen
No ratings yet
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)
Detection of Sparse Anomalies in High-Dimensional Network Telescope Signals
No ratings yet
Detection of Sparse Anomalies in High-Dimensional Network Telescope Signals
13 pages
Ukraine
No ratings yet
Ukraine
13 pages
10C - A Longitudinal Study of Herd Behavior in The Adoption and Continued Use of Technology - 2013
No ratings yet
10C - A Longitudinal Study of Herd Behavior in The Adoption and Continued Use of Technology - 2013
30 pages
Detection
No ratings yet
Detection
16 pages
Hierarchy Code Node PR - 01 PR - 01 PR - 01 Products PR - 01 PR - 01 PR - 01 PR - 01 PR - 01 PR - 01 PR - 01
No ratings yet
Hierarchy Code Node PR - 01 PR - 01 PR - 01 Products PR - 01 PR - 01 PR - 01 PR - 01 PR - 01 PR - 01 PR - 01
4 pages
On The Exam We Can Have 1 Cheat Sheet: Blg/Edit?Usp Sharing
No ratings yet
On The Exam We Can Have 1 Cheat Sheet: Blg/Edit?Usp Sharing
40 pages
Evaluation Aspects For The Presentation
No ratings yet
Evaluation Aspects For The Presentation
2 pages
Dynamo: Amazon's Highly Available Key-Value Store
No ratings yet
Dynamo: Amazon's Highly Available Key-Value Store
16 pages
Google Bigtable: Describe The Data Model of Bigtable
100% (1)
Google Bigtable: Describe The Data Model of Bigtable
6 pages
Gantt Chart
No ratings yet
Gantt Chart
1 page
Numerical Analysis
No ratings yet
Numerical Analysis
117 pages
Predloga Venture Design
No ratings yet
Predloga Venture Design
9 pages
Context
No ratings yet
Context
7 pages
CPE 301 Logic Circuits & Design
No ratings yet
CPE 301 Logic Circuits & Design
5 pages
5 g Reasoner
No ratings yet
5 g Reasoner
16 pages
ITEC 221 6 - File Management
No ratings yet
ITEC 221 6 - File Management
35 pages
IEEE Capella SysML Paper
No ratings yet
IEEE Capella SysML Paper
6 pages
Healthcare Provision:: Hospital Management System
No ratings yet
Healthcare Provision:: Hospital Management System
14 pages
Serial /PROFIBUS DP Adapter PM-125 User Manual: Technical Support: +86-21-5102 8348
No ratings yet
Serial /PROFIBUS DP Adapter PM-125 User Manual: Technical Support: +86-21-5102 8348
51 pages
ECE391 Midterm 20-03-2023
No ratings yet
ECE391 Midterm 20-03-2023
4 pages
Blockchain Technology
No ratings yet
Blockchain Technology
5 pages
Sangfor SSL VPN User Guide For Nutanixdays
No ratings yet
Sangfor SSL VPN User Guide For Nutanixdays
7 pages
Python Classes
No ratings yet
Python Classes
23 pages
Juniper QFX5120 DS
No ratings yet
Juniper QFX5120 DS
13 pages
SQL Functions For Calculated Column in SAP HANA
100% (2)
SQL Functions For Calculated Column in SAP HANA
8 pages
Cloud Computing CC Lab Manual - 240125 - 135558
No ratings yet
Cloud Computing CC Lab Manual - 240125 - 135558
51 pages
Week 1: Toolbox: Keyboard Shortcuts
No ratings yet
Week 1: Toolbox: Keyboard Shortcuts
6 pages
SupportTool Instructions for Use Rev. 18.x.x
No ratings yet
SupportTool Instructions for Use Rev. 18.x.x
312 pages
p454
No ratings yet
p454
8 pages
Marvell Phys Transceivers Alaska 88e1111 Technical Product Brief 2013 10
No ratings yet
Marvell Phys Transceivers Alaska 88e1111 Technical Product Brief 2013 10
56 pages
Minishark User Guide
No ratings yet
Minishark User Guide
6 pages
Parameter Configuration
No ratings yet
Parameter Configuration
1 page
Coap &MQTT
No ratings yet
Coap &MQTT
19 pages
amalanche_resume
No ratings yet
amalanche_resume
2 pages
Ludeca OMNITREND Install Tool Instructions
No ratings yet
Ludeca OMNITREND Install Tool Instructions
24 pages
Upgrade Guide For SAP S/4HANA 2020
No ratings yet
Upgrade Guide For SAP S/4HANA 2020
28 pages
4 Solidity Layout
No ratings yet
4 Solidity Layout
10 pages
Computer Networks Chapter 2
No ratings yet
Computer Networks Chapter 2
16 pages
10 Mark: Important Questions DCN
No ratings yet
10 Mark: Important Questions DCN
12 pages
Chapter 1 Computer System Overview
No ratings yet
Chapter 1 Computer System Overview
420 pages
Xplore Online GuideBook General
No ratings yet
Xplore Online GuideBook General
24 pages

Mining Massive Data University of Primorska Fall 2020

Uploaded by

Mining Massive Data University of Primorska Fall 2020

Uploaded by

Mining Massive Data

7. Consider the matrix with six rows below.

8. Evaluate the following S-curve:

for s = 0.1, 0.2, . . . , 0.9, for the following values of r and b:

You might also like