Data Mining mod 2
Data Mining mod 2
Example:
🔹 2.1 Item
An item is a single object, product, or attribute being analyzed.
🔹 2.2 Itemset
A collection of one or more items.
● 1-itemset: {milk}
● 2-itemset: {milk, bread}
🔹 2.3 Transaction
A set of items bought together at the same time. Stored in a transactional database.
TID Items
1 {milk, bread}
3 {bread, butter}
Example:
🔹 2.6 Support
The fraction or percentage of transactions that contain the itemset.
🔹 2.7 Confidence
The conditional probability that a transaction containing itemset A also contains itemset B.
Confidence(A⇒B)=Support(A∪B)Support(A)=P(B∣A)\text{Confidence}(A \Rightarrow B) =
\frac{\text{Support}(A \cup B)}{\text{Support}(A)} =
P(B|A)Confidence(A⇒B)=Support(A)Support(A∪B)=P(B∣A)
● If:
○ Support({milk, bread}) = 2 / 3
○ Support({milk}) = 2 / 3
Then:
Example:
Where:
Measure Use
🔷 3. Problem Definition
Let’s formally define the task:
○ Minimum Support
○ Minimum Confidence
This principle allows the algorithm to prune the search space and avoid computing all
combinations.
🔸 Example:
If {milk, butter} is not frequent, then:
Steps:
TID Items
3 {bread, butter}
4 {milk, butter}
5 {bread}