simple 4,6 DWDM
simple 4,6 DWDM
What It Does: The Apriori Algorithm nds items that are often bought together in transactions.
Steps
2. Threshold: Only keep items that appear frequently enough (e.g., at least 40%).
Example
Dataset:
Transactio
Items Bought
n
T1 Bread, Milk, Beer
Bread, Diaper,
T2
Milk
T3 Milk, Diaper, Beer
T4 Bread, Milk
T5 Bread, Diaper
Result:
2nd question
What It Does: FP-Growth nds frequent itemsets without generating candidates like Apriori. It uses a
special tree (FP-Tree) to store transactions compactly.
Steps
◦ Start from the bottom of the tree and nd combinations of frequent items.
Example:
Transactio
Items Bought
n
T1 Bread, Milk, Beer
Bread, Milk,
T2
Diaper
T3 Milk, Diaper, Beer
T4 Bread, Milk
T5 Bread, Diaper
Frequent Items: Bread (4), Milk (4), Diaper (3), Beer (2).
FP-Tree:
NULL
fi
fi
|
Bread (4)
/ \
Milk (3) Diaper (1)
| |
Beer (1) Milk (1)
Frequent Itemsets:
What It Does: Classi cation assigns categories (labels) to data based on past examples.
Examples:
1. Imbalanced Data: If one class (e.g., "spam") is rare, the model may focus on the common class.
Conclusion: Classi cation helps predict labels, but good data and careful modeling are essential.
4th question
What It Does: A Decision Tree splits data into branches based on rules to classify it into categories.
fi
fi
fi
fi
fi
fi
fi
fi
Steps
1. Feature Selection: Choose the best feature to split the data (e.g., "Weather").
Example
Dataset:
Decision Tree:
Weather?
├── Sunny: No
├── Overcast: Yes
└── Rainy: Yes
Prediction:
Advantages
Conclusion: Decision Trees classify data by splitting it based on features. They're simple but need pruning
to work well.
fi
fi
Support Vector Machine (SVM)
• What It Does: SVM separates data points into classes using a line (or hyperplane in higher
dimensions).
• Key Idea: It nds the line that keeps the maximum distance from the closest points of each class
(called support vectors).
• When To Use: For data with clear separation between classes.
• Advantages:
◦ Works well with high-dimensional data.
◦ Effective when there’s a clear margin between classes.
• Disadvantages:
◦ Can be slow with large datasets.
◦ Needs careful selection of parameters (like kernels).
• What It Does: KNN assigns a class to a point based on the majority class of its k
-nearest neighbors.
• Key Idea: It uses the distance to nearby points to classify new data.
• When To Use: Simple problems with smaller datasets.
• Advantages:
◦ Easy to understand and implement.
◦ No need for a training phase (lazy learning).
• Disadvantages:
◦ Can be slow for large datasets.
◦ Sensitive to irrelevant features and the value of k
Both are useful, but SVM is better for complex and high-dimensional problems, while KNN is great for
simpler, intuitive tasks.
fl
fi