Data_Mining_Key_Concepts
Data_Mining_Key_Concepts
4. Data Access: Enable users to query and analyze the data for decision-making.
2. Bottom-up Approach: Build data marts first, integrating them later into a warehouse.
3. Hybrid Approach: Combines top-down and bottom-up approaches for flexibility and scalability.
The Apriori Algorithm identifies frequent itemsets in a dataset using a bottom-up approach, starting
with single items and extending them iteratively by adding items, provided their subsets are
frequent. It uses the Apriori Property: "All non-empty subsets of a frequent itemset must also be
frequent."
4. May converge to local minima and fail to produce the global optimal solution.
Classification is a supervised learning technique used to assign labels to data based on predefined
categories. It builds a model using training data, which is then applied to predict the class labels of
new data. Common algorithms include Decision Trees, Naive Bayes, and SVM.
Clustering is an unsupervised learning method used to group similar data points into clusters based
on shared characteristics. Examples include K-means, DBSCAN, and Hierarchical Clustering. Unlike
Binning is a data smoothing technique used to reduce noise in numerical data by grouping values
2. Equal-frequency binning: Divides data such that each bin contains the same number of elements.
3. Smoothing by bin means: Replaces data in a bin with the mean value.