0% found this document useful (0 votes)
4 views3 pages

Data_Mining_Key_Concepts

Uploaded by

manishpal2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

Data_Mining_Key_Concepts

Uploaded by

manishpal2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

a.

Draw the diagram for key steps of data mining

Key steps of data mining:

1. Data Cleaning: Remove noise and inconsistent data.

2. Data Integration: Combine data from multiple sources.

3. Data Selection: Select relevant data for analysis.

4. Data Transformation: Convert data into a suitable format.

5. Data Mining: Apply algorithms to extract patterns.

6. Pattern Evaluation: Identify interesting patterns.

7. Knowledge Presentation: Visualize the results.

b. Define the term Support and Confidence

Support: It is the frequency of an itemset appearing in the dataset.

Support(X) = Transactions containing X / Total transactions

Confidence: It measures the reliability of a rule, calculated as the proportion of transactions

containing both X and Y to those containing X.

Confidence(X -> Y) = Support(X U Y) / Support(X)

c. Explain Data Warehouse Process

The Data Warehouse process involves the following steps:

1. Data Extraction: Gather data from multiple sources.

2. Data Transformation: Clean and standardize data for consistency.

3. Data Loading: Store transformed data in the data warehouse.

4. Data Access: Enable users to query and analyze the data for decision-making.

d. Illustrate the Warehousing Strategy

A data warehousing strategy involves:


1. Top-down Approach: Design the enterprise-wide warehouse first, followed by smaller data marts.

2. Bottom-up Approach: Build data marts first, integrating them later into a warehouse.

3. Hybrid Approach: Combines top-down and bottom-up approaches for flexibility and scalability.

e. Write the statement for Apriori Algorithm

The Apriori Algorithm identifies frequent itemsets in a dataset using a bottom-up approach, starting

with single items and extending them iteratively by adding items, provided their subsets are

frequent. It uses the Apriori Property: "All non-empty subsets of a frequent itemset must also be

frequent."

f. List out the drawbacks of k-mean algorithm

1. Requires pre-specifying the number of clusters (k).

2. Sensitive to initial cluster centroids and outliers.

3. Only works well with spherical clusters.

4. May converge to local minima and fail to produce the global optimal solution.

5. Inefficient with large datasets due to high computation cost.

g. Explain about the Classification

Classification is a supervised learning technique used to assign labels to data based on predefined

categories. It builds a model using training data, which is then applied to predict the class labels of

new data. Common algorithms include Decision Trees, Naive Bayes, and SVM.

h. Discuss the Clustering

Clustering is an unsupervised learning method used to group similar data points into clusters based

on shared characteristics. Examples include K-means, DBSCAN, and Hierarchical Clustering. Unlike

classification, clustering does not require labeled data.


i. Explain the needs on Data Mining

Data mining is essential to:

1. Extract useful patterns and insights from large datasets.

2. Aid decision-making processes in business, healthcare, and education.

3. Detect fraud, predict trends, and improve efficiency in various domains.

4. Handle and analyze the growing volume of data effectively.

j. Write a short note on Binning

Binning is a data smoothing technique used to reduce noise in numerical data by grouping values

into bins or intervals. Methods include:

1. Equal-width binning: Divides data into bins of equal size.

2. Equal-frequency binning: Divides data such that each bin contains the same number of elements.

3. Smoothing by bin means: Replaces data in a bin with the mean value.

You might also like