0% found this document useful (0 votes)

13 views5 pages

simple 4,6 DWDM

Uploaded by

sunnyreddy670

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views5 pages

simple 4,6 DWDM

Uploaded by

sunnyreddy670

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1st question

Frequent Itemset Generation Using Apriori Algorithm

What It Does: The Apriori Algorithm nds items that are often bought together in transactions.

Steps

1. Support: Check how often an item or group of items appears in transactions.

Support
=
Number of Transactions with the Item(s)
Total Transactions

\text{Support} = \frac{\text{Number of Transactions with the Item(s)}}{\text{Total Transactions}}

2. Threshold: Only keep items that appear frequently enough (e.g., at least 40%).

Example

Dataset:

Transactio
Items Bought
n
T1 Bread, Milk, Beer
Bread, Diaper,
T2
Milk
T3 Milk, Diaper, Beer
T4 Bread, Milk
T5 Bread, Diaper

1. Find Single Frequent Items:

◦ Bread: 4/5 = 0.8

◦ Milk: 4/5 = 0.8
◦ Beer: 2/5 = 0.4
◦ Diaper: 3/5 = 0.6
Keep: Bread, Milk, Beer, Diaper.
2. Find Frequent Pairs:

◦ Bread + Milk: 3/5 = 0.6 (Keep)

◦ Bread + Diaper: 2/5 = 0.4 (Keep)
◦ Milk + Diaper: 2/5 = 0.4 (Keep)
3. Find Frequent Triples:
fi
◦ Bread + Milk + Diaper: 1/5 = 0.2 (Too low, discard).

Result:

• Frequent Single Items: Bread, Milk, Beer, Diaper.

• Frequent Pairs: {Bread, Milk}, {Bread, Diaper}, {Milk, Diaper}.

Conclusion: The Apriori Algorithm shows patterns like:

"If someone buys Bread, they are likely to buy Milk."

2nd question

2nd Question: FP-Growth Algorithm

What It Does: FP-Growth nds frequent itemsets without generating candidates like Apriori. It uses a
special tree (FP-Tree) to store transactions compactly.

Steps

1. Build the FP-Tree:

◦ Count how often items appear.

◦ Keep items meeting the support threshold (e.g., 40% or 2 transactions).
◦ Sort items by frequency and add transactions to the tree.
2. Mine the FP-Tree:

◦ Start from the bottom of the tree and nd combinations of frequent items.

Example:

Transactio
Items Bought
n
T1 Bread, Milk, Beer
Bread, Milk,
T2
Diaper
T3 Milk, Diaper, Beer
T4 Bread, Milk
T5 Bread, Diaper

Frequent Items: Bread (4), Milk (4), Diaper (3), Beer (2).

FP-Tree:

NULL
fi
fi
|
Bread (4)
/ \
Milk (3) Diaper (1)
| |
Beer (1) Milk (1)
Frequent Itemsets:

• {Bread}, {Milk}, {Bread, Milk}, {Bread, Diaper}, {Milk, Diaper}.

Result: FP-Growth quickly nds frequent itemsets like:

• Bread and Milk often appear together.

3rd Question: What is Classi cation?

What It Does: Classi cation assigns categories (labels) to data based on past examples.

Examples:

• Spam Detection: Is an email "spam" or "not spam"?

• Medical Diagnosis: Is a tumor "benign" or "malignant"?

Issues in Classi cation

1. Imbalanced Data: If one class (e.g., "spam") is rare, the model may focus on the common class.

◦ Solution: Balance the dataset.

2. Over tting: The model learns noise in training data.

◦ Solution: Simplify the model or use more data.

3. Under tting: The model is too simple and misses patterns.

◦ Solution: Use a better algorithm or features.

4. Noisy Data: Errors or irrelevant information can confuse the model.

◦ Solution: Clean the data.

Conclusion: Classi cation helps predict labels, but good data and careful modeling are essential.

4th question

4th Question: Classi cation by Decision Tree Induction

What It Does: A Decision Tree splits data into branches based on rules to classify it into categories.
fi
fi
fi
fi
fi
fi
fi
fi
Steps

1. Feature Selection: Choose the best feature to split the data (e.g., "Weather").

◦ Use methods like Information Gain or Gini Index to decide.

2. Build the Tree:

◦ Split data into branches based on the chosen feature.

◦ Repeat until:
▪ All data in a branch belongs to the same class, or
▪ A stopping condition is met (e.g., max depth).
3. Prune the Tree:

◦ Remove unnecessary branches to avoid over tting.

Example

Dataset:

Weathe Temperatur Play

r e Tennis
Sunny Hot No
Sunny Mild No
Overcast Mild Yes
Rainy Cool Yes
Rainy Mild Yes

Decision Tree:

Weather?
├── Sunny: No
├── Overcast: Yes
└── Rainy: Yes
Prediction:

• If Weather is Sunny, predict No.

Advantages

• Easy to understand and visualize.

• Works with numerical and categorical data.
Disadvantages

• Can over t if the tree is too deep.

• Sensitive to noisy data.

Conclusion: Decision Trees classify data by splitting it based on features. They're simple but need pruning
to work well.
fi
fi
Support Vector Machine (SVM)

• What It Does: SVM separates data points into classes using a line (or hyperplane in higher
dimensions).
• Key Idea: It nds the line that keeps the maximum distance from the closest points of each class
(called support vectors).
• When To Use: For data with clear separation between classes.
• Advantages:
◦ Works well with high-dimensional data.
◦ Effective when there’s a clear margin between classes.
• Disadvantages:
◦ Can be slow with large datasets.
◦ Needs careful selection of parameters (like kernels).

K-Nearest Neighbors (KNN)

• What It Does: KNN assigns a class to a point based on the majority class of its k

-nearest neighbors.
• Key Idea: It uses the distance to nearby points to classify new data.
• When To Use: Simple problems with smaller datasets.
• Advantages:
◦ Easy to understand and implement.
◦ No need for a training phase (lazy learning).
• Disadvantages:
◦ Can be slow for large datasets.
◦ Sensitive to irrelevant features and the value of k

Comparison of SVM and KNN

Feature SVM KNN

Model-based (builds a Instance-based (on-the-
Learning Type
rule) y)
Speed Fast prediction, slow train Slow prediction
Data
High-dimensional data Small datasets
Suitability

Both are useful, but SVM is better for complex and high-dimensional problems, while KNN is great for
simpler, intuitive tasks.
fl
fi

SQL Project ScienceQtech Employee Performance Mapping...
No ratings yet
SQL Project ScienceQtech Employee Performance Mapping...
16 pages
ICT285 2022 TJ Assignment 1
No ratings yet
ICT285 2022 TJ Assignment 1
8 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
25 pages
Online Loan Application and Verification System Abstract
100% (2)
Online Loan Application and Verification System Abstract
7 pages
SERIES 2
No ratings yet
SERIES 2
9 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
Unit - II
No ratings yet
Unit - II
37 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
UNIT 2 S4 SLO2
No ratings yet
UNIT 2 S4 SLO2
37 pages
On Unit-3
No ratings yet
On Unit-3
30 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
DataMining_Unit-3
No ratings yet
DataMining_Unit-3
8 pages
DWDM Mid Ii
No ratings yet
DWDM Mid Ii
13 pages
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
No ratings yet
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
42 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
_
No ratings yet
_
90 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
DWM_Module 3 (1)
No ratings yet
DWM_Module 3 (1)
22 pages
Ai Word Document Session 2 Detailed Exaple
No ratings yet
Ai Word Document Session 2 Detailed Exaple
15 pages
DM - MP (1)
No ratings yet
DM - MP (1)
15 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
Data Mining
No ratings yet
Data Mining
68 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
ML_Course_15 -17
No ratings yet
ML_Course_15 -17
31 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
11 pages
Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach
No ratings yet
Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach
126 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
4 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Unsupervised Ml
No ratings yet
Unsupervised Ml
15 pages
Chapter 5 Association Rules FP Tree
No ratings yet
Chapter 5 Association Rules FP Tree
26 pages
ML Unit3 QB Solutions
No ratings yet
ML Unit3 QB Solutions
11 pages
ARM and Clustering
No ratings yet
ARM and Clustering
79 pages
Chapter - 4
No ratings yet
Chapter - 4
14 pages
Unit III Data Mining Techniques
No ratings yet
Unit III Data Mining Techniques
17 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
UNIT 4 NOTES
No ratings yet
UNIT 4 NOTES
21 pages
Module Iii
No ratings yet
Module Iii
15 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
Data Science Unit 3 (1) - Copy
No ratings yet
Data Science Unit 3 (1) - Copy
33 pages
Lec03 Classifiers KNN+DT
No ratings yet
Lec03 Classifiers KNN+DT
30 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Decision Trees
No ratings yet
Decision Trees
34 pages
Seminar 3
No ratings yet
Seminar 3
43 pages
ML UNIT4
No ratings yet
ML UNIT4
10 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
43 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
40 pages
Week 15 Lecture Notes
No ratings yet
Week 15 Lecture Notes
66 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
IAI UNIT 6
No ratings yet
IAI UNIT 6
6 pages
G3 SE
No ratings yet
G3 SE
18 pages
OSC - Practice Questions - Units 4, 5 and 6 copy
No ratings yet
OSC - Practice Questions - Units 4, 5 and 6 copy
1 page
IAI unit 5
No ratings yet
IAI unit 5
3 pages
Syllab 003
No ratings yet
Syllab 003
2 pages
Visualizations in Spreadsheets and Tableau
No ratings yet
Visualizations in Spreadsheets and Tableau
4 pages
Informatica MDM Course Hyd Trainging
No ratings yet
Informatica MDM Course Hyd Trainging
17 pages
oel-1-section-A-06112023-093325am
No ratings yet
oel-1-section-A-06112023-093325am
4 pages
Chapter 10
No ratings yet
Chapter 10
59 pages
MLM Business Requirement Document
No ratings yet
MLM Business Requirement Document
2 pages
Ces WP 18 27
No ratings yet
Ces WP 18 27
162 pages
NetWorker Cloning Integration Guide
No ratings yet
NetWorker Cloning Integration Guide
100 pages
AP Mini Project
No ratings yet
AP Mini Project
24 pages
Kibana
No ratings yet
Kibana
2 pages
Cyber Threat Management-2
No ratings yet
Cyber Threat Management-2
9 pages
Job Application Letter
No ratings yet
Job Application Letter
5 pages
Standards Institute, Standards Planning and Requirements Committee, Is An Abstract Design
No ratings yet
Standards Institute, Standards Planning and Requirements Committee, Is An Abstract Design
4 pages
3.A Sample Case Study On MongoDB
No ratings yet
3.A Sample Case Study On MongoDB
9 pages
doc_1653402016492
No ratings yet
doc_1653402016492
3 pages
Reviewer For MTA
No ratings yet
Reviewer For MTA
17 pages
SAP PS - Project Information System 25
No ratings yet
SAP PS - Project Information System 25
3 pages
Bioinformatics: Methods and Applications Dev Bukhsh Singh download
100% (1)
Bioinformatics: Methods and Applications Dev Bukhsh Singh download
51 pages
Transformation Carte
No ratings yet
Transformation Carte
5,774 pages
3018-Article Text-16604-2-10-20230706
No ratings yet
3018-Article Text-16604-2-10-20230706
13 pages
Computer Chapter 11
No ratings yet
Computer Chapter 11
12 pages
Chapter-4.1
No ratings yet
Chapter-4.1
37 pages
Change Data Capture
No ratings yet
Change Data Capture
4 pages
Big Data Use Cases: Product Development
No ratings yet
Big Data Use Cases: Product Development
8 pages
Exporting Shared Services (933to121)
No ratings yet
Exporting Shared Services (933to121)
2 pages
Distributed Databases
No ratings yet
Distributed Databases
24 pages
33 Mgg03 Ku1102 Computationalthinking
No ratings yet
33 Mgg03 Ku1102 Computationalthinking
32 pages