DM-Ass03

Uploaded by

i200847 Fatima Asim

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

DM-Ass03

Uploaded by

i200847 Fatima Asim

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Assignment 3 DS3002-Data Mining Total marks: 100

DS3002-Data Mining
Assignment #3
Spring 2024

Total: 100
Deadline: 30 March 2024

1|Pag
e
Assignment 3 DS3002-Data Mining Total marks: 100

Important Instructions
 You must complete the assignment before the deadline.
Since the assignment is a little lengthy, get started right
away!

 Plagiarism is strictly prohibited! Try to solve the

assignment on your own. Marks will be given based on
your thought process so make sure you have a solid
reason for all your attempts.

 No assignment will be accepted after the due date.

 If Plagiarism Found, Straight ZERO in Whole Assignment.

Deliverables:

 Submission Format: i21-XXXX_A3.zip

 Submit 2 Jupyter Notebooks (Q1 and Q2 Separately) and
Report
Marks Distribution

 Report 10 Marks
 Q1 (50 Marks)
 Q2 (5+20+5+10 = 40 Marks)

2|Pag
e
Assignment 3 DS3002-Data Mining Total marks: 100

Q1. Interactive Foreground Segmentation Using K-Means Clustering

You will implement a basic version of the interactive image cut-out / segmentation
approach called Lazy Snapping [1]. You are given several test images, along with
corresponding auxiliary images depicting the foreground and background “seed” pixels
marked with red and blue brush-strokes respectively. Your program should exploit these
partial human annotations in order to compute a precise figure-ground segmentation.

You should first write a function that performs K-Means Clustering on color pixels. The
input arguments are the desired number of clusters k and the data points to cluster. It
outputs a cluster index for each input data point, as well as the k cluster centroids. You
should then extract the seed pixels for each class (i.e., foreground and background) and
use your k-means function to obtain N clusters for each class. A good choice for N is 64,
but you can experiment with smaller or bigger values

Next, compute the likelihood of a given pixel p to belong to each

of the N clusters of a given class (either foreground or
background) using an exponential function of the negative of the
Euclidean distance between the respective cluster center 𝐶𝑘 and
the given pixel 𝐼𝑝 in the RGB color space. The overall likelihood
𝑝(𝑝) of the pixel to belong to this class is a weighted sum of all
these cluster. For this Assignment 𝑤𝑘 values will be 0.1

You might see improved results if you remove the square operation in equation above.
The reason may have to do with the fact that squaring essentially reduces small values
even further, mitigating (adversely) the effects of the negative exponential. Finally, a
given pixel is simply assigned to the class to which it is more likely to belong. That is, if
𝑝𝑓𝑔(𝑝) > 𝑝𝑏𝑔(𝑝), pixel 𝑝 is assigned to the foreground class, i.e., 𝑥𝑝= 1, and vice versa.

Include your results for all test images in your report, and explain what you get. For test
images with two stroke images, you should report results for both cases. Also compare
results for different values of N, i.e., the number of clusters evolved in the foreground
and background classes.

Results for N=64 (both foreground and background)

3|Pag
e
Assignment 3 DS3002-Data Mining Total marks: 100

Q2. Face Recognition Using K-NN

For this question, we use simplified version of CMU Pose, Illumination, and Expression
(PIE) Dataset, which only contains 10 subjects spanning five near-frontal poses, and there
are 170 images for each individual. In addition, all the images have been resized to
32x32 pixels. The dataset is provided in the form of a csv file with 1700 rows and 1024
columns. Each row is an instance and each column a feature. The first 170 instances
belong to the first subject, the next 170 to the second subject and so on. Following
illustrate various instances of the first subject.

Tasks
1. Pre-process the dataset by normalizing each face image vector to unit
length (i.e., dividing each vector by its magnitude). Next, for each of the
10 subjects, randomly select 150 images for training and use the
remaining 20 for testing.

2. Implement a k Nearest Neighbors (k-NN) classifier from scratch and using

the training set and evaluate its performance on the test set. You may not
use built-in / library functions to implement the classifier.
a. You should also implement Euclidean and cosine similarity distance
measures and used them for different values of K.
b. You should also present results when fewer training images are
used (for instead 100 training images and 70 test images per
subject)
c. Write results in the report for k-values 2, 5, 7, and 11 with each distance
metric.

3. Use Sklearn to apply SVM and GaussianNB on this dataset and compare
the accuracy with K-NN in the report.

4. Perform dimensionality reduction on the training and testing datasets and

visualize them in 3-D space using the Principal Component Analysis (PCA)
and matplotlib or Seaborn.

4|Pag
e
Assignment 3 DS3002-Data Mining Total marks: 100

Rubric

Criteria Marks
Marks Distribution
- Report 10
- Q1 50
- Q2 40

Q1. Interactive Foreground Segmentation Using K-Means

- K-Means Clustering function implementation 15
- Seed pixel extraction and likelihood computation 20
- Experimentation and comparison with different N values 15

Q2. Face Recognition Using K-NN

- Pre-processing and data splitting 5
- k-NN classifier implementation 20
- Evaluation with different K values 10
- Evaluation with fewer training images 5
- Comparison with SVM and GaussianNB 5
- Dimensionality reduction and visualization 10

Overall Presentation
- Clarity and organization of the report 5
- Documentation and comments in code 5

5|Pag
e

Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
The Medicinal Chef - Dale Pinnock
100% (5)
The Medicinal Chef - Dale Pinnock
294 pages
Assignment 02
No ratings yet
Assignment 02
5 pages
Assignment 3 B
No ratings yet
Assignment 3 B
7 pages
Dip Lab - 9
No ratings yet
Dip Lab - 9
3 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
hw4 PDF
No ratings yet
hw4 PDF
3 pages
ML Assignment 02
No ratings yet
ML Assignment 02
8 pages
Lab 11-12
No ratings yet
Lab 11-12
8 pages
Assignment 2 New
No ratings yet
Assignment 2 New
5 pages
ml_aat_report 1
No ratings yet
ml_aat_report 1
8 pages
Project2 2022 Fall
No ratings yet
Project2 2022 Fall
7 pages
Data Set Property Based K' in VDBSCAN Clustering Algorithm
No ratings yet
Data Set Property Based K' in VDBSCAN Clustering Algorithm
5 pages
51 DA5400_FML51_20250501 ProblemSet06
No ratings yet
51 DA5400_FML51_20250501 ProblemSet06
4 pages
lab_1
No ratings yet
lab_1
3 pages
ML UNIT 5..
No ratings yet
ML UNIT 5..
40 pages
HW_02
No ratings yet
HW_02
3 pages
assignment 1 face recognition updated(1)
No ratings yet
assignment 1 face recognition updated(1)
3 pages
Computer Vision (600.461/600.661) Homework 6: Segmentation and Recognition
No ratings yet
Computer Vision (600.461/600.661) Homework 6: Segmentation and Recognition
2 pages
Deep Learning (R20A6610)
No ratings yet
Deep Learning (R20A6610)
46 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
Bi12-019 Bi12-263 LW3
No ratings yet
Bi12-019 Bi12-263 LW3
35 pages
Homework 5
No ratings yet
Homework 5
2 pages
Image Classification
No ratings yet
Image Classification
18 pages
lab_1_1.2
No ratings yet
lab_1_1.2
4 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Classification and Dimension Reduction: Load Dataset
No ratings yet
Classification and Dimension Reduction: Load Dataset
11 pages
KNN Lab
No ratings yet
KNN Lab
4 pages
Choudhury21unsupervised Supp
No ratings yet
Choudhury21unsupervised Supp
9 pages
Classifying Chinese Handwritten Numerals Using Machine Learning Classification Methods (1)
No ratings yet
Classifying Chinese Handwritten Numerals Using Machine Learning Classification Methods (1)
13 pages
ML Notes
100% (2)
ML Notes
125 pages
Assignment - Data Science Concepts
No ratings yet
Assignment - Data Science Concepts
6 pages
new90机器学习刘扬
No ratings yet
new90机器学习刘扬
12 pages
Copy of Qb Ml Sem7 Computer 2023 3024.Docx
No ratings yet
Copy of Qb Ml Sem7 Computer 2023 3024.Docx
3 pages
8960 - DWM Experiment 5
No ratings yet
8960 - DWM Experiment 5
6 pages
Assignment 3-Image Processing
No ratings yet
Assignment 3-Image Processing
1 page
Image Segmentation in Python- Practical Hands-On (3)
No ratings yet
Image Segmentation in Python- Practical Hands-On (3)
24 pages
Ass 6 DSBDL
No ratings yet
Ass 6 DSBDL
6 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
KDD_WS_24_25_E4_Clustering_I
No ratings yet
KDD_WS_24_25_E4_Clustering_I
2 pages
Assignment 3 Specification
No ratings yet
Assignment 3 Specification
3 pages
Exp-6
No ratings yet
Exp-6
5 pages
Untitled 9
No ratings yet
Untitled 9
17 pages
Computer_Vision__Winter_2025_HW2
No ratings yet
Computer_Vision__Winter_2025_HW2
4 pages
2010.01650
No ratings yet
2010.01650
4 pages
Minor Assignment - 4 (Machine Learning-Classification, Regression and Clustering)
No ratings yet
Minor Assignment - 4 (Machine Learning-Classification, Regression and Clustering)
2 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
Homework 1
50% (2)
Homework 1
3 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
An Implementation of K-Means Clustering For Efficient Image Segmentation
No ratings yet
An Implementation of K-Means Clustering For Efficient Image Segmentation
10 pages
MLP Unit-2
No ratings yet
MLP Unit-2
102 pages
Image Classification Using K-Mean Algorithm
No ratings yet
Image Classification Using K-Mean Algorithm
4 pages
Shahapure 2020
No ratings yet
Shahapure 2020
2 pages
DL Smit
No ratings yet
DL Smit
33 pages
unit 6 ai
No ratings yet
unit 6 ai
28 pages
lab report
No ratings yet
lab report
3 pages
AI60201_2024_endsem_solutions (1)
No ratings yet
AI60201_2024_endsem_solutions (1)
5 pages
Experiment 9 IVP - REGION - N292 - GROWING
No ratings yet
Experiment 9 IVP - REGION - N292 - GROWING
5 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Simatic S5 ET 100U Distributed I/Os: Manual
No ratings yet
Simatic S5 ET 100U Distributed I/Os: Manual
275 pages
Crochet Bear Amigurumi Pattern Amiguroom Toys
No ratings yet
Crochet Bear Amigurumi Pattern Amiguroom Toys
1 page
Employment Agreement
No ratings yet
Employment Agreement
2 pages
Understanding The Microstructure of Overheated Carbon Steel: Constituents
No ratings yet
Understanding The Microstructure of Overheated Carbon Steel: Constituents
6 pages
LVNV Funding, L.L.C. v. Colvell
100% (1)
LVNV Funding, L.L.C. v. Colvell
4 pages
Alexandra Hodge Resume
No ratings yet
Alexandra Hodge Resume
1 page
Powell Intranet Brochure
No ratings yet
Powell Intranet Brochure
2 pages
QUIZ#1
No ratings yet
QUIZ#1
1 page
INDEPENDENT LEARNING PLAN in Ethics For February 26 2020
No ratings yet
INDEPENDENT LEARNING PLAN in Ethics For February 26 2020
3 pages
MB-280
No ratings yet
MB-280
12 pages
Azeemraza 2020
No ratings yet
Azeemraza 2020
16 pages
GEA PHE Evaporation en
No ratings yet
GEA PHE Evaporation en
16 pages
Top Travel Trip Email
No ratings yet
Top Travel Trip Email
2 pages
Notes On Conceptual Architectu - Peter Eisenman - 603
No ratings yet
Notes On Conceptual Architectu - Peter Eisenman - 603
29 pages
Republic Act No 7925
No ratings yet
Republic Act No 7925
2 pages
Namrod Fiddlebrook
No ratings yet
Namrod Fiddlebrook
5 pages
Let'S Solve A Consulting Case
No ratings yet
Let'S Solve A Consulting Case
9 pages
Professional Baking College Version with CD Rom 4th Edition Wayne Gisslendownload
100% (1)
Professional Baking College Version with CD Rom 4th Edition Wayne Gisslendownload
45 pages
Turkish Petroleum Overseas Company About - Google
No ratings yet
Turkish Petroleum Overseas Company About - Google
1 page
Digital design theory readings from the field First Edition Armstrong - The ebook with all chapters is available with just one click
100% (2)
Digital design theory readings from the field First Edition Armstrong - The ebook with all chapters is available with just one click
70 pages
List of Sony α Cameras - Wikipedia
No ratings yet
List of Sony α Cameras - Wikipedia
7 pages
Picture of My Heart Crash Landing On You OST
No ratings yet
Picture of My Heart Crash Landing On You OST
7 pages
Mia Khalifa - Google Search
No ratings yet
Mia Khalifa - Google Search
1 page
G9 Q1 Diagnostic Test
100% (1)
G9 Q1 Diagnostic Test
5 pages
Material Safety Data Sheet: 1 Identification of The Substance & Company Information
50% (2)
Material Safety Data Sheet: 1 Identification of The Substance & Company Information
4 pages
Final Copy of NNBC - 2017
No ratings yet
Final Copy of NNBC - 2017
428 pages
Fire Emergency and Evacuation Plan - 083140
No ratings yet
Fire Emergency and Evacuation Plan - 083140
11 pages
Neonatal Exposure To Thimerosal From Vaccines and Child Development in The First 3 Years of Life
No ratings yet
Neonatal Exposure To Thimerosal From Vaccines and Child Development in The First 3 Years of Life
6 pages
CHEM Test 1
No ratings yet
CHEM Test 1
3 pages