DM-Ass03
DM-Ass03
DS3002-Data Mining
Assignment #3
Spring 2024
Total: 100
Deadline: 30 March 2024
1|Pag
e
Assignment 3 DS3002-Data Mining Total marks: 100
Important Instructions
You must complete the assignment before the deadline.
Since the assignment is a little lengthy, get started right
away!
Deliverables:
Report 10 Marks
Q1 (50 Marks)
Q2 (5+20+5+10 = 40 Marks)
2|Pag
e
Assignment 3 DS3002-Data Mining Total marks: 100
You should first write a function that performs K-Means Clustering on color pixels. The
input arguments are the desired number of clusters k and the data points to cluster. It
outputs a cluster index for each input data point, as well as the k cluster centroids. You
should then extract the seed pixels for each class (i.e., foreground and background) and
use your k-means function to obtain N clusters for each class. A good choice for N is 64,
but you can experiment with smaller or bigger values
You might see improved results if you remove the square operation in equation above.
The reason may have to do with the fact that squaring essentially reduces small values
even further, mitigating (adversely) the effects of the negative exponential. Finally, a
given pixel is simply assigned to the class to which it is more likely to belong. That is, if
𝑝𝑓𝑔(𝑝) > 𝑝𝑏𝑔(𝑝), pixel 𝑝 is assigned to the foreground class, i.e., 𝑥𝑝= 1, and vice versa.
Include your results for all test images in your report, and explain what you get. For test
images with two stroke images, you should report results for both cases. Also compare
results for different values of N, i.e., the number of clusters evolved in the foreground
and background classes.
3|Pag
e
Assignment 3 DS3002-Data Mining Total marks: 100
Tasks
1. Pre-process the dataset by normalizing each face image vector to unit
length (i.e., dividing each vector by its magnitude). Next, for each of the
10 subjects, randomly select 150 images for training and use the
remaining 20 for testing.
3. Use Sklearn to apply SVM and GaussianNB on this dataset and compare
the accuracy with K-NN in the report.
4|Pag
e
Assignment 3 DS3002-Data Mining Total marks: 100
Rubric
Criteria Marks
Marks Distribution
- Report 10
- Q1 50
- Q2 40
Overall Presentation
- Clarity and organization of the report 5
- Documentation and comments in code 5
5|Pag
e