Assignment 4_9c1ff867-2368-4c16-8b4b-5adfbd043c28

The document outlines an assignment for a course on Data Science and Machine Intelligence, focusing on decision trees and Gaussian mixture modeling. It includes several tasks such as computing Gini indices, constructing regression trees, vector quantization, and deriving maximum likelihood estimation for Gaussian mixtures. Additionally, it contains programming questions related to building and visualizing classification and regression trees using provided datasets.

Uploaded by

Pankaj SIngh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Assignment 4_9c1ff867-2368-4c16-8b4b-5adfbd043c28

Uploaded by

Pankaj SIngh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

EE708: Fundamentals of Data Science and Machine Intelligence

Assignment 4
Based on Module 4B: Decision Trees and Module 5: Gaussian Mixture Modeling

1. A dataset contains 200 samples classified into two classes: 120 positive and 80 negatives.
a. Compute the Gini index before splitting.
b. If a split results in subsets:
Left: (50 positive, 10 negative)
Right: (70 positive, 70 negative)
Compute the weighted Gini index and determine whether the split improves purity.
2. Consider the given dataset with two independent variables (𝑥1 , 𝑥2 ) 𝒙𝟏 𝒙𝟐 𝒚
and one dependent variable (𝑦): 1 5 10
a. Use the sum of squared errors (SSE) to determine the best 2 6 12
splitting point for 𝑥1 . 3 8 15
b. Construct the first split of a regression tree using SSE as the 4 10 18
impurity measure.
5 12 21
3. Consider a 2-dimensional feature space with a dataset of 𝑁 = 10
6 15 25
points. A vector quantization (VQ) system maps these points into
𝐾 = 3 clusters using a codebook. The distortion function is the 7 18 28
squared Euclidean distance between the original points and their 8 20 30
assigned cluster centroids. Given the following initial cluster centroids:
𝐶1 = (2,3), 𝐶2 = (5,8), 𝐶3 = (9,4)
Assign the following data points to their closest centroid using squared Euclidean distance:
(1,2), (3,4), (6,7), (8,3), (5,5)
a. Compute the new centroids after one iteration of vector quantization.
b. Show whether the distortion decreases after this iteration.
4. Show that if we maximize the first equation with respect to Σ𝑘 and π𝑘 while keeping the
responsibilities γ(𝑧𝑛𝑘 ) fixed, we obtain the closed-form solutions given by the following
equations:
𝑁 𝐾

𝐸𝑍 [ln 𝑝 (𝑋, 𝑍|μ, Σ, π)] = ∑ ∑ γ(𝑧𝑛𝑘 )(ln π𝑘 + ln 𝒩 (𝑥𝑛 |μ𝑘 , Σ𝑘 ))

𝑛=1 𝑘=1
𝑁
1
Σ𝑘 = ∑ γ(𝑧𝑛𝑘 )(𝑥𝑛 − μ𝑘 )(𝑥𝑛 − μ𝑘 )T
𝑁𝑘
𝑛=1
𝑁𝑘
π𝑘 =
𝑁
5. Consider a density model given by a mixture distribution
𝐾

𝑝(𝑥) = ∑ π𝑘 𝑝( 𝑥 ∣ 𝑘 )
𝑘=1
and suppose that we partition the vector x into two parts so that x = (xa, xb). Show that the
conditional density 𝑝( 𝑥𝑏 ∣ 𝑥𝑎 ) is itself a mixture distribution and find expressions for the
mixing coefficients and component densities.
6. Consider a mixture of Gaussian distributions given by
𝐾

𝑝(𝑥|Θ) = ∑ 𝜋𝑘 𝒩(𝑥|𝜇𝑘 , Σ𝑘 )
𝑘=1

1
where:
𝐾: number of Gaussian components
𝜋𝑘 : mixing coefficients such that ∑𝐾 𝑘=1 𝜋𝑘 = 1 and 𝜋𝑘 > 0
𝒩(𝑥|𝜇𝑘 , Σ𝑘 ): Gaussian density with mean 𝜇𝑘 and covariance Σ𝑘
Θ = {𝜋𝑘 , 𝜇𝑘 , Σ𝑘 }Kk=1 represents the parameters of the model.
a. Write down the complete log-likelihood function for a dataset {𝑥1 , 𝑥2 , … , 𝑥𝑁 } assuming
that the data points are drawn independently from the mixture model.
b. Derive the Maximum Likelihood Estimation (MLE) update rules for 𝜋𝑘 , 𝜇𝑘 and Σ𝑘
assuming that the component that generated each data point is known.

Programming Questions:
7. Write a code to obtain a fully grown regression tree for the data given in Q2 and visualize the
regression tree.
8. Binary classification tree:
a. Train a fully grown binary classification tree based on Gini impurity using the dataset
A4_train.csv and visualize it.
b. Compute the Sum of Squared Errors (SSE) on the test dataset (A4_test.csv) at each depth and
plot the variation of SSE with depth.
c. Determine the optimal pruning depth by selecting the depth where SSE change is minimal.
d. Visualize the pruned tree.

SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
HW 1
No ratings yet
HW 1
4 pages
MT2023-Sol
No ratings yet
MT2023-Sol
8 pages
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
No ratings yet
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
3 pages
Homework Set 3
No ratings yet
Homework Set 3
7 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
CSE 474/574 Introduction To Machine Learning Fall 2011 Assignment 3
No ratings yet
CSE 474/574 Introduction To Machine Learning Fall 2011 Assignment 3
3 pages
HW 2
No ratings yet
HW 2
7 pages
AI60201_module3_4_problems (1)
No ratings yet
AI60201_module3_4_problems (1)
4 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
PCCCS504 Module 4
No ratings yet
PCCCS504 Module 4
4 pages
MIT18 05S14 Prac Fnal Exm
No ratings yet
MIT18 05S14 Prac Fnal Exm
8 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
Topic 2 Matlab Examples
No ratings yet
Topic 2 Matlab Examples
5 pages
hw7 Sol
No ratings yet
hw7 Sol
12 pages
Merged Exercises
No ratings yet
Merged Exercises
238 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
LNCS 2810 A Mixture Model Approach for Binned Data Clustering 1st Edition by Allou SamÃ©, Christophe Ambroise, GÃ©rard GovaertÂ ISBN 3540452311 9783540452317 instant download
100% (5)
LNCS 2810 A Mixture Model Approach for Binned Data Clustering 1st Edition by Allou SamÃ©, Christophe Ambroise, GÃ©rard GovaertÂ ISBN 3540452311 9783540452317 instant download
46 pages
Cse291d 7
No ratings yet
Cse291d 7
39 pages
Notes7_Mixtures_and_EM
No ratings yet
Notes7_Mixtures_and_EM
7 pages
Ps 4
No ratings yet
Ps 4
6 pages
Midterm - EE511 - Part B: K K K K
No ratings yet
Midterm - EE511 - Part B: K K K K
8 pages
Pattern Recognition 21BR551 MODULE 03 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 03 NOTES
16 pages
M03 Clustering (1)
No ratings yet
M03 Clustering (1)
37 pages
INAIO_Stage_2_Sample_Problems_MLTheory
No ratings yet
INAIO_Stage_2_Sample_Problems_MLTheory
6 pages
Conjugate Gradient Methods For High-Dimensional Glmms
No ratings yet
Conjugate Gradient Methods For High-Dimensional Glmms
39 pages
2023_Summer_Final (1)
No ratings yet
2023_Summer_Final (1)
21 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
TD10 - td_gmm_2025
No ratings yet
TD10 - td_gmm_2025
1 page
Kernel MDL to Determine the Number of Clusters 1st Edition by Ivan Kyrgyzov, Olexiy Kyrgyzov, Henri MaÃ®tre, Marine Campedel 9783540734987pdf download
No ratings yet
Kernel MDL to Determine the Number of Clusters 1st Edition by Ivan Kyrgyzov, Olexiy Kyrgyzov, Henri MaÃ®tre, Marine Campedel 9783540734987pdf download
43 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Lecture5
No ratings yet
Lecture5
27 pages
Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页
No ratings yet
Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页
5 pages
MLSlides5 - Selected - Shared
No ratings yet
MLSlides5 - Selected - Shared
30 pages
Quiz3_2024
No ratings yet
Quiz3_2024
2 pages
2_21032025
No ratings yet
2_21032025
9 pages
HW 4
No ratings yet
HW 4
5 pages
Problem Sheet 1 (1)
No ratings yet
Problem Sheet 1 (1)
3 pages
endsem_ML_regular_AK
No ratings yet
endsem_ML_regular_AK
7 pages
Final Compre - Solutions - updated FoDS
No ratings yet
Final Compre - Solutions - updated FoDS
12 pages
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
No ratings yet
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
4 pages
EE4146_Test1_202324_semB_solution
No ratings yet
EE4146_Test1_202324_semB_solution
7 pages
CB PDF
No ratings yet
CB PDF
69 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
No ratings yet
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
11 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
Quiz1_18September2021-Ans
No ratings yet
Quiz1_18September2021-Ans
3 pages
The Gibbs Sampler: Function
No ratings yet
The Gibbs Sampler: Function
1 page
2011_end_spring_2011_Computer_Science_Machine_Learning
No ratings yet
2011_end_spring_2011_Computer_Science_Machine_Learning
10 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Sample Exam For ML YSZ: Question 1 (Linear Regression)
No ratings yet
Sample Exam For ML YSZ: Question 1 (Linear Regression)
4 pages
Homework 2 - Handwriting: Grading Policy
No ratings yet
Homework 2 - Handwriting: Grading Policy
3 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
No ratings yet
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Image and Video Compression Fundamentals Techniques and Applications 1st Edition Madhuri A. Joshi 2024 scribd download
100% (7)
Image and Video Compression Fundamentals Techniques and Applications 1st Edition Madhuri A. Joshi 2024 scribd download
67 pages
Unit 2 - Speech and Video Processing (SVP) - 1
No ratings yet
Unit 2 - Speech and Video Processing (SVP) - 1
23 pages
Enhanced Initialization Method For LBG Codebook Design Algorithm in Vector Quantization of Images
No ratings yet
Enhanced Initialization Method For LBG Codebook Design Algorithm in Vector Quantization of Images
4 pages
HTCS501 unit 5
No ratings yet
HTCS501 unit 5
21 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt
No ratings yet
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt
13 pages
Shao-Han Liu and Jzau-Sheng Lin - A Compensated Fuzzy Hopfield Neural Network For Codebook Design in Vector Quantization
No ratings yet
Shao-Han Liu and Jzau-Sheng Lin - A Compensated Fuzzy Hopfield Neural Network For Codebook Design in Vector Quantization
13 pages
DIP Lecture Note - Image Compression
No ratings yet
DIP Lecture Note - Image Compression
23 pages
Linear Vector Quntization
No ratings yet
Linear Vector Quntization
22 pages
2211 00508
No ratings yet
2211 00508
5 pages
Speech To Text
No ratings yet
Speech To Text
6 pages
A. Increase B. Decrease C. Can't Say D. None of The Above Answer
No ratings yet
A. Increase B. Decrease C. Can't Say D. None of The Above Answer
50 pages
VideoLogic Vector Quantisation
No ratings yet
VideoLogic Vector Quantisation
9 pages
What We Instagram
No ratings yet
What We Instagram
4 pages
Name: PR Iyas Ingh Ahirw Ar CONTACT NO: 09975890088/ 07083880083 Email-Id
No ratings yet
Name: PR Iyas Ingh Ahirw Ar CONTACT NO: 09975890088/ 07083880083 Email-Id
2 pages
Vector Quantization: April 2006
No ratings yet
Vector Quantization: April 2006
25 pages
NNDL Assignment Ans
No ratings yet
NNDL Assignment Ans
15 pages
Voice Recognition Thesis Topic
100% (3)
Voice Recognition Thesis Topic
8 pages
DCDR Important Question Bank (Assignment/DPR Question) : Faculty of Engineering - 083 Information Technology - 16
No ratings yet
DCDR Important Question Bank (Assignment/DPR Question) : Faculty of Engineering - 083 Information Technology - 16
1 page
Hala Paper
No ratings yet
Hala Paper
6 pages
Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
37 pages
Music Gen
No ratings yet
Music Gen
17 pages
Get Image and video compression for multimedia engineering: fundamentals, algorithms, and standards Third Edition Shi free all chapters
100% (2)
Get Image and video compression for multimedia engineering: fundamentals, algorithms, and standards Third Edition Shi free all chapters
62 pages
Online Product Quantization: Donna Xu, Ivor W. Tsang, and Ying Zhang
No ratings yet
Online Product Quantization: Donna Xu, Ivor W. Tsang, and Ying Zhang
14 pages
Segmentation of Connected Arabic Characters Using Hidden Markov Models
No ratings yet
Segmentation of Connected Arabic Characters Using Hidden Markov Models
5 pages
Vector Quantization
No ratings yet
Vector Quantization
12 pages
3 Vector Quantization - LBG
No ratings yet
3 Vector Quantization - LBG
9 pages
Seeing Beyond The Brain: Conditional Diffusion Model With Sparse Masked Modeling For Vision Decoding
No ratings yet
Seeing Beyond The Brain: Conditional Diffusion Model With Sparse Masked Modeling For Vision Decoding
20 pages
Image and video compression for multimedia engineering: fundamentals, algorithms, and standards Third Edition Shi - The full ebook with complete content is ready for download
100% (4)
Image and video compression for multimedia engineering: fundamentals, algorithms, and standards Third Edition Shi - The full ebook with complete content is ready for download
57 pages

Assignment 4_9c1ff867-2368-4c16-8b4b-5adfbd043c28

Uploaded by

Assignment 4_9c1ff867-2368-4c16-8b4b-5adfbd043c28

Uploaded by

EE708: Fundamentals of Data Science and Machine Intelligence

𝐸𝑍 [ln 𝑝 (𝑋, 𝑍|μ, Σ, π)] = ∑ ∑ γ(𝑧𝑛𝑘 )(ln π𝑘 + ln 𝒩 (𝑥𝑛 |μ𝑘 , Σ𝑘 ))

You might also like