Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm

This document outlines an assignment on basic machine learning concepts. It includes questions on random variables and their distributions, data simulation, decision theory using a medical dataset, and sufficient statistics. Students are asked to submit code for marked questions. The deadline is March 27, 2015. Students should submit solutions and code as a single zip file to the Moodle system. Various software options like MATLAB and Octave are recommended.

Uploaded by

pasomaga

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm

Uploaded by

pasomaga

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Machine Learning

Assignment 1
Basic Concepts

Due: 27 March 2015, 15:00pm

Please hand in your solutions in the class, and upload a pdf version with the code in a
single .zip file to the Moodle system before the deadline. Your submission delay is rounded
up to a day, i.e. one minute delay is considered one day.
You are supposed to submit your code for the questions that are marked with [+CODE].
After each section, the interface you should provide for your code is specified. Your code
will be tested by calling a function as mentioned.
For computations and plotting the graphs, you are free to use any software/language of
your choice. The recommended tool is MatLab/Octave, since they will be by far the easiest
tools for the next assignments. You better get used to them sooner than later.
MatLab: http://www.mathworks.com/
Octave : http://octave.sourceforge.net/
Fast Octave installation in Ubuntu: apt-get install octave octave-signal
Fast Octave installation in MacOS: port install octave octave-signal
Inside Octave, load a package (e.g. signal) by: pkg load signal

Question 1 (Function of Random Variables): Imagine two random

variables X and Y with probability distributions fX (x) and fY (y), and the merged random
variable Z = X + Y with probability distribution fZ (z).
(a): Prove
fZ (z) = [fX (x) fY (y)]|z , (1)
where f (.)|z is the value of the function f at location z, i.e. f (z), and is the convolution
operator: Z
[f (x) g(y)]|z = f (x)g(z x)dx (2)
(b): Suppose the two random variables X and Y are Gaussian:

2
X N X , X Y N Y , Y2 . (3)
Prove that the new random variable Z is also a Gaussian, with distribution:

2
Z N X + Y , X + Y2 (4)

1
( HINT: To calculate the convolution, you can split Z = X + Y into X = aZ + t and
Y = (1 a)Z t, instead of X = x and Y = Z x; i.e.:
Z Z
fZ (z) = fX (x)fY (z x)dx = fX (az + t)fY ((1 a)z t) dt. (5)

The equations become straightforward for one specific choice of a. )

Question 2 (Data Simulation [+CODE]): The random generator function

randn in MatLab generates one sample from a random variable X N (0, 1). Using
samples from X random variable, how can someone generate samples of another Gaussian
random variable Y N (10, 5)? Why?
CODE: You should provide a function that generates n samples from the requested dis-
tribution: a = generateSamples(n)

Question 3 (Decision Theory [+CODE]): In this question, you need to load

our imaginary medical dataset:
http://vda.univie.ac.at/Teaching/ML/15s/assignments/asgn01-data.zip.
The data is available in MAT and TXT format. You can load either one in Matlab/Octave,
using load data.mat or load data.txt. The columns of the data are (BT, WBC, DS,
I).
We would like to decide if a person has infection based on three attributes: their White
Blood Cells count (WBC), their Body Temperature (BT), and their Daily Sleep (DS). The
algorithm should classify the patient as Healthy (CH ) or Infected (CI ). For simplicity,
you can imaging an attribute I, which is +1 if the person is infected, and 1 if the patient
is healthy.
We have 2000 imaginary people in the dataset file with their WBC, BT, and DS. The
person p is healthy if Ip = 1, and infected if Ip = +1. Using the training data, we would
like to design an algorithm to find out the infection status of new patients.
(a): Draw the data points in 2D projections BT-WBC, WBC-DS, and DS-BT, and
color code the infection state. Color the infected patients with red, and healthy patients
with blue. (HINT: use scatter(X,Y,color) function in MatLab/Octave)
CODE: function runA(data)
(b): Draw the following distribution pairs, each pair in one graph:

p(W BC|CI ) and p(W BC|CH );

p(BT |CI ) and p(BT |CH );

p(DS|CI ), p(DS|CH )

(HINT: You can use histogram with proper bin size. You can use [a,b]=hist(X) function
in MatLab and plot the histogram with plot(b,a,color). For the distribution, be careful
about the normalization factor.)
CODE: function runB(data)

Page 2
(c): Based on the visual look of the graphs, which attribute is the best to determine the
patients infection status? Why? (NOTE: There might be more than one correct answer.
Your answer to the why question is what matters.)
(d): Find out how much each data attribute can tell us about the infection status by
calculating their correlation with the infection status. Which correlation has the highest
value? Is it consistent with your reasoning in the previous section?
CODE: function runD(data)
(e): Plot Infection-vs.-WBC, Infection-vs.-BT, and Infection-vs.-DS in three different
graphs (For the infection, assume I=-1 if the person is healthy, and I=+1 if infected). How
do the correlations you calculated before reflect on these graphs? How can this graph tell us
about our single-attribute decision algorithm?
CODE: function runE(data)
Sufficient Statistic: Given a set X of i.i.d. data with probability p(X|) for an unknown
parameter , a sufficient statistic is a function T (X), which contains all the information that
X provides to estimate . In other words:
P (|T (X), X) = P (|T (X)). (6)
From this point, assume p(W BC, BT, DS|CI ) and p(W BC, BT, DS|CH ) have Gaussian
distributions. As a result, all their slices and projections are also Gaussian.
(f ): What are the sufficient statistics (TI and TH ) for the ML estimator of the parameters
of p(W BC|CH ) and p(W BC|CI ) distributions? If TM = [TI , TH , K], what parameter K can
make TM sufficient statistic for estimating the parameters of p(W BC)?
(g): What are p(CH ) and p(CI ) in the training dataset? Calculate p(W BC) from the
training data.
CODE: function runG(data)
(h): Imagine the data is captured from people who came to the hospital for a checkup,
while in the real-world, only 5% of the people are infected. What is the estimated p(W BC)
for the real world? Why?
CODE: function runH(data)
(i): Design the decision algorithm, only based on WBC, with:
Maximum Likelihood approach
Minimum Cost approach, considering the cost of mis-classifying an infected person as
10 times higher than a healthy person.
Maximum A Posteriori approach, considering that only 5% of people in the real world
has are infected.
CODE: functions runI ML(data), runI COST(data), runI MAP(data)
(j): Consider the Maximum Likelihood approach. We would like to select only one
attribute for our decision making. Using the data in the dataset, find out which attribute
is the best for our decision making, using 10-fold cross-validation. Is your result consistent
with your analysis in the previous sections? Why?
CODE: function runJ(data)

Page 3

Practice Midterm
No ratings yet
Practice Midterm
4 pages
Modern Control Systems (MCS) : Lecture-2-3-4 Root Locus
No ratings yet
Modern Control Systems (MCS) : Lecture-2-3-4 Root Locus
28 pages
HW 1
No ratings yet
HW 1
4 pages
Assignment_III
No ratings yet
Assignment_III
3 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
ML - Compre - Question - Paper - 2022 - 23 - Marking Scheme
No ratings yet
ML - Compre - Question - Paper - 2022 - 23 - Marking Scheme
6 pages
101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann
No ratings yet
101827-FS2018-0: Programming With MATLAB: Advanced Course: Felix Wichmann
31 pages
Topic 2 Matlab Examples
No ratings yet
Topic 2 Matlab Examples
5 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Solution
No ratings yet
Solution
148 pages
Final 2006
No ratings yet
Final 2006
15 pages
PATTERN FILE[1]
No ratings yet
PATTERN FILE[1]
29 pages
CS 215: Data Analysis and Interpretation: Sample Questions
No ratings yet
CS 215: Data Analysis and Interpretation: Sample Questions
10 pages
CST383-A
No ratings yet
CST383-A
4 pages
Final Review Handout
No ratings yet
Final Review Handout
47 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
8 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Assignment 1
No ratings yet
Assignment 1
16 pages
2CSOE03_IR_December_2022 (2)
No ratings yet
2CSOE03_IR_December_2022 (2)
4 pages
cs419endsemsols
No ratings yet
cs419endsemsols
6 pages
3 Practical
No ratings yet
3 Practical
2 pages
ml-20230316-1
No ratings yet
ml-20230316-1
9 pages
Matlab For Pattern Recognition
No ratings yet
Matlab For Pattern Recognition
58 pages
Question Bank
No ratings yet
Question Bank
6 pages
HW 4
No ratings yet
HW 4
5 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
No ratings yet
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
78 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
Midterm - EE511 - Part B: K K K K
No ratings yet
Midterm - EE511 - Part B: K K K K
8 pages
Model Fitting and Error Estimation: BSR 1803 Systems Biology: Biomedical Modeling
No ratings yet
Model Fitting and Error Estimation: BSR 1803 Systems Biology: Biomedical Modeling
34 pages
HW_02
No ratings yet
HW_02
3 pages
DS Ex1975
No ratings yet
DS Ex1975
5 pages
ML Lab Experiments (1) - Pages-4
No ratings yet
ML Lab Experiments (1) - Pages-4
10 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
22CB340
No ratings yet
22CB340
4 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
hw7 Sol
No ratings yet
hw7 Sol
12 pages
Artificial Intelligence & BA - Practicals Assignments
No ratings yet
Artificial Intelligence & BA - Practicals Assignments
15 pages
Quiz (Dama51- q2)
No ratings yet
Quiz (Dama51- q2)
8 pages
UnivariateRegression Summary
No ratings yet
UnivariateRegression Summary
36 pages
Linear Regression Example
No ratings yet
Linear Regression Example
26 pages
Econometrics 2019 PDF
No ratings yet
Econometrics 2019 PDF
143 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
StatisticsToolbox II
No ratings yet
StatisticsToolbox II
16 pages
Version 1
No ratings yet
Version 1
18 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Lecture 7
No ratings yet
Lecture 7
83 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
CS725 2020 Quiz1
No ratings yet
CS725 2020 Quiz1
3 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
MIT18 05S14 Prac Fnal Exm
No ratings yet
MIT18 05S14 Prac Fnal Exm
8 pages
Print Merged
No ratings yet
Print Merged
23 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Neurotoxicology and Teratology: Full Length Article
No ratings yet
Neurotoxicology and Teratology: Full Length Article
7 pages
HW 8
No ratings yet
HW 8
1 page
Complex Diff
No ratings yet
Complex Diff
2 pages
Duchi 16
No ratings yet
Duchi 16
88 pages
Data-Driven Baseline Estimation of Residential Buildings For Demand Response
No ratings yet
Data-Driven Baseline Estimation of Residential Buildings For Demand Response
21 pages
Encyclopedia of Slavery in The United States PDF
No ratings yet
Encyclopedia of Slavery in The United States PDF
910 pages
WinEdt Cannot Work With Acrobat Reader 10 (X) Whole Text
No ratings yet
WinEdt Cannot Work With Acrobat Reader 10 (X) Whole Text
3 pages
Proof of The Cofactor Expansion Theorem 1
No ratings yet
Proof of The Cofactor Expansion Theorem 1
13 pages
Introduction To Percolation
100% (1)
Introduction To Percolation
25 pages
A Mini Course On Percolation Theory
No ratings yet
A Mini Course On Percolation Theory
38 pages
MMC Overview: For Class Use Only Do Not Distribute
No ratings yet
MMC Overview: For Class Use Only Do Not Distribute
37 pages
Nash Equilibria in Competitive Societies, With Applications To Facility Location, Traffic Routing and Auctions
No ratings yet
Nash Equilibria in Competitive Societies, With Applications To Facility Location, Traffic Routing and Auctions
10 pages
The Physical Meaning of Replica Symmetry Breaking
No ratings yet
The Physical Meaning of Replica Symmetry Breaking
15 pages
Clustering With Shallow Trees
No ratings yet
Clustering With Shallow Trees
17 pages
48d1e801-6130-43a6-9967-baad05b0e35a
No ratings yet
48d1e801-6130-43a6-9967-baad05b0e35a
26 pages
4 - Boundary Value Analysis
No ratings yet
4 - Boundary Value Analysis
38 pages
Disign and Analysis of Algorith - Overview
No ratings yet
Disign and Analysis of Algorith - Overview
23 pages
Chap 3
No ratings yet
Chap 3
5 pages
1437235816control Systems (K-Wiki Root Locus) PDF
No ratings yet
1437235816control Systems (K-Wiki Root Locus) PDF
24 pages
OR Ch-2
No ratings yet
OR Ch-2
30 pages
Final Exams - Sample Paper CS301P
No ratings yet
Final Exams - Sample Paper CS301P
9 pages
Holters 2009
No ratings yet
Holters 2009
6 pages
The Resultant of Chebyshev Polynomials PDF
No ratings yet
The Resultant of Chebyshev Polynomials PDF
9 pages
CAPE Integrated Mathematics Stationary Points
100% (2)
CAPE Integrated Mathematics Stationary Points
14 pages
Machine Learning Roadmap
No ratings yet
Machine Learning Roadmap
22 pages
Cancer Detection and Segmentation Project PPT Compressed
No ratings yet
Cancer Detection and Segmentation Project PPT Compressed
12 pages
Maxwell's Equations: Electromagnetism
No ratings yet
Maxwell's Equations: Electromagnetism
3 pages
Stochastic Optimal Control in Infinite Dimension Dynamic Programming and HJB Equations 1st Edition Giorgio Fabbri Ebook All Chapters PDF
100% (9)
Stochastic Optimal Control in Infinite Dimension Dynamic Programming and HJB Equations 1st Edition Giorgio Fabbri Ebook All Chapters PDF
62 pages
Zio2017 Solutions PDF
No ratings yet
Zio2017 Solutions PDF
1 page
Clustering
No ratings yet
Clustering
42 pages
Progress Draft Propossal Deffense 1
No ratings yet
Progress Draft Propossal Deffense 1
14 pages
1 Introduction To Finite Element Methods For Electromagnetic Fields and Coupled Problems
No ratings yet
1 Introduction To Finite Element Methods For Electromagnetic Fields and Coupled Problems
128 pages
GPC 2.3 D SCP03 v1.1.2 PublicRelease
No ratings yet
GPC 2.3 D SCP03 v1.1.2 PublicRelease
41 pages
Water Jag Problem
No ratings yet
Water Jag Problem
36 pages
2023 Maths Bank AA SL Mock Paper 2 Markscheme
No ratings yet
2023 Maths Bank AA SL Mock Paper 2 Markscheme
10 pages
Project Report
No ratings yet
Project Report
3 pages
Novel Fuzzy
No ratings yet
Novel Fuzzy
44 pages
JFT175TopologyOptimizationBasic e
No ratings yet
JFT175TopologyOptimizationBasic e
11 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
Fuzzy TNorm and SNorm
No ratings yet
Fuzzy TNorm and SNorm
16 pages
Spell Check and Soundex
No ratings yet
Spell Check and Soundex
19 pages
Wavelet Convnets For Texture Classification
No ratings yet
Wavelet Convnets For Texture Classification
9 pages
Excel+Functions+to+Know+2023
No ratings yet
Excel+Functions+to+Know+2023
3 pages

Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm

Uploaded by

Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm

Uploaded by

Machine Learning

Due: 27 March 2015, 15:00pm

Question 1 (Function of Random Variables): Imagine two random

The equations become straightforward for one specific choice of a. )

Question 2 (Data Simulation [+CODE]): The random generator function

Question 3 (Decision Theory [+CODE]): In this question, you need to load

p(W BC|CI ) and p(W BC|CH );

p(BT |CI ) and p(BT |CH );

You might also like