0% found this document useful (0 votes)

20 views

3 PDF

This document contains solutions to 10 problems involving basic concepts in probability and statistics using machine learning techniques. Problem 2.1 involves drawing samples from a Bernoulli distribution to estimate the probability parameter and commenting on the results with different sample sizes. Problem 2.2 extends this to a categorical distribution. Later problems involve computing expectations, variances and probabilities for different distributions, verifying the results using empirical estimates from samples. Vector and linear algebra concepts are also illustrated, including computing norms, inner products, and orthogonal vectors. Plots are provided to visualize the distributions and vector relationships.

Uploaded by

Tala Abdelghani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

3 PDF

Uploaded by

Tala Abdelghani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Machine Learning for Engineers:

Chapter 2. Basic Background - Problems

Osvaldo Simeone

May 6, 2021

Osvaldo Simeone ML4Engineers 1 / 56

Problem 2.1

Consider a Bernoulli rv x ∼ Bern(0.2).

I Draw the pmf.
I Generate N = 10 independent realizations of rv x.
I Use these samples to estimate p(x = 1).
I Repeat the previous two points using N = 1000 samples.
I Comment on your results.

Osvaldo Simeone ML4Engineers 2 / 56

Problem 2.1: Solution

p=0.2;
stem([0,1],[1-p,p],’LineWidth’,2)
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
help binornd
binornd(1,p)
N=10;
x=binornd(1,p,N,1);
pest=mean(x)
N=1000;
x=binornd(1,p,N,1);
pest=mean(x)

Osvaldo Simeone ML4Engineers 3 / 56

Problem 2.2

Consider a categorical, or multinoulli rv, x ∼ Cat([0.2, 0.1, 0.3, 0.4]T ).

I Draw the pmf.

I Generate N = 10 independent realizations of rv x.
I Using these samples estimate the probabilities qk = p(x = k) for
k = 0, 1, 2, 3.
I Repeat the previous two points using N = 1000 samples.
I Comment on your results.

Osvaldo Simeone ML4Engineers 4 / 56

Problem 2.2: Solution

q=[0.2,0.1,0.3,0.4];
stem([0,1,2,3],q,’LineWidth’,2)
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
%%%
help mnrnd
mnrnd(1,q) %one-hot representation
N=10;
xoh=mnrnd(1,q,N);
pest=mean(xoh)
%%%
N=1000;
xoh=mnrnd(1,q,N);
pest=mean(xoh)

Osvaldo Simeone ML4Engineers 5 / 56

Problem 2.3

Consider a Gaussian rv x ∼ N (−3, 4).

I Draw the pdf.
I Generate N = 10 independent realizations of rv x.
I Using these samples estimate the probability Pr[x ∈ (−3, 3)].
I Repeat the previous two points using N = 1000 samples.
I Comment on your results.

Osvaldo Simeone ML4Engineers 6 / 56

Problem 2.3: Solutions

dx=0.01;
xaxis=[-9:dx:3];
help normpdf
plot(xaxis,normpdf(xaxis,-3,2),’LineWidth’,2); %note that we need to specify
the standard deviation and not the variance
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
%%%
normrnd(-3,2)
N=10;
x=normrnd(-3,2,N,1);
pest=mean(((-3<=x).*(x<=3)))
N=1000;
x=normrnd(-3,2,N,1);
pest=mean(((-3<=x).*(x<=3)))

Osvaldo Simeone ML4Engineers 7 / 56

Problem 2.4

Given a rv x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ) compute the expectation

Ex∼Cat(q) [x2 + 3 exp(x)].

Verify your calculation using an empirical estimate obtained by drawing random

samples.

Osvaldo Simeone ML4Engineers 8 / 56

Problem 2.4: Solution

By linearity of the expectation, we have

Ex∼Cat(q) [x2 + 3 exp(x)] = Ex∼Cat(q) [x2 ] + 3Ex∼Cat(q) [exp(x)],

where
Ex∼Cat(q) [x2 ] = 0.2 · 0 + 0.1 · 1 + 0.3 · 4 + 0.4 · 9 = 4.9
and

Ex∼Cat(q) [exp(x)] = 0.2 · 1 + 0.1 · exp(1) + 0.3 · exp(2) + 0.4 · exp(3) = 10.72.

So we we finally have

Ex∼Cat(q) [x2 + 3 exp(x)] = 4.9 + 3 · 10.72 = 37.06.

Osvaldo Simeone ML4Engineers 9 / 56

Problem 2.4: Solution

q=[0.2,0.1,0.3,0.4];
N=10000;
xoh=mnrnd(1,q,N);
x=xoh*[0,1,2,3]’; %convert from one-hot vector to scalar representation
expest=mean(x.ˆ2+3*exp(x))

Osvaldo Simeone ML4Engineers 10 / 56

Problem 2.5

Given a rv x ∼ N (−3, 4) compute the expectation

Ex∼N (−3,4) [x + 3x2 ].

Verify your calculation using an empirical estimate obtained by drawing random

samples.

Osvaldo Simeone ML4Engineers 11 / 56

Problem 2.5: Solution

By linearity of the expectation, we have

Ex∼N (−3,4) [x + 3x2 ] = Ex∼N (−3,4) [x] + 3Ex∼N (−3,4) [x2 ],

where
Ex∼N (−3,4) [x] = −3
and
Ex∼N (−3,4) [x2 ] = (−3)2 + 4 = 13.
Therefore, we finally have

Ex∼N (−3,4) [x + 3x2 ] = −3 + 3 · 13 = 36.

Osvaldo Simeone ML4Engineers 12 / 56

Problem 2.5: Solution

N=10000;
x=normrnd(-3,2,N,1);
expest=mean(x+3*x.ˆ2)

Osvaldo Simeone ML4Engineers 13 / 56

Problem 2.6

Plot the variance of Bernoulli rv Bern(p) as a function of p. What is the

value of p that maximizes uncertainty?

Osvaldo Simeone ML4Engineers 14 / 56

Problem 2.6: Solution

paxis=[0:0.01:1];
plot(paxis,paxis.*(1-paxis),’LineWidth’,2)
xlabel(’$p$’,’Interpreter’,’latex’)
ylabel(’Var$(p)$’,’Interpreter’,’latex’)

Osvaldo Simeone ML4Engineers 15 / 56

Problem 2.7

Compute the variance of random variable x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ) .

Osvaldo Simeone ML4Engineers 16 / 56

Problem 2.7: Solution

We compute the mean

Ex∼Cat(q) [x] = 0.2 · 0 + 0.1 · 1 + 0.3 · 2 + 0.4 · 3 = 1.9

and
Ex∼Cat(q) [x2 ] = 0.2 · 0 + 0.1 · 1 + 0.3 · 4 + 0.4 · 9 = 4.9
to obtain

Var(x) = Ex∼Cat(q) [x2 ] − (Ex∼Cat(q) [x])2 = 1.29.

Osvaldo Simeone ML4Engineers 17 / 56

Problem 2.8

Given a categorical rv x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ), what is the

expectation Ex∼Cat(q) [1(x = 0)]?
For a Gaussian rv x ∼ N (x|0, 1), what is the expectation
Ex∼N (x|0,1) [1(x = 0)]?

Osvaldo Simeone ML4Engineers 18 / 56

Problem 2.8: Solution

We have

Ex∼Cat(q) [1(x = 0)] =Pr[x = 0]

=0.2

and

Ex∼N (x|0,1) [1(x = 0)] =Pr[x = 0]

=0.

Osvaldo Simeone ML4Engineers 19 / 56

Problem 2.9

Given vectors x = [2, 1]T and y = [−1, 3]T ,

I represent the vectors in the two-dimensional plane R2 ;
I compute their inner product;
I compute the squared ` norms of the two vectors;
2
I compute the cosine of the angle between the two vectors;

I are the two vectors linearly independent?

I determine a vector that is orthogonal to x;
I normalize vector y so that it has unitary norm;
I give a vector such that the cosine of the angle with x equals 1 and a

vector with cosine -1;

I compute the element-wise product x y ;
I plot all the vectors determined at the previous points.

Osvaldo Simeone ML4Engineers 20 / 56

Problem 2.9: Solution

MATLAB code:
x=[2;1]; %or x=[2,1]’;
y=[-1;3];
%represent as points on the plane
plot(x(1),x(2),’x’,’LineWidth’,2,’MarkerSize’,10); hold on
plot(y(1),y(2),’o’,’LineWidth’,2,’MarkerSize’,10)
ylim([0,3])
%represent as arrows
drawArrow = @(a,b) quiver( a(1),a(2),b(1)-a(1),b(2)-a(2),0);
a = [0 0];
drawArrow(a,x);
hold on
drawArrow(a,y);
axis equal %aspect ratio so that data units are the same

Osvaldo Simeone ML4Engineers 21 / 56

Problem 2.9: Solution

Inner product

x T y = y T x = 2 · (−1) + 1 · 3 = 1.

Squared `2 norms of the two vectors

||x||2 = 22 + 12 = 5 and ||y ||2 = (−1)2 + 32 = 10.

Compute the cosine of the angle between the two vectors

xT y 1
cos(θ) = =√ = 0.14.
||x||||y || 5 · 10
Are the two vectors linearly independent? No.

Osvaldo Simeone ML4Engineers 22 / 56

Problem 2.9: Solution
A vector that is orthogonal to x:
z = [−1, 2]T
since z T x = −2 + 2 = 0.
Normalized vector y
−1
" #
y √
= 10 .
||y || √3
10
Vector such that the cosine of the angle with x equals 1 and a vector
with cosine of the angle equal to -1:
z = c[2, 1]T ,
where c > 0 so that the angle is 1 and c < 0 so that the cosine of the
angle is -1.
Element-wise product

2 · (−1) −2
x y = = .
1·3 3
Osvaldo Simeone ML4Engineers 23 / 56
Problem 2.9: Solution

x=[2;1]; y=[-1;3];
x’*y %inner product
norm(x) %l2 norm (not squared)
norm(y) %l2 norm (not squared)
x’*y/(norm(x)*norm(y)) %angle
%%%
drawArrow = @(a,b) quiver( a(1),a(2),b(1)-a(1),b(2)-a(2),0);
a = [0 0];
drawArrow(a,x);
hold on
drawArrow(a,y);
z=[-1;2];
drawArrow(a,z);
ynorm=y/norm(y);
drawArrow(a,ynorm);
zp=0.6*x;
zm=-0.6*x;
drawArrow(a,zp);
drawArrow(a,zm);

Osvaldo Simeone ML4Engineers 24 / 56

Problem 2.10

1 −1 2 3 −1
Given matrices A = and B = ,
−1 2 −1 2 3
I compute the product AB;
I compute the product B T AT ;
I compute the product Diag([1, 2]T )B;
I is A symmetric?
I if it is symmetric, evaluate eigenvectors and eigenvalues of A;

I is A positive definite, i.e., is A 0?

I plot the quadratic form x T Ax as a function of vector x = [x , x ]T for
1 2
x1 ∈ [−2, 2] and x2 ∈ [−2, 2];
I is BB T positive definite? Is it invertible?

Osvaldo Simeone ML4Engineers 25 / 56

Problem 2.10: Solution

1 −1 2 3 −1
Given matrices A = and B = , we have
−1 2 −1 2 3

[1, −1][2, −1]T = 3, [1, −1][3, 2]T = 1, [1, −1][−1, 3]T = −4

AB =
[−1, 2][2, −1]T = −4, [−1, 2][3, 2]T = 1 [−1, 2][−1, 3]T = 7

and  
3 −4
B T AT = (AB)T = 1 1 .
−4 7

Osvaldo Simeone ML4Engineers 26 / 56

Problem 2.10: Solution

We also have

T 1 0 2 3 −1 2 3 −1
Diag([1, 2] )B = = .
0 2 −1 2 3 −2 4 6

1 −1
A= is symmetric.
−1 2

Osvaldo Simeone ML4Engineers 27 / 56

Problem 2.10: Solution

A=[1,-1;-1,2]; B=[2,3,-1;-1,2,3];
A*B
B’*A’ %can be equivalently computed as (A*B)’
diag([1,2])*B %note that we have diag(A)=[1,2]
[U,L]=eig(A);
L %all eigenvalues are positive, so A is positive definite
U*L*U’ %this equals A
%%%
help mesh
x1axis=[-2:0.01:2];
x2axis=[-2:0.01:2];
[X1,X2] = meshgrid(x1axis,x2axis);
mesh(X1,X2,A(1,1)*X1.ˆ2+A(2,2)*X2.ˆ2+2*A(1,2)*X1.*X2)

Osvaldo Simeone ML4Engineers 28 / 56

Problem 2.10: Solution

[U,L]=eig(B*B’);
L %they are all positive and hence B*B’ is positive definite
rank(B*B’) %the matrix is 2x2 and the rank is 2 so it is invertible
inv(B*B’)
[U,L]=eig(B’*B);
L %one of the eigenvalues is zero and the others are positive, and
hence B’*B is positive semi-definite
rank(B’*B) %the matrix is 3x3 and the rank is 2 so it is not
invertible – try using inv(B’*B)!

Osvaldo Simeone ML4Engineers 29 / 56

Problem 2.11

x1
For the jointly Bernoulli random vector x = with the joint pmf
x2
p(x1 , x2 ) below, compute the marginal p(x2 ) and the conditional
distribution p(x2 |x1 = 1).
Generate N = 100 realizations of the rv x at the previous problem
and estimate the probabilities of each of the four configurations of the
outputs.
Repeat with N = 10000 and discuss your results.

x1 \x2 0 1
0 0.45 0.05 .
1 0.1 0.4

Osvaldo Simeone ML4Engineers 30 / 56

Problem 2.11: Solution

Marginal p(x2 ):

p(x2 = 1) = p(x1 = 0, x2 = 1) + p(x1 = 1, x2 = 1)

= 0.05 + 0.4 = 0.45.

So, we have x2 ∼ Bern(0.45).

Conditional distribution p(x2 |x1 ) :

x1 \x2 0 1
0 0.45/(0.45 + 0.05) = 0.9 0.1
1 0.1/(0.1 + 0.4) = 0.2 0.8

Osvaldo Simeone ML4Engineers 31 / 56

Problem 2.11: Solution

N=100;
x1=binornd(1,0.5,N,1); %this generates samples x1 from the marginal
p(x1)=Bern(0.5)
x2=zeros(N,1);
for n=1:N
if (x1(n)==0)
x2(n)=binornd(1,0.1); %this generates sample x2 from the conditional
p(x2|x1=0)=Bern(0.1)
else
x2(n)=binornd(1,0.8); %this generates sample x2 from the conditional
p(x2|x1=1)=Bern(0.8)
end
end

Osvaldo Simeone ML4Engineers 32 / 56

Problem 2.11: Solution

pest=zeros(4,1);
for n=1:N
if ((x1(n)==0)&&(x2(n)==0))
pest(1)=pest(1)+1/N;
elseif ((x1(n)==1)&&(x2(n)==0))
pest(2)=pest(2)+1/N;
elseif ((x1(n)==0)&&(x2(n)==1))
pest(3)=pest(3)+1/N;
elseif ((x1(n)==1)&&(x2(n)==1))
pest(4)=pest(4)+1/N;
end
end
pest

Osvaldo Simeone ML4Engineers 33 / 56

Problem 2.12

x1 µ1
For a jointly Gaussian rv x = with mean vector µ = and
x2 µ2
2
σ1 σ12
covariance Σ = , prove the second equality in
σ12 σ22

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )] = Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

Osvaldo Simeone ML4Engineers 34 / 56

Problem 2.12: Solution

x1 µ1
For a jointly Gaussian rv x = with mean vector µ = and
x µ2
2 2
σ1 σ12
covariance Σ = , prove the second equality in
σ12 σ22

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )] = Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

We have

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )]

= Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 Ex∼N (µ,σ2 ) [x2 ] − µ2 Ex∼N (µ,σ2 ) [x1 ] + µ1 µ2
= Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 Ex2 ∼N (µ2 ,σ2 ) [x2 ] − µ2 Ex1 ∼N (µ1 ,σ2 ) [x1 ] + µ1 µ2
2 1

= Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

Osvaldo Simeone ML4Engineers 35 / 56

Problem 2.13

0
For a jointly Gaussian rv with mean vector µ = and covariance
0

2 −1
Σ= ,
−1 2
I compute the covariance coefficient ρ;
I verify that the covariance matrix is positive definite;
I evaluate the expectation Ex∼N (µ,Σ) [(x1 + x2 )2 ];
I evaluate the expectation Ex∼N (µ,Σ) [(x1 − x2 )2 ] and compare with your
result at the previous point;
I modify the covariance σ12 so that Ex∼N (µ,Σ) [(x1 + x2 )2 ] = 0;
I modify the covariance σ12 so that Ex∼N (µ,Σ) [(x1 − x2 )2 ] = 0.

Osvaldo Simeone ML4Engineers 36 / 56

Problem 2.13: Solution

Covariance coefficient ρ:
σ12 1
ρ= =− ,
σ1 σ2 2
so the two variables are negatively correlated, although not maximally so, since
ρ < 1 : Given that the mean is zero for both variables, when x1 is
positive/negative, x2 will tend to be negative/positive, and vice versa.
Since |ρ| < 1, the covariance is positive definite.
Expectation Ex∼N (µ,Σ) [(x1 + x2 )2 ] :

Ex∼N (µ,Σ) [(x1 + x2 )2 ] = Ex∼N (µ,Σ) [x21 ] + Ex∼N (µ,Σ) [x22 ] + 2Ex∼N (µ,Σ) [x1 x2 ]
= Ex1 ∼N (0,2) [x21 ] + Ex2 ∼N (0,2) [x22 ] + 2Ex∼N (µ,Σ) [x1 x2 ]
= (σ12 + µ21 ) + (σ22 + µ22 ) + 2(σ12 + µ1 µ2 )
= 2 + 2 + 2 · (−1) = 2.

Since the variables are negatively correlated, they tend to cancel each other.
Therefore, we have Ex∼N (µ,Σ) [(x1 + x2 )2 ] < Ex∼N (µ,Σ) [x21 ] + Ex∼N (µ,Σ) [x22 ] = 4.

Osvaldo Simeone ML4Engineers 37 / 56

Problem 2.13: Solution

Expectation Ex∼N (µ,Σ) [(x1 − x2 )2 ] :

Ex∼N (µ,Σ) [(x1 − x2 )2 ] = (σ12 + µ21 ) + (σ22 + µ22 ) − 2(σ12 + µ1 µ2 )

= 2 + 2 − 2 · (−1) = 6.

Therefore, subtracting the two variables yields on average to a larger

“power” since the two variables are negatively correlated: we have
Ex∼N (µ,Σ) [(x1 − x2 )2 ] > Ex∼N (µ,Σ) [x21 ] + Ex∼N (µ,Σ) [x22 ] = 4.

Osvaldo Simeone ML4Engineers 38 / 56

Problem 2.13: Solution
Modify the covariance matrix Σ so that Ex∼N (µ,Σ) [(x1 + x2 )2 ] = 0:
The variables need to be maximally negatively correlated, which is
obtained for ρ = −1, and hence σ12 = ρσ1 σ2 = −2, yielding

2 −2
Σ= .
−2 2

In this case, x1 = −x2 .

Modify the covariance matrix so that Ex∼N (µ,Σ) [(x1 − x2 )2 ] = 0: The
variables need to be maximally positively correlated, which is obtained
for ρ = 1, and hence σ12 = ρσ1 σ2 = 2, yielding

2 2
Σ= .
2 2

In this case, x1 = x2 .

Osvaldo Simeone ML4Engineers 39 / 56

Problem 2.14

Produce
3D plots for a jointly Gaussian
pdfwith mean vector
0 2 −1
µ= and covariance Σ = .
0 −1 2

2 −1.9
Repeat for Σ = .
−1.9 2
Produce
3D plots for a jointly Gaussian
pdfwith mean vector
5 2 −1
µ= and covariance Σ = .
7 −1 2

Osvaldo Simeone ML4Engineers 40 / 56

Problem 2.14: Solution

x1=-3*sqrt(2):0.1:3*sqrt(2);
x2=x1;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)]; %X1(:) makes a vector out of matrix X1
mu=[0,0];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=\mu 2=0$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and $\sigma {12}=-
1$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 41 / 56

Problem 2.14: Solution

x1=-3*sqrt(2):0.1:3*sqrt(2);
x2=x1;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)]; %X1(:) makes a vector out of matrix X1
mu=[0,0];
Sigma=[2 -1.9;-1.9 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=\mu 2=0$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and $\sigma {12}=-
1.9$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 42 / 56

Problem 2.14: Solution

x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 43 / 56

Problem 2.15

Produce
contour
plots for a jointly
Gaussian pdf with mean vector
5 2 −1
µ= and covariance Σ = . Repeat for
7 −1 2

2 −1.9
Σ= .
−1.9 2

Osvaldo Simeone ML4Engineers 44 / 56

Problem 2.15: Solution

x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
contour(x1,x2,y);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 45 / 56

Problem 2.15: Solution

x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1.9;-1.9 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
contour(x1,x2,y);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1.9$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 46 / 56

Problem 2.16

µ1
For a jointly Gaussian rv with mean vector µ = and
µ2
2
σ1 σ12
covariance Σ = , interpret the formula for the
σ12 σ22
conditional distribution

σ1 2 2
(x1 |x2 = x2 ) ∼ N µ1 + ρ (x2 − µ2 ), σ1 (1 − ρ )
σ2

in terms of prediction of x1 given an observation x2 = x2 .

How is the formula simplified when σ1 = σ2 and µ1 = µ2 = 0?

Osvaldo Simeone ML4Engineers 47 / 56

Problem 2.16: Solution
Given an observation x2 , assuming that the joint distribution is known
(or estimated from data), a prediction of x1 can be obtained by
considering the mean of the conditional distribution, i.e.,
σ1
Ex1 ∼p(x1 |x2 ) [x1 ] = µ1 + ρ (x2 − µ2 ).
σ2
This says that the prediction is given by the mean µ1 , which would be
the corresponding prediction had we not measured x2 , corrected by
ρ σσ21 (x2 − µ2 ). This term is positive if ρ and (x2 − µ2 ) have the same
sign in accordance to the interpretation of the covariance coefficient.
Furthermore, the correction is weighted by the ratio σ1 /σ2 , which
accounts for the potentially different variance of the two variables.
When σ1 = σ2 and µ1 = µ2 = 0, we have the simplified formula

Ex1 ∼p(x1 |x2 ) [x1 ] = ρx2 ,

which can be readily interpreted in light of the discussion above.

Osvaldo Simeone ML4Engineers 48 / 56
Problem 2.17

Consider a jointly Gaussian vector with all-zero mean vector and

covariance matrix defined by σ1 = σ2 = 1 and σ12 = −0.1. Consider
the linear predictor x̂2 = ax1 for some real number a and compute the
mean squared error

Ex∼N (µ,Σ) [(x̂2 − x2 )2 ]

as a function of a.
Then, optimize over a by equating the derivative with respect to a to
zero.
How does this solution compare with the conditional mean ρx2
obtained in the previous problem?

Osvaldo Simeone ML4Engineers 49 / 56

Problem 2.17: Solution

We have

Ex∼N (µ,Σ) [(ax1 − x2 )2 ] = a2 σ12 + σ22 − 2aσ12

= a2 + 1 − 2aρ.

Computing the derivative and setting it equal to zero, we get

d 2
(a + 1 − 2aρ) = 2a − 2ρ = 0,
da
which yields the optimal value a∗ = ρ.
Therefore, the conditional expectation x̂1 = ρx2 is the linear
prediction (i.e., the prediction of the form x̂2 = ax1 ) that minimizes
the mean squared error.

Osvaldo Simeone ML4Engineers 50 / 56

Problem 2.18

Using the formula for the joint pdf of a two-dimensional jointly Gaussian vector,
show that, if the covariance is zero, i.e., if σ12 = 0, then the two variables are
independent.

Osvaldo Simeone ML4Engineers 51 / 56

Problem 2.18: Solution

The joint pdf is given as

1 (x1 − µ1 )2 (x2 − µ2 )2

1
N (x|µ, Σ) = exp − +
2πσ1 σ2 2 σ12 σ22
2
1 (x2 − µ2 )2

1 1 (x1 − µ1 ) 1
= p exp − exp −
σ12 σ22
p
2πσ12 2 2πσ22 2
= N (x1 |µ1 , σ12 ) · N (x2 |µ2 , σ22 ),

which concludes the proof.

Osvaldo Simeone ML4Engineers 52 / 56

Problem 2.19

We have a test for screening cancer that is 90% sensitive (i.e.,

Pr(positive|cancer) = 0.9) and 90% specific (i.e.,
Pr(negative|no cancer) = 0.9). Assuming that 1% of the population
has cancer, what is the fraction of positive tests that correctly detects
cancer?
What happens if the test is 100% specific?
In this case, would your answer change if the test was less sensitive?

Osvaldo Simeone ML4Engineers 53 / 56

Problem 2.19: Solution
Using Bayes theorem, we need to compute

Pr(positive|cancer)
Pr(cancer|positive) = Pr(cancer) × .
| {z } Pr(positive)
prior | {z }
likelihood ratio

We have
0.9
0.01 ×
Pr(cancer|positive) = |{z} = 0.083.
0.01 · 0.9 + 0.99 · 0.1
prior | {z }
likelihood ratio

If the test is 100% specific, we have

0.9
0.01 ×
Pr(cancer|positive) = |{z} = 1.
prior | {z· 0.9}
0.01
likelihood ratio

In this case, the answer does not depend on the sensitivity of the test.
Osvaldo Simeone ML4Engineers 54 / 56
Problem 2.20

For the joint distribution p(x1 , x2 ) in the table below, compute the
average E(x1 ,x2 )∼p(x1 ,x2 ) [x1 x2 + x22 ] using the law of iterated
expectations.

x1 \x2 0 1
0 0.6 0.1
1 0.1 0.2

Osvaldo Simeone ML4Engineers 55 / 56

Problem 2.20: Solution

We first compute the expectation using the conditional p(x1 |x2 ) and
then using the marginal p(x2 ). Using the law of iterated expectations,
we obtain:

F (0) =Ex1 ∼p(x1 |x2 =0) [x1 x2 + x22 ] = 0

F (1) =Ex1 ∼p(x1 |x2 =1) [x1 x2 + x22 ]
=Ex1 ∼p(x1 |x2 =1) [x1 + 1]
=p(x1 = 1|x2 = 1) + 1
2 5
= +1=
3 3
and
5
Ex2 ∼p(x2 ) [F (x2 )] = 0.7 × 0 + 0.3 × = 0.5.
3

Osvaldo Simeone ML4Engineers 56 / 56

Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
Design Guide For Layout and Plot Plan
No ratings yet
Design Guide For Layout and Plot Plan
63 pages
BAL, Mieke - Semiotics and Art History
100% (1)
BAL, Mieke - Semiotics and Art History
36 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Global News Journalistic Principles and Practices
No ratings yet
Global News Journalistic Principles and Practices
53 pages
HW 1
No ratings yet
HW 1
4 pages
Homework 1
No ratings yet
Homework 1
8 pages
exercise01
No ratings yet
exercise01
3 pages
10 PDF
No ratings yet
10 PDF
63 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
No ratings yet
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
4 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Solutions Manual Scientific Computing
0% (1)
Solutions Manual Scientific Computing
192 pages
S Ccs Answers
No ratings yet
S Ccs Answers
192 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 4: Solutions
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 4: Solutions
8 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Assignment 1: Statistical Machine Learning, Summer Term 2022
No ratings yet
Assignment 1: Statistical Machine Learning, Summer Term 2022
4 pages
Day 1
No ratings yet
Day 1
41 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
lec12
No ratings yet
lec12
9 pages
homework1
No ratings yet
homework1
3 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
MIT2 086F12 Quiz3 Samples
No ratings yet
MIT2 086F12 Quiz3 Samples
14 pages
Problem Sets
No ratings yet
Problem Sets
47 pages
cs419endsemsols
No ratings yet
cs419endsemsols
6 pages
Series 1, Oct 1st, 2013 Probability and Related) : Machine Learning
No ratings yet
Series 1, Oct 1st, 2013 Probability and Related) : Machine Learning
4 pages
DAMA_50_exam_final_23-24
No ratings yet
DAMA_50_exam_final_23-24
8 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
ML Ctanujit
No ratings yet
ML Ctanujit
56 pages
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
No ratings yet
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
8 pages
HW 3 Solutions Summer 2010
No ratings yet
HW 3 Solutions Summer 2010
6 pages
Fall2020 CS395T Mock Midterm Solutions
No ratings yet
Fall2020 CS395T Mock Midterm Solutions
4 pages
Midterm Aut2014 (Final) Sol
No ratings yet
Midterm Aut2014 (Final) Sol
23 pages
mock end term solution
No ratings yet
mock end term solution
12 pages
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
No ratings yet
Sample Exam For ML YSZ Sample For Machine Lerning - CMNKNVMNCS."NMD, MN, MVN, MDNV, MNDV MC, MDN, MDCNVM, NDV, M Ccwdmnbnbew, Mwbe
4 pages
hw3_red
No ratings yet
hw3_red
4 pages
DAMA_50_exam_resit_22-23
No ratings yet
DAMA_50_exam_resit_22-23
11 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
Sample Exam For ML YSZ: Question 1 (Linear Regression)
No ratings yet
Sample Exam For ML YSZ: Question 1 (Linear Regression)
4 pages
Homework1 2024
No ratings yet
Homework1 2024
2 pages
homework2
No ratings yet
homework2
5 pages
Linear Programming With MATLAB
No ratings yet
Linear Programming With MATLAB
51 pages
Lecture 9: October 2: 9.1.1 Stochastic Block Model
No ratings yet
Lecture 9: October 2: 9.1.1 Stochastic Block Model
6 pages
ПМиИИ Демо ENG
No ratings yet
ПМиИИ Демо ENG
11 pages
HW 1 in 2015
No ratings yet
HW 1 in 2015
3 pages
HW 1
No ratings yet
HW 1
3 pages
Math Tools NYU HW1 Problem Set
No ratings yet
Math Tools NYU HW1 Problem Set
4 pages
hw01 Cvxopt sp19
No ratings yet
hw01 Cvxopt sp19
3 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
7 pages
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
No ratings yet
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
10 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages
Sol 42
No ratings yet
Sol 42
4 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
saksham MATLAB
No ratings yet
saksham MATLAB
35 pages
Question 1 (Linear Regression)
No ratings yet
Question 1 (Linear Regression)
18 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Lagoon Avenue Sungkono Terhadap Kinerja Simpang Di JL
No ratings yet
Lagoon Avenue Sungkono Terhadap Kinerja Simpang Di JL
12 pages
Business Proposal Anne - pptx234
100% (2)
Business Proposal Anne - pptx234
16 pages
Corrosion Modelling.: Piping Corrosion Circuit or Corrosion Loop / Piping Circuitization and
No ratings yet
Corrosion Modelling.: Piping Corrosion Circuit or Corrosion Loop / Piping Circuitization and
4 pages
Resume Manthaiyappan
No ratings yet
Resume Manthaiyappan
3 pages
Survey Report
No ratings yet
Survey Report
12 pages
Armadillo: An Open Source C++ Linear Algebra Library
No ratings yet
Armadillo: An Open Source C++ Linear Algebra Library
16 pages
41.WHIH Newsfront Promo
No ratings yet
41.WHIH Newsfront Promo
3 pages
11 Chapter 4
No ratings yet
11 Chapter 4
14 pages
RSR Vickham
No ratings yet
RSR Vickham
272 pages
Unit 9 Formal Informal and Non Formal Education PDF
No ratings yet
Unit 9 Formal Informal and Non Formal Education PDF
2 pages
Focus
No ratings yet
Focus
4 pages
Black - Pearl Jam
No ratings yet
Black - Pearl Jam
1 page
BALDOVINO JV Lab2-1
No ratings yet
BALDOVINO JV Lab2-1
11 pages
Greenpeace Book of Greenwash PDF
No ratings yet
Greenpeace Book of Greenwash PDF
36 pages
Carlson - The Unfit A History of A Bad Idea (2001)
No ratings yet
Carlson - The Unfit A History of A Bad Idea (2001)
468 pages
To Install SC4M Map File: SC4Mapper
No ratings yet
To Install SC4M Map File: SC4Mapper
2 pages
Brennan, Niamh M. and Merkl-Davies, Doris M. [2013] “Accounting Narratives and Impression Management” (peer reviewed), in Lisa Jack, Jane Davison and Russell Craig (eds.), The Routledge Companion to Accounting Communication, Routledge, London, pp. 109-132.
No ratings yet
Brennan, Niamh M. and Merkl-Davies, Doris M. [2013] “Accounting Narratives and Impression Management” (peer reviewed), in Lisa Jack, Jane Davison and Russell Craig (eds.), The Routledge Companion to Accounting Communication, Routledge, London, pp. 109-132.
48 pages
Supplementary Notes On Entropy and The Second Law of Thermodynamics
No ratings yet
Supplementary Notes On Entropy and The Second Law of Thermodynamics
6 pages
Executing APD in Process Chain: Applies To
No ratings yet
Executing APD in Process Chain: Applies To
12 pages
Touchscreen Is An Electronic Visual Display That Can Detect The Presence and Location of A Touch Within The Display Area
No ratings yet
Touchscreen Is An Electronic Visual Display That Can Detect The Presence and Location of A Touch Within The Display Area
9 pages
Learning Supplement M1 Ch. 7 More About Trapezoidal Rule
No ratings yet
Learning Supplement M1 Ch. 7 More About Trapezoidal Rule
5 pages
Com01 2ND Lesson
No ratings yet
Com01 2ND Lesson
28 pages
Validation Books 2007
No ratings yet
Validation Books 2007
7 pages
Pocketmod - The Free Recyclable Personal Organizer
No ratings yet
Pocketmod - The Free Recyclable Personal Organizer
1 page
18.S096 Problem Set Fall 2013: Stochastic Calculus
No ratings yet
18.S096 Problem Set Fall 2013: Stochastic Calculus
3 pages
Brigada Eskwela Highlights Community Service
No ratings yet
Brigada Eskwela Highlights Community Service
12 pages
Concept of Leadership
100% (1)
Concept of Leadership
142 pages

3 PDF

Uploaded by

3 PDF

Uploaded by

Machine Learning for Engineers:

Chapter 2. Basic Background - Problems

Osvaldo Simeone ML4Engineers 1 / 56

Consider a Bernoulli rv x ∼ Bern(0.2).

Osvaldo Simeone ML4Engineers 2 / 56

Osvaldo Simeone ML4Engineers 3 / 56

Consider a categorical, or multinoulli rv, x ∼ Cat([0.2, 0.1, 0.3, 0.4]T ).

I Draw the pmf.

Osvaldo Simeone ML4Engineers 4 / 56

Osvaldo Simeone ML4Engineers 5 / 56

Consider a Gaussian rv x ∼ N (−3, 4).

Osvaldo Simeone ML4Engineers 6 / 56

Osvaldo Simeone ML4Engineers 7 / 56

Given a rv x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ) compute the expectation

Ex∼Cat(q) [x2 + 3 exp(x)].

Verify your calculation using an empirical estimate obtained by drawing random

Osvaldo Simeone ML4Engineers 8 / 56

By linearity of the expectation, we have

Ex∼Cat(q) [x2 + 3 exp(x)] = Ex∼Cat(q) [x2 ] + 3Ex∼Cat(q) [exp(x)],

Ex∼Cat(q) [x2 + 3 exp(x)] = 4.9 + 3 · 10.72 = 37.06.

Osvaldo Simeone ML4Engineers 9 / 56

Osvaldo Simeone ML4Engineers 10 / 56

Given a rv x ∼ N (−3, 4) compute the expectation

Ex∼N (−3,4) [x + 3x2 ].

Verify your calculation using an empirical estimate obtained by drawing random

Osvaldo Simeone ML4Engineers 11 / 56

By linearity of the expectation, we have

Ex∼N (−3,4) [x + 3x2 ] = Ex∼N (−3,4) [x] + 3Ex∼N (−3,4) [x2 ],

Ex∼N (−3,4) [x + 3x2 ] = −3 + 3 · 13 = 36.

Osvaldo Simeone ML4Engineers 12 / 56

Osvaldo Simeone ML4Engineers 13 / 56

Plot the variance of Bernoulli rv Bern(p) as a function of p. What is the

Osvaldo Simeone ML4Engineers 14 / 56

Osvaldo Simeone ML4Engineers 15 / 56

Osvaldo Simeone ML4Engineers 16 / 56

We compute the mean

Ex∼Cat(q) [x] = 0.2 · 0 + 0.1 · 1 + 0.3 · 2 + 0.4 · 3 = 1.9

Var(x) = Ex∼Cat(q) [x2 ] − (Ex∼Cat(q) [x])2 = 1.29.

Osvaldo Simeone ML4Engineers 17 / 56

Given a categorical rv x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ), what is the

Osvaldo Simeone ML4Engineers 18 / 56

Ex∼Cat(q) [1(x = 0)] =Pr[x = 0]

Ex∼N (x|0,1) [1(x = 0)] =Pr[x = 0]

Osvaldo Simeone ML4Engineers 19 / 56

Given vectors x = [2, 1]T and y = [−1, 3]T ,

I are the two vectors linearly independent?

vector with cosine -1;

Osvaldo Simeone ML4Engineers 20 / 56

Osvaldo Simeone ML4Engineers 21 / 56

Squared `2 norms of the two vectors

||x||2 = 22 + 12 = 5 and ||y ||2 = (−1)2 + 32 = 10.

Compute the cosine of the angle between the two vectors

Osvaldo Simeone ML4Engineers 22 / 56

Osvaldo Simeone ML4Engineers 24 / 56

I is A positive definite, i.e., is A  0?

Osvaldo Simeone ML4Engineers 25 / 56

[1, −1][2, −1]T = 3, [1, −1][3, 2]T = 1, [1, −1][−1, 3]T = −4

Osvaldo Simeone ML4Engineers 26 / 56

Osvaldo Simeone ML4Engineers 27 / 56

Osvaldo Simeone ML4Engineers 28 / 56

Osvaldo Simeone ML4Engineers 29 / 56

Osvaldo Simeone ML4Engineers 30 / 56

p(x2 = 1) = p(x1 = 0, x2 = 1) + p(x1 = 1, x2 = 1)

So, we have x2 ∼ Bern(0.45).

Osvaldo Simeone ML4Engineers 31 / 56

Osvaldo Simeone ML4Engineers 32 / 56

Osvaldo Simeone ML4Engineers 33 / 56

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )] = Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

Osvaldo Simeone ML4Engineers 34 / 56

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )] = Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )]

= Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

Osvaldo Simeone ML4Engineers 35 / 56

Osvaldo Simeone ML4Engineers 36 / 56

Osvaldo Simeone ML4Engineers 37 / 56

Expectation Ex∼N (µ,Σ) [(x1 − x2 )2 ] :

I is A positive definite, i.e., is A 0?