0% found this document useful (0 votes)
20 views

3 PDF

This document contains solutions to 10 problems involving basic concepts in probability and statistics using machine learning techniques. Problem 2.1 involves drawing samples from a Bernoulli distribution to estimate the probability parameter and commenting on the results with different sample sizes. Problem 2.2 extends this to a categorical distribution. Later problems involve computing expectations, variances and probabilities for different distributions, verifying the results using empirical estimates from samples. Vector and linear algebra concepts are also illustrated, including computing norms, inner products, and orthogonal vectors. Plots are provided to visualize the distributions and vector relationships.

Uploaded by

Tala Abdelghani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

3 PDF

This document contains solutions to 10 problems involving basic concepts in probability and statistics using machine learning techniques. Problem 2.1 involves drawing samples from a Bernoulli distribution to estimate the probability parameter and commenting on the results with different sample sizes. Problem 2.2 extends this to a categorical distribution. Later problems involve computing expectations, variances and probabilities for different distributions, verifying the results using empirical estimates from samples. Vector and linear algebra concepts are also illustrated, including computing norms, inner products, and orthogonal vectors. Plots are provided to visualize the distributions and vector relationships.

Uploaded by

Tala Abdelghani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Machine Learning for Engineers:

Chapter 2. Basic Background - Problems

Osvaldo Simeone

May 6, 2021

Osvaldo Simeone ML4Engineers 1 / 56


Problem 2.1

Consider a Bernoulli rv x ∼ Bern(0.2).


I Draw the pmf.
I Generate N = 10 independent realizations of rv x.
I Use these samples to estimate p(x = 1).
I Repeat the previous two points using N = 1000 samples.
I Comment on your results.

Osvaldo Simeone ML4Engineers 2 / 56


Problem 2.1: Solution

p=0.2;
stem([0,1],[1-p,p],’LineWidth’,2)
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
help binornd
binornd(1,p)
N=10;
x=binornd(1,p,N,1);
pest=mean(x)
N=1000;
x=binornd(1,p,N,1);
pest=mean(x)

Osvaldo Simeone ML4Engineers 3 / 56


Problem 2.2

Consider a categorical, or multinoulli rv, x ∼ Cat([0.2, 0.1, 0.3, 0.4]T ).

I Draw the pmf.


I Generate N = 10 independent realizations of rv x.
I Using these samples estimate the probabilities qk = p(x = k) for
k = 0, 1, 2, 3.
I Repeat the previous two points using N = 1000 samples.
I Comment on your results.

Osvaldo Simeone ML4Engineers 4 / 56


Problem 2.2: Solution

q=[0.2,0.1,0.3,0.4];
stem([0,1,2,3],q,’LineWidth’,2)
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
%%%
help mnrnd
mnrnd(1,q) %one-hot representation
N=10;
xoh=mnrnd(1,q,N);
pest=mean(xoh)
%%%
N=1000;
xoh=mnrnd(1,q,N);
pest=mean(xoh)

Osvaldo Simeone ML4Engineers 5 / 56


Problem 2.3

Consider a Gaussian rv x ∼ N (−3, 4).


I Draw the pdf.
I Generate N = 10 independent realizations of rv x.
I Using these samples estimate the probability Pr[x ∈ (−3, 3)].
I Repeat the previous two points using N = 1000 samples.
I Comment on your results.

Osvaldo Simeone ML4Engineers 6 / 56


Problem 2.3: Solutions

dx=0.01;
xaxis=[-9:dx:3];
help normpdf
plot(xaxis,normpdf(xaxis,-3,2),’LineWidth’,2); %note that we need to specify
the standard deviation and not the variance
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
%%%
normrnd(-3,2)
N=10;
x=normrnd(-3,2,N,1);
pest=mean(((-3<=x).*(x<=3)))
N=1000;
x=normrnd(-3,2,N,1);
pest=mean(((-3<=x).*(x<=3)))

Osvaldo Simeone ML4Engineers 7 / 56


Problem 2.4

Given a rv x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ) compute the expectation

Ex∼Cat(q) [x2 + 3 exp(x)].

Verify your calculation using an empirical estimate obtained by drawing random


samples.

Osvaldo Simeone ML4Engineers 8 / 56


Problem 2.4: Solution

By linearity of the expectation, we have

Ex∼Cat(q) [x2 + 3 exp(x)] = Ex∼Cat(q) [x2 ] + 3Ex∼Cat(q) [exp(x)],

where
Ex∼Cat(q) [x2 ] = 0.2 · 0 + 0.1 · 1 + 0.3 · 4 + 0.4 · 9 = 4.9
and

Ex∼Cat(q) [exp(x)] = 0.2 · 1 + 0.1 · exp(1) + 0.3 · exp(2) + 0.4 · exp(3) = 10.72.

So we we finally have

Ex∼Cat(q) [x2 + 3 exp(x)] = 4.9 + 3 · 10.72 = 37.06.

Osvaldo Simeone ML4Engineers 9 / 56


Problem 2.4: Solution

q=[0.2,0.1,0.3,0.4];
N=10000;
xoh=mnrnd(1,q,N);
x=xoh*[0,1,2,3]’; %convert from one-hot vector to scalar representation
expest=mean(x.ˆ2+3*exp(x))

Osvaldo Simeone ML4Engineers 10 / 56


Problem 2.5

Given a rv x ∼ N (−3, 4) compute the expectation

Ex∼N (−3,4) [x + 3x2 ].

Verify your calculation using an empirical estimate obtained by drawing random


samples.

Osvaldo Simeone ML4Engineers 11 / 56


Problem 2.5: Solution

By linearity of the expectation, we have

Ex∼N (−3,4) [x + 3x2 ] = Ex∼N (−3,4) [x] + 3Ex∼N (−3,4) [x2 ],

where
Ex∼N (−3,4) [x] = −3
and
Ex∼N (−3,4) [x2 ] = (−3)2 + 4 = 13.
Therefore, we finally have

Ex∼N (−3,4) [x + 3x2 ] = −3 + 3 · 13 = 36.

Osvaldo Simeone ML4Engineers 12 / 56


Problem 2.5: Solution

N=10000;
x=normrnd(-3,2,N,1);
expest=mean(x+3*x.ˆ2)

Osvaldo Simeone ML4Engineers 13 / 56


Problem 2.6

Plot the variance of Bernoulli rv Bern(p) as a function of p. What is the


value of p that maximizes uncertainty?

Osvaldo Simeone ML4Engineers 14 / 56


Problem 2.6: Solution

paxis=[0:0.01:1];
plot(paxis,paxis.*(1-paxis),’LineWidth’,2)
xlabel(’$p$’,’Interpreter’,’latex’)
ylabel(’Var$(p)$’,’Interpreter’,’latex’)

Osvaldo Simeone ML4Engineers 15 / 56


Problem 2.7

Compute the variance of random variable x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ) .

Osvaldo Simeone ML4Engineers 16 / 56


Problem 2.7: Solution

We compute the mean

Ex∼Cat(q) [x] = 0.2 · 0 + 0.1 · 1 + 0.3 · 2 + 0.4 · 3 = 1.9

and
Ex∼Cat(q) [x2 ] = 0.2 · 0 + 0.1 · 1 + 0.3 · 4 + 0.4 · 9 = 4.9
to obtain

Var(x) = Ex∼Cat(q) [x2 ] − (Ex∼Cat(q) [x])2 = 1.29.

Osvaldo Simeone ML4Engineers 17 / 56


Problem 2.8

Given a categorical rv x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ), what is the


expectation Ex∼Cat(q) [1(x = 0)]?
For a Gaussian rv x ∼ N (x|0, 1), what is the expectation
Ex∼N (x|0,1) [1(x = 0)]?

Osvaldo Simeone ML4Engineers 18 / 56


Problem 2.8: Solution

We have

Ex∼Cat(q) [1(x = 0)] =Pr[x = 0]


=0.2

and

Ex∼N (x|0,1) [1(x = 0)] =Pr[x = 0]


=0.

Osvaldo Simeone ML4Engineers 19 / 56


Problem 2.9

Given vectors x = [2, 1]T and y = [−1, 3]T ,


I represent the vectors in the two-dimensional plane R2 ;
I compute their inner product;
I compute the squared ` norms of the two vectors;
2
I compute the cosine of the angle between the two vectors;

I are the two vectors linearly independent?


I determine a vector that is orthogonal to x;
I normalize vector y so that it has unitary norm;
I give a vector such that the cosine of the angle with x equals 1 and a

vector with cosine -1;


I compute the element-wise product x  y ;
I plot all the vectors determined at the previous points.

Osvaldo Simeone ML4Engineers 20 / 56


Problem 2.9: Solution

MATLAB code:
x=[2;1]; %or x=[2,1]’;
y=[-1;3];
%represent as points on the plane
plot(x(1),x(2),’x’,’LineWidth’,2,’MarkerSize’,10); hold on
plot(y(1),y(2),’o’,’LineWidth’,2,’MarkerSize’,10)
ylim([0,3])
%represent as arrows
drawArrow = @(a,b) quiver( a(1),a(2),b(1)-a(1),b(2)-a(2),0);
a = [0 0];
drawArrow(a,x);
hold on
drawArrow(a,y);
axis equal %aspect ratio so that data units are the same

Osvaldo Simeone ML4Engineers 21 / 56


Problem 2.9: Solution

Inner product

x T y = y T x = 2 · (−1) + 1 · 3 = 1.

Squared `2 norms of the two vectors

||x||2 = 22 + 12 = 5 and ||y ||2 = (−1)2 + 32 = 10.

Compute the cosine of the angle between the two vectors

xT y 1
cos(θ) = =√ = 0.14.
||x||||y || 5 · 10
Are the two vectors linearly independent? No.

Osvaldo Simeone ML4Engineers 22 / 56


Problem 2.9: Solution
A vector that is orthogonal to x:
z = [−1, 2]T
since z T x = −2 + 2 = 0.
Normalized vector y
−1
" #
y √
= 10 .
||y || √3
10
Vector such that the cosine of the angle with x equals 1 and a vector
with cosine of the angle equal to -1:
z = c[2, 1]T ,
where c > 0 so that the angle is 1 and c < 0 so that the cosine of the
angle is -1.
Element-wise product
   
2 · (−1) −2
x y = = .
1·3 3
Osvaldo Simeone ML4Engineers 23 / 56
Problem 2.9: Solution

x=[2;1]; y=[-1;3];
x’*y %inner product
norm(x) %l2 norm (not squared)
norm(y) %l2 norm (not squared)
x’*y/(norm(x)*norm(y)) %angle
%%%
drawArrow = @(a,b) quiver( a(1),a(2),b(1)-a(1),b(2)-a(2),0);
a = [0 0];
drawArrow(a,x);
hold on
drawArrow(a,y);
z=[-1;2];
drawArrow(a,z);
ynorm=y/norm(y);
drawArrow(a,ynorm);
zp=0.6*x;
zm=-0.6*x;
drawArrow(a,zp);
drawArrow(a,zm);

Osvaldo Simeone ML4Engineers 24 / 56


Problem 2.10

   
1 −1 2 3 −1
Given matrices A = and B = ,
−1 2 −1 2 3
I compute the product AB;
I compute the product B T AT ;
I compute the product Diag([1, 2]T )B;
I is A symmetric?
I if it is symmetric, evaluate eigenvectors and eigenvalues of A;

I is A positive definite, i.e., is A  0?


I plot the quadratic form x T Ax as a function of vector x = [x , x ]T for
1 2
x1 ∈ [−2, 2] and x2 ∈ [−2, 2];
I is BB T positive definite? Is it invertible?

Osvaldo Simeone ML4Engineers 25 / 56


Problem 2.10: Solution

   
1 −1 2 3 −1
Given matrices A = and B = , we have
−1 2 −1 2 3

[1, −1][2, −1]T = 3, [1, −1][3, 2]T = 1, [1, −1][−1, 3]T = −4


 
AB =
[−1, 2][2, −1]T = −4, [−1, 2][3, 2]T = 1 [−1, 2][−1, 3]T = 7

and  
3 −4
B T AT = (AB)T = 1 1 .
−4 7

Osvaldo Simeone ML4Engineers 26 / 56


Problem 2.10: Solution

We also have
    
T 1 0 2 3 −1 2 3 −1
Diag([1, 2] )B = = .
0 2 −1 2 3 −2 4 6
 
1 −1
A= is symmetric.
−1 2

Osvaldo Simeone ML4Engineers 27 / 56


Problem 2.10: Solution

A=[1,-1;-1,2]; B=[2,3,-1;-1,2,3];
A*B
B’*A’ %can be equivalently computed as (A*B)’
diag([1,2])*B %note that we have diag(A)=[1,2]
[U,L]=eig(A);
L %all eigenvalues are positive, so A is positive definite
U*L*U’ %this equals A
%%%
help mesh
x1axis=[-2:0.01:2];
x2axis=[-2:0.01:2];
[X1,X2] = meshgrid(x1axis,x2axis);
mesh(X1,X2,A(1,1)*X1.ˆ2+A(2,2)*X2.ˆ2+2*A(1,2)*X1.*X2)

Osvaldo Simeone ML4Engineers 28 / 56


Problem 2.10: Solution

[U,L]=eig(B*B’);
L %they are all positive and hence B*B’ is positive definite
rank(B*B’) %the matrix is 2x2 and the rank is 2 so it is invertible
inv(B*B’)
[U,L]=eig(B’*B);
L %one of the eigenvalues is zero and the others are positive, and
hence B’*B is positive semi-definite
rank(B’*B) %the matrix is 3x3 and the rank is 2 so it is not
invertible – try using inv(B’*B)!

Osvaldo Simeone ML4Engineers 29 / 56


Problem 2.11
 
x1
For the jointly Bernoulli random vector x = with the joint pmf
x2
p(x1 , x2 ) below, compute the marginal p(x2 ) and the conditional
distribution p(x2 |x1 = 1).
Generate N = 100 realizations of the rv x at the previous problem
and estimate the probabilities of each of the four configurations of the
outputs.
Repeat with N = 10000 and discuss your results.

x1 \x2 0 1
0 0.45 0.05 .
1 0.1 0.4

Osvaldo Simeone ML4Engineers 30 / 56


Problem 2.11: Solution

Marginal p(x2 ):

p(x2 = 1) = p(x1 = 0, x2 = 1) + p(x1 = 1, x2 = 1)


= 0.05 + 0.4 = 0.45.

So, we have x2 ∼ Bern(0.45).


Conditional distribution p(x2 |x1 ) :

x1 \x2 0 1
0 0.45/(0.45 + 0.05) = 0.9 0.1
1 0.1/(0.1 + 0.4) = 0.2 0.8

Osvaldo Simeone ML4Engineers 31 / 56


Problem 2.11: Solution

N=100;
x1=binornd(1,0.5,N,1); %this generates samples x1 from the marginal
p(x1)=Bern(0.5)
x2=zeros(N,1);
for n=1:N
if (x1(n)==0)
x2(n)=binornd(1,0.1); %this generates sample x2 from the conditional
p(x2|x1=0)=Bern(0.1)
else
x2(n)=binornd(1,0.8); %this generates sample x2 from the conditional
p(x2|x1=1)=Bern(0.8)
end
end

Osvaldo Simeone ML4Engineers 32 / 56


Problem 2.11: Solution

pest=zeros(4,1);
for n=1:N
if ((x1(n)==0)&&(x2(n)==0))
pest(1)=pest(1)+1/N;
elseif ((x1(n)==1)&&(x2(n)==0))
pest(2)=pest(2)+1/N;
elseif ((x1(n)==0)&&(x2(n)==1))
pest(3)=pest(3)+1/N;
elseif ((x1(n)==1)&&(x2(n)==1))
pest(4)=pest(4)+1/N;
end
end
pest

Osvaldo Simeone ML4Engineers 33 / 56


Problem 2.12

   
x1 µ1
For a jointly Gaussian rv x = with mean vector µ = and
x2 µ2
 2 
σ1 σ12
covariance Σ = , prove the second equality in
σ12 σ22

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )] = Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

Osvaldo Simeone ML4Engineers 34 / 56


Problem 2.12: Solution

   
x1 µ1
For a jointly Gaussian rv x = with mean vector µ = and
x µ2
 2  2
σ1 σ12
covariance Σ = , prove the second equality in
σ12 σ22

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )] = Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

We have

σ12 = Ex∼N (µ,σ2 ) [(x1 − µ1 )(x2 − µ2 )]


= Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 Ex∼N (µ,σ2 ) [x2 ] − µ2 Ex∼N (µ,σ2 ) [x1 ] + µ1 µ2
= Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 Ex2 ∼N (µ2 ,σ2 ) [x2 ] − µ2 Ex1 ∼N (µ1 ,σ2 ) [x1 ] + µ1 µ2
2 1

= Ex∼N (µ,σ2 ) [x1 x2 ] − µ1 µ2 .

Osvaldo Simeone ML4Engineers 35 / 56


Problem 2.13

 
0
For a jointly Gaussian rv with mean vector µ = and covariance
0
 
2 −1
Σ= ,
−1 2
I compute the covariance coefficient ρ;
I verify that the covariance matrix is positive definite;
I evaluate the expectation Ex∼N (µ,Σ) [(x1 + x2 )2 ];
I evaluate the expectation Ex∼N (µ,Σ) [(x1 − x2 )2 ] and compare with your
result at the previous point;
I modify the covariance σ12 so that Ex∼N (µ,Σ) [(x1 + x2 )2 ] = 0;
I modify the covariance σ12 so that Ex∼N (µ,Σ) [(x1 − x2 )2 ] = 0.

Osvaldo Simeone ML4Engineers 36 / 56


Problem 2.13: Solution

Covariance coefficient ρ:
σ12 1
ρ= =− ,
σ1 σ2 2
so the two variables are negatively correlated, although not maximally so, since
ρ < 1 : Given that the mean is zero for both variables, when x1 is
positive/negative, x2 will tend to be negative/positive, and vice versa.
Since |ρ| < 1, the covariance is positive definite.
Expectation Ex∼N (µ,Σ) [(x1 + x2 )2 ] :

Ex∼N (µ,Σ) [(x1 + x2 )2 ] = Ex∼N (µ,Σ) [x21 ] + Ex∼N (µ,Σ) [x22 ] + 2Ex∼N (µ,Σ) [x1 x2 ]
= Ex1 ∼N (0,2) [x21 ] + Ex2 ∼N (0,2) [x22 ] + 2Ex∼N (µ,Σ) [x1 x2 ]
= (σ12 + µ21 ) + (σ22 + µ22 ) + 2(σ12 + µ1 µ2 )
= 2 + 2 + 2 · (−1) = 2.

Since the variables are negatively correlated, they tend to cancel each other.
Therefore, we have Ex∼N (µ,Σ) [(x1 + x2 )2 ] < Ex∼N (µ,Σ) [x21 ] + Ex∼N (µ,Σ) [x22 ] = 4.

Osvaldo Simeone ML4Engineers 37 / 56


Problem 2.13: Solution

Expectation Ex∼N (µ,Σ) [(x1 − x2 )2 ] :

Ex∼N (µ,Σ) [(x1 − x2 )2 ] = (σ12 + µ21 ) + (σ22 + µ22 ) − 2(σ12 + µ1 µ2 )


= 2 + 2 − 2 · (−1) = 6.

Therefore, subtracting the two variables yields on average to a larger


“power” since the two variables are negatively correlated: we have
Ex∼N (µ,Σ) [(x1 − x2 )2 ] > Ex∼N (µ,Σ) [x21 ] + Ex∼N (µ,Σ) [x22 ] = 4.

Osvaldo Simeone ML4Engineers 38 / 56


Problem 2.13: Solution
Modify the covariance matrix Σ so that Ex∼N (µ,Σ) [(x1 + x2 )2 ] = 0:
The variables need to be maximally negatively correlated, which is
obtained for ρ = −1, and hence σ12 = ρσ1 σ2 = −2, yielding
 
2 −2
Σ= .
−2 2

In this case, x1 = −x2 .


Modify the covariance matrix so that Ex∼N (µ,Σ) [(x1 − x2 )2 ] = 0: The
variables need to be maximally positively correlated, which is obtained
for ρ = 1, and hence σ12 = ρσ1 σ2 = 2, yielding
 
2 2
Σ= .
2 2

In this case, x1 = x2 .

Osvaldo Simeone ML4Engineers 39 / 56


Problem 2.14

Produce
 3D  plots for a jointly Gaussian
 pdfwith mean vector
0 2 −1
µ= and covariance Σ = .
0 −1 2
 
2 −1.9
Repeat for Σ = .
−1.9 2
Produce
 3D  plots for a jointly Gaussian
 pdfwith mean vector
5 2 −1
µ= and covariance Σ = .
7 −1 2

Osvaldo Simeone ML4Engineers 40 / 56


Problem 2.14: Solution

x1=-3*sqrt(2):0.1:3*sqrt(2);
x2=x1;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)]; %X1(:) makes a vector out of matrix X1
mu=[0,0];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=\mu 2=0$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and $\sigma {12}=-
1$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 41 / 56


Problem 2.14: Solution

x1=-3*sqrt(2):0.1:3*sqrt(2);
x2=x1;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)]; %X1(:) makes a vector out of matrix X1
mu=[0,0];
Sigma=[2 -1.9;-1.9 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=\mu 2=0$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and $\sigma {12}=-
1.9$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 42 / 56


Problem 2.14: Solution

x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 43 / 56


Problem 2.15

Produce
 contour
 plots for a jointly
 Gaussian pdf with mean vector
5 2 −1
µ= and covariance Σ = . Repeat for
7 −1 2
 
2 −1.9
Σ= .
−1.9 2

Osvaldo Simeone ML4Engineers 44 / 56


Problem 2.15: Solution

x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
contour(x1,x2,y);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 45 / 56


Problem 2.15: Solution

x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1.9;-1.9 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
contour(x1,x2,y);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1.9$’,’Interpreter’,’latex’,’FontSize’,14)

Osvaldo Simeone ML4Engineers 46 / 56


Problem 2.16

 
µ1
For a jointly Gaussian rv with mean vector µ = and
µ2
 2 
σ1 σ12
covariance Σ = , interpret the formula for the
σ12 σ22
conditional distribution
 
σ1 2 2
(x1 |x2 = x2 ) ∼ N µ1 + ρ (x2 − µ2 ), σ1 (1 − ρ )
σ2

in terms of prediction of x1 given an observation x2 = x2 .


How is the formula simplified when σ1 = σ2 and µ1 = µ2 = 0?

Osvaldo Simeone ML4Engineers 47 / 56


Problem 2.16: Solution
Given an observation x2 , assuming that the joint distribution is known
(or estimated from data), a prediction of x1 can be obtained by
considering the mean of the conditional distribution, i.e.,
σ1
Ex1 ∼p(x1 |x2 ) [x1 ] = µ1 + ρ (x2 − µ2 ).
σ2
This says that the prediction is given by the mean µ1 , which would be
the corresponding prediction had we not measured x2 , corrected by
ρ σσ21 (x2 − µ2 ). This term is positive if ρ and (x2 − µ2 ) have the same
sign in accordance to the interpretation of the covariance coefficient.
Furthermore, the correction is weighted by the ratio σ1 /σ2 , which
accounts for the potentially different variance of the two variables.
When σ1 = σ2 and µ1 = µ2 = 0, we have the simplified formula

Ex1 ∼p(x1 |x2 ) [x1 ] = ρx2 ,

which can be readily interpreted in light of the discussion above.


Osvaldo Simeone ML4Engineers 48 / 56
Problem 2.17

Consider a jointly Gaussian vector with all-zero mean vector and


covariance matrix defined by σ1 = σ2 = 1 and σ12 = −0.1. Consider
the linear predictor x̂2 = ax1 for some real number a and compute the
mean squared error

Ex∼N (µ,Σ) [(x̂2 − x2 )2 ]

as a function of a.
Then, optimize over a by equating the derivative with respect to a to
zero.
How does this solution compare with the conditional mean ρx2
obtained in the previous problem?

Osvaldo Simeone ML4Engineers 49 / 56


Problem 2.17: Solution

We have

Ex∼N (µ,Σ) [(ax1 − x2 )2 ] = a2 σ12 + σ22 − 2aσ12


= a2 + 1 − 2aρ.

Computing the derivative and setting it equal to zero, we get


d 2
(a + 1 − 2aρ) = 2a − 2ρ = 0,
da
which yields the optimal value a∗ = ρ.
Therefore, the conditional expectation x̂1 = ρx2 is the linear
prediction (i.e., the prediction of the form x̂2 = ax1 ) that minimizes
the mean squared error.

Osvaldo Simeone ML4Engineers 50 / 56


Problem 2.18

Using the formula for the joint pdf of a two-dimensional jointly Gaussian vector,
show that, if the covariance is zero, i.e., if σ12 = 0, then the two variables are
independent.

Osvaldo Simeone ML4Engineers 51 / 56


Problem 2.18: Solution

The joint pdf is given as

1 (x1 − µ1 )2 (x2 − µ2 )2
  
1
N (x|µ, Σ) = exp − +
2πσ1 σ2 2 σ12 σ22
2
1 (x2 − µ2 )2
     
1 1 (x1 − µ1 ) 1
= p exp − exp −
σ12 σ22
p
2πσ12 2 2πσ22 2
= N (x1 |µ1 , σ12 ) · N (x2 |µ2 , σ22 ),

which concludes the proof.

Osvaldo Simeone ML4Engineers 52 / 56


Problem 2.19

We have a test for screening cancer that is 90% sensitive (i.e.,


Pr(positive|cancer) = 0.9) and 90% specific (i.e.,
Pr(negative|no cancer) = 0.9). Assuming that 1% of the population
has cancer, what is the fraction of positive tests that correctly detects
cancer?
What happens if the test is 100% specific?
In this case, would your answer change if the test was less sensitive?

Osvaldo Simeone ML4Engineers 53 / 56


Problem 2.19: Solution
Using Bayes theorem, we need to compute

Pr(positive|cancer)
Pr(cancer|positive) = Pr(cancer) × .
| {z } Pr(positive)
prior | {z }
likelihood ratio

We have
0.9
0.01 ×
Pr(cancer|positive) = |{z} = 0.083.
0.01 · 0.9 + 0.99 · 0.1
prior | {z }
likelihood ratio

If the test is 100% specific, we have


0.9
0.01 ×
Pr(cancer|positive) = |{z} = 1.
prior | {z· 0.9}
0.01
likelihood ratio

In this case, the answer does not depend on the sensitivity of the test.
Osvaldo Simeone ML4Engineers 54 / 56
Problem 2.20

For the joint distribution p(x1 , x2 ) in the table below, compute the
average E(x1 ,x2 )∼p(x1 ,x2 ) [x1 x2 + x22 ] using the law of iterated
expectations.

x1 \x2 0 1
0 0.6 0.1
1 0.1 0.2

Osvaldo Simeone ML4Engineers 55 / 56


Problem 2.20: Solution

We first compute the expectation using the conditional p(x1 |x2 ) and
then using the marginal p(x2 ). Using the law of iterated expectations,
we obtain:

F (0) =Ex1 ∼p(x1 |x2 =0) [x1 x2 + x22 ] = 0


F (1) =Ex1 ∼p(x1 |x2 =1) [x1 x2 + x22 ]
=Ex1 ∼p(x1 |x2 =1) [x1 + 1]
=p(x1 = 1|x2 = 1) + 1
2 5
= +1=
3 3
and
5
Ex2 ∼p(x2 ) [F (x2 )] = 0.7 × 0 + 0.3 × = 0.5.
3

Osvaldo Simeone ML4Engineers 56 / 56

You might also like