3 PDF
3 PDF
Osvaldo Simeone
May 6, 2021
p=0.2;
stem([0,1],[1-p,p],’LineWidth’,2)
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
help binornd
binornd(1,p)
N=10;
x=binornd(1,p,N,1);
pest=mean(x)
N=1000;
x=binornd(1,p,N,1);
pest=mean(x)
q=[0.2,0.1,0.3,0.4];
stem([0,1,2,3],q,’LineWidth’,2)
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
%%%
help mnrnd
mnrnd(1,q) %one-hot representation
N=10;
xoh=mnrnd(1,q,N);
pest=mean(xoh)
%%%
N=1000;
xoh=mnrnd(1,q,N);
pest=mean(xoh)
dx=0.01;
xaxis=[-9:dx:3];
help normpdf
plot(xaxis,normpdf(xaxis,-3,2),’LineWidth’,2); %note that we need to specify
the standard deviation and not the variance
xlabel(’$x$’,’Interpreter’,’latex’)
ylabel(’$p(x)$’,’Interpreter’,’latex’)
%%%
normrnd(-3,2)
N=10;
x=normrnd(-3,2,N,1);
pest=mean(((-3<=x).*(x<=3)))
N=1000;
x=normrnd(-3,2,N,1);
pest=mean(((-3<=x).*(x<=3)))
where
Ex∼Cat(q) [x2 ] = 0.2 · 0 + 0.1 · 1 + 0.3 · 4 + 0.4 · 9 = 4.9
and
Ex∼Cat(q) [exp(x)] = 0.2 · 1 + 0.1 · exp(1) + 0.3 · exp(2) + 0.4 · exp(3) = 10.72.
So we we finally have
q=[0.2,0.1,0.3,0.4];
N=10000;
xoh=mnrnd(1,q,N);
x=xoh*[0,1,2,3]’; %convert from one-hot vector to scalar representation
expest=mean(x.ˆ2+3*exp(x))
where
Ex∼N (−3,4) [x] = −3
and
Ex∼N (−3,4) [x2 ] = (−3)2 + 4 = 13.
Therefore, we finally have
N=10000;
x=normrnd(-3,2,N,1);
expest=mean(x+3*x.ˆ2)
paxis=[0:0.01:1];
plot(paxis,paxis.*(1-paxis),’LineWidth’,2)
xlabel(’$p$’,’Interpreter’,’latex’)
ylabel(’Var$(p)$’,’Interpreter’,’latex’)
Compute the variance of random variable x ∼ Cat(q = [0.2, 0.1, 0.3, 0.4]T ) .
and
Ex∼Cat(q) [x2 ] = 0.2 · 0 + 0.1 · 1 + 0.3 · 4 + 0.4 · 9 = 4.9
to obtain
We have
and
MATLAB code:
x=[2;1]; %or x=[2,1]’;
y=[-1;3];
%represent as points on the plane
plot(x(1),x(2),’x’,’LineWidth’,2,’MarkerSize’,10); hold on
plot(y(1),y(2),’o’,’LineWidth’,2,’MarkerSize’,10)
ylim([0,3])
%represent as arrows
drawArrow = @(a,b) quiver( a(1),a(2),b(1)-a(1),b(2)-a(2),0);
a = [0 0];
drawArrow(a,x);
hold on
drawArrow(a,y);
axis equal %aspect ratio so that data units are the same
Inner product
x T y = y T x = 2 · (−1) + 1 · 3 = 1.
xT y 1
cos(θ) = =√ = 0.14.
||x||||y || 5 · 10
Are the two vectors linearly independent? No.
x=[2;1]; y=[-1;3];
x’*y %inner product
norm(x) %l2 norm (not squared)
norm(y) %l2 norm (not squared)
x’*y/(norm(x)*norm(y)) %angle
%%%
drawArrow = @(a,b) quiver( a(1),a(2),b(1)-a(1),b(2)-a(2),0);
a = [0 0];
drawArrow(a,x);
hold on
drawArrow(a,y);
z=[-1;2];
drawArrow(a,z);
ynorm=y/norm(y);
drawArrow(a,ynorm);
zp=0.6*x;
zm=-0.6*x;
drawArrow(a,zp);
drawArrow(a,zm);
1 −1 2 3 −1
Given matrices A = and B = ,
−1 2 −1 2 3
I compute the product AB;
I compute the product B T AT ;
I compute the product Diag([1, 2]T )B;
I is A symmetric?
I if it is symmetric, evaluate eigenvectors and eigenvalues of A;
1 −1 2 3 −1
Given matrices A = and B = , we have
−1 2 −1 2 3
and
3 −4
B T AT = (AB)T = 1 1 .
−4 7
We also have
T 1 0 2 3 −1 2 3 −1
Diag([1, 2] )B = = .
0 2 −1 2 3 −2 4 6
1 −1
A= is symmetric.
−1 2
A=[1,-1;-1,2]; B=[2,3,-1;-1,2,3];
A*B
B’*A’ %can be equivalently computed as (A*B)’
diag([1,2])*B %note that we have diag(A)=[1,2]
[U,L]=eig(A);
L %all eigenvalues are positive, so A is positive definite
U*L*U’ %this equals A
%%%
help mesh
x1axis=[-2:0.01:2];
x2axis=[-2:0.01:2];
[X1,X2] = meshgrid(x1axis,x2axis);
mesh(X1,X2,A(1,1)*X1.ˆ2+A(2,2)*X2.ˆ2+2*A(1,2)*X1.*X2)
[U,L]=eig(B*B’);
L %they are all positive and hence B*B’ is positive definite
rank(B*B’) %the matrix is 2x2 and the rank is 2 so it is invertible
inv(B*B’)
[U,L]=eig(B’*B);
L %one of the eigenvalues is zero and the others are positive, and
hence B’*B is positive semi-definite
rank(B’*B) %the matrix is 3x3 and the rank is 2 so it is not
invertible – try using inv(B’*B)!
x1 \x2 0 1
0 0.45 0.05 .
1 0.1 0.4
Marginal p(x2 ):
x1 \x2 0 1
0 0.45/(0.45 + 0.05) = 0.9 0.1
1 0.1/(0.1 + 0.4) = 0.2 0.8
N=100;
x1=binornd(1,0.5,N,1); %this generates samples x1 from the marginal
p(x1)=Bern(0.5)
x2=zeros(N,1);
for n=1:N
if (x1(n)==0)
x2(n)=binornd(1,0.1); %this generates sample x2 from the conditional
p(x2|x1=0)=Bern(0.1)
else
x2(n)=binornd(1,0.8); %this generates sample x2 from the conditional
p(x2|x1=1)=Bern(0.8)
end
end
pest=zeros(4,1);
for n=1:N
if ((x1(n)==0)&&(x2(n)==0))
pest(1)=pest(1)+1/N;
elseif ((x1(n)==1)&&(x2(n)==0))
pest(2)=pest(2)+1/N;
elseif ((x1(n)==0)&&(x2(n)==1))
pest(3)=pest(3)+1/N;
elseif ((x1(n)==1)&&(x2(n)==1))
pest(4)=pest(4)+1/N;
end
end
pest
x1 µ1
For a jointly Gaussian rv x = with mean vector µ = and
x2 µ2
2
σ1 σ12
covariance Σ = , prove the second equality in
σ12 σ22
x1 µ1
For a jointly Gaussian rv x = with mean vector µ = and
x µ2
2 2
σ1 σ12
covariance Σ = , prove the second equality in
σ12 σ22
We have
0
For a jointly Gaussian rv with mean vector µ = and covariance
0
2 −1
Σ= ,
−1 2
I compute the covariance coefficient ρ;
I verify that the covariance matrix is positive definite;
I evaluate the expectation Ex∼N (µ,Σ) [(x1 + x2 )2 ];
I evaluate the expectation Ex∼N (µ,Σ) [(x1 − x2 )2 ] and compare with your
result at the previous point;
I modify the covariance σ12 so that Ex∼N (µ,Σ) [(x1 + x2 )2 ] = 0;
I modify the covariance σ12 so that Ex∼N (µ,Σ) [(x1 − x2 )2 ] = 0.
Covariance coefficient ρ:
σ12 1
ρ= =− ,
σ1 σ2 2
so the two variables are negatively correlated, although not maximally so, since
ρ < 1 : Given that the mean is zero for both variables, when x1 is
positive/negative, x2 will tend to be negative/positive, and vice versa.
Since |ρ| < 1, the covariance is positive definite.
Expectation Ex∼N (µ,Σ) [(x1 + x2 )2 ] :
Ex∼N (µ,Σ) [(x1 + x2 )2 ] = Ex∼N (µ,Σ) [x21 ] + Ex∼N (µ,Σ) [x22 ] + 2Ex∼N (µ,Σ) [x1 x2 ]
= Ex1 ∼N (0,2) [x21 ] + Ex2 ∼N (0,2) [x22 ] + 2Ex∼N (µ,Σ) [x1 x2 ]
= (σ12 + µ21 ) + (σ22 + µ22 ) + 2(σ12 + µ1 µ2 )
= 2 + 2 + 2 · (−1) = 2.
Since the variables are negatively correlated, they tend to cancel each other.
Therefore, we have Ex∼N (µ,Σ) [(x1 + x2 )2 ] < Ex∼N (µ,Σ) [x21 ] + Ex∼N (µ,Σ) [x22 ] = 4.
In this case, x1 = x2 .
Produce
3D plots for a jointly Gaussian
pdfwith mean vector
0 2 −1
µ= and covariance Σ = .
0 −1 2
2 −1.9
Repeat for Σ = .
−1.9 2
Produce
3D plots for a jointly Gaussian
pdfwith mean vector
5 2 −1
µ= and covariance Σ = .
7 −1 2
x1=-3*sqrt(2):0.1:3*sqrt(2);
x2=x1;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)]; %X1(:) makes a vector out of matrix X1
mu=[0,0];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=\mu 2=0$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and $\sigma {12}=-
1$’,’Interpreter’,’latex’,’FontSize’,14)
x1=-3*sqrt(2):0.1:3*sqrt(2);
x2=x1;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)]; %X1(:) makes a vector out of matrix X1
mu=[0,0];
Sigma=[2 -1.9;-1.9 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=\mu 2=0$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and $\sigma {12}=-
1.9$’,’Interpreter’,’latex’,’FontSize’,14)
x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
mesh(x1,x2,y,’FaceAlpha’,0.5);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14)
zlabel(’$p(x 1,x 2)$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1$’,’Interpreter’,’latex’,’FontSize’,14)
Produce
contour
plots for a jointly
Gaussian pdf with mean vector
5 2 −1
µ= and covariance Σ = . Repeat for
7 −1 2
2 −1.9
Σ= .
−1.9 2
x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1;-1 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
contour(x1,x2,y);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1$’,’Interpreter’,’latex’,’FontSize’,14)
x1=[-3*sqrt(2):0.1:3*sqrt(2)]+5;
x2=[-3*sqrt(2):0.1:3*sqrt(2)]+7;
[X1,X2] = meshgrid(x1,x2);
X = [X1(:) X2(:)];
mu=[5,7];
Sigma=[2 -1.9;-1.9 2];
y = mvnpdf(X,mu,Sigma);
y = reshape(y,length(x2),length(x1));
contour(x1,x2,y);
xlabel(’$x 1$’,’Interpreter’,’latex’,’FontSize’,14);
ylabel(’$x 2$’,’Interpreter’,’latex’,’FontSize’,14);
title(’$\mu 1=5, \mu 2=7$, $\sigma 1ˆ2=\sigma 2ˆ2=2$ and
$\sigma {12}=-1.9$’,’Interpreter’,’latex’,’FontSize’,14)
µ1
For a jointly Gaussian rv with mean vector µ = and
µ2
2
σ1 σ12
covariance Σ = , interpret the formula for the
σ12 σ22
conditional distribution
σ1 2 2
(x1 |x2 = x2 ) ∼ N µ1 + ρ (x2 − µ2 ), σ1 (1 − ρ )
σ2
as a function of a.
Then, optimize over a by equating the derivative with respect to a to
zero.
How does this solution compare with the conditional mean ρx2
obtained in the previous problem?
We have
Using the formula for the joint pdf of a two-dimensional jointly Gaussian vector,
show that, if the covariance is zero, i.e., if σ12 = 0, then the two variables are
independent.
1 (x1 − µ1 )2 (x2 − µ2 )2
1
N (x|µ, Σ) = exp − +
2πσ1 σ2 2 σ12 σ22
2
1 (x2 − µ2 )2
1 1 (x1 − µ1 ) 1
= p exp − exp −
σ12 σ22
p
2πσ12 2 2πσ22 2
= N (x1 |µ1 , σ12 ) · N (x2 |µ2 , σ22 ),
Pr(positive|cancer)
Pr(cancer|positive) = Pr(cancer) × .
| {z } Pr(positive)
prior | {z }
likelihood ratio
We have
0.9
0.01 ×
Pr(cancer|positive) = |{z} = 0.083.
0.01 · 0.9 + 0.99 · 0.1
prior | {z }
likelihood ratio
In this case, the answer does not depend on the sensitivity of the test.
Osvaldo Simeone ML4Engineers 54 / 56
Problem 2.20
For the joint distribution p(x1 , x2 ) in the table below, compute the
average E(x1 ,x2 )∼p(x1 ,x2 ) [x1 x2 + x22 ] using the law of iterated
expectations.
x1 \x2 0 1
0 0.6 0.1
1 0.1 0.2
We first compute the expectation using the conditional p(x1 |x2 ) and
then using the marginal p(x2 ). Using the law of iterated expectations,
we obtain: