0% found this document useful (0 votes)
4 views

Lecture-02_Probability_basics

Uploaded by

kuangau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture-02_Probability_basics

Uploaded by

kuangau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to Deep

Generative Modeling Lecture #2


HY-673 – Computer Science Dep., University of Crete
Professors: Yannis Pantazis & Yannis Stylianou
TAs: Michail Raptakis & Michail Spanakis
What is probability? Lecture #2

•Frequentist Answer:
What is probability? Lecture #2

•Axiomatic Answer (Kolmogorov, 1933):


Let Ω be a sample space, F be a event space (e.g., σ-algebra of Ω)
and P be a measure. If

1. For all A ∈ F: P (A) ≥ 0.

= P (A) + P (B). (σ-additivity)


then, (Ω, F , P ) is a probability space.
Examples of event space Lecture #2

Fair die: Ω1 = {ω1 , . . . , ω6 } with P (ωi ) = 61 ,

Fair coin: Ω2 = {ψ1 , ψ2 } with P (ψj ) = 21 ,


Product spaces:
Ω = Ω1 × Ω2 , or
Ω = Ω 1 × Ω1
Basic properties Lecture #2

0. Unity: Venn Diagram

1. Independence*:
A, B independent events → P (A ∩ B) = P (A) · P (B).

2. Conditional Probability:

*Independence: “The occurrence of one event does not


affect the probability of occurrence of the other.”
Basic properties Lecture #2

3. Chain Rule:

4. Bayes’ Theorem:
P (B|A)P (A)
If then P (A|B) = P (B) .
What is a Random Variable? Lecture #2

• We’ll use the notation: x(ω), X(ω), Y (ω), ... or just x, X, Y, ....
Basic properties of r.v.s Lecture #2

• Cumulative Density Function (CDF), or Distribution Function:

FX (x) := P (X ≤ x) ≡ P ({ω ∈ Ω : x (ω) ≤ x}).

• limx→−∞ FX (x) = P (∅) = 0.


• limx→+∞ FX (x) = P (Ω) = 1.
Basic properties of r.v.s Lecture #2

• Probability Density Function (PDF):


Example of a CDF/PDF Lecture #2

• Mixed-type PDF:

0 1 2 0 1 2
Dirac Delta Function (or impulse):
Moments of a r.v. Lecture #2

• Expected Value:

“center of mass”

empirical

*i.i.d: “Independent and identically distributed random variables.”


Moments of a r.v. Lecture #2

• Variance: • Standard Deviation:


! "
2
= EX (X − µX )

! 2
"
= EX X − µ2X .
Jensen’s Inequality Lecture #2

• Convexity: g(·) convex ⇐⇒ g(tx1 + (1 − t)x2 ) ≤ tg(x1 ) + (1 − t)g(x2 )

The line that connects 𝒙𝟏 , 𝒙𝟐 is


above g 𝒙 , 𝒙 ∈ [𝒙𝟏 , 𝒙𝟐 ].

• Jensen’s Inequality:
g(·) convex
⇒ g (E[X]) ≤ E [g(X)]
Markov’s Inequality Lecture #2

• Proof:
Chebyshev's Inequality Lecture #2

Proof: Apply Markov’s inequality to Y = (X − µ)2 .


• E.g., for k = 2:

5
• When X is Gaussian, it holds: P (|X − µ| ≤ 2σ) ≤ 100 .
Sampling an r.v. given 𝐹! 𝑥 Lecture #2

• Probability Integral Transform:


Sampling an R.V. given 𝐹! 𝑥 Lecture #2

• Inverse version (Inverse Transform Sampling):

Very important and popular example:


categorical distribution or softmax.

• Algorithm:
Function of an r.v. Lecture #2

! ! ! !
! d −1 ! 1
fY (y)= fX (g (y)) · ! dy g (y)!= fX (g (y)) · ! g′ (g−1 (y)) !.
−1 −1 ! !

Proof: Examples:
! "
y−b
fY (y) = a1 fX a ,

fY (y)
Joint r.v.s Lecture #2

• Joint CDF of 2 random variables:

Limits: PDF:
Independence Lecture #2

• Independence of 𝑋, 𝑌:
Sum of two independent r.v.s Lecture #2

= P (X + Y ≤ z)
= P (X ≤ z − Y )
! +∞
= −∞ P (X ≤ z − y)fY (y)dy
! +∞
= −∞ FX (z − y)fY (y)dy

Examples: Sum of 2 uniform


distributions, sum of 2 dice, etc.
Conditional PDFs Lecture #2
Lecture #2
HY-673

• Conditional Probability: Example: Die with even/odd events as conditions


fXY (x,y) f (x,y)
= = ! +∞ XY
fX (x) f
−∞ XY
(x,y)dy

• Marginal Probability: Both a consequence of:


! +∞
= −∞
fY |X (y|x)fX (x)dx
• Bayes Theorem:
fX|Y (x|y)fX (x)
= fX|Y (x|y)fY (y)
= fY (y)
Multivariate Gaussian Distribution Lecture #2

• Notation:
x = (x1 , . . . , xd )T , x ∼ N (µ, Σ).

• PDF General Form:


N (x|µ, Σ)
Multivariate Gaussian Distribution Lecture #2

• Mean Vector:

• Covariance Matrix:
Multivariate Gaussian Distribution Lecture #2

• Marginals (2D example):


! "
σ12 σ12
= 2 .
σ21 σ2
!
p(x1 ) = fX1 (x1 ) = fX (X|µ, Σ)dx2 = N (x1 |µ1 , σ12 ).

= ... = N (x2 |µ2 , σ22 ).


Multivariate Gaussian Distribution Lecture #2

• Geometric interpretation: Correlation


coefficient

𝒖𝒊 : eigenvectors
𝝀𝒊 : eigenvalues
Multivariate Gaussian Distribution Lecture #2

• How to sample:
x = µ + σz ∼ N (µ, σ 2 ) , z ∼ N (0, 1)
Cholesky decomposition
Reparametrization trick

x = µ + Lz ∼ N (µ, Σ) , z ∼ N (0, Id )

Sampling from 1d normal Sampling from multivariate normal


import numpy as np mu = [1, 2]
mu, sigma = 0, 0.1 # mean and standard deviation Sigma = [[1, 2], [2, 4]]
x = np.random.normal(mu, sigma, 1000) x = np.random.multivariate_normal(mu, Sigma, 1000)
x.shape à (1000,) x.shape à (1000,2)
type(x) à <class 'numpy.ndarray'>
Multivariate Gaussian Distribution Lecture #2

• Conditional probability:

Schur complement

p(xA |xB ) = N (xA |µA + ΣAB Σ−1


BB (x B − µ B ), Σ AA − Σ AB BB ΣBA )
Σ −1

p(xB |xA ) = N (xB |µB + ΣBA Σ−1 (x


AA A − µ A ), Σ BB − Σ BA AA ΣAB )
Σ −1
Multivariate Gaussian Distribution Lecture #2

• Conditional probability example:


! "
σ12 σ12
x =
σ21 σ22

p(x2 |x1 ) = N (x2 |µ2 + σ21 σ1−2 (x1 − µ1 ), σ22 − σ21


2 −2
σ1 )
− Variance (a.k.a. uncertainty) is reduced whenever there is correlation!
Asymptotics Lecture #2

• Law of Large Numbers:


!n
Let {Xk } be a sequence of i.i.d. r.v.s and Sn = k=1 Xk be the sum r.v.
If µ = E[Xk ] exists, then ∀ϵ > 0:
Asymptotics Lecture #2

• Central Limit Theorem:


!n
Let {Xk } be a sequence of i.i.d. r.v.s and Sn = k=1 Xk be the sum r.v.
Suppose that µ = E[X], and σ 2 = var(X) exist. Then, ∀β fixed:

• In other words:
References Lecture #2

1. All of statistics: A Concise Course in Statistical Inference (Chapters 1–4)


Larry Wasserman, Springer (2004)

2. Probabilistic Machine Learning: An Introduction (Chapters 2–3)


Kevin P Murphy, The MIT Press (2022)
Introduction to Deep
Generative Modeling Lecture #2
HY-673 – Computer Science Dep., University of Crete
Professors: Yannis Pantazis & Yannis Stylianou
TAs: Michail Raptakis & Michail Spanakis

You might also like