0% found this document useful (0 votes)

4 views

Lecture-03_Estimation_basics

Uploaded by

kuangau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Lecture-03_Estimation_basics

Uploaded by

kuangau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Introduction to Deep

Generative Modeling Lecture #3

HY-673 – Computer Science Dep., University of Crete
Professors: Yannis Pantazis & Yannis Stylianou
TAs: Michail Raptakis & Michail Spanakis
Taxonomy of Deep Generative Models Lecture #3
According to the Likelihood Function

GMs

Exact Approximate Implicit

ARMs NFs VAEs EBMs DPMs GANs GGFs

(R)NADE Planar Vanilla Belief nets diffusion Vanilla KALE

WaveNet Coupling β-VAE Boltzmann denoising WGAN Lipschitz-reg.
WaveRNN MAFs/IAFs VQ-VAE machines score 𝑓-GAN …
GPT … … … … (𝑓, Γ)-GAN
…
Introduction to Estimator Theory Lecture #3

Let D = {x1 , . . . , xn } be a set of data drawn from pd (x), and pθ (x) be a

family of models with θ ∈ Θ. A point estimator θ̂ = θ̂(D) is a random variable
for which we want:

pθ̂ (x) ≈ pd (x)

Introduction to Estimator Theory Lecture #3

• How to construct an estimator?

– Maximum Likelihood Estimation (MLE)
– Maximum A Posteriory (MAP) Estimation
– Based on a Probability Distance or a Divergence (implicit)
– Bayesian Inference (learns a distribution for the
estimator’s parameters)
Maximum Likelihood Estimator Lecture #3
Maximum Likelihood Estimator Lecture #3

− Ln (θ̂1 ) > Ln (θ̂2 ) implies that θ̂1 is

more likely to have generated
the observed samples x1 , ..., xn .

− Thus, it provides a ranking of model’s

fitness/accuracy/matching to the data.
MLE Example #1 Lecture #3

d
L(θ̂; D)
dθ
MLE Example #2 Lecture #3
MLE Example #3 Lecture #3

Partial derivative
or gradient vector:
MLE Example #3 Lecture #3

Maximizing L(θ) is equivalent to

minimizing the Sum of Squares
(Least Squares)

Exactly the same solution as LS!

MLE Example #4 Lecture #3

• Logistic regression with sigmoids a.k.a. binary classification.

Dataset: D = {(x1 , y1 ), . . . , (xn , yn )} with xi ∈ Rd and yi ∈ {0, 1},
Model family: pθ (yi = 1|xi ) = σ(θT xi ), pθ (yi = 0|xi ) = 1 − pθ (yi = 1|xi ),
θ ∈ Rd and σ(z) = 1+e1−z be the sigmoid function.
MLE Example #4 Lecture #3

Learning rate
Maximum Likelihood Estimator Lecture #3
Kullback-Leibler Divergence (KLD) Lecture #3

• Geometric interpretation:
MLE is equivalent to minimizing the KLD of pd (x) w.r.t. pθ (x).
Maximum Likelihood Estimator Lecture #3

where the cross entropy of probability P with PDF p(x) with respect to proba-
bility Q with PDF q(x) is defined as
Kullback-Leibler Divergence Lecture #3

• MLE is also equivalent to minimizing the KLD of pd (x) w.r.t. pθ (x).

arg max L(θ; pd ) = arg min DKL (pd ||pθ )

θ θ

• The Kullback-Leibler divergence (KLD) of P w.r.t. Q is defined as:

! ! !
p(x)
DKL (P ||Q) := log p(x)dx = log p(x)p(x)dx − log q(x)p(x)dx
q(x)

DKL (P ||Q) = −H(P ) + H × (P ||Q). Entropy Cross Entropy

Kullback-Leibler Divergence Lecture #3

DKL (P ||Q) ≥ 0 and

Jensen’s inequality
Maximum A Posteriori Estimator Lecture #3

arg max p(θ|D)

θ
Maximum A Posteriori Estimator Lecture #3
Maximum A Posteriori Estimator Lecture #3

• Linear model: D = {(x1 , y1 ), . . . , (xn , yn )}, xi ∈ Rd , yi ∈ R, model:

yi = θT xi + ϵi , ϵi ∼ N (0, 1)

− p(θ) = N (0, λ−1 Id ) ⇒ rigde regression a.k.a. (Tikhonov) regularized

Least Squares.
− p(θ) = Laplace(0, λ−1 ) ⇒ lasso regression (least absolute shrinkage and
selection operator).
Estimator Assessment Lecture #3

• Basic toolkit to assess an estimator:

Estimator Assessment Lecture #3
Estimator Assessment Lecture #3
Estimator Assessment Lecture #3

Chebyshev’s inequality
Estimator Assessment Lecture #3
Estimator Assessment Lecture #3
Estimator Assessment Lecture #3

• Let θ̂1 and θ̂2 be two unbiased estimators of θ∗ . θ̂1 is more eﬃcient than
θ̂2 if and only if Var(θ̂1 ) < Var(θ̂2 ).
Estimator Assessment Lecture #3
Estimator Assessment Lecture #3
Lecture #3
HY-673
References Lecture #3

1. All of statistics: A Concise Course in Statistical Inference (Chapters 6 & 9)

Larry Wasserman, Springer (2004)

3. Matrix Calculus:
https://en.wikipedia.org/wiki/Matrix_calculus
https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
Introduction to Deep
Generative Modeling Lecture #3
HY-673 – Computer Science Dep., University of Crete
Professors: Yannis Pantazis & Yannis Stylianou
TAs: Michail Raptakis & Michail Spanakis

FN-6 Operation Manual
75% (4)
FN-6 Operation Manual
22 pages
International Politics Concepts Theories and Issues Basu Rumki
100% (4)
International Politics Concepts Theories and Issues Basu Rumki
567 pages
Ericsson Spo 1400 Family: Packet Optical Transport-Etsi
No ratings yet
Ericsson Spo 1400 Family: Packet Optical Transport-Etsi
4 pages
cs236_lecture4
No ratings yet
cs236_lecture4
25 pages
Lecture 17 - KL Divergence, Autoencoders
No ratings yet
Lecture 17 - KL Divergence, Autoencoders
54 pages
Deep Learning A Tutorial
No ratings yet
Deep Learning A Tutorial
16 pages
Deep Learning 2017 Lecture7GAN
No ratings yet
Deep Learning 2017 Lecture7GAN
62 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Class3 ML MaxEnt
No ratings yet
Class3 ML MaxEnt
6 pages
L3good Neural
No ratings yet
L3good Neural
18 pages
cs236_lecture5
No ratings yet
cs236_lecture5
29 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
MUML Preliminiaries
No ratings yet
MUML Preliminiaries
24 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
ACV - Notes - Final
No ratings yet
ACV - Notes - Final
7 pages
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
No ratings yet
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
62 pages
Understanding Diffusion Models: A Unified Perspective
No ratings yet
Understanding Diffusion Models: A Unified Perspective
23 pages
6 Probabilities
No ratings yet
6 Probabilities
52 pages
Chapter 5
No ratings yet
Chapter 5
140 pages
cs236_lecture3
No ratings yet
cs236_lecture3
36 pages
02_review_estimation_2
No ratings yet
02_review_estimation_2
36 pages
Theory of Deep Learning 1652786371
No ratings yet
Theory of Deep Learning 1652786371
118 pages
Therml: Thermodynamics of Machine Learning: Box & Draper 1987 1A
No ratings yet
Therml: Thermodynamics of Machine Learning: Box & Draper 1987 1A
16 pages
Mod5_Slides
No ratings yet
Mod5_Slides
37 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
Maximum Likelihood Estimators and Least Squares
No ratings yet
Maximum Likelihood Estimators and Least Squares
5 pages
08.02.How to Generate an Estimator
No ratings yet
08.02.How to Generate an Estimator
8 pages
Notes On Kullback-Leibler Divergence and Likelihood Theory
No ratings yet
Notes On Kullback-Leibler Divergence and Likelihood Theory
4 pages
3a Variations
No ratings yet
3a Variations
17 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Bayesian NN
No ratings yet
Bayesian NN
82 pages
3a Variations4
No ratings yet
3a Variations4
5 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
L20-GenerativeModels
No ratings yet
L20-GenerativeModels
53 pages
Lecture04 VDL
No ratings yet
Lecture04 VDL
93 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
lec05
No ratings yet
lec05
46 pages
Frequentist Estimation: 4.1 Likelihood Function
No ratings yet
Frequentist Estimation: 4.1 Likelihood Function
6 pages
1807.04162v3
No ratings yet
1807.04162v3
24 pages
Intro To Vae
No ratings yet
Intro To Vae
89 pages
DLAI4 Networks Gans
No ratings yet
DLAI4 Networks Gans
7 pages
MLE_Assingnment (1)
No ratings yet
MLE_Assingnment (1)
7 pages
practicalMachineLearning_lecture3
No ratings yet
practicalMachineLearning_lecture3
25 pages
18.650 Statistics For Applications
No ratings yet
18.650 Statistics For Applications
25 pages
Generative Adversarial Networks For Data
No ratings yet
Generative Adversarial Networks For Data
86 pages
CH 1 Introduction
No ratings yet
CH 1 Introduction
19 pages
DL145611_03_Shallow
No ratings yet
DL145611_03_Shallow
92 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
An Introduction To Variational Autoencoders: Foundations and Trends in Machine Learning
No ratings yet
An Introduction To Variational Autoencoders: Foundations and Trends in Machine Learning
89 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
5 DL Loss Functions
No ratings yet
5 DL Loss Functions
72 pages
lec12
No ratings yet
lec12
15 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
DGM 2023 Endterm Solution
No ratings yet
DGM 2023 Endterm Solution
12 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Adlerian Theory
No ratings yet
Adlerian Theory
1 page
Sikandar Industries - FAP (Residential Building - New)
No ratings yet
Sikandar Industries - FAP (Residential Building - New)
2 pages
Anecdotal Record Final
No ratings yet
Anecdotal Record Final
1 page
Unit - 1 - Topicwise - Level 2
No ratings yet
Unit - 1 - Topicwise - Level 2
6 pages
PHD Thesis in Computer Science PDF in India
100% (3)
PHD Thesis in Computer Science PDF in India
8 pages
Jumbos and Jumping Devils
No ratings yet
Jumbos and Jumping Devils
313 pages
E PipeAlator
No ratings yet
E PipeAlator
9 pages
Curso Siemens NX Parte 48
No ratings yet
Curso Siemens NX Parte 48
13 pages
Conduct o Me Try
No ratings yet
Conduct o Me Try
111 pages
Germany Reborn 1934th Edition Hermann Goering - Quickly download the ebook to never miss any content
100% (3)
Germany Reborn 1934th Edition Hermann Goering - Quickly download the ebook to never miss any content
51 pages
HP2 Z12
No ratings yet
HP2 Z12
5 pages
Zeroth Review Ppt (1)
No ratings yet
Zeroth Review Ppt (1)
10 pages
Neet Admit Card Final-compressed
No ratings yet
Neet Admit Card Final-compressed
3 pages
Stem Collaborative Lesson Plan
No ratings yet
Stem Collaborative Lesson Plan
5 pages
The Nash Ensemble: Dohnányi
No ratings yet
The Nash Ensemble: Dohnányi
16 pages
Espm
No ratings yet
Espm
44 pages
Commentary To The Book of JEREMIAH - Rev. John Schultz
100% (3)
Commentary To The Book of JEREMIAH - Rev. John Schultz
300 pages
JEEligiblelist 20220311
No ratings yet
JEEligiblelist 20220311
32 pages
Đề Thi Secondary Checkpoint Science 2024 October Paper 2
100% (5)
Đề Thi Secondary Checkpoint Science 2024 October Paper 2
16 pages
Interfacing The Arduino With Matlab Using
No ratings yet
Interfacing The Arduino With Matlab Using
2 pages
Assignment On "Overdenture": Submitted by
No ratings yet
Assignment On "Overdenture": Submitted by
6 pages
1-03 of Mice and Men John Steinbeck PowerPoint Presentation
No ratings yet
1-03 of Mice and Men John Steinbeck PowerPoint Presentation
24 pages
Polyglycidol-Stabilized Nanoparticles As A Promising Alternative To
No ratings yet
Polyglycidol-Stabilized Nanoparticles As A Promising Alternative To
13 pages
Presentation On Migrate Labour in Punjab
No ratings yet
Presentation On Migrate Labour in Punjab
26 pages
Spreadsheet Test
No ratings yet
Spreadsheet Test
4 pages
Zahid Darzi Our Message
No ratings yet
Zahid Darzi Our Message
37 pages
Dela Cruz v. People November 19, 2014
No ratings yet
Dela Cruz v. People November 19, 2014
11 pages

Lecture-03_Estimation_basics

Uploaded by

Lecture-03_Estimation_basics

Uploaded by

Introduction to Deep

Generative Modeling Lecture #3

Exact Approximate Implicit

ARMs NFs VAEs EBMs DPMs GANs GGFs

(R)NADE Planar Vanilla Belief nets diffusion Vanilla KALE

Let D = {x1 , . . . , xn } be a set of data drawn from pd (x), and pθ (x) be a

pθ̂ (x) ≈ pd (x)

• How to construct an estimator?

− Ln (θ̂1 ) > Ln (θ̂2 ) implies that θ̂1 is

− Thus, it provides a ranking of model’s

Maximizing L(θ) is equivalent to

Exactly the same solution as LS!

• Logistic regression with sigmoids a.k.a. binary classification.

• MLE is also equivalent to minimizing the KLD of pd (x) w.r.t. pθ (x).

arg max L(θ; pd ) = arg min DKL (pd ||pθ )

• The Kullback-Leibler divergence (KLD) of P w.r.t. Q is defined as:

DKL (P ||Q) = −H(P ) + H × (P ||Q). Entropy Cross Entropy

DKL (P ||Q) ≥ 0 and

arg max p(θ|D)

• Linear model: D = {(x1 , y1 ), . . . , (xn , yn )}, xi ∈ Rd , yi ∈ R, model:

− p(θ) = N (0, λ−1 Id ) ⇒ rigde regression a.k.a. (Tikhonov) regularized

• Basic toolkit to assess an estimator:

1. All of statistics: A Concise Course in Statistical Inference (Chapters 6 & 9)

You might also like