0% found this document useful (0 votes)

2 views13 pages

Lecture_3 (2)

The document presents a lecture on generative models for real-world systems with a focus on the Bayesian concept. It covers topics such as Bayesian learning, prior and posterior processing, the Naïve Bayes classifier, and applications of the Bayesian approach. The lecture concludes with a summary of the methods and practical applications discussed.

Uploaded by

Elias Tol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views13 pages

Lecture_3 (2)

Uploaded by

Elias Tol

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

National Technical University of Ukraine

“Igor Sikorsky Kyiv Polytechnic Institute”

Institute of Physics and Technology

Lecture 3
Generative Models for Real-World Systems.
Bayesian Concept

Dmytro Progonov,
PhD, Associate Professor
Content
• Bayesian concept learning;
• Prior and posterior processing;
• Naïve Bayes classifier;
• Bayesian approach applications.

Generative Models for Real-World Systems.

Bayesian Concept 2/13
Bayesian concept learning
Concept 𝐶
(e.g. prime numbers) True

Initial dataset 𝒟 Does next element 𝑥𝑁+1

𝒟 = 𝑥1 , 𝑥2 , ⋯ 𝑥𝑁 belong to 𝒟?

Observer’s False
guessing

Prior Guessing Posterior

𝑝 𝑦 = 𝑐|𝐱, 𝛉 ∝ 𝑝 𝐱|𝑦 = 𝑐, 𝛉 ∙ 𝑝 𝑦 = 𝑐|𝛉

Observer’s Supposed Observed Unknown

answer class features parameters

Generative Models for Real-World Systems.

Bayesian Concept 3/13
Prior and posterior processing (1/4)
Prior
Likelihood
𝑁 𝑁
1 1
𝑝 𝒟|ℎ = = ,
𝑠𝑠𝑠𝑠 ℎ ℎ
𝒟 − observed data;
ℎ − total number of items;
𝑁 − number of sampled (with replacement) items.
Occam’s razor:
The model favors the simplest (smallest) hypothesis consistent with the data.
Jeffreys-Linley paradox:
Bayes-oriented decision systems will always favor the simpler model, since the
probability of the observed data under a complex model with a very diffuse prior will
be very small.
Posterior
𝑝 𝒟|ℎ 𝑝 ℎ 𝑝 ℎ 𝕀 𝒟∈ℎ ⁄ℎ𝑁
𝑝 ℎ|𝒟 = =
∑ℎ́∈ℋ 𝑝 𝐷, ℎ́ ∑ℎ́∈ℋ 𝑝 ℎ́ 𝕀 𝒟 ∈ ℎ́ � ℎ́
𝑁

where 𝕀 𝒟 ∈ ℎ is 1 if and only if all the data are in extension of the hypothesis ℎ.
Generative Models for Real-World Systems.
Bayesian Concept 4/13
Prior and posterior processing (2/4)
If we have enough data, the posterior 𝑝 ℎ|𝒟 becomes Maximum A Posterior Estimate
(MAP):

𝑝 𝒟|ℎ → 𝛿ℎ�𝑀𝑀𝑀 ℎ ,

ℎ�𝑀𝑀𝑀 = argmaxℎ 𝑝 ℎ|𝒟 − the posterior mode;

1, 𝑥 ∈ 𝐴
𝛿𝑥 𝐴 = � − Dirac measure.
0, 𝑥 ∉ 𝐴

Note that the MAP estimate can be written as:

ℎ�𝑀𝑀𝑀 = argmax 𝑝 𝒟|ℎ 𝑝 ℎ = argmax log 𝑝 𝒟|ℎ + log 𝑝 ℎ .
ℎ ℎ

As we get more and more data, the MAP estimate converges towards the Maximum
Likelihood Estimate (MLE):
ℎ�𝑀𝐿𝐿 = argmax 𝑝 𝒟|ℎ = argmax log 𝑝 𝒟|ℎ .
ℎ ℎ

Generative Models for Real-World Systems.

Bayesian Concept 5/13
Prior and posterior processing (3/4)
The way to test if our beliefs are justified is to use them to predict objectively observable
quantities with usage of posterior predictive distribution:

𝑝 𝑥� ∈ 𝐶|𝒟 = � 𝑝 𝑦 = 1|𝑥�, ℎ 𝑝 ℎ|𝒟 .

ℎ

When we have a small and/or ambiguous dataset, the posterior 𝑝 ℎ|𝒟 is vague, which
induces a broad predictive distribution. However, once we have “figured things out”, the
posterior becomes a delta function centered at the MAP estimate. In this case, we can use
plug-in approximation:

𝑝 𝑥� ∈ 𝐶|𝒟 = � 𝑝 𝑥�| ℎ 𝛿ℎ� ℎ = 𝑝 𝑥�|ℎ� .

ℎ

Generative Models for Real-World Systems.

Bayesian Concept 6/13
Prior and posterior processing (4/4)
In general, computing the Maximum a Posterior Estimate 𝑝 𝒟|ℎ can be quite difficult. One
simple but popular approximation is known as the Bayesian information criterion (BIC)

�
𝑑𝑑𝑑 𝛉
� −
𝐵𝐵𝐵 ≜ log 𝑝 𝒟|𝛉 log 𝑁 ≈ log 𝑝 𝒟 ,
2

� − the
� − maximum likelihood estimation of used model parameters; 𝑑𝑑𝑑 𝛉
where 𝜽
number of degrees of freedom in used model.

The BIC method is very closely related to the Minimum Description Length or MDL
principle, which characterizes the score of a model in terms of how well it fits the data,
minus how complex the model is to define.

A very similar expression of BIC / MDL is called the Akaike information criterion or AIC

� 𝑀𝑀𝑀 − 𝑑𝑑𝑑 𝑚 .
𝐴𝐴𝐴 𝑚, 𝒟 ≜ log 𝑝 𝒟|𝜽

Generative Models for Real-World Systems.

Bayesian Concept 7/13
Examples. Beta-binomial model (1/3)
Suppose 𝑋𝑖 ~𝐵𝐵𝐵 𝜃 , where 𝑋𝑖 = 1 represents “heads”, 𝑋𝑖 = 0 represents “tails”, and
𝜃 ∈ 0; 1 is the rate parameter (probability of head). If the data are iid, the likelihood has
the form:

𝑝 𝒟|𝜃 = 𝜃𝑁1 1 − 𝜃 𝑁0
.

where we have 𝑁1 = ∑𝑁 𝑁
𝑖=1 𝕀 𝑥𝑖 = 1 heads and 𝑁0 = ∑𝑖=1 𝕀 𝑥𝑖 = 0 tails, 𝑁 = 𝑁0 + 𝑁1 is
observed trials. In this case we have 𝑁1 ~𝐵𝐵𝐵 𝑁, 𝜃 , which has following pdf:
𝑛 𝑘 𝑛−𝑘
𝐵𝐵𝐵 𝑘|𝑛, 𝜃 = 𝜃 1−𝜃 .
𝑘
𝑛
Since binomial coefficients is a constant independent of 𝜃, the likelihood of the
𝑘
binomial sampling model is the same as the likelihood for the Bernoulli model – any
inference we have about 𝜃 will be same whether we observe the counts 𝒟 = 𝑁1 , 𝑁 or
sequence of trials 𝒟 = 𝑥1 , ⋯ 𝑥𝑁 .

Generative Models for Real-World Systems.

Bayesian Concept 8/13
Examples. Beta-binomial model (2/3)
To make the math easier, it would be convenient if the prior had the same form as the
likelihood:
𝑝 𝜃 ∝ 𝜃 𝛾1 1 − 𝜃 𝛾0

for some prior parameters 𝛾1 and 𝛾2 . Then we could easily evaluate the posterior by simply
adding the exponents
𝑝 𝜃|𝒟 ∝ 𝑝 𝒟|𝜃 𝑝 𝜃 = 𝜃 𝑁1 1 − 𝜃 𝑁0 𝛾1
𝜃 1−𝜃 𝛾0
= 𝜃𝑁1+𝛾1 1 − 𝜃 𝑁0 +𝛾0

When the prior and the posterior have the same form, we say that the prior is a conjugate
prior for the corresponding likelihood. In case of the Bernoulli, the conjugate prior is the
beta distribution:
𝐵𝐵𝐵𝐵 𝜃|𝑎, 𝑏 ∝ 𝜃 𝑎−1 1 − 𝜃 𝑏−1

Then posterior is
𝑝 𝜃|𝒟 ∝ 𝐵𝐵𝐵 𝑁1 |𝜃, 𝑁0 + 𝑁1 𝐵𝐵𝐵𝐵 𝜃|𝑎, 𝑏 ∝ 𝐵𝐵𝐵𝐵 𝜃|𝑁1 + 𝑎, 𝑁0 + 𝑏

Generative Models for Real-World Systems.

Bayesian Concept 9/13
Examples. Beta-binomial model (3/3)
Consider predicting the probability of heads in a single future trial under a Beta 𝑎, 𝑏
posterior:
1 1
𝑎
𝑝 𝑥� = 1|𝒟 = � 𝑝 𝑥 = 1|𝜃 𝑝 𝜃|𝒟 𝑑𝑑 = � 𝜃Beta 𝜃|𝑎, 𝑏 𝑑𝑑 = 𝔼 𝜃|𝒟 = .
0 0 𝑎+𝑏

Zero count (sparse data) problem – occurs when estimating counts from small
amount of data;

Black swan paradox – problem of how to draw general conclusion about the future from
specific observation from the past.
Suppose now we were interested in predicting the number of heads, 𝑥, in 𝑀 future trials:
1
𝑝 𝑥|𝒟, 𝑀 = � Bin 𝑥|𝜃, 𝑀 Beta 𝜃|𝑎, 𝑏 𝑑𝑑 =
0
1
𝑀 1 𝑀 𝐵 𝑥 + 𝑎, 𝑀 − 𝑥 + 𝑏
� 𝜃𝑥 1 − 𝜃 𝑀−𝑥 𝑎−1
𝜃 1−𝜃 𝑏−1
𝑑𝑑 = .
𝑥 𝐵 𝑎, 𝑏 0 𝑥 𝐵 𝑎, 𝑏

Generative Models for Real-World Systems.

Bayesian Concept 10/13
Naïve Bayes classifier
Let us classify vector of discrete-valued features 𝐱 ∈ 1, ⋯ , 𝐾 𝐷 , where 𝐾 is number of
values for each feature, 𝐷 is the number of features. The simplest approach is to assume
the features are conditionally independent given the class label:
𝐷

𝑝 𝐱|𝑦 = 𝑐, 𝛉 = � 𝑝 𝑥𝑗 |𝑦 = 𝑐, 𝛉𝑗𝑗 .
𝑗=1

The model is called “naïve” since we do not expect the features to be independent, even
conditional on the class label. One reason of successful application of naïve Bayes classifier
is that the model is quite simple, and hence it is relatively immune to overfitting.

Type of feature Recommended type of class-conditional density

𝐷
Real-valued 𝑝 𝐱|𝑦 = 𝑐, 𝛉 = � 𝒩 𝑥𝑗 |𝜇𝑗𝑗 , 𝜎𝑗𝑗 2
𝑗=1
𝐷
Binary 𝑝 𝐱|𝑦 = 𝑐, 𝛉 = � Ber 𝑥𝑗 |𝜇𝑗𝑗
𝑗=1
𝐷
Categorical 𝑝 𝐱|𝑦 = 𝑐, 𝛉 = � Multinoulli 𝑥𝑗 |𝛍𝐣𝐣
𝑗=1
Generative Models for Real-World Systems.
Bayesian Concept 11/13
Bayesian approach applications
Hierarchical Bayes (multi-level model) based on putting a prior on used prior:
𝜂 → 𝜃 → 𝒟.

Empirical Bayes violates the principle that the prior should be chosen independently of the
data:

𝜂� = argmax 𝑝 𝒟|𝛈 = argmax � 𝑝 𝒟|𝛉 𝑝 𝛉|𝛈 𝑑𝛉 .

Method Definition

Maximum likelihood � = argmax𝛉 𝑝 𝒟|𝛉

𝛉

MAP estimation � = argmax𝛉 𝑝 𝒟|𝛉 𝑝 𝛉|𝛈

𝛉

Empirical Bayes � = argmax𝛈 𝑝 𝒟|𝛈

𝛈

Full Bayes 𝑝 𝛉, 𝛈|𝒟 ∝ 𝑝 𝒟|𝛉 𝑝 𝛉|𝛈 𝑝 𝛈

Generative Models for Real-World Systems.
Bayesian Concept 12/13
Conclusion
• Concept of Bayesian learning was considered;
• Methods for processing the prior and
posterior information were presented;
• Practical applications of Bayesian approach
were shown.

Review of Probability Theory and its Usage

in System Identification Tasks 13/13

Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
GATE Industrial Engineering Book
100% (1)
GATE Industrial Engineering Book
12 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
BSC ML CH2.pptx
No ratings yet
BSC ML CH2.pptx
79 pages
CS-601-Machine-learning-Unit-5 (1)
No ratings yet
CS-601-Machine-learning-Unit-5 (1)
18 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
No ratings yet
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
24 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
Notes4_BayesianLearning
No ratings yet
Notes4_BayesianLearning
8 pages
Heckerman95 BN KnowledgePLUSData
No ratings yet
Heckerman95 BN KnowledgePLUSData
47 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
No ratings yet
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
49 pages
Module 5
No ratings yet
Module 5
24 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
IML Module 3.pptx
No ratings yet
IML Module 3.pptx
95 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
Data Analytics Unit-2 PPT Notes
No ratings yet
Data Analytics Unit-2 PPT Notes
190 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
Bayesian-inference-slides-2021
No ratings yet
Bayesian-inference-slides-2021
37 pages
3.1 New
No ratings yet
3.1 New
12 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
bayesian-inference
No ratings yet
bayesian-inference
18 pages
HW 4
No ratings yet
HW 4
6 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
Unit III
No ratings yet
Unit III
19 pages
5 ML NaiveBayes
No ratings yet
5 ML NaiveBayes
45 pages
Computing Bayes: Bayesian Computation From 1763 To The 21st Century
No ratings yet
Computing Bayes: Bayesian Computation From 1763 To The 21st Century
47 pages
Bark08 Ghahramani Samlbb 01
No ratings yet
Bark08 Ghahramani Samlbb 01
26 pages
Concept Learning
No ratings yet
Concept Learning
33 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
UNIT-3
No ratings yet
UNIT-3
99 pages
ML Module 4 Chapter 8 RNSIT
No ratings yet
ML Module 4 Chapter 8 RNSIT
5 pages
Bayesian Inference: A Practical Primer: Outline
No ratings yet
Bayesian Inference: A Practical Primer: Outline
28 pages
Unit-4
No ratings yet
Unit-4
36 pages
Module05 - Bayesian Reasoning
No ratings yet
Module05 - Bayesian Reasoning
37 pages
AI & ML Unit 2 Notes
No ratings yet
AI & ML Unit 2 Notes
12 pages
7. Statistical Perspective
No ratings yet
7. Statistical Perspective
85 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
Performance Comparison and Implementation of Bayesian Variants For Network Intrusion Detection
No ratings yet
Performance Comparison and Implementation of Bayesian Variants For Network Intrusion Detection
5 pages
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
No ratings yet
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
188 pages
ML pp8_u2
No ratings yet
ML pp8_u2
35 pages
Module - 4 AIML
No ratings yet
Module - 4 AIML
22 pages
ML Unit 3 Bayesian - Learning (Textbook)
No ratings yet
ML Unit 3 Bayesian - Learning (Textbook)
25 pages
An Overview of Bayesian Econometrics
No ratings yet
An Overview of Bayesian Econometrics
30 pages
Johnson11MLSS Talk Extras
No ratings yet
Johnson11MLSS Talk Extras
73 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Slide 1
No ratings yet
Slide 1
37 pages
Chapter 6 Bayesianlearning
No ratings yet
Chapter 6 Bayesianlearning
32 pages
Unit 3 Bayesian Concept Learning
No ratings yet
Unit 3 Bayesian Concept Learning
66 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
Bayesian Inference
No ratings yet
Bayesian Inference
22 pages
Naive Bayesian Classifier: National Institute of Technology Sikkim
No ratings yet
Naive Bayesian Classifier: National Institute of Technology Sikkim
6 pages
ml3 - Text Classification – Naive Bayes
No ratings yet
ml3 - Text Classification – Naive Bayes
50 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
Bayesian Basics: Ryan P. Adams
No ratings yet
Bayesian Basics: Ryan P. Adams
7 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Identity-Based Key Management in Manets Using Public Key Cryptography
No ratings yet
Identity-Based Key Management in Manets Using Public Key Cryptography
8 pages
Two Marks Question With Answers
No ratings yet
Two Marks Question With Answers
12 pages
Sumona
No ratings yet
Sumona
4 pages
Important Questions for DAA
No ratings yet
Important Questions for DAA
2 pages
Lecture Notes On Problem Discretization Using Approximation Theory
No ratings yet
Lecture Notes On Problem Discretization Using Approximation Theory
73 pages
SAMPLE PAPER 2 2025
No ratings yet
SAMPLE PAPER 2 2025
3 pages
Learning Cyber Security and Machine Engineering at The University
No ratings yet
Learning Cyber Security and Machine Engineering at The University
6 pages
Week 3 Appled Thermo Dynamics
No ratings yet
Week 3 Appled Thermo Dynamics
5 pages
Week 11: Searching: STIA2024 Data Structures & Algorithm Analysis
No ratings yet
Week 11: Searching: STIA2024 Data Structures & Algorithm Analysis
28 pages
05 Naive Bayes - Relationship To Language Modeling 4-35
No ratings yet
05 Naive Bayes - Relationship To Language Modeling 4-35
2 pages
Adsw 3
No ratings yet
Adsw 3
4 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
DBMS KCS501 Question Paper
No ratings yet
DBMS KCS501 Question Paper
2 pages
Model Exam Retest QM-18
No ratings yet
Model Exam Retest QM-18
4 pages
HomeworkSolutionsF05 PDF
No ratings yet
HomeworkSolutionsF05 PDF
47 pages
Arc Length and Curvature.
No ratings yet
Arc Length and Curvature.
15 pages
Solution Test 1 - Novemeber 2022
No ratings yet
Solution Test 1 - Novemeber 2022
2 pages
Phase Plane Analysis - 2
No ratings yet
Phase Plane Analysis - 2
16 pages
W11 Trees PDF
No ratings yet
W11 Trees PDF
8 pages
2330293Lab7SubmissionPPJ
No ratings yet
2330293Lab7SubmissionPPJ
13 pages
Chapter 3
No ratings yet
Chapter 3
10 pages
Brain Tumour Detection Using M-IRO-Journals-3 4 5
No ratings yet
Brain Tumour Detection Using M-IRO-Journals-3 4 5
12 pages
Computer Eng Study Plan
No ratings yet
Computer Eng Study Plan
1 page
Encryption Techniques
No ratings yet
Encryption Techniques
6 pages
VB Ex and Solutions
No ratings yet
VB Ex and Solutions
7 pages
MTH100 Assignment No. 2
No ratings yet
MTH100 Assignment No. 2
5 pages
資料探勘技術在晶圓針測誤宰分析之應用 Applying Data Mining Techniques to the Overkill Analysis of Wafer Testing
No ratings yet
資料探勘技術在晶圓針測誤宰分析之應用 Applying Data Mining Techniques to the Overkill Analysis of Wafer Testing
57 pages
Ptspunit2 VRC
No ratings yet
Ptspunit2 VRC
164 pages
Radix Sort
No ratings yet
Radix Sort
10 pages

Lecture_3 (2)

Uploaded by

Lecture_3 (2)

Uploaded by

National Technical University of Ukraine

“Igor Sikorsky Kyiv Polytechnic Institute”

Generative Models for Real-World Systems.

Initial dataset 𝒟 Does next element 𝑥𝑁+1

Prior Guessing Posterior

𝑝 𝑦 = 𝑐|𝐱, 𝛉 ∝ 𝑝 𝐱|𝑦 = 𝑐, 𝛉 ∙ 𝑝 𝑦 = 𝑐|𝛉

Observer’s Supposed Observed Unknown

Generative Models for Real-World Systems.

ℎ�𝑀𝑀𝑀 = argmaxℎ 𝑝 ℎ|𝒟 − the posterior mode;

Note that the MAP estimate can be written as:

Generative Models for Real-World Systems.

𝑝 𝑥� ∈ 𝐶|𝒟 = � 𝑝 𝑦 = 1|𝑥�, ℎ 𝑝 ℎ|𝒟 .

𝑝 𝑥� ∈ 𝐶|𝒟 = � 𝑝 𝑥�| ℎ 𝛿ℎ� ℎ = 𝑝 𝑥�|ℎ� .

Generative Models for Real-World Systems.

Generative Models for Real-World Systems.

Generative Models for Real-World Systems.

Generative Models for Real-World Systems.

Generative Models for Real-World Systems.

Type of feature Recommended type of class-conditional density

𝜂� = argmax 𝑝 𝒟|𝛈 = argmax � 𝑝 𝒟|𝛉 𝑝 𝛉|𝛈 𝑑𝛉 .

Maximum likelihood � = argmax𝛉 𝑝 𝒟|𝛉

MAP estimation � = argmax𝛉 𝑝 𝒟|𝛉 𝑝 𝛉|𝛈

Empirical Bayes � = argmax𝛈 𝑝 𝒟|𝛈

Full Bayes 𝑝 𝛉, 𝛈|𝒟 ∝ 𝑝 𝒟|𝛉 𝑝 𝛉|𝛈 𝑝 𝛈

Review of Probability Theory and its Usage

You might also like