0% found this document useful (0 votes)

7 views

Untitled

The document contains various statistical terms and concepts, including Bayesian linear regression, conjugate prior distribution, and maximum-a-posteriori estimation. It discusses the relationships between likelihood functions, posterior distributions, and model complexity through Kullback-Leibler divergence. Additionally, it provides mathematical proofs and theorems related to cumulative distribution functions and Bayesian statistics.

Uploaded by

nelelen929

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Untitled

Uploaded by

nelelen929

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

654 CHAPTER V.

APPENDIX

D185 tcon t-contrast for contrast-based infer- JoramSoch 2022-12-16 475

ence in multiple linear regression
D186 fcon F-contrast for contrast-based infer- JoramSoch 2022-12-16 475
ence in multiple linear regression
D187 exg ex-Gaussian distribution tomfaulkenberry 2023-04-18 274
D188 skew Skewness tomfaulkenberry 2023-04-20 61
D189 bvn Bivariate normal distribution JoramSoch 2023-09-22 287
D190 skew-samp Sample skewness tomfaulkenberry 2023-10-30 61
D191 map Maximum-a-posteriori estimation JoramSoch 2023-12-01 126
D192 sr Scoring rule KarahanS 2024-02-28 135
D193 psr Proper scoring rule KarahanS 2024-02-28 135
D194 spsr Strictly proper scoring rule KarahanS 2024-02-28 135
D195 lpsr Log probability scoring rule KarahanS 2024-02-28 136
D196 fstat F-statistic JoramSoch 2024-03-15 610
D197 bsr Brier scoring rule KarahanS 2024-03-23 139
D198 lr Likelihood ratio JoramSoch 2024-06-14 117
D199 llr Log-likelihood ratio JoramSoch 2024-06-14 117
516 CHAPTER III. STATISTICAL MODELS

(1) (1)
1 |P1 | 1 |Λ | 1 |Λn |
LBF12 = log + log 0(2) − log (2)
2 |P2 | 2 |Λ0 | 2 |Λn |

(1)
Γ an
+ log + a0 log b0 − a(1)
(1) (1) (1)
(1) n log bn (9)
Γ a0

(2)
Γ an
− log − a0 log b0 + a(2)
(2) (2) (2)
(2) n log bn .
Γ a0

1.7 Bayesian linear regression with known covariance

1.7.1 Conjugate prior distribution
Theorem: Let

y = Xβ + ε, ε ∼ N (0, Σ) (1)
be a linear regression model (→ III/1.5.1) with measured n × 1 data vector y, known n × p design
matrix X and known n × n covariance matrix Σ as well as unknown p × 1 regression coeﬀicients β.
Then, the conjugate prior (→ I/5.2.5) for this model is a multivariate normal distribution (→ II/4.1.1)

p(β) = N (β; µ0 , Σ0 ) . (2)

Proof: By definition, a conjugate prior (→ I/5.2.5) is a prior distribution (→ I/5.1.3) that, when
combined with the likelihood function (→ I/5.1.2), leads to a posterior distribution (→ I/5.1.7) that
belongs to the same family of probability distributions (→ I/1.5.1). This is fulfilled when the prior
density and the likelihood function are proportional to the model parameters in the same way, i.e.
the model parameters appear in the same functional form in both.
Equation (1) implies the following likelihood function (→ I/5.1.2):
s
1 1 T −1
p(y|β) = N (y; Xβ, Σ) = exp − (y − Xβ) Σ (y − Xβ) . (3)
(2π)n |Σ| 2
Expanding the product in the exponent, we have:
s
1 1 T −1 T −1 T T −1 T T −1

p(y|β) = · exp − y Σ y − y Σ Xβ − β X Σ y + β X Σ Xβ . (4)
(2π)n |Σ| 2
Completing the square over β, one obtains
s
1 1
T T −1 T −1
p(y|β) = · exp − (β − X̃y) X Σ X(β − X̃y) − y Qy + y Σ y
T
(5)
(2π)n |Σ| 2
−1 T −1
where X̃ = X T Σ−1 X X Σ and Q = X̃ T X T Σ−1 X X̃.
1. UNIVARIATE NORMAL DATA 523

With the posterior distribution for Bayesian linear regression with known covariance (→ III/1.7.2),
this becomes:

n 1 1 T −1 T −1 T T −1

Acc(m) = − log(2π) − log |Σ| − y Σ y − 2y Σ Xβ + β X Σ Xβ . (9)
2 2 2 N (β;µn ,Σn )

If x ∼ N(µ, Σ), then its expected value is (→ II/4.1.9)

⟨x⟩ = µ (10)
and the expectation of a quadratic form is given by (→ I/1.10.9)

xT Ax = µT Aµ + tr(AΣ) . (11)
Thus, the model accuracy of m evaluates to

n 1
Acc(m) = − log(2π) − log |Σ|−
2 2
1 T −1
y Σ y − 2y T Σ−1 Xµn + µT T −1 T −1
n X Σ Xµn + tr(X Σ XΣn )
2 (12)
1 1 n 1
= − (y − Xµn )T Σ−1 (y − Xµn ) − log |Σ| − log(2π) − tr(X T Σ−1 XΣn )
2 2 2 2
(4) 1 T −1 1 n 1 T −1
= − ey Σ ey − log |Σ| − log(2π) − tr(X Σ XΣn )
2 2 2 2
which proofs the first part of (3).

2) The complexity penalty is the Kullback-Leibler divergence (→ I/2.5.1) of the posterior distribution
(→ I/5.1.7) p(β|y) from the prior distribution (→ I/5.1.3) p(β):

Com(m) = KL [p(β|y) || p(β)] . (13)

With the prior distribution (→ III/1.7.1) given by (2) and the posterior distribution for Bayesian
linear regression with known covariance (→ III/1.7.2), this becomes:

Com(m) = KL [N (β; µn , Σn ) || N (β; µ0 , Σ0 )] . (14)

With the Kullback-Leibler divergence for the multivariate normal distribution (→ II/4.1.12)

1 T −1 −1 |Σ1 |
KL[N (µ1 , Σ1 ) || N (µ2 , Σ2 )] = (µ2 − µ1 ) Σ2 (µ2 − µ1 ) + tr(Σ2 Σ1 ) − ln −n (15)
2 |Σ2 |
the model complexity of m evaluates to

1 T −1 −1 |Σn |
Com(m) = (µ0 − µn ) Σ0 (µ0 − µn ) + tr(Σ0 Σn ) − log −p
2 |Σ0 |
1 1 1 1 p (16)
= (µ0 − µn )T Σ−1 0 (µ0 − µn ) + log |Σ0 | − log |Σn | + tr(Σ−1
0 Σn ) −
2 2 2 2 2
(4) 1 T −1 1 1 1 −1 p
= eβ Σ0 eβ + log |Σ0 | − log |Σn | + tr(Σ0 Σn ) −
2 2 2 2 2
1. PROBABILITY THEORY 31

Proof: The cumulative distribution function (→ I/1.8.1) of a random variable (→ I/1.2.2) X is

defined as the probability that X is smaller than x:

FX (x) = Pr(X ≤ x) . (2)

The probability mass function (→ I/1.6.1) of a discrete (→ I/1.2.6) random variable (→ I/1.2.2) X
returns the probability that X takes a particular value x:

fX (x) = Pr(X = x) . (3)

Taking these two definitions together, we have:

(2) X
FX (x) = Pr(X = t)
t∈X
t≤x
X (4)
(3)
= fX (t) .
t∈X
t≤x

1.8.6 Cumulative distribution function of continuous random variable

Theorem: Let X be a continuous (→ I/1.2.6) random variable (→ I/1.2.2) with possible values X
and probability density function (→ I/1.7.1) fX (x). Then, the cumulative distribution function (→
I/1.8.1) of X is
Z x
FX (x) = fX (t) dt . (1)
−∞

Proof: The cumulative distribution function (→ I/1.8.1) of a random variable (→ I/1.2.2) X is

defined as the probability that X is smaller than x:

FX (x) = Pr(X ≤ x) . (2)

The probability density function (→ I/1.7.1) of a continuous (→ I/1.2.6) random variable (→ I/1.2.2)
X can be used to calculate the probability that X falls into a particular interval A:
Z
Pr(X ∈ A) = fX (x) dx . (3)
A
Taking these two definitions together, we have:

(2)
FX (x) = Pr(X ∈ (−∞, x])
Z x (4)
(3)
= fX (t) dt .
−∞

■
5. BAYESIAN STATISTICS 127

Then, the value of θ at which the posterior density (→ I/5.1.7) attains its maximum is called the
“maximum-a-posteriori estimate”, “MAP estimate” or “posterior mode” of θ:

θ̂MAP = arg max D(θ; ϕ) . (2)

Sources:
• Wikipedia (2023): “Maximum a posteriori estimation”; in: Wikipedia, the free encyclopedia, re-
trieved on 2023-12-01; URL: https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation#
Description.

5.1.9 Posterior density is proportional to joint likelihood

Theorem: In a full probability model (→ I/5.1.4) m describing measured data y using model pa-
rameters θ, the posterior density (→ I/5.1.7) over the model parameters is proportional to the joint
likelihood (→ I/5.1.5):

p(θ|y, m) ∝ p(y, θ|m) . (1)

Proof: In a full probability model (→ I/5.1.4), the posterior distribution (→ I/5.1.7) can be expressed
using Bayes’ theorem (→ I/5.3.1):

p(y|θ, m) p(θ|m)
p(θ|y, m) = . (2)
p(y|m)
Applying the law of conditional probability (→ I/1.3.4) to the numerator, we have:

p(y, θ|m)
p(θ|y, m) = . (3)
p(y|m)
Because the denominator does not depend on θ, it is constant in θ and thus acts a proportionality
factor between the posterior distribution and the joint likelihood:

p(θ|y, m) ∝ p(y, θ|m) . (4)

5.1.10 Combined posterior distribution from independent data

Theorem: Let p(θ|y1 ) and p(θ|y2 ) be posterior distributions (→ I/5.1.7), obtained using the same
prior distribution (→ I/5.1.3) from conditionally independent (→ I/1.3.7) data sets y1 and y2 :

p(y1 , y2 |θ) = p(y1 |θ) · p(y2 |θ) . (1)

Then, the combined posterior distribution (→ I/1.5.1) is proportional to the product of the individual
posterior densities (→ I/1.7.1), divided by the prior density:

p(θ|y1 ) · p(θ|y2 )
p(θ|y1 , y2 ) ∝ . (2)
p(θ)

A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
Matrix Formulation of The Lpps
No ratings yet
Matrix Formulation of The Lpps
13 pages
Stat520 Ch.4
No ratings yet
Stat520 Ch.4
5 pages
The University of Nottingham: Do NOT Turn Examination Paper Over Until Instructed To Do So
No ratings yet
The University of Nottingham: Do NOT Turn Examination Paper Over Until Instructed To Do So
6 pages
Untitled
No ratings yet
Untitled
5 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
Lecture4 More Bayes
No ratings yet
Lecture4 More Bayes
24 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
Stat 111
No ratings yet
Stat 111
7 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
No ratings yet
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
18 pages
ST 1
No ratings yet
ST 1
2 pages
Statistics
No ratings yet
Statistics
60 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Module C
No ratings yet
Module C
30 pages
final_soln
No ratings yet
final_soln
5 pages
Solution 5 Problem 1: Let a > 0 be a known constant, and let θ > 0 be a parameter
No ratings yet
Solution 5 Problem 1: Let a > 0 be a known constant, and let θ > 0 be a parameter
8 pages
Filt Ident Lecturenotes
No ratings yet
Filt Ident Lecturenotes
12 pages
The University of Nottingham
No ratings yet
The University of Nottingham
6 pages
Probability and Statistics Cheat Sheet
100% (2)
Probability and Statistics Cheat Sheet
28 pages
Assign 1
No ratings yet
Assign 1
5 pages
T10 Sol..ol
No ratings yet
T10 Sol..ol
8 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
formulasheetensvnew
No ratings yet
formulasheetensvnew
15 pages
Prints PDF
No ratings yet
Prints PDF
106 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Chap 2
No ratings yet
Chap 2
28 pages
STAT2102_Chapter6
No ratings yet
STAT2102_Chapter6
5 pages
Stat520 Ch.3
No ratings yet
Stat520 Ch.3
5 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
STATS-EXAM-FORMULAE
No ratings yet
STATS-EXAM-FORMULAE
6 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Ma40189 2016 2017 Problem Sheet 3 Solutions合并版
No ratings yet
Ma40189 2016 2017 Problem Sheet 3 Solutions合并版
67 pages
Appunti
No ratings yet
Appunti
34 pages
Probability and Statistics - Cookbook
No ratings yet
Probability and Statistics - Cookbook
28 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
Lecture Notes 1 36-705 Brief Review of Basic Probability
No ratings yet
Lecture Notes 1 36-705 Brief Review of Basic Probability
7 pages
Mathematics Handbook
No ratings yet
Mathematics Handbook
11 pages
Week 10
No ratings yet
Week 10
2 pages
W10 Notes
No ratings yet
W10 Notes
2 pages
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
No ratings yet
Stat 535 C - Statistical Computing & Monte Carlo Methods: Arnaud Doucet
23 pages
Lecture 1.3
No ratings yet
Lecture 1.3
7 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
murphysolns
No ratings yet
murphysolns
45 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
ML_Lec 2- Review of probability and statistics
No ratings yet
ML_Lec 2- Review of probability and statistics
30 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
College Statistics
No ratings yet
College Statistics
244 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Stats Cheat Sheet
No ratings yet
Stats Cheat Sheet
28 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
알파폴드1논문
No ratings yet
알파폴드1논문
27 pages
Comparative Analysis of PID, Cascade and Fuzzy Logic Control For The Efficient Temperature Control in CSTR
No ratings yet
Comparative Analysis of PID, Cascade and Fuzzy Logic Control For The Efficient Temperature Control in CSTR
6 pages
Binary Search Tree
No ratings yet
Binary Search Tree
22 pages
AOA Course Contents
No ratings yet
AOA Course Contents
5 pages
Chameleon PDF
100% (1)
Chameleon PDF
10 pages
Deep Learning para Previsao de Faltantes A Consultas
No ratings yet
Deep Learning para Previsao de Faltantes A Consultas
10 pages
Hillier and Leiberman Chapter 4
100% (1)
Hillier and Leiberman Chapter 4
65 pages
S & S 2 Marks
No ratings yet
S & S 2 Marks
30 pages
SHA1 Using JAVA and Its Explanation
No ratings yet
SHA1 Using JAVA and Its Explanation
3 pages
Applied Abstract Algebra with Mapletm and Matlab r Third Edition Richard E Klima pdf download
No ratings yet
Applied Abstract Algebra with Mapletm and Matlab r Third Edition Richard E Klima pdf download
56 pages
00. Analyzing Lower Half Facial Gestures for Lip Reading Applications Survey on Vision Techniques
No ratings yet
00. Analyzing Lower Half Facial Gestures for Lip Reading Applications Survey on Vision Techniques
45 pages
X Boost
No ratings yet
X Boost
2 pages
A Comprehensive Survey On Generative Diffusion Models For Structured Data
No ratings yet
A Comprehensive Survey On Generative Diffusion Models For Structured Data
20 pages
PERT & CPM Numerical Practice
No ratings yet
PERT & CPM Numerical Practice
37 pages
Mat4101 Linear-Algebra-And-optimization Th 1.0 4 Mat4101 Linear Algebra and Optimization
No ratings yet
Mat4101 Linear-Algebra-And-optimization Th 1.0 4 Mat4101 Linear Algebra and Optimization
3 pages
Lecture 23partial Derivatives - Total Differential and Derivatives of Composite Functions
No ratings yet
Lecture 23partial Derivatives - Total Differential and Derivatives of Composite Functions
18 pages
Equilibrium Conditions
No ratings yet
Equilibrium Conditions
7 pages
Download Full Algorithm Design 1st Edition Jon Kleinberg PDF All Chapters
No ratings yet
Download Full Algorithm Design 1st Edition Jon Kleinberg PDF All Chapters
41 pages
Polynomials - Worksheet1
No ratings yet
Polynomials - Worksheet1
2 pages
Shannon-Weaver Communication Model - Oral Communication Q1
No ratings yet
Shannon-Weaver Communication Model - Oral Communication Q1
39 pages
Numeric Methods and Computer Applications
No ratings yet
Numeric Methods and Computer Applications
5 pages
Playing Hide and Seek With Stored Keys: Adi Shamir and Nicko Van Someren September 22, 1998
No ratings yet
Playing Hide and Seek With Stored Keys: Adi Shamir and Nicko Van Someren September 22, 1998
9 pages
Sampling and Reconstruction: Hanhdn@hcmut - Edu.vn
No ratings yet
Sampling and Reconstruction: Hanhdn@hcmut - Edu.vn
48 pages
Continuous Human Action Recognition For Human Machine Interaction A Review
No ratings yet
Continuous Human Action Recognition For Human Machine Interaction A Review
31 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
45 pages
FPGA-based High-Performance Parallel Architecture For Homomorphic Computing On Encrypted Data
No ratings yet
FPGA-based High-Performance Parallel Architecture For Homomorphic Computing On Encrypted Data
12 pages
Genetic Algorithms - I
No ratings yet
Genetic Algorithms - I
1 page
Analysis of Inventory Models With Limited Demand Information
No ratings yet
Analysis of Inventory Models With Limited Demand Information
22 pages

Untitled

Uploaded by

Untitled

Uploaded by

654 CHAPTER V.

D185 tcon t-contrast for contrast-based infer- JoramSoch 2022-12-16 475

1.7 Bayesian linear regression with known covariance

p(β) = N (β; µ0 , Σ0 ) . (2)

If x ∼ N(µ, Σ), then its expected value is (→ II/4.1.9)

Com(m) = KL [p(β|y) || p(β)] . (13)

Com(m) = KL [N (β; µn , Σn ) || N (β; µ0 , Σ0 )] . (14)

Proof: The cumulative distribution function (→ I/1.8.1) of a random variable (→ I/1.2.2) X is

FX (x) = Pr(X ≤ x) . (2)

fX (x) = Pr(X = x) . (3)

1.8.6 Cumulative distribution function of continuous random variable

Proof: The cumulative distribution function (→ I/1.8.1) of a random variable (→ I/1.2.2) X is

FX (x) = Pr(X ≤ x) . (2)

θ̂MAP = arg max D(θ; ϕ) . (2)

5.1.9 Posterior density is proportional to joint likelihood

p(θ|y, m) ∝ p(y, θ|m) . (1)

p(θ|y, m) ∝ p(y, θ|m) . (4)

5.1.10 Combined posterior distribution from independent data

p(y1 , y2 |θ) = p(y1 |θ) · p(y2 |θ) . (1)

You might also like