0% found this document useful (0 votes)

35 views

15 Model Averaging

This document discusses model averaging, where multiple models or estimators are combined using a weighted average. It describes Bayesian model averaging (BMA) where the weights are based on the posterior probability of each model. Smoothed or weighted AIC is also discussed as an alternative that does not assume a true model. For linear regression, the averaging estimator corresponds to averaging the coefficient estimates from each model. Mallows criterion is described as an optimal method for selecting weights that minimizes residuals while penalizing model complexity.

Uploaded by

dssd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

15 Model Averaging

Uploaded by

dssd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

16 Model Averaging

16.1 Framework
Let g be a (non-parametric) object of interest, such as a conditional mean, variance, density,
or distribution function. Let g^m ; m = 1; :::; M be a discrete set of estimators. Most commonly,
this set is the same as we might consider for the problem of model selection. In linear regression,
typically g^m correspond to di¤erent sets of regressors. We will sometimes call the m’th estimator
the m’th “model”.
Let wm be a set of weights for the m’th estimator. Let w = (w1 ; :::; wM ) be the vector of
weights. Typically we will require

0 wm 1
M
X
wm = 1
m=1

The set of weights satisfying this condition is HM ; the unit simplex in RM :

An averaging estimator is
M
X
g^ (w) = wm g^m
m=1

It is commonly called a “model average estimator”.

Selection estimators are the special case where we impose the restriction wm 2 f0; 1g:

16.2 Model Weights

The most common method for weight speci…cation is Bayesian Model Averaging (BMA). As-
sume that there are M potential models and one of the models is the true model. Specify prior
probabilities that each of the potential models is the true model. For each model specify a prior
over the parameters. Then the posterior distribution is the weighted average of the individual
models, where the weights are Bayesian posterior probabilities that the given model is the true
model, conditional on the data.
Given di¤use priors and equal model prior probabilities, the BMA weights are approximately

1
exp BICm
2
wm =
PM 1
j=1 exp BICj
2

where
BICm = 2Lm + km log(n)

Lm is the negative log-likelihood, and km is the number of parameters in model m: BICm is the
Bayesian information criterion for model m: It is similar to AIC, but with the “2” replaced by

138
log(n).
The BMA estimator has the nice interpretation as a Bayesian estimator. The downside is that
it does not allow for misspeci…cation. It is designed to search for the “true” model, not to select
an estimator with low loss.
To remedy this situation, Burnham and Anderson have suggested replacing BIC with AIC,
resulting in what has been called smoothed AIC (AIC) or weighted AIC (WAIC). The weights are

1
exp AICm
2
wm =
PM 1
j=1 exp AICj
2
where
AICm = 2Lm + 2km

The suggestion goes back to Akaike, who suggested that these wm may be interpreted as model
probabilities. It is convenient and simple to implement. The idea can be applied quite broadly, in
any context where AIC is de…ned.
In simulation studies, the SAIC estimator performs very well. (In particular, better than
conventional AIC.) However, to date I have seen no formal justi…cation for the procedure. It
is unclear in what sense SAIC is producing a good approximation.

16.3 Linear Regression

In the case of linear regression, let Xm be regressor matrix for the m’th estimator. Then
the list of all regressors. Then the m’th estimator is

^ 0 1
m = Xm Xm Xm y
g^m = Xm ^ m
= Pm y

where
0 1
Pm = X m X m Xm Xm

The averaging estimator is

M
X
g^ (w) = wm g^m
m=1
XM
= wm Pm y
m=1
= P (w) y

139
where
M
X
P (w) = wm Pm
m=1

Let X be the matrix of all regressors. We can also write

M
X
0 1
g^ (w) = wm Xm Xm Xm Xm y
m=1
XM
= wm Xm ^ m
m=1
M
!
X ^
m
= X wm
m=1
0
= X ^ (w)

where !
M
X ^
^ (w) = m
wm
m=1
0

is the average of the coe¢ cient estimates. ^ (w) is the model average estimator for : In lin-
ear regression, there is a direct correspondence between the average estimator for the conditional
mean and the average estimator of the parameters, but this correspondence breaks down when the
estimator is not linear in the parameters.

16.4 Mallows Weight Selection

As pointed out above, in the linear regression setting, g^ (w) = P (w) y is a linear estimator, so
falls in the class studied by Li (1987). His framework allows for estimators indexed by w 2 HM
Under homoskedasticity, an optimal method for selection of w is the Mallows criterion. As we
discussed before, for estimators g^ (w) = P (w) y; the Mallows criterion is

C(w) = e^ (w)0 e^ (w) + 2 2

tr P (w)

where
e^ (w) = y g^ (w)

is the residual.

140
In averaging linear regression

M
X
tr P (w) = tr wm Pm
m=1
M
X
= wm tr Pm
m=1
XM
= wm km
m=1
= w0 K

where km is the number of coe¢ cients in the m’th model, and K = (k1 ; :::; kM )0 . The penalty is
twice w0 K; the (weighted) average number of coe¢ cients.
Also

e^ (w) = y g^ (w)
M
X
= wm (y g^m )
m=1
XM
= wm e^m
m=1
= ^
ew

where e^m is the n 1 residual vector from the m’th model, and ^
e = [^
e1 ; :::; e^M ] is the n M matrix
of residuals from all M models.
We can then write the criterion as

C(w) = w0^
e0 ^
ew + 2 2
w0 K

This is quadratic in the vector w:

The Mallows selected weight vector minimizes the criterion C(w) over w 2 HM ; the unit
simplex.
^ = argmin C(w)
w
w2HM

This is a quadratic programming problem with inequality constraints, which is pre-programmed in

Gauss and Matlab, so computation of w
^ is a simple command.
The Mallows selected estimator is then

g^ = g^(w)
^
M
X
= w
^m g^m
m=1

141
This is an

16.5 Weight Selection Optimality

As we discussed in the section on model selection, Li (1987) provided a set of su¢ cient con-
ditions for the Mallows selected estimator to be optimal, in the sense that the squared error is
asymptotically equivalent to the infeasible optimum. The key condition was
X s
(nR(w)) !0 (1)
w

In Hansen (Econometrica, 2007), I show that this condition is satis…ed if we restrict the set of
weights to a discrete set.
Recall that HM is the unit simplex in RM :
1 2
Now restrict w 2 HM HM ; where the weights in HM are elements of f ; ; :::; 1g for some
N N
integer N: In that paper, I show that Li’s condition (1) over w 2 HM holds under the similar
conditions as model selection, namely if the models are nested,

n = inf nR(w) ! 1
w2HM

and
4(N +1)
E ei j Xi < 1:

Thus model averaging is asymptotically optimal, in the sense that

L(w)
^
!p 1
inf w2HM L(w)

where, again
1
L(w) = (^
g (w) g)0 (^
g (w) g)
n
The proof is similar to that for model selection in linear regression. The restriction of w to a
discrete set is necessary to directly apply Li’s theorem, as the summation requires discreteness.
The discreteness was relaxed in a paper by Wan, Zhang, and Zou (2008, Least Squares Model
Combining by Mallows Criterion, working paper). Rather than proving (1), they provided a more
basic derivation, although using stronger conditions. Recall that the proof requires showing uniform
convergence results of the form

je0 b(w)j
sup !p 0
w2HM nR(w)

142
where
M
X
b(w) = wm bm
m=1
bm = (I Pm ) g

Here is their proof: First,

M
X
je0 b(w)j je0 bm j je0 bm j
sup wm max
w2HM nR(w) n 1 m M n
m=1

Second, by Markov’s and Whittle’s inequalities

M
X
je0 bm j je0 bm j
P max > P >
1 m M n n
m=1
XM
E je0 bm j2G
2G 2G
m=1 n

XM
jb0m bm jG
K 2G 2G
m=1 n

XM 0 ) G
nR(wm
K 2G 2G
m=1 n

0 is the weight vector with a 1 in the m’th place and zeros elsewhere. Equivalently,
where wm
0 ) is the expected squared error from the m’th model. The …nal inequality uses the fact from
nR(wm
the analysis for model selection that

0
nR(wm ) = b0m bm + 2
km b0m bm

Wan, Zhang, and Zou then assume

PM 0 ) G
m=1 nR(wm
2G
!0
n

PM 0 ) G
This is stronger than the condition from my paper n ! 1; as it requires that m=1 nR(wm
2G
diverges slower than n : They also do not directly assume that the models are nested.

16.6 Cross-Validation Selection

Hansen and Racine (Jacknife Model Averaging, working paper).
In this paper, we substitute CV for the Mallows criterion. As a result, we do not require
homoskedasticity.

143
For the m0 th model, let e~m
i denote the leave-one-out (LOO) residuals for the i’th observation,
e.g.
1
e~m
i = yi Xim0 Xm0i Xmi Xm0i y i

and let e~m denote the n 1 vector of the e~m

i : Then the LOO averaging residuals are

M
X
e~i (w) = wm e~m
i
m=1

M
X
e~ (w) = wm e~m = ~
ew
m=1

where ~
e is an n M matrix whose mth column is e~m : Then the sum-of-squared LOO residuals is

CV (w) = e~ (w)0 e~ (w) = w0 e~0 e~w

which is quadratic in w .
The CV (or jacknife) selected weight vector w
^ minimizes the criterion CV (w) over the unit
simplex. As for Mallows selection, this is solved by quadratic programming.
The JMA estimator is then g^(w)
^
In Hansen-Racine, we show that the CV estimator is asymptotically equivalent to the infeasible
best weight vector, under the conditions

0 < min E e2i j Xi min E e2i j Xi < 1

i i
4(N +1)
E ei j Xi <1

n = inf nR(w) ! 1
w2HM
1
max max Xim0 Xm0 Xm Xm0 Xim ! 0
1 m M1 i n

16.7 Many Unsolved Issues

Model averaging for other estimators: e.g. densities or conditional densities

IV, GMM, EL, ET

Standard errors?

Inference

144

Business Statistics: Fourth Canadian Edition
No ratings yet
Business Statistics: Fourth Canadian Edition
33 pages
Malhotra17 Tif
No ratings yet
Malhotra17 Tif
12 pages
Speciality Packaging Case Study
100% (1)
Speciality Packaging Case Study
20 pages
Prediction of Random Variables
No ratings yet
Prediction of Random Variables
36 pages
AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION
No ratings yet
AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION
44 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
1xraftery Et All 1997
No ratings yet
1xraftery Et All 1997
14 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Bayesian Model Averaging For Linear Regression Models
No ratings yet
Bayesian Model Averaging For Linear Regression Models
14 pages
Appendix Robust Regression
No ratings yet
Appendix Robust Regression
8 pages
Robust Regression: 1 M-Estimation
No ratings yet
Robust Regression: 1 M-Estimation
8 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
No ratings yet
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
14 pages
Summary SC Microeconometrics
No ratings yet
Summary SC Microeconometrics
20 pages
Robust Geodetic Parameter Estimation Under Least Squares Through Weighting On The Basis of The Mean Square Error
No ratings yet
Robust Geodetic Parameter Estimation Under Least Squares Through Weighting On The Basis of The Mean Square Error
12 pages
Appendix Robust Regression
No ratings yet
Appendix Robust Regression
17 pages
Model Selection Techniques - An Overview: Jie Ding, Vahid Tarokh, and Yuhong Yang
No ratings yet
Model Selection Techniques - An Overview: Jie Ding, Vahid Tarokh, and Yuhong Yang
21 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
A Brief Overview of Bayesian Model Averaging: Chris Sroka, Juhee Lee, Prasenjit Kapat, Xiuyun Zhang
No ratings yet
A Brief Overview of Bayesian Model Averaging: Chris Sroka, Juhee Lee, Prasenjit Kapat, Xiuyun Zhang
70 pages
A Novel Bayesian Approach For Variable Selection in Linear Regression Models
No ratings yet
A Novel Bayesian Approach For Variable Selection in Linear Regression Models
24 pages
A Family of Median Based Estimators in Simple Random Sampling
No ratings yet
A Family of Median Based Estimators in Simple Random Sampling
11 pages
Linear Stochastic Models: 5.1 Least Squares
No ratings yet
Linear Stochastic Models: 5.1 Least Squares
12 pages
Model Selection and Model Averaging
No ratings yet
Model Selection and Model Averaging
16 pages
Sample Midterm Exam 6_Solutions
No ratings yet
Sample Midterm Exam 6_Solutions
10 pages
Linear Model
No ratings yet
Linear Model
14 pages
Regression
No ratings yet
Regression
11 pages
Chap7
No ratings yet
Chap7
7 pages
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
No ratings yet
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares
8 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Wk05 machine learning
No ratings yet
Wk05 machine learning
6 pages
Chapter 35 Bayesian Model Selection and Averaging Penny2007
No ratings yet
Chapter 35 Bayesian Model Selection and Averaging Penny2007
14 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Slides Estimation
No ratings yet
Slides Estimation
171 pages
Lesson 5 Model Selection
No ratings yet
Lesson 5 Model Selection
41 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall
No ratings yet
Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall
17 pages
Lec10 PDF
No ratings yet
Lec10 PDF
8 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
P6 Adaptive Filtering LMS
No ratings yet
P6 Adaptive Filtering LMS
25 pages
s-m-s-t-c--lecture-2425-3
No ratings yet
s-m-s-t-c--lecture-2425-3
61 pages
intro to regression
No ratings yet
intro to regression
4 pages
12 LinearModels1 Annotated
No ratings yet
12 LinearModels1 Annotated
34 pages
Course1 Review
No ratings yet
Course1 Review
45 pages
eng
No ratings yet
eng
10 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
MAFE208IU-L6 - Least Squares Regression
No ratings yet
MAFE208IU-L6 - Least Squares Regression
45 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
Similar With My Research
No ratings yet
Similar With My Research
8 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Seattle SISG 18 IntroQG Lecture08
No ratings yet
Seattle SISG 18 IntroQG Lecture08
21 pages
HW 5
No ratings yet
HW 5
5 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
2 - Multiple Linear Regression
No ratings yet
2 - Multiple Linear Regression
71 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Functions and Graphs
From Everand
Functions and Graphs
I. M. Gelfand
4/5 (1)
7 Single Index Models
No ratings yet
7 Single Index Models
7 pages
6 Partially Linear Regression
No ratings yet
6 Partially Linear Regression
12 pages
14 Model Selection
No ratings yet
14 Model Selection
24 pages
4 Conditional Density Estimation
No ratings yet
4 Conditional Density Estimation
4 pages
2 NW and Local Linear Regression
No ratings yet
2 NW and Local Linear Regression
14 pages
Assignment On Forecasting-Solution
No ratings yet
Assignment On Forecasting-Solution
13 pages
Regression Discontinuity Models: Pavel Coronado
No ratings yet
Regression Discontinuity Models: Pavel Coronado
63 pages
Final PHD Paal Sundsoey
No ratings yet
Final PHD Paal Sundsoey
167 pages
Econometrics All Chpter
No ratings yet
Econometrics All Chpter
233 pages
TSB Statistics Checklist
No ratings yet
TSB Statistics Checklist
2 pages
Ecofmet
No ratings yet
Ecofmet
2 pages
Estimation of Nonstationary Heterogeneous Panels
No ratings yet
Estimation of Nonstationary Heterogeneous Panels
13 pages
Lecture No. 20
No ratings yet
Lecture No. 20
19 pages
Lecture / Discussion Simultaneous Equations: Regression Analysis Tutorial
No ratings yet
Lecture / Discussion Simultaneous Equations: Regression Analysis Tutorial
20 pages
Business Statistics, 5 Ed.: by Ken Black
No ratings yet
Business Statistics, 5 Ed.: by Ken Black
34 pages
Density CPKO
No ratings yet
Density CPKO
18 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
ECS4863 TL201 2024 - Marking Guide
No ratings yet
ECS4863 TL201 2024 - Marking Guide
23 pages
Linear Regression: Rustom D. Sutaria - Avia Intelligence 2016, Dubai
No ratings yet
Linear Regression: Rustom D. Sutaria - Avia Intelligence 2016, Dubai
3 pages
Econometrics1 Cha2
100% (1)
Econometrics1 Cha2
77 pages
Econometrics Lecture0 CourseOverview
No ratings yet
Econometrics Lecture0 CourseOverview
8 pages
Durbin Watson Tabel (Anwar)
No ratings yet
Durbin Watson Tabel (Anwar)
151 pages
Master Thesis of ESG in China Market
No ratings yet
Master Thesis of ESG in China Market
377 pages
Economic Forecasting
No ratings yet
Economic Forecasting
7 pages
A Vector Auto-Regressive (VAR) Model
No ratings yet
A Vector Auto-Regressive (VAR) Model
21 pages
Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10
No ratings yet
Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10
6 pages
MS Excel Instruction Steps in Matrimony Conjoint Analysis
No ratings yet
MS Excel Instruction Steps in Matrimony Conjoint Analysis
8 pages
Lecture Notes Chapter 7 Exercises and Answers
No ratings yet
Lecture Notes Chapter 7 Exercises and Answers
102 pages
Methodology Used in Fdi in India
No ratings yet
Methodology Used in Fdi in India
3 pages
Chapter 04 - Multiple Regression
No ratings yet
Chapter 04 - Multiple Regression
23 pages
NumXL Functions
No ratings yet
NumXL Functions
11 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
34 pages

15 Model Averaging

Uploaded by

15 Model Averaging

Uploaded by

16 Model Averaging

The set of weights satisfying this condition is HM ; the unit simplex in RM :

It is commonly called a “model average estimator”.

16.2 Model Weights

16.3 Linear Regression

The averaging estimator is

Let X be the matrix of all regressors. We can also write

16.4 Mallows Weight Selection

C(w) = e^ (w)0 e^ (w) + 2 2

This is quadratic in the vector w:

This is a quadratic programming problem with inequality constraints, which is pre-programmed in

16.5 Weight Selection Optimality

Thus model averaging is asymptotically optimal, in the sense that

Here is their proof: First,

Second, by Markov’s and Whittle’s inequalities

Wan, Zhang, and Zou then assume

16.6 Cross-Validation Selection

and let e~m denote the n 1 vector of the e~m

CV (w) = e~ (w)0 e~ (w) = w0 e~0 e~w

0 < min E e2i j Xi min E e2i j Xi < 1

16.7 Many Unsolved Issues

IV, GMM, EL, ET

You might also like