0% found this document useful (0 votes)
7 views

slides-5-iu

The document discusses endogeneity in regression analysis, highlighting its causes such as measurement errors, simultaneity, and omitted variables, which lead to biased and inconsistent estimates when using Ordinary Least Squares (OLS). It introduces Instrumental Variable (IV) regression as a solution, detailing the process of choosing instruments, the identification problem, and the two-stage estimation procedure. Additionally, it covers diagnostic tests for weak instruments, endogeneity, and overidentifying restrictions to ensure the validity of the instruments used in the analysis.

Uploaded by

Ngô Trâm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

slides-5-iu

The document discusses endogeneity in regression analysis, highlighting its causes such as measurement errors, simultaneity, and omitted variables, which lead to biased and inconsistent estimates when using Ordinary Least Squares (OLS). It introduces Instrumental Variable (IV) regression as a solution, detailing the process of choosing instruments, the identification problem, and the two-stage estimation procedure. Additionally, it covers diagnostic tests for weak instruments, endogeneity, and overidentifying restrictions to ensure the validity of the instruments used in the analysis.

Uploaded by

Ngô Trâm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

ENDOGENEITY AND

INSTRUMENTAL VARIABLE REGRESSION

Trương Đăng Thụy


[email protected]
ENDOGENEITY
▪ OLS assumption

Ε 𝜀|𝑋 = 0 and Ε 𝜀𝑋 = 0
meaning the residuals are not correlated with 𝑋

▪ Endogeneity happen when the residuals are correlated with 𝑋

Ε 𝜀𝑋 ≠ 0

Note that 𝛽 = 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑦, suggests that 𝑋 ′ 𝑋𝛽 = 𝑋 ′ 𝑦 and so 𝑋 ′ 𝑋𝛽 = 𝑋 ′ 𝑋𝛽 + 𝑒 or


𝑋 ′ 𝑋𝛽 = 𝑋 ′ 𝑋𝛽 + 𝑋 ′ 𝑒. If 𝑋 ′ 𝑒 ≠ 0, then 𝛽 ≠ 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑦.
CONSEQUENCES OF ENDOGENEITY
𝑦 = 𝑋𝛽 + 𝜀

x y
𝜕𝑦 𝜕𝜀
=𝛽+
𝜕𝑥 𝜕𝑥
ε
▪ Suppose 𝜀 is correlated with 𝑥
▪ When 𝑥 changes, 𝜀 changes as well
▪ So 𝛽 does not indicate the impact of 𝑥 on 𝑦
▪ If we use OLS in a regression with endogeneity:

BIASED AND INCONSISTENT ESTIMATES


REASONS FOR ENDOGENEITY
▪ errors in variables
▪ simultaneity (jointly endogenous variables), or reverse
causality
▪ omitted variables
ENDOGENEITY: MEASUREMENT ERRORS
▪ Consider a regression 𝑦 = 𝛽𝑥 + 𝜀
▪ Suppose we can’t observe 𝑦, 𝑥
▪ Instead, we observe 𝑦 ∗ , 𝑥 ∗ which contains errors
𝑦 = 𝑦 ∗ + 𝑣 and 𝑥 = 𝑥 ∗ + 𝑢
▪ The equation we regress becomes
መ ∗ + 𝜀 ⇒ 𝑦 = 𝛽𝑥
𝑦 ∗ = 𝛽𝑥 መ + 𝜀 − 𝑣 + 𝛽𝑢

መ +𝜔
or 𝑦 = 𝛽𝑥 መ = 𝜀 − 𝑣 + 𝛽መ 𝑥 − 𝑥 ∗
where 𝜔 = 𝜀 − 𝑣 + 𝛽𝑢
▪ Because 𝜔 contains 𝑥, so it is correlated with 𝑥.
▪ This results in endogeneity.
ENDOGENEITY: SIMULTANEITY
▪ Consider an individual or HH demand equation
𝑞𝑑 = 𝛼1 𝑝 + 𝛼2 𝑦 + 𝑢𝑑
▪ If the commodity is not homogenous, and
▪ there are a varieties of goods with different quality and prices
▪ goods at higher prices are of higher quality

▪ Then consumers make choice on prices and quality simultaneously to maximize


their utility
▪ Therefore, the equation should have been a system of two equations
▪ When we reduce the system into a single equation, 𝑢𝑑 turns out to be correlated
with 𝑝, which is endogeneity.
ENDOGENEITY: OMITTED VARIABLES
▪ Suppose the true model is

𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜀
▪ and suppose we omit 𝑥2

𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝜇
where 𝜇 = 𝛽2 𝑥2 + 𝜀
▪ Then Ε 𝜇𝑥1 ≠ 0 if Cov 𝑥1 , 𝑥2 ≠ 0
▪ This results in endogeneity in 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝜇
SOLUTION TO ENDOGENEITY:
INSTRUMENTAL VARIABLE REGRESSION
CHOOSING INSTRUMENTAL VARIABLES
▪ Consider a model
𝑦 = 𝛼 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝜀
▪ Suppose 𝑋2 is endogenous Ε 𝜀𝑋2 ≠ 0
▪ Instrumental variables (instruments) 𝑍 must satisfy
▪ exogeneity (uncorrelated with 𝜀)
▪ relevance (correlated with 𝑋2 )

How many IVs are enough?


IDENTIFICATION PROBLEM
▪ If 𝑘 is the number of endogenous variables, and ℎ is the number of instruments,
then

▪ If ℎ < 𝑘 the model is unidentified (not allowed)


▪ If ℎ = 𝑘 the model is just-identified (most preferred)
▪ If ℎ > 𝑘 the model is over-identified (need to test for overidentifying
restrictions)
IV ESTIMATION METHODS
FOR CROSS-SECTIONAL DATA
▪ IV estimator
▪ 2-stage estimation procedure
▪ 2SLS estimator (most popular)
▪ LIML estimator (Limited Information Maximum Likelihood)
▪ reduce bias associated with weak instruments compared to 2SLS

▪ GMM
▪ more efficient under heteroskedasticity, if the model is correctly specified
▪ produce same estimates as 2SLS estimates if using VCV as the weighting matrix
▪ [for panel data]: be cautious when using lagged variables as instruments as they
may not satisfy exogeneity requirement.
TWO-STAGE ESTIMATION PROCEDURE
TWO-STAGE ESTIMATION PROCEDURE
▪ Consider a regression
𝑦 = 𝛼 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝜀
▪ Suppose 𝑋2 is endogenous Ε 𝜀𝑋2 ≠ 0
▪ 𝑍 is the instruments
▪ The 2SLS procedure is
▪ Stage 1: Regress the endogenous variable 𝑋2 on 𝑋1 and 𝑍
𝑋2 = 𝛾0 + 𝛾1 𝑋1 + 𝛾2 𝑍 + 𝑣
෢2 = 𝛾ො0 + 𝛾ො1 𝑋1 + 𝛾ො2 𝑍
Then compute the fitted values 𝑋
▪ Stage 2: Regress
𝑦 = 𝛼 + 𝛽1 𝑋1 + 𝛽2 𝑋෠2 + 𝜀
2-STAGE PROCEDURE FOR MULTIPLE
ENDOGENOUS REGRESSORS
▪ If there are multiple endogenous regressors
𝑋2 = 𝑋21 , 𝑋22 , … , 𝑋2𝑘
▪ The procedure is
▪ Stage 1: Regress each endogenous variable 𝑋2𝑖 on 𝑋1 and 𝑍
𝑋෢2𝑖 = 𝛾0 + 𝛾1 𝑋1 + 𝛾2 𝑍 + 𝑣
Then compute the fitted values 𝑋෢ ො0 + 𝛾ො1 𝑋1 + 𝛾ො2 𝑍 for each of the endogenous variables.
2𝑖 = 𝛾
▪ Stage 2: Regress
𝑦 = 𝛼 + 𝛽1 𝑋1 + 𝛽2 𝑋෠2 + 𝜀
where𝑋෠2 = 𝑋෢ ෢ ෢
21 , 𝑋22 , … , 𝑋2𝑘 .
THE WAGE EQUATION
ln 𝑤𝑎𝑔𝑒 = 𝑓 𝑒𝑑, 𝑋 + 𝜀
▪ 𝑒𝑑: education
▪ X: other control variables that affects wage
▪ Endogeneity: missing important variable of ability/productivity
▪ So the ability is included in 𝜀
▪ ability is believed to be correlated with 𝑒𝑑.
▪ As a result, 𝑒𝑑 is correlated with 𝜀, which implies endogeneity.
▪ Note that in this case, 𝑒𝑑 is the endogenous variable.
THE DATA
A survey of 20,306 individuals in the U.S.

▪ wage wage ($/hour) – THE DEPENDENT VARIABLE


▪ edu years of schooling (years) – THE ENDOGENOUS VARIABLE
▪ male 1 = male; 2 = female
▪ age age (year)
▪ tenure # years working for current employer
▪ union 1 = union member, 0 otherwise
▪ race 1 = white; 2 = black; 3 = others
▪ married 1 = married or living together with a partner, 0 otherwise
▪ nch: number of children
▪ fa_ed and mo_ed: (INSTRUMENTS): 1 = not completed highschool, 2 = completed
highschool, 3 = higher than highschool
Data file: EMP3.xlsx
SUMMARY STATISTICS
OLS REGRESSION
2SLS MANUALLY
FIRST STAGE
2SLS MANUALLY
SECOND STAGE
2SLS ESTIMATOR

AND LIMITED INFORMATION MAXIMUM LIKELIHOOD


2-STAGE LEAST SQUARES
▪ The 𝜅-class estimator of 𝛽
𝑏 = 𝑍 ′ 𝐼 − 𝜅𝑀𝑧 𝑍 −1 𝑍′ 𝐼 − 𝜅𝑀𝑧 𝑦
▪ with 𝑋𝑧 = 𝐼𝑉 𝑋 , 𝑀𝑧 is defined by
𝑀𝑧 = 𝐼 − 𝑋𝑧 𝑋𝑧 𝑋𝑧 −1 𝑋 ′
𝑧

▪ If we set 𝜅 = 1, 𝑏 is the 2SLS estimator.


2SLS
with R package
IVREG
2SLS
with R package
IVREG
robust standard
errors
LIMITED INFORMATION MAXIMUM LIKELIHOOD
▪ Recall the 𝜅-class estimator of 𝛽
𝑏 = 𝑍 ′ 𝐼 − 𝜅𝑀𝑧 𝑍 −1 𝑍′ 𝐼 − 𝜅𝑀𝑧 𝑦
▪ with 𝑋𝑧 = 𝐼𝑉 𝑋 , 𝑀𝑧 is defined by
𝑀𝑧 = 𝐼 − 𝑋𝑧 𝑋𝑧 𝑋𝑧 −1 𝑋 ′
𝑧

▪ The LIML estimator is obtained if we set 𝜅 to be the minimum eigenvalue of the


matrix
1 1
′ −2 ′𝑀𝑋1 ′ −2
𝑌1 𝑀𝑧 𝑌1 𝑌1 𝑌1 𝑌1 𝑀𝑧 𝑌1
▪ where
−1
𝑀𝑋1 = 𝐼 − 𝑋1 𝑋1 ′ 𝑋1 𝑋1 ′
▪ and 𝑌1 = [𝑦 𝑌] and everything else (formula) are identical to 2SLS.
DIAGNOSTIC TESTS
AFTER IV REGRESSION

Test for weak instruments


Test for endogeneity
Test for overidentifying restrictions
TEST FOR WEAK INSTRUMENTS
▪ This is to test whether the instruments satisfy “relevance” property.
▪ H0: not relevant,
▪ or, instruments not correlated with the endogenous regressor
▪ or, weak instrument

▪ In our example, H0 indicates that parents education does not affect workers’
education
▪ We reject H0 if p-value is small
▪ If not reject H0, the instruments are weak and we need to find other instruments
(and no need for further diagnostic tests)
▪ If reject H0, the instruments are not weak.
▪ We then need to test for overidentifying restrictions (for exogeneity, if number of
instruments are greater than number of endogenous variable)
TEST FOR WEAK INSTRUMENTS IN IVREG
TEST FOR ENDOGENEITY
▪ This is to test whether endogeneity exists in the model
▪ This test is based on the assumptions that the IVs are good (not weak)
▪ Now if our IVs are not weak, the 2SLS coefficients should be different from OLS
coefficients.
▪ If the 2SLS coefficients are not different from OLS coefficients, then there is no
endogeneity.
▪ This applies a Wu-Hausman test
▪ Null hypothesis H0: 2SLS coefficients = OLS coefficients
▪ If p-value is large: do not reject H0, meaning no endogeneity and OLS estimates are
better
▪ Otherwise if p-value is small, reject H0, meaning there is endogeneity and 2SLS
estimates are better.
TEST FOR ENDOGENEITY IN IVREG
SARGAN TEST
FOR OVERIDENTIFYING RESTRICTIONS
▪ We need instruments that are not weak before performing Sargan.
▪ The Sargan tests for exogeneity of the instruments (H0), which implies:
▪ the IVs are uncorrelated with the residuals of the second stage, and
▪ the equations are correctly specified, and no IV should in fact be included in the second
stage.
▪ If p-value is large, we do not reject H0, and the instruments are valid.
▪ Otherwise, if p-value is small, we reject H0 and the instruments are not valid.
▪ Note: if the number of IVs is equal to the number of endogenous variables, then we
can’t test whether the IVs are uncorrelated with the residuals of the second stage.
TEST FOR OVERIDENTIFYING RESTRICTIONS
DIAGNOSTIC TESTS
AFTER IVREG
USING ROBUST VCV

Test statistics and p-values are


different when using robust VCV.
GENERALIZED
METHOD OF MOMENTS
GENERALIZED METHOD OF MOMENTS
A BRIEF
▪ Without endogeneity and instruments, the regression equation is
𝑦𝑖 = 𝑋𝑖 𝛽 + 𝜀𝑖
▪ Without endogeneity, the error is uncorrelated with 𝑋𝑖
𝐸 𝑋𝑖 𝜀𝑖 = 0
▪ This gives the moment conditions
𝐸 𝑋𝑖 𝑦 − 𝑋𝑖 𝛽 = 0
▪ The sample moment conditions are
1 ′
𝑋 𝑦 − 𝑋𝛽 = 0
𝑁
▪ GMM estimation method estimate 𝛽 by minimizing the objective function

1 ′ 1 ′
𝐽(𝛽) = 𝑋 𝑦 − 𝑋𝛽 𝑊 𝑋 𝑦 − 𝑋𝛽
𝑁 𝑁
▪ where 𝑊 is the weighting matrix.
GENERALIZED METHOD OF MOMENTS
CONNECTION WITH OLS AND GLS
▪ The GMM objective function

1 ′ 1 ′
𝐽 𝛽 = 𝑋 𝑦 − 𝑋𝛽 𝑊 𝑋 𝑦 − 𝑋𝛽
𝑁 𝑁
▪ If the weighting matrix is chosen as the identity matrix:
▪ the GMM objective function is equivalent to the sum of squares of the residuals.
▪ The estimates are thus identical to OLS.

▪ If the weighting matrix is the inverse of the VCV matrix of the residuals Ω−1 :
▪ the GMM objective function is equivalent to the sum of squares of the residuals, weighted
by the inverse VCV.
▪ The estimates are thus identical to GLS.
GENERALIZED METHOD OF MOMENTS
WITH ENDOGENEITY AND INSTRUMENTS
▪ Given the equation
𝑦𝑖 = 𝛽𝑋𝑖 + 𝜀𝑖
▪ with some of the Xs are endogenous, and
▪ 𝑍 includes the exogenous Xs and the instrumental variables.
▪ The moment conditions become 𝑍𝑖 𝜀𝑖 = 0, or
𝐸 𝑍𝑖 𝑦 − 𝑋𝑖 𝛽 =0
▪ The sample moment conditions are
1 ′
𝑍 𝑦 − 𝑋𝛽 = 0
𝑁
▪ The GMM objective function is then

1 ′ 1 ′
𝐽(𝛽) = 𝑍 𝑦 − 𝑋𝛽 𝑊 𝑍 𝑦 − 𝑋𝛽
𝑁 𝑁
▪ where 𝑊 is the weighting matrix.
GMM
with R package
gmm

You might also like