Linear Models and Econometrics Chapter 4 Econometrics
Linear Models and Econometrics Chapter 4 Econometrics
Econometrics
1
Basic Econometrics
Introduction:
What is
Econometrics?
2
Introduction
What is Econometrics?
Definition 1: Economic Measurement
Definition 2: Application of the
mathematical statistics to economic data
in order to lend empirical support to the
economic mathematical models and
obtain numerical results (Gerhard Tintner,
1968)
3
Introduction
What is Econometrics?
Definition 3: The quantitative
analysis of actual economic phenomena
based on concurrent development of
theory and observation, related by
appropriate methods of inference
(P.A.Samuelson, T.C.Koopmans and
J.R.N.Stone, 1954)
4
Introduction
What is Econometrics?
Definition 4: The social science
which applies economics, mathematics
and statistical inference to the analysis of
economic phenomena (By Arthur S.
Goldberger, 1964)
Definition 5: The empirical
determination of economic laws (By H.
Theil, 1971)
5
Introduction
What is Econometrics?
Definition 6: A conjunction of
economic theory and actual
measurements, using the theory and
technique of statistical inference as a
bridge pier (By T.Haavelmo, 1944)
And the others
6
Economic Mathematical
Theory Economics
Econometrics
Economic Mathematic
Statistics Statistics
7
Introduction
Why a separate
discipline?
Economic theory makes statements
that are mostly qualitative in nature,
while econometrics gives empirical
content to most economic theory
Mathematical economics is to
express economic theory in
mathematical form without empirical
verification of the theory, while
econometrics is mainly interested in the
later
8
Introduction
Why a separate discipline?
Economic Statistics is mainly
concerned with collecting, processing and
presenting economic data. It does not
being concerned with using the collected
data to test economic theories
Mathematical statistics provides
many of tools for economic studies, but
econometrics supplies the later with
many special methods of quantitative
analysis based on economic data
9
Economic Mathematical
Theory Economics
Econometrics
Economic Mathematic
Statistics Statistics
10
Introduction
Methodology of
Econometrics
(1) Statement of theory or
hypothesis:
12
Introduction
Methodology of
Econometrics
(3) Specification of the
econometric model of the
theory
Y = ß1+ ß2X + u ; 0 < ß2< 1;
Y = consumption expenditure;
X = income;
ß1 and ß2 are parameters; ß1is
intercept and ß2 is slope coefficients;
u is disturbance term or error term. It
is a random or stochastic variable
13
Introduction
Methodology of
Econometrics
Y= Personal consumption
expenditure
X= Gross Domestic Product
all in Billion US Dollars
14
Introduction
Methodology of
Econometrics
(4) Obtaining Data
Year X Y
18
Introduction
Methodology of
Econometrics
(8) Using model for control or
policy purposes
Y=4000= -231.8+0.7194 X X 5882
MPC = 0.72, an income of $5882 Bill
will produce an expenditure of $4000
Bill. By fiscal and monetary policy,
Government can manipulate the
control variable X to get the desired
level of target variable Y
19
Introduction
Methodology of
Econometrics
Figure 1.4: Anatomy of economic
modelling
• 1) Economic Theory
• 2) Mathematical Model of Theory
• 3) Econometric Model of Theory
• 4) Data
• 5) Estimation of Econometric Model
• 6) Hypothesis Testing
• 7) Forecasting or Prediction
• 8) Using the Model for control or policy
purposes
20
Economic Theory
Estimation
Hypothesis Testing
Application
in control or
Forecasting policy
studies
21
Basic Econometrics
Chapter 1:
THE NATURE OF
REGRESSION
ANALYSIS
22
1-1. Historical origin of the term
“Regression”
The term REGRESSION was
introduced by Francis Galton
Tendency for tall parents to have tall
children and for short parents to have
short children, but the average height
of children born from parents of a
given height tended to move (or
regress) toward the average height in
the population as a whole (F. Galton,
“Family Likeness in Stature”)
23
1-1. Historical origin of the term
“Regression”
Galton’s Law was confirmed by Karl
Pearson: The average height of sons of
a group of tall fathers < their fathers’
height. And the average height of sons
of a group of short fathers > their
fathers’ height. Thus “regressing” tall
and short sons alike toward the average
height of all men. (K. Pearson and A.
Lee, “On the law of Inheritance”)
By the words of Galton, this was
“Regression to mediocrity”
24
1-2. Modern Interpretation of
Regression Analysis
The modern way in interpretation of
Regression: Regression Analysis is
concerned with the study of the
dependence of one variable (The
Dependent Variable), on one or more
other variable(s) (The Explanatory
Variable), with a view to estimating
and/or predicting the (population)
mean or average value of the former in
term of the known or fixed (in
repeated sampling) values of the latter.
Examples: (pages 16-19) 25
Dependent Variable Y; Explanatory Variable Xs
1. Y = Son’s Height; X = Father’s Height
2. Y = Height of boys; X = Age of boys
3. Y = Personal Consumption Expenditure
X = Personal Disposable Income
4. Y = Demand; X = Price
5. Y = Rate of Change of Wages
X = Unemployment Rate
6. Y = Money/Income; X = Inflation Rate
7. Y = % Change in Demand; X = % Change in the
advertising budget
8. Y = Crop yield; Xs = temperature, rainfall, sunshine,
fertilizer
26
1-3. Statistical vs.
Deterministic
Relationships
In regression analysis we are concerned
with STATISTICAL DEPENDENCE
among variables (not Functional or
Deterministic), we essentially deal with
RANDOM or STOCHASTIC variables
(with the probability distributions)
27
1-4. Regression vs. Causation:
Regression does not necessarily imply
causation. A statistical relationship
cannot logically imply causation. “A
statistical relationship, however strong
and however suggestive, can never
establish causal connection: our ideas
of causation must come from outside
statistics, ultimately from some theory
or other” (M.G. Kendal and A. Stuart,
“The Advanced Theory of Statistics”)
28
1-5. Regression vs.
Correlation
Correlation Analysis: the primary objective
is to measure the strength or degree of
linear association between two variables
(both are assumed to be random)
Regression Analysis: we try to estimate or
predict the average value of one variable
(dependent, and assumed to be stochastic)
on the basis of the fixed values of other
variables (independent, and non-stochastic)
29
1-6. Terminology and Notation
Dependent Variable Explanatory
Variable(s)
Explained Variable Independent
Variable(s)
Predictand
Predictor(s)
Regressand
Regressor(s)
Response Stimulus or control
variable(s)
Endogenous
Exogenous(es)
30
1-7. The Nature and Sources
of Data for Econometric
Analysis
1) Types of Data :
Time series data;
Cross-sectional data;
Pooled data
31
1-8. Summary and Conclusions
32
1-8. Summary and Conclusions
33
Basic Econometrics
Chapter 2:
TWO-VARIABLE
REGRESSION ANALYSIS:
Some basic Ideas
34
2-1. A Hypothetical Example
Total population: 60 families
Y=Weekly family consumption expenditure
X=Weekly disposable family income
60 families were divided into 10 groups of
approximately the same income level
(80, 100, 120, 140, 160, 180, 200, 220, 240, 260)
35
2-1. A Hypothetical Example
36
Table 2-2: Weekly family income X ($), and consumption Y ($)
Total 325 462 445 707 678 750 685 1043 966 1211
39
2-2. The concepts of population
regression function (PRF)
40
2-4. Stochastic Specification of PRF
U = Y - E(YX=Xi ) or Yi = E(YX=Xi ) + Ui
i
U = Stochastic disturbance or stochastic error
i
term. It is nonsystematic component
Component E(YX=X ) is systematic or
i
deterministic. It is the mean consumption
expenditure of all the families with the same
level of income
The assumption that the regression line
passes through the conditional means of Y
implies that E(UiXi ) = 0
41
2-5. The Significance of the Stochastic
Disturbance Term
U = Stochastic Disturbance Term is a
i
surrogate for all variables that are
omitted from the model but they
collectively affect Y
Many reasons why not include such
variables into the model as follows:
42
2-5. The Significance of the Stochastic
Disturbance Term
Why not include as many as variable into
the model (or the reasons for using ui)
+ Vagueness of theory
+ Unavailability of Data
+ Core Variables vs. Peripheral Variables
+ Intrinsic randomness in human behavior
+ Poor proxy variables
+ Principle of parsimony
+ Wrong functional form
43
2-6. The Sample Regression
Function (SRF)
Table 2-4: A random
sample from the Table 2-5: Another random
sample from the population
population
Y X Y X
------------------ -------------------
70 80 55 80
65 100 88 100
90 120 90 120
95 140 80 140
110 160 118 160
115 180 120 180
120 200 145 200
140 220 135 220
155 240 145 240
150 260 175 260
------------------ --------------------
44
Weekly Consumption
Expenditure (Y)
SRF1
SRF2
49
2-7. Summary and Conclusions
For empirical purposes, it is the stochastic
PRF that matters. The stochastic
disturbance term ui plays a critical role in
estimating the PRF.
The PRF is an idealized concept, since in
practice one rarely has access to the entire
population of interest. Generally, one has a
sample of observations from population
and use the stochastic sample regression
(SRF) to estimate the PRF.
50
Basic Econometrics
Chapter 3:
TWO-VARIABLE
REGRESSION MODEL:
The problem of Estimation
51
3-1. The method of ordinary
least square (OLS)
Least-square criterion:
Minimizing U^2 = (Y – Y^ ) 2
i i i
59
β̂
2
60
3-5. The coefficient of determination r2:
A measure of “Goodness of fit”
TSS = y 2 = Total Sum of Squares
i
ESS = Y^ i2 = ^22 x2i =
Explained Sum of Squares
RSS = u^2I = Residual Sum of
Squares
ESS RSS
1 = -------- + -------- ; or
TSS TSS
RSS RSS
1= r2 + ------- ; or r2 = 1 - ------- 61
TSS TSS
3-5. The coefficient of determination r2:
A measure of “Goodness of fit”
r2 = ESS/TSS
is coefficient of determination, it measures
the proportion or percentage of the total
variation in Y explained by the regression
Model
0 r2 1;
r = r2 is sample correlation
coefficient
Some properties of r
62
3-5. The coefficient of determination r2:
A measure of “Goodness of fit”
63
Basic Econometrics
Chapter 4:
THE NORMALITY
ASSUMPTION:
Classical Normal Linear
Regression Model
(CNLRM)
64
4-2.The normality assumption
CNLR assumes that each u i is distributed
normally u i N(0, 2) with:
Mean = E(u i) = 0 Ass 3
Variance = E(u2i) = 2 Ass 4
Cov(u i , u j ) = E(u i , u j) = 0 (i#j) Ass 5
Note: For two normally distributed variables, the
zero covariance or correlation means
independence of them, so u i and u j are not only
uncorrelated but also independently distributed.
Therefore u i NID(0, 2) is Normal and
Independently Distributed
65
4-2.The normality assumption
Why the normality assumption?
(1) With a few exceptions, the distribution of sum
of a large number of independent and
identically distributed random variables tends
to a normal distribution as the number of
such variables increases indefinitely
(2) If the number of variables is not very large or
they are not strictly independent, their sum
may still be normally distributed
66
4-2.The normality assumption
Why the normality assumption?
(3) Under the normality assumption
for ui , the OLS estimators ^1 and
^2 are also normally distributed
(4) The normal distribution is a
comparatively simple distribution
involving only two parameters
(mean and variance)
67
4-3. Properties of OLS
estimators under the normality
assumption
With the normality assumption the
OLS estimators ^1 , ^2 and ^2 have
the following properties:
1. They are unbiased
2. They have minimum variance.
Combined 1 and 2, they are efficient
estimators
3. Consistency, that is, as the sample size
increases indefinitely, the estimators
converge to their true population
values
68
4-3. Properties of OLS estimators
under the normality assumption
4. ^1 is normally distributed
N(1, ^12)
And Z = (^1- 1)/ ^1 is N(0,1)
5. ^2 is normally distributed N(2 ,^22)
And Z = (^2- 2)/ ^2 is N(0,1)
6. (n-2) ^2/ 2 is distributed as the
2(n-2)
69
4-3. Properties of OLS estimators
under the normality assumption
7. ^1 and ^2 are distributed
independently of ^2. They have
minimum variance in the entire class
of unbiased estimators, whether linear
or not. They are best unbiased
estimators (BUE)
8. Let ui is N(0, 2 ) then Yi is
N[E(Yi); Var(Yi)] = N[1+ 2X i ; 2]
70
Some last points of chapter 4
71
Some last points of chapter 4
Chapter 5:
TWO-VARIABLE
REGRESSION:
Interval Estimation
and Hypothesis Testing
73
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing
74
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing
75
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing
Type of H0 H1 Reject H0
Hypothesis if
Two-tail 2 = 2* 2 # 2* |t| > t/2,df
Right-tail 2 2* 2 > 2* t > t,df
Left-tail 2 2* 2 < 2* t < - t,df
84
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing
5-7. Hypothesis Testing: The test of
significance approach
Testing the significance of 2 : The 2 Test
Under the Normality assumption we have:
^2
2 = (n-2) ------- ~ 2 (n-2) (5.4.1)
2
From (5.4.2) and (5.4.3) on page 520 =>
85
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing
5-7. Hypothesis Testing: The test of
significance approach
Table 5-2: A summary of the 2 Test
H0 H1 Reject H0 if
2 = 20 2 > 20 Df.(^2)/ 20 > 2 ,df
2 = 20 2 < 20 Df.(^2)/ 20 < 2(1-),df
2 = 20 2 # 20 Df.(^2)/ 20 > 2/2,df
or < 2 (1-/2), df
86
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing
91
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing
An illustration:
5-12. (Continued)
Under the null hypothesis H0 that the
residuals are normally distributed
Jarque and Bera show that in large
sample (asymptotically) the JB statistic
given in (5.12.12) follows the Chi-
Square distribution with 2 df. If the p-
value of the computed Chi-Square
statistic in an application is sufficiently
low, one can reject the hypothesis that
the residuals are normally distributed.
But if p-value is reasonable high, one
does not reject the normality 98
Chapter 5 TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing
Chapter 6
EXTENSIONS OF THE
TWO-VARIABLE LINEAR
REGRESSION MODEL
105
Chapter 6
EXTENSIONS OF THE TWO-VARIABLE LINEAR
REGRESSION MODELS
109
Chapter 6
EXTENSIONS OF THE TWO-VARIABLE LINEAR
REGRESSION MODELS
6-2. Scaling and units of measurement
112
6-4. How to measure elasticity
The log-linear model
Exponential regression model:
Yi= 1Xi e u i (6.4.1)
By taking log to the base e of both side:
lnYi = ln1 +2lnXi + ui , by setting ln1 =
lnYi = +2lnXi + ui (6.4.3)
(log-log, or double-log, or log-linear model)
This can be estimated by OLS by letting
Y*i = +2X*i + ui , where Y*i=lnYi, X*i=lnXi ;
2 measures the ELASTICITY of Y respect to X, that is,
113
percentage change in Y for a given (small) percentage
6-4. How to measure elasticity
Chapter 7
MULTIPLE REGRESSION
ANALYSIS:
The Problem of Estimation
121
7-1. The three-Variable Model:
Notation and Assumptions
Yi = ß1+ ß2X2i + ß3X3i + u i (7.1.1)
ß , ß are partial regression coefficients
2 3
With the following assumptions:
123
7-3. The meaning of partial
regression coefficients
Yi= ß1+ ß2X2i + ß3X3 +….+ ßsXs+ ui
ßk measures the change in the mean
value of Y per unit change in Xk,
holding the rest explanatory variables
constant. It gives the “direct” effect of
unit change in Xk on the E(Yi), net of Xj
(j # k)
How to control the “true” effect of a
unit change in Xk on Y? (read pages
195-197) 124
7-4. OLS and ML estimation of the
partial regression coefficients
This section (pages 197-201) provides:
1. The OLS estimators in the case of three-
variable regression
Yi= ß1+ ß2X2i + ß3X3+ ui
2. Variances and standard errors of OLS
estimators
3. 8 properties of OLS estimators (pp 199-201)
4. Understanding on ML estimators
125
7-5. The multiple coefficient of
determination R2 and the multiple
coefficient of correlation R
This section provides:
1. Definition of R2 in the context of multiple
regression like r2 in the case of two-variable
regression
2. R = R2 is the coefficient of multiple
regression, it measures the degree of
association between Y and all the explanatory
variables jointly
3. Variance of a partial regression coefficient
Var(ß^k) = 2/ x2k (1/(1-R2k)) (7.5.6)
Where ß^k is the partial regression coefficient
of regressor Xk and R2k is the R2 in the
regression of Xk on the rest regressors 126
7-6. Example 7.1: The
expectations-augmented Philips
Curve for the US (1970-1982)
127
7-7. Simple regression in the
context of multiple regression:
Introduction to specification bias
128
7-8. R2 and the Adjusted-R2
R2 is a non-decreasing function of the number of
explanatory variables. An additional X variable will not
decrease R2
R2= ESS/TSS = 1- RSS/TSS = 1-u^2I / y^2i (7.8.1)
This will make the wrong direction by adding more
irrelevant variables into the regression and give an idea
for an adjusted-R2 (R bar) by taking account of degree of
freedom
R2bar= 1- [ u^2I /(n-k)] / [y^2i /(n-1) ] , or (7.8.2)
R2bar= 1- ^2 / S2Y (S2Y is sample variance of Y)
K= number of parameters including intercept term
– By substituting (7.8.1) into (7.8.2) we get
R2bar = 1- (1-R2) (n-1)/(n- k) (7.8.4)
– For k > 1, R2bar < R2 thus when number of X variables
increases R2bar increases less than R2 and R2bar can be 129
negative
7-8. R2 and the Adjusted-R2
132
7-10. Example 7.3: The Cobb-
Douglas Production function
More on functional form
(7.11.3)
Example 7.4: Estimating the Total Cost
Function
Data set is in Table 7.4
Empirical results is in page 221
--------------------------------------------------------------
7-12. Summary and Conclusions 134
(page 221)
Basic Econometrics
Chapter 8
MULTIPLE REGRESSION
ANALYSIS:
The Problem of Inference
135
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-3. Hypothesis testing in multiple regression:
Testing hypotheses about an individual partial regression
coefficient
Testing the overall significance of the estimated multiple
regression model, that is, finding out if all the partial slope
coefficients are simultaneously equal to zero
Testing that two or more coefficients are equal to one
another
Testing that the partial regression coefficients satisfy
certain restrictions
Testing the stability of the estimated regression model
over time or in different cross-sectional units
Testing the functional form of regression models
136
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-4. Hypothesis testing about individual partial
regression coefficients
With the assumption that u i ~ N(0,2) we can
use t-test to test a hypothesis about any
individual partial regression coefficient.
H0: 2 = 0
H1: 2 0
If the computed t value > critical t value at the
chosen level of significance, we may reject the
null hypothesis; otherwise, we may not reject it
137
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-5. Testing the overall significance of a multiple
regression: The F-Test
For Yi = 1 + 2X2i + 3X3i + ........+ kXki + ui
To test the hypothesis H : = =....= = 0 (all
0 2 3 k
slope coefficients are simultaneously zero) versus H1: Not at
all slope coefficients are simultaneously zero,
compute
F=(ESS/df)/(RSS/df)=(ESS/(k-1))/(RSS/(n-k)) (8.5.7)
(k = total number of parameters to be estimated
including intercept)
If F > F = F(k-1,n-k), reject H0
critical 138
Otherwise you do not reject it
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-5. Testing the overall significance of a multiple
regression
Alternatively, if the p-value of F obtained from
(8.5.7) is sufficiently low, one can reject H0
An important relationship between R2 and F:
F=(ESS/(k-1))/(RSS/(n-k)) or
R2/(k-1)
F = ---------------- (8.5.1)
(1-R2) / (n-k)
( see prove on page 249)
139
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-5. Testing the overall significance of a multiple
regression in terms of R2
(8.7.3) 146
Chapter 8
MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference
8-7. Restricted least square: Testing
linear equality restrictions
How to test (8.7.3)
The t Test approach (unrestricted): test of the
hypothesis H0:2 + 3 = 1 can be conducted by t- test:
t = [(^2 + ^3) – (2 + 3)] / se(^2 - ^3) (8.7.4)