0% found this document useful (0 votes)
3 views87 pages

Econometrics II Chapter One

The document discusses the use of dummy variables in regression analysis, highlighting their role in incorporating qualitative data into models. It explains how dummy variables can represent categories such as gender or education levels, allowing for the analysis of their impact on dependent variables like salary or productivity. The document also provides examples and guidelines for constructing and interpreting regression models that include dummy variables.

Uploaded by

yazewyohanes794
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views87 pages

Econometrics II Chapter One

The document discusses the use of dummy variables in regression analysis, highlighting their role in incorporating qualitative data into models. It explains how dummy variables can represent categories such as gender or education levels, allowing for the analysis of their impact on dependent variables like salary or productivity. The document also provides examples and guidelines for constructing and interpreting regression models that include dummy variables.

Uploaded by

yazewyohanes794
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Econometrics II

Econ 3062

By: Habtamu Legese (Asst.Prof)

By: Habtamu Legese (Asst.Prof) 1


Chapter One
Regression Analysis with Qualitative Data:
Binary (or Dummy Variables)

By: Habtamu Legese (Asst.Prof) 2


1.1 The nature of dummy variables

• In regression analysis the dependent variable is


frequently influenced not only by variables that can be
readily quantified on some well-defined scale.
• (e.g., sex, race, colour, religion, nationality, wars,
earthquakes, strikes, political upheavals, and
changes in government economic policy).

By: Habtamu Legese (Asst.Prof) 3


Cont.
• For example, holding all other factors constant, female
daily wage workers are found to earn less than their
male counterparts, and nonwhites are found to earn
less than whites.
• This pattern may result from sex or racial
discrimination, but whatever the reason, qualitative
variables such as sex and race do influence the
dependent variable and clearly should be included
among the explanatory variables.

By: Habtamu Legese (Asst.Prof) 4


Cont.
•Qualitative variables usually indicate the
presence or absence of a “quality” or an
attribute, such as male or female, black or
white, or Christian or Muslim.

•One method of “quantifying” such attributes


is by constructing artificial variables that take
on values of 1 or 0, 0 indicating the absence of
an attribute and 1 indicating the presence (or
possession) of that attribute.

By: Habtamu Legese (Asst.Prof) 5


Cont.
•For example, 1 may indicate that a person is a
male, and 0 may designate a female; or 1 may
indicate that a person is a college graduate, and
0 that he is not, and so on.
•Variables that assume such 0 and 1 values are
called dummy variables.
•Alternative names are indicator variables,
binary variables, categorical variables, and
dichotomous variables.

By: Habtamu Legese (Asst.Prof) 6


Cont.
• Dummy variables can be used in regression models
just as easily as quantitative variables. As a matter of
fact, a regression model may contain explanatory
variables that are exclusively dummy, or qualitative,
in nature.

Example: Yi    Di  ui ------------------------------------------(1.01)

where Y=annual salary of a college professor


Di  1 if male college professor
= 0 otherwise (i.e., female professor)

By: Habtamu Legese (Asst.Prof) 7


Cont.
• Model (1.01) may enable us to find out whether sex
makes any difference in a college professor’s salary,
assuming, of course, that all other variables such as
age, degree attained, and years of experience are held
constant.
• Assuming that the disturbance satisfies the usually
assumptions of the classical linear regression model,
we obtain from (1.01).

Mean salary of female college professor: E (Yi / Di  0)   -------(1.02)

Mean salary of male college professor: E (Yi / Di  1)    

By: Habtamu Legese (Asst.Prof) 8


Cont.
the intercept term  gives the mean salary of female college professors and the slope
coefficient  tells by how much the mean salary of a male college professor differs from the
mean salary of his female counterpart,    reflecting the mean salary of the male college
professor.

A test of the null hypothesis that there is no sex discrimination ( H 0 :   0) can be easily made

by running regression (1.01) in the usual manner and finding out whether on the basis of the t
test the estimated  is statistically significant.

By: Habtamu Legese (Asst.Prof) 9


A. Dummy Independent Variable Models
1.2 Regression on one quantitative variable and one qualitative
variable with two classes, or categories

Consider the model: Yi   i   2 Di  X i  ui ---------------(1.03)


Where: Yi  annual salary of a college professor
X i  years of teaching experience
Di  1 if male
=0 otherwise

By: Habtamu Legese (Asst.Prof) 10


Cont.
• Model (1.03) contains one quantitative variable (years of
teaching experience) and one qualitative variable (sex)
that has two classes (or levels, classifications, or
categories), namely, male and female.

What is the meaning of this equation? Assuming, as usual, that E(ui )  0, we see that

Mean salary of female college professor: E(Yi / X i , Di  0)  1  X i ---------(1.04)

Mean salary of male college professor: E(Yi / X i , Di  1)  (   2 )  X i ------(1.05)

By: Habtamu Legese (Asst.Prof) 11


Cont.
• Geometrically, we have the situation shown in fig. 1.1 (for
illustration, it is assumed that ). In words, model 1.01
postulates that the male and female college professors’ salary
functions in relation to the years of teaching experience have
the same slope but different intercepts.
• In other words, it is assumed that the level of the male
professor’s mean salary is different from that of the female
professor’s mean salary (by but the rate of change in the
mean annual salary by years of experience is the same for both
sexes.

By: Habtamu Legese (Asst.Prof) 12


By: Habtamu Legese (Asst.Prof) 13
Cont.
•If the assumption of common slopes is valid, a
test of the hypothesis that the two regressions
(1.04) and (1.05) have the same intercept (i.e.,
there is no sex discrimination) can be made
easily by running the regression (1.03) and
noting the statistical significance of the
estimated on the basis of the traditional t
test.
•If the t test shows that is statistically
significant, we reject the null hypothesis that the
male and female college professors’ levels of
mean annual salary are the same.
By: Habtamu Legese (Asst.Prof) 14
Cont.
• Before proceeding further, note the following features of the
dummy variable regression model considered previously
1. To distinguish the two categories, male and female, we have
introduced only one dummy variable . For if always
denotes a male, when D = 0 we know that it is a female since
there are only two possible outcomes.
Hence, one dummy variable suffices to distinguish two
categories. The general rule is this: If a qualitative variable
has ‘m’ categories, introduce only ‘m-1’ dummy variables.

By: Habtamu Legese (Asst.Prof) 15


Cont.
• In our example, sex has two categories, and hence we
introduced only a single dummy variable. If this rule is not
followed, we shall fall into what might be called the dummy
variable trap, that is, the situation of perfect multicollinearity.
2. The assignment of 1 and 0 values to two categories, such as
male and female, is arbitrary in the sense that in our example we
could have assigned D = 1 for female and D = 0 for male.

By: Habtamu Legese (Asst.Prof) 16


Cont.
3. The group, category, or classification that is assigned the
value of 0 is often referred to as the base, benchmark, control,
comparison, reference, or omitted category. It is the base in
the sense that comparisons are made with that category.

4. The coefficient attached to the dummy variable D can


be called the differential intercept coefficient because it tells
by how much the value of the intercept term of the category that
receives the value of 1 differs from the intercept coefficient of
the base category.

By: Habtamu Legese (Asst.Prof) 17


What is dummy variable ?
• In statistics and econometrics, particularly in regression
analysis, a dummy variable is one that takes only the
value 0 or 1 to indicate the absence or presence of
some categorical effect that may be expected to shift the
outcome.

By: Habtamu Legese (Asst.Prof) 18


What is the purpose of dummy variables?

• Dummy variables are useful because they enable us to use a


single regression equation to represent multiple groups.

• This means that we don't need to write out separate equation


models for each subgroup.

• The dummy variables act like 'switches' that turn various


parameters on and off in an equation.

By: Habtamu Legese (Asst.Prof) 19


How do you determine the number of dummy variables?

• The first step in this process is to decide the number of dummy


variables.
• This is easy; it's simply k-1, where k is the number of levels of
the original variable.
• You could also create dummy variables for all levels in the
original variable, and simply drop one from each analysis.

By: Habtamu Legese (Asst.Prof) 20


Is 0 male or female?
• In the case of gender, there is typically no natural
reason to code the variable female = 0, male = 1,
versus male = 0, female = 1.
• However, convention may suggest one coding is more
familiar to a reader; or choosing a coding that makes
the regression coefficient positive may ease
interpretation.

By: Habtamu Legese (Asst.Prof) 21


Can dummy variables be 1 and 2?
• Technically, dummy variables are dichotomous,
quantitative variables.
• Their range of values is small; they can take on only two
quantitative values.
• As a practical matter, regression results are easiest to interpret
when dummy variables are limited to two specific values, 1 or
0.

By: Habtamu Legese (Asst.Prof) 22


Why do we drop one dummy variable?

By: Habtamu Legese (Asst.Prof) 23


Numerical Example
• Let's say we want to analyze how a training program (dummy
variable) affects productivity (quantitative variable) measured
in units produced.

Employee Training (Dummy Productivity (Units


Variable) Produced)
1 0 50
2 1 70
3 0 60
4 1 80
5 1 75
6 0 55

By: Habtamu Legese (Asst.Prof) 24


Step 1: Set Up the Variables
Dummy Variable (D):
0 if no training
1 if training
Quantitative Variable (X):
o Productivity (units produced)
Step 2: Calculate the Means
Mean of Training (D) =3/6 = 0.5
Mean of Productivity (Y) = 390/6 65

By: Habtamu Legese (Asst.Prof) 25


Step 3: Calculate the Regression Coefficients
• The regression equation can be expressed as:
Y=β0​+β1​D
• For simplicity, we’ll use only the dummy variable
and the productivity variable in this example.
1.Calculate the coefficients using the Ordinary Least
Squares (OLS) method.
• For the sake of simplicity, let's compute the
coefficients manually using the following formulas:

By: Habtamu Legese (Asst.Prof) 26


Step 4: Calculate β1​ (Coefficient for Training)

No. D Y D-𝑫ന Y-𝒀ന (D-𝑫ന )(Y-𝒀


ന) (D-𝑫ന )2
1 0 50 -0.5 -15 7.5 0.25
2 1 70 0.5 5 2.5 0.25
3 0 60 -0.5 -5 2.5 0.25
4 1 80 0.5 15 7.5 0.25
5 1 75 0.5 10 5 0.25
6 0 55 -0.5 -10 5 0.25
∑ 3 390 0 0 30 1.5
Mean 0.5 65

By: Habtamu Legese (Asst.Prof) 27


Calculate the parameters

𝟑𝟎
= = 20
𝟏.𝟓
=65-20(0.5) = 55

The regression equation is:


Y = 55 + 20D

By: Habtamu Legese (Asst.Prof) 28


Final Regression Equation
• The regression equation is:
Y = 55 + 20D
• Interpretation
• When the training program is not provided (D=0), the
expected productivity is 55 units.
• When the training program is provided (D=1), the
expected productivity increases to 75 units (55 + 20).
• This simple numerical example illustrates how to
perform linear regression with one dummy variable
and one quantitative variable.

By: Habtamu Legese (Asst.Prof) 29


1.3 Regression on one quantitative variable and one
qualitative variable with more than two classes

• Suppose that, on the basis of the cross-sectional data, we


want to regress the annual expenditure on health care by
an individual on the income and education of the
individual.
• Since the variable education is qualitative in nature,
suppose we consider three mutually exclusive levels of
education: less than high school, high school, and
college.

By: Habtamu Legese (Asst.Prof) 30


Cont.
• Now, unlike the previous case, we have more than
two categories of the qualitative variable education.

• Therefore, following the rule that the number of


dummies be one less than the number of categories
of the variable, we should introduce two dummies
to take care of the three levels of education.

• Assuming that the three educational groups have a


common slope but different intercepts in the
regression of annual expenditure on health care on
annual income, we can use the following model:
By: Habtamu Legese (Asst.Prof) 31
Cont.
Yi  1   2 D2i   3 D3i  X i  ui --------------------------(1.06)

Where Yi  annual expenditure on health care

X i  annual expenditure
D2  1 if high school education
= 0 otherwise
D3  1 if college education
= 0 otherwise

By: Habtamu Legese (Asst.Prof) 32


Cont.
• Note that in the preceding assignment of the dummy
variables we are arbitrarily treating the “less than
high school education” category as the base
category. Therefore, the intercept will reflect the
intercept for this category.

•The differential intercepts and tell by how


much the intercepts of the other two categories
differ from the intercept of the base category,
which can be readily checked as follows:

By: Habtamu Legese (Asst.Prof) 33


Cont.

(Yi | D2 , we
•EAssuming D3  0from
0, obtain   1  X i
, X i )(1.06)
E (Yi | D2  1, D3  0, X i )  (1   2 )  X i
E(Yi | D2  0, D3  1, X i )  (1   3 )  X i

• Which are, respectively the mean health care


expenditure functions for the three levels of
education, namely, less than high school, high
school, and college.

By: Habtamu Legese (Asst.Prof) 34


Cont.
• Geometrically, the situation is shown in fig 1.2 (for illustrative
purposes it is assumed that ).

By: Habtamu Legese (Asst.Prof) 35


1.4 Regression on one quantitative variable
and two qualitative variables
The technique of dummy variable can be
easily extended to handle more than one
qualitative variable.
Let us revert to the college professors’ salary
regression (1.03), but now assume that in
addition to years of teaching experience and
sex the skin color of the teacher is also an
important determinant of salary.
For simplicity, assume that colour has two
categories: black and white
By: Habtamu Legese (Asst.Prof) 36
Cont.
Yi​=α1​+α2​D2i​+α3​D3i​+βXi​+ui
•Where:
Yi = Annual salary
Xi= Years of teaching experience
D2= 1 if male, 0 otherwise
D3​= 1 if white, 0 otherwise

By: Habtamu Legese (Asst.Prof) 37


Cont.

• Notice that each of the two qualitative variables, sex and color,
has two categories and hence needs one dummy variable for
each. Note also that the omitted, or base, category now is
“black female professor”.

By: Habtamu Legese (Asst.Prof) 38


Cont.
Assuming E(ui )  0 , we can obtain the following regression from (1.07)
Mean salary for black female professor:
E(Yi | D2  0, D3  0, X i )  1  X i
Mean salary for black male professor:
E(Yi | D2  1, D3  0, X i )  (1   2 )  X i
Mean salary for white female professor:
E(Yi | D2  0, D3  1, X i )  (1   3 )  X i
Mean salary for white male professor:
E(Yi | D2  1, D3  1, X i )  (1   2   3 )  X i

By: Habtamu Legese (Asst.Prof) 39


Cont.
• Once again, it is assumed that the preceding regressions
differ only in the intercept coefficient but not in the slope
coefficient.
• An OLS estimation of (1.07) will enable us to test a variety
of hypotheses. Thus, if is statistically significant, it will
mean that colour does affect a professor’s salary.
• Similarly, if is statistically significant, it will mean that
sex also affects a professor’s salary. If both these differential
intercepts are statistically significant, it would mean sex as
well as colour is an important determinant of professors’
salaries.

By: Habtamu Legese (Asst.Prof) 40


Cont.
•From the preceding discussion it follows that
we can extend our model to include more than
one quantitative variable and more than two
qualitative variables.
•The only precaution to be taken is that the
number of dummies for each qualitative
variable should be one less than the number of
categories of that variable.

By: Habtamu Legese (Asst.Prof) 41


1.5 Interaction effects
• Consider the following model:

Yi  1   2 D2i   3 D3i  X i  ui ----------------------------(1.08)


where Yi  annual expenditure on clothing
X i  Income
D2  1 if female
= 0 if male
D3  1 if college graduate
= 0 otherwise

By: Habtamu Legese (Asst.Prof) 42


Cont.
• The implicit assumption in this model is that the
differential effect of the sex dummy is constant
across the two levels of education and the differential
effect of the education dummy is also constant across
the two sexes.
• That is, if, say, the mean expenditure on clothing is
higher for females than males this is so whether they
are college graduates or not. Likewise, if, say,
college graduates on the average spend more on
clothing than non-college graduates, this is so whether
they are female or males.

By: Habtamu Legese (Asst.Prof) 43


Cont.
• In many applications, such an assumption may be
untenable. A female college graduate may spend
more on clothing than a male graduate.
• In other words, there may be interaction between the
two qualitative variables and therefore their effect
on mean Y may not be simply additive as in (1.08) but
multiplicative as well, as in the following model:

Yi  1   2 D2i   3 D3i   4 ( D2i D3i )  X i  ui -----------------(4.09)

By: Habtamu Legese (Asst.Prof) 44


Cont.
• From (4.09) we obtain
E(Yi | D2  1, D3  1, X i )  (1   2   3   4 )  X i ------------(4.10)
• which is the mean clothing expenditure of graduate
females. Notice that
• differential effect of being a female
• differential effect of being a college graduate
• differential effect of being a female graduate

By: Habtamu Legese (Asst.Prof) 45


Cont.
•If are all positive, the average
clothing expenditure of females is higher than the
base category (which here is male non-graduate),
but it is much more so if the females also happen
to be graduates.
•This shows how the interaction dummy modifies
the effect of the two attributes considered
individually.
•Whether the coefficient of the interaction dummy
is statistically significant can be tested by the
usual t test. Omitting a significant interaction
term will lead to a specification bias.
By: Habtamu Legese (Asst.Prof) 46
Cont.
The importance of interactions among
dummy variables
 Help us to get influential variables
 To avoid misspecification bias

By: Habtamu Legese (Asst.Prof) 47


Slope indicator variables

The interaction between dummy variables and


quantitative variables. They affect only slope,
i.e, it does not affect intercept.
It help us to captures the interaction effect of
dummy and quantitative variables on
dependent variables
 Look at the following example
The price of condominium house can be explained
as a function of its characteristics such as its size,
location, number of bedrooms, age, floor and so on.
By: Habtamu Legese (Asst.Prof)
48
Cont.
For our discussion, let us assume that the
number bed room of the house measured in
numbers, nbdr, is the only relevant variable in
determining house price.
prhou   0  1nbdr  ui

 1 is the value of an additional number of bed


rooms.
 0 is the value of land alone
 We can use dummy variable and indicator
variable interchangeable.
By: Habtamu Legese (Asst.Prof)
49
Cont.
prhou   0  neib  1nbdr  ui
1if desirable neibourhood
neib   0 if not desirable neibourhood

We make the reference group, non desirable


group.

Instead of assuming that the effect of location on


house price causes a change in the intercept.
Let us assume that the change is in the slope of the
relationship.
By: Habtamu Legese (Asst.Prof)
50
Cont.
We can allow for a change in a slope by
including in the model an additional
explanatory variable that is equal to the
product of an indicator variable and
continuous variable.
In our model, the slope of the relationship is
the value of an additional number of bed
rooms.
If we assume 1 value for homes in desirable
neibourhood, and 0 other wise; we can specify
our model as follows:
prhou   0  1nbdr  (nbdr * neib)  ui
By: Habtamu Legese (Asst.Prof)
51
Cont.
 The new variable (nbdr*neib) is the product of the
number of bedroom and the indicator variables, is
called an interaction variable as it captures the
interaction of location and number of bedroom on
condominium house prices.
 Or it is called a slope –indicator variable or a slope
dummy variable, b/c it allows for the change in the
slope of the relationship.
 The slope indicator variable takes a value equal to
nbdr for houses in the desirable neibourhood, when
neib = 1, and it is 0 for homes in other
neighbourhoods.
By: Habtamu Legese (Asst.Prof)
52
Cont.
 A slope indicator variable is treated as just like
any other explanatory variable in a regression
model.
  0   1 nbdr   nbdr    when D  1
E ( prhou )  
  0   1 nbdr         when D  0

 In the desirable neighbourhood, the price per


additional number of bedrooms of a house is    1

 In the non desirable neighbourhood, the price


per additional number of bedrooms of a house is  . 1

 If   0 price per additional number of bedrooms


is higher in the more desirable neighbourhood.

By: Habtamu Legese (Asst.Prof)


53
Cont.
The effect of including a slope indicator variable
also can be see by using calculus.
The partial derivatives of expected house price
with respect to number of bed rooms
E ( prhou )  1   when D  1
(nbdr )   1 w hen D  0
 If 0

E( prhou)   0  ( 1  )nbdr
slope  1  
prhou
E( prhou)  0  1nbdr

 0
slope  1

nbdr
By: Habtamu Legese (Asst.Prof)
54
Cont.

 If we assume that house location affects both the


intercept and the slope, then both can be
incorporated into a single model.
 The model specification will be:
prhou   0  neib  1nbdr  (nbdr * neib)  ui

 (  0   )  (  1   ) nbdr    when D  1
E ( prhou )  
  0   1 nbdr         when D  0

By: Habtamu Legese (Asst.Prof)


55
Tests for Structural Change and Stability
•A fundamental assumption in regression
modeling is that the pattern of data on
dependent and independent variables remains
the same throughout the period over which the
data is collected.
• Under such an assumption, a single linear
regression model is fitted over the entire data
set.
•The regression model is estimated and used for
prediction assuming that the parameters
remain same over the entire time period of
estimation and prediction.
By: Habtamu Legese (Asst.Prof) 56
•When it is suspected that there exists a
change in the pattern of data, then the
fitting of single linear regression model
may not be appropriate, and more than
one regression models may be required to
be fitted.
•Before taking such a decision to fit a single
or more than one regression models, a
question arises how to test and decide if there
is a change in the structure or pattern of
data.
•Such changes can be characterized by the
change in the parameters of the model and
are termed as structural change. By: Habtamu Legese (Asst.Prof) 57
Cont.
• Now we consider some examples to understand the
problem of structural change in the data.
• Suppose the data on the consumption pattern is available
for several years and suppose there was a war in
between the years over which the consumption data is
available.
• Obviously, the consumption pattern before and after the
war does not remain the same as the economy of the
country gets disturbed.
• So if a model is fitted then the regression coefficients
before and after the war period will change. Such a
change is referred to as a structural break or
structural change in the data.
• A better option, in this case, would be to fit two
different linear regression models- one for the data
before the war and another for the data after the war.
By: Habtamu Legese (Asst.Prof) 58
Cont.
Testing for structural stability will help us to find
out whether two or more regressions are different,
where the difference may be in the intercepts or
the slopes or both.
Suppose we are interested in estimating a
simple saving function that relates domestic
household savings (S) with gross domestic
product (Y) for Ethiopia.
Suppose further that, at a certain point of
time (1991), a series of economic reforms
have been introduced.

By: Habtamu Legese (Asst.Prof)


59
Cont.
So far we assumed that the intercept and
all the slope coefficients (βj's) are the
same/stable for the whole set of
observations. Y = Xβ + e
But, structural shifts and/or group
differences are common in the real world.
May be:
The intercept differs/changes, or
The slope differs/changes, or
Both differ/change across categories or
time period.
By: Habtamu Legese (Asst.Prof)
60
Cont.
 The hypothesis here is that such reforms might have
considerably influenced the savings - income
relationship, that is, the relationship between savings
and income might be different in the post reform
period as compared to that in the pre-reform period.
 If this hypothesis is true, then we say a structural
change has happened.

 H0: Economic reforms might not have influenced the


savings and national income relationship
 H1: Economic reforms might have influenced the
savings and national income relationship
 How do we check if this is so?
By: Habtamu Legese (Asst.Prof)
61
Cont.
 We can test structural stability of testing parameter
by using two methods.

1. Dummy variables
2. Chow test
1. Using dummy variables
* Write the savings function as:
S t   0  1 Dt   2Yt   3 (Yt Dt )  u t
where St is household saving at time t,Yt is GDP at time t and
0 if pre  reform ( 1991)
D t 
1if post  reform ( 1991)

By: Habtamu Legese (Asst.Prof)


62
Cont.
 3 is the differential slope coefficient
indicating how much the slope coefficient of the
pre-reform period savings function differs from
the slope coefficient of the savings function in
the post reform period.
Decision rule:
If  1 &  3 are both statistically significant as
judged by the t-test, the pre-reform and post-
reform regressions differ in both the intercept
and the slope.

By: Habtamu Legese (Asst.Prof)


63
Cont.
 If only  1 is statistically significant, then the
pre-reform and post-reform regressions differ
only in the intercept (meaning the marginal
propensity to save (MPS) is the same for pre-
reform and post-reform periods).
 If only  3 is statistically significant, then the
two regressions differ only in the slope (MPS).
 Check structural stability for the f/wing
regression result:
Ŝt  20.76005  5.9991 D̂t  2.616285 Yˆ  0.5298177 (Yˆ D̂ )
t t t

By: Habtamu Legese (Asst.Prof)


64
Cont.
Example 2: Using the DVR to Test for Structural
Break:
 Recall the example of consumption function:
Period 1: consi = α1+ β1*inci+ui
vs. Period 2: consi = α2+ β2*inci+ui
Let’s define a dummy variable D1, where:
 D1 = 1 for the period 1974-1991, and
 D1 =0 for the period 1992-2006
 Then, consi = α0+α1*D1+β0*inci+β1(D1*inci)+ui
For period 1: consi = (α0+α1)+(β0+β1)inci+ui
For period 2 (base category): consi= α0+ β0*inci+ui
 Regressing cons on inc, D1 and (D1*inc) gives:
cons = 1.95 + 152D1 + 0.806*inc – 0.056(D1*inc)
p-value: (0.968) (0.010) (0.000) (0.002)
By: Habtamu Legese (Asst.Prof)
65
Cont.
 D1=1 for i ϵ period-1 & D1=0 for i ϵ period-2:
 Period 1 (1974-1991):cons = 153.95 + 0.75*inc
 Period 2 (1992-2005): cons = 1.95 + 0.806*inc

By: Habtamu Legese (Asst.Prof)


66
2. Chow Test
• A Chow test is a statistical test developed by
economist Gregory Chow that is used to test
whether the coefficients in two different
regression models on different datasets are
equal.
• The Chow test is typically used in the field of
econometrics with time series data to
determine if there is a structural break in the
data at some point.

By: Habtamu Legese (Asst.Prof)


67
When to use the Chow Test
The following examples illustrate situations where
you may wish to perform a Chow test:
1. To determine if stock prices change at different
rates before and after an election.
2. To determine if housing prices change before
and after an interest rate change.
3. To determine if the average profit of public
companies is different before and after a new
tax law is passed.
In each situation, we could use a Chow test to
determine if there is a structural break point in the
data at a certain point in time.
By: Habtamu Legese (Asst.Prof)
68
2. Chow’s test
 One approach for testing the presence of structural change (structural
instability) is by means of Chow’s test. The steps involved in this
procedure:
 Step 1: Estimate the regression equation for the whole period (pre-
reform plus post-reform periods) and find the error sum of
squares ( ESSR ) or RRSS.
 Step 2: Estimate the equation (model) using the available data in the pre-
reform period (say, of size n1), and find the error sum of squares (ESS1) or
RSS1
 Step 3: Estimate the equation (model) using the available data in the post-
reform period (say, of size n2), and find the error sum of squares (ESS2) or
RSS2.
 Step 4: Calculate RSSU= RSS1+RSS2.
 Step 5: Calculate the Chow test statistic
(RSS R  RSSU ) / k
Fc 
RSSU /(n1  n2  2k)
 Where k is number of estimated regression coefficients and
intercept By: Habtamu Legese (Asst.Prof)
69
Chow Test
RSS c  ( RSS1  RSS 2 ) / k
F
RSS1  RSS 2 / n  2k
RSS c  combined _ RSS
RSS1  pre  break _ RSS
RSS 2  post  break _ RSS
Cont.

F is the critical value from the t-
(k,n1 n2 2k)

 distribution with k (in our case k =2) and


n1+n2-2k degrees of freedom from a given
significance level,
 Decision rule: Reject the null hypothesis of
identical intercepts and slopes for the pre-reform
and post reform periods, that is
0  3
H0   if Fc  Ftb.
2  4
 i.e, Rejecting H0 means there is a structural
change.
By: Habtamu Legese (Asst.Prof)
71
Cont.
Example: RSS1= 64,499,436.865 (Error sum of
squares in the pre-reform period); n1=12;
RSS2=2,726,652,790.434 (Error sum of squares in
the post-reform period); n2=11;
RSSR=13,937,337,067.461 (Error sum of squares
for the whole period)
 RSSU=RSS1+RSS2=2,791,152,227.299
 The test statistics is:

(RSSR  RSSU ) / k (13,937,337,067.461 2,791,152,227.2) / 2


Fc    190
RSSU /(n1  n2  2k) (2,791,152,227.299) /(12 11 2(2))
The tabulated value from the F-distribution with 2
and 19 degrees of freedom at the 5% level of
significance is 3.52.
By: Habtamu Legese (Asst.Prof)
72
By: Habtamu Legese (Asst.Prof)
73
Cont.
 Decision: Since the calculated value of F exceeds
the tabulated value, we reject the null hypothesis
of identical intercepts and slopes for the pre-
reform and post reform periods at the 5% level of
significance.
 Hence, we can conclude that there is a structural
break.

By: Habtamu Legese (Asst.Prof)


74
Cont.
Draw backs:
 Chow’s test does not tell us whether the
difference (change) in the slope only, in the
intercept only or in both the intercept and the
slope.
The Chow Tests
Using an F-test to determine whether a single
regression is more efficient than two/more
separate regressions on sub-samples.

By: Habtamu Legese (Asst.Prof)


75
Example
Suppose we have the following results from estimation
of consumption from disposable income:
i. For the period 1974-1991: consi = α1+β1*inci+ui
Consumption = 153.95 + 0.75*Income
p-value: (0.000) (0.000)
RSS = 4340.26114; R2 = 0.9982
ii. For the period 1992-2006: consi = α2+ β2*inci+ui
Consumption = 1.95 + 0.806*Income
p-value: (0.975) (0.000)
RSS = 10706.2127; R2 = 0.9949
iii. For the period 1974-2006: consi = α+ β*inci+ui
Consumption = 77.64 + 0.79*Income
t-ratio: (4.96) (155.56)
RSS = 22064.6663; R2 = 0.9987
By: Habtamu Legese (Asst.Prof)
76
Cont.
1. URSS = RSS1 + RSS2 = 15064.474
2. RRSS = 22064.6663
 K = 1 and K + 1 = 2; n1 = 18, n2 = 15, n = 33.
3. Thus, [22064.6663  15064.474] 2
Fcal   6.7632981
15064.474 29
4.The tabulated value fromthe F-distribution with 2 and 29 degrees of
freedomat the 5% level of significance is 3.33.
5. Reject H0 at α=1%. Thus, there is structural break.
The pooled consumption model is an
inadequate specification; we should run separate
regressions.
 The above method of calculating the Chow
test breaks down if either n1 < K+1 or n2 < K+1.
 Solution: use Chow’s second (predictive) test!
By: Habtamu Legese (Asst.Prof)
77
Cont.
If, for instance, n2 < K+1, then the F-statistic will be
altered as follows:
[R R S S  R S S 1 ]
n2
F cal 
R SS 1
n 1  (K  1)
 The Chow test tells if the parameters differ on
average, but not which parameters differ.
 Also, it requires that all groups have the same 2.
This assumption is questionable: if parameters can
be different, then so can the variances be.
One way of correcting for unequal 2 is to use
dummy variable regression with robust standard
errors. By: Habtamu Legese (Asst.Prof)
78
Using Dummy variables vs Chow’s test
Comparing the two methods, it is preferable to use
the method of dummy variables regression.
 This is because with the method of DVR:
1. We run only one regression.
2.We can test whether the change is in
the intercept only, in the slope only, or
in both.

By: Habtamu Legese (Asst.Prof)


79
Dummy dependent variable
(Qualitative Response Model)
Qualitative Response Model shows situations in
which the dependent variable in a regression
equation simply represents a discrete choice
assuming only a limited number of values
 Such a model is called
 Limited dependent variable

 Discrete dependent variable

 Qualitative response

By: Habtamu Legese (Asst.Prof)


80
Categories of Qualitative Response Models
 There are two broad categories of QRM
1. Binomial Model: it shows the choice
between two alternatives
e.g: Decision to participate in labor force or not
2. Multinomial models: the choice between
more than two alternatives
e.g: Y= 1, occupation is farming
=2, occupation is carpentry
=0, government employee
Important terminologies
Binary variables: variables that have two categories and used
to an event that has occurred or some characteristics present.
By: Habtamu Legese (Asst.Prof)
81
Cont.

Ordinal variables: variables that have


categories that can be ranked.
Example: Rank according to education
attainment (Y)
0 if primary education

Y  if secondary education
1

 2 if university education
Nominal variables: variables occur when
there are multiple outcomes that cannot be
ordered.
By: Habtamu Legese (Asst.Prof)
82
Cont.

Example: Occupation can be grouped as


farming, fishing, carpentry etc.
0 if farming N.B: Numbers are
1if fishermen

Y assigned arbitrarily
 2 if carpentry
 3 if government employee
Count variables: indicate the number of times
some event has occurred.
Example: How many years of education you
have attend?
In all of the above situations, the variables
are discrete valued. By: Habtamu Legese (Asst.Prof) 83
Qualitative Choice Analysis
In such cases instead of standard regression
models, we apply different methods of modeling
and analyzing discrete data.
Qualitative choice models may be used when a
decision maker faces a choice among:
Finite number of choices
The choices are mutually exclusive (the
person chooses only one of the alternatives)
The choices are exhaustive (all possible
alternatives are included)
By: Habtamu Legese (Asst.Prof)
84
Cont.
Throughout our discussion we shall restrict
ourselves to cases of qualitative choice where the
set of alternatives is binary.
For the sake of convenience the dependent
variable is given a value of 0 or 1.
Example: Suppose the choice is whether to
work or not. The discrete dependent variable
we are working with will assume only two

1if i th individual is working


values 0 and 1: Yi  
0 if i th individual is notworking
where i = 1, 2, …, n.
By: Habtamu Legese (Asst.Prof)
85
Group Assignment
The four most commonly used approaches to
estimating binary response models
(Type of binomial models). These are:
 Linear probability models
 The logit model
 The probit model
 The tobit (censored regression) model

By: Habtamu Legese (Asst.Prof)


86
Thank You!

By: Habtamu Legese (Asst.Prof)


87

You might also like