Lecture 08 Dummy Variables
Lecture 08 Dummy Variables
Dummy variables
Frequently, some factors we would like to include into a regression model are of qualitative
nature and, therefore, are not numerically measurable. One possible approach would be to divide
observations into several groups in accordance with whether they possess a certain qualitative
characteristic, and then analyze the difference between regression coefficients across respective
groups. Alternatively, one could estimate a single regression for all observations, measuring the
influence of the qualitative factor by introducing a so called dummy variable. This variable takes the
value of either zero or unity, depending on whether the given observation possesses the qualitative
characteristic we want to account for. It allows to test the significance of the effect of the
corresponding qualitative factor. Moreover, under certain assumptions regression estimates become
more efficient. This lecture will analyze different ways to include dummy variables into the model in
accordance with the initial hypothesis of how the difference in qualitative characteristics can affect
the relationship. First, we will illustrate where and how dummy variables are used looking at different
examples. Second, the lecture will describe different types of dummy variables, including intercept,
slope, and interactive dummy variables. Then it will discuss what is the dummy variable trap and how
estimation results depend on the choice of a reference category. And, finally, we will examine the
Chow test that enables to compare relationships between different subsamples.
It assumes that the qualitative variables introduced The assumption that each category of the
into the regression are responsible only for shifts in qualitative variables does not influence the slope
the constant term. The slope of the regression line of the regression line is not always plausible.
is identical for each category of the qualitative Sometimes we might want to allow the slope
variables. In other words, marginal effects are not coefficients on other variables to vary between
affected with the inclusion of a qualitative groups, thus accounting for different marginal
characteristic. effects. It can be done by creating a slope
Consider a model with one regressor 𝑋2 and one dummy, equal to a dummy variable times
dummy variable 𝐷: another variable:
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 ⋅ 𝐷 + 𝑢 𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 (𝐷 ⋅ 𝑋2 ) + 𝑢
If 𝐷 = 0, then 𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝑢;
If 𝐷 = 1, then 𝑌 = (𝛽1 + 𝛽3 ) + (𝛽2 + 𝛽4 )𝑋2 + 𝑢.
WHITE NON-WHITE
So, the inclusion of this interaction variable allows to answer the question: are there ethnic
variations in the effect of the gender of a respondent on earnings? Formally,
𝐻0 : 𝛽5 = 0
If 𝛽5 is significant, then the estimate of the coefficient at the variable MALEWHITE shows, that
the observed interaction between gender and ethnicity (in the sense that there is a significant ethnic
variation in the effect of gender) makes possible for a white male respondent to earn 𝑏5 % more.
The dummy variable trap:
Consider the following model: 𝑌 = 𝛽1 + 𝛽2 𝑋2 +. . . +𝛽𝑘 𝑋𝑘 + 𝛿2 𝐷2 +. . . +𝛿𝑠 𝐷𝑠 + 𝑢 (1)
So, the qualitative variable has 𝑠 categories. As was discussed, the general procedure is to
include 𝑠 − 1 dummies into the model. The dummy variable trap occurs when the reference
category is also included together with the constant term and as a result it becomes impossible to
fit the model:
𝑌 = 𝛽1 + 𝛽2 𝑋2 +. . . +𝛽𝑘 𝑋𝑘 + 𝛿1 𝐷1 + 𝛿2 𝐷2 +. . . +𝛿𝑠 𝐷𝑠 + 𝑢 (2) – the dummy variable trap.
Reasons:
1) Intuitive: The intercept dummy variable shows the increase in the intercept relative to that
of the reference category but, as we have already included this basic category, there is no
room for comparison now. Hence, there is no logical interpretation.
2) Mathematically: exact multicollinearity issue. Let’s denote the regressor at the constant
term 𝛽1 as 𝑋1 . It is identical equal to one: 𝑋1 ≡ 1. Now it becomes ∑𝑠𝑖=1 𝐷𝑖 = 1 = 𝑋1 ,
because one of the dummy variables will be equal to 1 and all the others will be equal to
0 in any observation 𝑖. Therefore, there is an exact multicollinearity => no estimates can
be obtained.
Solution:
1) Estimate as (1);
2) Drop the constant term: 𝑌 = 𝛽2 𝑋2 +. . . +𝛽𝑘 𝑋𝑘 + 𝛿1 𝐷1 + 𝛿2 𝐷2 +. . . +𝛿𝑠 𝐷𝑠 + 𝑢. Note that
the interpretation of the coefficients will change.
Main results:
The choice of reference category does not affect the substance of the regression results. The
table below shows the effects on the model with intercept dummies.
Statistical tests:
The test for either a change in intercept or a change in slope can be performed by using standard
t-tests on the dummy variable parameters.
Consider a model with both slope and intercept dummies:
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 ⋅ 𝐷 + 𝛽4 (𝐷 ⋅ 𝑋2 ) + 𝑢
𝐻0 : 𝛽3 = 0
Standard t-test: 𝑑. 𝑓. = (#𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠) − (#𝑜𝑓 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠) = 𝑛 − 4 – for
our case.
The same applies for the slope dummy:
𝐻0 : 𝛽4 = 0.
The test for the joint explanatory power of all dummy variables is carried out with the help of the
following F-statistics, calculated on the basis of the residual sums of squares in the models with
dummy variables and without them:
𝐻0 : The coefficients of all dummy variables are simultaneously equal to zero
𝐻0 : The coefficient of at least at one dummy variable is non-zero.
( SSRno dumies − SSRdummies ) / the number of dummies
F=
SSRdummies / (the number of observations − the total number of parameters estimated )
𝐻0
~ 𝐹(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑢𝑚𝑚𝑖𝑒𝑠, (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 − 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑))
Chow test:
Sometimes a sample of observations consists of two or more subsamples and it is difficult to
decide, whether it is necessary to estimate one regression for the entire sample or separate
regressions for all sub-samples. The Chow test is used to solve this problem. It tests the following
hypothesis:
𝐻𝑜 : 𝑡ℎ𝑒 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑠𝑢𝑏𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝐻1 : 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑑𝑖𝑓𝑓𝑒𝑟𝑠
Moreover, it will be shown that the Chow test is equivalent to an F test testing the explanatory
power of the dummy variables as a group (only if we include the full set of dummies).
I. Chow test for 2 subsamples each of which has 𝑘 parameters to estimate (𝑘 − 1
explanatory variables, and 1 intercept):
Subsample 1: 𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 ⋅ 𝑋3 +. . . +𝛽𝑘 ⋅ 𝑋𝑘 + 𝑢1 sample size 𝑛1 𝑆𝑆𝑅1
′ ′ ′ ′
Subsample 2: 𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 ⋅ 𝑋3 +. . . +𝛽𝑘 ⋅ 𝑋𝑘 + 𝑢2 sample size 𝑛2 𝑆𝑆𝑅2
1 = 1'
2 = 2
'
𝐻𝑜 : .......... ..
= '
k k
Procedures:
1) Estimate the regression for the whole sample: 𝑛 = 𝑛1 + 𝑛2 and 𝑆𝑆𝑅0
(𝑆𝑆𝑅0 −(𝑆𝑆𝑅1 +𝑆𝑆𝑅2 ))⁄𝑘
2) F-statistics: 𝐹(𝑘, 𝑛 − 2𝑘) = (𝑆𝑆𝑅1 +𝑆𝑆𝑅2 )⁄(𝑛−2𝑘)
𝑐𝑟𝑖𝑡
3) Perform F-test: compare to 𝐹𝛼% 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 (𝑘, 𝑛 − 2𝑘):
(𝑆𝑆𝑅0 −(𝑆𝑆𝑅1 +𝑆𝑆𝑅2 ))⁄𝑘 𝑐𝑟𝑖𝑡
If 𝐹(𝑘, 𝑛 − 2𝑘) = > 𝐹𝛼% (𝑘, 𝑛 − 2𝑘), then we can reject the null
(𝑆𝑆𝑅1 +𝑆𝑆𝑅2 )⁄(𝑛−2𝑘)
hypothesis that the relationships in both samples are the same.
Showing the equivalence:
Let’s show that this test is equivalent to the following F-test for linear restrictions:
1 if observation belongs to sample1
Define the dummy variable: 𝐷 = {
0 if observation belongs to sample2
Let’s include into the regression the full set of dummies:
𝑌 = 𝛽1 + 𝛽1′ 𝐷 + 𝛽2 𝑋2 + 𝛽2′ (𝐷 ⋅ 𝑋2 ) + 𝛽3 𝑋3 + 𝛽3′ (𝐷 ⋅ 𝑋3 )+. . . +𝛽𝑘 𝑋𝑘 + 𝛽𝑘′ (𝐷 ⋅ 𝑋𝑘 ) + 𝑢
Now the number of estimated parameters is equal to 2𝑘
So it becomes equivalent to test:
𝐻𝑜 : 𝛽1′ = 𝛽2′ =. . . = 𝛽𝑘′ = 0. There are 𝑘 restrictions
Unrestricted model with 𝑆𝑆𝑅𝑈𝑅 :
𝑌 = 𝛽1 + 𝛽1′ 𝐷 + 𝛽2 𝑋2 + 𝛽2′ (𝐷 ⋅ 𝑋2 ) + 𝛽3 𝑋3 + 𝛽3′ (𝐷 ⋅ 𝑋3 )+. . . +𝛽𝑘 𝑋𝑘 + 𝛽𝑘′ (𝐷 ⋅ 𝑋𝑘 ) + 𝑢
OLS will choose the intercept 𝑏1 and the bs coefficients of 𝑋2 … 𝑋𝑘 such that to optimise the
fit for the 𝐷 = 0 observations. The coefficients will be exactly the same as if the regression
has been run with only the subsample of 𝐷 = 0 observations. The same logic applies for 𝐷 =
1. So, 𝑆𝑆𝑅𝑈𝑅 = 𝑆𝑆𝑅1 + 𝑆𝑆𝑅2 .
Restricted model with 𝑆𝑆𝑅𝑅 :
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 ⋅ 𝑋3 +. . . +𝛽𝑘 ⋅ 𝑋𝑘 + 𝑢
(𝑆𝑆𝑅𝑅 −𝑆𝑆𝑅𝑈𝑅 )⁄𝑘 (𝑆𝑆𝑅0 −(𝑆𝑆𝑅1 +𝑆𝑆𝑅2 ))⁄𝑘
Therefore, we get the same 𝐹(𝑘, 𝑛 − 2𝑘) = (𝑆𝑆𝑅𝑈𝑅 )⁄(𝑛−2𝑘)
= (𝑆𝑆𝑅1 +𝑆𝑆𝑅2 )⁄(𝑛−2𝑘)
II. Generalizing for m subsamples:
Using the equivalence result, in this case there are m categories => we include 𝑚 − 1 dummies
=> the number of restrictions is 𝑘(𝑚 − 1) and the number of estimated parameters is 𝑚𝑘.
Therefore F-statistics:
(𝑆𝑆𝑅0 − (𝑆𝑆𝑅1 +. . . +𝑆𝑆𝑅𝑚 ))/(𝑘(𝑚 − 1))
𝐹(𝑘(𝑚 − 1), 𝑛 − 𝑚𝑘) =
(𝑆𝑆𝑅1 +. . . +𝑆𝑆𝑅𝑚 )⁄(𝑛 − 𝑚𝑘)
The Chow test is easier to perform than the test of the joint explanatory power of the group of dummy
variables, but it is less informative in the sense it does not distinguish between the contributions of
each dummy variable to the difference between regressions and does not test them for significance.
However, the test statistics and, accordingly, the conclusions of these two tests are identical.