Econometrics For Finance Chapter 5
Econometrics For Finance Chapter 5
Page 67
Mean salary of public school teachers in the West: E (Yi | D2i = 0, D3i = 0) = β1
The benchmark category is the Western region.
The intercept value (β1) represents the mean value of the benchmark category.
The coefficients attached to the dummy variables (standing alone) are known as the
differential intercept coefficients because they tell by how much the value of the
intercept of non-base group differs from the intercept coefficient of the benchmark
category.
But how do we know if these differences are statistically significant?
Ŷi = 26,158.62 − 1734.473D2i − 3264.615D3i
se = (1128.523) (1435.953) (1499.615)
t = (23.1759) (− 1.2078) (− 2.1776)
pval= (0.0000) (0.2330) (0.0349) R2 = 0.0901
By checking if each of the “slope” coefficients is statistically significant.
Therefore, the overall conclusion is that statistically the mean salaries of public school
teachers in the West and the North are about the same but the mean salary of teachers in
the South is statistically significantly lower by about $3265.
The dummy variables will simply point out the differences, if they exist, but they do not
suggest the reasons for the differences.
Note:
First, if the regression contains a constant term, the number of dummy variables must be
one less than the number of classes of each qualitative variable. If all categories of a
qualitative variable are incorporated with intercept, there will be perfect (multi)
collinearity and regression will be impossible. This is called dummy variable trap.
There is a way to avoid this trap by introducing as many dummy variables as the
number of categories of that variable, provided we do not introduce the intercept in
such a model.
Yi = β1D1i + β2D2i + β3iD3i + ui
β’s now represent mean salary of teachers in the respective regions.
Second, if there is base group in the model, the coefficient attached to the dummy variables
must always be interpreted in relation to the base, or reference, group. The base chosen
will depend on the purpose of research at hand.
Page 68
Finally, if a model has several qualitative variables with several classes, introduction of
dummy variables can consume a large number of degrees of freedom.
E.g. 2 (The case of multiple qualitative variables)
From a certain sample, the following regression results were obtained for hourly wages in
relation to marital status and region of residence:
Yi = 8.8148 + 1.0997D2i − 1.6729D3i
se = (0.4015) (0.4642) (0.4854)
t = (21.9528) (2.3688) (− 3.4462)
pval = (0.0000) (0.0182) (0.0006) R2 = 0.0322
where Y = hourly wage ($)
D2 = marital status; 1 = married, 0 = otherwise
D3 = region of residence; 1 = South, 0 = otherwise
Implicit in this model is the assumption that the differential effect of the marital status dummy D2
is constant across the levels of region of residence and ….
i. What is the benchmark category for this model? unmarried, non-South residence
ii. What is the mean hourly wage of the benchmark category? $8.81
iii. Interpret the coefficients. Mean hourly wage of married and non-south =8.81+1.09, Mean
hourly wage of non-married and south= 8.81-1.67
iv. What are the actual hourly wages for the married and in the South? (8.8148 + 1.0997 −
1.6729 = $8.2416)
v. Are the average hourly wages statistically different compared to the base category? Yes,
they are.
E.g. 3 (Regression with a mixture of quantitative and qualitative regressors)
Let’s introduce a quantitative variable for the above regression in example 1.
Yi = β1 + β2D2i + β3iD3i +β4Xi + ui
where Xi = spending on public school per pupil ($)
Teacher’s salary in relation to region and spending on public school per pupil
Ŷi = 13,269.11 − 1673.514D2i − 1144.157D3i + 3.2889Xi
se = (1395.056) (801.1703) (861.1182) (0.3176)
t = (9.5115)* (− 2.0889)* (− 1.3286)** (10.3539)* R2 = 0.7266
Page 69
The constant term in this model is the salary of public school teachers in the West for
zero spending on public school per pupil.
Ceteris paribus, as public expenditure goes up by one dollar, on average, public school
teacher’s salary goes up by about 3.29$.
Dummy variables can be used in testing for differences in regression functions
across groups.
Dummy variables can tell us whether the difference in the two regressions was
because of differences in the intercept terms or the slope coefficients or both.
When we compare regressions from the two groups, we see that there are four
possibilities:
i. Coincident regressions: Both the intercept and the slope coefficients are
the same in the two regressions.
ii. Parallel regressions: Only the intercepts in the two regressions are
different but the slopes are the same.
iii. Concurrent regressions: The intercepts in the two regressions are the same,
but the slopes are different.
iv. Dissimilar regressions: Both the intercepts and slopes in the two
regressions are different.
E.g. 4 The relationship between savings and income in the United States over the period
1970-1995.
Yt = α1 + α2Dt + β1Xt + β2 (Dt Xt) + ut
where Y = savings
X = income
t = time
D = 1 for observations in 1982–1995; 0 otherwise
Mean savings function for 1970–1981: E (Yt | Dt = 0, Xt) = α1 + β1 Xt
Mean savings function for 1982–1995: E (Yt | Dt = 1, Xt) = (α1 + α2) + (β1 + β2) Xt
α2 is the differential intercept and β2 is the differential slope
Notice how the introduction of the dummy variable D in the interactive, or multiplicative,
form (D multiplied by X) enables us to differentiate between slope coefficients of the two
periods, just as the introduction of the dummy variable in the additive form enabled us to
distinguish between the intercepts of the two periods.
Page 70
Ŷi = 1.0161 + 152.4786Dt + 0.0803Xt − 0.0655(Dt Xt)
se = (20.1648) (33.0824) (0.0144) (0.0159)
t = (0.0504)** (4.6090)* (5.5413)* ( − 4.0963)* R2 = 0.8819
Both the differential intercept and slope coefficients are statistically significant,
strongly suggesting that the savings–income regressions for the two time periods
are different.
Savings–income regression, 1970–1981: Yt = 1.0161 + 0.0803Xt
Savings–income regression, 1982–1995: Yt = (1.0161 + 152.4786) + (0.0803 − 0.0655)Xt
= 153.4947 + 0.0148Xt
The use of dummy variables for seasonal adjustment
Seasonality is regular oscillatory movements in the data usually within a year.
E.g. Sales and demand for money during holiday times, prices of cash crops right after
harvest
Often it is desirable to remove the seasonal factor, or component, from a time series so
that one can concentrate on the other components. The process of removing the seasonal
component from a time series is known as deseasonalization or seasonal adjustment.
To deseasonalize a quarterly time series data on variable Y
i. Set up Yt = α1D1t + α2D2t + α3D3t + α4D4t + ut, where the D’s are the dummies, taking
a value of 1 in the relevant quarter and 0 otherwise. We are regressing Y effectively
on an intercept, except that we allow for a different intercept for each quarter.
ii. If there is any seasonal effect in a given quarter, that will be indicated by a
statistically significant t value of the dummy coefficient for that quarter. This method
of assigning a dummy to each quarter assumes that the seasonal factor, if present, is
deterministic and not stochastic.
iii. The deseasonalized time series of refrigerator sales=Yt -Ŷt. They (residuals) represent
the remaining components of the refrigerator time series, namely, the trend, cycle,
and random components.
E.g. Seasonality in refrigerator sales
Ŷt = 1222.125D1t + 1467.5D2t + 1569.75D3t + 1160.0D4t
t = (20.3720) (24.4622) (26.1666) (19.3364) R2 = 0.5317
se = 59.99 for all coz all the dummies take only a value of either 1 or 0.
The estimated coefficients represent average sales of refrigerators in each season.
Page 71
Or
Ŷt = 1222.125 + 245.375D2t + 347.625D3t − 62.125D4t
t = (20.3720)* (2.8922)* (4.0974)* (− 0.7322)** R2 = 0.5318
The deseasonalized time series of refrigerator sales=Yt -Ŷt
Will the picture change if we bring in a quantitative regressor in the model? If a
quantitative variable X (durable goods expenditure) is added to the model
Ŷt = 456.24 + 242.49D2t + 325.26D3t − 86.08D4t + 2.77Xt
t = (2.5593)* (3.6951)* (4.9421)* (− 1.3073)** (4.4496)* R2 =
0.7298
The interesting thing about this equation is that the dummy variables in that
model not only remove the seasonality in Y but also the seasonality, if any, in X.
Policy evaluation using dummy variables
In the simplest case, there are two groups of subjects: control and experimental. The
control group does not participate in the program. The experimental group or treatment
group does take part in the program. E.g. new fertilizer, land certification,
E.g. The dependent variable is hours of training per employee, at the firm level. The
variable grant is a dummy variable equal to one if the firm received a job training grant
for 1988 and zero otherwise. The variables sales and employ represent annual sales and
number of employees, respectively.
hrsemp̂ = 46.67 + 26.25 grant + 0.98 log (sales) - 6.07 log (employ)
se = (43.41) (5.59) (3.54) (3.88) R2 =0.237
Controlling for sales and employment, firms that received a grant trained each worker, on
average, 26.25 hours more.
Example
Labor force participation = f (unemployment rate, average wage rate, education, family
income)…. Yes/No
Vote = f (rates of GDP, unemployment, inflation)….Dem/Rep/Lab
In a model where Y is quantitative, our objective is to estimate its expected (mean) value
given the values of the regressors.
Page 72
In models where Y is qualitative, our objective is to find the probability of something
happening. Hence, qualitative response regression models are often known as probability
models.
The binary response model is a type of limited dependent variable where the qualitative
0 𝑖𝑓 𝑛𝑜
variable takes either 1 or 0. It takes on two values: 0 and 1. 𝑌 =
1 𝑖𝑓 𝑦𝑒𝑠
Binary outcome models are among the most used in applied economics.
Look at the OLS model: 𝑌 = x ′ 𝛽 + 𝑒.
Binary outcome models estimate the probability that y=1 as a function of the
independent variables.
𝑝 = 𝑝𝑟[𝑌 = 1|x] = 𝐹(𝑥 ′ 𝛽)
There are three approaches to developing a probability model for a binary (dichotomous)
response variable depending on the functional form of 𝐹(x ′ 𝛽): linear probability model,
logit model and probit model.
Assume X = family income and Y = 1 if the family owns a house and 0 if it does not own
a house and consider the following regression:
Yi = β1 + β2Xi + ui
This model is called a linear probability model (LPM) because
Page 73
Now, if P = probability that Y = 1 (that is, the event occurs), and (1– P) = probability that
Y = 0 (that is, the event does not occur), the variable Yi has the following (probability)
distribution.
Yi Probability
0 1–P
1 P
Total 1
That is, Yi follows the Bernoulli probability distribution. Now, by the definition of
mathematical expectation, we obtain:
E(Yi) = ∑YiPi = 0(1 – P) + 1(P) = P, which can be equated
E(Yi | Xi) = β1 + β2Xi = P
In most of applications, the primary goal is to explain the effects of Xj on the response
probability Pr (Y = 1).
In the LPM, βj measures the change in the response probability when Xj increases by one
𝜕𝑃𝑟 (𝑌𝑖 =1)
unit: = 𝛽𝑗 . For the OLS regression model, the marginal effects are the
𝜕𝑋𝑗
Problems of LPM
Heteroscedasticity of ui
Var (ui) = E(ui)2 = Var (Yi)
Var (Yi) = E(Yi - μ)2 = E(Yi2) - μ2 → or ∑(Yi - μ)2 Pi
= ∑Yi2 Pi - μ2 ∑Pi
= 12 * (P) + 02 * (1- P) - P2
= P – P2 = P (1- P) → heteroscedasticity because P = β1 + β2Xi
Thus, the distribution of ui is non-normal.
Non-normality of the disturbances ui
Possibility of Ŷi lying outside the 0–1 range
There is no guarantee that Ŷi, the estimators of E(Yi | Xi), will necessarily fulfill this
restriction, and this is the real problem with the OLS estimation of the LPM. The LPM
assumes that Pi increases linearly with X. There are two ways of solving this problem: (i)
apply 0 when Ŷi < 0 & 1 when Ŷi > 1, (ii) devise techniques that guarantee the restriction.
Constant marginal effects
Page 74
𝜕𝑃𝑟 (𝑌𝑖 =1)
Given the linearity of the model, = 𝛽𝑗 , that is, the marginal effect of X remains
𝜕𝑋𝑗
constant throughout.
Questionable values of R2 as a measure of goodness of fit
All the Y values will either lie along the X axis or along the line corresponding to 1.
Therefore, generally no LPM is expected to fit such a scatter well.
The LPM estimated by OLS for house ownership:
Ŷi = − 0.9457 + 0.1021Xi
se = (0.1228) (0.0082)
t = (− 7.698) (12.515) R2 = 0.8048
The intercept of − 0.9457 gives the “probability’’ that a family with zero income will
own a house. The slope value of 0.1021 means that for a unit change in income, on the
average, the probability of owning a house increases by 0.1021 or about 10 percent.
If x1 is a binary explanatory variable, β1 is just the difference in the probability of success
when x1= 1 and x1 = 0, holding the other xj fixed. For dummy independent variables, the
marginal effect is expressed in comparison to the base category (x1=0).
The limitations of the LPM can be overcome by using more sophisticated response
models: 𝑃𝑟 (𝑌𝑖 = 1) = 𝐺(𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 ), where G (.) is a function taking on values
between zero and one: 0 < G (z) < 1 for any real z.
The two commonest functional forms are logit model and probit model.
For the logit model, 𝐹(𝑥 ′ 𝛽 ) is the cdf of the logistic distribution.
′
′ ′
𝑒𝑥 𝛽 exp(𝑥 ′ 𝛽)
𝐹(𝑥 𝛽 ) = Λ(𝑥 𝛽 ) = ′ =
1 + 𝑒 𝑥 𝛽 1 + exp(𝑥 ′ 𝛽)
1 𝑒 𝑍𝑖
Let 𝑍𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 , then the function 𝑃𝑖 = 𝐸(𝑌 = 1 | 𝑋𝑖 ) = 1+𝑒 −𝑍𝑖 = 1+𝑒 𝑍𝑖 .
Page 75
We interpret the sign of the coefficient but not the magnitude. The magnitude cannot be
interpreted using the coefficient because different models have different scales of
coefficients.
Average marginal effects: The marginal effects are estimated as the average of the
𝜕𝑃𝑟 (𝑌𝑖 =1) ∑𝐹′ x′ 𝛽
individual marginal effects. 𝜕𝑋𝑗
= 𝛽𝑗 𝑛
This is a better approach of estimating marginal effects and in practice, the two ways to
estimate marginal effects produce almost identical results most of the time.
This shows that the rate of change in probability with respect to X involves not only β2
but also the level of probability from which the change is measured.
Because Pi is nonlinear not only in X but also in the β’s, this creates an estimation
problem, i.e., we cannot use the familiar OLS procedure to estimate the parameters.
Probit and logit models are estimated using the maximum likelihood method.
The odds ratio or relative risk in a binary response model is defined as 𝑃𝑟 (𝑌𝑖 = 1)/[1 −
𝑃𝑟 (𝑌𝑖 = 1)] and measures the probability that Y=1 relative to the probability that Y=0.
If Pi is the probability of owning a house, then (1−Pi), the probability of not owning a
house, is
1
1 − 𝑃𝑖 = 1+𝑒 𝑍𝑖
𝑃𝑖 1+𝑒 𝑍𝑖
= = 𝑒 𝑍𝑖
1−𝑃𝑖 1+𝑒 −𝑍𝑖
Page 76
Now Pi / (1− Pi) is simply the odds ratio in favor of owning a house - the ratio of the
probability that a family will own a house to the probability that it will not own a house.
If this ratio is equal to 1, then both outcomes have equal probability. If this ratio is equal
to 2, then the outcome Yi = 1 is twice more likely than the outcome Yi = 0.
The interpretation of the logit model is as follows: βi, the slope, measures the marginal
effect of Xi on the log odds-ratio in favor of Y=1. The intercept 𝛽0 is the value of the log
odds if Xi’s are zero.
Given a certain level of Xi’s, if we actually want to estimate not the log odds but the
probability that Y=1 itself, this can be done directly once the estimates of the 𝛽𝑖 ’s are
available.
The LPM assumes that Pi is linearly related to Xi, the logit model assumes that the of
odds ratio is linearly related to Xi.
Logit analysis produces statistically sound results. By allowing for the transformation of
a dichotomous dependent variable to a continuous variable ranging from - ∞ to + ∞, the
problem of out of range estimates is avoided.
The logit analysis provides results which can be easily interpreted and the method is
simple to analyze.
It gives parameter estimates which are asymptotically consistent, efficient and normal, so
that the analogue of the regression t-test can be applied.
Page 77
The predicted probabilities are limited between 0 and 1.
𝜕𝑃𝑟 (𝑌𝑖 =1)
Marginal effect: = 𝛽𝑗 𝐹′(x ′ 𝛽 ). We interpret both the sign and the magnitude of
𝜕𝑋𝑗
The normal CDF is relatively steeper than logistic CDF, i.e., the probit curve approaches
the axis more quickly than the logistic curve.
The probit and logit models produce almost identical marginal effects. Therefore, there is
no compelling reason to choose one over the other. But in practice many researchers
choose the logit model because of its comparative mathematical simplicity.
In the both models, the relative effects of any two continuous independent variables, X1
𝜕𝑃𝑟 (𝑌𝑖 =1)
𝜕𝑋1 𝛽
and X2, are 𝜕𝑃𝑟 (𝑌𝑖 =1) = 𝛽1 .
2
𝜕𝑋2
Comparison of coefficients
Coefficients differ among models because of the functional form of the F function.
𝛽𝑙𝑜𝑔𝑖𝑡 ≅ 4𝛽𝑂𝐿𝑆
𝛽𝑝𝑟𝑜𝑏𝑖𝑡 ≅ 2.5𝛽𝑂𝐿𝑆
𝛽𝑙𝑜𝑔𝑖𝑡 ≅ 1.6𝛽𝑝𝑟𝑜𝑏𝑖𝑡
We should not compare the magnitude of the coefficients among different models. Hence,
comparisons of coefficients across nested models can be misleading because the dependent
variable is scaled differently in each model.
Page 78