0% found this document useful (0 votes)
17 views

Econometrics For Finance Chapter 5

Uploaded by

haminjohn15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Econometrics For Finance Chapter 5

Uploaded by

haminjohn15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

5.

Regression Analysis with Qualitative Information


In this chapter, we consider models that may involve not only ratio scale (X1/X2, X1-X2, X1 >
X2) variables but also nominal scale variables.

5.1 Describing Qualitative Information


 In regression analysis the dependent variable can be influenced by variables that are
essentially qualitative in nature, such as sex, race, color, religion, nationality,
geographical region, political upheavals, and party affiliation.
 One way we could “quantify” such attributes is by constructing artificial variables that
take on values of 1 or 0, 1 indicating the presence (or possession) of that attribute and 0
indicating the absence of that attribute.
 Variables that assume such 0 and 1 values are called dummy/ indicator/ binary/
categorical/ dichotomous variables. Such variables are essentially a device to classify
data into mutually exclusive categories.
 Dummy variables are a data-classifying device in that they divide a sample into various
subgroups based on qualities or attributes and implicitly allow one to run individual
regressions for each subgroup.
 The category that receives the value of zero is called base/reference/ benchmark group.
And all comparisons are made in relation to the benchmark category.
 The choice of omitted category does not affect the substance of the regression results.
 Dummy variables can be incorporated in regression models just as easily as quantitative
variables.

5.2 Dummy as Independent Variable


E.g. 1 (The case of single qualitative variable)
Suppose we want to find out if the average annual salary of public school teachers differs
among the three geographical regions (West, North and South) of a country.
To do this, we can set up the following model:
Yi = β1 + β2D2i + β3D3i + ui
where Yi = (average) salary of public school teacher in state i ($)
D2 = 1 for states in the North; 0 otherwise
D3 = 1 for states in the South; 0 otherwise
Mean salary of public school teachers in the North: E (Yi | D2i = 1, D3i = 0) = β1 + β2
Mean salary of public school teachers in the South: E (Yi | D2i = 0, D3i = 1) = β1 + β3

Page 67
Mean salary of public school teachers in the West: E (Yi | D2i = 0, D3i = 0) = β1
 The benchmark category is the Western region.
 The intercept value (β1) represents the mean value of the benchmark category.
 The coefficients attached to the dummy variables (standing alone) are known as the
differential intercept coefficients because they tell by how much the value of the
intercept of non-base group differs from the intercept coefficient of the benchmark
category.
 But how do we know if these differences are statistically significant?
Ŷi = 26,158.62 − 1734.473D2i − 3264.615D3i
se = (1128.523) (1435.953) (1499.615)
t = (23.1759) (− 1.2078) (− 2.1776)
pval= (0.0000) (0.2330) (0.0349) R2 = 0.0901
By checking if each of the “slope” coefficients is statistically significant.
 Therefore, the overall conclusion is that statistically the mean salaries of public school
teachers in the West and the North are about the same but the mean salary of teachers in
the South is statistically significantly lower by about $3265.
 The dummy variables will simply point out the differences, if they exist, but they do not
suggest the reasons for the differences.
Note:
 First, if the regression contains a constant term, the number of dummy variables must be
one less than the number of classes of each qualitative variable. If all categories of a
qualitative variable are incorporated with intercept, there will be perfect (multi)
collinearity and regression will be impossible. This is called dummy variable trap.
 There is a way to avoid this trap by introducing as many dummy variables as the
number of categories of that variable, provided we do not introduce the intercept in
such a model.
Yi = β1D1i + β2D2i + β3iD3i + ui
 β’s now represent mean salary of teachers in the respective regions.
 Second, if there is base group in the model, the coefficient attached to the dummy variables
must always be interpreted in relation to the base, or reference, group. The base chosen
will depend on the purpose of research at hand.

Page 68
 Finally, if a model has several qualitative variables with several classes, introduction of
dummy variables can consume a large number of degrees of freedom.
E.g. 2 (The case of multiple qualitative variables)
From a certain sample, the following regression results were obtained for hourly wages in
relation to marital status and region of residence:
Yi = 8.8148 + 1.0997D2i − 1.6729D3i
se = (0.4015) (0.4642) (0.4854)
t = (21.9528) (2.3688) (− 3.4462)
pval = (0.0000) (0.0182) (0.0006) R2 = 0.0322
where Y = hourly wage ($)
D2 = marital status; 1 = married, 0 = otherwise
D3 = region of residence; 1 = South, 0 = otherwise
Implicit in this model is the assumption that the differential effect of the marital status dummy D2
is constant across the levels of region of residence and ….
i. What is the benchmark category for this model? unmarried, non-South residence
ii. What is the mean hourly wage of the benchmark category? $8.81
iii. Interpret the coefficients. Mean hourly wage of married and non-south =8.81+1.09, Mean
hourly wage of non-married and south= 8.81-1.67
iv. What are the actual hourly wages for the married and in the South? (8.8148 + 1.0997 −
1.6729 = $8.2416)
v. Are the average hourly wages statistically different compared to the base category? Yes,
they are.
E.g. 3 (Regression with a mixture of quantitative and qualitative regressors)
Let’s introduce a quantitative variable for the above regression in example 1.
Yi = β1 + β2D2i + β3iD3i +β4Xi + ui
where Xi = spending on public school per pupil ($)
Teacher’s salary in relation to region and spending on public school per pupil
Ŷi = 13,269.11 − 1673.514D2i − 1144.157D3i + 3.2889Xi
se = (1395.056) (801.1703) (861.1182) (0.3176)
t = (9.5115)* (− 2.0889)* (− 1.3286)** (10.3539)* R2 = 0.7266

Page 69
The constant term in this model is the salary of public school teachers in the West for
zero spending on public school per pupil.
Ceteris paribus, as public expenditure goes up by one dollar, on average, public school
teacher’s salary goes up by about 3.29$.
 Dummy variables can be used in testing for differences in regression functions
across groups.
 Dummy variables can tell us whether the difference in the two regressions was
because of differences in the intercept terms or the slope coefficients or both.
 When we compare regressions from the two groups, we see that there are four
possibilities:
i. Coincident regressions: Both the intercept and the slope coefficients are
the same in the two regressions.
ii. Parallel regressions: Only the intercepts in the two regressions are
different but the slopes are the same.
iii. Concurrent regressions: The intercepts in the two regressions are the same,
but the slopes are different.
iv. Dissimilar regressions: Both the intercepts and slopes in the two
regressions are different.
E.g. 4 The relationship between savings and income in the United States over the period
1970-1995.
Yt = α1 + α2Dt + β1Xt + β2 (Dt Xt) + ut
where Y = savings
X = income
t = time
D = 1 for observations in 1982–1995; 0 otherwise
Mean savings function for 1970–1981: E (Yt | Dt = 0, Xt) = α1 + β1 Xt
Mean savings function for 1982–1995: E (Yt | Dt = 1, Xt) = (α1 + α2) + (β1 + β2) Xt
α2 is the differential intercept and β2 is the differential slope
 Notice how the introduction of the dummy variable D in the interactive, or multiplicative,
form (D multiplied by X) enables us to differentiate between slope coefficients of the two
periods, just as the introduction of the dummy variable in the additive form enabled us to
distinguish between the intercepts of the two periods.
Page 70
Ŷi = 1.0161 + 152.4786Dt + 0.0803Xt − 0.0655(Dt Xt)
se = (20.1648) (33.0824) (0.0144) (0.0159)
t = (0.0504)** (4.6090)* (5.5413)* ( − 4.0963)* R2 = 0.8819
 Both the differential intercept and slope coefficients are statistically significant,
strongly suggesting that the savings–income regressions for the two time periods
are different.
Savings–income regression, 1970–1981: Yt = 1.0161 + 0.0803Xt
Savings–income regression, 1982–1995: Yt = (1.0161 + 152.4786) + (0.0803 − 0.0655)Xt
= 153.4947 + 0.0148Xt
 The use of dummy variables for seasonal adjustment
 Seasonality is regular oscillatory movements in the data usually within a year.
 E.g. Sales and demand for money during holiday times, prices of cash crops right after
harvest
 Often it is desirable to remove the seasonal factor, or component, from a time series so
that one can concentrate on the other components. The process of removing the seasonal
component from a time series is known as deseasonalization or seasonal adjustment.
 To deseasonalize a quarterly time series data on variable Y
i. Set up Yt = α1D1t + α2D2t + α3D3t + α4D4t + ut, where the D’s are the dummies, taking
a value of 1 in the relevant quarter and 0 otherwise. We are regressing Y effectively
on an intercept, except that we allow for a different intercept for each quarter.
ii. If there is any seasonal effect in a given quarter, that will be indicated by a
statistically significant t value of the dummy coefficient for that quarter. This method
of assigning a dummy to each quarter assumes that the seasonal factor, if present, is
deterministic and not stochastic.
iii. The deseasonalized time series of refrigerator sales=Yt -Ŷt. They (residuals) represent
the remaining components of the refrigerator time series, namely, the trend, cycle,
and random components.
E.g. Seasonality in refrigerator sales
Ŷt = 1222.125D1t + 1467.5D2t + 1569.75D3t + 1160.0D4t
t = (20.3720) (24.4622) (26.1666) (19.3364) R2 = 0.5317
se = 59.99 for all coz all the dummies take only a value of either 1 or 0.
The estimated coefficients represent average sales of refrigerators in each season.
Page 71
Or
Ŷt = 1222.125 + 245.375D2t + 347.625D3t − 62.125D4t
t = (20.3720)* (2.8922)* (4.0974)* (− 0.7322)** R2 = 0.5318
 The deseasonalized time series of refrigerator sales=Yt -Ŷt
Will the picture change if we bring in a quantitative regressor in the model? If a
quantitative variable X (durable goods expenditure) is added to the model
Ŷt = 456.24 + 242.49D2t + 325.26D3t − 86.08D4t + 2.77Xt
t = (2.5593)* (3.6951)* (4.9421)* (− 1.3073)** (4.4496)* R2 =
0.7298
The interesting thing about this equation is that the dummy variables in that
model not only remove the seasonality in Y but also the seasonality, if any, in X.
 Policy evaluation using dummy variables
 In the simplest case, there are two groups of subjects: control and experimental. The
control group does not participate in the program. The experimental group or treatment
group does take part in the program. E.g. new fertilizer, land certification,
E.g. The dependent variable is hours of training per employee, at the firm level. The
variable grant is a dummy variable equal to one if the firm received a job training grant
for 1988 and zero otherwise. The variables sales and employ represent annual sales and
number of employees, respectively.
hrsemp̂ = 46.67 + 26.25 grant + 0.98 log (sales) - 6.07 log (employ)
se = (43.41) (5.59) (3.54) (3.88) R2 =0.237
Controlling for sales and employment, firms that received a grant trained each worker, on
average, 26.25 hours more.

5.3 Dummy as Dependent Variable


 In this section we consider models in which the regressand itself is qualitative in nature.

 Example

Labor force participation = f (unemployment rate, average wage rate, education, family
income)…. Yes/No
Vote = f (rates of GDP, unemployment, inflation)….Dem/Rep/Lab
 In a model where Y is quantitative, our objective is to estimate its expected (mean) value
given the values of the regressors.

Page 72
 In models where Y is qualitative, our objective is to find the probability of something
happening. Hence, qualitative response regression models are often known as probability
models.

 The binary response model is a type of limited dependent variable where the qualitative
0 𝑖𝑓 𝑛𝑜
variable takes either 1 or 0. It takes on two values: 0 and 1. 𝑌 =
1 𝑖𝑓 𝑦𝑒𝑠

 Binary outcome models

 Binary outcome models are among the most used in applied economics.
 Look at the OLS model: 𝑌 = x ′ 𝛽 + 𝑒.
 Binary outcome models estimate the probability that y=1 as a function of the
independent variables.
𝑝 = 𝑝𝑟[𝑌 = 1|x] = 𝐹(𝑥 ′ 𝛽)
There are three approaches to developing a probability model for a binary (dichotomous)
response variable depending on the functional form of 𝐹(x ′ 𝛽): linear probability model,
logit model and probit model.

5.3.1 The Linear Probability Model (LPM)


 In the linear probability model, 𝐹(𝑥 ′ 𝛽 ) = 𝑥 ′ 𝛽.

 Assume X = family income and Y = 1 if the family owns a house and 0 if it does not own
a house and consider the following regression:

Yi = β1 + β2Xi + ui
 This model is called a linear probability model (LPM) because

i. the dependent variable is binary


ii. the response probability is linear in the parameters βj
iii. the conditional expectation of Yi given Xi, E (Yi | Xi), can be interpreted as the
conditional probability that the event will occur given Xi, that is, Pr (Yi = 1 | Xi)
 Justification

Assume E(ui) = 0 to obtain unbiased estimators.


E(Yi |Xi ) = β1 + β2Xi

Page 73
Now, if P = probability that Y = 1 (that is, the event occurs), and (1– P) = probability that
Y = 0 (that is, the event does not occur), the variable Yi has the following (probability)
distribution.
Yi Probability
0 1–P
1 P
Total 1
That is, Yi follows the Bernoulli probability distribution. Now, by the definition of
mathematical expectation, we obtain:
E(Yi) = ∑YiPi = 0(1 – P) + 1(P) = P, which can be equated
E(Yi | Xi) = β1 + β2Xi = P
 In most of applications, the primary goal is to explain the effects of Xj on the response
probability Pr (Y = 1).

 In the LPM, βj measures the change in the response probability when Xj increases by one
𝜕𝑃𝑟 (𝑌𝑖 =1)
unit: = 𝛽𝑗 . For the OLS regression model, the marginal effects are the
𝜕𝑋𝑗

coefficients and they do not depend on x.

 Problems of LPM

 Heteroscedasticity of ui
Var (ui) = E(ui)2 = Var (Yi)
Var (Yi) = E(Yi - μ)2 = E(Yi2) - μ2 → or ∑(Yi - μ)2 Pi
= ∑Yi2 Pi - μ2 ∑Pi
= 12 * (P) + 02 * (1- P) - P2
= P – P2 = P (1- P) → heteroscedasticity because P = β1 + β2Xi
Thus, the distribution of ui is non-normal.
 Non-normality of the disturbances ui
 Possibility of Ŷi lying outside the 0–1 range
There is no guarantee that Ŷi, the estimators of E(Yi | Xi), will necessarily fulfill this
restriction, and this is the real problem with the OLS estimation of the LPM. The LPM
assumes that Pi increases linearly with X. There are two ways of solving this problem: (i)
apply 0 when Ŷi < 0 & 1 when Ŷi > 1, (ii) devise techniques that guarantee the restriction.
 Constant marginal effects

Page 74
𝜕𝑃𝑟 (𝑌𝑖 =1)
Given the linearity of the model, = 𝛽𝑗 , that is, the marginal effect of X remains
𝜕𝑋𝑗
constant throughout.
 Questionable values of R2 as a measure of goodness of fit
All the Y values will either lie along the X axis or along the line corresponding to 1.
Therefore, generally no LPM is expected to fit such a scatter well.
The LPM estimated by OLS for house ownership:
Ŷi = − 0.9457 + 0.1021Xi
se = (0.1228) (0.0082)
t = (− 7.698) (12.515) R2 = 0.8048
The intercept of − 0.9457 gives the “probability’’ that a family with zero income will
own a house. The slope value of 0.1021 means that for a unit change in income, on the
average, the probability of owning a house increases by 0.1021 or about 10 percent.
 If x1 is a binary explanatory variable, β1 is just the difference in the probability of success
when x1= 1 and x1 = 0, holding the other xj fixed. For dummy independent variables, the
marginal effect is expressed in comparison to the base category (x1=0).

The limitations of the LPM can be overcome by using more sophisticated response
models: 𝑃𝑟 (𝑌𝑖 = 1) = 𝐺(𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 ), where G (.) is a function taking on values
between zero and one: 0 < G (z) < 1 for any real z.
The two commonest functional forms are logit model and probit model.

5.3.2 The Logit Model

 For the logit model, 𝐹(𝑥 ′ 𝛽 ) is the cdf of the logistic distribution.

′ ′
𝑒𝑥 𝛽 exp(𝑥 ′ 𝛽)
𝐹(𝑥 𝛽 ) = Λ(𝑥 𝛽 ) = ′ =
1 + 𝑒 𝑥 𝛽 1 + exp(𝑥 ′ 𝛽)
1 𝑒 𝑍𝑖
 Let 𝑍𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 , then the function 𝑃𝑖 = 𝐸(𝑌 = 1 | 𝑋𝑖 ) = 1+𝑒 −𝑍𝑖 = 1+𝑒 𝑍𝑖 .

 The predicted probabilities are limited between 0 and 1.

 An increase in x increases/decreases the likelihood that Y=1 (makes that outcome


more/less likely). In other words, an increase in x makes the outcome of 1 more or less
likely.

Page 75
 We interpret the sign of the coefficient but not the magnitude. The magnitude cannot be
interpreted using the coefficient because different models have different scales of
coefficients.

 A standard logistic distribution has a mean of 0 and a variance of π2 /3.

Estimating marginal effects


 The marginal effects reflect the change in the probability of Y=1, given a 1 unit change in
an independent variable x. One unit increase in Xj leads to an increase of 𝛽𝑗 𝑃𝑟 (𝑌𝑖 =
1)(1 − 𝑃𝑟 (𝑌𝑖 = 1)) in the response probability.

 Average marginal effects: The marginal effects are estimated as the average of the
𝜕𝑃𝑟 (𝑌𝑖 =1) ∑𝐹′ x′ 𝛽
individual marginal effects. 𝜕𝑋𝑗
= 𝛽𝑗 𝑛

 This is a better approach of estimating marginal effects and in practice, the two ways to
estimate marginal effects produce almost identical results most of the time.

 This shows that the rate of change in probability with respect to X involves not only β2
but also the level of probability from which the change is measured.

 Because Pi is nonlinear not only in X but also in the β’s, this creates an estimation
problem, i.e., we cannot use the familiar OLS procedure to estimate the parameters.

 Probit and logit models are estimated using the maximum likelihood method.

 Odds ratios are estimated with the logistic model.

 Reporting marginal effects instead of odds ratios is more popular in economics.

 The odds ratio or relative risk in a binary response model is defined as 𝑃𝑟 (𝑌𝑖 = 1)/[1 −
𝑃𝑟 (𝑌𝑖 = 1)] and measures the probability that Y=1 relative to the probability that Y=0.

 If Pi is the probability of owning a house, then (1−Pi), the probability of not owning a
house, is
1
1 − 𝑃𝑖 = 1+𝑒 𝑍𝑖
𝑃𝑖 1+𝑒 𝑍𝑖
= = 𝑒 𝑍𝑖
1−𝑃𝑖 1+𝑒 −𝑍𝑖

Page 76
 Now Pi / (1− Pi) is simply the odds ratio in favor of owning a house - the ratio of the
probability that a family will own a house to the probability that it will not own a house.

 If this ratio is equal to 1, then both outcomes have equal probability. If this ratio is equal
to 2, then the outcome Yi = 1 is twice more likely than the outcome Yi = 0.

 The odds ratio is always non-negative.


𝑃𝑟 (𝑌𝑖 =1)
𝐿𝑖 = ln = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖
1−𝑃𝑟 (𝑌𝑖 =1)
 The log of the odds ratio is not only linear in X but also linear in the parameters and L is
called the logit model.

 The interpretation of the logit model is as follows: βi, the slope, measures the marginal
effect of Xi on the log odds-ratio in favor of Y=1. The intercept 𝛽0 is the value of the log
odds if Xi’s are zero.
 Given a certain level of Xi’s, if we actually want to estimate not the log odds but the
probability that Y=1 itself, this can be done directly once the estimates of the 𝛽𝑖 ’s are
available.
 The LPM assumes that Pi is linearly related to Xi, the logit model assumes that the of
odds ratio is linearly related to Xi.
 Logit analysis produces statistically sound results. By allowing for the transformation of
a dichotomous dependent variable to a continuous variable ranging from - ∞ to + ∞, the
problem of out of range estimates is avoided.
 The logit analysis provides results which can be easily interpreted and the method is
simple to analyze.
 It gives parameter estimates which are asymptotically consistent, efficient and normal, so
that the analogue of the regression t-test can be applied.

5.3.3 The Probit (Normit) Model

 The term probit is short for “probability unit”.


 For the probit model, 𝐹(x ′ 𝛽) is the cdf of the (not standard) normal distribution.
−(x−𝜇)2
x′ 𝛽 x′ 𝛽 1
𝐹(x ′ 𝛽 ) = Φ(x ′ 𝛽 ) = ∫−∞ 𝜙(𝑧)𝑑𝑧 = ∫−∞ 𝑒 2𝜎2 , if a variable x follows the
√2𝜎 2 𝜋
normal distribution with mean µ and variance σ2.

Page 77
 The predicted probabilities are limited between 0 and 1.
𝜕𝑃𝑟 (𝑌𝑖 =1)
 Marginal effect: = 𝛽𝑗 𝐹′(x ′ 𝛽 ). We interpret both the sign and the magnitude of
𝜕𝑋𝑗

the marginal effects.


 Coefficients and marginal effects have the same signs because 𝐹′(x ′ 𝛽 ) > 0.
 E.g. of grouped probit

Relationship between Logit and Probit Models

 The normal CDF is relatively steeper than logistic CDF, i.e., the probit curve approaches
the axis more quickly than the logistic curve.
 The probit and logit models produce almost identical marginal effects. Therefore, there is
no compelling reason to choose one over the other. But in practice many researchers
choose the logit model because of its comparative mathematical simplicity.
 In the both models, the relative effects of any two continuous independent variables, X1
𝜕𝑃𝑟 (𝑌𝑖 =1)
𝜕𝑋1 𝛽
and X2, are 𝜕𝑃𝑟 (𝑌𝑖 =1) = 𝛽1 .
2
𝜕𝑋2

Comparison of coefficients
 Coefficients differ among models because of the functional form of the F function.
𝛽𝑙𝑜𝑔𝑖𝑡 ≅ 4𝛽𝑂𝐿𝑆
𝛽𝑝𝑟𝑜𝑏𝑖𝑡 ≅ 2.5𝛽𝑂𝐿𝑆
𝛽𝑙𝑜𝑔𝑖𝑡 ≅ 1.6𝛽𝑝𝑟𝑜𝑏𝑖𝑡

 We should not compare the magnitude of the coefficients among different models. Hence,
comparisons of coefficients across nested models can be misleading because the dependent
variable is scaled differently in each model.

Page 78

You might also like