Web Page
Web Page
Introduction:
Multiple Regression - Further Applications
Dragos Radu
[email protected]
Recommended readings:
Stock and Watson, chapter: 8
test scores: language learning and small classes
What if students who are still learning English benefit in a di↵erent way
from one-on-one or small-group instruction?
• perhaps smaller classes help more if there are many English learners,
who need individual attention
DTestScore
• in that case, DSTR might depend on PctEL
DY
• or more generally: DX 1
might depend on X2
• how do we model such “interactions” between X1 and X2 ?
three types of interactions
Y = b 0 + b 1 · X1 + b 2 · X2 + b 3 · ( X1 ⇥ X2 ) + u
We consider three cases:
• both X1 and X2 are binary
• X1 is continuous and X2 is binary
• both X1 and X2 are continuous
our variables
outcome of interest:
testscr average of reading and math scores on achivement test
control variables:
el pct percent of English Learners
expn stu expentitures per student ($’s)
avginc district average income (in $1000’s)
example: TestScr , STR and English learners
How can we allow the e↵ect of being in a small class depend on the
percentage of English learners?
You can do this when you regress test scores on these two dummies
and the interaction between them.
regression with two dummies, no interaction
we first generate the two dummies:
. gen histr=str>=20
. gen hiel=el_pct>=10
we could the run the following regression:
. reg testscr histr hiel
We can see why the previous regression (just on the two dummies)
assumes that the e↵ect of class size is the same in districts with high and
low % of English learners using the equation:
\ = b 0 + b 1 · HiSTR + b 2 · HiEL
TestScr
your turn: regression with two interacted binary variables
To a allow the e↵ect of HiSTR to depend on HiEL we include the interaction term in the
regression. We can do this directly in Stata using the ## operator:
. reg testscr histr##hiel
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.histr | -1.907842 2.233654 -0.85 0.394 -6.298497 2.482813
1.hiel | -18.16295 2.150084 -8.45 0.000 -22.38933 -13.93656
|
histr#hiel |
1 1 | -3.494335 3.22244 -1.08 0.279 -9.82863 2.83996
|
_cons | 664.1433 1.314807 505.13 0.000 661.5588 666.7278
------------------------------------------------------------------------------
Can you relate these coefficients to the following table of group means?
Can you fill in the rest of the cells using the regression results?
Dragos Radu
[email protected]
\ = b 0 + b 1 · HiSTR + b 2 · HiEL
TestScr
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.histr | -1.907842 2.233654 -0.85 0.394 -6.298497 2.482813
1.hiel | -18.16295 2.150084 -8.45 0.000 -22.38933 -13.93656
|
histr#hiel |
1 1 | -3.494335 3.22244 -1.08 0.279 -9.82863 2.83996
|
_cons | 664.1433 1.314807 505.13 0.000 661.5588 666.7278
------------------------------------------------------------------------------
Can you relate these coefficients to the following table of group means?
There are four ways to create indicator variables from categorical variables
and to interact categorical and continuous variables:
Operator Description
----------------------------------------------------------
i. operator to specify indicators (categories)
c. operator to treat a variable as continuous
# binary operator to specify interactions
## binary operator to specify factorial interactions
-----------------------------------------------------------
to see how prefixes and binary interaction operators work in Stata use:
help fvvarlist
what comes next?
Dragos Radu
[email protected]
we impose a common slope on exper for men and women, b 1 = .333 in this example
only the intercepts that are allowed to di↵er.
intercept shift
graph of wage = b 0 + d0 · female + b 1 · exper for d0 < 0
14
wage
men (slope = .333)
12
10
difference = 2.99
8
0 2 4 6 8 10 12 14
exper
Intercept Slope
HiEL=0 (Low % EL) b0 b1
HiEL=1 (High % EL) b0 + b2 b1 + b3
Di↵erence (High % EL) (Low % EL) b2 b3
interaction between a binary and a continuous variable
we allow the e↵ect of STR to depend on HiEL by including their interaction in the regression:
. reg testscr c.str##hiel
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -.9684601 .539787 -1.79 0.074 -2.02951 .0925899
1.hiel | 5.639141 16.71767 0.34 0.736 -27.2225 38.50078
|
hiel#c.str |
1 | -1.276613 .8440608 -1.51 0.131 -2.935769 .3825425
|
_cons | 682.2458 10.51094 64.91 0.000 661.5847 702.907
------------------------------------------------------------------------------
The binary operator c.str##hiel tells to to treat str and a continuous variables and
HiEL as binary.
interacted continuous and binary variables
TestScr , STR and English learners
• when HiEL = 0:
\ = 682.2
TestScr 0.97 · STR
• when HiEL = 1:
\ = 682.2 0.97 · STR + 5.6
TestScr 1.28 · STR
= 687.8 2.25 · STR
• two regression lines for each HiEL group
• class size reduction is estimated to have a larger e↵ect when the
percent of English learners is large
allowing for di↵erent slopes
TestScore = b 0 + b 1 · STR + b 2 · HiEL + b 3 · Hiel · STR + u
back to our wage regression
lwage = b 0 + d0 female + b 1 exper + d1 female · exper + u
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -.5184152 .1469194 -3.53 0.000 -.8068399 -.2299905
exper | .0283287 .0111194 2.55 0.011 .0064997 .0501577
femexper | .0233771 .0134822 1.73 0.083 -.0030905 .0498447
_cons | 2.097866 .1258909 16.66 0.000 1.850724 2.345009
------------------------------------------------------------------------------
allowing for di↵erent slopes in the wage regression
[ =
lwage 2.098 .518 female + .0283 exper + .0234 female · exper
(.126) (.147) (.0111) (.0135)
2
n = 750, R = .180
• the intercept for men is 2.098 and the slope is .0283 – about 2.8% for
each year of experience.
• the intercept for women is 2.098 .518 = 1.58 and the slope is
.0283 + .0234 = .0517 – about 5.2% for each year of experience.
• the interaction term is marginally statistically significant, with p-value
= .083. (at the 10% level but not the 5%.)
allowing for di↵erent slopes in the wage regression
lwage = b 0 + d0 female + b 1 exper + d1 female · exper + u
2.6
lwage
2.4
slope = .0283
difference = .190
2.2
2
difference = .518
1.8
slope = .0517
1.6
1.4
0 2 4 6 8 10 12 14
exper
male female
interpretation
[ =
lwage 2.098 .518 female + .0283 exper + .0234 female · exper
(.126) (.147) (.0111) (.0135)
2
n = 750, R = .180
must use care to interpret the coefficient on female when female · exper is
included: at any level of experience, the predicted di↵erence in lwage
between females and males is
[ =
lwage 2.098 .518 female + .0283 exper + .0234 female · exper
(.126) (.147) (.0111) (.0135)
2
n = 750, R = .180
must use care to interpret the coefficient on female when female · exper is
included: at any level of experience, the predicted di↵erence in lwage
between females and males is
more interesting is the gap at around the mean, say exper = 10:
or about 28.4% less for women - the gap never fully closes (largest amount
of experience in the sample = 13 56 years)
• we can centre the variable by replacing female · exper with
female · (exper 10)
• the coefficient on female becomes the di↵erence at 10 years exper
• 10 is close to the mean value of experience in the sample.
results after centuring
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -.2846441 .0350777 -8.11 0.000 -.3535068 -.2157814
exper | .0283287 .0111194 2.55 0.011 .0064997 .0501577
femexper_10 | .0233771 .0134822 1.73 0.083 -.0030905 .0498447
_cons | 2.097866 .1258909 16.66 0.000 1.850724 2.345009
------------------------------------------------------------------------------
what comes next?
Dragos Radu
[email protected]
To a allow the e↵ect of HiSTR to depend on PctEL we include the interaction term in the
regression. We can do this directly in Stata using the ## operator:
. reg testscr c.str##c.el_pct, r
the cubic regression from columns (5) and (7) are very similar
regression functions of test scores on class size
the two lines have similar shapes and slopes for most districts (17 < STR < 23)
conclusions