Perraillon Marginal Effects Lecture Lisbon 0
Perraillon Marginal Effects Lecture Lisbon 0
University of Colorado
Anschutz Medical Campus
—
https://www.perraillon.com/PLH/
2
Why do we need marginal effects?
3
Why do we need marginal effects?
4
Logit/probit model reminder
There are several ways of deriving the logit model. We can assume a
latent outcome or assume the observed outcome 1/0 distributes either
Binomial or Bernoulli. The latent approach is convenient because it
can be used to derive both logit and probit models
We assume that there is a latent (unobserved) variable y ∗ that is
continuous. Think about it as a measure of illness
If y ∗ crosses a threshold, then the person dies. We only observe if the
person died but we can’t, by definition, observe the latent variable y ∗
What is the probability of dying? We can write this problem as:
P(y = 1|X ) = P(β0 + β1 X1 + · · · + βp Xp + u > 0) = P(−u <
β0 + β1 X1 + · · · + βp Xp ) = F (β0 + β1 X1 + · · · + βp Xp )
F() is the cdf of -u. If we assume logistic distribution, we get logistic
regression, if we assume cumulative normal, we get a probit model
5
Logit/probit model reminder
And that’s the probit model. Note that because we use the cdf, the
probability will obviously be constrained between 0 and 1 because,
well, it’s a cdf
If we assume that u distributes standard logistic then our model
e β0 +β1 x
becomes P(y = 1|x) = 1+e β0 +β1 x
6
Logit/probit model reminder
For now, the most important part to remember is that the scale of
estimation is not the same as the scale of interest (more on this in
30 seconds)
This is because we use transformations to constraint the probability
between 0 and 1
Keep in mind that the logistic model can be derived in different ways.
This tends to confuse students. All ways lead to same likelihood
function and therefore the same parameters
Back to why we need marginal effects...
7
Why do we need marginal effects?
p
We can write the logistic model as: log ( 1−p ) = β0 + β1 age + β2 male
The estimated parameters are in the log-odds scale, which, other
than the sign, don’t have any useful interpretation
In the above equation, β1 is the effect of age on the log-odds of the
outcome, not on the probability, which is often what were care
about
As an alternative, economists prefer to estimate Probit models for
binary outcomes:
P(y = 1|male, age) = Φ(γ0 + γ1 age + γ2 male)
But still similar problem. In the estimation scale, γ1 is interpreted as
shifts in the standard cumulative normal, which, again, is of little help
8
Why do we need marginal effects?
With the logit model we could present odds ratios (e β1 and e β2 ) but
odds-ratios are often misinterpreted as if they were relative
risks/probabilities (nonetheless presenting odds-ratios is standard
practice in the medical literature)
A simple example with no covariates: Say that the probability of
death in a control group is 0.40. The probability of death in the
treatment group is 0.20
0.2
1−0.2
The odds-ratio is: 0.4 = 0.375. The treatment reduces the odds of
1−0.4
death by a factor of 0.375. Or in reverse, the odds of death are 2.67
1
higher in the control group ( 0.375 )
But that’s not the relative risk, even though most people, including
journalists, would interpret the odds ratio as a relative risk. The
relative risk is 0.2
0.4 = 0.5. The probability of death is reduced by half
in the treatment group
9
Why do we need marginal effects?
Note something else. With odds ratios and relative risks, we don’t
have a sense of the magnitude. Same example but now the
probability of death in the control group is 0.0004 and 0.0002 in the
treatment group
The odds ratio is still 0.375 and the relative risk is still 0.5. The
magnitudes are of course quite different
A journalist could still say that, for example, eating broccoli sprouts
daily reduces the probability of dying of cancer by half. By half!!!
But if you learned that the reduction is (0.0004-0.0002) 0.0002 or
0.02 percent points, you probably are not going to run to Miosotis or
Celeiro or Pingo Doce to get a $4 serving of broccoli sprouts every day
On the other hand, a difference of 20 percent point looks quite
impressive
As we will see, marginal effects is a way of presenting results as
differences in probabilities, which is more informative than odds
ratios and relative risks 10
Why do we need marginal effects?: Recap
Ideally, we want to understand what the model saying in the
probability scale and not in the odds scale, much less in the
estimation scale, the log-odds.
In the probability scale, all effects are non-linear because,
conditional on covariate values, the probability must be bounded
between 0 and 1
Here is when numerical methods come to the rescue
We call them marginal effects in econometrics but they come in
many other names and there are different types
Big picture: marginal effects use model PREDICTION for
INTERPRETATION. We are using the estimated model to make
predictions so we can better interpret the model in the scale that
makes more sense (but we are not trying to evaluate how good is the
model at predicting...)
11
Big picture: not just for logit/probit models
12
It’s about derivatives
13
It’s about derivatives
14
Derivative, review
The analytical derivative is a limit:
limh→0 f (x+h)−f
h
(x)
All the formulas for the derivative can be derived using the
definition and taking the limit. For example, an easy one for
f (x) = X 2
2 2
limh→0 (x+h)h −x = x +2xh+h
2 2 2
−x 2
x = 2xh+h
h = 2x + h = 2x
Numerically, that is, without finding the analytical formula, we could
use the definition plugging in a number for h that is small enough. In
that case:
limh→0 f (x+h)−f
h
(x)
≈ f (x+h)−f
h
(x)
16
Terminology
As usual, language that originates in one discipline doesn’t translate
well to others. The term “marginal affects” is common in economics
and is the language of Stata
Gelman and Hill (2007) use the term “average predicted probability”
to refer to the same concept as marginal effects (in the logit model)
SAS and R have some procedures that can get marginal effects and
are also called marginal effects
One confusion is that when you tell your statistician friend about
marginal effects, your friend imagines an integral because of marginal
probability density functions (in a table of joint probabilities, the
probabilities “at the margin” are the marginal probabilities)
In economics, marginal means “additional” or “incremental,” which is
a derivative
Career advice: When you use marginal effects in a
presentation/paper for non-economists, make sure that you explain
what you mean when you show marginal effects
17
Digression: Is it a unit change?
18
Digression: Is it a unit change?
19
Data
gen lw = 0
replace lw = 1 if bwght < 100 & bwght ~= .
20
Model
------------------------------------------------------------------------------
lw | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | .0449006 .0104436 4.30 0.000 .0244316 .0653696
faminc | -.0080855 .004801 -1.68 0.092 -.0174953 .0013243
motheduc | .0031552 .037153 0.08 0.932 -.0696634 .0759738
_cons | -1.678173 .4497551 -3.73 0.000 -2.559676 -.7966687
------------------------------------------------------------------------------
21
Model
------------------------------------------------------------------------------
lw | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | 1.045924 .0109232 4.30 0.000 1.024733 1.067554
faminc | .9919471 .0047623 -1.68 0.092 .9826569 1.001325
motheduc | 1.00316 .0372704 0.08 0.932 .9327077 1.078934
_cons | .1867149 .083976 -3.73 0.000 .0773298 .4508283
------------------------------------------------------------------------------
22
Model
We can also run our trusty linear model with the caveat that SEs are
likely not right (but probably close) and that since low birth
probability is (relatively) low we should be extra careful
Now, in the probability scale, an extra cigarette increases the
probability of low birth weight by 0.7%. With 10 cigarettes, 7% (but
that assumes a linear effect)
reg lw cigs faminc motheduc, robust
------------------------------------------------------------------------------
| Robust
lw | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | .007757 .0020677 3.75 0.000 .0037009 .0118131
faminc | -.0009345 .0005785 -1.62 0.106 -.0020693 .0002004
motheduc | .0005403 .0042972 0.13 0.900 -.0078895 .00897
_cons | .1531912 .0532648 2.88 0.004 .0487027 .2576797
------------------------------------------------------------------------------
23
A plot is always helpful
A plot will help you understand the shape of the relationship of
interest but remember that other variables may change the shape
lowess lw cigs, gen(lw_c)
scatter lw cigs, jitter(3) msize(small) || ///
line lw_c cigs, color(blue) sort legend(off) saving(l.gph, replace)
graph export l.png, replace
24
Average Marginal Effect (AME)
25
Average Marginal Effect (AME)
Let’s calculate AME for the cigarette variable using the typical
formula for the analytical derivative
* Get the "small change"
qui sum cigs
scalar h = (abs(r(mean))+.0001)*.0001
di h
*.00020873
preserve
qui logit lw cigs faminc motheduc, nolog
* as is
predict double lw_0 if e(sample)
* Change cigs by a bit
replace cigs = cigs + scalar(h)
predict lw_1 if e(sample)
* For each obs
gen double dydx = (lw_1-lw_0)/scalar(h)
* Average
sum dydx
restore
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
dydx | 1,387 .0055768 .0012444 .0040507 .0113006
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | .0055782 .0012814 4.35 0.000 .0030666 .0080898
------------------------------------------------------------------------------
27
Average Marginal Effect (AME) the (almost) Stata way
* Two-sided derivative
preserve
qui logit lw cigs faminc motheduc
* Define small change for cigs
qui sum cigs
scalar h = (abs(r(mean))+0.0001)*0.0001
* Duplicte variable
clonevar cigs_c = cigs
* Everybody smokes
replace smoked = 1
predict double lw_1 if e(sample)
30
AME for indicator variables
We can of course also use the margins command with caution (!)
qui logit lw smoked faminc motheduc, nolog
margins, dydx(smoked)
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoked | .0988076 .0230959 4.28 0.000 .0535405 .1440748
------------------------------------------------------------------------------
qui logit lw i.smoked faminc motheduc, nolog
margins, dydx(smoked)
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.smoked | .118284 .0322576 3.67 0.000 .0550602 .1815078
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Even though same margins statement, different results. The first one
is not what we wanted. We did not use the factor syntax in the first
model so Stata didn’t go from 0 to 1; instead it used a “small” change
Smoking increases the probability of low birth weight by almost 12%
points (yikes)
31
The margins command must be treated with caution
32
AME for indicator variables
With indicator variables, we can also get what Stata calls predictive
margins (not marginal effects). Marginal effects are their difference
We can also use the results to go from margins to relative risk and to
odds ratios
qui logit lw i.smoked faminc motheduc, nolog
margins smoked
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoked |
0 | .1305183 .0099014 13.18 0.000 .1111118 .1499248
1 | .2488023 .0304311 8.18 0.000 .1891584 .3084461
------------------------------------------------------------------------------
34
Marginal Effect at the Mean (MEM)
We have left the values of the covariates as they were observed
rather than holding them fixed at a certain value
We can also calculate marginal effects at the mean (of each
covariate)
There is some discussion about which way is better (see Williams,
2012)
For example, does it make sense to hold male at 0.6 male? In a sense,
yes. We are giving males the value of the proportion in the sample,
0.6. In another sense, it seems odd because male is a dummy variable
Don’t waste too much time thinking about this. When we
calculate marginal effects (not margins), it doesn’t really matter at
which value we hold the other covariates constant because we are
taking differences in effects. There could some differences in small
samples
In general, the difference will be so small that it is better to spend
mental resources somewhere else 35
Marginal Effect at the Mean (MEM)
Keep covariates at mean values instead
preserve
qui sum cigs
scalar h = (abs(r(mean))+0.0001)*0.0001
qui logit lw cigs faminc motheduc, nolog
clonevar cigs_c = cigs
* At mean
replace faminc = 29.02666
replace motheduc = 12.93583
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | .005563 .0012843 4.33 0.000 .0030458 .0080801
------------------------------------------------------------------------------
37
Marginal Effect at the Mean (MEM)
Not the same as using the atmeans option
margins, dydx(cigs) atmeans
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | .0055506 .0012879 4.31 0.000 .0030264 .0080749
------------------------------------------------------------------------------
In this one, cigarettes were held at its mean, 2.088. Not a big deal in
this example because in this example the effect of cigs is relatively
linear (see lowess plot above) but you could have gotten a very
different answer
One more time: please be careful with the margins command
38
Marginal effects at representative values (MER)
39
Marginal effects at representative values (MER)
We will do it “by hand” for low income (10K) and higher income
(40K) using the one-sided version to make the code shorter
preserve
qui logit lw cigs faminc motheduc, nolog
* income 10k
replace faminc = 10
predict double lw_0_10 if e(sample)
replace cigs = cigs + .00597269
predict double lw_1_10 if e(sample)
gen double dydx10 = (lw_1_10-lw_0_10)/.00597269
* income 40k
replace faminc = 40
predict double lw_0_40 if e(sample)
replace cigs = cigs + .00597269
predict double lw_1_40 if e(sample)
gen double dydx40 = (lw_1_40-lw_0_40)/.00597269
sum dydx*
restore
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
dydx10 | 1,387 .0061672 .0010198 .005653 .0112164
dydx40 | 1,387 .0052304 .001039 .0047327 .0111981
40
Marginal effects at representative values (MER)
Below, income reduces the effect of smoking. Better access to health
care? So income is a modifier of the effect?
qui logit lw cigs faminc motheduc, nolog
margins, dydx(cigs) at(faminc=(10 20 30 40)) vsquish
41
Marginal effects at representative values (MER)
42
Same but with LPM
Since there are no interactions, the marginal effect doesn’t depend on
the value of income
qui reg lw cigs faminc motheduc
margins, dydx(cigs) at(faminc=(10 20 30 40)) vsquish
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs |
_at |
1 | .007757 .001631 4.76 0.000 .0045574 .0109566
2 | .007757 .001631 4.76 0.000 .0045574 .0109566
3 | .007757 .001631 4.76 0.000 .0045574 .0109566
4 | .007757 .001631 4.76 0.000 .0045574 .0109566
------------------------------------------------------------------------------
43
With interactions the effect should be more noticiable
Now adding interactions between cigarettes and income. This is the
right way of making the effect of cigs depend on income
Note how the conclusion is different. However, the interaction is not
statistically significant
qui logit lw c.cigs##c.faminc motheduc, nolog
margins, dydx(cigs) at(faminc=(10 20 30 40)) vsquish
Average marginal effects Number of obs = 1,387
Model VCE : OIM
Expression : Pr(lw), predict()
dy/dx w.r.t. : cigs
1._at : faminc = 10
2._at : faminc = 20
3._at : faminc = 30
4._at : faminc = 40
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs |
_at |
1 | .0054953 .0016702 3.29 0.001 .0022217 .0087689
2 | .0059858 .0013512 4.43 0.000 .0033374 .0086342
3 | .0064006 .0016609 3.85 0.000 .0031453 .0096558
4 | .0067452 .0022472 3.00 0.003 .0023408 .0111497
------------------------------------------------------------------------------
44
Marginsplot
You can visualize changes using marginsplot. This is a way to get
adjusted plots using the margins command
marginsplot, saving(mp.gph, replace)
graph export mp.png, replace
45
Interactions
46
Interactions
47
Interactions
pm pf
Difference males - females: log ( 1−p m
) − log ( 1−p f
) = β2 + β3 hsp
Difference male - female for educated:
pme pfe
log ( 1−p me
) − log ( 1−p fe
) = β2 + β3
Difference male - female for uneducated:
pmu pfu
log ( 1−p mu
) − log ( 1−p fu
) = β2
Difference in difference:
pme pfe pmu pfu
log ( 1−p me
) − log ( 1−p fe
) − [log ( 1−p mu
) − log ( 1−p fu
)] = β3
So same as with linear model. In the log-odds scale, it is a
difference-in-difference
48
Interactions in the odds scale
pme pfe pmu pfu
log ( 1−p me
) − log ( 1−p fe
) − [log ( 1−p mu
) − log ( 1−p fu
)] = β3
We can apply the rules of logs and take e () on both sides:
Pme Pmu
1−Pme
Pfe / 1−P
Pfu
mu
= e β3
1−Pfe 1−Pfu
Wait, two effects? The model has three coefficients. Where is the
interaction?
51
Interactions and marginal effects
This may seem confusing but it’s not when you remember how Stata
calculates marginal effects
For cigs, a continuous variable, it’s using the two-sided derivative
increasing cigs by a little bit and calculating predictions. It’s
increasing cigs in both the main effect and the interaction
Then it takes an average so the marginal effect of cigs is the
numerical derivative for both inc=1 and inc=0 combined
For the marginal effect of inc, it’s doing the same going from 0 to 1,
averaging over the values of cigs
To get what we need, which in this case is the marginal effect of cigs
separately for inc=1 and inc=0, we have to be more specific
52
Interactions and marginal effects
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs |
_at |
1 | .0062867 .0012881 4.88 0.000 .0037621 .0088113
2 | -.0004394 .0062301 -0.07 0.944 -.0126501 .0117713
------------------------------------------------------------------------------
53
Interactions and marginal effects
Of course, interactions go both ways. So the effect of income
depends on the number of cigs. But cigs is continuous; we have to
choose some values
margins, dydx(inc) at(cigs=(0 10 20 40)) vsquish
Conditional marginal effects Number of obs = 1,388
Model VCE : OIM
Expression : Pr(lw), predict()
dy/dx w.r.t. : 1.inc
1._at : cigs = 0
2._at : cigs = 10
3._at : cigs = 20
4._at : cigs = 40
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
0.inc | (base outcome)
-------------+----------------------------------------------------------------
1.inc |
_at |
1 | -.0114123 .0214935 -0.53 0.595 -.0535388 .0307143
2 | -.0851994 .0622866 -1.37 0.171 -.2072788 .03688
3 | -.1819228 .1224851 -1.49 0.137 -.4219893 .0581436
4 | -.4251388 .2397838 -1.77 0.076 -.8951064 .0448287
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
54
Digression
55
Margins are predictions
56
A very brief summary of margins and the margins
command
Most common uses: estimate “effects” in the scale of interest.
That is a 1) a numerical derivative for continuous covariates or 2)
incremental effects for dummy variables. Syntax is “margins,
dydx(varname)”
Another possibility is to use margins to obtain “predictive margins”
of dummy variables–and if you fix a continuous covariate at some
values, for continuous variable as well. Syntax is “margins varname”
or “margins varname, at(...)”
With the previous syntax you can use margins to obtain predictions,
Just specify values for all covariates: “margins, at(var1=10 var2=20
var3=...)”
You can also use margins to obtain “adjusted predictions,” which is
essentially the same ideas as previous point. You need to fix covariates
at some values: margins, at(cigs=(0(1)50) faminc=50 motheduc=13)
57
A very brief summary of margins and the margins
command
Confusion alert: Make sure you understand the difference between
marginal effects and predictive margins. I guarantee you are going to
get confused
Marginal effects (dydx) is about effects; the other is about calculating
predictions but not effects. Yet, part of the confusion is that in order
to calculate effects you also use predictions BUT changing values by a
“small” amount or from 0 to 1
We haven’t discussed other features but you can use the margins
command to express effects as elasticities, for example
The marginsplot has many options. It’s specially helpful to display
interactions and understand the model
You can produce adjusted plots as in the example in Excel using a
reference population...
58
Predictions in logistic models
We saw that we can easily make predictions in the probability scale
logit lw smoked faminc, nolog
Logistic regression Number of obs = 1,388
LR chi2(2) = 24.30
Prob > chi2 = 0.0000
Log likelihood = -572.48309 Pseudo R2 = 0.0208
------------------------------------------------------------------------------
lw | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoked | .7898437 .1830874 4.31 0.000 .4309989 1.148688
faminc | -.0076573 .0043414 -1.76 0.078 -.0161662 .0008516
_cons | -1.681833 .151239 -11.12 0.000 -1.978256 -1.38541
------------------------------------------------------------------------------
di exp(_b[_cons] + _b[smoked] + _b[faminc]*30) / (1+exp(_b[_cons] + _b[smoked] + _b[faminc]*30))
.24569462
60
More custom: Two-part models
Two-part models are often used to estimate cost data with a large
proportion of costs that are zero
The idea is to estimate the model in two parts:
1) Estimate the probability that the cost is greater than zero
conditional on covariates: P(yi > 0|Xi ) (using logit, probit,
complementary log-log models)
2) For those observations with non-zero costs, estimate the expected
costs conditional on covariates: E (yi |yi > 0, Xi ) (using Poisson, linear
models, Gamma, log (y ), Box-Cox, etc)
Predictions are obtaining combining both parts (multiplication)
61
Two-part models marginal effects
If you know how to get predictions, you now how to calculate
marginal effects
First, using the twopm user-written command and the margins
command
* Get data from example in twopm command
webuse womenwk
replace wage = 0 if wage==.
62
Two-part models marginal effects “by hand”
gen nozeroc = 0
replace nozeroc = 1 if wage >0
preserve
clonevar marr = married
* not married
qui logit nozeroc i.married children
replace married = 0
predict double fp0
replace married = marr
qui glm wage i.married children if wage > 0, f(gamma) l(log)
replace married = 0
predict double c0
gen chat0 = fp0*c0
* married
replace married = marr
qui logit nozeroc i.married children
replace married = 1
predict double fp1
replace married = marr
qui glm wage i.married children if wage > 0, f(gamma) l(log)
replace married = 1
predict double c1
gen chat1 = fp1*c1
64