0% found this document useful (0 votes)
4 views

Lecture Notes Paradisi

Uploaded by

raniaditi563
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture Notes Paradisi

Uploaded by

raniaditi563
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Public Economics Lecture Notes

Matteo Paradisi

1
Contents
1 Section 1-2: Uncompensated and Compensated Elas-
ticities; Static and Dynamic Labor Supply 4

1.1 Uncompensated Elasticity and the Utility Maximization Problem . . . . . . . . . . . . . 4


1.2 Substitution Elasticity and the Expenditure Minimization Problem . . . . . . . . . . . . 6
1.3 Relating Walrasian and Hicksian Demand: The Slutsky Equation . . . . . . . . . . . . . 6
1.4 Static Labor Supply Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Dynamic Labor Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Section 2: Introduction to Optimal Income Taxa-


tion 12

2.1 The Income Taxation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12


2.2 Taxation in a Model With No Behavioral Responses . . . . . . . . . . . . . . . . . . . . 12
2.3 Towards the Mirrlees Optimal Income Tax Model . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Optimal Linear Tax Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Optimal Top Income Taxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Section 3-4: Mirrlees Taxation 17

3.1 The Model Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


3.2 Optimal Income Tax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Diamond ABC Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Optimal Taxes With Income Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Pareto Efficient Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 A Test of the Pareto Optimality of the Tax Schedule . . . . . . . . . . . . . . . . . . . . 23

4 Section 5: Optimal Taxation with Income Effects


and Bunching 28
4.1 Optimal Taxes with Income Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Bunching Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Section 6: Optimal Income Transfers 34


5.1 Optimal Income Transfers in a Formal Model . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Optimal Tax/Transfer with Extensive Margin Only . . . . . . . . . . . . . . . . . . . . . 35
5.3 Optimal Tax/Transfer with Intensive Margin Responses . . . . . . . . . . . . . . . . . . 36
5.4 Optimal Tax/Transfer with Intensive and Extensive Margin Responses . . . . . . . . . . 37

6 Section 7: Optimal Top Income Taxation 38


6.1 Trickle Down: A Model With Endogenous Wages . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Taxation in the Roy Model and Rent-Seeking . . . . . . . . . . . . . . . . . . . . . . . . 39
6.3 Wage Bargaining and Tax Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7 Section 8: Optimal Minimum Wage and Introduc-


tion to Capital Taxation 44
7.1 Optimal Minimum Wage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2
8 Section 9: Linear Capital Taxation 49
8.1 A Two-Period Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.2 Infinite Horizon Model - Chamley (1986) . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.3 Infinite Horizon - Judd (1985) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

9 Section 10: Education Policies and Simpler Theory


of Capital Taxation 57
9.1 Education Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9.2 A Simpler Theory of Capital Taxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

10 Section 11: Non-Linear Capital Taxation 62


10.1 Non-Linear Capital Taxation: Two-Periods Model . . . . . . . . . . . . . . . . . . . . . 62
10.2 Infinite Horizon Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3
1 Section 1-2: Uncompensated and Compensated Elas-
ticities; Static and Dynamic Labor Supply

In this section, we will briefly review the concepts of substitution (compensated) elasticity and uncom-
pensated elasticity. Compensated and uncompensated labor elasticities play a key role in studies of
optimal income taxation. In the second part of the section we will study the context of labor supply
choices in a static and dynamic framework.

1.1 Uncompensated Elasticity and the Utility Maximization Problem


The utility maximization problem: We start by defining the concept of Walrasian demand in
a standard utility maximization problem (UMP). Suppose the agent chooses a bundle of consumption
goods x1 , . . . , xN with prices p1 , . . . , pN and her endowment is denoted by w. The optimal consumption
bundle solves the following:

max u (x1 , . . . , xN )
x1 ,...,xN

s.t.
N
X
pi x i  w
i=1

We solve the problem using a Lagrangian approach and we get the following optimality condition
(if an interior optimum exists) for every good i:

ui (x⇤ ) ⇤
pi = 0

Solving this equation for ⇤


and doing the same for good j yields:
ui (x⇤ ) pi
=
uj (x⇤ ) pj

This is an important condition in economics and it equates the relative price of two goods to the
marginal rate of substitution (MRS) between them. The MRS measures the amount of good j that
the consumer must be given to compensate the utility loss from a one-unit marginal reduction in her
consumption of good i. Graphically, the price ratio is the slope of the budget constraint, while the
ratio of marginal utilities represents the slope of the indifference curve.1
We call the solution to the utility maximization problem Walrasian or Marshallian demand and
we represent it as a function x (p, w) of the price vector and the endowment. The Walrasian demand
has the following two properties:
• homogeneity of degree zero: xi (↵p, ↵w) = xi (p, w)
• Walras Law : for every p 0 and w > 0 we have p · x (p, w) = w
1 Notice that in a two goods economy by differentiating the indifference curve u (x1 , x2 (x1 )) = k wrt x1 you get:
dx2
u1 + u2 =0
dx1
which delivers
dx2 u1
=
dx1 u2
which shows that the ratio of marginal utilities is the slope of the indifference curve at a point (x1 , x2 ).

4
We define uncompensated elasticity as the percentage change in the consumption of good i when we
raise the price pk . Using the Walrasian demand we can write the uncompensated elasticity as:

@xi (p, w) pk
"ui,pk =
@pk xi (p, w)

Elasticities can also be defined using logarithms such that:

@ log xi (p, w)
"ui,pk =
@ log pk

Indirect utility: We introduce the concept of indirect utility that will be useful throughout the
class. It also helps interpreting the role of the Lagrange multiplier. The indirect utility is the utility
that the agent achieves when consuming the optimal bundle x (p, w). It can be obtained by plugging
the Walrasian demand into the utility function:

v (p, w) = u (x (p, w))

The indirect utility has the following properties:

• homogeneity of degree zero: since the Walrasian demand is homogeneous of degree zero, it follows
that the indirect utility will inherit this property
• @v (p, w) /@w > 0 and @v (p, w) /@pk  0

Roy’s Identity and the multiplier interpretation: Using the indirect utility function, the value
of the problem can be written as follows at the optimum:

v (p, w) = u (x⇤ (p, w)) + ⇤


(w p· x⇤ (p, w))

Applying the Envelope theorem, we can study how the indirect utility responds to changes in the
agent’s wealth:

@v (p, w) ⇤
=
@w
The value of the Lagrange multiplier at the optimum is the shadow value of the constraint. Specif-
ically, it is the increase in the value of the objective function resulting from a slight relaxation of the
constraint achieved by giving an extra dollar of endowment to the agent. This interpretation of the
Lagrangian multiplier is particularly important in the study of optimal Ramsey taxes and transfers.
You will see more about it in the second part of the PF sequence.
The Envelope theorem also implies that:

@v (p, w) ⇤
= xi (p, w)
@pi
Using the two conditions together we have:
@v(p,w)
@pi
@v(p,w)
= xi (p, w)
@w

This equation is know as the Roy’s Identity and it derives the Walrasian demand from the indirect
utility function.

5
1.2 Substitution Elasticity and the Expenditure Minimization Problem
In this section we aim to isolate the substitution effect of a change in price. An increase in the price
of good i typically generates two effects:
• substitution effect: the relative price of xi increases, therefore the consumer substitutes away
from this good towards other goods,
• income effect: the consumer’s purchasing power has decreased, therefore she needs to reoptimize
her entire bundle. This reduces even more the consumption of good i.
We define substitution or compensated elasticity as the percentage change in the demand for a good in
response to a change in a price that ignores the income effect. In order to get at this new concept, we
focus on a problem that is “dual” to the utility maximization problem: the expenditure minimization
problem (EMP). The consumer solves:
N
X
min pi x i
x1 ,...,xN
i=1

s.t.

u (x1 , . . . , xN ) ū

The problem asks to solve for the consumption bundle that minimizes the amount spent to achieve
utility level ū. The solution delivers two important functions: the expenditure function e (p, ū), which
measures the total expenditure needed to achieve utility ū under the price vector p, and the Hicksian
(or compensated ) demand h (p, ū), which is the demand vector that solves the minimization problem.
The Walrasian and Hicksian demands answer two different but related problems. The following
two statements establish a relationship between the two concepts:
1. If x⇤ is optimal in the UMP when wealth is w, then x⇤ is optimal in the EMP when ū = u (x⇤ ).
Moreover, e (p, ū) = w.
2. If x⇤ is optimal in the EMP when ū is the required level of utility, then x⇤ is optimal in the UMP
when w = p · x⇤ . Moreover, ū = u (x⇤ ).
The Hicksian demand allows us to isolate the pure substitution effect in response to a price change.
We call it compensated since it is derived following the idea that, after a price change, the consumer
will be given enough wealth (the “compensation”) to maintain the same utility level she experienced
before the price change. Suppose that under the price vector p the consumer demands a bundle x such
that p· x = w. When the price vector is p0 , the consumer solves the new expenditure minimization
problem and switches to x0 such that u (x) = u (x0 ) and p0 · x0 = w0 . The change w = w0 w is the
compensation that the agent receives to be as well off in utility terms after the price change as she
was before. Thanks to the compensation there is no income effect coming from the reduction in the
agent’s purchasing power.
We call the elasticity of the Hicksian demand function compensated elasticity and it reads:
@hi (p, ū) pk
"ci,pk =
@pk hi (p, ū)

1.3 Relating Walrasian and Hicksian Demand: The Slutsky Equation


We now establish a relationship between the Walrasian and the Hicksian demand elasticities. We know
that u (xi (p, w)) = ū and e (p, ū) = w. Start from the following identity:

xi (p, e (p, ū)) = hi (p, ū)

6
and differentiate both sides wrt pk to get:

@hi (p, ū) @xi (p, e (p, ū)) @xi (p, e (p, ū)) @e (p, ū)
= +
@pk @pk @e (p, ū) @pk
@xi (p, w) @xi (p, w)
= + hk (p, ū)
@pk @w
@xi (p, w) @xi (p, w)
= + xk (p, , e (p, ū))
@pk @w
@xi (p, w) @xi (p, w)
= + xk (p, w)
@pk @w
Rearranging, we derive the following relation:

@xi (p, w) @hi (p, ū) @xi (p, w)


= xk (p, w)
@p @p | @w {z
| {zk } | {zk } }
uncompensated change substitution effect income effect

we have thus decomposed the uncompensated change into income and substitution effect. Notice
also how the income effect is the product of two terms: @xi@w(p,w)
is the response of the Walrasian
demand for good i to a change in wealth; xk (p, w) is the mechanical effect of an increase in pk on
the agent’s purchasing power: an agent whose demand for k was xk (p, w) experiences a mechanical
reduction of her purchasing power amounting to xk (p, w) when pk increases by 1. J. R.

1.4 Static Labor Supply Choice


In this paragraph we study a simple framework of labor supply choice and we derive uncompensated
labor elasticities. Assume an agent derives utility from consumption, but disutility from labor. Her
preferences are represented by the utility function u (c, n) where @u/@c > 0 and @u/@n < 0. The agent
has I amount of wealth and earns salary w. We normalize the price of consumption to 1.2 The utility
maximization problem now is:

max u (c, n)
c,n

s.t.

c = wn + I

Taking FOCs and rearranging we get the following:


un
=w
uc
This condition is similar to the one we derived above. It equates the cost of leisure w to the marginal
rate of substitution between labor and consumption. Dividing the marginal disutility of labor by the
marginal utility of consumption we get the marginal utility cost of labor in consumption units. The
condition therefore equates the marginal utility cost of labor to the salary.
We now want to study the labor supply response to a change in salary. Suppose that the wage
increases. Since the consumer gets paid more for every hour she works, she will tend to work more
(which implies that she will consume less leisure). This is the substitution effect. However, since the
agent earns more for every hour of work, she gets paid more for the amount of hours she were already
2 Notice that we can normalize the price of consumption in a two goods economy and interpret salary w as the relative

price of leisure over consumption.

7
working. Since the consumer is wealthier, if leisure is a normal good, she will tend to work less and
consume more leisure. This is the income effect. Notice that, even if the cost of leisure has increased,
the income and substitution effects do not go in the same direction unlike in standard consumer
problems where an increase in the price of good i generates a negative income and substitution effect
for good i. The reason is that this is an endowment economy where we think about leisure l as the
difference between total time endowment T and labor. We have l = T n. In this setup the agent is
a net seller of leisure and therefore the income effect is positive for leisure when the salary increases.
Now we get a little more formal and we study analytically the response of labor supply to changes
in the wage rate. Totally differentiating the optimality condition wrt w we get:

@n uc + n (unc + wucc )
=
@w w2 ucc + 2wunc + unn
Notice that the denominator of the expression is the second order condition of the problem and
can therefore be signed. If we assume the problem is concave (in order to get an interior solution), the
denominator is negative. This implies that:
@n
/ uc + n (unc + wucc )
@w
This expression captures the intuition provided above. The first term is the substitution effect,
which is always positive and proportional to the marginal utility of consumption: the extent to which
the consumer substitutes labor and consumption depends on how attractive consumption is. The
second term measures the income effect. It depends on the cross-derivative of consumption and labor
and the concavity of the utility function in consumption. The cross-derivative measures how changes
in consumption affect the labor disutility. Faster decreasing marginal returns to consumption imply
lower incentive to consume when the agent becomes wealthier (remember that ucc < 0). The income
effect is scaled by n, which is the mechanical effect on endowment of a one unit increase of w.

Example: We now study a functional form for preferences that is particularly convenient for the
study of optimal tax problems. Suppose the agent has the following utility:
1
n1+ "
u (c, n) = c
1 + 1"

This is a quasi-linear utility function whose property is to rule out income effects. We will come
back to this point later.
The optimality condition reads:
1
n" = w

Taking logs we get:


1
log n = log w
"
Since "un,w = @ log n/@ log w we can write:

"un,w = @ log n/@ log w = "

Therefore, this utility function has a constant elasticity of labor supply. Also, given the absence of
income effects, we know that "un,w = "cn,w .

8
Compensated Labor Supply Elasticity: We can derive the compensated response of labor supply
by using the Slutsky equation. We already know the uncompensated response to wage changes and
we therefore need to find @n/@I. Totally differentiating the FOC wrt I we get:

@n unc + wucc
=
@I w2 u cc + 2wunc + unn

The Slutsky equation is the following:


@n @nc @n
= + n
@w @w @I
Notice that the sign of the income effect is flipped since w is the price of leisure, while we are
studying the response of labor. We therefore conclude:
@nc uc
=
@w w2 u cc + 2wunc + unn

By comparing the compensated and uncompensated response we clearly see why quasi-linear pref-
erences imply no income effect: they are separable and linear in consumption. Therefore, unc = 0 and
ucc = 0.

-constant Elasticity: We introduce a concept that will be useful later in the analysis of intertem-
poral elasticites. The first order conditions for the static labor supply model solved with a Lagrangian
approach are:

uc =
un = w

Define the -constant or Frisch elasticity the elasticity that is computed assuming does not
change. Totally differentiating we get:
 " # 
@cF
ucc ucn @w 0
· @n F =
unc unn
@w

By inverting the 2 ⇥ 2 matrix we can solve the system. The sultions are:

" #  
@cF 1
@w ucc ucn 0
@nF
=
unc unn
@w
 
1 unn ucn 0
=
ucc unn u2cn unc ucc
" #
ucn
ucc unn u2cn
= ucc
ucc unn u2cn

A Comparison Among Elasticities: We now draw a comparison among the three elasticities
presented so far. We already know that "cl,w "ul,w since the income effect is negative on labor supply.
We therefore need to compare compensated and -constant elasticity. We will prove that "F l,w "cl,w .
Start by writing the following:

9
1 1 w2 ucc + 2wunc + unn ucc unn u2cn
= +
"cl,w "F
l,w uc uc ucc
✓ 2

1 ucn
= w2 ucc 2wunc
uc ucc
u2cn
The definition of -constant elasticity implies that unn ucc . It follows that:

u2cn
w2 ucc 2wunc w2 ucc 2wunc unn
ucc
= SOC
0

Where the last inequality uses the fact that the second order condition must be negative. Hence, we
established that "c1 1
"F
0, which implies "Fl,w "cl,w . Keeping the marginal utility of consumption
l,w l,w
constant implies that there are no income effects: higher wealth given the same amount of hours of
work does not change preferences towards consumption. Thus, the -constant elasticity is at least as
big as the compensated one.
We therefore conclude that the following is always true:

"F
l,w "cl,w "ul,w

1.5 Dynamic Labor Supply


In the previous paragraph we studied the static labor supply choice. Now we will switch to a dynamic
setting that allows us to study labor supply responses to over time changes in salaries. Agents make
labor supply decisions in view of their lifetime. Current labor supply depends on current and future
wages and income. Compared to static labor supply models, the substitution effect is similar, but the
income effect differs since the agent faces a lifetime budget constraint. MaCurdy (1981) provides a
useful framework to study labor supply elasticities over the lifecycle. In order to achieve our goal, we
need to separate exogenous static changes (such as the ones studied above) from evolutionary changes,
due to shifts in the life-cycle wage profile. In this analysis we need to distinguish between expected
and unexpected wage changes. While expected changes do not lead to wealth effects, permanent
unexpected changes generate strong wealth effects.
We distinguish among three dimensions of labor supply:
1. the pure lifecycle dimension. Usually, wages have a hump-shaped pattern over the lifecycle.
Agents adjust the hours of work in response to the different salaries they observe along their
lifetime.
2. the macro dimension. Hours of work vary over the business cycle following unexpected shocks.
3. the idiosyncratic dimension. A person may have temporarily higher wages in some period.
In order to isolate the labor supply response to expected changes in wage we need to rule out wealth
effects. We will employ the concept of Frisch elasticity, which allows us to keep the marginal utility of
consumption constant.
We study intertemporal labor supply in the same framework as before, but we introduce the time
dimension. Preferences are now:
+1
X
s
u (cs , ns )
s=t

10
The consumer faces the following period-by-period budget constraint:

At+1 = (1 + rt ) (At + yt + wt nt ct )

The Bellman equation for the problem is:

V (At ) = max u (ct , nt ) + V ((1 + rt ) (At + yt + wt nt ct ))


ct ,nt

The FOCs for the problem read:

0
t = (1 + rt ) Vt+1 (At+1 )
un (ct , nt ) = w t

By envelope t = V 0 (At ). Therefore, the conditions become:

0
t = (1 + rt ) Vt+1 ( t+1 )
un (ct , nt ) = t wt

Since t = uc (ct , nt ) the static labor supply choice is the same as in the previous paragraph:
un (ct , nt )
= wt
uc (ct , nt )
Rearranging the budget constraint we have:

At+1
c t = w t nt + y t + At
1 + rt
h i
Notice that the problem is identical to the previous one where income It = yt + A t+1
1+rt At .
In order to assess the Frisch elasticity, we need to compute the labor responses to changes in w
when we keep the constant avoiding any wealth effect. The Frisch demands are defined as follows:

ct = cF
t (wt , t)
nt = nF
t (wt , t)

Since the model is identical to the static labor supply choice and we already derived the Frisch
elasticity for the latter, we can write:
@nF
t wt ucc (ct , nt ) wt
"F
nt ,wt = =
@wt nt ucc (ct , nt ) unn (ct , nt ) u2cn (ct , nt ) nt

References
MaCurdy, T. E.. An Empirical Model of Labor Supply in a Life-Cycle Setting, Journal of Political
Economy, 1981, vol. 89, issue 6, pages 1059-85
Mas-Colell, A., M. Whinston, and J. R. Green. Microeconomic Theory. New York: Oxford University
Press, 1995.
Miller, N. H. Notes on Microeconomic Theory

11
2 Section 2: Introduction to Optimal Income Taxation

In this section we will introduce the problem of optimal income taxation. We will set up the government
problem and derive optimal taxes. We will study optimal linear tax rate, optimal top tax rate and the
revenue maximizing tax rate.

2.1 The Income Taxation Problem


Our goal for most of this class is to derive the properties of optimal taxes in different context. We will
define the tax in a flexible way using the mathematical object T (z), where z is the income reported
by the agent. The tax T (z) generates the retention function R (z) = z T (z). R (z) measures how
much the agent can retain out of total income z. We denote transfers to income z with T (z) so that
the transfer T (0) to non-working individuals is the intercept of the retention function.
If T (z) is differentiable, T 0 (z) represents the marginal tax rate. It measures how much the agent
gets taxed out of one additional dollar of income.
In order to study the extensive margin decision between working and remaining unemployed, we
need to know the participation tax rate ⌧p = T (z) z T (0) . It is the fraction of income that an agent pays
in taxes when she moves from 0 income to z.

2.2 Taxation in a Model With No Behavioral Responses


We start with a simple version of an optimal income taxation problem that ignores the labor supply
response to taxation. Suppose the agent has utility u (c) such that u0 (c) > 0 and u00 (c)  0. Labor
does not enter the utility function and it is supplied inelastically. The agent consumes everything that
is left after taxes so that c = z T (z). The economy is populated by several agents and their income
is distributed according to h (z) with support [0, 1].
We study the problem of a government, whose goal is to maximize the total utility of the economy.
Every agent in the economy is equally weighted such that:
ˆ 1
u (z T (z)) h (z) dz
0

We call this type of social welfare function utilitarian. The government targets a level of revenues
E and its budget constraint is:
ˆ 1
T (z) h (z) dz E
0

The Lagrangian for the problem reads:

L = [u (z T (z)) + T (z)] h (z)

Where is constant across individuals and measures the value of government revenues in equilib-
rium. The optimal choice of T (z) delivers the following first order condition:

@L
= [ u0 (z T (z)) + ] h (z) = 0
@T (z)

Rearranging:

u0 (z T (z)) =

12
Notice that since is constant and all agents have the same preferences, the equilibrium condition
implies that consumption is equalized across all individuals. This is a direct consequence of the
utilitarian social welfare function and the concavity of the utility. Suppose that we taxed a rich
individual who would otherwise have a high level of consumption to redistribute to a poor who would
otherwise have low consumption. The marginal utility gain of the poor would be higher that the
marginal utility loss of the rich if the utility has decreasing marginal returns (implied by the concavity
of the utility function). This implies that until all consumption levels are equalized across the economy
the government can increase social welfare through “redistribution” from rich to poor individuals. Since
every agent has the same weight in the government social welfare function, the optimal policy will
treat all individuals equally. There is no gain for the government from guaranteeing a higher level of
consumption to a particular group of individuals.
Taxes will serve the purpose of collecting ´ 1the revenues needed to meet the requirement E. Each
individual consumes c = z̄ E, where z̄ = 0 zh (z) dz is the average income. Therefore, we have a
100% marginal tax rate above z̃ = z̄ E.

2.3 Towards the Mirrlees Optimal Income Tax Model


The main limitation of the model presented in the previous paragraph is the absence of behavioral
responses. Agents were not allowed to respond to fiscal incentives and adjust the labor supply according
to the tax schedule. We showed that an extreme case of 100% marginal tax rate can be optimal without
causing a loss of revenues due to lower labor supply. We now relax the assumption of inelastic labor
supply and study a more flexible model.
Suppose the agent has preferences over consumption and labor represented by the utility function
u (c, l). Each agent earns income wl when supplying l hours of labor and consumes c = wl T (wl)
after taxes. Individuals are heterogeneous in the salary w that we will interpret as a measure of ability.
Salaries are distributed according to f (w).
Changes in taxes have labor supply effects that depends on the characteristics of the change. A
lump-sum change in the level of taxes at a given income changes labor supply through an income effect.
On the other hand, a shift in the marginal tax rate causes a distortion in the labor supply through a
substitution effect.

Social Welfare Functions: The general problem in Mirrlees (1971) assumes that individual welfare
is aggregated through a social welfare function G (· ). We typically assume that G (· ) is concave in order
to represent redistributive preferences of the government. We define the following a social marginal
welfare weight:
G0 ui uic
gi =

It measures the government marginal utility from giving a dollar to individual i. The expression is
scaled by the marginal value of revenues to the government ( ), that converts the marginal utility in
money metric. The concavity of the utility implies that gi is decreasing in zi . The social welfare effect
of giving $1 to all the individuals in the economy is therefore i gi .
´

2.4 Optimal Linear Tax Rate


In this paragraph we study the optimal income tax when we restrict the instruments that the govern-
ment can use to tax income. We focus on linear taxes ⌧ . The revenues of the tax are rebated through
lump-sum transfers. The individual therefore consumes:

ci = (1 ⌧ ) w i li + ⌧ Z
where Z represents the total income level in equilibrium and therefore ⌧ Z is the total tax revenue
from the tax.

13
The government sets the linear tax to maximize the following:
ˆ
G [ui ((1 ⌧ ) wi li + ⌧ Z, li )]
i

Notice that we do not have any government budget constraint since the entire revenue is rebated.
Applying the Envelope theorem we get:


dZ
ˆ
0
G (ui ) u0i w i li + Z ⌧ = 0
i d (1 ⌧ )


ˆ
G0 (ui ) u0i zi + Z Z"z,1 ⌧ = 0
i (1 ⌧)

Where the second line exploits the definition of uncompensated elasticity. Unlike zi , we implicitly
differentiate Z since the individual does not maximize over Z, but takes the transfer as given. In other
words the agent does not internalize the effect of her labor supply choice on aggregate revenues and
transfers. This is why the Envelope theorem does not apply to Z.
The two terms in the expression above are central in the optimal taxation literature:
• Z zi is the mechanical effect of the tax. Suppose we keep labor supply unchanged, an increase
in ⌧ generates a drop in income of zi and a mechanical increase in transfers of Z due to higher
revenues.
• ⌧
(1 ⌧ ) Z"z,1 ⌧is the behavioral effect of the tax. If we allow individuals to adjust their labor
supplies we have to take into account the fiscal externality on revenues: when people work less
the government collects lower revenues.
We could expect to see in the formula the utility consequence of a change in labor supply. However,
any welfare effect related to the behavioral response of the individual is excluded. The reason is that
although the agent changes the labor supply, if the tax change is small enough we can neglect the
utility effect invoking the envelope theorem. Remember that the logic of the envelope theorem is that
after we shift a parameter (the tax in this case) the agent is moving to a new bundle on the same
indifference curve.
Rearranging the optimality condition we find:


ˆ ˆ ˆ
Z gi g i zi = Z"z,1 ⌧ gi
i (1 ⌧)
´i i
gz
i´i i ⌧
1 = "z,1 ⌧
Z i gi (1 ⌧)
´
g z
We define ḡ = Z
i´i i
gi
and rewrite the condition above to get the optimal tax rate:
i

1 ḡ
⌧⇤ =
1 ḡ + "z,1 ⌧

The optimal tax is decreasing in "z,1 ⌧ and ḡ. When income is very elastic to taxes, the government
will tax less to avoid negative effects on revenues and transfers coming from distortions to the labor
supply. This is the efficiency part of the formula. On the other hand, ḡ is a measure of inequality in
the economy. It is low when income is extremely polarized. Therefore, the government increases taxes
at the optimum when inequality is high. This is the equity part of the formula.

14
2.5 Optimal Top Income Taxation
We now derive taxes as in Saez (2001). Instead of specifying a model, we consider the different effects
of a tax change and derive the tax by imposing that their sum is zero in equilibrium. Suppose the
government wants to optimally set a constant marginal tax rate ⌧ above an income threshold z ⇤ .
The average income above z ⇤ is denoted by z (1 ⌧ ) and it depends on the tax rate in place. The
uncompensated elasticity of z for top earners is constant and denoted by "z,1 ⌧ .
When tax ⌧ is raised we have no effects on individuals with income below z ⇤ , while all income
above z ⇤ are affected by the change. We will study three different effects of the tax.

Mechanical Effect Suppose labor supply was inelastic, when ⌧ increases we would see a mechanical
increase in revenues of the following form:

dM = d⌧ (z z⇤)

The mechanical effect is proportional to the difference between the average income above z ⇤ and
z . It measures the mechanical increase in revenues that is generated by the tax change.

Behavioral Effect Top earners react to the tax increase by adjusting their labor supply. The
behavioral response triggers a fiscal externality and a reduction in revenues. The behavioral effect is:

dz
dB = ⌧ dz = ⌧ d⌧
d (1 ⌧ )
⌧ 1 ⌧ dz
= d⌧
1 ⌧ z d (1 ⌧ )

= "z,1 ⌧ zd⌧
1 ⌧
It is proportional to the elasticity of labor supply since the more elastic is labor the higher is the
revenue loss.

Welfare Effect Denote with ḡ the (assumed) constant social marginal welfare weight for earners
above z ⇤ . The tax change mechanically raises revenues on top income individuals generating the
following welfare effect:

dW = d⌧ ḡ (z z⇤)

We also showed that the tax increase triggers a behavioral response. The reason it is not included
in the welfare effect is that if the tax change is small people reoptimize at the margin and their utility
level is unaffected. Again, this is an Envelope theorem argument.

Optimal Tax In equilibrium the three effects must sum to zero. If they did not the government
would have margin to adjust the tax rate and achieve a higher social welfare. We therefore have:


dM + dB + dW = d⌧ (1 ḡ) [z z ⇤ ] "z,1 ⌧ z =0
1 ⌧
Rearranging:
1 ḡ
⌧⇤ =
1 ḡ + a"z,1 ⌧

with a = z
z z⇤ measuring the thinness of the right tail in the income distribution. The optimal tax
is decreasing in the social marginal welfare weight of top earners ḡ: the more the government cares

15
about top income individuals, the less they will be taxed. As we could expect, the optimal tax is also
decreasing in the elasticity of labor supply. Higher elasticity implies larger efficiency costs. Finally, ⌧ ⇤
decreases in a. The shape of the income distribution matters: the government sets lower top income
taxes when earners above z ⇤ are mostly concentrated around z ⇤ . If instead there is a thicker tail, the
top income tax is higher.

References
Piketty, Thomas and Emmanuel Saez “Optimal Labor Income Taxation,” Handbook of Public Eco-
nomics, Volume 5, Amsterdam: Elsevier-North Holland, 2013. (web)

16
3 Section 3-4: Mirrlees Taxation

In this section we will solve the Mirrlees tax problem. We will and derive optimal taxes introducing
the concept of wedges and study the model with and without income effects.

3.1 The Model Setup


Suppose the agent has preferences over consumption and labor represented by the utility function
u (c, l) that we assume separable and quasi-linear such that u (c, l) = c v (l). We assume that
v 0 (l) > 0 and v 00 (l) 0. Each agent earns income z = nl when supplying l hours of labor and
consumes c = nl T (nl) after taxes. Individuals are heterogeneous in the salary n that represents
their type and we will interpret as a measure of ability. Salaries are distributed according to f (n), with
n 2 [n, n̄]. Individual welfare is aggregated through a social welfare function G (· ), that we assume
differentiable and concave.

Revelation Principle Throughout all of the tax problems that we study we will assume that the
government cannot observe the labor choice of the agent and her type. Income is the only observed
choice that the government can target. We solve the model using a revelation mechanism. Our goal is
to define an optimal tax schedule that delivers an allocation z (n), c (n) to each agent n. The Revelation
Principle claims that if an allocation can be implemented through some mechanism, then it can also
be implemented through a direct truthful mechanism where the agent reveals her information about
n.
We imagine that each agent reports to the government her type n0 and that allocations are a
function of n0 such that we can write c (n0 ), l (n0 ), z (n0 ) and u (n0 ). By revelation principle, the
government cannot do better than defining functions c (n), z (n) such that the agent finds optimal to
reveal her true productivity:
✓ ◆ ✓ ◆
z (n) z (n0 )
c (n) v c (n0 ) v
n n

for every n and n0 where n is the true type of the agent. Notice that since n is continuous we have
an infinity of constraints. In order to reduce the dimensionality of the problem, we assume that the
marginal rate of substitution between consumption and before-tax income is decreasing in n:

v 0 (z (n) /n)
M RScz = decreases in n
nu0 (c (n))

This is the so called single-crossing condition (or Spence-Mirrlees condition). Single-crossing and
incentive compatibility imply the monotonicity of allocations (i.e. c (n), z (n) are increasing in n). If
monotonicity and single-crossing are satisfied, we can replace the incentive constraint with the first-
order necessary conditions of the agent that provide a local incentive condition. Under monotonicity
and single-crossing the local conditions are also sufficient. While solving these problems we will only
impose local incentive constraints and ignore the monotonicity of allocations, which is then verified
ex-post.

Incentive Compatibility We reduce the dimensionality of the problem by taking a first order
approach that replaces the infinity of constraints for each individual with a local condition relying on
the optimal revelation choice. When reporting, the individual of type n solves the following problem:
✓ ◆
0 z (n0 )
max c (n ) v
n0 n

17
the first order necessary condition for this problem is:
✓ ◆
z 0 (n0 ) 0 z (n0 )
c0 (n0 ) v =0
n n
If the government wants the agent to reveal her true type, it must be:
✓ ◆
0 z 0 (n) 0 z (n)
c (n) = v
n n
Under the concavity assumption on the preferences, this is a global incentive constraint condition.
Suppose we study local utility changes by totally differentiating the utility wrt n, we get:
✓ ✓ ◆◆ ✓ ◆
du (n) z 0 (n) 0 z (n) z (n) z (n)
= c0 (n) v + 2 v0
dn n n n n
Notice that the term in the first bracket is the first order condition of the agent. We can thus write
du (n) /dn = z (n) /n2 v 0 (z (n) /n). This equation pins down the slope of the utility assigned to the
agent at the optimum. By convexity of v (·), the slope is always positive: the government assigns
higher utility to higher types at the optimum. Higher types have a lower marginal disutility of labor
for a given level of hours worked and they get informational rents in the equilibrium.

Labor Supply and Labor Wedge The individual solves the following optimization problem:
⇣z⌘
max z T (z) v
z n
The first order condition is:
v 0 (l)
T 0 (z) = 1
n
The second term on the right-hand-side of the equation is the marginal rate of substitution between
consumption and income and we can always write that T 0 (n) = 1 M RS (n). When the agent is
not distorted, the M RS is equal to 1 implying T 0 (z) = 0. We can interpret T 0 (z) as a wedge on
the optimal labor supply: whenever it is different from zero, labor supply is distorted. Wedges are a
central concept in the optimal taxation literature and we will encounter them throughout the class.
From the optimality condition, we can derive the elasticity of labor wrt the net of tax wage. Rewrite
the optimality condition as:
⇣z⌘
v0 = (1 T 0 (z)) n
n
Totally differentiating wrt (1 T 0 (z)) n, we have:
dl
v 00 (l) = 1
d (1 T 0 (z)) n
Which implies the following elasticity:
dl (1 T 0 (z)) n v 0 (z)
"= = 00
d (1 T 0 (z)) n l lv (z)

Resource Constraint Suppose the government has an exogenous revenue requirement E. The
revenues collected through taxation must be at least equal to E. Using the agent’s budget constraint
we can write the tax levied on a single individual as T (z (n)) = z (n) c (n). Summing over all the
individuals in the economy we get:
ˆ n̄ ˆ n̄
c (n) f (n) dn z (n) f (n) dn E
n n

This is the resource constraint for this economy. Notice that unlike incentive constraint, this
constraint is unique.

18
3.2 Optimal Income Tax
We now solve the constrained maximization problem using optimal control theory. Instead of having
taxes as a choice variable, we assume that the government chooses an allocation for each agent. Given
the individual’s budget constraint, this is equivalent to choosing a tax level. The government problem
is:
ˆ n̄
max G (u (n)) f (n)
c(n),u(n),z(n) n

s.t.
✓ ◆
du (n) z (n) 0 z (n)
= v
dn n2 n

ˆ n̄ ˆ n̄
c (n) f (n) dn z (n) f (n) dn E
n n

We solve the problem with a Hamiltonian where we interpret n as the continuous variable and
choose u (n) as state variable and z (n) as control. The incentive constraint becomes the law of motion
of the state variable: it measures how utility changes across types in equilibrium. In order to setup
the Hamiltonian, we need to replace consumption in the resource constraint with a function of state
and control variables. Using the definition of indirect utility, we can write c (n) = u (n) + v (z (n) /n).
We replace this condition into the resource constraint and setup the following Hamiltonian:
 ✓ ✓ ◆◆ ✓ ◆
z (n) z (n) z (n)
H = G (u (n)) + z (n) u (n) v f (n) + µ (n) 2 v 0
n n n

µ (n) denotes the multiplier on the incentive constraint of type n and is the multiplier on the
resource constraint.
The first order conditions of the optimal control problem are:
  ✓ ◆ ✓ ◆
@H v 0 (l (n)) µ (n) 0 z (n) z (n) 00 z (n)
= 1 f (n) + 2 v + v =0 (1)
@z (n) n n n n n
@H
= [G0 (u (n)) ] f (n) = µ0 (n) (2)
@u (n)
The transversality (boundary) conditions read:

µ (n) = µ (n̄) = 0

The Hamiltonian solution requires µ (n̄) u (n̄) = 0. However, if we want to provide positive utility
to type n̄ we must require µ (n̄) = 0. At the same time, since at the optimum the incentive constraints
will be binding downwards, we require µ (n) = 0. As it is standard in this kind of problems the lowest
type does not want to imitate any other agent in equilibrium implying that her incentive constraint
is slack, while everyone else is indifferent between her allocation and the allocation provided to the
immediately lower type.
If we integrate equation (2) over the entire type space and use transversality conditions we find:
ˆ n̄
= G0 (u (n)) f (n) dn
n

This is an expression for the marginal value of public funds to the government. It states that the
value of public funds depends on the marginal social welfare gains across the entire type space and

19
it is equal to the welfare effect of transferring $1 to every individual in the economy. In other words,
public funds are more valuable the higher are the social welfare gains achievable in the economy.
We can also integrate equation (2) to find the value of µ (n):
ˆ n̄
µ (n) = [ G0 (u (m))] f (m) dm (3)
n
Using the definition of labor elasticity, we rearrange the following:
 ✓ ◆ ✓ ◆ ✓ ◆
0 z (n) z (n) 00 z (n) 0 z (n) 1
v + v =v 1+
n n n n ✏

Exploiting the definition of the tax wedge, we simplify equation (1) to get:
✓ ◆
0 µ (n) 0 1
T (z (n)) = (1 T (z (n))) 1 +
f (n) ✏

Using the expression for µ derived in equation (3) we get:


✓ ◆ ´ n̄
T 0 (z (n)) 1+✏ [1 g (m)] f (m) dm
0
= n
(4)
1 T (z (n)) ✏ nf (n)
0
The optimal tax is decreasing in the elasticity of labor supply. We define g (n) = G (u(n)) the relative
´ n̄
social welfare weight of individual n such that n g (n) f (n) dn = 1. Remember that aggregates the
social welfare weights across the entire economy. Thus, a higher g (n) means that the government cares
relatively more about individual n and will tax her less.
A Rawlsian government would have g (n) = 0 for any n > n and the formula would reduce to:
✓ ◆
T 0 (z (n)) 1 + ✏ 1 F (n)
=
1 T 0 (z (n)) ✏ nf (n)

The second part of the expression captures the ratio of the mass above type n and the density at
n. It is a measure of thickness and the lower it is the higher marginal tax rate will be.

3.3 Diamond ABC Formula


In this paragraph we derive a tax formula presented in Diamond (1998). We change our assumption
about welfare weights and assume that they are distributed according to a function (n) with cdf
(n). The government objective function becomes:
ˆ n̄
u (n) (n) dn
n

´ n̄
By assumption n (n) dn = 1 implies = 1. First order conditions can be derived exactly as
before. We therefore have:

µ0 (n) = (n) f (n)

and after integration:

ˆ n̄
µ (n) = (f (n) (n)) dn
n
= (n) F (n)

20
Using the expression above the tax formula reads:
✓ ◆
T 0 (z (n)) 1+✏ (n) F (n)
=
1 T 0 (z (n)) ✏ nf (n)
To write the ABC formula we divide and multiply by 1 F (n) to get:
✓ ◆
T 0 (z (n)) 1+✏ (n) F (n) 1 F (n)
0
=
1 T (z (n)) ✏ 1 F (n) nf (n)
| {z } | {z } | {z }
A(n) B(n) C(n)

A (n) captures the standard elasticity and efficiency argument. B (n) measures the desire for
redistribution: if the sum of weights below n is high relative to the mass above n, the government will
tax more. Finally, C (n) measures the thickness of the right tail of the distribution. A thicker tail will
be associated to higher tax rates.
Notice that in the Rawlsian case (n) = 1 for every n > n and the formula converges to the one
presented in the previous paragraph.

3.4 Optimal Taxes With Income Effects


We now relax the assumption of no income effects. Suppose the utility of the agent takes the form
ũ (c, l) = u (c) v (l) where u0 (c) > 0 and u00 (c)  0.

Elasticity of Labor Supply The optimality condition for the labor supply choice becomes:
v 0 (l)
= (1 T 0 (z)) n
u0 (c)
The uncompensated response of labor supply to the net of tax wage is:
@lu u0 (c) + l (1 T 0 (z)) nu00 (c)
= 2
@ (1 T 0 (z)) n v 00 (l) (1 T 0 (z)) n2 u00 (c)
implying the following uncompensated elasticity:
v 0 (l)2 00
u0 (c) /l + u0 (c)2
u (c)
u
" = v 0 (l)2
v 00 (l) u0 (c)2
u00 (c)

The response of labor to income changes is given by:


@l (1 T 0 (z)) nu00 (c)
= 2
@I v 00 (l) (1 T 0 (z)) n2 u00 (c)
Using the Slutsky equation (as we did in Section notes 1):

@lc u0 (c) + l (1 T 0 (z)) nu00 (c) l (1 T 0 (z)) nu00 (c)


= 2 2
@ (1 T 0 (z)) n v 00 (l) (1 T 0 (z)) n2 u00 (c) v 00 (l) (1 T 0 (z)) n2 u00 (c)
u0 (c)
= 2
v 00 (l) (1 T 0 (z)) n2 u00 (c)
Therefore:
v 0 (l) /l
"c = 2
v 00 (l) (1 T 0 (z)) n2 u00 (c)

21
Optimal Tax Everything is similar to the previous case except for the fact that now we cannot
replace the variable c (n) in the resource constraint using the definition of indirect utility. We will
define consumption as an expenditure function c̃ (ũ (n) , z (n) , n) and implicitly differentiate it wrt to
ũ (n) and z (n). Start from the definition of indirect utility:

ũ (n) = u (c̃ (n)) v (z ⇤ (n) /n)

It follows that the following two conditions will hold at the optimum:

dũ (n) = u0 (c̃ (n)) dc̃ (n)

1 0 ⇤
0 = u0 (c̃ (n)) dc̃ (n) v (z (n) /n) dz ⇤ (n)
n
Rearranging:
dc̃ (n) 1
= 0
dũ (n) u (c̃ (n))

dc̃ (n) v 0 (z ⇤ (n) /n)



=
dz (n) nu0 (c̃ (n))
The Hamiltonian for the problem is:
✓ ◆
z (n) 0 z (n)
H = [G (u (n)) + (z (n) c̃ (ũ (n) , z (n) , n))] f (n) + µ (n) v
n2 n
and FOCs are:
  ✓ ◆ ✓ ◆
@H v 0 (z (n) /n) µ (n) 0 z (n) z (n) 00 z (n)
= 1 f (n) + v + v =0
@z (n) nu0 (c (n)) n2 n n n


@H
= G0 (u (n)) f (n) = µ0 (n)
@u (n) u0 (c⇤ (n))
In order to find the equilibrium value of the multiplier, we can integrate the second FOC:
ˆ n̄ 
µ (n) = G0 (u (m)) f (m) dm
n u0 (c (m))
We exploit the definition of the two elasticities to write:

2 ⇣ ⌘3
 ✓ ◆ ✓ ◆ ✓ ◆ 00 z(n)
z (n) z (n) 00 z (n) z (n) 41 + z (n) v n
v0 + v = v0 ⇣ ⌘5
n n n n n v0 z(n)
n
✓ ◆✓ u

z (n) 1+"
= v0
n "c
The optimal tax formula will then become:
✓ ◆
T 0 (z (n)) 1 + "u ⌘ (n)
= (5)
1 T 0 (z (n)) "c nf (n)
u0 (c(n))µ(n)
where ⌘ (n) = .

22
3.5 Pareto Efficient Taxes
We now ask the question of whether a tax system T0 (z) in place is Pareto-optimal, meaning that there
exists no feasible adjustment in the tax schedule such that all individuals in the economy are weakly
better off.
We can characterize the Pareto frontier of the previous problem by solving the following:
ˆ n̄
max u (n) (n) dn
n

s.t.

u (c (n)) h (z (n) /n) u (c (n0 )) h (z (n0 ) /n) 8n, n0

ˆ n̄
[z (n) c (n)] f (n) dn E
n

By varying the social marginal welfare weights, we can trace out every point on the Pareto frontier.
However, there might be points on the Pareto frontier that can be improved upon increasing the utility
of all the agents in the economy.
Werning (2007) develops a test for the Pareto optimality of a tax schedule. The first important
result of the paper is the following:
Proposition 1: A tax code fails to be constrained Pareto optimal if and only if there exists a
feasible tax reform that (weakly) reduces taxes at all incomes.3
Proof: (if ) suppose we weakly reduce taxes all over the entire economy, then every individual is
at least as well off.
(only if ) suppose there exists a Pareto improving feasible tax reform T1 (z). Then we have:

U (z1 (n) T1 (z1 (n)) , z1 (n) , n) U (zo (n) T0 (z0 (n)) , z0 (n) , n)
U (z1 (n) T0 (z1 (n)) , z1 (n) , n)

where the first inequality comes from the assumption of Pareto-improvement and the second from
the assumption that under T0 (z) the agent truthfully reveals her type and chooses z0 (n). The chain
of inequalities implies that T1 (z1 (n))  T0 (z1 (n)) for every n.
Proposition 1 implies that since the resource constraint is satisfied and both tax systems raise
revenues at least equal to E, a Pareto improvement can only occur through a tax reduction that does not
generate a drop in revenues. This can be interpreted as a Laffer effect: although the government lowers
taxes, the behavioral response (increase in labor supply) is strong enough to more than compensate
the revenue loss.

3.6 A Test of the Pareto Optimality of the Tax Schedule


In order to implement the test, Werning takes a dual approach to the optimal taxation problem that
we studied in the previous paragraphs. We rewrite the problem such that instead of maximizing the
social welfare function, the government maximizes the resources to provide a minimum level v̄ (n) of
indirect utility to every agent in the economy. We write the problem as follows:
ˆ n̄
max (z (n) c̃ (v (n) , z (n) , n)) f (n) dn
u(n),z(n) n

3 Feasible means that it satisfies the resource constraint.

23
s.t.

dv (n)
= Un (c̃ (v (n) , z (n) , n) , z (n) , n) (6)
dn

v (n) v̄ (n) 8n (7)


Notice that c̃ (v (n) , z (n) , n) is the expenditure function that we introduced to study optimal
taxes with income effects. The problem would also have a monotonicity constraint that we relax for
the moment, as we usually do. Notice that by changing the levels of v̄ (n) we can characterize the
entire Pareto frontier.4
We solve the problem with a Lagrangian by attaching multiplier (n) f (n) to (7) and µ (n) to the
local incentive constraint:

ˆ n̄ ˆ n̄
L = (z (n) c̃ (v (n) , z (n) , n)) f (n) dn + (n) v (n) f (n) dn
n n
ˆ n̄ ˆ n̄
+ µ (n) v 0 (n) dn µ (n) Un (c̃ (v (n) , z (n) , n) , z (n) , n) dn
n n

Notice that this is identical to the Lagrangian that ´we would obtain in a classical optimal tax

problem with welfare weights (n) f (n).5 We integrate n µ (n) u0 (n) dn by parts to obtain:
ˆ n̄ ˆ n̄
µ (n) v 0 (n) dn = µ (n̄) v (n̄) µ (n) v (n) µ0 (n) v (n) dn
n n

and rewrite the Lagrangian as follows:

ˆ n̄ ˆ n̄
L = (z (n) c̃ (v (n) , z (n) , n)) f (n) dn + (n) v (n) f (n) dn
n n
ˆ n̄ ˆ n̄
+µ (n̄) v (n̄) µ (n) v (n) µ0 (n) v (n) dn µ (n) Un (c̃ (v (n) , z (n) , n) , z (n) , n) dn
n n

The FOC wrt z (n) is:


✓ ◆ 
dc̃ (v (n) , z (n) , n) dc̃ (v (n) , z (n) , n)
1 f (n) µ (n) Unc (n) + Unz (n) = 0 (8)
dz (n) dz (n)
and the FOC wrt v (n) is:
4 The first order conditions for this problem will be sufficient. We can rewrite the problem in terms of ũ (n) = u (c (n))
and h̃ (n) = h (z (n) /n) so that the objective function becomes nh 1 h̃ (n) u 1 (ũ (n)) and is concave in ũ (n) and
h̃ (n) that become the new control and state variables.
5 The problem would be:

ˆ n̄
v (n) (n) f (n) dn
n

s.t.
dv (n)
= Un (c̃ (v (n) , z (n) , n) , z (n) , n)
dn
ˆ n̄
[z (n) c̃ (v (n) , z (n) , n)] f (n) dn E
n

24
dc̃ (v (n) , z (n) , n) dc̃ (v (n) , z (n) , n)
f (n) µ0 (n) µ (n) Unc (n) + (n) f (n) = 0 (9)
dv (n) dv (n)
We know from the paragraph about optimal taxation with income effects that we can write:

dc̃ (v (n) , z (n) , n) Uz (n)


= = M RS (n)
dz (n) Uc (n)

Also, since T 0 (n) = 1 M RS (n) (see the discussion about wedges) we have that:

dc̃ (v (n) , z (n) , n)


1 =1 M RS (n) = T 0 (n)
dz (n)

The term in square brackets in equation (8) can be written as follows:

dc̃ (v (n) , z (n) , n) Uz (n)


Unc (n) + Unz (n) = Unc (n) + Unz (n)
dz (n) Uc (n)
Uzn (n) Uc (n) Ucn (n) Uz (n)
= Uc (n) 2
Uc (n)

@ Uz (n)
= Uc (n)
@n Uc (n)
@M RS (n)
= Uc (n)
@n
Therefore, euqation (8) becomes:

@M RS (n)
T 0 (n) f (n) = µ (n) Uc (n)
@n
Using M RS (n) = 1 T 0 (n), we can rewrite the condition as:

T 0 (n) @ log M RS (n)


f (n) = µ (n) Uc (n)
1 T 0 (n) @n

Now, we move to the second FOC. We know from before that:

dc̃ (v (n) , z (n) , n) 1


=
dv (n) Uc (n)

We can therefore rewrite (9) as:

f (n) Unc (n)


µ0 (n) µ (n) + (n) f (n) = 0
Uc (n) Uc (n)

Any Pareto Efficient allocation must satisfy (7) and provide at least utility v (n̄) to every agent n.
By the complementarity-slackness condition, this is equivalent to ask that (n) f (n) 0 , which is
that the multipliers associated to the constraints are never negative. We rewrite the FOC imposing
the following inequality:

Uc µ0 (n) µ (n) Unc (n)  f (n) (10)


If we change variables and define µ̂ (n) ⌘ Uc (n) µ (n), we have:

µ̂0 (n) = Uc (n) µ0 (n) + µ (n) [Ucn (n) + Ucc (n) c0 (n) + Ucz (n) z 0 (n)]

25
Substituting into (10) we find:

Ucc (n) c0 (n) + Ucz (n)


µ̂0 (n) + µ̂ (n)  f (n)
Uc (n)

The local incentive constraint of the agent (FOC for optimal reporting) implies that c0 (n) /z 0 (n) =
Uz (n) /Uc (n). It follows that:

Ucc (n) c0 (n) + Ucz (n) Ucc (n) zc0(n)


(n) + Ucz (n)
= z 0 (n)
Uc (n) Uc (n)
Ucc (n) Uz (n) + Ucz (n) Uc (n)
= 2 z 0 (n)
Uc (n)

@ Uz (n) 0
= z (n)
@c Uc (n)
@M RS (n) 0
= z (n)
@c
We finally establish two conditions for Pareto efficiency:

T 0 (n) @ log M RS (n)


f (n) = µ̂ (n) (11)
1 + T 0 (n) @n
@M RS (n) 0
µ̂0 (n) µ̂ (n) z (n)  f (n) (12)
@c
Using the tax schedule in place, the preferences and the skill distribution we can derive the µ̂ from
equation (11). We can then use equation (12) to test for the Pareto efficiency of the tax schedule.

Applying the Test and Interpreting the Conditions Suppose the agent has preferences U (c, z, n) =
c 1 (z/n) with elasticity " = 1/ ( 1). The FOCs of the dual problem read:

µ0 (n) = (n) f (n) f (n) (13)

T 0 (n) µ (n) 1 + "


0
f (n) = (14)
1 T (n) n "
The tax schedule is Pareto-optimal if and only if (n) f (n) 0, which implies µ0 (n)  f (n).
This inequality is the same as the one derived in (12) since Uc (n) = 1 and @M RS (n) /@c = 0.
Equation (14) is the same as (11) when we notice that @M RS (n) /@n = (1 + 1/") /n.6
Suppose that the marginal tax rate is linear and equal to ⌧ , when we put the two conditions together
we get:

" @ ⌧
nf (n) = µ0 (n)  f (n) 8n
1 + " @n 1 ⌧

Taking the derivative wrt n the condition becomes:


6 Noticethat M RS (n) = z (n) 1
n
Therefore:
@ log M RS (n) @M RS (n) 1
= =
@n @n M RS (n) n
where = 1 + 1/".

26

" ⌧ nf 0 (n)
1  1 8n (15)
1+"1 ⌧ f (n)
First, the condition in (15) shows that for any ⌧ and " there exists a set of f (n) such that ⌧ is
Pareto efficient and a set of f (n) such that it is not Pareto efficient. At the same time, for any " and
f (n) we can find flat tax schedules ⌧ that are efficient and set of ⌧ s that are inefficient. It follows
that it is crucial to know the distribution of skills. The test can also be written in terms of income
distributions that are easier to infer from the data. Higher " makes the condition harder to be satisfied:
when individuals are reacting more to changes in taxes, a tax reduction is more likely to lead to a
Pareto improvement. When taxes are locally lowered at some n, the individuals below n will tend to
increase their labor supply and individuals above will reduce the labor supply. The term nf 0 (n) /f (n)
measures the elasticity of the skill distribution and captures how fast the skill distribution is decreasing
at some n. Highly negative elasticity of skill distribution at n means that the distribution decreases fast
and that the mass of individuals below n is significantly larger than above n, implying a local Laffer
effect from the increase in labor supply of individuals below n. In other words, by locally decreasing
taxes at n the government can increase revenues by incentivizing the labor supply of the large mass of
individuals below n. For this reason, when the elasticity of the skill distribution is highly negative the
test is harder to pass.7

References
Diamond, Peter (1998), “Optimal Income Taxation: An Example with a U-Shaped Pattern of Optimal
Marginal Tax Rates”, American Economic Review, 88, 83–95.
Salaniè, Bernard (2002), “The Economics of Taxation”, The MIT Press
Scheuer, Florian (2016), Lecture Notes
Werning, Ivan (2007), “Pareto Efficient Income Taxation”, Working Paper

7 In the second part of the sequence you will study how to run the same test in the inequality deflator framework

using the concept of fiscal externality.

27
4 Section 5: Optimal Taxation with Income Effects
and Bunching
In this section we will see the derivation of optimal income taxes proposed by Saez (2001). We will
also introduce a way to estimate the elasticity of reported income that exploits the degree of bunching
at the kinks of the existing tax schedule (Saez, 2010).8

4.1 Optimal Taxes with Income Effects


In this paragraph we derive optimal income taxes with income effects following the experiment in
Saez (2001). Suppose individuals in the economy are heterogeneous in ability n and work ln hours
earning income zn . We can write the labor supply as a Walrasian (uncompensated) demand such that
ln (wn , Rn ) where wn = n (1 T 0 ) is the net of tax wage that the agent earns in equilibrium and Rn is
the virtual (non-labor) income. Define Rn under the assumption that the tax is linear and tangent to
the tax schedule at zn such that c = (1 T 0 ) nln + Rn . Using the fact that c = nln T (nln ) we can
write Rn = nln T (nln ) nln (1 T 0 ). Using the Walrasian demand we can define uncompensated
elasticity and the income parameter as follows:

@ln (1 T 0 )
⇣nu = (16)
@ (1 T 0 ) ln
@ln
(1 T 0 )
⌘n = (17)
@Rn
Using the definitions above and the Slutsky equation, we can write the compensated elasticity as:

⇣nc = ⇣nu ⌘ (18)


We now derive a result that will be useful later. Totally differentiating ln we get:
@ln h ⇣ ⌘ i @ln ⇣ ˙ ⌘
l˙n = 1 T 0 n nl˙n + ln T 00 + nln + ln (nln T 00 )
@wn @Rn
rearranging

wn @ln ln @ln wn @ln ln T 00 h i
l˙n = + wn l n + n ˙
l n
ln @wn n @Rn ln @wn (1 T 0 )

Notice that żn = ln + nl˙n and apply the definitions in (16) and (17) to get:

ln ln T 00 c
l˙n = ⇣nu żn ⇣ (19)
n 1 T0 n
Using (19) we can write:

żn ln + nl˙n 1 + ⇣u T 00 c
= = żn ⇣ (20)
zn nln n 1 T0
In order to derive the optimal tax, we follow the experiment in Saez (2001). Suppose we introduce
a perturbation around the optimal tax schedule such that we raise the marginal tax rate by d⌧ in a
small interval [z ⇤ , z ⇤ + dz ⇤ ].9 As we have already seen in previous Sections, the tax has three main
effects: mechanical, welfare and behavioral. While the mechanical and welfare effects are similar to the
ones we have previously studied, the behavioral effect now consists of two components: an elasticity
effect for people in the interval [z ⇤ , z ⇤ + dz ⇤ ] and an income effect for taxpayers above z ⇤ .
8 The bunching paragraph is based on previous notes by Simon Jager.
9 The experiment also assumes that d⌧ is second order relative to dz ⇤ to avoid any bunching at kink.

28
Figure 1: Tax Reform Experiment

Mechanical and Welfare Effects Every taxpayer above z ⇤ will pay an extra dz ⇤ d⌧ of taxes, which
for welfare purposes are valued according to the social marginal welfare weights g (z). The net-of-
welfare mechanical effect for pre-tax income z is (1 g (z)) dz ⇤ d⌧ . Summing up over all incomes above
z ⇤ we get:
ˆ 1
M = dz ⇤ d⌧ (1 g (z)) h (z) dz
z⇤

Social marginal welfare weights g (z) represent the value for the government of giving one dollar to
some level of income z. In particular, the government is indifferent between giving 1/g (z1 ) dollars to
individual 1 and 1/g (z2 ) dollars to individual 2. The social marginal welfare weights are expressed in
government money. Going back to the fully specified problem, we can interpret the weights as being
normalized by , the multiplier on the resource constraint. The measures the value of transferring
one dollar to every individual in the economy and captures the value of government funds. Higher
means that the government can significantly raise welfare by transferring money to the individuals in
the economy, a low on the other hand implies that the gain from transfers is low and public funds
are less valuable.

Elasticity Effect The increase in the marginal tax rate has an effect on individuals’ labor supply
that is denoted by dz. The effect consists of two parts. First, there is the direct consequence of
the increase in taxes that depends on the compensated elasticity of labor supply. Second, since the
taxpayer changes her labor supply by dz shifting on the tax schedule, she will face an additional change
in the tax. We write the change in the marginal tax rate induced by the shift dz as dT 0 = T 00 dz. The
behavioral response is proportional to the total tax change d⌧ + dT 0 :

dz c d⌧ + dT 0
dz = (d⌧ + dT 0 ) = ⇣ cz⇤
d (1 T 0 ) 1 T0

29
Rearranging:
d⌧
dz = ⇣ cz⇤ (21)
1 T0 + ⇣ c z ⇤ T 00

Income Effect All individuals above z ⇤ face a parallel shift of the tax schedule and pay additional
taxes for dz ⇤ d⌧ . The mechanical increase in taxes paid has a direct income effect that depends on the
income parameter ⌘ = (dz/dR) (1 T 0 ). Moreover, since the individual shifts along the tax schedule
we must take into account a further change in the tax rate. The two effects combined are:
T 00 dz d⌧ dz ⇤
dz = ⇣ cz ⌘
1 T0 1 T0
rearranging
d⌧ dz ⇤
dz = ⌘
1 T 0 z⇣ c T 00
In order to compute the total revenue effect we then need to sum over all taxpayers above z ⇤ and
account for the marginal tax rate T 0 :
ˆ 1
d⌧ dz ⇤
⌘ T 0 h (z) dz (22)
z⇤ 1 T 0 z⇣ c T 00

Virtual and Actual Income Density Saez (2001) introduces the concept of virtual density in
order to simplify the tax formulas. The virtual density is closely related to the virtual income in that
it is the income density that would arise if the tax system was linear and tangent to the tax schedule
T (z) at every z. We denote the virtual density with h⇤ (z). The mapping between virtual density and
the type distribution f (n) is given by the following:

h⇤ (z) ż ⇤ = f (n)

where ż ⇤ is the derivative of earnings wrt nwhen the linear tax schedule is in place. A similar
relation holds for h (z) and we have h (z) ż = f (n). Using the definition in (27) and the fact that
T 00 = 0 when a linear tax schedule is in place we can write:
1 T0 1 + ⇣nu ⇤ 1 + ⇣nu
h (zn ) z n = h (z n ) zn
1 T 0 + zn ⇣nc T 00 n n
It follows that:

h⇤ (z) h (z)
= (23)
1 T 0 (z) 1 T 0 (z) + ⇣ c zT 00 (z)

Optimal Income Taxes Starting from (21) and using equation (23) we can write the elasticity
effect as follows:

d⌧ T0
E= h (z) dz ⇤ T 0 ⇣ c z ⇤ = ⇣ cz⇤ h⇤ (z) d⌧ dz ⇤ (24)
1 T + ⇣ c z ⇤ T 00
0 1 T0
Notice that in order to get the revenue effect that the elasticity effect should measure, we multiplied
the expression in (21) by the marginal tax rate T 0 and by h (z) dz ⇤ , which is the share of taxpayers
affected by the tax reform. Using (23) we can write the income effect as follows:
ˆ 1
T0
I = d⌧ dz ⇤ ⌘ h⇤ (z) dz (25)
z⇤ 1 T0

30
At the optimum, the sum of the three effects must be zero. We thus impose:

M +E+I =0

and find
1 1
T0 T0
ˆ ˆ
dz ⇤ d⌧ (1 g (z)) h (z) dz ⇣ cz⇤ h⇤ (z) d⌧ dz ⇤ d⌧ dz ⇤ ⌘ h⇤ (z) dz = 0
z⇤ 1 T0 z⇤ 1 T0

Rearranging:

ˆ 1 ˆ 1
T0 1 1 T0
= (1 g (z)) h (z) dz ⌘ h⇤ (z) dz
1 T0 ⇣ c h⇤ (z) z ⇤ z⇤ z⇤ 1 T0
ˆ 1 ˆ 1
1 1 H (z ⇤ ) h (z) T0 h⇤ (z)
= (1 g (z)) dz ⌘ dz (26)
⇣ c h⇤ (z) z ⇤ z⇤ 1 H (z ⇤ ) z⇤ 1 T 0 1 H (z ⇤ )

As we have already seen in the previous Sections the formula consists of different terms. 1
H (z ⇤ ) /h⇤ (z) z ⇤ captures the shape of the income distribution and measures how many people are
above z ⇤ relative to how much income is accumulated at z ⇤ (i.e. z ⇤ h (z ⇤ )). The former is proportional
to the mechanical increase in revenues, while the second measures the total income that is distorted
by the tax. The marginal tax is also decreasing in the compensated elasticity ⇣ c following a classical
efficiency argument, and increases the larger the income effect is (in absolute value): a stronger income
effect means that the negative fiscal externality from higher taxes is reduced.

4.2 Bunching Estimator


In this paragraph we will study a way to derive income elasticity that was introduced by Saez (2010).
The methodology exploits the degree of bunching at the kinks that characterize the tax schedule.
Suppose that individual incomes z are distributed according to a smooth density distribution h (z).
There is a constant marginal tax rate t at income z ⇤ and a reform introduces an increase in marginal
taxes such that it becomes t + dt for all incomes above z ⇤ . The kink will induce people that were
falling in the interval [z ⇤ , z ⇤ + dz ⇤ ] before the reform to bunch at z ⇤ . Denote with L an individual who
is exactly indifferent between the pre and post-reform tax schedule and does not change her income in
equilibrium. This individual’s indifference curve has slope 1 t at z ⇤ . There is also an individual H
who represents the highest pre-reform income bunching at z ⇤ . The indifference curve of H has slope
1 t dt at z ⇤ and is tangent to the slope of the retention function above z ⇤ .

31
Figure 2: Bunching

The response of income to the tax reform for individual H is:

dz zH dt
dz ⇤ = | H d (1 t) = e d (1 t) = e (z ⇤ + dz ⇤ )
d (1 t) z=z 1 t 1 t

Where e is the compensated elasticity of income. Notice that the dz ⇤ is proportional to the ratio
between the change in the tax rate and the net-of-tax rate 1 t. It follows that, everything else being
equal, a change in marginal tax rates from 0 percent to 10 percent should produce the same amount
of bunching as a change from 90 percent to 91 percent. Rearranging:
✓ ◆
1 t + edt z⇤
dz ⇤ = e dt
1 t 1 t

which implies:
z⇤
dz ⇤ = e dt (27)
1 t + edt
It is not surprising that dz ⇤ is increasing in the elasticity of income, implying that if income is more
elastic more people will bunch.
Suppose the income distribution is locally continuous, the share of people bunching at the kink is:

s (z ⇤ ) = h (z ⇤ ) dz ⇤

Using the definition in (27):

s (z ⇤ ) z⇤
= e dt
h (z ⇤ ) 1 t + edt

32
✓ ◆
s (z ⇤ ) ⇤ s (z ⇤ )
e + z dt = (1 t)
h (z ⇤ ) h (z ⇤ )

s (z ⇤ ) 1 t s (z ⇤ ) 1 t 1
e= ⇡
s (z ⇤ ) + z ⇤ s (z ⇤ ) dt h (z ⇤ ) z ⇤ dt

where we assumed that s (z ⇤ ) /z ⇤ ⇡ 0. The formula shows that using the share of people bunching
at kink and the income distribution that would arise under the no reform scenario we can estimate the
elasticity of labor supply.

References
Saez, E. “Using Elasticities to Derive Optimal Income Tax Rates”, Review of Economics Studies, Vol.
68, 2001, 205-229.
Saez, E. “Do Taxpayers Bunch at Kink Points?”, AEJ: Economic Policy, Vol. 2, 2010, 180-212

33
5 Section 6: Optimal Income Transfers
In this Section we study the optimal design of income transfers. We start from a formal model where we
specify individuals’ preferences and a government’s social welfare function. We then take the approach
by Saez (2002) to derive optimal transfers using an “experiment” where we introduce a perturbation
around the optimal tax schedule for a generic “occupation” and derive a formula for the optimal tax.

5.1 Optimal Income Transfers in a Formal Model


We introduce in this paragraph a model of discrete choices where we will derive optimal taxes. Suppose
agents choose an occupation i among a set of occupations {1, 2, . . . , I} and earn income wi at occupation
i. Each individual is indexed by m 2 M being a multidimensional set of measure one. The measure of
individuals on M is denoted by dv (m). The agents maximize um (ci , i) differentiable in consumption.
Individual consumption after taxes is ci = wi Ti . A tax schedule defines a vector (c0 , . . . , cI ) such that
the set M will be partitioned in subsets M1 , M
P2 , . . . , MI . Denote with hi (c0 , c1 , . . . , cI ) the fraction of
individuals choosing occupation i such that i hi = 1. hi is differentiable under the assumption that
tastes for work captured by um (·) are regularly distributed. We define the elasticity of participation
for occupation i as follows:
ci c0 @hi
⌘i = (28)
hi @ (ci c0 )
Suppose the government weights individual utilities through linear welfare weights µm and that
the social welfare function is:
ˆ
W = µm um (wi⇤ Ti⇤ , i⇤ ) dv (m) (29)
M
The government has some revenue requirement H such that the budget can be written as:
X
hi T i = H (30)
i

We solve the problem with a Lagrangian where we attach multiplier to the government constraint.
The FOC wrt Ti reads:
2 3
m ⇤ XI
@u (c , i ) @h
ˆ
i ⇤ j5
µm dv (m) + 4hi Tj =0 (31)
Mi @ci j=0
@c i

For the usual envelope argument equation (31) ignores the welfare effect of a change in ci .
A social marginal welfare weight is:

1 @um (ci⇤ , i⇤ )
ˆ
gi = µm dv (m) (32)
hi M i @ci
Using the definition of gi we can rewrite (31) as:
I
X @hj
(1 g i ) hi = Tj (33)
j=0
@ci

This formula is very similar to the one you will see in the spring studying Ramsey taxation.10 Take
10 The formula implies that the following is true for every i:

X
I
Tj @hj
=1
hi (1 gi ) @ci
j=0

34
a benchmark case of no income effects such that hj (c0 , . . . , cI ) = hj (c0 + R, . . . , cI + R), the formula
implies that (1 gi ) hi = 0. Summing over all is:
X X
hi g i = hi = 1 (34)
i i

5.2 Optimal Tax/Transfer with Extensive Margin Only


Suppose each individual only chooses between some occupation i and being unemployed. This can be
rationalized by a utility function where um (cj , j) = 1 for any j 6= i. The assumption implies that
@hi /@ci + @h0 /@ci = 0 and we can rewrite (33) as:

@hi @h0 @hi


(1 g i ) hi = T i + T0 = (Ti T0 )
@ci @c0 @ci
using the definition of the elasticity of participation:
Ti T0 1
= (1 gi ) (35)
ci c0 ⌘i
The level of taxation at occupation i decreases in the elasticity of taxation for the usual efficiency
argument.
Redistributive preferences imply g0 g1 . . . gI . Suppose there are no income effects, we know
from (34) that the weighted average of the gi s is 1 and therefore there is a i⇤ such that gj  1 for j  i⇤
and gj > 1 for j > i⇤ . This implies that Ti T0  0 for i  i⇤ , meaning that the government is providing
a higher transfer to workers with low income relative to unemployed. Therefore, we established that it
is optimal for the government to implement negative marginal tax rates at the bottom of the income
distribution.
If the government was Rawlsian, we could have that g0 only is higher than 1. When this is the
case, the tax schedule does not display negative marginal tax rates and we have a classical negative
income tax. On the other hand, a utilitarian government would have constant gi s such that the
budget constraint is satisfied. We therefore have two cases. First, if every individual can pay H, the
government will charge a constant lump-sum tax equal to H to every taxpayer and gi = 1 for every
i. Second, if low incomes cannot afford the tax the government will only impose the tax on higher
income setting their social marginal welfare weights below 1 and having positive marginal tax rates
throughout occupations.

Tax Experiment The same formula for optimal taxes can be derived through the following exper-
iment. Suppose taxes increase by dTi for occupation i. The mechanical increase in tax revenues is
hi dTi and it will be valued (1 gi ) hi dTi by the government taking into account the welfare effect of
the change. The government must also account for the fiscal externality generated by the behavioral
response of agents in occupation i. Using the elasticity of participation, the share of people leaving
occupation i is:
hi
dhi = ⌘i dTi
ci c0
Each worker leaving occupation i generates a loss in revenues equal to Ti T0 . The total behavioral
effect of the tax increase is:
Ti T0
dhi (Ti T0 ) = ⌘ i hi dTi
ci c0
We can interpret the lhs as an index of how much labor supply is discouraged. The formula holds for every i and
implies that discouragement is equalized across all occupations.

35
Summing the mechanical and behavioral effects at the optimum we get:
Ti T0
(1 gi ) hi dTi ⌘ i hi dTi = 0
ci c0
Rearranging we can derive (35). The decomposition of the formula in mechanical and behavioral
effects provides further intuition for why marginal tax rates can be negative at the optimum. For very
low incomes the mechanical effect of providing an extra dollar is positive (gi > 1) and at the same
time a decrease in taxes at i provides incentives for unemployed workers to enter the labor force. The
sum of the two effects is unambiguously positive.

5.3 Optimal Tax/Transfer with Intensive Margin Responses


Suppose that agents’ preferences are such that they can only work in two adjacent occupations and
that we can write the share of workers in occupation i as hi (ci+1 ci , ci ci 1 ) when we assume there
are no income effects.11 The behavioral elasticity is defined as follows:
ci ci 1 @hi
⇣i = (36)
hi @ (ci ci 1)

Equation (33) becomes:

@hi+1 @hi @hi @hi 1


(1 g i ) hi = Ti+1 Ti + Ti + Ti 1
@ (ci+1 ci ) @ (ci+1 ci ) @ (ci ci 1) @ (ci ci 1)

By assumption on agent’s preferences @hi+1 /@ (ci+1 ci ) = @hi /@ (ci+1 ci ) and rearranging we


find:
@hi+1 @hi
(1 g i ) hi = (Ti+1 Ti ) + (Ti Ti 1)
@ (ci+1 ci ) @ (ci ci 1)

Summing over i, i + 1, . . . , I and using the definition in (36) we can derive the optimal tax formula:

Ti Ti 1 1 (1 gi ) hi + (1 gi+1 ) hi+1 + . . . + (1 gI ) hI
= (37)
ci ci 1 ⇣i hi
Non-increasing social marginal welfare weights imply that (1 gi ) hi + (1 gi+1 ) hi+1 + . . . +
(1 gI ) hI 0 for any i > 0. Thus, the tax Ti is increasing in i and it is not optimal to set negative
marginal tax rates. Using (34), (37) and computing the formula for the tax rate at the bottom of the
income distribution we get:

T1 T0 1 (g0 1) h0
= (38)
c1 c0 ⇣1 h1
A higher social marginal welfare weight g0 implies a higher tax rate at the bottom. The reason is
that if the government cares more about the unemployed individual it should set the lump-sum transfer
T0 as large as possible by imposing large phasing-out tax rates at the bottom. Negative marginal
tax rates at the bottom can still occur for g0 < 1, but this would imply that the unemployed worker
has a lower welfare weight than the average taxpayer in the economy, meaning that the government
has unusual redistributive tastes.
11 We can write the share of people working in occupation i as h (c , c
i i i+1 ). No income effects imply h (c0 , c1 , . . . , cI ) =
h (c0 + R, c1 + R, . . . , cI + R) . It follows that h (ci , ci+1 ) = h (ci + ci ci 1 , ci + ci+1 ci ) = h (ci ci 1 , ci+1 ci ).

36
Tax Experiment The formula in (37) can be derived through an experiment where taxes increase
by dT for any occupation i, i + 1, . . . , i + I. This change decreases ci ci 1 by dT and leaves any
other difference unaltered. The mechanical increase in revenues is [hi + hi+1 + . . . + hI ] dT and net-
of-welfare it is valued [hi (1 gi ) + hi+1 (1 gi+1 ) + . . . + hI (1 gI )] dT . The behavioral effect of
the tax change arise from individuals in occupation i only when we assume income effects away. The
impact on revenues is dhi = hi ⇣i dT / (ci ci 1 ) and it must be scaled by the loss in revenues Ti Ti 1
generated by each worker switching to occupation i 1. Summing the two impacts:

[hi (1 gi ) + hi+1 (1 gi+1 ) + . . . + hI (1 gI )] dT hi ⇣i (Ti Ti 1 ) dT / (ci ci 1) =0

Rearranging we get the formula in (37). The mechanical and behavioral effects help providing
intuition for why negative marginal tax rates are not optimal with intensive margin only. Suppose
the government raised taxes at i when there is a negative marginal tax rate in the interval [i 1, i].
Individuals would respond by shifting their labor supply to i 1 and, given the higher tax rate, would
pay more taxes. At the same time the tax change would mechanically increase revenues. Therefore,
the government could always improve welfare by increasing taxes as long as the marginal tax rate is
negative.

5.4 Optimal Tax/Transfer with Intensive and Extensive Margin Responses


We present for the sake of simplicity only the tax experiment derivation of the formula. Suppose taxes
are raised by dT for everyone in occupation i, i + 1, . . . , iI . The mechanical effect is the same as the one
observed in the previous paragraph. However, we have to add the participation effect of an increase in
the tax for all the occupations above i. The share of people who become unemployed leaving a generic
occupation i is hi ⌘i dT / (ci c0 ), generating a revenue loss equal to hi ⌘i (Ti T0 ) dT / (ci c0 ).
Summing this effect over every occupation j i and setting the sum of behavioral and mechanical
effects equal to 0, we can derive the following formula:
I 
Ti Ti 1 1 X Tj T0
= hj 1 gj ⌘j (39)
ci ci 1 ⇣i hi j=i cj c0

When a tax is lowered in the pure extensive margin model, labor supply unambiguously increases.
On the other hand, if a tax is decreased in a pure intensive margin model individuals will have incentives
to lower their labor supply. The formula shows how to optimally trade-off the two effects.
Notice that (39) can be rewritten as (37) where we employ augmented social welfare weights
ĝi = gi + ⌘i (Ti T0 ) / (cj c0 ). When the participation elasticity is high enough, the augmented
welfare weights are not necessarily decreasing in wi if gi s are. This explains why an earning income
tax credit could be optimal in a mixed model.

References
E. Saez "Optimal Income Transfer Programs:Intensive Versus Extensive Labor Supply Responses"
Quarterly Journal of Economics, 117, 2002, 1039-1073
E. Saez, P. Diamond. "The Case for a Progressive Tax: From Basic Research to Policy Recommenda-
tions", Journal of Economic Perspectives 25(4), Fall 2011, 165-190

37
6 Section 7: Optimal Top Income Taxation
In this Section we study the optimal design of top income taxes.12 We have already covered optimal
top income taxation in a simple Mirrlees framework in Section 2. Today, we will start from the
“trickle down” model with endogenous wages introduced by Stiglitz (1982). We then analyze an
example of optimal taxation in a general equilibrium model where workers choose between different
occupations/sectors (Rothschild Scheuer, 2016). Finally, we present a model where top earners respond
to taxes on three margins: labor supply, tax avoidance, and compensation bargaining (Piketty Saez
Stantcheva, 2014).

6.1 Trickle Down: A Model With Endogenous Wages


Stiglitz (1982) studies a model with two unobservable types and endogenous wages. Suppose there are
two types of workers: H (high skill) and L (low skill). For simplicity we assume they have equal mass
and work li hours. The utility for a generic type i is u (ci , li ). Work is the only input in the constant
return to scale (CRS) production function F (lL , lH ). With competitive labor markets wages are equal
to the marginal product of labor:

@F (lL , lH )
wi = (40)
@li
The standard Mirrlees model implicitly assumes a linear production F (lL , lH ) = ✓L lL +✓H lH where
✓i is the ability of agent i so that wages are wi = ✓i .
The resource constraint of this economy is:
X
ci  F (lL , lH )
i

The government assigns linear welfare weights L and H to the two types. If H < L the
government wants to redistribute to low types and we know that in equilibrium the incentive constraint
for the high type is binding:
✓ ◆
w L lL
u (cH , lH ) = u cL , (41)
wH
We solve the problem with the following Lagrangian:

" #
X
L = L u (cL , lL ) + H u (cH , lH ) + F (lL , lH ) ci (42)
i
 ✓ ◆ X
w L lL
+µ u (cH , lH ) u cL , + ⌘i (wi Fi (lL , lH ))
wH i

is the marginal value of public funds, µ is the value of relaxing the incentive constraint for type
H and ⌘s are the multipliers on the constraints in (40).
We derive the optimal marginal tax rate for the high type by optimally choosing cH and lH . The
FOCs are respectively:

[ H + µ] uc (cH , lH ) = (43)

X
[ H + µ] ul (cH , lH ) = FH (lH , lL ) + ⌘i FiH (lL , lH ) (44)
i
12 The first two paragraphs of this section are based on notes by Florian Scheuer.

38
The optimal labor supply choice implies the following labor wedge:
ul (cH , lH )
T 0 (zH ) = 1 +
uc (cH , lH ) wH

Using (43) and (44) we can rewrite the labor wedge as follows:
P P
FH (lH , lL ) + i ⌘i FiH (lL , lH ) ⌘i FiH (lL , lH )
T 0 (zH ) = 1 + = i (45)
FH (lL , lH ) FH (lL , lH )
The sign of (45) depends on ⌘L and ⌘H . In order to sign them, we exploit the government optimal
choice of wi characterized by the following FOCs:
✓ ◆
w L l L lL
µul cL , + ⌘L = 0
wH wH

✓ ◆
w L lL w L lL
µul cL , 2 + ⌘H = 0
wH wH
Since ul < 0, they imply ⌘L < 0 and ⌘H > 0. The CRS technology and concavity imply FHL > 0
and
P FHH < 0, which means complementarity0 between the two factors of production. Therefore,
i ⌘i FiH (lL , lH ) < 0 and we conclude that T (zH ) < 0. Top earners are subsidized at the margin
because their labor raises the wages of lower earners. By closing the gap between the two wages the
government can relax the incentive constraint for the high type and allow for additional redistribution.
The result is entirely driven by the complementarity of the two factors in the production function,
which generates a “positive externality” of the high type on the low type. In the classical Mirrlees
model with linear technology there is no complementarity and top incomes are not subsidized.

6.2 Taxation in the Roy Model and Rent-Seeking


In this paragraph we study a more general model introduced by Rothschild and Scheuer (2013, 2016)
where individuals can choose the sector where they work, how much they work and have a multidimen-
sional vector of skills (one for each sector). We present a simple example with two activities and a two
dimensional skill vector. Workers can choose between two activities: a traditional productive activity
where the wage reflects the social marginal product of labor and rent-seeking where the marginal prod-
uct of labor is zero and workers compete for a fixed rent µsuch that wages are proportional to µ/E,
with E being the total effort in the rent-seeking sector. Every individual has a skill vector (✓, ') such
that ✓ is the ability in the productive sector and ' is ability in the rent-seeking sector. Suppose there
are only two types of workers in the economy: productive workers with ✓ = ' = 1 and rent-seekers
with ✓ = 0 and ' = 'R .
The total rent-seeking effort is:

E = 'R e R + P eP

where P is the fraction of productive workers working in the rent-seeking sector. Productive
workers are indifferent between the two sectors when the wage in the rent-seeking sector is equal to 1
(the marginal product of labor in productive sector) and we have µ/E = 1 implying µ = E. If instead
E > µ they would all work in the traditional sector; while when E < µ they would all choose the
rent-seeking activity.
Suppose that preferences are quasi-linear u (c, e) = c h (e). It can be shown that the optimal
allocation involves an interior equilibrium where productive workers are indifferent between the two
occupations. If P is the share of productive workers working in the rent-seeking sector, we have:

E = 'R e R + P eP =µ

39
which implies that P is:
µ 'R e R
P (eR , eP ) =
eP
Given that the share of productive workers employed in the productive sector is 1 P, total output
produced in the economy is:

Y = µ + (1 P (eR , eP )) eP = eP + 'R eR

Suppose the government is utilitarian, the welfare function is:

W = e P + 'R e R h (eP ) h (eR )

If the government can observe and tax income through a non-linear tax schedule but cannot tax
occupational choices, we can solve the problem by choosing an optimal effort level for each type. The
FOCs for effort are:

h0 (eP ) = 1

h0 (eR ) = 'R

Notice that the two conditions imply zero wedge on labor income for both types. Although rent-
seekers are not productive at all they are not taxed in equilibrium. The reason is that in this model
rent-seekers are “indirectly productive” by crowding out productive workers from the rent-seeking
sector. If rent-seekers were taxed, productive workers would be attracted into the rent-seeking sector
(eR would fall but E = µ, and p would have to increase to balance the change) and would decrease
total production. Notice that the result is not dependent on the assumption of utilitarian social
preferences, but would hold for any other combination of social welfare weights.
This example shows how general equilibrium considerations might be extremely important in shap-
ing optimal marginal tax rates. Even under the extreme assumption that all top earners are rent-
seekers, general equilibrium considerations would put downward pressure on marginal tax rates at the
top to avoid attracting productive workers into rent-seeking.
The model with occupational choices can also be employed to study the “trickle down” effects in
Stiglitz (1982) (see Rotschild Scheuer, 2013). Allowing for occupational choices still pushes towards
lower top marginal tax rate than in a standard Mirrlees model with linear production, but less so than
in a world without occupational choice (as in Stiglitz 1982). The reason is that, unlike in the Stiglitz’s
model, if the government subsidizes high types, effort increases in the high skill sector decreasing wages
in the high sector and increasing wages for the low sector. In a Roy model this would attract to the
low-skilled sector some workers who were indifferent between the two sectors, reducing the increase in
the low-skilled wage. This effect works against the standard general equilibrium effect presented in the
previous paragraph, making trickle down less effective.

6.3 Wage Bargaining and Tax Avoidance


In this paragraph we study a standard Mirrlees model with a linear production function where indi-
vidual income can depart from actual output. We present two potential departures: wage bargaining
and tax avoidance.

Wage Bargaining Suppose top earners have measure 1 and after bargaining get a fraction ⌘ of
their output z (where we allow for ⌘ > 1) such that y = ⌘z. Bargained earnings are b = (⌘ 1) y
and average bargained earnings in the economy are E (b). In the aggregate, it must be the case that
total product is equal to total compensation. Hence, if E (b) > 0, so that there is overpay on average,

40
E (b) must come at the expense of somebody. The opposite is true if E (b) < 0. For simplicity, we
assume that any gain made through bargaining comes uniformly at the expense of everybody else in
the economy. Hence, individual incomes are all reduced by a uniform amount E (b) if E (b) > 0.13 We
further assume that individuals can exert effort to increase ⌘ and their preferences are:

ui (c, ⌘, y) = c hi (y) ki (⌘)


When the cost of bargaining E (b) is uniformly distributed across all agents, the government can
offset it with the demogrant T (0). It follows that earnings can be written as z = ⌘y = y + b. Each
individual chooses ⌘ and z to maximize ui (c, ⌘, y) = ⌘y T (⌘y) hi (y) ki (⌘) and FOCs are:

(1 ⌧ ) ⌘ = h0i (y)

(1 ⌧ ) y = ki0 (⌘)
with ⌧ = T 0 (z). Let us denote the average reported income, output and bargaining as Walrasian
demands z (1 ⌧ ), y (1 ⌧ ) and b (1 ⌧ ). The implied elasticities are:
1 ⌧ dy
e1 =
y d (1 ⌧ )

1 ⌧ dz
e=
z d (1 ⌧ )

1 ⌧ db
e2 = = s· e
z d (1 ⌧ )
with
db/d (1 ⌧)
s=
dy/d (1 ⌧)
The definitions imply that (y/z) e1 = (1 s) e and that e = (y/z) e1 + e3 .
When the social welfare weight on top incomes is zero, the government chooses the top tax rate to
maximize total revenues:
max ⌧ [z (1 ⌧) z̄] N · E (b)

and the FOC is:


dz
[z z̄] ⌧ =z z̄
d (1 ⌧ )
rearranging
dz ⌧ s z z̄ 1
[⌧ s] =z z̄ ) ·e = =
d (1 ⌧ ) 1 ⌧ z a
using e2 = s· e we get:
1 1 + e2⌧
⌧e e2 = ) ⌧⇤ = (46)
a 1 + ae
If top earners are overpaid relative to their productivity s > 0 and e2 > 0 implying that the optimal
tax rate is higher than the one maximizing revenues in the standard model (⌧ ⇤ > 1/ (1 + ae)). This
is because of a trickle up effect that arise when a higher tax on high incomes reduces the cost of
bargaining for low incomes. On the other hand, if z < y and e2 < 0 we would have a trickle down
situation where a lower tax on top incomes shifts resources to individuals at the bottom.
13 This assumption can be relaxed allowing for income reductions affecting only workers in the same occupation using

a framework similar to the one we presented in Section 6.

41
Tax Avoidance Responses to tax rates can also take the form of tax avoidance. Define tax avoidance
as changes in reported income due to changes in the form of compensation, but not in the total level
of compensation. We observe tax-avoidance if taxpayers can shift part of their taxable income into
another form that is treated more favorably from a tax perspective. Denote with x total sheltered
income such that ordinary taxable income is z = y x. Sheltered income is taxed at a constant
marginal tax rate t. Suppose that the individual faces a utility cost for sheltering taxes and utility is
ui (c, y, x) = c hi (y) di (x) where c = y ⌧ z tx + R = (1 ⌧ ) y + (⌧ t) x + R and R = ⌧ z̄ T (z̄)
is virtual income. We can write Walrasian demands z (1 ⌧, t) = y (1 ⌧ ) x (⌧ t). Let us define
e3 the elasticity of sheltered income:
1 ⌧ dx
e3 = = s· e
z d (1 ⌧ )
where
dx/d (⌧ t) dx/d (⌧ t)
s= =
dy/d (1 ⌧ ) + dx/d (⌧ t) @z/@ (1 ⌧)
and e = (y/z) e1 + e3 .
The government problem is:

max ⌧ [z (1 ⌧, t) z̄] + tx (⌧ t)
⌧,t

Suppose the government could only optimally set ⌧ given some t, the FOC would be:

@z dx
0 = [z z̄] ⌧ +t
@ (1 ⌧ ) d (⌧ t)
@z @z
= [z z̄] ⌧ + ts
@ (1 ⌧ ) @ (1 ⌧ )
⌧ ts
= [z z̄] ez
1 ⌧
rearranging:
1 + t· a· e2
⌧⇤ = (47)
1 + a· e
Notice how the tax is proportional to t· a· e2 that captures the fiscal externality of tax avoidance.
If t = 0 and the government cannot do anything to prevent income shifting, it is irrelevant whether e
is due to real response or tax avoidance response (see Feldstein, 1999).
If instead the government could also optimally set t, we would have an extra optimality condition:

@z dx
0 = ⌧ +x t
@t d (⌧ t)
dx
= x + (⌧ t)
d (⌧ t)
since x 0 and dx/d (⌧ t) 0 the first order condition can only hold if ⌧ = t. If this is the case
x (⌧ t) = x (0) = 0 and z = y so that e e3 = e1 . If we replace this in (47) we obtain:
1
⌧ ⇤ = t⇤ = (48)
1 + a· e1
Intuitively, the government finds optimal to close any tax avoidance opportunity at the optimum.
When this is the case the elasticity of income is the only one that matters.

42
References
Feldstein, Martin. 1999. “Tax Avoidance and the Deadweight Loss of the Income Tax.” Review of
Economics and Statistics 81 (4): 674–80.
Piketty, Thomas, Emmanuel Saez, and Stefanie Stantcheva. 2014. “Optimal Taxation of Top Labor
Incomes: A Tale of Three Elasticities.” American Economic Journal: Economic Policy 6 (1): 230–71.
Rothschild, Casey and Florian Scheuer, “Redistributive Taxation in the Roy Model,” Quarterly Journal
of Economics, 2013, 128, 623–668.
Rothschild, Casey and Florian Scheuer, “Optimal Taxation with Rent-Seeking,” Review of Economic
Studies, 2016, forthcoming
Scheuer, Florian (2014), Lecture Notes
Stiglitz, Joseph, “Self-Selection and Pareto Efficient Taxation,” Journal of Public Economics, 1982, 17,
213–240.

43
7 Section 8: Optimal Minimum Wage and Introduc-
tion to Capital Taxation
In this Section we develop a theoretical analysis of optimal minimum wage policy in a perfectly com-
petitive labor market following Lee and Saez (2012).

7.1 Optimal Minimum Wage


We study a model with extensive and intensive margin labor supply responses where wages are
endogenous. Suppose output is produced through a constant return to scale production function
F (h1 , h2 ) where h1 and h2 are low and high-skilled workers respectively. Profits are given by ⇧ =
F (h1 , h2 ) w1 h1 w2 h2 and wages are equal to the marginal product of labor:

@F (h1 , h2 )
wi = (49)
@hi
A mass 1 of individuals has three labor supply options: i) not work and earn zero income, ii) work
in occupation 1 and get w1 , iii) work in occupation 2 and earn w2 . Individuals are heterogeneous
in their tastes for work. Every individual faces a vector ✓ = (✓1 , ✓2 ) of work costs that is smoothly
distributed across the entire population according to H (✓) with support ⇥. The government perfectly
observes the wage wi , but does not observe the cost of working. There are no savings and after tax
income equals consumption such that ci = wi Ti . Suppose there are no income effects and utility is
linear in consumption:
ui = c i ✓i
The subset of individuals choosing occupation i is ⇥i = {✓ 2 ⇥|ui = maxj uj }. The fraction of the
population working in occupation i is hi (c) = |⇥i | and is a function of c = (c0 , c1 , c2 ). The tax system
defines a competitive equilibrium (h1 , h2 , w1 , w2 ).
Equation (49) implies that w2 /w1 = F2 (1, h2 /h1 ) /F1 (1, h2 /h1 ). Constant returns to scale along
with decreasing marginal productivity along each skill implies that the right- hand-side is a decreasing
function of h2 /h1 . Therefore, the function is invertible and the ratio h2 /h1 can be written as a function
of the wage ratio w2 /w1 : h2 /h1 = ⇢ (w2 /w1 ) with ⇢ (· ) a decreasing function. Constant returns to
scale also imply that there are no profits in equilibrium. Hence ⇧ = F (h1 , h2 ) w1 h1 w2 h2 = 0 so
that w1 + w2 ⇢ (w2 /w1 ) = F (1, ⇢w2 /w1 ), which defines a decreasing mapping between w1 and w2 so
that we can express w2 as a decreasing function of w1 : w2 (w1 ).
Labor supply and demand for the low-skilled labor market are D1 (w1 ) and S1 (w1 ) with D10 (w1 )  0
and S10 (w1 ) 0 and are defined assuming that the market clears in the high-skilled labor market. The
low-skilled labor demand elasticity is:
w1
⌘1 = D1 (w1 )
h1
The resource constraint of the economy is:

h0 c 0 + h1 c 1 + h2 c 2  h1 w 1 + h2 w 2 (50)
The government weights individual utilities through a social welfare function G (· ) and we can
write the social welfare of the economy as:
ˆ ˆ
SW = (1 h1 h2 ) G (c0 ) + G (c1 ✓1 ) dH (✓) + G (c2 ✓2 ) dH (✓) (51)
⇥1 ⇥2
We define social marginal welfare weights as usual g0 = G0 (c0 ) / and gi = ⇥i G0 (ci ✓i ) dH (✓) / ( hi ),
´

where is the marginal value of public funds. The concavity of the SWF implies g0 > g1 and g1 > g2 .
Since there are no income effects the value of transferring $1 to everyone in the economy is $1 and we
have = g0 h0 + g1 h1 + g2 h2 = 1.

44
Figure 3:

Minimum Wage with No Taxes Suppose there are no taxes and transfers, we have c0 = 0,
c1 = w1 and c2 = w2 . Suppose the economy is at the equilibrium and the government introduces a
small minimum wage above the equilibrium wage of the low-skilled market such that w̄ = w1⇤ + dw̄.
The change will generate a drop in employment h1 . The workers who drop out of the low-skilled
sector will move either to unemployment or to the high-skilled sector depending on their preferences.
We will assume efficient rationing: the workers who involuntarily lose their low-skilled jobs due to the
minimum wage are those with the least surplus from working in the low-skilled sector.14 This is clearly
the most favorable case to minimum wage policy. We establish the first result of the paper:

Proposition 1: With no taxes/transfers, if (i) efficient rationing holds; (ii) the government values
redistribution from high-skilled workers toward low-skilled workers (g1 > g2 ); (iii) the demand elasticity
⌘1 for low-skilled workers is finite; and (iv) the supply elasticity of lows-killed workers is positive, then
introducing a minimum wage increases social welfare.

Consider
P the changes dw1 , dw2 , dh1 and dh2 following the increase in the minimum wage, we have
d⇧ = i [(@F/@hi ) dhi wi dhi hi dwi ] = h1 dw1 h2 dw2 and the no profit condition implies:

h1 dw1 + h2 dw2 = 0 (52)


Therefore, the earnings gain for low-skilled people h1 dw1 > 0 is compensated by a loss in the earn-
ings of high-skilled workers h2 dw2 < 0. The government values the transfer of resources [g1 g2 ] h1 dw1 .
Under efficient rationing, positive supply elasticity and finite demand elasticity, the welfare loss due
to low-skilled individuals moving to unemployment is second-order (see Figure 3).
More formally, the first order condition wrt dw̄ is:


dSW dh1 dh2 dh1 dh2
= G (0) + G (0) + G (0) +
dw̄ dw̄ dw̄ dw̄ dw̄
14 The assumption can be relaxed and a working paper version of the paper shows how the model can be derived under

the assumption of uniform rationing.

45
Figure 4:

dw2
ˆ ˆ
+ G0 (c1 ✓1 ) dH (✓) + G0 (c2 ✓2 ) dH (✓)
⇥1 dw̄ ⇥2

The second and third terms come from the assumption of perfect rationing: the workers moving
to unemployment from the two occupations are those with zero surplus from working therefore the
welfare loss associated to the change of occupation is zero. Also, those who drop out of occupation 1
and move to 2 are indifferent between the two and we can ignore the welfare effect associated to the
change by envelope theorem. Using (52) we have dw2 /dw̄ = h1 /h2 and the FOC becomes:

dSW
= h1 [g1 g2 ] > 0
dw̄
which proves Proposition 1.

Minimum Wage with Taxes and Transfers We now assume that the government can use taxes
and transfers jointly with the minimum wage policy.

Proposition 2: Under efficient rationing, assuming ⌘1 < 1, if g1 > 1 at the optimal tax allocation
(with no minimum wage), then introducing a minimum wage is desirable. Furthermore, at the joint
minimum wage and tax optimum, we have: (i) g1 = 1 (Full redistribution to low-skilled workers); (ii)
h0 g0 + h1 g1 + h2 g2 = 1 (Social welfare weights average to one).

Suppose there was no minimum wage, an attempt to increase c1 by dc1 while keeping c0 and c2
constant through an increased work subsidy provides incentives for some of the non-workers to start
working in occupation 1 (extensive labor supply response) and for some of workers in occupation 2
to switch to occupation 1 (intensive labor supply response). This leads to a reduction in w1 through
demand side effects (as long as ⌘1 < 1). See Figure 4.
Consider the same increase in c1 when the minimum wage was initially set at w̄ = w1T , where
wi , ci is the the optimal tax and transfer system which maximizes social welfare absent the minimum
T T

wage. Since w1 cannot fall, labor supply responses are effectively blocked (Figure 5). Efficient rationing
guarantees that individuals willing to leave occupation 1 are precisely those with the lowest surplus

46
from working in occupation 1 relative to their next best option. Therefore, the dc1 change is like a
lump-sum tax reform and its net welfare effect is simply [g1 1] h1 dc1 . If g1 > 1, the introduction of
the minimum wage improves upon the tax/transfer optimum allocation. This result shows that under
the minimum wage policy, redistribution to low-skilled workers can be made lump-sum. Furthermore,
raising the lump-sum transfer to occupation 1 improves welfare as long as g1 > 1 and therefore the
government will find optimal to do it until g1 = 1. With no behavioral responses an increase of $1 has
a welfare effect of h0 g0 + h1 g1 + h2 g2 and at the optimal the two are equal.

Figure 5:

To prove it formally, rewrite consumption in occupation i as ci = ci c0 and the resource


constraint as h1 · (w1 c1 ) + h2 · (w2 c2 ) c0 . The Lagrangian of the problem is:

L = SW + [h1 · (w1 c1 ) + h2 · (w2 c2 ) c0 ]

Suppose there is a minimum wage and the government introduces a change dc1 , the wage of oc-
cupation 1 does not change because of the minimum wage and so does w2 given that w2 (w1 ) (as we
showed above). As a consequence, there is no change in h1 /h2 = ⇢ (w2 /w1 ) and no change in the levels
of h1 and h2 since they cannot increase simultaneously. Therefore:
dL
ˆ
= G0 (c0 + c1 ✓1 ) dH (✓) h1 = [g1 1] h1
dc1 ⇥1

At the optimum it must be g1 = 1. Taking the FOC wrt c0 we have:

dL
ˆ ˆ
= (1 h1 h2 ) G0 (c0 ) + G0 (c0 + c1 ✓1 ) dH (✓) + G0 (c0 + c2 ✓2 ) dH (✓)
dc0 ⇥1 ⇥2
= [h0 g0 + h1 g1 + h2 g2 1]

which proves Proposition 2.

Pareto Improving Reform In this section we review the last result in Lee and Saez (2012)
that shows how minimum wage and low-skilled labor subsidies can be complementary. Suppose

47
there are extensive margin responses only, the participation tax rate of low-skilled workers ⌧1 is
1 ⌧1 = (c1 c0 ) /w1 , such that c1 = c0 + (1 ⌧1 ) w1 .

Proposition 3: In a model with extensive labor supply responses only, a binding minimum wage
associated with a positive tax rate on minimum wage earnings (⌧1 > 0) is second-best Pareto inefficient.
This result remains a-fortiori true when rationing is not efficient.

Suppose the government reduces the minimum wage by dw̄ < 0 while keeping c0 , c1 and c2 constant.
The change incentivizes unemployed individuals to enter occupation 1 generating a change dh1 > 0
and increasing revenues since ⌧1 > 0. The change dh1 > 0 induces a change dw2 > 0. However, since
h1 dw̄ + h2 dw2 = 0 the mechanical effect of changes in wages is zero. Since c0 , c1 and c2 are constant
the total effect of the government policy is only given by the increase in revenues, which is positive.
Proposition 3 implies that, when labor supply responses are concentrated along the extensive margin,
a minimum wage should always be associated with low-skilled work subsidies such as the EITC.
To prove it formally notice that since consumption does not change at any occupation, the utility of
those who do not switch jobs is not affected. From the demand side, we have w2 (w1 ) with dw2 /dw1 =
h1 /h2 < 0 and hence dw2 > 0. This implies that relative demand for high-skilled work h2 /h1 =
⇢ (w2 /w1 ) decreases as ⇢ (· ) is decreasing. Because c2 c0 remains constant, and labor supply is only
along the extensive margin, the supply of high-skilled workers is unchanged so that dh2 = 0, which
then implies that dh1 > 0. The dh1 individuals shifting from no work to low-skilled work are weakly
better-off because they were by definition rationed by the minimum wage (strictly better off in case of
inefficient rationing). The government budget is h1 (w1 c1 ) + h2 (w2 c2 ) c0 0. Therefore,
the net effect of the reform on the budget is: dh1 · (w1 c1 ) + h1 dw1 + h2 dw2 = dh1 ⌧1 w1 > 0.
Thus, with ⌧1 > 0, the reform creates a budget surplus which can be used to increase c0 and improve
everybody’s welfare (with no behavioral response effects), a Pareto improvement.

References
[1] David Lee & Emmanuel Saez, 2012. "Optimal minimum wage policy in competitive labor markets,"
Journal of Public Economics, vol 96(9-10), pages 739-749.

48
8 Section 9: Linear Capital Taxation
In this section we introduce a framework to study optimal linear capital taxation. We first focus on
a two-period model, define the concept of intertemporal wedge and derive optimal capital taxes using
the Atkinson Stiglitz result. We then move to an infinite horizon model with aggregate uncertainty
and derive optimal taxes. Finally, we study a model with capitalists and workers and show that only
under some assumption about preferences we can recover a zero capital tax in steady state.15

8.1 A Two-Period Model


We introduce a two-period model with capital accumulation that will be useful to study the problem
of capital taxation. The two time periods are t = 0 and t = 1. The preferences for individual i are
U i (c0 , c1 , y0 ). The individual can save period 0 income and earn gross interest rate R on savings. We
start by constraining the government’s instruments and focusing on a linear consumption tax, while
keeping a non-linear tax on income. The budget constraints for the two periods read:

(1 + ⌧0 ) c0  y0 T (y0 ) k1

(1 + ⌧1 ) c1  Rk1

Combining the two we get:

(1 + ⌧1 ) c1  R (y0 T (y0 ) (1 + ⌧0 ) c0 ) (53)


Rearranging we have:

1 1 + ⌧1 y0 T (y0 )
c0 + c1 
R 1 + ⌧0 1 + ⌧0

Let us denote 1+˜


⌧1 = (1 + ⌧1 ) / (1 + ⌧0 ) and T̃ (y) = (⌧0 y + T (y)) / (1 + ⌧0 ). The budget constraint
can be written:
1
c0 + [1 + ⌧˜1 ] c1  y0 T̃ (y0 )
R
When the budget constraint holds with equality we can write consumption at time 1 as:
R
c1 = s0
1 + ⌧˜1

where s0 = y0 T̃ (y0 ) c0 is the total level of savings when there are no distortions in the
economy. We can interpret ⌧˜ as a capital income tax. It distorts inter-temporal consumption decisions
by changing the relative price of c0 and c1 . It can be interpreted as a wedge on the optimal savings
decision. Notice that whenever ⌧˜1 = 0 we have no distortion in the inter-temporal choice of the agent.
Suppose the agent has separable preferences in consumption and labor of the form:

U (c0 , c1 , y0 ) = u (c0 ) + u (c1 ) h (y0 /w)

We can rewrite the preferences as U (g (c0 , c1 ) , y0 ) so that the utility is weakly separable in g (·) and
y0 . It follows that Atkinson-Stiglitz applies: if non-linear income taxation is available the government
finds optimal to set a flat zero tax on c0 and c1 .16
15 The second part of these notes is based on lecture notes by Florian Scheuer.
16 Here is a short proof of the Atkinson Stiglitz result provided in Kaplow (2006). Suppose all individuals have weakly

49
8.2 Infinite Horizon Model - Chamley (1986)
In this section we introduce a model where capital returns and wages are endogenous. We focus on
linear capital and labor taxes in an infinite horizon economy. There is aggregate uncertainty in the
economy and each period t a state st is realized so that the history of aggregate uncertainty is a
sequence st = (s0 , s1 , . . . , st ). Output is produced according to a constant return to scale production
function:

F K st 1
, L s t , st , t (54)
the productive capital at time t is the stock that was chosen at time t 1. The firm solves the
following profit maximization problem:

max F K st 1
, L s t , st , t w st L st r st K st
K,L

Competitive labor and capital markets imply that input prices are equal to their marginal product:

w s t = FL K s t 1
, L s t , st , t

r s t = FK K s t 1
, L s t , st , t

The economy is populated by a representative agent whose utility is:


X
t
Pr st u c st , L st
st

where Pr (st ) is the probability of history st . The aggregate resource constraint of the economy is:

c st + g st + K st (1 ) K st 1
 F K st 1
, L s t , st , t (55)
The output produced is employed to finance consumption, public spending and investments. The
resource constraint implicitly assumes that aggregate uncertainty results from technology or govern-
ment spending shocks. The government optimally chooses taxes on labor income ⌧ l (st ) and taxes on
capital income ⌧ k (st ) and starts with initial debt B0 .
We assume complete markets where the price of an Arrow-Debreu security is p (st ).17 The govern-
ment budget constraint is:
X ⇥ ⇤
p st g st ⌧ l st w st L st ⌧ k st r st K s t 1  B0
t,st

Taxes on consumption and capital are employed to finance government layouts g (st ). Notice that
the capital tax is levied on the capital gain net of the capital depreciation.
separable preferences V (g (x) , y), where x is a vector of commodities. Suppose we start from a situation where there are
positive taxes on commodities and we implement a policy such that t ! 0: zero flat tax on all commodities. Suppose
the government offsets the utility change of the agent with non-linear income taxes such that labor supply is unchanged
at the optimum and V (g (x0 ) , y) = V (g (x) , y) . By definition every agent has the same utility as before and no one is
willing to imitate
P another individual (if they were not willing to imitate before the tax change). By revealed preference
we know that p x > y T ⇤ (y): the agent cannot afford the old bundle under the current income taxation. Under
k k k
P
the old policy scenario the agent could afford the bundle and we had (pk + ⌧k ) x0k  y T (y). Combining the
P k
two inequalities we find that T ⇤ (y) > ⌧ x
k k k
0 + T (y) and the total revenue raised after the tax change is strictly

higher than total revenues before the tax change. Since incentive compatibility holds and we have no welfare effect by
construction, the new policy is welfare improving since it raises more revenues.
17 An Arrow-Debreu security is a financial instrument that provides one unit of consumption in a state s and zero
t
units in any other state. We talk about complete markets whenever we can price such an asset in every state of the
world.

50
The household budget constraint reads:
X ⇥ ⇤
p st c st + K st w st 1 ⌧ l st L st R st K st 1
 B0 (56)
t,st

where R (st ) = 1 + 1 ⌧ k (st ) (r (st ) ) is the gross interest rate net of taxes. We can set up
the Lagrangian for the consumer problem:

X
t
L = Pr st u c st , L st +
st
2 3
X ⇥ ⇤
+ 4 B0 p st c st + K st w st 1 ⌧ l st L st R st K st 1 5
t,st

The first order conditions are:


t
Pr st uc c st , L st p st = 0 (57)

t
Pr st uL c st , L st + p st 1 ⌧ l st w st = 0 (58)

X
p st + p st+1 R st+1 = 0 (59)
st+1

On top of the FOCs, a non-arbitrage condition must hold between capital and Arrow-Debreu
securities:
X
p st = p st+1 R st+1 (60)
st+1

We can define a competitive equilibrium as follows:

Definition: A competitive equilibrium is a policy g (st ) , ⌧ k (st ) , ⌧ l (st ) , an allocation {c (st ) , K (st ) , L (st )}
and prices {w (st ) , r (st ) , p (st )}, such that households maximizes utility s.t. budget constraint, firms
maximize profits, the government budget constraint holds and markets clear

Combining (57) and (58) we get the standard intratemporal condition for labor supply:
t
Pr (st ) uc (c (st ) , L (st )) t
Pr (st ) uL (c (st ) , L (st ))
=
p (st ) p (st ) (1 ⌧ l (st )) w (st )

uL (c (st ) , L (st ))
1 ⌧ l st w st = (61)
uc (c (st ) , L (st ))
From (57) and (59) we derive the so called Euler equation that pins down the slope of the con-
sumption path of the agent:
t
Pr (st ) uL (c (st ) , L (st ))
uc (c0 , L0 ) = (62)
p (st )
Starting from (56), we can rewrite it using the optimality conditions and the no-arbitrage condition:

51
X ⇥ ⇤
p st c st w st 1 ⌧ l st L st  B 0 + R0 K0
t,st
X t

Pr (st ) uc (c (st ) , L (st )) uL (c (st ) , L (st ))
c st + L st  B 0 + R0 K0
t
uc (c0 , L0 ) uc (c (st ) , L (st ))
t,s
X 
t uL (c (st ) , L (st ))
Pr st uc c st , L st c st + t ) , L (st ))
L st  uc (c0 , L0 ) [B0 + R0 K0 ]
u c (c (s
t,st
X ⇥ ⇤
t
Pr st uc c st , L st c st + uL c st , L st L st  uc (c0 , L0 ) [B0 + R0 K0 ](63)
t,st

We call the constraint in (63) implementability constraint since it captures the agent’s optimal
choices subject to their feasibility.

Optimal Taxes The government chooses taxes to maximize the welfare of the representative agent
subject to the resource constraint and the implementability constraint. The problem reads:

X
t
max Pr st u c st , L st
c(st ),L(st ),K(st ),⌧0k
st

s.t.

c st + g st + K st (1 ) K st 1
 F K st 1
, L s t , st , t

X ⇥ ⇤
t
Pr st uc c s t , L s t c s t + uL c s t , L s t L st  uc (c0 , L0 ) [B0 + R0 K0 ]
t,st

We assume ⌧0k is fixed, we denote with µ the multiplier on the implementability constraint and
define:
⇥ ⇤
W c s t , L s t = u c s t , L s t + µ uc c s t , L s t c s t + uL c s t , L s t L s t

The government problem becomes:

X
t
max Pr st W c st , L st µuc (c0 , L0 ) [B0 + R0 K0 ]
c(st ),L(s ),K(st )
t
st

s.t.

c st + g st + K st (1 ) K st 1
 F K st 1
, L s t , st , t

For any period t 6= 0 the FOCs are:


t
Pr st Wc c st , L st st = 0

t
Pr st WL c st , L st + s t FL K s t 1
, L s t , st , t = 0

52
X ⇥ ⇤
st + st , st+1 Fk K s t 1
, L st , st , t + (1 ) =0
st+1

Combining the FOCs we get an intratemporal condition:


WL (c (st ) , L (st ))
= FL K s t 1
, L s t , st , t
Wc (c (st ) , L (st ))
Using the household FOC in (58), we can rewrite the optimality condition as a function of the tax:

uL (c (st ) , L (st )) Wc (c (st ) , L (st ))


⌧ l⇤ st = 1 (64)
WL (c (st ) , L (st )) uc (c (st ) , L (st ))
From the government FOCs we can also derive an intertemporal condition:
X
Wc c s t , L s t = Pr st+1 |st WL c st+1 , L st+1 R⇤ st+1
st+1

where R⇤ (st ) = 1 + FK K st 1 , L (st ) , st , t is the untaxed gross return on capital net of


depreciation. Again, exploiting the household’s Euler equation we can derive:

Wc c st+1 , L st+1 uc (c (st ) , L (st ))


R st+1 = R⇤ st+1 (65)
uc (c (st+1 ) , L (st+1 )) Wc (c (st ) , L (st ))

Proposition 1: Suppose that (i) there is no uncertainty (ii) there is a steady state. Then in the
steady state ⌧ k = 0 is optimal.

It is easy to see that in a steady state when there is no uncertainty R (ss) = R⇤ (ss), which implies:

1+ 1 ⌧ k (ss) (Fk (K (ss) , L (ss)) ) = 1 + FK (K (ss) , L (ss) , ss)


and that ⌧ (ss) = 0.
k

Now consider a special case with separable preferences and constant intertemporal elasticity of
substitution:
c1
u (c, L) = v (L)
1
then we have:

c1 ⇥ ⇤
W (c, L) = v (L) + µ c c v 0 (L) L
1
✓ ◆
1
= + µ c1 [v (L) + µv 0 (L) L]
1
therefore:

Wc = (1 + µ (1 )) c
= (1 + µ (1 )) uc

Wc
= 1 + µ (1 )
uc
Equation (65) reduces to R st+1 = R⇤ st+1 . Hence, we established that for separable preferences
with constant intertemporal elasticity of substitution we have zero capital taxation even out of the
steady state and in a model with uncertainty.

53
Tax Smoothing Take now the special case where v (L) = ↵L / is isoelastic, we have:
✓ ◆ ✓ ◆
1 1
W (c, L) = + µ c1 ↵ +µ L
1
it follows that
WL
=1+µ
uL
the optimal linear labor tax becomes:
1 + µ (1 )
⌧ l⇤ st = 1
1+µ
Therefore, labor taxes are constant across states and over time. The government finds optimal to
smooth distortions to labor supply. This result depends on the possibility of setting state-contingent
capital taxes. If the labor elasticity is constant and shocks can be offset using capital taxes, there is
no residual reason to differentially tax labor.

8.3 Infinite Horizon - Judd (1985)


We now introduce the model by Judd (1985) where the famous “zero steady state capital tax” result
arise. We then show that the result is not general and depends on the agent’s preferences as shown in
Werning Straub (2015). Suppose there are two agents: capitalists and workers. The former save, get
the return to capital and do not work; the latter supply one unit of labor inelastically and consume
everything they earn. The government taxes return to capital and pays transfers to workers. Output
is produced according to a constant return to scale technology with production function F (kt , nt ).
Aggregate labor is nt = 1 so that we can rewrite f (kt ) = F (kt , 1). Capitalists have utility U (Ct ) and
workers’ utility is u (ct ). The resource constraint of the economy is:

ct + Ct + g + kt+1  f (kt ) + (1 ) kt

Under the assumption of perfectly competitive labor markets, wages are:

wt = FL (kt , nt ) = f (kt ) kt f 0 (kt )

The after-tax return to capital is:

Rt = 1 + (1 ⌧t ) (Rt⇤ 1)

where R⇤ = f 0 (kt ) + 1 .
Capitalists solve the following maximization problem:
+1
X
t
max U (Ct )
Ct ,at+1
t=0

s.t.

Ct + at+1 = Rt at

The optimality condition delivers the standard Euler equation U 0 (Ct ) = Rt+1 U 0 (Ct+1 ). Since
total wealth must equal total capital stock in equilibrium, using the Euler equation:
U 0 (Ct 1 )
Ct + kt+1 = Rt kt = kt
U 0 (Ct )

54
rearranging:

U 0 (Ct ) (Ct + kt+1 ) = U 0 (Ct 1 ) kt (66)


Equation (66) is the implementability constraint. The government maximizes the following objec-
tive function, where is the Pareto weight on capitalists:
1
X
t
max (u (ct ) + U (Ct ))
ct ,Ct ,kt+1
t=0

The Lagrangian of the problem is:

1
X
t
L = (u (ct ) + U (Ct ))
t=0
X1
t
+ t (f (kt ) + (1 ) kt ct Ct g kt+1 )
t=0
X1
+ t
µt ( U 0 (Ct ) (Ct + kt+1 ) U 0 (Ct 1 ) kt )
t=0

with µ0 = 0 since there is no implementability in the first period (⌧0 is taken as given). The first
order conditions wrt to ct , kt+1 and Ct are respectively:

u0 (ct ) = t (67)

t+1 1 U 0 (Ct )
(f 0 (kt+1 ) + 1 )= + (µt+1 µt ) (68)
t t
✓ ◆ ✓ ◆
µt U 0 (Ct ) 1 U 0 (Ct ) t
µt+1 = Ct + kt+1 + + (69)
kt+1 U 00 (Ct ) kt+1 U 00 (Ct ) U 00 (Ct )
It is straightforward from equation (68) that whenever a steady state exists it involves zero capital
taxes and R (ss) = f 0 (kt ) + 1 = R⇤ . This result is extremely powerful since it is independent
of the welfare weight attached to capitalists. However, the result does not hold for the case where
= 1. Rewrite the FOCs (68) and (69) using the inverse intertemporal elasticity of substitution
t = U 00 (Ct ) Ct /U 0 (Ct ) and defining the ratio vt = U 0 (Ct ) /u0 (ct ):

u0 (ct+1 ) 0 1
(68) ) (f (kt+1 ) + 1 )= + vt (µt+1 µt )
u0 (ct )

✓ ◆
1 1/ 1
(69) ) µt+1 = µt +1 + (1 vt )
kt+1 kt t v t

Take the case where = 1 (log preferences) and the allocation converges to a steady state, then:
1
R⇤
µt+1 µt =
v

1 v
µt+1 µt =
kv

55
1 1 v
) R⇤ =
k
As long as there is a low enough weight on capitalists, capital is taxed in steady state. For a long
time we thought that this was simply an anomaly for the logarithmic case. However, Werning and
Straub (2015) show that the result does not hold for any > 1 by noticing that the steady state does
not necessarily exist.

Proposition 2: If > 1 and = 0, then for any initial k0 the solution to the planning problem
does not converge to the zero-tax steady state, or any other interior steady state.

Suppose capital taxes are raised in the future, capitalists will decrease savings today for the substi-
tution effect. A capital tax increase will also reduce agent’s wealth and lower capitalists’ consumption
through the income effect. When > 1 the income effect prevails and capitalists save more. The in-
crease in the capital stock increases wages and is beneficial for workers. For this reason the government
wants to set positive capital taxes in the long-term. The opposite is true when < 1: the substitution
effect is larger than the income effect and zero taxes in the future increase savings in the short term
increasing wages and workers’ consumption.

References
Chamley, Christophe, “Optimal Taxation of Capital Income in General Equilibrium with Infinite Lives,”
Econometrica, 1986, 54 (3), pp. 607–622.
Judd, Kenneth L., “Redistributive taxation in a simple perfect foresight model,” Journal of Public
Economics, 1985, 28 (1), 59 – 83.
Kaplow, Louis, 2006. "On the undesirability of commodity taxation even when income taxation is not
optimal," Journal of Public Economics, Elsevier, vol. 90(6-7), pages 1235-1250, August.
Scheuer, Florian (2014), Lecture Notes
Straub, Ludwig and Ivan Werning, “Positive Long Run Capital Taxation: Chamley-Judd Revisited,”
Working Paper, MIT 2015.

56
9 Section 10: Education Policies and Simpler Theory
of Capital Taxation
In this section we study education policies in a simplified version of framework analyzed by Stantcheva
(2016). We then review a simpler theory of capital taxation proposed by Saez and Stantcheva (2016)
in a continuous time model.

9.1 Education Policies


We study a static model with human capital investments based on Bovenberg and Jacobs (2005,
2011) and Stantcheva (2016). Suppose individuals are heterogeneous in ability ✓ distributed according
to f (✓). Agents can invest in education paying a monetary cost M (e) such that M 0 (e) > 0 and
M 00 (e) 0. Wages are a function of ability and human capital that we can write as w (✓, e). Denote
with ⇢✓e the Hicksian coefficient of complementarity between ability and education defined as:
w✓e w
⇢✓e =
w✓ we
Consider two special cases for the wage function: (i) multiplicative w (✓, e) = ✓e implies ⇢✓e = 1; (ii)
⇥ ⇤ 1
constant elasticity of substitution w (✓, e) = ↵✓1 ⇢ + (1 ↵) e1 ⇢ 1 ⇢ implies ⇢✓e = ⇢. The elasticity
of the wage to ability is "w,✓ = @ log (w) /@ log (✓).
Suppose utility is quasi-linear in consumption such that:

u (c, l) = c h (l)

The agent consumes everything that is left after taxes and education investments such that c (✓) =
w (✓, e (✓)) l (✓) M (e (✓)) T (w (✓, e (✓)) l (✓) , e (✓)). Solving the individual maximization problem
we can define income and education wedges as:

h0 (l (✓))
⌧y (✓) = 1 (70)
w (✓, e (✓))

⌧e (✓) = we (✓, e (✓)) l (✓) (1 ⌧y (✓)) M 0 (e (✓)) (71)


The elasticity of labor to the net of tax wage is:
h0 (l)
"=
h00 (l) l

We can write the indirect utility as u (✓) = c (✓) h (l (✓)). Using the Envelope we can derive the
local incentive constraint:

@u (✓) w✓ (✓, e (✓)) 0


= l (✓) h (l (✓)) (72)
@✓ w (✓, e (✓))
It differs from the one derived in the standard problem by the term w✓ (✓, e (✓)), that takes into
account the effect of ability on the wage. In the stardard problem we normalized it to 1.
The resource constraint is:
ˆ
(w (✓, e (✓)) l (✓) u (✓) h (l (✓)) M (e (✓))) f (✓) d✓ E (73)

The government assigns welfare weight (✓) to each ✓ and solves:


ˆ
max (✓) u (✓) f (✓) d✓
c(✓),l(✓),e(✓),u(✓)

57
subject to (72) and (73).
The first order conditions wrt u (✓), l (✓) and e (✓) are respectively:

(✓) f (✓) f (✓) = µ0 (✓) (74)


hw w✓ 00 i

µ (✓) h0 (l) + l h (l) + [w h0 (l)] f (✓) = 0 (75)
w w

l l
µ (✓) 2
we w✓ h0 (l) + w✓e h0 (l) + [we l M 0 (e)] f (✓) (76)
w w
Taking the integral over (74) we derive µ (✓) = (✓) F (✓). Using (75), (70) and the definition
of the elasticity of labor supply we get:

w✓ 0 h00 (l)
µ (✓) h (l) 1 + l 0 = w⌧y (✓) f (✓)
w h (l)


h00 (l)
µ (✓) w✓ (1 ⌧y (✓)) 1 + l 0 = w⌧y (✓) f (✓)
h (l)
✓ ◆ ✓ ◆
⌧y (✓) (✓) F (✓) w✓ 1+" (✓) F (✓) "w,✓ 1+"
= = (77)
1 ⌧y (✓) f (✓) w " f (✓) ✓ "
The optimal income wedge is similar to the one we studied in Section 3. The formula has an extra
term proportional to the elasticity of the wage to ability. Labor distortions are higher at the optimum
when income is highly elastic to ability: the government distorts labor more when income is mostly
explained by ability and less by effort or investment in education.
Notice that:
⌧e + M 0 (e) ⌧y
we l M 0 (e) =
1 ⌧y

Rearranging (76) and using (70) and (71) we get:



we w✓ 0 w✓e w ⌧e + M 0 (e) ⌧y
µ (✓) l 2 h (l) 1 + + f (✓) = 0
w w✓ we 1 ⌧y

⌧e (✓) + M 0 (e) ⌧y (✓) (✓) F (✓) w✓ we


2 = l (1 ⇢✓,e ) (78)
(1 ⌧y (✓)) f (✓) w
The optimal education wedge decreases in the Hicksian coefficient of complementarity. When
education and ability are complements (i.e. ⇢✓e > 0) the government wants to discourage human
capital investments in order to redistribute income more. On the other hand, if the coefficient is
negative or low, education benefits low ability individuals more and a government subsidy to education
helps redistribution. Suppose the wage function is w (✓, e) = ✓e and the monetary cost is linear in
e, the optimal wedge is ⌧e (✓) = ⌧y (✓). This is the special case studied by Bovenberg and Jacobs
(2005) whose result is that income and education taxes are “Siamese Twins” and both margins should
be distorted the same way. They also prove that the optimal linear education subsidy is equal to
the optimal linear income tax rate, which is equivalent to making human capital expenses fully tax
deductible.

58
9.2 A Simpler Theory of Capital Taxation
We introduce in this paragraph a continuous time model with wealth in the utility function. We study
the case where utility is quasi-linear in consumption that allows us to transofrm the problem in a static
taxation problem.
Suppose individual i has utility ui (c, k, z) = c + ai (k) hi (z) where ai (·) is increasing and concave
and hi (·) is the standard disutility from labor. Agents have heterogeneous discount rates i . The
discounted utility is:
ˆ 1
Vi ({ci (t) , ki (t) , zi (t)}) = i [ci (t) + ai (ki (t)) hi (z (t))] e i t dt (79)
0
Capital accumulates according to:

dki (t)
= rki (t) + zi (t) T (zi (t) , rki (t)) ci (t) (80)
dt
where T (zi (t) , rki (t)) is the tax paid by individual i and is dependent on income and capital
returns. Wealth accumulation depends on the heterogeneous individual preferences, as embodied in the
taste for wealth ai (·) and in the impatience i . It also depends on the net-of-tax return r̄ = r (1 Tk ):
capital taxes discourage wealth accumulation through a substitution effect (there are no income effects).
The Hamiltonian for the individual maximization problem is:

it
Hi (ci (t) , ki (t) , zi (t) , (t)) = [ci (t) + ai (ki (t)) hi (z (t))] e + i (t) [rki (t) + zi (t) T (zi (t) , rki (t)) ci (t)]

Taking first order conditions we have:


@Hi it
=e i (t) = 0
@c

@Hi
= h0i (z (t)) e it
+ i (t) [1 Tz (zi (t) , rki (t))] = 0
@zi

@Hi
= a0i (ki (t)) e it
+ i (t) r (1 Tk (zi (t) , rki (t))) = 0
i (t)
@ki (t)

Rearranging:

i (t) = e it
, h0i (z (t)) = 1 Tz (zi (t) , rki (t)) , a0i (ki (t)) = i r (1 Tk (zi (t) , rki (t)))

Since utility is quasi-linear in consumption, the model converges immediately to a steady state.
Denote (ci , zi , ki ) the steady state allocation, the problem collapses to a static optimization of the
following objective function:

Vi ({ci (t) , ki (t) , zi (t)}) = [ci + ai (ki ) hi (z)] + i kiinit ki

where kiinit is the inherited level of capital and kiinit ki is the utility cost of going from kiinit to
the steady-state level.
The government maximizes the following:
ˆ
SW F = !i Ui (ci , ki , zi ) di
i

with !i 0 is the Pareto weight on individual i. The social marginal welfare weight is gi = !i Uic .

59
Optimal Linear Taxes Suppose the government sets linear income and capital taxes ⌧L and ⌧K .
The individual chooses labor and capital according to a0i (ki ) = i r̄ and h0i (zi ) = 1 ⌧L with
r̄ = r (1 ⌧K ). The government balances the budget through lump-sum transfers for a total of G ´ =
⌧K rk m (r̄)+⌧L ·z m (1 ⌧L ), where z m (1 ⌧L ) = i zi di is the aggregate labor income and k m (r̄) = i ki
´

is aggregate capital. Total consumption is ci = (1 ⌧K ) rki + (1 ⌧L ) zi + ⌧K rk m (r̄) + ⌧L · z m (1 ⌧L )


and the government maximizes:
⇥ ⇤
ˆ
SW F = !i (1 ⌧K ) rki + (1 ⌧L ) zi + ⌧K rk m + ⌧L · z m + ai (ki ) hi (zi ) + i · kiinit ki di
i

Using the Envelope theorem we get:


dSW F @k m
ˆ
= !i rki + rk m + ⌧K r di
d⌧K @⌧k
ˆ ✓ ◆
ki ⌧K
= rk m !i 1 m
di eK
i k 1 ⌧K

where eK is the elasticity of aggregate capital with respect to the net of tax return r̄. At the
optimum dSW F/d⌧K = 0 and the optimal linear tax is:
1 ḡK
⌧k =
1 ḡK + eK

where ḡK = i gi ki / i ki . This is the standard formula for optimal linear taxes that we studied in
´ ´

Section 2 applied to capital. Notice that whenever capital accumulation is uncorrelated with social
marginal welfare weights (i.e. ḡK = 1) the optimal tax is zero. The reason is that if capital has no tag
value the government does not find optimal to tax capital for redistributive purposes. We also know
from previous sections that the revenue maximizing tax rate corresponds to the case of ḡK = 0 and it
is ⌧K = 1/1 + eK .

Optimal Non-Linear Separable Taxes Suppose the government optimally sets TK (rk) and
TL (z). The individual budget constraint is:

ci = rki TK (rki ) + zi TL (zi )

Define with ḠK (rk) the average relative welfare weight on inviduals with capital income higher
than rk. We have:
´
g di
{i:rki rk} i
ḠK (rk) =
P (rki rk)

Let hK (rk) be the distribution of capital income so that the Pareto parameter associated to the
capital income distribution is:

rk · hK (rk)
↵K (rk) =
1 HK (rk)

Denote eK (rk) the elasticity of capital income with respect to the net of tax return r (1 TK
0
(rk)).
Suppose the government introduces a small reform TK (rk) where the marginal tax rate is increased
by ⌧K in a small interval of capital income from rk to rk + d (rk). The mechanical effect associated
to the reform is:

d (rk) ⌧K [1 HK (rk)]

60
The welfare effects just weights the mechanical effect by Ḡ (rk), the social marginal welfare weight
associated to capital incomes above rk. Individuals who face the increase in the tax rate change
their capital incomes by (rk) = eK ⌧K / (1 TK 0
(rk)). There are hK (rk) d (rk) individuals in the
window affected by the tax change. Therefore, the total behavioral effect is:
0
TK (rk)
hK (rk) d (rk) rk 0 (rk) eK (rk)
1 TK

Summing up the three effects and rearranging we find:


0
TK (rk) 1 1 HK (rk)
1 TK0 (rk) = e (rk) · rk · h (rk) · 1 ḠK (rk)
K K

Using the definition of the Pareto parameter we derive:

0 1 ḠK (rk)
TK (rk) =
1 ḠK (rk) + ↵K (rk) · eK (rk)

which looks like the standard optimal non-linear tax formula.

References
Bovenberg, L. and B. Jacobs. 2005. “Redistribution and Education Subsidies are Siamese Twins,”
Journal of Public Economics, 89(11-12), 2005-2035
Bovenberg, L. and B. Jacobs. 2011. “Optimal Taxation of Human Capital and the Earnings Function,”
Journal of Public Economic Theory, 13, (6), 957-971.
Saez, Emmanuel, and Stefanie Stantcheva. Working Paper. “A Simpler Theory of Optimal Capital
Taxation.” NBER Working Paper 22664.
Stantcheva, Stefanie. “Optimal Taxation and Human Capital Policies over the Life Cycle,” Journal of
Political Economy, forthcoming.

61
10 Section 11: Non-Linear Capital Taxation
In this section we first introduce a framework to study non-linear capital taxes and establish the
inverse Euler equation relation at the optimum. We then discuss implications of the inverse Euler
equation, the effect of skill shocks on consumption and the trend in consumption inequality required
by a Pareto-optimal allocation.18

10.1 Non-Linear Capital Taxation: Two-Periods Model


We introduce in this paragraph a two-period model with idiosyncratic uncertainty that unfolds over
time. The agent chooses savings and consumption today without knowing the realization of a stochastic
variable s in period 1 that can be interpreted as the agent’s skills. Preferences are represented by
U (c0 , c1 , y (s) /s). Notice that the two periods model we saw in section 9 had a time-0 shock, while
in the current model uncertainty unfolds in period 1. We will show that in the current model when
preferences are weakly separable, the Atkinson-Stiglitz result does not hold.
Individuals save through a linear storage technology with rate of return R⇤ = 1/q and the aggregate
resource constraint is:
X X
c0 + q c1 (s) p (s)  q y (s) p (s) (81)
s s

The individual maximizes utility:


X
U (c0 , c1 (s) , y (s) /s) p (s)
s

subject to (81).

First-Best The first-best allocation is characterized by the following two first-order conditions:

E [Uc0 (c0 , c1 (s) , y (s) /s)] =

Uc1 (s) (c0 , c1 (s) , y (s) /s) = q

which imply:

E [Uc0 (c0 , c1 (s) , y (s) /s)] = R⇤ Uc1 (s) (c0 , c1 (s) , y (s) /s) 8s (82)
Since the condition holds for every realization of s and E [Uc0 (c0 , c1 (s) , y (s) /s)] is constant, it
follows that in the first best the marginal utility of consumption across all states is equalized. This
implies we have full insurance and consumption is the same in every state of the world. Assuming
separability in consumption and labor and taking expectations on both sides of (82) we obtain the
standard Euler equation:

u0 (c0 ) = R⇤ E [u0 (c1 (s))]

and c1 (s) = c1 for every s.


18 Part of these notes is based on the Lecture notes by Florian Scheuer.

62
Second-Best Suppose now that s is private information of the agent. We know that we can solve the
problem by invoking the revelation principle. We assume the agent reports r = (s), where is the
reporting strategy and truthtelling implies ⇤ (s) = s. We denote consumption and income under the
reporting strategy with c1 (s) = c1 ( (s)) and y (s) = y ( (s)). Incentive compatibility requires:

E [U (c0 , c1 (s) , y (s) /s)] E [U (c0 , c1 (s) , y (s) /s)] 8 ,s


which is equivalent to:

U (c0 , c1 (s) , y (s) /s) U (c0 , c1 (r) , y (r) /s) 8r, s (83)
An allocation is feasible whenever it satisfies (81) and (83).
Given that an allocation is feasible we can ask whether free savings is feasible, meaning that extra
savings in period 0 leave the incentive compatibility unchanged. Suppose that preferences are
quasi-linear in consumption such that:

U (c0 , c1 (s) , y (s) /s) = Û (c0 , c1 (s) h (y (s) /s))


the incentive compatibility condition holds if:

Û (c0 , c1 (s) + R⇤ h (y (s) /s)) Û (c0 , c1 (r) + R⇤ h (y (r) /s))


this is true if and only if:

c1 (s) + R⇤ h (y (s) /s) c1 (r) + R⇤ h (y (r) /s)


therefore, only if c1 (s) h (y (s) /s) c1 (r) h (y (r) /s) that is implied by the feasibility of the
original allocation. This is always true when there are no income effects. However, if there were income
effects a change in savings in period 0 would have a negative effect on the labor supply in period 1.
We study the case with income effects assuming separable preferences in consumption and labor.
Consider the following preferences:

U (c0 , c1 (s) , y (s) /s) = u (c0 ) + u (c1 (s)) h (y (s) /s)


Consider the variation:
(c0 , c1 (s) + ( , s) , y (s))
such that

u (c0 ) + u (c1 (s) + ( , s)) = u (c0 ) + u (c1 (s)) + A ( ) 8s, (84)


for some A ( ) such that the variation is resource neutral:
X
+q ( , s) p (s) = 0 8 (85)
s
Incentive compatibility after the variation holds if:

u (c0 ) + u (c1 (s) + ( , s)) h (y (s) /s) u (c0 ) + u (c1 (r) + ( , r)) h (y (r) /s)
Using (84), we have:

u (c0 ) + u (c1 (s)) + A ( ) h (y (s) /s) u (c0 ) + u (c1 (r)) + A ( ) h (y (r) /s)
u (c0 ) + u (c1 (s)) h (y (s) /s) u (c0 ) + u (c1 (r)) h (y (r) /s)
which holds since the original allocation was incentive compatible. It follows that the total utility
from consumption is everything that matters for incentive compatibility when it is changed indepen-
dently from s.

63
Inverse Euler Equation Suppose a feasible allocation solves the second-best problem, the variation
that we just studied should not improve the welfare of the agent. We have:

X
0 = arg max p (s) [u (c0 ) + u (c1 (s) + ( , s)) h (y (s) /s)]
s
X
= arg max p (s) [u (c0 ) + u (c1 (s)) + A ( ) h (y (s) /s)]
s
= arg max A ( )

where the FOC is A0 (0) = 0. Differentiating (84) at = 0 we have:

@ ( , s)
u0 (c0 ) + u0 (c1 (s)) | =0 = A0 (0)
@
That we can rearrange to get:

@ ( , s) u0 (c0 ) + A0 (0) u0 (c0 )


| =0 = = (86)
@ u0 (c1 (s)) u0 (c1 (s))
using equation (85) that requires that the variation is neutral in terms of resources we find:
X @ ( , s)
1+q p (s) | =0 = 0
s
@

Combining this equation with (86) we get:

1 1 X p (s)
= (87)
u0 (c0 ) R⇤ s u0 (c1 (s))
This is the so called inverse Euler equation. Notice that we can rewrite the expression above as:
✓ ◆ ✓ ◆
1 1 1
E = E
u0 (c0 ) R⇤ u0 (c1 (s))
Using the Jensen’s Inequality we obtain:

 !  !
1 1
0 1 1 1
E (u (c0 )) = E 0
<E ⇤ 0
= R⇤ E (u0 (c1 (s)))
u (c0 ) R u (c1 (s))

Since u0 (c0 ) < R⇤ E (u0 (c1 (s))) we established that in the second-best the government will distort
savings downwards such that there is a positive intertemporal wedge. The reason is that savings change
the incentives to work in period 1 whenever there are income effects. By transferring money to period
1 the agent is less willing to work and more prone to imitate lower-skilled agents. By taxing savings
the government can partially prevent this behavior. Therefore, it is impossible to achieve a Pareto
optimal allocation when agents are allowed to undertake unlimited trading of bonds.

Dual Approach Consider the allocation in terms of utils:

u0 = u (c0 )
u1 (s) = u (c1 (s))

Suppose we change the allocation so that the agent is indifferent and we have:

64
u0 + u1 (s) = ũ0 + ũ1 (s) 8s (88)
We can write:

ũ0 = u0

and

ũ1 (s) = u1 (s) + 8s

Then the new allocation satisfies incentive compatibility only if the labor disutility under the new
allocation is such that:

u0 + (u1 (s) + ) h (y (s) /s) u1 (r) + (u1 (r) + ) h (y (r) /s) 8r, s

this is true if and only if:

u0 + u1 (s) h (y (s) /s) u1 (r) + u1 (r) h (y (r) /s) 8r, s

which is true only if the initial allocation was feasible. The dual problem rewrites the objective
function of the government so that at the optimum the total resource cost of the allocation is minimized:
( )
X
min C (u0 )+q p (s) C (u1 (s) + )
s

where C (u) is the inverse function of u (c). If an allocation is optimal, then = 0 must solve the
problem. The first order condition evaluated at = 0 is:
X
C 0 (u0 ) + q p (s) C 0 (u1 (s)) = 0
s

Since C (u) is the inverse function of u (c), then C 0 (u) = 1/u0 (c) (we also know it from Section 3).
It follows that:

1 q X p (s)
= (89)
u0 (c0 ) s
u0 (c1 (s))

We can interpret 1/u0 (c) as the resource cost of providing some incentives and therefore the inverse
Euler equation is equalizing the expected resource cost of providing incentives across the two periods.

10.2 Infinite Horizon Model


The dual approach goes through very easily in an infinite horizon setting with separable preferences.
The individual utility is:
X ⇥ ⇤
t
u c st h y st /st Pr st
t,st

and st is the history of shocks up to time t. We implement a revelation mechanism where each
agent must report her type through reporting strategies:

rt = t st

and truth-telling implies:

65

t st = st 8st , t

The history of reports is:


t
st = (r0 , r1 , . . . , rt ) = 0 (s0 ) , 1 s1 , . . . , t st

The dynamic incentive constraint is:

X ⇥ ⇤
t
u c st h y st /st Pr st
t,st
X ⇥ ⇤
t
u c st h y st /st Pr st
t,st

for every . Start from a node st and set the following perturbation:

ũ (s⌧ ) = u (s⌧ )

for any s⌧ 6= st and s⌧ 6= (st , st+1 ) so that consumption utilities are unchanged at any node that is
not st or any other of its direct successors. We therefore have:

ũ st = u st

and
ũ st , st+1 = u st , st+1 + 8st+1
The perturbation is incentive compatible is the starting allocation was incentive compatible. More-
over, the total expected utility after the perturbation is unchanged. At the optimal allocation we know
that = 0 must solve the following problem:
8 9
< X =
min C u st +q Pr st+1 | st C u1 st+1 +
: t+1 t
;
s |s

The FOC is:


X
C 0 u st +q Pr st+1 | st C 0 u1 st+1 =0
st+1 |st

which can be rearranged to get the inverse Euler equation:



1 1 1
= E | st
u0 (c (st )) R⇤ u0 (c (st+1 ))

History Dependence When there is private information it is not Pareto-efficient to fully insure the
agent in equilibrium. The government must provide the agent with the incentive to produce higher
output by allowing her to receive a higher level of consumption in case of a positive skill shock. Suppose
that R = 1 . Define as follows the innovation in agent’s information about 1/u0 (ct+1 ):

1 1
t+1 = 0 Et 0 (90)
u (ct+1 ) u (ct+1 )
Define the change in future forecasts as:

66
 
1 1
Et+1 Et (91)
u0 (ct+s+1 ) u0 (ct+s )
The inverse Euler equation is a Martingale when R = 1 and together with the Law of iterated
expectations imply:
 
1 1 1
= E t = E t 8s (92)
u0 (ct ) u0 (ct+1 ) u0 (ct+s )
Therefore the lhs of (91) becomes:
 
1 1 1
Et+1 0 = Et+1 0 = 0
u (ct+s+1 ) u (ct+2 ) u (ct+1 )
while using (92) the rhs becomes:
 
1 1 1
Et 0 = 0 = Et 0
u (ct+s ) u (ct ) u (ct+1 )
Therefore:
 
1 1
Et+1 Et = t+1 (93)
u0 (ct+s+1 ) u0 (ct+s )
The change in future forecasts equals the the innovation in agent’s information at t + 1. Any shock
that generates a change t+1 in 1/u0 (c) leads to a change t+1 in agent’s forecasts of 1/u0 (c) at
any future date. This implies that skill shocks have a permanent effect on the reciprocal of marginal
utilities. When utility is logarithmic permanent changes reflect directly in the level of consumption.
The reason is that if an agent receives a positive skill shock, it is efficient for the government to require
a higher level of output. However, since there is private information, the government must reward
higher output with a lifetime increase in consumption to provide the right incentive to the agent. The
inverse Euler equation governs how the increase is spread over time. In the special case of R = 1 the
increase is evenly spread over time.

Consumption Inequality Suppose that R = 1, the inverse Euler equation implies:


1 1
= Et
u0 (ct ) u0 (ct+1 )
1 1
+ "t+1 =
u0 (ct ) 0
u (ct+1 )
where "t+1 has mean zero and is uncorrelated with 1/u0 (ct ). Taking the variance:
✓ ◆ ✓ ◆
1 1
V ar + V ar ("t+1 ) = V ar
u0 (ct ) u0 (ct+1 )
We know from above that 1/u0 (ct+1 ) depends on new information revealed at time t + 1 and
therefore V ar ("t+1 ) > 0. It follows that:
✓ ◆ ✓ ◆
1 1
V ar > V ar
u0 (ct+1 ) u0 (ct )
Therefore, in a Pareto Optimal allocation the variance of the reciprocal of marginal utility grows
overtime. If utility is logarithmic this is equivalent to an increase in the inequality of consumption over
time. The reason for this result is that the government must provide through changes in consumption
the incentives. Therefore, consumption will grow more for workers with high skill realizations.

67
References
Golosov, M., Tsyvinski, A. & Werning, I., 2006. New Dynamic Public Finance: A User’s Guide. In
NBER Macroeconomic Annual 2006. MIT Press.
Kocherlakota, Narayana R. (2010), The New Dynamic Public Finance, Princeton Press
Scheuer, Florian (2014), Lecture Notes

68

You might also like