Lecture Notes Paradisi
Lecture Notes Paradisi
Matteo Paradisi
1
Contents
1 Section 1-2: Uncompensated and Compensated Elas-
ticities; Static and Dynamic Labor Supply 4
2
8 Section 9: Linear Capital Taxation 49
8.1 A Two-Period Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.2 Infinite Horizon Model - Chamley (1986) . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.3 Infinite Horizon - Judd (1985) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3
1 Section 1-2: Uncompensated and Compensated Elas-
ticities; Static and Dynamic Labor Supply
In this section, we will briefly review the concepts of substitution (compensated) elasticity and uncom-
pensated elasticity. Compensated and uncompensated labor elasticities play a key role in studies of
optimal income taxation. In the second part of the section we will study the context of labor supply
choices in a static and dynamic framework.
max u (x1 , . . . , xN )
x1 ,...,xN
s.t.
N
X
pi x i w
i=1
We solve the problem using a Lagrangian approach and we get the following optimality condition
(if an interior optimum exists) for every good i:
ui (x⇤ ) ⇤
pi = 0
This is an important condition in economics and it equates the relative price of two goods to the
marginal rate of substitution (MRS) between them. The MRS measures the amount of good j that
the consumer must be given to compensate the utility loss from a one-unit marginal reduction in her
consumption of good i. Graphically, the price ratio is the slope of the budget constraint, while the
ratio of marginal utilities represents the slope of the indifference curve.1
We call the solution to the utility maximization problem Walrasian or Marshallian demand and
we represent it as a function x (p, w) of the price vector and the endowment. The Walrasian demand
has the following two properties:
• homogeneity of degree zero: xi (↵p, ↵w) = xi (p, w)
• Walras Law : for every p 0 and w > 0 we have p · x (p, w) = w
1 Notice that in a two goods economy by differentiating the indifference curve u (x1 , x2 (x1 )) = k wrt x1 you get:
dx2
u1 + u2 =0
dx1
which delivers
dx2 u1
=
dx1 u2
which shows that the ratio of marginal utilities is the slope of the indifference curve at a point (x1 , x2 ).
4
We define uncompensated elasticity as the percentage change in the consumption of good i when we
raise the price pk . Using the Walrasian demand we can write the uncompensated elasticity as:
@xi (p, w) pk
"ui,pk =
@pk xi (p, w)
@ log xi (p, w)
"ui,pk =
@ log pk
Indirect utility: We introduce the concept of indirect utility that will be useful throughout the
class. It also helps interpreting the role of the Lagrange multiplier. The indirect utility is the utility
that the agent achieves when consuming the optimal bundle x (p, w). It can be obtained by plugging
the Walrasian demand into the utility function:
• homogeneity of degree zero: since the Walrasian demand is homogeneous of degree zero, it follows
that the indirect utility will inherit this property
• @v (p, w) /@w > 0 and @v (p, w) /@pk 0
Roy’s Identity and the multiplier interpretation: Using the indirect utility function, the value
of the problem can be written as follows at the optimum:
Applying the Envelope theorem, we can study how the indirect utility responds to changes in the
agent’s wealth:
@v (p, w) ⇤
=
@w
The value of the Lagrange multiplier at the optimum is the shadow value of the constraint. Specif-
ically, it is the increase in the value of the objective function resulting from a slight relaxation of the
constraint achieved by giving an extra dollar of endowment to the agent. This interpretation of the
Lagrangian multiplier is particularly important in the study of optimal Ramsey taxes and transfers.
You will see more about it in the second part of the PF sequence.
The Envelope theorem also implies that:
@v (p, w) ⇤
= xi (p, w)
@pi
Using the two conditions together we have:
@v(p,w)
@pi
@v(p,w)
= xi (p, w)
@w
This equation is know as the Roy’s Identity and it derives the Walrasian demand from the indirect
utility function.
5
1.2 Substitution Elasticity and the Expenditure Minimization Problem
In this section we aim to isolate the substitution effect of a change in price. An increase in the price
of good i typically generates two effects:
• substitution effect: the relative price of xi increases, therefore the consumer substitutes away
from this good towards other goods,
• income effect: the consumer’s purchasing power has decreased, therefore she needs to reoptimize
her entire bundle. This reduces even more the consumption of good i.
We define substitution or compensated elasticity as the percentage change in the demand for a good in
response to a change in a price that ignores the income effect. In order to get at this new concept, we
focus on a problem that is “dual” to the utility maximization problem: the expenditure minimization
problem (EMP). The consumer solves:
N
X
min pi x i
x1 ,...,xN
i=1
s.t.
u (x1 , . . . , xN ) ū
The problem asks to solve for the consumption bundle that minimizes the amount spent to achieve
utility level ū. The solution delivers two important functions: the expenditure function e (p, ū), which
measures the total expenditure needed to achieve utility ū under the price vector p, and the Hicksian
(or compensated ) demand h (p, ū), which is the demand vector that solves the minimization problem.
The Walrasian and Hicksian demands answer two different but related problems. The following
two statements establish a relationship between the two concepts:
1. If x⇤ is optimal in the UMP when wealth is w, then x⇤ is optimal in the EMP when ū = u (x⇤ ).
Moreover, e (p, ū) = w.
2. If x⇤ is optimal in the EMP when ū is the required level of utility, then x⇤ is optimal in the UMP
when w = p · x⇤ . Moreover, ū = u (x⇤ ).
The Hicksian demand allows us to isolate the pure substitution effect in response to a price change.
We call it compensated since it is derived following the idea that, after a price change, the consumer
will be given enough wealth (the “compensation”) to maintain the same utility level she experienced
before the price change. Suppose that under the price vector p the consumer demands a bundle x such
that p· x = w. When the price vector is p0 , the consumer solves the new expenditure minimization
problem and switches to x0 such that u (x) = u (x0 ) and p0 · x0 = w0 . The change w = w0 w is the
compensation that the agent receives to be as well off in utility terms after the price change as she
was before. Thanks to the compensation there is no income effect coming from the reduction in the
agent’s purchasing power.
We call the elasticity of the Hicksian demand function compensated elasticity and it reads:
@hi (p, ū) pk
"ci,pk =
@pk hi (p, ū)
6
and differentiate both sides wrt pk to get:
@hi (p, ū) @xi (p, e (p, ū)) @xi (p, e (p, ū)) @e (p, ū)
= +
@pk @pk @e (p, ū) @pk
@xi (p, w) @xi (p, w)
= + hk (p, ū)
@pk @w
@xi (p, w) @xi (p, w)
= + xk (p, , e (p, ū))
@pk @w
@xi (p, w) @xi (p, w)
= + xk (p, w)
@pk @w
Rearranging, we derive the following relation:
we have thus decomposed the uncompensated change into income and substitution effect. Notice
also how the income effect is the product of two terms: @xi@w(p,w)
is the response of the Walrasian
demand for good i to a change in wealth; xk (p, w) is the mechanical effect of an increase in pk on
the agent’s purchasing power: an agent whose demand for k was xk (p, w) experiences a mechanical
reduction of her purchasing power amounting to xk (p, w) when pk increases by 1. J. R.
max u (c, n)
c,n
s.t.
c = wn + I
7
working. Since the consumer is wealthier, if leisure is a normal good, she will tend to work less and
consume more leisure. This is the income effect. Notice that, even if the cost of leisure has increased,
the income and substitution effects do not go in the same direction unlike in standard consumer
problems where an increase in the price of good i generates a negative income and substitution effect
for good i. The reason is that this is an endowment economy where we think about leisure l as the
difference between total time endowment T and labor. We have l = T n. In this setup the agent is
a net seller of leisure and therefore the income effect is positive for leisure when the salary increases.
Now we get a little more formal and we study analytically the response of labor supply to changes
in the wage rate. Totally differentiating the optimality condition wrt w we get:
@n uc + n (unc + wucc )
=
@w w2 ucc + 2wunc + unn
Notice that the denominator of the expression is the second order condition of the problem and
can therefore be signed. If we assume the problem is concave (in order to get an interior solution), the
denominator is negative. This implies that:
@n
/ uc + n (unc + wucc )
@w
This expression captures the intuition provided above. The first term is the substitution effect,
which is always positive and proportional to the marginal utility of consumption: the extent to which
the consumer substitutes labor and consumption depends on how attractive consumption is. The
second term measures the income effect. It depends on the cross-derivative of consumption and labor
and the concavity of the utility function in consumption. The cross-derivative measures how changes
in consumption affect the labor disutility. Faster decreasing marginal returns to consumption imply
lower incentive to consume when the agent becomes wealthier (remember that ucc < 0). The income
effect is scaled by n, which is the mechanical effect on endowment of a one unit increase of w.
Example: We now study a functional form for preferences that is particularly convenient for the
study of optimal tax problems. Suppose the agent has the following utility:
1
n1+ "
u (c, n) = c
1 + 1"
This is a quasi-linear utility function whose property is to rule out income effects. We will come
back to this point later.
The optimality condition reads:
1
n" = w
Therefore, this utility function has a constant elasticity of labor supply. Also, given the absence of
income effects, we know that "un,w = "cn,w .
8
Compensated Labor Supply Elasticity: We can derive the compensated response of labor supply
by using the Slutsky equation. We already know the uncompensated response to wage changes and
we therefore need to find @n/@I. Totally differentiating the FOC wrt I we get:
@n unc + wucc
=
@I w2 u cc + 2wunc + unn
By comparing the compensated and uncompensated response we clearly see why quasi-linear pref-
erences imply no income effect: they are separable and linear in consumption. Therefore, unc = 0 and
ucc = 0.
-constant Elasticity: We introduce a concept that will be useful later in the analysis of intertem-
poral elasticites. The first order conditions for the static labor supply model solved with a Lagrangian
approach are:
uc =
un = w
Define the -constant or Frisch elasticity the elasticity that is computed assuming does not
change. Totally differentiating we get:
" #
@cF
ucc ucn @w 0
· @n F =
unc unn
@w
By inverting the 2 ⇥ 2 matrix we can solve the system. The sultions are:
" #
@cF 1
@w ucc ucn 0
@nF
=
unc unn
@w
1 unn ucn 0
=
ucc unn u2cn unc ucc
" #
ucn
ucc unn u2cn
= ucc
ucc unn u2cn
A Comparison Among Elasticities: We now draw a comparison among the three elasticities
presented so far. We already know that "cl,w "ul,w since the income effect is negative on labor supply.
We therefore need to compare compensated and -constant elasticity. We will prove that "F l,w "cl,w .
Start by writing the following:
9
1 1 w2 ucc + 2wunc + unn ucc unn u2cn
= +
"cl,w "F
l,w uc uc ucc
✓ 2
◆
1 ucn
= w2 ucc 2wunc
uc ucc
u2cn
The definition of -constant elasticity implies that unn ucc . It follows that:
u2cn
w2 ucc 2wunc w2 ucc 2wunc unn
ucc
= SOC
0
Where the last inequality uses the fact that the second order condition must be negative. Hence, we
established that "c1 1
"F
0, which implies "Fl,w "cl,w . Keeping the marginal utility of consumption
l,w l,w
constant implies that there are no income effects: higher wealth given the same amount of hours of
work does not change preferences towards consumption. Thus, the -constant elasticity is at least as
big as the compensated one.
We therefore conclude that the following is always true:
"F
l,w "cl,w "ul,w
10
The consumer faces the following period-by-period budget constraint:
At+1 = (1 + rt ) (At + yt + wt nt ct )
0
t = (1 + rt ) Vt+1 (At+1 )
un (ct , nt ) = w t
0
t = (1 + rt ) Vt+1 ( t+1 )
un (ct , nt ) = t wt
Since t = uc (ct , nt ) the static labor supply choice is the same as in the previous paragraph:
un (ct , nt )
= wt
uc (ct , nt )
Rearranging the budget constraint we have:
At+1
c t = w t nt + y t + At
1 + rt
h i
Notice that the problem is identical to the previous one where income It = yt + A t+1
1+rt At .
In order to assess the Frisch elasticity, we need to compute the labor responses to changes in w
when we keep the constant avoiding any wealth effect. The Frisch demands are defined as follows:
ct = cF
t (wt , t)
nt = nF
t (wt , t)
Since the model is identical to the static labor supply choice and we already derived the Frisch
elasticity for the latter, we can write:
@nF
t wt ucc (ct , nt ) wt
"F
nt ,wt = =
@wt nt ucc (ct , nt ) unn (ct , nt ) u2cn (ct , nt ) nt
References
MaCurdy, T. E.. An Empirical Model of Labor Supply in a Life-Cycle Setting, Journal of Political
Economy, 1981, vol. 89, issue 6, pages 1059-85
Mas-Colell, A., M. Whinston, and J. R. Green. Microeconomic Theory. New York: Oxford University
Press, 1995.
Miller, N. H. Notes on Microeconomic Theory
11
2 Section 2: Introduction to Optimal Income Taxation
In this section we will introduce the problem of optimal income taxation. We will set up the government
problem and derive optimal taxes. We will study optimal linear tax rate, optimal top tax rate and the
revenue maximizing tax rate.
We call this type of social welfare function utilitarian. The government targets a level of revenues
E and its budget constraint is:
ˆ 1
T (z) h (z) dz E
0
Where is constant across individuals and measures the value of government revenues in equilib-
rium. The optimal choice of T (z) delivers the following first order condition:
@L
= [ u0 (z T (z)) + ] h (z) = 0
@T (z)
Rearranging:
u0 (z T (z)) =
12
Notice that since is constant and all agents have the same preferences, the equilibrium condition
implies that consumption is equalized across all individuals. This is a direct consequence of the
utilitarian social welfare function and the concavity of the utility. Suppose that we taxed a rich
individual who would otherwise have a high level of consumption to redistribute to a poor who would
otherwise have low consumption. The marginal utility gain of the poor would be higher that the
marginal utility loss of the rich if the utility has decreasing marginal returns (implied by the concavity
of the utility function). This implies that until all consumption levels are equalized across the economy
the government can increase social welfare through “redistribution” from rich to poor individuals. Since
every agent has the same weight in the government social welfare function, the optimal policy will
treat all individuals equally. There is no gain for the government from guaranteeing a higher level of
consumption to a particular group of individuals.
Taxes will serve the purpose of collecting ´ 1the revenues needed to meet the requirement E. Each
individual consumes c = z̄ E, where z̄ = 0 zh (z) dz is the average income. Therefore, we have a
100% marginal tax rate above z̃ = z̄ E.
Social Welfare Functions: The general problem in Mirrlees (1971) assumes that individual welfare
is aggregated through a social welfare function G (· ). We typically assume that G (· ) is concave in order
to represent redistributive preferences of the government. We define the following a social marginal
welfare weight:
G0 ui uic
gi =
It measures the government marginal utility from giving a dollar to individual i. The expression is
scaled by the marginal value of revenues to the government ( ), that converts the marginal utility in
money metric. The concavity of the utility implies that gi is decreasing in zi . The social welfare effect
of giving $1 to all the individuals in the economy is therefore i gi .
´
ci = (1 ⌧ ) w i li + ⌧ Z
where Z represents the total income level in equilibrium and therefore ⌧ Z is the total tax revenue
from the tax.
13
The government sets the linear tax to maximize the following:
ˆ
G [ui ((1 ⌧ ) wi li + ⌧ Z, li )]
i
Notice that we do not have any government budget constraint since the entire revenue is rebated.
Applying the Envelope theorem we get:
dZ
ˆ
0
G (ui ) u0i w i li + Z ⌧ = 0
i d (1 ⌧ )
⌧
ˆ
G0 (ui ) u0i zi + Z Z"z,1 ⌧ = 0
i (1 ⌧)
Where the second line exploits the definition of uncompensated elasticity. Unlike zi , we implicitly
differentiate Z since the individual does not maximize over Z, but takes the transfer as given. In other
words the agent does not internalize the effect of her labor supply choice on aggregate revenues and
transfers. This is why the Envelope theorem does not apply to Z.
The two terms in the expression above are central in the optimal taxation literature:
• Z zi is the mechanical effect of the tax. Suppose we keep labor supply unchanged, an increase
in ⌧ generates a drop in income of zi and a mechanical increase in transfers of Z due to higher
revenues.
• ⌧
(1 ⌧ ) Z"z,1 ⌧is the behavioral effect of the tax. If we allow individuals to adjust their labor
supplies we have to take into account the fiscal externality on revenues: when people work less
the government collects lower revenues.
We could expect to see in the formula the utility consequence of a change in labor supply. However,
any welfare effect related to the behavioral response of the individual is excluded. The reason is that
although the agent changes the labor supply, if the tax change is small enough we can neglect the
utility effect invoking the envelope theorem. Remember that the logic of the envelope theorem is that
after we shift a parameter (the tax in this case) the agent is moving to a new bundle on the same
indifference curve.
Rearranging the optimality condition we find:
⌧
ˆ ˆ ˆ
Z gi g i zi = Z"z,1 ⌧ gi
i (1 ⌧)
´i i
gz
i´i i ⌧
1 = "z,1 ⌧
Z i gi (1 ⌧)
´
g z
We define ḡ = Z
i´i i
gi
and rewrite the condition above to get the optimal tax rate:
i
1 ḡ
⌧⇤ =
1 ḡ + "z,1 ⌧
The optimal tax is decreasing in "z,1 ⌧ and ḡ. When income is very elastic to taxes, the government
will tax less to avoid negative effects on revenues and transfers coming from distortions to the labor
supply. This is the efficiency part of the formula. On the other hand, ḡ is a measure of inequality in
the economy. It is low when income is extremely polarized. Therefore, the government increases taxes
at the optimum when inequality is high. This is the equity part of the formula.
14
2.5 Optimal Top Income Taxation
We now derive taxes as in Saez (2001). Instead of specifying a model, we consider the different effects
of a tax change and derive the tax by imposing that their sum is zero in equilibrium. Suppose the
government wants to optimally set a constant marginal tax rate ⌧ above an income threshold z ⇤ .
The average income above z ⇤ is denoted by z (1 ⌧ ) and it depends on the tax rate in place. The
uncompensated elasticity of z for top earners is constant and denoted by "z,1 ⌧ .
When tax ⌧ is raised we have no effects on individuals with income below z ⇤ , while all income
above z ⇤ are affected by the change. We will study three different effects of the tax.
Mechanical Effect Suppose labor supply was inelastic, when ⌧ increases we would see a mechanical
increase in revenues of the following form:
dM = d⌧ (z z⇤)
The mechanical effect is proportional to the difference between the average income above z ⇤ and
z . It measures the mechanical increase in revenues that is generated by the tax change.
⇤
Behavioral Effect Top earners react to the tax increase by adjusting their labor supply. The
behavioral response triggers a fiscal externality and a reduction in revenues. The behavioral effect is:
dz
dB = ⌧ dz = ⌧ d⌧
d (1 ⌧ )
⌧ 1 ⌧ dz
= d⌧
1 ⌧ z d (1 ⌧ )
⌧
= "z,1 ⌧ zd⌧
1 ⌧
It is proportional to the elasticity of labor supply since the more elastic is labor the higher is the
revenue loss.
Welfare Effect Denote with ḡ the (assumed) constant social marginal welfare weight for earners
above z ⇤ . The tax change mechanically raises revenues on top income individuals generating the
following welfare effect:
dW = d⌧ ḡ (z z⇤)
We also showed that the tax increase triggers a behavioral response. The reason it is not included
in the welfare effect is that if the tax change is small people reoptimize at the margin and their utility
level is unaffected. Again, this is an Envelope theorem argument.
Optimal Tax In equilibrium the three effects must sum to zero. If they did not the government
would have margin to adjust the tax rate and achieve a higher social welfare. We therefore have:
⌧
dM + dB + dW = d⌧ (1 ḡ) [z z ⇤ ] "z,1 ⌧ z =0
1 ⌧
Rearranging:
1 ḡ
⌧⇤ =
1 ḡ + a"z,1 ⌧
with a = z
z z⇤ measuring the thinness of the right tail in the income distribution. The optimal tax
is decreasing in the social marginal welfare weight of top earners ḡ: the more the government cares
15
about top income individuals, the less they will be taxed. As we could expect, the optimal tax is also
decreasing in the elasticity of labor supply. Higher elasticity implies larger efficiency costs. Finally, ⌧ ⇤
decreases in a. The shape of the income distribution matters: the government sets lower top income
taxes when earners above z ⇤ are mostly concentrated around z ⇤ . If instead there is a thicker tail, the
top income tax is higher.
References
Piketty, Thomas and Emmanuel Saez “Optimal Labor Income Taxation,” Handbook of Public Eco-
nomics, Volume 5, Amsterdam: Elsevier-North Holland, 2013. (web)
16
3 Section 3-4: Mirrlees Taxation
In this section we will solve the Mirrlees tax problem. We will and derive optimal taxes introducing
the concept of wedges and study the model with and without income effects.
Revelation Principle Throughout all of the tax problems that we study we will assume that the
government cannot observe the labor choice of the agent and her type. Income is the only observed
choice that the government can target. We solve the model using a revelation mechanism. Our goal is
to define an optimal tax schedule that delivers an allocation z (n), c (n) to each agent n. The Revelation
Principle claims that if an allocation can be implemented through some mechanism, then it can also
be implemented through a direct truthful mechanism where the agent reveals her information about
n.
We imagine that each agent reports to the government her type n0 and that allocations are a
function of n0 such that we can write c (n0 ), l (n0 ), z (n0 ) and u (n0 ). By revelation principle, the
government cannot do better than defining functions c (n), z (n) such that the agent finds optimal to
reveal her true productivity:
✓ ◆ ✓ ◆
z (n) z (n0 )
c (n) v c (n0 ) v
n n
for every n and n0 where n is the true type of the agent. Notice that since n is continuous we have
an infinity of constraints. In order to reduce the dimensionality of the problem, we assume that the
marginal rate of substitution between consumption and before-tax income is decreasing in n:
v 0 (z (n) /n)
M RScz = decreases in n
nu0 (c (n))
This is the so called single-crossing condition (or Spence-Mirrlees condition). Single-crossing and
incentive compatibility imply the monotonicity of allocations (i.e. c (n), z (n) are increasing in n). If
monotonicity and single-crossing are satisfied, we can replace the incentive constraint with the first-
order necessary conditions of the agent that provide a local incentive condition. Under monotonicity
and single-crossing the local conditions are also sufficient. While solving these problems we will only
impose local incentive constraints and ignore the monotonicity of allocations, which is then verified
ex-post.
Incentive Compatibility We reduce the dimensionality of the problem by taking a first order
approach that replaces the infinity of constraints for each individual with a local condition relying on
the optimal revelation choice. When reporting, the individual of type n solves the following problem:
✓ ◆
0 z (n0 )
max c (n ) v
n0 n
17
the first order necessary condition for this problem is:
✓ ◆
z 0 (n0 ) 0 z (n0 )
c0 (n0 ) v =0
n n
If the government wants the agent to reveal her true type, it must be:
✓ ◆
0 z 0 (n) 0 z (n)
c (n) = v
n n
Under the concavity assumption on the preferences, this is a global incentive constraint condition.
Suppose we study local utility changes by totally differentiating the utility wrt n, we get:
✓ ✓ ◆◆ ✓ ◆
du (n) z 0 (n) 0 z (n) z (n) z (n)
= c0 (n) v + 2 v0
dn n n n n
Notice that the term in the first bracket is the first order condition of the agent. We can thus write
du (n) /dn = z (n) /n2 v 0 (z (n) /n). This equation pins down the slope of the utility assigned to the
agent at the optimum. By convexity of v (·), the slope is always positive: the government assigns
higher utility to higher types at the optimum. Higher types have a lower marginal disutility of labor
for a given level of hours worked and they get informational rents in the equilibrium.
Labor Supply and Labor Wedge The individual solves the following optimization problem:
⇣z⌘
max z T (z) v
z n
The first order condition is:
v 0 (l)
T 0 (z) = 1
n
The second term on the right-hand-side of the equation is the marginal rate of substitution between
consumption and income and we can always write that T 0 (n) = 1 M RS (n). When the agent is
not distorted, the M RS is equal to 1 implying T 0 (z) = 0. We can interpret T 0 (z) as a wedge on
the optimal labor supply: whenever it is different from zero, labor supply is distorted. Wedges are a
central concept in the optimal taxation literature and we will encounter them throughout the class.
From the optimality condition, we can derive the elasticity of labor wrt the net of tax wage. Rewrite
the optimality condition as:
⇣z⌘
v0 = (1 T 0 (z)) n
n
Totally differentiating wrt (1 T 0 (z)) n, we have:
dl
v 00 (l) = 1
d (1 T 0 (z)) n
Which implies the following elasticity:
dl (1 T 0 (z)) n v 0 (z)
"= = 00
d (1 T 0 (z)) n l lv (z)
Resource Constraint Suppose the government has an exogenous revenue requirement E. The
revenues collected through taxation must be at least equal to E. Using the agent’s budget constraint
we can write the tax levied on a single individual as T (z (n)) = z (n) c (n). Summing over all the
individuals in the economy we get:
ˆ n̄ ˆ n̄
c (n) f (n) dn z (n) f (n) dn E
n n
This is the resource constraint for this economy. Notice that unlike incentive constraint, this
constraint is unique.
18
3.2 Optimal Income Tax
We now solve the constrained maximization problem using optimal control theory. Instead of having
taxes as a choice variable, we assume that the government chooses an allocation for each agent. Given
the individual’s budget constraint, this is equivalent to choosing a tax level. The government problem
is:
ˆ n̄
max G (u (n)) f (n)
c(n),u(n),z(n) n
s.t.
✓ ◆
du (n) z (n) 0 z (n)
= v
dn n2 n
ˆ n̄ ˆ n̄
c (n) f (n) dn z (n) f (n) dn E
n n
We solve the problem with a Hamiltonian where we interpret n as the continuous variable and
choose u (n) as state variable and z (n) as control. The incentive constraint becomes the law of motion
of the state variable: it measures how utility changes across types in equilibrium. In order to setup
the Hamiltonian, we need to replace consumption in the resource constraint with a function of state
and control variables. Using the definition of indirect utility, we can write c (n) = u (n) + v (z (n) /n).
We replace this condition into the resource constraint and setup the following Hamiltonian:
✓ ✓ ◆◆ ✓ ◆
z (n) z (n) z (n)
H = G (u (n)) + z (n) u (n) v f (n) + µ (n) 2 v 0
n n n
µ (n) denotes the multiplier on the incentive constraint of type n and is the multiplier on the
resource constraint.
The first order conditions of the optimal control problem are:
✓ ◆ ✓ ◆
@H v 0 (l (n)) µ (n) 0 z (n) z (n) 00 z (n)
= 1 f (n) + 2 v + v =0 (1)
@z (n) n n n n n
@H
= [G0 (u (n)) ] f (n) = µ0 (n) (2)
@u (n)
The transversality (boundary) conditions read:
µ (n) = µ (n̄) = 0
The Hamiltonian solution requires µ (n̄) u (n̄) = 0. However, if we want to provide positive utility
to type n̄ we must require µ (n̄) = 0. At the same time, since at the optimum the incentive constraints
will be binding downwards, we require µ (n) = 0. As it is standard in this kind of problems the lowest
type does not want to imitate any other agent in equilibrium implying that her incentive constraint
is slack, while everyone else is indifferent between her allocation and the allocation provided to the
immediately lower type.
If we integrate equation (2) over the entire type space and use transversality conditions we find:
ˆ n̄
= G0 (u (n)) f (n) dn
n
This is an expression for the marginal value of public funds to the government. It states that the
value of public funds depends on the marginal social welfare gains across the entire type space and
19
it is equal to the welfare effect of transferring $1 to every individual in the economy. In other words,
public funds are more valuable the higher are the social welfare gains achievable in the economy.
We can also integrate equation (2) to find the value of µ (n):
ˆ n̄
µ (n) = [ G0 (u (m))] f (m) dm (3)
n
Using the definition of labor elasticity, we rearrange the following:
✓ ◆ ✓ ◆ ✓ ◆
0 z (n) z (n) 00 z (n) 0 z (n) 1
v + v =v 1+
n n n n ✏
Exploiting the definition of the tax wedge, we simplify equation (1) to get:
✓ ◆
0 µ (n) 0 1
T (z (n)) = (1 T (z (n))) 1 +
f (n) ✏
The second part of the expression captures the ratio of the mass above type n and the density at
n. It is a measure of thickness and the lower it is the higher marginal tax rate will be.
´ n̄
By assumption n (n) dn = 1 implies = 1. First order conditions can be derived exactly as
before. We therefore have:
ˆ n̄
µ (n) = (f (n) (n)) dn
n
= (n) F (n)
20
Using the expression above the tax formula reads:
✓ ◆
T 0 (z (n)) 1+✏ (n) F (n)
=
1 T 0 (z (n)) ✏ nf (n)
To write the ABC formula we divide and multiply by 1 F (n) to get:
✓ ◆
T 0 (z (n)) 1+✏ (n) F (n) 1 F (n)
0
=
1 T (z (n)) ✏ 1 F (n) nf (n)
| {z } | {z } | {z }
A(n) B(n) C(n)
A (n) captures the standard elasticity and efficiency argument. B (n) measures the desire for
redistribution: if the sum of weights below n is high relative to the mass above n, the government will
tax more. Finally, C (n) measures the thickness of the right tail of the distribution. A thicker tail will
be associated to higher tax rates.
Notice that in the Rawlsian case (n) = 1 for every n > n and the formula converges to the one
presented in the previous paragraph.
Elasticity of Labor Supply The optimality condition for the labor supply choice becomes:
v 0 (l)
= (1 T 0 (z)) n
u0 (c)
The uncompensated response of labor supply to the net of tax wage is:
@lu u0 (c) + l (1 T 0 (z)) nu00 (c)
= 2
@ (1 T 0 (z)) n v 00 (l) (1 T 0 (z)) n2 u00 (c)
implying the following uncompensated elasticity:
v 0 (l)2 00
u0 (c) /l + u0 (c)2
u (c)
u
" = v 0 (l)2
v 00 (l) u0 (c)2
u00 (c)
21
Optimal Tax Everything is similar to the previous case except for the fact that now we cannot
replace the variable c (n) in the resource constraint using the definition of indirect utility. We will
define consumption as an expenditure function c̃ (ũ (n) , z (n) , n) and implicitly differentiate it wrt to
ũ (n) and z (n). Start from the definition of indirect utility:
It follows that the following two conditions will hold at the optimum:
1 0 ⇤
0 = u0 (c̃ (n)) dc̃ (n) v (z (n) /n) dz ⇤ (n)
n
Rearranging:
dc̃ (n) 1
= 0
dũ (n) u (c̃ (n))
@H
= G0 (u (n)) f (n) = µ0 (n)
@u (n) u0 (c⇤ (n))
In order to find the equilibrium value of the multiplier, we can integrate the second FOC:
ˆ n̄
µ (n) = G0 (u (m)) f (m) dm
n u0 (c (m))
We exploit the definition of the two elasticities to write:
2 ⇣ ⌘3
✓ ◆ ✓ ◆ ✓ ◆ 00 z(n)
z (n) z (n) 00 z (n) z (n) 41 + z (n) v n
v0 + v = v0 ⇣ ⌘5
n n n n n v0 z(n)
n
✓ ◆✓ u
◆
z (n) 1+"
= v0
n "c
The optimal tax formula will then become:
✓ ◆
T 0 (z (n)) 1 + "u ⌘ (n)
= (5)
1 T 0 (z (n)) "c nf (n)
u0 (c(n))µ(n)
where ⌘ (n) = .
22
3.5 Pareto Efficient Taxes
We now ask the question of whether a tax system T0 (z) in place is Pareto-optimal, meaning that there
exists no feasible adjustment in the tax schedule such that all individuals in the economy are weakly
better off.
We can characterize the Pareto frontier of the previous problem by solving the following:
ˆ n̄
max u (n) (n) dn
n
s.t.
ˆ n̄
[z (n) c (n)] f (n) dn E
n
By varying the social marginal welfare weights, we can trace out every point on the Pareto frontier.
However, there might be points on the Pareto frontier that can be improved upon increasing the utility
of all the agents in the economy.
Werning (2007) develops a test for the Pareto optimality of a tax schedule. The first important
result of the paper is the following:
Proposition 1: A tax code fails to be constrained Pareto optimal if and only if there exists a
feasible tax reform that (weakly) reduces taxes at all incomes.3
Proof: (if ) suppose we weakly reduce taxes all over the entire economy, then every individual is
at least as well off.
(only if ) suppose there exists a Pareto improving feasible tax reform T1 (z). Then we have:
U (z1 (n) T1 (z1 (n)) , z1 (n) , n) U (zo (n) T0 (z0 (n)) , z0 (n) , n)
U (z1 (n) T0 (z1 (n)) , z1 (n) , n)
where the first inequality comes from the assumption of Pareto-improvement and the second from
the assumption that under T0 (z) the agent truthfully reveals her type and chooses z0 (n). The chain
of inequalities implies that T1 (z1 (n)) T0 (z1 (n)) for every n.
Proposition 1 implies that since the resource constraint is satisfied and both tax systems raise
revenues at least equal to E, a Pareto improvement can only occur through a tax reduction that does not
generate a drop in revenues. This can be interpreted as a Laffer effect: although the government lowers
taxes, the behavioral response (increase in labor supply) is strong enough to more than compensate
the revenue loss.
23
s.t.
dv (n)
= Un (c̃ (v (n) , z (n) , n) , z (n) , n) (6)
dn
ˆ n̄ ˆ n̄
L = (z (n) c̃ (v (n) , z (n) , n)) f (n) dn + (n) v (n) f (n) dn
n n
ˆ n̄ ˆ n̄
+ µ (n) v 0 (n) dn µ (n) Un (c̃ (v (n) , z (n) , n) , z (n) , n) dn
n n
Notice that this is identical to the Lagrangian that ´we would obtain in a classical optimal tax
n̄
problem with welfare weights (n) f (n).5 We integrate n µ (n) u0 (n) dn by parts to obtain:
ˆ n̄ ˆ n̄
µ (n) v 0 (n) dn = µ (n̄) v (n̄) µ (n) v (n) µ0 (n) v (n) dn
n n
ˆ n̄ ˆ n̄
L = (z (n) c̃ (v (n) , z (n) , n)) f (n) dn + (n) v (n) f (n) dn
n n
ˆ n̄ ˆ n̄
+µ (n̄) v (n̄) µ (n) v (n) µ0 (n) v (n) dn µ (n) Un (c̃ (v (n) , z (n) , n) , z (n) , n) dn
n n
ˆ n̄
v (n) (n) f (n) dn
n
s.t.
dv (n)
= Un (c̃ (v (n) , z (n) , n) , z (n) , n)
dn
ˆ n̄
[z (n) c̃ (v (n) , z (n) , n)] f (n) dn E
n
24
dc̃ (v (n) , z (n) , n) dc̃ (v (n) , z (n) , n)
f (n) µ0 (n) µ (n) Unc (n) + (n) f (n) = 0 (9)
dv (n) dv (n)
We know from the paragraph about optimal taxation with income effects that we can write:
Also, since T 0 (n) = 1 M RS (n) (see the discussion about wedges) we have that:
@M RS (n)
T 0 (n) f (n) = µ (n) Uc (n)
@n
Using M RS (n) = 1 T 0 (n), we can rewrite the condition as:
Any Pareto Efficient allocation must satisfy (7) and provide at least utility v (n̄) to every agent n.
By the complementarity-slackness condition, this is equivalent to ask that (n) f (n) 0 , which is
that the multipliers associated to the constraints are never negative. We rewrite the FOC imposing
the following inequality:
µ̂0 (n) = Uc (n) µ0 (n) + µ (n) [Ucn (n) + Ucc (n) c0 (n) + Ucz (n) z 0 (n)]
25
Substituting into (10) we find:
The local incentive constraint of the agent (FOC for optimal reporting) implies that c0 (n) /z 0 (n) =
Uz (n) /Uc (n). It follows that:
Applying the Test and Interpreting the Conditions Suppose the agent has preferences U (c, z, n) =
c 1 (z/n) with elasticity " = 1/ ( 1). The FOCs of the dual problem read:
26
" ⌧ nf 0 (n)
1 1 8n (15)
1+"1 ⌧ f (n)
First, the condition in (15) shows that for any ⌧ and " there exists a set of f (n) such that ⌧ is
Pareto efficient and a set of f (n) such that it is not Pareto efficient. At the same time, for any " and
f (n) we can find flat tax schedules ⌧ that are efficient and set of ⌧ s that are inefficient. It follows
that it is crucial to know the distribution of skills. The test can also be written in terms of income
distributions that are easier to infer from the data. Higher " makes the condition harder to be satisfied:
when individuals are reacting more to changes in taxes, a tax reduction is more likely to lead to a
Pareto improvement. When taxes are locally lowered at some n, the individuals below n will tend to
increase their labor supply and individuals above will reduce the labor supply. The term nf 0 (n) /f (n)
measures the elasticity of the skill distribution and captures how fast the skill distribution is decreasing
at some n. Highly negative elasticity of skill distribution at n means that the distribution decreases fast
and that the mass of individuals below n is significantly larger than above n, implying a local Laffer
effect from the increase in labor supply of individuals below n. In other words, by locally decreasing
taxes at n the government can increase revenues by incentivizing the labor supply of the large mass of
individuals below n. For this reason, when the elasticity of the skill distribution is highly negative the
test is harder to pass.7
References
Diamond, Peter (1998), “Optimal Income Taxation: An Example with a U-Shaped Pattern of Optimal
Marginal Tax Rates”, American Economic Review, 88, 83–95.
Salaniè, Bernard (2002), “The Economics of Taxation”, The MIT Press
Scheuer, Florian (2016), Lecture Notes
Werning, Ivan (2007), “Pareto Efficient Income Taxation”, Working Paper
7 In the second part of the sequence you will study how to run the same test in the inequality deflator framework
27
4 Section 5: Optimal Taxation with Income Effects
and Bunching
In this section we will see the derivation of optimal income taxes proposed by Saez (2001). We will
also introduce a way to estimate the elasticity of reported income that exploits the degree of bunching
at the kinks of the existing tax schedule (Saez, 2010).8
@ln (1 T 0 )
⇣nu = (16)
@ (1 T 0 ) ln
@ln
(1 T 0 )
⌘n = (17)
@Rn
Using the definitions above and the Slutsky equation, we can write the compensated elasticity as:
Notice that żn = ln + nl˙n and apply the definitions in (16) and (17) to get:
ln ln T 00 c
l˙n = ⇣nu żn ⇣ (19)
n 1 T0 n
Using (19) we can write:
żn ln + nl˙n 1 + ⇣u T 00 c
= = żn ⇣ (20)
zn nln n 1 T0
In order to derive the optimal tax, we follow the experiment in Saez (2001). Suppose we introduce
a perturbation around the optimal tax schedule such that we raise the marginal tax rate by d⌧ in a
small interval [z ⇤ , z ⇤ + dz ⇤ ].9 As we have already seen in previous Sections, the tax has three main
effects: mechanical, welfare and behavioral. While the mechanical and welfare effects are similar to the
ones we have previously studied, the behavioral effect now consists of two components: an elasticity
effect for people in the interval [z ⇤ , z ⇤ + dz ⇤ ] and an income effect for taxpayers above z ⇤ .
8 The bunching paragraph is based on previous notes by Simon Jager.
9 The experiment also assumes that d⌧ is second order relative to dz ⇤ to avoid any bunching at kink.
28
Figure 1: Tax Reform Experiment
Mechanical and Welfare Effects Every taxpayer above z ⇤ will pay an extra dz ⇤ d⌧ of taxes, which
for welfare purposes are valued according to the social marginal welfare weights g (z). The net-of-
welfare mechanical effect for pre-tax income z is (1 g (z)) dz ⇤ d⌧ . Summing up over all incomes above
z ⇤ we get:
ˆ 1
M = dz ⇤ d⌧ (1 g (z)) h (z) dz
z⇤
Social marginal welfare weights g (z) represent the value for the government of giving one dollar to
some level of income z. In particular, the government is indifferent between giving 1/g (z1 ) dollars to
individual 1 and 1/g (z2 ) dollars to individual 2. The social marginal welfare weights are expressed in
government money. Going back to the fully specified problem, we can interpret the weights as being
normalized by , the multiplier on the resource constraint. The measures the value of transferring
one dollar to every individual in the economy and captures the value of government funds. Higher
means that the government can significantly raise welfare by transferring money to the individuals in
the economy, a low on the other hand implies that the gain from transfers is low and public funds
are less valuable.
Elasticity Effect The increase in the marginal tax rate has an effect on individuals’ labor supply
that is denoted by dz. The effect consists of two parts. First, there is the direct consequence of
the increase in taxes that depends on the compensated elasticity of labor supply. Second, since the
taxpayer changes her labor supply by dz shifting on the tax schedule, she will face an additional change
in the tax. We write the change in the marginal tax rate induced by the shift dz as dT 0 = T 00 dz. The
behavioral response is proportional to the total tax change d⌧ + dT 0 :
dz c d⌧ + dT 0
dz = (d⌧ + dT 0 ) = ⇣ cz⇤
d (1 T 0 ) 1 T0
29
Rearranging:
d⌧
dz = ⇣ cz⇤ (21)
1 T0 + ⇣ c z ⇤ T 00
Income Effect All individuals above z ⇤ face a parallel shift of the tax schedule and pay additional
taxes for dz ⇤ d⌧ . The mechanical increase in taxes paid has a direct income effect that depends on the
income parameter ⌘ = (dz/dR) (1 T 0 ). Moreover, since the individual shifts along the tax schedule
we must take into account a further change in the tax rate. The two effects combined are:
T 00 dz d⌧ dz ⇤
dz = ⇣ cz ⌘
1 T0 1 T0
rearranging
d⌧ dz ⇤
dz = ⌘
1 T 0 z⇣ c T 00
In order to compute the total revenue effect we then need to sum over all taxpayers above z ⇤ and
account for the marginal tax rate T 0 :
ˆ 1
d⌧ dz ⇤
⌘ T 0 h (z) dz (22)
z⇤ 1 T 0 z⇣ c T 00
Virtual and Actual Income Density Saez (2001) introduces the concept of virtual density in
order to simplify the tax formulas. The virtual density is closely related to the virtual income in that
it is the income density that would arise if the tax system was linear and tangent to the tax schedule
T (z) at every z. We denote the virtual density with h⇤ (z). The mapping between virtual density and
the type distribution f (n) is given by the following:
h⇤ (z) ż ⇤ = f (n)
where ż ⇤ is the derivative of earnings wrt nwhen the linear tax schedule is in place. A similar
relation holds for h (z) and we have h (z) ż = f (n). Using the definition in (27) and the fact that
T 00 = 0 when a linear tax schedule is in place we can write:
1 T0 1 + ⇣nu ⇤ 1 + ⇣nu
h (zn ) z n = h (z n ) zn
1 T 0 + zn ⇣nc T 00 n n
It follows that:
h⇤ (z) h (z)
= (23)
1 T 0 (z) 1 T 0 (z) + ⇣ c zT 00 (z)
Optimal Income Taxes Starting from (21) and using equation (23) we can write the elasticity
effect as follows:
d⌧ T0
E= h (z) dz ⇤ T 0 ⇣ c z ⇤ = ⇣ cz⇤ h⇤ (z) d⌧ dz ⇤ (24)
1 T + ⇣ c z ⇤ T 00
0 1 T0
Notice that in order to get the revenue effect that the elasticity effect should measure, we multiplied
the expression in (21) by the marginal tax rate T 0 and by h (z) dz ⇤ , which is the share of taxpayers
affected by the tax reform. Using (23) we can write the income effect as follows:
ˆ 1
T0
I = d⌧ dz ⇤ ⌘ h⇤ (z) dz (25)
z⇤ 1 T0
30
At the optimum, the sum of the three effects must be zero. We thus impose:
M +E+I =0
and find
1 1
T0 T0
ˆ ˆ
dz ⇤ d⌧ (1 g (z)) h (z) dz ⇣ cz⇤ h⇤ (z) d⌧ dz ⇤ d⌧ dz ⇤ ⌘ h⇤ (z) dz = 0
z⇤ 1 T0 z⇤ 1 T0
Rearranging:
ˆ 1 ˆ 1
T0 1 1 T0
= (1 g (z)) h (z) dz ⌘ h⇤ (z) dz
1 T0 ⇣ c h⇤ (z) z ⇤ z⇤ z⇤ 1 T0
ˆ 1 ˆ 1
1 1 H (z ⇤ ) h (z) T0 h⇤ (z)
= (1 g (z)) dz ⌘ dz (26)
⇣ c h⇤ (z) z ⇤ z⇤ 1 H (z ⇤ ) z⇤ 1 T 0 1 H (z ⇤ )
As we have already seen in the previous Sections the formula consists of different terms. 1
H (z ⇤ ) /h⇤ (z) z ⇤ captures the shape of the income distribution and measures how many people are
above z ⇤ relative to how much income is accumulated at z ⇤ (i.e. z ⇤ h (z ⇤ )). The former is proportional
to the mechanical increase in revenues, while the second measures the total income that is distorted
by the tax. The marginal tax is also decreasing in the compensated elasticity ⇣ c following a classical
efficiency argument, and increases the larger the income effect is (in absolute value): a stronger income
effect means that the negative fiscal externality from higher taxes is reduced.
31
Figure 2: Bunching
dz zH dt
dz ⇤ = | H d (1 t) = e d (1 t) = e (z ⇤ + dz ⇤ )
d (1 t) z=z 1 t 1 t
Where e is the compensated elasticity of income. Notice that the dz ⇤ is proportional to the ratio
between the change in the tax rate and the net-of-tax rate 1 t. It follows that, everything else being
equal, a change in marginal tax rates from 0 percent to 10 percent should produce the same amount
of bunching as a change from 90 percent to 91 percent. Rearranging:
✓ ◆
1 t + edt z⇤
dz ⇤ = e dt
1 t 1 t
which implies:
z⇤
dz ⇤ = e dt (27)
1 t + edt
It is not surprising that dz ⇤ is increasing in the elasticity of income, implying that if income is more
elastic more people will bunch.
Suppose the income distribution is locally continuous, the share of people bunching at the kink is:
s (z ⇤ ) = h (z ⇤ ) dz ⇤
s (z ⇤ ) z⇤
= e dt
h (z ⇤ ) 1 t + edt
32
✓ ◆
s (z ⇤ ) ⇤ s (z ⇤ )
e + z dt = (1 t)
h (z ⇤ ) h (z ⇤ )
s (z ⇤ ) 1 t s (z ⇤ ) 1 t 1
e= ⇡
s (z ⇤ ) + z ⇤ s (z ⇤ ) dt h (z ⇤ ) z ⇤ dt
where we assumed that s (z ⇤ ) /z ⇤ ⇡ 0. The formula shows that using the share of people bunching
at kink and the income distribution that would arise under the no reform scenario we can estimate the
elasticity of labor supply.
References
Saez, E. “Using Elasticities to Derive Optimal Income Tax Rates”, Review of Economics Studies, Vol.
68, 2001, 205-229.
Saez, E. “Do Taxpayers Bunch at Kink Points?”, AEJ: Economic Policy, Vol. 2, 2010, 180-212
33
5 Section 6: Optimal Income Transfers
In this Section we study the optimal design of income transfers. We start from a formal model where we
specify individuals’ preferences and a government’s social welfare function. We then take the approach
by Saez (2002) to derive optimal transfers using an “experiment” where we introduce a perturbation
around the optimal tax schedule for a generic “occupation” and derive a formula for the optimal tax.
We solve the problem with a Lagrangian where we attach multiplier to the government constraint.
The FOC wrt Ti reads:
2 3
m ⇤ XI
@u (c , i ) @h
ˆ
i ⇤ j5
µm dv (m) + 4hi Tj =0 (31)
Mi @ci j=0
@c i
For the usual envelope argument equation (31) ignores the welfare effect of a change in ci .
A social marginal welfare weight is:
1 @um (ci⇤ , i⇤ )
ˆ
gi = µm dv (m) (32)
hi M i @ci
Using the definition of gi we can rewrite (31) as:
I
X @hj
(1 g i ) hi = Tj (33)
j=0
@ci
This formula is very similar to the one you will see in the spring studying Ramsey taxation.10 Take
10 The formula implies that the following is true for every i:
X
I
Tj @hj
=1
hi (1 gi ) @ci
j=0
34
a benchmark case of no income effects such that hj (c0 , . . . , cI ) = hj (c0 + R, . . . , cI + R), the formula
implies that (1 gi ) hi = 0. Summing over all is:
X X
hi g i = hi = 1 (34)
i i
Tax Experiment The same formula for optimal taxes can be derived through the following exper-
iment. Suppose taxes increase by dTi for occupation i. The mechanical increase in tax revenues is
hi dTi and it will be valued (1 gi ) hi dTi by the government taking into account the welfare effect of
the change. The government must also account for the fiscal externality generated by the behavioral
response of agents in occupation i. Using the elasticity of participation, the share of people leaving
occupation i is:
hi
dhi = ⌘i dTi
ci c0
Each worker leaving occupation i generates a loss in revenues equal to Ti T0 . The total behavioral
effect of the tax increase is:
Ti T0
dhi (Ti T0 ) = ⌘ i hi dTi
ci c0
We can interpret the lhs as an index of how much labor supply is discouraged. The formula holds for every i and
implies that discouragement is equalized across all occupations.
35
Summing the mechanical and behavioral effects at the optimum we get:
Ti T0
(1 gi ) hi dTi ⌘ i hi dTi = 0
ci c0
Rearranging we can derive (35). The decomposition of the formula in mechanical and behavioral
effects provides further intuition for why marginal tax rates can be negative at the optimum. For very
low incomes the mechanical effect of providing an extra dollar is positive (gi > 1) and at the same
time a decrease in taxes at i provides incentives for unemployed workers to enter the labor force. The
sum of the two effects is unambiguously positive.
Summing over i, i + 1, . . . , I and using the definition in (36) we can derive the optimal tax formula:
Ti Ti 1 1 (1 gi ) hi + (1 gi+1 ) hi+1 + . . . + (1 gI ) hI
= (37)
ci ci 1 ⇣i hi
Non-increasing social marginal welfare weights imply that (1 gi ) hi + (1 gi+1 ) hi+1 + . . . +
(1 gI ) hI 0 for any i > 0. Thus, the tax Ti is increasing in i and it is not optimal to set negative
marginal tax rates. Using (34), (37) and computing the formula for the tax rate at the bottom of the
income distribution we get:
T1 T0 1 (g0 1) h0
= (38)
c1 c0 ⇣1 h1
A higher social marginal welfare weight g0 implies a higher tax rate at the bottom. The reason is
that if the government cares more about the unemployed individual it should set the lump-sum transfer
T0 as large as possible by imposing large phasing-out tax rates at the bottom. Negative marginal
tax rates at the bottom can still occur for g0 < 1, but this would imply that the unemployed worker
has a lower welfare weight than the average taxpayer in the economy, meaning that the government
has unusual redistributive tastes.
11 We can write the share of people working in occupation i as h (c , c
i i i+1 ). No income effects imply h (c0 , c1 , . . . , cI ) =
h (c0 + R, c1 + R, . . . , cI + R) . It follows that h (ci , ci+1 ) = h (ci + ci ci 1 , ci + ci+1 ci ) = h (ci ci 1 , ci+1 ci ).
36
Tax Experiment The formula in (37) can be derived through an experiment where taxes increase
by dT for any occupation i, i + 1, . . . , i + I. This change decreases ci ci 1 by dT and leaves any
other difference unaltered. The mechanical increase in revenues is [hi + hi+1 + . . . + hI ] dT and net-
of-welfare it is valued [hi (1 gi ) + hi+1 (1 gi+1 ) + . . . + hI (1 gI )] dT . The behavioral effect of
the tax change arise from individuals in occupation i only when we assume income effects away. The
impact on revenues is dhi = hi ⇣i dT / (ci ci 1 ) and it must be scaled by the loss in revenues Ti Ti 1
generated by each worker switching to occupation i 1. Summing the two impacts:
Rearranging we get the formula in (37). The mechanical and behavioral effects help providing
intuition for why negative marginal tax rates are not optimal with intensive margin only. Suppose
the government raised taxes at i when there is a negative marginal tax rate in the interval [i 1, i].
Individuals would respond by shifting their labor supply to i 1 and, given the higher tax rate, would
pay more taxes. At the same time the tax change would mechanically increase revenues. Therefore,
the government could always improve welfare by increasing taxes as long as the marginal tax rate is
negative.
When a tax is lowered in the pure extensive margin model, labor supply unambiguously increases.
On the other hand, if a tax is decreased in a pure intensive margin model individuals will have incentives
to lower their labor supply. The formula shows how to optimally trade-off the two effects.
Notice that (39) can be rewritten as (37) where we employ augmented social welfare weights
ĝi = gi + ⌘i (Ti T0 ) / (cj c0 ). When the participation elasticity is high enough, the augmented
welfare weights are not necessarily decreasing in wi if gi s are. This explains why an earning income
tax credit could be optimal in a mixed model.
References
E. Saez "Optimal Income Transfer Programs:Intensive Versus Extensive Labor Supply Responses"
Quarterly Journal of Economics, 117, 2002, 1039-1073
E. Saez, P. Diamond. "The Case for a Progressive Tax: From Basic Research to Policy Recommenda-
tions", Journal of Economic Perspectives 25(4), Fall 2011, 165-190
37
6 Section 7: Optimal Top Income Taxation
In this Section we study the optimal design of top income taxes.12 We have already covered optimal
top income taxation in a simple Mirrlees framework in Section 2. Today, we will start from the
“trickle down” model with endogenous wages introduced by Stiglitz (1982). We then analyze an
example of optimal taxation in a general equilibrium model where workers choose between different
occupations/sectors (Rothschild Scheuer, 2016). Finally, we present a model where top earners respond
to taxes on three margins: labor supply, tax avoidance, and compensation bargaining (Piketty Saez
Stantcheva, 2014).
@F (lL , lH )
wi = (40)
@li
The standard Mirrlees model implicitly assumes a linear production F (lL , lH ) = ✓L lL +✓H lH where
✓i is the ability of agent i so that wages are wi = ✓i .
The resource constraint of this economy is:
X
ci F (lL , lH )
i
The government assigns linear welfare weights L and H to the two types. If H < L the
government wants to redistribute to low types and we know that in equilibrium the incentive constraint
for the high type is binding:
✓ ◆
w L lL
u (cH , lH ) = u cL , (41)
wH
We solve the problem with the following Lagrangian:
" #
X
L = L u (cL , lL ) + H u (cH , lH ) + F (lL , lH ) ci (42)
i
✓ ◆ X
w L lL
+µ u (cH , lH ) u cL , + ⌘i (wi Fi (lL , lH ))
wH i
is the marginal value of public funds, µ is the value of relaxing the incentive constraint for type
H and ⌘s are the multipliers on the constraints in (40).
We derive the optimal marginal tax rate for the high type by optimally choosing cH and lH . The
FOCs are respectively:
[ H + µ] uc (cH , lH ) = (43)
X
[ H + µ] ul (cH , lH ) = FH (lH , lL ) + ⌘i FiH (lL , lH ) (44)
i
12 The first two paragraphs of this section are based on notes by Florian Scheuer.
38
The optimal labor supply choice implies the following labor wedge:
ul (cH , lH )
T 0 (zH ) = 1 +
uc (cH , lH ) wH
Using (43) and (44) we can rewrite the labor wedge as follows:
P P
FH (lH , lL ) + i ⌘i FiH (lL , lH ) ⌘i FiH (lL , lH )
T 0 (zH ) = 1 + = i (45)
FH (lL , lH ) FH (lL , lH )
The sign of (45) depends on ⌘L and ⌘H . In order to sign them, we exploit the government optimal
choice of wi characterized by the following FOCs:
✓ ◆
w L l L lL
µul cL , + ⌘L = 0
wH wH
✓ ◆
w L lL w L lL
µul cL , 2 + ⌘H = 0
wH wH
Since ul < 0, they imply ⌘L < 0 and ⌘H > 0. The CRS technology and concavity imply FHL > 0
and
P FHH < 0, which means complementarity0 between the two factors of production. Therefore,
i ⌘i FiH (lL , lH ) < 0 and we conclude that T (zH ) < 0. Top earners are subsidized at the margin
because their labor raises the wages of lower earners. By closing the gap between the two wages the
government can relax the incentive constraint for the high type and allow for additional redistribution.
The result is entirely driven by the complementarity of the two factors in the production function,
which generates a “positive externality” of the high type on the low type. In the classical Mirrlees
model with linear technology there is no complementarity and top incomes are not subsidized.
E = 'R e R + P eP
where P is the fraction of productive workers working in the rent-seeking sector. Productive
workers are indifferent between the two sectors when the wage in the rent-seeking sector is equal to 1
(the marginal product of labor in productive sector) and we have µ/E = 1 implying µ = E. If instead
E > µ they would all work in the traditional sector; while when E < µ they would all choose the
rent-seeking activity.
Suppose that preferences are quasi-linear u (c, e) = c h (e). It can be shown that the optimal
allocation involves an interior equilibrium where productive workers are indifferent between the two
occupations. If P is the share of productive workers working in the rent-seeking sector, we have:
E = 'R e R + P eP =µ
39
which implies that P is:
µ 'R e R
P (eR , eP ) =
eP
Given that the share of productive workers employed in the productive sector is 1 P, total output
produced in the economy is:
Y = µ + (1 P (eR , eP )) eP = eP + 'R eR
If the government can observe and tax income through a non-linear tax schedule but cannot tax
occupational choices, we can solve the problem by choosing an optimal effort level for each type. The
FOCs for effort are:
h0 (eP ) = 1
h0 (eR ) = 'R
Notice that the two conditions imply zero wedge on labor income for both types. Although rent-
seekers are not productive at all they are not taxed in equilibrium. The reason is that in this model
rent-seekers are “indirectly productive” by crowding out productive workers from the rent-seeking
sector. If rent-seekers were taxed, productive workers would be attracted into the rent-seeking sector
(eR would fall but E = µ, and p would have to increase to balance the change) and would decrease
total production. Notice that the result is not dependent on the assumption of utilitarian social
preferences, but would hold for any other combination of social welfare weights.
This example shows how general equilibrium considerations might be extremely important in shap-
ing optimal marginal tax rates. Even under the extreme assumption that all top earners are rent-
seekers, general equilibrium considerations would put downward pressure on marginal tax rates at the
top to avoid attracting productive workers into rent-seeking.
The model with occupational choices can also be employed to study the “trickle down” effects in
Stiglitz (1982) (see Rotschild Scheuer, 2013). Allowing for occupational choices still pushes towards
lower top marginal tax rate than in a standard Mirrlees model with linear production, but less so than
in a world without occupational choice (as in Stiglitz 1982). The reason is that, unlike in the Stiglitz’s
model, if the government subsidizes high types, effort increases in the high skill sector decreasing wages
in the high sector and increasing wages for the low sector. In a Roy model this would attract to the
low-skilled sector some workers who were indifferent between the two sectors, reducing the increase in
the low-skilled wage. This effect works against the standard general equilibrium effect presented in the
previous paragraph, making trickle down less effective.
Wage Bargaining Suppose top earners have measure 1 and after bargaining get a fraction ⌘ of
their output z (where we allow for ⌘ > 1) such that y = ⌘z. Bargained earnings are b = (⌘ 1) y
and average bargained earnings in the economy are E (b). In the aggregate, it must be the case that
total product is equal to total compensation. Hence, if E (b) > 0, so that there is overpay on average,
40
E (b) must come at the expense of somebody. The opposite is true if E (b) < 0. For simplicity, we
assume that any gain made through bargaining comes uniformly at the expense of everybody else in
the economy. Hence, individual incomes are all reduced by a uniform amount E (b) if E (b) > 0.13 We
further assume that individuals can exert effort to increase ⌘ and their preferences are:
(1 ⌧ ) ⌘ = h0i (y)
(1 ⌧ ) y = ki0 (⌘)
with ⌧ = T 0 (z). Let us denote the average reported income, output and bargaining as Walrasian
demands z (1 ⌧ ), y (1 ⌧ ) and b (1 ⌧ ). The implied elasticities are:
1 ⌧ dy
e1 =
y d (1 ⌧ )
1 ⌧ dz
e=
z d (1 ⌧ )
1 ⌧ db
e2 = = s· e
z d (1 ⌧ )
with
db/d (1 ⌧)
s=
dy/d (1 ⌧)
The definitions imply that (y/z) e1 = (1 s) e and that e = (y/z) e1 + e3 .
When the social welfare weight on top incomes is zero, the government chooses the top tax rate to
maximize total revenues:
max ⌧ [z (1 ⌧) z̄] N · E (b)
⌧
41
Tax Avoidance Responses to tax rates can also take the form of tax avoidance. Define tax avoidance
as changes in reported income due to changes in the form of compensation, but not in the total level
of compensation. We observe tax-avoidance if taxpayers can shift part of their taxable income into
another form that is treated more favorably from a tax perspective. Denote with x total sheltered
income such that ordinary taxable income is z = y x. Sheltered income is taxed at a constant
marginal tax rate t. Suppose that the individual faces a utility cost for sheltering taxes and utility is
ui (c, y, x) = c hi (y) di (x) where c = y ⌧ z tx + R = (1 ⌧ ) y + (⌧ t) x + R and R = ⌧ z̄ T (z̄)
is virtual income. We can write Walrasian demands z (1 ⌧, t) = y (1 ⌧ ) x (⌧ t). Let us define
e3 the elasticity of sheltered income:
1 ⌧ dx
e3 = = s· e
z d (1 ⌧ )
where
dx/d (⌧ t) dx/d (⌧ t)
s= =
dy/d (1 ⌧ ) + dx/d (⌧ t) @z/@ (1 ⌧)
and e = (y/z) e1 + e3 .
The government problem is:
max ⌧ [z (1 ⌧, t) z̄] + tx (⌧ t)
⌧,t
Suppose the government could only optimally set ⌧ given some t, the FOC would be:
@z dx
0 = [z z̄] ⌧ +t
@ (1 ⌧ ) d (⌧ t)
@z @z
= [z z̄] ⌧ + ts
@ (1 ⌧ ) @ (1 ⌧ )
⌧ ts
= [z z̄] ez
1 ⌧
rearranging:
1 + t· a· e2
⌧⇤ = (47)
1 + a· e
Notice how the tax is proportional to t· a· e2 that captures the fiscal externality of tax avoidance.
If t = 0 and the government cannot do anything to prevent income shifting, it is irrelevant whether e
is due to real response or tax avoidance response (see Feldstein, 1999).
If instead the government could also optimally set t, we would have an extra optimality condition:
@z dx
0 = ⌧ +x t
@t d (⌧ t)
dx
= x + (⌧ t)
d (⌧ t)
since x 0 and dx/d (⌧ t) 0 the first order condition can only hold if ⌧ = t. If this is the case
x (⌧ t) = x (0) = 0 and z = y so that e e3 = e1 . If we replace this in (47) we obtain:
1
⌧ ⇤ = t⇤ = (48)
1 + a· e1
Intuitively, the government finds optimal to close any tax avoidance opportunity at the optimum.
When this is the case the elasticity of income is the only one that matters.
42
References
Feldstein, Martin. 1999. “Tax Avoidance and the Deadweight Loss of the Income Tax.” Review of
Economics and Statistics 81 (4): 674–80.
Piketty, Thomas, Emmanuel Saez, and Stefanie Stantcheva. 2014. “Optimal Taxation of Top Labor
Incomes: A Tale of Three Elasticities.” American Economic Journal: Economic Policy 6 (1): 230–71.
Rothschild, Casey and Florian Scheuer, “Redistributive Taxation in the Roy Model,” Quarterly Journal
of Economics, 2013, 128, 623–668.
Rothschild, Casey and Florian Scheuer, “Optimal Taxation with Rent-Seeking,” Review of Economic
Studies, 2016, forthcoming
Scheuer, Florian (2014), Lecture Notes
Stiglitz, Joseph, “Self-Selection and Pareto Efficient Taxation,” Journal of Public Economics, 1982, 17,
213–240.
43
7 Section 8: Optimal Minimum Wage and Introduc-
tion to Capital Taxation
In this Section we develop a theoretical analysis of optimal minimum wage policy in a perfectly com-
petitive labor market following Lee and Saez (2012).
@F (h1 , h2 )
wi = (49)
@hi
A mass 1 of individuals has three labor supply options: i) not work and earn zero income, ii) work
in occupation 1 and get w1 , iii) work in occupation 2 and earn w2 . Individuals are heterogeneous
in their tastes for work. Every individual faces a vector ✓ = (✓1 , ✓2 ) of work costs that is smoothly
distributed across the entire population according to H (✓) with support ⇥. The government perfectly
observes the wage wi , but does not observe the cost of working. There are no savings and after tax
income equals consumption such that ci = wi Ti . Suppose there are no income effects and utility is
linear in consumption:
ui = c i ✓i
The subset of individuals choosing occupation i is ⇥i = {✓ 2 ⇥|ui = maxj uj }. The fraction of the
population working in occupation i is hi (c) = |⇥i | and is a function of c = (c0 , c1 , c2 ). The tax system
defines a competitive equilibrium (h1 , h2 , w1 , w2 ).
Equation (49) implies that w2 /w1 = F2 (1, h2 /h1 ) /F1 (1, h2 /h1 ). Constant returns to scale along
with decreasing marginal productivity along each skill implies that the right- hand-side is a decreasing
function of h2 /h1 . Therefore, the function is invertible and the ratio h2 /h1 can be written as a function
of the wage ratio w2 /w1 : h2 /h1 = ⇢ (w2 /w1 ) with ⇢ (· ) a decreasing function. Constant returns to
scale also imply that there are no profits in equilibrium. Hence ⇧ = F (h1 , h2 ) w1 h1 w2 h2 = 0 so
that w1 + w2 ⇢ (w2 /w1 ) = F (1, ⇢w2 /w1 ), which defines a decreasing mapping between w1 and w2 so
that we can express w2 as a decreasing function of w1 : w2 (w1 ).
Labor supply and demand for the low-skilled labor market are D1 (w1 ) and S1 (w1 ) with D10 (w1 ) 0
and S10 (w1 ) 0 and are defined assuming that the market clears in the high-skilled labor market. The
low-skilled labor demand elasticity is:
w1
⌘1 = D1 (w1 )
h1
The resource constraint of the economy is:
h0 c 0 + h1 c 1 + h2 c 2 h1 w 1 + h2 w 2 (50)
The government weights individual utilities through a social welfare function G (· ) and we can
write the social welfare of the economy as:
ˆ ˆ
SW = (1 h1 h2 ) G (c0 ) + G (c1 ✓1 ) dH (✓) + G (c2 ✓2 ) dH (✓) (51)
⇥1 ⇥2
We define social marginal welfare weights as usual g0 = G0 (c0 ) / and gi = ⇥i G0 (ci ✓i ) dH (✓) / ( hi ),
´
where is the marginal value of public funds. The concavity of the SWF implies g0 > g1 and g1 > g2 .
Since there are no income effects the value of transferring $1 to everyone in the economy is $1 and we
have = g0 h0 + g1 h1 + g2 h2 = 1.
44
Figure 3:
Minimum Wage with No Taxes Suppose there are no taxes and transfers, we have c0 = 0,
c1 = w1 and c2 = w2 . Suppose the economy is at the equilibrium and the government introduces a
small minimum wage above the equilibrium wage of the low-skilled market such that w̄ = w1⇤ + dw̄.
The change will generate a drop in employment h1 . The workers who drop out of the low-skilled
sector will move either to unemployment or to the high-skilled sector depending on their preferences.
We will assume efficient rationing: the workers who involuntarily lose their low-skilled jobs due to the
minimum wage are those with the least surplus from working in the low-skilled sector.14 This is clearly
the most favorable case to minimum wage policy. We establish the first result of the paper:
Proposition 1: With no taxes/transfers, if (i) efficient rationing holds; (ii) the government values
redistribution from high-skilled workers toward low-skilled workers (g1 > g2 ); (iii) the demand elasticity
⌘1 for low-skilled workers is finite; and (iv) the supply elasticity of lows-killed workers is positive, then
introducing a minimum wage increases social welfare.
Consider
P the changes dw1 , dw2 , dh1 and dh2 following the increase in the minimum wage, we have
d⇧ = i [(@F/@hi ) dhi wi dhi hi dwi ] = h1 dw1 h2 dw2 and the no profit condition implies:
dSW dh1 dh2 dh1 dh2
= G (0) + G (0) + G (0) +
dw̄ dw̄ dw̄ dw̄ dw̄
14 The assumption can be relaxed and a working paper version of the paper shows how the model can be derived under
45
Figure 4:
dw2
ˆ ˆ
+ G0 (c1 ✓1 ) dH (✓) + G0 (c2 ✓2 ) dH (✓)
⇥1 dw̄ ⇥2
The second and third terms come from the assumption of perfect rationing: the workers moving
to unemployment from the two occupations are those with zero surplus from working therefore the
welfare loss associated to the change of occupation is zero. Also, those who drop out of occupation 1
and move to 2 are indifferent between the two and we can ignore the welfare effect associated to the
change by envelope theorem. Using (52) we have dw2 /dw̄ = h1 /h2 and the FOC becomes:
dSW
= h1 [g1 g2 ] > 0
dw̄
which proves Proposition 1.
Minimum Wage with Taxes and Transfers We now assume that the government can use taxes
and transfers jointly with the minimum wage policy.
Proposition 2: Under efficient rationing, assuming ⌘1 < 1, if g1 > 1 at the optimal tax allocation
(with no minimum wage), then introducing a minimum wage is desirable. Furthermore, at the joint
minimum wage and tax optimum, we have: (i) g1 = 1 (Full redistribution to low-skilled workers); (ii)
h0 g0 + h1 g1 + h2 g2 = 1 (Social welfare weights average to one).
Suppose there was no minimum wage, an attempt to increase c1 by dc1 while keeping c0 and c2
constant through an increased work subsidy provides incentives for some of the non-workers to start
working in occupation 1 (extensive labor supply response) and for some of workers in occupation 2
to switch to occupation 1 (intensive labor supply response). This leads to a reduction in w1 through
demand side effects (as long as ⌘1 < 1). See Figure 4.
Consider the same increase in c1 when the minimum wage was initially set at w̄ = w1T , where
wi , ci is the the optimal tax and transfer system which maximizes social welfare absent the minimum
T T
wage. Since w1 cannot fall, labor supply responses are effectively blocked (Figure 5). Efficient rationing
guarantees that individuals willing to leave occupation 1 are precisely those with the lowest surplus
46
from working in occupation 1 relative to their next best option. Therefore, the dc1 change is like a
lump-sum tax reform and its net welfare effect is simply [g1 1] h1 dc1 . If g1 > 1, the introduction of
the minimum wage improves upon the tax/transfer optimum allocation. This result shows that under
the minimum wage policy, redistribution to low-skilled workers can be made lump-sum. Furthermore,
raising the lump-sum transfer to occupation 1 improves welfare as long as g1 > 1 and therefore the
government will find optimal to do it until g1 = 1. With no behavioral responses an increase of $1 has
a welfare effect of h0 g0 + h1 g1 + h2 g2 and at the optimal the two are equal.
Figure 5:
Suppose there is a minimum wage and the government introduces a change dc1 , the wage of oc-
cupation 1 does not change because of the minimum wage and so does w2 given that w2 (w1 ) (as we
showed above). As a consequence, there is no change in h1 /h2 = ⇢ (w2 /w1 ) and no change in the levels
of h1 and h2 since they cannot increase simultaneously. Therefore:
dL
ˆ
= G0 (c0 + c1 ✓1 ) dH (✓) h1 = [g1 1] h1
dc1 ⇥1
dL
ˆ ˆ
= (1 h1 h2 ) G0 (c0 ) + G0 (c0 + c1 ✓1 ) dH (✓) + G0 (c0 + c2 ✓2 ) dH (✓)
dc0 ⇥1 ⇥2
= [h0 g0 + h1 g1 + h2 g2 1]
Pareto Improving Reform In this section we review the last result in Lee and Saez (2012)
that shows how minimum wage and low-skilled labor subsidies can be complementary. Suppose
47
there are extensive margin responses only, the participation tax rate of low-skilled workers ⌧1 is
1 ⌧1 = (c1 c0 ) /w1 , such that c1 = c0 + (1 ⌧1 ) w1 .
Proposition 3: In a model with extensive labor supply responses only, a binding minimum wage
associated with a positive tax rate on minimum wage earnings (⌧1 > 0) is second-best Pareto inefficient.
This result remains a-fortiori true when rationing is not efficient.
Suppose the government reduces the minimum wage by dw̄ < 0 while keeping c0 , c1 and c2 constant.
The change incentivizes unemployed individuals to enter occupation 1 generating a change dh1 > 0
and increasing revenues since ⌧1 > 0. The change dh1 > 0 induces a change dw2 > 0. However, since
h1 dw̄ + h2 dw2 = 0 the mechanical effect of changes in wages is zero. Since c0 , c1 and c2 are constant
the total effect of the government policy is only given by the increase in revenues, which is positive.
Proposition 3 implies that, when labor supply responses are concentrated along the extensive margin,
a minimum wage should always be associated with low-skilled work subsidies such as the EITC.
To prove it formally notice that since consumption does not change at any occupation, the utility of
those who do not switch jobs is not affected. From the demand side, we have w2 (w1 ) with dw2 /dw1 =
h1 /h2 < 0 and hence dw2 > 0. This implies that relative demand for high-skilled work h2 /h1 =
⇢ (w2 /w1 ) decreases as ⇢ (· ) is decreasing. Because c2 c0 remains constant, and labor supply is only
along the extensive margin, the supply of high-skilled workers is unchanged so that dh2 = 0, which
then implies that dh1 > 0. The dh1 individuals shifting from no work to low-skilled work are weakly
better-off because they were by definition rationed by the minimum wage (strictly better off in case of
inefficient rationing). The government budget is h1 (w1 c1 ) + h2 (w2 c2 ) c0 0. Therefore,
the net effect of the reform on the budget is: dh1 · (w1 c1 ) + h1 dw1 + h2 dw2 = dh1 ⌧1 w1 > 0.
Thus, with ⌧1 > 0, the reform creates a budget surplus which can be used to increase c0 and improve
everybody’s welfare (with no behavioral response effects), a Pareto improvement.
References
[1] David Lee & Emmanuel Saez, 2012. "Optimal minimum wage policy in competitive labor markets,"
Journal of Public Economics, vol 96(9-10), pages 739-749.
48
8 Section 9: Linear Capital Taxation
In this section we introduce a framework to study optimal linear capital taxation. We first focus on
a two-period model, define the concept of intertemporal wedge and derive optimal capital taxes using
the Atkinson Stiglitz result. We then move to an infinite horizon model with aggregate uncertainty
and derive optimal taxes. Finally, we study a model with capitalists and workers and show that only
under some assumption about preferences we can recover a zero capital tax in steady state.15
(1 + ⌧0 ) c0 y0 T (y0 ) k1
(1 + ⌧1 ) c1 Rk1
where s0 = y0 T̃ (y0 ) c0 is the total level of savings when there are no distortions in the
economy. We can interpret ⌧˜ as a capital income tax. It distorts inter-temporal consumption decisions
by changing the relative price of c0 and c1 . It can be interpreted as a wedge on the optimal savings
decision. Notice that whenever ⌧˜1 = 0 we have no distortion in the inter-temporal choice of the agent.
Suppose the agent has separable preferences in consumption and labor of the form:
We can rewrite the preferences as U (g (c0 , c1 ) , y0 ) so that the utility is weakly separable in g (·) and
y0 . It follows that Atkinson-Stiglitz applies: if non-linear income taxation is available the government
finds optimal to set a flat zero tax on c0 and c1 .16
15 The second part of these notes is based on lecture notes by Florian Scheuer.
16 Here is a short proof of the Atkinson Stiglitz result provided in Kaplow (2006). Suppose all individuals have weakly
49
8.2 Infinite Horizon Model - Chamley (1986)
In this section we introduce a model where capital returns and wages are endogenous. We focus on
linear capital and labor taxes in an infinite horizon economy. There is aggregate uncertainty in the
economy and each period t a state st is realized so that the history of aggregate uncertainty is a
sequence st = (s0 , s1 , . . . , st ). Output is produced according to a constant return to scale production
function:
F K st 1
, L s t , st , t (54)
the productive capital at time t is the stock that was chosen at time t 1. The firm solves the
following profit maximization problem:
max F K st 1
, L s t , st , t w st L st r st K st
K,L
Competitive labor and capital markets imply that input prices are equal to their marginal product:
w s t = FL K s t 1
, L s t , st , t
r s t = FK K s t 1
, L s t , st , t
where Pr (st ) is the probability of history st . The aggregate resource constraint of the economy is:
c st + g st + K st (1 ) K st 1
F K st 1
, L s t , st , t (55)
The output produced is employed to finance consumption, public spending and investments. The
resource constraint implicitly assumes that aggregate uncertainty results from technology or govern-
ment spending shocks. The government optimally chooses taxes on labor income ⌧ l (st ) and taxes on
capital income ⌧ k (st ) and starts with initial debt B0 .
We assume complete markets where the price of an Arrow-Debreu security is p (st ).17 The govern-
ment budget constraint is:
X ⇥ ⇤
p st g st ⌧ l st w st L st ⌧ k st r st K s t 1 B0
t,st
Taxes on consumption and capital are employed to finance government layouts g (st ). Notice that
the capital tax is levied on the capital gain net of the capital depreciation.
separable preferences V (g (x) , y), where x is a vector of commodities. Suppose we start from a situation where there are
positive taxes on commodities and we implement a policy such that t ! 0: zero flat tax on all commodities. Suppose
the government offsets the utility change of the agent with non-linear income taxes such that labor supply is unchanged
at the optimum and V (g (x0 ) , y) = V (g (x) , y) . By definition every agent has the same utility as before and no one is
willing to imitate
P another individual (if they were not willing to imitate before the tax change). By revealed preference
we know that p x > y T ⇤ (y): the agent cannot afford the old bundle under the current income taxation. Under
k k k
P
the old policy scenario the agent could afford the bundle and we had (pk + ⌧k ) x0k y T (y). Combining the
P k
two inequalities we find that T ⇤ (y) > ⌧ x
k k k
0 + T (y) and the total revenue raised after the tax change is strictly
higher than total revenues before the tax change. Since incentive compatibility holds and we have no welfare effect by
construction, the new policy is welfare improving since it raises more revenues.
17 An Arrow-Debreu security is a financial instrument that provides one unit of consumption in a state s and zero
t
units in any other state. We talk about complete markets whenever we can price such an asset in every state of the
world.
50
The household budget constraint reads:
X ⇥ ⇤
p st c st + K st w st 1 ⌧ l st L st R st K st 1
B0 (56)
t,st
where R (st ) = 1 + 1 ⌧ k (st ) (r (st ) ) is the gross interest rate net of taxes. We can set up
the Lagrangian for the consumer problem:
X
t
L = Pr st u c st , L st +
st
2 3
X ⇥ ⇤
+ 4 B0 p st c st + K st w st 1 ⌧ l st L st R st K st 1 5
t,st
t
Pr st uL c st , L st + p st 1 ⌧ l st w st = 0 (58)
X
p st + p st+1 R st+1 = 0 (59)
st+1
On top of the FOCs, a non-arbitrage condition must hold between capital and Arrow-Debreu
securities:
X
p st = p st+1 R st+1 (60)
st+1
Definition: A competitive equilibrium is a policy g (st ) , ⌧ k (st ) , ⌧ l (st ) , an allocation {c (st ) , K (st ) , L (st )}
and prices {w (st ) , r (st ) , p (st )}, such that households maximizes utility s.t. budget constraint, firms
maximize profits, the government budget constraint holds and markets clear
Combining (57) and (58) we get the standard intratemporal condition for labor supply:
t
Pr (st ) uc (c (st ) , L (st )) t
Pr (st ) uL (c (st ) , L (st ))
=
p (st ) p (st ) (1 ⌧ l (st )) w (st )
uL (c (st ) , L (st ))
1 ⌧ l st w st = (61)
uc (c (st ) , L (st ))
From (57) and (59) we derive the so called Euler equation that pins down the slope of the con-
sumption path of the agent:
t
Pr (st ) uL (c (st ) , L (st ))
uc (c0 , L0 ) = (62)
p (st )
Starting from (56), we can rewrite it using the optimality conditions and the no-arbitrage condition:
51
X ⇥ ⇤
p st c st w st 1 ⌧ l st L st B 0 + R0 K0
t,st
X t
Pr (st ) uc (c (st ) , L (st )) uL (c (st ) , L (st ))
c st + L st B 0 + R0 K0
t
uc (c0 , L0 ) uc (c (st ) , L (st ))
t,s
X
t uL (c (st ) , L (st ))
Pr st uc c st , L st c st + t ) , L (st ))
L st uc (c0 , L0 ) [B0 + R0 K0 ]
u c (c (s
t,st
X ⇥ ⇤
t
Pr st uc c st , L st c st + uL c st , L st L st uc (c0 , L0 ) [B0 + R0 K0 ](63)
t,st
We call the constraint in (63) implementability constraint since it captures the agent’s optimal
choices subject to their feasibility.
Optimal Taxes The government chooses taxes to maximize the welfare of the representative agent
subject to the resource constraint and the implementability constraint. The problem reads:
X
t
max Pr st u c st , L st
c(st ),L(st ),K(st ),⌧0k
st
s.t.
c st + g st + K st (1 ) K st 1
F K st 1
, L s t , st , t
X ⇥ ⇤
t
Pr st uc c s t , L s t c s t + uL c s t , L s t L st uc (c0 , L0 ) [B0 + R0 K0 ]
t,st
We assume ⌧0k is fixed, we denote with µ the multiplier on the implementability constraint and
define:
⇥ ⇤
W c s t , L s t = u c s t , L s t + µ uc c s t , L s t c s t + uL c s t , L s t L s t
X
t
max Pr st W c st , L st µuc (c0 , L0 ) [B0 + R0 K0 ]
c(st ),L(s ),K(st )
t
st
s.t.
c st + g st + K st (1 ) K st 1
F K st 1
, L s t , st , t
t
Pr st WL c st , L st + s t FL K s t 1
, L s t , st , t = 0
52
X ⇥ ⇤
st + st , st+1 Fk K s t 1
, L st , st , t + (1 ) =0
st+1
Proposition 1: Suppose that (i) there is no uncertainty (ii) there is a steady state. Then in the
steady state ⌧ k = 0 is optimal.
It is easy to see that in a steady state when there is no uncertainty R (ss) = R⇤ (ss), which implies:
Now consider a special case with separable preferences and constant intertemporal elasticity of
substitution:
c1
u (c, L) = v (L)
1
then we have:
c1 ⇥ ⇤
W (c, L) = v (L) + µ c c v 0 (L) L
1
✓ ◆
1
= + µ c1 [v (L) + µv 0 (L) L]
1
therefore:
Wc = (1 + µ (1 )) c
= (1 + µ (1 )) uc
Wc
= 1 + µ (1 )
uc
Equation (65) reduces to R st+1 = R⇤ st+1 . Hence, we established that for separable preferences
with constant intertemporal elasticity of substitution we have zero capital taxation even out of the
steady state and in a model with uncertainty.
53
Tax Smoothing Take now the special case where v (L) = ↵L / is isoelastic, we have:
✓ ◆ ✓ ◆
1 1
W (c, L) = + µ c1 ↵ +µ L
1
it follows that
WL
=1+µ
uL
the optimal linear labor tax becomes:
1 + µ (1 )
⌧ l⇤ st = 1
1+µ
Therefore, labor taxes are constant across states and over time. The government finds optimal to
smooth distortions to labor supply. This result depends on the possibility of setting state-contingent
capital taxes. If the labor elasticity is constant and shocks can be offset using capital taxes, there is
no residual reason to differentially tax labor.
ct + Ct + g + kt+1 f (kt ) + (1 ) kt
Rt = 1 + (1 ⌧t ) (Rt⇤ 1)
where R⇤ = f 0 (kt ) + 1 .
Capitalists solve the following maximization problem:
+1
X
t
max U (Ct )
Ct ,at+1
t=0
s.t.
Ct + at+1 = Rt at
The optimality condition delivers the standard Euler equation U 0 (Ct ) = Rt+1 U 0 (Ct+1 ). Since
total wealth must equal total capital stock in equilibrium, using the Euler equation:
U 0 (Ct 1 )
Ct + kt+1 = Rt kt = kt
U 0 (Ct )
54
rearranging:
1
X
t
L = (u (ct ) + U (Ct ))
t=0
X1
t
+ t (f (kt ) + (1 ) kt ct Ct g kt+1 )
t=0
X1
+ t
µt ( U 0 (Ct ) (Ct + kt+1 ) U 0 (Ct 1 ) kt )
t=0
with µ0 = 0 since there is no implementability in the first period (⌧0 is taken as given). The first
order conditions wrt to ct , kt+1 and Ct are respectively:
u0 (ct ) = t (67)
t+1 1 U 0 (Ct )
(f 0 (kt+1 ) + 1 )= + (µt+1 µt ) (68)
t t
✓ ◆ ✓ ◆
µt U 0 (Ct ) 1 U 0 (Ct ) t
µt+1 = Ct + kt+1 + + (69)
kt+1 U 00 (Ct ) kt+1 U 00 (Ct ) U 00 (Ct )
It is straightforward from equation (68) that whenever a steady state exists it involves zero capital
taxes and R (ss) = f 0 (kt ) + 1 = R⇤ . This result is extremely powerful since it is independent
of the welfare weight attached to capitalists. However, the result does not hold for the case where
= 1. Rewrite the FOCs (68) and (69) using the inverse intertemporal elasticity of substitution
t = U 00 (Ct ) Ct /U 0 (Ct ) and defining the ratio vt = U 0 (Ct ) /u0 (ct ):
u0 (ct+1 ) 0 1
(68) ) (f (kt+1 ) + 1 )= + vt (µt+1 µt )
u0 (ct )
✓ ◆
1 1/ 1
(69) ) µt+1 = µt +1 + (1 vt )
kt+1 kt t v t
Take the case where = 1 (log preferences) and the allocation converges to a steady state, then:
1
R⇤
µt+1 µt =
v
1 v
µt+1 µt =
kv
55
1 1 v
) R⇤ =
k
As long as there is a low enough weight on capitalists, capital is taxed in steady state. For a long
time we thought that this was simply an anomaly for the logarithmic case. However, Werning and
Straub (2015) show that the result does not hold for any > 1 by noticing that the steady state does
not necessarily exist.
Proposition 2: If > 1 and = 0, then for any initial k0 the solution to the planning problem
does not converge to the zero-tax steady state, or any other interior steady state.
Suppose capital taxes are raised in the future, capitalists will decrease savings today for the substi-
tution effect. A capital tax increase will also reduce agent’s wealth and lower capitalists’ consumption
through the income effect. When > 1 the income effect prevails and capitalists save more. The in-
crease in the capital stock increases wages and is beneficial for workers. For this reason the government
wants to set positive capital taxes in the long-term. The opposite is true when < 1: the substitution
effect is larger than the income effect and zero taxes in the future increase savings in the short term
increasing wages and workers’ consumption.
References
Chamley, Christophe, “Optimal Taxation of Capital Income in General Equilibrium with Infinite Lives,”
Econometrica, 1986, 54 (3), pp. 607–622.
Judd, Kenneth L., “Redistributive taxation in a simple perfect foresight model,” Journal of Public
Economics, 1985, 28 (1), 59 – 83.
Kaplow, Louis, 2006. "On the undesirability of commodity taxation even when income taxation is not
optimal," Journal of Public Economics, Elsevier, vol. 90(6-7), pages 1235-1250, August.
Scheuer, Florian (2014), Lecture Notes
Straub, Ludwig and Ivan Werning, “Positive Long Run Capital Taxation: Chamley-Judd Revisited,”
Working Paper, MIT 2015.
56
9 Section 10: Education Policies and Simpler Theory
of Capital Taxation
In this section we study education policies in a simplified version of framework analyzed by Stantcheva
(2016). We then review a simpler theory of capital taxation proposed by Saez and Stantcheva (2016)
in a continuous time model.
u (c, l) = c h (l)
The agent consumes everything that is left after taxes and education investments such that c (✓) =
w (✓, e (✓)) l (✓) M (e (✓)) T (w (✓, e (✓)) l (✓) , e (✓)). Solving the individual maximization problem
we can define income and education wedges as:
h0 (l (✓))
⌧y (✓) = 1 (70)
w (✓, e (✓))
We can write the indirect utility as u (✓) = c (✓) h (l (✓)). Using the Envelope we can derive the
local incentive constraint:
57
subject to (72) and (73).
The first order conditions wrt u (✓), l (✓) and e (✓) are respectively:
h00 (l)
µ (✓) w✓ (1 ⌧y (✓)) 1 + l 0 = w⌧y (✓) f (✓)
h (l)
✓ ◆ ✓ ◆
⌧y (✓) (✓) F (✓) w✓ 1+" (✓) F (✓) "w,✓ 1+"
= = (77)
1 ⌧y (✓) f (✓) w " f (✓) ✓ "
The optimal income wedge is similar to the one we studied in Section 3. The formula has an extra
term proportional to the elasticity of the wage to ability. Labor distortions are higher at the optimum
when income is highly elastic to ability: the government distorts labor more when income is mostly
explained by ability and less by effort or investment in education.
Notice that:
⌧e + M 0 (e) ⌧y
we l M 0 (e) =
1 ⌧y
58
9.2 A Simpler Theory of Capital Taxation
We introduce in this paragraph a continuous time model with wealth in the utility function. We study
the case where utility is quasi-linear in consumption that allows us to transofrm the problem in a static
taxation problem.
Suppose individual i has utility ui (c, k, z) = c + ai (k) hi (z) where ai (·) is increasing and concave
and hi (·) is the standard disutility from labor. Agents have heterogeneous discount rates i . The
discounted utility is:
ˆ 1
Vi ({ci (t) , ki (t) , zi (t)}) = i [ci (t) + ai (ki (t)) hi (z (t))] e i t dt (79)
0
Capital accumulates according to:
dki (t)
= rki (t) + zi (t) T (zi (t) , rki (t)) ci (t) (80)
dt
where T (zi (t) , rki (t)) is the tax paid by individual i and is dependent on income and capital
returns. Wealth accumulation depends on the heterogeneous individual preferences, as embodied in the
taste for wealth ai (·) and in the impatience i . It also depends on the net-of-tax return r̄ = r (1 Tk ):
capital taxes discourage wealth accumulation through a substitution effect (there are no income effects).
The Hamiltonian for the individual maximization problem is:
it
Hi (ci (t) , ki (t) , zi (t) , (t)) = [ci (t) + ai (ki (t)) hi (z (t))] e + i (t) [rki (t) + zi (t) T (zi (t) , rki (t)) ci (t)]
@Hi
= h0i (z (t)) e it
+ i (t) [1 Tz (zi (t) , rki (t))] = 0
@zi
@Hi
= a0i (ki (t)) e it
+ i (t) r (1 Tk (zi (t) , rki (t))) = 0
i (t)
@ki (t)
Rearranging:
i (t) = e it
, h0i (z (t)) = 1 Tz (zi (t) , rki (t)) , a0i (ki (t)) = i r (1 Tk (zi (t) , rki (t)))
Since utility is quasi-linear in consumption, the model converges immediately to a steady state.
Denote (ci , zi , ki ) the steady state allocation, the problem collapses to a static optimization of the
following objective function:
where kiinit is the inherited level of capital and kiinit ki is the utility cost of going from kiinit to
the steady-state level.
The government maximizes the following:
ˆ
SW F = !i Ui (ci , ki , zi ) di
i
with !i 0 is the Pareto weight on individual i. The social marginal welfare weight is gi = !i Uic .
59
Optimal Linear Taxes Suppose the government sets linear income and capital taxes ⌧L and ⌧K .
The individual chooses labor and capital according to a0i (ki ) = i r̄ and h0i (zi ) = 1 ⌧L with
r̄ = r (1 ⌧K ). The government balances the budget through lump-sum transfers for a total of G ´ =
⌧K rk m (r̄)+⌧L ·z m (1 ⌧L ), where z m (1 ⌧L ) = i zi di is the aggregate labor income and k m (r̄) = i ki
´
dSW F @k m
ˆ
= !i rki + rk m + ⌧K r di
d⌧K @⌧k
ˆ ✓ ◆
ki ⌧K
= rk m !i 1 m
di eK
i k 1 ⌧K
where eK is the elasticity of aggregate capital with respect to the net of tax return r̄. At the
optimum dSW F/d⌧K = 0 and the optimal linear tax is:
1 ḡK
⌧k =
1 ḡK + eK
where ḡK = i gi ki / i ki . This is the standard formula for optimal linear taxes that we studied in
´ ´
Section 2 applied to capital. Notice that whenever capital accumulation is uncorrelated with social
marginal welfare weights (i.e. ḡK = 1) the optimal tax is zero. The reason is that if capital has no tag
value the government does not find optimal to tax capital for redistributive purposes. We also know
from previous sections that the revenue maximizing tax rate corresponds to the case of ḡK = 0 and it
is ⌧K = 1/1 + eK .
Optimal Non-Linear Separable Taxes Suppose the government optimally sets TK (rk) and
TL (z). The individual budget constraint is:
Define with ḠK (rk) the average relative welfare weight on inviduals with capital income higher
than rk. We have:
´
g di
{i:rki rk} i
ḠK (rk) =
P (rki rk)
Let hK (rk) be the distribution of capital income so that the Pareto parameter associated to the
capital income distribution is:
rk · hK (rk)
↵K (rk) =
1 HK (rk)
Denote eK (rk) the elasticity of capital income with respect to the net of tax return r (1 TK
0
(rk)).
Suppose the government introduces a small reform TK (rk) where the marginal tax rate is increased
by ⌧K in a small interval of capital income from rk to rk + d (rk). The mechanical effect associated
to the reform is:
d (rk) ⌧K [1 HK (rk)]
60
The welfare effects just weights the mechanical effect by Ḡ (rk), the social marginal welfare weight
associated to capital incomes above rk. Individuals who face the increase in the tax rate change
their capital incomes by (rk) = eK ⌧K / (1 TK 0
(rk)). There are hK (rk) d (rk) individuals in the
window affected by the tax change. Therefore, the total behavioral effect is:
0
TK (rk)
hK (rk) d (rk) rk 0 (rk) eK (rk)
1 TK
0 1 ḠK (rk)
TK (rk) =
1 ḠK (rk) + ↵K (rk) · eK (rk)
References
Bovenberg, L. and B. Jacobs. 2005. “Redistribution and Education Subsidies are Siamese Twins,”
Journal of Public Economics, 89(11-12), 2005-2035
Bovenberg, L. and B. Jacobs. 2011. “Optimal Taxation of Human Capital and the Earnings Function,”
Journal of Public Economic Theory, 13, (6), 957-971.
Saez, Emmanuel, and Stefanie Stantcheva. Working Paper. “A Simpler Theory of Optimal Capital
Taxation.” NBER Working Paper 22664.
Stantcheva, Stefanie. “Optimal Taxation and Human Capital Policies over the Life Cycle,” Journal of
Political Economy, forthcoming.
61
10 Section 11: Non-Linear Capital Taxation
In this section we first introduce a framework to study non-linear capital taxes and establish the
inverse Euler equation relation at the optimum. We then discuss implications of the inverse Euler
equation, the effect of skill shocks on consumption and the trend in consumption inequality required
by a Pareto-optimal allocation.18
subject to (81).
First-Best The first-best allocation is characterized by the following two first-order conditions:
which imply:
E [Uc0 (c0 , c1 (s) , y (s) /s)] = R⇤ Uc1 (s) (c0 , c1 (s) , y (s) /s) 8s (82)
Since the condition holds for every realization of s and E [Uc0 (c0 , c1 (s) , y (s) /s)] is constant, it
follows that in the first best the marginal utility of consumption across all states is equalized. This
implies we have full insurance and consumption is the same in every state of the world. Assuming
separability in consumption and labor and taking expectations on both sides of (82) we obtain the
standard Euler equation:
62
Second-Best Suppose now that s is private information of the agent. We know that we can solve the
problem by invoking the revelation principle. We assume the agent reports r = (s), where is the
reporting strategy and truthtelling implies ⇤ (s) = s. We denote consumption and income under the
reporting strategy with c1 (s) = c1 ( (s)) and y (s) = y ( (s)). Incentive compatibility requires:
U (c0 , c1 (s) , y (s) /s) U (c0 , c1 (r) , y (r) /s) 8r, s (83)
An allocation is feasible whenever it satisfies (81) and (83).
Given that an allocation is feasible we can ask whether free savings is feasible, meaning that extra
savings in period 0 leave the incentive compatibility unchanged. Suppose that preferences are
quasi-linear in consumption such that:
u (c0 ) + u (c1 (s) + ( , s)) h (y (s) /s) u (c0 ) + u (c1 (r) + ( , r)) h (y (r) /s)
Using (84), we have:
u (c0 ) + u (c1 (s)) + A ( ) h (y (s) /s) u (c0 ) + u (c1 (r)) + A ( ) h (y (r) /s)
u (c0 ) + u (c1 (s)) h (y (s) /s) u (c0 ) + u (c1 (r)) h (y (r) /s)
which holds since the original allocation was incentive compatible. It follows that the total utility
from consumption is everything that matters for incentive compatibility when it is changed indepen-
dently from s.
63
Inverse Euler Equation Suppose a feasible allocation solves the second-best problem, the variation
that we just studied should not improve the welfare of the agent. We have:
X
0 = arg max p (s) [u (c0 ) + u (c1 (s) + ( , s)) h (y (s) /s)]
s
X
= arg max p (s) [u (c0 ) + u (c1 (s)) + A ( ) h (y (s) /s)]
s
= arg max A ( )
@ ( , s)
u0 (c0 ) + u0 (c1 (s)) | =0 = A0 (0)
@
That we can rearrange to get:
1 1 X p (s)
= (87)
u0 (c0 ) R⇤ s u0 (c1 (s))
This is the so called inverse Euler equation. Notice that we can rewrite the expression above as:
✓ ◆ ✓ ◆
1 1 1
E = E
u0 (c0 ) R⇤ u0 (c1 (s))
Using the Jensen’s Inequality we obtain:
! !
1 1
0 1 1 1
E (u (c0 )) = E 0
<E ⇤ 0
= R⇤ E (u0 (c1 (s)))
u (c0 ) R u (c1 (s))
Since u0 (c0 ) < R⇤ E (u0 (c1 (s))) we established that in the second-best the government will distort
savings downwards such that there is a positive intertemporal wedge. The reason is that savings change
the incentives to work in period 1 whenever there are income effects. By transferring money to period
1 the agent is less willing to work and more prone to imitate lower-skilled agents. By taxing savings
the government can partially prevent this behavior. Therefore, it is impossible to achieve a Pareto
optimal allocation when agents are allowed to undertake unlimited trading of bonds.
u0 = u (c0 )
u1 (s) = u (c1 (s))
Suppose we change the allocation so that the agent is indifferent and we have:
64
u0 + u1 (s) = ũ0 + ũ1 (s) 8s (88)
We can write:
ũ0 = u0
and
Then the new allocation satisfies incentive compatibility only if the labor disutility under the new
allocation is such that:
u0 + (u1 (s) + ) h (y (s) /s) u1 (r) + (u1 (r) + ) h (y (r) /s) 8r, s
which is true only if the initial allocation was feasible. The dual problem rewrites the objective
function of the government so that at the optimum the total resource cost of the allocation is minimized:
( )
X
min C (u0 )+q p (s) C (u1 (s) + )
s
where C (u) is the inverse function of u (c). If an allocation is optimal, then = 0 must solve the
problem. The first order condition evaluated at = 0 is:
X
C 0 (u0 ) + q p (s) C 0 (u1 (s)) = 0
s
Since C (u) is the inverse function of u (c), then C 0 (u) = 1/u0 (c) (we also know it from Section 3).
It follows that:
1 q X p (s)
= (89)
u0 (c0 ) s
u0 (c1 (s))
We can interpret 1/u0 (c) as the resource cost of providing some incentives and therefore the inverse
Euler equation is equalizing the expected resource cost of providing incentives across the two periods.
and st is the history of shocks up to time t. We implement a revelation mechanism where each
agent must report her type through reporting strategies:
rt = t st
65
⇤
t st = st 8st , t
X ⇥ ⇤
t
u c st h y st /st Pr st
t,st
X ⇥ ⇤
t
u c st h y st /st Pr st
t,st
for every . Start from a node st and set the following perturbation:
ũ (s⌧ ) = u (s⌧ )
for any s⌧ 6= st and s⌧ 6= (st , st+1 ) so that consumption utilities are unchanged at any node that is
not st or any other of its direct successors. We therefore have:
ũ st = u st
and
ũ st , st+1 = u st , st+1 + 8st+1
The perturbation is incentive compatible is the starting allocation was incentive compatible. More-
over, the total expected utility after the perturbation is unchanged. At the optimal allocation we know
that = 0 must solve the following problem:
8 9
< X =
min C u st +q Pr st+1 | st C u1 st+1 +
: t+1 t
;
s |s
History Dependence When there is private information it is not Pareto-efficient to fully insure the
agent in equilibrium. The government must provide the agent with the incentive to produce higher
output by allowing her to receive a higher level of consumption in case of a positive skill shock. Suppose
that R = 1 . Define as follows the innovation in agent’s information about 1/u0 (ct+1 ):
1 1
t+1 = 0 Et 0 (90)
u (ct+1 ) u (ct+1 )
Define the change in future forecasts as:
66
1 1
Et+1 Et (91)
u0 (ct+s+1 ) u0 (ct+s )
The inverse Euler equation is a Martingale when R = 1 and together with the Law of iterated
expectations imply:
1 1 1
= E t = E t 8s (92)
u0 (ct ) u0 (ct+1 ) u0 (ct+s )
Therefore the lhs of (91) becomes:
1 1 1
Et+1 0 = Et+1 0 = 0
u (ct+s+1 ) u (ct+2 ) u (ct+1 )
while using (92) the rhs becomes:
1 1 1
Et 0 = 0 = Et 0
u (ct+s ) u (ct ) u (ct+1 )
Therefore:
1 1
Et+1 Et = t+1 (93)
u0 (ct+s+1 ) u0 (ct+s )
The change in future forecasts equals the the innovation in agent’s information at t + 1. Any shock
that generates a change t+1 in 1/u0 (c) leads to a change t+1 in agent’s forecasts of 1/u0 (c) at
any future date. This implies that skill shocks have a permanent effect on the reciprocal of marginal
utilities. When utility is logarithmic permanent changes reflect directly in the level of consumption.
The reason is that if an agent receives a positive skill shock, it is efficient for the government to require
a higher level of output. However, since there is private information, the government must reward
higher output with a lifetime increase in consumption to provide the right incentive to the agent. The
inverse Euler equation governs how the increase is spread over time. In the special case of R = 1 the
increase is evenly spread over time.
1 1
= Et
u0 (ct ) u0 (ct+1 )
1 1
+ "t+1 =
u0 (ct ) 0
u (ct+1 )
where "t+1 has mean zero and is uncorrelated with 1/u0 (ct ). Taking the variance:
✓ ◆ ✓ ◆
1 1
V ar + V ar ("t+1 ) = V ar
u0 (ct ) u0 (ct+1 )
We know from above that 1/u0 (ct+1 ) depends on new information revealed at time t + 1 and
therefore V ar ("t+1 ) > 0. It follows that:
✓ ◆ ✓ ◆
1 1
V ar > V ar
u0 (ct+1 ) u0 (ct )
Therefore, in a Pareto Optimal allocation the variance of the reciprocal of marginal utility grows
overtime. If utility is logarithmic this is equivalent to an increase in the inequality of consumption over
time. The reason for this result is that the government must provide through changes in consumption
the incentives. Therefore, consumption will grow more for workers with high skill realizations.
67
References
Golosov, M., Tsyvinski, A. & Werning, I., 2006. New Dynamic Public Finance: A User’s Guide. In
NBER Macroeconomic Annual 2006. MIT Press.
Kocherlakota, Narayana R. (2010), The New Dynamic Public Finance, Princeton Press
Scheuer, Florian (2014), Lecture Notes
68