Dp 11125
Dp 11125
Georg Graetz
NOVEMBER 2017
DISCUSSION PAPER SERIES
Georg Graetz
Uppsala University, CEP (LSE), CESifo and IZA
NOVEMBER 2017
Any opinions expressed in this paper are those of the author(s) and not those of IZA. Research published in this series may
include views on policy, but IZA takes no institutional policy positions. The IZA research network is committed to the IZA
Guiding Principles of Research Integrity.
The IZA Institute of Labor Economics is an independent economic research institute that conducts research in labor economics
and offers evidence-based policy advice on labor market issues. Supported by the Deutsche Post Foundation, IZA runs the
world’s largest network of economists, whose research aims to provide answers to the global labor market challenges of our
time. Our key objective is to build bridges between academic research, policymakers and society.
IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper
should account for its provisional character. A revised version may be available directly from the author.
ABSTRACT
Human Capital, Signaling, and Employer
Learning: What Insights Do We Gain from
Regression Discontinuity Designs?*
Several recent papers employ the regression discontinuity design (RDD) to estimate the
causal effect of a diploma (or similar credentials) on wages. Using a simple model of
asymmetric information, I show that RDD estimates the information value of a diploma.
A positive information value arises if employers, unable to observe the test score that
determines diploma receipt, infer that workers with a diploma have higher average
productivity than those without. Crucially, a diploma can have information value regardless
of whether workers’ productivity is solely determined by acquisition of knowledge and
skills through studying (the pure human capital model) or whether studying has no effect
on productivity (the pure signaling model). Thus, while RDD estimates of diploma effects
are evidence for information frictions and statistical discrimination, they do not help to
distinguish between human capital and signaling. However, with longitudinal data, RDD
can be used to estimate the speed of employer learning, since RDD coefficients are direct
estimates of (differences in) expectation errors.
Corresponding author:
Georg Graetz
Economics Department
Uppsala University
P.O. Box 513
75120 Uppsala
Sweden
E-mail: [email protected]
* Part of the work on this paper was done while I visited ifo Institute Munich, and I thank its faculty and staff for
their hospitality. I am grateful to Mattias Nordin for many helpful discussions during the gestation of this paper.
I also thank Michael Böhm, Steve Pischke, and Oskar Nordström Skans for their suggestions, which substantially
improved the paper. And for helpful comments, I thank audiences at the Stockholm-Uppsala Economics of Education
Workshop and the Helsinki Center of Economic Research. All remaining errors are my own.
1 Introduction
Why do some people get more education than others, and why are wages and education positively
correlated? According to the human capital view, education raises productivity and hence
wages, and individuals who are more efficient at learning respond to this incentive more strongly
(Becker, 1964; Ben-Porath, 1967). In contrast, the pure signaling view denies any productive
effect of education. Instead, it assumes that individuals’ study efficiency is positively correlated
with productivity, and argues that more-skilled individuals choose higher levels of education to
signal their higher productivity to employers (Spence, 1973). The two theories are difficult to
distinguish in the data, as they both predict positive relationships between wages and education,
as well as between education and skill measures such as IQ.1
Evidence in favor of the pure signaling view is especially hard to come by. For example,
a comparison of two workers with the same number of years in school, where one obtained
a graduation diploma but the other did not, may be contaminated by heterogeneity in human
capital accumulation that is unobserved to the econometrician but observed to employers. To
address this problem, researchers have searched for exogenous variation in diplomas (or similar
credentials) to estimate their signaling value. Regression discontinuity designs (RDDs) using
the exam score that determines diploma receipt have become especially popular in recent years.
By focussing on individuals close to the threshold that must be cleared to obtain the diploma,
one can consistently estimate the causal effect of the diploma on wages. It is often argued that a
positive estimate constitutes evidence in favor of the signaling view of education.2
In this paper I show instead that random assignment of signals via an RDD cannot be used
to distinguish between signaling and human capital theories. Wage differences detected by
an RDD do reflect differences in beliefs that employers attach to different diploma statuses.
However, these differences must be due to comparisons of large groups of workers on either
side of the score threshold, since employers do not know which individuals are close to the
threshold. In the sample that the employers’ comparisons are based on, there are true productivity
1 Throughout the paper, I use productivity in the same way as Spence (1973, p.361): “For productivity the reader
may read ‘what the individual is worth to the employer.’ There is no need to rely on marginal productivity here.”
2 For instance, Clark and Martorell (2014) propose using an RDD to estimate the signaling value of a high school
2
differences by diploma status. Crucially, these productivity differences could be due to variation
in accumulated human capital, or due to a correlation between productivity and study efficiency
when knowledge has no effect on productivity (the pure signaling view), or due to a combination
of both. Unfortunately, there is no way to tell from the RDD evidence what the source of the
productivity differences is.
I develop a simple model that clarifies these points. The model is one of asymmetric
information, where employers must infer workers’ productivity based on some signal such
as a diploma. Workers can influence the probability of receiving the diploma by acquiring
knowledge and skills through studying, and they differ by study efficiency. Despite the premise
that employers must solve a signal extraction problem, there is still room in my model for both
the pure human capital and the pure signaling views of education in the following sense. Worker
productivity may either be completely determined by studying, it may be unaffected by studying,
or it may both be a function of studying and pre-determined characteristics. Thus, I take as the
central question in the human capital versus signaling debate whether education has a causal
effect on productivity, and not whether there exist information frictions in the labor market.
Indeed, human capital theory suggests that, regardless of the existence of information frictions,
policy should remove barriers that prevent people from acquiring their desired level of education.
In contrast, pure signaling theory suggests that education may be socially harmful, leading to
very different policy implications.
Although the conclusion of this paper is negative with respect to the ability of RDD to
address the human capital versus signaling question, I do offer two insights which I hope readers
will find useful. First, I show that RDD estimates the information value of a diploma or similar
certification, the causal effect of the diploma on employer’s beliefs about worker productivity.
This parameter is of interest for our understanding of the labor market and also for policy. If the
information value is estimated to be positive, then this is evidence for information frictions, for
the ability of a diploma to alleviate these frictions, and for statistical discrimination by employers.
If the information value appears to be zero, then this suggests that either information frictions
are absent, or, perhaps more likely, that the diploma does not carry any useful information—and
the latter would be cause for concern to policy makers.
The concept of the information value that I use in this paper is arguably related to the ‘signal-
ing value’ referred to in previous literature. My contribution is to define this concept in relation
to a theoretical model. In this way, misunderstandings can be avoided. For instance, it becomes
clear that RDD or any other quasi-experimental design cannot estimate the information value
“net of human capital effects” (Tyler, Murnane, and Willett, 2000). Any positive information
value will reflect differences in accumulated human capital if the knowledge required to pass the
test is useful in production.
Second, I derive implications of diploma RDD coefficients for employer learning. Given that
diploma RDDs estimate employers’ initial beliefs, the connection with the employer learning
3
literature is natural.3 To build intuition, I first discuss a simple extension to the theoretical model.
I show that, if worker ability gets revealed over time, then an RDD using wages long after
graduation as the outcome will estimate a zero diploma effect (again, regardless of the relative
importance of human capital and signaling). I then present a more general framework in which
worker productivity is time-varying. I show that if wages are observed at several points in time,
then diploma RDDs can be used to trace out the time path of (differences in) expectation errors,
thus measuring the speed at which employers learn about worker ability. This result does not
depend on the functional form and distributional assumptions made in previous work.
The focus of this paper is RDD because it is arguably the most common research design for
estimating causal effects of diplomas or other certifications (see the next paragraph). However,
the conclusions generalize to any setting in which diplomas are (quasi-)randomly assigned.
Regardless of the identification strategy employed, positive estimates of a diploma’s information
value (or signaling value) cannot be evidence in favor of pure signaling. This is because diplomas
will also have information value in a human capital model featuring asymmetric information.4
There are already several papers estimating diploma RDDs and their number is sure to
increase in the coming years. Clark and Martorell (2014) estimate a zero effect of high-school
diploma receipt in data from Texas. Di Pietro (2017) and Feng and Graetz (2017) use RDDs
to estimate the causal effects of degree class (a coarse measure of university performance) on
early labor market outcomes in the UK. Both papers find no effect on employment, but Feng and
Graetz (2017) estimate a positive effect on the probability of working in a high-wage industry, as
well as on (expected) industry wages and earnings. In Feng and Graetz (2017) we are careful to
point out that our estimates do not speak to the signaling versus human capital question.5 Freier,
Schumann, and Siedler (2015) estimate positive earnings effects of graduating with honors for
German law graduates. While their setting lends itself to a typical ‘diploma RDD,’ they do not
actually implement this due to lack of statistical power. However, their differences-in-differences
design closely mimics an RDD. With reference to Feng and Graetz (2017), they acknowledge
that their evidence cannot distinguish between signaling and human capital. Khoo and Ost (2017)
use an RDD to estimate the effects of Latin honors in the US. They find positive effects on wages
in the first two years after graduating, but not in the third, consistent with employer learning.
3 See for instance Altonji and Pierret (2001), Arcidiacono, Bayer, and Hizmo (2010), Farber and Gibbons (1996),
Fredriksson, Hensvik, and Nordström Skans (2015), Gibbons, Katz, Lemieux, and Parent (2005), and Lange (2007).
4 Tyler, Murnane, and Willett (2000) estimate the information value (they call it signaling value) of the General
Educational Development (GED) credential in the US. Since there are differences in passing standards across states,
Tyler, Murnane, and Willett (2000) are able to essentially compare earnings between individuals with the same
raw score but different GED status. They are motivated by the observations that “[it] has proved difficult (...) to
distinguish between human capital and signaling explanations of the observed relationship between education and
earnings” (p.431) and that “[ideal] data for identifying the returns to a signal would contain exogenous variation in
signaling status among individuals with similar levels of human capital” (p.432). In this paper I affirm the latter
statement, but emphasize that estimating the return to a signal is not sufficient for distinguishing between human
capital and signaling theories.
5 We made the contrary claim in the first working paper version of that article, before realizing our error.
4
Their interpretation (p.4) is in line with the argument in this paper: “[It] is possible that firms
view Latin honors as a signal of ability, but this ability was learned in college as opposed to
being an innate trait as in Spence’s original model. As such, the signaling value of Latin honors
could theoretically operate through a human capital effect.”6,7
The plan of the paper is as follows. Section 2 presents the model. Section 3 characterizes
a separating equilibrium and proves its existence. Section 4 clarifies what RDD estimates in
relation to the model. Section 5 first extends the model to incorporate learning, and then presents
a more general framework to demonstrate how RDD can be used to estimate the speed of learning.
Section 6 offers a brief concluding discussion of alternative approaches to testing signaling
theory.
I begin with an informal overview. The model economy is populated by large numbers of
risk-neutral workers as well as risk-neutral entrepreneurs who run firms employing the workers.
All workers first study at school. At the end of their studies they take a test, and if they score
above a known threshold, they receive a diploma. Workers can influence their test score through
knowledge acquisition, but cannot manipulate the score precisely. Workers differ by innate
and time-invariant talent, which determines the cost of acquiring knowledge. Productivity is
an increasing function of acquired knowledge, or of innate talent, or of both. After graduation,
workers enter a perfectly competitive labor market and are paid their expected marginal product.
Prior to employing them, firms know nothing about workers apart from whether they have
received a diploma. Hence, there are at most two distinct values that starting wages may take.
There is only one period of production activity, and no learning takes place. I also discuss two
extensions of the model, one that allows for a second signal and hence wage variation conditional
on diploma status, and one that includes learning.
I now turn to the formal exposition of the model, starting with test score production and
diploma receipt. Worker i’s test score zi is the sum of her knowledge ki ≥ 0 and a noise term ui ,
6A further concern with the interpretation of diploma RDDs is the possibility that employers observe the
underlying running variable. In that case, a positive diploma effect indicates limits on employers’ ability to process
information.
7 There is a close connection between the literature on ‘sheepskin effects’ and the more recent stream of papers
employing RDD to estimate degree effects. Both literatures claim to offer tests of signaling theory. Layard and
Psacharopoulos (1974) argue that if signaling is important, then graduation should be a stronger indication of
productive skills than mere attendance. Hence, there should be larger returns to an extra year of education when
completion of that year coincides with a degree award. They do not find evidence of this in US data. However,
subsequent literature has consistently documented substantial sheepskin effects (see for instance Jaeger and Page,
1996). It is questionable whether these findings are in support of signaling, since graduation years may feature a faster
rate of human capital accumulation, and this may be known by employers if not measured by the econometrician.
An alternative interpretation is that individuals face uncertainty about their returns to schooling, but partially resolve
this uncertainty while in school. This causes low-return individuals to drop out, see Lange and Topel (2006).
5
which is revealed only after knowledge has been acquired,
zi = ki + ui . (1)
The noise is independently and identically distributed with mean zero. Let Fu denote its CDF,
with unbounded support and the symmetry property Fu (x) = 1 − Fu (−x).8 The assumption that
workers cannot perfectly set their test score is realistic, but it is also needed for the validity of an
RDD that uses data generated from this model.
Workers incur a cost when studying, in wage units, of θi−1 c(ki ). Innate talent θi is strictly
positive and continuously distributed with bounded support. The cost function is defined on
[0, k), with k < ∞, and has the following properties:
c0 (k) > 0 if k > 0, c0 (0) = 0; c00 (k) > 0; lim c(k) = ∞. (2)
k→k
These properties will ensure existence of a separating equilibrium in which optimal knowledge
is distributed over finite support.9 Diploma status is indicated by di . Receipt of the diploma
requires a non-negative test score,
0 if zi < 0
di = (3)
1 if zi ≥ 0.
To complete the setup of the model, I specify the production side. Worker i produces pi units
of output. Productivity is a function of knowledge and innate talent as follows:
pi = ak ki + aθ θi , ak , aθ ≥ 0. (4)
Equation (4) nests the pure human capital model where only knowledge affects productivity:
ak > 0, aθ = 0; the pure signaling model where only innate talent affects productivity: ak =
0, aθ > 0 (recall that the cost of acquiring knowledge is an inverse function of innate talent); as
well as a combination of human capital and signaling ak > 0, aθ > 0.
Given the setup of my model, a suitable equilibrium concept is Bayesian Nash Equilibrium
(BNE). In a BNE, workers set their knowledge level, and by implication the probability of
obtaining the diploma, to maximize expected wages net of study costs, given the wage schedule
offered by employers. In turn, employers’ beliefs about workers’ acquired knowledge and type,
8 All functions in this paper are assumed to be differentiable as often as needed.
9 Note that k could be a very large number, so the assumption of finite support is not a very restrictive one.
6
conditional on diploma status, are consistent with workers’ actions. While there is a multiplicity
of BNEs in this model, I am most interested in a separating equilibrium in which workers with
a diploma are correctly expected to have acquired more knowledge than those without.10 The
following result claims existence of such an equilibrium.
Proposition 1. Suppose that ak , aθ ≥ 0 and ak + aθ > 0. There exists a Bayesian Nash Equi-
librium in which workers’ chosen knowledge level is a strictly increasing function of innate
talent; and in which workers with diploma receive a wage w1 that is strictly higher than the
wage earned by workers without a diploma, w0 .
To prove the claim, I start by characterizing the properties that such an equilibrium would
have if it existed.
Suppose that w1 > w0 and consider the workers’ optimal knowledge acquisition choice. The
probability of receiving a diploma, given knowledge, is 1 − Fu (−ki ) which equals Fu (ki ) due
to the symmetrical distribution of the noise term. Hence, optimal knowledge is determined by
solving the following optimization problem:
Recall that ki is non-negative and ui has a symmetric distribution with mean zero. Therefore,
fu0 (ki ) ≤ 0. This together with the properties of the cost function (2), ensures that the second-
order condition is always satisfied. The first- and second-order conditions can then be used to
demonstrate that optimal knowledge at the interior solution is strictly increasing in talent θi and
strictly increasing in the wage gap w1 − w0 . To ensure that the interior solution delivers the
global maximum, we need to rule out that ki = 0 is optimal for some workers, which is done by
assuming that the derivative of the cost function is zero at zero knowledge, as stated in (2).
I now turn to employers’ beliefs. If the distribution of knowledge is non-degenerate, then
the expected knowledge level of a worker with diploma is higher than that of one without. Let
the CDF of optimal knowledge conditional on diploma status be denoted by Fk∗ |d . Then we
10 There is also a trivial pooling equilibrium in which employers pay the same wage to everybody, regardless of
diploma status, and workers rationally refrain from studying. The test score and diploma status are then completely
uninformative about worker types, reinforcing the employers’ equal-pay strategy.
7
have Fk∗ |d=1 (x) ≤ Fk∗ |d=0 (x), with strict inequality for some x. In other words, Fk∗ |d=1 first-order
stochastically dominates Fk∗ |d=0 , implying that the expected value of k∗ is higher for workers
with a diploma.11 The same argument applies to expected innate talent: it is also higher for
workers holding a diploma, Fθ |d=1 (x) ≤ Fθ |d=0 (x). In a competitive market, wages equal the
expected value of productivity conditional on diploma status, wi = E[pi |di ]. Since workers who
possess a diploma have more knowledge on average than those who do not, they earn higher
wages. This may be because higher expected knowledge implies higher levels of human capital,
ak > 0. Or it could be because higher expected knowledge implies higher levels of innate talent,
implying higher productivity if aθ > 0. In sum, a diploma leads to higher wages whenever
knowledge, or innate talent, or both, affect productivity.
Let the difference in wages between workers with diploma and those without be denoted by
δ ≡ w1 − w0 . Then we can write, using (4),
δ = E[pi|di = 1] − E[pi|di = 0]
R R
= ak Sk∗ xd Fk∗ |d=1 (x) − Fk∗ |d=0 (x) + aθ Sθ xd Fθ |d=1 (x) − Fθ |d=0 (x) .
Applying integration by parts, and using the fact that knowledge and talent have bounded support,
we obtain
R R
δ = ak Sk∗ Fk∗ |d=0 (x) − Fk∗ |d=1 (x) dx + aθ Sθ Fθ |d=0 (x) − Fθ |d=1 (x) dx. (7)
The right-hand side of (7) is strictly positive (because F·|d=1 (x) ≤ F·|d=0 (x), with strict inequality
for some x), and it is a function of δ since, by the FOC (6), δ affects knowledge choice and
hence the distribution of the test score.
Thus, all that is needed for a separating equilibrium to exist is for the right-hand side of (7)
to be bounded even as δ grows arbitrarily large. Since all terms on the right-hand side of (7) are
either constants or CDFs that are integrated over finite support, this condition is satisfied. This
completes the proof of Proposition 1.
11 To see that Fk∗ |d=1 (x) ≤ Fk∗ |d=0 (x), note that
and
Pr(z < 0|k∗ ≤ x)Pr(k∗ ≤ x) [1 − Pr(z ≥ 0|k∗ ≤ x)]Pr(k∗ ≤ x)
Fk∗ |d=0 (x) ≡ Pr(k∗ ≤ x|z < 0) = = .
Pr(z < 0) 1 − Pr(z ≥ 0)
Comparing the two rightmost expressions, we see that Fk∗ |d=1 (x) ≤ Fk∗ |d=0 (x) requires Pr(z ≥ 0|k∗ ≤ x) ≤ Pr(z ≥ 0),
which holds since Pr(z ≥ 0) ≡ Pr(z ≥ 0|k∗ < ∞).
8
4 RDD estimates the information value of a diploma, but cannot distinguish between
signalling and human capital theories of education
Recall that the difference in wages between workers with diploma and those without is denoted by
δ ≡ w1 − w0 . I call this the information value of the diploma, because in a separating equilibrium
it reflects the difference in employers’ beliefs about the productivities of workers with different
diploma status. Here I show that an RDD that uses data generated by my model consistently
estimates this value.
Before proceeding, it is worth noting that the model as stated does not require an RDD for
estimation of δ . In fact, a comparison of mean wages by diploma status will suffice. In reality,
however, this is unlikely to be the case, and a simple modification of the model demonstrates
why. Suppose that employers receive a second signal about workers’ ability, si = θi + ηi , where
ηi is an independently and identically distributed noise term. Suppose further that this signal,
although it is correlated with type, cannot be controlled by the workers, and that workers acquire
knowledge unaware of the value of the second signal. This signal will generate wage dispersion
even conditional on diploma status. Moreover, because test scores vary with knowledge, which
in turn is a deterministic function of innate talent, there will be a positive correlation between
wages and test scores even conditional on diploma status. A comparison of mean wages by
diploma status will then give an upwardly biased estimate of δ . However, an RDD would address
this problem. To simplify the discussion, I will focus for now on the case where the diploma
is the only signal (but perhaps the econometrician is not aware of this). I return to the case of
multiple signals below.
The running variable for the RDD is the test score zi , the treatment is diploma receipt as
indicated by di , and the assignment rule is given by (3). The RDD consistently estimates the
causal effect of diploma receipt on wages, or the information value of the diploma δ , provided
that the distribution of innate talent does not jump at the cutoff. A sufficient condition is that
workers cannot precisely manipulate their test score (Lee and Lemieux, 2010). This condition is
satisfied thanks to the noise term in equation (1).
Why then does the RDD not help to distinguish between signalling and human capital
theories of education? In terms of the model and Proposition 1, the reason is simply that the
diploma has information value (δ > 0) regardless of whether productivity is determined by
acquired knowledge only, ak > 0 and aθ = 0, which corresponds to a pure human capital theory
of education; or whether acquired knowledge has no effect on productivity, ak = 0 and aθ > 0,
corresponding to a pure signaling view; or whether a combination of both theories holds, ak > 0
and aθ > 0.
Equation (7) demonstrates that RDD, although comparing statistically identical marginal
workers, identifies a quantity that contains information about infra-marginal workers (the
cumulative distribution functions conditional on diploma receipt). Perhaps it is this insight
9
that is overlooked by researchers who interpret RDD estimates of diploma effects as evidence
of signaling. The implication of an RDD detecting a positive diploma information value is
that employers have different beliefs about workers with diploma from those without. These
differences in beliefs must be based on a comparison of a large number workers with diploma to
a large number of workers without, and cannot be based on a comparison just of workers close
to the threshold. Since employers do not observe test scores, they cannot make this comparison—
and if they could, there would be no diploma information value. Finally, the observation that
employers expect workers with diploma to have higher productivity than those without, tells us
nothing about the sources of these differences in beliefs. It could be that knowledge generates
human capital, of which workers who study more efficiently will accumulate more. Or, it could
be that knowledge has no impact on productivity, but that study efficiency and productivity
are positively correlated, hence allowing workers to signal their type through acquisition of a
diploma. These issues cannot be resolved by an RDD, nor any other way of (quasi-)randomly
assigning diplomas. And although quasi-experimental designs are capable of estimating diploma
information values, these estimates cannot be interpreted as “net of human capital effects” (Tyler,
Murnane, and Willett, 2000).
It is worth thinking about wage variation away from the cutoff more thoroughly. Clark and
Martorell (2014, p284) argue that “in the broader population, the measured earnings advantage
enjoyed by workers with diplomas reflects [the diploma’s] signaling value plus any productivity
differences that firms observe.” Let us return to the case mentioned above where employers
receive another noisy signal si about workers’ type. Worker i now earns a wage wi = E[pi |di , si ].
In general, it is difficult to determine the functional form of this conditional expectation. A natural
starting point is an additive formulation, E[pi |di , si ] = δ di + g(si ), where g(·) is some strictly
increasing, continuous function. The simple mean comparison E[wi |di = 1] − E[wi |di = 0] now
delivers δ + E[g(si )|di = 1] − E[g(si )|di = 0], which overstates the diploma information value
since innate talent is positively correlated with both diploma receipt and the second signal.
However, by comparing workers close to the threshold, an RDD is capable of recovering δ .12
In general, we may have E[pi |di , si ] = h(di , si ), where h(·, ·) is not additive in the two signals.
The information value of the diploma will then not be constant, and a standard RDD will deliver
a local estimate instead.13 In any case, the substantive point is that wage variation around the
cutoff comes from the diploma signal; wage variation away from the cutoff, and conditional on
diploma receipt, is due to separate information about workers’ types that employers receive; and
wage differences by diploma status in the whole population reflect both the diploma signal and
the separate information. Again, this is true regardless of whether the labor market is described
by a pure signaling model, or by a human capital model.
12 Of course, implementation of an RDD requires dealing with the relationship between wages and test scores
away from the cutoff, for instance using local polynomials.
13 It is possible to extrapolate RDD treatment effects in some circumstances, as shown by Angrist and Rokkanen
(2015).
10
5 Both signaling and human capital will produce zero RDD estimates for wages in the
long run if ability is revealed over time, and RDD can be used to estimate the speed of
learning
In this section I explore how RDD can contribute to our understanding of employer learning. I
first take the model of Sections 2 and 3 and modify it to allow for learning. Leaving that model
behind, I then consider a more general framework.
To analyze the role of learning in the model of the previous sections, let us add a second
period of production to the model. At the end of the first period, workers’ ability is fully revealed.
This means that in the second period, all workers are paid their true marginal product pi . Since
innate talent, and hence knowledge, vary continuously around the threshold for diploma receipt,
an RDD using second-period wages will estimate a zero effect. Because workers’ ability has
been revealed, the diploma loses its information value. Once more, this holds regardless of the
relative importance of signaling and human capital.
Now consider a more realistic framework in which productivity may be time-varying. Let
individual i’s productivity at time t be given by
The function π is assumed to be constant across individuals and over time, and continuous in
its arguments. These arguments include the time-invariant worker trait θi , where we assume as
before that such innate talent may influence productivity directly or indirectly through acquired
human capital; and time t reflecting productivity growth due to further human capital investments
such as on-the-job training as well as any aggregate labor productivity growth. The two arguments
may interact, so that differences in innate talent may lead to differences in productivity growth.
Employers know the function π.
Note that I omit knowledge k from (8) to save on notation. In terms of the model of Section
2, its effects are of course captured by the presence of θ given the one-to-one mapping between
innate talent and knowledge. The distinction between direct productivity effects of θ , and
indirect effects through k, is not essential here.
Equation (8) is a generalization of productivity processes typically assumed in the employer
learning literature, see for instance equation (2) in Lange (2007).14 It implies that true productiv-
ity is independent of past productivity realizations, as well as past wages, conditional on θi and t.
I discuss the plausibility of these assumptions below. For now, let us simply observe that they
are in line with the employer learning literature. (To be sure, expectations of productivity formed
by employers are of course assumed to depend on past productivity in this literature.)
Given risk neutrality of employers and workers, wages equal the expected value of productiv-
14 Note that θ could be a vector rather than a scalar.
11
ity according to the employers’ beliefs, wit = E b t [pit ], which may be different from pit . Indeed,
let us assume that employers are initially uncertain about θi , and that this is the object that
they learn about over time. Following the employer learning literature I assume that any new
information that is revealed about individuals’ productivity is known to all market participants,
so that all employers share the same beliefs. The time subscript in E b t reflects the information
that employers acquire over time (as well as their prior knowledge about how the conditional
means change with time, see below).
Average productivity may be conditioned on whether individuals fall below or above the
threshold, denoted by E[pit |di = 0] and E[pit |di = 1], respectively. We can similarly condition
employers’ beliefs, and hence we can write average wages for workers above and below the
threshold as w0t = E b t [pit |di = 0] and w1t = E b t [pit |di = 1]. For workers close to the threshold
(‘local’), we have E[pit |di = 0, local] = E[pit |di = 1, local] given (8) and since θi does not jump at
the threshold. As in the simple extension of the model, if employers have learned about workers’
productivity—the time-invariant component of workers’ productivity θi has been revealed—then
Eb [pit ] = E[pit ] and in particular Eb [pit |di, local] = E[pit |di, local], hence wlocal
0t = wlocal
1t .
How fast do employers learn about worker ability? If wage data at several points in time are
available, then an RDD can actually help answer this question. Estimating the speed of learning
is a difficult problem because it is typically not possible to observe the mistakes that employers
make when assessing workers’ productivity. Lange (2007) employs an indirect approach using
the insight of Altonji and Pierret (2001) that initially-unobserved (by the employer) correlates
of productivity, such as IQ scores, should increase in importance in a wage equation over time,
while the opposite is true for easily observed correlates such as years of schooling. Lange (2007)
shows that the changes in coefficients over time is informative about the speed of learning under
specific functional form and distributional assumptions. Using data from the NLSY, he estimates
that employers’ expectation errors—the difference between wages and true productivity—decline
by 50 percent within three years.
Any estimation approach for the speed of learning of course requires there to be evidence for
learning in the first place. For Lange (2007), this evidence is the agreement of his data with the
predictions of Altonji and Pierret (2001). The same predictions can be applied to the diploma
RDD context. Learning requires the RDD coefficients to decline over time—the diploma here
takes over the role of years of schooling; and it requires the slope coefficients to increase over
time—the running variable takes over the role of the IQ score. I assume from now on that we
deal with data in which such evidence for learning is present. In particular, I assume that RDD
coefficients are indistinguishable from zero from some point T onwards, as in Khoo and Ost
(2017).
Positive diploma RDD coefficients for entry wages imply that employers initially over-
(under-)predict the productivity of workers closely above (below) the threshold, on average.
Thus, RDD estimates prior to ability being revealed are direct estimates of the difference
12
between expectation errors for workers just above and just below the cutoff. Let the expectation
error for individual i be defined as eit ≡ E b t [pit ] − pit . Furthermore, let average expectation
errors, conditioned on diploma status, be defined as e1t ≡ E b t [pit |di = 1] − E[pit |di = 1] and
e0t ≡ Eb t [pit |di = 0] − E[pit |di = 0], respectively. The diploma information value (DIV) that
RDD estimates is now formally expressed as
DIV ≡ E
b t [pit |di = 1, local] − E
b t [pit |di = 0, local].
This equals the difference in average expectation errors for workers close to the threshold,
Eb t [pit |di = 1] − E[pit |di = 1, local] − Eb t [pit |di = 0] − E[pit |di = 0, local]
n o n o
DIV =
≡ elocal local
1t − e0t ,
Eb T [piT |di = 0] = Eb T [piT |di = 1]. In this scenario, employers acknowledge some persistence of
individual heterogeneity across education and labor market entry, E b t [pit |di = 0] < E
0 0
b t [pit |di =
0 0
1], so that RDD estimates are initially positive. But employers also believe that after a certain
amount of time, individuals’ productivity draws are uncorrelated with their initial ones, so that
15 One could go further by allowing productivity to be stochastic: Each individual’s productivity is drawn from an
idiosyncratic, time-varying distribution, whose mean is given by (8). By learning about θi , employers then learn
about the mean of the idiosyncratic productivity distributions. Rather than differences in expectation errors, RDD
estimates differences in “errors in beliefs.” The result that RDD can estimate the speed of learning carries over to
this more general setting. I have chosen to focus on the non-stochastic case mainly to avoid additional notation.
13
any early signals become uninformative eventually, and this is what produces zero RDD estimates
in the long run. Empirically, this scenario could be ruled out by gathering independent evidence
on the persistence of wages and (beliefs about) productivity. One possibility is to examine the
slope of the running variable: mean reversion implies that the slope declines over time. But note
that this would contradict my premise that the data show evidence of learning.
Second, in contradiction to (8), true expected productivity may depend on the history of
productivity draws as well as on past wages, even after conditioning on time-invariant worker
traits. Suppose, for instance, that human capital accumulation such as on-the-job training is
affected by transitory productivity variation. If there is learning, then RDD estimates will still
be zero in the long run as there are no jumps at the threshold in the parameters determining
transitory productivity variation. However, the interpretation of the speed of learning will be
more nuanced since the learning is now about a moving target. More problematically, past
wages may affect human capital accumulation, due to liquidity constraints or because employers’
training choices are based on the same information as their wage setting (Kahn and Lange, 2014).
Thus, workers just above the cutoff accumulate human capital at a faster rate than workers just
below, at least temporarily. With learning, the differences in investment rates disappear over
time, and if human capital depreciates sufficiently fast, so do the productivity differences. The
time it takes for RDD estimates to become zero (or to stabilize around some positive value, if
depreciation is relatively slow) is then an upper bound to the time it would take for expectation
errors to disappear in the absence of differential human capital accumulation.
Regression discontinuity designs that estimate the causal effect of a diploma or similar certifi-
cation on wages are useful because they can detect the existence of information frictions and
statistical discrimination; because they quantify the information value of a diploma; and because
they can be used to measure the speed of employer learning. However, the notion that RDDs can
speak to the question of whether education produces human capital or serves a pure signaling
function, does not withstand the scrutiny of a theoretical model. A positive diploma information
value (or signaling value) is not evidence in favor of pure signaling, because diplomas can
also have information value in a human capital model featuring asymmetric information. This
conclusion applies to any setting in which diplomas are (quasi-)randomly assigned.
There are at least three viable, alternative approaches to test for the relative importance of
human capital and signaling. One approach is to estimate the causal effect of schooling on
productive skills such as IQ. Carlsson, Dahl, Öckert, and Rooth (2015) accomplish this by using
exogenous variation in the date on which Swedish men took cognitive tests for (compulsory)
military enlistment. They find that an extra ten days in school raise cognitive skills by one percent
of a standard deviation, while extra non-school days have almost no effect. This contradicts the
14
pure signaling view of education.
A second approach is to deduce additional predictions from signaling theory that are truly
distinct from those of the human capital model. Bedard (2001) develops a signaling model in
which some high-ability individuals face barriers to entering university. If university access is
expanded, then some of the previously constrained individuals enter. As the pool of high school
graduates decreases in average quality, and employers realize this, some low-skill individuals no
longer find high school worthwhile. The high school dropout rate increases, a prediction that
contradicts the pure human capital model. Bedard (2001) does find suggestive evidence of this
in US data. It seems worthwhile to search for further distinct predictions of signaling theory, and
to confront them with data.16
A third approach to assessing the importance of signaling has been developed in the employer
learning literature. This literature documents the importance of information frictions in the labor
market, but also the ability of employers to overcome these frictions through learning. Lange
(2007) and Lange and Topel (2006) highlight that learning imposes limits on the importance of
signaling: the faster true ability is revealed, the less time is available for recouping the educational
investment. With the help of a theoretical model, one can then use estimates of the speed of
learning together with estimates of the opportunity cost of schooling to bound the importance of
signaling. Lange (2007) suggests that for reasonable parameter choices, the upper bound on the
contribution of signaling to the observed return to schooling is smaller than 15 percent. Learning
appears to be simply too fast for signaling to play an important role in education choices. By
estimating the speed of learning, diploma RDDs—and other strategies for isolating quasi-random
variation in labor market signals—could potentially make a contribution to the signaling versus
human capital debate, after all, albeit perhaps in a different way than originally intended.
References
ACEMOGLU , D., AND D. H. AUTOR (2009): “Lectures in Labor Economics,” unpublished manuscript, https:
//economics.mit.edu/files/4689, accessed on July 17, 2017.
A LTONJI , J. G., AND C. R. P IERRET (2001): “Employer Learning and Statistical Discrimination,” The Quarterly
Journal of Economics, 116(1), 313–350.
A NGRIST, J. D., AND M. ROKKANEN (2015): “Wanna Get Away? Regression Discontinuity Estimation of Exam
School Effects Away From the Cutoff,” Journal of the American Statistical Association, 110(512), 1331–1344.
A RCIDIACONO , P., P. BAYER , AND A. H IZMO (2010): “Beyond Signaling and Human Capital: Education and the
Revelation of Ability,” American Economic Journal: Applied Economics, 2(4), 76–104.
16 In a similar spirit, Lang and Kropp (1986) argue that in a signaling model, increases in the compulsory schooling
age will shift the entire distribution of schooling. They do find evidence of such a ‘ripple effect’ in US data. However,
Lange and Topel (2006) suggest that increases in the compulsory schooling age may reflect secular increases in the
value of education, which would produce the same patterns in the data.
15
B ECKER , G. S. (1964): Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education.
University of Chicago Press.
B EDARD , K. (2001): “Human Capital versus Signaling Models: University Access and High School Dropouts,”
Journal of Political Economy, 109(4), 749–775.
B EN -P ORATH , Y. (1967): “The Production of Human Capital and the Life Cycle of Earnings,” Journal of Political
Economy, 75(4), 352–365.
C ARLSSON , M., G. B. DAHL , B. Ö CKERT, AND D.-O. ROOTH (2015): “The Effect of Schooling on Cognitive
Skills,” The Review of Economics and Statistics, 97(3), 533–547.
C LARK , D., AND P. M ARTORELL (2014): “The Signaling Value of a High School Diploma,” Journal of Political
Economy, 122(2), 282–318.
D I P IETRO , G. (2017): “Degree classification and recent graduates’ ability: Is there any signalling effect?,” Journal
of Education and Work, 30(5), 501–514.
FARBER , H. S., AND R. G IBBONS (1996): “Learning and Wage Dynamics,” The Quarterly Journal of Economics,
111(4), 1007–1047.
F ENG , A., AND G. G RAETZ (2017): “A Question of Degree: The Effects of Degree Class on Labor Market
Outcomes,” Economics of Education Review, 59, to appear.
F REDRIKSSON , P., L. H ENSVIK , AND O. N ORDSTR ÖM S KANS (2015): “Mismatch of Talent: Evidence on
Match Quality, Entry Wages, and Job Mobility,” Research Papers in Economics 2015:10, Stockholm University,
Department of Economics.
F REIER , R., M. S CHUMANN , AND T. S IEDLER (2015): “The earnings returns to graduating with honors: Evidence
from law graduates,” Labour Economics, 34(0), 39 – 50, European Association of Labour Economists 26th
Annual Conference.
G IBBONS , R., L. F. K ATZ , T. L EMIEUX , AND D. PARENT (2005): “Comparative Advantage, Learning, and
Sectoral Wage Determination,” Journal of Labor Economics, 23(4), 681–724.
JAEGER , D. A., AND M. E. PAGE (1996): “Degrees Matter: New Evidence on Sheepskin Effects in the Returns to
Education,” The Review of Economics and Statistics, 78(4), 733–740.
K AHN , L. B., AND F. L ANGE (2014): “Employer Learning, Productivity, and the Earnings Distribution: Evidence
from Performance Measures,” Review of Economic Studies, 81(4), 1575–1613.
K HOO , P., AND B. O ST (2017): “The Effect of Latin Honors on Earnings,” Discussion paper, University of Illinois
at Chicago.
L ANG , K., AND D. K ROPP (1986): “Human Capital Versus Sorting: The Effects of Compulsory Attendance Laws,”
The Quarterly Journal of Economics, 101(3), 609–624.
L ANGE , F. (2007): “The Speed of Employer Learning,” Journal of Labor Economics, 25(1), 1–35.
L ANGE , F., AND R. T OPEL (2006): “Chapter 8: The Social Value of Education and Human Capital,” Handbook of
the Economics of Education, 1, 459 – 509.
16
L AYARD , R., AND G. P SACHAROPOULOS (1974): “The Screening Hypothesis and the Returns to Education,”
Journal of Political Economy, 82(5), 985–998.
L EE , D. S., AND T. L EMIEUX (2010): “Regression Discontinuity Designs in Economics,” Journal of Economic
Literature, 48(2), 281–355.
S PENCE , M. (1973): “Job Market Signaling,” The Quarterly Journal of Economics, 87(3), 355–374.
T YLER , J. H., R. J. M URNANE , AND J. B. W ILLETT (2000): “Estimating the Labor Market Signaling Value of
the GED,” The Quarterly Journal of Economics, 115(2), 431–468.
17