Card 1993 - Using Geographic Variation in College Proximity to Estimate the Return to Schooling searchable
Card 1993 - Using Geographic Variation in College Proximity to Estimate the Return to Schooling searchable
David Card
I am grateful to Charles Thomas and Norman Thurston for outstanding research assistance, and
to Michael Boozer, Alan Krueger, and Cecilia Rouse for comments. This research was funded
by the Industrial Relations Section of Princeton University. This paper is part of NBER's
research program in Labor Studies. Any opinions expressed are those of the author and not
those of the National Bureau of Economic Research.
NBER Working Paper #4483
October 1993
ABSTRACT
A convincing analysis of the causal link between schooling and earnings requires an
exogenous source of variation in education outcomes. This paper explores the use of college
proximity as an exogenous determinant of schooling. Analysis of the NLS Young Men Cohort
reveals that men who grew up in local labor markets with a nearby college have significantly
higher education and earnings than other men. The education and earnings gains are concentrated
among men with poorly-educated parents -- men who would otherwise stop schooling at
relatively low levels. When college proximity is taken as an exogenous determinant of schooling
the implied instrumental variables estimates of the return to schooling are 25-60% higher than
Since the effect of a nearby college on schooling attainment varies by family background
schooling. The results affirm that marginal returns to education among children of less-educated
parents are as high and perhaps much higher than the rates of return estimated by conventional
methods.
David Card
Department of Economics
Princeton University
Princeton, N.J. 08544
and NBER
One of the most important "facts" about the labor market is
that better-educated workers earn higher wages. Hundreds of
studies in virtually every country show earnings gains of 5-15
percent (or more) per additional year of schooling.1 Despite this
evidence, most analysts are reluctant to interpret the earnings gap
between more and less educated workers as a reliable estimate of
the economic return to schooling. Education levels are not
randomly assigned across the population; rather, individuals make
their own schooling choices. Depending on how these choices
are made, measured earnings differences between workers with
different levels of schooling may over-state or under-state the
"true" return to education. 2
A convincing analysis of the causal link between education and
earnings requires an exogenous source of variation in education
choices. In this paper I argue that geographic differences in the
accessibility of college are a potential source·of such exogenous
variation.3 Using data from the Young Men Cohort of the
1
Studies of the United States are reviewed in Rosen (1977),
and Willis (1986). A survey of international studies is presented
in Psacharopoulos (1985).
2
See Griliches (1977) for an overview of the issues.
3
A similar idea is used by Kane and Rouse (1993) to control
for the endogeneity of choice between a four-year college and a
two-v., ear colle .ge.
Mallar (1979) used proximity to a training site to estimate the
effect of the Job Corps program.
2
National Longitudinal Survey I find that men who were raised in
local labor markets with a nearby 4-year college have
significantly higher levels of education and earnings. This
differential persists even after controlling for regional and family
background factors (including parental education and family
structure). The effects of a nearby college are largest for men
with the lowest predicted levels of schooling attainment,
suggesting that the presence of a local college lowers the costs
and/or raises the perceived benefits of education among children
with relatively poor family backgrounds.
When college proximity is taken as an exogenous determinant
of schooling the implied instrumental variables estimates of the
return to education are 25-60 percent higher than the
corresponding ordinary least squares estimates. Contrary to
widespread belief (e.g. Ehrenberg and Smith (1991, pp. 320-
322)) but consistent with a growing number of studies of
endogenous school choice, these findings suggest that the cross-
sectional earnings gap between more- and less-educated workers
may under-state the economic return to schooling for some groups
of workers. 4
4
See e.g. Angrist and Krueger (1991a), Ashenfelter and
Krueger (1992), Kane and Rouse (1993) and Butcher and Case
(1993). All four of these studies report instrumental variables
estimates of the return to schooling that exceed the conventional
ordinary least-squares estimate in the same dat;l set
3
Since the effect of a nearby college on schooling attainment
varies with family background it is possible to test whether-
college proximity is a legitimately exogenous determinant of
schooling -- i.e., whether growing up near a college has a direct
effect on earnings or only an indirect effect through the education
decision. Specifically, one can include college proximity in the
earnings equation and use the interaction of college proximity
with a indicator for low parental education as an instrumental
variable for education. This identification strategy relies on the
extra boost to education and earnings among children with poor
family backgrounds. The resulting estimates are still substantially
higher than the ordinary least squares estimates, and provide no
evidence against the hypothesis that college proximity is an
exogenous determinant of schooling.
5
See Hall and Turner (1970).
5
The 1966 interview also included a 28 item test of
6
"Knowledge of the World of Work" (see row 7). The overall
score on this test is correlated with completed education and wage
rates in later waves of the survey, and the test has been used as
a measure of "ability" in several previous studies of education and
earnings (e.g. Griliches (1976, 1977)).
Finally, the NLSYM data set contains a number of
characteristics of the respondent's local labor market in 1966.7
Among these is an indicator for the presence of an accredited 4-
year college in the local labor market (row 8).8 About 70
percent of individuals lived in a labor market area with a nearby
college. The college proximity rate varies by region (lower in
the South and Mountain regions), by urban versus rural location
(higher for individuals living in a Standard Metropolitan
Statistical Area), and is correlated with race and parental
education (see below).
6
The test items were questions on the job activities of 10
specific occupations, the education requirements for these 10
occupations, and the relative earnings of 8 different pairs of
occupations.
7
These are based on the county of residence in 1966.
8
An indicator for the presence of a 2-year college is also
included in the NLSYM, but this variable turns out to be only
weakly correlated with education or earnings. See below.
6
(2) 𝑦𝑖 = 𝑋𝑖 𝛼 + 𝑆𝑖 𝛽 + 𝑢𝑖
9
Note that the estimated coefficient of a linear education
variable is only strictly interpretable as a "rate of return" to
schooling under very rigid conditions (see Mincer (1974)). I use
the terminology "rate of return to schooling" to refer to the
education coefficient in conventional human capital model.
10
If the return to education varies across individuals then the
coefficient {3 in equation (2) should be interpreted as the average
return to education. Specifically, suppose Yi = Xia + Si{3i +
Ei, where /3i is the marginal return to education for i. Then
equation (2) holds with 𝛽 = E(𝛽) and ui = 𝜖𝑖 + 𝑆𝑖 (𝛽𝑖 − 𝛽).
9
There are a variety of reasons why schooling may be
correlated with the unobserved component of earnings. One that
has received considerable attention in the literature is "ability
bias" (see e.g. Griliches (1977)). Suppose that some individuals
have an unobserved characteristic ("ability") that enables them to
earn higher wages at any level of education. If these individuals
acquire higher-than-average schooling then the OLS estimate of
(3 will be upward-biased. The fact that individuals with higher
test scores (on IQ or achievement tests) tend to have higher
earnings and more schooling is often interpreted as evidence of
ability bias.
Another important source of correlation between ui and vi is
measurement error in schooling. Measurement error induces a
negative correlation between the error components of earnings
and observed schooling, leading to a downward bias in OLS
estimates of (3 (see Griliches (1977)).11 A similar negative bias
arises if the true return to schooling varies across the population
and if individuals with lower levels of schooling have higher
returns to schooling. Such a negative correlation is implied by a
model of school choice in which individuals with different
11
Estimates in the literature (cited by Griliches) suggest that
10% of the variance in measured education is due to measurement
error. In this case the OLS estimate of the return to education is
downward biased by 10-15 percent, depending on what other
covariates are included in the model.
10
12
If the true rate of return to education varies across the
population then one can obtain a consistent estimate of the
average return to education for some subset of the population.
See Angrist and Imbens (1993).
13
Something like this idea is used by Angrist and Krueger
(1991b), who use draft-lottery status as an instrument for
schooling of men who could have served in the Vietnam war.
14
Tabulations of the October 1973 Current Population Survey
show that in the early 1970s 34% of college students age 18-24
lived with their parents while attending school. The fraction is
higher (39 %) for black students.
11
investments in higher education, at least among children from
relatively low-income families. 15
To check this basic insight I fit a linear model to years of
completed schooling (in 1976) for the subset of men who grew up
in local labor markets without an accredited 4-year college. The
determinants of schooling include region and urban/rural
indicators (measured as of 1966), age and race dummies, and
family background factors (family structure and parental
16
education). I then divided the overall sample into quartiles of
predicted education in the absence of a nearby college and
calculated the mean levels of education by quartile of predicted
education for men who grew up in areas with and without a local
college. Figure 1 plots the mean levels of education. In every
quartile the mean level of education is higher for those who grew
up near a college. For men in the three highest predicted
quartiles of education the effect of college proximity is modest
(0.2 to 0.4 years). For men in the lowest quartile, however, the
difference in mean education is 1.1 years. As expected, the
presence of a nearby college has its strongest effect on men with
lowest propensities to continue their education (e.g. men from
15
See Anderson, Bowman, and Tinto (1972) for a review of
the sociological literature on the effects of college accessibility on
attendance probabilities.
16
The R-squared of the regression is 0.30.
12
single-headed families with low parental education 1n rural
Southern areas).
17
Under the null hypothesis that the OLS estimates are
consistent the variance of the difference between the IV and OLS
estimates of the return to education is the difference in their
variances, which is approximately equal to the variance of the IV
estimate.
14
18
Education, experience, and the current location variables are
all defined as of the 1978 survey.
15
19
Assuming that measurement errors account for 10 percent
of the cross-sectional variance in observed _schooling (Siegel and
Hodge (1968)), and that the true effect of KWW on earnings is
0, the expected attenuation of the schooling coefficient when
KWW is added to the model is about 5 % .
20
Note that one could include IQ in the earnings equation and
use the KWW score as an instrument. This has no effect on the
conclusions from Table 4.
16
schooling. In row 5 college proximity is defined as living in a
local labor market with a public 4-year college.21 Proximity to
a public college has a slightly smaller reduced form effect on
education (0.31 years versus 0.32 for proximity to any ccllege)
and a slightly larger reduced form effect on earnings (6.2 %
versus 4.2 %). Thus the implied IV estimate of the return to
college is higher. than the IV estimate using proximity to any
college, although the standard error is again relatively large.
The IV estimation in row 6 combines 2 college proximity
variables: one for any accredited 4-year college, another for any
accredited 2-year college. In the reduced form equations the
presence of a nearby 2-year college has small positive effects on
schooling and earnings (whether or not an indicator is included
for proximity to a 4-year college). Using both ·indicators as
instruments leads to an estimated rate of return to education of
0.12, and a very slight improvement in the standard error of the
estimate relative to the baseline estimate in row 1.
One difficulty with these college proximity measures is that
they pertain to the place of residence in 1966 rather than the
place of residence at age 18 or 19, when the college enrollment
decision is typically made. By the time of the 1966 interview
21
Among the men who grew up in local labor markets with
accredited 4-year colleges, 73% were in labor markets with a
public 4-year coilege.
17
22
For unmarried respondents enrolled in college and living
away from home in 1966 the place of residence was defined as
the place of residence of their parents. Thus there should be no
reverse-causation for these individuals.
18
Is College Proximity a Legitimate Instrument?
For college proximity to serve as a legitimate instrument for
completed education it must affect individual schooling decisions
but have no direct effect on earnings. There are at least three
reasons why men who grew up near a college may have higher
earnings than other men, controlling for education, geographic
information, and parental background. First, families that place
a strong emphasis on education may choose to live near a college.
Children of these families may have higher "ability" or may be
more highly motivated to achieve labor market success. Either
factor could induce a positive correlation between college
proximity and the unobserved determinants of wages (i.e. un in
equation (2)). Second, the presence of a college may be
associated with higher school quality at nearby elementary and
secondary schools. Card and Krueger (1992) show that higher
school quality is associated with higher earnings. The omission
of direct information on the quality of schools attended by men
in the NLSYM may then lead to an error component'in wages
that is correlated with college proximity. Finally, if only
imperfect indicators are available for the place of residence in
1976, and if men who grew up in areas with a nearby college
tend to live in higher-wage areas, then college proximity may be
correlated with unobserved geographic wage premiums.
19
(lb) 𝑆𝑖 = 𝑋1𝑖 𝛾1 + 𝐶𝑖 𝛿0 + 𝐶𝑖 ∗ 𝑃𝑖 𝛿1 + 𝑣𝑖 ,
(2b) 𝑦𝑖 = 𝑋1 𝛼1 + 𝐶𝑖 𝛼0 + 𝑆𝑖 𝛽 + 𝑢𝑖
23
This definition of low family background was derived by
comparing mean education levels of men in the 8 parental
education classes used in the models in Tables 3 and 4. The
means show a discrete drop for men from the two lowest parental
education categories. I therefore combined the two categories as
a "low family background indicator.
11
21
set has the effect of lowering the standard error of the IV
estimate, while raising the point estimate slightly. An over-
identification test for the mutual consistency of the available
instruments is insignificant (p-value =0.28). As in column (3),
the estimate of the direct earnings effect of living near a college
is small and statistically insignificant.
Another alternative is to interpret predicted education in the
absence of a nearby college (i.e. the predicted education level
used to generate the quartiles in Figure 1) as a continuous
indicator of "family background". Using the interaction of
predicted education and college proximity as an instrument, and
including college proximity directly in the earnings equation, the
IV estimate of the return to education is 0.122, with a standard
error of 0.075.
Regardless of the method of classifying family background, IV
estimates based on the interaction of family background and
_college proximity are similar to IV estimates based on college
proximity alone. Furthermore, estimates of the direct effect of
college proximity on wages are uniformly small and statistically
insignificant. Assuming that college proximity can be excluded
from the earnings equation, both college proximity and its
interaction with family-background indicators can be used as
instruments for schooling. For example, using 9 parental
education indicators interacted with college proximity as
22
24
The over-identification test statistic for this estimate (with
8 degrees of freedom) has a probability value of 0.38.
23
this paper and in other recent studies is substantially above this
range.
An alternative possibility, discussed in some detail in Card
(1993), is that the "true" rate of return to education varies across
the population, and that the increase in education associated with
college proximity occurs for individuals with relatively high rates
of return to schooling. Algebraically, the IV estimate of the
return to schooling is the ratio of the differences in average wages
and average education between individuals who grew up in labor
markets with and without a nearby college.25 If the presence of
a nearby college affects only the education decisions of men with
poor family backgrounds, then the IV estimate depends only on
the marginal return to schooling in this subset of the population.
Thus one explanation for the relatively high IV estimates of the
return to education in Tables 3-5 is that the marginal return to
education among men with poor family backgrounds is relatively
high.
Why do men with poorly educated parents have high returns
to schooling? According to the simplest economic model of
25
Specifically, let y1 and y2 represent mean wages of
individuals who grew up in labor markets with and without a
nearby college (adjusted for other covariates), and let S1 and S2
represent mean years of schooling for the same 2 groups (again,
adjusted for other covariates). Then the IV estimate of the return
to schooling is (y1-y2)/(S1-S2).
24
26
This is a condensed version of the argument developed in
Card (1993). See also Lang (1993).
25
that the economic value of education for many children may be
significantly understated.
Conclusion
Any credible analysis of the causal link between education and
earnings requires an exogenous source of variation in education
choices. In this paper I explore the use of college accessibility as
an exogenous determinant of schooling. An analysis of education
and earnings outcomes for men in the NLS Young Men Cohort
shows that men who grew up in areas with a nearby 4-year
college have significantly higher schooling and significantly
higher earnings. These effects are concentrated among men with
poorly-educated parents -- men who would otherwise stop
schooling at relatively low levels. The implied instrumental
variables estimates of the earnings gain per year of additional
schooling (10-14 %) are substantially above the earnings gains
estimated by a conventional ordinary least squares procedure
(7.3%).
These inferences are robust to minor changes in specification,
including the addition of measured test scores to the earnings
model and changes in the definition of college proximity.
Nevertheless, they rely on the restrictive assumption that living
near a college has no effect on earnings apart from the effect
through education. To test this assumption I use the fact that
26
college proximity has a larger impact on the schooling choices of
men with poorer family backgrounds. Thus, an interaction of
college proximity and low family background can be used as an
instrumental variable for observed schooling even in earnings
models that include a direct college proximity effect. The results
of this test give rise to estimates in the same range as the simpler
instrumental variables estimates based on college proximity alone.
References
Anderson, C. Arnold, Mary Jean Bowman, and Vincent Tinto.
Where Colleges Are and Who Attends. New York:
McGraw Hill, 1972.
15
C
.
3 14
"O
w
(II 13
L
0
a,
>
0
2
2 3
Quartile of Predicted Education
Figure 1
Table 1: Semple Characteristics for Overall Sample and 1976 Subset
of National longitudinal Survey of Young Men
(1) ( 2) ( 3) ( 4) ( 5)
b/
B: Treat Experience and Experience Squared as Endogenous
OLS IV
Estimate Estimate •
Notes: The dependent variable in row and rows 3·7 ls the log of hourly
wages in 1976. The dependent variable in row 2 is the log of
hourly wages in 1978. Reported estimates are coefficients of
linear education variable in models th t also include a black
indicator, Indicators for southern residence and residence in an
SMSA in 1976, indicators for region in 1966 and living in an SMSA
in 1966, expe;lence and experience squared, and 14 variables
representing mother's and father's education, indicators for missing
father's or mother's education, interactions of mother's and
father's education, and dummies for family structure at age 14.
1
1n these models education and experience are treated as endogenous.
Instruments for experience and experience squared are age and age
squared. Instrument for education is proximity to a 4•year college
unless otherwise noted.
b
In column 3 the instrument for education is an Interaction of
an Indicator for low parental education with an indicator for
living near a college in 1966. In column 4 the instruments are
interactions of 8 parental education class indicators with an
indicator for living near a college in 1966.
C
14 variables representing mother's and father's education,
indicators for missing father's or mother's education,
interactions of mother's and father's education, and dummies
for family structure at age 14.