Which Moments To Match
Which Moments To Match
A. RONALD GALLANT
University of North Carolina
GEORGE TAUCHEN
Duke University
1. INTRODUCTION
We present a systematic approach to generating moment conditions for the
generalized method of moments (GMM) estimator (Hansen, 1982) of the
parameters of a structural model. The approach is an alternative to the com-
mon practice of selecting a few low-order moments on an ad hoc basis and
then proceeding with GMM. The idea is simple: Use the expectation under
the structural model of the score from an auxiliary model as the vector of
moment conditions.
This score is the derivative of the log density of the auxiliary model with
respect to the parameters of the auxiliary model. Thus, the moment condi-
tions depend on both the parameters Of the auxiliary model and the param-
eters of the structural model. The parameters of the auxiliary model are
replaced by their quasimaximum likelihood estimates, which are computed
by maximizing the pseudolikelihood of the auxiliary model.
The estimates of the structural parameters are computed by minimizing a
GMM criterion function. As seen later, the optimal weighting matrix for
forming the GMM criterion from the moment conditions depends only on
the auxiliary model and is easily computed.
We thank Laura Baldwin, Ravi Bansal, John Coleman, and Anthony Smith for helpful discussions and two
referees for very useful comments. This paper was supported by the National Science Foundation. Address
correspondence to: A. Ronald Gallant, Department of Economics, University of North Carolina, CB #3305,
6F Gardner Hall, Chapel Hill, NC 27599-3305, USA; e-mail: [email protected].
We call the auxiliary model the score generator. The score generator need
not encompass (nest) the structural model. If it does, then the estimator is
as efficient as the maximum likelihood estimator. Hence, our approach
ensures efficiency against a given parametric model. If the score generator
closely approximates the actual distribution of the data, even though it does
not encompass it, then the estimator is nearly fully efficient.
The estimation context that we have in mind is one where a structural
model defines a data generation process for the data. The key feature of this
data generation process is that it is relatively easy to compute the expecta-
tion of a nonlinear function given values for the structural parameters. An
expectation may be computed by simulation, by numerical quadrature, or by
analytic expressions, whichever is the most convenient.
Examples of this estimation context are the panel data models motivating
the simulated method of moments approach of Pakes and Pollard (1989) and
McFadden (1989). Another is the asset pricing model that motivates the
dynamic method of moments estimator of Duffie and Singleton (1993). In
Section 4, we present three such situations drawn from macroeconomics,
finance, and empirical auction modeling. In these examples, the likelihood
is difficult to compute, so maximum likelihood is infeasible. Simulation and
moment matching thus naturally arise.
As indicated, there is no presumption that the score generator encompasses
the structural model, although an order condition for identification requires
a minimal level of complexity of the score generator. Under weak regularity
conditions, our estimator is root-/? consistent and asymptotically normal with
an asymptotic distribution that depends on both the structural model and the
score generator. If there exists a local, smooth mapping of the structural
parameters into the parameters of the score generator, then the estimator has
the same asymptotic distribution as the maximum likelihood estimator under
the structural model.
The asymptotic theory of the estimator subsumes situations with strictly
exogenous variables, where one conditions on particular values of the explan-
atory variables. It also subsumes situations with predetermined but not
strictly exogenous variables, as is typical of stationary Markov data gener-
ation processes. The most general version allows for processes with time-
dependent laws of motion and dependence extending into the indefinite past.
Section 2 presents the asymptotic justification of the estimator. Section 3
presents some candidate specifications for the score generator. Section 4
presents three proposed applications, each of which is a substantive empiri-
cal project.
2. THEORY
For a stochastic process described by a sequence of densities [piix^p),
[Pt(yt\xt>p)\?=\) f° r which expectations of nonlinear functions are easily
computed by simulation, by quadrature, or by analytic expressions, we
WHICH MOMENTS TO MATCH? 659
which is to say that p° denotes the true value of the parameter p in the
model
(2)
where R denotes the parameter space. The model
{ ) (3)
is called the score generator. The variables y, and x, can be univariate or
multivariate or have a dimension that depends on t. The functional form of
the score generator may act to exclude elements of the vector y,\ that is, the
score generator may define a stochastic process for only some elements of y,.
When we say that a process is time invariant we mean that the densities
of the process do not depend on t, in which case the t subscripts on the den-
sities may be suppressed and the dimensions of y, and x, are fixed. Writing
JnXn*N(Q, Vn) means Jn{Vlnn)-xXn^N{Q,I), where Vn = (K/2)(^/2)'.
Smoothly embedded is defined as follows.
These are the moment conditions that define the estimator. In most appli-
cations, analytic expressions for the integrals will not be available and sim-
ulation or quadrature will be required to compute them.
If the integrals are computed by simulation, then the formulas used in
practice are
where W{yr,xT\p) and (yr,xT) are the weights and the abscissae implied by
the quadrature rule. Of course, Nis dramatically smaller for quadrature rules
than for Monte Carlo integration. Quadrature for Case 3 will be at too high
a dimension to be practical in most applications.
As to statistical theory, Cases 1 and 2 are special cases of Case 3, so that
throughout the rest of the discussion we cart discuss Case 3 arid specialize to
Cases 1 and 2, as required.
662 A. RONALD GALLANT AND GEORGE TAUCHEN
The randomness in mn (p, $„) is solely due to the random fluctuation of the
maximum likelihood estimator §„. Under the regularity conditions imposed
earlier, there is a sequence ($g} such that mn(p°,6°)=0, lim,,...^ - B°) = 0
almost surely, and
3n)~ l{3°){3°n)-'], (15)
where d° = (d/dd')mn(p°,e°) and
= Var[-Jr S (16)
(Gallant, 1987, Ch. 7, Theorem 6). Note that 3° and #£ are not random
quantities because we have assumed that either quadrature has been em-
ployed to compute mn(p,6) or that N is as large as necessary to make the
average essentially the same as the expected value. Using Taylor's theorem
- 6°n)
(17)
This implies that
(18)
Thus, given an estimator %„ of 3° that is consistent in the sense that
lim^cotii,, - 3°) = 0 almost surely, the GMM estimator with an efficient
weighting matrix is
3n =^ Yx [(d/dennMy^XmiS/ddnnMy^X)]' (20)
can be used. This estimator can also be used with Gaussian QMLE scores if
the conditional mean and variance functions are correctly specified (Bollerslev
and Woolridge, 1992). A sufficient (but not necessary) condition is Assump-
tion 2.
A weaker assumption that facilitates estimation of 3° is the following.
Assumption 1. There is a 6° such that
sm if T2: 0,
.(5. -r)' ifr< 0 (25)
(Gallant, 1987, Ch. 7, Theorem 5). See Andrews (1991) for alternative sug-
gestions as to appropriate weights and rates one might use instead of w(x)
and ny$. The Parzen weights suggested above guarantee the positive defi-
niteness of 3n, which is essential. Weights that do not guarantee positive
definiteness cannot be used.
If one is unwilling to accept Assumption 1, then the estimator §„ is mod-
ified as follows. First, compute the initial estimator
Compute
(27)
using the integration methods already described. For Case 1, use the estimator
ln = - S [(d/dd)lnft(yt\xjn) - £,] [(d/dd)lnf,(9,\XjH) - A,]'. (28)
664 A. RONALD GALLANT AND GEORGE TAUCHEN
09)
S 2
x [(d/de)\nf,.r(y,-r\xt-rA) - A,-,]' (30)
for T > 0. It is unlikely that this generality will be necessary in practice
because the use of this formula means that one thinks that the score gener-
ator is a poor statistical approximation to the data generating process, which
is unlikely to be true for the following reasons. The score generator is con-
ceptually a reduced form model, not a structural model. Thus, it is ordinarily
easy to modify it by adding a few parameters so that it fits the data well. The
situation where one thinks the score generator is a poor approximation might
arise in hypothesis testing, but even then the null hypothesis will usually imply
either Assumption 1 or Assumption 2 and the generality is, again, unnecessary-
Theorem 1 gives the asymptotic distribution of pn.
THEOREM 1. For Case 1, let Assumptions 8-11 of Gallant (1987, Ch. 3)
hold. For Cases 2 and 3, let Assumptions 8-11 of Gallant (1987, Ch. 7) hold.
Then,
Proof. Apply Theorems 7 and 9 of Gallant (1987, Ch. 3) for Case 1 and
Theorems 8 and 10 of Gallant (1987, Ch. 7) for Cases 2 and 3. Make these
associations: X = p, X° = p°, mn(X) = mn(p,8~n), m°(\) = mn(p,6°), and
5n° = VeiT[yfnmn(K)] = K- •
The identification condition
mn(p,Q°) = 0 =» p = p° for all n larger than some n° (34)
is among the regularity conditions of Theorem 1. The situation is analogous
to verification of the order and rank conditions of simultaneous equations
models. The order condition is that the dimension of 8 must exceed the
dimension of p. However, due to nonlinearity, analytic verification of the
analog of the rank condition, which is that the equations mn(p,B°) = 0 do
not have multiple solutions for p e R, is difficult. See Gallant (1977) for dis-
WHICH MOMENTS TO MATCH? 665
for/= 1,2
and
j
x J{pT(yr\x,,p0)dyrpx(xx\p0)dxx =0 (37)
T=l
666 A. RONALD GALLANT AND GEORGE TAUCHEN
j ••• J[(d/d8)\nMyt\xl,B°)md/a8)lnMyt\xl,80)]'
n
x TJ PAyT\xT,p°) dyTpx (x, \p°) dxt
T=l
= - J • • • J(d2/d8d8')lnft(yt\xt,8°)
') (43)
(Gallant, 1987, Ch. 7, Sect. 4). Because
pt(yt\x,,P))7=i}, (44)
WHICH MOMENTS TO MATCH? 667
pmlr is also the maximum likelihood estimator for the process (pi(x,|p),
[p,{y,\x,,fi))^i]'
If g cannot be computed, the minimum chi-square estimator is not prac-
tical. However, S(0 n |p) can be computed by simulation and the preceding
remarks suggest that minimum chi-square with S(0n|p) replacing g(p) would
be a practical, fully efficient estimator. See Gourieioux, Monfort, and Renault
(1993) and Smith (1993) for examples. The difficulty with this approach is
that the simulated minimum chi-square estimator is computationally ineffi-
cient relative to the GMM estimator proposed here because at each of the N
Monte Carlo repetitions in the expression &(§n\p) = (l/A02£Li BnT an opti-
mization to compute 0~nT is required. The GMM estimator requires only the
one optimization to compute 8~n and avoids the TV extra optimizations
required to compute 8(0 n |p). Moreover, one would actually have to invoke
Assumption 2 or estimate DU to follow this approach. See Gourie'roux et al.
(1993) for additional remarks on the relationships among various approaches.
We conclude this section by showing that pn has the same asymptotic dis-
tribution as pmie.
THEOREM 2. Assumption 3 implies
Jn('pn-p°) = N{0,[(Gn0)'(3°)(Gn0)]-'}. (45)
Proof. From the first-order conditions
0 = (d/dp) lm^pn,dn)(3n)-lmn(Pn,en)]
= 2[(d/dp)m^pn,en)] [3nrlmAPn,0n), (46)
we have, after a Taylor's expansion of mn(pn,9n),
[(Mn)'(3n)-'(Mn)]V7i(pn - p ° ) = -[(Mn)'{Sn)-l(d/de)mn]yfn(en - 0°),
(47)
where the overbars indicate that the rows of Mn{p,d) = (d/dp')mn{p,6)
and (3/dd')mn(p,0) have been evaluated at points on the line segment
joining {pn,9n) to (p°,d°). Recall that Mn and M° indicate evaluation of
Mn{p,0) at (pn,en) and (p°,0°), respectively. Now lim^.(M n - M°) = 0,
linw.(A2rB - M°n) = 0, l i n w [ - (d/d8)mn - 3°n] = 0, and Um n _(3 n - 3°n) =
0 a.s. Furthermore,
32)-']. (48)
Therefore, the preceding equation can be rewritten as
[(Mn)'(3n)-\Mn)] Vn(pn - p°) = {MZ)'sTn(en - 9°) + op(l), (49)
which implies that
n ,=i J
n
x
HPr(yAxT,P)dyTP\(xx\p)
')-S f ••
n l=\ J
X n/r(
T—l
= -££ f
X n/rCj'rlJfr.^)*',
+ - S fo(d/3p')/»i(jri|p)|,-p»d*,
- t t ( •• ([(d/mnf<(yt\x,,d°)]
x
WHICH MOMENTS TO MATCH? 669
X
T=l
= - S f •'• ![(d/dd)lnfl(yl\Xt,eo)}[(d/de)\nMyl\xl,eo)]'
(51)
to be small enough that p., can be put to zero with little effect upon the
accuracy of the computation of Snr and small enough that
S nr , r * 0, (53)
can be put to zero with little effect on the accuracy of the computation of
§„. The estimator of 3° would then assume its simplest form:
3n = -n S [^/^)^My,\xJn)][(dm)\nf,(yt\xlten)Y. (54)
Both SNP and neural nets are series expansions that have the property that
(52) can be made arbitrarily small by using enough terms in the expansion
670 A. RONALD GALLANT AND GEORGE TAUCHEN
(Gallant and Nychka, 1987; Gallant and White, 1992). Hence, £, and (53)
can be made arbitrarily small by using enough terms. The appropriate num-
ber of terms relative to the sample size are suggested by the results of Fen-
ton and Gallant (1996) and McCaffrey and Gallant (1994). However, there
is as yet no general theory giving the rate at which terms can be added so as
to retain Vn-asymptotic normality so one must guard against taking too
many terms and then claiming that standard asymptotics apply.
4. APPLICATIONS
subject to
c, + kt+l-ktsAk?vu, (56)
where c, is consumption at time t and k, the capital stock at the beginning
of period t (i.e., inherited from period t — 1); vlt and v2t are strictly posi-
tive shocks to technology and preferences; 8,(-) is shorthand for the condi-
tional information given all variables in the model dated time t and earlier;
and the parameters satisfy 0 < |3 < 1, 7 > 0, A>Q, and 0 < a < 1. The
agent's choice variables at time / are c, and Ar,+I. The stochastic process v, =
(fiMu2/)' is strictly stationary and Markovian of order r, with conditional
density 4>(vl+11 v*,S), where v* = (v,' y,'_,)' and 5 is a parameter vector.
The Euler equation for this problem is
il). (57)
The solution of the optimization problem is
k,+l=*k(kt,vn, (58)
c, = ic(k,,vn, (59)
where &. and \pk are the policy functions.
WHICH MOMENTS TO MATCH? 671
There is no known closed form solution for the policy functions, though
the policy functions can be well approximated using one of the newly devel-
oped methods for solving nonlinear rational expectations models. The 1990
symposium in the Journal of Business and Economic Statistics (Tauchen,
1990) surveys many of the extant methods. For this model, and the proposed
application, the method of Coleman (1990), which uses quadrature for
numerical integration and computes the policy function over an extremely
fine grid, is probably the most accurate and numerically efficient.
Using Coleman's method to evaluate the policy functions, one can then
easily simulate from this model. Given an initial value k° for the capital
stock, and a simulated realization {vT) generated from #(v| v*,8), one gen-
erates simulated [kT,cT] by recursively feeding the vT and kT through the
policy function for capital. Good practice is to allow the iterations to run
for a long while in order to let the effects of the transients wear off. A sim-
ulated realization of length N, {£r, £,.}£,,, would be the last ./V values of the
iterations.
Strategies to implement empirically the corresponding competitive equilib-
rium of this model differ depending on which variables are used to confront
the model to data, that is, which variables enter the score generator. For
example, with good data on both consumption and capital, the researcher
could use (c,,k,)'. However, if capital is poorly measured but output well
measured, then it would be better to use (c,,q,)', where q, is total output,
which in the simulation would be computed as qT = Ak?vlT. Neither of
these strategies, though, makes use of price data.
A strategy that incorporates price information is to use c, along with the
returns on a pure discount risk-free bond, rbt, and a stock, rst. Asset returns
are determined via asset pricing calculations, carried out as follows. (It turns
out to be a bit easier to think of the equations defining returns between t and
t + 1.) The bond return, rbJ+l, is the solution to
c,-7 = S,(/3c,7i)d+/*.,+!). (60)
and rbil+l is known to agents at time /. For the stock return, the dividend
process is dst = Ak"vu — rbt,k,, and the stock price process [psl] is the solu-
tion to the expectational equation
Pstcp = 6,[0c,7| {ps,l+l + */,.,+,)]. (61)
The stock return between t and t + 1 is rStt+l = (ps,,+l + dSil+1)/psl. Solv-
ing for the asset returns entails additional computation that could potentially
be as numerically intensive as approximating the policy functions.
This formulation presumes that, in the competitive equilibrium, the firm
uses 100% debt financing to rent from a household the capital stock kl+l
for one period at interest rate rbit+i. (Both r 6 / + 1 and k,^ are determined
and known at time t.) The firm distributes to the household as the dividend
ds,t+\ = Ak?+X vu,+i — rbt+i k,+i, which is the firm's cash flow in period t + 1,
672 A. RONALD GALLANT AND GEORGE TAUCHEN
that is, the proceeds after paying off the bondholder. Other conceptualiza-
tions are possible, and, in particular, the stock price and returns process
could be different if the firm retains earnings or uses different forms of debt
financing.
One typically does not observe a risk-free real bond return. Common prac-
tice in empirical asset pricing is to use the consumption series along with
either the real ex-post return on the stock (deflated using a price index) or
the excess of the stock return over the bond return, ret = rst — rbt, from
which inflation cancels out. This practice presumes that the observed data
come from a monetary economy with exactly the same real side as above and
a nominal side characterized by a binding cash-in-advance constraint, which
implies unitary monetary velocity.
We show how to implement the estimator on data consisting of consump-
tion and the excess stock return. This is done for illustrative purposes. The
proposal offers an alternative to the standard SMM strategy of selecting out
a set of low-order moments, as in Gennotte and Marsh (1993), for estima-
tion of an asset pricing model. In actual practice, one would want to employ
more sophisticated versions of the model with time nonseparabilitres in con-
sumption and production and also include additional latent taste and/or tech-
nology shocks when additional asset returns are observed. Common practice
in stochastic modeling is to include sufficient shocks or measurement errors
to preclude the predicted distribution of the data from being concentrated
on a lower dimensional manifold, which is normally counterfactual, and the
model being dismissed out of hand immediately.
Put y, = (ret,c,)'. Let p = (y,A,a,6')' denote the vector of structural
parameters. The numerical solution of the model provides a means to sim-
ulate data given a value of p.
Experience with financial data suggests that a reasonable choice for the
score generator is the sequence of densities defined by an ARCH (Engle,
1982) or GARCH (Engle and Bollerslev, 1986) process. For ease of exposi-
tion, we show the ARCH case. Consider the multivariate ARCH model
y, = b0 + 2 Bjy,.j + «„ (62)
where u, ~N(Q,Lt). Let pdf (y,\x,, \p) denote the implied conditional density
of [y,] under the ARCH model, where x, = (y',-L,... ,y',-\)'> L = LX+ L2,
and
* = (*6. v ec([5, B2.. .BLl])', d, v e c ( [ C , C 2 . . . CLl])')'. (64)
Common practice in ARCH modeling is to impose a priori restrictions, such
as diagonality or factor restrictions, so as to constrain ^ = ${9) to depend
WHICH MOMENTS TO MATCH? 673
The first equation is the mean equation with parameters ny, c, and ry\ the
second is the volatility equation with parameters a and rw. [y,\ is an ob-
served financial returns process and [w,] is an unobserved volatility process.
In the basic specification, z, and zt are mutually independent iidN(O.l)
shocks. The model can be generalized in an obvious way to accommodate
longer lag lengths in either equation. Versions of this model have been exam-
ined by Clark (1973), Melino and Turnbull (1990), Harvey, Ruiz, and Shep-
hard (1993), Jacquier, Poison, and Rossi (1994), and many others. The
appeal of the model is that it provides a simple specification for speculative
price movements that accounts, in qualitative terms, for broad general fea-
tures of data from financial markets such as leptokurtosis and persistent vol-
atility. The complicating factor for estimation is that the likelihood function
is not readily available in closed form, which motivates consideration of
other approaches.
Gallant, Hsieh, and Tauchen (1994) employ the estimator of this paper to
estimate the stochastic volatility model on a long time series comprised of
16,127 daily observations {y,)\tl21 on adjusted movements in the Standard
and Poor's Composite Index, 1928-1987. The score generator is an SNP
model, as described in Section 3. The specification search for appropriate
auxiliary models for {y,]\t\27 leads to two scores: a "Nonparametric ARCH
Score," when errors are constrained to be homogeneous, and a "Nonlinear
Nonparametric Score," when errors are allowed to be conditionally hetero-
geneous. The Nonparametric ARCH Score contains indicators for both devi-
ations from conditional normality and ARCH. Together, these scores suffice
to identify the stochastic volatility model; indeed, the stochastic volatility
model places overidentifying restrictions across these scores. The Nonlinear
Nonparametric ARCH Score contains additional indicators for conditional
heterogeneity, most importantly, the leverage type effect of Nelson (1991),
which is a form of dynamic asymmetry. These additional indicators identify
dynamic asymmetries like those suggested by Harvey and Shephard (1993),
which the Nonparametric ARCH Score does not identify. When fitted to
either of these two scores, the standard stochastic volatility model fails to
approximate the distribution of the data adequately; it is overwhelmingly
rejected on the chi-square goodness-of-fit tests. After altering the distribu-
tion of z, to accommodate thickness in both tails along with left skewness
and generalizing the volatility equation to include long memory (Harvey,
1993), the stochastic volatility model can match the moments defined by the
simpler Nonparametric ARCH Score, but not those defined by the Nonlin-
ear Nonparametric Score. Introducing cross-correlation between zt-\ and z,
as in Harvey and Shephard (1993) improves the fit to the Nonlinear Nonpara-
metric Score substantially, but still the stochastic volatility model cannot fit
that score. Overall, Gallant et al. (1994) find the estimation provides a com-
putationally tractable means to assess the relative plausibility of a wide class
of alternative specifications of the stochastic volatility model. They show how
WHICH MOMENTS TO MATCH? 675
Auctions are commonly used to sell assets. Game theoretic models of auc-
tions provide a detailed theory of the mapping from the disparate values that
bidders place on the asset to the final outcome (the winner and the sales
price). The predictions of this theory depend strongly on the assumptions
regarding the characteristics of the auction and the bidders. Generally, the
specific rules of the auction along with the information structure, the atti-
tudes of the bidders toward risk, and the bidders' strategic behavior all mat-
ter a great deal in determining the final outcome (Milgrom, 1986).
Empirical implementation of game theoretic models of auctions lags well
behind the theory. The extreme nonlinearities and numerical complexity of
auction models presents substantial obstacles to direct implementation. Two
recent papers, by Paarsch (1991) and Laffont, Ossard, and Vuong (1991)
make substantial progress, however. In both papers, the task is to estimate
the parameters of the distribution of values across bidders. Paarsch devel-
ops a framework based on standard maximum likelihood. His approach can
handle a variety of informational environments but is restricted to a relatively
narrow set of parametric models for the valuation distribution—essentially
the Pareto and Weibull. Laffont et al. use a simulation approach, and they
can thereby handle a much broader class of valuation distributions. How-
ever, their approach imposes only the predictions of the theory regarding first
moments and ignores higher order structure, which can cause problems of
inefficiency and identification.
The method set forth in Section 2 imposes all restrictions and generates an
efficient estimate of the valuation distribution. In what follows, we illustrate
how one would implement the method for some of the simpler models of
auctions. A full empirical study would go much further and, in particular,
would relax our strong assumptions and consider other environments known
to be theoretically important.
We first provide a short overview of some of the simplest auction models
and then proceed to the econometrics.
676 A. RONALD GALLANT AND GEORGE TAUCHEN
where i/(1:B) ^ • • • <, viB:B) are the order statistics of v, vB, and /(•) is
the zero-one indicator function. On the event v{B:B) < r0, the winning bid is
defined as zero and the item is unsold.
~L&-Poa{y\r0,B,q,p), or simplyPoa(y\x,p) withx= (ro,B,q), denote the
conditional probability density of the winning bid. Below, we write either
Poa(y\ro,B,q,p) or poa{y\x,p), depending on whether or not we wish to
emphasize dependence on each of the different components of x. In general,
Poa(yIx,p) is an ordinary density on the regiony > r0, so long as h(v\q,p)
is smooth, whereas Poa(y\x,p) has atoms at y = 0 and y = r0. In certain
circumstances — for example, h(v\q,p) is Pareto or Weibull as in Paarsch
(1991)—poa(y\x,p) has a manageable closed-form expression. In other
circumstances—for example, h(v\q,p) is lognormal as in Laffont et al.
(1991)—poa(y\x,p) admits no tractable expression. However, so long as it
is easy to simulate from h{v\q,p), then it is easy to simulate fromPoa(y\x,p).
For the sealed bid first price auction, the winning bid is
y = E[max{viB-l:B),r0)\viB:B)}I(vlB:B) > r0) ifB>2, (69)
y = r0l(vi>r0) ifS=l. (70)
WHICH MOMENTS TO-MATCH? 677
Thus, when there are two or more bidders and viB:B) > r0, then the winning bid
follows the distribution of the conditional expectation of max(viB-i-.B)>ro)
given viB.B). Letp sp (y\r 0 ,B,q,p), orp sp (y\x,p), denote the implied con-
ditional density of the winning bid in the sealed bid case.
Generally, psp(y\x,p) is less manageable in practice than is poa(y\x,p).
Generation of a simulated draw from psp(y\ix,p) entails either numerical
integration of the cumulative distribution of the valuation distribution or a
double-nested set of simulations.
fi,-i:B,).r0,), (72)
where vrAs,-i:S,) is the second highest order statistic of the rth independent
simulated realization of (yrI vrBi) i.i.d. from h(v\q,,p). In their moti-
vating examples and empirical applications, v is conditionally lognormal with
a mean that depends on q,, and p contains the parameters of this condi-
tional lognormal distribution. The SNLLS estimator is nonlinear least
squares with a heteroskedasticity-robust estimate of the asymptotic variance
of p that accounts for conditional heteroskedasticity of
e, = y,- Hoa{ro,,B,,qt,P°). (73)
678 A. RONALD GALLANT AND GEORGE TAUCHEN
Laffont et al. (1991) noted that revenue equivalence implies the same for-
mulation of the conditional mean function applies for a sealed bid auction.
Revenue equivalence implies
= jypoa(y\r0,B,q,p)dy
= Hsb(ro,B,q,p) (74)
for all rQ, B, q, and p. Hence, one can evaluate the conditional mean func-
tion at the data, that is, compute fiSb(rot>B,,qt,p), by simulating and aver-
aging exactly as one does under oral ascending rules. The result can be a
significant reduction in computational demands.
The SNLLS approach works off of the conditional first moment implica-
tions alone, though, and auction models place additional structure on the
data. An auction model has second moment implications as well as first
moment implications. In fact, it actually dictates the functional form of the
conditional heteroskedasticity in the nonlinear regression equation, which
suggests additional moment conditions. There are practical consequences
from not incorporating additional restrictions beyond first moment informa-
tion. Laffont et al. (1991) and Baldwin (1992) find it difficult to estimate the
variance of the underlying parent lognormal using SNLLS. Bringing in sec-
ond moment estimation can be expected to alleviate this difficulty. In gen-
eral, there are further implications beyond first and second moments as well;
imposition of all implications of the model can be expected to sharpen even
further the estimates of the parameter p.
Ideally, one wants to do this by doing maximum likelihood using either
Poa(.y\r0,B,q,p) orpsb(y\r0,B,q,p) as appropriate to define the likelihood.
The difficulty is that both densities are intractable, except in the special cir-
cumstances assumed by Paarsch (1991).
The approach outlined in Section 2 can come close to the maximum like-
lihood ideal. Our analysis pertains to the just-described situation where the
likelihood is smooth but intractable; it does not cover cases where the like-
lihood is nondifferentiable in parameters. The consistency of the estimator
pn is not affected by nondifferentiability but asymptotic normality may be.
See Hansen, Heaton, and Luttmer (1995, Appendix C) for a discussion of
differentiability considerations with respect to GMM estimators.
The approach would be applied to the auction data as follows. Bn is ob-
tained as
( i ) (75)
WHICH MOMENTS TO MATCH? 679
where
REFERENCES
Ellner, S., A.R. Gallant, & J. Theiler (1995) Detecting nonlinearity and chaos in epidemic data.
In D. Mollison (ed.), Epidemic Models: Their Structure and Relation to Data, pp. 54-78. Cam-
bridge: Cambridge University Press.
Engle, R.F. (1982) Autoregressive conditional heteroskedasticity with estimates of the variance
of United Kingdom inflation. Econometrica 50, 987-1007.
Engle, R.F. (1994) Indirect Inference on Volatility Diffusions and Stochastic Volatility Models.
Manuscript, University of California at San Diego.
Engle, R.F. & T. Bollerslev (1986) Modeling the persistence of conditional variance. Economet-
ric Reviews 5, 1-50.
Fenton, V. & A.R. Gallant (1996) Convergence rates of SNP density estimators. Econometrica
64, 719-727.
Gallant, A.R. (1977) Three stage least squares estimation for a system of simultaneous, non-
linear, implicit equations. Journal of Econometrics 5, 71-88.
Gallant, A.R. (1987) Nonlinear Statistical Models. New York: Wiley.
Gallant, A.R., D.A. Hsieh, & G. Tauchen (1994) Estimation of stochastic volatility models with
diagnostics. Manuscript, Duke University.
Gallant, A.R. & D.W. Nychka (1987) Semi-nonparametric maximum likelihood estimation. Econ-
ometrica 55, 363-390.
Gallant, A.R., P.E. Rossi, & G. Tauchen (1992) Stock prices and volume. Review of Financial
Studies 5, 199-242.
Gallant, A.R. & G. Tauchen (1989) Seminonparametric estimation of conditionally constrained
heterogeneous processes: Asset pricing applications. Econometrica 57, 1091-1120.
Gallant, A.R. & G. Tauchen (1992) A nonparametric approach to nonlinear time series analysis:
Estimation and simulation. In E. Parzen, D. Brillinger, M. Rosenblatt, M. Taqqu, J. Geweke,
& P. Caines (eds.), New Dimensions in Time Series Analysis, pp. 71-92. New York:
Springer-Verlag.
Gallant, A.R. & H. White (1992) On learning the derivatives of an unknown mapping with multi-
layer feedforward networks. Neural Networks 5, 129-138.
Gennotte, G. & T.A. Marsh (1993) Variations in economic uncertainty and risk premiums on
capital assets. European Economic Review 37, 1021-1041.
Ghysels, E. & J. J. Jasiak (1994) Stochastic volatility and time deformation: An application to
trading volume and leverage effects. Manuscript, University of Montreal.
Gourie'roux, C , A. Monfort, & E. Renault (1993) Indirect inference. Journal of Applied Econo-
metrics 8, S85-S118.
Hansen, L.P. (1982) Large sample properties of generalized method of moments estimators.
Econometrica 50, 1029-1054.
Hansen, L.P., J. Heaton, & E.J.G. Luttmer (1995) Econometric evaluation of asset pricing mod-
els. The Review of Financial Studies 8, 237-274.
Harvey, A.C. (1993) Long Memory in Stochastic Volatility. Manuscript, London School of
Economics.
Harvey, A . C , E. Ruiz, & N. Shephard (1993) Multivariate stochastic variance models. Review
of Economic Studies 61, 247-264.
Harvey, A.C. & N. Shephard (1993) Estimation of an Asymmetric Stochastic Volatility Model
for Asset Returns. Manuscript, London School of Economics.
Jacquier, E., N.G. Poison, & P.E. Rossi (1994) Bayesian analysis of stochastic volatility mod-
els. Journal of Business and Economic Statistics 12, 371-388.
Laffont, J.-J., H. Ossard, & Q. Vuong (1991) The Econometrics of First-Price Auctions. Doc-
ument de Travail 7, Institut d'Economie Industrielle, Toulouse.
McCaffrey, D.F. & A.R. Gallant (1994) Convergence rates for single hidden layer feedforward
networks. Neural Networks 7, 147-158.
McFadden, D. (1989) A method of simulated moments for estimation of discrete response mod-
els without numerical integration. Econometrica 57, 995-1026.
WHICH MOMENTS TO MATCH? 681
Melino, A. & S.M. Turnbull (1990) Pricing foreign currency options with stochastic volatility.
Journal of Econometrics 45, 239-266.
Milgrom, P. (1986) Auction theory. In T. Bewley (ed.), Advances in Economic Theory, pp. 1-32.
Cambridge: Cambridge University Press.
Nelson, D. (1991) Conditional heteroskedasticity in asset returns: A new approach. Economet-
rica 59, 347-370.
Paarsch, H.J. (1991) Empirical Models of Auctions and an Application to British Columbian
Timber Sales. Discussion paper 91-19, University of British Columbia.
Pakes, A. & D. Pollard (1989) Simulation and the asymptotics of optimization estimators. Econ-
ometrica 57, 1027-1058.
Smith, A.A. (1993) Estimating nonlinear time series models using vector autoregressions: Two
approaches. Journal of Applied Econometrics 8, 63-84.
Tauchen, G. (1990) Associate editor's introduction. Journal of Business and Economic Statis-
tics 8,1-1.
Tauchen, G. & M. Pitts (1983) The price variability-volume relationship on speculative mar-
kets. Econometrica 51, 485-505.