0% found this document useful (0 votes)
61 views25 pages

Which Moments To Match

Gallant and Tauchen (1996). Approach to generate moments conditions for GMM

Uploaded by

José Guerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views25 pages

Which Moments To Match

Gallant and Tauchen (1996). Approach to generate moments conditions for GMM

Uploaded by

José Guerra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Econometric Theory, 12, 1996. 657-681. Printed in the United States of America.

WHICH MOMENTS TO MATCH?

A. RONALD GALLANT
University of North Carolina
GEORGE TAUCHEN
Duke University

We describe an intuitive, simple, and systematic approach to generating mo-


ment conditions for generalized method of moments (GMM) estimation of the
parameters of a structural model. The idea is to use the score of a density that
has an analytic expression to define the GMM criterion. The auxiliary model
that generates the score should closely approximate the distribution' of the ob-
served data but is not required to nest it. If the auxiliary model nests the struc-
tural model then the estimator is as efficient as maximum likelihood. The
estimator is advantageous when expectations under a structural model can be
computed by simulation, by quadrature, or by analytic expressions but the
likelihood cannot be computed easily.

1. INTRODUCTION
We present a systematic approach to generating moment conditions for the
generalized method of moments (GMM) estimator (Hansen, 1982) of the
parameters of a structural model. The approach is an alternative to the com-
mon practice of selecting a few low-order moments on an ad hoc basis and
then proceeding with GMM. The idea is simple: Use the expectation under
the structural model of the score from an auxiliary model as the vector of
moment conditions.
This score is the derivative of the log density of the auxiliary model with
respect to the parameters of the auxiliary model. Thus, the moment condi-
tions depend on both the parameters Of the auxiliary model and the param-
eters of the structural model. The parameters of the auxiliary model are
replaced by their quasimaximum likelihood estimates, which are computed
by maximizing the pseudolikelihood of the auxiliary model.
The estimates of the structural parameters are computed by minimizing a
GMM criterion function. As seen later, the optimal weighting matrix for
forming the GMM criterion from the moment conditions depends only on
the auxiliary model and is easily computed.

We thank Laura Baldwin, Ravi Bansal, John Coleman, and Anthony Smith for helpful discussions and two
referees for very useful comments. This paper was supported by the National Science Foundation. Address
correspondence to: A. Ronald Gallant, Department of Economics, University of North Carolina, CB #3305,
6F Gardner Hall, Chapel Hill, NC 27599-3305, USA; e-mail: [email protected].

© 1996 Cambridge University Press 026^4666/96 $9.00 + .10 657


658 A. RONALD GALLANT AND GEORGE TAUCHEN

We call the auxiliary model the score generator. The score generator need
not encompass (nest) the structural model. If it does, then the estimator is
as efficient as the maximum likelihood estimator. Hence, our approach
ensures efficiency against a given parametric model. If the score generator
closely approximates the actual distribution of the data, even though it does
not encompass it, then the estimator is nearly fully efficient.
The estimation context that we have in mind is one where a structural
model defines a data generation process for the data. The key feature of this
data generation process is that it is relatively easy to compute the expecta-
tion of a nonlinear function given values for the structural parameters. An
expectation may be computed by simulation, by numerical quadrature, or by
analytic expressions, whichever is the most convenient.
Examples of this estimation context are the panel data models motivating
the simulated method of moments approach of Pakes and Pollard (1989) and
McFadden (1989). Another is the asset pricing model that motivates the
dynamic method of moments estimator of Duffie and Singleton (1993). In
Section 4, we present three such situations drawn from macroeconomics,
finance, and empirical auction modeling. In these examples, the likelihood
is difficult to compute, so maximum likelihood is infeasible. Simulation and
moment matching thus naturally arise.
As indicated, there is no presumption that the score generator encompasses
the structural model, although an order condition for identification requires
a minimal level of complexity of the score generator. Under weak regularity
conditions, our estimator is root-/? consistent and asymptotically normal with
an asymptotic distribution that depends on both the structural model and the
score generator. If there exists a local, smooth mapping of the structural
parameters into the parameters of the score generator, then the estimator has
the same asymptotic distribution as the maximum likelihood estimator under
the structural model.
The asymptotic theory of the estimator subsumes situations with strictly
exogenous variables, where one conditions on particular values of the explan-
atory variables. It also subsumes situations with predetermined but not
strictly exogenous variables, as is typical of stationary Markov data gener-
ation processes. The most general version allows for processes with time-
dependent laws of motion and dependence extending into the indefinite past.
Section 2 presents the asymptotic justification of the estimator. Section 3
presents some candidate specifications for the score generator. Section 4
presents three proposed applications, each of which is a substantive empiri-
cal project.

2. THEORY
For a stochastic process described by a sequence of densities [piix^p),
[Pt(yt\xt>p)\?=\) f° r which expectations of nonlinear functions are easily
computed by simulation, by quadrature, or by analytic expressions, we
WHICH MOMENTS TO MATCH? 659

derive a computationally convenient GMM estimator for p that uses the


scores (d/dd)lnf,(y,\x,,8) from another sequence of densities (/i(*i|0),
[ft(y,\xt,d)]?=l} to generate the moment conditions. As an example,
(Pi(*i \p)APt{yt\xt,p)XtL\} might be a model that describes asset prices y, in
terms of exogenous variables x, and structural parameters p and [f\ (x{ \9),
lft(yi\xt,9)}lLi} might be the sequence of densities of a GARCH process
(Bollerslev, 1986, 1987). The estimator is consistent and asymptotically nor-
mally distributed in general. It is fully efficient if the model [pi(Xi\p),
[Pt(yt\Xt,p)\T=\\PeR is smoothly embedded (Definition 1) within the model
Tne
{fi(xi\B),lf,(yt\x,,d)}Zi}eee- estimator is attractive when the density
ft (yt I x,, 6) has a convenient analytic expression whereas the density p, (y, \ xt, p)
does not.
Throughout, the observed data (yt,xt]%\ are assumed to have been gen-
erated from the sequence of densities
{Pi(Jf,|p°),{A(^l*r.Po))?-i}. (1)

which is to say that p° denotes the true value of the parameter p in the
model
(2)
where R denotes the parameter space. The model

{ ) (3)
is called the score generator. The variables y, and x, can be univariate or
multivariate or have a dimension that depends on t. The functional form of
the score generator may act to exclude elements of the vector y,\ that is, the
score generator may define a stochastic process for only some elements of y,.
When we say that a process is time invariant we mean that the densities
of the process do not depend on t, in which case the t subscripts on the den-
sities may be suppressed and the dimensions of y, and x, are fixed. Writing
JnXn*N(Q, Vn) means Jn{Vlnn)-xXn^N{Q,I), where Vn = (K/2)(^/2)'.
Smoothly embedded is defined as follows.

DEFINITION 1. The model [pi(xl\p),{pl(yl\xl,p)fcl}l>eRissaid to be


smoothly embedded within the score generator [/, (*, |0), {/,(.y,jx,,0))," i } flge
if for some open neighborhood R° of p° there is a twice continuously
differentiable mapping g:R°->6 such that
P,(yl\xl,p)=f,[yl\xl,g{p)], / = 1,2,..., (4)
for every p G R° and/?, (x{ \p) = / , [x, \g(p)] for every p e R°.
We consider three cases.
Case 1. All densities are time invariant, the analysis is conditional on
the observed sequence (x,)?-,, and the data [(y,,x,))"=i are a sample from
660 A. RONALD GALLANT AND GEORGE TAUCHEN

H"=iP(ytlxt,p°). An example for which these assumptions are appropriate


is the nonlinear regression model on cross-sectional data with independently
and identically distributed errors. For this case, using simulation to com-
pute the GMM estimator proposed here requires N simulated sequences
{(Ar)"=i}£ii from the density H?=iP(y,\xt,p). We impose Assumptions 1-6
of Gallant (1987, Ch. 3) on both p(y)x,p) and/(>-|;t,0).
Case 2. All densities are time invariant, the analysis is unconditional, and
the data {(^,,x,))7=, are a sample from JI?=lp{yt\xt,po)p(xl\po). An ex-
ample for which these assumptions are appropriate is an autoregressive
process where x, is comprised of L lagged values of y,. We impose Assump-
tions 1-6 of Gallant (1987, Ch. 7) on both p(y\x,p) and/(.y|;e,0). In addi-
tion, \(yt,x,)\%-m is assumed to be stationary with joint density p{y,x\p),
marginal densityp(x|p) =fp(y,x\p) dy, and conditional densityp(y\x,p) =
P(y>x\p)/p(x\p). Similarly for f(y\x,8). For this case, using simulation
to compute the GMM estimator requires only a single simulated sequence
{(A..*r))£i from the density IL^piyr\xT,p)p{xl \p), generated as follows:
Start at an arbitrary xx = ( y 0 , . . . ,y-L+i), simulate yy from p{y\\x{,p),
put £2 = (h y-L+2)> simulate y2 from p(y2\x2,p), and so on. So that
xT is plausibly a sample from p(x\p), enough initial simulations are dis-
carded for transients to die out and the next N simulations are retained as
the sequence [(yT,xT)}?=t.
Case 3. Densities are not time invariant, the analysis is unconditional, and
the data ((j',,x,))7=1 are a sample from W}=\Pl(y,\xl,p°)px(xx\p0). This
framework does permit conditioning on the initial observation Xi and con-
ditioning on exogenous variables. Conditioning on x{ is accomplished by
letting /?! (JCJ |p) put its mass on a single point. Conditioning on exogenous
variables w, is accomplished through the dependence of Pt(y,\x,,p) on t by
puttingpt(y,Ix,,p) = p{yt\xt,wnp). An example for which these assump-
tions are appropriate is a nonlinear regression with fixed regressors w, and
lagged dependent variables x,. For Case 3, using simulation to compute the
GMM estimator may require Nsimulated sequences {(ylT,x,T}1^x J ^ from
the density H1=\Pi(y,\x,,p)pi(x{ \p). However, in the common case where
the structural model is Case 2 and the score generator describes an asymp-
totically strictly stationary process, a single simulated sequence as in Case 2
suffices. We impose Assumptions 1-6 of Gallant (1987, Ch. 7) on both
P,(y,\x,,p)
Our idea is to use the scores
(d/dd)lnMyt\xt,d) (5)
evaluated at the quasimaximum likelihood estimate

6n = argmax - £lnf,(y,\xt,d) (6)


eee ft , = 1
WHICH MOMENTS TO MATCH? 661

to generate GMM moment conditions. The GMM moment equations are

Casel: mn(pjn) = - 2 . f {d/de)lnf(y\xjn)p{y\xl,p)dy, , (7)

Case2: mn(p,6n) = Jj (d/dd)\nf(y\x,8n)p(y\x,p)dyp(x\p)dx, (8)

Case3: mn(pX) = - E f ••• [{d/de)\nf,(y,\xjn)


n
^•IlPr(yr\xT,p)dyTp1(xi\p)dxl. (9)

These are the moment conditions that define the estimator. In most appli-
cations, analytic expressions for the integrals will not be available and sim-
ulation or quadrature will be required to compute them.
If the integrals are computed by simulation, then the formulas used in
practice are

Case 1: mn(p,en) = - 2 -J- £ (d/dd)\nf{ylr\xjn), (10)

Case 2: mn{p,Bn) = -J- £ {d/dd)\nf{yT\xT,6n), UD

Case 3: mn(p,h) = - S T; 2 (d/dd)\nft(ylT\xlT,dn)- (12)


We assume that N is large enough that the Monte Carlo integral approxi-
mates the analytic integral to within a negligible error of the same sort as is
made in computing any mathematical expression on a digital computer.
There are instances where the integral can be computed to a given accu-
racy at less cost by quadrature. Quadrature rules have the generic form

Case 1: mn(P,Bn) = - 2 T; 2 (Bm)\nf{ytr\xJn)W{ytT\x,,p), (13)


n , = , N T=1

Case 2: mn(p,6n) = j : E (d/de)\nf(yT\xT,dnW(yT,xT\p), (14)

where W{yr,xT\p) and (yr,xT) are the weights and the abscissae implied by
the quadrature rule. Of course, Nis dramatically smaller for quadrature rules
than for Monte Carlo integration. Quadrature for Case 3 will be at too high
a dimension to be practical in most applications.
As to statistical theory, Cases 1 and 2 are special cases of Case 3, so that
throughout the rest of the discussion we cart discuss Case 3 arid specialize to
Cases 1 and 2, as required.
662 A. RONALD GALLANT AND GEORGE TAUCHEN

The randomness in mn (p, $„) is solely due to the random fluctuation of the
maximum likelihood estimator §„. Under the regularity conditions imposed
earlier, there is a sequence ($g} such that mn(p°,6°)=0, lim,,...^ - B°) = 0
almost surely, and
3n)~ l{3°){3°n)-'], (15)
where d° = (d/dd')mn(p°,e°) and

= Var[-Jr S (16)

(Gallant, 1987, Ch. 7, Theorem 6). Note that 3° and #£ are not random
quantities because we have assumed that either quadrature has been em-
ployed to compute mn(p,6) or that N is as large as necessary to make the
average essentially the same as the expected value. Using Taylor's theorem
- 6°n)
(17)
This implies that
(18)
Thus, given an estimator %„ of 3° that is consistent in the sense that
lim^cotii,, - 3°) = 0 almost surely, the GMM estimator with an efficient
weighting matrix is

pn = argmin m'n(p,6~n)(3n)-xmn(p,6n). (19)

The computations necessary to estimate 3° depend on how well one


thinks that the score generator approximates the true data generating pro-
cess. If one is confident that the score generator is a good statistical approx-
imation to the data generating process, then the estimator

3n =^ Yx [(d/dennMy^XmiS/ddnnMy^X)]' (20)

can be used. This estimator can also be used with Gaussian QMLE scores if
the conditional mean and variance functions are correctly specified (Bollerslev
and Woolridge, 1992). A sufficient (but not necessary) condition is Assump-
tion 2.
A weaker assumption that facilitates estimation of 3° is the following.
Assumption 1. There is a 6° such that

(d/dB)lnMy,\xt,e°) n PAyr\xT,fio)dytPx(x,\p°)dxt=0 (21)


T=l

for every t < n.


WHICH MOMENTS TO MATCH? 663

Case 2 will always satisfy Assumption 1 because of stationarity and time


in variance. Thus, it is an assumption that only affects Cases 1 and 3.
For Case 1, the preceding estimator,

Si, = - 2 [Wm*f,{yt\zJn)mdm\nft(yt\xJn)Y> (22)

retains its consistency under the weaker Assumption 1.


For Cases 2 and 3, the following estimator is consistent under Assumption 1:
- [n 1/31 / T \
n (23)
\[n1/5)}
where
if O s x s 3,
»w - \*\)3
13
if i s JC s 1, (24)
and
n
'l

sm if T2: 0,
.(5. -r)' ifr< 0 (25)

(Gallant, 1987, Ch. 7, Theorem 5). See Andrews (1991) for alternative sug-
gestions as to appropriate weights and rates one might use instead of w(x)
and ny$. The Parzen weights suggested above guarantee the positive defi-
niteness of 3n, which is essential. Weights that do not guarantee positive
definiteness cannot be used.
If one is unwilling to accept Assumption 1, then the estimator §„ is mod-
ified as follows. First, compute the initial estimator

p* = argmin m'n(p,6n)mn(p,en). (26)

Compute

£,= f ••• I\d/bB)\nft(yt\xJn)ilpr(yr\xT,p"n)dyrpl{Xv\p1!,)dx ,


J J r =l

(27)
using the integration methods already described. For Case 1, use the estimator
ln = - S [(d/dd)lnft(yt\xjn) - £,] [(d/dd)lnf,(9,\XjH) - A,]'. (28)
664 A. RONALD GALLANT AND GEORGE TAUCHEN

For Case 3, use the formula

09)

with §nT above modified to read

S 2
x [(d/de)\nf,.r(y,-r\xt-rA) - A,-,]' (30)
for T > 0. It is unlikely that this generality will be necessary in practice
because the use of this formula means that one thinks that the score gener-
ator is a poor statistical approximation to the data generating process, which
is unlikely to be true for the following reasons. The score generator is con-
ceptually a reduced form model, not a structural model. Thus, it is ordinarily
easy to modify it by adding a few parameters so that it fits the data well. The
situation where one thinks the score generator is a poor approximation might
arise in hypothesis testing, but even then the null hypothesis will usually imply
either Assumption 1 or Assumption 2 and the generality is, again, unnecessary-
Theorem 1 gives the asymptotic distribution of pn.
THEOREM 1. For Case 1, let Assumptions 8-11 of Gallant (1987, Ch. 3)
hold. For Cases 2 and 3, let Assumptions 8-11 of Gallant (1987, Ch. 7) hold.
Then,

lim pn=p° a.s., (31)

yfn(pn-p°) « N[0,[{M!yMrlW!)}-l\, (32)


lim (Mn-M°) = 0 a.s., (33)
n—oa

Proof. Apply Theorems 7 and 9 of Gallant (1987, Ch. 3) for Case 1 and
Theorems 8 and 10 of Gallant (1987, Ch. 7) for Cases 2 and 3. Make these
associations: X = p, X° = p°, mn(X) = mn(p,8~n), m°(\) = mn(p,6°), and
5n° = VeiT[yfnmn(K)] = K- •
The identification condition
mn(p,Q°) = 0 =» p = p° for all n larger than some n° (34)
is among the regularity conditions of Theorem 1. The situation is analogous
to verification of the order and rank conditions of simultaneous equations
models. The order condition is that the dimension of 8 must exceed the
dimension of p. However, due to nonlinearity, analytic verification of the
analog of the rank condition, which is that the equations mn(p,B°) = 0 do
not have multiple solutions for p e R, is difficult. See Gallant (1977) for dis-
WHICH MOMENTS TO MATCH? 665

cussion and examples. It is usually adequate to rely on the optimization pro-


gram used to compute argmmesR'p'n{p,dn)(3n)~lmn(p,dn) to indicate the
flat spots on the surface m'n(p,b~n){Qn)~xmn(p,Gn) that suggest identification
failure. For example, the parameters of the mixing process of a stochastic
volatility model (see Section 4.2) require third- and fourth-order moment
information for identification. Using the score of a Gaussian vector auto-
regression will not provide this information. We have actually done this inad-
vertently by setting some tuning parameters erroneously in a computation
and learned of the error by the behavior of the optimizer with respect to the
parameters of the mixing process.
Direct use of Theorem 1 for setting confidence intervals on the elements
of pn or testing hypotheses with the Wald test requires computation of
Mn(p,9). This is probably easiest to do by saving the trial values of p and
m(p>8n) generated over the course of the optimization that computes
ai%m\nPeRm'n(p,0n)(3n)~lmn{p,On), fitting the local quadratic regressions
"», = bol + bl(p -pn) + {p- pn)'Bi(p - pn) for i = 1 , 2 , . . . ,dim(0) to the
elements of mn(p,9n) at points near pn and taking Mn to be the matrix with
rows b\. Computation of Mn(p,0) can be avoided by testing hypotheses
using the criterion difference test statistic (Gallant, 1987, Ch. 7, Theorem 15)
and setting confidence intervals by inverting it. Under Assumption 1, the
condition HVH' = H3~lH' of Gallant (1987, Ch. 7, Theorem 15) will be
satisfied.
It is important to note that we have not, as yet, made use of an assump-
tion that the score generator {/I(JC, \0),[f,(y,\x,,9)]flll}ese contains the true
model. That is, we have not yet imposed the following assumption.
Assumption 2. There is a 8° such that p,{yt\x,,p°) = f,(y,\x,,d°) for
°
Because Assumption 2 implies that the score generator is a correctly spec-
ified model, it implies Assumption 1 and the following standard results from
the theory of maximum likelihood estimation:

J (d/de)\nfl(yl\xt,e°)pl(yl\xl,p0) dy, = 0 (35)

for/= 1,2

/a^ln/.U.Ifl^AUilP 0 )^. =0, (36)

and

j
x J{pT(yr\x,,p0)dyrpx(xx\p0)dxx =0 (37)
T=l
666 A. RONALD GALLANT AND GEORGE TAUCHEN

when t * s. These results allow use of the estimator

3* = £ £ [(d/d8)lnft(yt\x,,8n)] [(d/d8)lnft(yt\x,,8n)]' (38)

in Cases 1-3. Moreover,

j ••• J[(d/d8)\nMyt\xl,B°)md/a8)lnMyt\xl,80)]'

n
x TJ PAyT\xT,p°) dyTpx (x, \p°) dxt
T=l

= - J • • • J(d2/d8d8')lnft(yt\xt,8°)

' dJf! (39)


T=l

for t = 1,2,..., so that


<JrH6n-8°)~N[0,(S'r1]. (40)
Now let us examine the consequences of the smoothly embedded assump-
tion (see Definition 1, earlier).
Assumption 3. The model [pi(x1\p),[Pt(yt\x,,p))%,i}f)eR is smoothly
embedded within the score generator [fi(xl\8),[f,(yt\xl,6))?L1}geQ.
Assumption 3 implies Assumption 2. Moreover, the consistency of pn
implies that pn is tail equivalent (Gallant, 1987, p. 187) to a GMM estima-
tor obtained by optimizing over the closure of R" instead of over./?. There-
fore, without loss of generality, we may assume that the twice continuously
differentiable function g given by Definition 1 is defined over R. Let G(p) =
(d/dp')g(p), G° = G(p°), and G = G(p).
A consequence of Assumption 3 is that the minimum chi-square estimator

'Pmcs = argmin[fln - g{p)]'{3n) [8n - g(p)] (41)

is as efficient as the maximum likelihood estimator for \pi{xx\p),


lPt(yt\x,,p)]Zi}. To see this, first note that
s-P0) ~N{0,[(G!)'(9°n)(G!)}-1). (42)
Now, if pmle denotes the maximum likelihood estimator for {/i [x\ \g(p)],

') (43)
(Gallant, 1987, Ch. 7, Sect. 4). Because
pt(yt\x,,P))7=i}, (44)
WHICH MOMENTS TO MATCH? 667

pmlr is also the maximum likelihood estimator for the process (pi(x,|p),
[p,{y,\x,,fi))^i]'
If g cannot be computed, the minimum chi-square estimator is not prac-
tical. However, S(0 n |p) can be computed by simulation and the preceding
remarks suggest that minimum chi-square with S(0n|p) replacing g(p) would
be a practical, fully efficient estimator. See Gourieioux, Monfort, and Renault
(1993) and Smith (1993) for examples. The difficulty with this approach is
that the simulated minimum chi-square estimator is computationally ineffi-
cient relative to the GMM estimator proposed here because at each of the N
Monte Carlo repetitions in the expression &(§n\p) = (l/A02£Li BnT an opti-
mization to compute 0~nT is required. The GMM estimator requires only the
one optimization to compute 8~n and avoids the TV extra optimizations
required to compute 8(0 n |p). Moreover, one would actually have to invoke
Assumption 2 or estimate DU to follow this approach. See Gourie'roux et al.
(1993) for additional remarks on the relationships among various approaches.
We conclude this section by showing that pn has the same asymptotic dis-
tribution as pmie.
THEOREM 2. Assumption 3 implies
Jn('pn-p°) = N{0,[(Gn0)'(3°)(Gn0)]-'}. (45)
Proof. From the first-order conditions
0 = (d/dp) lm^pn,dn)(3n)-lmn(Pn,en)]
= 2[(d/dp)m^pn,en)] [3nrlmAPn,0n), (46)
we have, after a Taylor's expansion of mn(pn,9n),
[(Mn)'(3n)-'(Mn)]V7i(pn - p ° ) = -[(Mn)'{Sn)-l(d/de)mn]yfn(en - 0°),
(47)
where the overbars indicate that the rows of Mn{p,d) = (d/dp')mn{p,6)
and (3/dd')mn(p,0) have been evaluated at points on the line segment
joining {pn,9n) to (p°,d°). Recall that Mn and M° indicate evaluation of
Mn{p,0) at (pn,en) and (p°,0°), respectively. Now lim^.(M n - M°) = 0,
linw.(A2rB - M°n) = 0, l i n w [ - (d/d8)mn - 3°n] = 0, and Um n _(3 n - 3°n) =
0 a.s. Furthermore,
32)-']. (48)
Therefore, the preceding equation can be rewritten as
[(Mn)'(3n)-\Mn)] Vn(pn - p°) = {MZ)'sTn(en - 9°) + op(l), (49)
which implies that

yfn(pn - p°) - AflO,[(Afj?)'<s;)-1(Atf)]-1 )• (50)


We complete the proof by showing that M° = 3°G°°
668 A. RONALD GALLANT AND GEORGE TAUCHEN

n ,=i J
n

x
HPr(yAxT,P)dyTP\(xx\p)

')-S f ••

n l=\ J

X n/r(
T—l

= -££ f
X n/rCj'rlJfr.^)*',

+ - S fo(d/3p')/»i(jri|p)|,-p»d*,

- t t ( •• ([(d/mnf<(yt\x,,d°)]

x
WHICH MOMENTS TO MATCH? 669

X
T=l

= - S f •'• ![(d/dd)lnfl(yl\Xt,eo)}[(d/de)\nMyl\xl,eo)]'

(51)

3. GENERAL PURPOSE SCORE GENERATORS

As pointed out in Section 2, if a model [A{xl\8)Afl(yl\xl,0)}?=i)eee is


known to accurately describe the distribution of the data (y,)?^»then that
model should be the score generator that defines mn(p,6n) and pn. If not,
we can suggest two general purpose score generators.
The first is the SNP score, which can be expected to closely approximate
any nonlinear Markovian process. An example of its use in connection with
the estimator pn proposed here is that by Bansal, Gallant, Hussey, and
Tauchen (1995), whofita general equilibrium, two-country, monetary model
using high-frequency financial market data. The second is the neural net
score, which can be expected to closely approximate any cross-sectional non-
linear regression or any dynamic nonlinear autoregression, including deter-
ministic chaos. An example of its use in connection with pn is that by Ellner,
Gallant, and Theiler (1995), who use data widely believed to exhibit chaotic
dynamics to calibrate the parameters of the SEIR model, which is a model
of epidemics often used in health economics. The cited applications contain
descriptions of the SNP and neural net scores, respectively.
In terms of convenience, what one would like is for

(d/dOnnMy.lx.WPdy.lx^p0) dy, (52)

to be small enough that p., can be put to zero with little effect upon the
accuracy of the computation of Snr and small enough that
S nr , r * 0, (53)
can be put to zero with little effect on the accuracy of the computation of
§„. The estimator of 3° would then assume its simplest form:

3n = -n S [^/^)^My,\xJn)][(dm)\nf,(yt\xlten)Y. (54)

Both SNP and neural nets are series expansions that have the property that
(52) can be made arbitrarily small by using enough terms in the expansion
670 A. RONALD GALLANT AND GEORGE TAUCHEN

(Gallant and Nychka, 1987; Gallant and White, 1992). Hence, £, and (53)
can be made arbitrarily small by using enough terms. The appropriate num-
ber of terms relative to the sample size are suggested by the results of Fen-
ton and Gallant (1996) and McCaffrey and Gallant (1994). However, there
is as yet no general theory giving the rate at which terms can be added so as
to retain Vn-asymptotic normality so one must guard against taking too
many terms and then claiming that standard asymptotics apply.

4. APPLICATIONS

We discuss three classes of applications of the estimator developed in the pre-


vious sections. In the setup for each application, it is relatively simple to gen-
erate simulated realizations from the structural model while computation of
the likelihood is infeasible. Hence, simulation and moment matching are
appropriate estimation strategies.

4 . 1 . Consumption and Asset Returns in a Production Economy

Consider the following version of the Brock-Mirman one-sector setup. The


representative agent's problem is

max 8, —— 2J P'citf v2<,+, \ (55)

subject to
c, + kt+l-ktsAk?vu, (56)
where c, is consumption at time t and k, the capital stock at the beginning
of period t (i.e., inherited from period t — 1); vlt and v2t are strictly posi-
tive shocks to technology and preferences; 8,(-) is shorthand for the condi-
tional information given all variables in the model dated time t and earlier;
and the parameters satisfy 0 < |3 < 1, 7 > 0, A>Q, and 0 < a < 1. The
agent's choice variables at time / are c, and Ar,+I. The stochastic process v, =
(fiMu2/)' is strictly stationary and Markovian of order r, with conditional
density 4>(vl+11 v*,S), where v* = (v,' y,'_,)' and 5 is a parameter vector.
The Euler equation for this problem is
il). (57)
The solution of the optimization problem is
k,+l=*k(kt,vn, (58)
c, = ic(k,,vn, (59)
where &. and \pk are the policy functions.
WHICH MOMENTS TO MATCH? 671

There is no known closed form solution for the policy functions, though
the policy functions can be well approximated using one of the newly devel-
oped methods for solving nonlinear rational expectations models. The 1990
symposium in the Journal of Business and Economic Statistics (Tauchen,
1990) surveys many of the extant methods. For this model, and the proposed
application, the method of Coleman (1990), which uses quadrature for
numerical integration and computes the policy function over an extremely
fine grid, is probably the most accurate and numerically efficient.
Using Coleman's method to evaluate the policy functions, one can then
easily simulate from this model. Given an initial value k° for the capital
stock, and a simulated realization {vT) generated from #(v| v*,8), one gen-
erates simulated [kT,cT] by recursively feeding the vT and kT through the
policy function for capital. Good practice is to allow the iterations to run
for a long while in order to let the effects of the transients wear off. A sim-
ulated realization of length N, {£r, £,.}£,,, would be the last ./V values of the
iterations.
Strategies to implement empirically the corresponding competitive equilib-
rium of this model differ depending on which variables are used to confront
the model to data, that is, which variables enter the score generator. For
example, with good data on both consumption and capital, the researcher
could use (c,,k,)'. However, if capital is poorly measured but output well
measured, then it would be better to use (c,,q,)', where q, is total output,
which in the simulation would be computed as qT = Ak?vlT. Neither of
these strategies, though, makes use of price data.
A strategy that incorporates price information is to use c, along with the
returns on a pure discount risk-free bond, rbt, and a stock, rst. Asset returns
are determined via asset pricing calculations, carried out as follows. (It turns
out to be a bit easier to think of the equations defining returns between t and
t + 1.) The bond return, rbJ+l, is the solution to
c,-7 = S,(/3c,7i)d+/*.,+!). (60)
and rbil+l is known to agents at time /. For the stock return, the dividend
process is dst = Ak"vu — rbt,k,, and the stock price process [psl] is the solu-
tion to the expectational equation
Pstcp = 6,[0c,7| {ps,l+l + */,.,+,)]. (61)
The stock return between t and t + 1 is rStt+l = (ps,,+l + dSil+1)/psl. Solv-
ing for the asset returns entails additional computation that could potentially
be as numerically intensive as approximating the policy functions.
This formulation presumes that, in the competitive equilibrium, the firm
uses 100% debt financing to rent from a household the capital stock kl+l
for one period at interest rate rbit+i. (Both r 6 / + 1 and k,^ are determined
and known at time t.) The firm distributes to the household as the dividend
ds,t+\ = Ak?+X vu,+i — rbt+i k,+i, which is the firm's cash flow in period t + 1,
672 A. RONALD GALLANT AND GEORGE TAUCHEN

that is, the proceeds after paying off the bondholder. Other conceptualiza-
tions are possible, and, in particular, the stock price and returns process
could be different if the firm retains earnings or uses different forms of debt
financing.
One typically does not observe a risk-free real bond return. Common prac-
tice in empirical asset pricing is to use the consumption series along with
either the real ex-post return on the stock (deflated using a price index) or
the excess of the stock return over the bond return, ret = rst — rbt, from
which inflation cancels out. This practice presumes that the observed data
come from a monetary economy with exactly the same real side as above and
a nominal side characterized by a binding cash-in-advance constraint, which
implies unitary monetary velocity.
We show how to implement the estimator on data consisting of consump-
tion and the excess stock return. This is done for illustrative purposes. The
proposal offers an alternative to the standard SMM strategy of selecting out
a set of low-order moments, as in Gennotte and Marsh (1993), for estima-
tion of an asset pricing model. In actual practice, one would want to employ
more sophisticated versions of the model with time nonseparabilitres in con-
sumption and production and also include additional latent taste and/or tech-
nology shocks when additional asset returns are observed. Common practice
in stochastic modeling is to include sufficient shocks or measurement errors
to preclude the predicted distribution of the data from being concentrated
on a lower dimensional manifold, which is normally counterfactual, and the
model being dismissed out of hand immediately.
Put y, = (ret,c,)'. Let p = (y,A,a,6')' denote the vector of structural
parameters. The numerical solution of the model provides a means to sim-
ulate data given a value of p.
Experience with financial data suggests that a reasonable choice for the
score generator is the sequence of densities defined by an ARCH (Engle,
1982) or GARCH (Engle and Bollerslev, 1986) process. For ease of exposi-
tion, we show the ARCH case. Consider the multivariate ARCH model

y, = b0 + 2 Bjy,.j + «„ (62)

vech(E,) = c0 + S Cj vech(w,_; <g) u,_ ; ), (63)

where u, ~N(Q,Lt). Let pdf (y,\x,, \p) denote the implied conditional density
of [y,] under the ARCH model, where x, = (y',-L,... ,y',-\)'> L = LX+ L2,
and
* = (*6. v ec([5, B2.. .BLl])', d, v e c ( [ C , C 2 . . . CLl])')'. (64)
Common practice in ARCH modeling is to impose a priori restrictions, such
as diagonality or factor restrictions, so as to constrain ^ = ${9) to depend
WHICH MOMENTS TO MATCH? 673

on a lower dimensional parameter, 0. Let f(y,\x,,0) = pdf(.y,|;c,,^[0]) de-


note the ARCH conditional density under the restrictions, which we take as
the score generator.
Given the observed data set iyt}?=\, the first step in the estimation is to
apply quasimaximum likelihood to the ARCH model

dn = argmax - 2ln/,(9t\x,,B). (65)


see n ,^\
The second step is to estimate p by
p = argmin m'n(p,Bn){%n)-xmn(p,en), (66)
R

where mn(p,dn) = (l/AOS£i {b/dd)\nf,{yT\xT, §„), [yT\ is a simulated


realization given p from the model, and xT = (y',-L,... ,y't-\Y-
The relevant asymptotic distribution theory is that of Case 2 in Section 2.
The order condition for identification is that the length of 0 be at least as long
as the length of p. The analog of the rank condition is given in the discus-
sion following Theorem 1. It is exceedingly difficult to determine analytically
whether the ARCH scores suffice to identify the asset pricing model, which,
as noted earlier, is typical of nonlinear statistics. In practice, near fiat spots
in the sample objective function would be a strong indicator of failure of
identification. In such a case, further expansion of the score generator such
as relaxing conditional normality or using a non-Markov (GARCH) model
could bring in additional score components to achieve identification. The
mechanics of implementing a GARCH-type score generator are similar to
that just described for ARCH, though the notation is more cumbersome. In
either case, use of this estimator provides a means to bring to bear on the
task of selecting moments the knowledge that ARCH-GARCH models fit
returns data well.
Another possible score generator would be to use the SNP model of Gal-
lant and Tauchen (1989,1992); this strategy is employed by Bansal et al.
(1995) for estimation of a model of weekly currency market data. Use of an
SNP model would give the exercise a more nonparametric slant, as the choice
of dimension of the score generator model would be data-determined. Either
choice would ensure efficiency against a class of models known to capture
much of the first and second moment dynamics of asset prices and other
macro aggregates.

4.2. Stochastic Volatility


Consider the stochastic volatility model
y, - ny = c(^,_j - ny) + exp(w,)r,z,,
w, = cw,_, + rwz,.
674 A. RONALD GALLANT AND GEORGE TAUCHEN

The first equation is the mean equation with parameters ny, c, and ry\ the
second is the volatility equation with parameters a and rw. [y,\ is an ob-
served financial returns process and [w,] is an unobserved volatility process.
In the basic specification, z, and zt are mutually independent iidN(O.l)
shocks. The model can be generalized in an obvious way to accommodate
longer lag lengths in either equation. Versions of this model have been exam-
ined by Clark (1973), Melino and Turnbull (1990), Harvey, Ruiz, and Shep-
hard (1993), Jacquier, Poison, and Rossi (1994), and many others. The
appeal of the model is that it provides a simple specification for speculative
price movements that accounts, in qualitative terms, for broad general fea-
tures of data from financial markets such as leptokurtosis and persistent vol-
atility. The complicating factor for estimation is that the likelihood function
is not readily available in closed form, which motivates consideration of
other approaches.
Gallant, Hsieh, and Tauchen (1994) employ the estimator of this paper to
estimate the stochastic volatility model on a long time series comprised of
16,127 daily observations {y,)\tl21 on adjusted movements in the Standard
and Poor's Composite Index, 1928-1987. The score generator is an SNP
model, as described in Section 3. The specification search for appropriate
auxiliary models for {y,]\t\27 leads to two scores: a "Nonparametric ARCH
Score," when errors are constrained to be homogeneous, and a "Nonlinear
Nonparametric Score," when errors are allowed to be conditionally hetero-
geneous. The Nonparametric ARCH Score contains indicators for both devi-
ations from conditional normality and ARCH. Together, these scores suffice
to identify the stochastic volatility model; indeed, the stochastic volatility
model places overidentifying restrictions across these scores. The Nonlinear
Nonparametric ARCH Score contains additional indicators for conditional
heterogeneity, most importantly, the leverage type effect of Nelson (1991),
which is a form of dynamic asymmetry. These additional indicators identify
dynamic asymmetries like those suggested by Harvey and Shephard (1993),
which the Nonparametric ARCH Score does not identify. When fitted to
either of these two scores, the standard stochastic volatility model fails to
approximate the distribution of the data adequately; it is overwhelmingly
rejected on the chi-square goodness-of-fit tests. After altering the distribu-
tion of z, to accommodate thickness in both tails along with left skewness
and generalizing the volatility equation to include long memory (Harvey,
1993), the stochastic volatility model can match the moments defined by the
simpler Nonparametric ARCH Score, but not those defined by the Nonlin-
ear Nonparametric Score. Introducing cross-correlation between zt-\ and z,
as in Harvey and Shephard (1993) improves the fit to the Nonlinear Nonpara-
metric Score substantially, but still the stochastic volatility model cannot fit
that score. Overall, Gallant et al. (1994) find the estimation provides a com-
putationally tractable means to assess the relative plausibility of a wide class
of alternative specifications of the stochastic volatility model. They show how
WHICH MOMENTS TO MATCH? 675

to use the score vector of a rejected model to elucidate useful diagnostic


information.
There are other ongoing applications of the estimator in the context of sto-
chastic volatility. Engle (1994) employs it to estimate a continuous time sto-
chastic volatility model, with the score generator being a GARCH model
fitted to the discrete time data. Ghysels and Jasiak (1994) use it to estimate
a continuous time model of stock returns and volume subject to time defor-
mation like that of Clark (1973) and Tauchen and Pitts (1983). Their score
generator is an SNP model very similar to that of Gallant, Rossi, and Tau-
chen (1992) fitted to the discrete time returns and volume data.

4.3. Empirical Modeling of Auction Data

Auctions are commonly used to sell assets. Game theoretic models of auc-
tions provide a detailed theory of the mapping from the disparate values that
bidders place on the asset to the final outcome (the winner and the sales
price). The predictions of this theory depend strongly on the assumptions
regarding the characteristics of the auction and the bidders. Generally, the
specific rules of the auction along with the information structure, the atti-
tudes of the bidders toward risk, and the bidders' strategic behavior all mat-
ter a great deal in determining the final outcome (Milgrom, 1986).
Empirical implementation of game theoretic models of auctions lags well
behind the theory. The extreme nonlinearities and numerical complexity of
auction models presents substantial obstacles to direct implementation. Two
recent papers, by Paarsch (1991) and Laffont, Ossard, and Vuong (1991)
make substantial progress, however. In both papers, the task is to estimate
the parameters of the distribution of values across bidders. Paarsch devel-
ops a framework based on standard maximum likelihood. His approach can
handle a variety of informational environments but is restricted to a relatively
narrow set of parametric models for the valuation distribution—essentially
the Pareto and Weibull. Laffont et al. use a simulation approach, and they
can thereby handle a much broader class of valuation distributions. How-
ever, their approach imposes only the predictions of the theory regarding first
moments and ignores higher order structure, which can cause problems of
inefficiency and identification.
The method set forth in Section 2 imposes all restrictions and generates an
efficient estimate of the valuation distribution. In what follows, we illustrate
how one would implement the method for some of the simpler models of
auctions. A full empirical study would go much further and, in particular,
would relax our strong assumptions and consider other environments known
to be theoretically important.
We first provide a short overview of some of the simplest auction models
and then proceed to the econometrics.
676 A. RONALD GALLANT AND GEORGE TAUCHEN

Two auction models under independent private valuations. An item, such


as a tract of land or stand of timber, is to be sold at auction. The item will
be sold so long as a selling price at least as large as a reservation price r0 > 0
is realized; otherwise, it is left unsold.
There are two commonly used auction designs. In an oral ascending auc-
tion, the selling value of the item starts at r0 and then increases. Bidders
drop out as the selling value rises until one bidder remains, who pays the sell-
ing value at which the last of the other bidders dropped out. In a sealed bid
first price auction, all bids are collected simultaneously. The object is sold
to the highest bidder, who pays his bid so long as it exceeds the reservation
price.
The independent private value paradigm is a set of assumptions regarding
the characteristics of bidders; the paradigm is applied to either type of auc-
tion. In this paradigm, each of B bidders is assumed to have a private valu-
ation, vh i = 1,2,... ,B, for the item to be sold. Each bidder knows his or
her own private valuation but does not know the valuation of other bidders.
The bidders act as if the B valuations are i.i.d. drawings from a common val-
uation distribution H(v\q,p), with density h(v|q,p), where q is a vector of
covariates defining characteristics of the item to be sold and p is a param-
eter vector. Each bidder knows q,p, the functional form of H(v\q,p), and
the reservation price, r0. Also, each bidder is assumed to berisk-neutraland
the equilibrium concept is the symmetric Bayesian Nash equilibrium.
For the oral ascending auction, the winning bid, y, is
y = max[v{B-l:B),r0]I(vlB:B) > r0) if B > 2, (67)
y = r0I(Vl>r0) if 5 = 1, (68)

where i/(1:B) ^ • • • <, viB:B) are the order statistics of v, vB, and /(•) is
the zero-one indicator function. On the event v{B:B) < r0, the winning bid is
defined as zero and the item is unsold.
~L&-Poa{y\r0,B,q,p), or simplyPoa(y\x,p) withx= (ro,B,q), denote the
conditional probability density of the winning bid. Below, we write either
Poa(y\ro,B,q,p) or poa{y\x,p), depending on whether or not we wish to
emphasize dependence on each of the different components of x. In general,
Poa(yIx,p) is an ordinary density on the regiony > r0, so long as h(v\q,p)
is smooth, whereas Poa(y\x,p) has atoms at y = 0 and y = r0. In certain
circumstances — for example, h(v\q,p) is Pareto or Weibull as in Paarsch
(1991)—poa(y\x,p) has a manageable closed-form expression. In other
circumstances—for example, h(v\q,p) is lognormal as in Laffont et al.
(1991)—poa(y\x,p) admits no tractable expression. However, so long as it
is easy to simulate from h{v\q,p), then it is easy to simulate fromPoa(y\x,p).
For the sealed bid first price auction, the winning bid is
y = E[max{viB-l:B),r0)\viB:B)}I(vlB:B) > r0) ifB>2, (69)
y = r0l(vi>r0) ifS=l. (70)
WHICH MOMENTS TO-MATCH? 677

Thus, when there are two or more bidders and viB:B) > r0, then the winning bid
follows the distribution of the conditional expectation of max(viB-i-.B)>ro)
given viB.B). Letp sp (y\r 0 ,B,q,p), orp sp (y\x,p), denote the implied con-
ditional density of the winning bid in the sealed bid case.
Generally, psp(y\x,p) is less manageable in practice than is poa(y\x,p).
Generation of a simulated draw from psp(y\ix,p) entails either numerical
integration of the cumulative distribution of the valuation distribution or a
double-nested set of simulations.

A new estimation strategy for auction models. Suppose an econometri-


cian observes [pttX,]"^, where y, is the winning bid and x, = {rOt,B,,q,)
contains the reservation price, the number of bidders, and covariates for
each of n auctions. In what follows, we take the auctions to be oral ascend-
ing auctions and point out, where appropriate, how things differ for sealed
bid auctions. The econometrician assumes that the same valuation density,
h(v\q,p°), describes the bidder valuations for each auction. The analysis is
conditional (the x's are strictly exogenous); the econometrician assumes that
y, and ys are statistically independent for t =£ s, conditional on the sequence
[xl,X2,...,xn]. The task is to estimate the true underlying parameter vec-
tor, p°.
One estimation strategy is straight maximum likelihood. Under special dis-
tributional assumptions on the valuation density such as Weibull or Pareto,
the conditional density of the winning bid poa(,y\rQ,B,q,p) has a manage-
able closed form. Conventional maximum likelihood estimation can then be
undertaken. This is the strategy of Paarsch (1991).
Laffont et al. (1991) developed a simulated nonlinear least squares (SNLLS)
estimator that can handle a broader class of parent densities for the valua-
tion distribution. Their approach is to apply nonlinear least squares:

p = argmin £ [.?/- Moa^oo^/.&.p)]2 . (71)

where ixoa(f0t,B,,q,,p) = fypoa(y\r0l,Bt,ql,p)dy. In practice,


q,,p) is approximated via Monte Carlo integration:

fi,-i:B,).r0,), (72)

where vrAs,-i:S,) is the second highest order statistic of the rth independent
simulated realization of (yrI vrBi) i.i.d. from h(v\q,,p). In their moti-
vating examples and empirical applications, v is conditionally lognormal with
a mean that depends on q,, and p contains the parameters of this condi-
tional lognormal distribution. The SNLLS estimator is nonlinear least
squares with a heteroskedasticity-robust estimate of the asymptotic variance
of p that accounts for conditional heteroskedasticity of
e, = y,- Hoa{ro,,B,,qt,P°). (73)
678 A. RONALD GALLANT AND GEORGE TAUCHEN

Laffont et al. (1991) noted that revenue equivalence implies the same for-
mulation of the conditional mean function applies for a sealed bid auction.
Revenue equivalence implies

= jypoa(y\r0,B,q,p)dy

= Hsb(ro,B,q,p) (74)

for all rQ, B, q, and p. Hence, one can evaluate the conditional mean func-
tion at the data, that is, compute fiSb(rot>B,,qt,p), by simulating and aver-
aging exactly as one does under oral ascending rules. The result can be a
significant reduction in computational demands.
The SNLLS approach works off of the conditional first moment implica-
tions alone, though, and auction models place additional structure on the
data. An auction model has second moment implications as well as first
moment implications. In fact, it actually dictates the functional form of the
conditional heteroskedasticity in the nonlinear regression equation, which
suggests additional moment conditions. There are practical consequences
from not incorporating additional restrictions beyond first moment informa-
tion. Laffont et al. (1991) and Baldwin (1992) find it difficult to estimate the
variance of the underlying parent lognormal using SNLLS. Bringing in sec-
ond moment estimation can be expected to alleviate this difficulty. In gen-
eral, there are further implications beyond first and second moments as well;
imposition of all implications of the model can be expected to sharpen even
further the estimates of the parameter p.
Ideally, one wants to do this by doing maximum likelihood using either
Poa(.y\r0,B,q,p) orpsb(y\r0,B,q,p) as appropriate to define the likelihood.
The difficulty is that both densities are intractable, except in the special cir-
cumstances assumed by Paarsch (1991).
The approach outlined in Section 2 can come close to the maximum like-
lihood ideal. Our analysis pertains to the just-described situation where the
likelihood is smooth but intractable; it does not cover cases where the like-
lihood is nondifferentiable in parameters. The consistency of the estimator
pn is not affected by nondifferentiability but asymptotic normality may be.
See Hansen, Heaton, and Luttmer (1995, Appendix C) for a discussion of
differentiability considerations with respect to GMM estimators.
The approach would be applied to the auction data as follows. Bn is ob-
tained as

( i ) (75)
WHICH MOMENTS TO MATCH? 679

where f(y|x, B) is a score generator that gives a good approximation to the


conditional distribution of y given the exogenous variables. One choice for
f(y\x, 6) is a truncated Hermite expansion, or SNP model of Gallant and
Tauchen (1992), which has been found in practice to be sufficiently flexible
to approximate well a wide class of densities. Note that/(^|x, 8) does not
have to smoothly embed Poa(y\x,p), although, if it does, then the estima-
tor is equally efficient as maximum likelihood.
Our estimator is GMM using the score function of the 6 estimation to
define the moment conditions:

p = argmin m'n(pyBn)(3n)-xmn{p,Bn), (76)

where

mn(p,dn) = (l//i) S (1/W) S {d/dd)lnfl(yTl\xtJn) (77)

and where, for each t, (j>r,)£ii is a simulated realization of length N from


zither poa(y\r0t,B,,qnp) or psb(y\rOl, B,,q,,p), depending on whether
the data are from an oral ascending or sealed bid auction. Sampling from
Poa(y\ht,Bt,Qt,P) is relatively easy while sampling frompsb(y\rOl,B,,q,,p)
is more difficult. (The revenue equivalence property only simplifies the sam-
pling for the conditional first moment.)
The appropriate asymptotic theory for this estimator is Case 1, as the
entire analysis is conditional on the realization of the strictly exogenous pro-
cess [x,]. To the extent/(.y|*,0) provides a good approximation of the dis-
tribution of y given the exogenous variables, then this estimator will have
efficiency close to that of maximum likelihood.

REFERENCES

Andrews, D.W.K. (1991) Heteroskedasticity and autocorrelation consistent covariance matrix


estimation. Econometrica 59, 307-346.
Baldwin, L. (1992) Essays on Auctions and Procurement. Ph.D. Dissertation, Duke University.
Bansal, R., A.R. Gallant, R. Hussey, & G. Tauchen (1995) Nonparametric estimation of struc-
tural models for high-frequency currency market data. Journal of Econometrics 66, 251-287.
Bollerslev, T. (1986) Generalized autoregressive conditional heteroskedasticity. Journal of Econo-
metrics 31, 307-327.
Botlerslev, T. (1987) A conditionally heteroskedastic time series model of speculative prices and
rates of return. Review of Economics and Statistics 64, 542-547.
Bollerslev, T. & J.M. Wooldridge (1992) Quasi-maximum likelihood estimation and inference
in dynamic models with time-varying covariances. Econometric Reviews 11, 143-172.
Clark, P.K. (1973) A subordinated stochastic process model with finite variance for specula-
tive prices. Econometrica 41, 135-56.
Coleman, J. (1990) Solving the stochastic growth model by policy-function iteration. Journal
of Business and Economic Statistics 8, 27-30.
Duffle, D. & K.I. Singleton (1993) Simulated moments estimation of Markov models of asset
prices. Econometrica 61, 929-952.
680 A. RONALD GALLANT AND GEORGE TAUCHEN

Ellner, S., A.R. Gallant, & J. Theiler (1995) Detecting nonlinearity and chaos in epidemic data.
In D. Mollison (ed.), Epidemic Models: Their Structure and Relation to Data, pp. 54-78. Cam-
bridge: Cambridge University Press.
Engle, R.F. (1982) Autoregressive conditional heteroskedasticity with estimates of the variance
of United Kingdom inflation. Econometrica 50, 987-1007.
Engle, R.F. (1994) Indirect Inference on Volatility Diffusions and Stochastic Volatility Models.
Manuscript, University of California at San Diego.
Engle, R.F. & T. Bollerslev (1986) Modeling the persistence of conditional variance. Economet-
ric Reviews 5, 1-50.
Fenton, V. & A.R. Gallant (1996) Convergence rates of SNP density estimators. Econometrica
64, 719-727.
Gallant, A.R. (1977) Three stage least squares estimation for a system of simultaneous, non-
linear, implicit equations. Journal of Econometrics 5, 71-88.
Gallant, A.R. (1987) Nonlinear Statistical Models. New York: Wiley.
Gallant, A.R., D.A. Hsieh, & G. Tauchen (1994) Estimation of stochastic volatility models with
diagnostics. Manuscript, Duke University.
Gallant, A.R. & D.W. Nychka (1987) Semi-nonparametric maximum likelihood estimation. Econ-
ometrica 55, 363-390.
Gallant, A.R., P.E. Rossi, & G. Tauchen (1992) Stock prices and volume. Review of Financial
Studies 5, 199-242.
Gallant, A.R. & G. Tauchen (1989) Seminonparametric estimation of conditionally constrained
heterogeneous processes: Asset pricing applications. Econometrica 57, 1091-1120.
Gallant, A.R. & G. Tauchen (1992) A nonparametric approach to nonlinear time series analysis:
Estimation and simulation. In E. Parzen, D. Brillinger, M. Rosenblatt, M. Taqqu, J. Geweke,
& P. Caines (eds.), New Dimensions in Time Series Analysis, pp. 71-92. New York:
Springer-Verlag.
Gallant, A.R. & H. White (1992) On learning the derivatives of an unknown mapping with multi-
layer feedforward networks. Neural Networks 5, 129-138.
Gennotte, G. & T.A. Marsh (1993) Variations in economic uncertainty and risk premiums on
capital assets. European Economic Review 37, 1021-1041.
Ghysels, E. & J. J. Jasiak (1994) Stochastic volatility and time deformation: An application to
trading volume and leverage effects. Manuscript, University of Montreal.
Gourie'roux, C , A. Monfort, & E. Renault (1993) Indirect inference. Journal of Applied Econo-
metrics 8, S85-S118.
Hansen, L.P. (1982) Large sample properties of generalized method of moments estimators.
Econometrica 50, 1029-1054.
Hansen, L.P., J. Heaton, & E.J.G. Luttmer (1995) Econometric evaluation of asset pricing mod-
els. The Review of Financial Studies 8, 237-274.
Harvey, A.C. (1993) Long Memory in Stochastic Volatility. Manuscript, London School of
Economics.
Harvey, A . C , E. Ruiz, & N. Shephard (1993) Multivariate stochastic variance models. Review
of Economic Studies 61, 247-264.
Harvey, A.C. & N. Shephard (1993) Estimation of an Asymmetric Stochastic Volatility Model
for Asset Returns. Manuscript, London School of Economics.
Jacquier, E., N.G. Poison, & P.E. Rossi (1994) Bayesian analysis of stochastic volatility mod-
els. Journal of Business and Economic Statistics 12, 371-388.
Laffont, J.-J., H. Ossard, & Q. Vuong (1991) The Econometrics of First-Price Auctions. Doc-
ument de Travail 7, Institut d'Economie Industrielle, Toulouse.
McCaffrey, D.F. & A.R. Gallant (1994) Convergence rates for single hidden layer feedforward
networks. Neural Networks 7, 147-158.
McFadden, D. (1989) A method of simulated moments for estimation of discrete response mod-
els without numerical integration. Econometrica 57, 995-1026.
WHICH MOMENTS TO MATCH? 681

Melino, A. & S.M. Turnbull (1990) Pricing foreign currency options with stochastic volatility.
Journal of Econometrics 45, 239-266.
Milgrom, P. (1986) Auction theory. In T. Bewley (ed.), Advances in Economic Theory, pp. 1-32.
Cambridge: Cambridge University Press.
Nelson, D. (1991) Conditional heteroskedasticity in asset returns: A new approach. Economet-
rica 59, 347-370.
Paarsch, H.J. (1991) Empirical Models of Auctions and an Application to British Columbian
Timber Sales. Discussion paper 91-19, University of British Columbia.
Pakes, A. & D. Pollard (1989) Simulation and the asymptotics of optimization estimators. Econ-
ometrica 57, 1027-1058.
Smith, A.A. (1993) Estimating nonlinear time series models using vector autoregressions: Two
approaches. Journal of Applied Econometrics 8, 63-84.
Tauchen, G. (1990) Associate editor's introduction. Journal of Business and Economic Statis-
tics 8,1-1.
Tauchen, G. & M. Pitts (1983) The price variability-volume relationship on speculative mar-
kets. Econometrica 51, 485-505.

You might also like