SPIV Abgabe Op
SPIV Abgabe Op
net/publication/265505870
CITATIONS READS
120 1,038
1 author:
Matthias R. Fengler
University of St.Gallen
62 PUBLICATIONS 1,013 CITATIONS
SEE PROFILE
All content following this page was uploaded by Matthias R. Fengler on 19 November 2019.
Semiparametric Modeling of
Implied Volatility
SPIN Springer’s internal project number, if known
– Monograph –
Springer
Berlin Heidelberg NewYork
Hong Kong London
Milan Paris Tokyo
Le Monde Instable
Corrozet (1543)
quoted from Henkel and Schöne (1996)
Acknowledgements
This book has benefitted a lot from suggestions and comments of colleagues,
fellow students and friends whom I wish to thank at this place. At first rate,
I thank Wolfgang Härdle. He directed my interest to implied volatilities and
made me familiar with non- and semiparametric modeling in Finance. Without
him, his encouragement and advise this work would not exist. Furthermore,
I like to thank Vladimir Spokoiny, in particular for his comments during my
talks in the Seminar for Mathematical Statistics at the WIAS, Berlin.
This work is in close context with essays I have written with a number of
coauthors. Above all, I thank Enno Mammen: the cooperation in semipara-
metric modeling has been highly instructive and fruitful for me. In this regard,
I also thank Qihua Wang.
For an unknown number of helpful discussions or proofreading my thanks
go to Peter Bank, Michal Benko, Szymon Borak, Kai Detlefsen, Erhard and
Martin Fengler, Patrick Herbst, Zdeněk Hlávka, Torsten Kleinow, Danilo Mer-
curio and Marlene Müller and to all contemporary and former members of the
ISE and CASE for the inspiring working environment they generated there.
Finally, I wish to thank the members of my family non explicitly mentioned
up to now, Stephanus and especially my mother Brigitte Fengler and Georgia
Mavrodi who in their ways did all their best to support me and the project
at its different stages.
I gratefully acknowledge financial support by the Deutsche Forschungs-
gemeinschaft in having been a member of the Sonderforschungsbereich 373
Quantifikation und Simulation ökonomischer Prozesse at the Humboldt-
Universität zu Berlin.
ATM at-the-money
BS Black and Scholes (1973)
cdf cumulative distribution function
Ct price of a call option at time t
CtBS Black-Scholes price of a call option at time t
C(A) the continuous functions f : A → R
C k (A) functions in C(A) with continuous derivatives
up to order k
C k,l (R × R) the functions f : R × R → R
which are C k w.r.t. the first and C l w.r.t.
the second argument
Cov(X, Y ) covariance of two random variables X and Y
CPC(A) common principal component (analysis)
δ dividend yield
δx0 Dirac
R delta function defined by the property:
f (x) δx0 (x) dx = f (x0 ) for a smooth
function f
E(X) expected value of the random variable X
Ft forward or futures price of an asset at time t
Ft filtration, the information set generated
by the information available up to time t
Ip p × p unity matrix
IV implied volatility
IVS implied volatility surface
ITM in-the-money
1(A) indicator function of the set A
K exercise price
X
def
= is defined as
∼ if X ∼ D, the random variable X has the distribution D
L
−→ converges in distribution to
p
−→ converges in probability to
def
(X)+ (X)+ = max(X, 0)
hXit quadratic variation process of the stochastic process X
hX, Y it covariation process of the stochastic processes X and Y
|x| absolute value of the scalar x
|X| determinant of the matrix X
X> transpose of the matrix X
tr X trace of the matrix X
hf, gi inner product of the functions f and g
XI
dSt
= µ(St , t) dt + σ(St , t, ·) dWt .
St
—— volatility ——
σ(St , t, ·) σ
bt (K, T ) σK,T (St , t)
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
1
Introduction
volatility for finding an option price, one aims at recovering that volatility
which the market has priced into a given option price observation. To put it
in other words, the question is:
0.50
0.44
0.38
0.32
0.26
0.56 0.63
0.71 0.51
0.87 0.40
0.28
1.02 0.16
1.18
Fig. 1.1. DAX option IVs on 20000502. IV observations are displayed as black dots.
Lower left axis is moneyness and lower right time to maturity measured in years.
artefact and curiosity – bears valuable information on the asset price pro-
cess and its dynamics, and that this information can be exploited in models
for the pricing and the hedging of other complex derivatives or positions.
This development goes in line with the advent of highly liquid option and fu-
tures markets that were established all around the world beginning from the
nineteen-nineties. Before this, model calibration and pricing typically relied
on historically sampled time series data. This bears the disadvantage that
the results are predominantly determined by the price history and that the
adjustment to new information is too slow. Unlike time series data, the cross-
sectional dimension of option prices across different strikes over a range of time
to maturities offers the unique opportunity to directly exploit instantaneous
data for model calibration.
This breakthrough, initiated by the work of Derman and Kani (1994a),
Dupire (1994) and Rubinstein (1994), triggered the literature on smile consis-
tent pricing. It led, for instance, to the development of static option replication
as a means of hedging or to implied trees as a pricing tool. The challenge for
this new approach is that IV cannot be directly used as an input factor, since
– as shall be seen in the course of this book – IV is a global measure of volatil-
ity. Pricing requires a local measure of volatility. Hence, at the heart of this
theory there is another volatility concept, called local volatility. Local volatil-
4 1 Introduction
t ↑ T : spatial harmonic
mean of volatility (3.46)
local variance determ. - implied variance
2
σK,T (St , t) no strike dependence bt2 (K, T )
σ
or far OTM/ITM
KA KA arithmetic mean (2.78) and (3.47)
A AA
A
A E(K,T ) {σ 2 (S , T, ·)|F } Qλ1
√ 2
2
A T t {E ( σ |Ft )}
A Section 3.8 K ≈ Ft , see (2.93)
K = St , T = t A K = Ft , t ↑ T
see (3.4) A A see (3.124)
A A
A
U A
instantaneous variance
σ 2 (St , t, ·)
Fig. 1.2. Overview on the volatility concepts important to this work. Solid lines
denote exact concepts about how the different types of volatility are linked. The dot-
ted line represents an ad-hoc relationship. The arrows denote the direction of the
relation. The term volatility is reserved for objects of the kind σ and σb, while their
squared counterparts σ 2 and σb2 are called variance.
are provided, e.g., by Efromovich (1999), Härdle (1990), Härdle et al. (2004),
Horowitz (1998), Pagan and Ullah (1999), and Ramsay and Silverman (1997).
Local volatility models or their stochastic ramifications are not the only
way to price derivatives. Of same significance are approaches relying on
stochastic volatility specifications and on Lévy processes. Indeed, the cur-
rent literature on derivatives pricing may be divided into two main camps:
the partisans of local volatility models who prefer them, because local volatil-
ity models produce an almost excellent fit to the observed option data; and
those who criticize local volatility models principally for predicting the wrong
6 1 Introduction
smile dynamics. It is this second camp that favors stochastic volatility spec-
ifications and Lévy models. In this book, we enter the particulars of this
debate, but topics like stochastic volatility and Lévy models are only briefly
touched. In doing so, we do not intend to argue that these competing model-
ing approaches are not justified: they certainly are, and there are very good
arguments in favor of them. Rather it is our intention to bring together this
important strand of literature and to discuss advantages and potential draw-
backs. The pricing of derivatives in stochastic volatility models can be found
in the excellent textbooks by Fouque et al. (2000) and Lewis (2000), and an
outstanding treatment of jump diffusions is provided in Cont and Tankov
(2004), or in Schoutens (2003).
smoothes the IVS in the space of option prices and avoids the potentially
undesirable two-step procedure of previous estimators: traditionally, in the
first step, implied volatilities are derived. In the second step the actual fitting
algorithm is applied. A two-step estimator may be less biased, when option
prices or other input parameters can be observed with errors, only.
The probably biggest challenge in IVS modeling is dimension reduction.
This is the topic of Chapter 5, which is divided into two major parts. The
first part, focusses on linear transformations of the IVS. A standard approach
in statistics is to apply principal component analysis. In principal component
analysis the high-dimensional variables are projected into a lower dimensional
space such that as little information as possible is lost. However, this approach
is not directly applicable to the IVS due to the surface structure. Hence, we
use the common principal component models that we find to allow for a parsi-
monious, yet flexible model choice. A concern of applying the principal com-
ponent transformation is stability across time. We derive and apply stability
tests across different annual samples. The first part concludes by modeling
the resulting factors via standard GARCH time series techniques.
The second part of Chapter 5 is devoted to nonlinear transformations via
functional principal component techniques. We first outline the functional
principal component framework. Then we propose a semiparametric factor
model for the IVS. The semiparametric factor model provides a number of
advantages compared with other methods: first, surface estimation and di-
mension reduction can be achieved in one single step. Second, it estimates in
the local neighborhood of the design points of the surface, only. With regard
to Figure 1.1 this means that we estimate only in the local vicinity of the black
dots. This will avoid model biases. Third, the technique delivers a small set of
functions and factor loadings that span the propagation of the IVS through
space and time. We provide another time series analysis of these factors based
on vector autoregressive models and perform a horse race which compares the
model against a simpler practitioners’ model.
Chapter 6 concludes and gives directions to future research.
2
The option pricing model developed by Black and Scholes (1973) and further
extended by Merton (1973) is a landmark in financial theory. It laid the foun-
dations of preference-free valuation of contingent claims. Despite its rather
restrictive assumptions and the large number of refinements to the model
available today, it remains an important benchmark and cornerstone of finan-
cial model building. Here, we give a short review of the BS model and present
the fundamental results necessary for the further development of this work.
For a more detailed account, we refer to textbooks in Finance, such as Musiela
and Rutkowski (1997) or Karatzas (1997).
We consider a continuous-time economy with a trading interval [0, T ∗ ],
where T ∗ > 0. It is assumed that trading can take place continuously, that
there are no differences between lending and borrowing rates, no taxes and
short-sale constraints.
Let (Ω, F, P) be a probability space, and (Wt )0≤t≤T ∗ a Brownian motion
(see appendix Chapter B for a definition of the Brownian motion) defined on
this space. P is the objective probability measure. Information in the economy
is revealed by a filtration (Ft )0≤t≤T ∗ , which is the P-augmentation of the
natural filtration
FtW = σ Ws , 0 ≤ s ≤ t , 0 ≤ t ≤ T ∗ .
(2.1)
The filtration is assumed to satisfy the ‘usual’ conditions, namely that it is
right-continuous, and that F0 contains all null sets.
The asset price (St )0≤t≤T ∗ , which pays a constant dividend yield δ, is mod-
elled by a geometric Brownian motion adapted to (Ft )0≤t≤T ∗ . The evolution
of the asset is given by the stochastic differential equation (SDE):
dSt
= µ dt + σ dWt , (2.2)
St
10 2 The implied volatility surface
where µ denotes the (constant) instantaneous drift and σ the (constant) in-
stantaneous (or spot) volatility function. The quantity σ 2 measures the instan-
taneous variance of the return process of ln St . Thus, instantaneous volatility
σ can be interpreted as the (local) measure of the risk incurred when investing
one monetary unit into the risky asset, Frey (1996).
The solution to the SDE (2.2) is given by
1 2
St = S0 exp µ − σ t + σWt , ∀t ∈ [0, T ∗ ] , (2.3)
2
where S0 > 0. This is seen from applying the Itô formula, given in (B.10),
to (2.3). Since (2.3) is a functional of the Brownian Motion Wt , it is a strong
solution; for the precise conditions, conditions guaranteeing uniqueness and
existence of a solution to (2.2), see in appendix Chapter B.
The economy is endowed with a savings account or riskless bond with con-
stant interest rate r, which is described by the ordinary differential equation:
ψ(ST ) = (K − ST )+ . (2.6)
These simple derivatives are also called plain vanilla options. They are nowa-
days tradable as standardized contracts on almost any futures exchange mar-
ket around the world.
In order to receive a payoff such as (2.5) and (2.6), the investor must pay
an option price, or option premium, to a counterparty when the contract is
entered. The investor is also said to be long in the option, while the counter-
party has a short position. The counterparty is obliged to deliver the payoff
according to the prespecified conditions. In any case, also when the option
2.2 The self-financing replication strategy 11
expires worthless, the short position earns the option premium paid initially
by the long side. Option theory deals with finding this option premium, i.e.
it is about the valuation, or the pricing of contingent claims.
There are two important methodologies for deriving the prices of contin-
gent claims: first, a replication strategy based on a self-financing portfolio that
provides the same terminal payoff as the derivative. By no-arbitrage consider-
ations, the capital necessary for setting up this portfolio must equal the price
of the derivative. Second, there is a probabilistic approach which computes the
derivative price as the discounted expectation of the payoff under an equiva-
lent martingale measure (so called risk neutral measure). Both strategies will
be sketched in the following.
since the stock pays a dividend δSt dt within the small interval dt. Self-
financing means that gains and losses in the portfolio are entirely due to
changes in the stock and the bond.
It should be remarked that the self-financing property is not sufficient
to exclude arbitrage opportunities. Additionally it is required that the value
process (Vt )0≤t≤T has a finite lower bound: it is called to be tame, Karatzas
(1997).
The price of a contingent claim is a function denoted by H(St , t). It shall be
assumed that H ∈ C 2,1 R+ × (0, T ) , i.e. it is contained in the set of functions
which are twice in their first and once in their second argument continuously
12 2 The implied volatility surface
differentiable. The portfolio replicates the contingent claim if for some pair
(at )0≤t≤T and (bt )0≤t≤T :
∂H ∂H 1 ∂2H
dH(St , t) = dt + dSt + dhSit
∂t ∂S 2 ∂S 2
∂2H
∂H ∂H 1 ∂H
= + µSt + σ 2 St2 dt + σSt dWt . (2.10)
∂t ∂S 2 ∂S 2 ∂S
∂H
at = . (2.11)
∂S
From the replication condition (2.9), the trading strategy in the bond is
obtained as
−rt ∂H
bt = e H(St , t) − St . (2.12)
∂S
∂H ∂H 1 ∂2H
0= + (r − δ)S + σ2 S 2 − rH . (2.13)
∂t ∂S 2 ∂S 2
Thus, the price of any European option has to satisfy this partial differen-
tial equation (henceforth: BS PDE) with the appropriate boundary condition
H(ST , T ) = ψ(ST ). The solution to (2.13) is the value of the replicating port-
folio. In fact, for any payoff function ψ(x) continuous on R, for which the
R +∞ 2
condition −∞ e−αx |ψ(x)|dx < ∞ holds for some α > 0, derivative prices
can be found by solving (2.13), Musiela and Rutkowski (1997).
The remarkable feature of this result is that pricing the derivative within
this model is independent of the appreciation rate µ. Thus market participants
may have a different idea about the appreciation rate of the stock, they will
agree on the derivative price as long as they agree on the other parameters
in the model. This result is closely related to the uniqueness of a risk neutral
measure to be introduced next.
2.3 Risk neutral pricing 13
The idea about risk neutral pricing is to introduce a new probability mea-
def
sure Q such that the discounted value process Vet = e−rt Vt of the replicating
portfolio is a martingale under Q, i.e. it satisfies:
where
def µ+δ−r
λ = . (2.17)
σ
where
def
W t = Wt + λt , ∀t ∈ [0, T ∗ ] , (2.19)
is a Brownian motion on the space (Ω, F, Q). The object λ is called market
price of risk, since it measures the excess return µ + δ − r per unit of risk
borne by the investor. The term vanishes under Q, whence the name risk
neutral pricing. The risk neutral measure is unique if and only if the market
14 2 The implied volatility surface
where the second step follows from (2.8) and (2.9) rewritten in terms of Ŝt .
The BS pricing formula for a plain vanilla call is found by computing
The price C(St , t) of a plain vanilla call is the solution to the PDE (2.13)
with the boundary condition C(ST , T ) = (ST − K)+ . The explicit solution is
known as the Black and Scholes (1973) formula for calls:
where
ln(St /K) + (r − δ + 12 σ 2 )τ
d1 = √ , (2.24)
σ τ
√
d2 = d1 − σ τ , (2.25)
def Ru
and where Φ(u) = −∞
ϕ(x) dx is the cdf of the standard normal distribution,
def 2
whose pdf is given by ϕ(x) = √12π e−x /2 for x ∈ R. St denotes the asset price
at time t. K is the strike or exercise price (the notation is not to be confused
with the kernel functions denoted by K(·) in Chapter 4). The expiry date of
2.4 The BS formula and the greeks 15
def
the option is T , and τ = T −t denotes its time to maturity. As in (2.2), σ is the
constant volatility function. The riskless interest rate is denoted by r, and the
constant dividend yield by δ. It is easy to check using the relevant derivatives
given in (2.28) to (2.37) that the BS price satisfies the BS PDE (2.13).
To clarify notation: we will rarely enumerate all parameters of an option
pricing function C(St , t, K, T, σ, r, δ) explicitly. Rather we limit the enumera-
tion to those parameters that are important for the exposition in the certain
context. Sometimes we find it convenient to simply denote the time depen-
dence as a t-subscript: Ct .
The price of a put option P (St , t, K, T, σ, r, δ) on the same asset with
same expiry and same strike price, which has the payoff function ψ(ST ) =
(K − ST )+ , can be obtained from the put-call parity:
This is a model-free relationship that follows from the trivial fact that ST −
K = (ST − K)+ − (K − ST )+ . The BS put price is found to be:
Delta
1.00
0.80
0.60
0.40
0.20
50.00
70.00 1.00
90.00 0.81
110.00 0.62
0.43
130.00 0.24
SCMdelta.xpl
Fig. 2.1. Call delta (2.28) as a function of asset prices (left axes) and time to
maturity (right axes) for K = 100.
negative.Theta measures the sensitivity of the option to time decay, and rho
is the sensitivity with respect to interest rate changes.
∂ 2 Ct e−δτ ϕ(d1 )
gamma = √ (2.29)
∂S 2 St σ τ
∂Ct √
vega = e−δτ St τ ϕ(d1 ) (2.30)
∂σ
2.4 The BS formula and the greeks 17
Gamma
0.06
0.05
0.04
0.02
0.01
50.00
70.00 1.00
90.00 0.81
110.00 0.62
0.43
130.00 0.24
SCMgamma.xpl
Fig. 2.2. Gamma (2.29) as a function of asset prices (left axes) and time to maturity
(right axes) for K = 100.
∂ 2 Ct √ d1 d2
volga = e−δτ St τ ϕ(d1 ) (2.31)
∂σ∂σ σ
∂ 2 Ct d2
vanna = −e−δτ ϕ(d1 ) (2.32)
∂σ∂S σ
∂Ct
= −e−rτ Φ(d2 ) (2.33)
∂K
Vega
39.89
31.91
23.94
15.96
7.98
50.00
70.00 1.00
90.00 0.81
110.00 0.62
0.43
130.00 0.24
SCMvega.xpl
Fig. 2.3. Vega (2.30) as a function of asset prices (left axes) and time to maturity
(right axes) for K = 100.
∂ 2 Ct e−δτ St d1 ϕ(d1 )
= (2.35)
∂σ∂K σK
∂Ct ∂Ct
theta = −
∂t ∂T
e−δτ St σϕ(d1 )
=− √
2 τ
+ δe−δτ St Φ(d1 ) − re−rτ KΦ(d2 ) (2.36)
∂Ct
rho = τ e−rτ KΦ(d2 ) (2.37)
∂r
2.4 The BS formula and the greeks 19
Volga
138.33
110.66
83.00
55.33
27.67
50.00
70.00 1.00
90.00 0.81
110.00 0.62
0.43
130.00 0.24
SCMvolga.xpl
Fig. 2.4. Volga (2.31) as a function of asset prices (left axes) and time to maturity
(right axes) for K = 100.
def ∂ 2 Ct (K, T )
φ(K, T |St , t) = er(T −t) . (2.38)
∂K 2
ϕ(d2 )
φ(K, T |St , t) = √ , (2.40)
σ τK
which is a log-normal pdf in K. Of course, this is just another way to see that
1
ln ST ∼ N ln St + r − δ − σ 2 τ, σ 2 τ , (2.41)
2
as was explained earlier.
The second derivative with respect to the strike, however, is useful to
recover the transition probability also in more general contexts than the BS
model: result (2.38) – first shown by Breeden and Litzenberger (1978) – hinges
on the particular form of the call payoff function ψ(ST ) = (ST −K)+ , only, and
is thus applicable in more general circumstances, irrespective of the particular
distributional assumptions on the underlying asset price process. It derives its
importance from the results of Section 2.3: if one knew this density, either
by believing in the BS model or by obtaining an empirical estimate of it,
any path independent contingent claim could be priced by simply integrating
the payoff function over this density. The state price density is also useful
for trading strategies, which try to exploit systematic deviances between the
risk neutral and the historical properties of the underlying stock price time
series, Aı̈t-Sahalia et al. (2001b) and Blaskowitz et al. (2004). And it shall play
an important role to derive local volatility, see Chapter 3 and in particular
Section 3.3.
For this reason, the statistical literature has developed a whole battery of
methods for estimating state price densities from observed option prices, see
e.g. Jackwerth (1999) or Weinberg (2001) for reviews. The extraction of the
state price density can be achieved for instance via parametric specifications
of the density, or – in a discrete way – via implied trees, Section 3.10.1. This
is more deeply discussed in Härdle and Zheng (2002). Recent advances by
Härdle and Yatchew (2003) and Hlávka (2003) allow to estimate the state
price density via non- and semiparametric procedures.
Finally, there is an identity which is useful for a lot of manipulations of
the BS formula, see for instance Equation (2.34):
It is obvious that the BS formula is derived under assumptions that are un-
likely to be met in practice: frictionless markets, the ability to hedge continu-
ously without transaction costs, asset prices without jumps, but independent
Gaussian increments, and last but not least, a constant volatility function.
Due to the simplicity of the model, any deviation from these assumptions is
empirically summarized in one single parameter or object: the IV smile and
the IVS.
The only unknown parameter in the BS pricing formula (2.23) is the
volatility. Given observed market prices C ft , it is therefore natural to define an
implicit or implied volatility (IV), first introduced by Latané and Rendelman
(1976):
σ
b: C BS (St , t, K, T, σ
b) − C
ft = 0 . (2.43)
b : (t, K, T ) → σ
σ bt (K, T ) . (2.44)
where Ft = e(r−δ)(T −t) St denotes the (fair) futures or forward price at time
t, Hull (2002). A stock price moneyness can be defined by:
22 2 The implied volatility surface
def
κ = K/St . (2.46)
Forward moneyness is a natural choice of the moneyness scale, when one works
with European style option data. European options can only be exercised at
expiry. From this point of view, one incorporates the risk neutral drift in the
moneyness measure, which is taken into account by dividing by the futures
price.
We say that an option is at-the-money (ATM) when κ ≈ 1. A call option
is called out-of-the-money, OTM, (in-the-money, ITM), if κ > 1 (κ < 1) with
the reverse applying to puts. Sometimes the literature also works in units
of log-moneyness: ln(K/Ft ) or ln(K/St ). Given a quantity in one moneyness
definition, it is often not difficult to switch between the different scales.
A typical picture of the IV smile is presented in Figures 2.5 and 2.7. IV
observations appear as black dots. The IV data, which are the basis for all
empirical parts of this study, are obtained from prices of DAX index options
traded at the EUREX in Frankfurt am Main. The original raw data was
provided from the Deutsche Börse AG, Frankfurt. It has undergone consider-
able refinement and is stored in the financial data base MD*base located at
the Center for Applied Statistics and Economics (CASE) at the Humboldt-
Universität zu Berlin. A detailed description of the data and the preparation
scheme is given in Appendix A. The option data is contract based data: each
price observation belongs to actual trades, i.e. we do not work with price quo-
tations or settlement data. Due to the nature of transaction based data, the
data set may contain noise, potential misprints and other errors. This is also
seen in Figures 2.5 and 2.7 with the two single observations traded at an IV
of 21% in the lower left of the smile function.
Figure 2.5 shows a downward-sloping smile across strikes for the 45 days
to expiry contract as observed on 20000502. Obviously, OTM puts and ITM
calls are traded at higher prices than the corresponding ATM options. Since
the contracts are highly standardized on organized markets, IV observations
are only available for a small subset of strikes. Consequently, observations are
concentrated at these strikes.
In the lower panel of Figure 2.5, we added the intraday movements of the
futures contract (expiry June 2000). Given the observation that the futures
contract gains approximately 1% during the day, one may ask whether the
dispersion of IV observations for a fixed strike is due to intraday movements
of IV. In the top panel of Figure 2.6, we present the intraday movements of
IV at the fixed strikes 6400, 7000 and 7500. No particular directional moves
of IV are evident. Rather – this is especially pronounced for the 6400-strike
contract – IV jumps up and down between two distinct levels: this is the
bid-ask bounce. During the day, the bid-ask spread seems to widen beginning
from 3:00 p.m. Note that this coincides with a strong increase of the futures
price in this part of the day. This contract, which is already in the OTM put
region, is floating further away from the ATM region. The other contracts,
2.5 The IV smile 23
0.4
0.35
0.3
0.25
0.2
0.15
9 10 11 12 13 14 15 16 17 18
time in hours
SCMivanalysis.xpl
Fig. 2.5. Top panel: DAX option IV smile for 45 days to expiry on 20000502 plotted
per strike. IV observations are displayed as black dots. Bottom panel: DAX futures
contract, June 2000 contract, between 8:00 a.m. and 5:30 p.m. on 20000502.
24 2 The implied volatility surface
0.4
0.35
0.3
0.25
0.2
10 12 14 16 18
time in hours
10 12 14 16 18
time in hours
SCMivanalysis.xpl
Fig. 2.6. Top panel: IV smile for 45 days to expiry plotted per strikes 6400 (blue,
top), 7000 (black, middle), and 7500 (cyan, bottom), between 8:00 a.m. and 5:30 p.m.
on 20000502. Bottom panel: same IV smile plotted per (forward) moneyness 0.85
(blue, top), 1.00 (black, middle), and 1.05 (cyan, bottom).
2.5 The IV smile 25
0.4
0.35
0.3
0.25
0.2
0.15
SCMsmile.xpl
Fig. 2.7. Upper panel: IV smile for 45 days to expiry on 20000502. IV observations
are displayed as black dots; the smile estimate is obtained from a local linear estima-
tor with localized bandwidths. Lower panel: first order derivative obtained from a local
linear estimator with localized bandwidths (solid line). No-arbitrage bounds (2.53) on
the smile (dashed).
26 2 The implied volatility surface
closer to ATM, exhibit a much less pronounced jump behavior due to the
bid-ask prices of the options.
In Figure 2.7, the very same IV data are plotted against (forward) money-
ness as defined in (2.45). As a proxy, we divide the strike of each option by
the futures price which is closest in time within an interval of five minutes.
It should be remarked that due to the daily settlement of futures contracts,
futures prices and forward prices are not equal, when interest rates are stochas-
tic. However, for the time to maturities we consider throughout this work (up
to half a year), we believe this difference to be negligible, see Hull (2002, p. 51-
52) for a more detailed discussion and further references to this topic. As is
seen in Figure 2.7, the overall shape of the smile function is not altered, but
the data appear smeared across moneyness, which is due to the intraday fluc-
tuations of the futures price. The lower panel of Figure 2.6 gives the intraday
movements for fixed moneyness in the neighborhood of κf = 0.85, 1.00, 1.10
(maximum distance is ± 0.02). It is seen that the turnover for ITM put (OTM
call) options is very thin compared with OTM puts. Most trading activity is
taking place ATM.
Comparing Figure 2.5 with Figure 2.7 exhibits a nice feature of the money-
ness data. Plotting the smile against moneyness not only makes the smile
independent from large moves in the underlying asset in the view of months
and years. To some extent, it acts as a ‘smoothing’ device. This facilitates the
aggregation of intraday data to daily samples as we do throughout this work.
Especially, from the perspective of curve estimation, which is the topic of
Chapters 4 and 5, moneyness data are better tractable and more convenient.
Finally, let us present the entire IVS in Figure 2.8. The IV smiles appear
as black rows, which we shall call strings. The strings belong to different
maturities of the option contracts. Similarly to the discrete set of strikes, only
a very small number of maturities, here five, are actively traded at the same
time. Also, one can discern that not all maturity strings are of comparable
size: the third one is much shorter than the others. Obviously the IVS has a
degenerated design. This poses several challenges to the modeling task which
will be addressed in Section 5.4.
As a general pattern, it is seen from Figure 2.8 that the smile curve flattens
out with longer time to maturity. The lower panel shows the term structure
for various slices in the IVS for moneyness κf = 0.75 top line, κf = 1, i.e.
ATM, middle line, κf = 1.1, bottom line. There is a slightly increasing slope
for ATM IV and OTM call (ITM put) IV, while OTM put (ITM call) IV
displays a decreasing term structure. This is due to the more shallow smiles
for the long-term maturities.
The most fundamental conclusion of this section is that OTM puts and
ITM calls are traded at higher prices than the corresponding ATM options.
Obviously, the BS model does not properly capture the probability of large
downward movements of the underlying asset price. To arrive at an explana-
2.6 Static properties of the smile function 27
tion, one needs to relax the assumptions of the BS model. This literature is
summarized in Section 2.11.
The pivotal question following this conclusion is: what does the IVS imply
for practice? Two points shall be raised here:
The good, or perhaps lucky news is that the ambiguity of the model is
sublimated in one single entity, the IV. This allows traders to think themselves
as making a market for volatility rather than for specific equity contracts:
hence, it is common practice to quote options in terms of IV. The BS formula
is only employed as a simple and convenient mapping to assign to each option
on the same underlying a strike-dependent (and a maturity-dependent) IV.
For this purpose it is not necessary to believe in the BS model. It simply acts
as a computational tool insuring a common language among traders.
The bad news is that for each K and each T across the IVS a different
BS model applies. This causes difficulties in managing option books, as shall
be discussed in Section 2.9. The reason is that for hedging purposes it may
not be a good idea to evaluate the delta of the option using its own ‘quoted
IV’. Also for pricing exotic options, the presence of the IVS poses challenges,
especially for volatility-sensitive exotics, such as barrier options. This issue is
addressed in Chapter 3.
Often, IV is interpreted as the market’s expectation of average volatility
through the life time of the option. At first glance this notion seems sensible,
since if the market has a consensus about future volatility, it will be reflected
in IV. Unfortunately, from a theoretical point of view, this notion can only
be validated for a very limited class of models, as shall be demonstrated
in Section 2.8. Furthermore, option markets are also driven by supply and
demand. If market participants seek for some reason protection against a
down-swing in the market, this will drive up put prices. Eventually, since the
put price (as the call price) is positively monotonous in volatility, this will
be reflected in higher IVS levels for OTM puts. Thus, this notion should be
treated with caution.
From the general fact that (European) call prices are monotonically decreas-
ing and puts are monotonically increasing functions of strike prices, com-
pare (2.33), it is possible to obtain broad no-arbitrage bounds on the slope of
the smile, Lee (2002). If K1 < K2 for any expiry date T , we have
Ct (K1 , T ) ≥ Ct (K2 , T ) , Pt (K1 , T ) ≤ Pt (K2 , T ) . (2.47)
Pt (K1 , T ) Pt (K2 , T )
Ct (K1 , T ) ≥ Ct (K2 , T ) , ≤ . (2.48)
K1 K2
∂b
σ P BS /K − ∂PtBS /∂K
≥ t . (2.51)
∂K ∂PtBS /∂b
σ
Finally, insert the analytical expressions of the option derivatives and the
put price and make use of relationship (2.42). This shows:
Φ(−d1 ) ∂b
σ Φ(d2 )
−√ ≤ ≤√ . (2.52)
τ Kϕ(d1 ) ∂K τ Kϕ(d2 )
Φ(−d1 ) ∂b
σ Φ(d2 )
−√ ≤ ≤√ , (2.53)
τ κf ϕ(d1 ) ∂κf τ κf ϕ(d2 )
since ∂b σ /∂κf Ft −1 .
σ /∂K = ∂b
The bounds are displayed in the lower panel of Figure 2.7 together with
the estimated first order derivative. It is seen that the bounds are very broad
given the estimated slope of the smile
√ function. Without the refinement (2.48),
the lower bound is −{1 − Φ(d2 )}/ τ κf ϕ(d2 ), only.
Lee (2003) derived remarkable results for the large and small strike behavior
def
of the smile function. Define x = ln κf = K/Ft , where the futures prices
remains fixed in this section. He shows that
r
2|x|
σ
bt (x, T ) < (2.54)
T
for some sufficiently large |x| > x∗ (see also Zhu and Avellaneda (1998) who
derive this bound in a less general setting). He proceeds in showing that there
2.6 Static properties of the smile function 29
For the left-hand side one sees for any call price function that
since by EQ ST < ∞ we can interchange the limit and the expectation by the
dominated convergence theorem. For the right-hand side one obtains
p n p o
lim CtBS (x, T, 2|x|/T ) = e−rτ Ft Φ(0) − lim ex Φ(− 2|x|)
x↑∞ x↑∞
−rτ
=e Ft /2 , (2.57)
and
def b2 (x, T )
σ
βR = lim sup , (2.60)
x↑∞ |x|/T
def b2 (x, T )
σ
βL = lim sup . (2.61)
x↓−∞ |x|/T
Despite its daily fluctuations, the IVS exhibits a number of empirical regu-
larities, both from a static and a dynamic perspective. Here, they shall be
summarized with respect to the DAX index options traded at the German
option market, Deutsche Börse AG, Frankfurt, see Appendix A for details
2.7 General regularities of the IVS 31
concerning the data. Typically these stylized facts are observed for any equity
index market. Markets with other underlying assets display similar features.
For a compendium on single stocks, interest rate and foreign exchange markets
see Rebonato (1999) or Tompkins (1999).
1. For short time to maturities the smile is very pronounced, while the smile
becomes more and more shallow for longer time to maturities: the IVS
flattens out, Figure 2.8. Figure 2.9 displays the mean IVS 1996. Since this
is computed from smoothed IVS data and on a relatively small grid, this
effect is less apparent than in Figure 2.8. For more pictures of this kind,
see Fengler (2002).
2. The smile function achieves its minimum in the neighborhood of ATM to
near OTM call options, Figure 2.7 and Figure 2.9. The term structure is
increasing, but may also display a humped profile, especially in periods of
market turmoil as during the Asian crisis in 1997, Figure 2.11.
3. OTM put regions display higher levels of IV than OTM call regions, lower
panel of Figure 2.8 and Figure 2.9. However, this has not always been
the case: a more or less symmetric smile became strongly asymmetric (a
‘sneer’, or ‘smirk’) and considerably more pronounced after the 1987 crash.
It is widely argued that this is due to the investors’ increased awareness
of market down-swings since this period, Rubinstein (1994).
4. The volatility of IV is biggest for short maturity options and monotonically
declining with time to maturity, Figure 2.10 and Figure 2.12, also Fengler
(2002).
5. Returns of the underlying asset and returns of IV are negatively correlated,
indicating a leverage effect, Black (1976). For the entire data 1995 to May
2001 we find a correlation between ATM IV (three months) and DAX
returns of ρ = −0.32. This point is further discussed in the context of the
principal component analysis in Section 5.2.5.
6. IV appears to be mean-reverting, Cont and da Fonseca (2002). For ATM
IV (three months) we find a mean reversion of approximately 60 days, see
also Section 5.2.5 and Table 5.4.
7. Shocks across the IVS are highly correlated. Thus, IVS dynamics can be
decomposed into a small number of driving factors, Chapter 5.
An overview on the three-month ATM IV time series between 1995 and May
2001 is given in Figure 2.13. In Figure 2.14, the same time series together
with the rescaled DAX is shown. At the beginning of 1995, the DAX index
was at around 2100 points and increased moderately till the beginning of
1997. During this time ATM IV was below 20% and gradually fell till 1997
32 2 The implied volatility surface
towards 14%. Beginning from the end of 1996 the DAX commenced a steady
and smooth increase till mid 1998, which was shortly interrupted by the Asian
crisis in the second half of the year 1997. The entire ascent of the DAX was
accompanied by steadily increasing IV levels rising as high as 35%, when the
DAX fell sharply at the peak of the market turmoil. From then, IVs gradually
declined, but remained – also relatively volatile – at historically high levels,
while the index rose again. Between mid of July and beginning of October
1998 the DAX dropped again about 2000 points, followed by a sharp increase
of IVs with peaks up to 50%. During the recovery of the index between 1999
and 2000, IVs returned to the levels recorded before the late 1998 increase.
Although increasingly volatile, they remained at these levels, when the DAX
began its gradual decline from the post war peak of 8000 points in March
2000.
The annual standard deviation of IV can be inferred from Figure 2.12.
As already observed short run volatilities are more subject to daily variation
than the long term volatilities, which is reflected in the downward sloping
functions. The year 1998 was the period of the highest volatility, followed by
the years 1997 and 1999.
From this description, the only obvious regularity between the time series
patterns of the underlying asset, the DAX index, and ATM IV is that times of
market crises lead to sharply increasing levels of the IVS. This is likely to be
due to the increased demand for put options. Otherwise no clear-cut relation
between the time series patterns emerges: levels of the IVS may be constant,
downward or upward-trending independently from the index. Some authors,
like Derman (1999) and Alexander (2001b), have suggested to distinguish
different market regimes. In this interpretation, the IVS acts as an indicator of
market sentiment, i.e. as an additional financial variable describing the current
state of the market. This view explains the increased interest in accurate
modeling techniques of the IVS.
2.7 General regularities of the IVS 33
0.50
0.44
0.38
0.32
0.26
0.56 0.65
0.71 0.53
0.87 0.41
0.29
1.02 0.17
1.18
SCMivsts.xpl
Fig. 2.8. Top panel: DAX option IVS on 20000502. IV observations are displayed
as black dots; the surface estimate is obtained from a local quadratic estimator with
localized bandwidths. Bottom panel: term structure of the IVS. κf = 0.75 top line,
κf = 1, i.e. ATM, middle line, κf = 1.1, bottom line.
34 2 The implied volatility surface
0.18
0.17
0.15
0.14
0.12
0.92 0.25
0.95 0.21
0.98 0.16
0.12
1.00 0.08
1.03
1.92
1.76
1.60
1.45
1.29
0.92 0.25
0.95 0.21
0.98 0.16
0.12
1.00 0.08
1.03
Fig. 2.10. Standard deviation of the IVS 1996, computed from smoothed surfaces.
2.7 General regularities of the IVS 35
1998
0.3
1999
0.25
2000
Implied Volatility
1997
2001
0.2
0.15
1995
1996
0.1
Fig. 2.11. ATM mean IV term structures between 1995 and 2001, computed from
smoothed surfaces.
1998
Implied Volatility
1997
1999
5
2000
1995 2001
1996
0
Fig. 2.12. ATM standard deviation of IV term structures between 1995 and 2001,
computed from smoothed surfaces.
36 2 The implied volatility surface
0.6
0.5
0.4
0.3
0.2
0.1
0
Fig. 2.14. German DAX ×10−4 (upper line) and three-months ATM IV levels
(lower line), also given in Figure 2.12.
2.8 Relaxing the constant volatility case 37
The fact that IV is counterfactual to the BS model has spurred a large number
of alternative pricing models. The easiest way for more flexibility is to allow
the coefficients of the SDE, which describes the stock price evolution, to be
deterministic functions in the asset price and time. This preserves the complete
market setting, Section 2.8.1. A second important class of models specifies
volatility as an additional stochastic process. Since volatility is not a tradable
asset, this implies that the market is incomplete. A short review is given in
Section 2.8.2.
Allowing for coefficients of the SDE of the stock price evolution that are
deterministic functions in the asset price and time, leads to
dSt
= µ(St , t) dt + σ(St , t) dWt , (2.66)
St
where µ, σ : R × [0, T ∗ ] → R are deterministic functions. For the existence of
a unique strong solution, the functions must satisfy a global Lipschitz and a
linear growth condition, see appendix Chapter B.
For pricing derivatives, one may proceed as in Section 2.1. This leads to
the generalized BS PDE for the derivative price:
∂H ∂H 1 ∂2H
0= + (r − δ)S + σ 2 (S, t)S 2 − rH . (2.67)
∂t ∂S 2 ∂S 2
As before the delta hedge ratio is given by the first order derivative of the
solution to (2.67) with respect to St .
For plain vanilla options, closed-form solutions can be derived for some par-
ticular specifications of volatility. Generally, this can be achieved via change
of variable techniques, e.g. Bluman (1980) and Harper (1994). For the special
def
case, when volatility is only time-dependent, i.e. σ(St , t) = σ(t), one can use
the following arguments to solve (2.67), Wilmott (2001a, Chapter 8):
One introduces the new variables
def
S = Se(r−δ)(T −t) , (2.68)
def
t = f (t) , (2.69)
def r(T −t)
H(S, t) = H(S, t) e , (2.70)
where f is some smooth function. Expressing the PDE (2.67) in terms of the
new variables (2.68) to (2.70), yields
38 2 The implied volatility surface
2
∂H ∂t 1 2∂ H
= σ 2 (t)S 2 . (2.71)
∂t ∂t 2 ∂S
If we choose Z T
def
f (t) = σ 2 (s) ds , (2.72)
t
∂H 1 2 ∂2H
= S , (2.73)
∂t 2 ∂S 2
which is independent from time in its coefficients. Also, the boundary condition
for a (European) call H(S T , T ) = H(ST , T ) = (ST − K)+ (or a European
put) stays the same after these manipulations. Consequently, in denoting by
H(S, t) the solution to (2.73), we can rewrite this in the original variables as
(0) (1)
The two Brownian motions Wt 0≤t≤T ∗ and Wt 0≤t≤T ∗ are defined
on the probability space (Ω, F, P), and let (Ft )0≤t≤T ∗ be the P-augmented
filtration generated by both Brownian motions. Again we suppose that the
sufficient conditions are met, such that (2.79) and (2.80) have unique strong
solutions, Chapter B. The function f (y) chosen for positivity and analytical
√
tractability: typical examples are: f (y) = y, Hull and White (1987), f (y) =
ey , Stein and Stein (1991), or f (y) = |y|, Scott (1987).
Empirical analysis suggests a mean-reverting behavior of volatility. To cap-
ture this regularity, (Yt )0≤t≤T ∗ is often assumed to be an Ornstein-Uhlenbeck
process, which is defined as the solution to the SDE
(1)
dYt = α(µy − Yt ) dt + θ dWt , (2.81)
where α, µy , θ > 0, Scott (1987) and Stein and Stein (1991). Here, α is the
rate of mean reversion, pulling the levels of the process back to its long run
mean µy . The solution of (2.81) is given by:
Z t
−αt
Yt = µy + (y0 − µy )e +θ e−α(t−s) dWs(1) , (2.82)
0
(0) (1)
where λt t≥0
and λt t≥0
are (Ft )0≤t≤T ∗ -adapted processes. Further-
more,
Z t Z t
(0) def (0) (1) def (1)
Wt = Wt + λ(0)
s ds and Wt = Wt + λ(1)
s ds , (2.86)
0 0
are Brownian motions on the space (Ω, F, Q) for all t ∈ [0, T ∗ ]. If (and only
if)
(0) def µ + δ − r
λt = , (2.87)
σ(t, Yt )
(1)
the discounted price process e−rt St is a martingale. The process λt t≥0
can be any adapted process satisfying the required integrability condition. In
analogy to (2.87) it is called the market price of volatility risk. The measure
(1)
Q depends on the choice of λt t≥0 . In some sense, one may think about
the measure as being ‘parameterized’ by this process, we write therefore: Qλ1 .
Option prices are computed by exploiting the risk neutral pricing relationship:
λ1
Ht = EQ {e−rt ψ(ST )|Ft } . (2.88)
(1)
Let’s assume that λt t≥0 is a function of Yt , St and t only, i.e. a Markov
def
process, and that ρ = 0. In this particular case, we can compute option prices
by conditioning on the volatility path. By the law of iterated expectations, we
have, e.g. for a call:
λ1
h λ n o i
1
C(St , t) = EQ EQ e−r(T −t) (ST − K)+ |Ft , σ(Ys , s), t ≤ s ≤ T |Ft .
(2.89)
2.8 Relaxing the constant volatility case 41
def RT
where now σ 2 = T 1−t t {f (Ys )}2 ds. As before we insert the root-mean-
square time average over a particular trajectory of volatility into the BS for-
mula. The call price is given by an average of prices over all possible volatility
paths.
Due to its similarity to the case of deterministic volatility, one is tempted
to interpret IV as an average volatility over the remaining life time of the
option. However, in general we have
λ1
n p o λ1
p
EQ C BS ·, σ 2 |Ft 6= C BS ·, EQ σ 2 |Ft , (2.91)
and thus: λ1
p
b 6= EQ
σ σ 2 |Ft , (2.92)
since σ 2 is random and the call price a nonlinear function of volatility. For
ATM strikes, where the volga is small and negative (recall our remark from
page 15 with regard to Figure 2.4), Jensen’s inequality yields:
λ1
p
b < EQ
σ σ 2 |Ft , (2.93)
λ1
√
b ≈ EQ
but σ σ 2 |Ft may be considered as a sufficiently good approxima-
tion.
Thus, it can still be justified to interpret IV as an average volatility over
the remaining life time of the option. It should be borne in mind, however,
that this interpretation is limited for ATM strikes and ρ = 0, only. For the
case ρ 6= 0, a representation of this form depends strongly on the specification
of the underlying volatility process. A generalization of (2.89) within the Hull
and White (1987) model for any ρ has been obtained by Zhu and Avellaneda
(1998), see also the discussion in Fouque et al. (2000).
Due to the incompleteness of the market, the construction of a riskless
hedge portfolio as in Section 2.1 is not possible. One solution, referred to as
delta-sigma hedging, is to introduce another option with a longer maturity into
the market, whose price is given exogenously. This completes the market under
some conditions, Bajeux and Rochet (1992). Given these three instruments,
the stock, the bond and the additional option, a riskless hedge portfolio can
be constructed that prices the option.
(1) def
Another strategy is to assume that λt t≥0 = 0, i.e. volatility risk is
unpriced. This is sensible if volatility risk can diversified away, or preferences
42 2 The implied volatility surface
are logarithmic, Pham and Touzi (1996). The measure Q0 can also be inter-
preted as the closest measure to P in an relative entropy sense, Föllmer and
Schweizer (1990). In the general case, one needs to resort to hedging strate-
gies that have been developed within the incomplete markets literature: in
super-hedging the contingent claim is ‘super-replicated’, i.e. a self-financing
strategy with minimum initial costs is seeked such that any future obligation
from selling the contingent claim is covered, while in quantile-hedging one tries
to cover this obligation only with a sufficiently high probability. Finally, one
may consider trading strategies which are not necessarily self-financing, i.e.
which allow for the additional transfer of wealth to the hedge portfolio. This is
called risk-minimizing hedging orginated by Föllmer and Sondermann (1986).
See Föllmer and Schweizer (1990), Karatzas (1997) and Föllmer and Schied
(2002) for a detailed mathematical treatment of these hedging approaches.
In the presence of the smile, a first obvious challenge is the computation of the
relevant hedge ratios. At first glance, an answer may be to insert IV into the BS
derivatives in order to compute the hedge ratios for some option position. This
strategy is also called an ‘IV compensated BS hedge’. However, one should be
aware that this strategy can be erroneous, since IV is not necessarily equal to
the hedging volatility. Analogously to IV, the hedging volatility, for instance
for the delta, is defined by:
∂C BS ∂C
ft
σ
bh : bh ) −
(St , t, K, T, σ =0, (2.94)
∂S ∂S
which is the volatility that equates the BS delta with the delta of the true,
but unknown pricing model, Renault and Touzi (1996). Unfortunately, the
hedging volatility is not directly observable.
Renault and Touzi (1996) prove that the bias in this approximation is sys-
tematic, when the classical Hull and White (1987) model is the true underlying
price process. The bias translates into the following errors in the hedge ratios:
for ITM options the use of IV to compute the hedge ratios leads to an under-
hedged position in the delta, while for OTM options the use of IV leads to
an overhedged position. Only for ATM options, in the log-forward moneyness
sense, the delta-hedge is perfect. This problem is also demonstrated for both
delta and vega risks in a simulation study by Rebonato (1999, Case Study
4.1).
An alternative approach for approximating the unknown delta is to ex-
plicitly assume that the smile depends on the underlying asset price St . Then
one approximates the delta as:
2.9 Challenges arising from the smile 43
∂C
ft ∂C BS ∂C BS ∂b
σ
= (St , t, K, T, σ
b) + (St , t, K, T, σ
b) . (2.95)
∂S ∂S ∂b
σ ∂S
σ /∂S. It cannot directly be
In (2.95) all quantities are known except for ∂b
recovered from the IVS, since the IVS is a function in maturity and strikes,
not in the underlying. One solution would be to use simple conjectures about
this quantity, see Derman et al. (1996b) or Derman (1999) and Section 3.11
for typical examples. Assuming that volatility is a deterministic function of
St and t as in Section 2.8.1, Coleman et al. (2001) suggest the following
approximation to ∂b σ /∂S. They observe that under this assumption European
put and call prices are related to each other through a reversal of S and K,
and r and δ, respectively, i.e.
C(St , t, K, T, σ
b, r, δ) = P (K, t, St , T, σ
b, δ, r) , (2.96)
where P (St , t, K, T, σ, r, δ) denotes the price of a European put option, where
– in this order! – current asset price is St , t time, strike K, expiry date T ,
volatility σ and interest rate r, dividend yield δ. In further assuming that a
similar relationship holds in terms of IV, they derive
σ C (St , t, K, T, r, δ)
∂b σ P (K, t, St , T, δ, r)
∂b
= , (2.97)
∂S ∂S
where the superscript C and P denote call and put respectively. Note that the
left-hand side in (2.97) is the unknown derivative of IV with respect to the
underlying asset, while the right-hand side of (2.97) is a strike derivative which
can be reconstructed from the IVS. Relation (2.97) is particularly convenient
when r ≈ δ (as in the case of futures options), since the switch of both
quantities becomes obsolete. In terms of an empirical performance, Coleman
et al. (2001) report that this approximation substantially improves the hedges
based on a simple constant volatility method.
Another hedging strategy due to Lee (2001) also includes the stochastic
volatility case: reconsider the strike-analogue to (2.95). It is given by:
∂C
ft ∂C BS ∂C BS ∂b
σ
= (St , t, K, T, σ
b) + (St , t, K, T, σ
b) . (2.98)
∂K ∂K ∂b
σ ∂K
Multiply (2.95) with St and (2.98) with K and sum both equations. As-
suming further that C ft is homogenous of degree one in St and K (the BS
BS
price Ct fulfills this property as can easily be checked), we find:
∂b
σ K ∂bσ
=− . (2.99)
∂S St ∂K
∂C
ft ∂C BS ∂C BS K ∂bσ
= b) −
(St , t, K, T, σ (St , t, K, T, σ
b) . (2.100)
∂S ∂S ∂b
σ St ∂K
44 2 The implied volatility surface
This delta-hedge has a direct reference to the IVS and can be implemented
without estimating an underlying stochastic volatility model. When the smile
is negatively skewed, this approach delivers smile dynamics that proxy the
so called sticky-moneyness assumption, see the discussion in Section 3.11.
Moreover, it also gives insights for the results by Renault and Touzi (1996):
when the smile is u-shaped, the IV compensated delta overhedges the OTM
call, since ∂b
σ /∂K in the second term of the right-hand side in (2.100) has a
positive sign in the OTM regions of calls, Section 2.7.
For risk management, other difficulties appear, especially when IV com-
pensated hedge ratios are used. When different BS models apply for different
strikes, one may question whether delta and vega risks across different strikes
can simply be added to assess the overall risk in the option book: being a cer-
tain amount of euro delta long in high strike options, and the same amount
delta short in low strike options, need not necessarily imply that the book
is eventually delta-neutral. There may be residual delta risk that has to be
hedged, even on this aggregate level.
Similarly, the vega risk of the portfolio needs to be carefully assessed. In
stress scenarios it is crucial how the IVS is shocked, e.g. whether one shifts
the IVS across strikes and time to maturity in an entirely parallel fashion or
in more sophisticated ways. This is explored with the dimension reduction
techniques developed in Chapter 5, which offer empirical answers to these
questions: typically, the most important shocks are due to almost parallel up
and down shifts of the IVS. A second source of shocks affects the moneyness
slope of the IVS, while a third type influences the moneyness curvature of the
IVS or – depending on the modeling approach – its term structure. This will
be studied in details in Chapter 5.
2.9.2 Pricing
A next challenge is valuing exotic options. The reason is that even weakly
path-dependent options, such as barrier options, require sophisticated volatil-
ity specifications. Consider, e.g. an ITM knock-out option with strike K and
barrier L > K. In this case, explicit valuation formulae are known when
the underlying follows a geometric Brownian Motion, Musiela and Rutkowski
(1997, Chapter 9). However, which IV should be used for pricing? One could
use the IV at the strike K, the one at the barrier L, or some average of both.
This problem is the more virulent the more sensitive the exotic option is to
volatility.
At this point it becomes clear that, in the presence of the IVS, pricing is
not sensible without a self-consistent and reliable model. One way is taken by
the stochastic volatility models sketched in Section 2.8. Another way, which is
much closer to the concept of the IVS, and hence to the topic of this research,
is offered by the smile consistent local volatility models. These models rely on a
2.10 IV as predictor of realized volatility 45
volatility function that is directly backed out of prices of plain vanilla options
observed in the market. Thus, the exotic option is priced consistently with the
entire IVS. This is a natural approach, especially when the exotic option is
to be hedged with plain vanilla options. It will be the topic in Chapter 3.
amount of information on future volatility and are better than (only) time
series based methods. At the same time, most authors conclude that IV is a
biased predictor. Deeper theoretical insight on the bias is shed by Britten-
Jones and Neuberger (2000), see Section 3.9, that provide a model-free option
based volatility forecast. They show that if the IVS does not depend on K
and T , but is not necessarily constant, then squared IV is exactly this forecast
– however: under the risk neutral measure. Hence, the bias may not be due
to model misspecification or measurement errors, but rather due to the way
the market prices volatility risk. Thus, research now proposes the volatility
risk premium as a possible explanation, Lee (2001) and Bakshi and Kapadia
(2003).
Ever since the observation of the smile function, research has aimed at explain-
ing this striking deviation from the BS constant volatility assumption. This is
achieved by subsequently relaxing the assumptions of the models. Nowadays,
the literature comprises a lot of factors possibly responsible for smile and term
structure patterns: they range from market microstructure frictions, such as
liquidity constraints and transaction costs, to stochastic volatility and Lévy-
processes for the underlying asset price process. Although stochastic volatility
and asset prices driven by Lévy-processes may be the best understood and
most prominent explanations, the empirical literature has been little successful
in disentangling the different factors: since IV is a free parameter, it comprises
‘expected volatility and everything else that affects option supply and demand
but is not in the model’, Figlewski (1989, p. 13).
It has been conjectured quite early in the literature that stochastic volatil-
ity is responsible for the smile effect, but Renault and Touzi (1996) are the
first to formally prove this suspicion under the assumption of an underlying
Hull and White (1987) model with zero correlation between the two Brownian
motions. They show that stochastic volatility necessarily implies a U-shaped
smile which attains its minimum for ATM options (in the sense of forward
moneyness). A similar conclusion is drawn in stochastic IV models, see Sec-
tion 3.12. The stochastic volatility smile effect is also confirmed in empirical
work: for instance, Härdle and Hafner (2000) demonstrate that GARCH-type
models considerably reduce the pricing error of options compared with the
simple BS model. However, the smile patterns generated by these stochastic
volatility models do not appear to match well the ones empirically observed,
Heynen (1994). This is also confirmed by Jorion (1988) and Bates (1996)
who overall favor jump diffusion models against stochastic volatility. Das and
Sundaram (1999) investigate more deeply the implications of the models con-
cerning the shape of the smile and the term structure of the IVS. According
to them, stochastic volatility smiles are too shallow, while jump diffusions
2.11 Why do we smile? 47
imply the smile only for short maturity options. Moreover, they prove that
jump diffusions always imply an increasing term structure of IV. However,
empirically, also a decreasing or at least humped term structure is observed,
compare Figure 2.11 for the years 1997 and 2000. Similar results are reported
by Tompkins (2001) for a variety of stochastic and jump models. Summing up,
it appears that only a combination of jump and stochastic volatility models
is sufficiently capable of capturing the stylized facts of the IVS, Bakshi et al.
(1997).
As in the literature on the predictive power of IV, new studies seek the
reasons for the IVS in long memory in volatility of the underlying process,
e.g. Breidt et al. (1998). There is evidence that in particular the upward-
sloping term structure of the IVS can strongly be influenced by long memory
in volatility, Taylor (2000).
Given that the distributional assumption of normal returns behind the
BS model is frequently rejected, see e.g. Ederington and Guan (2002) for an
analysis that is based on delta-hedging an option portfolio, processes with a
marginal distribution tails heavier than the Gaussian ones are considered. For
instance, Barndorff-Nielsen (1997) discusses the inverse Gaussian distribution.
This distribution is from the class of generalized hyperbolic distributions,
proposed by Eberlein and Keller (1995), Küchler et al. (1999), and Eberlein
and Prause (2002) for modeling asset price processes that capture the smile
effect. A comprehensive introduction into smile consistent option pricing with
Lévy processes is found in Cont and Tankov (2004).
An increasing literature seeks the reasons for the smiling volatility func-
tions in market imperfections. Jarrow and O’Hara (1989) argue that the dif-
ferences between IV and the historical volatility reflect the transaction costs
of the dynamic hedge portfolio. In approximating transaction costs by the bid-
ask spread, a similar conclusion is reached by Peña et al. (1999). Within an
equilibrium framework, Grossman and Zhou (1996) analyze possible feedback
effects from hedging and market illiquidity. In their set-up, portfolio insurance
can generate volatility skews. Frey and Patie (2002) show that a market liq-
uidity, which depends on the asset price level, produces smile patterns as are
typically observed. This assumption is in line with the experience that large
up or down swings in asset prices lead to a decrease of market liquidity.
In Section 2.5, it has been argued that also supply and demand condi-
tions may contribute to the shape of the smile. In a recent study, Bollen and
Whaley (2003) examine the net buying pressure proxied by the difference of
buyer-motivated and seller-motivated contracts. To them, net buying pressure
plays an important role for the shape of the IVS in the S&P 500 market. One
is tempted to argue that, in an efficient market, a higher increased demand
for portfolio insurance should provide incentives to agents to sell options and
to replicate them synthetically. However, the presence of short sale and bor-
rowing constraints among investors may make the replication strategy more
48 2 The implied volatility surface
costly, thereby driving up option prices. Fahlenbrach and Strobl (2002) pro-
vide empirical evidence for this argument.
Finally, an interesting explanation for the index smile has recently been
put forward: it is a well-known fact that stock smile functions are shallow com-
pared with the smile of index options, Bollen and Whaley (2003). The risk
neutral distribution of the index – since it is a (deterministically) weighted
average of single stocks – is completely determined by the risk neutral distribu-
tions of the single stocks. Branger and Schlag (2004) show that the steepness
of the smile is an immediate result of the dependence structure of the single
stocks in the basket. Moreover, a change in this dependence structure, which
has been addressed by Fengler and Schwendner (2004) in the context of pric-
ing multiasset equity options, can have dramatic consequences to the shape of
the IVS. Indeed the relation of prices, risk neutral distributions and volatility
functions between stock options and basket options is relatively unexplored.
First such approaches in this direction are Avellaneda et al. (2002), Bakshi et
al. (2003) and Lee et al. (2003).
2.12 Summary
In this chapter we introduced the phenomenon of the IV smile and the IVS:
the first part was devoted to an introduction into the BS model for the pricing
of contingent claims. Two principles in pricing, the self-financing replication
strategy and the probabilistic approach based on the risk neutral measure,
were presented. We derived the BS formula for plain vanilla European calls
and puts.
The second part treated the concept of IV. We discussed the static prop-
erties of the smile function, such as no-arbitrage bounds on the IV slope and
the asymptotic behavior of the smile function. A discussion of the general
empirical regularities of the IVS observed on equity markets followed. In a
first attempt to explain the smile, we presented two typical approaches for
relaxing the strict assumptions of the BS model: time-dependent and stochas-
tic volatility. In both frameworks, we arrived at an interpretation of IV as an
average of the squared volatility function. Then we discussed the challenges in
hedging and pricing in the presence of a smile. The chapter concluded with
a short summary on the literature that employs IV as a predictor for future
stock price fluctuations, and with a complementary section on other possible
explanations for the existence of the smile phenomenon.
3
3.1 Introduction
The existence of the smile requires the development of new pricing models
that capture the static and dynamic distortions of the IVS. One pathway
taken first in Merton (1976) and Hull and White (1987) and the subsequent
related literature is to add another degree of freedom either to the process
of the underlying asset or to the volatility process. This approach has been
sketched in Section 2.8. The advent of highly liquid option markets, on which
large numbers of standardized plain vanilla options are traded at low costs,
has reversed the procedure: an emerging strand of literature of so called smile
consistent volatility models takes the prices of plain vanilla options as given.
The aim is to extract information about the asset price dynamics and the
volatility directly from the observed option prices and the IVS only, which
then is employed to price and hedge other derivative products. The decisive
point is that these other derivatives are priced and hedged relative to the
observed plain vanilla options. The name smile consistent volatility models
is derived from the fact that the (European) options priced in these models
exactly reproduce the IVS observed empirically.
This approach is justified by at least two empirical facts and one practi-
cal consideration: first, option prices and the IVS are readily at hand, if not
directly observed. Second, recent studies demonstrate that a large number of
option price movements cannot be attributed to movements in the underlying
or to market microstructure frictions, Bakshi et al. (2000). This leads to the
impression that option markets due to their depth and liquidity behave in-
creasingly self-governed by its own supply and demand conditions. This seems
to be particularly virulent at the joint expiry dates of futures contracts and
options, the ‘triple witch days’.
The third and more practical point concerns portfolios of exotic options:
necessarily, positions in these options need to be hedged, and in most cases,
this hedge will be sought by employing plain vanilla options. A particular
50 3 Smile consistent volatility models
strategy is static hedging. Unlike dynamic hedging where the hedge is (almost)
continuously adjusted, in static hedging, the payoff of the exotic option is
replicated by an appropriate portfolio of plain vanilla options. This portfolio
remains unaltered up to expiry, Derman et al. (1995), Carr et al. (1998) and
Andersen et al. (2002). In this case, pricing exotic derivatives correctly relative
to the options that will be used for the hedge is vital for its accuracy.
In achieving these goals, two main lines of models have emerged: first, local
volatility models and their most recent stochastic ramifications, and second,
stochastic implied volatility models. In both approaches, the parameters are
obtained from a calibration of the model to a cross-section of option prices.
Furthermore, both models allow for preference-free derivative valuation (ex-
cept for the stochastic local volatility models), since within each modeling
framework the market is complete. Thus they do not require additional as-
sumptions on the market price of risk.
The concept central to local volatility models is the local volatility surface
(LVS). Unlike the IVS, which is a global measure of volatility, as can be under-
stood from the averaging concept usually attributed to it, the LVS is a local
measure in the sense that it gives a volatility forecast for a pair of a particular
strike and a particular expiry date (K, T ). In this framework instantaneous
volatility is not necessarily a deterministic function of time and asset prices, it
may perfectly be stochastic. However, in the derivation of the LVS all sources
of risk in the stochastic volatility are integrated out, which leave as only risky
element the fluctuations of the asset price, Derman and Kani (1998). By its
local nature, the LVS – opposite to the IVS – is the correct input parameter
for pricing models. Most recently, a number of studies try to circumvent the
static implications of the LVS in moving towards stochastic local volatility
models.
Stochastic IV models explicitly allow for a stochastic setting. However,
the additional state variable is not introduced in the instantaneous volatil-
ity function as in the classical stochastic volatility literature, but tied to a
stochastic IV. Ultimately, of course, this also implies a stochastic instanta-
neous volatility. Since plain vanilla options are still priced via the BS formula
using the contemporaneous realization of the IV, volatility risk is tradable and
the market complete. This allows for a preference-free valuation of contingent
claims.
In this chapter, we aim at giving a comprehensive review on the current
state of literature of local volatility and stochastic IV models (see also Ski-
adopoulos (2001) for an excellent review). First, the notion of local volatility
as pioneered by Dupire (1994) and Derman and Kani (1998) is presented. In
Section 3.3 we relate local volatility to observed option prices. The central
result will be the so called Dupire formula. An alternative path to the Dupire
formula is given in Section 3.4. Section 3.5 establishes the link between lo-
cal volatility and IV. The theoretical part of local volatility is deepened in
Section 3.8, which develops the local volatility as an expected value of instan-
3.2 The theory of local volatility 51
The concept of local volatility (also called forward volatility) was introduced
by Dupire (1994), and further developed in Derman and Kani (1998). Intu-
itively one may think about local volatility, denoted by σK,T , as the market’s
consensus of instantaneous volatility for a market level K at some future date
T . The ensemble of such estimates for a collection of market levels and fu-
ture dates is called the local volatility surface (LVS). Since it is implied from
observed option prices, the LVS gives the fair value of the asset price volatil-
ity for future market levels and times. Note the difference to the concept of
IV which under certain conditions is thought of as the market’s estimate of
expected average volatility through the life time of the option, Section 2.8.
To make the concept of local volatility more precise, we reconsider the
continuous-time economy with a trading interval [0, T ∗ ], where T ∗ > 0. Let
(Ω, F, P) be a probability space, on which at least one Brownian motion
(0)
Wt 0≤t≤T ∗ , but possibly also more Brownian motions, are defined. As usu-
ally, P is the objective probability measure and information is revealed by a
filtration (Ft )0≤t≤T ∗ . The asset price (St )0≤t≤T ∗ is modelled by a (Ft )0≤t≤T ∗ -
adapted stochastic process, driven by the SDE
dSt (0)
= µ(St , t) dt + σ(St , t, ·) dWt , (3.1)
St
where µ(·, ·) denotes the instantaneous drift. We assume that the instanta-
neous volatility σ(St , t, ·) 0≤t≤T ∗ follows some (Ft )0≤t≤T ∗ -adapted stochastic
process possibly depending on St , the history of St or on other state variables.
This arbitrary dependence is meant with the ‘ · -notation’. Finally, we assume
absence of arbitrage, which implies the existence of some risk neutral measure
Q ∈ Q equivalent to P, under which the discounted asset price (Set )0≤t≤T ∗ is a
martingale. If the martingale measure is not unique, we think about Q as the
risk-neutral measure ‘the market has agreed upon’, i.e. some market measure,
see Cont (1999) or Björk (1998, p. 150) for a discussion of this notion. It is
also assumed that the entire spectrum of European plain vanilla call prices
Ct (K, T ), which are priced under Q, are given for any strike K and maturity
date T : G = {Ct (K, T ), K ≥ 0, 0 ≤ T ≤ T ∗ }.
52 3 Smile consistent volatility models
2
The local variance σK,T (St , t) is defined as the risk-neutral expectation of
squared instantaneous volatility conditional on ST = K, and time t informa-
tion Ft :
def
2
σK,T (St , t) = EQ {σ 2 (ST , T, ·)|ST = K, Ft } , (3.2)
where EQ (·) is the expectation operator under the measure Q. Then local
volatility is given by: q
def 2
σK,T = σK,T . (3.3)
This definition of local volatility has two implications: first, the use of the
market’s view on future volatility expressed by the expectation operator clar-
ifies that all sources of risk from the stochastic volatility are integrated out.
Instead, the evolution of volatility is compressed into a single function that
is deterministic in St and t. To put it differently, the concept of local volatil-
ity presumes – as time elapses – that the instantaneous volatility will evolve
entirely along today’s market expectations sublimated in the local volatility
function. Therefore, within a local volatility framework, for some market level
K = St at T = t, the instantaneous volatility is:
In this case, instantaneous volatility evolves along the static local volatility
function, since the right-hand side is independent of S and t.
Derman and Kani (1998) further characterize the local variance in show-
ing that it can be represented as the risk-adjusted expectation of the future
instantaneous variance at time T :
2
σK,T (St , t) = E(K,T ) {σ 2 (ST , T, ·)|Ft } , (3.7)
where the expectation is now taken with respect to a new measure, which
is called the K-strike and T -maturity forward risk-adjusted measure. This is
again in analogy with the theory of forward rates: here, the forward rate is
obtained by taking the expectation of the short rate under the T -maturity
forward measure, Jamshidian (1993). The derivation of (3.7) will be delayed
until Section 3.8.
Clearly, for pricing, the assumption that the only source of risk is the
asset price may be considered as a drawback. It may be good for markets in
which asset prices and volatility are strongly correlated, as is commonly seen
in equity markets, but can be questioned for foreign exchange markets. The
dynamic hedging performance of the deterministic local volatility models is
criticized, Hagan et al. (2002). Furthermore, they do not provide a genuine
explanation for the smile phenomenon, but rather overstretch the ordinary
BS world, Ayache et al. (2004). This, however, does not appear to diminish
their significance in pricing exotic derivatives in practice. In order to meet
this criticism and to improve the hedging performance, recent work aims at
relaxing the deterministic framework and moves towards a stochastic theory
of local volatility.
Under the equivalent martingale measure asset prices follow the SDE:
dSt (0)
= (r − δ) dt + σ(St , t, ·) dW t , (3.8)
St
(0)
where W t denotes the Brownian motion, which drives the asset price, un-
der the risk neutral measure Q. The interest rate and the continuously com-
pounded dividend yield are denoted by r and δ, respectively.
By the martingale property, the calls are priced by
∂ 2 Ct (K, T )
= e−rτ EQ {δK (ST )|Ft } , (3.11)
∂K 2
where δx0 (·) denotes the Dirac delta function, which is defined by the property
∂2
R
f (x) δx0 (x) dx = f (x0 ) for a smooth function f . The derivative ∂K 2 (ST −
∂Ct (K, T ) ∂ Q
= −rCt (K, T ) + e−rτ E {(ST − K)+ |Ft } . (3.12)
∂T ∂T
To evaluate the right-hand side of (3.12) we apply a generalization of the
Itô formula to the convex function (ST − K)+ , called Tanaka-Meyer formula,
Appendix (B.14). This yields
1
d(ST − K)+ = 1(ST > K) dST + ST2 σ 2 (ST , T, ·) δK (ST ) dT . (3.13)
2
Taking expectations in (3.13) together with the asset price dynamics (3.8)
yields:
(3.14)
EQ {ST 1(ST > K)} = EQ {(ST − K)+ } + KEQ {1(ST > K)} . (3.15)
Plugging this into (3.14) and using (3.9) and (3.10), one obtains:
∂ Q + rτ ∂Ct (K, T )
E {(ST − K) |Ft } = e (r − δ) Ct (K, T ) − K
∂T ∂K
1 2 Q 2
+ K E {σ (ST , T, ·) δK (ST )|Ft } . (3.16)
2
By the law of iterated expectations the last term in (3.16) can be rewritten
as:
∂Ct (K,T )
2 ∂T + δCt (K, T ) + (r − δ)K ∂Ct∂K
(K,T )
σK,T (St , t) = 2 2C , (3.19)
K2 ∂ t (K,T )
∂K 2
def
where σK,T2
(St , t) = EQ {σ 2 (ST , T, ·)|ST = K, Ft }. This is the Dupire
formula, Dupire (1994). The Dupire formula gives a representation of the
local volatility function completely in terms of observed call prices and their
derivatives. q
2 def
It remains to show that local volatility σK,T (St , t) = σK,T (St , t) is
indeed a real number. This can be seen by the following observations: the
denominator of (3.19) is positive by no-arbitrage, since the transition proba-
bility must be positive on the entire support. Positiveness of the numerator
is obtained by a portfolio dominance arguments similar to those in Merton
(1973), see Andersen and Brotherton-Ratcliffe (1997). We have:
There is a remarkable second approach for deriving the Dupire formula (3.19),
Dupire (1994). This approach directly builds on the transition probability.
While in general it is not possible to recover the dynamics of the asset price
process from the transition probability, there is one exception: if one considers
one-factor diffusions only, i.e. if one initially assumes instantaneous volatility
to be a deterministic function in the asset price and time. The reason is that
there exists a dual or adjoint PDE to the BS PDE (2.13) which has, instead
of S and t, K and T as independent variables.
Assume now that under the risk neutral measure Q the asset price dynam-
ics are given by:
dSt (0)
= (r − δ) dt + σ(St , t) dW t , (3.22)
St
where the notation stays as before except that σ(St , t) is deterministic.
def
It is well known that the risk neutral transition probability φ(K, T |St , t) =
e ∂ 2 Ct (K, T )/∂K 2 , introduced in (2.38), satisfies the BS PDE (2.13) with
rτ
terminal condition:
φ(K, T |ST , T ) = δK (ST ) . (3.23)
∂φ(K 0 , T |St , t) 1 ∂2 n 2 0 0 2
0
o
= σ (K , T ) K φ(K , T |St , t)
∂T 2 ∂(K 0 )2
n o
− ∂ 0 (r − δ)K 0 φ(K 0 , T |St , t)
∂K
(3.24)
for fixed St and t, over all maturities T and strikes K 0 with initial condition:
To derive the Dupire formula one substitutes for φ(K, T |St , t). Evaluating
the first term in (3.24) yields:
∂φ(K 0 , T |St , t) ∂ 2 Ct (K 0 , T )
∂
= erτ
∂T ∂T ∂(K 0 )2
∂ 2 Ct (K 0 , T ) ∂ 2 ∂Ct (K 0 , T )
= rerτ 0
+ erτ . (3.26)
∂(K ) 2 ∂(K 0 )2 ∂T
∂ 2 Ct (K 0 , T ) ∂ 2 ∂Ct (K 0 , T ) 1 ∂2 2 ∂ 2 Ct
r + = σ 2 (K 0 , T ) K 0
∂(K 0 )2 ∂(K 0 )2 ∂T 2 ∂2K 0 ∂(K 0 )2
2
∂ 0 ∂ Ct
− (r − δ) K .
∂K 0 ∂(K 0 )2
(3.28)
∂Ct ∂ 2 Ct ∂ 3 Ct
Ct (K, T ), K , K2 2
, K2 → 0 as K → ∞ . (3.30)
∂K ∂K ∂K 3
58 3 Smile consistent volatility models
Note that (3.30) has implications for the tail behavior of the (risk-neutral)
transition density φ(K, T |St , t), which must be O(K −2 ). With regard to the
BS pricing function, it is evident that the assumptions (3.30) hold given the
exponential decay of the (log-normal) transition density, see Equation (2.40).
From (3.29) the Dupire formula (3.19) is readily received by solving for
σ 2 (K, T ). The final arguments are the same as given in Section 3.3 following
Equation (3.19). Uniqueness is proved in Derman and Kani (1994b).
An open question up to now is how the IVS and the LVS can be linked. This
would be desirable from two points of view: first, in a static situation, one
could immediately recover the LVS, which in principle is unobservable, from
the easily observable IVS. Second, in a dynamic context, it adds additional
value to the dynamical description of the IVS, for instance in terms of the
semiparametric factor model, Chapter 5: given a low-dimensional description
of the IVS dynamics, a representation of the Dupire formula in terms of IV
could be exploited to yield the corresponding LVS dynamics. This may help
improve the hedging performance of local volatility models. Another obvious
application could be stress tests for portfolios of exotic options. Here, one could
simulate the IVS within the semiparametric factor model. IVS scenarios are
then converted into LVS scenarios. The latter are the basis for correctly pricing
the exotic options in the portfolio and computing a value at risk measure.
The central idea to obtain such an IV counterpart of the Dupire formula is
to exploit the BS formula as an analytical vehicle, Andersen and Brotherton-
Ratcliffe (1997) and Dempster and Richards (2000). More precisely, we insert
the BS formula and its derivatives into the Dupire formula (3.19). In doing so,
the BS formula is interpreted as if IV depended on K and T as one empirically
observes on the markets, i.e. we assume:
C BS (St , t, K, T, σ, r, δ) = C BS (St , t, K, T, σ
b(K, T ), r, δ) . (3.31)
Furthermore, we maintain our assumption that local volatility is a determin-
istic function.
Applying the chain rule of differentiation, we obtain for the numerator of
the Dupire formula, suppressing the dependence of σ b on K and T :
BS
∂CtBS ∂b ∂CtBS ∂CtBS ∂b
∂Ct σ BS σ
2 + + δCt + (r − δ)K + .
∂T ∂b
σ ∂T ∂K ∂b
σ ∂K
(3.32)
Now, the analytical expressions for the BS formula (2.23) and its K- and
T -derivatives in (2.33) and in (2.36) are inserted. Most of the terms cancel
out. The strategy in the further derivation is to express the remaining terms
using the volatility derivative, the vega (2.30). This yields:
3.5 From the IVS to the LVS 59
∂CtBS
σ ∂b
σ ∂b
σ
+ (r − δ)K
b
2 + . (3.33)
∂b
σ 2τ ∂T ∂K
Finally, collecting the numerator (3.33) and the denominator (3.35) shows:
σ ∂b
σ ∂b
σ
2
b
τ + 2 ∂T + 2K(r − δ) ∂K
σK,T (St , t) = n o . (3.36)
σ 2 ∂2σ
1
+ 2 Kbσd1√τ ∂b
σ d1 d2 ∂b
K2 K2σ
bτ ∂K + σ
b ∂K + b
∂K 2
This is the Dupire formula in terms of the IVS and its derivatives.
Obviously, this approach does not provide a theory unifying both concepts.
This requires more careful treatment, and – up to now – has only been achieved
in certain asymptotic situations, Berestycki et al. (2002) and Section 3.6.
Rather, it is an ad hoc, but successful procedure to link the unobservable
LVS with the IVS. Given (3.36) and (3.39), one estimates the IVS and plugs
it into (3.36), which yields an estimate of the LVS. The LVS is then used
as input factor in pricing algorithms, e.g. in finite difference schemes that
solve the generalized BS PDE, Andersen and Brotherton-Ratcliffe (1997) and
Randall and Tavella (2000).
For a deeper understanding, of formula (3.36) it is instructive to inspect
the situation of no strike-dependence in the IVS. In this case all derivatives
with respect to K vanish and (3.36) reduces to
∂b
σ
σT2 (t) = σ
b + 2τ σ
b , (3.37)
∂T
which implies:
Z T
1
2
σ
b = σT2 (u) du . (3.38)
τ t
60 3 Smile consistent volatility models
0.50
0.44
0.38
0.32
0.26
0.56 0.65
0.71 0.53
0.87 0.41
0.29
1.02 0.17
1.18
SCMivs.xpl
LVS: 20000502
0.96
0.76
0.57
0.38
0.19
0.75 0.65
0.82 0.53
0.89 0.41
0.29
0.96 0.17
1.03
SCMlvs.xpl
Fig. 3.1. Top panel: DAX option IVS on 20000502. IV observations are displayed
as black dots; the surface estimate is obtained from a local quadratic estimator with
localized bandwidths. Bottom panel: LVS on 20000502; obtained from the IVS given
in the top panel via the moneyness representation of the Dupire formula (3.39).
3.5 From the IVS to the LVS 61
0.45
0.4
0.35
0.3
0.25
SCMlvs.xpl
Fig. 3.2. DAX option implied (squares) versus local (circles) volatility smiles for
one month and three months to expiry respectively on 20000502 taken as slices from
Figure 3.1.
62 3 Smile consistent volatility models
b2 + 2b
σ σ τ ∂b
σ
σκ2 f ,τ (St , t) = ∂τ
,
√ ∂b
σ 2τ
∂bσ
2
2 ∂2σ
1 + 2κf τ d1 ∂κ f
+ d 1 d 2 (κf ) ∂κf + σ
b τ (κ f ) ∂κf
b
2
√ √ (3.39)
where √d1 and d2 are interpreted as d1 = − ln(κf )/(b σ τ ) + 0.5 σb τ and d2 =
d1 − σ
b τ.
In Figure 3.1, we present an estimate of the LVS based on the moneyness
representation of the Dupire formula. The derivatives of the IVS are estimated
as derivatives of local polynomials of order two which are used to smooth the
IVS, see Section 4.3 for a description of this procedure. Due to the different
scales, the LVS appears to be flatter than the IVS at first glance. As we show
in Figure 3.2, which displays slices from both functions at the maturity of one
and three months, this impression is erroneous: it is the LVS which is steeper
than the IVS (leaving out the spiky short term local volatilities). Derman et al.
(1996b) report as an empirical regularity in equity markets that the smile of
the local volatility is approximately two times steeper than the IV smile. They
call this relationship the two-times-IV-slope-rule for local volatility. Using a
recent result by Berestycki et al. (2002) we shall prove in Section 3.7 that this
conjecture can be made more precise for short maturity ATM options.
In fact, there are a large number of other procedures to reconstruct the
LVS. They will be separately surveyed in Section 3.10, among them the implied
tree approaches. Another important stream of literature calls for a more formal
mathematical treatment and recovers the LVS from the Dupire formula or the
dual PDE in terms of an (ill-posed) inverse problem.
As a final cursory remark, note that Equation (3.35), if we ignore the initial
K 2 -term, is nothing but an expansion of the state price density in terms of the
BS vega, the smile and its first and second order derivatives, see the discussion
in Section 2.4, pp. 19:
√
φ(K, T |St , t) = e−δτ St τ ϕ(d1 ) (3.40)
( 2 )
∂2σ
1 2d1 ∂b
σ d1 d2 ∂b σ
× √
b
2
+ + + ,
K σ bτ Kbσ τ ∂K σ
b ∂K ∂K 2
3.6 Asymptotic relations between implied and local volatility 63
In estimating the smile and its derivatives, expression in (3.40) may serve
as a vehicle to recover the state price density, see Huynh et al. (2002) and
Brunner and Hafner (2003) for details.
Recent research has identified situations in which the relation between im-
plied and local volatility can be established more exactly. These results are of
asymptotic nature and more general than those stated so far, since they allow
the local volatility to be strike-dependent. More precisely, Berestycki et al.
(2002) show that near expiry, IV can be represented as the spatial harmonic
mean of local volatility. The key consequence of this result is that the IVS
can be extended up to τ = 0 as a continuous function. This can be exploited
in the calibration of local volatility models, Section 3.10.3. Additionally, they
prove that the representation (3.38), i.e. squared IV as an average of squared
local volatility, holds also for deep OTM options under certain assumptions.
To obtain their results, Berestycki et al. (2002) assume that local volatility
is deterministic. As noted in (3.6), this implies
2
σK,T (St , t) = σ 2 (K, T ) , (3.41)
i.e. local volatility is the instantaneous volatility function for all St = K and
t = T . Further they transform the Dupire formula, into the (inverse) log-
forward moneyness space, similarly as we have done to derive the forward
moneyness representation for the empirical demonstration in the previous
section. Define
def
x = − ln κf = ln(St /K) + (r − δ)τ . (3.42)
To gain an insight into the nature of this first result, consider the following:
let σ
b(x, 0) be the unique solution to the PDE at τ = 0. Then (3.43) reduces
to
2
2 2 x ∂b σ (x, 0)
b (x, 0) − σ (x, 0) 1 −
σ =0. (3.44)
σ
b(x, 0) ∂x
64 3 Smile consistent volatility models
holds in fact.
Result (3.46) establishes that for options near to expiry IV can be un-
derstood as the harmonic mean of local volatility. Note that – unlike the
situations seen so far – the mean is taken across log-forward moneyness, i.e.
in a spatial sense across the LVS. Berestycki et al. (2002) point out that this
result relies on the particular boundary condition imposed by the call payoff
function: ψ(x) = (ex − 1)+ (here in the inverse log-forward moneyness nota-
tion). Indeed, if it is replaced by any strictly convex function they show that
limτ ↓0 σ
b(x, τ ) = σ(x, 0).
The authors also provide an intuitive argument for their result: consider
the situation of an asset price process, the local volatility of which vanishes
in some interval [ex, 0] for x < x
e < 0. Then, we get σ b(x, 0) = 0 from (3.46).
Clearly, this result, which is obtained by averaging harmonically, is correct
also from a probabilistic point of view, since the stock starting in x will never
cross the interval and never reach the ITM region of the call. Thus the call
must have a price of zero. However, an IV of zero is inconsistent with the
simple (spatial) arithmetic averages.
For the second result, assume that local volatility is bounded away from
zero and infinity and that is has the continuous limits: limx↑∞ σ(x, τ ) = σ+ (τ )
and limx↓−∞ σ(x, τ ) = σ− (τ ). Then
1 τ 2
Z
lim σ b2 (x, τ ) = σ (s) ds . (3.47)
x→±∞ τ 0 ±
def Rτ 2
For understanding this result, note that e.g. σ b2 (+∞, τ ) = τ1 0 σ+ (s) ds
has already the correct behavior by the arguments on the non-strike dependent
local volatility in the previous section. To prove (3.47), Berestycki et al. (2002)
construct sub- and supersolutions for any τ > 0 with the required behavior
at infinity and apply a comparison principle.
3.7 The two-times-IV-slope rule for local volatility 65
σ 2 (0, 0) 1 ∂σ(0, 0) s ds
Z
∂b
σ (0, 0)
σ
b(0, 0) + x = σ(0, 0) + x
∂x 2 0 ∂x σ 2 (0, 0)
1 ∂σ(0, 0)
= σ(0, 0) + x . (3.49)
2 ∂x
∂b
σ (0, 0) ∂σ(0, 0)
2 = , (3.50)
∂x ∂x
i.e. the two-times-IV-slope rule holds for short-to-expiry ATM options.
We complete this section by a simulation. Suppose the local volatility smile
for some close expiry date can be approximated within the interval [−0.2, 0.2]
by the function:
σ(x) = a(x + b)2 + c , (3.51)
where a, b, c ∈ R. Computing the harmonic mean according to (3.48) yields
for the IV smile
√
x ac
σ
b(x) = p a pa . (3.52)
arctan c (x + b) − arctan cb
0.36
0.34
0.32
0.3
SCMsimuIVLV.xpl
Fig. 3.3. Simulation of option implied (squares) versus local (circles) volatility
smiles according to (3.51) and (3.52) for a = 0.5, b = 0.15, c = 0.3. Moneyness
def
is (inverse) forward moneyness x = − ln κf . The interval [−0.2, 0.2] corresponds to
[1.22, 0.81] in the usual forward moneyness metric, compare Figure 3.2.
which we give in a simplified setting here for the sake of clarity. Originally,
the authors allow for multi-factor dynamics. The process of the local variance
2
σK,T (St , t) 0≤t≤T ∗ is adapted to the filtration (Ft )0≤t≤T ∗ generated by two
(0) (1)
uncorrelated Brownian motions Wt 0≤t≤T ∗ and Wt 0≤t≤T ∗ . The drift
3.8 The K-strike and T -maturity forward risk-adjusted measure 67
process αK,T (St , t) 0≤t≤T ∗ and the volatility process θK,T (St , t) 0≤t≤T ∗ ,
which reflects the sensitivity of the LVS with respect to random shocks, are not
further specified, but satisfy mild integrability and measurability conditions
(see Derman and Kani (1998) for details).
In this set-up instantaneous variance is given by
Z t Z t
σS2 t ,t (St , t) = σS2 t ,t (S0 , 0) + αSt ,t (Ss , s) ds + θSt ,t (Ss , s) dWs(1) , (3.54)
0 0
measure also the transition probability φ(K, T |St , t) = EQ {δK (ST )|Ft } is a
martingale. Thus, it evolves according to a SDE of the form
The previous analysis has shown, compare (3.17), that local volatility
2
σK,T (St , t) obeys
ct(i) = W (i)
R t (i)
Now introduce new Brownian motions W t − 0 ζK,T (Ss , s) ds, for
i = 0, 1. From (3.60) and (3.57) it is seen that the stochastic evolution of the
local variance is given by
2
dσK,T (St , t)
ct(1) ,
= θK,T (St , t) dW (3.61)
2
σK,T (St , t)
which is a martingale.
We define the new measure Q(K,T ) via its Radon-Nikodým derivative:
" 1 (Z )#
T
dQ(K,T ) 1 T (i)
Z 2
(i)
X
= exp ζK,T (Ss , s) dW s − ζK,T (Ss , s) ds .
dQ i=0 0 2 0
(3.62)
This measure explicitly depends on K and T . Hence it is called the K-
strike and T -maturity forward risk-adjusted measure, in analogy to the theory
of interest rates. Denoting the expectation with respect to the new measure
by E(K,T ) (·) shows that (3.2) can be rewritten as
def
2
σK,T (St , t) = EQ {σS2 T ,T (ST , T )|ST = K, Ft } = E(K,T ) {σS2 T ,T (ST , T )|Ft } ,
(3.63)
which provides the desired representation.
3.9 Model-free (implied) volatility forecasts 69
In a large number of studies that have been surveyed in Section 2.7, the qual-
ity of IV as a predictor of stock price volatility is discussed. However, it may
be advantageous to resort to a volatility measure implied from options that
is independent of the BS model, or at best: model-free. This goal has been
achieved by Britten-Jones and Neuberger (2000). They assume that dividends
and interest rates are zero. In the presence of nonzero interest rates and div-
idends, Britten-Jones and Neuberger (2000) interpret option and asset prices
as forward prices.
Usually one is interested in comparing multi-period forecasts of volatility
with volatility over several periods. To obtain the unconditional expectation
of the Dupire formula (3.19), one first integrates across all strikes K:
Z ∞
EQ {σ 2 (ST , T, ·)|Ft } = EQ {σ 2 (ST , T, ·)|ST = K, Ft }φ(K, T |St , t) dK
0
Z ∞
∂Ct (K, T ) −2
=2 K dK . (3.64)
0 ∂T
For the forecast between the two time horizons T1 < T2 , integrate again
with respect to time to maturity. This yields:
(Z ) Z ∞
T2
Q 2 Ct (K, T2 ) − Ct (K, T1 )
E σ (ST , T, ·)|Ft = 2 dK . (3.65)
T1 0 K2
Thus, if the IVS is flat in K and T , but not necessarily a constant, squared
b2 in our common notation, is the risk-neutral forecast as given in
IV, i.e. σ
(3.65). There is also an intuitive argument: a lot of processes are consistent
with the squared volatility forecast (3.65). Naturally, one of them is the BS
deterministic (squared) volatility process. Hence, it precisely provides the fore-
cast.
70 3 Smile consistent volatility models
Only if the IVS were a constant, i.e. if no stochastics were involved, IV would
be an unbiased forecast for realized volatility. This, however, is a case of little
interest.
The forecast (3.64) is a risk-neutral one. It will necessarily differ from
the forecast under the objective measure, unless volatility risk is unpriced,
and both forecasts cannot simply be compared. Nevertheless, studying the
systematic deviations between realized variance and its risk-neutral forecast,
would certainly contribute to our understanding of how volatility risk is priced.
Here, we survey models and techniques to recover the LVS from observed
option prices. First, deterministic implied trees are presented. They are grown
either by forward induction or by backward induction. Next, trinomial trees
are discussed. Stochastic implied trees are considered in Section 3.10.2. The
section concludes with methods motivated from continuous time theory.
Valuation methods based on trees are working horses in option pricing. Pi-
oneered by Cox, Ross and Rubinstein (1979) (CRR), they provide a simple
framework in which pricing of path-independent and path-dependent options
alike can be accomplished fast and efficiently by backward induction. Most
importantly, under certain regularity conditions, they are the discrete time
approximations to the diffusion
stock stock
HH
H
H H
H H
H HHH
HH
H
HH
HH
time time
Fig. 3.4. Left panel: standard binomial tree, e.g. as in Cox et al. (1979). Right
panel: implied binomial tree derived from market data, Derman and Kani (1994b).
Derman and Kani (1994b), Barle and Cakici (1998). The principle of
constructing implied binomial trees according to Derman and Kani (1994b)
and Barle and Cakici (1998) is forward induction. The tree is (for simplic-
ity) equally spaced with ∆t and has levels j = 1, . . . , J. Since the tree is
72 3 Smile consistent volatility models
where
qj,j λj,j
for i = j + 1 ,
λi,j+1 = qi−1,j λi−1,j + (1 − qi,j )λi,j for 2 ≤ i ≤ j , (3.72)
(1 − q1,j )λ1,j for i = 1 .
∆C
i0 = qi0 ,j λi0 ,j (Si0 +1,j+1 − K) , (3.73)
3.10 Local volatility models 73
s
Si0 +1,j+1
qi0 ,j
node
(i0 , j) si0 ,j s
HH
H
HH
H
HH
Hs
H Si0 ,j+1
level j j+1
time tj tj+1
Fig. 3.5. Construction of the implied binomial tree from level j to level j + 1 accord-
ing to Derman and Kani (1994b) and Barle and Cakici (1998) by forward induction.
si0 ,j denotes the (known) stock price at node (i0 , j), Si0 +1,j+1 the (unknown) stock
price at node (i0 + 1, j + 1). qi0 ,j is the (unknown) risk neutral transition probability
from node (i0 , j) to node (i0 + 1, j + 1). At level j there are i = 1, . . . , j nodes (i, j).
def Pj
where ∆C i0 = C(K, tj+1 ) e
r∆t
− i=i0 +1 λi,j+1 (Fi,j − K). Equation (3.73) de-
pends on the two unknown parameters qi0 ,j and Si0 +1,j+1 . Exploiting the risk
neutrality condition (3.69), we receive from (3.73) the fundamental recursion
formula for the implied binomial trees by Derman and Kani (1994b) and Barle
and Cakici (1998):
∆C
i0 Si0 ,j+1 − λi0 ,j K(Fi0 ,j − Si0 ,j+1 )
Si0 +1,j+1 = . (3.74)
∆C i0 − λi0 ,j (Fi0 ,j − Si0 ,j+1 )
In using (3.69) and (3.74) iteratively, one solves for Si0 +1,j+1 and qi0 ,j
through the upper part of the tree, if an initial Si0 ,j+1 is known. Indeed,
there are 2j + 1 unknown parameters in the tree at level j: j + 1 stock prices
and j transition probabilities, while the number of equations in (3.69) and
(3.71) are only 2j. This remaining degree of freedom is closed by fixing the
root (the center) of the tree. If the number of nodes j + 1 are odd, one fixes
Sj/2+1,j+1 = S. Otherwise, if the number of nodes j + 1 are even, one employs
the logarithmic centering condition known from the CRR tree, i.e. one posits
S(j+1)/2,j+1 S(j+3)/2,j+1 = S 2 . Once the center is fixed the recursions (3.69)
and (3.74) can be used to unfold the upper part of the tree.
Similarly, the lower part of the tree is grown from put prices. One steps
down from the center, and the recursion formula (3.74) is altered to
74 3 Smile consistent volatility models
∆P
i0 Si0 +1,j+1 − λi0 ,j K(Si0 +1,j+1 − Fi0 ,j )
Si0 ,j+1 = . (3.75)
∆P i0 − λi0 ,j (Si0 +1,j+1 − Fi0 ,j )
The trees by Derman and Kani (1994b) and Barle and Cakici (1998) differ
in the choice of the strike prices and the centering condition. Derman and
Kani (1994b) put K = si0 ,j and S = s1,1 , i.e. they fix the center of the tree
at the current asset price. Barle and Cakici (1998) choose K = Fi0 ,j and
S = s1,1 e(r−δ)t , i.e. their tree bends upward with the risk-neutral drift. They
show that this choice produces a better fit to the IV smile, especially, when
interest rates are very high.
Both trees are calibrated to the entire set of available option prices, both
across the strike dimension and across the term structure of the IVS. How-
ever, an inherent difficulty in both trees is the fact that none of them can
prevent transition probabilities from being negative. From negative transition
probabilities, arbitrage possibilities ensue. Derman and Kani (1994b) avoid
this by checking node by node whether Fi,j < Si,j+1 < Fi+1,j . If this con-
dition is violated, they take a stock price that keeps the logarithmic spacing
between neighboring nodes equal to the corresponding nodes at the previous
level. Barle and Cakici (1998) propose to set Si,j+1 = (Fi,j + Fi+1,j )/2. But
even with these modifications, as the authors note, negative transition prob-
abilities may not totally be avoided, either.
s
Qi0 +1,j+1 , Si0 +1,j+1
qi0 ,j
node
(i0 , j) Qi0 ,j , Si0 ,j s
HH
H
HH
H
HH
HHs Qi0 ,j+1 , Si0 ,j+1
level j j+1
time tj tj+1
Fig. 3.6. Construction of the implied binomial tree from level j + 1 to level j ac-
cording to Rubinstein (1994) by backward induction. Si0 ,j denotes the asset price at
(i0 , j) and Qi0 ,j its risk neutral nodal probability. qi0 ,j is the (unknown) risk neutral
transition probability from node (i0 , j) to node (i0 + 1, j + 1). Quantities at level j + 1
are known, while those at j are unknown.
(3) Si0 ,j = e−(r−δ)∆t {(1 − qi0 ,j )Si0 ,j+1 + qi0 ,j Si0 +1,j+1 } ,
i−1 def
where qi,j denotes again the risk neutral transition probability. w(i, j) = j−1
is a weight function, more precisely, the fraction of the nodal probability in
node (i, j) which is going down to its preceding lower node in (i−1, j −1). The
weight function is a consequence of the assumption of path independence and
derived from the arithmetics of the CRR tree, Jackwerth (1997). Note that
our notation follows Jackwerth (1997), but is adapted to observe consistency
with our previous presentation: our tree has the root node (1, 1), which is
different to both authors who start with zero.
An interesting feature of the trees implied by backward induction is that
negative transition probabilities cannot occur by construction. This can di-
rectly be seen from (3.76). However, the crucial assumption in the tree by Ru-
binstein (1994) is the aforementioned property of path independence. While
it facilitates the tree’s construction enormously, it is also its biggest weakness:
only a single maturity of options is calibrated to the tree. This may be disad-
vantageous when pricing exotic options, the expiry of which does not match
76 3 Smile consistent volatility models
with the maturity of the options used as inputs. This deficiency is remedied
by Jackwerth (1997) in allowing for more arbitrary weight functions w(i, j).
More precisely, he proposes the piecewise function:
(
i−1 i−1
2w j−1 for 0 ≤ j−1 ≤ 21
w(i, j) = i−1 1 i−1
, (3.77)
−1 + 2w + (2 − 2w) j−1 for 2 < j−1 ≤ 1
Local volatilities. Given an implied tree, the local volatility σi,j at asset
price level i in time step j is calculated via:
where Ri+1,j denotes the return between the node (i, j − 1) and (i + 1, j) in
the tree. Note that the local volatility may need to be annualized to make it
comparable with IV. If we hold the horizon T of the tree fixed, and let the step
size shrink to zero, the approximation tends to the local variance function of
the corresponding underlying continuous time process.
Practically, this could be the smile function obtained from the smoothing
techniques in Chapter 4.
Let’s assume that S0 = 100, and T = 0.5 years discretized in five time
steps. In this case, the stock price evolution is found to be:
117.9
113.8
110.1 110.0
106.6 106.5
103.2 103.2 103.2
100.0 100.0 100.0
96.9 96.9 96.9
93.8 93.9
90.8 90.9
87.8
84.8
0.483
0.486
0.488 0.488
0.490 0.490
0.492 0.492 0.492
0.494 0.494
0.496 0.496
0.498
0.500
0.028
0.057
0.118 0.148
0.241 0.242
0.492 0.370 0.310
1.000 0.502 0.378
0.508 0.382 0.320
0.257 0.258
0.130 0.162
0.065
0.033
Implied vs local volatility from implied trees
0.2
0.18
0.16
0.14
0.12
0.1
0.08
SCMibtITTconv.xpl
Fig. 3.7. Convex IV smile (squares) computed from (3.79) and local (circles) volatil-
ity recovered from the implied binomial tree (filled circles) and trinomial tree (empty
circles).
SCMibtITTmon.xpl
Fig. 3.8. Monotonous IV smile (squares) computed from σ b = −0.06 ln(K/S) + 0.15
and local (circles) volatility recovered from the implied binomial tree (filled circles)
and trinomial tree (empty circles).
3.10 Local volatility models 79
0.109
0.105
0.102 0.102
0.100 0.100
0.100 0.100 0.099
0.100 0.100
0.102 0.102
0.105
0.109
In Figure 3.7, we display the smile together with the terminal local volatil-
ities (filled circles). It is seen that near ATM the local volatility smile is at the
levels of the IV smile, but increases in either direction from ATM. This is due
to the fact that the IV smile is convex. If it were monotonously decreasing,
local volatility would be below IV in the right-hand side of the Figure. This is
seen for another example in Figure 3.8. The two-times-IV-slope-rule is visible
as well, Section 3.7.
s
Si0 +2,j+1
qiu0 ,j
node
1 − qiu0 ,j − qid0 ,j
(i0 , j) si0 ,j s
s Si0 +1,j+1
HH
H
HH
H
qid0 ,j HH
HHs Si0 ,j+1
level j j+1
time tj tj+1
Fig. 3.9. Construction of the implied trinomial tree from level j to level j + 1
according to Derman et al. (1996a) by forward induction. si0 ,j denotes the (known)
stock price at node (i0 , j), Si0 ,j+1 the (known, since a priori specified) stock price
at node (i0 , j + 1). qiu0 ,j is the (unknown) risk neutral transition probability from
node (i0 , j) to the upper node (i0 + 2, j + 1), qid0 ,j to (i0 , j + 1). At level j there are
i = 1, . . . , (2j − 1) nodes (i, j).
u u d d
Fi,j = qi,j Si+2,j+1 + (1 − qi,j − qi,j )Si+1,j+1 + qi,j Si,j+1 , (3.80)
and the option pricing equation (3.71) for calls maturing one period later
becomes:
2j+1
X
C(K, tj+1 ) = e−r∆t λi,j+1 (Si,j+1 − K, 0)+ , (3.81)
i=1
where
u
λ2j−1,j q2j−1,j for i = 2j + 1
u u d
λ2j−2,j q2j−2,j + λ1,j (1 − q2j−1,j − q2j−1,j ) for i = 2j
u u d d
i−2,j qi−2,j + λi,j (1 − qi−1,j − qi−1,j ) + λi,j qi,j
λ for i = 3, . . .
λi,j+1 = .
. . . , 2j − 1
u d d
λ1,j (1 − q1,j − q1,j ) + λ2,j qj,2 for i = 2
d
λ1,j qj,1 for i = 1
(3.82)
In fixing the strike of the option at K = Si0 +1,j+1 , Derman et al. (1996a)
show that (3.71) together with (3.80) can be solved for the unknown transition
probabilities:
3.10 Local volatility models 81
P2j
er∆t C(Si0 +1 , tj+1 ) − j=i0 +1 λi,j (Fi,j − Si+1,j+1 )
qiu0 ,j = , (3.83)
λi0 ,j (Si0 +2,j+1 − Si0 +1,j+1 )
while qid0 ,j follows immediately from (3.80). This determines the upper tree
from the center, while the lower part is grown from
Pi0 −1
d
er∆t P (Si0 +1 , tj+1 ) − j=0 λi,j (Si+1,j+1 − Fi,j )
qi0 ,j = . (3.84)
λi0 ,j (Si0 +1,j+1 − Si0 ,j+1 )
Again, qiu0 ,j is given by (3.80).
Trinomial trees can be considered to be advantageous compared to bino-
mial ones, since with the same number of steps, the approximation to the
diffusion is finer. Thus, pricing is more accurate at a given number of steps.
Furthermore they provide more flexibility, which – if judiciously handled –
may help avoid negative transition probabilities as encountered in the bi-
nomial trees implied from forward induction. As a drawback, one needs to
specify a priori the state space of the evolution of the asset price. Derman et
al. (1996a) discuss several techniques of doing so, usually taking an equally
spaced trinomial tree as starting point. From our experience, the more curved
the IV function is, the easier the standard CRR tree as state space is over-
taxed: more and more transition probabilities need to be overridden, which
can produce unlikely local volatilities. Thus, the challenge in trinomial trees
lies in an appropriate choice of the state space, which should immediately
reflect the structure of the – at this point unknown! – local volatility function.
0.466
0.393 0.346
0.296 0.276 0.267
0.250 0.249 0.246 0.245
0.244 0.244 0.243 0.242 0.241
0.250 0.249 0.246 0.245
0.296 0.276 0.267
0.393 0.346
0.466
0.487
0.411 0.362
0.309 0.289 0.279
0.262 0.260 0.257 0.256
0.256 0.256 0.254 0.253 0.252
0.262 0.260 0.257 0.256
0.309 0.289 0.279
0.411 0.362
0.487
0.003
0.007 0.010
0.018 0.027 0.038
0.061 0.084 0.100 0.108
0.244 0.241 0.229 0.215 0.202
1.000 0.500 0.378 0.316 0.278 0.251
0.256 0.253 0.240 0.224 0.211
0.067 0.092 0.110 0.118
0.021 0.031 0.044
0.008 0.012
0.004
3.10 Local volatility models 83
In this case, the price of the digital call is Cdig (100, 1) = 0.361. The large
difference in the results of the two trees is of course due to the small number of
levels used in the simulation. After increasing the levels, both prices converge
to Cdig (100, 1) ≈ 0.40. From (3.78) the tree of local volatilities is calculated
as:
0.138
0.127 0.119
0.110 0.106 0.104
0.101 0.101 0.100 0.100
0.100 0.100 0.100 0.099 0.099
0.101 0.101 0.100 0.100
0.110 0.106 0.104
0.127 0.119
0.138
In Figure 3.7, we display the smile together with the terminal local volatil-
ities of the binomial (filled circles) and trinomial trees (empty circles). Natu-
rally, the trinomial tree is more finely spaced.
Derman and Kani (1998). Starting point in Derman and Kani (1998) is
the trinomial tree introduced by Derman et al. (1996a) which is calibrated to
the set of observed option prices. Next local volatilities are perturbed by the
discretized SDE
n (1)
o
2 2
∆σm,n (i, j) = σm,n (i, j) αem,n (i, j)∆tj + θ∆W , (3.85)
where the pair (i, j) denote the node (Si , tj ) in the tree, while (m, n) denote
all future nodes in the tree. This equation is meant to discretize the SDE
in (3.57).
84 3 Smile consistent volatility models
and
Λ(K, t)
if K 0 = Ku
0
Q(St+h = K |St = K, F0 ) = 1 − (1 + u)Λ(K, t) if K 0 = K . (3.90)
0
uΛ(K, t) if K = K/u
In (3.89) it is seen that the probability of the asset arriving at any price
level in the tree on a future date t ∈ T is fully determined by an initial set of
option prices. However, it is also obvious from (3.89) and (3.90) that this does
not determine the probability of a specific price path, since the conditioning
information in (3.90) is neither Ft nor the price history up to t. Thus prices
of exotic options are not unique. The probability of a price path would be
determined if (and only if) the price process were Markovian, i.e. if
This would be the case if the volatility were fully deterministic in S and t.
Thus, under the assumption of a deterministic volatility this approach can be
used to recover the complete price process from option prices.
Under stochastic volatility, however, all risk-neutral processes consistent
with the initial option prices share that the expectation of the squared returns
is given by:
( 2 )
Q St+h − St (u − 1)2 (u + 1)
E St = K = Λ(K, t) . (3.92)
St u
that affects the one-step transition probabilities, i.e. the local volatilities
in the tree. The chain Z ∈ {1, 2, . . . , N } takes values on a set of inte-
gers with the transition matrix defined by its elements Q = (qm,n ), where
qm,n = Q(Zt+h = m|Zt = n, Ft ). The transition probabilities are chosen
independently and depend on the specific volatility process to be modelled.
def def
Define Π(K, t, z) = Q(St = K and Zt = z|F0 ) and Λ(K, t, z) = Q(St =
Ku|St = K and Zt = z, F0 ). The authors show that – in order to be consistent
with the initial set of option prices – Λ(K, t, z) must satisfy:
N
X
Λ(K, t)Π(K, t) = Λ(K, t, n)Π(K, t, n) . (3.93)
n=1
The left-hand side of (3.93) is extracted from the option data. In order to
identify the right-hand side they put Λ(K, t, z) = qe(K, t) v(z), where v(z) is
an exogenously chosen volatility function depending on the state z, and qe(K, t)
a multiplicative, node-dependent drift adjustment.
If all Π(K, t, z) and qe(K, t) are known for all prices K and volatility states
z up to t, forward induction of the tree is done via the following two steps:
first imply
N
X h
Π(K, t + h, z) = qz,n Λ(K/u, t, n)Π(K/u, t, n)
n=1
+ uΛ(Ku, t, n)Π(Ku, t, n)
i
+ {1 − (1 + u)Λ(K, t, n)}Π(K, t, n) . (3.94)
The first step (3.94) follows a discrete version of the forward Kolmogorov
equation, in that the probability of a time-dependent state event is expressed
as the sum of the products of the preceding events and the one-step transition
probabilities. Equation (3.95) is obtained from (3.93).
Pricing works via backward valuation. Let V (K, t, z) be the value of an
option depending on level K and volatility state z. It has the terminal payoff
V (K, T, z). By the backward iteration
N
X h
V (K, t − h, z) = qz,n Λ(K/u, t − h, n)V (Ku, t, n)
n=1
+ uΛ(Ku, t − h, n)V (K/u, t, n)
i
+ {1 − (1 + u)Λ(K, t − h, n)}V (K, t, n) , (3.96)
3.10 Local volatility models 87
the price of the option is computed. Any contingent claim can be valued using
the lattice but the prices depend on the volatility process chosen.
The approach by Britten-Jones and Neuberger (2000) is an elegant, and
fast methodology for valuing options under stochastic volatility. It allows for
a wide range of volatility specifications including mean-reversion, GARCH, or
regime-switching models. Rossi (2002) investigates the ability of this model
to capture the smile dynamics among alternative volatility specifications.
Another recent advance in stochastic local volatility model is an approach
by Alexander et al. (2003): they model the local volatility function by a
stochastic mixture of local variances derived from a small number of base
processes. From this point of view they extend the work by Brigo and Mer-
curio (2001) discussed in Section 3.10.3. Alexander et al. (2003) report that
the model captures the patterns of the IVS both for short and long time to
maturities very well. Overall, stochastic local volatility models appear to be a
fruitful line of research. Their empirical performance in hedging and pricing,
for instance along the lines of Dumas et al. (1998) and Rosenberg (2000),
remains to be investigated more deeply.
Parametric approaches
dSt = µ(St , t) dt + σ
e(St , t) dWt , (3.99)
88 3 Smile consistent volatility models
def
which is unlike our terminology. Typically σe(St , t) = γ(t) p(St ) for a strictly
positive and bounded function γ and a quadratic polynomial p(x) = a +
bx + cx2 . Zühlsdorff (2002) shows existence and uniqueness of the solution
to (3.99) and also discusses option pricing, when p has no, one and two real
roots. According to his simulations this model is perfectly able to mimic the
smile patterns one usually observes in the markets. An empirical application
with bounded polynomials up to order two in asset prices and time to maturity
is given by Dumas et al. (1998).
Other more flexible specifications have been proposed: Brown and Randall
(1999) use sums of hyperbolic trigonometric functions designed to capture the
term structure, smile and skew effects in the surface. Piecewise quadratic and
cubic splines are employed by Beaglehole and Chebanier (2002) and Cole-
man et al. (1999), respectively. McIntyre (2001) approximates the LVS with
Hermite polynomials.
The general advantage of these approaches appears to be that the es-
timated LVS does not exhibit excessive spikes as fully nonparametric cali-
brations are prone to unless strongly regularized. However, the parametric
calibration problem can be underdetermined given the small number of ob-
served market prices and the large number of parameters. Thus, the optimal
parameters may not be uniquely identifiable, which may cause instability for
instance in the computation of value at risk measures, Bouchouev and Isakov
(1999).
Mixture diffusions
with common initial value S0 . The volatility functions θi (·) satisfy similar
growth-conditions. Denote by φi (K, T |St , t) the risk neutral transition density
of these processes. The task is to identify the volatility function of (3.100) such
that the risk neutral transition density satisfies:
N
X
φ(K, T |St , t) = λi φi (K, T |St , t) , (3.102)
i=1
3.10 Local volatility models 89
PN
where λi ≥ 0 and i=1 λi = 1.
As shown in Brigo and Mercurio (2001), the solution is found by insert-
ing the candidate solution (3.102) into the Fokker-Planck equation (see Ap-
pendix B) and solving for the variance function by integrating twice. The
solution is given by:
PN 2
i=1 λi θi (St , t) φi (·)
σ 2 (St , t) = PN . (3.103)
2
i=1 λi St φi (·)
def
In the special case, where θi (St , t) = θi (t)St , the variance can be written
as a weighted average of the individual variance functions:
N
X
σ 2 (S, t) = ei θ2 (t) ,
λ i (3.104)
i=1
ei def
where λ = λi φi (·)/φ(·).
Hence the asset price process satisfies:
v
uN
dSt uX
ei θ2 (t) dW (0) .
= (r − δ) dt + t λ i t (3.105)
St i=1
Brigo and Mercurio (2001) point out that the conditions for existence
and uniqueness of a strong solution to (3.105) must be given case by case for
different specifications of the base transition densities φi (·). Brigo et al. (2002)
and Brigo and Mercurio (2002) analyze the cases of mixtures of normals, log-
normals and sine-hyperbolic processes.
The elegance of this approach becomes apparent in option pricing, espe-
cially when there are analytical pricing formulae for the base transition den-
sities. Due to linearity of the integration and derivative operators, the price
Ht of an option is given by
(i)
where ψ is some payoff function and Ht denotes the corresponding option
prices of the base processes. Also all greeks of Ht are convex sums of the base
option greeks. In the special case of log-normal mixtures, option prices are
90 3 Smile consistent volatility models
weighted sums of the BS prices of the options in the base processes which
makes the computation of prices particularly easy.
The approach is beautiful, since it provides a close link between the lo-
cal volatility and the risk neutral transition density. In the aforementioned
approaches it is difficult, if not impossible, to determine the risk neutral tran-
sition density from its specific parameterization at hand. However, as was seen,
this is desirable as it can make the computation of hedge ratios and prices
more straightforward, especially, when closed-form solutions are available.
Nonparametric methods
As has been pointed out throughout this work, the decisive virtue of smile
consistent models, local volatility models in particular, is that they completely
reproduce or reprice the market, thereby allowing to price plain vanilla options
and exotic options alike with the same model. This is simply by construction,
and theoretically appealing, since a lot of types of exotic options can be hedged
via static approaches, Derman et al. (1995), Carr et al. (1998) and Andersen et
al. (2002). However, since the conditions under which static hedging works, are
92 3 Smile consistent volatility models
typically not met on real markets, in practice one often hedges dynamically.
Dynamic hedging depends on the accuracy to which the greeks describe the
price dynamics to first or second order. However, this is exactly where local
volatility models have been put severely under fire in a article by Hagan et
al. (2002). The authors focus their criticism on the delta computed from local
volatility models.
To illustrate their main argument, they consider the special case where
local volatility is a function of the form:
σ
b σ
b σ
b
HH HH HH HH HH
H H H H- H
HH HH HH HH HH
H H H H H
K K K
Fig. 3.10. Alternative IV smile dynamics assuming an upward shift of the asset
price. Left panel: dynamics of the IV smile implied from (deterministic) local volatil-
ity models. Central panel: sticky-strike assumption. Right panel: sticky-moneyness
assumption.
∂C
ft ∂CtBS ∂CtBS ∂b
σ
= + . (3.112)
∂S ∂S ∂b
σ ∂S
Since the local volatility model predicts that the smile moves left when the
spot moves up and vice versa, which is opposite to common market behavior,
Hagan et al. (2002) conclude that the local volatility delta is wrong or at best
very misleading.
In practice this problem is met by recalibration of the model. Instead of
reading the delta from the finite difference scheme, which yields the model-
implied delta, one shifts the spot and computes the delta via a finite difference
quotient. In shifting the spot, one imposes the IV smile dynamics that are
considered as appropriate, i.e. one recomputes the new option prices either
at the same smile (sticky-strike), or at a smile function shifted with the spot
(sticky-moneyness). This practice, however, has led to a whole delta menu and
a fierce debate on which is the best: the model-implied local volatility delta,
the sticky-strike or BS delta, and the sticky-moneyness delta.
From a theoretical perspective, the answer can be given case by case de-
pending on the prevailing market regime, Derman (1999) and more recently
Crépey (2004), but practically the question appears to be unsolved. In simu-
lating alternative asset price dynamics, McIntyre (2001) finds that the local
volatility model is not delivering robust delta hedges when the true model
is a jump-diffusion, but fairly accurate ones in a pure stochastic volatility
setting. In hedging exercises with real data, Dumas et al. (1998) prefer the
sticky-strike delta to the local volatility variant, whereas Coleman et al. (2001)
94 3 Smile consistent volatility models
and Vähämaa (2004) find opposite evidence. Clearly, the contradicting results
can be due to the fact that the ‘right’ delta depends on the current market
regime, and a final answer cannot be given, or must be sought in stochastic
local volatility settings, Alexander and Nogueira (2004).
Clearly, the delta discussion extends also to other higher order greeks
involving a spot derivative, in particular gamma and vanna, but the literature
appears to be silent on this topic. The difficulty is that higher order greeks are
prone to numerical errors making an analysis very cumbersome. But still, since
the local volatility models are frequently used for options with non-convex
payoff profiles, such as barrier options, this discussion is of vital importance,
and needs to be addressed in the future.
Aside from the delta problem, another unsatisfying feature of LV models
is that they predict flat future smiles: the since the IVS flattens out for longer
time horizons, so does the LVS, compare Figure 3.1. Therefore, implicitly the
model predicts flat future smiles, which is typically not what one expects.
Therefore, options that start in the long dated future, such as forward start
options and cliquet structures, will be priced incorrectly, as their prices are
computed under the assumption of a flat (forward) IVS at their starting date.
These types of exotics need to be priced with stochastic LV model, stochastic
volatility or jump diffusion models that do not suffer from this drawback,
Kruse (2003).
Stochastic IV models follow a different strategy than the local volatility and
the classical stochastic volatility models: the idea is not to introduce the
stochastic setting via the instantaneous volatility function, but through a
stochastic IV process. Like deterministic local volatility models, they allow
for a preference-free option valuation, since markets are complete owing to
the fact that volatility is tradable through options, usually plain vanilla op-
tions of European style. Stochastic IV models were developed by Ledoit and
Santa-Clara (1998) and Schönbucher (1999), and have recently been more
deeply analyzed by Brace et al. (2001), Amerio et al. (2003) and Daglish et
al. (2003).
The (somewhat simplified) model set-up is as follows: for a fixed time
interval [0, T ∗ ], we consider a probability space (Ω, F, Q), where Q is the
(unique) martingale measure in the economy. We define two Brownian motions
(0) (1)
W t 0≤t≤T ∗ and W t 0≤t≤T ∗ on this space. Without loss of generality
they are assumed to be uncorrelated. The space is equipped with a filtration
(Ft )0≤t≤T ∗ . As tradable assets, we have the underlying asset St paying a
constant dividend yield δ, a riskless investment with constant interest rate r,
and a European call option C(St , t, K, T ).
3.12 Stochastic IV models 95
Under the measure Q the asset price dynamics are governed by the SDE
dSt (0)
= (r − δ) dt + σ(St , t, σ
bt ) dW t , (3.113)
St
where σ(St , t, σ
bt ) 0≤t≤T ∗ is some (Ft )0≤t≤T ∗ -adapted stochastic process. It
will be seen that it is driven by the stochastic IV process which follows
db
σt (K, T ) (0) (1)
= α(b σt , t, St ) dt+θ0 (b
σt , t, St ) dW t +θ1 (b σt , t, St ) dW t , (3.114)
σbt (K, T )
where α(b σt , t, St ) 0≤t≤T ∗ and θi (b σt , t, St ) 0≤t≤T ∗ are predictable stochastic
processes. The explicit dependence on (b σt , t, St ) is dropped in the following for
the sake of clarity. Also we will write σ bt only, but the dependence of IV on K
and T should be borne in mind. Finally, all diffusion parameters are assumed
to satisfy the regularity assumptions such that unique strong solutions exist,
see appendix Chapter B. The option is priced using the BS formula together
with the current realization of the IV process σ bt .
A first set of restrictions on the drift of the IV process insures that no-
arbitrage opportunities exist. They are derived as follows: by Itô’s lemma the
dynamics of the call are given by:
∂Ct ∂Ct 1 ∂ 2 Ct
dCt = dt + dSt + σ 2 (St , t, σ
bt )St2 dt
∂t ∂S 2 ∂S 2
∂Ct 1 ∂ 2 Ct ∂ 2 Ct
+ db
σt + σ it +
dhb dhb
σ , Sit .
∂b
σ 2 ∂bσ ∂bσ ∂b
σ ∂S
(3.115)
In the risk neutral world, the drift of the call must be equal to rCt dt.
Thus, by collecting the dt-terms in (3.115) and rearranging, the condition on
the drift reads as
∂Ct ∂Ct 1 2 2 ∂ 2 Ct
0 = + (r − δ)St + σ b S − rCt
∂t ∂S 2 t t ∂S 2
1 2
n o ∂ 2 Ct
+ σ (St , t, σ bt2 St2
bt ) − σ
2 ∂S 2
2
∂Ct 1 ∂ Ct 2 ∂Ct
+α + (θ + θ12 ) + σ(St , t, σ
bt )θ0 St . (3.116)
∂bσ 2 ∂b σ ∂bσ 0 ∂S∂b
σ
Obviously, the first line of (3.116) is the BS PDE (2.13) with IV replacing
the volatility function. It must be equal to zero. Taking this into account, the
condition on the drift is identified as
−1 n o ∂2C
1 ∂Ct t
α= bt2 − σ 2 (St , t, σ
σ bt ) St2
2 ∂b σ ∂St2
∂ 2 Ct 2
2 ∂Ct
− (θ + θ1 ) − 2σ(St , t, σ
bt ) θ0 St .
∂bσ ∂bσ 0 ∂S∂b
σ
(3.117)
96 3 Smile consistent volatility models
σt4 + σ
−b bt2 σ 2 (St , t, σ
bt ) − 2 xθ0 σ bt ) + x2 (θ02 + θ12 ) = 0 ,
bt σ(St , t, σ (3.120)
def
where x = − ln κf = ln{e(r−δ)τ St /K} is (inverse) forward log-moneyness,
bubbles are excluded from the model. This holds uniquely, since for θ0 , θ1 , σ >
0 and x ∈ R this polynomial has only one solution for σ
bt > 0.
Equation (3.120) has at least two implications: first, it is seen that σbt is
quadratic in x, which implies a smile across K. Since its shape is directly
determined by θ0 and θ1 , both parameters may be identified by calibration
to the market smile. If θ0 = 0, i.e. if there is no correlation between the asset
price and the IV dynamics, the smile is symmetric in x. Thus, asymmetry in
the smile, the ‘sneer’, is introduced through the Brownian motion driving both
variables. This parallels the work of Renault and Touzi (1996) as discussed in
Section 2.11.
3.12 Stochastic IV models 97
Second, the ATM IV defined in terms of the forward moneyness, i.e. where
x = 0 converges to instantaneous volatility as T − t tends to zero. However,
this is not a consequence of the no-bubble restriction, but can be formally
proved, Ledoit and Santa-Clara (1998); Daglish et al. (2003). This is seen as
follows: from a first order Taylor series expansion of the BS pricing formula
in the neighborhood
√ of ATM (in the sense of log-forward moneyness), i.e. at
d1 = −d2 = 12 σ
bt τ , we obtain that
1 √
Ct (St , t, e(r−δ)τ St , T ) ≈ √ e−δτ St σ
bt τ . (3.121)
2π
This implies r
2π Ct
lim σ
bt = lim . (3.122)
t↑T t↑T τ e−δτ St
The call price can be approximated for small τ by
n o
Ct = e−rτ EQ (ST − e(r−δ)τ St )+ |Ft
n (0) (0) +
o
≈ e−rτ EQ St σ(St , t, σ bt ) W T − W t |Ft
r
−rτ τ
=e St σ(St , t, σ
bt ) , (3.123)
2π
q
Var(z)
where the last line follows from the fact that E(z)+ = 2π , where z is
a normally distributed random variable with zero mean and variance Var(z).
Inserting (3.122) and taking limits yields the desired result:
lim σ
bt = lim σ(St , t, σ
bt ) . (3.124)
t↑T t↑T
Note that this parallels the harmonic mean averaging result of Berestycki
et al. (2002): here also, ATM local volatility, which is instantaneous volatility,
converges to IV, Section 3.6.
The pricing of path-independent options works along standard lines. By
standard results, the option price H must satisfy the following PDE subject
to the appropriate boundary conditions:
∂H ∂H 1 ∂2H
0= + (r − δ)St + σ 2 (St , t, σ
bt )St2 − rH
∂t ∂S 2 ∂S 2
∂2H ∂H 1 ∂2H
+ σ(St , t, σ
bt ) θ0 St +α + (θ02 + θ12 ) .
∂b
σ ∂S ∂b
σ 2 ∂b
σ ∂b
σ
(3.125)
In the implementation, difficulties may arise from the rather involved no-
arbitrage conditions, Balland (2002). To simplify, Brace et al. (2001) propose
def
to parameterize the volatility of IV as θi (b
σ , t) = θi σ
b, i = 0, 1, where θi > 0
is constant. This removes the singularities apparent in (3.118). Instead of ob-
taining the parameters from fitting the smile as suggested above, they can be
recovered from PCA methods developed in Section 5.2. This path is taken for
instance in Fengler et al. (2002b) and Cont et al. (2002). For the specification
of the instantaneous volatility a lot of freedom remains, as long as (3.120)
is satisfied in the limit. Alternatively, one may fix a drift function and re-
cover from (3.118) the corresponding instantaneous volatility. For instance,
the simplest choice would be to put α = 0.
Finally, it should be remarked that the specification of the model in abso-
lute terms, i.e. in terms of a fixed expiry date and a fixed strike may sometimes
prove to be inconvenient in practice. Especially, an empirical identification of
the parameters is likely to be more stable in terms of moneyness and time to
maturity rather than in strikes and expiry dates. This is addressed in Brace
et al. (2001) who show how to switch from the absolute to the relative nota-
tion of the model and its no-arbitrage conditions. Amerio et al. (2003) follow
this approach and show how to price volatility derivatives using stochastic IV
models.
3.13 Summary
In the first part of this chapter we introduced the theory of local volatility.
Also several techniques for extracting local volatility from option prices were
discussed. The focus was on implied trees. In the second part stochastic IV
models were presented. At this point, we consider it to be appropriate to recall
the concepts of volatility systematically. As explained in the introduction, we
collected the main results in Figure 3.11.
Starting from the most left arrow with the instantaneous variance, the
first (and trivial) relation is the identity of local and instantaneous variance for
K = St and T = t. Moreover, local and implied variance can be represented as
averages of instantaneous variance: local variance is the expectation under the
(K, T )-risk adjusted measure. Implied variance is – for ATM options under
the Hull and White (1987) model – the expectation under the risk neutral
measure. Finally, the stochastic IV models show that ATM IV converges to
instantaneous volatility as time to maturity converges to zero.
The asymptotic relations between implied and local volatility are presented
in the top of the figure. They hold under the assumption of a deterministic
instantaneous volatility function: first, IV is a spatial harmonic mean as time
to maturity converges to zero. Second, if no strike dependence is present or
for far OTM/ITM options, IV is a time average of local volatility. The Dupire
3.13 Summary 99
IV counterpart of Dupire formula (3.36)
t ↑ T : spatial harmonic
mean of volatility (3.46)
local variance determ. - implied variance
2
σK,T (St , t) no strike dependence bt2 (K, T )
σ
or far OTM/ITM
KA KA arithmetic mean (2.78) and (3.47)
A AA
A
A E(K,T ) {σ 2 (S , T, ·)|F } Qλ1
√ 2
2
A T t {E ( σ |Ft )}
A Section 3.8 K ≈ Ft , see (2.93)
K = St , T = t A K = Ft , t ↑ T
see (3.4) A A see (3.124)
A A
A
U A
instantaneous variance
σ 2 (St , t, ·)
Fig. 3.11. Overview on volatility concepts. Solid lines denote exact relations between
the different types of volatility. The dashed line denotes an ad-hoc concept. The
arrows denote the direction of the relations.
formula in its IV representation – the dotted line – allows for recovering the
LVS from the IVS and its derivatives. It is an ad-hoc concept, but a convenient
way to reconstruct the LVS. Finally, the two-times-IV-slope was shown to hold
for ATM options near to expiry.
After the recent theoretical and computational advances, local volatility
models are found to be more and more criticized either for practical reasons
or from theoretical grounds. From the practical perspective, there is the crit-
icism that local volatility models deliver a wrong delta, Hagan et al. (2002).
As discussed, the empirical literature does not appear to be strongly conclu-
sive on the matter. A harsh methodological criticism is given by Ayache et al.
(2004). Their main argument against local volatility is that these models lack
economic grounds by not offering a reasonable smile explanation, as stochastic
volatility or jump diffusion models do. Rather these models ‘tweak’ the diffu-
sion coefficient in the BS PDE, until the observed option prices are matched.
100 3 Smile consistent volatility models
Smoothing techniques
4.1 Introduction
Functional flexibility is a key requirement for model building and model se-
lection in quantitative finance: often it is difficult, and sometimes impossible
to justify on theoretical grounds a specific parametric form of an economic
relationship under investigation. Furthermore, in a dynamic context, the eco-
nomic structure may be liable to sizable changes and considerable fluctuations.
Thus, estimation techniques that do not impose any a priori restrictions on the
estimate, such as non- and semiparametric methods, are increasingly popular
in financial practice.
In the case of the IVS, model flexibility is a prerequisite rather than an
option: as has been seen in Chapter 2, from the BS theory, the IVS should
be a flat and constant function across strike prices and the term structure
of the option’s time to maturity. Yet, as a matter of fact, one observes rich
functional patterns fluctuating through time. This feature together with the
discrete design, i.e. the fact that the daily IV observations occur only for a
limited number of maturities, render IVS estimation an intricate challenge.
Parametric attempts to model the IVS along the strike profile, i.e. the
‘smile’, usually employ quadratic specifications, Shimko (1993), Ané and Ge-
man (1999), and Tompkins (1999) among others. Also some of the methods
listed for estimating the local volatility function are applicable here, Sec-
tion 3.10.3. To allow for more flexibility, Hafner and Wallmeier (2001) fit
quadratic splines to the smile function. However, it seems that these para-
metric approaches are not capable of capturing the salient features of IVS
patterns, and hence estimates may be biased.
Recently, non- and semiparametric smoothing techniques for estimating
the IVS have been used more and more: Aı̈t-Sahalia and Lo (1998), Rosenberg
(2000), Cont and da Fonseca (2002), Fengler et al. (2003b) employ a Nadaraya-
Watson estimator of the IVS function, and higher order local polynomial
102 4 Smoothing techniques
yi = m(xi ) + εi , i = 1, . . . , n . (4.1)
where {wi,n (x)}ni=1 denotes a sequence of weights. The weights reflect the
likely fact that one will give higher weights to the observations xi in the near
vicinity of x than for those far off. Most nonparametric techniques can be
written in this way, and differ only in the way the weights are computed.
In Section 4.2 and 4.3 of this chapter, we give an introduction into
Nadaraya-Watson and local polynomial smoothing, which are the techniques
employed for almost any of the graphical illustrations throughout this work.
In Nadaraya-Watson smoothing one estimates a local constant, while in local
polynomial smoothing one fits a polynomial of order p within a small neighbor-
hood. From this point of view, Nadaraya-Watson smoothing is the special case
of local polynomial smoothing with degree p = 0. Usually, in local polynomial
smoothing, one uses a local linear estimator, i.e. p = 1, which is less affected
by a bias in the boundary regions of the estimate than the Nadaraya-Watson
estimator, Härdle et al. (2004). This however is asymptotically negligible.
When it is mandatory to also estimate derivatives, e.g. when the LVS is
recovered from the IVS, Section 3.5, one needs to use higher order local poly-
nomials. The degree of the polynomial depends on the number of derivatives
desired. Since the local polynomial estimator can be written as a weighted
least squares estimator, implementation is straightforward.
Section 4.5 presents an IVS estimator, a least squares kernel smoother,
proposed by Gouriéroux et al. (1994) and Fengler and Wang (2003). This
approach smoothes the IVS in the space of option prices and avoids a po-
tentially undesirable feature of previous estimators: the two-step procedure.
4.2 Nadaraya-Watson smoothing 103
Traditionally, in a first step, IVs are derived by equating the BS formula with
observed market prices and by solving for the diffusion coefficient, Section 2.5.
In the second step the actual fitting algorithm is applied. A two-step estimator
may be biased, when option prices or other input parameters can be observed
with errors, only. Moreover, the nonlinear transformation of the option prices
makes the error distribution less tractable. Indeed, it has been conjectured
that the presence of measurement errors can be of substantial impact, see
Roll (1984), Harvey and Whaley (1991), and particularly Hentschel (2003)
for an extensive study on errors in IV estimation and their possible magni-
tude. Potential error sources are the bid-ask bounce, nonsynchronous pricing,
infrequent trading of index stocks, and finite quote precision. Unlike the lo-
cal polynomial smoother, the least squares kernel smoother does not have a
closed-form solution, and for each grid point, the estimation must be achieved
separately by a minimizing the objective function. On the other hand, as shall
be seen, our results allow for the estimation of confidence bands that take the
nonlinear transformation of the option prices into IVs into account.
A third methodology due to Fengler et al. (2003a) estimates the IVS via a
semiparametric factor model. The reason for investigating this third approach
lies in the very nature of the IVS data. As has been pointed out in Section 2.5,
the IVS data are not equally distributed in the space, but occur in strings. Un-
less carefully calibrated, the fits obtained by the methods, which are discussed
in this chapter, can be biased. The estimation strategy of the semiparametric
factor model is specifically tailored to the degenerated, discrete string struc-
ture of the IVS data. It shall be discussed in Chapter 5, since we consider
the dimension reduction aspects of this approach as its dominating feature,
although it may also be seen as a pure estimation technique.
def 1 2
K(u) = ϕ(u) = √ e−u /2 , (4.6)
2π
where π = 3.141... denotes the circle constant.
For multidimensional smoothing tasks, as for IVS estimation, one needs
multidimensional kernels. It is most common to obtain multidimensional ker-
nels via products of univariate kernels:
d
Y
K(u1 , . . . , ud ) = K(j) (uj ) , (4.7)
j=1
which in this way inherit the properties of the univariate kernel function.
While different kernels have a different impact on the theoretical properties
of the estimator, in practice the choice of the kernel function is not of big
importance, and is mainly driven by practical considerations, Marron and
Nolan (1988). For our work, we will only use quartic kernels and products of
them.
The degree of localization or smoothing is steered via the bandwidth h.
For instance, for a given data set {(xi , yi )}ni=1 , for x, y ∈ R, the bandwidths
enter the kernel functions via
1 x − xi
K , i = 1, . . . , n . (4.8)
hn hn
The index n for the bandwidth clarifies that hn actually depends on the
number of observations. This is natural, since in the asymptotic perspective,
as the number of observations tend to infinity, the degree of localization can
shrink to zero without ‘loosing’ information about the regression function. In
most cases, however, we will suppress this explicit notation.
Finally, it will occasionally be convenient to use the abbreviation
def 1 u
Kh (u) = K . (4.9)
h h
4.2 Nadaraya-Watson smoothing 105
Y = m(X) + ε , (4.10)
with the unknown (but twice differentiable) regression function m. The ex-
planatory variable X and the response variable Y take values in R, have the
joint pdf f (x, y) and are independent of ε. The error ε has the properties
E(ε|x) = 0 and E(ε2 |x) = σ 2 (x).
Taking the (conditional) expectation of (4.10) yields
which says that the unknown regression function is the conditional expec-
tation function of Y given X = x. Using the definition of the conditional
expectation (4.11) can be written as
R
yf (x, y) dy
m(x) = E(Y |X = x) = , (4.12)
fx (x)
where fx denotes the marginal pdf. Representation (4.12) shows that the
regression function m can be estimated via the kernel density estimates of
the joint and the marginal density. This approach was first introduced by
Nadaraya (1964) and Watson (1964).
Suppose we are given the randomly sampled iid data set {(xi , yi )}ni=1 .
Then, the Nadaraya-Watson estimator is given by:
Pn
n−1 i=1 Kh (x − xi ) yi
m(x) = −1 Pn . (4.13)
i=1 Kh (x − xi )
b
n
Rewriting (4.13) as
n n
1X K (x − xi ) 1X
m(x) = −1
Pnh yi = wi,n (x) yi (4.14)
j=1 Kh (x − xj )
b
n i=1 n n i=1
def K (x − xi )
wi,n (x) = Pnh . (4.15)
n−1 j=1 Kh (x − xj )
h2 m0 (x)fx0 (x)
00
Bias{m(x)}
b = µ2 (K) m (x) + 2
2 fx (x)
+ O(n−1 h−1 ) + O(h2 ) , (4.17)
1 σ 2 (x)
Z
Var{m(x)}
b = K 2 (u) du + O(n−1 h−1 ) . (4.18)
nh fx (x)
For a precise treatment of the preceding statements see for instance Härdle
(1990) or Pagan and Ullah (1999).
The Nadaraya-Watson estimator generalizes in a straightforward manner
to the multivariate case: for some Rd -valued sample {(xi , yi )}ni=1 , the multi-
variate Nadaraya-Watson estimator is given by
Pn
Kh (x − xi ) yi
m(x) = Pi=1
n , (4.19)
i=1 Kh (x − xi )
b
1 (p)
m(ξ) ≈ m(x) + m0 (x)(x − ξ) + . . . + m (x)(x − ξ)p (4.21)
p!
for ξ in the neighborhood of x. Again we include the neighborhood of x via
kernel weights. Thus, an estimator of m(x) can be formulated in terms of the
quadratic minimization problem
Xn 2
p
min
p+1
yi − β0 − β1 (x − xi ) − . . . − βp (x − xi ) Kh (x − xi ) , (4.22)
β∈R
i=1
Then we can write the solution of (4.22) in the usual least squares formu-
lation as
β(x)
b = (X> WX)−1 X> Wy . (4.25)
m(x)
b = βb0 (x) , (4.26)
by comparison of (4.21) and (4.22). From Equation (4.25), writing the esti-
mator as a local average of the response function is obvious.
Practice requires the choice of p. From the asymptotic behavior it is known
that polynomials with odd degrees are to be preferred to those with even ones,
i.e. the order one polynomial outperforms the order zero polynomial, the order
three polynomial the order two polynomial etc. A case used particularly often
is the local linear estimator with p = 1. It has been studied extensively by
Fan (1992, 1993) and Fan and Gijbels (1992).
108 4 Smoothing techniques
For the local linear estimator the asymptotic variance is identical to that
stated in (4.18) for the Nadaraya-Watson estimator. The asymptotic bias takes
the form:
h2
Bias{m(x)}
b = µ2 (K) m00 (x) + O(h2 ) . (4.27)
2
Comparing (4.27) with (4.17), uncovers a remarkable difference: the bias
does not depend on the densities, i.e. it is said to be design adaptive, Fan
(1992). Moreover the bias vanishes, when m is linear. Thus local linear es-
timation can be superior to Nadaraya-Watson smoothing when the design
becomes sparse as is typically the case for the IVS data. Another advantage
of the local linear estimator is that its bias and variance are of the same order
in magnitude in both the interior and the boundary of fx . In practice, this
may improve the behavior of the estimate near the boundary of the design.
An important byproduct of local polynomial estimators is that they pro-
vide an easy and efficient way for computing derivatives up to order (p + 1)
of the regression function. For instance, the jth order derivative of m, m(j) ,
is given by
mb (j) (x) = j! βbj (x) . (4.28)
For the Rd -variate extension to (4.22), one proceeds similarly. For instance,
for the local linear estimator we have
n
X 2
>
min yi − β0 − β 1 (x − xi ) Kh (x − xi ) , (4.29)
β0 ,β 1 ∈Rd
i=1
where Bias{m(x)}
b = E{m(x)
b − m(x)}.
Denote by AMSE the asymptotic MSE, which is obtained by ignoring all
lower order terms in expressions like (4.17) and (4.18). This shows for the case
of the Nadaraya-Watson estimates (as most other nonparametric estimates)
that
1
AMSE(h) = c1 + h4 c2 , (4.32)
nh
where c1 and c2 are constant. Minimizing with respect to h yields that
h ∝ n−1/5 . (4.33)
where w(·)
e is some weight function. It may be employed to assign less weight
to regions where the data are sparse. A discrete approximation to the ISE is
the average squared error (ASE):
n
def 1 X
ASE(h) = b i ) − m(xi )}2 w(x
{m(x e i) . (4.35)
n i=1
Both the ISE and the ASE are random variables. Taking the expectation
of the ISE, yields the mean integrated squared error (MISE)
def
MISE(h) = E{ISE(h)} , (4.36)
which is not a random variable. One may also take the expected value of the
ASE, which yields the mean average squared error (MASE). We use a weighted
version of the MASE for model selection in the semiparametric factor model,
see Secion 5.4.3.
For the Nadaraya-Watson estimator it has been shown by Marron and
Härdle (1986, Theorem 3.4) that under mild conditions the ISE, ASE, and
MISE are asymptotically equivalent in the sense that
110 4 Smoothing techniques
and
sup |ISE(h) − MISE(h)|/MISE(h) −→ 0 a.s. , (4.38)
h
For the generalized cross validation selector, we have CV (h) = G(h) with
ΞGCV . For other asymptotically equivalent choices of Ξ(·) see Härdle et al.
(2004).
If we denote by b
h the minimizer of G(h) and by bh∗ the minimizer of ASE,
then for n ↑ ∞
ASE(bh) p h p
b
−→ 1 and −→ 1 . (4.46)
ASE(b h∗ ) h∗
b
def K
κ = , (4.47)
St
where St = Ft e−rτ , since δ ≈ 0.
In Table 4.1, we give an overview of the data employed. We prefer to
present the summary statistics in form of the IV data obtained by inverting
the BS formula separately for each observation rather than in form of the
option price data itself. The corresponding option prices will be displayed
later in the context of the least squares kernel estimator, see the top panel
of Figure 4.7. For the distribution of the data across moneyness compare
Figure 4.1, which presents density plots of moneyness for calls, puts, and all
the observations observed on 20010102 for 17 days to expiry. The densities are
obtained via a nonparametric density estimator, and bandwidths are chosen
by Silverman’s rule of thumb. Silverman’s rule of thumb is a particular way
to choose bandwidths in nonparametric density estimation, see Härdle et al.
112 4 Smoothing techniques
10
5
0
Table 4.1. IV data as obtained by inverting the BS formula separately for each
observation in the sense of two-step estimators.
4.4 Bandwidth selection 113
(2004) for details. Put and call densities appear shifted. This is due to the
higher liquidity of ATM and OTM options. For the sake of space, we do not
present the very similar plots for the other expiry dates and 20010202.
For our smile fits, we pick the options nearest to expiry from the 20010102
data. We start using the Nadaraya-Watson estimator for different bandwidths
to demonstrate the tradeoff between bias and variance. The top left estimate
for the bandwidth h = 0.005 in Figure 4.2 is clearly undersmoothed: the
estimate is very rough especially in the far OTM regions and has spikes. Since
the smile in the ATM region looks already quite reasonable, one solution is to
employ local bandwidths h(x) that vary in x. In this case bandwidths should be
an increasing function in either direction from ATM. Alternatively one may
increase the global bandwidth: the estimate obtained for h = 0.01 appears
already smoother, but still has some ‘whiggles’. Increasing the bandwidth
further to h = 0.05 yields the smooth smile function seen in the lower left panel
in Figure 4.2. However, the function appears already slightly biased, since in
the wings of the smile the estimated function tends to lie systematically below
the IV observations. This becomes more obvious for the large and extremely
oversmoothing bandwidth h = 0.1, Figure 4.2 lower right panel. The reason
for this behavior of the Nadaraya-Watson estimator is that the number of
observations become smaller and smaller the farther we move into the wings
of the smile. Thus, within the local window of averaging, the estimate will be
strongly influenced by the mass of the observations which have a lower IV.
Next, we run an Akaike penalizing approach for the bandwidth choice. In
the top panel of Figure 4.3, we display the penalized objective function. It
is a convex function that takes its minimum in the neighborhood of 0.0285,
for which we display the estimate in the lower panel. It appears to provide a
reasonable fit to the data.
The exercise can be repeated for the local linear estimator. The results are
displayed in Figure 4.4. Typically the bandwidths for the local polynomial
estimator need to be bigger than for the Nadaraya-Watson estimator. This
is seen in the upper left plot of the figure. Here, the bandwidth in the wings
of the smile is too small to yield a reasonable estimate. For the bigger band-
widths better estimates are obtained. Note that the bias problem visible for
the Nadaraya-Watson estimator is less present in local linear smoothing: even
for the biggest bandwidth from our set, 0.1, we receive a reasonable result.
This is because even for larger intervals, the IV smile can be reasonably well
fitted by piecewise linear splines. The bandwidth needs to be increased much
stronger to produce an estimate similar to that in the lower right panel of
Figure 4.2. Given the typical parabolic shape of the smile function, this effect
is even more striking for local quadratic fits.
For precisely this reason we prefer local polynomial smoothing in smile
modeling: for the functional shapes that are usually encountered in smile
modeling, the local polynomial estimates appear to be relatively robust against
oversmoothing. This facilitates bandwidth choice enormously for two reasons:
114 4 Smoothing techniques
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2
Moneyness Moneyness
0.35
0.3
0.3
0.25
0.25
0.2
0.2
Fig. 4.2. Smile function obtained via Nadaraya-Watson smoothing for various band-
widths h. From top left to lower right h is: 0.005, 0.01, 0.05, 0.1 .
first, the data in the outer wings of the smile can become very sparse. Thus,
if a global bandwidth is to be used, it is likely that the smile needs to be
oversmoothed. Second, from the perspective of computing daily estimates of
the smile in a large sample as ours, it can be justified to employ one single
and potentially slightly oversmoothing bandwidth for all estimates without
minimizing the penalized resubstitution estimate again and again.
For estimates of the entire IVS, in principle one could proceed similarly.
The empirical difficulty, however, seems to be that both cross validation and
penalizing approaches tend to yield unsatisfactory results due to the intricate
design of the IV data in the time to maturity direction: while the bandwidth
optimization in moneyness direction poses no difficulty, adding the time to
maturity dimension leads to convexity problems in the penalized function
and consequently to unreasonable minimizers, such as boundary solutions.
4.4 Bandwidth selection 115
0.15
Y*E-3
0.1
0.05
Fig. 4.3. The top panel displays the penalized resubstitution estimate of the
Nadaraya-Watson estimator. Penalizing function is the Akaike function (4.44). The
lower panel shows the smile function obtained for the optimal bandwidth h = 0.028.
116 4 Smoothing techniques
0.35
0.35
0.3
0.3
0.25
0.25
0.2
0.2
0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2
Moneyness Moneyness
0.35
0.3
0.3
0.25
0.25
0.2
0.2
Fig. 4.4. Smile function obtained via local linear smoothing for various bandwidths
h. From top left to lower right h is: 0.005, 0.01, 0.05, 0.1 .
This phenomenon has been first discussed by Fengler et al. (2003b), see also
Fengler et al. (2003a).
The practical solution we adopt in most cases, where we use global band-
widths, such as in the CPC analysis, is the following: we run the aforemen-
tioned minimization only across moneyness in each of a number of daily
samples. Next, we inspect the minimizers and the bias over a wide range
of bandwidths. Typically the conclusions are similar, and we use slightly over-
smoothing, but fixed bandwidths for all estimates. This approach is justified
by the fact that in the time to maturity direction one is more interested in
interpolation rather than in smoothing. In the semiparametric factor model,
where visible inspection is not directly possible, we propose a weighted Akaike
penalization that explicitly takes into account the sparseness of the data. This
is explained in Section 5.4.
4.4 Bandwidth selection 117
0.38
0.34
0.30
0.25
0.21
0.50
0.41
0.32
0.72 0.23
0.89 0.14
1.06
1.23
1.40
0.38
0.34
0.30
0.25
0.21
0.50
0.41
0.32
0.72 0.23
0.89 0.14
1.06
1.23
1.40
Fig. 4.5. IVS estimation via Nadaraya-Watson (top panel) and local linear smooth-
ing (lower panel). Global bandwidth h1 = 0.04 in moneyness and h2 = 0.3 in time
to maturity direction.
118 4 Smoothing techniques
0.50 0.03
0.44 -0.24
0.38 -0.52
0.32 -0.79
0.26 -1.06
IVS: first order time to mat. derivative IVS: second order moneyness derivative
0.15 5.63
-0.26 4.56
-0.66 3.48
-1.07 2.41
-1.47 1.33
Fig. 4.6. IVS derivative estimation via local polynomial estimation of order two.
From upper left to lower right, the plots show the IVS on 20000502, the first order
moneyness derivative, the first order time to maturity derivative, and the second
order moneyness derivative. Bandwidths are localized. The corresponding LVS plot
was given in Figure 3.1.
4.5 Least squares kernel smoothing 119
In this section, we propose a special smoother designed for estimating the IVS.
It is a one-step procedure based on a least squares kernel (LSK) estimator that
smoothes IV in the space of option prices. There is no need for first inverting
the BS formula to recover IV observations – the observed option prices are the
input parameters required. The LSK estimator is a special case of a general
class of estimators, the so called kernel M-estimators, that has been introduced
by Gouriéroux et al. (1994). Gouriéroux et al. (1995) employ this estimator
to model and predict stochastic IV.
Since we aim at estimating on a moneyness metric, we rewrite the BS
formula for calls (2.23) in terms of moneyness as follows, Gouriéroux et al.
(1995):
C BS (St , t, K, T, σ, r, δ) = St cBS (κt , τ, σ, r, δ) , (4.48)
def − ln κ +(r+ 1 σ 2 )τ
where cBS (κt , τ, σ, r, δ) = Φ(d1 ) − κt e−rτ Φ(d2 ), and d1 = t
√ 2
σ τ
,
√
d2 = d1 − σ τ as before. We recall that throughout this section we work with
the simple moneyness measure
def K
κt = . (4.49)
St
120 4 Smoothing techniques
(A1) The moneyness of the option prices is iid , and Eκ4t < ∞.
(A2) The weight function w(·) is uniformly continuous and bounded.
(A3) K(1) (·) and K(2) (·) are bounded probability density kernel functions
with bounded support.
(A4) Interest rate r is a fixed constant.
the impact from changing interest rates can be substantial for options with a
very long time to maturity.
Given assumptions (A1) to (A4), we have:
The proof can be found in Gouriéroux et al. (1994) and is contained for the
sake of completeness in the version of Fengler and Wang (2003) in Appendix C.
where
h
def
γ 2 = E{−B 2 (κt , τ, r, σ)w(κt )
i2
+ A(κt , τ, r, σ)D(κt , τ, r, σ)w(κt )|Ft } ft (κt , τ ) , (4.53)
def
ν 2 = E{A2 (κt , τ, r, σ)B 2 (κt , τ, r, σ)w2 (κt )|Ft }
Z
2 2
× K(1) (u)K(2) (v) dudv , (4.54)
For the proof see Gouriéroux et al. (1994) and in Appendix C. Finally,
the results carry over to put options: By the put-call-parity and the bounded
122 4 Smoothing techniques
pay-off of put options, both results hold also for put options, with A replaced
correspondingly.
The asymptotic distribution depends intricately on the first and second
order derivatives, and the particular weight function. Nevertheless an approx-
imation is simple, since the first and second order derivatives have the ana-
lytical expressions given in Equations (4.51) and (4.52).
For the choice of the weighting function, one may go back to the early lit-
erature on IV. In the vain of obtaining a good forecast of the asset price
variability, these studies discuss weighting the observations intensively, see
the discussion in Section 2.10. Schmalensee and Trippi (1978) and Whaley
(1982) argue in favor of unweighted averages, i.e. they use the scalar estimate
n
X
σ
b = arg min ei − C BS (·, σ)}2 ,
{C (4.55)
σ
i=1
def
where wi = ∂Ci /∂σ is the option vega.
Similarly, Latané and Rendelman (1976) use the squared vega as weights:
v
u n n
uX X
σ
b=t wi2 σ
bi2 / wi . (4.57)
i=1 i=1
Finally, Chiras and Manaster (1978) propose to employ the elasticity with
respect to volatility:
Xn n
X
∗
σ
b = ηi σ
bi / ηi , (4.58)
i=1 i=1
def
where ηi = ∂C i σ
∂σ Ci .
For calls and puts, vega is a Gaussian shaped function in the underlying
centered (roughly) ATM, compare Equation (4.51) and Figure 2.3. Elasticity is
a decreasing (increasing) function in the underlying for calls (puts). Common
concern of the weighting procedures is to give low weight to ITM options, and
highest weight to ATM or OTM options: ITM options are more expensive than
ATM and OTM options because their intrinsic value, i.e. the payoff function
evaluated at the current underlying prices, is already positive. Thus, they
provide lower leverage for speculation, and produce higher costs in portfolio
4.5 Least squares kernel smoothing 123
0.1
0.05
0
Fig. 4.7. Upper panel: Observed option price data on 20010102. From lower left
to upper right relative put prices, from upper left to lower right relative call prices.
Lower panel: LSK smoothed IV smile for 17 days to expiry on 20010102. Bandwidth
h1 = 0.025, quartic kernels employed. Minimization achieved by Golden section
search. Dotted lines are the 95% confidence intervals for σb. Single dots are IV data
obtained by inverting the BS formula separately for each observation in the sense of
two-step estimators.
124 4 Smoothing techniques
0.15
0.1
0.05
0
Fig. 4.8. Upper panel: Observed option price data on 20010202. From lower left
to upper right relative put prices, from upper left to lower right relative call prices.
Lower panel: LSK smoothed IV smile for 14 days to expiry on 20010202. Bandwidth
h1 = 0.015, quartic kernels employed. Minimization achieved by Golden section
search. Dotted lines are the 95% confidence intervals for σb. Single dots are IV data
obtained by inverting the BS formula separately for each observation in the sense of
two-step estimators.
4.5 Least squares kernel smoothing 125
hedging. Due to their lower trading volume, they are suspected to sell at a
liquidity premium from which biased estimates of IV may ensue. Consequently,
some authors delete or downweigh ITM options, Aı̈t-Sahalia and Lo (1998).
The LSK estimator is general enough to allow for uniformly continuous
and bounded weighting functions w(κ) depending on moneyness. Technically,
it is possible to use weights depending also on other variables including σ
as done in (4.56) to (4.58). For several reasons, however, we refrain from
using more involved weight functions: first, when ITM options are deleted or
downweighted in the more recent literature, this choice is entirely determined
by moneyness, not by the vega. From this point of view, to have the weighting
scheme depend on σ is rather implicit. Second, from a statistical point of view,
weights depending on σ are likely to blow up the asymptotic variances in form
of the derivatives of w. This complicates the estimation and the computation
of the confidence bands without adding to the problem of recovering a good
estimate of the IVS. Finally, if one likes weights looking like the option vega
or elasticity with respect to volatility, one may very easily construct weights
w(κ) that look very similar. For instance, an estimator in the type of Latané
and Rendelman (1976) would put w shaped as a Gaussian density.
For the IVS estimation in our particular application, we want to give less
weight to ITM options. This can be achieved by using as weighting functions:
1 n o
w(κ) = arctan α(1 − κ) + 0.5 , (4.59)
π
for calls, and for puts:
1 n o
w(κ) = arctan α(κ − 1) + 0.5 , (4.60)
π
where π = 3.141... is the circle constant. The parameter α controls the
speed, with which ITM options receive lower weight. ATM options are equally
weighted. Outside κ ≈ 1, only OTM options enter the minimization with sig-
nificant weight. In our application we choose α = 9. Other values are perfectly
possible, and this choice is motivated to have a gentle transition between OTM
call and OTM put options. The ultimate choice of α will depend on the specific
application at hand.
As kernel functions we employ the quartic kernels given in Equation (4.4).
Other bounded kernels can perfectly be used, such as the Epanechnikov kernel
as stated in (4.5). In practice, the choice of the kernel functions has little
impact on the estimates, Marron and Nolan (1988) and Härdle (1990). Since
the minimization is globally convex (compare the proof of consistency in the
appendix), and well posed as long as h1 and h2 do not become unreasonably
small, any minimization algorithm for globally convex objective functions can
be employed. We use the Golden section search, described for instance in
Press et al. (1993) and implemented in XploRe, Härdle et al. (2000b). The
tolerance, i.e. the fractional precision of the minimum, is fixed at 10−8 .
126 4 Smoothing techniques
0.38
0.34
0.30
0.25
0.21
0.24
0.20
0.16
0.78 0.12
0.88 0.08
0.98
1.08
1.18
0.46
0.39
0.32
0.26
0.19
0.27
0.22
0.17
0.75 0.13
0.84 0.08
0.93
1.02
1.11
Fig. 4.9. Top panel: IVS fit for 20010102; lower panel: IVS fit for 20010202, both
with the LSK smoother. In both panels, bandwidths are h1 = 0.03 in the moneyness
direction and h2 = 0.07 in the time to maturity direction. single dots denote IV data
obtained by inverting the BS formula separately for each observation in the sense of
two-step estimators. All observations are equally weighted.
4.6 Summary 127
We use the same data as already presented in Table 4.1. For the smile
estimation, we pick the options with the shortest time to expiry from the
20010102 and the 20010202 data. Plots are displayed in Figures 4.7 and 4.8.
The top panel shows the observed option prices given on the moneyness scale.
The function from the lower left to the upper right is the put price function,
the one from the upper left to the lower right the call price function. This is
at odds with the familiar ways of plotting these functions. The effect is due to
our definition of moneyness. The lower panels in Figures 4.7 and 4.8 present
the smile together with the asymptotic confidence bands. They fan out at the
wings of the smile since the data become increasingly sparse.
In Figure 4.9, fits for the entire IVS are presented. They appear under-
smoothed compared with Figure 4.5, since we used very small bandwidths in
the time to maturity direction. For these estimates, we do not employ the
weight functions (4.59) and (4.60): all observations are equally weighted.
4.6 Summary
Dimension-reduced modeling
5.1 Introduction
The IVS is a complex, high-dimensional random object. In building a model,
it is thus desirable to have a low-dimensional representation of the IVS. This
aim can be achieved by employing dimension reduction techniques. Generally
it is found that two or three factors with appealing financial interpretations
are sufficient to capture more than 90% of the IVS dynamics. This implies for
instance for a scenario analysis in risk-management that only a parsimonious
model needs to be implemented to study the vega-sensitivity of an option
portfolio, Fengler et al. (2002b). This section will give a general overview on
dimension reduction techniques in the context of IVS modeling. We will con-
sider techniques from multivariate statistics and methods from functional data
analysis. Sections 5.2 and 5.3 will provide an in-depth treatment of the CPC
and the semiparametric factor model of the IVS together with an extensive
empirical analysis of the German DAX index data.
In multivariate analysis, the most prominent technique for dimension re-
duction is principal component analysis (PCA). The idea is to seek linear com-
binations of the original observations, so called principal components (PCs)
that inherit as much information as possible from the original data. In PCA,
this means to look for standardized linear combinations with maximum vari-
ance. The approach appears to be sensible in an analysis of the IVS dynamics,
since a large variance separates out systematic from idiosyncratic shocks that
drive the surface. As a nice byproduct, the structure of the linear combinations
reveals relationships among the variables that are not apparent in the origi-
nal data. This helps understand the nature of the interdependence between
different regions in the IVS.
In finance, PCA is a well-established tool in the analysis of the term struc-
ture of interest rates, see Gouriéroux et al. (1997) or Rebonato (1998) for text-
book treatments: PCA is applied to a multiple time series of interest rates (or
forward rates) of various maturities that is recovered from the term structure
130 5 Dimension-reduced modeling
where xi are the vectors that contain the log-differences of IVs, and Σi are
the sample covariance matrices. The ellipse (5.1) is an approximate 95% con-
fidence region for a zero mean multivariate normal distribution.
The striking observation is that the principal axes in both time to matu-
rity groups are almost similar. It is only the volatility of IV that is different.
A natural assumption therefore is to attribute the variability of the axes to
sampling variability, and otherwise to estimate principal axes jointly in both
groups under the constraint that they are equal: for the same data, the re-
sults are displayed in Figure 5.2. Now, principal axes in both cases are iden-
tical. We shall show and test that this also holds across the short-term IVS.
Consequently, via CPC methods, a significant reduction of dimension can be
achieved for the IVS dynamics.
def
Denote by Xi = (xi1 , . . . , xip ) ∈ Rp , i = 1, . . . , k the IV returns for k
maturity groups at p grid points in the IVS. The hypothesis for a CPC model
is written as:
HCP C : Ψi = ΓΛi Γ> , i = 1, . . . , k , (5.2)
5.2 Common principal component analysis 133
Scatterplot under PCA: 22 days to maturity Scatterplot under PCA: 90 days to maturity
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
{1.050} moneyness
{1.050} moneyness
0
0
-0.05
-0.05
-0.1
-0.1
-0.15
-0.15
-0.2
-0.2
-0.1 0 0.1 0.2 -0.1 0 0.1 0.2
{0.925} moneyness {0.925} moneyness
SCMcpcpca.xpl
Scatterplot under CPC: 22 days to maturity Scatterplot under CPC: 90 days to maturity
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
{1.050} moneyness
{1.050} moneyness
0
0
-0.05
-0.05
-0.1
-0.1
-0.15
-0.15
-0.2
-0.2
-0.1 0 0.1 0.2 -0.1 0 0.1 0.2
{0.925} moneyness {0.925} moneyness
SCMcpccpc.xpl
good description of the IVS dynamics. Furthermore, since the data in our
IVS groups are very much correlated, the factor series may be regarded as
scaled versions of each other. Thus, we can reduce our attention to study one
maturity group only. In total, instead of modeling kp factor series, we end up
with modeling three of them. These considerations demonstrate the usefulness
of dimension reduction techniques.
A particular strength of CPC models is that they enclose a whole family
of models with varying degrees of flexibility in the eigenstructure. The pro-
portional model puts additional constraints on the matrix of eigenvalues Λi
by imposing that λij = ρi λ1j , where ρi > 0 are unknown constants. This is
equivalent to writing:
Hprop : Ψi = ρi Ψ1 , i = 2, . . . , k . (5.5)
The number of parameters here are p(p + 1)/2 + (k − 1). For the IVS this
means that the variances of the common components between the groups are
proportionally scaled versions of each other. In terms of modeling the IVS,
this implies that one needs to resort to one maturity group only, once the
scaling constants ρi are estimated.
In letting the eigenvalues unrestricted as in the CPC hypothesis, one can
also ease the restrictions on the transformation matrix Γ: this leads to partial
5.2 Common principal component analysis 135
Table 5.1. The table presents the hierarchy of nested CPC models. From top to
bottom restrictions on the estimated population covariance matrices are eased. Se-
quentially, starting from top, each model is tested against the next lower one in the
hierarchy. The degrees of freedom of the corresponding χ2 test as given in column
(3) are obtained by subtracting the number of parameters to be estimated in each
model, compare Flury (1988), p. 151, and Fengler et al. (2003b). After arriving at
the CPC hypothesis, one tests the CPC against the pCPC(p − 2) model. Next, the
pCPC(p − 2) model is tested against the pCPC(p − 3) model, and so on, down to the
pCPC(1) model which is finally tested against the hypothesis of arbitrary covariance
matrices.
Here, we focus on the ordinary CPC model given in (5.2) due to its practical
importance and its similarity with the proportional and the pCPC models.
For the theory on the other models we refer to Flury (1988).
def
In abuse of notation, let Xi = (xi1 , . . . , xip ), i = 1, . . . , k, be the (ni × p)
matrices of IV returns sampled from k underlying p -variate normal distri-
butions N (µ, Ψi ). As stated earlier, Ψi denotes the population covariance
matrix. In our view the sample is recovered from a grid of size (k × p) ob-
tained by smoothing the IVS as discussed in the previous chapter. Let Σi be
the (unbiased) sample covariance matrix of the returns of IV. In our applica-
tions, we derive returns as first order log-differences of IVs. The sample size
is ni > p for i = 1, . . . , k.
Applying to general results from multivariate analysis, Härdle and Simar
(2003), under the assumption of normality, the distribution of Σi is a general-
ization of the chi-squared variate, the Wishart distribution with scale matrix
Ψi and (ni − 1) degrees of freedom. It is denoted by:
ni Σi ∼ Wp (Ψi , ni − 1) .
def
Rdenotes
∞ −s t−1
the multivariate Gamma function, where π = 3.141..., and Γ (t) =
0
e s ds is the univariate Gamma function.
5.2 Common principal component analysis 137
Taking partial derivatives with respect to all λij and γ j , it can be shown
that the solution of the CPC model can be written as the generalized system
of characteristic equations:
( k )
>
X λim − λij
γm (ni − 1) Σi γ j = 0, m, j = 1, . . . , p, m 6= j .
i=1
λim λij
(5.13)
This is solved observing
λim = γ >
m Σi γ m , i = 1, . . . , k, m = 1, . . . , p , (5.14)
k −1 −1
def P N −k λij λim
and θjm = ni −1 (λij −λim )2 with m 6= j. We point out that
i=1
the variance matrix, as usual in PCA, does not have full rank. Instead, it has
rank p (p − 1)/2.
Eigenvalues
subsamples:
(1) (r) (R)
H0 : λij = · · · = λij = · · · = λij
(r ) (r ) (r ) (r )
against the alternative H1 : ∃ λij 1 , λij 2 such that λij 1 6= λij 2 for some
r1 , r2 . H0 can be written as
(1) (2)
λij − λij = 0
..
.
(1) (r)
H0 : λij − λij = 0 . (5.19)
..
.
(1) (R)
λij − λij = 0
where nir is the sample size of group i and subsample r. A test for (5.19) can
be based on:
n o−1
Tequ = (C1 λe ij )> C1 Var(λ)C
e > C1 λ
e ij . (5.21)
1
Since the λij are asymptotically normal and independent by virtue of (5.16),
n o−1/2
def e >
z = C1 Var(λ)C 1 C1 λ
e ij is asymptotically N (0R−1 , IR−1 ) under H0 .
Thus Tequ = z> z is asymptotically χ2 distributed with (R − 1) degrees of
freedom. In practice all unknowns are to be replaced by consistent estimates,
which does not alter the asymptotic distribution of (5.21).
140 5 Dimension-reduced modeling
would be
(2) (1)
λij − λij = 0
(3) (2)
λij − λij = 0
H0 : .. .
.
(R) (R−1)
λij − λij =0
The equivalence of the tests is due to the fact that any pair of contrast
matrices is related by a nonsingular matrix A such that C1 = AC2 . Inserting
AC2 into T yields:
e > −1 C1 λ
e > C1 Var(λ)C
T = (C1 λ) 1
e
e > A> −1 AC2 λ
e > AC2 Var(λ)C
= (AC2 λ) 2
e
e > C2 Var(λ)C
e > −1
= (C2 λ) 2 C2 λ
e ,
Eigenvectors
In testing for the eigenvectors one faces the difficulty that the covariance
matrix Var(Γ) given in equation (5.18) is singular. The problem was first
solved for testing one single eigenvector by Anderson (1963) and generalized
for the case of several eigenvectors by Flury (1988). We adapt their strategies
here for our stability tests.
We will be interested in testing for stability of a single eigenvector across
different samples. Without loss of generality we focus on the first eigenvector.
p
θ1j γ j γ >
P
Thus, the test will be based on the upper p × p matrix j in (5.18),
j=1
j6=1
only. Tests for equality of q > 1 eigenvectors would need to employ the qp × qp
upper submatrix of (5.18).
In analogy to (5.19) write
(1) (r) (R)
H0 : γ1 = · · · = γ1 = · · · = γ1
5.2 Common principal component analysis 141
(r ) (r ) (r ) (r )
against the alternative H1 : ∃ γ 1 1 , γ 1 2 such that γ 1 1 6= γ 1 2 for some
r1 , r 2 .
Again we rewrite H0 as
(1) (2)
γ 1 − γ 1 = 0p
(1) (3)
γ 1 − γ 1 = 0p
..
.
H0 : (1) (r) , (5.22)
γ1 − γ1 = 0p
..
.
(1) (R)
γ1 − γ1 = 0p
and use the p(R − 1) × pR contrast matrix:
I −I 0 · · · 0
I 0 −I · · · 0
C3 = . . . . , (5.23)
..
.. .. .. . . .
I 0 0 · · · −I
def def
where I = Ip and 0 = 0p×p , here.
γ )C>
1). Thus C3 Var(e 3 has full rank only if R(p−1) ≥ (R−1)p, or, equivalently,
142 5 Dimension-reduced modeling
There are several strategies of model selection. Given our maximum likelihood
framework, on the one hand, one may construct likelihood ratio tests and test
each model separately against the unrestricted model. The log-likelihood ratio
statistic for testing the HCP C against the unrestricted model (unrelatedness
between covariance matrices) is given by:
k
L(Ψ̂1 , . . . , Ψ̂k ) X |Ψ̂i |
T = −2 ln = (ni − 1) ln , (5.27)
L(S 1 , . . . , S k ) i=1
|S i |
For the empirical CPC analysis, we estimate the IVS for the 1995 to 2001
data from daily samples by means of a local polynomial estimator, Section 4.3.
The data set is described in the appendix. The moneyness grid is κf ∈ {0.925,
0.950, 0.975, 1.000, 1.025, 1.050} and the maturity grid is τ ∈ {0.0625, 0.1250,
0.1875, 0.2500} years, which corresponds to 22, 45, 68, and 90 days to expiry.
As kernel function we choose the product of univariate quartic kernels. In
the bandwidth selection, we proceed as discussed in Section 4.4.2. Since our
estimation grid only covers the short maturity data, there is no particular
144 5 Dimension-reduced modeling
Model
higher lower Chi. Sqr. df. p -val. AIC SIC
Equality Proportionality 1174.9 3 0.00 3529.3 3529.3
Proportionality CPC 1488.8 15 0.00 2360.3 2407.0
CPC pCPC(4) 122.6 3 0.00 901.5 1181.2
pCPC(4) pCPC(3) 210.6 6 0.00 784.9 1111.2
pCPC(3) pCPC(2) 115.5 9 0.00 586.2 1005.8
pCPC(2) pCPC(1) 398.9 12 0.00 488.7 1048.2
pCPC(1) Unrelated 17.6 15 0.28 113.7 859.7
Unrelated 126.0 1105.1
The results of our model selection procedures are displayed in Table 5.2. Ac-
cording to the sequential chi-squared tests the model to be preferred is a
pCPC(1) model, since this test is the first that cannot be rejected against the
next more flexible model. Also when testing directly against the unrelated
model, which is done by adding up the test statistics and the corresponding
degrees of freedom between the model of interest and the unrelated model in
Table 5.2, it is the pCPC(1) model which is not rejected. AIC and SIC both
recommend the pCPC(1). Note also that according to the SIC all CPC(q)
models with q ≤ 3 are superior to the unrelated model. For the remaining
CPC models, the SIC is slightly higher than for the unrelated case, whereas
for the proportional and the equality models the information criteria increase
tremendously. As shall be seen in the following, for an approximation up to
88%, one component will be sufficient, while the second and third only add
only 6% and 3% of explained variance. Thus, we believe – also for computa-
tional and practical simplicity – that a CPC model can be chosen as a valid
description of the IVS dynamics.
The estimation results of the eigenvectors for the entire sample period
exhibit the same stylized facts as documented in Fengler et al. (2003b) for
the year 1999 for daily settlement prices. In Figure 5.3, we display the results
for the first three eigenvectors. Table 5.3 reports the estimation results of
the entire matrix of eigenvectors. The numbers given in parenthesis are the
asymptotic standard errors.
5.2 Common principal component analysis 145
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
2 4 6
Index of Eigenvectors
Fig. 5.3. CPC model for the entire sample period 19950101 - 20010531. First eigen-
vector horizontal line, second eigenvector diagonal line, third eigenvector U-shaped
line. Compare with Fengler et al. (2003b).
The factor loadings of the first eigenvector, the blue line in Figure 5.3, are
of the same sign throughout (eigenvectors are unique up to sign), and give
approximately the same weight to each volatility shock across the smile. We
hence interpret this factor as a common shift factor. In Figure 5.4, we present
the projection of the longest IV maturity group (three months maturity) using
the first eigenvector. The upper panel shows the PC, the lower the integrated
process. The shift interpretation of the first component is also visible from the
general structure of this process: in comparison with Figure 2.13, it is seen
that it exhibits almost the same patterns as the IV process itself.
In PCA, one typically employs the following measure to gauge the fraction
of variance, which is captured by the j 0 th factor:
λ̂ij 0
Pp , (5.30)
j=1 λ̂ij
mon. index γ̂ 1 γ̂ 2 γ̂ 3 γ̂ 4 γ̂ 5 γ̂ 6
1 0.344 -0.598 0.530 0.472 -0.129 0.055
(0.0021) (0.0095) (0.0113) (0.0070) (0.0085) (0.0037)
2 0.373 -0.385 0.022 -0.614 0.502 -0.288
(0.0014) (0.0044) (0.0096) (0.0086) (0.0105) (0.0068)
3 0.397 -0.173 -0.339 -0.326 -0.457 0.618
(0.0010) (0.0065) (0.0055) (0.0090) (0.0090) (0.0056)
4 0.419 0.024 -0.482 0.250 -0.337 -0.644
(0.0011) (0.0085) (0.0038) (0.0083) (0.0088) (0.0043)
5 0.440 0.270 -0.252 0.432 0.610 0.334
(0.0012) (0.0057) (0.0074) (0.0105) (0.0081) (0.0074)
6 0.463 0.625 0.554 -0.213 -0.191 -0.073
(0.0022) (0.0095) (0.0108) (0.0076) (0.0052) (0.0032)
mat.
group λ̂i1 λ̂i2 λ̂i3 λ̂i4 λ̂i5 λ̂i6
1 16.39 0.90 0.55 0.11 0.04 0.01
(0.578) (0.032) (0.019) (0.004) (0.001) (0.0003)
2 10.14 0.41 0.16 0.07 0.03 0.01
(0.357) (0.014) (0.006) (0.002) (0.001) (0.0004)
3 7.20 0.33 0.14 0.07 0.04 0.02
(0.254) (0.012) (0.005) (0.002) (0.001) (0.001)
4 6.01 0.40 0.23 0.09 0.06 0.02
(0.211) (0.014) (0.008) (0.003) (0.002) (0.001)
Table 5.3. In the top position the eigenvectors Γ̂ = (γ̂ 1 , . . . , γ̂ 6 ). From top to
bottom, the numbers denote the moneyness grid κf ∈ {0.925, 0.950, 0.975, 1.000,
1.025, 1.050}. The eigenvalues below λ̂ij × 103 are ordered from top to bottom
with increasing maturity τi ∈ {0.0625, 0.1250, 0.1875, 0.2500}, standard errors in
parenthesis; sample period 19950101 to 20010531.
the second type of shocks as common slope shocks. Figure 5.5 displays this
component. The integrated second PC has a stable downward trend, which
appears to revert around 1999. The third eigenvector can be interpreted as
a common twist factor. This factor hits the curvature of the surface, since
the sign of the eigenvector switches within the near-the-money region. Again
the projection and the integrated process are shown in Figure 5.6. These
components account for only 6% and 3% of the variance. Similar results have
been obtained by Zhu and Avellaneda (1997), Skiadopoulos et al. (1999),
5.2 Common principal component analysis 147
1st PC
0.4
0.2
0
-0.2
-0.4
Fig. 5.4. Projection of the longest maturity group (90 days to expiry) using the first
eigenvector. The upper panel shows returns, the lower panel the integrated series.
Alexander (2001b), Cont and da Fonseca (2002), Fengler et al. (2002b). The
interpretations of the factor loadings in terms of shift, slope and twist shocks
are also known from PCA studies on interest and forward rates, Bliss (1997)
and Rebonato (1998).
Table 5.4 summarizes the descriptive statistics of the PCs. The results are
similar to the findings of Cont and da Fonseca (2002) reported on the S&P 500
and the FTSE 100, except for the mean reversion. Whereas skewness is close
to zero for the three PCs, there is evidence for excess kurtosis especially for the
second and third PC. The mean reversion of the integrated first PC is found
to be around 230 days, i.e. almost a year, while the second PC exhibits a more
148 5 Dimension-reduced modeling
2nd PC
0.1
0
-0.1
Fig. 5.5. Projection of the longest maturity group (90 days to expiry) using the
second eigenvector. The upper panel shows returns, the lower panel the integrated
series.
3rd PC
0.05
0
-0.05
Fig. 5.6. Projection of the longest maturity group (90 days to expiry) using the third
eigenvector. The upper panel shows returns, the lower panel the integrated series.
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
2 4 6
Index of Eigenvectors
Fig. 5.7. CPC model estimated separately in each annual sample 1995, 1996, 1997,
1998, 1999, 2000, 2001. Colors move from light to intensive tones the more recent
the subsample.
In Table 5.5, we present the test-statistics and the p -values of our tests. For
the first eigenvector, against the benchmark year 1995, the stability hypothesis
cannot be rejected at the 5% level of significance except for the years 2000 and
2001. The sequential tests reveal that there is a significant change from 1996 to
1997 and from 1999 to 2000. Our interpretation of these results, together with
the visual inspection of Figure 5.7, is that the first eigenvector is relatively
reliable across the sample periods.
For the second and third eigenvectors, as can be conjectured from Fig-
ure 5.7, the case is much different: the stability hypothesis is strongly rejected
against the benchmark year. In the sequential tests, only from year 2000 to
2001 the null hypothesis cannot be rejected. There is a marginal case with
respect to the third eigenvector, from 1997 to 1998. Altogether, we conclude
that the second and third eigenvectors exhibit significant changes over time.
In the stability case of the eigenvalues, we present tests of only one group.
There would be – if we tested three eigenvalues only – 132 tests (benchmark
and sequential tests) to study in four groups with a moneyness grid of dimen-
sion six. Table 5.6 displays the results of the group with the shortest time to
maturity (22 days to expiry). Results for the other groups are very similar.
152 5 Dimension-reduced modeling
First eigenvectors
Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 11.8 0.066
1995 1997 5.6 0.465 1996 1997 28.8 0.000
1995 1998 12.5 0.051 1997 1998 2.4 0.873
1995 1999 6.6 0.352 1998 1999 4.7 0.580
1995 2000 29.2 0.000 1999 2000 39.1 0.000
1995 2001 22.6 0.001 2000 2001 1.38 0.966
Second eigenvectors
Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 205.9 0.000
1995 1997 188.8 0.000 1996 1997 289.0 0.000
1995 1998 85.8 0.000 1997 1998 19.6 0.003
1995 1999 539.1 0.000 1998 1999 99.3 0.000
1995 2000 100.8 0.000 1999 2000 173.8 0.000
1995 2001 28.4 0.000 2000 2001 9.34 0.155
Third eigenvectors
Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 444.3 0.000
1995 1997 108.0 0.000 1996 1997 532.1 0.000
1995 1998 36.4 0.000 1997 1998 16.7 0.010
1995 1999 251.8 0.000 1998 1999 66.7 0.000
1995 2000 48.1 0.000 1999 2000 92.9 0.000
1995 2001 19.8 0.000 2000 2001 7.4 0.284
Table 5.5. Stability tests of eigenvectors. Tests are constructed as derived in (5.25).
The p -value is from a chi-squared variate with six degrees of freedom.
This is not surprising given the high degree of co-movements in the IVS. As is
seen in Table 5.6, the null hypothesis is rejected against the benchmark year
for all three eigenvalues. For the sequential tests, results are mixed: mostly
all tests reject, but e.g. between the years of financial crisis 1997 and 1998,
differences between the two samples are not significant in the first and second
eigenvalue. As a general bottom line, for the second eigenvalue, differences
between the years seem to be much less important than for the first and third
one. This is an interesting result since it says – given our interpretation of this
component earlier – that volatility in the wings of the IVS is more constant
than in the level and the twist component.
Summing up, from the stability tests, we draw the following conclusions:
stability of the eigenvalues – except for the second one – is rejected. This is
not a particular threat to modeling PCs or PCA in general, since it simply
indicates that GARCH-type models can be an adequate choice in the time
series context. For the eigenvectors, things look different: the good news is
5.2 Common principal component analysis 153
that the first eigenvector, the component, which captures more than 80% of
the variance, is fairly stable. Thus in applications of risk controlling, such
as scenario analysis or stress tests, see e.g. Fengler et al. (2002b), one can
build on reliable estimates. The results from these experiments may not be
completely correct in the wings of the IVS. However, since the biggest threat
to option portfolios stems from level changes, the risk may be bearable from
a risk management point of view. The bad news applies to trading strategies
that aim at exploiting the wings of the IVS, i.e. trading in OTM puts or OTM
calls. Here, continuous recalibration of the models appears to be mandatory.
From our point of view, the results call for adaptive techniques of PCA
that identify homogenous subintervals in the sample period by data-driven
methods. On homogenous subintervals, reliable estimates are recovered. The
literature on adaptive estimation, as pioneered by Lepski and Spokoiny (1997)
and Spokoiny (1998), has been applied successfully in other contexts in finance
154 5 Dimension-reduced modeling
Due to the similarity within the groups (we consider the time series as scaled
versions of each other), we concentrate on one group only. We pick the longest
time to maturity group. The time series of the first three PCs yk1 , yk3 , yk3 are
obtained from the projection Yk = Xk Γ̂. Since the stability of the second and
third eigenvectors was rejected, we reestimate the model in each subsample
and project using the new matrices Γ̂(r) . Based on autocorrelation and partial
autocorrelation plots, we propose adequate models for each univariate series.
Of course the univariate time series are not independent, but they are un-
correlated by construction. This is why modeling the univariate series can be
justified, see Zhu and Avellaneda (1997) for a similar approach. By AIC and
SIC searches we will identify a best fitting model, and present the estimation
results in more detail.
From Figures 5.4 to 5.6, it is seen that the first three PCs display a be-
havior close to white noise. This impression is reinforced when inspecting
the autocorrelation and partial autocorrelation functions as displayed in Fig-
ures 5.8 to 5.13. From Figure 5.8 it is seen that the first component exhibits
no autocorrelation: it immediately dies off. Also the partial autocorrelation
function in Figure 5.9 does not show a particular structure. Thus, the first
component, which explains up to 88% of the variance, can be considered as
noise.
For the second and third components a different picture arises: from Fig-
ures 5.10 and 5.12 a negative first order correlation is visible hinting towards
an MA(1) model. Also the partial autocorrelation functions in Figures 5.11
and 5.13 display the typical patterns of an MA process.
With this preliminary analysis at hand, we perform AIC and SIC searches
over MA(q)-GARCH(r, s) models, where q = 0, r = 1, 2 s = 1, 2 for the
first, and q = 1, r = 1, 2, s = 1, 2 for the second and third component. We
also estimate different types of GARCH models such as TGARCH specifica-
tions in order to investigate asymmetries in shocks. Since Table 5.4 suggests
a substantial correlation with the contemporaneous index returns, we addi-
tionally include index returns into the mean equations of all processes, and
additionally into the variance equation of the first component.
The MA-GARCH models for the components j = 1, 2, 3 are given by:
yjt = c + a1 zt + εjt + b1 εj,t−1 ; , (5.31)
2
εjt ∼ N (0, σjt ),
X r Xs
2
σjt = cσ + αm σj,t−m + βm ε2j,t−m + γzt2 , (5.32)
m=1 m=1
5.2 Common principal component analysis 155
ACF 1st PC
1
0.5
acf
0
0 5 10 15 20 25 30
lag
PACF 1st PC
0.05
pacf
0
-0.05
5 10 15 20 25 30
lag
where we denote the elements of yk1 , yk3 , yk3 by y1t , y2t , y3t to put ourselves
into the usual time series notation. Log-returns in the DAX index are denoted
by zt .
Table 5.7 displays the statistics of the model selection criteria for the
different models under consideration. For y1t both AIC and SIC suggest an
GARCH(1,2) specification. For y2t and y3t , the results are not as clear-cut.
Since the differences of the model selection criteria are very much the same, we
decided for the more parsimonious model, i.e. an MA(1)-GARCH(1,1) model
for both.
Given these results, one may like to alter the variance equation to allow
for asymmetries in shocks: under the TGARCH model, Glosten et al. (1993)
156 5 Dimension-reduced modeling
ACF 2nd PC
1
0.5
acf
0
0 5 10 15 20 25 30
lag
PACF 2nd PC
0
-0.1
pacf
-0.2
-0.3
-0.4
5 10 15 20 25 30
lag
In this model, good news, εt > 0, and bad news, εt < 0, have differential
Ps effects
on the conditional variance P– good news have an impact of m=1 βm , while
s
bad news have an impact of m=1 βm +β1− . If β1− > 0, a leverage effect exists,
and the news impact is asymmetric if β1− 6= 0. We also estimated EGARCH
models, Nelson (1991), however, since they did non produce any substantial
gain compared to the other models, we do not report the estimation results
here.
5.2 Common principal component analysis 157
ACF 3rd PC
1
0.5
acf
0
-0.5
0 5 10 15 20 25 30
lag
PACF 2nd PC
0
-0.1
pacf
-0.2
-0.3
-0.4
5 10 15 20 25 30
lag
In Table 5.8, the estimation results are displayed in more detail. From
the mean equation for y1t it is evident that the index returns have a highly
significant impact on the first PC. The sign is in line with the leverage effect
hypothesis. In the variance equation all parameters are significant. β2 < 0
may be interpreted as an ‘over-reaction correction’ in terms of the variance:
high two-period lagged returns have a dampening impact on the variance. As
is to be expected, volatility increases also when volatility in the underlying
is high (γ > 0). From the TGARCH model, no evidence for a GARCH type
leverage effect is found, since β1− < 0. The other parameter estimates for the
TGARCH are of same size and significance level. The adjusted R̄2 is around
23%. This is high, however, it is entirely due to the index returns included in
158 5 Dimension-reduced modeling
Factor
y1t y2t y3t
cond. mean
c 0.001 0.001 1.9E −4 1.0E −4 -3.8E −05 -5.8E −05
[0.407] [1.048] [1.170] [0.566] [-0.592] [-0.907]
a1 -2.920 -2.930 0.086 0.079 0.005 0.004
[-24.46] [-24.21] [4.860] [4.564] [0.457] [0.351]
b1 -0.733 -0.501 -0.733 -0.729
[-35.50] [-21.78] [-35.50] [-34.81]
cond. var.
cσ 1.4E −4 1.6E −4 6.7E −5 6.4E −5 1.7E −05 2.2E −05
[3.945] 4.141 [7.515] [7.353] [8.687] [8.681]
α1 0.803 0.797 0.425 0.462 0.686 0.631
[32.09] [29.07] [6.774] [7.791] [24.41] [17.11]
β1 0.246 0.284 0.200 0.115 0.147 0.082
[7.112] [7.598] [6.840] [3.505] [8.027] [3.206]
β2 -0.130 -0.124
[-4.110] [-3.611]
β1− -0.950 0.150 0.142
[-3.706] [3.239] [3.916]
γ 1.480 1.580
[4.991] [4.909]
R̄2 0.23 0.23 0.22 0.21 0.33 0.33
Table 5.8. Estimation results of GARCH models for the three PCs, t-statistics in
brackets.
the regression. Leaving zt out of the mean equations reduces the R̄2 to around
2%, only.
In the mean equations of y2t and y3t , the MA(1) components are negative
and significant. The index returns are only significant for y2t and positively
influence the slope structure in the surface. Thus, together with the results
for y1t , we see that positive shocks in the underlying tend to reduce IV levels,
5.2 Common principal component analysis 159
while at the same time the slope of the surface is intensified. The variance
equations do not exhibit any special features, however, it is interesting that a
GARCH type leverage effect is present, since β1− > 0: lagged negative shocks
increase the variance of both processes.
We have seen that CPC models yield a valid description of the IVS dynamics.
They offer a convenient framework for model choice – and ultimately – for a
low-dimensional description of the IVS. Three components that have intuitive
financial interpretations as a shift, a slope and a twist shock appear to yield a
sufficiently exact representation. Stability tests indicate that the first and most
important component is fairly stable, while this conclusion cannot be drawn
for the other two components. We employed GARCH models to describe the
dynamics of the resulting factor series.
Within this framework risk and scenario analysis for portfolios can be
implemented in a straightforward manner, Fengler et al. (2002b) and Fengler
et al. (2003b). Forecasting is likely to be limited. At best a one-day forecast
can be performed. Since this will be done in the context of the semiparametric
factor model, we do not perform a separate forecast exercise at this point.
A potential disadvantage of CPC models is that the number of time series
to be modelled are a multiple of the number of time to maturity groups, if
one does not follow our simplification to model the series as scaled versions
of each others. Also it would be more elegant, if factor extraction and surface
estimation could be performed within a single step. This can be resolved by
applying a functional PCA or using a semiparametric factor model as shall
be seen presently.
160 5 Dimension-reduced modeling
Rethinking the approach taken in Section 5.2 suggests to carry the idea of
PCA over to the functional case: this leads to functional PCA (FPCA). In
PCA we obtain eigenvectors which are used to project the slices of the IVS
into a lower dimensional space. In FPCA, we will recover eigenfunctions, or
eigenmodes, for this projection (now defined in a functional sense). Similarly
to PCA, we can represent the IVS as a linear combination of uncorrelated
(scalar) random variables, which – via their eigenfunctions – unfold the high-
dimensional dynamics of the IVS. In the literature of signal processing this
representation is often called Karhunen-Loève expansion or decomposition.
In the following subsection the basic ideas of the FPCA approach will be
sketched. Key references for functional data analysis are Besse (1991) and
Ramsay and Silverman (1997), who coined the field of functional data analysis.
We also briefly address ways of computing FPCs. First application in the
context of the IVS is due to Cont and da Fonseca (2002) who studied the
IVS derived from options on the S&P 500 index and the FTSE 100 index. A
treatment that also focusses on the computational aspects of FPCA, is given
by Benko and Härdle (2004).
The notation of the inner product can be distinguished from the covariation
process of two stochastic processes h·, ·it which is indexed by t.
We interpret a random surface X as a random function such that each
realization ω ∈ Ω gives a smooth surface X(ω, ·) : J → R. Without loss of
generality we assume that X is mean zero. For the precise probabilistic set-up,
we refer to Dauxois et al. (1982) and Pezzulli and Silverman (1993).
One can derive FPCA in the same step-wise manner as is typically done
in PCA in standard textbook treatments: find linear combinations, i.e. weight
functions γ(u), such that the projection
5.3 Functional data analysis 161
Z
Y1 = γ(u) X(u) du = hγ1 , Xi (5.36)
J
subject to kγj k2 = 1 and hγj 0 , γj i = 0 for j 0 < j. The covariance between the
def
two surface values at u, v ∈ J is denoted by C(u, v) = Cov{X(u), X(v)}, and
the integral transform of the weight function γ with kernel C is defined by:
Z
def
Aγ(·) = C(·, v) γ(v) dv . (5.38)
J
There are a number of methods for computing FPCs and solving (5.39),
Ramsay and Silverman (1997). The first approach consists in discretizing the
functions. In the simplest case when J is only a one-dimensional interval,
say a particular smile or the ATM term structure, one can recover the values
xi (u1 ), xi (u2 ), . . ., xi (up ) on a dense grid, and store the data in (n×p) matrix.
Then an ordinary PCA is applied. Since in practice it can happen that p >
n, it may be necessary to recover the solution to the eigenvalue problem
from the singular value decomposition of the data matrix. In order to recover
the functional form of the eigenvectors, they are renormalized and suitably
interpolated, Ramsay and Silverman (1997, Section 6.4.1). In principle, one
could proceed similarly in the two-dimensional case where J contains the
full region of moneyness by stacking the surfaces into a huge matrix. After
applying an ordinary PCA, the resulting eigenvectors are resorted to recover
the two-dimensional eigenfunctions.
Another, more elegant solution relies on basis expansions of the eigenfunc-
tions, Ramsay and Silverman (1997, Section 6.4.2) and Cont and da Fonseca
(2002). Suppose that the IVS admits an expansion in terms of a set of L basis
functions φ1 (u), φ1 (u), . . . , φL (u), u ∈ J . Then each function is written as:
L
X
xi (u) = cil φl (u) , (5.43)
l=1
def 1
φ> (u)C> Cφ(v) .
Cov
d X(u), X(v) = (5.45)
n−1
1
φ> (u) C> CWb = λφ> (u) b . (5.47)
n−1
Since the last equation must hold for any u ∈ J , it reduces to the pure
matrix equation:
1
C> CWb = λ b . (5.48)
n−1
Equation (5.48) is further simplified by the following observation: in our
basis framework, the inner product corresponds to
Z
>
hγj , γj 0 i = b> >
j φ(u) φ (u) bj 0 du = bj Wbj 0 . (5.49)
J
def
Defining u = W1/2 b, one can transform (5.48) into the symmetric eigen-
value problem:
1
W1/2 C> CW1/2 u = λ u . (5.50)
n−1
This is solved using any standard PCA routines in statistical packages. The
desired eigenfunctions are recovered by b = W−1/2 u.
A special case occurs, when the basis functions are orthonormal. Then
W = IL , i.e. it becomes the identity matrix of order L. Hence, FPCA is re-
duced to the multivariate PCA performed on the coefficient matrix C, Ramsay
and Silverman (1997).
The concept of expanding the unknown solution to (5.39) on a set of basis
functions is also known as collocation. It should be outlined that this superpo-
sition of basis functions leads to a strong solution of the underlying Fredholm
integral equation of the second kind. This highlights the main difference to
the well-known Galerkin methods that solve (5.39) in a weak sense, i.e. with
respect to the corresponding dual space of H. To implement the Galerkin
method, one starts with a finite dimensional subspace of this dual space, and
solves (5.39) with respect to a basis of this subspace. As the dimension of that
subspace tends to infinity, one can obtain a solution that holds for all linear
functionals. The Galerkin approach is taken by Cont and da Fonseca (2002).
164 5 Dimension-reduced modeling
hεj , φl i = 0 . (5.53)
This yields
L
X Z Z Z
bj,l C(u, v)φj (u)φl (v) dv du − λj φj (u)φl (u) du = 0 . (5.54)
l=1 J J J
B = (bj,l ) (5.55)
Z
def
W = (wj,l ) = φj (v) φl (v) dv (5.56)
Z JZ
def
C = (cj,l ) = C(u, v)φj (u)φl (v) dv du (5.57)
J J
def
Λ = diag(λj , j = 1, . . . , N ) . (5.58)
CB = ΛWB . (5.59)
In modeling the IVS one faces two main challenges: first, the data design is
degenerated. Due to trading conventions, observations of the IVS occur only
for a small number of maturities such as one, two, three, six, nine, twelve, 18,
and 24 months to expiry on the date of issue. Consequently, IVs appear like
pearls strung on a necklace – or in short – as strings. This pattern has been
discussed in Section 2.5. For convenience, we display again the IVS together
with a plot, which shows the data design as seen from the top, Figure 5.14.
Options belonging to the same string have a common time to maturity, i.e.
lie on the same line. As time passes, the strings move through the maturity
axis towards expiry, while changing levels and shape in a random fashion.
As a second challenge, also in the moneyness dimension, the observation
grid does not cover the desired estimation grid at any point in time with the
same density. Consider, for instance the third IV string from the bottom: only
in a moneyness interval between 0.8 and 1.1 is occupied, while the coverage
for the second string from the bottom is much wider. The reasons for this
pattern can be twofold: first, these contracts have simply not been traded and
consequently do not show up in a (transaction based) data set. The second
reason – which is the more likely in this particular case – is hidden in the
specific institutional arrangements at the futures exchange with regard to the
creation of new contracts. Note that the options belonging to the third string
expire in July and have been created at the beginning of April. When new
contracts of a particular time to maturity are created, they are not available
on the entire strike spectrum: initially, only a certain range of OTM and
ITM options are open for trading. New contracts of this time to maturity are
subsequently born, as the underlying price moves. This practice ensures that a
minimum range of OTM and ITM options around the current spot price of the
underlying asset is always maintained. In reference to Figure 5.14, this means
that contracts of other strikes may simply not exist, since the underlying
moved too little between April and May.
Whatever the precise reasons are, it needs to be taken as a fact that even
when the data sets are huge as ours, for a large number of cases IV observations
are missing for certain subregions of the desired estimation grid. Of course,
this is a point that will be most virulent in transaction based data sets.
The dimension reduction techniques from the previous sections fit the IVS
on a grid for each day. Afterwards a PCA using a functional norm is applied
to the surfaces. For the semi- or nonparametric approximations to the IVS,
which are used within this work and which are promoted by Aı̈t-Sahalia and
Lo (1998), Rosenberg (2000), Aı̈t-Sahalia et al. (2001b), Cont and da Fonseca
(2002), Fengler et al. (2003b), and Fengler and Wang (2003), this design may
pose difficulties. For illustration, consider in Figure 5.15 (left panel) the fit
of a standard Nadaraya-Watson estimator. Bandwidths are h1 = 0.03 for the
moneyness and h2 = 0.04 for the time to maturity dimension (measured in
166 5 Dimension-reduced modeling
0.50
0.44
0.38
0.32
0.26
0.56 0.63
0.71 0.51
0.87 0.40
0.28
1.02 0.16
1.18
Data Design
0.7
0.6
0.5
Time to maturity
0.4
0.3
0.2
0.1
0
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Moneyness
Fig. 5.14. Left panel: call and put IVs observed on 20000502. Right panel: data
design on 20000502.
5.4 Semiparametric factor models 167
-0.82
-0.98
-1.14
-1.30
-1.46
0.80 0.50
0.88 0.41
0.96 0.32
0.23
1.04 0.14
1.12
-0.85
-1.00
-1.16
-1.31
-1.47
0.80 0.50
0.88 0.41
0.96 0.32
0.23
1.04 0.14
1.12
Fig. 5.15. Nadaraya-Watson estimate and SFM fit for 20000502. Bandwidths for
both estimates h1 = 0.03 for the moneyness and h2 = 0.04 for the time to maturity
dimension.
168 5 Dimension-reduced modeling
years). The fit appears very rough, and there are huge holes in the surface,
since the bandwidths are too small to ‘bridge’ the gaps between the maturity
strings. In order to remedy this deficiency one would need to strongly increase
the bandwidths. But this can induce a model bias. Moreover, since the design
is time-varying, bandwidths would also need to be adjusted anew for each
trading day, which complicates daily applications.
As an alternative, we will introduce the semiparametric factor model
(SFM) with time-varying coefficients due to Fengler et al. (2003a). In this
approach the IVS is fitted each day at the observed design points which will
lead to a minimization with respect to functional norms that depend on time.
This procedure avoids bias effects which can ensue from global daily fits used
in standard FPCA. In the following, we present the model, discuss its esti-
mation, and provide an empirical analysis for our data for the years 1998 to
2001.
L
X
yi,j ≈ m0 (xi,j ) + βi,l ml (xi,j ) , (5.60)
l=1
(2000), Gouriéroux and Jasiak (2001), Fan et al. (2003), and Linton et al.
(2003) among others. Nonparametric techniques are now broadly used in op-
tion pricing, e.g. Broadie et al. (2000), Aı̈t-Sahalia et al. (2001a), Aı̈t-Sahalia
and Duarte (2003), Daglish (2003), and interest rate modeling, e.g. Aı̈t-Sahalia
(1996), Ghysels and Ng (1989), and Linton et al. (2001).
Estimates m b l , (l = 0, . . . , L) and βbi,l (i = 1, . . . , I; l = 1, . . . , L) are
def
defined as minimizers of the following least squares criterion (βbi,0 = 1):
Ji Z
I X
( L
)2
X X
yi,j − βbi,l m
b l (u) Kh (u − xi,j ) du , (5.61)
i=1 j=1 l=0
I
(r) def (r−1) b(r−1)
X
bl,l0 (u) = Ji βbi,l0 βi,l pbi (u) , 0 ≤ l, l0 ≤ L , (5.68)
i=1
I
(r) def (r−1)
X
ql (u) = Ji βbi,l qbi (u) , 0≤l≤L. (5.69)
i=1
The algorithm is run until only minor changes occur. In the implemen-
tation, we choose a grid of points and calculate m b l at these points. In the
calculation of M(r) (i) and s(r) (i), we replace the integral by a Riemann in-
tegral approximation using the values of the integrated functions at the grid
points.
As discussed above, m b l and βbi,l are not uniquely defined. Therefore, we or-
PI
thogonalize m b L in L2 (p̂), where p̂(u) = I −1 i=1 p̂i (u), such that
b 0, . . . , m
PI b2 PI 2
i=1 βi,1 is maximum, and given βi,1 , m
b b 0, mb 1 , i=1 βbi,2 is maximum, and so
forth. These aims can be achieved by the following two steps: first replace
m b new
b 0 by m 0 b 0 − γ > Γ−1 m
=m b ,
m b new = Γ−1/2 m
b by m b , (5.74)
n o
β b new
b by β = Γ1/2 βb + Γ−1 γ ,
i i i
>
where we redefine the vector m b = (m b 1, . . . , m
bRL ) not to contain m b 0 any
>
more. Further we define the (L × L) matrix Γ = m(u) b m(u)
b p̂(u) du, or for
def R
clarity elementwise by Γ = (γl,l0 ), with γl,l0 = m b l (u) m
b l0 (u)p̂(u)du. Finally,
def R
we have γ = (γl ), with γl = m b 0 (u)m
b l (u)p̂(u) du.
Note that by applying (5.74), m b 0 is replaced by a function that minimizes
b 20 (u)p̂(u)du. This is evident because m
R
m b 0 is orthogonal to the linear space
spanned by m b 1, . . . m
b L . By the second equation of (5.74), m b 1, . . . , m
b L are
replaced by orthonormal functions in L2 (p̂).
In a second step, PIwe bproceed as in PCA and define a matrix B e with
elements ebl,l0 = β β
i=1 i,l i,l
b 0 and calculate the eigenvalues of B,
e λ1 >
. . . > λL , and the corresponding eigenvectors z1 , . . . zL . Put Z = (z1 , . . . , zL ).
Replace
mb by m b new = Z> m b , (5.75)
b new
(i.e. m l = z>
l m),
b and
β b new = Z> β
b by β b . (5.76)
i i i
172 5 Dimension-reduced modeling
After the application of (5.75) and (5.76), the orthonormal basis of the
PI 2
model m b 1, . . . , m
b L is chosen such that i=1 βbi,1 is maximum, and – given
P I 2
βi,1 , m
b b 0, m
b 1 – the quantity i=1 βi,2 is maximum, and so on, i.e. m
b b 1 is chosen
such that as much as possible is explained by βi,1 m b b 1 . Next mb 2 is chosen to
achieve the maximum explanation by βbi,1 m b 1 + βbi,2 m
b 2 , and so forth.
Unlike in Section 5.3.1 on FPCA, the functions m b l are not eigenfunctions of
an operator. This is because we use a different norm, namely f 2 (u)p̂i (u)du,
R
for each day. Through the norming procedure the functions are chosen
as eigenfunctions in an L-dimensional approximating linear space. The L-
dimensional approximating spaces are not necessarily nested for increasing L.
For this reason the estimates cannot be calculated by an iterative procedure
that starts by fitting a model with one component, and that uses the old L − 1
components in the iteration step from L − 1 to L to fit the next component.
The calculation of m b 0, . . . , m
b L has to be redone for different choices of L.
For the choice of L, we consider the residual sum of squares for different L:
PI PJi n PL b o2
def i j y i,j − l=0 βi,l m
b l (xi,j )
RV (L) = PI PJi , (5.77)
2
i j (yi,j − ȳ)
where ȳ denotes the overall mean of the observations. The quantity 1−RV (L)
is the portion of variance explained in the approximation, and L can be in-
creased until a sufficiently high level of fitting accuracy is achieved. As has
been explained for the CPC models, see Equation 5.30, this is a common
selection method also in PCA.
For a data-driven choice of bandwidths, we propose an approach based on
a weighted Akaike Information Criterion (AIC). We argue for using a weighted
criterion, since the distribution of the observations is far from regular, as was
seen from Figure 5.16. As mentioned in Section 4.3, this leads to nonconvexity
in the criterion and typically to inacceptably small bandwidths. Given the
unequal distribution of observations, it is natural to punish the criterion in
areas where the distribution is sparse. For a given weight function w, consider:
L
def 1 X X
4(m0 , . . . , mL ) = E {yi,j − βi,l ml (xi,j )}2 w(xi,j ) , (5.78)
N i,j
l=0
def
Putting w(u) = 1, delivers the common AIC, see in particular Sec-
tion 4.4.1. This, however, does not take into account the quality of the es-
timation at the boundary regions or in regions where the data are sparse,
since in these regions p(u) is small. We propose to choose
def 1
w(u) = , (5.81)
p(u)
which gives equal weight everywhere as can be seen by the following consid-
erations:
1 X 2
4(m0 , . . . , mL ) = E ε w(xi,j )
N i,j
" L #2
1 X X
+ E βi,l {ml (xi,j ) − m
b l (xi,j )} w(xi,j )
N i,j
l=0
Z
2
≈σ w(u)p(u) du
Z " L
#2
1 X X
+ βi,l {ml (u) − m
b l (u)} w(u)p(u) du .
N i,j
l=0
(5.82)
As above (r) denotes the result from the rth cycle of the estimation. Here, we
approximate the integral by a simple sum over the estimation grid. Putting
k = 1, 2, we have an L1 and an L2 measure of convergence. Iterations are
stopped when Qk (r) ≤ k for some small > 0.
IVs are observed only for particular strings, but in practice, one thinks about
them as being the observed values of an entire surface, the IVS. This is ev-
ident, when one likes to price and hedge over-the-counter options expiring
at intermediate maturities. We model log-IV on xi,j = (κi,j , τi,j )> . Our esti-
mation set J covers in moneyness κf ∈ [0.80, 1.20] and in time to maturity
τ ∈ [0.05, 0.5] measured in years.
5.4 Semiparametric factor models 175
precisely, we compute:
v
u L
uX
Vβb(hk ) = t Var{|βbi,l (hk ) − βbi,l (h∗ )|} , (5.86)
l=0
v
u L
uX
and Vm
b (hk ) =
t Var{|m b l (u; h∗ )|} ,
b l (u; hk ) − m (5.87)
l=0
where hk runs over the values given in Table 5.9, and Var(x) denotes the
variance of x. It is seen that changes in mb are 10 to 100 times higher in
magnitude than those for β. b This corroborates the approximation in (5.82)
that treats the factor loadings as known.
In being able to choose such small bandwidths, the strength of the mod-
eling approach is demonstrated: the bandwidth in the time to maturity di-
mension is so small that in the fit of a particular day, data from contracts
with two adjacent time to maturities do not enter together pbi (u) in (5.64)
and qbi (u) in (5.65). In fact, for a given u0 , the quantities pbi (u0 ) and qbi (u0 )
are zero most of the time, and only assume positive values for dates i when
the observations are in the local neighborhood of u0 . The same applies to the
moneyness dimension. Of course, during the entire observation period I, it
is mandatory that at least some observations for each u at some dates i are
made.
In Figure 5.16, we display the L1 and L2 measures of convergence. Conver-
gence is achieved quickly. The iterations were stopped after 25 cycles, when
the L2 was less than 10−5 . Figures 5.17 to 5.19 display the functions m b 1 to
mb 4 together with contour plots. We do not display the invariant function m b 0,
since it essentially is the zero function of the affine space fitted by the data:
both mean and median are zero up to 10−2 in magnitude. We believe this to
be pure estimation error. The remaining functions exhibit more interesting
patterns: m b 1 in Figure 5.17 is positive throughout, and mildly concave. There
is little variability across the term structure. Since this function belongs to
the weights with highest variance, we interpret it as the time dependent mean
of the (log)-IVS, i.e. a shift effect. Clearly, these observations are (and must
be) an iteration of the results from our CPC analysis in Section 5.2.5, see also
Cont and da Fonseca (2002).
5.4 Semiparametric factor models 177
6
5
4
3
log_10(Fitting Criterion)
2
1
0
-1
-2
-3
-4
5 10 15 20 25
Number of Iterations
Average density
58.16
46.56
34.96
23.36
11.75
0.80 0.50
0.88 0.41
0.32
0.96 0.23
1.04 0.14
1.12
Fig. 5.16. Left panel: convergence in the SFM model. Solid line shows the L1 , the
dotted line the L2 measure of convergence.
P The total number of iterations are 25.
Right panel: average density p̂(u) = I −1 Ii=1 p̂i (u). Bandwidths are h1 = 0.03 for
moneyness and h2 = 0.04 for time to maturity.
178 5 Dimension-reduced modeling
Function mb 2 , depicted in Figure 5.18, changes sign around the ATM re-
gion, which implies that the smile deformation of the IVS is exacerbated or
mitigated by this eigenfunction. Hence we consider this function as a money-
ness slope effect of the IVS. Finally, m b 3 is positive for the very short term
contracts, and negative for contracts with maturity longer than 0.1 years,
Figure 5.19. Thus, a positive weight in βbi,3 lowers short term IVs and in-
creases long term IVs: m b 3 generates the term structure dynamics of the IVS,
i.e. it provides a term structure slope effect.
To appreciate the power of the SFM, we inspect again the situation of
20000502. In Figure 5.20 we compare a Nadaraya-Watson estimator (left
panel) with the SFM (right panel). In the first case, the bandwidths are in-
creased to h = (0.06, 0.25)> in order to remove all holes and excessive varia-
tion in the fit, while for the latter the bandwidths are kept at h = (0.03, 0.04)> .
While both fits look quite similar at a first glance, the differences are best visi-
ble when both cases are contrasted for each time to maturity string separately,
Figures 5.21 to 5.24. Note that these figures do not display separate fits of the
smile functions. What we display are slices from the two-dimensional surfaces.
As is well seen, the standard Nadaraya-Watson fit exhibits a strong di-
rectional bias, especially in the wings of the IVS. For instance, for the short
maturity contracts, Figure 5.21, the estimated IVS is too low both in the
OTM put and the OTM call region. At the same time, levels are too high for
the 45 days to expiry contracts, Figure 5.22. For the 80 days to expiry case,
Figure 5.23, the fit exhibits an S-formed shape, although the data lie almost
on a linear line. Also the SFM is not entirely free of a directional bias, but
clearly the fit is superior.
Figure 5.25 shows the entire time series of βbi,1 to βbi,3 , the summary statis-
tics are given in Table 5.10 and contemporaneous correlation in Table 5.11.
The correlograms given in the lower panel of Figure 5.25 display the rich
5.4 Semiparametric factor models 179
0.89
1.28
1.14
0.99
1.00
0.84
0.70
0.80 0.50
0.88 0.41 1.10
0.96 0.32
1.04 0.23
0.14
1.12
1.20
Fig. 5.17. Factor m b 1 in the left panel (moneyness lower left axis). Right panel shows
contour plots of this function (moneyness left axis). Lines are thick for positive level
values, thin for negative ones. The gray scale becomes increasingly lighter the higher
the level in absolute value. Stepwidth between contour lines is 0.028, estimated from
ODAX data 19980101-20010531.
0.89
2.73
1.70
0.67
1.00
-0.36
-1.39
0.80 0.50
0.88 0.41 1.10
0.96 0.32
1.04 0.23
0.14
1.12
1.20
Fig. 5.18. Factor m b 2 in the left panel (moneyness lower left axis). Right panel shows
contour plots of this function (moneyness left axis). Lines are thick for positive level
values, thin for negative ones. The gray scale becomes increasingly lighter the higher
the level in absolute value. Stepwidth between contour lines is 0.225, estimated from
ODAX data 19980101-20010531.
180 5 Dimension-reduced modeling
0.89
2.40
0.88
-0.63
1.00
-2.15
-3.66
0.80 0.50
0.88 0.41 1.10
0.96 0.32
1.04 0.23
0.14
1.12
1.20
Fig. 5.19. Factor m b 3 in the left panel (moneyness lower left axis). Right panel
shows contour plots of this function (moneyness left axis). Lines are thick for
positive level values, thin for negative ones. The gray scale becomes increasingly
lighter the higher the level in absolute value. Stepwidth between contour lines is
0.240, estimated from ODAX data 19980101-20010531.
autoregressive dynamics of the factor loadings. The ADF tests, Table 5.12,
indicate a unit root for βbi,1 and βbi,2 at the 5% level. In following the pathway
taken in Section 5.2.5 for the CPC models, one may model the first differences
of the first two loading series together with the levels of βbi,3 in a parsimonious
VAR framework. Alternatively, since the results are only marginally signifi-
cant, one may estimate the levels of the loading series in a rich VAR model.
Although our results from Section 5.2.5 also suggest a GARCH specification,
we opt for the VAR model in levels. The main reason is that the loading series
of the SFM – unlike those obtained from the CPC models – are not uncorre-
lated. Accordingly, one would need to specify a multivariate GARCH model.
However, even for moderate dimensions the likelihood function of the multi-
variate GARCH model is quickly untractable or can deliver unstable results,
Fengler and Herwartz (2002). As an alternative, one may consider dynamical
correlation models. Introduced by Engle (2002) and Tse and Tsui (2002), they
enjoy increasing popularity due to their tractability and richness of volatility
and correlation patterns they allow for. We shall not pursue this model class
at this point, but it may be profitable to do so in the future.
Given the preceding considerations we model the levels of the factor load-
ings in a VAR(2) model. The results are presented in Table 5.13. The estima-
tion also includes a constant and two dummy variables, assuming the value
one right at those days and one day after, when the corresponding IV obser-
vations of the minimum time to maturity string (10 days to expiry) were to be
5.4 Semiparametric factor models 181
-0.85
-1.00
-1.16
-1.31
-1.47
0.80 0.50
0.88 0.41
0.96 0.32
0.23
1.04 0.14
1.12
-0.85
-1.00
-1.16
-1.31
-1.47
0.80 0.50
0.88 0.41
0.96 0.32
0.23
1.04 0.14
1.12
Fig. 5.20. Nadaraya-Watson estimator with h = (0.06, 0.25)> and SFM with h =
(0.03, 0.04)> for 20000502.
182 5 Dimension-reduced modeling
Traditional string fit 20000502, 17 days to exp. Individual string fit 20000502, 17 days to exp.
-1
-1
-1.2
-1.2
-1.4
-1.4
-1.6
-1.6
0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2
Moneyness Moneyness
Traditional string fit 20000502, 45 days to exp. Individual string fit 20000502, 45 days to exp.
-1.1
-1.1
-1.2
-1.2
-1.3
-1.3
-1.4
-1.4
-1.5
-1.5
Traditional string fit 20000502, 80 days to exp. Individual string fit 20000502, 80 days to exp.
-1.1
-1.1
-1.2
-1.2
-1.3
-1.3
-1.4
-1.4
-1.5
-1.5
Traditional string fit 20000502, 136 days to exp. Individual string fit 20000502, 136 days to exp.
-1.1
-1.1
-1.2
-1.2
-1.3
-1.3
-1.4
-1.4
-1.5
-1.5
-1.6
-1.6
βbi,1 , have a positive impact on the term structure, βbi,3 . Second order lags in
the term structure dynamics themselves influence positively the moneyness
slope effect, βbi,2 , and negatively the shift variable βbi,1 : thus shocks in the
term structure may decrease the level of the smile and aggravate the skew.
Similar interpretations can be revealed from other significant coefficients in
Table 5.13.
184 5 Dimension-reduced modeling
-0.5
0.1
0.1
0.05
0.05
0
-1
-0.05
-0.1
-0.05
-1.5
1998 1999 2000 2001 1998 1999 2000 2001 1998 1999 2000 2001
Time Time Time
1
0.5
0.5
0.5
acf
acf
acf
0
0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
lag lag lag
Table 5.12. ADF tests on βbi,1 to βbi,3 for the full IVS model, intercept included in
each case. Third column gives the number of lags included in the ADF regression.
For the choice of lag length, we started with four lags, and subsequently deleted lag
terms, until the last lag term became significant at least at a 5% level. MacKinnon
critical values for rejecting the hypothesis of a unit root are -2.87 at 5% significance
level, and -3.44 at 1% significance level.
Equation
Dependent variable βbi,1 βbi,2 βbi,3
βbi−1,1 0.978 -0.009 0.047
[24.40] [-1.21] [ 3.70]
βbi−2,1 0.004 0.012 -0.047
[ 0.08] [ 1.63] [-3.68]
βbi−1,2 0.182 0.861 0.134
[ 0.92] [ 23.88] [ 2.13]
βbi−2,2 -0.129 0.109 -0.126
[-0.65] [ 3.03] [-2.01]
βbi−1,3 0.115 -0.019 0.614
[ 0.97] [-0.89] [ 16.16]
βbi−2,3 -0.231 0.030 0.248
[-1.96] [ 1.40] [ 6.60]
R̄2 0.957 0.948 0.705
F -statistic 2405.273 1945.451 258.165
a time series model with fitted values βei,l (θ̂) based on βbi0 ,l with i0 ≤ i − 1 , 1 ≤
l ≤ L, where θ̂ is a vector of estimated coefficients seen in Table 5.13. Similarly
as before, we employ an AIC based on the fitted values as an asymptotically
unbiased estimate of the mean square prediction error.
def
For the model comparison, we use the criterion ΞAIC1 with w(u) = 1.
Additionally we penalize the dimension of the fitted time series model β(θ):
e
Ji
I X
( L
)2
def
X X
−1
Ξ
eAIC = N yi,j − βei,l (θ̂) m
b l (xi,j )
i j l=0
L 2 dim(θ)
× exp 2 Kh (0) µλ + . (5.88)
N N
In our case dim(θ) = 27, since we have for three equations six VAR-coefficients
plus the constant and two dummy variables.
Criterion (5.88) is compared with the squared one-day prediction error of
the sticky moneyness (StM) model:
Ji
I X
def
X 2
ΞStM = N −1 (yi,j − yi−1,j 0 ) . (5.89)
i j
ΞStM = 0.00476 ,
Ξ
eAIC = 0.00439 .
Thus, the model comparison reveals that the SFM is approximately 10%
better than the naı̈ve trader model. This is a substantial improvement given
the high variance in IV and financial data in general. An alternative approach
would investigate the hedging performance of our model compared with other
models, e.g. in following Engle and Rosenberg (2000). This is left for further
research.
5.5 Summary 187
5.5 Summary
This chapter is divided into two main parts. In the first part, we presented
CPC models as a natural means of modeling the IVS. The CPC approach
comprises an entire hierarchy of models. This allows for a detailed analysis of
the ‘degree of commonness’ within different maturity groups of the IVS. We
derived tests to assess stability of the factor loadings across different samples
and found that only the first component may be considered as being suffi-
ciently stable. The other components fluctuate from sample to sample year.
Finally, we modelled the resulting time series by means of ARCH and GARCH
processes.
In the second part, we digressed on FPCA for IVS modeling. Then, we
presented a semiparametric factor model as a new modeling approach to the
IVS. The key advantage is that it takes care of the discrete string structure of
IV data. The technique can be seen as a combination from FPCA and back-
fitting in additive models. Unlike other studies, this ansatz is tailored to the
degenerated design of IV data by fitting basis functions in the local neighbor-
hood of the design points only. This can reduce bias effects in the estimation
of the IVS. Due to its flexible semiparametric structure, the SFM may also
be advantageous compared to the CPC approach given the structural shifts
in the underlying data. After estimating the factor functions, we fitted vector
autoregressive processes of order two to the factor series. The presentation of
the SFM concluded with a horse race between the SFM and the ‘naı̈ve trader
model’. We found the SFM to be approximately 10% superior to the more
simple model.
Our analysis has shown that CPC and SFM models are powerful dimension
reduction techniques in the context of IVS modeling. Typically, the IVS allows
for a decomposition into three factor that drive the surface. These factors
can be interpreted as a shift factor, which accounts for around 80% of the
variation, a slope and a twist or term structure factor. This result can have
numerous applications: an obvious one is risk management, for instance in
scenario analysis and stress tests of portfolios. In order to make the SFM
more tractable, it may be good to replace the nonparametric functions by
suitable parametric approximations. Then, Monte Carlo simulations of the
models along the lines of Jamshidian and Zhu (1997) are straightforward.
6
The implied volatility (IV) smile and implied volatility surface (IVS) are em-
pirical phenomena that have spurred research since the discovery of the Black-
Scholes (BS) formula in the nineteen-seventies. Two main strands of literature
have dominated the research agenda since then. The first tries to exploit IV
as a predictor for asset price fluctuations. The second seeks to provide alter-
native option pricing models that explain the existence of the volatility smile.
Recently, a third line of research has emerged: shaped by the establishment
of organized futures markets that allow trading of standardized derivatives at
low costs with high liquidity, this new research aims at exploiting the infor-
mation content of option prices or the IVS for the pricing of more complicated
derivatives or positions. This approach has been termed smile consistent mod-
eling.
The IVS is an input factor in almost any smile consistent model, either
directly or in some intermediate step such as the reconstruction of the lo-
cal volatility surface: it may come along as a simple estimate of the current
surface or as a fully specified dynamic model describing the propagation of
the IVS through time. Its accuracy and precision are the decisive competitive
advantages for any smile consistent pricing model. This is particularly obvi-
ous for the complex derivatives and structured products that emerged on the
markets: several underlying assets of all different kinds such as stocks, bonds
and commodity linked products are comprised into a single structured deriva-
tive with complicated path-dependent payoffs, Overhaus (2002) and Quessette
(2002). These products are likely to exhibit high sensitivity to volatility and
are very susceptible to any misspecification of the volatility process.
Besides introducing into the financial theory of smile consistent approaches,
the aim of this book is to take a specific semiparametric perspective towards
two main aspects of model building of the IVS: smoothing and dimension-
reduced modeling. We believe that such an approach is well placed given the
challenges we face in this context: the unknown, complicated functional form
of the IVS and its intricate discrete design. Non- and semiparametric tech-
190 6 Conclusion and outlook
niques do not require any a priori knowledge of the functional form which
is fitted to the data. Rather, it is the IV observations that ‘decide’. Since
from theory only loose restrictions on the IVS can be derived, for instance in
terms of wide no-arbitrage bounds on the slopes, this approach appears to be
particularly attractive.
Smile consistent models are a fruitful field of research, and we can resort
on a wide spectrum of different approaches and specifications today. However,
the current literature lacks empirical assessments and especially investigations
of their hedging performance. These studies should include exotic options and
be performed in comparison with competing model classes, such as stochastic
volatility and jump-diffusion models. This will also shed new light on the delta
debate. Stochastic variants of local volatility models may serve as an elegant
way to circumvent the delta problem, Derman and Kani (1998), Alexander
and Nogueira (2004), but it remains to be shown how they can be employed
effectively for the pricing of exotic derivatives.
A topic for further research is the stability of the dimension reduction.
Instead of estimating on predefined intervals, an alternative is to embed it
into a framework of adaptive window choice as developed by Spokoiny (1998).
Within this setting, one would aim at identifying time-homogeneous intervals
on which the dimension reduction is performed. Examples of this approach in
(realized) volatility modeling are Härdle et al. (2003), Mercurio (2004), and
Mercurio and Spokoiny (2004).
Besides from modeling the IVS, common principle component (CPC) mod-
els are a natural choice whenever the data fall into a number of groups. This
is encountered a lot of times in economics and finance: for instance, the same
variables may be measured in different countries and markets. Thus, CPC
models have found application in the analysis of the term structure of interest
rates across different countries, Alexander and Lvov (2003) and Pérignon and
Villa (2002, 2004). Other possible applications are obvious. Similar reflections
apply to the semiparametric factor model (SFM). Its main properties – esti-
mation in the local neighborhood of the design points and suitable dimension
reduction – make it an ideal candidate for functional modeling. Potential fields
of application are the term structure of interest rates, or swap and forward
rates.
We believe that semiparametric modeling in finance is an inspiring field of
research, and – in recalling the words of Corrozet (1543) – it appears to be
particularly fruitful in a financial world that is ‘un monde instable porté sur
la mer tant esmeue et rogue’.
A
A.1 Preliminaries
The data set employed for this research contains tick statistics on the DAX
futures contract and DAX index options and is provided by the EUREX
(Frankfurt am Main) for the period from 19950101 to 20010531. Both futures
contract data and option data are contract based data, i.e. each single contract
is registered together with its price, contract size, and time of settlement up
to a hundredth second. Interest rate data in daily frequency, i.e. one, three, six
and twelve months FIBOR rates for the years 1995–1999 and EURIBOR rates
for the period 2000–2001, are obtained from Thomson Financial Datastream.
Interest rate data are linearly interpolated to approximate the riskless interest
rate for the option’s time to maturity. In order to avoid a German tax bias,
option raw data has undergone a preparation scheme which is due to Hafner
and Wallmeier (2001) and described in the following. The entire data set
is stored in the financial database MD*base, maintained at the Center for
Applied Statistics and Economics (CASE) at the Humboldt-Universität zu
Berlin.
It is important to remark that a number of fundamental amendments in
income taxation were introduced in Germany in 2000 (Steuersenkungsgesetz,
BGBl. Teil I, Nr. 46 dating from 20001026). After a transition period starting
in 2001, the changes came fully into effect beginning from 2002. The former
legislation granted a tax voucher to domestic shareholders in compensation
for the corporate tax paid by the company (Anrechnungsverfahren). However,
this did not apply to foreign investors. Since 2002, the taxes paid on corpo-
rate income can no longer be deducted by domestic shareholders. Instead,
50% of the distributed dividends are taxed at the personal income tax (Halb-
einkünfteverfahren), while the other 50% of the capital income are not liable
to any further taxation. Therefore, the correction may no longer be mandatory
for the DAX index option data beginning from 2002. Regrettably, we are not
192 A Description and preparation of the IV data
aware of any study investigating this issue. For details on German taxation
law, we refer for instance to Tipke et al. (2002) or Rose (2004).
In a first step of the correction scheme, the DAX index values are recovered.
To this end, we group to each option price observation Ht the futures price Ft
of the nearest available futures contract, which was traded within a one minute
interval around the observed option. The futures price observation was taken
from the most heavily traded futures contract on the particular day, which is
the three months contract. The no-arbitrage price of the underlying index in
a frictionless market without dividends is given by
where St and Ft denote the index and the futures price respectively, TF the
maturity date of the futures contract, and rT,t the interest rate with maturity
T − t.
The DAX index is a capital weighted performance index, Deutsche Börse
(2002), i.e. dividends less corporate tax are reinvested into the index. There-
fore, at a first glance, dividend payments should have no or almost little
impact on the index options. However, when only the interest rate discounted
futures price is used to recover IVs by inverting the BS formula, IVs of calls
and puts can differ significantly. This discrepancy is especially large during
spring, when most of the 30 companies listed in the DAX distribute dividends.
The point is best visible in Figure A.1 from 20000404: IVs of calls (crosses) and
puts (circles) fall apart, thus violating the put-call-parity (2.26) and general
market efficiency considerations.
Hafner and Wallmeier (2001) argue that the marginal investor’s individual
tax scheme is different from the one actually assumed to compute the DAX
index. As has been explained in Section A.1, this can be the case between for-
eign and domestic shareholders, or between domestic shareholders of different
individual taxation. Consequently, the net dividend for this investor can be
higher or lower than the one used for the index computation. The discrep-
ancy, which the authors call difference dividend, has the same impact as a
dividend payment for an unprotected option, i.e. it drives a wedge into the
option prices and hence into IVs. Denote by ∆Dt,T the time T value of this
difference dividend incurred between t and T . Consider the dividend adjusted
futures price, which is approximated here by the forward price:
0.39
0.35
0.32
0.29
0.25
0.20
0.72 0.82 0.14 0.17
0.92 1.02 0.08 0.11
1.12
Fig. A.1. IVS ticks on 20000404, derived from futures prices that are interest rate
discounted only. Put IV are circles, call IV crosses.
with TH denoting the call’s Ct and the put’s Pt maturity date. Inserting
equation (A.2) into (A.3) yields
is that index level, which ties put and call IVs exactly to the same levels when
used in the inversion of the BS formula.
For an estimate of ∆D̂t,TH ,TF , pairs of puts and calls of the strikes and
same maturity are identified provided they were traded within a five minutes
interval. For each pair the ∆Dt,TH ,TF is derived from Equation (A.4). To
ensure robustness ∆D̂t,TH ,TF is estimated by the median of all ∆Dt,TH ,TF of
194 A Description and preparation of the IV data
0.40
0.36
0.33
0.29
0.26
0.20
0.14 0.17
0.72 0.82
0.92 1.02 0.08 0.11
1.12
Fig. A.2. IVS ticks on 20000404, derived from futures prices that are interest rate
discounted and corrected with the implied difference dividend. Put IV are circles, call
IV crosses.
the pairs for a given maturity at day t. IVs are recovered by inverting the BS
formula using the corrected index value S̃t = Ft e−rF (TF −t) + ∆D̂t,TH ,TF . Note
that ∆Dt,TH ,TF = 0, when TH = TF . Indeed, when calculated also in this case,
∆D̂t,TH ,TF proved to be very small (compared with the index value), which
supports the validity of this approach. The described procedure is applied on
a daily basis throughout the entire data set from 19950101 to 20010531. All
computations have been made with XploRe, Härdle et al. (2000b).
In Figure A.2, also from 20000404, we present the data after correct-
ing the discounted futures price with an implied difference dividend ∆D̂t =
(10.3, 5.0, 1.9)> , where the first entry refers to 16 days, the second to 45 days
and the third to 73 days to maturity. IVs of puts and calls converge two
one single string, while the concavity of the put volatility smile is remedied,
too. Note that the overall level of the IV string is not altered through that
procedure.
The data are transaction based and may contain potential misprints and
outliers. This is seen in Figures A.1 and A.2. To accommodate for this, a mild
filter is applied: observations with IV less than 4% and bigger than 80% are
A.2 Data correction scheme 195
Table A.1. Summary statistics on the data base from 19950101 to 20010531, en-
tirely and on an annual basis. 2001 is from 20010101 to 20010531, only.
This chapter contains a number of basic definitions and results from stochas-
tic calculus. They are collected in order to make our treatment more self-
contained. Thus, the selection of the issues is driven by their complementary
function to our work, rather than by their importance in stochastic calculus.
For any deeper treatment or proofs, we refer to standard textbooks such as
Øksendal (1998), Karatzas and Shreve (1991), or Steele (2000).
In this chapter, we consider stochastic processes defined on a complete
probability space (Ω, F, P). The probability space is equipped with a filtra-
tion, i.e. a nondecreasing family (Ft )t≥0 of subsigma fields Fs ⊆ Ft ⊆ F, for
0 ≤ s < t. The filtration is assumed to satisfy the ‘usual’ conditions, namely
that it is right-continuous, and that F0 contains all null sets. A stochastic pro-
cess is a collection of random variables (Xt )t≥0 on (Ω, F), which take values
in Rd . The index t is interpreted as ‘time’. We say that a stochastic process
X is adapted to (Ft )t≥0 , if all Xt are (Ft )t≥0 -measurable. For a fixed ω ∈ Ω,
the mapping t → Xt (ω) for t ≥ 0 is called the sample path of X associated
with ω.
Martingale
Let (Xt )0≤t<∞ be an (Ft )t≥0 -adapted stochastic process on (Ω, F, P) satis-
fying E|Xt | < ∞ for all 0 ≤ t < ∞. The process X is called an (Ft )t≥0 -
martingale, if for every 0 ≤ s < t < ∞, we have
E(Xt |Fs ) = Xs . (B.1)
Let (Xt )0≤t≤T , for T < ∞, be an (Ft )t≥0 -adapted stochastic process on
(Ω, F, P). Further, let Dn be the Dyadic decomposition of order n on the
interval [0, T ], i.e.
198 B Some results from stochastic calculus
Brownian motion
A real-valued stochastic process (Wt )0≤t≤T <∞ adapted to (Ft )0≤t<T is called
a standard Brownian motion with respect to (Ft )0≤t<T on the interval [0, T ]
if it satisfies the following properties:
(i) W0 = 0
(ii) For any 0 ≤ s < t ≤ T the increment
Wt − Ws (B.5)
Itô formula
Suppose that the real-valued process X taking values in R has the (stochastic)
integral representation
Z t Z t
Xt = x0 + as ds + bs dWs (B.8)
0 0
on 0 ≤ t ≤ T , where (at )0≤t≤T and (bt )0≤t≤T are real-valued (Ft )0≤t≤T -
adapted processes satisfying
! !
Z T Z T
P |as | ds < ∞ = 1 and P b2s ds < ∞ =1.
0 0
Then X is called an Itô process. Its quadratic variation process exists and
is given by: Z t
hXit = b2s ds (B.9)
0
for 0 ≤ t ≤ T .
Let f ∈ C 2,1 (R × R+ ). Then Itô’s formula states
Z t Z t
∂f (Xs , s) ∂f (Xs , s)
f (Xt , t) = f (X0 , 0) + ds + dXs
0 ∂t 0 ∂x
Z t 2
1 ∂ f (Xs , s)
+ dhXis , (B.10)
2 0 ∂x2
for 0 ≤ t ≤ T .
For the vector-valued process X = (X (1) , . . . , X (d) )> and f ∈ C 2,1 (Rd ×
+
R ), Itô’s formula generalizes to
Z t d Z t
∂f (Xs , s) X ∂f (Xs , s)
f (Xt , t) = f (X0 , 0) + ds + dXs(i)
0 ∂t i=1 0
∂x i
d d Z t 2
1 XX ∂ f (Xs , s)
+ dhX (i) , X (j) is . (B.11)
2 i=1 j=1 0 ∂xi ∂xj
200 B Some results from stochastic calculus
Tanaka-Meyer formula
is called the local time at level c. Intuitively, it measures the ‘time spent at
level c’.
This is the one-dimensional version of, e.g., Karatzas and Shreve (1991,
Theorem 5.2.9). In the vector-valued case, the absolute value is to be replaced
by a norm, but similar results hold.
Fokker-Planck equation
for some jointly measurable density function φ(y, T |Xt , t) ≥ 0. The notation
makes precise that it is a density conditional on Xt and t. Then, φ(y, T |Xt , t)
can be characterized by the Fokker-Planck or forward Kolmogorov equation
n o n o
2 2
∂φ(y, T |Xt , t) ∂ a(y, T )φ(y, T |X t , t) 1 ∂ b (y, T )φ(y, T |X t , t)
0= + −
∂T ∂y 2 ∂y 2
(B.24)
for fixed (Xt , t) ∈ R × R+ and with the initial condition
Girsanov’s theorem
where k · k denotes the Euclidian norm. Assume that α satisfies the Novikov
condition: ( !)
1 T
Z
2
E exp kαs k ds <∞. (B.27)
2 0
EMt = 1 , (B.28)
i.e. P
e has the Radon-Nikodým derivative:
dPeT
= MT . (B.30)
dP
for i = 1, . . . , d and 0 ≤ t ≤ T .
In this situation Girsanov’s theorem asserts that W
f is a standard Brownian
motion on the new probability space (Ω, F, PT ).
e
C
and
n
b n (σ) def 1 X
L = cti − cBS (κti , τi , ri , σ)}2 Z(κti , τi ) .
{e (C.2)
nh1,n h2,n i=1
def
and we remind that throughout this chapter κt = K/St . For sake of clarity,
we drop in the following the explicit dependence of the option prices and its
derivatives on r. Moreover, in this and the following section Et is an abbrevi-
ation for the conditional expectation with respect to Ft .
As a first step, let us prove
p def
ct − cBS (κt , τ, σ)}2 w(κt ) .
b n (σ) −→
L L(σ) = Et {e (C.3)
It is observed that
204 C Proofs of the results on the LSK IV estimator
n
1 X
cti − cBS (κti , τi , σ)}2 Z(κti , τi )
L
b n (σ) = {e
nh1,n h2,n i=1
cti − cBS (κti , τi , σ)}2 Z(κti , τi )
− Et {e
1
ct1 − cBS (κt1 , τ1 , σ)}2 Z(κt1 , τ1 )
+ Et {e
h1,n h2,n
def
= αn + βn . (C.4)
∂2
+ 2 Et w(κt )cBS (κt , τ, σ(κ, τ )) 2 cBS (κt , τ, σ)
∂σ σ=σ(κ,τ )
2
∂ BS
= 2 Et w(κt ) c (κt , τ, σ) . (C.8)
∂σ σ=σ(κt ,τ )
σ(κt , τ ) is proved.
def ∂
where σ ∗ lies between σ and σ
b and Un0 (σ ∗ ) = ∂σ Un (σ)|σ=σ .
∗
1 nZ
42,n = Et − B 2 (x, y, σ) + A(x, y, σ)D(x, y, σ)
h1,n h2,n
o
× Z(x, y)ft (x, y) dx dy
Z n
= Et −B 2 (κt − h1,n u, τ − h2,n v, σ)
o
+ A(κt − h1,n u, τ − h2,n v, σ)D(κt − h1,n u, τ − h2,n v, σ)
× w(κt ) ft (κt − h1,n u, τ − h2,n v)K(1) (u)K(2) (v) du dv
h n o
−→ − Et B 2 (κt , τ, σ)w(κt )
n oi
+ Et A(κt , τ, σ)D(κt , τ, σ)w(κt ) ft (κt , τ ) . (C.14)
Equations (C.12), (C.13), (C.14) and the fact Un0 (σ ∗ )−Un0 (σ) → 0 together
prove:
p
h n o
Un0 (σ ∗ ) −→ Et −B 2 (κt , τ, σ)w(κt )
n oi
+ Et A(κt , τ, σ)D(κt , τ, σ)w(κt ) ft (κt , τ ) .
(C.15)
Now, let
def 1
uni = A(κti , τi , σ)B(κti , τi , σ) Z(κti , τi ) . (C.16)
h1,n h2,n
C.2 Proof of asymptotic normality 207
Similarly, we get:
R 2 R 2
ft (κt , τ ) K(1) (u) du K(2) (v) dv
Et u2ni =
h1,n h2,n
× Et {A2 (κt , τ, σ)B 2 (κ, τ, σ)w2 (κt )}
1
+ O . (C.18)
h1,n h2,n
as nh1,n h2,n → 0.
Applying the Liapounov central limit theorem, we get
p L
nh1,n h2,n Un (σ) −→ N 0, ft (κt , τ ) ν 2 , (C.20)
where
Z
2 def 2 2 2 2 2
ν = Et {A (κt , τ, σ)B (κt , τ, σ)w (κt )} K(1) (u)K(2) (v) dudv . (C.21)
Cox, J. E. and Ross, S. A. (1976). The valuation of options for alternative stochastic
processes, Journal of Financial Economics 76: 145–166.
Cox, J. E., Ross, S. A. and Rubinstein, M. (1979). Option pricing: A simplified
approach, Journal of Financial Economics 7: 229–263.
Crépey, S. (2004). Delta-hedging vega risk?, Technical report, Université d’Évry,
France.
Daglish, T. (2003). Pricing and hedging comparison for index options, Journal of
Financial Econometrics 1(3): 327–364.
Daglish, T., Hull, J. C. and Suo, W. (2003). Volatility surfaces: Theory, rules of
thumb, and empirical evidence, Working paper, J. L. Rotman School of Man-
agement, University of Toronto.
Das, S. and Sundaram, R. (1999). Of smiles and smirks: A term-structure perspec-
tive, Journal of Financial and Quantitative Analysis 34(2): 211–240.
Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for the principal
component analysis of a vector random function: Some applications to statistical
inference, Journal of Multivariate Analysis 12: 136–154.
Dempster, M. A. H. and Richards, D. G. (2000). Pricing American options fitting
the smile, Mathematical Finance 10(2): 157–177.
Derman, E. (1999). Regimes of volatility, RISK 12(4): 55–59.
Derman, E. and Kani, I. (1994a). Riding on a smile, RISK 7(2): 32–39.
Derman, E. and Kani, I. (1994b). The volatility smile and its implied tree, Quanti-
tative strategies research notes, Goldman Sachs.
Derman, E. and Kani, I. (1998). Stochastic implied trees: Arbitrage pricing with
stochastic term and strike structure of volatility, International Journal of The-
oretical and Applied Finance 1(1): 61–110.
Derman, E., Ergener, D. and Kani, I. (1995). Static options replication, Journal of
Derivatives 2(4): 78–95.
Derman, E., Kani, I. and Chriss, N. (1996a). Implied trinomial trees of the volatility
smile, Journal of Derivatives 3(4): 7–22.
Derman, E., Kani, I. and Kamal, M. (1997). Trading and hedging local volatility,
Journal of Financial Engineering 6(3): 1233–1268.
Derman, E., Kani, I. and Zou, J. Z. (1996b). The local volatility surface: Unlocking
the information in index option prices, Financial Analysts Journal 7-8: 25–36.
Deutsche Börse (2002). Leitfaden zu den Aktienindizes der Deutschen Börse, 4.3
edn, Deutsche Börse AG, 60284 Frankfurt am Main.
Duffie, D. (2001). Dynamic Asset Pricing Theory, 3rd edn, Princeton University
Press, Princeton.
Dumas, B., Fleming, J. and Whaley, R. E. (1998). Implied volatility functions:
Empirical tests, Journal of Finance 80(6): 2059–2106.
Dupire, B. (1994). Pricing with a smile, RISK 7(1): 18–20.
Eberlein, E. and Keller, U. (1995). Hyperbolic distributions in finance, Bernoulli
1: 281–299.
Eberlein, E. and Prause, K. (2002). The generalized hyperbolic model: Financial
derivatives and risk measures, in H. Geman, D. Madan, S. Pliska and T. Vorst
(eds), Mathematical Finance - Bachelier Congress 2000, Springer-Verlag, Berlin,
Heidelberg, pp. 245–267.
Ederington, L. and Guan, W. (2002). Why are those options smiling?, Journal of
Derivatives 10(2): 9–34.
214 References
Figlewski, S. (1989). What does an option pricing model tell us about option prices?,
Financial Analysts Journal 45: 12–15.
Flury, B. (1988). Common Principal Components and Related Multivariate Models,
Wiley Series in Probability and Mathematical Statistics, John Wiley & Son,
New York.
Flury, B. and Gautschi, W. (1986). An algorithm for simultaneous orthogonal trans-
formations of several positive definite matrices to nearly diagonal form, Journal
on Scientific and Statistical Computing 7: 169–184.
Föllmer, H. and Schied, A. (2002). Stochastic Finance: An Introduction in Dis-
crete Time, Wiley Series in Probability and Mathematical Statistics, Walter de
Gruyter, Berlin, New York.
Föllmer, H. and Schweizer, M. (1990). Hedging of contingent claims under incom-
plete information, in M. H. A. Davis and R. J. Elliott (eds), Applied Stochasti-
cal Analysis, Vol. 5 of Stochastics Monographs, Gordon and Breach, New York,
pp. 389–414.
Föllmer, H. and Sondermann, D. (1986). Hedging of non-redundant contingent
claims, in W. Hildenbrand and A. Mas-Colell (eds), Contributions to Math-
ematical Economics in Honor of Gérard Debreu, North-Holland, Amsterdam,
pp. 206–223.
Fouque, J.-P., Papanicolaou, G. and Sircar, K. R. (2000). Derivatives in Financial
Markets with Stochastic Volatility, Cambridge University Press, Cambridge.
Franke, J., Härdle, W. and Hafner, C. (2004). Introduction to the Statistics of
Financial Markets, Springer-Verlag, Berlin, Heidelberg. Forthcoming.
Frey, R. (1996). Derivative asset analysis in models with level-dependent and
stochastic volatility, CWI Quarterly 10(1): 1–34.
Frey, R. and Patie, P. (2002). Risk management for derivatives in illiquid markets:
A simulation study, in K. Sandmann and P. Schönbucher (eds), Advances in
Finance and Stochastics, Springer-Verlag, Berlin, Heidelberg.
Gatheral, J. (1999). The volatility skew: Arbitrage constraints and asymptotic be-
havior, Technical report, Merill Lynch.
Ghysels, E. and Ng, S. (1989). A semiparametric factor model of interest rates and
tests of the affine term structure, Review of Economics and Statistics 80: 535–
548.
Glosten, L., Jagannathan, R. and Runkle, D. (1993). Relationship between the
expected value and the volatility of the nominal excess return on stocks, Journal
of Finance 48: 1779–1801.
Golub, B. and Tilman, L. M. (1997). Measuring yield curve risk using principal
component analysis, value at risk, and key rate durations, Journal of Portfolio
Management 23(4): 72–84.
Gouriéroux, C. and Jasiak, J. (2001). Dynamic factor models, Econometrics Review
20(4): 385–424.
Gouriéroux, C., Monfort, A. and Tenreiro, C. (1994). Nonparametric diagnostics for
structural models, Document de travail 9405, CREST, Paris.
Gouriéroux, C., Monfort, A. and Tenreiro, C. (1995). Kernel M-estimators and
functional residual plots, Document de travail 9546, CREST, Paris.
Gouriéroux, C., Scaillet, O. and Szafarz, A. (1997). Econométrie de la finance,
Economica, Paris.
Grossman, S. and Zhou, Z. (1996). Equilibrium analysis of portfolio insurance,
Journal of Finance 51(4): 1379–1403.
216 References
Heston, S. (1993). A closed-form solution for options with stochastic volatility with
applications to bond and currency options, Review of Financial Studies 6: 327–
343.
Heynen, R. (1994). An empirical investigation of observed smile patterns, Review
of Futures Markets 13: 317–353.
Hlávka, Z. (2003). Constrained estimation of state price densities, Discussion Paper
2003-22, SfB 373, Humboldt-Universität zu Berlin.
Hormander, L. (1990). The Analysis of Linear Partial Differential Operators I: Dis-
tribution Theory and Fourier Analysis, 2nd edn, Springer-Verlag, Berlin, Hei-
delberg.
Horowitz, J. (1998). Semiparametric Methods in Econometrics, number 131 in Lec-
ture Notes in Statistics, Springer-Verlag, Berlin, Heidelberg.
Horowitz, J., Klemela, J. and Mammen, E. (2002). Optimal estimation in additive
models, Preprint.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal
components, Journal of Educational Psychology 24: 417–441.
Hull, J. (2002). Options, Futures, and Other Derivatives, Prentice Hall, New Jersey,
USA.
Hull, J. and White, A. (1987). The pricing of options on assets with stochastic
volatilities, Journal of Finance 42: 281–300.
Huynh, K., Kervalla, P. and Zheng, J. (2002). Estimating state price densities with
nonparametric regression, in W. Härdle, T. Kleinow and G. Stahl (eds), Applied
Quantitative Finance, Springer-Verlag, Berlin, Heidelberg.
Ingersoll, J. E. (1997). Valuing foreign exchange rate derivatives with a bounded
exchange rate process, Review of Derivatives Research 1: 159–181.
Jackson, N., Süli, E. and Howison, S. (1998). Computation of deterministic volatility
surfaces, Journal of Computational Finance 2(2): 5–32.
Jackwerth, J. C. (1997). Generalized binomial trees, Journal of Derivatives 5: 7–17.
Jackwerth, J. C. (1999). Option-implied risk-neutral distributions and implied bi-
nomial trees: A literature review, Journal of Derivatives 7(2): 66–82.
Jackwerth, J. C. and Rubinstein, M. (2001). Recovering stochastic processes from
option prices, Working paper, Universität Konstanz.
Jamshidian, F. (1993). Options and futures evaluation with deterministic volatilities,
Mathematical Finance 3(2): 149–159.
Jamshidian, F. and Zhu, Y. (1997). Scenario simulation: Theory and methodology,
Finance and Stochastics 1: 43–67.
Jarrow, R. A. and O’Hara, M. (1989). Primes and scores: An essay on market
imperfections, Journal of Finance 44: 1265–1287.
Jiang, L. and Tao, Y. (2001). Identifying the volatility of the underlying assets from
option prices, Inverse Problems 17: 137–155.
Jiang, L., Chen, Q., Wang, L. and Zhang, J. E. (2003). A new well-posed algorithm
to recover implied local volatility, Quantitative Finance 3: 451–457.
Johnson, R. A. and Wichern, D. W. (1998). Applied Multivariate Statistical Analysis,
4 edn, Prentice-Hall, Englewood Cliffs, N.J.
Jorion, P. (1988). On jump processes in the foreign exchange and stock markets,
Review of Financial Studies 1(4): 427–445.
Jorion, P. (1995). Predicting volatility in the foreign exchange market, Journal of
Finance 50(2): 507–528.
218 References
Pong, S., Shackleton, M., Taylor, S. and Xu, X. (2003). Forecasting currency volatil-
ity: A comparison of implied volatilities and AR(FI)MA models, Journal of
Banking and Finance. Forthcoming.
Poon, S.-H. and Granger, C. W. J. (2003). Forecasting volatility in financial markets:
A review, Journal of Economic Literature 41: 478–539.
Press, W., Flannery, B., Teukolsky, S. and Vetterling, W. (1993). Numerical Recipes
in C: The Art of Scientific Computing, 2nd edn, Cambridge University Press.
Quessette, R. (2002). New products, new risks, RISK 15(3): 97–100.
Rady, S. (1997). Option pricing in the presence of natural boundaries and a quadratic
diffusion term, Finance and Stochastics 1: 331–344.
Ramsay, J. O. and Silverman, B. W. (1997). Functional Data Analysis, Springer-
Verlag, Berlin, Heidelberg.
Randall, C. and Tavella, D. (2000). Pricing Financial Instruments: The Finite
Difference Method, John Wiley & Sons, New York.
Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd edn, Wiley,
New York.
Rebonato, R. (1998). Interest-Rate Option Models: Understanding, Analyzing and
Using Models for Exotic Interest-Rate Options, Wiley Series in Financial Engi-
neering, 2nd edn, John Wiley & Son Ltd.
Rebonato, R. (1999). Volatility and Correlation, Wiley Series in Financial Engineer-
ing, John Wiley & Son Ltd.
Renault, E. and Touzi, N. (1996). Option hedging and implied volatilities in a
stochastic volatility model, Mathematical Finance 6(3): 279–302.
Riesz, F. and Nagy, B. (1956). Functional Analysis, Blackie, London.
Roll, R. (1984). A simple implicit measure of the effective bid-ask spread, Journal
of Finance 39: 1127–1139.
Rookley, C. (1997). Fully exploiting the information content of intra-day option
quotes: Applications in option pricing and risk management, Technical report,
Department of Finance, University of Arizona.
Rose, G. (2004). Unternehmenssteuerrecht, E. Schmidt Verlag.
Rosenberg, J. (2000). Implied volatility functions: A reprise, Journal of Derivatives
7: 51–64.
Rossi, A. (2002). The Britten-Jones and Neuberger smile-consistent with stochas-
tic volatility option pricing model: A further analysis, International Journal of
Theoretical and Applied Finance 5(1): 1–31.
Rubinstein, M. (1994). Implied binomial trees, Journal of Finance 49: 771–818.
Ruppert, D. (1997). Empirical-bias bandwidths for local polynomial nonparametric
regression and density estimation, Journal of the American Statistical Associa-
tion 92: 1049–1062.
Ruppert, D. and Wand, M. P. (1994). Multivariate locally weighted least squares
regression, Annals of Statistics 22(3): 1346–1370.
Schmalensee, R. and Trippi, R. R. (1978). Common stock volatility expectations
implied by option premia, Journal of Finance 33: 129–147.
Schönbucher, P. J. (1999). A market model for stochastic implied volatility, Philo-
sophical Transactions of the Royal Society 357(1758): 2071–2092.
Schoutens, W. (2003). Lévy Processes in Finance, John Wiley & Sons, New York.
Schwarz, G. (1978). Estimating the dimension of a model, Annals of Statistics
6: 461–464.
References 221
Scott, L. (1987). Option pricing when the variance changes randomly: Theory,
estimation, and an application, Journal of Financial and Quantitative Analysis
22: 419–37.
Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under
conditions of risk, Journal of Finance 19: 425–442.
Shimko, D. (1993). Bounds on probability, RISK 6(4): 33–37.
Shu, J. and Zhang, J. E. (2003). The relationship between implied and realized
volatility of S&P 500 index, Wilmott magazine Jan.: 83–91.
Skiadopoulos, G. (2001). Volatility smile consistent option models: A survey, Inter-
national Journal of Theoretical and Applied Finance 4(3): 403–437.
Skiadopoulos, G., Hodges, S. and Clewlow, L. (1999). The dynamics of the S&P 500
implied volatility surface, Review of Derivatives Research 3: 263–282.
Spokoiny, V. (1998). Estimation of a function with discontinuities via local polyno-
mial fit with an adaptive window choice, Annals of Statistics 26: 1356–1378.
Steele, J. M. (2000). Stochastic Calculus and Financial Applications, Springer-
Verlag, Berlin, Heidelberg, New York.
Stein, E. M. and Stein, J. C. (1991). Stock price distributions with stochastic
volatility: An analytic approach, Review of Financial Studies 4: 727–752.
Stone, C. J. (1986). The dimensionality reduction principle for generalized additive
models, The Annals of Statistics 14: 592–606.
Tanaka, H. (1963). Note on continuous additive functionals of the 1-dimensional
Brownian path, Zeitschrift für Wahrscheinlichkeitstheorie 1: 251–257.
Taylor, S. J. (2000). Consequences for option pricing of a long memory in volatility,
Working paper, Department of Accounting and Finance, Lancaster University,
UK.
Tipke, K., Lang, J. and Seer, R. (2002). Steuerrecht, O. Schmidt Verlag, Köln.
Tompkins, R. (1999). Implied volatility surfaces: Uncovering regularities for options
on financial futures, Working paper, Vienna University of Technology.
Tompkins, R. (2001). Stock index futures markets: Stochastic volatility models and
smiles, The Journal of Futures Markets 21(1): 43–78.
Tse, Y. and Tsui, A. (2002). A multivariate generalized autoregressive conditional
heteroscedastic model with time-varying correlations, Journal of Business and
Economic Statistics 20(3): 351–362.
Vähämaa, S. (2004). Delta hedging with the smile, Financial Markets and Portfolio
Management 18(3): 241–255.
Watson, G. S. (1964). Smooth regression analysis, Sankyhā, Series A 26: 359–372.
Weinberg, S. A. (2001). Interpreting the volatility smile: An examination of the
informational content of option prices, International Finance Discussion Papers
706, Federal Reserve Board, Washington, D. C.
Whaley, R. (1982). Valuation of American call options on dividend-paying stocks:
Empirical tests, Journal of Financial Economics 10: 29–58.
Wilmott, P. (2001a). Paul Wilmott on Quantitative Finance, Vol. 1, John Wiley &
Sons.
Wilmott, P. (2001b). Paul Wilmott on Quantitative Finance, Vol. 2, John Wiley &
Sons.
Zakoian, J. M. (1994). Threshold heteroskedastic functions, Journal of Economic
Dynamics and Control 18: 931–955.
Zhu, Y. and Avellaneda, M. (1997). An E-ARCH model for the term-structure of
implied volatility of FX options, Applied Mathematical Finance 4: 81–100.
222 References