0% found this document useful (0 votes)
15 views

SPIV Abgabe Op

Uploaded by

hzd199623333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

SPIV Abgabe Op

Uploaded by

hzd199623333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 241

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/265505870

Semiparametric Modeling of Implied Volatility

Book · December 2005


DOI: 10.1007/3-540-30591-2

CITATIONS READS
120 1,038

1 author:

Matthias R. Fengler
University of St.Gallen
62 PUBLICATIONS 1,013 CITATIONS

SEE PROFILE

All content following this page was uploaded by Matthias R. Fengler on 19 November 2019.

The user has requested enhancement of the downloaded file.


Matthias R. Fengler

Semiparametric Modeling of
Implied Volatility
SPIN Springer’s internal project number, if known

– Monograph –

August 11, 2005

Springer
Berlin Heidelberg NewYork
Hong Kong London
Milan Paris Tokyo
Le Monde Instable

Le monde en vne isle porté


Sur la mer tant esmeue et rogue,
Sans seur gouuernal nage et vogue,
Monstrant son instabilité.

Corrozet (1543)
quoted from Henkel and Schöne (1996)
Acknowledgements

This book has benefitted a lot from suggestions and comments of colleagues,
fellow students and friends whom I wish to thank at this place. At first rate,
I thank Wolfgang Härdle. He directed my interest to implied volatilities and
made me familiar with non- and semiparametric modeling in Finance. Without
him, his encouragement and advise this work would not exist. Furthermore,
I like to thank Vladimir Spokoiny, in particular for his comments during my
talks in the Seminar for Mathematical Statistics at the WIAS, Berlin.
This work is in close context with essays I have written with a number of
coauthors. Above all, I thank Enno Mammen: the cooperation in semipara-
metric modeling has been highly instructive and fruitful for me. In this regard,
I also thank Qihua Wang.
For an unknown number of helpful discussions or proofreading my thanks
go to Peter Bank, Michal Benko, Szymon Borak, Kai Detlefsen, Erhard and
Martin Fengler, Patrick Herbst, Zdeněk Hlávka, Torsten Kleinow, Danilo Mer-
curio and Marlene Müller and to all contemporary and former members of the
ISE and CASE for the inspiring working environment they generated there.
Finally, I wish to thank the members of my family non explicitly mentioned
up to now, Stephanus and especially my mother Brigitte Fengler and Georgia
Mavrodi who in their ways did all their best to support me and the project
at its different stages.
I gratefully acknowledge financial support by the Deutsche Forschungs-
gemeinschaft in having been a member of the Sonderforschungsbereich 373
Quantifikation und Simulation ökonomischer Prozesse at the Humboldt-
Universität zu Berlin.

Berlin, May 2005 Matthias R. Fengler


Frequently used notation

Abbreviation or symbol Explanation

ATM at-the-money
BS Black and Scholes (1973)
cdf cumulative distribution function
Ct price of a call option at time t
CtBS Black-Scholes price of a call option at time t
C(A) the continuous functions f : A → R
C k (A) functions in C(A) with continuous derivatives
up to order k
C k,l (R × R) the functions f : R × R → R
which are C k w.r.t. the first and C l w.r.t.
the second argument
Cov(X, Y ) covariance of two random variables X and Y
CPC(A) common principal component (analysis)
δ dividend yield
δx0 Dirac
R delta function defined by the property:
f (x) δx0 (x) dx = f (x0 ) for a smooth
function f
E(X) expected value of the random variable X
Ft forward or futures price of an asset at time t
Ft filtration, the information set generated
by the information available up to time t
Ip p × p unity matrix
IV implied volatility
IVS implied volatility surface
ITM in-the-money
1(A) indicator function of the set A
K exercise price
X

K(·) kernel function: continuous,


R bounded and symmetric
real function satisfying K(u) du = 1
def
κf forward or futures moneyness: κf = K/Ft
LVS local volatility surface
µ mean of a random variable
N (µ, Σ) normal distribution with mean vector µ
and covariance matrix Σ
OTM out-of-the-money
O αn = O(βn ) means: limn→∞ αβnn → 0
O αn = O(βn ) means: limn→∞ αβnn → some constant
pdf probability density function
pCPC(q) partial CPC model of order q
Pt price of a put option at time t
PCA principal component analysis
P(A) probability of the set A, objective measure
PDE partial differential equation
Q a risk neutral measure
r interest rate
Rd d-dimensional Euclidian space, R = R1
R+ the non-negative real numbers
St price of a stock at time t
SDE stochastic differential equation
Σ covariance matrix
t time
T expiry date of a financial contract
def
τ τ = T − t, time to maturity of an option or a forward
Var(X) variance of the random variable X
Wt Brownian motion at time t
Wt Brownian motion under the risk neutral measure at time t
def 2
ϕ(x) pdf of the normel distribution: ϕ(x) = √1 e−x /2

def R u 2
Φ(u) cdf of a normal random variable: Φ(u) = −∞ √12π e−x /2 dx

def
= is defined as
∼ if X ∼ D, the random variable X has the distribution D
L
−→ converges in distribution to
p
−→ converges in probability to
def
(X)+ (X)+ = max(X, 0)
hXit quadratic variation process of the stochastic process X
hX, Y it covariation process of the stochastic processes X and Y
|x| absolute value of the scalar x
|X| determinant of the matrix X
X> transpose of the matrix X
tr X trace of the matrix X
hf, gi inner product of the functions f and g
XI

In this book, we will mainly employ three concepts of volatility based on


the following stochastic differential equation for the asset price process:

dSt
= µ(St , t) dt + σ(St , t, ·) dWt .
St

These concepts are in particular:

Instantaneous Implied Local

—— volatility ——

σ(St , t, ·) σ
bt (K, T ) σK,T (St , t)

Instantaneous volatility Implied volatility is the Local volatility is the


measures the instanta- BS option price implied expected instantaneous
neous standard deviation measure of volatility. It volatility conditional on
of the return process of is the volatility parameter a particular level of the
the log-asset price. It that equates the BS price asset price ST = K at
depends on the current and a particular observed t = T . If the instan-
level of the asset price market price of an option. taneous volatility is a
St , time t and possibly Thus, it depends on the deterministic function
on other state variables strike K, the expiry date in St and t, i.e. can be
abbreviated with ‘ · ’. T and time t . written as σ(St , t), then
σK,T (St , t) = σ(K, T ).

The term volatility is reserved for objects of the kind σ and σ


b, while their
squared counterparts σ 2 and σ
b2 are called variance.
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 The implied volatility surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9


2.1 The Black-Scholes model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The self-financing replication strategy . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Risk neutral pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 The BS formula and the greeks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 The IV smile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Static properties of the smile function . . . . . . . . . . . . . . . . . . . . . . 27
2.6.1 Bounds on the slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.2 Large and small strike behavior . . . . . . . . . . . . . . . . . . . . . 28
2.7 General regularities of the IVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7.1 Static stylized facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7.2 DAX index IV between 1995 and 2001 . . . . . . . . . . . . . . . 31
2.8 Relaxing the constant volatility case . . . . . . . . . . . . . . . . . . . . . . . 37
2.8.1 Deterministic volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8.2 Stochastic volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.9 Challenges arising from the smile . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9.1 Hedging and risk management . . . . . . . . . . . . . . . . . . . . . . 42
2.9.2 Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.10 IV as predictor of realized volatility . . . . . . . . . . . . . . . . . . . . . . . . 45
2.11 Why do we smile? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
XIV Contents

3 Smile consistent volatility models . . . . . . . . . . . . . . . . . . . . . . . . . . 49


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 The theory of local volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Backing the LVS out of observed option prices . . . . . . . . . . . . . . 53
3.4 The dual PDE approach to local volatility . . . . . . . . . . . . . . . . . 56
3.5 From the IVS to the LVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.6 Asymptotic relations between implied and local volatility . . . . . 63
3.7 The two-times-IV-slope rule for local volatility . . . . . . . . . . . . . . 64
3.8 The K-strike and T -maturity forward risk-adjusted measure . . 66
3.9 Model-free (implied) volatility forecasts . . . . . . . . . . . . . . . . . . . . 69
3.10 Local volatility models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.10.1 Deterministic implied trees . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.10.2 Stochastic implied trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.10.3 Reconstructing the LVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.11 Excellent fit, but...: the delta problem . . . . . . . . . . . . . . . . . . . . . . 91
3.12 Stochastic IV models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4 Smoothing techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2 Nadaraya-Watson smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.1 Kernel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.2 The Nadaraya-Watson estimator . . . . . . . . . . . . . . . . . . . . 105
4.3 Local polynomial smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.4 Bandwidth selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4.1 Theoretical framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4.2 Bandwidth choice in practice . . . . . . . . . . . . . . . . . . . . . . . 111
4.5 Least squares kernel smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.5.1 The LSK estimator of the IVS . . . . . . . . . . . . . . . . . . . . . . 119
4.5.2 Application of the LSK estimator . . . . . . . . . . . . . . . . . . . . 122
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5 Dimension-reduced modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2 Common principal component analysis . . . . . . . . . . . . . . . . . . . . . 132
5.2.1 The family of CPC models . . . . . . . . . . . . . . . . . . . . . . . . . 132
Contents XV

5.2.2 Estimating common eigenstructures . . . . . . . . . . . . . . . . . 136


5.2.3 Stability tests for eigenvalues and eigenvectors . . . . . . . . 138
5.2.4 CPC model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.2.5 Empirical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.3 Functional data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.3.1 Basic set-up of FPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.3.2 Computing FPCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.4 Semiparametric factor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.4.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.4.2 Norming of the estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.4.3 Choice of model parameters . . . . . . . . . . . . . . . . . . . . . . . . 172
5.4.4 Empirical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.4.5 Assessing prediction performance . . . . . . . . . . . . . . . . . . . . 185
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

6 Conclusion and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

A Description and preparation of the IV data . . . . . . . . . . . . . . . . 191


A.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
A.2 Data correction scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

B Some results from stochastic calculus . . . . . . . . . . . . . . . . . . . . . . 197

C Proofs of the results on the LSK IV estimator . . . . . . . . . . . . . 203


C.1 Proof of consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
C.2 Proof of asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
1

Introduction

Yet that weakness is also its greatest strength. People


like the model because they can easily understand
its assumptions. The model is often good as a first
approximation, and if you can see the holes in
the assumptions you can use the model in more
sophisticated ways.
Black (1992)

Expected volatility as a measure of risk involved in economic decision making


is a key ingredient in modern financial theory: the rational, risk-averse investor
will seek to balance the tradeoff between the risk he bears and the return he
expects. The more volatile the asset is, i.e. the more it is prone to exces-
sive price fluctuations, the higher will be the expected premium he demands.
Markowitz (1959), followed by Sharpe (1964) and Lintner (1965), were among
the first to quantify the idea of the simple equation ‘more risk means higher
return’ in terms of equilibrium models. Since then, the analysis of volatility
and price fluctuations has sparked a vast literature in theoretical and quanti-
tative finance that refines and extends these early models. As the most recent
climax of this story, one may see the Nobel prize in Economics granted to
Robert Engle in 2003 for his path-breaking work on modeling time-dependent
volatility.
Long before this, a decisive turn in the research of volatility was rendered
possible with the seminal publication by Black and Scholes (1973) on the
pricing of options and corporate liabilities. Their fundamental result, the cel-
ebrated Black-Scholes (BS) formula, offers a framework for the valuation of
European style derivatives within a simple set of assumptions. Six parameters
enter the pricing formula: the current underlying asset price, the strike price,
the expiry date of the option, the riskless interest rate, the dividend yield, and
a constant volatility parameter that describes the instantaneous standard de-
viation of the returns of the log-asset price. The application of the formula,
however, faces an obstacle: only its first five parameters are known quantities.
The last one, the volatility parameter, is not.
An obvious way to respond to this dilemma is to resort to well-established
statistical tools and to estimate the volatility parameter from the time series
data of the underlying asset. However, there is also a second perspective that
the markets and the literature quickly adopted: instead of estimating the
2 1 Introduction

volatility for finding an option price, one aims at recovering that volatility
which the market has priced into a given option price observation. To put it
in other words, the question is:

what volatility is implied in observed option prices, if the BS model is


a valid description of market conditions?

This reverse perspective constitutes the concept of the BS implied volatility.


A typical picture of implied volatility (IV), as observed on 2nd May, 2000,
or 20000502 (a date notation we will adopt from now on) is presented in
Figure 1.1. IV is displayed across different strike prices and expiry dates.
Strikes are rescaled in a moneyness metric, where strikes near the current
asset price are mapped into the neighborhood of one, and the expiry dates
are converted into the time to maturity of the option expressed in years. As is
visible, IV exhibits a pronounced curvature across strikes and is also curved
across time to maturity, albeit not so much. For a given time to maturity, this
function has been named smile, and the entire ensemble is called the implied
volatility surface (IVS). The striking conclusion from a picture like Figure 1.1
is the clear contradiction to an assumption fundamental to the BS model:
instead of being constant, IV is nonlinear in strikes and time to maturity, and
– if seen in a sequel of points in time – also time-dependent.
This evident antagonism has been a fruitful starting-point for variations
and extensions of this basic pricing model in any direction. At the same time, it
does not appear to harm the model itself or the popularity of IV. Nowadays,
IV is ubiquitous: it serves as a convenient way of quoting options among
market participants, volatility trading is common practice on trading floors,
market models incorporate the risk from fluctuating IVs for hedges, and risk
management tools, which are approved by banking regulators to steer the
allocation of economic capital, include models of the IVS.
A number of reasons may be put forward for explaining the unrivalled
popularity of IV. One of them – already anticipated by the initial words by
Fisher Black – can be seen in the set of easy-to-communicate assumptions
associated with the BS model. Another, more fundamental reason is that a
volatility concept implied from option prices enjoys a particular – if not pivotal
– property: as options are bets on the future development of the underlying
asset, the key advantage of this option implied volatility is the fact that it is
a forward looking variable by nature. Thus, unlike volatility measures based
on historical data, it should reflect market expectations on volatility over the
remaining life time of the option. Consequently, the information content of
IV and its capability of being a predictor for future asset price volatility has
been of primary concern in the literature on IV from the early studies up to
now.
Yet, it was only in the recent decade that the finance community recog-
nized that the IVS – aside from being a potential predictor or well-known
1 Introduction 3

IVS Ticks 20000502

0.50

0.44

0.38

0.32

0.26

0.56 0.63
0.71 0.51
0.87 0.40
0.28
1.02 0.16
1.18

Fig. 1.1. DAX option IVs on 20000502. IV observations are displayed as black dots.
Lower left axis is moneyness and lower right time to maturity measured in years.

artefact and curiosity – bears valuable information on the asset price pro-
cess and its dynamics, and that this information can be exploited in models
for the pricing and the hedging of other complex derivatives or positions.
This development goes in line with the advent of highly liquid option and fu-
tures markets that were established all around the world beginning from the
nineteen-nineties. Before this, model calibration and pricing typically relied
on historically sampled time series data. This bears the disadvantage that
the results are predominantly determined by the price history and that the
adjustment to new information is too slow. Unlike time series data, the cross-
sectional dimension of option prices across different strikes over a range of time
to maturities offers the unique opportunity to directly exploit instantaneous
data for model calibration.
This breakthrough, initiated by the work of Derman and Kani (1994a),
Dupire (1994) and Rubinstein (1994), triggered the literature on smile consis-
tent pricing. It led, for instance, to the development of static option replication
as a means of hedging or to implied trees as a pricing tool. The challenge for
this new approach is that IV cannot be directly used as an input factor, since
– as shall be seen in the course of this book – IV is a global measure of volatil-
ity. Pricing requires a local measure of volatility. Hence, at the heart of this
theory there is another volatility concept, called local volatility. Local volatil-
4 1 Introduction

ity, unfortunately, cannot be observed and needs to be extracted from market


data, either from option prices or from the IVS. Other modeling approaches
formulate IV as an additional stochastic process, that – together with the
asset price process – enters the pricing equation of derivatives.
These developments explain why the new focus actuated the interest in
refined modeling techniques of the IVS and in the structural analysis of its
dynamics. In modeling the IVS, one faces two principal challenges: as is visible
from Figure 1.1, the estimators are required to provide sufficient functional
flexibility in order to optimally fit the shape of the IVS. Otherwise, a model
bias will ensue. Second, given the high-dimensional complexity of the IVS,
low-dimensional representations are desirable from a dynamic standpoint. Not
only does a low-dimensional representation of the IVS facilitate the practical
implementation of any (dynamic) model, it additionally uncovers the struc-
tural basis of the data. This will ultimately lead to a better understanding of
the IVS as a financial variable. Natural candidates of techniques that meet
these key requirements are non- and semiparametric methods: they allow for
high functional flexibility and parsimonious modeling. Therefore, results from
this line of research are of immediate importance when local volatility or
stochastic IV models are to be implemented in practice.
The aim of this book is twofold: the first object is to give a thorough treat-
ment of the financial theory on implied and local volatility and smile consistent
modeling. Particular attention is given to highlight the cross-relationships be-
tween the volatility concepts as shown in Figure 1.1. The second object is to
familiarize the reader with refined non- and semiparametric estimation strate-
gies and dimension reduction methods for functional surfaces and to demon-
strate their effectiveness in the field of IV modeling. The majority of results
and techniques we discuss are currently available in preprints or published
papers, only. In having their applicability in mind, we take care to illustrate
them with empirical investigations that underline their use in practice. We
believe that in combining the two fields of research – smile consistent model-
ing and non- and semiparametric estimation techniques – in this way, we can
fill a gap among the textbooks at today’s disposal.
Writing a book in the mid of two fields of research requires concessions to
the breadth each topic can be treated with. Since our emphasis is on financial
modeling aspects, we introduce both financial and statistical theory to the
extent we deem necessary for the reader to fully appreciate the core concepts
of the book. At the same time, we try to keep the book as self-contained as
possible in providing an appendix that collects main results from stochastic
calculus and statistics. Therefore, general asset pricing theory is introduced
only in its basics. For a broader and more general overview on asset pricing
theory the reader is referred to classical textbooks such as Björk (1998), Duffie
(2001), Föllmer and Schied (2002), Hull (2002), Joshi (2003), or Lipton (2001)
to name but a few. The same philosophy applies to the non- and semiparamet-
ric methods. Standard books the reader may like to consult in this direction
1 Introduction 5

IV counterpart of Dupire formula (3.36)

t ↑ T : spatial harmonic
mean of volatility (3.46)
local variance determ. - implied variance
2
σK,T (St , t) no strike dependence bt2 (K, T )
σ
or far OTM/ITM
KA KA arithmetic mean (2.78) and (3.47)
 
A AA  
A 
A E(K,T ) {σ 2 (S , T, ·)|F } Qλ1
√ 2 
2
A T t {E ( σ |Ft )} 
A Section 3.8 K ≈ Ft , see (2.93) 
K = St , T = t A  K = Ft , t ↑ T
see (3.4) A A   see (3.124)
A A  
A
U A  

instantaneous variance
σ 2 (St , t, ·)

Fig. 1.2. Overview on the volatility concepts important to this work. Solid lines
denote exact concepts about how the different types of volatility are linked. The dot-
ted line represents an ad-hoc relationship. The arrows denote the direction of the
relation. The term volatility is reserved for objects of the kind σ and σb, while their
squared counterparts σ 2 and σb2 are called variance.

are provided, e.g., by Efromovich (1999), Härdle (1990), Härdle et al. (2004),
Horowitz (1998), Pagan and Ullah (1999), and Ramsay and Silverman (1997).
Local volatility models or their stochastic ramifications are not the only
way to price derivatives. Of same significance are approaches relying on
stochastic volatility specifications and on Lévy processes. Indeed, the cur-
rent literature on derivatives pricing may be divided into two main camps:
the partisans of local volatility models who prefer them, because local volatil-
ity models produce an almost excellent fit to the observed option data; and
those who criticize local volatility models principally for predicting the wrong
6 1 Introduction

smile dynamics. It is this second camp that favors stochastic volatility spec-
ifications and Lévy models. In this book, we enter the particulars of this
debate, but topics like stochastic volatility and Lévy models are only briefly
touched. In doing so, we do not intend to argue that these competing model-
ing approaches are not justified: they certainly are, and there are very good
arguments in favor of them. Rather it is our intention to bring together this
important strand of literature and to discuss advantages and potential draw-
backs. The pricing of derivatives in stochastic volatility models can be found
in the excellent textbooks by Fouque et al. (2000) and Lewis (2000), and an
outstanding treatment of jump diffusions is provided in Cont and Tankov
(2004), or in Schoutens (2003).

Organization of the book

In Chapter 2, we give an introduction into the classical BS model. The basic


option valuation techniques are presented to derive the celebrated BS pric-
ing formula. Next, the concepts of IV and the IVS are introduced. Given
the model’s inconsistency with the empirical evidence, potential directions of
relaxing the rigid assumptions are discussed. This will lead to new interpreta-
tions of IV as averages of volatility. We proceed in discussing the consequences
that arise for pricing and hedging in the presence of the smile. A short sum-
mary of the literature that investigates IV as a predictor of realized volatility
follows. The chapter concludes by giving an account of the potential reasons
for the existence of a non-constant smile function.
Chapter 3 is devoted to local volatility. Up to now, the theoretical relation-
ship between implied and local volatility – and finally instantaneous volatility
as the measure of the contemporaneous asset price variability – is not as clear-
cut as one might wish. In certain boundary situations or asymptotic regimes
only has it been possible to make the relation more precise. Figure 1.2 gives
an overview of the current state of research. All relations are developed in the
course of the next two chapters. The relationship, possibly most important
from a practical point of view, is presented by the dotted line, linking implied
and local volatility. It represents the so called IV counterpart of the Dupire
formula, which enables the pricing of exotic options directly from an estimate
of the IVS and its derivatives. The chapter discusses several methods to ex-
tract local volatility, especially implied tree techniques. Implied trees can be
considered as nonparametric approximations to the local volatility function.
The so called delta debate of local volatility models is covered. The chapter
concludes by presenting the class of stochastic IV models.
In Chapter 4, we move to smoothing techniques of the IVS. We intro-
duce the Nadaraya-Watson estimator as the simplest nonparametric estimator
for the IVS. This is followed by local polynomial estimation, which is deci-
sive when it comes to the estimation of derivatives. Finally, we introduce a
least squares kernel estimator of the IVS. The least squares kernel estimator
1 Introduction 7

smoothes the IVS in the space of option prices and avoids the potentially
undesirable two-step procedure of previous estimators: traditionally, in the
first step, implied volatilities are derived. In the second step the actual fitting
algorithm is applied. A two-step estimator may be less biased, when option
prices or other input parameters can be observed with errors, only.
The probably biggest challenge in IVS modeling is dimension reduction.
This is the topic of Chapter 5, which is divided into two major parts. The
first part, focusses on linear transformations of the IVS. A standard approach
in statistics is to apply principal component analysis. In principal component
analysis the high-dimensional variables are projected into a lower dimensional
space such that as little information as possible is lost. However, this approach
is not directly applicable to the IVS due to the surface structure. Hence, we
use the common principal component models that we find to allow for a parsi-
monious, yet flexible model choice. A concern of applying the principal com-
ponent transformation is stability across time. We derive and apply stability
tests across different annual samples. The first part concludes by modeling
the resulting factors via standard GARCH time series techniques.
The second part of Chapter 5 is devoted to nonlinear transformations via
functional principal component techniques. We first outline the functional
principal component framework. Then we propose a semiparametric factor
model for the IVS. The semiparametric factor model provides a number of
advantages compared with other methods: first, surface estimation and di-
mension reduction can be achieved in one single step. Second, it estimates in
the local neighborhood of the design points of the surface, only. With regard
to Figure 1.1 this means that we estimate only in the local vicinity of the black
dots. This will avoid model biases. Third, the technique delivers a small set of
functions and factor loadings that span the propagation of the IVS through
space and time. We provide another time series analysis of these factors based
on vector autoregressive models and perform a horse race which compares the
model against a simpler practitioners’ model.
Chapter 6 concludes and gives directions to future research.
2

The implied volatility surface

A smiley implied volatility is the wrong number to put


in the wrong formula to obtain the right price.
Rebonato (1999)

2.1 The Black-Scholes model

The option pricing model developed by Black and Scholes (1973) and further
extended by Merton (1973) is a landmark in financial theory. It laid the foun-
dations of preference-free valuation of contingent claims. Despite its rather
restrictive assumptions and the large number of refinements to the model
available today, it remains an important benchmark and cornerstone of finan-
cial model building. Here, we give a short review of the BS model and present
the fundamental results necessary for the further development of this work.
For a more detailed account, we refer to textbooks in Finance, such as Musiela
and Rutkowski (1997) or Karatzas (1997).
We consider a continuous-time economy with a trading interval [0, T ∗ ],
where T ∗ > 0. It is assumed that trading can take place continuously, that
there are no differences between lending and borrowing rates, no taxes and
short-sale constraints.
Let (Ω, F, P) be a probability space, and (Wt )0≤t≤T ∗ a Brownian motion
(see appendix Chapter B for a definition of the Brownian motion) defined on
this space. P is the objective probability measure. Information in the economy
is revealed by a filtration (Ft )0≤t≤T ∗ , which is the P-augmentation of the
natural filtration
FtW = σ Ws , 0 ≤ s ≤ t , 0 ≤ t ≤ T ∗ .

(2.1)
The filtration is assumed to satisfy the ‘usual’ conditions, namely that it is
right-continuous, and that F0 contains all null sets.
The asset price (St )0≤t≤T ∗ , which pays a constant dividend yield δ, is mod-
elled by a geometric Brownian motion adapted to (Ft )0≤t≤T ∗ . The evolution
of the asset is given by the stochastic differential equation (SDE):
dSt
= µ dt + σ dWt , (2.2)
St
10 2 The implied volatility surface

where µ denotes the (constant) instantaneous drift and σ the (constant) in-
stantaneous (or spot) volatility function. The quantity σ 2 measures the instan-
taneous variance of the return process of ln St . Thus, instantaneous volatility
σ can be interpreted as the (local) measure of the risk incurred when investing
one monetary unit into the risky asset, Frey (1996).
The solution to the SDE (2.2) is given by
  
1 2
St = S0 exp µ − σ t + σWt , ∀t ∈ [0, T ∗ ] , (2.3)
2

where S0 > 0. This is seen from applying the Itô formula, given in (B.10),
to (2.3). Since (2.3) is a functional of the Brownian Motion Wt , it is a strong
solution; for the precise conditions, conditions guaranteeing uniqueness and
existence of a solution to (2.2), see in appendix Chapter B.
The economy is endowed with a savings account or riskless bond with con-
stant interest rate r, which is described by the ordinary differential equation:

dBt = rBt dt , (2.4)


def
with boundary condition B0 = 1, or equivalently Bt = ert , for all t ∈ [0, T ∗ ].
An option, also called derivative or contingent claim, is a security whose
payoff depends on a primary asset, such as the stock price. This asset is
usually referred to as the underlying asset. For instance, a call option entitles
the buyer the right – but not the obligation – to buy the underlying asset
for a known price K, the exercise price. A put option entitles the buyer the
right to sell the underlying asset for a known price K. We say that an option
is of European style if it can only be exercised at a prespecified expiry date
T ≤ T ∗ . If the option can be exercised at any date t ∈ [0, T ] during its life
time, the option is said to be of American style.
At the maturity date T , the value of a European call contract is given by
the payoff function
ψ(ST ) = (ST − K)+ , (2.5)
def
where (ST − K)+ = max(ST − K, 0). For a put option the payoff is:

ψ(ST ) = (K − ST )+ . (2.6)

These simple derivatives are also called plain vanilla options. They are nowa-
days tradable as standardized contracts on almost any futures exchange mar-
ket around the world.
In order to receive a payoff such as (2.5) and (2.6), the investor must pay
an option price, or option premium, to a counterparty when the contract is
entered. The investor is also said to be long in the option, while the counter-
party has a short position. The counterparty is obliged to deliver the payoff
according to the prespecified conditions. In any case, also when the option
2.2 The self-financing replication strategy 11

expires worthless, the short position earns the option premium paid initially
by the long side. Option theory deals with finding this option premium, i.e.
it is about the valuation, or the pricing of contingent claims.
There are two important methodologies for deriving the prices of contin-
gent claims: first, a replication strategy based on a self-financing portfolio that
provides the same terminal payoff as the derivative. By no-arbitrage consider-
ations, the capital necessary for setting up this portfolio must equal the price
of the derivative. Second, there is a probabilistic approach which computes the
derivative price as the discounted expectation of the payoff under an equiva-
lent martingale measure (so called risk neutral measure). Both strategies will
be sketched in the following.

2.2 The self-financing replication strategy

A trading strategy is given by a pair of progressively measurable processes


(at )0≤t≤T and (bt )0≤t≤T , which denote the number of shares held in the stock
and the amount of money stored in the savings account. They must satisfy
RT  RT 
P 0 a2t dt < ∞ = 1 and P 0 |bt |dt < ∞ = 1 such that the stochastic and
usual integrals involving at and bt are well defined. Denote the portfolio value
by Vt = at St + bt Bt .
We say that there is an arbitrage opportunity in the market if (Vt )0≤t≤T
satisfies for V0 = 0:

VT ≥ 0 and P(VT > 0) > 0 , (2.7)

In words: if there were an arbitrage opportunity in the market, we would finish


in T from zero capital with positive probability of gain at no risk.
The portfolio is called to be self-financing, if it satisfies:

dVt = at dSt + bt dBt + at δSt dt


= at (µ + δ)St dt + at σSt dWt + bt rBt dt , (2.8)

since the stock pays a dividend δSt dt within the small interval dt. Self-
financing means that gains and losses in the portfolio are entirely due to
changes in the stock and the bond.
It should be remarked that the self-financing property is not sufficient
to exclude arbitrage opportunities. Additionally it is required that the value
process (Vt )0≤t≤T has a finite lower bound: it is called to be tame, Karatzas
(1997).
The price of a contingent claim is a function denoted by H(St , t). It shall be
assumed that H ∈ C 2,1 R+ × (0, T ) , i.e. it is contained in the set of functions
which are twice in their first and once in their second argument continuously
12 2 The implied volatility surface

differentiable. The portfolio replicates the contingent claim if for some pair
(at )0≤t≤T and (bt )0≤t≤T :

Vt = at St + bt Bt = H(St , t) , ∀t ∈ [0, T ] . (2.9)

Applying the Itô formula (B.10) to H(St , t) yields:

∂H ∂H 1 ∂2H
dH(St , t) = dt + dSt + dhSit
∂t ∂S 2 ∂S 2
∂2H

∂H ∂H 1 ∂H
= + µSt + σ 2 St2 dt + σSt dWt . (2.10)
∂t ∂S 2 ∂S 2 ∂S

The quadratic variation process hXit has a t-subscript in order to distinguish


it from our notation for the inner product h·, ·i, which is introduced in Sec-
tion 5.3.
Equating the coefficients of (2.8) and (2.10) in the dWt terms shows:

∂H
at = . (2.11)
∂S

From the replication condition (2.9), the trading strategy in the bond is
obtained as  
−rt ∂H
bt = e H(St , t) − St . (2.12)
∂S

With these results of at and bt , equate the coefficients of dt-terms in (2.8)


and (2.10). This shows:

∂H ∂H 1 ∂2H
0= + (r − δ)S + σ2 S 2 − rH . (2.13)
∂t ∂S 2 ∂S 2

Thus, the price of any European option has to satisfy this partial differen-
tial equation (henceforth: BS PDE) with the appropriate boundary condition
H(ST , T ) = ψ(ST ). The solution to (2.13) is the value of the replicating port-
folio. In fact, for any payoff function ψ(x) continuous on R, for which the
R +∞ 2
condition −∞ e−αx |ψ(x)|dx < ∞ holds for some α > 0, derivative prices
can be found by solving (2.13), Musiela and Rutkowski (1997).
The remarkable feature of this result is that pricing the derivative within
this model is independent of the appreciation rate µ. Thus market participants
may have a different idea about the appreciation rate of the stock, they will
agree on the derivative price as long as they agree on the other parameters
in the model. This result is closely related to the uniqueness of a risk neutral
measure to be introduced next.
2.3 Risk neutral pricing 13

2.3 Risk neutral pricing

The idea about risk neutral pricing is to introduce a new probability mea-
def
sure Q such that the discounted value process Vet = e−rt Vt of the replicating
portfolio is a martingale under Q, i.e. it satisfies:

Vet = EQ (VeT |Ft ) , (2.14)

see Appendix B. By the fundamental theorem of asset pricing, originally due


to Harrison and Kreps (1979), such a measure exists if and only if the market
is arbitrage-free.
Since the portfolio replicates the derivative, i.e. VT = ψ(ST ), (2.14) implies

Vt = EQ {e−r(T −t) ψ(ST )|Ft } , (2.15)


which is, by the arguments in Section 2.1, the price of the derivative. This
means that under the measure Q, pricing derivatives is reduced to computing
(conditional) expectations.
The change of measure is achieved by Girsanov’s Theorem, which is given
in the technical appendix in Chapter B. It states that there exists a measure Q
equivalent to P (i.e. both measures agree on the same null sets) such that
the discounted stock price is a martingale under Q. In our setting, since we
have a continuous dividend payment, we require the discounted process with
def
cumulative dividend re-investments Ŝt = eδt St to be the martingale. It is
def
denoted by Set = e−rt Ŝt .
By Girsanov’s Theorem the new measure Q is computed via the Radon-
Nikodým derivative P almost surely
 
dQ 1
= exp −λWT ∗ − λ2 T ∗ , (2.16)
dP 2

where
def µ+δ−r
λ = . (2.17)
σ

Under Q the discounted price process Set satisfies

dSet = σ Set dW t , (2.18)

where
def
W t = Wt + λt , ∀t ∈ [0, T ∗ ] , (2.19)
is a Brownian motion on the space (Ω, F, Q). The object λ is called market
price of risk, since it measures the excess return µ + δ − r per unit of risk
borne by the investor. The term vanishes under Q, whence the name risk
neutral pricing. The risk neutral measure is unique if and only if the market
14 2 The implied volatility surface

is complete, i.e., every integrable contingent claim can be replicated by a tame


portfolio. This is the case, when the number of tradable assets and the number
of driving Brownian motions coincide, Karatzas (1997, Theorem 0.3.5).
That (Vet )0≤t≤T is indeed a martingale is seen by the following manipula-
tions:

dVet = −re−rt Vt dt + e−rt dVt


= −re−rt (at e−δt Ŝt + bt Bt ) dt + e−rt (at e−δt dŜt + bt dBt )
= at e−δt d(e−rt Ŝt )
= at e−δt dSet
= at e−δt σ Set dW t , (2.20)

where the second step follows from (2.8) and (2.9) rewritten in terms of Ŝt .
The BS pricing formula for a plain vanilla call is found by computing

EQ {ψ(ST − K)+ |Ft } . (2.21)

This is done by noting that


 1 
ln(SeT /Set ) ∼ N r − δ − σ 2 (T − t), σ 2 (T − t) , (2.22)
2
where N (µ, σ 2 ) is the normal distribution with mean µ and variance σ 2 . The
solution is given in the following section.

2.4 The BS formula and the greeks

The price C(St , t) of a plain vanilla call is the solution to the PDE (2.13)
with the boundary condition C(ST , T ) = (ST − K)+ . The explicit solution is
known as the Black and Scholes (1973) formula for calls:

C BS (St , t, K, T, σ, r, δ) = e−δτ St Φ(d1 ) − e−rτ KΦ(d2 ) , (2.23)

where
ln(St /K) + (r − δ + 12 σ 2 )τ
d1 = √ , (2.24)
σ τ

d2 = d1 − σ τ , (2.25)
def Ru
and where Φ(u) = −∞
ϕ(x) dx is the cdf of the standard normal distribution,
def 2
whose pdf is given by ϕ(x) = √12π e−x /2 for x ∈ R. St denotes the asset price
at time t. K is the strike or exercise price (the notation is not to be confused
with the kernel functions denoted by K(·) in Chapter 4). The expiry date of
2.4 The BS formula and the greeks 15
def
the option is T , and τ = T −t denotes its time to maturity. As in (2.2), σ is the
constant volatility function. The riskless interest rate is denoted by r, and the
constant dividend yield by δ. It is easy to check using the relevant derivatives
given in (2.28) to (2.37) that the BS price satisfies the BS PDE (2.13).
To clarify notation: we will rarely enumerate all parameters of an option
pricing function C(St , t, K, T, σ, r, δ) explicitly. Rather we limit the enumera-
tion to those parameters that are important for the exposition in the certain
context. Sometimes we find it convenient to simply denote the time depen-
dence as a t-subscript: Ct .
The price of a put option P (St , t, K, T, σ, r, δ) on the same asset with
same expiry and same strike price, which has the payoff function ψ(ST ) =
(K − ST )+ , can be obtained from the put-call parity:

Ct − Pt = e−δτ St − e−rτ K . (2.26)

This is a model-free relationship that follows from the trivial fact that ST −
K = (ST − K)+ − (K − ST )+ . The BS put price is found to be:

P BS (St , t, K, T, σ, r, δ) = e−rτ KΦ(−d2 ) − e−δτ St Φ(−d1 ) , (2.27)

where d1 and d2 are defined as in (2.24) and (2.25).


In hedging and risk management, but also for the further exposition of
this book, the derivatives of the BS formula, the so called greeks, play an
important role. In the following, we present their formulae together with the
names commonly used on the trading floor, and shortly discuss the most
important ones. For some of these derivatives – to the best of our knowledge
– there do not exist any nicknames. Usually, these are sensitivities of less
immediate concern in daily practice, such as the derivatives with respect to
the strike price. A more detailed discussion of the properties and the use of
the greeks can be found in Hull (2002) and Franke et al. (2004).
The first derivative with respect to the stock price, the delta, gives the
number of shares of the underlying asset to be held in the hedge portfolio.
This was shown in Equation (2.11). From (2.28), but also from the example
in Figure 2.1, it is seen that the delta for a call is positive throughout. Thus,
the replication portfolio is always long in the stock. Of equal importance
in practice is the gamma (2.29), which measures the convexity of the price
function in the stock. The gamma achieves its maximum in the neighborhood
of the current asset price, Figure 2.2. From the put-call parity (2.26) it is seen
that the put gamma and the call gamma are equivalent.
The vega (2.30), which is the option’s sensitivity to changes in volatility, is
plotted in Figure 2.3. It is seen that it increases for longer time to maturities.
Put and call vega are equal. The second derivative with respect to volatil-
ity (2.31), which is termed volga, is displayed in Figure 2.4. For the strikes
in the neighborhood of the current asset price it is typically very low and
16 2 The implied volatility surface

Delta

1.00

0.80

0.60

0.40

0.20

50.00
70.00 1.00
90.00 0.81
110.00 0.62
0.43
130.00 0.24

SCMdelta.xpl

Fig. 2.1. Call delta (2.28) as a function of asset prices (left axes) and time to
maturity (right axes) for K = 100.

negative.Theta measures the sensitivity of the option to time decay, and rho
is the sensitivity with respect to interest rate changes.

The formulae of the greeks are given by:


∂Ct
delta = e−δτ Φ(d1 ) (2.28)
∂S

∂ 2 Ct e−δτ ϕ(d1 )
gamma = √ (2.29)
∂S 2 St σ τ
∂Ct √
vega = e−δτ St τ ϕ(d1 ) (2.30)
∂σ
2.4 The BS formula and the greeks 17

Gamma

0.06

0.05

0.04

0.02

0.01

50.00
70.00 1.00
90.00 0.81
110.00 0.62
0.43
130.00 0.24

SCMgamma.xpl

Fig. 2.2. Gamma (2.29) as a function of asset prices (left axes) and time to maturity
(right axes) for K = 100.

∂ 2 Ct √ d1 d2
volga = e−δτ St τ ϕ(d1 ) (2.31)
∂σ∂σ σ

∂ 2 Ct d2
vanna = −e−δτ ϕ(d1 ) (2.32)
∂σ∂S σ
∂Ct
= −e−rτ Φ(d2 ) (2.33)
∂K

∂ 2 Ct e−rτ ϕ(d2 ) e−δτ St ϕ(d1 )


= √ = √ (2.34)
∂K 2 σ τK σ τ K2
18 2 The implied volatility surface

Vega

39.89

31.91

23.94

15.96

7.98

50.00
70.00 1.00
90.00 0.81
110.00 0.62
0.43
130.00 0.24

SCMvega.xpl

Fig. 2.3. Vega (2.30) as a function of asset prices (left axes) and time to maturity
(right axes) for K = 100.

∂ 2 Ct e−δτ St d1 ϕ(d1 )
= (2.35)
∂σ∂K σK
∂Ct ∂Ct
theta = −
∂t ∂T
e−δτ St σϕ(d1 )
=− √
2 τ
+ δe−δτ St Φ(d1 ) − re−rτ KΦ(d2 ) (2.36)

∂Ct
rho = τ e−rτ KΦ(d2 ) (2.37)
∂r
2.4 The BS formula and the greeks 19

Volga

138.33

110.66

83.00

55.33

27.67

50.00
70.00 1.00
90.00 0.81
110.00 0.62
0.43
130.00 0.24

SCMvolga.xpl

Fig. 2.4. Volga (2.31) as a function of asset prices (left axes) and time to maturity
(right axes) for K = 100.

An important quantity among the ‘unnamed’ greeks is the second deriva-


tive of the option with respect to the strike price (2.34). The reason is that
the second derivative with respect to the strike price yields the risk neutral
(transition) density of the process. In the empirical literature it is also called
state price density:

def ∂ 2 Ct (K, T )
φ(K, T |St , t) = er(T −t) . (2.38)
∂K 2

The probability that the stock arrives at levels K ∈ [K1 , K2 ] at date T ,


given that the stock is at level St in t, is computed by:
20 2 The implied volatility surface
Z K2
Q(ST ∈ [K1 , K2 ]) = φ(K, T |St , t) dK . (2.39)
K1

Relationship (2.34) yields the specific BS transition density as:

ϕ(d2 )
φ(K, T |St , t) = √ , (2.40)
σ τK

which is a log-normal pdf in K. Of course, this is just another way to see that
 1  
ln ST ∼ N ln St + r − δ − σ 2 τ, σ 2 τ , (2.41)
2
as was explained earlier.
The second derivative with respect to the strike, however, is useful to
recover the transition probability also in more general contexts than the BS
model: result (2.38) – first shown by Breeden and Litzenberger (1978) – hinges
on the particular form of the call payoff function ψ(ST ) = (ST −K)+ , only, and
is thus applicable in more general circumstances, irrespective of the particular
distributional assumptions on the underlying asset price process. It derives its
importance from the results of Section 2.3: if one knew this density, either
by believing in the BS model or by obtaining an empirical estimate of it,
any path independent contingent claim could be priced by simply integrating
the payoff function over this density. The state price density is also useful
for trading strategies, which try to exploit systematic deviances between the
risk neutral and the historical properties of the underlying stock price time
series, Aı̈t-Sahalia et al. (2001b) and Blaskowitz et al. (2004). And it shall play
an important role to derive local volatility, see Chapter 3 and in particular
Section 3.3.
For this reason, the statistical literature has developed a whole battery of
methods for estimating state price densities from observed option prices, see
e.g. Jackwerth (1999) or Weinberg (2001) for reviews. The extraction of the
state price density can be achieved for instance via parametric specifications
of the density, or – in a discrete way – via implied trees, Section 3.10.1. This
is more deeply discussed in Härdle and Zheng (2002). Recent advances by
Härdle and Yatchew (2003) and Hlávka (2003) allow to estimate the state
price density via non- and semiparametric procedures.
Finally, there is an identity which is useful for a lot of manipulations of
the BS formula, see for instance Equation (2.34):

e−rτ Kϕ(d2 ) = e−δτ St ϕ(d1 ) , (2.42)

which we state for completeness.


2.5 The IV smile 21

2.5 The IV smile

It is obvious that the BS formula is derived under assumptions that are un-
likely to be met in practice: frictionless markets, the ability to hedge continu-
ously without transaction costs, asset prices without jumps, but independent
Gaussian increments, and last but not least, a constant volatility function.
Due to the simplicity of the model, any deviation from these assumptions is
empirically summarized in one single parameter or object: the IV smile and
the IVS.
The only unknown parameter in the BS pricing formula (2.23) is the
volatility. Given observed market prices C ft , it is therefore natural to define an
implicit or implied volatility (IV), first introduced by Latané and Rendelman
(1976):

σ
b: C BS (St , t, K, T, σ
b) − C
ft = 0 . (2.43)

IV is the empirically determined parameter that ‘makes the BS formula


fit market prices of options’. Since the BS price is monotone in σ, as can be
inferred from the positiveness of the call vega (2.30), there exists a unique
solution σ
b > 0. Numerically, σ b can be found e.g. by a Newton-Raphson algo-
rithm as discussed in Manaster and Koehler (1982). Finally, by the put-call-
parity (2.26), put and call IV are equal.
In the derivation of the BS model it is presumed that the diffusion co-
efficient of the Brownian motion is a constant. IV σ b, however, displays a
pronounced curvature across option strikes K and, albeit to a lesser extent,
across different expiry days T . Thus IV is in fact a mapping from time, strike
prices and expiry days to R+ :

b : (t, K, T ) → σ
σ bt (K, T ) . (2.44)

This mapping is called the implied volatility surface (IVS).


Often it is not convenient to work in absolute variables as expiry dates T
and strikes K. Rather one prefers relative variables, since the analysis becomes
independent of expiry effects and the movements of the underlying. Moreover,
the options with strikes close to the spot price of the underlying asset are
traded with high liquidity. As a new scale, one typically employs time to
def
maturity τ = T − t and moneyness. In this work, we will predominantly use
the following forward (or futures) moneyness definition:
def
κf = K/Ft , (2.45)

where Ft = e(r−δ)(T −t) St denotes the (fair) futures or forward price at time
t, Hull (2002). A stock price moneyness can be defined by:
22 2 The implied volatility surface
def
κ = K/St . (2.46)

Forward moneyness is a natural choice of the moneyness scale, when one works
with European style option data. European options can only be exercised at
expiry. From this point of view, one incorporates the risk neutral drift in the
moneyness measure, which is taken into account by dividing by the futures
price.
We say that an option is at-the-money (ATM) when κ ≈ 1. A call option
is called out-of-the-money, OTM, (in-the-money, ITM), if κ > 1 (κ < 1) with
the reverse applying to puts. Sometimes the literature also works in units
of log-moneyness: ln(K/Ft ) or ln(K/St ). Given a quantity in one moneyness
definition, it is often not difficult to switch between the different scales.
A typical picture of the IV smile is presented in Figures 2.5 and 2.7. IV
observations appear as black dots. The IV data, which are the basis for all
empirical parts of this study, are obtained from prices of DAX index options
traded at the EUREX in Frankfurt am Main. The original raw data was
provided from the Deutsche Börse AG, Frankfurt. It has undergone consider-
able refinement and is stored in the financial data base MD*base located at
the Center for Applied Statistics and Economics (CASE) at the Humboldt-
Universität zu Berlin. A detailed description of the data and the preparation
scheme is given in Appendix A. The option data is contract based data: each
price observation belongs to actual trades, i.e. we do not work with price quo-
tations or settlement data. Due to the nature of transaction based data, the
data set may contain noise, potential misprints and other errors. This is also
seen in Figures 2.5 and 2.7 with the two single observations traded at an IV
of 21% in the lower left of the smile function.
Figure 2.5 shows a downward-sloping smile across strikes for the 45 days
to expiry contract as observed on 20000502. Obviously, OTM puts and ITM
calls are traded at higher prices than the corresponding ATM options. Since
the contracts are highly standardized on organized markets, IV observations
are only available for a small subset of strikes. Consequently, observations are
concentrated at these strikes.
In the lower panel of Figure 2.5, we added the intraday movements of the
futures contract (expiry June 2000). Given the observation that the futures
contract gains approximately 1% during the day, one may ask whether the
dispersion of IV observations for a fixed strike is due to intraday movements
of IV. In the top panel of Figure 2.6, we present the intraday movements of
IV at the fixed strikes 6400, 7000 and 7500. No particular directional moves
of IV are evident. Rather – this is especially pronounced for the 6400-strike
contract – IV jumps up and down between two distinct levels: this is the
bid-ask bounce. During the day, the bid-ask spread seems to widen beginning
from 3:00 p.m. Note that this coincides with a strong increase of the futures
price in this part of the day. This contract, which is already in the OTM put
region, is floating further away from the ATM region. The other contracts,
2.5 The IV smile 23

Smile ticks per strike, 45 days to expiry

0.4
0.35
0.3
0.25
0.2
0.15

6500 7000 7500 8000 8500


Strike prices

DAX June-2000-future on 20000502


7575
7550
7525
7500

9 10 11 12 13 14 15 16 17 18
time in hours

SCMivanalysis.xpl

Fig. 2.5. Top panel: DAX option IV smile for 45 days to expiry on 20000502 plotted
per strike. IV observations are displayed as black dots. Bottom panel: DAX futures
contract, June 2000 contract, between 8:00 a.m. and 5:30 p.m. on 20000502.
24 2 The implied volatility surface

Fixed strike IV on 20000502

0.4
0.35
0.3
0.25
0.2

10 12 14 16 18
time in hours

Fixed moneyness IV on 20000502


0.4
0.35
0.3
0.25
0.2

10 12 14 16 18
time in hours

SCMivanalysis.xpl

Fig. 2.6. Top panel: IV smile for 45 days to expiry plotted per strikes 6400 (blue,
top), 7000 (black, middle), and 7500 (cyan, bottom), between 8:00 a.m. and 5:30 p.m.
on 20000502. Bottom panel: same IV smile plotted per (forward) moneyness 0.85
(blue, top), 1.00 (black, middle), and 1.05 (cyan, bottom).
2.5 The IV smile 25

Smile Ticks, 45 days to expiry: 20000502

0.4
0.35
0.3
0.25
0.2
0.15

0.8 0.9 1 1.1


(Forward) moneyness

First derivative and no arbitrage bounds, 45 days to expiry: 20000502


5
0
-5
-10

0.8 0.9 1 1.1


(Forward) moneyness

SCMsmile.xpl

Fig. 2.7. Upper panel: IV smile for 45 days to expiry on 20000502. IV observations
are displayed as black dots; the smile estimate is obtained from a local linear estima-
tor with localized bandwidths. Lower panel: first order derivative obtained from a local
linear estimator with localized bandwidths (solid line). No-arbitrage bounds (2.53) on
the smile (dashed).
26 2 The implied volatility surface

closer to ATM, exhibit a much less pronounced jump behavior due to the
bid-ask prices of the options.
In Figure 2.7, the very same IV data are plotted against (forward) money-
ness as defined in (2.45). As a proxy, we divide the strike of each option by
the futures price which is closest in time within an interval of five minutes.
It should be remarked that due to the daily settlement of futures contracts,
futures prices and forward prices are not equal, when interest rates are stochas-
tic. However, for the time to maturities we consider throughout this work (up
to half a year), we believe this difference to be negligible, see Hull (2002, p. 51-
52) for a more detailed discussion and further references to this topic. As is
seen in Figure 2.7, the overall shape of the smile function is not altered, but
the data appear smeared across moneyness, which is due to the intraday fluc-
tuations of the futures price. The lower panel of Figure 2.6 gives the intraday
movements for fixed moneyness in the neighborhood of κf = 0.85, 1.00, 1.10
(maximum distance is ± 0.02). It is seen that the turnover for ITM put (OTM
call) options is very thin compared with OTM puts. Most trading activity is
taking place ATM.
Comparing Figure 2.5 with Figure 2.7 exhibits a nice feature of the money-
ness data. Plotting the smile against moneyness not only makes the smile
independent from large moves in the underlying asset in the view of months
and years. To some extent, it acts as a ‘smoothing’ device. This facilitates the
aggregation of intraday data to daily samples as we do throughout this work.
Especially, from the perspective of curve estimation, which is the topic of
Chapters 4 and 5, moneyness data are better tractable and more convenient.
Finally, let us present the entire IVS in Figure 2.8. The IV smiles appear
as black rows, which we shall call strings. The strings belong to different
maturities of the option contracts. Similarly to the discrete set of strikes, only
a very small number of maturities, here five, are actively traded at the same
time. Also, one can discern that not all maturity strings are of comparable
size: the third one is much shorter than the others. Obviously the IVS has a
degenerated design. This poses several challenges to the modeling task which
will be addressed in Section 5.4.
As a general pattern, it is seen from Figure 2.8 that the smile curve flattens
out with longer time to maturity. The lower panel shows the term structure
for various slices in the IVS for moneyness κf = 0.75 top line, κf = 1, i.e.
ATM, middle line, κf = 1.1, bottom line. There is a slightly increasing slope
for ATM IV and OTM call (ITM put) IV, while OTM put (ITM call) IV
displays a decreasing term structure. This is due to the more shallow smiles
for the long-term maturities.
The most fundamental conclusion of this section is that OTM puts and
ITM calls are traded at higher prices than the corresponding ATM options.
Obviously, the BS model does not properly capture the probability of large
downward movements of the underlying asset price. To arrive at an explana-
2.6 Static properties of the smile function 27

tion, one needs to relax the assumptions of the BS model. This literature is
summarized in Section 2.11.
The pivotal question following this conclusion is: what does the IVS imply
for practice? Two points shall be raised here:
The good, or perhaps lucky news is that the ambiguity of the model is
sublimated in one single entity, the IV. This allows traders to think themselves
as making a market for volatility rather than for specific equity contracts:
hence, it is common practice to quote options in terms of IV. The BS formula
is only employed as a simple and convenient mapping to assign to each option
on the same underlying a strike-dependent (and a maturity-dependent) IV.
For this purpose it is not necessary to believe in the BS model. It simply acts
as a computational tool insuring a common language among traders.
The bad news is that for each K and each T across the IVS a different
BS model applies. This causes difficulties in managing option books, as shall
be discussed in Section 2.9. The reason is that for hedging purposes it may
not be a good idea to evaluate the delta of the option using its own ‘quoted
IV’. Also for pricing exotic options, the presence of the IVS poses challenges,
especially for volatility-sensitive exotics, such as barrier options. This issue is
addressed in Chapter 3.
Often, IV is interpreted as the market’s expectation of average volatility
through the life time of the option. At first glance this notion seems sensible,
since if the market has a consensus about future volatility, it will be reflected
in IV. Unfortunately, from a theoretical point of view, this notion can only
be validated for a very limited class of models, as shall be demonstrated
in Section 2.8. Furthermore, option markets are also driven by supply and
demand. If market participants seek for some reason protection against a
down-swing in the market, this will drive up put prices. Eventually, since the
put price (as the call price) is positively monotonous in volatility, this will
be reflected in higher IVS levels for OTM puts. Thus, this notion should be
treated with caution.

2.6 Static properties of the smile function

2.6.1 Bounds on the slope

From the general fact that (European) call prices are monotonically decreas-
ing and puts are monotonically increasing functions of strike prices, com-
pare (2.33), it is possible to obtain broad no-arbitrage bounds on the slope of
the smile, Lee (2002). If K1 < K2 for any expiry date T , we have
Ct (K1 , T ) ≥ Ct (K2 , T ) , Pt (K1 , T ) ≤ Pt (K2 , T ) . (2.47)

Due to an observation by Gatheral (1999), this can be improved to:


28 2 The implied volatility surface

Pt (K1 , T ) Pt (K2 , T )
Ct (K1 , T ) ≥ Ct (K2 , T ) , ≤ . (2.48)
K1 K2

Assuming the explicit dependence of volatility on strikes, we obtain by


differentiating:
∂Ct ∂CtBS ∂CtBS ∂b σ
= + ≤0, (2.49)
∂K ∂K ∂bσ ∂K
which implies
∂bσ ∂C BS /∂K
≤ − tBS . (2.50)
∂K ∂Ct /∂b σ

Differentiating PtBS /K with respect to K, yields for the lower bound:

∂b
σ P BS /K − ∂PtBS /∂K
≥ t . (2.51)
∂K ∂PtBS /∂b
σ

Finally, insert the analytical expressions of the option derivatives and the
put price and make use of relationship (2.42). This shows:

Φ(−d1 ) ∂b
σ Φ(d2 )
−√ ≤ ≤√ . (2.52)
τ Kϕ(d1 ) ∂K τ Kϕ(d2 )

Using Ft = e(r−δ)(T −t) St and κf = K/Ft , this can be expressed in terms


of our forward moneyness measure as:

Φ(−d1 ) ∂b
σ Φ(d2 )
−√ ≤ ≤√ , (2.53)
τ κf ϕ(d1 ) ∂κf τ κf ϕ(d2 )

since ∂b σ /∂κf Ft −1 .
σ /∂K = ∂b
The bounds are displayed in the lower panel of Figure 2.7 together with
the estimated first order derivative. It is seen that the bounds are very broad
given the estimated slope of the smile
√ function. Without the refinement (2.48),
the lower bound is −{1 − Φ(d2 )}/ τ κf ϕ(d2 ), only.

2.6.2 Large and small strike behavior

Lee (2003) derived remarkable results for the large and small strike behavior
def
of the smile function. Define x = ln κf = K/Ft , where the futures prices
remains fixed in this section. He shows that
r
2|x|
σ
bt (x, T ) < (2.54)
T
for some sufficiently large |x| > x∗ (see also Zhu and Avellaneda (1998) who
derive this bound in a less general setting). He proceeds in showing that there
2.6 Static properties of the smile function 29

is a precise one-to-one correspondence between the asymptotic behavior of


the smile function and the number of finite moments of the distribution of the
underlying ST and its inverse, ST−1 .
To understand the first result consider the large strike case. Due to mono-
tonicity of the BS formula in volatility, it is equivalent to (2.54) to show for
some x > x∗ that
p
CtBS (x, T, σ
bt (x, T )) < CtBS (x, T, 2|x|/T ) . (2.55)

For the left-hand side one sees for any call price function that

lim Ct (x, T ) = lim e−rτ EQ (ST − K)+ = 0 , (2.56)


x↑∞ K↑∞

since by EQ ST < ∞ we can interchange the limit and the expectation by the
dominated convergence theorem. For the right-hand side one obtains
p n p o
lim CtBS (x, T, 2|x|/T ) = e−rτ Ft Φ(0) − lim ex Φ(− 2|x|)
x↑∞ x↑∞
−rτ
=e Ft /2 , (2.57)

by applying L’Hôpital’s rule. A similar approach involving puts proves the


small strike case, Lee (2003, Lemma 3.3).
The second result relates a coefficient that can replace the ‘2’ in (2.54)
with the number of finite moments in the underlying distribution of ST and
ST−1 . Define
def
pe = sup{p : EST1+p < ∞} , (2.58)
def
qe = sup{q : EST−q < ∞} , (2.59)

and
def b2 (x, T )
σ
βR = lim sup , (2.60)
x↑∞ |x|/T
def b2 (x, T )
σ
βL = lim sup . (2.61)
x↓−∞ |x|/T

The coefficients βR , βL can be interpreted as the slope coefficients of the


asymptotes of the implied variance function.
Lee (2003, Theorem 3.2 and 3.4) shows that βR , βL ∈ [0, 2]. Moreover, he
proves that
1 βR 1
pe = + − , (2.62)
2βR 8 2
1 βL 1
qe = + − , (2.63)
2βL 8 2
30 2 The implied volatility surface
def
where 1/0 = ∞, and
p 
βR = 2 − 4 pe2 + pe − pe , (2.64)
p 
βL = 2 − 4 qe2 + qe − qe , (2.65)

where βR , βL are read as zero, if pe, qe are infinity.


The intuition of these results is as follows: the IV smile must carry the
same information as the underlying risk neutral transition density. For in-
stance, in the empirical literature it is well known that certain shapes of the
state price density determine the shape of the smile function. Also the asymp-
totic behavior of the smile is shaped by the tail behavior of the risk neutral
transition density and vice versa. The tail decay of the risk neutral transition
density, however, determines the number of finite moments in the distribu-
tion. This context is made rigorous by employing two fundamental results: on
the one hand, options are bounded by moments, Broadie et al. (1998), and
on the other hand, moments, which can be interpreted as exotic options with
power payoffs, are bounded by mixtures of a strike continuum of plain vanilla
options, Carr and Madan (1998).
The results by Lee (2003) have implications for the extrapolation of the
IVS, for instance in the context of local volatility models, see in Section 3.10.3,
pp. 90. In a meaningful pricing algorithm it is often necessary to extrapolate
the IVS beyond the values at which options are typically observed. The choice
of the extrapolation is an intricate task since the prices of exotic options
depend significantly on the specific extrapolation. For instance, assuming that
at large and small strikes the IVS flattens out to a constant may produce
prices of exotic options that are far different from those obtained in a model
that allows the IVS to moderately increase at large and small strikes. The
results now providep guidelines for this extrapolation: the smile shouldpnot
grow faster than |x|. Furthermore, it should not grow slower than |x|
unless one assumes that ST has finite moments of all orders. Finally, the
moment formula allows to determine the number of finite moments that one
immediately implies in the choice of the multiplicative factor in the growth
term that extrapolates the smile.

2.7 General regularities of the IVS

2.7.1 Static stylized facts

Despite its daily fluctuations, the IVS exhibits a number of empirical regu-
larities, both from a static and a dynamic perspective. Here, they shall be
summarized with respect to the DAX index options traded at the German
option market, Deutsche Börse AG, Frankfurt, see Appendix A for details
2.7 General regularities of the IVS 31

concerning the data. Typically these stylized facts are observed for any equity
index market. Markets with other underlying assets display similar features.
For a compendium on single stocks, interest rate and foreign exchange markets
see Rebonato (1999) or Tompkins (1999).

1. For short time to maturities the smile is very pronounced, while the smile
becomes more and more shallow for longer time to maturities: the IVS
flattens out, Figure 2.8. Figure 2.9 displays the mean IVS 1996. Since this
is computed from smoothed IVS data and on a relatively small grid, this
effect is less apparent than in Figure 2.8. For more pictures of this kind,
see Fengler (2002).
2. The smile function achieves its minimum in the neighborhood of ATM to
near OTM call options, Figure 2.7 and Figure 2.9. The term structure is
increasing, but may also display a humped profile, especially in periods of
market turmoil as during the Asian crisis in 1997, Figure 2.11.
3. OTM put regions display higher levels of IV than OTM call regions, lower
panel of Figure 2.8 and Figure 2.9. However, this has not always been
the case: a more or less symmetric smile became strongly asymmetric (a
‘sneer’, or ‘smirk’) and considerably more pronounced after the 1987 crash.
It is widely argued that this is due to the investors’ increased awareness
of market down-swings since this period, Rubinstein (1994).
4. The volatility of IV is biggest for short maturity options and monotonically
declining with time to maturity, Figure 2.10 and Figure 2.12, also Fengler
(2002).
5. Returns of the underlying asset and returns of IV are negatively correlated,
indicating a leverage effect, Black (1976). For the entire data 1995 to May
2001 we find a correlation between ATM IV (three months) and DAX
returns of ρ = −0.32. This point is further discussed in the context of the
principal component analysis in Section 5.2.5.
6. IV appears to be mean-reverting, Cont and da Fonseca (2002). For ATM
IV (three months) we find a mean reversion of approximately 60 days, see
also Section 5.2.5 and Table 5.4.
7. Shocks across the IVS are highly correlated. Thus, IVS dynamics can be
decomposed into a small number of driving factors, Chapter 5.

2.7.2 DAX index IV between 1995 and 2001

An overview on the three-month ATM IV time series between 1995 and May
2001 is given in Figure 2.13. In Figure 2.14, the same time series together
with the rescaled DAX is shown. At the beginning of 1995, the DAX index
was at around 2100 points and increased moderately till the beginning of
1997. During this time ATM IV was below 20% and gradually fell till 1997
32 2 The implied volatility surface

towards 14%. Beginning from the end of 1996 the DAX commenced a steady
and smooth increase till mid 1998, which was shortly interrupted by the Asian
crisis in the second half of the year 1997. The entire ascent of the DAX was
accompanied by steadily increasing IV levels rising as high as 35%, when the
DAX fell sharply at the peak of the market turmoil. From then, IVs gradually
declined, but remained – also relatively volatile – at historically high levels,
while the index rose again. Between mid of July and beginning of October
1998 the DAX dropped again about 2000 points, followed by a sharp increase
of IVs with peaks up to 50%. During the recovery of the index between 1999
and 2000, IVs returned to the levels recorded before the late 1998 increase.
Although increasingly volatile, they remained at these levels, when the DAX
began its gradual decline from the post war peak of 8000 points in March
2000.
The annual standard deviation of IV can be inferred from Figure 2.12.
As already observed short run volatilities are more subject to daily variation
than the long term volatilities, which is reflected in the downward sloping
functions. The year 1998 was the period of the highest volatility, followed by
the years 1997 and 1999.
From this description, the only obvious regularity between the time series
patterns of the underlying asset, the DAX index, and ATM IV is that times of
market crises lead to sharply increasing levels of the IVS. This is likely to be
due to the increased demand for put options. Otherwise no clear-cut relation
between the time series patterns emerges: levels of the IVS may be constant,
downward or upward-trending independently from the index. Some authors,
like Derman (1999) and Alexander (2001b), have suggested to distinguish
different market regimes. In this interpretation, the IVS acts as an indicator of
market sentiment, i.e. as an additional financial variable describing the current
state of the market. This view explains the increased interest in accurate
modeling techniques of the IVS.
2.7 General regularities of the IVS 33

IV ticks and IVS: 20000502

0.50

0.44

0.38

0.32

0.26

0.56 0.65
0.71 0.53
0.87 0.41
0.29
1.02 0.17
1.18

IVS term structure: 20000502


0.45
0.4
0.35
0.3
0.25
0.2
0.15

0.1 0.2 0.3 0.4 0.5 0.6


Time to maturity

SCMivsts.xpl

Fig. 2.8. Top panel: DAX option IVS on 20000502. IV observations are displayed
as black dots; the surface estimate is obtained from a local quadratic estimator with
localized bandwidths. Bottom panel: term structure of the IVS. κf = 0.75 top line,
κf = 1, i.e. ATM, middle line, κf = 1.1, bottom line.
34 2 The implied volatility surface

Mean IVS 1996

0.18

0.17

0.15

0.14

0.12

0.92 0.25
0.95 0.21
0.98 0.16
0.12
1.00 0.08
1.03

Fig. 2.9. Mean IVS 1996, computed from smoothed surfaces.

Standard Deviation of IVS 1996

1.92

1.76

1.60

1.45

1.29

0.92 0.25
0.95 0.21
0.98 0.16
0.12
1.00 0.08
1.03

Fig. 2.10. Standard deviation of the IVS 1996, computed from smoothed surfaces.
2.7 General regularities of the IVS 35

ATM Mean IV Term Structures

1998

0.3
1999

0.25
2000

Implied Volatility
1997
2001

0.2
0.15

1995

1996
0.1

0.05 0.1 0.15 0.2 0.25


Time to Maturity

Fig. 2.11. ATM mean IV term structures between 1995 and 2001, computed from
smoothed surfaces.

ATM StdDev IV Term Structures


10

1998
Implied Volatility

1997

1999
5

2000

1995 2001

1996
0

0.05 0.1 0.15 0.2 0.25


Time to Maturity

Fig. 2.12. ATM standard deviation of IV term structures between 1995 and 2001,
computed from smoothed surfaces.
36 2 The implied volatility surface

3 mths ATM Implied Volatility

0.6
0.5
0.4
0.3
0.2
0.1
0

1995 1996 1997 1998 1999 2000 2001


Time

Fig. 2.13. The three-months ATM IV levels of DAX index options.

ATM IV and DAX Index

1995 1996 1997 1998 1999 2000 2001


Time

Fig. 2.14. German DAX ×10−4 (upper line) and three-months ATM IV levels
(lower line), also given in Figure 2.12.
2.8 Relaxing the constant volatility case 37

2.8 Relaxing the constant volatility case

The fact that IV is counterfactual to the BS model has spurred a large number
of alternative pricing models. The easiest way for more flexibility is to allow
the coefficients of the SDE, which describes the stock price evolution, to be
deterministic functions in the asset price and time. This preserves the complete
market setting, Section 2.8.1. A second important class of models specifies
volatility as an additional stochastic process. Since volatility is not a tradable
asset, this implies that the market is incomplete. A short review is given in
Section 2.8.2.

2.8.1 Deterministic volatility

Allowing for coefficients of the SDE of the stock price evolution that are
deterministic functions in the asset price and time, leads to
dSt
= µ(St , t) dt + σ(St , t) dWt , (2.66)
St
where µ, σ : R × [0, T ∗ ] → R are deterministic functions. For the existence of
a unique strong solution, the functions must satisfy a global Lipschitz and a
linear growth condition, see appendix Chapter B.
For pricing derivatives, one may proceed as in Section 2.1. This leads to
the generalized BS PDE for the derivative price:

∂H ∂H 1 ∂2H
0= + (r − δ)S + σ 2 (S, t)S 2 − rH . (2.67)
∂t ∂S 2 ∂S 2
As before the delta hedge ratio is given by the first order derivative of the
solution to (2.67) with respect to St .
For plain vanilla options, closed-form solutions can be derived for some par-
ticular specifications of volatility. Generally, this can be achieved via change
of variable techniques, e.g. Bluman (1980) and Harper (1994). For the special
def
case, when volatility is only time-dependent, i.e. σ(St , t) = σ(t), one can use
the following arguments to solve (2.67), Wilmott (2001a, Chapter 8):
One introduces the new variables
def
S = Se(r−δ)(T −t) , (2.68)
def
t = f (t) , (2.69)
def r(T −t)
H(S, t) = H(S, t) e , (2.70)

where f is some smooth function. Expressing the PDE (2.67) in terms of the
new variables (2.68) to (2.70), yields
38 2 The implied volatility surface
2
∂H ∂t 1 2∂ H
= σ 2 (t)S 2 . (2.71)
∂t ∂t 2 ∂S

If we choose Z T
def
f (t) = σ 2 (s) ds , (2.72)
t

Equation (2.71) reduces to

∂H 1 2 ∂2H
= S , (2.73)
∂t 2 ∂S 2

which is independent from time in its coefficients. Also, the boundary condition
for a (European) call H(S T , T ) = H(ST , T ) = (ST − K)+ (or a European
put) stays the same after these manipulations. Consequently, in denoting by
H(S, t) the solution to (2.73), we can rewrite this in the original variables as

H(S, t) = e−r(T −t) H{Se(r−δ)(T −t) , f (t)} . (2.74)

Now denote by H BS the BS solution with constant volatility σ. It can be


written in the form
BS
H BS = e−r(T −t) H {Se(r−δ)(T −t) , σ 2 (T − t)} , (2.75)
BS
for some function H . By comparison of (2.74) and (2.75), it is seen that
the constant and the time-dependent coefficient case have the same solutions
if we put
Z T
def 1 1
σ2 = f (t) = σ 2 (s) ds . (2.76)
T −t T −t t
Hence, in the case of a European call option we have
 p 
C(St , t) = C BS St , t, K, T, σ 2 . (2.77)

This is the common BS formula with the volatility parameter σ replaced


by an average volatility up to expiry. It follows by the definition of IV in (2.43)
that p
b = σ2 .
σ (2.78)
Therefore, this result provides a justification for interpreting IV as the average
volatility over the option’s life time from t to T , Section 2.5.
The time-dependent volatility generates a term structure of the IVS, only,
not a smile. To obtain a smile, volatility must depend on St as well. This
is achieved in the more general local volatility models. Actually, the model
with time-dependent volatility is just a special case of local volatility models.
Consequently, the further discussion of these models is delayed to Chapter 3.
2.8 Relaxing the constant volatility case 39

2.8.2 Stochastic volatility

In stochastic volatility models, additionally to the Brownian motion which


drives the asset price a second stochastic process is introduced that governs
the volatility dynamics. A typical model set-up can be given by:
dSt (0)
= µ dt + σ(t, Yt ) dWt , (2.79)
St
σ(t, Yt ) = f (Yt ) ,
(1)
dYt = α(Yt , t) dt + θ(Yt , t) dWt . (2.80)

(0)  (1) 
The two Brownian motions Wt 0≤t≤T ∗ and Wt 0≤t≤T ∗ are defined
on the probability space (Ω, F, P), and let (Ft )0≤t≤T ∗ be the P-augmented
filtration generated by both Brownian motions. Again we suppose that the
sufficient conditions are met, such that (2.79) and (2.80) have unique strong
solutions, Chapter B. The function f (y) chosen for positivity and analytical

tractability: typical examples are: f (y) = y, Hull and White (1987), f (y) =
ey , Stein and Stein (1991), or f (y) = |y|, Scott (1987).
Empirical analysis suggests a mean-reverting behavior of volatility. To cap-
ture this regularity, (Yt )0≤t≤T ∗ is often assumed to be an Ornstein-Uhlenbeck
process, which is defined as the solution to the SDE
(1)
dYt = α(µy − Yt ) dt + θ dWt , (2.81)

where α, µy , θ > 0, Scott (1987) and Stein and Stein (1991). Here, α is the
rate of mean reversion, pulling the levels of the process back to its long run
mean µy . The solution of (2.81) is given by:
Z t
−αt
Yt = µy + (y0 − µy )e +θ e−α(t−s) dWs(1) , (2.82)
0

where y0 denotes a known starting value. The distribution of Yt is


 θ2 
N µy + (y0 − µy )e−αt , (1 − e−2αt ) . (2.83)

θ2

The stationary distribution for t ↑ ∞ is given by N µy , 2α , which does not
depend on y0 .
Alternatively, the literature considers a log-normal process (Hull and
White; 1987), or the Cox-Ingersoll-Ross process, Heston (1993) and Ball and
Roma (1994), or a combination of the constant elasticity of variance model
with stochastic volatility, Hagan et al. (2002), the so called SABR model.
Usually, it is supposed that both Brownian motions are correlated, i.e.

hW (0) , W (1) it = ρ t , (2.84)


40 2 The implied volatility surface

where −1 ≤ ρ ≤ 1 is the instantaneous correlation between the processes. The


case ρ < 0 is associated with the leverage effect, Black (1976): volatility rises,
when there is a negative shock in the market value of the firms, since this
results in an increase in the debt-equity ratio. This pattern is also observed
for IV processes, Section 2.7.
As has been stated earlier, the assumption of no-arbitrage is equivalent
to the existence of an equivalent martingale measure. Unlike to Section 2.1,
the market is not complete due to the additional source of risk. In general,
there exists an entire set of equivalent martingale measures Q. A martingale
measure Q ∈ Q can be characterized by the Radon-Nikodým derivative using
Girsanov’s Theorem, Chapter B:
( Z ∗ Z ∗
T
dQ (0) (0) 1 T 2
= exp − λs Ws ds − λ(0)
s ds
dP 0 2 0
Z T∗ Z ∗ )
(1) (1) 1 T (1) 2

− λs Ws ds − λs ds , (2.85)
0 2 0

(0)  (1) 
where λt t≥0
and λt t≥0
are (Ft )0≤t≤T ∗ -adapted processes. Further-
more,
Z t Z t
(0) def (0) (1) def (1)
Wt = Wt + λ(0)
s ds and Wt = Wt + λ(1)
s ds , (2.86)
0 0

are Brownian motions on the space (Ω, F, Q) for all t ∈ [0, T ∗ ]. If (and only
if)
(0) def µ + δ − r
λt = , (2.87)
σ(t, Yt )
(1) 
the discounted price process e−rt St is a martingale. The process λt t≥0
can be any adapted process satisfying the required integrability condition. In
analogy to (2.87) it is called the market price of volatility risk. The measure
(1) 
Q depends on the choice of λt t≥0 . In some sense, one may think about
the measure as being ‘parameterized’ by this process, we write therefore: Qλ1 .
Option prices are computed by exploiting the risk neutral pricing relationship:
λ1
Ht = EQ {e−rt ψ(ST )|Ft } . (2.88)
(1) 
Let’s assume that λt t≥0 is a function of Yt , St and t only, i.e. a Markov
def
process, and that ρ = 0. In this particular case, we can compute option prices
by conditioning on the volatility path. By the law of iterated expectations, we
have, e.g. for a call:
λ1
h λ n o i
1
C(St , t) = EQ EQ e−r(T −t) (ST − K)+ |Ft , σ(Ys , s), t ≤ s ≤ T |Ft .
(2.89)
2.8 Relaxing the constant volatility case 41

Since the inner expectation is the same as in the time-dependent volatility


case, this reduces to
λ1
n  p  o
C(St , t) = EQ C BS St , t, K, T, σ 2 |Ft , (2.90)

def RT
where now σ 2 = T 1−t t {f (Ys )}2 ds. As before we insert the root-mean-
square time average over a particular trajectory of volatility into the BS for-
mula. The call price is given by an average of prices over all possible volatility
paths.
Due to its similarity to the case of deterministic volatility, one is tempted
to interpret IV as an average volatility over the remaining life time of the
option. However, in general we have
λ1
n  p  o  λ1
p 
EQ C BS ·, σ 2 |Ft 6= C BS ·, EQ σ 2 |Ft , (2.91)

and thus: λ1
p 
b 6= EQ
σ σ 2 |Ft , (2.92)

since σ 2 is random and the call price a nonlinear function of volatility. For
ATM strikes, where the volga is small and negative (recall our remark from
page 15 with regard to Figure 2.4), Jensen’s inequality yields:
λ1
p 
b < EQ
σ σ 2 |Ft , (2.93)

λ1
√ 
b ≈ EQ
but σ σ 2 |Ft may be considered as a sufficiently good approxima-
tion.
Thus, it can still be justified to interpret IV as an average volatility over
the remaining life time of the option. It should be borne in mind, however,
that this interpretation is limited for ATM strikes and ρ = 0, only. For the
case ρ 6= 0, a representation of this form depends strongly on the specification
of the underlying volatility process. A generalization of (2.89) within the Hull
and White (1987) model for any ρ has been obtained by Zhu and Avellaneda
(1998), see also the discussion in Fouque et al. (2000).
Due to the incompleteness of the market, the construction of a riskless
hedge portfolio as in Section 2.1 is not possible. One solution, referred to as
delta-sigma hedging, is to introduce another option with a longer maturity into
the market, whose price is given exogenously. This completes the market under
some conditions, Bajeux and Rochet (1992). Given these three instruments,
the stock, the bond and the additional option, a riskless hedge portfolio can
be constructed that prices the option.
(1)  def
Another strategy is to assume that λt t≥0 = 0, i.e. volatility risk is
unpriced. This is sensible if volatility risk can diversified away, or preferences
42 2 The implied volatility surface

are logarithmic, Pham and Touzi (1996). The measure Q0 can also be inter-
preted as the closest measure to P in an relative entropy sense, Föllmer and
Schweizer (1990). In the general case, one needs to resort to hedging strate-
gies that have been developed within the incomplete markets literature: in
super-hedging the contingent claim is ‘super-replicated’, i.e. a self-financing
strategy with minimum initial costs is seeked such that any future obligation
from selling the contingent claim is covered, while in quantile-hedging one tries
to cover this obligation only with a sufficiently high probability. Finally, one
may consider trading strategies which are not necessarily self-financing, i.e.
which allow for the additional transfer of wealth to the hedge portfolio. This is
called risk-minimizing hedging orginated by Föllmer and Sondermann (1986).
See Föllmer and Schweizer (1990), Karatzas (1997) and Föllmer and Schied
(2002) for a detailed mathematical treatment of these hedging approaches.

2.9 Challenges arising from the smile

2.9.1 Hedging and risk management

In the presence of the smile, a first obvious challenge is the computation of the
relevant hedge ratios. At first glance, an answer may be to insert IV into the BS
derivatives in order to compute the hedge ratios for some option position. This
strategy is also called an ‘IV compensated BS hedge’. However, one should be
aware that this strategy can be erroneous, since IV is not necessarily equal to
the hedging volatility. Analogously to IV, the hedging volatility, for instance
for the delta, is defined by:

∂C BS ∂C
ft
σ
bh : bh ) −
(St , t, K, T, σ =0, (2.94)
∂S ∂S
which is the volatility that equates the BS delta with the delta of the true,
but unknown pricing model, Renault and Touzi (1996). Unfortunately, the
hedging volatility is not directly observable.
Renault and Touzi (1996) prove that the bias in this approximation is sys-
tematic, when the classical Hull and White (1987) model is the true underlying
price process. The bias translates into the following errors in the hedge ratios:
for ITM options the use of IV to compute the hedge ratios leads to an under-
hedged position in the delta, while for OTM options the use of IV leads to
an overhedged position. Only for ATM options, in the log-forward moneyness
sense, the delta-hedge is perfect. This problem is also demonstrated for both
delta and vega risks in a simulation study by Rebonato (1999, Case Study
4.1).
An alternative approach for approximating the unknown delta is to ex-
plicitly assume that the smile depends on the underlying asset price St . Then
one approximates the delta as:
2.9 Challenges arising from the smile 43

∂C
ft ∂C BS ∂C BS ∂b
σ
= (St , t, K, T, σ
b) + (St , t, K, T, σ
b) . (2.95)
∂S ∂S ∂b
σ ∂S
σ /∂S. It cannot directly be
In (2.95) all quantities are known except for ∂b
recovered from the IVS, since the IVS is a function in maturity and strikes,
not in the underlying. One solution would be to use simple conjectures about
this quantity, see Derman et al. (1996b) or Derman (1999) and Section 3.11
for typical examples. Assuming that volatility is a deterministic function of
St and t as in Section 2.8.1, Coleman et al. (2001) suggest the following
approximation to ∂b σ /∂S. They observe that under this assumption European
put and call prices are related to each other through a reversal of S and K,
and r and δ, respectively, i.e.
C(St , t, K, T, σ
b, r, δ) = P (K, t, St , T, σ
b, δ, r) , (2.96)
where P (St , t, K, T, σ, r, δ) denotes the price of a European put option, where
– in this order! – current asset price is St , t time, strike K, expiry date T ,
volatility σ and interest rate r, dividend yield δ. In further assuming that a
similar relationship holds in terms of IV, they derive
σ C (St , t, K, T, r, δ)
∂b σ P (K, t, St , T, δ, r)
∂b
= , (2.97)
∂S ∂S
where the superscript C and P denote call and put respectively. Note that the
left-hand side in (2.97) is the unknown derivative of IV with respect to the
underlying asset, while the right-hand side of (2.97) is a strike derivative which
can be reconstructed from the IVS. Relation (2.97) is particularly convenient
when r ≈ δ (as in the case of futures options), since the switch of both
quantities becomes obsolete. In terms of an empirical performance, Coleman
et al. (2001) report that this approximation substantially improves the hedges
based on a simple constant volatility method.
Another hedging strategy due to Lee (2001) also includes the stochastic
volatility case: reconsider the strike-analogue to (2.95). It is given by:

∂C
ft ∂C BS ∂C BS ∂b
σ
= (St , t, K, T, σ
b) + (St , t, K, T, σ
b) . (2.98)
∂K ∂K ∂b
σ ∂K
Multiply (2.95) with St and (2.98) with K and sum both equations. As-
suming further that C ft is homogenous of degree one in St and K (the BS
BS
price Ct fulfills this property as can easily be checked), we find:
∂b
σ K ∂bσ
=− . (2.99)
∂S St ∂K

Thus the corrected hedge ratio is given by:

∂C
ft ∂C BS ∂C BS K ∂bσ
= b) −
(St , t, K, T, σ (St , t, K, T, σ
b) . (2.100)
∂S ∂S ∂b
σ St ∂K
44 2 The implied volatility surface

This delta-hedge has a direct reference to the IVS and can be implemented
without estimating an underlying stochastic volatility model. When the smile
is negatively skewed, this approach delivers smile dynamics that proxy the
so called sticky-moneyness assumption, see the discussion in Section 3.11.
Moreover, it also gives insights for the results by Renault and Touzi (1996):
when the smile is u-shaped, the IV compensated delta overhedges the OTM
call, since ∂b
σ /∂K in the second term of the right-hand side in (2.100) has a
positive sign in the OTM regions of calls, Section 2.7.
For risk management, other difficulties appear, especially when IV com-
pensated hedge ratios are used. When different BS models apply for different
strikes, one may question whether delta and vega risks across different strikes
can simply be added to assess the overall risk in the option book: being a cer-
tain amount of euro delta long in high strike options, and the same amount
delta short in low strike options, need not necessarily imply that the book
is eventually delta-neutral. There may be residual delta risk that has to be
hedged, even on this aggregate level.
Similarly, the vega risk of the portfolio needs to be carefully assessed. In
stress scenarios it is crucial how the IVS is shocked, e.g. whether one shifts
the IVS across strikes and time to maturity in an entirely parallel fashion or
in more sophisticated ways. This is explored with the dimension reduction
techniques developed in Chapter 5, which offer empirical answers to these
questions: typically, the most important shocks are due to almost parallel up
and down shifts of the IVS. A second source of shocks affects the moneyness
slope of the IVS, while a third type influences the moneyness curvature of the
IVS or – depending on the modeling approach – its term structure. This will
be studied in details in Chapter 5.

2.9.2 Pricing

A next challenge is valuing exotic options. The reason is that even weakly
path-dependent options, such as barrier options, require sophisticated volatil-
ity specifications. Consider, e.g. an ITM knock-out option with strike K and
barrier L > K. In this case, explicit valuation formulae are known when
the underlying follows a geometric Brownian Motion, Musiela and Rutkowski
(1997, Chapter 9). However, which IV should be used for pricing? One could
use the IV at the strike K, the one at the barrier L, or some average of both.
This problem is the more virulent the more sensitive the exotic option is to
volatility.
At this point it becomes clear that, in the presence of the IVS, pricing is
not sensible without a self-consistent and reliable model. One way is taken by
the stochastic volatility models sketched in Section 2.8. Another way, which is
much closer to the concept of the IVS, and hence to the topic of this research,
is offered by the smile consistent local volatility models. These models rely on a
2.10 IV as predictor of realized volatility 45

volatility function that is directly backed out of prices of plain vanilla options
observed in the market. Thus, the exotic option is priced consistently with the
entire IVS. This is a natural approach, especially when the exotic option is
to be hedged with plain vanilla options. It will be the topic in Chapter 3.

2.10 IV as predictor of realized volatility

Forecasting volatility is a major topic in economics and finance: whether in


monetary policy making, for investment decisions, in security valuation, or
in risk management, a precise assessment of the market’s expectations on
volatility is inevitable. Consequently, forecasting volatility has received high
attention in the past twenty years. One main strand of this literature employs
genuine time series models to produce volatility forecasts, and most of these
studies rely on the class of autoregressive conditional heteroscedasticity mod-
els, which emerged from the initial work of Engle (1982). Another natural
methodology is to exploit option IV as predictor for future volatility. Here, we
give a short, and by no means comprehensive summary of this second stream
of studies. For an excellent survey on this enormous body of literature we refer
to Poon and Granger (2003).
In an efficient market, options instantaneously adjust to new information.
Thus, IV predictions do not depend on the historical price or volatility series
in an adaptive sense, as is typically the case in time series based methods.
While this may be seen as a general advantage of IV based methods, there
are two methodological caveats: first, the test on the forecasting ability of
IV is always a joint test of option market efficiency and the option pricing
model, which can hardly be disentangled. Second, given the presence of the
smile, one either has to restrict the analysis to ATM options or to find an
appropriate weighting scheme of IV across different strikes, see Section 4.5.2
for a discussion of this point in the context of least squares kernel smoothing
of the IVS.
The first to study IV as a predictor for individual stock volatility is Latané
and Rendelman (1976), followed by Chiras and Manaster (1978), Schmalensee
and Trippi (1978), Beckers (1981), and Lamoureux and Lastrapes (1993).
Harvey and Whaley (1992), Christensen and Prabhala (1998), and Shu and
Zhang (2003) among others investigate stock market indices. Foreign exchange
markets and interest rate futures options are examined by e.g. Jorion (1995),
and Amin and Ng (1997), respectively. Most recent research focusses on long
memory and fits fractionally integrated autoregressive models for the volatility
forecast, Andersen et al. (2003) and Pong et al. (2003). This appears to be a
promising line of research.
The overall consensus of the literature regardless of the market and the
forecasting horizon under scrutiny appears to be – for an exception see Canina
and Figlewski (1993) – that IV based predictors do contain a substantial
46 2 The implied volatility surface

amount of information on future volatility and are better than (only) time
series based methods. At the same time, most authors conclude that IV is a
biased predictor. Deeper theoretical insight on the bias is shed by Britten-
Jones and Neuberger (2000), see Section 3.9, that provide a model-free option
based volatility forecast. They show that if the IVS does not depend on K
and T , but is not necessarily constant, then squared IV is exactly this forecast
– however: under the risk neutral measure. Hence, the bias may not be due
to model misspecification or measurement errors, but rather due to the way
the market prices volatility risk. Thus, research now proposes the volatility
risk premium as a possible explanation, Lee (2001) and Bakshi and Kapadia
(2003).

2.11 Why do we smile?

Ever since the observation of the smile function, research has aimed at explain-
ing this striking deviation from the BS constant volatility assumption. This is
achieved by subsequently relaxing the assumptions of the models. Nowadays,
the literature comprises a lot of factors possibly responsible for smile and term
structure patterns: they range from market microstructure frictions, such as
liquidity constraints and transaction costs, to stochastic volatility and Lévy-
processes for the underlying asset price process. Although stochastic volatility
and asset prices driven by Lévy-processes may be the best understood and
most prominent explanations, the empirical literature has been little successful
in disentangling the different factors: since IV is a free parameter, it comprises
‘expected volatility and everything else that affects option supply and demand
but is not in the model’, Figlewski (1989, p. 13).
It has been conjectured quite early in the literature that stochastic volatil-
ity is responsible for the smile effect, but Renault and Touzi (1996) are the
first to formally prove this suspicion under the assumption of an underlying
Hull and White (1987) model with zero correlation between the two Brownian
motions. They show that stochastic volatility necessarily implies a U-shaped
smile which attains its minimum for ATM options (in the sense of forward
moneyness). A similar conclusion is drawn in stochastic IV models, see Sec-
tion 3.12. The stochastic volatility smile effect is also confirmed in empirical
work: for instance, Härdle and Hafner (2000) demonstrate that GARCH-type
models considerably reduce the pricing error of options compared with the
simple BS model. However, the smile patterns generated by these stochastic
volatility models do not appear to match well the ones empirically observed,
Heynen (1994). This is also confirmed by Jorion (1988) and Bates (1996)
who overall favor jump diffusion models against stochastic volatility. Das and
Sundaram (1999) investigate more deeply the implications of the models con-
cerning the shape of the smile and the term structure of the IVS. According
to them, stochastic volatility smiles are too shallow, while jump diffusions
2.11 Why do we smile? 47

imply the smile only for short maturity options. Moreover, they prove that
jump diffusions always imply an increasing term structure of IV. However,
empirically, also a decreasing or at least humped term structure is observed,
compare Figure 2.11 for the years 1997 and 2000. Similar results are reported
by Tompkins (2001) for a variety of stochastic and jump models. Summing up,
it appears that only a combination of jump and stochastic volatility models
is sufficiently capable of capturing the stylized facts of the IVS, Bakshi et al.
(1997).
As in the literature on the predictive power of IV, new studies seek the
reasons for the IVS in long memory in volatility of the underlying process,
e.g. Breidt et al. (1998). There is evidence that in particular the upward-
sloping term structure of the IVS can strongly be influenced by long memory
in volatility, Taylor (2000).
Given that the distributional assumption of normal returns behind the
BS model is frequently rejected, see e.g. Ederington and Guan (2002) for an
analysis that is based on delta-hedging an option portfolio, processes with a
marginal distribution tails heavier than the Gaussian ones are considered. For
instance, Barndorff-Nielsen (1997) discusses the inverse Gaussian distribution.
This distribution is from the class of generalized hyperbolic distributions,
proposed by Eberlein and Keller (1995), Küchler et al. (1999), and Eberlein
and Prause (2002) for modeling asset price processes that capture the smile
effect. A comprehensive introduction into smile consistent option pricing with
Lévy processes is found in Cont and Tankov (2004).
An increasing literature seeks the reasons for the smiling volatility func-
tions in market imperfections. Jarrow and O’Hara (1989) argue that the dif-
ferences between IV and the historical volatility reflect the transaction costs
of the dynamic hedge portfolio. In approximating transaction costs by the bid-
ask spread, a similar conclusion is reached by Peña et al. (1999). Within an
equilibrium framework, Grossman and Zhou (1996) analyze possible feedback
effects from hedging and market illiquidity. In their set-up, portfolio insurance
can generate volatility skews. Frey and Patie (2002) show that a market liq-
uidity, which depends on the asset price level, produces smile patterns as are
typically observed. This assumption is in line with the experience that large
up or down swings in asset prices lead to a decrease of market liquidity.
In Section 2.5, it has been argued that also supply and demand condi-
tions may contribute to the shape of the smile. In a recent study, Bollen and
Whaley (2003) examine the net buying pressure proxied by the difference of
buyer-motivated and seller-motivated contracts. To them, net buying pressure
plays an important role for the shape of the IVS in the S&P 500 market. One
is tempted to argue that, in an efficient market, a higher increased demand
for portfolio insurance should provide incentives to agents to sell options and
to replicate them synthetically. However, the presence of short sale and bor-
rowing constraints among investors may make the replication strategy more
48 2 The implied volatility surface

costly, thereby driving up option prices. Fahlenbrach and Strobl (2002) pro-
vide empirical evidence for this argument.
Finally, an interesting explanation for the index smile has recently been
put forward: it is a well-known fact that stock smile functions are shallow com-
pared with the smile of index options, Bollen and Whaley (2003). The risk
neutral distribution of the index – since it is a (deterministically) weighted
average of single stocks – is completely determined by the risk neutral distribu-
tions of the single stocks. Branger and Schlag (2004) show that the steepness
of the smile is an immediate result of the dependence structure of the single
stocks in the basket. Moreover, a change in this dependence structure, which
has been addressed by Fengler and Schwendner (2004) in the context of pric-
ing multiasset equity options, can have dramatic consequences to the shape of
the IVS. Indeed the relation of prices, risk neutral distributions and volatility
functions between stock options and basket options is relatively unexplored.
First such approaches in this direction are Avellaneda et al. (2002), Bakshi et
al. (2003) and Lee et al. (2003).

2.12 Summary

In this chapter we introduced the phenomenon of the IV smile and the IVS:
the first part was devoted to an introduction into the BS model for the pricing
of contingent claims. Two principles in pricing, the self-financing replication
strategy and the probabilistic approach based on the risk neutral measure,
were presented. We derived the BS formula for plain vanilla European calls
and puts.
The second part treated the concept of IV. We discussed the static prop-
erties of the smile function, such as no-arbitrage bounds on the IV slope and
the asymptotic behavior of the smile function. A discussion of the general
empirical regularities of the IVS observed on equity markets followed. In a
first attempt to explain the smile, we presented two typical approaches for
relaxing the strict assumptions of the BS model: time-dependent and stochas-
tic volatility. In both frameworks, we arrived at an interpretation of IV as an
average of the squared volatility function. Then we discussed the challenges in
hedging and pricing in the presence of a smile. The chapter concluded with
a short summary on the literature that employs IV as a predictor for future
stock price fluctuations, and with a complementary section on other possible
explanations for the existence of the smile phenomenon.
3

Smile consistent volatility models

3.1 Introduction
The existence of the smile requires the development of new pricing models
that capture the static and dynamic distortions of the IVS. One pathway
taken first in Merton (1976) and Hull and White (1987) and the subsequent
related literature is to add another degree of freedom either to the process
of the underlying asset or to the volatility process. This approach has been
sketched in Section 2.8. The advent of highly liquid option markets, on which
large numbers of standardized plain vanilla options are traded at low costs,
has reversed the procedure: an emerging strand of literature of so called smile
consistent volatility models takes the prices of plain vanilla options as given.
The aim is to extract information about the asset price dynamics and the
volatility directly from the observed option prices and the IVS only, which
then is employed to price and hedge other derivative products. The decisive
point is that these other derivatives are priced and hedged relative to the
observed plain vanilla options. The name smile consistent volatility models
is derived from the fact that the (European) options priced in these models
exactly reproduce the IVS observed empirically.
This approach is justified by at least two empirical facts and one practi-
cal consideration: first, option prices and the IVS are readily at hand, if not
directly observed. Second, recent studies demonstrate that a large number of
option price movements cannot be attributed to movements in the underlying
or to market microstructure frictions, Bakshi et al. (2000). This leads to the
impression that option markets due to their depth and liquidity behave in-
creasingly self-governed by its own supply and demand conditions. This seems
to be particularly virulent at the joint expiry dates of futures contracts and
options, the ‘triple witch days’.
The third and more practical point concerns portfolios of exotic options:
necessarily, positions in these options need to be hedged, and in most cases,
this hedge will be sought by employing plain vanilla options. A particular
50 3 Smile consistent volatility models

strategy is static hedging. Unlike dynamic hedging where the hedge is (almost)
continuously adjusted, in static hedging, the payoff of the exotic option is
replicated by an appropriate portfolio of plain vanilla options. This portfolio
remains unaltered up to expiry, Derman et al. (1995), Carr et al. (1998) and
Andersen et al. (2002). In this case, pricing exotic derivatives correctly relative
to the options that will be used for the hedge is vital for its accuracy.
In achieving these goals, two main lines of models have emerged: first, local
volatility models and their most recent stochastic ramifications, and second,
stochastic implied volatility models. In both approaches, the parameters are
obtained from a calibration of the model to a cross-section of option prices.
Furthermore, both models allow for preference-free derivative valuation (ex-
cept for the stochastic local volatility models), since within each modeling
framework the market is complete. Thus they do not require additional as-
sumptions on the market price of risk.
The concept central to local volatility models is the local volatility surface
(LVS). Unlike the IVS, which is a global measure of volatility, as can be under-
stood from the averaging concept usually attributed to it, the LVS is a local
measure in the sense that it gives a volatility forecast for a pair of a particular
strike and a particular expiry date (K, T ). In this framework instantaneous
volatility is not necessarily a deterministic function of time and asset prices, it
may perfectly be stochastic. However, in the derivation of the LVS all sources
of risk in the stochastic volatility are integrated out, which leave as only risky
element the fluctuations of the asset price, Derman and Kani (1998). By its
local nature, the LVS – opposite to the IVS – is the correct input parameter
for pricing models. Most recently, a number of studies try to circumvent the
static implications of the LVS in moving towards stochastic local volatility
models.
Stochastic IV models explicitly allow for a stochastic setting. However,
the additional state variable is not introduced in the instantaneous volatil-
ity function as in the classical stochastic volatility literature, but tied to a
stochastic IV. Ultimately, of course, this also implies a stochastic instanta-
neous volatility. Since plain vanilla options are still priced via the BS formula
using the contemporaneous realization of the IV, volatility risk is tradable and
the market complete. This allows for a preference-free valuation of contingent
claims.
In this chapter, we aim at giving a comprehensive review on the current
state of literature of local volatility and stochastic IV models (see also Ski-
adopoulos (2001) for an excellent review). First, the notion of local volatility
as pioneered by Dupire (1994) and Derman and Kani (1998) is presented. In
Section 3.3 we relate local volatility to observed option prices. The central
result will be the so called Dupire formula. An alternative path to the Dupire
formula is given in Section 3.4. Section 3.5 establishes the link between lo-
cal volatility and IV. The theoretical part of local volatility is deepened in
Section 3.8, which develops the local volatility as an expected value of instan-
3.2 The theory of local volatility 51

taneous volatility under the K-strike and T -maturity forward risk-adjusted


measure. Section 3.9 shows how model-free (implied) volatility forecasts can
be extracted from option data, Britten-Jones and Neuberger (2000). Finally,
a variety of specific models for pricing and extracting local volatility are pre-
sented: the main focus is on implied trees, but we also inspect methods mo-
tivated from a continuous time setting. Stochastic local volatility models are
also considered. The chapter concludes by presenting the stochastic IV model
and its properties in Section 3.12.

3.2 The theory of local volatility

The concept of local volatility (also called forward volatility) was introduced
by Dupire (1994), and further developed in Derman and Kani (1998). Intu-
itively one may think about local volatility, denoted by σK,T , as the market’s
consensus of instantaneous volatility for a market level K at some future date
T . The ensemble of such estimates for a collection of market levels and fu-
ture dates is called the local volatility surface (LVS). Since it is implied from
observed option prices, the LVS gives the fair value of the asset price volatil-
ity for future market levels and times. Note the difference to the concept of
IV which under certain conditions is thought of as the market’s estimate of
expected average volatility through the life time of the option, Section 2.8.
To make the concept of local volatility more precise, we reconsider the
continuous-time economy with a trading interval [0, T ∗ ], where T ∗ > 0. Let
(Ω, F, P) be a probability space, on which at least one Brownian motion
(0) 
Wt 0≤t≤T ∗ , but possibly also more Brownian motions, are defined. As usu-
ally, P is the objective probability measure and information is revealed by a
filtration (Ft )0≤t≤T ∗ . The asset price (St )0≤t≤T ∗ is modelled by a (Ft )0≤t≤T ∗ -
adapted stochastic process, driven by the SDE
dSt (0)
= µ(St , t) dt + σ(St , t, ·) dWt , (3.1)
St
where µ(·, ·) denotes the instantaneous drift. We assume that the instanta-
neous volatility σ(St , t, ·) 0≤t≤T ∗ follows some (Ft )0≤t≤T ∗ -adapted stochastic
process possibly depending on St , the history of St or on other state variables.
This arbitrary dependence is meant with the ‘ · -notation’. Finally, we assume
absence of arbitrage, which implies the existence of some risk neutral measure
Q ∈ Q equivalent to P, under which the discounted asset price (Set )0≤t≤T ∗ is a
martingale. If the martingale measure is not unique, we think about Q as the
risk-neutral measure ‘the market has agreed upon’, i.e. some market measure,
see Cont (1999) or Björk (1998, p. 150) for a discussion of this notion. It is
also assumed that the entire spectrum of European plain vanilla call prices
Ct (K, T ), which are priced under Q, are given for any strike K and maturity
date T : G = {Ct (K, T ), K ≥ 0, 0 ≤ T ≤ T ∗ }.
52 3 Smile consistent volatility models

2
The local variance σK,T (St , t) is defined as the risk-neutral expectation of
squared instantaneous volatility conditional on ST = K, and time t informa-
tion Ft :
def
2
σK,T (St , t) = EQ {σ 2 (ST , T, ·)|ST = K, Ft } , (3.2)

where EQ (·) is the expectation operator under the measure Q. Then local
volatility is given by: q
def 2
σK,T = σK,T . (3.3)

This definition of local volatility has two implications: first, the use of the
market’s view on future volatility expressed by the expectation operator clar-
ifies that all sources of risk from the stochastic volatility are integrated out.
Instead, the evolution of volatility is compressed into a single function that
is deterministic in St and t. To put it differently, the concept of local volatil-
ity presumes – as time elapses – that the instantaneous volatility will evolve
entirely along today’s market expectations sublimated in the local volatility
function. Therefore, within a local volatility framework, for some market level
K = St at T = t, the instantaneous volatility is:

σ(St , t) = σSt ,t (St , t) , (3.4)

and the asset price is driven by:


dSt (0)
= µ(St , t) dt + σSt ,t (St , t) dWt . (3.5)
St
It is precisely this feature which allows to use the LVS directly in the general-
ized BS PDE (2.67) as a (market implied) volatility function to price exotic or
illiquid options, since the market remains complete and derivative valuation
preference-free. In this sense local volatility ensures that these other options
are correctly priced relative to the observed plain vanilla options. This simplic-
ity, however, comes at a cost: whereas (3.1) includes all stochastic volatility
models, (3.5) is a one-factor diffusion with a deterministic (though possibly
very complicated) volatility function. It can be questioned whether this deliv-
ers an adequate description of asset price behavior, Hagan et al. (2002) and
Ayache et al. (2004), and Section 3.11.
At this point it is useful to revoke the similarity of local volatility and
the forward rate. The key insight by Derman and Kani (1998) is that local
volatilities are constructed in an analogous way as forward rates in the theory
of interest rates. They play the same role: in the same way as bond prices that
are computed from the forward rate curve match their market prices, so do
option prices calculated from the local volatility function. Just as the first is
extracted from bond prices and then employed to correctly price derivatives
or other bonds, so is the latter. Using forward rates does not imply the believe
that they are the right predictors for interest rates. The same applies to local
3.3 Backing the LVS out of observed option prices 53

volatility with respect to future instantaneous volatility. However, despite this


fact, forward rates are the relevant quantities for a bond trader in this context.
Second, as a minor and obvious implication, if instantaneous volatility
def
is deterministic in spot and time, i.e. σ(St , t, ·) = σ(St , t), both concepts,
instantaneous and local variance, coincide:
def
2
σK,T (St , t) = EQ {σ 2 (ST , T, ·)|ST = K, Ft }
= EQ {σ 2 (ST , T )|ST = K, Ft } = σ 2 (K, T ) . (3.6)

In this case, instantaneous volatility evolves along the static local volatility
function, since the right-hand side is independent of S and t.
Derman and Kani (1998) further characterize the local variance in show-
ing that it can be represented as the risk-adjusted expectation of the future
instantaneous variance at time T :
2
σK,T (St , t) = E(K,T ) {σ 2 (ST , T, ·)|Ft } , (3.7)

where the expectation is now taken with respect to a new measure, which
is called the K-strike and T -maturity forward risk-adjusted measure. This is
again in analogy with the theory of forward rates: here, the forward rate is
obtained by taking the expectation of the short rate under the T -maturity
forward measure, Jamshidian (1993). The derivation of (3.7) will be delayed
until Section 3.8.
Clearly, for pricing, the assumption that the only source of risk is the
asset price may be considered as a drawback. It may be good for markets in
which asset prices and volatility are strongly correlated, as is commonly seen
in equity markets, but can be questioned for foreign exchange markets. The
dynamic hedging performance of the deterministic local volatility models is
criticized, Hagan et al. (2002). Furthermore, they do not provide a genuine
explanation for the smile phenomenon, but rather overstretch the ordinary
BS world, Ayache et al. (2004). This, however, does not appear to diminish
their significance in pricing exotic derivatives in practice. In order to meet
this criticism and to improve the hedging performance, recent work aims at
relaxing the deterministic framework and moves towards a stochastic theory
of local volatility.

3.3 Backing the LVS out of observed option prices

As forward rates are intricately linked to observed bond prices, so is local


volatility to observed option prices. Here, we show how the local volatility
function is recovered from the set of European call option prices G. The ex-
position follows Derman and Kani (1998).
54 3 Smile consistent volatility models

Under the equivalent martingale measure asset prices follow the SDE:
dSt (0)
= (r − δ) dt + σ(St , t, ·) dW t , (3.8)
St
(0)
where W t denotes the Brownian motion, which drives the asset price, un-
der the risk neutral measure Q. The interest rate and the continuously com-
pounded dividend yield are denoted by r and δ, respectively.
By the martingale property, the calls are priced by

Ct (K, T ) = e−rτ EQ {(ST − K)+ |Ft } , (3.9)


def
where τ = T − t. Taking the left side first order derivative with respect to K
yields
D− Ct (K, T ) = −e−rτ EQ {1(ST > K)|Ft } . (3.10)

Differentiating again we recover

∂ 2 Ct (K, T )
= e−rτ EQ {δK (ST )|Ft } , (3.11)
∂K 2
where δx0 (·) denotes the Dirac delta function, which is defined by the property
∂2
R
f (x) δx0 (x) dx = f (x0 ) for a smooth function f . The derivative ∂K 2 (ST −

K)+ = δK (ST ) is defined in a distributional sense, see Hormander (1990) for a


proper mathematical formulation of distributions. Note that Equation (3.11)
shows yet another derivation of the state price density, Section 2.4.
In a second step we take the derivatives of (3.9) with respect to T :

∂Ct (K, T ) ∂ Q
= −rCt (K, T ) + e−rτ E {(ST − K)+ |Ft } . (3.12)
∂T ∂T
To evaluate the right-hand side of (3.12) we apply a generalization of the
Itô formula to the convex function (ST − K)+ , called Tanaka-Meyer formula,
Appendix (B.14). This yields
1
d(ST − K)+ = 1(ST > K) dST + ST2 σ 2 (ST , T, ·) δK (ST ) dT . (3.13)
2

Taking expectations in (3.13) together with the asset price dynamics (3.8)
yields:

dEQ {(ST − K)+ |Ft } =


(r − δ)EQ {ST 1(ST > K)|Ft } dT + EQ 12 ST2 σ 2 (ST , T, ·) δK (ST )|Ft dT .


(3.14)

The first term in the previous equation can be split into


3.3 Backing the LVS out of observed option prices 55

EQ {ST 1(ST > K)} = EQ {(ST − K)+ } + KEQ {1(ST > K)} . (3.15)

Plugging this into (3.14) and using (3.9) and (3.10), one obtains:
 
∂ Q + rτ ∂Ct (K, T )
E {(ST − K) |Ft } = e (r − δ) Ct (K, T ) − K
∂T ∂K
1 2 Q 2
+ K E {σ (ST , T, ·) δK (ST )|Ft } . (3.16)
2

By the law of iterated expectations the last term in (3.16) can be rewritten
as:

EQ {σ 2 (ST , T, ·) δK (ST )|Ft } = EQ EQ {σ 2 (ST , T, ·) δK (ST )|ST = K, Ft }|Ft


 

= EQ {σ 2 (ST , T, ·)|ST = K, Ft }EQ {δK (ST )|Ft } .


(3.17)

Thus, inserting (3.16) and (3.17) into (3.12), we find that

∂Ct (K, T ) ∂Ct (K, T )


= −δCt (K, T ) − (r − δ)K
∂T ∂K
1 2 ∂ 2 Ct (K, T ) Q 2
+ K E {σ (ST , T, ·)|ST = K, Ft } . (3.18)
2 ∂K 2

Solving for the volatility function EQ {σ 2 (T, ·)|ST = K, Ft } yields:

∂Ct (K,T )
2 ∂T + δCt (K, T ) + (r − δ)K ∂Ct∂K
(K,T )
σK,T (St , t) = 2 2C , (3.19)
K2 ∂ t (K,T )
∂K 2

def
where σK,T2
(St , t) = EQ {σ 2 (ST , T, ·)|ST = K, Ft }. This is the Dupire
formula, Dupire (1994). The Dupire formula gives a representation of the
local volatility function completely in terms of observed call prices and their
derivatives. q
2 def
It remains to show that local volatility σK,T (St , t) = σK,T (St , t) is
indeed a real number. This can be seen by the following observations: the
denominator of (3.19) is positive by no-arbitrage, since the transition proba-
bility must be positive on the entire support. Positiveness of the numerator
is obtained by a portfolio dominance arguments similar to those in Merton
(1973), see Andersen and Brotherton-Ratcliffe (1997). We have:

eδε Ct (Ke(r−δ)ε , T + ε) ≥ Ct (K, T ) (3.20)

for ε > 0. A Taylor series expansion of order one in the neighborhood of ε = 0


yields:
56 3 Smile consistent volatility models

∂Ct (K, T ) ∂Ct (K, T )


+ δCt (K, T ) + (r − δ)K ≥0. (3.21)
∂T ∂K
q
2
Thus it is verified that local volatility σK,T (St , t) is indeed a real number.
The result in (3.19) holds irrespective of the assumptions made on the
process σ(St , t, ·) 0≤t≤T ∗ . If instantaneous volatility is assumed to be deter-
ministic, as Dupire (1994) did in his original work, the expectation operator
can be dropped. In this case, it can be shown that the diffusion is completely
characterized by the (risk neutral) transition probability, Section 3.4.
It is interesting to interpret (3.19) in terms of trading strategies: the nu-
merator is related to an infinitesimal calendar spread, while the denominator
contains the position of an infinitesimal butterfly spread. Thus, from a trad-
ing perspective local volatility is linked to the ratio of both types of spreads.
These considerations imply that local volatility can be locked by appropriate
trading strategies as one can lock the forward rates in trading bonds. This is
discussed in Derman et al. (1997).

3.4 The dual PDE approach to local volatility

There is a remarkable second approach for deriving the Dupire formula (3.19),
Dupire (1994). This approach directly builds on the transition probability.
While in general it is not possible to recover the dynamics of the asset price
process from the transition probability, there is one exception: if one considers
one-factor diffusions only, i.e. if one initially assumes instantaneous volatility
to be a deterministic function in the asset price and time. The reason is that
there exists a dual or adjoint PDE to the BS PDE (2.13) which has, instead
of S and t, K and T as independent variables.
Assume now that under the risk neutral measure Q the asset price dynam-
ics are given by:
dSt (0)
= (r − δ) dt + σ(St , t) dW t , (3.22)
St
where the notation stays as before except that σ(St , t) is deterministic.
def
It is well known that the risk neutral transition probability φ(K, T |St , t) =
e ∂ 2 Ct (K, T )/∂K 2 , introduced in (2.38), satisfies the BS PDE (2.13) with

terminal condition:
φ(K, T |ST , T ) = δK (ST ) . (3.23)

However, it also satisfies the Fokker-Planck or forward Kolmogorov PDE,


see appendix (B.24). This yields:
3.4 The dual PDE approach to local volatility 57

∂φ(K 0 , T |St , t) 1 ∂2 n 2 0 0 2
 0
o
= σ (K , T ) K φ(K , T |St , t)
∂T 2 ∂(K 0 )2
n o
− ∂ 0 (r − δ)K 0 φ(K 0 , T |St , t)
∂K
(3.24)

for fixed St and t, over all maturities T and strikes K 0 with initial condition:

φ(K 0 , t|St , t) = δS (K 0 ) . (3.25)

To derive the Dupire formula one substitutes for φ(K, T |St , t). Evaluating
the first term in (3.24) yields:

∂φ(K 0 , T |St , t) ∂ 2 Ct (K 0 , T )
 

= erτ
∂T ∂T ∂(K 0 )2
∂ 2 Ct (K 0 , T ) ∂ 2 ∂Ct (K 0 , T )
= rerτ 0
+ erτ . (3.26)
∂(K ) 2 ∂(K 0 )2 ∂T

Next, we find the term on the right-hand side in (3.24):


2
 
∂ n 0 0
o
rτ ∂ 0 ∂ Ct
(r − δ)K φ(K , T |St , t) = (r − δ)e K . (3.27)
∂K 0 ∂K 0 ∂(K 0 )2

Thus, (3.24) results in

∂ 2 Ct (K 0 , T ) ∂ 2 ∂Ct (K 0 , T ) 1 ∂2 2 ∂ 2 Ct
 
r + = σ 2 (K 0 , T ) K 0
∂(K 0 )2 ∂(K 0 )2 ∂T 2 ∂2K 0 ∂(K 0 )2
2
 
∂ 0 ∂ Ct
− (r − δ) K .
∂K 0 ∂(K 0 )2
(3.28)

Integrating (3.28) twice from K to infinity, yields:

∂Ct (K, T ) 1 2 2 ∂ 2 Ct (K, T )


rCt (K, T ) + − K σ (K, T )
∂T 2 ∂K 2
∂Ct (K, T )
+ (r − δ)K − (r − δ) Ct (K, T ) = 0 (, 3.29)
∂K
under the following assumptions: given that the payoff function of a call is
ψ = (S − K)+ , the call price and its first and second order derivatives as
functions of the strike must tend to zero as K tends to infinity. More precisely,
we require that

∂Ct ∂ 2 Ct ∂ 3 Ct
Ct (K, T ), K , K2 2
, K2 → 0 as K → ∞ . (3.30)
∂K ∂K ∂K 3
58 3 Smile consistent volatility models

Note that (3.30) has implications for the tail behavior of the (risk-neutral)
transition density φ(K, T |St , t), which must be O(K −2 ). With regard to the
BS pricing function, it is evident that the assumptions (3.30) hold given the
exponential decay of the (log-normal) transition density, see Equation (2.40).
From (3.29) the Dupire formula (3.19) is readily received by solving for
σ 2 (K, T ). The final arguments are the same as given in Section 3.3 following
Equation (3.19). Uniqueness is proved in Derman and Kani (1994b).

3.5 From the IVS to the LVS

An open question up to now is how the IVS and the LVS can be linked. This
would be desirable from two points of view: first, in a static situation, one
could immediately recover the LVS, which in principle is unobservable, from
the easily observable IVS. Second, in a dynamic context, it adds additional
value to the dynamical description of the IVS, for instance in terms of the
semiparametric factor model, Chapter 5: given a low-dimensional description
of the IVS dynamics, a representation of the Dupire formula in terms of IV
could be exploited to yield the corresponding LVS dynamics. This may help
improve the hedging performance of local volatility models. Another obvious
application could be stress tests for portfolios of exotic options. Here, one could
simulate the IVS within the semiparametric factor model. IVS scenarios are
then converted into LVS scenarios. The latter are the basis for correctly pricing
the exotic options in the portfolio and computing a value at risk measure.
The central idea to obtain such an IV counterpart of the Dupire formula is
to exploit the BS formula as an analytical vehicle, Andersen and Brotherton-
Ratcliffe (1997) and Dempster and Richards (2000). More precisely, we insert
the BS formula and its derivatives into the Dupire formula (3.19). In doing so,
the BS formula is interpreted as if IV depended on K and T as one empirically
observes on the markets, i.e. we assume:
C BS (St , t, K, T, σ, r, δ) = C BS (St , t, K, T, σ
b(K, T ), r, δ) . (3.31)
Furthermore, we maintain our assumption that local volatility is a determin-
istic function.
Applying the chain rule of differentiation, we obtain for the numerator of
the Dupire formula, suppressing the dependence of σ b on K and T :
 BS
∂CtBS ∂b ∂CtBS ∂CtBS ∂b
 
∂Ct σ BS σ
2 + + δCt + (r − δ)K + .
∂T ∂b
σ ∂T ∂K ∂b
σ ∂K
(3.32)
Now, the analytical expressions for the BS formula (2.23) and its K- and
T -derivatives in (2.33) and in (2.36) are inserted. Most of the terms cancel
out. The strategy in the further derivation is to express the remaining terms
using the volatility derivative, the vega (2.30). This yields:
3.5 From the IVS to the LVS 59

∂CtBS
 
σ ∂b
σ ∂b
σ
+ (r − δ)K
b
2 + . (3.33)
∂b
σ 2τ ∂T ∂K

In differentiating the denominator, we get:


( 2 )
2 BS 2 BS 2 BS BS 2

∂ C t ∂ Ct ∂bσ ∂ Ct ∂b
σ ∂C t ∂ σ
K2
b
+2 + + . (3.34)
∂K 2 ∂K∂b σ ∂K σ2
∂b ∂K σ ∂K 2
∂b

Once again one substitutes the analytical BS derivatives and introduces


into each term the BS vega. This results in
( 2 )
BS 2

∂C t 1 2d 1 ∂bσ d 1 d 2 ∂b
σ ∂ σ
K2 √
b
+ + + . (3.35)
∂b
σ K 2σ
bτ Kbσ τ ∂K σ
b ∂K ∂K 2

Finally, collecting the numerator (3.33) and the denominator (3.35) shows:

σ ∂b
σ ∂b
σ
2
b
τ + 2 ∂T + 2K(r − δ) ∂K
σK,T (St , t) = n o . (3.36)
σ 2 ∂2σ
1
+ 2 Kbσd1√τ ∂b
σ d1 d2 ∂b

K2 K2σ
bτ ∂K + σ
b ∂K + b
∂K 2

This is the Dupire formula in terms of the IVS and its derivatives.
Obviously, this approach does not provide a theory unifying both concepts.
This requires more careful treatment, and – up to now – has only been achieved
in certain asymptotic situations, Berestycki et al. (2002) and Section 3.6.
Rather, it is an ad hoc, but successful procedure to link the unobservable
LVS with the IVS. Given (3.36) and (3.39), one estimates the IVS and plugs
it into (3.36), which yields an estimate of the LVS. The LVS is then used
as input factor in pricing algorithms, e.g. in finite difference schemes that
solve the generalized BS PDE, Andersen and Brotherton-Ratcliffe (1997) and
Randall and Tavella (2000).
For a deeper understanding, of formula (3.36) it is instructive to inspect
the situation of no strike-dependence in the IVS. In this case all derivatives
with respect to K vanish and (3.36) reduces to

∂b
σ
σT2 (t) = σ
b + 2τ σ
b , (3.37)
∂T
which implies:
Z T
1
2
σ
b = σT2 (u) du . (3.38)
τ t
60 3 Smile consistent volatility models

IV ticks and IVS: 20000502

0.50

0.44

0.38

0.32

0.26

0.56 0.65
0.71 0.53
0.87 0.41
0.29
1.02 0.17
1.18

SCMivs.xpl

LVS: 20000502

0.96

0.76

0.57

0.38

0.19

0.75 0.65
0.82 0.53
0.89 0.41
0.29
0.96 0.17
1.03

SCMlvs.xpl

Fig. 3.1. Top panel: DAX option IVS on 20000502. IV observations are displayed
as black dots; the surface estimate is obtained from a local quadratic estimator with
localized bandwidths. Bottom panel: LVS on 20000502; obtained from the IVS given
in the top panel via the moneyness representation of the Dupire formula (3.39).
3.5 From the IVS to the LVS 61

IV smile vs LV smile, 1 month

0.45
0.4
0.35
0.3
0.25

0.8 0.9 1 1.1


Forward moneyness

IV smile vs LV smile, 3 months


0.4
0.35
0.3
0.25
0.2

0.8 0.9 1 1.1


Forward moneyness

SCMlvs.xpl
Fig. 3.2. DAX option implied (squares) versus local (circles) volatility smiles for
one month and three months to expiry respectively on 20000502 taken as slices from
Figure 3.1.
62 3 Smile consistent volatility models

Thus, this situation specializes to our previous interpretation of squared


IV as average squared (local) volatility through the life time of an option,
Section 2.8. It demonstrates that IV is a global measure of volatility, while the
LVS is a local measure of volatility giving a volatility forecast for a particular
pair (K, T ).
For a graphical illustration of the LVS, we derive another version of (3.36)
def
in terms of the forward moneyness measure κf = K/Ft = K/{e(r−δ)τ St } and
time to maturity τ . After some manipulations, the LVS is given by

b2 + 2b
σ σ τ ∂b
σ
σκ2 f ,τ (St , t) = ∂τ
,
√ ∂b
σ 2τ

∂bσ
2
2 ∂2σ
1 + 2κf τ d1 ∂κ f
+ d 1 d 2 (κf ) ∂κf + σ
b τ (κ f ) ∂κf
b
2

√ √ (3.39)
where √d1 and d2 are interpreted as d1 = − ln(κf )/(b σ τ ) + 0.5 σb τ and d2 =
d1 − σ
b τ.
In Figure 3.1, we present an estimate of the LVS based on the moneyness
representation of the Dupire formula. The derivatives of the IVS are estimated
as derivatives of local polynomials of order two which are used to smooth the
IVS, see Section 4.3 for a description of this procedure. Due to the different
scales, the LVS appears to be flatter than the IVS at first glance. As we show
in Figure 3.2, which displays slices from both functions at the maturity of one
and three months, this impression is erroneous: it is the LVS which is steeper
than the IVS (leaving out the spiky short term local volatilities). Derman et al.
(1996b) report as an empirical regularity in equity markets that the smile of
the local volatility is approximately two times steeper than the IV smile. They
call this relationship the two-times-IV-slope-rule for local volatility. Using a
recent result by Berestycki et al. (2002) we shall prove in Section 3.7 that this
conjecture can be made more precise for short maturity ATM options.
In fact, there are a large number of other procedures to reconstruct the
LVS. They will be separately surveyed in Section 3.10, among them the implied
tree approaches. Another important stream of literature calls for a more formal
mathematical treatment and recovers the LVS from the Dupire formula or the
dual PDE in terms of an (ill-posed) inverse problem.
As a final cursory remark, note that Equation (3.35), if we ignore the initial
K 2 -term, is nothing but an expansion of the state price density in terms of the
BS vega, the smile and its first and second order derivatives, see the discussion
in Section 2.4, pp. 19:

φ(K, T |St , t) = e−δτ St τ ϕ(d1 ) (3.40)
( 2 )
∂2σ

1 2d1 ∂b
σ d1 d2 ∂b σ
× √
b
2
+ + + ,
K σ bτ Kbσ τ ∂K σ
b ∂K ∂K 2
3.6 Asymptotic relations between implied and local volatility 63

In estimating the smile and its derivatives, expression in (3.40) may serve
as a vehicle to recover the state price density, see Huynh et al. (2002) and
Brunner and Hafner (2003) for details.

3.6 Asymptotic relations between implied and local


volatility

Recent research has identified situations in which the relation between im-
plied and local volatility can be established more exactly. These results are of
asymptotic nature and more general than those stated so far, since they allow
the local volatility to be strike-dependent. More precisely, Berestycki et al.
(2002) show that near expiry, IV can be represented as the spatial harmonic
mean of local volatility. The key consequence of this result is that the IVS
can be extended up to τ = 0 as a continuous function. This can be exploited
in the calibration of local volatility models, Section 3.10.3. Additionally, they
prove that the representation (3.38), i.e. squared IV as an average of squared
local volatility, holds also for deep OTM options under certain assumptions.
To obtain their results, Berestycki et al. (2002) assume that local volatility
is deterministic. As noted in (3.6), this implies
2
σK,T (St , t) = σ 2 (K, T ) , (3.41)

i.e. local volatility is the instantaneous volatility function for all St = K and
t = T . Further they transform the Dupire formula, into the (inverse) log-
forward moneyness space, similarly as we have done to derive the forward
moneyness representation for the empirical demonstration in the previous
section. Define
def
x = − ln κf = ln(St /K) + (r − δ)τ . (3.42)

Straightforward calculations show that this transforms the IV counterpart


of Dupire (3.36) into the following quasilinear parabolic PDE of IV, where we
suppress the dependence of IV on x and τ :
 2
∂b
σ x ∂b
σ
2 τσ
b +σb2 − σ 2 (x, τ ) 1 −
∂τ σ
b ∂x
 2
2 ∂2σb 1 2 2 2 ∂b
σ
− σ (x, τ ) τ σb 2 + σ (x, τ ) τ σb = 0 . (3.43)
∂x 4 ∂x

To gain an insight into the nature of this first result, consider the following:
let σ
b(x, 0) be the unique solution to the PDE at τ = 0. Then (3.43) reduces
to
 2
2 2 x ∂b σ (x, 0)
b (x, 0) − σ (x, 0) 1 −
σ =0. (3.44)
σ
b(x, 0) ∂x
64 3 Smile consistent volatility models

By simple calculations it is seen that the solution is:


Z 1 −1  Z x −1
ds 1 dy
σ
b(x, 0) = = , (3.45)
0 σ(sx, 0) x 0 σ(y, 0)

where the second more familiar representation is obtained by the variable


substitution y = sx for x 6= 0.
Berestycki et al. (2002) prove that
Z 1 −1
def ds
lim σ
b(x, τ ) = σ
b(x, 0) = (3.46)
τ ↓0 0 σ(sx, 0)

holds in fact.
Result (3.46) establishes that for options near to expiry IV can be un-
derstood as the harmonic mean of local volatility. Note that – unlike the
situations seen so far – the mean is taken across log-forward moneyness, i.e.
in a spatial sense across the LVS. Berestycki et al. (2002) point out that this
result relies on the particular boundary condition imposed by the call payoff
function: ψ(x) = (ex − 1)+ (here in the inverse log-forward moneyness nota-
tion). Indeed, if it is replaced by any strictly convex function they show that
limτ ↓0 σ
b(x, τ ) = σ(x, 0).
The authors also provide an intuitive argument for their result: consider
the situation of an asset price process, the local volatility of which vanishes
in some interval [ex, 0] for x < x
e < 0. Then, we get σ b(x, 0) = 0 from (3.46).
Clearly, this result, which is obtained by averaging harmonically, is correct
also from a probabilistic point of view, since the stock starting in x will never
cross the interval and never reach the ITM region of the call. Thus the call
must have a price of zero. However, an IV of zero is inconsistent with the
simple (spatial) arithmetic averages.
For the second result, assume that local volatility is bounded away from
zero and infinity and that is has the continuous limits: limx↑∞ σ(x, τ ) = σ+ (τ )
and limx↓−∞ σ(x, τ ) = σ− (τ ). Then

1 τ 2
Z
lim σ b2 (x, τ ) = σ (s) ds . (3.47)
x→±∞ τ 0 ±

def Rτ 2
For understanding this result, note that e.g. σ b2 (+∞, τ ) = τ1 0 σ+ (s) ds
has already the correct behavior by the arguments on the non-strike dependent
local volatility in the previous section. To prove (3.47), Berestycki et al. (2002)
construct sub- and supersolutions for any τ > 0 with the required behavior
at infinity and apply a comparison principle.
3.7 The two-times-IV-slope rule for local volatility 65

3.7 The two-times-IV-slope rule for local volatility

In our empirical demonstration of local volatility we remarked that in equity


markets the slope of the local smile is approximately twice as steep as the
implied smile. Derman et al. (1996b) call this empirical regularity the two-
times-IV-slope rule for local volatility. Here, we show how this conjecture can
be made more precise by using the results of the previous section.
For convenience, we reiterate the key result:
Z 1 −1
ds
σ
b(x, 0) = . (3.48)
0 σ(sx, 0)

Consider a Taylor expansion on both sides of (3.48) in the neighborhood


of x ≈ 0. This is yields:

σ 2 (0, 0) 1 ∂σ(0, 0) s ds
Z
∂b
σ (0, 0)
σ
b(0, 0) + x = σ(0, 0) + x
∂x 2 0 ∂x σ 2 (0, 0)
1 ∂σ(0, 0)
= σ(0, 0) + x . (3.49)
2 ∂x

b(0, 0) = σ(0, 0) by (3.48), this proves:


Since σ

∂b
σ (0, 0) ∂σ(0, 0)
2 = , (3.50)
∂x ∂x
i.e. the two-times-IV-slope rule holds for short-to-expiry ATM options.
We complete this section by a simulation. Suppose the local volatility smile
for some close expiry date can be approximated within the interval [−0.2, 0.2]
by the function:
σ(x) = a(x + b)2 + c , (3.51)
where a, b, c ∈ R. Computing the harmonic mean according to (3.48) yields
for the IV smile

x ac
σ
b(x) = p a pa  . (3.52)
arctan c (x + b) − arctan cb

In Figure 3.3 we display the situation for a = 0.5, b = 0.15, c = 0.3.


Note that moneyness is measured in terms of the (inverse) forward moneyness
def
x = − ln κf . Thus, the interval [−0.2, 0.2] corresponds to [1.22, 0.81] in the
usual forward moneyness metric, and the smiles appear as a mirror image to
Figure 3.2. Otherwise the plots look remarkably similar. Also the two-time-
IV-rule is well visible.
66 3 Smile consistent volatility models

Implied vs local smile

0.36
0.34
0.32
0.3

-0.2 -0.1 0 0.1 0.2


(inverse) log-forward moneyness

SCMsimuIVLV.xpl

Fig. 3.3. Simulation of option implied (squares) versus local (circles) volatility
smiles according to (3.51) and (3.52) for a = 0.5, b = 0.15, c = 0.3. Moneyness
def
is (inverse) forward moneyness x = − ln κf . The interval [−0.2, 0.2] corresponds to
[1.22, 0.81] in the usual forward moneyness metric, compare Figure 3.2.

3.8 The K-strike and T -maturity forward risk-adjusted


measure

As had been outlined in the introduction to this chapter, it is possible to


characterize the local variance as the unconditional expectation under a K-
strike and T -maturity forward risk-adjusted measure. Such a result is similar
to the case of forward rates: Jamshidian (1993) prove that the forward rate can
be obtained by taking the expectation of the short rate under a T -maturity
forward measure.
To derive their result, Derman and Kani (1998) assume the following
stochastic structure of the LVS under the objective measure P:
2
dσK,T (St , t) (1)
2 = αK,T (St , t) dt + θK,T (St , t) dWt , (3.53)
σK,T (St , t)

which we give in a simplified setting here for the sake of clarity. Originally,
the authors allow for multi-factor dynamics. The process of the local variance
2
σK,T (St , t) 0≤t≤T ∗ is adapted to the filtration (Ft )0≤t≤T ∗ generated by two
(0)  (1) 
uncorrelated Brownian motions Wt 0≤t≤T ∗ and Wt 0≤t≤T ∗ . The drift
3.8 The K-strike and T -maturity forward risk-adjusted measure 67
 
process αK,T (St , t) 0≤t≤T ∗ and the volatility process θK,T (St , t) 0≤t≤T ∗ ,
which reflects the sensitivity of the LVS with respect to random shocks, are not
further specified, but satisfy mild integrability and measurability conditions
(see Derman and Kani (1998) for details).
In this set-up instantaneous variance is given by
Z t Z t
σS2 t ,t (St , t) = σS2 t ,t (S0 , 0) + αSt ,t (Ss , s) ds + θSt ,t (Ss , s) dWs(1) , (3.54)
0 0

where σS2 t ,t (S0 , 0)


is a known constant. Instantaneous volatility enters the
asset price dynamics in the usual manner via
dSt (0)
= µ(St , t) dt + σSt ,t (St , t) dWt . (3.55)
St
In this general set-up, arbitrage may be possible. To avoid arbitrage op-
portunities generated by (3.53) and (3.55), the drift function αK,T (t, S) must
satisfy certain conditions, similarly to those known from the Heath, Jarrow
and Morton (1992) theory of interest rates. More precisely, the drift condition
is given by
( Z TZ ∞
1
αK,T (St , t) = −θK,T (St , t) θK 0 ,T 0 (St , t)φ(K 0 , T 0 |St , t)
φ(K, T |St , t) t 0
∂2

0 2 0 0 0 0 (1)
× (K ) φ(K , T |St , t)dK dT − λ ,
∂(K 0 ) 2
(3.56)
where φ(K, T |St , t) denotes as usually the transition probability. The term
λ(1) is the market price of volatility risk. Derman and Kani (1998) show the
existence of a unique martingale measure Q if and only if the market prices
of risk do not depend on K and T .
Condition (3.56) is much more involved than the classical one known from
the Heath et al. (1992) theory of interest rates. This is due to the two-
dimensional dependence of local volatilities on K and T . Also unlike the
latter, (3.56) depends on the market price of risk and on the transition den-
sity, which render an implementation difficult. Therefore, Derman and Kani
(1998) propose a discrete approximation by means of a stochastic implied tree,
Section 3.10.2.
The dynamic evolution of local volatility under the equivalent martingale
measure is given by
2
dσK,T (St , t) (1)
2 eK,T (St , t) dt + θK,T (St , t) dW t ,
=α (3.57)
σK,T (St , t)
where α
eK,T (St , t) is the instantaneous drift under Q. The Brownian motion
(1)
under the equivalent martingale measure is denoted by W t . Under this
68 3 Smile consistent volatility models

measure also the transition probability φ(K, T |St , t) = EQ {δK (ST )|Ft } is a
martingale. Thus, it evolves according to a SDE of the form

dφ(K, T |St , t) (0) (0) (1) (1)


= ζK,T dW t + ζK,T dW t . (3.58)
φ(K, T |St , t)

The previous analysis has shown, compare (3.17), that local volatility
2
σK,T (St , t) obeys

EQ {σS2 T ,T (ST , T ) δK (ST )|Ft } = σK,T


2
(St , t) EQ {δK (ST )|Ft } . (3.59)

As the transition probability on the right-hand side of (3.59), also the


left-hand side of (3.59) must be a martingale. Applying Itô’s lemma to the
product on the right-hand side of (3.59), and collecting the drift terms arising
from (3.57) and the covariation process of (3.57) and (3.58) shows that

eK,T (St , t) + ζ (1) (St , t)K,T θK,T (St , t) = 0 .


α (3.60)

ct(i) = W (i)
R t (i)
Now introduce new Brownian motions W t − 0 ζK,T (Ss , s) ds, for
i = 0, 1. From (3.60) and (3.57) it is seen that the stochastic evolution of the
local variance is given by
2
dσK,T (St , t)
ct(1) ,
= θK,T (St , t) dW (3.61)
2
σK,T (St , t)

which is a martingale.
We define the new measure Q(K,T ) via its Radon-Nikodým derivative:
" 1 (Z )#
T
dQ(K,T ) 1 T  (i)
Z 2
(i)
X
= exp ζK,T (Ss , s) dW s − ζK,T (Ss , s) ds .
dQ i=0 0 2 0
(3.62)
This measure explicitly depends on K and T . Hence it is called the K-
strike and T -maturity forward risk-adjusted measure, in analogy to the theory
of interest rates. Denoting the expectation with respect to the new measure
by E(K,T ) (·) shows that (3.2) can be rewritten as
def
2
σK,T (St , t) = EQ {σS2 T ,T (ST , T )|ST = K, Ft } = E(K,T ) {σS2 T ,T (ST , T )|Ft } ,
(3.63)
which provides the desired representation.
3.9 Model-free (implied) volatility forecasts 69

3.9 Model-free (implied) volatility forecasts

In a large number of studies that have been surveyed in Section 2.7, the qual-
ity of IV as a predictor of stock price volatility is discussed. However, it may
be advantageous to resort to a volatility measure implied from options that
is independent of the BS model, or at best: model-free. This goal has been
achieved by Britten-Jones and Neuberger (2000). They assume that dividends
and interest rates are zero. In the presence of nonzero interest rates and div-
idends, Britten-Jones and Neuberger (2000) interpret option and asset prices
as forward prices.
Usually one is interested in comparing multi-period forecasts of volatility
with volatility over several periods. To obtain the unconditional expectation
of the Dupire formula (3.19), one first integrates across all strikes K:
Z ∞
EQ {σ 2 (ST , T, ·)|Ft } = EQ {σ 2 (ST , T, ·)|ST = K, Ft }φ(K, T |St , t) dK
0
Z ∞
∂Ct (K, T ) −2
=2 K dK . (3.64)
0 ∂T

For the forecast between the two time horizons T1 < T2 , integrate again
with respect to time to maturity. This yields:
(Z ) Z ∞
T2
Q 2 Ct (K, T2 ) − Ct (K, T1 )
E σ (ST , T, ·)|Ft = 2 dK . (3.65)
T1 0 K2

This is the unconditional expectation of the instantaneous squared volatil-


ity over a finite period [T1 , T2 ]. Or more precisely, since the interest rate is
assumed to be zero, it is the expectation of the forward squared volatility.
How does this forecast relate to the classical BS IV? Inserting the BS
formula in (3.65) and integrating by parts reveals (after carefully examining
the limits):
(Z ) Z ∞ BS
T2
Q 2 Ct (K, T2 ) − CtBS (K, T1 )
E σ (ST , T, ·)|Ft = 2 dK
T1 0 K2
= σ 2 (T2 − T1 ) . (3.66)

Thus, if the IVS is flat in K and T , but not necessarily a constant, squared
b2 in our common notation, is the risk-neutral forecast as given in
IV, i.e. σ
(3.65). There is also an intuitive argument: a lot of processes are consistent
with the squared volatility forecast (3.65). Naturally, one of them is the BS
deterministic (squared) volatility process. Hence, it precisely provides the fore-
cast.
70 3 Smile consistent volatility models

However, BS IV is a biased estimator of realized volatility, since the un-


biased forecast holds only for squared volatility. This is seen from Jensen’s
inequality:
s  s
 Z T2 Z ∞
 Ct (K, T2 ) − Ct (K, T1 )
EQ σ 2 (T, ·)|Ft ≤ 2 dK (3.67)
 T1  0 K2

Only if the IVS were a constant, i.e. if no stochastics were involved, IV would
be an unbiased forecast for realized volatility. This, however, is a case of little
interest.
The forecast (3.64) is a risk-neutral one. It will necessarily differ from
the forecast under the objective measure, unless volatility risk is unpriced,
and both forecasts cannot simply be compared. Nevertheless, studying the
systematic deviations between realized variance and its risk-neutral forecast,
would certainly contribute to our understanding of how volatility risk is priced.

3.10 Local volatility models

Here, we survey models and techniques to recover the LVS from observed
option prices. First, deterministic implied trees are presented. They are grown
either by forward induction or by backward induction. Next, trinomial trees
are discussed. Stochastic implied trees are considered in Section 3.10.2. The
section concludes with methods motivated from continuous time theory.

3.10.1 Deterministic implied trees

Valuation methods based on trees are working horses in option pricing. Pi-
oneered by Cox, Ross and Rubinstein (1979) (CRR), they provide a simple
framework in which pricing of path-independent and path-dependent options
alike can be accomplished fast and efficiently by backward induction. Most
importantly, under certain regularity conditions, they are the discrete time
approximations to the diffusion

dSt = µ(St , t) dt + σ(St , t) dWt , (3.68)

where µ, σ : R × [0, T ∗ ] → R are deterministic functions. As is well known,


the CRR tree is the discrete time approximation of the geometric Brownian
motion with constant drift and constant volatility.
In the tree framework, a given interval [0, T ] is divided into j = 1, 2, . . . , J
equally spaced pieces of length ∆t = T /J. As an approximation to (3.68) one
chooses a step function starting at S0 , which jumps with a certain probability
at discrete times j, 2j, 3j, . . . to one out of two (binomial tree) or out of three
3.10 Local volatility models 71

stock stock

HH
  H
 H H 

H H

H  HHH
HH 
H 
HH
HH

time time
Fig. 3.4. Left panel: standard binomial tree, e.g. as in Cox et al. (1979). Right
panel: implied binomial tree derived from market data, Derman and Kani (1994b).

(trinomial tree) values in j + 1. Nelson and Ramaswamy (1990) discuss the


conditions under which this process converges indeed to (3.68) as ∆t tends
to zero, and they also show how to construct a binomial approximation for a
specific diffusion.
In the smile consistent implied lattice approaches, the tree is not specified
in advance and its parameters are not inferred from a calibration to the un-
derlying process or by an estimation from historical data of the underlying.
Rather, the tree as the approximation to (3.68) is recovered from observed
option data. In implied trees, the transition probabilities change from node
to node, and the state space is distorted in a way which mimics the LVS re-
flected in the option prices. This is displayed schematically in Figure (3.4).
Thus, European options priced on this tree will correctly reproduce the IVS,
and exotic options will be priced relative to them.
An implicit assumption – or from a practical point of view: a necessity –
is that for any strike and any time to maturity plain vanilla option prices are
available. From our discussion in Chapter 2, it is clear that this is not the
case. A typical approach to resolve this problem is to smooth the IVS on the
desired grid, e.g. by the smoothing techniques given in Chapter 4. Other in-
terpolating and extrapolating techniques are a valid choice as well. The values
estimated from the IVS are then inserted into the BS formula to obtain the
prices of plain vanilla options at pairs of strikes and time to maturities where
not available otherwise.

Derman and Kani (1994b), Barle and Cakici (1998). The principle of
constructing implied binomial trees according to Derman and Kani (1994b)
and Barle and Cakici (1998) is forward induction. The tree is (for simplic-
ity) equally spaced with ∆t and has levels j = 1, . . . , J. Since the tree is
72 3 Smile consistent volatility models

recombining, there are i = 1, . . . , j nodes (i, j) at level j. The node index is


running from the bottom to the top. For the presentation, suppose that the
first j levels of the tree have already been implied from the option data, i.e.
up to level j all stock prices si,j , all risk neutral transition probabilities qi,j−1
from nodes (i, j − 1) to node (i + 1, j), and Arrow-Debreu prices λi,j have
been recovered from the option data. The Arrow-Debreu price λi,j of node
(i, j) is the price of a digital option paying one unit in this particular state
and calculated as follows: ones sums over all possible paths the product of
all risk neutral transition probabilities along a single path from the root of
the tree to node (i, j), and discounts. In this sense the entire ensemble of the
(undiscounted) Arrow-Debreu prices is the discrete version of the risk neutral
transition density as introduced in Section 2.4.
Departing from a node (i0 , j) with stock price si0 ,j , we consider the con-
struction of the up-value Si0 +1,j+1 and the down-value Si0 ,j+1 at the nodes
(i0 + 1, j + 1) and (i0 , j + 1), respectively, Figure 3.5.
Denote by Fi,j = si,j e(r−δ)∆t the (known) forward price maturing at time
tj+1 = tj + ∆t, where r and δ is the interest rate and the dividend yield,
respectively. Then, by risk neutrality,

Fi,j = qi,j Si+1,j+1 + (1 − qi,j )Si,j+1 . (3.69)

There are j equations of this type, for each i one.


The second set of equations is derived from option prices, calls C(K, tj+1 )
and puts P (K, tj+1 ) struck at an exercise price K and expiring at tj+1 . Assume
that
si0 ,j ≤ K ≤ Si0 +1,j+1 . (3.70)
This choice guarantees that only the up (down) node and all nodes above
(below) this node contribute to the value of the call (put) with exercise price
K.
Theoretically, the prices of the call options are given from the tree by
evaluating the payoff function and discounting:
j+1
X
−r∆t
C(K, tj+1 ) = e λi,j+1 (Si,j+1 − K)+ , (3.71)
i=1

where

qj,j λj,j
 for i = j + 1 ,
λi,j+1 = qi−1,j λi−1,j + (1 − qi,j )λi,j for 2 ≤ i ≤ j , (3.72)

(1 − q1,j )λ1,j for i = 1 .

In light of condition (3.70), Equation (3.71) can be written as

∆C
i0 = qi0 ,j λi0 ,j (Si0 +1,j+1 − K) , (3.73)
3.10 Local volatility models 73

s
Si0 +1,j+1
qi0 ,j
 

node  

(i0 , j) si0 ,j s
 
HH
H
HH
H
HH
Hs
H Si0 ,j+1

level j j+1
time tj tj+1

Fig. 3.5. Construction of the implied binomial tree from level j to level j + 1 accord-
ing to Derman and Kani (1994b) and Barle and Cakici (1998) by forward induction.
si0 ,j denotes the (known) stock price at node (i0 , j), Si0 +1,j+1 the (unknown) stock
price at node (i0 + 1, j + 1). qi0 ,j is the (unknown) risk neutral transition probability
from node (i0 , j) to node (i0 + 1, j + 1). At level j there are i = 1, . . . , j nodes (i, j).

def Pj
where ∆C i0 = C(K, tj+1 ) e
r∆t
− i=i0 +1 λi,j+1 (Fi,j − K). Equation (3.73) de-
pends on the two unknown parameters qi0 ,j and Si0 +1,j+1 . Exploiting the risk
neutrality condition (3.69), we receive from (3.73) the fundamental recursion
formula for the implied binomial trees by Derman and Kani (1994b) and Barle
and Cakici (1998):

∆C
i0 Si0 ,j+1 − λi0 ,j K(Fi0 ,j − Si0 ,j+1 )
Si0 +1,j+1 = . (3.74)
∆C i0 − λi0 ,j (Fi0 ,j − Si0 ,j+1 )

In using (3.69) and (3.74) iteratively, one solves for Si0 +1,j+1 and qi0 ,j
through the upper part of the tree, if an initial Si0 ,j+1 is known. Indeed,
there are 2j + 1 unknown parameters in the tree at level j: j + 1 stock prices
and j transition probabilities, while the number of equations in (3.69) and
(3.71) are only 2j. This remaining degree of freedom is closed by fixing the
root (the center) of the tree. If the number of nodes j + 1 are odd, one fixes
Sj/2+1,j+1 = S. Otherwise, if the number of nodes j + 1 are even, one employs
the logarithmic centering condition known from the CRR tree, i.e. one posits
S(j+1)/2,j+1 S(j+3)/2,j+1 = S 2 . Once the center is fixed the recursions (3.69)
and (3.74) can be used to unfold the upper part of the tree.
Similarly, the lower part of the tree is grown from put prices. One steps
down from the center, and the recursion formula (3.74) is altered to
74 3 Smile consistent volatility models

∆P
i0 Si0 +1,j+1 − λi0 ,j K(Si0 +1,j+1 − Fi0 ,j )
Si0 ,j+1 = . (3.75)
∆P i0 − λi0 ,j (Si0 +1,j+1 − Fi0 ,j )

The trees by Derman and Kani (1994b) and Barle and Cakici (1998) differ
in the choice of the strike prices and the centering condition. Derman and
Kani (1994b) put K = si0 ,j and S = s1,1 , i.e. they fix the center of the tree
at the current asset price. Barle and Cakici (1998) choose K = Fi0 ,j and
S = s1,1 e(r−δ)t , i.e. their tree bends upward with the risk-neutral drift. They
show that this choice produces a better fit to the IV smile, especially, when
interest rates are very high.
Both trees are calibrated to the entire set of available option prices, both
across the strike dimension and across the term structure of the IVS. How-
ever, an inherent difficulty in both trees is the fact that none of them can
prevent transition probabilities from being negative. From negative transition
probabilities, arbitrage possibilities ensue. Derman and Kani (1994b) avoid
this by checking node by node whether Fi,j < Si,j+1 < Fi+1,j . If this con-
dition is violated, they take a stock price that keeps the logarithmic spacing
between neighboring nodes equal to the corresponding nodes at the previous
level. Barle and Cakici (1998) propose to set Si,j+1 = (Fi,j + Fi+1,j )/2. But
even with these modifications, as the authors note, negative transition prob-
abilities may not totally be avoided, either.

Rubinstein (1994), Jackwerth (1997). Contrary to the above approach,


Rubinstein (1994) and Jackwerth (1997) construct the tree by backward in-
duction beginning from a risk neutral distribution at the terminal nodes. This
distribution is recovered by minimizing in a least squares sense a prior distri-
bution, which is obtained from the binomial distribution of a standard CRR
tree. The minimization is accomplished subject to the conditions of being a
distribution (positivity, summability to one), and subject to correctly pricing
the observed (European) option prices and the asset under the new measure.
Different measures of distance do not appear to strongly affect the results of
the risk neutral distribution, Jackwerth and Rubinstein (2001).
The central assumption in the tree by Rubinstein (1994) is path indepen-
dence within the tree, i.e. the path of a downward move and an upward move
is as likely as an upward move followed by a downward move. Given the known
asset prices Si0 +1,j+1 and Si0 ,j+1 at level j + 1 and the corresponding nodal
probabilities Qi0 +1,j+1 and Qi0 ,j+1 , the tree is constructed in three steps and
iterated from the terminal nodes to the first one:
3.10 Local volatility models 75

s
Qi0 +1,j+1 , Si0 +1,j+1
qi0 ,j


 
node 

(i0 , j) Qi0 ,j , Si0 ,j s
 
HH
H
HH
H
HH
HHs Qi0 ,j+1 , Si0 ,j+1

level j j+1
time tj tj+1

Fig. 3.6. Construction of the implied binomial tree from level j + 1 to level j ac-
cording to Rubinstein (1994) by backward induction. Si0 ,j denotes the asset price at
(i0 , j) and Qi0 ,j its risk neutral nodal probability. qi0 ,j is the (unknown) risk neutral
transition probability from node (i0 , j) to node (i0 + 1, j + 1). Quantities at level j + 1
are known, while those at j are unknown.

(1) Qi0 ,j = w(i0 + 1, j + 1) Qi0 +1,j+1 + {1 − w(i0 , j + 1)} Qi0 ,j+1 ,

(2) qi0 ,j = w(i0 + 1, j + 1) Qi0 +1,j+1 /Qi0 ,j , (3.76)

(3) Si0 ,j = e−(r−δ)∆t {(1 − qi0 ,j )Si0 ,j+1 + qi0 ,j Si0 +1,j+1 } ,

i−1 def
where qi,j denotes again the risk neutral transition probability. w(i, j) = j−1
is a weight function, more precisely, the fraction of the nodal probability in
node (i, j) which is going down to its preceding lower node in (i−1, j −1). The
weight function is a consequence of the assumption of path independence and
derived from the arithmetics of the CRR tree, Jackwerth (1997). Note that
our notation follows Jackwerth (1997), but is adapted to observe consistency
with our previous presentation: our tree has the root node (1, 1), which is
different to both authors who start with zero.
An interesting feature of the trees implied by backward induction is that
negative transition probabilities cannot occur by construction. This can di-
rectly be seen from (3.76). However, the crucial assumption in the tree by Ru-
binstein (1994) is the aforementioned property of path independence. While
it facilitates the tree’s construction enormously, it is also its biggest weakness:
only a single maturity of options is calibrated to the tree. This may be disad-
vantageous when pricing exotic options, the expiry of which does not match
76 3 Smile consistent volatility models

with the maturity of the options used as inputs. This deficiency is remedied
by Jackwerth (1997) in allowing for more arbitrary weight functions w(i, j).
More precisely, he proposes the piecewise function:

(
i−1 i−1
2w j−1 for 0 ≤ j−1 ≤ 21
w(i, j) = i−1 1 i−1
, (3.77)
−1 + 2w + (2 − 2w) j−1 for 2 < j−1 ≤ 1

where i = 1, . . . , j and 0 < w < 1 is some value that allows w(i, j) to


i−1
be concave or convex in j−1 . Concavity implies that a path moving down
and then up, is more likely than a path moving up and afterwards down.
For w = 0.5, w(i, j) collapses to the Rubinstein (1994) case. The choice of
w can be added to the least squares problem used to recover the posterior
risk neutral distribution. Jackwerth (1997) reports that a concave weight, i.e.
w > 0.5, explains the post-crash data (beginning from 1987) best.
Generalized binomial implied trees preserve the property that non-positive
transition probabilities cannot occur, while at the same time the entire term
structure of options can be employed for its construction. Furthermore, un-
like the trees by Derman and Kani (1994b), Barle and Cakici (1998), and the
trinomial tree by Derman et al. (1996a) to be discussed next, they are easily
calibrated to non-European style options. A semi-recombining version of the
trees by Rubinstein (1994) and Jackwerth (1997) is proposed by Nagot and
Trommsdorff (1999).

Local volatilities. Given an implied tree, the local volatility σi,j at asset
price level i in time step j is calculated via:

µi,j = qi,j Ri+1,j+1 + (1 − qi,j ) Ri,j+1 ,


2
σi,j = qi,j (Ri+1,j+1 − µi,j )2 + (1 − qi,j ) (Ri,j+1 − µi,j )2 , (3.78)

where Ri+1,j denotes the return between the node (i, j − 1) and (i + 1, j) in
the tree. Note that the local volatility may need to be annualized to make it
comparable with IV. If we hold the horizon T of the tree fixed, and let the step
size shrink to zero, the approximation tends to the local variance function of
the corresponding underlying continuous time process.

Example. At this point, we illustrate the deterministic implied binomial trees


using the Derman and Kani (1994b) approach. We put r, δ = 0. As IVS
function, we use (also displayed in Figure 3.7):
−0.2
σ
b= + 0.3 . (3.79)
{ln(K/S)}2 + 1
Thus, we do not model a term structure of the IVS. From this IV function,
the BS option prices are computed, which are employed for growing the tree.
3.10 Local volatility models 77

Practically, this could be the smile function obtained from the smoothing
techniques in Chapter 4.
Let’s assume that S0 = 100, and T = 0.5 years discretized in five time
steps. In this case, the stock price evolution is found to be:

117.9
113.8
110.1 110.0
106.6 106.5
103.2 103.2 103.2
100.0 100.0 100.0
96.9 96.9 96.9
93.8 93.9
90.8 90.9
87.8
84.8

The tree of the upward transition probabilities is given by:

0.483
0.486
0.488 0.488
0.490 0.490
0.492 0.492 0.492
0.494 0.494
0.496 0.496
0.498
0.500

and, finally, the tree of the Arrow-Debreu prices is:

0.028
0.057
0.118 0.148
0.241 0.242
0.492 0.370 0.310
1.000 0.502 0.378
0.508 0.382 0.320
0.257 0.258
0.130 0.162
0.065
0.033
Implied vs local volatility from implied trees

0.2
0.18
0.16
0.14
0.12
0.1
0.08

80 100 120 140


Strike

SCMibtITTconv.xpl

Fig. 3.7. Convex IV smile (squares) computed from (3.79) and local (circles) volatil-
ity recovered from the implied binomial tree (filled circles) and trinomial tree (empty
circles).

Implied vs local volatility from implied trees


0.2
0.18
0.16
0.14
0.12
0.1
0.08

80 100 120 140


Strike

SCMibtITTmon.xpl

Fig. 3.8. Monotonous IV smile (squares) computed from σ b = −0.06 ln(K/S) + 0.15
and local (circles) volatility recovered from the implied binomial tree (filled circles)
and trinomial tree (empty circles).
3.10 Local volatility models 79

Exotic options of European style can be priced by simply multiplying the


payoff function, which is evaluated at each terminal node, with the Arrow-
Debreu price at this node. Since r = 0 we do not need to discount. For instance,
for K = 100, the price of a digital call is the sum of the Arrow-Debreu prices
for ST > K: Cdig (100, 1) = 0.485. For path-dependent options, one calculates
the path probabilities from the transition probabilities and iterates through
the tree by backward induction.
From Equation (3.78), the tree of local volatilities is calculated as:

0.109
0.105
0.102 0.102
0.100 0.100
0.100 0.100 0.099
0.100 0.100
0.102 0.102
0.105
0.109

In Figure 3.7, we display the smile together with the terminal local volatil-
ities (filled circles). It is seen that near ATM the local volatility smile is at the
levels of the IV smile, but increases in either direction from ATM. This is due
to the fact that the IV smile is convex. If it were monotonously decreasing,
local volatility would be below IV in the right-hand side of the Figure. This is
seen for another example in Figure 3.8. The two-times-IV-slope-rule is visible
as well, Section 3.7.

Derman et al. (1996a). Trinomial trees provide a more flexible approxima-


tion to the state space than a binomial tree, Figure 3.9: from each node (i, j)
at a (known) stock price si,j , there is the possibility of an upward move to
Si+2,j+1 , a downward move to Si,j+1 , and an intermediate move to Si+1,j+1 .
Again we let the node index i run from the bottom to the top. As will become
clear in the following, unlike the implied binomial tree which is uniquely deter-
mined (up to its trunk), the trinomial tree is underdetermined. At each node
(i, j) there are five unknowns: three subsequent stock prices and two transi-
tion probabilities. Consequently, Derman et al. (1996a) propose to fix a priori
the state space of the asset price evolution and to reduce the construction of
the tree to backing out the transition probabilities by forward induction. We
thus assume in the following that the asset price evolution has already been
specified.
The trinomial tree is recovered as the binomial one. First, as in (3.69), the
risk neutrality condition is
80 3 Smile consistent volatility models

s
Si0 +2,j+1
qiu0 ,j
 
node

 1 − qiu0 ,j − qid0 ,j

(i0 , j) si0 ,j s
  s Si0 +1,j+1
HH
H
HH
H
qid0 ,j HH
HHs Si0 ,j+1

level j j+1
time tj tj+1

Fig. 3.9. Construction of the implied trinomial tree from level j to level j + 1
according to Derman et al. (1996a) by forward induction. si0 ,j denotes the (known)
stock price at node (i0 , j), Si0 ,j+1 the (known, since a priori specified) stock price
at node (i0 , j + 1). qiu0 ,j is the (unknown) risk neutral transition probability from
node (i0 , j) to the upper node (i0 + 2, j + 1), qid0 ,j to (i0 , j + 1). At level j there are
i = 1, . . . , (2j − 1) nodes (i, j).

u u d d
Fi,j = qi,j Si+2,j+1 + (1 − qi,j − qi,j )Si+1,j+1 + qi,j Si,j+1 , (3.80)

and the option pricing equation (3.71) for calls maturing one period later
becomes:
2j+1
X
C(K, tj+1 ) = e−r∆t λi,j+1 (Si,j+1 − K, 0)+ , (3.81)
i=1

where

u

λ2j−1,j q2j−1,j for i = 2j + 1
 u u d
λ2j−2,j q2j−2,j + λ1,j (1 − q2j−1,j − q2j−1,j ) for i = 2j




u u d d

i−2,j qi−2,j + λi,j (1 − qi−1,j − qi−1,j ) + λi,j qi,j
λ for i = 3, . . .
λi,j+1 = .


 . . . , 2j − 1
u d d
λ1,j (1 − q1,j − q1,j ) + λ2,j qj,2 for i = 2





d
λ1,j qj,1 for i = 1

(3.82)
In fixing the strike of the option at K = Si0 +1,j+1 , Derman et al. (1996a)
show that (3.71) together with (3.80) can be solved for the unknown transition
probabilities:
3.10 Local volatility models 81
P2j
er∆t C(Si0 +1 , tj+1 ) − j=i0 +1 λi,j (Fi,j − Si+1,j+1 )
qiu0 ,j = , (3.83)
λi0 ,j (Si0 +2,j+1 − Si0 +1,j+1 )
while qid0 ,j follows immediately from (3.80). This determines the upper tree
from the center, while the lower part is grown from
Pi0 −1
d
er∆t P (Si0 +1 , tj+1 ) − j=0 λi,j (Si+1,j+1 − Fi,j )
qi0 ,j = . (3.84)
λi0 ,j (Si0 +1,j+1 − Si0 ,j+1 )
Again, qiu0 ,j is given by (3.80).
Trinomial trees can be considered to be advantageous compared to bino-
mial ones, since with the same number of steps, the approximation to the
diffusion is finer. Thus, pricing is more accurate at a given number of steps.
Furthermore they provide more flexibility, which – if judiciously handled –
may help avoid negative transition probabilities as encountered in the bi-
nomial trees implied from forward induction. As a drawback, one needs to
specify a priori the state space of the evolution of the asset price. Derman et
al. (1996a) discuss several techniques of doing so, usually taking an equally
spaced trinomial tree as starting point. From our experience, the more curved
the IV function is, the easier the standard CRR tree as state space is over-
taxed: more and more transition probabilities need to be overridden, which
can produce unlikely local volatilities. Thus, the challenge in trinomial trees
lies in an appropriate choice of the state space, which should immediately
reflect the structure of the – at this point unknown! – local volatility function.

Local volatilities. In trinomial trees, local volatilities are computed via an


obvious generalization of (3.78).

Example. We illustrate the implied trinomial tree. For comparison, we put


ourselves in the same situation as before. As IVS function, we use again (3.79).
For the trinomial tree, the stock price evolution fixed a priori from the
CRR tree is:
125.1
119.6 119.6
114.4 114.4 114.4
109.4 109.4 109.4 109.4
104.6 104.6 104.6 104.6 104.6
100.0 100.0 100.0 100.0 100.0 100.0
95.6 95.6 95.6 95.6 95.6
91.4 91.4 91.4 91.4
87.4 87.4 87.4
83.6 83.6
80.0
82 3 Smile consistent volatility models

The tree of the upward transition probabilities is given by:

0.466
0.393 0.346
0.296 0.276 0.267
0.250 0.249 0.246 0.245
0.244 0.244 0.243 0.242 0.241
0.250 0.249 0.246 0.245
0.296 0.276 0.267
0.393 0.346
0.466

and the tree of the downward transition probabilities by:

0.487
0.411 0.362
0.309 0.289 0.279
0.262 0.260 0.257 0.256
0.256 0.256 0.254 0.253 0.252
0.262 0.260 0.257 0.256
0.309 0.289 0.279
0.411 0.362
0.487

Finally, the tree of the Arrow-Debreu prices is:

0.003
0.007 0.010
0.018 0.027 0.038
0.061 0.084 0.100 0.108
0.244 0.241 0.229 0.215 0.202
1.000 0.500 0.378 0.316 0.278 0.251
0.256 0.253 0.240 0.224 0.211
0.067 0.092 0.110 0.118
0.021 0.031 0.044
0.008 0.012
0.004
3.10 Local volatility models 83

In this case, the price of the digital call is Cdig (100, 1) = 0.361. The large
difference in the results of the two trees is of course due to the small number of
levels used in the simulation. After increasing the levels, both prices converge
to Cdig (100, 1) ≈ 0.40. From (3.78) the tree of local volatilities is calculated
as:

0.138
0.127 0.119
0.110 0.106 0.104
0.101 0.101 0.100 0.100
0.100 0.100 0.100 0.099 0.099
0.101 0.101 0.100 0.100
0.110 0.106 0.104
0.127 0.119
0.138

In Figure 3.7, we display the smile together with the terminal local volatil-
ities of the binomial (filled circles) and trinomial trees (empty circles). Natu-
rally, the trinomial tree is more finely spaced.

3.10.2 Stochastic implied trees

Stochastic implied trees are stochastic extensions of the models discussed up


to now and combine Monte Carlo and lattice approaches. They have been in-
troduced as tractable implementations of the continuous time stochastic local
volatility models as presented in Section 3.8. Let (Ω, F, Q) be a probability
space with some martingale measure Q, which is equipped with the filtra-
tion (Ft )0≤t≤T . The key idea is to stochastically perturb the LVS observed
for a given set of option prices. While the asset price, which is adapted to
(Ft )0≤t≤T , moves randomly from node to node through the state space, local
transition probabilities between the nodes vary as well, thereby reflecting the
stochastic perturbations in local volatilities.

Derman and Kani (1998). Starting point in Derman and Kani (1998) is
the trinomial tree introduced by Derman et al. (1996a) which is calibrated to
the set of observed option prices. Next local volatilities are perturbed by the
discretized SDE
n (1)
o
2 2
∆σm,n (i, j) = σm,n (i, j) αem,n (i, j)∆tj + θ∆W , (3.85)

where the pair (i, j) denote the node (Si , tj ) in the tree, while (m, n) denote
all future nodes in the tree. This equation is meant to discretize the SDE
in (3.57).
84 3 Smile consistent volatility models

The volatility parameter θ is chosen in advance, e.g. via the principle


component analysis (PCA) presented in Chapter 5, while the drift coefficients
α
em,n (i, j) are obtained from the no-arbitrage requirement that the total prob-
ability Qm,n (i, j) of arriving at the future node (n, m) from the fixed initial
node (i, j) must be jointly martingales for all future nodes (n, m). Next a ran-
(0) (1) >
dom vector denoted by ∆W , ∆W is drawn. The first entry is used to
determine a new level of the underlying asset given by the three subsequent
nodes in the tree. The second one is directly inserted into (3.85) to arrive at
2
a new location for the entire volatility surface σm,n (i, j + 1). In the following,
all steps described are repeated for each node (i, j) in the tree. After each
draw of the random sample, the new drift coefficients are calculated from the
conditions on Qm,n (i, j), and so on. Thus, one generates many sample paths
through the tree as random realizations of arbitrage-free dynamics.
(1)
In specification (3.85), W is interpreted as a proportional shift to all
local volatilities. This corresponds to the main source of noise in the IVS as
shall be seen in Chapter 5. It is natural to assume that this holds also for
the LVS. Of course, multi-factor, node-dependent dynamics for the LVS could
be specified as well. Also parametric choices of the eigenfunctions recovered
by the functional PCA methods in Section 5.3 could be used. They could be
chosen in a way to model slope and twist shocks in the surface.
Stochastic implied trees are a flexible framework for option pricing and
hedging, since they also comprise non-Markovian volatility processes. How-
ever, the calculation of the drift-parameters becomes increasingly involved,
and the tree must be recalculated in each single simulation step, which is com-
putationally very demanding. Also, since the state space remains fixed from
the beginning, negative transition probabilities may occur when the volatili-
ties become very large. They need to be overwritten manually. Derman and
Kani (1998) report for their simulations that this occurred in less than 3% of
all paths simulated.

Britten-Jones and Neuberger (2000). In following the work by Derman


and Kani (1998), Britten-Jones and Neuberger (2000) propose a trinomial
implied tree that allows for stochastic volatility. Unlike the former approach,
their setting is Markovian, but for this reason also much simpler. As usual in
the trinomial tree framework, they start on a discrete time interval h = ∆t
by fixing the state space of the asset price evolution under the risk neutral
measure Q. The state space is chosen to be a finite geometric series (without
loss of generality):

K = {K| K = S0 uj , j = 0 ± 1, ±2, . . . , ±T /h} , (3.86)

where u > 0. Additionally, they require that if |j − k| > 1, then Q(St+h =


S0 uk |St = S0 uj ) = 0 with j, k = 0 ± 1, ±2, . . . , ±T /h. The latter assumption
can be thought of as a continuity assumption. As data input they require
3.10 Local volatility models 85

that a complete set of European calls C(K, t) for all expirations t ∈ T =


{0, h, 2h, . . . , T } and strikes K ∈ K be given.
Define the quantities
def C(Ku, t) − (1 + u)C(K, t) + uC(K/u, t)
Π(K, t) = , (3.87)
K(u − 1)
def C(K, t + h) − C(K, t)
Λ(K, t) = . (3.88)
C(Ku, t) − (1 + u)C(K, t) + uC(K/u, t)
Note that Π(K, t) is the cost of a butterfly spread paying one euro, if St = K,
and zero otherwise. Thus it is the Arrow-Debreu security in this framework.
Assuming that the asset price process adapted to the filtration (Ft )0≤t≤T
is a martingale with respect to the risk neutral measure Q, they show:

Q(St = K|F0 ) = Π(K, t) for all t ∈ T , K ∈ K , (3.89)

and

Λ(K, t)
 if K 0 = Ku
0
Q(St+h = K |St = K, F0 ) = 1 − (1 + u)Λ(K, t) if K 0 = K . (3.90)
0

uΛ(K, t) if K = K/u

In (3.89) it is seen that the probability of the asset arriving at any price
level in the tree on a future date t ∈ T is fully determined by an initial set of
option prices. However, it is also obvious from (3.89) and (3.90) that this does
not determine the probability of a specific price path, since the conditioning
information in (3.90) is neither Ft nor the price history up to t. Thus prices
of exotic options are not unique. The probability of a price path would be
determined if (and only if) the price process were Markovian, i.e. if

Q(St = K|Ft−1 ) = Q(St = K|St−1 ) for all t . (3.91)

This would be the case if the volatility were fully deterministic in S and t.
Thus, under the assumption of a deterministic volatility this approach can be
used to recover the complete price process from option prices.
Under stochastic volatility, however, all risk-neutral processes consistent
with the initial option prices share that the expectation of the squared returns
is given by:
( 2 )
Q St+h − St (u − 1)2 (u + 1)
E St = K = Λ(K, t) . (3.92)
St u

This is a necessary and sufficient condition, and the discrete-time counterpart


of the Dupire formula (3.19) in this tree.
To implement their stochastic volatility framework Britten-Jones and Neu-
berger (2000) assume the existence of a time-homogenous Markov chain Z
86 3 Smile consistent volatility models

that affects the one-step transition probabilities, i.e. the local volatilities
in the tree. The chain Z ∈ {1, 2, . . . , N } takes values on a set of inte-
gers with the transition matrix defined by its elements Q = (qm,n ), where
qm,n = Q(Zt+h = m|Zt = n, Ft ). The transition probabilities are chosen
independently and depend on the specific volatility process to be modelled.
def def
Define Π(K, t, z) = Q(St = K and Zt = z|F0 ) and Λ(K, t, z) = Q(St =
Ku|St = K and Zt = z, F0 ). The authors show that – in order to be consistent
with the initial set of option prices – Λ(K, t, z) must satisfy:
N
X
Λ(K, t)Π(K, t) = Λ(K, t, n)Π(K, t, n) . (3.93)
n=1

The left-hand side of (3.93) is extracted from the option data. In order to
identify the right-hand side they put Λ(K, t, z) = qe(K, t) v(z), where v(z) is
an exogenously chosen volatility function depending on the state z, and qe(K, t)
a multiplicative, node-dependent drift adjustment.
If all Π(K, t, z) and qe(K, t) are known for all prices K and volatility states
z up to t, forward induction of the tree is done via the following two steps:
first imply
N
X h
Π(K, t + h, z) = qz,n Λ(K/u, t, n)Π(K/u, t, n)
n=1
+ uΛ(Ku, t, n)Π(Ku, t, n)
i
+ {1 − (1 + u)Λ(K, t, n)}Π(K, t, n) . (3.94)

Second calculate the adjustments from


λ(K, t + 1)Π(K, t + h)
qe(K, t + h) = P , for K = S0 uj , |j| ≤ t/h . (3.95)
n v(n)Π(K, t + h, n)

The first step (3.94) follows a discrete version of the forward Kolmogorov
equation, in that the probability of a time-dependent state event is expressed
as the sum of the products of the preceding events and the one-step transition
probabilities. Equation (3.95) is obtained from (3.93).
Pricing works via backward valuation. Let V (K, t, z) be the value of an
option depending on level K and volatility state z. It has the terminal payoff
V (K, T, z). By the backward iteration
N
X h
V (K, t − h, z) = qz,n Λ(K/u, t − h, n)V (Ku, t, n)
n=1
+ uΛ(Ku, t − h, n)V (K/u, t, n)
i
+ {1 − (1 + u)Λ(K, t − h, n)}V (K, t, n) , (3.96)
3.10 Local volatility models 87

the price of the option is computed. Any contingent claim can be valued using
the lattice but the prices depend on the volatility process chosen.
The approach by Britten-Jones and Neuberger (2000) is an elegant, and
fast methodology for valuing options under stochastic volatility. It allows for
a wide range of volatility specifications including mean-reversion, GARCH, or
regime-switching models. Rossi (2002) investigates the ability of this model
to capture the smile dynamics among alternative volatility specifications.
Another recent advance in stochastic local volatility model is an approach
by Alexander et al. (2003): they model the local volatility function by a
stochastic mixture of local variances derived from a small number of base
processes. From this point of view they extend the work by Brigo and Mer-
curio (2001) discussed in Section 3.10.3. Alexander et al. (2003) report that
the model captures the patterns of the IVS both for short and long time to
maturities very well. Overall, stochastic local volatility models appear to be a
fruitful line of research. Their empirical performance in hedging and pricing,
for instance along the lines of Dumas et al. (1998) and Rosenberg (2000),
remains to be investigated more deeply.

3.10.3 Reconstructing the LVS

Parametric approaches

In this section, we survey approaches that aim at identifying the LVS as a


continuous function. In parametric approaches a functional form of the local
volatility is chosen and calibrated to the market data. As pioneering work for
alternative volatility specifications one may consider the constant elasticity
of variance model due to Cox and Ross (1976). In this model instantaneous
volatility is specified as
σ(St , t) = σStα−1 , (3.97)
with constants σ, α > 0. Since volatility is a deterministic function in St , the
LVS is: q
2
σK,T (St , t) = σ(K, T ) = σK (α−1) . (3.98)

For α = 1 we obtain the BS case. When α < 1, the volatility increases as


the stock price decreases. This corresponds to a transition probability func-
tion with heavy left tail and less heavy right tail. Consequently, this model
produces a downward sloping IV smile.
Another type of models that received recent attention are the quadratic
volatility models, Ingersoll (1997) and Rady (1997). Note however that for
this class of models, the term volatility refers to the function σ
e(St , t) in the
SDE of the form:

dSt = µ(St , t) dt + σ
e(St , t) dWt , (3.99)
88 3 Smile consistent volatility models
def
which is unlike our terminology. Typically σe(St , t) = γ(t) p(St ) for a strictly
positive and bounded function γ and a quadratic polynomial p(x) = a +
bx + cx2 . Zühlsdorff (2002) shows existence and uniqueness of the solution
to (3.99) and also discusses option pricing, when p has no, one and two real
roots. According to his simulations this model is perfectly able to mimic the
smile patterns one usually observes in the markets. An empirical application
with bounded polynomials up to order two in asset prices and time to maturity
is given by Dumas et al. (1998).
Other more flexible specifications have been proposed: Brown and Randall
(1999) use sums of hyperbolic trigonometric functions designed to capture the
term structure, smile and skew effects in the surface. Piecewise quadratic and
cubic splines are employed by Beaglehole and Chebanier (2002) and Cole-
man et al. (1999), respectively. McIntyre (2001) approximates the LVS with
Hermite polynomials.
The general advantage of these approaches appears to be that the es-
timated LVS does not exhibit excessive spikes as fully nonparametric cali-
brations are prone to unless strongly regularized. However, the parametric
calibration problem can be underdetermined given the small number of ob-
served market prices and the large number of parameters. Thus, the optimal
parameters may not be uniquely identifiable, which may cause instability for
instance in the computation of value at risk measures, Bouchouev and Isakov
(1999).

Mixture diffusions

A very flexible parametric, yet parsimonious modeling strategy based on mix-


ture diffusions was introduced by Brigo and Mercurio (2001). Let the dynamics
of the asset price under the risk-neutral measure Q be given by:
dSt (0)
= (r − δ) dt + σ(St , t) dW t , (3.100)
St
where σ(St , t) is a deterministic function satisfying the linear-growth condi-
tion spelled out in Appendix B to guarantee a unique solution to this SDE.
Furthermore, we are given N diffusions
(i) (i) (i)  (0)
dSt = (r − δ)St dt + θi St , t dW t , i = 1, . . . , N , (3.101)

with common initial value S0 . The volatility functions θi (·) satisfy similar
growth-conditions. Denote by φi (K, T |St , t) the risk neutral transition density
of these processes. The task is to identify the volatility function of (3.100) such
that the risk neutral transition density satisfies:
N
X
φ(K, T |St , t) = λi φi (K, T |St , t) , (3.102)
i=1
3.10 Local volatility models 89
PN
where λi ≥ 0 and i=1 λi = 1.
As shown in Brigo and Mercurio (2001), the solution is found by insert-
ing the candidate solution (3.102) into the Fokker-Planck equation (see Ap-
pendix B) and solving for the variance function by integrating twice. The
solution is given by:
PN 2
i=1 λi θi (St , t) φi (·)
σ 2 (St , t) = PN . (3.103)
2
i=1 λi St φi (·)

def
In the special case, where θi (St , t) = θi (t)St , the variance can be written
as a weighted average of the individual variance functions:
N
X
σ 2 (S, t) = ei θ2 (t) ,
λ i (3.104)
i=1

ei def
where λ = λi φi (·)/φ(·).
Hence the asset price process satisfies:
v
uN
dSt uX
ei θ2 (t) dW (0) .
= (r − δ) dt + t λ i t (3.105)
St i=1

Brigo and Mercurio (2001) point out that the conditions for existence
and uniqueness of a strong solution to (3.105) must be given case by case for
different specifications of the base transition densities φi (·). Brigo et al. (2002)
and Brigo and Mercurio (2002) analyze the cases of mixtures of normals, log-
normals and sine-hyperbolic processes.
The elegance of this approach becomes apparent in option pricing, espe-
cially when there are analytical pricing formulae for the base transition den-
sities. Due to linearity of the integration and derivative operators, the price
Ht of an option is given by

Ht = e−rτ EQ {ψ(ST )|Ft }


Z ∞ N
X
= e−rτ ψ(ST ) φi (K, T |St , t) dK
0 i=1
N
(i)
X
= λ i Ht , (3.106)
i=1

(i)
where ψ is some payoff function and Ht denotes the corresponding option
prices of the base processes. Also all greeks of Ht are convex sums of the base
option greeks. In the special case of log-normal mixtures, option prices are
90 3 Smile consistent volatility models

weighted sums of the BS prices of the options in the base processes which
makes the computation of prices particularly easy.
The approach is beautiful, since it provides a close link between the lo-
cal volatility and the risk neutral transition density. In the aforementioned
approaches it is difficult, if not impossible, to determine the risk neutral tran-
sition density from its specific parameterization at hand. However, as was seen,
this is desirable as it can make the computation of hedge ratios and prices
more straightforward, especially, when closed-form solutions are available.

Nonparametric methods

Alternative to the approaches above, another strand of literature directly aims


at recovering the full LVS directly from a set of observed option prices. Af-
ter estimating the LVS, it is implemented into pricing algorithms, e.g. finite
difference schemes to solve the generalized BS PDE, Randall and Tavella
(2000). Formally, reconstructing the LVS from option prices is an inverse
problem. Since the number of parameters for the calibration of the volatil-
ity surface largely outnumber the number of observations, which are typically
very small, Section 2.5, the problem is ill-posed in general. A review on this
literature is given by Bouchouev and Isakov (1999). They distinguish three
main approaches of numerical methods: optimization based algorithms, extra-
and interpolation schemes and iterative procedures.
Optimization based algorithms recover the LVS directly from the gener-
alized BS PDE (2.67) or from the dual PDE (3.19) by optimizing some cost
functional subject to the appropriate boundary conditions. Due to the ill-
posedness of the problem, small perturbations of the input data tend to result
in very different solutions of the minimizing functional. In order to stabilize
the computation, regularization methods for calibration are implemented. In
the Tikhonov regularization, one adds a smoothing device which insures that
the optimization problem has a unique solution under some goodness-of-fit
measure. For instance, Lagnado and Osher (1997) minimize the L2 -norm of
the gradient of the LVS such that the squared difference of the theoretical and
the observed prices is as close as possible to zero. The ‘closeness’ is steered by
a parameter to be chosen by the user. In each step the variational derivatives
are calculated for each point on a finite difference grid in a steepest descent
minimization. Berestycki et al. (2002) formulate the regularized cost func-
tional based on their asymptotic results reported in Section 3.6. Alternative
approaches using Tikhonov regularization are Jackson et al. (1998), Bodurtha
and Jermakyan (1999), Coleman et al. (1999), and Bodurtha (2000). As an
alternative means of regularization, Avellaneda et al. (1997) minimize the
relative-entropy distance to a prior distribution. They solve a constrained op-
timal control problem for a Bellman parabolic equation. An optimal control
framework is also chosen by Jiang and Tao (2001) and Jiang et al. (2003) to
determine the LVS.
3.11 Excellent fit, but...: the delta problem 91

Particularly simple methods are extra- and interpolation techniques. They


discretize the dual PDE (3.19) and extra- and interpolate the data for all
strikes and maturities. This is most conveniently achieved in the IV repre-
sentation of the LVS as derived in Equation (3.36). As extra- and interpola-
tion techniques Andersen and Brotherton-Ratcliffe (1997) and Dempster and
Richards (2000) employ cubic splines. Typically the splines are fitted first
across strikes, only, and a second set of splines across maturities. The relevant
derivatives are computed and inserted into (3.36). The evident disadvantage
of this approach is that the smoothness of the derivatives is guaranteed only
in the strike direction. In our computations of the LVS, we overcome this
point in employing the second order local polynomial estimator to estimate
the IVS. In this case, all derivatives are natural byproducts of the estimation.
A possible drawback of extra- and interpolation methods is that they are sen-
sitive and unstable. A particular challenge is the extrapolation of the IVS into
areas where no IVs are observed. Theoretical results by Lee (2003) suggest
that the smile function should be extended in log-moneyness as a square root
multiplied with a constant reflecting the number of moments assumed to ex-
ist in the underlying risk neutral distribution, Section 2.6.2. The difficulty of
extra- and interpolation methods is that there does not appear to be a way to
guarantee that standard arbitrage bounds are not violated, or that the local
variance remains positive and finite.
Finally, given the small number of observations, Bouchouev and Isakov
(1999) suggest iterative procedures to reconstruct the LVS. In their first ap-
proach they employ an analytic approximation for the solution to the general-
ized BS PDE. This leads to an integral equation for the LVS that is discretized
at the points where the option data are available. A resulting system of non-
linear equations is solved, where the values of the LVS are recovered as BS
IVs from adjusted option prices. His second method iteratively exploits the
fundamental solution to the generalized BS PDE (2.67).
To our knowledge, little is known how the different algorithms compare
among each other. Especially, a comprehensive appraisal of the different ap-
proaches in terms of stability, computational costs, and proneness to errors
remains to be done.

3.11 Excellent fit, but...: the delta problem

As has been pointed out throughout this work, the decisive virtue of smile
consistent models, local volatility models in particular, is that they completely
reproduce or reprice the market, thereby allowing to price plain vanilla options
and exotic options alike with the same model. This is simply by construction,
and theoretically appealing, since a lot of types of exotic options can be hedged
via static approaches, Derman et al. (1995), Carr et al. (1998) and Andersen et
al. (2002). However, since the conditions under which static hedging works, are
92 3 Smile consistent volatility models

typically not met on real markets, in practice one often hedges dynamically.
Dynamic hedging depends on the accuracy to which the greeks describe the
price dynamics to first or second order. However, this is exactly where local
volatility models have been put severely under fire in a article by Hagan et
al. (2002). The authors focus their criticism on the delta computed from local
volatility models.
To illustrate their main argument, they consider the special case where
local volatility is a function of the form:

σ(St , t) = σ(St ) , (3.107)

where σ is deterministic. By singular perturbation techniques, Hagan and


Woodward (1999) and Hagan et al. (2002) show that todays IV function
σ
b0 (S0 , K) is related to local volatility to leading order by:
"
1 σ 00 21 (S0 + K)
 
1
σ
b0 (S0 , K) = σ (S0 + K) 1 + 1 (S − K)2 . . . ,
2 24 σ 2 (S0 + K)
(3.108)
where σ 00 denotes the second derivative of the local volatility function with
respect to S. According to Hagan et al. (2002) the first term in (3.108) ac-
counts already for 99% of the IV function. Therefore, one can safely pretend
that  
1
b0 (S0 , K) ≈ σ
σ (S0 + K) , (3.109)
2
which also uncovers the two-times-IV-slope rule, Section 3.7. Thus, for a given
IV function the fitted local volatility must satisfy (3.109), or equivalently, at
a strike K = 2S − S0 , we find that
 
1
b0 (S0 , 2S − S0 ) ≈ σ
σ (S0 + 2S − S0 ) = σ(S) . (3.110)
2
Putted in words, this means that the local volatility at some point S corre-
sponds (approximately) to the IV function at the strike K = 2S − S0 .
Suppose now that the current spot value S0 changes by ∆S to S1 . The
decisive point to remember is that the local volatility function remains the
same, it is simply evaluated at the new spot level S1 = S0 + ∆S. Therefore,
reading Equation (3.109) from left to right and from right to left shows that
the new IV smile is related to the previous one by:
 
1
b1 (S1 , K) ≈ σ
σ (S0 + ∆S + K) ≈ σ b0 (S0 , K + ∆S) . (3.111)
2
Thus, as the spot moves up, the smile shifts to the left, and vice versa. This
behavior, however, is against the common market experience, Figure 3.10:
instead, the smile is expected to remain constant at the strikes (sticky-strike-
assumption), i.e. the smile function for a given K does not change, or to shift
3.11 Excellent fit, but...: the delta problem 93

Model consistent Sticky strike Sticky moneyness


b1 (S1 , K) ≈ σ
σ b0 (S0 , K + ∆S)

σ
b σ
b σ
b

HH HH HH HH HH
H H H H- H
HH HH HH HH HH
H H H H H

K K K
Fig. 3.10. Alternative IV smile dynamics assuming an upward shift of the asset
price. Left panel: dynamics of the IV smile implied from (deterministic) local volatil-
ity models. Central panel: sticky-strike assumption. Right panel: sticky-moneyness
assumption.

with the spot (sticky-moneyness-assumption), i.e. the smile stays constant


measured in terms of moneyness, Derman (1999).
Now, consider the delta in the local volatility model, which is simply given
the BS delta and a vega correction, compare with Section 2.9:

∂C
ft ∂CtBS ∂CtBS ∂b
σ
= + . (3.112)
∂S ∂S ∂b
σ ∂S
Since the local volatility model predicts that the smile moves left when the
spot moves up and vice versa, which is opposite to common market behavior,
Hagan et al. (2002) conclude that the local volatility delta is wrong or at best
very misleading.
In practice this problem is met by recalibration of the model. Instead of
reading the delta from the finite difference scheme, which yields the model-
implied delta, one shifts the spot and computes the delta via a finite difference
quotient. In shifting the spot, one imposes the IV smile dynamics that are
considered as appropriate, i.e. one recomputes the new option prices either
at the same smile (sticky-strike), or at a smile function shifted with the spot
(sticky-moneyness). This practice, however, has led to a whole delta menu and
a fierce debate on which is the best: the model-implied local volatility delta,
the sticky-strike or BS delta, and the sticky-moneyness delta.
From a theoretical perspective, the answer can be given case by case de-
pending on the prevailing market regime, Derman (1999) and more recently
Crépey (2004), but practically the question appears to be unsolved. In simu-
lating alternative asset price dynamics, McIntyre (2001) finds that the local
volatility model is not delivering robust delta hedges when the true model
is a jump-diffusion, but fairly accurate ones in a pure stochastic volatility
setting. In hedging exercises with real data, Dumas et al. (1998) prefer the
sticky-strike delta to the local volatility variant, whereas Coleman et al. (2001)
94 3 Smile consistent volatility models

and Vähämaa (2004) find opposite evidence. Clearly, the contradicting results
can be due to the fact that the ‘right’ delta depends on the current market
regime, and a final answer cannot be given, or must be sought in stochastic
local volatility settings, Alexander and Nogueira (2004).
Clearly, the delta discussion extends also to other higher order greeks
involving a spot derivative, in particular gamma and vanna, but the literature
appears to be silent on this topic. The difficulty is that higher order greeks are
prone to numerical errors making an analysis very cumbersome. But still, since
the local volatility models are frequently used for options with non-convex
payoff profiles, such as barrier options, this discussion is of vital importance,
and needs to be addressed in the future.
Aside from the delta problem, another unsatisfying feature of LV models
is that they predict flat future smiles: the since the IVS flattens out for longer
time horizons, so does the LVS, compare Figure 3.1. Therefore, implicitly the
model predicts flat future smiles, which is typically not what one expects.
Therefore, options that start in the long dated future, such as forward start
options and cliquet structures, will be priced incorrectly, as their prices are
computed under the assumption of a flat (forward) IVS at their starting date.
These types of exotics need to be priced with stochastic LV model, stochastic
volatility or jump diffusion models that do not suffer from this drawback,
Kruse (2003).

3.12 Stochastic IV models

Stochastic IV models follow a different strategy than the local volatility and
the classical stochastic volatility models: the idea is not to introduce the
stochastic setting via the instantaneous volatility function, but through a
stochastic IV process. Like deterministic local volatility models, they allow
for a preference-free option valuation, since markets are complete owing to
the fact that volatility is tradable through options, usually plain vanilla op-
tions of European style. Stochastic IV models were developed by Ledoit and
Santa-Clara (1998) and Schönbucher (1999), and have recently been more
deeply analyzed by Brace et al. (2001), Amerio et al. (2003) and Daglish et
al. (2003).
The (somewhat simplified) model set-up is as follows: for a fixed time
interval [0, T ∗ ], we consider a probability space (Ω, F, Q), where Q is the
(unique) martingale measure in the economy. We define two Brownian motions
(0)  (1) 
W t 0≤t≤T ∗ and W t 0≤t≤T ∗ on this space. Without loss of generality
they are assumed to be uncorrelated. The space is equipped with a filtration
(Ft )0≤t≤T ∗ . As tradable assets, we have the underlying asset St paying a
constant dividend yield δ, a riskless investment with constant interest rate r,
and a European call option C(St , t, K, T ).
3.12 Stochastic IV models 95

Under the measure Q the asset price dynamics are governed by the SDE
dSt (0)
= (r − δ) dt + σ(St , t, σ
bt ) dW t , (3.113)
St

where σ(St , t, σ
bt ) 0≤t≤T ∗ is some (Ft )0≤t≤T ∗ -adapted stochastic process. It
will be seen that it is driven by the stochastic IV process which follows
db
σt (K, T ) (0) (1)
= α(b σt , t, St ) dt+θ0 (b
σt , t, St ) dW t +θ1 (b σt , t, St ) dW t , (3.114)
σbt (K, T )
 
where α(b σt , t, St ) 0≤t≤T ∗ and θi (b σt , t, St ) 0≤t≤T ∗ are predictable stochastic
processes. The explicit dependence on (b σt , t, St ) is dropped in the following for
the sake of clarity. Also we will write σ bt only, but the dependence of IV on K
and T should be borne in mind. Finally, all diffusion parameters are assumed
to satisfy the regularity assumptions such that unique strong solutions exist,
see appendix Chapter B. The option is priced using the BS formula together
with the current realization of the IV process σ bt .
A first set of restrictions on the drift of the IV process insures that no-
arbitrage opportunities exist. They are derived as follows: by Itô’s lemma the
dynamics of the call are given by:
∂Ct ∂Ct 1 ∂ 2 Ct
dCt = dt + dSt + σ 2 (St , t, σ
bt )St2 dt
∂t ∂S 2 ∂S 2
∂Ct 1 ∂ 2 Ct ∂ 2 Ct
+ db
σt + σ it +
dhb dhb
σ , Sit .
∂b
σ 2 ∂bσ ∂bσ ∂b
σ ∂S
(3.115)

In the risk neutral world, the drift of the call must be equal to rCt dt.
Thus, by collecting the dt-terms in (3.115) and rearranging, the condition on
the drift reads as
∂Ct ∂Ct 1 2 2 ∂ 2 Ct
0 = + (r − δ)St + σ b S − rCt
∂t ∂S 2 t t ∂S 2
1 2
n o ∂ 2 Ct
+ σ (St , t, σ bt2 St2
bt ) − σ
2 ∂S 2
2
∂Ct 1 ∂ Ct 2 ∂Ct
+α + (θ + θ12 ) + σ(St , t, σ
bt )θ0 St . (3.116)
∂bσ 2 ∂b σ ∂bσ 0 ∂S∂b
σ
Obviously, the first line of (3.116) is the BS PDE (2.13) with IV replacing
the volatility function. It must be equal to zero. Taking this into account, the
condition on the drift is identified as
 −1 n o ∂2C
1 ∂Ct t
α= bt2 − σ 2 (St , t, σ
σ bt ) St2
2 ∂b σ ∂St2
∂ 2 Ct 2

2 ∂Ct
− (θ + θ1 ) − 2σ(St , t, σ
bt ) θ0 St .
∂bσ ∂bσ 0 ∂S∂b
σ
(3.117)
96 3 Smile consistent volatility models

Using the analytical derivatives of the BS call pricing formula given in


(2.28) to (2.37), this reduces to
1 n 2 o 1d d
1 2 d2
α= bt −σ 2 (St , t, σ
σ bt ) − (θ02 +θ12 )+ √ σ(St , t, σ
bt )θ0 , (3.118)
2bσt τ 2 σ
bt σ
bt τ
which must be satisfied Q-almost surely to avoid arbitrage.
Equation (3.118) provides a number of interesting insights:

1. When IV is constant, i.e. θ0 = θ1 = 0, the instantaneous volatility σ must


be a constant as well in order to satisfy (3.118).
2. If IV is a function only in time and strikes, i.e. again
n θ0 = θ1 = 0,othe
σt = (2τ )−1 σ
dynamics of the IV process reduce to db bt2 − σ 2 (St , t, σ
bt ) dt,
which can be written as
bt2 )
d(τ σ
σ 2 (St , t, σ
bt ) = − . (3.119)
dt
This in turn implies that the instantaneous volatility is as well only a func-
tion in time and strikes, as assumed in the (deterministic) local volatility
models. Equation (3.119) relates back to the interpretation of the squared
IV as the average squared volatility through the life time of the option.
This was discussed in Section 2.8.
3. The drift is mean-fleeing as the first term shows on the right-hand side
in (3.118). The further IV is away from instantaneous volatility, the further
it is going to be pushed away. The speed of the mean-fleeing behavior
increases as T − t tends to zero, causing a ‘volatility bubble’, Schönbucher
(1999).

The existence of the volatility bubble can be avoided by imposing restric-


tions on the instantaneous volatility as T − t tends to zero. Indeed, it is easy
to show that if the instantaneous volatility satisfies in the limit of t ↑ T

σt4 + σ
−b bt2 σ 2 (St , t, σ
bt ) − 2 xθ0 σ bt ) + x2 (θ02 + θ12 ) = 0 ,
bt σ(St , t, σ (3.120)
def
where x = − ln κf = ln{e(r−δ)τ St /K} is (inverse) forward log-moneyness,
bubbles are excluded from the model. This holds uniquely, since for θ0 , θ1 , σ >
0 and x ∈ R this polynomial has only one solution for σ
bt > 0.
Equation (3.120) has at least two implications: first, it is seen that σbt is
quadratic in x, which implies a smile across K. Since its shape is directly
determined by θ0 and θ1 , both parameters may be identified by calibration
to the market smile. If θ0 = 0, i.e. if there is no correlation between the asset
price and the IV dynamics, the smile is symmetric in x. Thus, asymmetry in
the smile, the ‘sneer’, is introduced through the Brownian motion driving both
variables. This parallels the work of Renault and Touzi (1996) as discussed in
Section 2.11.
3.12 Stochastic IV models 97

Second, the ATM IV defined in terms of the forward moneyness, i.e. where
x = 0 converges to instantaneous volatility as T − t tends to zero. However,
this is not a consequence of the no-bubble restriction, but can be formally
proved, Ledoit and Santa-Clara (1998); Daglish et al. (2003). This is seen as
follows: from a first order Taylor series expansion of the BS pricing formula
in the neighborhood
√ of ATM (in the sense of log-forward moneyness), i.e. at
d1 = −d2 = 12 σ
bt τ , we obtain that

1 √
Ct (St , t, e(r−δ)τ St , T ) ≈ √ e−δτ St σ
bt τ . (3.121)

This implies r
2π Ct
lim σ
bt = lim . (3.122)
t↑T t↑T τ e−δτ St
The call price can be approximated for small τ by
n o
Ct = e−rτ EQ (ST − e(r−δ)τ St )+ |Ft
n (0) (0) +
o
≈ e−rτ EQ St σ(St , t, σ bt ) W T − W t |Ft
r
−rτ τ
=e St σ(St , t, σ
bt ) , (3.123)

q
Var(z)
where the last line follows from the fact that E(z)+ = 2π , where z is
a normally distributed random variable with zero mean and variance Var(z).
Inserting (3.122) and taking limits yields the desired result:

lim σ
bt = lim σ(St , t, σ
bt ) . (3.124)
t↑T t↑T

Note that this parallels the harmonic mean averaging result of Berestycki
et al. (2002): here also, ATM local volatility, which is instantaneous volatility,
converges to IV, Section 3.6.
The pricing of path-independent options works along standard lines. By
standard results, the option price H must satisfy the following PDE subject
to the appropriate boundary conditions:

∂H ∂H 1 ∂2H
0= + (r − δ)St + σ 2 (St , t, σ
bt )St2 − rH
∂t ∂S 2 ∂S 2
∂2H ∂H 1 ∂2H
+ σ(St , t, σ
bt ) θ0 St +α + (θ02 + θ12 ) .
∂b
σ ∂S ∂b
σ 2 ∂b
σ ∂b
σ
(3.125)

Path-dependent options can be priced through Monte Carlo simulation.


98 3 Smile consistent volatility models

In the implementation, difficulties may arise from the rather involved no-
arbitrage conditions, Balland (2002). To simplify, Brace et al. (2001) propose
def
to parameterize the volatility of IV as θi (b
σ , t) = θi σ
b, i = 0, 1, where θi > 0
is constant. This removes the singularities apparent in (3.118). Instead of ob-
taining the parameters from fitting the smile as suggested above, they can be
recovered from PCA methods developed in Section 5.2. This path is taken for
instance in Fengler et al. (2002b) and Cont et al. (2002). For the specification
of the instantaneous volatility a lot of freedom remains, as long as (3.120)
is satisfied in the limit. Alternatively, one may fix a drift function and re-
cover from (3.118) the corresponding instantaneous volatility. For instance,
the simplest choice would be to put α = 0.
Finally, it should be remarked that the specification of the model in abso-
lute terms, i.e. in terms of a fixed expiry date and a fixed strike may sometimes
prove to be inconvenient in practice. Especially, an empirical identification of
the parameters is likely to be more stable in terms of moneyness and time to
maturity rather than in strikes and expiry dates. This is addressed in Brace
et al. (2001) who show how to switch from the absolute to the relative nota-
tion of the model and its no-arbitrage conditions. Amerio et al. (2003) follow
this approach and show how to price volatility derivatives using stochastic IV
models.

3.13 Summary

In the first part of this chapter we introduced the theory of local volatility.
Also several techniques for extracting local volatility from option prices were
discussed. The focus was on implied trees. In the second part stochastic IV
models were presented. At this point, we consider it to be appropriate to recall
the concepts of volatility systematically. As explained in the introduction, we
collected the main results in Figure 3.11.
Starting from the most left arrow with the instantaneous variance, the
first (and trivial) relation is the identity of local and instantaneous variance for
K = St and T = t. Moreover, local and implied variance can be represented as
averages of instantaneous variance: local variance is the expectation under the
(K, T )-risk adjusted measure. Implied variance is – for ATM options under
the Hull and White (1987) model – the expectation under the risk neutral
measure. Finally, the stochastic IV models show that ATM IV converges to
instantaneous volatility as time to maturity converges to zero.
The asymptotic relations between implied and local volatility are presented
in the top of the figure. They hold under the assumption of a deterministic
instantaneous volatility function: first, IV is a spatial harmonic mean as time
to maturity converges to zero. Second, if no strike dependence is present or
for far OTM/ITM options, IV is a time average of local volatility. The Dupire
3.13 Summary 99
IV counterpart of Dupire formula (3.36)

t ↑ T : spatial harmonic
mean of volatility (3.46)
local variance determ. - implied variance
2
σK,T (St , t) no strike dependence bt2 (K, T )
σ
or far OTM/ITM
KA KA arithmetic mean (2.78) and (3.47)
 
A AA  
A 
A E(K,T ) {σ 2 (S , T, ·)|F } Qλ1
√ 2 
2
A T t {E ( σ |Ft )} 
A Section 3.8 K ≈ Ft , see (2.93) 
K = St , T = t A  K = Ft , t ↑ T
see (3.4) A A   see (3.124)
A A  
A
U A  

instantaneous variance
σ 2 (St , t, ·)

Fig. 3.11. Overview on volatility concepts. Solid lines denote exact relations between
the different types of volatility. The dashed line denotes an ad-hoc concept. The
arrows denote the direction of the relations.

formula in its IV representation – the dotted line – allows for recovering the
LVS from the IVS and its derivatives. It is an ad-hoc concept, but a convenient
way to reconstruct the LVS. Finally, the two-times-IV-slope was shown to hold
for ATM options near to expiry.
After the recent theoretical and computational advances, local volatility
models are found to be more and more criticized either for practical reasons
or from theoretical grounds. From the practical perspective, there is the crit-
icism that local volatility models deliver a wrong delta, Hagan et al. (2002).
As discussed, the empirical literature does not appear to be strongly conclu-
sive on the matter. A harsh methodological criticism is given by Ayache et al.
(2004). Their main argument against local volatility is that these models lack
economic grounds by not offering a reasonable smile explanation, as stochastic
volatility or jump diffusion models do. Rather these models ‘tweak’ the diffu-
sion coefficient in the BS PDE, until the observed option prices are matched.
100 3 Smile consistent volatility models

From their point of view, local volatility is just a computational construct


bearing no economic content whatsoever. Given the highly spiky and coun-
terintuitive surface structures that are typically recovered in local volatility
models, this position cannot be completely dismissed. A somewhat milder po-
sition is taken by Wilmott (2001a, Chapter 25). He argues that local volatility
models may be good when the options, from which the LVS is backed out, are
simultaneously employed for static hedges: this can reduce the model error.
In this case, one computes the LVS, and prices for instance a barrier option
with respect to it. The option is then statically hedged by mimicking as close
as possible the boundary condition and the payoff. However, we are not aware
about a study simulating this strategy based on real data and assessing its
success.
To summarize, given the current state of research, it seems to be difficult to
give a concluding appraisal of local volatility models. The stochastic variants
of local volatility or the stochastic IV models may offer fruitful solutions. But
finally, it is daily practice on trading floors that has to determine whether
local volatility can compete with stochastic volatility and jump-diffusions or
not.
4

Smoothing techniques

4.1 Introduction

Functional flexibility is a key requirement for model building and model se-
lection in quantitative finance: often it is difficult, and sometimes impossible
to justify on theoretical grounds a specific parametric form of an economic
relationship under investigation. Furthermore, in a dynamic context, the eco-
nomic structure may be liable to sizable changes and considerable fluctuations.
Thus, estimation techniques that do not impose any a priori restrictions on the
estimate, such as non- and semiparametric methods, are increasingly popular
in financial practice.
In the case of the IVS, model flexibility is a prerequisite rather than an
option: as has been seen in Chapter 2, from the BS theory, the IVS should
be a flat and constant function across strike prices and the term structure
of the option’s time to maturity. Yet, as a matter of fact, one observes rich
functional patterns fluctuating through time. This feature together with the
discrete design, i.e. the fact that the daily IV observations occur only for a
limited number of maturities, render IVS estimation an intricate challenge.
Parametric attempts to model the IVS along the strike profile, i.e. the
‘smile’, usually employ quadratic specifications, Shimko (1993), Ané and Ge-
man (1999), and Tompkins (1999) among others. Also some of the methods
listed for estimating the local volatility function are applicable here, Sec-
tion 3.10.3. To allow for more flexibility, Hafner and Wallmeier (2001) fit
quadratic splines to the smile function. However, it seems that these para-
metric approaches are not capable of capturing the salient features of IVS
patterns, and hence estimates may be biased.
Recently, non- and semiparametric smoothing techniques for estimating
the IVS have been used more and more: Aı̈t-Sahalia and Lo (1998), Rosenberg
(2000), Cont and da Fonseca (2002), Fengler et al. (2003b) employ a Nadaraya-
Watson estimator of the IVS function, and higher order local polynomial
102 4 Smoothing techniques

smoothing of the IVS is used in Rookley (1997). Aı̈t-Sahalia et al. (2001a)


discuss model selection between fully parametric, semi- and nonparametric
IVS specifications and argue in favor of the latter approaches.
The key idea in nonparametric estimation can be summarized as follows:
suppose we are given a data set {(xi , yi )}ni=1 , where xi ∈ R denotes the pre-
dictor or the explanatory variables, and yi ∈ R the response variable. In the
context of IVS estimation, this would be some moneyness measure and time
to maturity, or either of them, and IV respectively. The aim is to estimate the
regression relationship

yi = m(xi ) + εi , i = 1, . . . , n . (4.1)

If one believes in some degree of smoothness between the explanatory


variables and the response variable, it appears natural to assume that the
data in the local neighborhood of a fixed point x contain information of m at
x: thus, the basic idea in nonparametric estimation is to obtain an estimate
m(x)
b by locally averaging the data. More formally, this can be described by
n
1X
m(x)
b = wi,n (x) yi , (4.2)
n i=1

where {wi,n (x)}ni=1 denotes a sequence of weights. The weights reflect the
likely fact that one will give higher weights to the observations xi in the near
vicinity of x than for those far off. Most nonparametric techniques can be
written in this way, and differ only in the way the weights are computed.
In Section 4.2 and 4.3 of this chapter, we give an introduction into
Nadaraya-Watson and local polynomial smoothing, which are the techniques
employed for almost any of the graphical illustrations throughout this work.
In Nadaraya-Watson smoothing one estimates a local constant, while in local
polynomial smoothing one fits a polynomial of order p within a small neighbor-
hood. From this point of view, Nadaraya-Watson smoothing is the special case
of local polynomial smoothing with degree p = 0. Usually, in local polynomial
smoothing, one uses a local linear estimator, i.e. p = 1, which is less affected
by a bias in the boundary regions of the estimate than the Nadaraya-Watson
estimator, Härdle et al. (2004). This however is asymptotically negligible.
When it is mandatory to also estimate derivatives, e.g. when the LVS is
recovered from the IVS, Section 3.5, one needs to use higher order local poly-
nomials. The degree of the polynomial depends on the number of derivatives
desired. Since the local polynomial estimator can be written as a weighted
least squares estimator, implementation is straightforward.
Section 4.5 presents an IVS estimator, a least squares kernel smoother,
proposed by Gouriéroux et al. (1994) and Fengler and Wang (2003). This
approach smoothes the IVS in the space of option prices and avoids a po-
tentially undesirable feature of previous estimators: the two-step procedure.
4.2 Nadaraya-Watson smoothing 103

Traditionally, in a first step, IVs are derived by equating the BS formula with
observed market prices and by solving for the diffusion coefficient, Section 2.5.
In the second step the actual fitting algorithm is applied. A two-step estimator
may be biased, when option prices or other input parameters can be observed
with errors, only. Moreover, the nonlinear transformation of the option prices
makes the error distribution less tractable. Indeed, it has been conjectured
that the presence of measurement errors can be of substantial impact, see
Roll (1984), Harvey and Whaley (1991), and particularly Hentschel (2003)
for an extensive study on errors in IV estimation and their possible magni-
tude. Potential error sources are the bid-ask bounce, nonsynchronous pricing,
infrequent trading of index stocks, and finite quote precision. Unlike the lo-
cal polynomial smoother, the least squares kernel smoother does not have a
closed-form solution, and for each grid point, the estimation must be achieved
separately by a minimizing the objective function. On the other hand, as shall
be seen, our results allow for the estimation of confidence bands that take the
nonlinear transformation of the option prices into IVs into account.
A third methodology due to Fengler et al. (2003a) estimates the IVS via a
semiparametric factor model. The reason for investigating this third approach
lies in the very nature of the IVS data. As has been pointed out in Section 2.5,
the IVS data are not equally distributed in the space, but occur in strings. Un-
less carefully calibrated, the fits obtained by the methods, which are discussed
in this chapter, can be biased. The estimation strategy of the semiparametric
factor model is specifically tailored to the degenerated, discrete string struc-
ture of the IVS data. It shall be discussed in Chapter 5, since we consider
the dimension reduction aspects of this approach as its dominating feature,
although it may also be seen as a pure estimation technique.

4.2 Nadaraya-Watson smoothing

4.2.1 Kernel functions

Nonparametric estimates are obtained by averaging the data locally. Usually,


in this averaging, the data are given weights depending on the vicinity to
x ∈ R, at which the regression function m is to be found. The weighting is
achieved by kernel functions K(·). The kernel functions employed in standard
situations are continuous, positive, bounded and symmetric real functions
which integrate to one: Z
K(u) du = 1 . (4.3)

Kernel functions that are typically employed in nonparametric smoothing


are the quartic kernel
15
K(u) = (1 − u2 )2 1(|u| ≤ 1) , (4.4)
16
104 4 Smoothing techniques

and the Epanechnikov kernel


3
K(u) = (1 − u2 ) 1(|u| ≤ 1) , (4.5)
4
which both have a bounded support. A kernel with infinite support is the
Gaussian kernel, which is given by:

def 1 2
K(u) = ϕ(u) = √ e−u /2 , (4.6)

where π = 3.141... denotes the circle constant.
For multidimensional smoothing tasks, as for IVS estimation, one needs
multidimensional kernels. It is most common to obtain multidimensional ker-
nels via products of univariate kernels:
d
Y
K(u1 , . . . , ud ) = K(j) (uj ) , (4.7)
j=1

which in this way inherit the properties of the univariate kernel function.
While different kernels have a different impact on the theoretical properties
of the estimator, in practice the choice of the kernel function is not of big
importance, and is mainly driven by practical considerations, Marron and
Nolan (1988). For our work, we will only use quartic kernels and products of
them.
The degree of localization or smoothing is steered via the bandwidth h.
For instance, for a given data set {(xi , yi )}ni=1 , for x, y ∈ R, the bandwidths
enter the kernel functions via
 
1 x − xi
K , i = 1, . . . , n . (4.8)
hn hn

The index n for the bandwidth clarifies that hn actually depends on the
number of observations. This is natural, since in the asymptotic perspective,
as the number of observations tend to infinity, the degree of localization can
shrink to zero without ‘loosing’ information about the regression function. In
most cases, however, we will suppress this explicit notation.
Finally, it will occasionally be convenient to use the abbreviation

def 1 u
Kh (u) = K . (4.9)
h h
4.2 Nadaraya-Watson smoothing 105

4.2.2 The Nadaraya-Watson estimator

For simplicity, consider the univariate model

Y = m(X) + ε , (4.10)

with the unknown (but twice differentiable) regression function m. The ex-
planatory variable X and the response variable Y take values in R, have the
joint pdf f (x, y) and are independent of ε. The error ε has the properties
E(ε|x) = 0 and E(ε2 |x) = σ 2 (x).
Taking the (conditional) expectation of (4.10) yields

E(Y |X = x) = m(x) , (4.11)

which says that the unknown regression function is the conditional expec-
tation function of Y given X = x. Using the definition of the conditional
expectation (4.11) can be written as
R
yf (x, y) dy
m(x) = E(Y |X = x) = , (4.12)
fx (x)

where fx denotes the marginal pdf. Representation (4.12) shows that the
regression function m can be estimated via the kernel density estimates of
the joint and the marginal density. This approach was first introduced by
Nadaraya (1964) and Watson (1964).
Suppose we are given the randomly sampled iid data set {(xi , yi )}ni=1 .
Then, the Nadaraya-Watson estimator is given by:
Pn
n−1 i=1 Kh (x − xi ) yi
m(x) = −1 Pn . (4.13)
i=1 Kh (x − xi )
b
n

Rewriting (4.13) as
n n
1X K (x − xi ) 1X
m(x) = −1
Pnh yi = wi,n (x) yi (4.14)
j=1 Kh (x − xj )
b
n i=1 n n i=1

reveals that the Nadaraya-Watson estimator can be written as the locally


weighted average of the response variable with weights

def K (x − xi )
wi,n (x) = Pnh . (4.15)
n−1 j=1 Kh (x − xj )

Under some regularity conditions, the Nadaraya-Watson estimator is con-


sistent, i.e.
p
m(x)
b −→ m(x) (4.16)
106 4 Smoothing techniques

as nh ↑ ∞, h ↓ 0 and n ↑ ∞. As opposed to parametric models under the


correct specification, nonparametric estimates are biased. The bias, which is
defined by Bias{m(x)}
b = E{m(x)
b − m(x)}, can be reduced by decreasing the
bandwidth, but this increases the variance of the regression function. The art
in nonparametric regression lies trading off the variance and the bias.
The asymptotic bias of the Nadaraya-Watson estimator is

h2 m0 (x)fx0 (x)
 
00
Bias{m(x)}
b = µ2 (K) m (x) + 2
2 fx (x)
+ O(n−1 h−1 ) + O(h2 ) , (4.17)

u2 K(u) du, and the asymptotic variance is given by:


R
where µ2 (K) =

1 σ 2 (x)
Z
Var{m(x)}
b = K 2 (u) du + O(n−1 h−1 ) . (4.18)
nh fx (x)

For a precise treatment of the preceding statements see for instance Härdle
(1990) or Pagan and Ullah (1999).
The Nadaraya-Watson estimator generalizes in a straightforward manner
to the multivariate case: for some Rd -valued sample {(xi , yi )}ni=1 , the multi-
variate Nadaraya-Watson estimator is given by
Pn
Kh (x − xi ) yi
m(x) = Pi=1
n , (4.19)
i=1 Kh (x − xi )
b

where Kh (·) denotes a multivariate kernel function with bandwidth vector


h = (h1 , . . . , hd )> . Similar results for the asymptotic bias and the asymptotic
variance hold, Härdle et al. (2004).

4.3 Local polynomial smoothing

Another view on the Nadaraya-Watson estimator can be taken by noting that


it can be written as the minimizer of (returning to the univariate case)
n
X
m(x)
b = min (yi − m)2 Kh (x − xi ) . (4.20)
m∈R
i=1

Computing the normal equations of (4.20) leads to (4.13) as solution for


m. This reveals that Nadaraya-Watson is a special case of fitting a constant
in the local neighborhood of x. In local polynomial smoothing this idea is
generalized to fitting locally a polynomial of order p.
Assume that the regression function is continuous up to order p. By ex-
panding equation (4.10) in a Taylor series, we obtain
4.3 Local polynomial smoothing 107

1 (p)
m(ξ) ≈ m(x) + m0 (x)(x − ξ) + . . . + m (x)(x − ξ)p (4.21)
p!
for ξ in the neighborhood of x. Again we include the neighborhood of x via
kernel weights. Thus, an estimator of m(x) can be formulated in terms of the
quadratic minimization problem
Xn  2
p
min
p+1
yi − β0 − β1 (x − xi ) − . . . − βp (x − xi ) Kh (x − xi ) , (4.22)
β∈R
i=1

where β = (β0 , . . . , βp )> denotes the vector of coefficients. Obviously the


result of this minimization problem is a weighted least squares estimator with
weights Kh (xi − x).
We introduce the following matrix notation:
1 x − x1 (x − x1 )2 · · · (x − x1 )p
 
 1 x − x2 (x − x2 )2 · · · (x − x2 )p 
X=.  , (4.23)
 
.. .. .. ..
 .. . . . . 
1 x − xn (x − xn )2 · · · (x − xn )p

and y = (y1 , . . . , yn )> , and finally


 
Kh (x − x1 ) 0 ··· 0
 0 K h (x − x2 ···
) 0 
W=  . (4.24)
 
.. .. .. ..
 . . . . 
0 0 · · · Kh (x − xn )

Then we can write the solution of (4.22) in the usual least squares formu-
lation as
β(x)
b = (X> WX)−1 X> Wy . (4.25)

Note that this estimator – unlike in the common parametric minimiza-


tion schemes – varies in x, and therefore must be repeated for any x. This
is highlighted by the notation β(x).
b The local polynomial estimator for the
regression function is given by

m(x)
b = βb0 (x) , (4.26)

by comparison of (4.21) and (4.22). From Equation (4.25), writing the esti-
mator as a local average of the response function is obvious.
Practice requires the choice of p. From the asymptotic behavior it is known
that polynomials with odd degrees are to be preferred to those with even ones,
i.e. the order one polynomial outperforms the order zero polynomial, the order
three polynomial the order two polynomial etc. A case used particularly often
is the local linear estimator with p = 1. It has been studied extensively by
Fan (1992, 1993) and Fan and Gijbels (1992).
108 4 Smoothing techniques

For the local linear estimator the asymptotic variance is identical to that
stated in (4.18) for the Nadaraya-Watson estimator. The asymptotic bias takes
the form:
h2
Bias{m(x)}
b = µ2 (K) m00 (x) + O(h2 ) . (4.27)
2
Comparing (4.27) with (4.17), uncovers a remarkable difference: the bias
does not depend on the densities, i.e. it is said to be design adaptive, Fan
(1992). Moreover the bias vanishes, when m is linear. Thus local linear es-
timation can be superior to Nadaraya-Watson smoothing when the design
becomes sparse as is typically the case for the IVS data. Another advantage
of the local linear estimator is that its bias and variance are of the same order
in magnitude in both the interior and the boundary of fx . In practice, this
may improve the behavior of the estimate near the boundary of the design.
An important byproduct of local polynomial estimators is that they pro-
vide an easy and efficient way for computing derivatives up to order (p + 1)
of the regression function. For instance, the jth order derivative of m, m(j) ,
is given by
mb (j) (x) = j! βbj (x) . (4.28)

For the Rd -variate extension to (4.22), one proceeds similarly. For instance,
for the local linear estimator we have
n 
X 2
>
min yi − β0 − β 1 (x − xi ) Kh (x − xi ) , (4.29)
β0 ,β 1 ∈Rd
i=1

and a representation of the kind of (4.25) is obtained by introducing a suitable


matrix notation, Härdle et al. (2004). For results on the asymptotic variance
and bias see Ruppert and Wand (1994). Derivatives and cross-derivatives are
obtained in the same way, i.e. by differentiating the local polynomial in β and
by picking the appropriate vector entries.

4.4 Bandwidth selection

4.4.1 Theoretical framework

The crucial task in nonparametric smoothing is the bandwidth choice. This


involves trading off the bias and the variance of the estimate. Typically, this
is done by considering L2 -measures of distance between the estimate and
the true regression curve. For the more detailed treatment of the following
statements see Härdle (1990).
A way of balancing bias and variance in a pointwise sense is to minimize
the mean squared error (MSE). The MSE is defined by:
4.4 Bandwidth selection 109
def
MSE{m(x)}
b = E[{m(x)
b − m(x)}2 ] , (4.30)

which can be written as


2
MSE{m(x)}
b = Var{m(x)}
b + [Bias{m(x)}]
b , (4.31)

where Bias{m(x)}
b = E{m(x)
b − m(x)}.
Denote by AMSE the asymptotic MSE, which is obtained by ignoring all
lower order terms in expressions like (4.17) and (4.18). This shows for the case
of the Nadaraya-Watson estimates (as most other nonparametric estimates)
that
1
AMSE(h) = c1 + h4 c2 , (4.32)
nh
where c1 and c2 are constant. Minimizing with respect to h yields that

h ∝ n−1/5 . (4.33)

This, however, is of little use in practice, since the constants depend on


unknown quantities like σ 2 (x) or m00 (x). Moreover, since the MSE is calculated
for a specific point x only, it is a local measure. To reduce the dimensionality
problem of optimizing h, one usually considers global measures.
A number of global measures can be defined. A typical choice is the inte-
grated squared error (ISE):
Z
def
ISE(h) = {m(x)b − m(x)}2 w(x)f
e x (x) dx , (4.34)

where w(·)
e is some weight function. It may be employed to assign less weight
to regions where the data are sparse. A discrete approximation to the ISE is
the average squared error (ASE):
n
def 1 X
ASE(h) = b i ) − m(xi )}2 w(x
{m(x e i) . (4.35)
n i=1

Both the ISE and the ASE are random variables. Taking the expectation
of the ISE, yields the mean integrated squared error (MISE)
def
MISE(h) = E{ISE(h)} , (4.36)

which is not a random variable. One may also take the expected value of the
ASE, which yields the mean average squared error (MASE). We use a weighted
version of the MASE for model selection in the semiparametric factor model,
see Secion 5.4.3.
For the Nadaraya-Watson estimator it has been shown by Marron and
Härdle (1986, Theorem 3.4) that under mild conditions the ISE, ASE, and
MISE are asymptotically equivalent in the sense that
110 4 Smoothing techniques

sup |ASE(h) − MISE(h)|/MISE(h) −→ 0 a.s. , (4.37)


h

and
sup |ISE(h) − MISE(h)|/MISE(h) −→ 0 a.s. , (4.38)
h

for h from a closed set.


Still, we face the problem that these distance measures are not imme-
diately computationally feasible in practice, since they depend on unknown
quantities. However, the last (asymptotic) equivalence results open the way
out by allowing to focus on the numerically most convenient of the three cri-
teria – the ASE – and to suitably replace or to estimate the unknowns. This
leads to cross validation and penalizing techniques as methods for the band-
widths choice. Both are based on a bias-corrected version of the resubstitution
estimate of the prediction error:
n
def 1 X
p(h) = b i )}2 w(x
{yi − m(x e i) . (4.39)
n i=1

The cross validation function is defined by


n
def 1 X
CV (h) = b −j (xi )}2 w(x
{yi − m e i) , (4.40)
n i=1

where mb −j denotes the leave-one-out estimator of the regression function,


in which the jth observation is left out. For instance, for the case of the
Nadaraya-Watson estimator, the leave-one-out estimator is given by
Pn
i6=j Kh (xj − xi ) yi
mb −j (x) = Pn . (4.41)
i6=j Kh (xj − xi )

In penalizing approaches one employs a weighted version of the resubsti-


tution estimate:  
1
G(h) = p(h) Ξ wi,n (xi ) (4.42)
n
with the correction function Ξ(·). It is required to have the first order Taylor
expansion
Ξ(u) = 1 + 2u + O(u2 ), u → 0 . (4.43)

Typical choices of Ξ(·) are the Akaike information criterion:

ΞAIC (u) = exp(2u) , (4.44)

and the generalized cross validation selector:

ΞGCV (u) = (1 − u)−2 . (4.45)


4.4 Bandwidth selection 111

For the generalized cross validation selector, we have CV (h) = G(h) with
ΞGCV . For other asymptotically equivalent choices of Ξ(·) see Härdle et al.
(2004).
If we denote by b
h the minimizer of G(h) and by bh∗ the minimizer of ASE,
then for n ↑ ∞
ASE(bh) p h p
b
−→ 1 and −→ 1 . (4.46)
ASE(b h∗ ) h∗
b

Hence, independent of the specific choice of Ξ(·), the penalizing approach


is asymptotically equivalent to the bandwidth obtained by minimizing the
ASE.

4.4.2 Bandwidth choice in practice

Here, we give a short empirical demonstration of the estimators and explore


the consequences of the bandwidth choice. For this application, we use option
data from the dates 20010102 and 20010202. We explain in the Appendix A
that call and put IVs can fall apart, when for the inversion of the BS formula,
futures prices are used that are simply discounted. This is due to dividend
effects and tax distortions, Hafner and Wallmeier (2001). It is best observed
in late spring and early summer during the dividend season of the DAX index
companies. To resolve this issue one applies a correction scheme, Appendix A.
In this section, we do not use the ‘corrected data’, since the least squares
kernel estimator to be presented in the following employs a weight function
that achieves this correction automatically by downweighing ITM options,
which are most sensitive to the dividend wedge. Moreover, for the January
and February data we use here this effect is hardly present. This implies that
we can use the simple moneyness measure

def K
κ = , (4.47)
St
where St = Ft e−rτ , since δ ≈ 0.
In Table 4.1, we give an overview of the data employed. We prefer to
present the summary statistics in form of the IV data obtained by inverting
the BS formula separately for each observation rather than in form of the
option price data itself. The corresponding option prices will be displayed
later in the context of the least squares kernel estimator, see the top panel
of Figure 4.7. For the distribution of the data across moneyness compare
Figure 4.1, which presents density plots of moneyness for calls, puts, and all
the observations observed on 20010102 for 17 days to expiry. The densities are
obtained via a nonparametric density estimator, and bandwidths are chosen
by Silverman’s rule of thumb. Silverman’s rule of thumb is a particular way
to choose bandwidths in nonparametric density estimation, see Härdle et al.
112 4 Smoothing techniques

10
5
0

0.9 1 1.1 1.2


Moneyness

Fig. 4.1. Nonparametrically estimated densities of observed moneyness κ = K/St


for 20010102, and options with 17 days to expiry. Solid line for all observations,
thickly dashed line for puts, and the more thinly dashed line for calls only. Quartic
kernel used, bandwidth chosen according to Silverman’s rule of thumb.

Observation Time to min max mean standard total number of


date expiry (days) deviation observations calls
20010102 17 0.1711 0.3796 0.2450 0.0190 1219 561
45 0.2112 0.2839 0.2425 0.0169 267 134
73 0.1951 0.3190 0.2497 0.0199 391 209
164 0.1777 0.3169 0.2528 0.0229 178 76
20010202 14 0.1199 0.4615 0.1730 0.0211 1560 813
42 0.1604 0.2858 0.1855 0.0188 715 329
77 0.1628 0.2208 0.1910 0.0172 128 45
133 0.1645 0.2457 0.1954 0.0221 119 63

Table 4.1. IV data as obtained by inverting the BS formula separately for each
observation in the sense of two-step estimators.
4.4 Bandwidth selection 113

(2004) for details. Put and call densities appear shifted. This is due to the
higher liquidity of ATM and OTM options. For the sake of space, we do not
present the very similar plots for the other expiry dates and 20010202.
For our smile fits, we pick the options nearest to expiry from the 20010102
data. We start using the Nadaraya-Watson estimator for different bandwidths
to demonstrate the tradeoff between bias and variance. The top left estimate
for the bandwidth h = 0.005 in Figure 4.2 is clearly undersmoothed: the
estimate is very rough especially in the far OTM regions and has spikes. Since
the smile in the ATM region looks already quite reasonable, one solution is to
employ local bandwidths h(x) that vary in x. In this case bandwidths should be
an increasing function in either direction from ATM. Alternatively one may
increase the global bandwidth: the estimate obtained for h = 0.01 appears
already smoother, but still has some ‘whiggles’. Increasing the bandwidth
further to h = 0.05 yields the smooth smile function seen in the lower left panel
in Figure 4.2. However, the function appears already slightly biased, since in
the wings of the smile the estimated function tends to lie systematically below
the IV observations. This becomes more obvious for the large and extremely
oversmoothing bandwidth h = 0.1, Figure 4.2 lower right panel. The reason
for this behavior of the Nadaraya-Watson estimator is that the number of
observations become smaller and smaller the farther we move into the wings
of the smile. Thus, within the local window of averaging, the estimate will be
strongly influenced by the mass of the observations which have a lower IV.
Next, we run an Akaike penalizing approach for the bandwidth choice. In
the top panel of Figure 4.3, we display the penalized objective function. It
is a convex function that takes its minimum in the neighborhood of 0.0285,
for which we display the estimate in the lower panel. It appears to provide a
reasonable fit to the data.
The exercise can be repeated for the local linear estimator. The results are
displayed in Figure 4.4. Typically the bandwidths for the local polynomial
estimator need to be bigger than for the Nadaraya-Watson estimator. This
is seen in the upper left plot of the figure. Here, the bandwidth in the wings
of the smile is too small to yield a reasonable estimate. For the bigger band-
widths better estimates are obtained. Note that the bias problem visible for
the Nadaraya-Watson estimator is less present in local linear smoothing: even
for the biggest bandwidth from our set, 0.1, we receive a reasonable result.
This is because even for larger intervals, the IV smile can be reasonably well
fitted by piecewise linear splines. The bandwidth needs to be increased much
stronger to produce an estimate similar to that in the lower right panel of
Figure 4.2. Given the typical parabolic shape of the smile function, this effect
is even more striking for local quadratic fits.
For precisely this reason we prefer local polynomial smoothing in smile
modeling: for the functional shapes that are usually encountered in smile
modeling, the local polynomial estimates appear to be relatively robust against
oversmoothing. This facilitates bandwidth choice enormously for two reasons:
114 4 Smoothing techniques

Smile estimation: h = 0.005 Smile estimation: h = 0.01

0.35

0.35
0.3

0.3
0.25

0.25
0.2

0.2
0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2
Moneyness Moneyness

Smile estimation: h = 0.05 Smile estimation: h = 0.1


0.35

0.35
0.3

0.3
0.25

0.25
0.2

0.2

0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2


Moneyness Moneyness

Fig. 4.2. Smile function obtained via Nadaraya-Watson smoothing for various band-
widths h. From top left to lower right h is: 0.005, 0.01, 0.05, 0.1 .

first, the data in the outer wings of the smile can become very sparse. Thus,
if a global bandwidth is to be used, it is likely that the smile needs to be
oversmoothed. Second, from the perspective of computing daily estimates of
the smile in a large sample as ours, it can be justified to employ one single
and potentially slightly oversmoothing bandwidth for all estimates without
minimizing the penalized resubstitution estimate again and again.
For estimates of the entire IVS, in principle one could proceed similarly.
The empirical difficulty, however, seems to be that both cross validation and
penalizing approaches tend to yield unsatisfactory results due to the intricate
design of the IV data in the time to maturity direction: while the bandwidth
optimization in moneyness direction poses no difficulty, adding the time to
maturity dimension leads to convexity problems in the penalized function
and consequently to unreasonable minimizers, such as boundary solutions.
4.4 Bandwidth selection 115

Penalizing Approach: hopt = 0.026

0.15
Y*E-3

0.1
0.05

0 0.05 0.1 0.15 0.2


bandwidths
0.35
0.3
0.25
0.2

0.9 1 1.1 1.2


Moneyness

Fig. 4.3. The top panel displays the penalized resubstitution estimate of the
Nadaraya-Watson estimator. Penalizing function is the Akaike function (4.44). The
lower panel shows the smile function obtained for the optimal bandwidth h = 0.028.
116 4 Smoothing techniques

Smile estimation: h = 0.005 Smile estimation: h = 0.01

0.35
0.35

0.3
0.3

0.25
0.25
0.2

0.2
0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2
Moneyness Moneyness

Smile estimation: h = 0.05 Smile estimation: h = 0.1


0.35

0.35
0.3

0.3
0.25

0.25
0.2

0.2

0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2


Moneyness Moneyness

Fig. 4.4. Smile function obtained via local linear smoothing for various bandwidths
h. From top left to lower right h is: 0.005, 0.01, 0.05, 0.1 .

This phenomenon has been first discussed by Fengler et al. (2003b), see also
Fengler et al. (2003a).
The practical solution we adopt in most cases, where we use global band-
widths, such as in the CPC analysis, is the following: we run the aforemen-
tioned minimization only across moneyness in each of a number of daily
samples. Next, we inspect the minimizers and the bias over a wide range
of bandwidths. Typically the conclusions are similar, and we use slightly over-
smoothing, but fixed bandwidths for all estimates. This approach is justified
by the fact that in the time to maturity direction one is more interested in
interpolation rather than in smoothing. In the semiparametric factor model,
where visible inspection is not directly possible, we propose a weighted Akaike
penalization that explicitly takes into account the sparseness of the data. This
is explained in Section 5.4.
4.4 Bandwidth selection 117

0.38

0.34

0.30

0.25

0.21
0.50
0.41
0.32
0.72 0.23
0.89 0.14
1.06
1.23
1.40

0.38

0.34

0.30

0.25

0.21
0.50
0.41
0.32
0.72 0.23
0.89 0.14
1.06
1.23
1.40

Fig. 4.5. IVS estimation via Nadaraya-Watson (top panel) and local linear smooth-
ing (lower panel). Global bandwidth h1 = 0.04 in moneyness and h2 = 0.3 in time
to maturity direction.
118 4 Smoothing techniques

IVS IVS: first order moneyness derivative

0.50 0.03

0.44 -0.24

0.38 -0.52

0.32 -0.79

0.26 -1.06

0.56 0.65 0.75 0.65


0.71 0.53 0.82 0.53
0.87 0.41 0.89 0.41
0.29 0.29
1.02 0.17 0.96 0.17
1.18 1.03

IVS: first order time to mat. derivative IVS: second order moneyness derivative

0.15 5.63

-0.26 4.56

-0.66 3.48

-1.07 2.41

-1.47 1.33

0.75 0.65 0.75 0.65


0.82 0.53 0.82 0.53
0.89 0.41 0.89 0.41
0.29 0.29
0.96 0.17 0.96 0.17
1.03 1.03

Fig. 4.6. IVS derivative estimation via local polynomial estimation of order two.
From upper left to lower right, the plots show the IVS on 20000502, the first order
moneyness derivative, the first order time to maturity derivative, and the second
order moneyness derivative. Bandwidths are localized. The corresponding LVS plot
was given in Figure 3.1.
4.5 Least squares kernel smoothing 119

In Figure 4.5, we present two surfaces, a Nadaraya-Watson and a local


linear estimate, for the data of 20010102. It is again visible that the local
linear estimator captures better the smile form, especially for IVs near to
expiry.
Before concluding let us revisit the estimate of the LVS in Section 3.1
recovered from the IV counterpart of the Dupire formula, Equation (3.39).
This requires estimating the first order derivatives of the IVS with respect to
moneyness and time to maturity as well as the second order derivative with
respect to moneyness. This can be achieved by local polynomial smoothing
of order p ≥ 2. Here, we employ an order two polynomial. Moreover, in order
to achieve an exact fit of the data we use local bandwidths h(x). This can be
achieved by employing a smooth function h(x) that approximates the global
bandwidths that have been obtained from separate cross validations for the
short and long time to maturity data. A more sophisticated method is an em-
pirical bias-bandwidth selection procedure, Ruppert (1997). Local bandwidths
allow for better capturing the gap between the short and the long term time
to maturity strings. In Figure 4.6, we present the derivatives together with
the regression function itself, from which – via the moneyness representation
of the Dupire formula (3.39) – the LVS is recovered.

4.5 Least squares kernel smoothing

4.5.1 The LSK estimator of the IVS

In this section, we propose a special smoother designed for estimating the IVS.
It is a one-step procedure based on a least squares kernel (LSK) estimator that
smoothes IV in the space of option prices. There is no need for first inverting
the BS formula to recover IV observations – the observed option prices are the
input parameters required. The LSK estimator is a special case of a general
class of estimators, the so called kernel M-estimators, that has been introduced
by Gouriéroux et al. (1994). Gouriéroux et al. (1995) employ this estimator
to model and predict stochastic IV.
Since we aim at estimating on a moneyness metric, we rewrite the BS
formula for calls (2.23) in terms of moneyness as follows, Gouriéroux et al.
(1995):
C BS (St , t, K, T, σ, r, δ) = St cBS (κt , τ, σ, r, δ) , (4.48)
def − ln κ +(r+ 1 σ 2 )τ
where cBS (κt , τ, σ, r, δ) = Φ(d1 ) − κt e−rτ Φ(d2 ), and d1 = t
√ 2
σ τ
,

d2 = d1 − σ τ as before. We recall that throughout this section we work with
the simple moneyness measure

def K
κt = . (4.49)
St
120 4 Smoothing techniques

We add a subscript t in order to highlight the time dependence. For simplicity,


we shall also assume zero dividends. Inserting a constant dividend yield would
be a simple extension.
The LSK estimator for the IVS is defined by:
n
X 2
cti − cBS (·, σ)

σ
b(κt , τ ) = arg min e
σ
i=1
   
κt − κti τ − τi
× w(κti ) K(1) K(2) .(4.50)
h1 h2
def
Here, the observed call prices are normalized by the asset price, i.e. e ct =
Cet /St , for i = 1, . . . , n. Otherwise notation stays as introduced in Section 2.4.
K(1) (·) and K(2) (·) are univariate kernel functions, and w(·) denotes a uni-
formly continuous and bounded weight function, which allows for differential
weights of observed option prices. This weight function is useful in the follow-
ing respect: it is usually argued that ITM options contain a liquidity premium
and should be incorporated to a lesser extent into the IV estimate, or even
excluded, Aı̈t-Sahalia and Lo (1998) and Skiadopoulos et al. (1999). This goal
can be achieved in using an appropriate weight function w(κ). In Section 4.5.2,
we will discuss plausible choices of the weight functions.

We make the following assumptions:

(A1) The moneyness of the option prices is iid , and Eκ4t < ∞.
(A2) The weight function w(·) is uniformly continuous and bounded.
(A3) K(1) (·) and K(2) (·) are bounded probability density kernel functions
with bounded support.
(A4) Interest rate r is a fixed constant.

Assumption (A1) is a very weak assumption. It can very well be consid-


ered to hold in practice, since by the institutional arrangements at futures
exchanges, options at new strikes are always launched in the neighborhood of
St . The proof relies on E(e c4t |Ft ) = E{(C
et /St )4 |Ft } < ∞. However, since St
is measureable with respect to Ft and by simple no-arbitrage considerations,
we have 0 ≤ Et (C et |Ft ) ≤ St . So this condition is implied by option pricing
theory.
Assumption (A2) is very common, and some important weight function
satisfy it. In Section 4.5.2 we will discuss possible choices of w(·). (A3) is
a condition met by a lot of kernels used in nonparametric regression, such
as the quartic or the Epanechnikov kernel functions, Section 4.2.1. (A4) is
an assumption often used in the option pricing literature including the BS
model. It is generally justified by the empirical observation that asset pricing
variability largely outweighs the changes of the interest rates. Nevertheless,
4.5 Least squares kernel smoothing 121

the impact from changing interest rates can be substantial for options with a
very long time to maturity.
Given assumptions (A1) to (A4), we have:

Consistency. Let σ(κt , τ ) be the solution of


ct1 − cBS (κt , τ, r1 , σ)}w(κt )|Ft ] = 0. If conditions (A1), (A2), (A3) and
E[{e
(A4) are satisfied, then
p
b(κt , τ ) −→ σ(κt , τ )
σ
as nh1,n h2,n → ∞.

The proof can be found in Gouriéroux et al. (1994) and is contained for the
sake of completeness in the version of Fengler and Wang (2003) in Appendix C.

For the next result, we introduce the notation:


def
Ai (κt , τ, r, σ) = e cti − cBS (κti , τi , ri , σ) ,
BS
def ∂c (·) ∂C BS (·) √
B(κt , τ, r, σ) = = St−1 = τ φ(d1 ) , (4.51)
∂σ ∂σ
2 BS
def ∂ c (·) ∂ 2 C BS (·) √ d1 d2
D(κt , τ, r, σ) = 2
= St−1 = τ φ(d1 ) , (4.52)
∂ σ ∂σ 2 σ
The quantities B and D are the ‘moneyness versions’ of the vega and the
volga, which have been introduced in Section 2.4.

Asymptotic normality. Under conditions (A1), (A2), (A3), and (A4) if


E{B 2 (κt , τ, r, σ)w(κt )|Ft } 6= E{A(κt , τ, r, σ)D(κt , τ, r, σ)w(κt )|Ft }, we have
L
σ (κt , τ ) − σ(κt , τ )} −→ N (0, γ −2 ν 2 ),
p
nh1,n h2,n {b

where
h
def
γ 2 = E{−B 2 (κt , τ, r, σ)w(κt )
i2
+ A(κt , τ, r, σ)D(κt , τ, r, σ)w(κt )|Ft } ft (κt , τ ) , (4.53)
def
ν 2 = E{A2 (κt , τ, r, σ)B 2 (κt , τ, r, σ)w2 (κt )|Ft }
Z
2 2
× K(1) (u)K(2) (v) dudv , (4.54)

and ft (κt , τ ) is the joint (time-t conditional) probability density function of


κt and τ respectively.

For the proof see Gouriéroux et al. (1994) and in Appendix C. Finally,
the results carry over to put options: By the put-call-parity and the bounded
122 4 Smoothing techniques

pay-off of put options, both results hold also for put options, with A replaced
correspondingly.
The asymptotic distribution depends intricately on the first and second
order derivatives, and the particular weight function. Nevertheless an approx-
imation is simple, since the first and second order derivatives have the ana-
lytical expressions given in Equations (4.51) and (4.52).

4.5.2 Application of the LSK estimator

For the choice of the weighting function, one may go back to the early lit-
erature on IV. In the vain of obtaining a good forecast of the asset price
variability, these studies discuss weighting the observations intensively, see
the discussion in Section 2.10. Schmalensee and Trippi (1978) and Whaley
(1982) argue in favor of unweighted averages, i.e. they use the scalar estimate
n
X
σ
b = arg min ei − C BS (·, σ)}2 ,
{C (4.55)
σ
i=1

as a predictor of the future stock price variability. Beckers (1981) minimizes


n
X n
X
σ
b = arg min ei − C BS (·, σ)}2 /
wi {C wi , (4.56)
σ
i=1 i=1

def
where wi = ∂Ci /∂σ is the option vega.
Similarly, Latané and Rendelman (1976) use the squared vega as weights:
v
u n n
uX X
σ
b=t wi2 σ
bi2 / wi . (4.57)
i=1 i=1

Finally, Chiras and Manaster (1978) propose to employ the elasticity with
respect to volatility:
Xn n
X

σ
b = ηi σ
bi / ηi , (4.58)
i=1 i=1
def
where ηi = ∂C i σ
∂σ Ci .
For calls and puts, vega is a Gaussian shaped function in the underlying
centered (roughly) ATM, compare Equation (4.51) and Figure 2.3. Elasticity is
a decreasing (increasing) function in the underlying for calls (puts). Common
concern of the weighting procedures is to give low weight to ITM options, and
highest weight to ATM or OTM options: ITM options are more expensive than
ATM and OTM options because their intrinsic value, i.e. the payoff function
evaluated at the current underlying prices, is already positive. Thus, they
provide lower leverage for speculation, and produce higher costs in portfolio
4.5 Least squares kernel smoothing 123

0.1
0.05
0

0.9 1 1.1 1.2


Moneyness

Smile, 20010102, 17 days to expiry


0.45
0.4
0.35
0.3
0.25
0.2
0.15

0.9 1 1.1 1.2


Moneyness

Fig. 4.7. Upper panel: Observed option price data on 20010102. From lower left
to upper right relative put prices, from upper left to lower right relative call prices.
Lower panel: LSK smoothed IV smile for 17 days to expiry on 20010102. Bandwidth
h1 = 0.025, quartic kernels employed. Minimization achieved by Golden section
search. Dotted lines are the 95% confidence intervals for σb. Single dots are IV data
obtained by inverting the BS formula separately for each observation in the sense of
two-step estimators.
124 4 Smoothing techniques

0.15
0.1
0.05
0

0.85 0.9 0.95 1 1.05 1.1 1.15


Moneyness

Smile, 20010202, 14 days to expiry


0.4
0.3
0.2
0.1

0.85 0.9 0.95 1 1.05 1.1 1.15


Moneyness

Fig. 4.8. Upper panel: Observed option price data on 20010202. From lower left
to upper right relative put prices, from upper left to lower right relative call prices.
Lower panel: LSK smoothed IV smile for 14 days to expiry on 20010202. Bandwidth
h1 = 0.015, quartic kernels employed. Minimization achieved by Golden section
search. Dotted lines are the 95% confidence intervals for σb. Single dots are IV data
obtained by inverting the BS formula separately for each observation in the sense of
two-step estimators.
4.5 Least squares kernel smoothing 125

hedging. Due to their lower trading volume, they are suspected to sell at a
liquidity premium from which biased estimates of IV may ensue. Consequently,
some authors delete or downweigh ITM options, Aı̈t-Sahalia and Lo (1998).
The LSK estimator is general enough to allow for uniformly continuous
and bounded weighting functions w(κ) depending on moneyness. Technically,
it is possible to use weights depending also on other variables including σ
as done in (4.56) to (4.58). For several reasons, however, we refrain from
using more involved weight functions: first, when ITM options are deleted or
downweighted in the more recent literature, this choice is entirely determined
by moneyness, not by the vega. From this point of view, to have the weighting
scheme depend on σ is rather implicit. Second, from a statistical point of view,
weights depending on σ are likely to blow up the asymptotic variances in form
of the derivatives of w. This complicates the estimation and the computation
of the confidence bands without adding to the problem of recovering a good
estimate of the IVS. Finally, if one likes weights looking like the option vega
or elasticity with respect to volatility, one may very easily construct weights
w(κ) that look very similar. For instance, an estimator in the type of Latané
and Rendelman (1976) would put w shaped as a Gaussian density.
For the IVS estimation in our particular application, we want to give less
weight to ITM options. This can be achieved by using as weighting functions:
1 n o
w(κ) = arctan α(1 − κ) + 0.5 , (4.59)
π
for calls, and for puts:
1 n o
w(κ) = arctan α(κ − 1) + 0.5 , (4.60)
π
where π = 3.141... is the circle constant. The parameter α controls the
speed, with which ITM options receive lower weight. ATM options are equally
weighted. Outside κ ≈ 1, only OTM options enter the minimization with sig-
nificant weight. In our application we choose α = 9. Other values are perfectly
possible, and this choice is motivated to have a gentle transition between OTM
call and OTM put options. The ultimate choice of α will depend on the specific
application at hand.
As kernel functions we employ the quartic kernels given in Equation (4.4).
Other bounded kernels can perfectly be used, such as the Epanechnikov kernel
as stated in (4.5). In practice, the choice of the kernel functions has little
impact on the estimates, Marron and Nolan (1988) and Härdle (1990). Since
the minimization is globally convex (compare the proof of consistency in the
appendix), and well posed as long as h1 and h2 do not become unreasonably
small, any minimization algorithm for globally convex objective functions can
be employed. We use the Golden section search, described for instance in
Press et al. (1993) and implemented in XploRe, Härdle et al. (2000b). The
tolerance, i.e. the fractional precision of the minimum, is fixed at 10−8 .
126 4 Smoothing techniques

IVS on Jan. 02, 2001

0.38

0.34

0.30

0.25

0.21
0.24
0.20
0.16
0.78 0.12
0.88 0.08
0.98
1.08
1.18

IVS on Feb. 02, 2001

0.46

0.39

0.32

0.26

0.19
0.27
0.22
0.17
0.75 0.13
0.84 0.08
0.93
1.02
1.11

Fig. 4.9. Top panel: IVS fit for 20010102; lower panel: IVS fit for 20010202, both
with the LSK smoother. In both panels, bandwidths are h1 = 0.03 in the moneyness
direction and h2 = 0.07 in the time to maturity direction. single dots denote IV data
obtained by inverting the BS formula separately for each observation in the sense of
two-step estimators. All observations are equally weighted.
4.6 Summary 127

We use the same data as already presented in Table 4.1. For the smile
estimation, we pick the options with the shortest time to expiry from the
20010102 and the 20010202 data. Plots are displayed in Figures 4.7 and 4.8.
The top panel shows the observed option prices given on the moneyness scale.
The function from the lower left to the upper right is the put price function,
the one from the upper left to the lower right the call price function. This is
at odds with the familiar ways of plotting these functions. The effect is due to
our definition of moneyness. The lower panels in Figures 4.7 and 4.8 present
the smile together with the asymptotic confidence bands. They fan out at the
wings of the smile since the data become increasingly sparse.
In Figure 4.9, fits for the entire IVS are presented. They appear under-
smoothed compared with Figure 4.5, since we used very small bandwidths in
the time to maturity direction. For these estimates, we do not employ the
weight functions (4.59) and (4.60): all observations are equally weighted.

4.6 Summary

In this chapter, we introduced smoothing techniques to estimate the IV smile


and the IVS. We considered Nadaraya-Watson, local polynomial and least
squares kernel smoothing and discussed the bandwidth choice.
In Nadaraya-Watson smoothing, one fits a local constant. This can have
disadvantageous effects: due to the unequal distribution of the IV data, this
may induce a bias in the wings of the smile function. In local polynomial
smoothing this effect is less present. Also for larger bandwidths the bias re-
mains small. Additionally, local polynomial smoothing allows for efficiently
estimating derivatives of the regression function. This feature is ideal for es-
timating the LVS.
Finally, we introduced an IVS estimator based on least squares kernel
smoothing. This estimator takes the option prices as input parameters, and
not IV data. Thus, in computing the asymptotic confidence bands, one directly
takes the nonlinear transformation of computing the IV into account.
5

Dimension-reduced modeling

5.1 Introduction
The IVS is a complex, high-dimensional random object. In building a model,
it is thus desirable to have a low-dimensional representation of the IVS. This
aim can be achieved by employing dimension reduction techniques. Generally
it is found that two or three factors with appealing financial interpretations
are sufficient to capture more than 90% of the IVS dynamics. This implies for
instance for a scenario analysis in risk-management that only a parsimonious
model needs to be implemented to study the vega-sensitivity of an option
portfolio, Fengler et al. (2002b). This section will give a general overview on
dimension reduction techniques in the context of IVS modeling. We will con-
sider techniques from multivariate statistics and methods from functional data
analysis. Sections 5.2 and 5.3 will provide an in-depth treatment of the CPC
and the semiparametric factor model of the IVS together with an extensive
empirical analysis of the German DAX index data.
In multivariate analysis, the most prominent technique for dimension re-
duction is principal component analysis (PCA). The idea is to seek linear com-
binations of the original observations, so called principal components (PCs)
that inherit as much information as possible from the original data. In PCA,
this means to look for standardized linear combinations with maximum vari-
ance. The approach appears to be sensible in an analysis of the IVS dynamics,
since a large variance separates out systematic from idiosyncratic shocks that
drive the surface. As a nice byproduct, the structure of the linear combinations
reveals relationships among the variables that are not apparent in the origi-
nal data. This helps understand the nature of the interdependence between
different regions in the IVS.
In finance, PCA is a well-established tool in the analysis of the term struc-
ture of interest rates, see Gouriéroux et al. (1997) or Rebonato (1998) for text-
book treatments: PCA is applied to a multiple time series of interest rates (or
forward rates) of various maturities that is recovered from the term structure
130 5 Dimension-reduced modeling

of interest rates. Typically, a small number of factors is found to represent


the dynamic variations of the term structure of interest rates. The studies of
Bliss (1997), Golub and Tilman (1997), Niffikeer et al. (2000), and Molgedey
and Galic (2001) are examples of this kind of literature.
This approach does not immediately carry over to the analysis of IVs due
to the surface structure. Consequently, in analogy to the interest rate case,
empirical work first analyzes the term structure of IVs of ATM options, only,
Zhu and Avellaneda (1997) and Fengler et al. (2002b). Alternatively, one smile
at one given maturity can be analyzed within the PCA framework, Alexander
(2001b). Skiadopoulos et al. (1999) group IVs into maturity buckets, average
the IVs of the options, whose maturities fall into them, and apply a PCA to
each bucket covariance matrix separately. A good overview of these methods
can be found in Alexander (2001a).
A surface perspective on IVS dynamics is adopted in Fengler et al. (2003b)
within a common principal component (CPC) framework for the IVS. The
approach is motivated by two salient features that characterize the IVS dy-
namics: first, the instantaneous profile of the IVS is subject to changes, but
most shocks tend to move it into the same direction. Second, the size of the
shocks decreases with the option’s maturity. This leads to high spatial corre-
lation between contemporaneous surface values, while at the same time the
‘volatility’ of IV is highest for the short maturity contracts. The insight from
these observations is that IVs of different maturity groups may obey a com-
mon eigenstructure. The CPC model exactly features this structure, since it
assumes that the space spanned by the eigenvectors of the covariance matri-
ces is identical across different groups, whereas the variances associated with
the components are allowed to vary. In order to mitigate the mixing effect of
IVs of different expiries, Fengler et al. (2003b) fit the daily IVS nonparamet-
rically and investigate a number of time to maturity slices. They show that
the dynamics of these slices can be generated by a small number of factors
from a lower dimensional space spanned by the eigenvectors of a common
transformation matrix.
Multivariate analysis is based on the idea that we observe a number of
random variables on a set of objects. The interest is to study the inherent
interdependence of these variables. To put the analysis of the IVS into this
framework, one recovers the IVS on a grid by applying some fitting algorithm,
e.g. as discussed in Section 4. The discrete ensemble of the observations at the
grid points is treated as the set of variables. As shall be seen, this approach
will yield a lot of insights into the nature of the IVS and its dynamics, and of
course, it is not against the nature of multivariate analysis. However, it may
be considered as being somewhat artificial, since the actual objects of interest
are functions rather than realizations of multivariate random variables. This
perspective is taken in functional data analysis.
In functional data analysis, we treat the observed IVS as a single entity –
as a function, though discretely sampled in practice – and not as a sequence
5.1 Introduction 131

of individual observations for a choice of time to maturities and moneyness.


The term ‘functional’ is derived from the intrinsic nature of the data, rather
than from their explicit form. In treating the data in this way, the techniques
of multivariate analysis can be generalized to the functional case. This leads
to a functional PCA (FPCA) of the IVS as proposed by Cont and da Fonseca
(2002) and Benko and Härdle (2004).
(C)PCA and also FPCA both require an estimate of the IVS. The challenge
is to obtain a good fit given the degenerated string structure of the IV data.
With the string structure, we recall the fact discussed already in Section 2.5
that in standardized markets only a very limited number of observations of
the IVS exist in the time to maturity direction. Unless carefully calibrated
to this structure, one may quickly obtain biased estimates. In nonparametric
estimation, this will be the case when the bandwidths are chosen too big.
This disadvantage is addressed in a new modeling approach by Fengler et al.
(2003a). They propose a dynamic semiparametric factor (SFM) model, which
approximates the IVS in a finite dimensional function space. The key feature
is that this model fits in the local neighborhood of the design points. The
approach can be considered as a combination of methods from FPCA and
backfitting techniques for additive models.
In practice, functional techniques often require a discretization of the func-
tional object that is to be estimated. Not rarely, one is back in the multivariate
framework again, which additionally bears the advantages of an easy imple-
mentation and cheap computation. Also, unlike functional data analysis, the
statistical properties of the techniques of multivariate analysis are usually well-
known. Nevertheless, a functional approach may be considered as being more
elegant. This is particularly obvious for the SFM, which delivers a biased-
reduced surface estimation, dimension reduction and dynamic modeling in a
single step.
The structure of the remaining parts of this chapter is as follows: Sec-
tion 5.2 introduces the CPC models of the IVS. Since PCA is a special case
of CPCA with one group only, we skip a separate presentation of PCA there.
For an introduction into PCA, we refer to classical textbooks in multivari-
ate statistics such as Mardia et al. (1992) or Härdle and Simar (2003). After
a motivation of CPC models, we present their theory. Next, we derive test
statistics to analyze the stability of the principal component transformation
of the IVS. An empirical analysis of DAX IVs between 1995 to May 2001
follows. Section 5.3 introduces into FPCA. Section 5.4 will be devoted to the
new class of SFMs. An exposition of the techniques will be given as well as
an extensive empirical analysis.
132 5 Dimension-reduced modeling

5.2 Common principal component analysis

5.2.1 The family of CPC models

PCA, as introduced into statistics by Pearson (1901) and Hotelling (1933) is a


dimension reduction technique for one group. In many applications, however,
the data fall into groups in which the same variables are measured. For exam-
ple, in a zoological application one measures the same characteristics across
different species, Airoldi and Flury (1988), or in an economic case study one
observes the same variables across different countries or markets. In an anal-
ysis of the IVS the data fall into maturity groups, as for a given observation
date a limited number of maturities are traded. This is visible from the black
dots in the plot of the IVS in Figure 1.1. In these situations, it is natural
to assume that the structure observed between groups is governed by one or
more common unobservable factors. The ‘degree of commonness’ between the
factors in each group may be of different nature. The CPC model and its
related methods, which were discovered by Flury (1988), allow for a thorough
analysis of the eigenstructure of the different groups.
For a graphical justification of CPC models of the IVS, observe Figures 5.1
and 5.2: in Figure 5.1, we present scatterplots of the IV returns of 22 days
and 90 days to expiry recovered for two fixed points of (forward) moneyness.
Due to the higher volatility of short term IV returns, the corresponding point
cloud is bigger compared with the one belonging to the data of the 90 days to
expiry. Together with the zero mean data, we present the principal axes and
the ellipse given by the Mahalanobis distance:
q
−1
x>i Σi xi = 2 , i = 1, 2 , (5.1)

where xi are the vectors that contain the log-differences of IVs, and Σi are
the sample covariance matrices. The ellipse (5.1) is an approximate 95% con-
fidence region for a zero mean multivariate normal distribution.
The striking observation is that the principal axes in both time to matu-
rity groups are almost similar. It is only the volatility of IV that is different.
A natural assumption therefore is to attribute the variability of the axes to
sampling variability, and otherwise to estimate principal axes jointly in both
groups under the constraint that they are equal: for the same data, the re-
sults are displayed in Figure 5.2. Now, principal axes in both cases are iden-
tical. We shall show and test that this also holds across the short-term IVS.
Consequently, via CPC methods, a significant reduction of dimension can be
achieved for the IVS dynamics.
def
Denote by Xi = (xi1 , . . . , xip ) ∈ Rp , i = 1, . . . , k the IV returns for k
maturity groups at p grid points in the IVS. The hypothesis for a CPC model
is written as:
HCP C : Ψi = ΓΛi Γ> , i = 1, . . . , k , (5.2)
5.2 Common principal component analysis 133

Scatterplot under PCA: 22 days to maturity Scatterplot under PCA: 90 days to maturity

0.2

0.2
0.15

0.15
0.1

0.1
0.05

0.05
{1.050} moneyness

{1.050} moneyness
0

0
-0.05

-0.05
-0.1

-0.1
-0.15

-0.15
-0.2

-0.2
-0.1 0 0.1 0.2 -0.1 0 0.1 0.2
{0.925} moneyness {0.925} moneyness

SCMcpcpca.xpl

Fig. 5.1. Scatterplots of IV returns of moneyness κf = 0.925 against κf = 1.050 for


the groups of 22 days and 90 days to expiry. IV returns computed as log-differences
from the IVS recovered on a fixed grid. The ellipse given by the Mahalanobis distance
is a 95% confidence region for a bivariate normal distribution. Principle axes of the
ellipses are the eigenvectors obtained by a separate PCA for each maturity group,
compare Figure 5.2.

where Ψ1 , . . . , Ψk are positive definite p × p population covariance matrices


of Xi . Further, Γ = (γ 1 , . . . , γ p ) denotes an orthogonal p × p matrix of eigen-
vectors and Λi = diag (λi1 , . . . , λip ) is the matrix of eigenvalues. The number
of parameters in the CPC model are p(p − 1)/2 for the orthogonal matrix Γ
plus kp for the eigenvalues in Λ1 , . . . , Λk .
def
The PCs Yi = (yi1 , . . . , yip ) are obtained by projecting Xi into the space
spanned by its eigenvectors, i.e. by computing Yi = Xi Γ. The variance of Yi
is
Var(Yi ) = Var(Xi Γ) = Γ> Var(Xi )Γ = Γ> ΓΛi Γ> Γ = Λi , (5.3)
since the eigenvectors are orthogonal. This confirms that PCs are uncorre-
lated and that the eigenvalues
Pp correspond to their variances. The sum of the
eigenvalues, i.e. tr Λi = j λij , is the total variance in the sample. If a small
number of our p-variate PCs, say three of them, capture a large portion of the
total variance, a considerable reduction of dimension is achieved. For we can
write for each group i:
Xi ≈ Yi Γe> , (5.4)
where Γe = (γ , γ , γ ). We say that the three-dimensional factor series unfolds
1 2 3
the full set of IV returns in group i. Instead of studying a p-variate factor
series, we inspect only three of them in each group. For a model in risk-
management or trading, this low-dimensional series can deliver a sufficiently
134 5 Dimension-reduced modeling

Scatterplot under CPC: 22 days to maturity Scatterplot under CPC: 90 days to maturity

0.2

0.2
0.15

0.15
0.1

0.1
0.05

0.05
{1.050} moneyness

{1.050} moneyness
0

0
-0.05

-0.05
-0.1

-0.1
-0.15

-0.15
-0.2

-0.2
-0.1 0 0.1 0.2 -0.1 0 0.1 0.2
{0.925} moneyness {0.925} moneyness

SCMcpccpc.xpl

Fig. 5.2. Scatterplots of IV returns of moneyness κf = 0.925 against κf = 1.050 for


the groups of 22 days and 90 days to expiry. IV returns computed as log-differences
from the IVS recovered on a fixed grid. The ellipse given by the Mahalanobis distance
is a 95% confidence region for a bivariate normal distribution. Principle axes of the
ellipses are the eigenvectors obtained by the CPC model for both maturity groups, i.e.
eigenvectors are estimated under the restriction to be identical, compare Figure 5.1.

good description of the IVS dynamics. Furthermore, since the data in our
IVS groups are very much correlated, the factor series may be regarded as
scaled versions of each other. Thus, we can reduce our attention to study one
maturity group only. In total, instead of modeling kp factor series, we end up
with modeling three of them. These considerations demonstrate the usefulness
of dimension reduction techniques.
A particular strength of CPC models is that they enclose a whole family
of models with varying degrees of flexibility in the eigenstructure. The pro-
portional model puts additional constraints on the matrix of eigenvalues Λi
by imposing that λij = ρi λ1j , where ρi > 0 are unknown constants. This is
equivalent to writing:

Hprop : Ψi = ρi Ψ1 , i = 2, . . . , k . (5.5)

The number of parameters here are p(p + 1)/2 + (k − 1). For the IVS this
means that the variances of the common components between the groups are
proportionally scaled versions of each other. In terms of modeling the IVS,
this implies that one needs to resort to one maturity group only, once the
scaling constants ρi are estimated.
In letting the eigenvalues unrestricted as in the CPC hypothesis, one can
also ease the restrictions on the transformation matrix Γ: this leads to partial
5.2 Common principal component analysis 135

higher model lower model degrees of freedom


equality proportionality k−1
proportionality CPC (p − 1)(k − 1)
1
CPC pCPC(q) (1 ≤ q ≤ p − 2) 2
(k − 1)(p − q)(p − q − 1)
pCPC(1) arbitrary covariance matrices (p − 1)(k − 1)

Table 5.1. The table presents the hierarchy of nested CPC models. From top to
bottom restrictions on the estimated population covariance matrices are eased. Se-
quentially, starting from top, each model is tested against the next lower one in the
hierarchy. The degrees of freedom of the corresponding χ2 test as given in column
(3) are obtained by subtracting the number of parameters to be estimated in each
model, compare Flury (1988), p. 151, and Fengler et al. (2003b). After arriving at
the CPC hypothesis, one tests the CPC against the pCPC(p − 2) model. Next, the
pCPC(p − 2) model is tested against the pCPC(p − 3) model, and so on, down to the
pCPC(1) model which is finally tested against the hypothesis of arbitrary covariance
matrices.

CPC models, pCPC(q), where q denotes the order of common eigenvectors in


Γ. This is appropriate under the assumption that each maturity group is hit
by q joint shocks, and by (p − q) shocks differing among the groups. Formally,
the hypothesis of the pCPC(q) model is
>
HpCP C : Ψi = Γ(i) Λi Γ(i) , i = 1, . . . , k, (5.6)
 
(i)
where Λi is as in (5.2) and Γ(i) = Γc , Γs . Here, the p × q matrix Γc
(i)
contains the q common eigenvectors, while Γs of dimension p × (p − q) holds
the p − q group specific eigenvectors. The Γ(i) are still orthogonal matrices.
This implies that the necessary dimension to estimate a pCPC(1) model is
p = 3. When all possible pCPC(q) are to be estimated sequentially moving
from the pCPC(p − 2) down to the pCPC(1) model, it is left to the modeling
approach in which order the constraints on γ j are relaxed. A natural way to
proceed is to allow in each step for group specific eigenvectors in the ‘least
important’ case, where importance is measured in terms of the size of the
corresponding eigenvalue. The total number of parameters amount to p(p −
1)/2 + kp + (k − 1)(p − q)(p − q − 1)/2.
CPC and pCPC(q) models can be ordered in a hierarchical fashion, which
allows a detailed analysis of the involved covariance matrices of different ma-
turity groups. The highest level of similarity would be to assume equality
between covariance matrices of different maturity groups Ψi . In this case the
number of parameters to be estimated are p(p + 1)/2, and one may obtain
the parameters by one single PCA applied to one pooled sample covariance
matrix of all k groups. The models, which relax the restrictions subsequently,
are the proportional model and the CPC model itself. The following levels
in the hierarchy are given by the pCPC(q) models starting from q = p − 2
136 5 Dimension-reduced modeling

and stepping down to q = 1. The relations between different groups disappear


subsequently, until at the last level the Ψi do not share any common eigen-
structure. As all these different models are nested, one can decompose the
total χ2 statistic and test one model against a more flexible one in a step-up
procedure. Table 5.1 displays these sequential tests. By this summation prop-
erty, a test against any lower model is given by adding up the χ2 test statistics
and the degrees of freedom between the two models under comparison, Flury
(1988). Additionally, we present Akaike and Schwarz information criteria for
model selection.

5.2.2 Estimating common eigenstructures

Here, we focus on the ordinary CPC model given in (5.2) due to its practical
importance and its similarity with the proportional and the pCPC models.
For the theory on the other models we refer to Flury (1988).
def
In abuse of notation, let Xi = (xi1 , . . . , xip ), i = 1, . . . , k, be the (ni × p)
matrices of IV returns sampled from k underlying p -variate normal distri-
butions N (µ, Ψi ). As stated earlier, Ψi denotes the population covariance
matrix. In our view the sample is recovered from a grid of size (k × p) ob-
tained by smoothing the IVS as discussed in the previous chapter. Let Σi be
the (unbiased) sample covariance matrix of the returns of IV. In our applica-
tions, we derive returns as first order log-differences of IVs. The sample size
is ni > p for i = 1, . . . , k.
Applying to general results from multivariate analysis, Härdle and Simar
(2003), under the assumption of normality, the distribution of Σi is a general-
ization of the chi-squared variate, the Wishart distribution with scale matrix
Ψi and (ni − 1) degrees of freedom. It is denoted by:
ni Σi ∼ Wp (Ψi , ni − 1) .

The pdf of the Wishart distribution is:


 p (n−1)/2
1 n−1
f (Σ) =
Γp ( n−1
2 )|Ψ|
(n−1)/2 2
 
n − 1 −1
× exp tr − Ψ Σ |Σ|(n−p−2)/2 (5.7)
2
for Σ positive definite, and zero otherwise, Evans et al. (2000).
p  
def
Y 1
Γp (u) = π p (p−1)/4 Γ u − (j − 1) (5.8)
j=1
2

def
Rdenotes
∞ −s t−1
the multivariate Gamma function, where π = 3.141..., and Γ (t) =
0
e s ds is the univariate Gamma function.
5.2 Common principal component analysis 137

For the k Wishart matrices Σi the likelihood function is given by


k   
Y 1 −1 − 1 (n −1)
L (Ψ1 , . . . , Ψk ) = c exp tr − (ni − 1)Ψi Σi |Ψi | 2 i , (5.9)
i=1
2

where c is a constant not depending on the parameters. Maximizing the like-


lihood is equivalent to minimizing the function
k
X n o
g(Γ, Ψ1 , . . . , Ψk ) = (ni − 1) ln |Ψi | + tr(Ψ−1
i Σi ) . (5.10)
i=1

Assuming that HCP C in equation (5.2) holds, yields


p
k
!
X X γ>
j Σi γ j
g(Γ, Λ1 , . . . , Λk ) = (ni − 1) ln λij + . (5.11)
i=1 j=1
λij

We impose the orthogonality constraints of Γ by introducing the Lagrange


multiplyers µj for the p constraints γ >
j γ j = 1, and the Lagrange multiplyers
>
µhj for the p(p − 1)/2 constraints γ h γ j = 0 (h 6= j). Hence the Lagrange
function to be minimized is given by
p
X p
X
g ∗ (Γ, Λ1 , . . . , Λk ) = g(·) − µj (γ >
j γ j − 1) − 2 µhj γ >
h γj . (5.12)
j=1 h<j

Taking partial derivatives with respect to all λij and γ j , it can be shown
that the solution of the CPC model can be written as the generalized system
of characteristic equations:
( k )
>
X λim − λij
γm (ni − 1) Σi γ j = 0, m, j = 1, . . . , p, m 6= j .
i=1
λim λij
(5.13)
This is solved observing

λim = γ >
m Σi γ m , i = 1, . . . , k, m = 1, . . . , p , (5.14)

and the constraints:


(
0 m 6= j
γ>
mγj = . (5.15)
1 m=j

If k = 1, the one-group case, it is quickly seen that (5.13) to (5.15) collapse


to the usual system of equations for an eigenvalue problem of Σ; this leads to
an ordinary PCA.
138 5 Dimension-reduced modeling

Flury (1988) proves existence and uniqueness of the maximum of the


likelihood function, and Flury and Gautschi (1986) provide a numerical al-
gorithm solving (5.13) to (5.15) that is implemented in XploRe, Härdle
et al. (2000b). The maximum likelihood estimates of Ψi are denoted by
Ψ̂i = Γ̂Λ̂i Γ̂> , i = 1, . . . , k. Sample common PCs of the maturity groups
are given by Yi = Xi Γ̂.
Furthermore the following results are due to Flury (1988):
The estimated eigenvalues λ̂ij , i = 1, . . . , k, j = 1, . . . , p are asymptotically
distributed as √ L
ni − 1(λ̂ij − λij ) −→ N (0, 2λ2ij ) (5.16)
as min ni ↑ ∞, and are asymptotically independent of each other and inde-
pendent of Γ̂.
Pk
Denote by N = i=1 ni the overall number of observations. The asymp-
totic distribution of the p eigenvectors is given by:
√  
L
 
N − k Γ̂ − Γ −→ N 0, Var(Γ) , (5.17)

where Var(Γ) is the p2 × p2 matrix


P p 
θ γ γ > −θ12 γ 2 γ > · · · −θ1p γ p γ >
 j=1 1j j j 1 1

 j6=1 
 p 
 −θ21 γ 1 γ > θ1j γ j γ > · · · −θ2p γ p γ >
 P 
2 j 1 
j=1
Var(Γ) =  (5.18)
 
j6=2 
 .. .. .. .. 

 . . . .


 p 
 −θ γ γ > −θ γ γ > ···
P
θpj γ j γ > 
p1 1 p p2 2 p j
j=1
j6=p

 k  −1 −1
def P N −k λij λim
and θjm = ni −1 (λij −λim )2 with m 6= j. We point out that
i=1
the variance matrix, as usual in PCA, does not have full rank. Instead, it has
rank p (p − 1)/2.

5.2.3 Stability tests for eigenvalues and eigenvectors

With the preceding results, we are in the shape of conducting hypothesis


tests about the eigenvalues λij and the eigenvectors γ j in the multisample
framework. The tests are formulated in this section.

Eigenvalues

Suppose we estimate a CPC model in R subsamples and wish to test the


(r)
hypothesis of equality of the jth eigenvalue λij in the ith group across R
5.2 Common principal component analysis 139

subsamples:
(1) (r) (R)
H0 : λij = · · · = λij = · · · = λij
(r ) (r ) (r ) (r )
against the alternative H1 : ∃ λij 1 , λij 2 such that λij 1 6= λij 2 for some
r1 , r2 . H0 can be written as
(1) (2)
λij − λij = 0
..
.
(1) (r)
H0 : λij − λij = 0 . (5.19)
..
.
(1) (R)
λij − λij = 0

To formulate the test statistic, it is useful to PLdefine a contrast matrix:


C = (c1 , . . . , cL ) is called a contrast matrix, if l=1 cl = 0, and if its rows
are linearly independent, Johnson and Wichern (1998).
Especially, define by C1 the (R − 1) × R contrast matrix
 
1 −1 0 · · · 0
 1 0 −1 · · · 0 
C1 =  . . . . .  . (5.20)
 
 .. .. .. . . .. 
1 0 0 · · · −1

Equality of the jth eigenvalue in the ith group in R subsamples.


Denote by λe ij the R × 1 stacked vector of λ̂(r) , in r = 1, . . . , R subsamples
ij
and its asymptotic variance by the R × R matrix
(1) 2 (r) 2 (R) 2
 
λij λij λij
Var(λ)
e = 2 diag  ,..., ,..., ,
ni1 − 1 nir − 1 niR − 1

where nir is the sample size of group i and subsample r. A test for (5.19) can
be based on:
n o−1
Tequ = (C1 λe ij )> C1 Var(λ)C
e > C1 λ
e ij . (5.21)
1

Since the λij are asymptotically normal and independent by virtue of (5.16),
n o−1/2
def e >
z = C1 Var(λ)C 1 C1 λ
e ij is asymptotically N (0R−1 , IR−1 ) under H0 .
Thus Tequ = z> z is asymptotically χ2 distributed with (R − 1) degrees of
freedom. In practice all unknowns are to be replaced by consistent estimates,
which does not alter the asymptotic distribution of (5.21).
140 5 Dimension-reduced modeling

In fact, there are a lot of ways of formulating the aforementioned hypothe-


sis by a different choice of the contrast matrix. However, the test statistic does
not depend on this particular choice. For example, an equivalent formulation
of the hypothesis using the contrast matrix
 
−1 1 0 · · · 0 0
 0 −1 1 · · · 0 0 
C2 =  . . . .
 
. .
 .. .. .. . . .. .. 
0 0 0 · · · −1 1

would be
(2) (1)
λij − λij = 0
(3) (2)
λij − λij = 0
H0 : .. .
.
(R) (R−1)
λij − λij =0

The equivalence of the tests is due to the fact that any pair of contrast
matrices is related by a nonsingular matrix A such that C1 = AC2 . Inserting
AC2 into T yields:

e > −1 C1 λ
e > C1 Var(λ)C

T = (C1 λ) 1
e
e > A> −1 AC2 λ
e > AC2 Var(λ)C

= (AC2 λ) 2
e
e > C2 Var(λ)C

e > −1
= (C2 λ) 2 C2 λ
e ,

which is the same as before.

Eigenvectors

In testing for the eigenvectors one faces the difficulty that the covariance
matrix Var(Γ) given in equation (5.18) is singular. The problem was first
solved for testing one single eigenvector by Anderson (1963) and generalized
for the case of several eigenvectors by Flury (1988). We adapt their strategies
here for our stability tests.
We will be interested in testing for stability of a single eigenvector across
different samples. Without loss of generality we focus on the first eigenvector.
p
θ1j γ j γ >
P
Thus, the test will be based on the upper p × p matrix j in (5.18),
j=1
j6=1
only. Tests for equality of q > 1 eigenvectors would need to employ the qp × qp
upper submatrix of (5.18).
In analogy to (5.19) write
(1) (r) (R)
H0 : γ1 = · · · = γ1 = · · · = γ1
5.2 Common principal component analysis 141
(r ) (r ) (r ) (r )
against the alternative H1 : ∃ γ 1 1 , γ 1 2 such that γ 1 1 6= γ 1 2 for some
r1 , r 2 .
Again we rewrite H0 as
(1) (2)
γ 1 − γ 1 = 0p
(1) (3)
γ 1 − γ 1 = 0p
..
.
H0 : (1) (r) , (5.22)
γ1 − γ1 = 0p
..
.
(1) (R)
γ1 − γ1 = 0p
and use the p(R − 1) × pR contrast matrix:
 
I −I 0 · · · 0
 I 0 −I · · · 0 
C3 =  . . . .  , (5.23)
 
..
 .. .. .. . . . 
I 0 0 · · · −I
def def
where I = Ip and 0 = 0p×p , here.

Test of equality of the first eigenvector in R subsamples. Denote by γ e1


(r)
the pR × 1 vector of stacked eigenvectors γ 1 . Suppose that the R subsamples
are independently drawn, and define Var(e γ ) as the pR × pR block-diagonal
matrix
P p 
(1) (1)>
θ1j γ j γ j ··· 0
 j=1 
 j6=1 
Var(e
γ) = 
 .
. . . .
.

 . (5.24)

. . . 
 p 
P (R) (R)> 
 0 ··· θ1j γ j γ j
j=1
j6=1

The α-level test of equality


H0 : C3 γ
e 1 = 0R(p−1)
e 1 6= 0 is given by:
for the R first eigenvectors against H1 : C3 γ
e 1 )> {C3 Var(e
Tequ = (C3 γ γ )C>
3}
−1
C3 γ
e1 . (5.25)
p
θ1j γ j γ >
P
Since j has rank p−1, the pR×pR matrix Var(e
γ ) has rank R(p−
j=1
j6=1

γ )C>
1). Thus C3 Var(e 3 has full rank only if R(p−1) ≥ (R−1)p, or, equivalently,
142 5 Dimension-reduced modeling

when p ≥ R. In this case, since the γ j are asymptotically normal by (5.17),


def  −1/2
z = C3 Var(e γ )C>
3 C3 γ
e 1 is asymptotically N (0p (R−1) , Ip (R−1) ) under
>
H0 . Thus Tequ = z z is asymptotically χ2 distributed with p (R − 1) degrees
of freedom.
γ )C>
If p < R, the test is computed with the generalized inverse of C3 Var(e 3.
Then, by Theorem 1 in Khatri (1980) on quadratic forms of (singular) normal
variables, Tequ = z> z is asymptotically χ2 distributed with (p − 1) R degrees
of freedom.
Thus, H0 is rejected, if

Tequ > χ2 1 − α; min{(p − 1) R, p (R − 1)} ,



(5.26)

where χ2 (1 − α; ν), is the (1 − α)-quantile of the chi-squared distribution with


ν degrees of freedom.

5.2.4 CPC model selection

There are several strategies of model selection. Given our maximum likelihood
framework, on the one hand, one may construct likelihood ratio tests and test
each model separately against the unrestricted model. The log-likelihood ratio
statistic for testing the HCP C against the unrestricted model (unrelatedness
between covariance matrices) is given by:
k
L(Ψ̂1 , . . . , Ψ̂k ) X |Ψ̂i |
T = −2 ln = (ni − 1) ln , (5.27)
L(S 1 , . . . , S k ) i=1
|S i |

where L(S 1 , . . . , S k ) denotes the unrestricted maximum of the log-likelihood


function. The number of parameters estimated in the CPC model are p(p−1)/2
for the orthogonal matrix Γ plus kp for the eigenvalues Λi , and the number
of parameters in the unrelated case are given by kp(p − 1)/2 + kp. Hence the
test is asymptotically chi-squared with (k − 1)p(p − 1)/2 degrees of freedom
as min ni ↑ ∞ , Rao (1973).
On the other hand, it was said that the CPC models are nested, since each
model implies all the models which are lower in the hierarchy. For instance, the
proportional model necessarily implies the CPC model, or a pCPC(3) model
implies the pCPC(2) model. From this feature, one can decompose the total
chi-squared statistic, i.e. the test of equality against inequality, into partial
chi-squared statistics in the following way, Flury (1988):

Ttotal = T inequality of proportionality constants | proportionality

+ T deviation from proportionality | CPC

+ T nonequality of last p − q components | pCPC(q)

+ T nonequality of the first q components .
5.2 Common principal component analysis 143

This decomposition of the log-likelihood function and testing along these


lines, i.e. the more restrictive model against the less restrictive model, is called
step-up procedure. The degrees of freedom of these sequential tests have al-
ready been presented in Table 5.1.
Alternative model selection approaches are the AIC and SIC criteria,
Akaike (1973) and Schwarz (1978), see also our discussions on bandwidth
choice in Section 4.4 and 5.4.3. The AIC is defined by:
def
ΞAIC = −2 × (maximum of log-likelihood)
+ 2 × (number of parameters estimated) .

Following Flury (1988), we use a modified AIC. Assume we have U hi-


erarchically ordered models to compare, with a1 < . . . < au < . . . < aU
parameters in model u. Then define the modified AIC as:
def
ΞAIC (u) = −2 (Lu − LU ) + 2 (au − a1 ) , (5.28)

where Lu is the maximum of the log-likelihood function of model u. Selecting


the model with the lowest ordinary AIC is equivalent to selecting the model
with the lowest modified ΞAIC (u). Observe that ΞAIC (U ) = 2 (aU − a1 ) and
ΞAIC (1) = −2 (L1 − LU ).
The SIC, which aims at finite dimensional models, is defined as
def
ΞSIC = −2 × (maximum of log-likelihood)
+ (number of parameters) × ln(number of observations) .

As in Fengler et al. (2003b), we modify this criterion to:


def
ΞSIC (u) = −2 (Lu − LU ) + (au − a1 ) ln(N ), (5.29)
Pk
where N = i=1 ni denotes the overall number of observations across the k
groups. The model with the lowest SIC is the best fitting one.

5.2.5 Empirical results

For the empirical CPC analysis, we estimate the IVS for the 1995 to 2001
data from daily samples by means of a local polynomial estimator, Section 4.3.
The data set is described in the appendix. The moneyness grid is κf ∈ {0.925,
0.950, 0.975, 1.000, 1.025, 1.050} and the maturity grid is τ ∈ {0.0625, 0.1250,
0.1875, 0.2500} years, which corresponds to 22, 45, 68, and 90 days to expiry.
As kernel function we choose the product of univariate quartic kernels. In
the bandwidth selection, we proceed as discussed in Section 4.4.2. Since our
estimation grid only covers the short maturity data, there is no particular
144 5 Dimension-reduced modeling

Model
higher lower Chi. Sqr. df. p -val. AIC SIC
Equality Proportionality 1174.9 3 0.00 3529.3 3529.3
Proportionality CPC 1488.8 15 0.00 2360.3 2407.0
CPC pCPC(4) 122.6 3 0.00 901.5 1181.2
pCPC(4) pCPC(3) 210.6 6 0.00 784.9 1111.2
pCPC(3) pCPC(2) 115.5 9 0.00 586.2 1005.8
pCPC(2) pCPC(1) 398.9 12 0.00 488.7 1048.2
pCPC(1) Unrelated 17.6 15 0.28 113.7 859.7
Unrelated 126.0 1105.1

Table 5.2. Step-up & model building approach of CPC models.

need for a localization of bandwidths. For robustness, IVs with maturity of


less than 10 days are excluded from the estimation.
First, we will estimate the family of CPC models in the entire sample
period. This will be followed by a stability analysis of eigenvalues and eigen-
vectors across the different samples.

The entire sample period

The results of our model selection procedures are displayed in Table 5.2. Ac-
cording to the sequential chi-squared tests the model to be preferred is a
pCPC(1) model, since this test is the first that cannot be rejected against the
next more flexible model. Also when testing directly against the unrelated
model, which is done by adding up the test statistics and the corresponding
degrees of freedom between the model of interest and the unrelated model in
Table 5.2, it is the pCPC(1) model which is not rejected. AIC and SIC both
recommend the pCPC(1). Note also that according to the SIC all CPC(q)
models with q ≤ 3 are superior to the unrelated model. For the remaining
CPC models, the SIC is slightly higher than for the unrelated case, whereas
for the proportional and the equality models the information criteria increase
tremendously. As shall be seen in the following, for an approximation up to
88%, one component will be sufficient, while the second and third only add
only 6% and 3% of explained variance. Thus, we believe – also for computa-
tional and practical simplicity – that a CPC model can be chosen as a valid
description of the IVS dynamics.
The estimation results of the eigenvectors for the entire sample period
exhibit the same stylized facts as documented in Fengler et al. (2003b) for
the year 1999 for daily settlement prices. In Figure 5.3, we display the results
for the first three eigenvectors. Table 5.3 reports the estimation results of
the entire matrix of eigenvectors. The numbers given in parenthesis are the
asymptotic standard errors.
5.2 Common principal component analysis 145

Common Coordinate Plot: First three Eigenvectors

0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6

2 4 6
Index of Eigenvectors

Fig. 5.3. CPC model for the entire sample period 19950101 - 20010531. First eigen-
vector horizontal line, second eigenvector diagonal line, third eigenvector U-shaped
line. Compare with Fengler et al. (2003b).

The factor loadings of the first eigenvector, the blue line in Figure 5.3, are
of the same sign throughout (eigenvectors are unique up to sign), and give
approximately the same weight to each volatility shock across the smile. We
hence interpret this factor as a common shift factor. In Figure 5.4, we present
the projection of the longest IV maturity group (three months maturity) using
the first eigenvector. The upper panel shows the PC, the lower the integrated
process. The shift interpretation of the first component is also visible from the
general structure of this process: in comparison with Figure 2.13, it is seen
that it exhibits almost the same patterns as the IV process itself.
In PCA, one typically employs the following measure to gauge the fraction
of variance, which is captured by the j 0 th factor:
λ̂ij 0
Pp , (5.30)
j=1 λ̂ij

for i = 1, . . . , k and j = 1, . . . , p. This is reasonable, since PCs are uncorrelated


by construction and each eigenvalue is the variance of the corresponding PC.
For the first PC, this amounts to 88% in the longest maturity group.
The second eigenvector, the green line, switches its sign at ATM, and gives
opposite weights to the shocks in the wings of the IVS. Thus, we interpret
146 5 Dimension-reduced modeling

mon. index γ̂ 1 γ̂ 2 γ̂ 3 γ̂ 4 γ̂ 5 γ̂ 6
1 0.344 -0.598 0.530 0.472 -0.129 0.055
(0.0021) (0.0095) (0.0113) (0.0070) (0.0085) (0.0037)
2 0.373 -0.385 0.022 -0.614 0.502 -0.288
(0.0014) (0.0044) (0.0096) (0.0086) (0.0105) (0.0068)
3 0.397 -0.173 -0.339 -0.326 -0.457 0.618
(0.0010) (0.0065) (0.0055) (0.0090) (0.0090) (0.0056)
4 0.419 0.024 -0.482 0.250 -0.337 -0.644
(0.0011) (0.0085) (0.0038) (0.0083) (0.0088) (0.0043)
5 0.440 0.270 -0.252 0.432 0.610 0.334
(0.0012) (0.0057) (0.0074) (0.0105) (0.0081) (0.0074)
6 0.463 0.625 0.554 -0.213 -0.191 -0.073
(0.0022) (0.0095) (0.0108) (0.0076) (0.0052) (0.0032)
mat.
group λ̂i1 λ̂i2 λ̂i3 λ̂i4 λ̂i5 λ̂i6
1 16.39 0.90 0.55 0.11 0.04 0.01
(0.578) (0.032) (0.019) (0.004) (0.001) (0.0003)
2 10.14 0.41 0.16 0.07 0.03 0.01
(0.357) (0.014) (0.006) (0.002) (0.001) (0.0004)
3 7.20 0.33 0.14 0.07 0.04 0.02
(0.254) (0.012) (0.005) (0.002) (0.001) (0.001)
4 6.01 0.40 0.23 0.09 0.06 0.02
(0.211) (0.014) (0.008) (0.003) (0.002) (0.001)
Table 5.3. In the top position the eigenvectors Γ̂ = (γ̂ 1 , . . . , γ̂ 6 ). From top to
bottom, the numbers denote the moneyness grid κf ∈ {0.925, 0.950, 0.975, 1.000,
1.025, 1.050}. The eigenvalues below λ̂ij × 103 are ordered from top to bottom
with increasing maturity τi ∈ {0.0625, 0.1250, 0.1875, 0.2500}, standard errors in
parenthesis; sample period 19950101 to 20010531.

PC Variance Standard Skewness Kurtosis Mean Corr.


explained deviation reversion with index
1 0.88 0.078 0.34 4.12 227.7 -0.48
2 0.06 0.020 0.30 6.54 36.5 0.08
3 0.03 0.015 0.22 7.30 2.2 -0.03
Table 5.4. Descriptive statistics of the first three PCs, 90 days to expiry.

the second type of shocks as common slope shocks. Figure 5.5 displays this
component. The integrated second PC has a stable downward trend, which
appears to revert around 1999. The third eigenvector can be interpreted as
a common twist factor. This factor hits the curvature of the surface, since
the sign of the eigenvector switches within the near-the-money region. Again
the projection and the integrated process are shown in Figure 5.6. These
components account for only 6% and 3% of the variance. Similar results have
been obtained by Zhu and Avellaneda (1997), Skiadopoulos et al. (1999),
5.2 Common principal component analysis 147

1st PC

0.4
0.2
0
-0.2
-0.4

1995 1996 1997 1998 1999 2000 2001


Time

1st PC, integrated


3
2
1
0
-1

1995 1996 1997 1998 1999 2000 2001


Time

Fig. 5.4. Projection of the longest maturity group (90 days to expiry) using the first
eigenvector. The upper panel shows returns, the lower panel the integrated series.

Alexander (2001b), Cont and da Fonseca (2002), Fengler et al. (2002b). The
interpretations of the factor loadings in terms of shift, slope and twist shocks
are also known from PCA studies on interest and forward rates, Bliss (1997)
and Rebonato (1998).
Table 5.4 summarizes the descriptive statistics of the PCs. The results are
similar to the findings of Cont and da Fonseca (2002) reported on the S&P 500
and the FTSE 100, except for the mean reversion. Whereas skewness is close
to zero for the three PCs, there is evidence for excess kurtosis especially for the
second and third PC. The mean reversion of the integrated first PC is found
to be around 230 days, i.e. almost a year, while the second PC exhibits a more
148 5 Dimension-reduced modeling

2nd PC

0.1
0
-0.1

1995 1996 1997 1998 1999 2000 2001


Time

2nd PC, integrated


0
-0.2

1995 1996 1997 1998 1999 2000 2001


Time

Fig. 5.5. Projection of the longest maturity group (90 days to expiry) using the
second eigenvector. The upper panel shows returns, the lower panel the integrated
series.

short-lived mean reversion of 36 days. The third PC has a mean reversion of 2


days. To our experience, however, the estimates of the mean reversion tend to
be very sensitive to the sample size chosen, and change significantly in annual
subsamples. Thus, the estimates of the mean reversion coefficient should be
taken with caution.
The correlation with the returns of underlying is around -0.5 for the first
PC. This is in line with the leverage effect: according to this argument (im-
plied) volatility rises, when there is a negative shock in the market value of the
firm, since this results in an increase in the debt-equity ratio. For the second
and third PCs, the correlations are negligible.
5.2 Common principal component analysis 149

3rd PC

0.05
0
-0.05

1995 1996 1997 1998 1999 2000 2001


Time

3rd PC, integrated


0.05
0
-0.05

1995 1996 1997 1998 1999 2000 2001


Time

Fig. 5.6. Projection of the longest maturity group (90 days to expiry) using the third
eigenvector. The upper panel shows returns, the lower panel the integrated series.

Stability analysis among different samples

For any application in trading or risk management model stability is a decisive


model characteristic, since otherwise model risk is unreckonable. In terms
of the CPC models, there are two types of stability of interest: the more
important one refers to the stability of the transformation matrices Γ. Stability
of Γ implies that the model can be estimated in a given (historical) sample
period and contemporaneous PCs can be obtained by daily updating the data
base of IVs and by projecting them into the same space without explicitly
estimating Γ again. The second type of stability refers to the variances of the
components collected in Λi . Instability of Λi does not imply the need to often
150 5 Dimension-reduced modeling

re-estimate the model, since the CPC model places no restrictions on Λi :


Ψi = ΓΛi Γ> may very well hold across time in the sense of time-dependent
variances Ψi,t and Λi,t . However, instability of Λi implies the need of time
series models capturing the heteroscedasticity of the PCs and thus has an
impact on the choice of the time series model of the PCs.
To assess stability we split the entire sample into R = 7 annual, non-
overlapping subsamples with around 250 observations in each subsample, ex-
cept for the last one with 105 observations. In each of the annual samples, we
estimate the CPC model separately.
Let us first address stability among eigenvectors: in Figure 5.7, we display
the estimation results, where again blue refers to the first, green to the second
and red to the third eigenvector. To highlight time dependence, colors move
gradually from light to intensive tones the more recent the subsample.
As is immediately seen, the general structure of the eigenvectors is not
altered: shift, slope and twist interpretation are visible for each sample. How-
ever, estimates display variability of different degrees. The first eigenvectors
(blue) changes only little and appears to wander around the ATM IV thereby
giving more equal weights to IVs across different moneyness for the most
recent samples. The second and third eigenvector exhibit greater variance
through time. For the years 1995, 1999, 2000, 2001 the second eigenvector
appears concave, for 1996, 1997, 1998 convex. From the color intensity it is
also seen that the third eigenvectors appear hardly altered for the most re-
cent samples, whereas only those belonging to the samples 1995 and 1996 are
largely different.
In a first testing attempt, we constructed a test for stability for all seven
subsamples in one big test. However, it turned out that in this encompass-
ing test, the variance-covariance matrix tended to be ill-conditioned, which
caused numerical problems in computing its inverse. Accordingly, we decided
to proceed sequentially in testing subsequently each year against the next
following one. There are two caveats in this procedure: first, when there is a
small but persistent trend in the data, it may be that the deviations from one
year to the next one are too little to be detected by the test. However, over
a longer horizon a large deviation may accumulate. To capture this possible
effect, we choose the oldest subsample from 1995 as a benchmark, and test
each eigenvector also against the 1995 estimates. As a second peril, we enter
the well-known statistical problem of pre-tests. However, given the numerical
difficulties encountered in the single test, we think that the sequential proce-
dure is more reliable and can be justified. Furthermore, from a financial point
of view, the information that significant changes occur from one year to the
next may be of more interest than the information that something happened
within the past seven years, since at some point – given the general concern
of model risk – one will update or recalibrate the model in any case.
5.2 Common principal component analysis 151

Common Coordinate Plot: First three Eigenvectors

0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8

2 4 6
Index of Eigenvectors

Fig. 5.7. CPC model estimated separately in each annual sample 1995, 1996, 1997,
1998, 1999, 2000, 2001. Colors move from light to intensive tones the more recent
the subsample.

In Table 5.5, we present the test-statistics and the p -values of our tests. For
the first eigenvector, against the benchmark year 1995, the stability hypothesis
cannot be rejected at the 5% level of significance except for the years 2000 and
2001. The sequential tests reveal that there is a significant change from 1996 to
1997 and from 1999 to 2000. Our interpretation of these results, together with
the visual inspection of Figure 5.7, is that the first eigenvector is relatively
reliable across the sample periods.
For the second and third eigenvectors, as can be conjectured from Fig-
ure 5.7, the case is much different: the stability hypothesis is strongly rejected
against the benchmark year. In the sequential tests, only from year 2000 to
2001 the null hypothesis cannot be rejected. There is a marginal case with
respect to the third eigenvector, from 1997 to 1998. Altogether, we conclude
that the second and third eigenvectors exhibit significant changes over time.
In the stability case of the eigenvalues, we present tests of only one group.
There would be – if we tested three eigenvalues only – 132 tests (benchmark
and sequential tests) to study in four groups with a moneyness grid of dimen-
sion six. Table 5.6 displays the results of the group with the shortest time to
maturity (22 days to expiry). Results for the other groups are very similar.
152 5 Dimension-reduced modeling

First eigenvectors
Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 11.8 0.066
1995 1997 5.6 0.465 1996 1997 28.8 0.000
1995 1998 12.5 0.051 1997 1998 2.4 0.873
1995 1999 6.6 0.352 1998 1999 4.7 0.580
1995 2000 29.2 0.000 1999 2000 39.1 0.000
1995 2001 22.6 0.001 2000 2001 1.38 0.966

Second eigenvectors
Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 205.9 0.000
1995 1997 188.8 0.000 1996 1997 289.0 0.000
1995 1998 85.8 0.000 1997 1998 19.6 0.003
1995 1999 539.1 0.000 1998 1999 99.3 0.000
1995 2000 100.8 0.000 1999 2000 173.8 0.000
1995 2001 28.4 0.000 2000 2001 9.34 0.155

Third eigenvectors
Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 444.3 0.000
1995 1997 108.0 0.000 1996 1997 532.1 0.000
1995 1998 36.4 0.000 1997 1998 16.7 0.010
1995 1999 251.8 0.000 1998 1999 66.7 0.000
1995 2000 48.1 0.000 1999 2000 92.9 0.000
1995 2001 19.8 0.000 2000 2001 7.4 0.284
Table 5.5. Stability tests of eigenvectors. Tests are constructed as derived in (5.25).
The p -value is from a chi-squared variate with six degrees of freedom.

This is not surprising given the high degree of co-movements in the IVS. As is
seen in Table 5.6, the null hypothesis is rejected against the benchmark year
for all three eigenvalues. For the sequential tests, results are mixed: mostly
all tests reject, but e.g. between the years of financial crisis 1997 and 1998,
differences between the two samples are not significant in the first and second
eigenvalue. As a general bottom line, for the second eigenvalue, differences
between the years seem to be much less important than for the first and third
one. This is an interesting result since it says – given our interpretation of this
component earlier – that volatility in the wings of the IVS is more constant
than in the level and the twist component.
Summing up, from the stability tests, we draw the following conclusions:
stability of the eigenvalues – except for the second one – is rejected. This is
not a particular threat to modeling PCs or PCA in general, since it simply
indicates that GARCH-type models can be an adequate choice in the time
series context. For the eigenvectors, things look different: the good news is
5.2 Common principal component analysis 153

First eigenvalues in group 1


Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 16.3 0.000
1995 1997 41.6 0.000 1996 1997 10.3 0.001
1995 1998 54.3 0.000 1997 1998 2.5 0.108
1995 1999 33.8 0.000 1998 1999 6.8 0.009
1995 2000 15.0 0.000 1999 2000 6.1 0.013
1995 2001 18.7 0.000 2000 2001 5.8 0.016

Second eigenvalues in group 1


Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 17.1 0.000
1995 1997 18.4 0.000 1996 1997 52.4 0.000
1995 1998 55.1 0.000 1997 1998 19.9 0.108
1995 1999 56.9 0.000 1998 1999 0.1 0.807
1995 2000 62.5 0.000 1999 2000 0.6 0.418
1995 2001 71.0 0.000 2000 2001 2.5 0.109

Third eigenvalues in group 1


Sample 1 Sample 2 T p -val. Sample 1 Sample 2 T p -val.
1995 1996 44.3 0.000
1995 1997 35.7 0.000 1996 1997 87.6 0.000
1995 1998 72.5 0.000 1997 1998 22.4 0.000
1995 1999 40.7 0.000 1998 1999 18.0 0.000
1995 2000 79.1 0.000 1999 2000 26.8 0.000
1995 2001 83.6 0.000 2000 2001 1.3 0.240
Table 5.6. Stability tests of eigenvalues in group 1. Tests are constructed as derived
in (5.19). The p -value is from a chi-squared variate with one degree of freedom.

that the first eigenvector, the component, which captures more than 80% of
the variance, is fairly stable. Thus in applications of risk controlling, such
as scenario analysis or stress tests, see e.g. Fengler et al. (2002b), one can
build on reliable estimates. The results from these experiments may not be
completely correct in the wings of the IVS. However, since the biggest threat
to option portfolios stems from level changes, the risk may be bearable from
a risk management point of view. The bad news applies to trading strategies
that aim at exploiting the wings of the IVS, i.e. trading in OTM puts or OTM
calls. Here, continuous recalibration of the models appears to be mandatory.
From our point of view, the results call for adaptive techniques of PCA
that identify homogenous subintervals in the sample period by data-driven
methods. On homogenous subintervals, reliable estimates are recovered. The
literature on adaptive estimation, as pioneered by Lepski and Spokoiny (1997)
and Spokoiny (1998), has been applied successfully in other contexts in finance
154 5 Dimension-reduced modeling

such as time-inhomogenous volatility modeling, Härdle et al. (2003) and Mer-


curio and Spokoiny (2004).

Time series models

Due to the similarity within the groups (we consider the time series as scaled
versions of each other), we concentrate on one group only. We pick the longest
time to maturity group. The time series of the first three PCs yk1 , yk3 , yk3 are
obtained from the projection Yk = Xk Γ̂. Since the stability of the second and
third eigenvectors was rejected, we reestimate the model in each subsample
and project using the new matrices Γ̂(r) . Based on autocorrelation and partial
autocorrelation plots, we propose adequate models for each univariate series.
Of course the univariate time series are not independent, but they are un-
correlated by construction. This is why modeling the univariate series can be
justified, see Zhu and Avellaneda (1997) for a similar approach. By AIC and
SIC searches we will identify a best fitting model, and present the estimation
results in more detail.
From Figures 5.4 to 5.6, it is seen that the first three PCs display a be-
havior close to white noise. This impression is reinforced when inspecting
the autocorrelation and partial autocorrelation functions as displayed in Fig-
ures 5.8 to 5.13. From Figure 5.8 it is seen that the first component exhibits
no autocorrelation: it immediately dies off. Also the partial autocorrelation
function in Figure 5.9 does not show a particular structure. Thus, the first
component, which explains up to 88% of the variance, can be considered as
noise.
For the second and third components a different picture arises: from Fig-
ures 5.10 and 5.12 a negative first order correlation is visible hinting towards
an MA(1) model. Also the partial autocorrelation functions in Figures 5.11
and 5.13 display the typical patterns of an MA process.
With this preliminary analysis at hand, we perform AIC and SIC searches
over MA(q)-GARCH(r, s) models, where q = 0, r = 1, 2 s = 1, 2 for the
first, and q = 1, r = 1, 2, s = 1, 2 for the second and third component. We
also estimate different types of GARCH models such as TGARCH specifica-
tions in order to investigate asymmetries in shocks. Since Table 5.4 suggests
a substantial correlation with the contemporaneous index returns, we addi-
tionally include index returns into the mean equations of all processes, and
additionally into the variance equation of the first component.
The MA-GARCH models for the components j = 1, 2, 3 are given by:
yjt = c + a1 zt + εjt + b1 εj,t−1 ; , (5.31)
2
εjt ∼ N (0, σjt ),
X r Xs
2
σjt = cσ + αm σj,t−m + βm ε2j,t−m + γzt2 , (5.32)
m=1 m=1
5.2 Common principal component analysis 155

ACF 1st PC

1
0.5
acf
0

0 5 10 15 20 25 30
lag

Fig. 5.8. Autocorrelation function of the first PC.

PACF 1st PC
0.05
pacf
0
-0.05

5 10 15 20 25 30
lag

Fig. 5.9. Partial autocorrelation function of the first PC.

where we denote the elements of yk1 , yk3 , yk3 by y1t , y2t , y3t to put ourselves
into the usual time series notation. Log-returns in the DAX index are denoted
by zt .
Table 5.7 displays the statistics of the model selection criteria for the
different models under consideration. For y1t both AIC and SIC suggest an
GARCH(1,2) specification. For y2t and y3t , the results are not as clear-cut.
Since the differences of the model selection criteria are very much the same, we
decided for the more parsimonious model, i.e. an MA(1)-GARCH(1,1) model
for both.
Given these results, one may like to alter the variance equation to allow
for asymmetries in shocks: under the TGARCH model, Glosten et al. (1993)
156 5 Dimension-reduced modeling

ACF 2nd PC

1
0.5
acf
0

0 5 10 15 20 25 30
lag

Fig. 5.10. Autocorrelation function of the second PC.

PACF 2nd PC
0
-0.1
pacf
-0.2
-0.3
-0.4

5 10 15 20 25 30
lag

Fig. 5.11. Partial autocorrelation function of the second PC.

and Zakoian (1994), the variance equation (5.32) becomes


r
X s
X
2
σjt = cσ + αm σj,t−m + βm ε2j,t−m +β1− ε2j,t−1 1(εj,t−1 < 0)+zt2 . (5.33)
m=1 m=1

In this model, good news, εt > 0, and bad news, εt < 0, have differential
Ps effects
on the conditional variance P– good news have an impact of m=1 βm , while
s
bad news have an impact of m=1 βm +β1− . If β1− > 0, a leverage effect exists,
and the news impact is asymmetric if β1− 6= 0. We also estimated EGARCH
models, Nelson (1991), however, since they did non produce any substantial
gain compared to the other models, we do not report the estimation results
here.
5.2 Common principal component analysis 157

ACF 3rd PC

1
0.5
acf
0
-0.5

0 5 10 15 20 25 30
lag

Fig. 5.12. Autocorrelation function of the third PC.

PACF 2nd PC
0
-0.1
pacf
-0.2
-0.3
-0.4

5 10 15 20 25 30
lag

Fig. 5.13. Partial autocorrelation function of the third PC.

In Table 5.8, the estimation results are displayed in more detail. From
the mean equation for y1t it is evident that the index returns have a highly
significant impact on the first PC. The sign is in line with the leverage effect
hypothesis. In the variance equation all parameters are significant. β2 < 0
may be interpreted as an ‘over-reaction correction’ in terms of the variance:
high two-period lagged returns have a dampening impact on the variance. As
is to be expected, volatility increases also when volatility in the underlying
is high (γ > 0). From the TGARCH model, no evidence for a GARCH type
leverage effect is found, since β1− < 0. The other parameter estimates for the
TGARCH are of same size and significance level. The adjusted R̄2 is around
23%. This is high, however, it is entirely due to the index returns included in
158 5 Dimension-reduced modeling

Model AIC SIC


y1t y2t y3t y1t y2t y3t
GARCH(1,1) -2.674 -2.654
GARCH(1,2) -2.681 -2.657
GARCH(2,1) -2.677 -2.654
GARCH(2,2) -2.681 -2.654
MA(1)-GARCH(1,1) -5.872 -6.460 -5.849 -6.436
MA(1)-GARCH(1,2) -5.871 -6.460 -5.844 -6.443
MA(1)-GARCH(2,1) -5.871 -6.461 -5.844 -6.434
MA(1)-GARCH(2,2) -5.870 -6.461 -5.840 -6.431
Table 5.7. Univariate model selection: Akaike and Schwarz Information Criteria
(AIC, SIC) over a variety of MA(q)-GARCH(r, s) models of yjt .

Factor
y1t y2t y3t
cond. mean
c 0.001 0.001 1.9E −4 1.0E −4 -3.8E −05 -5.8E −05
[0.407] [1.048] [1.170] [0.566] [-0.592] [-0.907]
a1 -2.920 -2.930 0.086 0.079 0.005 0.004
[-24.46] [-24.21] [4.860] [4.564] [0.457] [0.351]
b1 -0.733 -0.501 -0.733 -0.729
[-35.50] [-21.78] [-35.50] [-34.81]
cond. var.
cσ 1.4E −4 1.6E −4 6.7E −5 6.4E −5 1.7E −05 2.2E −05
[3.945] 4.141 [7.515] [7.353] [8.687] [8.681]
α1 0.803 0.797 0.425 0.462 0.686 0.631
[32.09] [29.07] [6.774] [7.791] [24.41] [17.11]
β1 0.246 0.284 0.200 0.115 0.147 0.082
[7.112] [7.598] [6.840] [3.505] [8.027] [3.206]
β2 -0.130 -0.124
[-4.110] [-3.611]
β1− -0.950 0.150 0.142
[-3.706] [3.239] [3.916]
γ 1.480 1.580
[4.991] [4.909]
R̄2 0.23 0.23 0.22 0.21 0.33 0.33
Table 5.8. Estimation results of GARCH models for the three PCs, t-statistics in
brackets.

the regression. Leaving zt out of the mean equations reduces the R̄2 to around
2%, only.
In the mean equations of y2t and y3t , the MA(1) components are negative
and significant. The index returns are only significant for y2t and positively
influence the slope structure in the surface. Thus, together with the results
for y1t , we see that positive shocks in the underlying tend to reduce IV levels,
5.2 Common principal component analysis 159

while at the same time the slope of the surface is intensified. The variance
equations do not exhibit any special features, however, it is interesting that a
GARCH type leverage effect is present, since β1− > 0: lagged negative shocks
increase the variance of both processes.

CPC models: an intermediate summary

We have seen that CPC models yield a valid description of the IVS dynamics.
They offer a convenient framework for model choice – and ultimately – for a
low-dimensional description of the IVS. Three components that have intuitive
financial interpretations as a shift, a slope and a twist shock appear to yield a
sufficiently exact representation. Stability tests indicate that the first and most
important component is fairly stable, while this conclusion cannot be drawn
for the other two components. We employed GARCH models to describe the
dynamics of the resulting factor series.
Within this framework risk and scenario analysis for portfolios can be
implemented in a straightforward manner, Fengler et al. (2002b) and Fengler
et al. (2003b). Forecasting is likely to be limited. At best a one-day forecast
can be performed. Since this will be done in the context of the semiparametric
factor model, we do not perform a separate forecast exercise at this point.
A potential disadvantage of CPC models is that the number of time series
to be modelled are a multiple of the number of time to maturity groups, if
one does not follow our simplification to model the series as scaled versions
of each others. Also it would be more elegant, if factor extraction and surface
estimation could be performed within a single step. This can be resolved by
applying a functional PCA or using a semiparametric factor model as shall
be seen presently.
160 5 Dimension-reduced modeling

5.3 Functional data analysis

Rethinking the approach taken in Section 5.2 suggests to carry the idea of
PCA over to the functional case: this leads to functional PCA (FPCA). In
PCA we obtain eigenvectors which are used to project the slices of the IVS
into a lower dimensional space. In FPCA, we will recover eigenfunctions, or
eigenmodes, for this projection (now defined in a functional sense). Similarly
to PCA, we can represent the IVS as a linear combination of uncorrelated
(scalar) random variables, which – via their eigenfunctions – unfold the high-
dimensional dynamics of the IVS. In the literature of signal processing this
representation is often called Karhunen-Loève expansion or decomposition.
In the following subsection the basic ideas of the FPCA approach will be
sketched. Key references for functional data analysis are Besse (1991) and
Ramsay and Silverman (1997), who coined the field of functional data analysis.
We also briefly address ways of computing FPCs. First application in the
context of the IVS is due to Cont and da Fonseca (2002) who studied the
IVS derived from options on the S&P 500 index and the FTSE 100 index. A
treatment that also focusses on the computational aspects of FPCA, is given
by Benko and Härdle (2004).

5.3.1 Basic set-up of FPCA

We consider the L2 Hilbert space H(J ) on a bounded interval J ⊂ R2 , where


J = [κmin , κmax ] × [τmin , τmax ] represents a region of moneyness and time to
maturity. To model the IVS, we concentrate on sufficiently smooth elements
of H that we interpret as surfaces over J .
The inner product on H is given by
Z
def
hf, gi = f (u)g(u) du , for f, g ∈ H(J ) , (5.34)
J

and the norm k · k


Z 1/2
def
kf k = f (u)2 du . (5.35)
J

The notation of the inner product can be distinguished from the covariation
process of two stochastic processes h·, ·it which is indexed by t.
We interpret a random surface X as a random function such that each
realization ω ∈ Ω gives a smooth surface X(ω, ·) : J → R. Without loss of
generality we assume that X is mean zero. For the precise probabilistic set-up,
we refer to Dauxois et al. (1982) and Pezzulli and Silverman (1993).
One can derive FPCA in the same step-wise manner as is typically done
in PCA in standard textbook treatments: find linear combinations, i.e. weight
functions γ(u), such that the projection
5.3 Functional data analysis 161
Z
Y1 = γ(u) X(u) du = hγ1 , Xi (5.36)
J

has maximum variance subject to kγ1 k = 1. Continue, by finding another


weight function γ2 such that Y2 = hγ2 , Xi has maximum variance subject to
kγ2 k = 1 and is orthogonal to γ1 in the sense that hγ2 , γ1 i = 0, and so on.
This leads to the following constrained optimization problem:
Z Z
max Varhγj , Xi = max E γj (u)X(u) γj (v)X(v) du dv
J J
Z Z
= max γj (u) C(u, v) γj (v) du dv
J J
= max hγj , A γj i (5.37)

subject to kγj k2 = 1 and hγj 0 , γj i = 0 for j 0 < j. The covariance between the
def
two surface values at u, v ∈ J is denoted by C(u, v) = Cov{X(u), X(v)}, and
the integral transform of the weight function γ with kernel C is defined by:
Z
def
Aγ(·) = C(·, v) γ(v) dv . (5.38)
J

We call the integral transform A which acts on γ the covariance operator.


Since C(·, ·) is continuous and J bounded, A is compact. By definition, we
have that A is symmetric and positive.
By general results from functional analysis, Riesz and Nagy (1956), the
solution to this problem is obtained by solving the functional eigenvalue prob-
lem: Z
C(u, v) γj (v) dv = λj γj (u) , (5.39)
J
which is a Fredholm integral equation of the second kind. The sequence of
eigenfunctions γ1 , γ2 , . . . and eigenvalues λ1 ≥ λ2 ≥ . . . ≥ 0 are the solutions
to the maximization problem associated with FPCA. An important difference
to multivariate PCA is the number of eigenfunction-eigenvalue pairs. In multi-
variate PCA, their number are equal to the number of variables measured: p in
our former notation, whereas there are infinitely many in the functional case.
In practice, the number depends on the rank of the covariance operator A.
The projection of X on γj (u) is given by Yj = hγj , Xi. By orthogonality of
the sequence of eigenfunctions γj , the Yj are a sequence of uncorrelated PCs.
This implies that X is spanned by
X
X(u) = Yj γj (u) , (5.40)
j

which yields the desired dimension reduction if the number of eigenfunctions,


which are surfaces themselves, can be chosen to be small, see the discussion in
Section 5.2.1, in particular Equation (5.4), and Cont and da Fonseca (2002).
162 5 Dimension-reduced modeling

Finally, as in multivariate PCA, compare (5.3), the following link between


the eigenvalues and the variance of the components holds:

Varhγj , Xi = hγj , Aγj i = λj kγj k2 = λj . (5.41)

5.3.2 Computing FPCs

Denote by xi (u), i = 1, 2, . . . n and u ∈ J a sample of realizations of the IVS.


As a first step, replace the unknown covariance function Cov by its sample
analogue Cov
d in further maintaining the assumption of a zero mean:
n
 def 1 X
Cov
d X(u), X(v) = xi (u) xi (v) . (5.42)
n − 1 i=1

There are a number of methods for computing FPCs and solving (5.39),
Ramsay and Silverman (1997). The first approach consists in discretizing the
functions. In the simplest case when J is only a one-dimensional interval,
say a particular smile or the ATM term structure, one can recover the values
xi (u1 ), xi (u2 ), . . ., xi (up ) on a dense grid, and store the data in (n×p) matrix.
Then an ordinary PCA is applied. Since in practice it can happen that p >
n, it may be necessary to recover the solution to the eigenvalue problem
from the singular value decomposition of the data matrix. In order to recover
the functional form of the eigenvectors, they are renormalized and suitably
interpolated, Ramsay and Silverman (1997, Section 6.4.1). In principle, one
could proceed similarly in the two-dimensional case where J contains the
full region of moneyness by stacking the surfaces into a huge matrix. After
applying an ordinary PCA, the resulting eigenvectors are resorted to recover
the two-dimensional eigenfunctions.
Another, more elegant solution relies on basis expansions of the eigenfunc-
tions, Ramsay and Silverman (1997, Section 6.4.2) and Cont and da Fonseca
(2002). Suppose that the IVS admits an expansion in terms of a set of L basis
functions φ1 (u), φ1 (u), . . . , φL (u), u ∈ J . Then each function is written as:
L
X
xi (u) = cil φl (u) , (5.43)
l=1

or, more compactly in matrix notation:

x(u) = Cφ(u) , (5.44)


def  def 
where the vectors x(u) = xi (u) , and φ(u) = φl (u) , and the matrix
def
C = (cil ) for i = 1, . . . , n and l = 1, . . . , L are defined by their elements. In
this case the covariance function is expressed as
5.3 Functional data analysis 163

def 1
φ> (u)C> Cφ(v) .

Cov
d X(u), X(v) = (5.45)
n−1

PL the eigenfunction is expressed in terms of the basis functions as


Similarly,
γ(u) = l=1 bl φl (u), or again in matrix form by γ(u) = φ> (u) b.
With these preparations, one transforms the left-hand side of (5.39) to:
Z
1 1
φ> (u) C> Cφ(v) φ> (v) b dv = φ> (u) C> CWb , (5.46)
n−1 J n−1
def R
where W = (wl,l0 ) = J
φl (v) φl0 (v) dv. Thus, the Equation (5.39) reads as

1
φ> (u) C> CWb = λφ> (u) b . (5.47)
n−1

Since the last equation must hold for any u ∈ J , it reduces to the pure
matrix equation:
1
C> CWb = λ b . (5.48)
n−1
Equation (5.48) is further simplified by the following observation: in our
basis framework, the inner product corresponds to
Z
>
hγj , γj 0 i = b> >
j φ(u) φ (u) bj 0 du = bj Wbj 0 . (5.49)
J

def
Defining u = W1/2 b, one can transform (5.48) into the symmetric eigen-
value problem:
1
W1/2 C> CW1/2 u = λ u . (5.50)
n−1
This is solved using any standard PCA routines in statistical packages. The
desired eigenfunctions are recovered by b = W−1/2 u.
A special case occurs, when the basis functions are orthonormal. Then
W = IL , i.e. it becomes the identity matrix of order L. Hence, FPCA is re-
duced to the multivariate PCA performed on the coefficient matrix C, Ramsay
and Silverman (1997).
The concept of expanding the unknown solution to (5.39) on a set of basis
functions is also known as collocation. It should be outlined that this superpo-
sition of basis functions leads to a strong solution of the underlying Fredholm
integral equation of the second kind. This highlights the main difference to
the well-known Galerkin methods that solve (5.39) in a weak sense, i.e. with
respect to the corresponding dual space of H. To implement the Galerkin
method, one starts with a finite dimensional subspace of this dual space, and
solves (5.39) with respect to a basis of this subspace. As the dimension of that
subspace tends to infinity, one can obtain a solution that holds for all linear
functionals. The Galerkin approach is taken by Cont and da Fonseca (2002).
164 5 Dimension-reduced modeling

To this end, Cont and da Fonseca (2002) expand the eigenfunctions up to


an error on a basis
L
X
γj (u) = bj,l φl (u) + j . (5.51)
l=1

Plugging (5.51) into (5.39) yields, up to another error term:


L
X Z 
εj = bj,l C(u, v)φl (v) du − λj φl (u) . (5.52)
l=1 J

It should be noted that implicitly both errors j and εj depend on L, the


number of basis functions.
The Galerkin approach requires the orthogonality of the error εj to the
approximating functions φl , l = 1, . . . , L, i.e.:

hεj , φl i = 0 . (5.53)

This yields
L
X Z Z Z 
bj,l C(u, v)φj (u)φl (v) dv du − λj φj (u)φl (u) du = 0 . (5.54)
l=1 J J J

Assuming that N eigenfunctions are to be recovered, introduce the follow-


ing the matrix notation (in an elementwise sense):

B = (bj,l ) (5.55)
Z
def
W = (wj,l ) = φj (v) φl (v) dv (5.56)
Z JZ
def
C = (cj,l ) = C(u, v)φj (u)φl (v) dv du (5.57)
J J
def
Λ = diag(λj , j = 1, . . . , N ) . (5.58)

Then (5.54) can be summarized as

CB = ΛWB . (5.59)

The solution of this generalized eigenvalue problem, B and Λ, delivers the


eigenfunctions by substituting into (5.51). The functional PCs are obtained
via the projection (5.36), the associated variances of which are given by λj .
Cont and da Fonseca (2002) show that three eigenfunctions explain more
than 95% of the variance of the IVS found in S&P 500 and FTSE 100 op-
tions. The particulars of their empirical evidence are very close to the results
obtained from the DAX index options using either the CPC models or the
semiparametric factor model, see Section 5.2 and Section 5.4, respectively.
5.4 Semiparametric factor models 165

5.4 Semiparametric factor models

In modeling the IVS one faces two main challenges: first, the data design is
degenerated. Due to trading conventions, observations of the IVS occur only
for a small number of maturities such as one, two, three, six, nine, twelve, 18,
and 24 months to expiry on the date of issue. Consequently, IVs appear like
pearls strung on a necklace – or in short – as strings. This pattern has been
discussed in Section 2.5. For convenience, we display again the IVS together
with a plot, which shows the data design as seen from the top, Figure 5.14.
Options belonging to the same string have a common time to maturity, i.e.
lie on the same line. As time passes, the strings move through the maturity
axis towards expiry, while changing levels and shape in a random fashion.
As a second challenge, also in the moneyness dimension, the observation
grid does not cover the desired estimation grid at any point in time with the
same density. Consider, for instance the third IV string from the bottom: only
in a moneyness interval between 0.8 and 1.1 is occupied, while the coverage
for the second string from the bottom is much wider. The reasons for this
pattern can be twofold: first, these contracts have simply not been traded and
consequently do not show up in a (transaction based) data set. The second
reason – which is the more likely in this particular case – is hidden in the
specific institutional arrangements at the futures exchange with regard to the
creation of new contracts. Note that the options belonging to the third string
expire in July and have been created at the beginning of April. When new
contracts of a particular time to maturity are created, they are not available
on the entire strike spectrum: initially, only a certain range of OTM and
ITM options are open for trading. New contracts of this time to maturity are
subsequently born, as the underlying price moves. This practice ensures that a
minimum range of OTM and ITM options around the current spot price of the
underlying asset is always maintained. In reference to Figure 5.14, this means
that contracts of other strikes may simply not exist, since the underlying
moved too little between April and May.
Whatever the precise reasons are, it needs to be taken as a fact that even
when the data sets are huge as ours, for a large number of cases IV observations
are missing for certain subregions of the desired estimation grid. Of course,
this is a point that will be most virulent in transaction based data sets.
The dimension reduction techniques from the previous sections fit the IVS
on a grid for each day. Afterwards a PCA using a functional norm is applied
to the surfaces. For the semi- or nonparametric approximations to the IVS,
which are used within this work and which are promoted by Aı̈t-Sahalia and
Lo (1998), Rosenberg (2000), Aı̈t-Sahalia et al. (2001b), Cont and da Fonseca
(2002), Fengler et al. (2003b), and Fengler and Wang (2003), this design may
pose difficulties. For illustration, consider in Figure 5.15 (left panel) the fit
of a standard Nadaraya-Watson estimator. Bandwidths are h1 = 0.03 for the
moneyness and h2 = 0.04 for the time to maturity dimension (measured in
166 5 Dimension-reduced modeling

IVS Ticks 20000502

0.50

0.44

0.38

0.32

0.26

0.56 0.63
0.71 0.51
0.87 0.40
0.28
1.02 0.16
1.18

Data Design
0.7
0.6
0.5
Time to maturity
0.4
0.3
0.2
0.1
0

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Moneyness

Fig. 5.14. Left panel: call and put IVs observed on 20000502. Right panel: data
design on 20000502.
5.4 Semiparametric factor models 167

Model fit 20000502

-0.82

-0.98

-1.14

-1.30

-1.46

0.80 0.50
0.88 0.41
0.96 0.32
0.23
1.04 0.14
1.12

Semiparametric factor model fit 20000502

-0.85

-1.00

-1.16

-1.31

-1.47

0.80 0.50
0.88 0.41
0.96 0.32
0.23
1.04 0.14
1.12

Fig. 5.15. Nadaraya-Watson estimate and SFM fit for 20000502. Bandwidths for
both estimates h1 = 0.03 for the moneyness and h2 = 0.04 for the time to maturity
dimension.
168 5 Dimension-reduced modeling

years). The fit appears very rough, and there are huge holes in the surface,
since the bandwidths are too small to ‘bridge’ the gaps between the maturity
strings. In order to remedy this deficiency one would need to strongly increase
the bandwidths. But this can induce a model bias. Moreover, since the design
is time-varying, bandwidths would also need to be adjusted anew for each
trading day, which complicates daily applications.
As an alternative, we will introduce the semiparametric factor model
(SFM) with time-varying coefficients due to Fengler et al. (2003a). In this
approach the IVS is fitted each day at the observed design points which will
lead to a minimization with respect to functional norms that depend on time.
This procedure avoids bias effects which can ensue from global daily fits used
in standard FPCA. In the following, we present the model, discuss its esti-
mation, and provide an empirical analysis for our data for the years 1998 to
2001.

5.4.1 The model

We denote by J = [κf min , κf max ] × [τmin , τmax ] a two-dimensional interval


that represents a region of moneyness and time to maturity. Further, define
def
(log)-IV as yi,j = ln{b σi,j (κf , τ )}, where for our transaction based volatility
data set the index i is the number of the day (i = 1, . . . , I), and j = 1, . . . , Ji
is an intra-day numbering of the option traded on day i. The observations yi,j
are regressed on the two-dimensional covariables xi,j that contain forward
moneyness κf i,j and maturity τi,j . The SFM approximates the IVS by:

L
X
yi,j ≈ m0 (xi,j ) + βi,l ml (xi,j ) , (5.60)
l=1

where ml : J → R are smooth basis functions (l = 0, . . . , L). The IVS


is approximated by a weighted sum of smooth functions ml with weights
def
βi,l depending on time i. The factor loading β i = (βi,1 , . . . βi,L )> forms an
unobserved multivariate time series. By fitting the model (5.60), to the IV
strings, we obtain approximations β b . We argue that VAR estimation based
i
on β b is asymptotically equivalent to an estimation based on the unobserved
i
β i . After recovering the β i , we will model them in a suitable time series model.
Hence, the time series of the factor loadings may be seen as state variables.
This perspective reveals a close relationship of the model to Kalman filtering
and is discussed in Borak et al. (2005).
In order to estimate the nonparametric components ml and the state vari-
ables βi,l in (5.60), ideas from fitting additive models as in Stone (1986),
Hastie and Tibshirani (1990), and Horowitz et al. (2002) are borrowed. The
approach is related to functional coefficient models such as Cai et al. (2000).
Other semi- and nonparametric factor models include Connor and Linton
5.4 Semiparametric factor models 169

(2000), Gouriéroux and Jasiak (2001), Fan et al. (2003), and Linton et al.
(2003) among others. Nonparametric techniques are now broadly used in op-
tion pricing, e.g. Broadie et al. (2000), Aı̈t-Sahalia et al. (2001a), Aı̈t-Sahalia
and Duarte (2003), Daglish (2003), and interest rate modeling, e.g. Aı̈t-Sahalia
(1996), Ghysels and Ng (1989), and Linton et al. (2001).
Estimates m b l , (l = 0, . . . , L) and βbi,l (i = 1, . . . , I; l = 1, . . . , L) are
def
defined as minimizers of the following least squares criterion (βbi,0 = 1):

Ji Z
I X
( L
)2
X X
yi,j − βbi,l m
b l (u) Kh (u − xi,j ) du , (5.61)
i=1 j=1 l=0

where u = (u1 , u2 ) ∈ J . Further, Kh with h = (h1 , h2 ) denotes the two-


def
dimensional product kernel, Kh (u) = h−1 −1 −1 −1
1 K(1) (h1 u1 ) × h2 K(2) (h2 u2 ),
which is computed from one-dimensional kernels K(v).
In (5.61) the minimization runs over all functions m b l : J → R and all
values βbi,l ∈ R. For illustration let us consider the case L = 0 : the IVs yi,j
are approximated by a surface
P mb 0 that does notPdepend on time i. In this de-
generated case, m b 0 (u) = i,j Kh (u − xi,j )yi,j / i,j Kh (u − xi,j ), which is the
Nadaraya-Watson estimate based on the pooled sample of all days, compare
with Section 4.2 and particularly with Equation (4.13). In the algorithmic
implementation of (5.61), the integral is replaced by Riemann sums on a fine
grid.
Using (5.61) the IVS is approximated by surfaces moving in an L-
PL
dimensional affine function space {m b l : α1 , . . . , αL ∈ R}. The
b 0 + l=1 αl m
estimates m
b l are not uniquely defined: they can be replaced by functions that
span the same affine space. In order to respond to this problem, we select m bl
such that they are orthogonal.
Replacing m b l in (5.61) by m
b l + δg with arbitrary functions g and taking
derivatives with respect to δ yields for 0 ≤ l0 ≤ L
I X Ji
( L
)
X X
yi,j − b l (u) βbi,l0 Kh (u − xi,j ) = 0 .
βbi,l m (5.62)
i=1 j=1 l=0

Furthermore, by replacing βbi,l by βbi,l +δ in (5.61) and again taking deriva-


tives with respect to δ, we get, for 1 ≤ l0 ≤ L and 1 ≤ i ≤ I:
Ji Z
( L
)
X X
yi,j − βbi,l m
b l (u) mb l0 (u)Kh (u − xi,j ) du = 0 . (5.63)
j=1 l=0

Introducing the following notation for 1 ≤ i ≤ I


170 5 Dimension-reduced modeling
Ji
1 X
pbi (u) = Kh (u − xi,j ) , (5.64)
Ji j=1
Ji
1 X
qbi (u) = Kh (u − xi,j )yi,j , (5.65)
Ji j=1

we obtain from (5.62)-(5.63) for 1 ≤ l0 ≤ L, 1 ≤ i ≤ I:


I
X I
X L
X
Ji βbi,l0 qbi (u) = Ji βbi,l0 βbi,l pbi (u)m
b l (u) , (5.66)
i=1 i=1 l=0
Z L
X Z
qbi (u)m
b l0 (u) du = βbi,l pbi (u)m
b l0 (u)m
b l (u) du . (5.67)
l=0

We calculate the estimates by iterative use of (5.66) and (5.67). We start


(0)
with initial values βbi,l for βbi,l . A possible choice of the initial β b could corre-
i
spond to fits of an IVS that is piecewise constant on time intervals I1 , . . . , IL .
(0) (0)
This means, for l = 1, . . . , L, put βbi,l = 1 (for i ∈ Il ), and βbi,l = 0 (for
L
S
i∈/ Il ). Here I1 , . . . , IL are pairwise disjoint subsets of {1, . . . , I} and Il is
l=1
(r)
a strict subset of {1, . . . , I}. For r ≥ 0, we put βbi,0 = 1. Define the matrix
(r)
B (u) by its elements:

  I
(r) def (r−1) b(r−1)
X
bl,l0 (u) = Ji βbi,l0 βi,l pbi (u) , 0 ≤ l, l0 ≤ L , (5.68)
i=1

and introduce a vector q(r) (u) with elements

  I
(r) def (r−1)
X
ql (u) = Ji βbi,l qbi (u) , 0≤l≤L. (5.69)
i=1

In the r-th iteration the estimate m


b = (m b L )> is given by
b 0, . . . , m

b (r) (u) = B(r) (u)−1 q(r) (u) .


m (5.70)

This update step is motivated by (5.66). The values of β are updated in


the r-th cycle as follows: define the matrix M(r) (i)
  Z
(r) def (r) (r)
Ml,l0 (i) = pbi (u)m
b l0 (u)m
b l (u) du , 1 ≤ l, l0 ≤ L , (5.71)

and define a vector s(r) (i):


5.4 Semiparametric factor models 171
Z Z
(r)  def (r) (r)
sl (i) = b l (u) du −
qbi (u)m pbi (u)m
b 0 (u)m
b l (u) du , 1≤l≤L.
(5.72)
Motivated by (5.67), put
 >
(r) (r)
βbi,1 , . . . , βbi,L = M(r) (i)−1 s(r) (i) . (5.73)

The algorithm is run until only minor changes occur. In the implemen-
tation, we choose a grid of points and calculate m b l at these points. In the
calculation of M(r) (i) and s(r) (i), we replace the integral by a Riemann in-
tegral approximation using the values of the integrated functions at the grid
points.

5.4.2 Norming of the estimates

As discussed above, m b l and βbi,l are not uniquely defined. Therefore, we or-
PI
thogonalize m b L in L2 (p̂), where p̂(u) = I −1 i=1 p̂i (u), such that
b 0, . . . , m
PI b2 PI 2
i=1 βi,1 is maximum, and given βi,1 , m
b b 0, mb 1 , i=1 βbi,2 is maximum, and so
forth. These aims can be achieved by the following two steps: first replace
m b new
b 0 by m 0 b 0 − γ > Γ−1 m
=m b ,

m b new = Γ−1/2 m
b by m b , (5.74)
n o
β b new
b by β = Γ1/2 βb + Γ−1 γ ,
i i i

>
where we redefine the vector m b = (m b 1, . . . , m
bRL ) not to contain m b 0 any
>
more. Further we define the (L × L) matrix Γ = m(u) b m(u)
b p̂(u) du, or for
def R
clarity elementwise by Γ = (γl,l0 ), with γl,l0 = m b l (u) m
b l0 (u)p̂(u)du. Finally,
def R
we have γ = (γl ), with γl = m b 0 (u)m
b l (u)p̂(u) du.
Note that by applying (5.74), m b 0 is replaced by a function that minimizes
b 20 (u)p̂(u)du. This is evident because m
R
m b 0 is orthogonal to the linear space
spanned by m b 1, . . . m
b L . By the second equation of (5.74), m b 1, . . . , m
b L are
replaced by orthonormal functions in L2 (p̂).
In a second  step, PIwe bproceed as in PCA and define a matrix B e with
elements ebl,l0 = β β
i=1 i,l i,l
b 0 and calculate the eigenvalues of B,
e λ1 >
. . . > λL , and the corresponding eigenvectors z1 , . . . zL . Put Z = (z1 , . . . , zL ).
Replace
mb by m b new = Z> m b , (5.75)
b new
(i.e. m l = z>
l m),
b and

β b new = Z> β
b by β b . (5.76)
i i i
172 5 Dimension-reduced modeling

After the application of (5.75) and (5.76), the orthonormal basis of the
PI 2
model m b 1, . . . , m
b L is chosen such that i=1 βbi,1 is maximum, and – given
P I 2
βi,1 , m
b b 0, m
b 1 – the quantity i=1 βi,2 is maximum, and so on, i.e. m
b b 1 is chosen
such that as much as possible is explained by βi,1 m b b 1 . Next mb 2 is chosen to
achieve the maximum explanation by βbi,1 m b 1 + βbi,2 m
b 2 , and so forth.
Unlike in Section 5.3.1 on FPCA, the functions m b l are not eigenfunctions of
an operator. This is because we use a different norm, namely f 2 (u)p̂i (u)du,
R

for each day. Through the norming procedure the functions are chosen
as eigenfunctions in an L-dimensional approximating linear space. The L-
dimensional approximating spaces are not necessarily nested for increasing L.
For this reason the estimates cannot be calculated by an iterative procedure
that starts by fitting a model with one component, and that uses the old L − 1
components in the iteration step from L − 1 to L to fit the next component.
The calculation of m b 0, . . . , m
b L has to be redone for different choices of L.

5.4.3 Choice of model parameters

For the choice of L, we consider the residual sum of squares for different L:
PI PJi n PL b o2
def i j y i,j − l=0 βi,l m
b l (xi,j )
RV (L) = PI PJi , (5.77)
2
i j (yi,j − ȳ)

where ȳ denotes the overall mean of the observations. The quantity 1−RV (L)
is the portion of variance explained in the approximation, and L can be in-
creased until a sufficiently high level of fitting accuracy is achieved. As has
been explained for the CPC models, see Equation 5.30, this is a common
selection method also in PCA.
For a data-driven choice of bandwidths, we propose an approach based on
a weighted Akaike Information Criterion (AIC). We argue for using a weighted
criterion, since the distribution of the observations is far from regular, as was
seen from Figure 5.16. As mentioned in Section 4.3, this leads to nonconvexity
in the criterion and typically to inacceptably small bandwidths. Given the
unequal distribution of observations, it is natural to punish the criterion in
areas where the distribution is sparse. For a given weight function w, consider:
L
def 1 X X
4(m0 , . . . , mL ) = E {yi,j − βi,l ml (xi,j )}2 w(xi,j ) , (5.78)
N i,j
l=0

for functions m0 , . . . , mL . We choose bandwidths such that 4(m


b 0, . . . , m
b L ) is
minimum. According to the AIC this is asymptotically equivalent to minimiz-
ing:
5.4 Semiparametric factor models 173
L
def 1 X X
ΞAIC1 = {yi,j − b l (xi,j )}2 w(xi,j )
βbi,l m
N i,j
l=0
 Z 
L
× exp 2 Kh (0) w(u)du . (5.79)
N

Alternatively, one may consider the computationally easier criterion:


L
def 1 X X
ΞAIC2 = {yi,j − b l (xi,j )}2
βbi,l m
N i,j
l=0
 R 
L w(u) du
× exp 2 Kh (0) R . (5.80)
N w(u)p(u) du

def
Putting w(u) = 1, delivers the common AIC, see in particular Sec-
tion 4.4.1. This, however, does not take into account the quality of the es-
timation at the boundary regions or in regions where the data are sparse,
since in these regions p(u) is small. We propose to choose
def 1
w(u) = , (5.81)
p(u)
which gives equal weight everywhere as can be seen by the following consid-
erations:
1 X 2
4(m0 , . . . , mL ) = E ε w(xi,j )
N i,j
" L #2
1 X X
+ E βi,l {ml (xi,j ) − m
b l (xi,j )} w(xi,j )
N i,j
l=0
Z
2
≈σ w(u)p(u) du
Z " L
#2
1 X X
+ βi,l {ml (u) − m
b l (u)} w(u)p(u) du .
N i,j
l=0
(5.82)

The two criteria become:


L
def 1 X X
ΞAIC1 = {yi,j − b l (xi,j )}2 p̂(xi,j )
βbi,l m
N i,j
l=0
 Z 
L 1
× exp 2 Kh (0) du , (5.83)
N p̂(u)
and
174 5 Dimension-reduced modeling
L
def 1 X X
ΞAIC2 = {yi,j − b l (xi,j )}2
βbi,l m
N i,j
l=0
 Z 
L 1
× exp 2 Kh (0) µ−1 λ du , (5.84)
N p̂(u)
def
where µλ = (κf max − κf min )(τmax − τmin ) denotes the Lebesgue measure of
the design set J .
Under some regularity conditions, the AIC is an asymptotically unbiased
estimate of the mean average squared error (MASE), Section 4.4. In our set-
ting it would be consistent if the density of xi,j did not depend on day i.
Due to the irregular design, this is an unrealistic assumption. For this reason,
ΞAIC1 and ΞAIC2 estimate weighted versions of the MASE.
In our AIC, the penalty term does not punish for the number of param-
eters βbi,l that are employed to model the time series. This can be neglected
because we will use a finite dimensional model for the dynamics of βi,l . The
corresponding penalty term is negligible compared to the smoothing penalty
term. A corrected penalty term that takes care of the parametric model of βbi,j
will be considered in the empirical part in Section 5.4.4 where the prediction
performance is assessed.
Clearly the choice of h and L are not independent. From this point of
view, one may think about minimizing (5.83) or (5.84) over both parameters.
However, our practical experience shows that for a given L, changes in the
criteria from a variation in h are small compared to a variation in L for a
given h. To reduce the computational burden, we use (5.77) to determine the
model size L, and then (5.83) and (5.84) to optimize h for a given L.
The convergence of the iteration cycles is measured by
I Z X
L k
def (r) (r) (r−1) (r−1)
X
Qk (r) = βbi m b l (u) − βbi m
bl (u) du . (5.85)
i=1 l=0

As above (r) denotes the result from the rth cycle of the estimation. Here, we
approximate the integral by a simple sum over the estimation grid. Putting
k = 1, 2, we have an L1 and an L2 measure of convergence. Iterations are
stopped when Qk (r) ≤ k for some small  > 0.

5.4.4 Empirical analysis

IVs are observed only for particular strings, but in practice, one thinks about
them as being the observed values of an entire surface, the IVS. This is ev-
ident, when one likes to price and hedge over-the-counter options expiring
at intermediate maturities. We model log-IV on xi,j = (κi,j , τi,j )> . Our esti-
mation set J covers in moneyness κf ∈ [0.80, 1.20] and in time to maturity
τ ∈ [0.05, 0.5] measured in years.
5.4 Semiparametric factor models 175

h1 h2 ΞAIC1 ΞAIC2 Vβb Vmb


0.01 0.02 0.000737 0.00151 0.015 0.938
0.01 0.04 0.000741 0.00150 0.003 0.579
0.01 0.06 0.000739 0.00152 0.005 0.416
0.01 0.08 0.000736 0.00163 0.011 0.434
0.02 0.02 0.001895 0.00237 0.104 3.098
0.02 0.04 0.000738 0.00150 0.001 0.181
0.02 0.06 0.000741 0.00151 0.004 0.196
0.02 0.08 0.000742 0.00156 0.008 0.279
0.02 0.10 0.000744 0.00162 0.011 0.339
0.03 0.02 0.002139 0.00256 0.111 3.050
0.03 0.04 0.000739 0.00149 − −
0.03 0.06 0.000743 0.00152 0.004 0.180
0.03 0.08 0.000743 0.00156 0.008 0.273
0.03 0.10 0.000744 0.00162 0.011 0.337
0.04 0.02 0.002955 0.00323 0.138 3.017
0.04 0.04 0.000743 0.00151 0.001 0.088
0.04 0.06 0.000746 0.00154 0.005 0.211
0.04 0.08 0.000745 0.00157 0.008 0.293
0.04 0.10 0.000746 0.00163 0.012 0.353
0.05 0.02 0.003117 0.00341 0.142 2.962
0.05 0.04 0.000748 0.00155 0.001 0.148
0.05 0.06 0.000749 0.00157 0.005 0.241
0.05 0.08 0.000748 0.00160 0.008 0.312
0.05 0.10 0.000749 0.00167 0.012 0.368
0.06 0.02 0.003054 0.00343 0.139 2.923
0.06 0.04 0.000755 0.00160 0.002 0.193
0.06 0.06 0.000756 0.00163 0.005 0.268
0.06 0.08 0.000754 0.00166 0.009 0.330
0.06 0.10 0.000754 0.00172 0.012 0.383
Table 5.9. Bandwidth selection via AIC as given in (5.83) and (5.84) for different
choices of h = (h1 , h2 )> : h1 refers to moneyness and h2 to time to maturity measured
in years; the bandwidths chosen are highlighted in bold. In all cases L = 3. Vβb and Vmb

measure the change in β b and m b as functions of h relative to the optimal bandwidth


h∗ = (0.03, 0.04)> , compare (5.86) and (5.87).

In this model, we employ L = 3 basis functions, which capture around


96.0% of the variations in the IVS. We believe this to be of sufficiently high
accuracy. Bandwidths used are h1 = 0.03 for moneyness and h2 = 0.04 for
time to maturity. This choice is justified by Table 5.9 which presents estimates
for the two AIC criteria. Both criterion functions become very flat near the
minimum, especially ΞAIC1 . However, ΞAIC2 assumes its global minimum in
the neighborhood of h∗ = (0.03, 0.04)> , which is why we opt for this pair of
bandwidths. In Table 5.9, we also display a measure of how the factor loadings
and the basis functions change relative to the optimal bandwidth h∗ . More
176 5 Dimension-reduced modeling

precisely, we compute:
v
u L
uX
Vβb(hk ) = t Var{|βbi,l (hk ) − βbi,l (h∗ )|} , (5.86)
l=0
v
u L
uX
and Vm
b (hk ) =
t Var{|m b l (u; h∗ )|} ,
b l (u; hk ) − m (5.87)
l=0

where hk runs over the values given in Table 5.9, and Var(x) denotes the
variance of x. It is seen that changes in mb are 10 to 100 times higher in
magnitude than those for β. b This corroborates the approximation in (5.82)
that treats the factor loadings as known.
In being able to choose such small bandwidths, the strength of the mod-
eling approach is demonstrated: the bandwidth in the time to maturity di-
mension is so small that in the fit of a particular day, data from contracts
with two adjacent time to maturities do not enter together pbi (u) in (5.64)
and qbi (u) in (5.65). In fact, for a given u0 , the quantities pbi (u0 ) and qbi (u0 )
are zero most of the time, and only assume positive values for dates i when
the observations are in the local neighborhood of u0 . The same applies to the
moneyness dimension. Of course, during the entire observation period I, it
is mandatory that at least some observations for each u at some dates i are
made.
In Figure 5.16, we display the L1 and L2 measures of convergence. Conver-
gence is achieved quickly. The iterations were stopped after 25 cycles, when
the L2 was less than 10−5 . Figures 5.17 to 5.19 display the functions m b 1 to
mb 4 together with contour plots. We do not display the invariant function m b 0,
since it essentially is the zero function of the affine space fitted by the data:
both mean and median are zero up to 10−2 in magnitude. We believe this to
be pure estimation error. The remaining functions exhibit more interesting
patterns: m b 1 in Figure 5.17 is positive throughout, and mildly concave. There
is little variability across the term structure. Since this function belongs to
the weights with highest variance, we interpret it as the time dependent mean
of the (log)-IVS, i.e. a shift effect. Clearly, these observations are (and must
be) an iteration of the results from our CPC analysis in Section 5.2.5, see also
Cont and da Fonseca (2002).
5.4 Semiparametric factor models 177

6
5
4
3
log_10(Fitting Criterion)
2
1
0
-1
-2
-3
-4

5 10 15 20 25
Number of Iterations

Average density

58.16

46.56

34.96

23.36

11.75

0.80 0.50
0.88 0.41
0.32
0.96 0.23
1.04 0.14
1.12

Fig. 5.16. Left panel: convergence in the SFM model. Solid line shows the L1 , the
dotted line the L2 measure of convergence.
P The total number of iterations are 25.
Right panel: average density p̂(u) = I −1 Ii=1 p̂i (u). Bandwidths are h1 = 0.03 for
moneyness and h2 = 0.04 for time to maturity.
178 5 Dimension-reduced modeling

Min. Max. Mean Median Stdd. Skewn. Kurt.


βb1 -1.541 -0.462 -1.221 -1.260 0.206 1.101 4.082
βb2 -0.075 0.106 0.001 0.002 0.034 0.046 2.717
βb3 -0.144 0.116 0.002 -0.001 0.025 0.108 5.175

Table 5.10. Summary statistics of SFM factor loadings β.


b

βbi,1 βbi,2 βbi,3


βbi,1 1 0.241 0.368
βbi,2 1 −0.003
βbi,3 1

Table 5.11. Contemporaneous correlation matrix of β.


b

Function mb 2 , depicted in Figure 5.18, changes sign around the ATM re-
gion, which implies that the smile deformation of the IVS is exacerbated or
mitigated by this eigenfunction. Hence we consider this function as a money-
ness slope effect of the IVS. Finally, m b 3 is positive for the very short term
contracts, and negative for contracts with maturity longer than 0.1 years,
Figure 5.19. Thus, a positive weight in βbi,3 lowers short term IVs and in-
creases long term IVs: m b 3 generates the term structure dynamics of the IVS,
i.e. it provides a term structure slope effect.
To appreciate the power of the SFM, we inspect again the situation of
20000502. In Figure 5.20 we compare a Nadaraya-Watson estimator (left
panel) with the SFM (right panel). In the first case, the bandwidths are in-
creased to h = (0.06, 0.25)> in order to remove all holes and excessive varia-
tion in the fit, while for the latter the bandwidths are kept at h = (0.03, 0.04)> .
While both fits look quite similar at a first glance, the differences are best visi-
ble when both cases are contrasted for each time to maturity string separately,
Figures 5.21 to 5.24. Note that these figures do not display separate fits of the
smile functions. What we display are slices from the two-dimensional surfaces.
As is well seen, the standard Nadaraya-Watson fit exhibits a strong di-
rectional bias, especially in the wings of the IVS. For instance, for the short
maturity contracts, Figure 5.21, the estimated IVS is too low both in the
OTM put and the OTM call region. At the same time, levels are too high for
the 45 days to expiry contracts, Figure 5.22. For the 80 days to expiry case,
Figure 5.23, the fit exhibits an S-formed shape, although the data lie almost
on a linear line. Also the SFM is not entirely free of a directional bias, but
clearly the fit is superior.
Figure 5.25 shows the entire time series of βbi,1 to βbi,3 , the summary statis-
tics are given in Table 5.10 and contemporaneous correlation in Table 5.11.
The correlograms given in the lower panel of Figure 5.25 display the rich
5.4 Semiparametric factor models 179

0.16 0.27 0.38 0.50

0.89
1.28

1.14

0.99
1.00
0.84

0.70

0.80 0.50
0.88 0.41 1.10
0.96 0.32
1.04 0.23
0.14
1.12

1.20

Fig. 5.17. Factor m b 1 in the left panel (moneyness lower left axis). Right panel shows
contour plots of this function (moneyness left axis). Lines are thick for positive level
values, thin for negative ones. The gray scale becomes increasingly lighter the higher
the level in absolute value. Stepwidth between contour lines is 0.028, estimated from
ODAX data 19980101-20010531.

0.16 0.27 0.38 0.50

0.89
2.73

1.70

0.67
1.00
-0.36

-1.39

0.80 0.50
0.88 0.41 1.10
0.96 0.32
1.04 0.23
0.14
1.12

1.20

Fig. 5.18. Factor m b 2 in the left panel (moneyness lower left axis). Right panel shows
contour plots of this function (moneyness left axis). Lines are thick for positive level
values, thin for negative ones. The gray scale becomes increasingly lighter the higher
the level in absolute value. Stepwidth between contour lines is 0.225, estimated from
ODAX data 19980101-20010531.
180 5 Dimension-reduced modeling

0.16 0.27 0.38 0.50

0.89
2.40

0.88

-0.63
1.00
-2.15

-3.66

0.80 0.50
0.88 0.41 1.10
0.96 0.32
1.04 0.23
0.14
1.12

1.20

Fig. 5.19. Factor m b 3 in the left panel (moneyness lower left axis). Right panel
shows contour plots of this function (moneyness left axis). Lines are thick for
positive level values, thin for negative ones. The gray scale becomes increasingly
lighter the higher the level in absolute value. Stepwidth between contour lines is
0.240, estimated from ODAX data 19980101-20010531.

autoregressive dynamics of the factor loadings. The ADF tests, Table 5.12,
indicate a unit root for βbi,1 and βbi,2 at the 5% level. In following the pathway
taken in Section 5.2.5 for the CPC models, one may model the first differences
of the first two loading series together with the levels of βbi,3 in a parsimonious
VAR framework. Alternatively, since the results are only marginally signifi-
cant, one may estimate the levels of the loading series in a rich VAR model.
Although our results from Section 5.2.5 also suggest a GARCH specification,
we opt for the VAR model in levels. The main reason is that the loading series
of the SFM – unlike those obtained from the CPC models – are not uncorre-
lated. Accordingly, one would need to specify a multivariate GARCH model.
However, even for moderate dimensions the likelihood function of the multi-
variate GARCH model is quickly untractable or can deliver unstable results,
Fengler and Herwartz (2002). As an alternative, one may consider dynamical
correlation models. Introduced by Engle (2002) and Tse and Tsui (2002), they
enjoy increasing popularity due to their tractability and richness of volatility
and correlation patterns they allow for. We shall not pursue this model class
at this point, but it may be profitable to do so in the future.
Given the preceding considerations we model the levels of the factor load-
ings in a VAR(2) model. The results are presented in Table 5.13. The estima-
tion also includes a constant and two dummy variables, assuming the value
one right at those days and one day after, when the corresponding IV obser-
vations of the minimum time to maturity string (10 days to expiry) were to be
5.4 Semiparametric factor models 181

Model fit 20000502

-0.85

-1.00

-1.16

-1.31

-1.47

0.80 0.50
0.88 0.41
0.96 0.32
0.23
1.04 0.14
1.12

Semiparametric factor model fit 20000502

-0.85

-1.00

-1.16

-1.31

-1.47

0.80 0.50
0.88 0.41
0.96 0.32
0.23
1.04 0.14
1.12

Fig. 5.20. Nadaraya-Watson estimator with h = (0.06, 0.25)> and SFM with h =
(0.03, 0.04)> for 20000502.
182 5 Dimension-reduced modeling

Traditional string fit 20000502, 17 days to exp. Individual string fit 20000502, 17 days to exp.

-1

-1
-1.2

-1.2
-1.4

-1.4
-1.6

-1.6
0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2
Moneyness Moneyness

Fig. 5.21. Bias comparison of the Nadaraya-Watson estimator with


h = (0.06, 0.25)> (left panel) and the SFM with h = (0.03, 0.04)> (right
panel) for the 17 days to expiry data (black dots) on 20000502.

Traditional string fit 20000502, 45 days to exp. Individual string fit 20000502, 45 days to exp.
-1.1

-1.1
-1.2

-1.2
-1.3

-1.3
-1.4

-1.4
-1.5

-1.5

0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2


Moneyness Moneyness

Fig. 5.22. Bias comparison of the Nadaraya-Watson estimator with


h = (0.06, 0.25)> (left panel) and the SFM with h = (0.03, 0.04)> (right
panel) for the 45 days to expiry data (black dots) on 20000502.

excluded from the estimation of the SFM, as is described in the Appendix A.


This is to capture possible seasonality effects introduced from the data filter.
Estimation results are displayed in Table 5.13. In the equations of βbi,1
and βbi,2 the constants and dummies are weakly significant. For the sake of
clarity, estimation results on the constant and the dummy variables are not
shown. As is seen all factor loadings follow AR(2) processes. There are also a
number of remarkable cross dynamics: first order lags in the level dynamics,
5.4 Semiparametric factor models 183

Traditional string fit 20000502, 80 days to exp. Individual string fit 20000502, 80 days to exp.

-1.1

-1.1
-1.2

-1.2
-1.3
-1.3

-1.4
-1.4

-1.5
-1.5

0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2


Moneyness Moneyness

Fig. 5.23. Bias comparison of the Nadaraya-Watson estimator with h =


(0.06, 0.25)> (left panel) and the SFM with h = (0.03, 0.04)> (right panel) for the 80
days to expiry data (black dots) on 20000502.

Traditional string fit 20000502, 136 days to exp. Individual string fit 20000502, 136 days to exp.
-1.1

-1.1
-1.2

-1.2
-1.3

-1.3
-1.4

-1.4
-1.5

-1.5
-1.6

-1.6

0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2


Moneyness Moneyness

Fig. 5.24. Bias comparison of the Nadaraya-Watson estimator with h =


(0.06, 0.25)> (left panel) and the SFM with h = (0.03, 0.04)> (right panel) for
the 136 days to expiry data (black dots) on 20000502.

βbi,1 , have a positive impact on the term structure, βbi,3 . Second order lags in
the term structure dynamics themselves influence positively the moneyness
slope effect, βbi,2 , and negatively the shift variable βbi,1 : thus shocks in the
term structure may decrease the level of the smile and aggravate the skew.
Similar interpretations can be revealed from other significant coefficients in
Table 5.13.
184 5 Dimension-reduced modeling

basis coeff. 1 basis coeff. 2 basis coeff. 3

-0.5

0.1

0.1
0.05
0.05

0
-1

-0.05
-0.1
-0.05
-1.5

1998 1999 2000 2001 1998 1999 2000 2001 1998 1999 2000 2001
Time Time Time

basis coeff. 1: ACF basis coeff. 2: ACF basis coeff. 3: ACF


1

1
0.5

0.5

0.5
acf

acf

acf
0

0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
lag lag lag

Fig. 5.25. Upper panel: time series of weights β.


b Lower panel: autocorrelation func-
tions.

Coefficient Test Statistic # of lags


βbi,1 -2.68 3
βbi,2 -3.20 1
βbi,3 -6.11 2

Table 5.12. ADF tests on βbi,1 to βbi,3 for the full IVS model, intercept included in
each case. Third column gives the number of lags included in the ADF regression.
For the choice of lag length, we started with four lags, and subsequently deleted lag
terms, until the last lag term became significant at least at a 5% level. MacKinnon
critical values for rejecting the hypothesis of a unit root are -2.87 at 5% significance
level, and -3.44 at 1% significance level.

In earlier specifications of the model, we also included contemporaneous


and lagged DAX returns into the regression equation. However, the competitor
in our horse-race in the following section is a simple one-step predictor without
any exogenous information. Therefore, we choose a simple VAR framework
without exogenous variables due to fairness.
5.4 Semiparametric factor models 185

Equation
Dependent variable βbi,1 βbi,2 βbi,3
βbi−1,1 0.978 -0.009 0.047
[24.40] [-1.21] [ 3.70]
βbi−2,1 0.004 0.012 -0.047
[ 0.08] [ 1.63] [-3.68]
βbi−1,2 0.182 0.861 0.134
[ 0.92] [ 23.88] [ 2.13]
βbi−2,2 -0.129 0.109 -0.126
[-0.65] [ 3.03] [-2.01]
βbi−1,3 0.115 -0.019 0.614
[ 0.97] [-0.89] [ 16.16]
βbi−2,3 -0.231 0.030 0.248
[-1.96] [ 1.40] [ 6.60]
R̄2 0.957 0.948 0.705
F -statistic 2405.273 1945.451 258.165

Table 5.13. Estimation results of an VAR(2) of the factor loadings β b . t-statistics


i
given in brackets, R̄2 denotes the adjusted coefficient of determination. The estima-
tion includes an intercept and two dummy variables (both not shown), which assume
the value one right at those days and one day after, when the corresponding IV ob-
servations of the minimum time to maturity string (10 days to expiry) were to be
excluded from the estimation of the SFM.

5.4.5 Assessing prediction performance

We now study the prediction performance of our model compared with a


benchmark model. Model comparisons that have been conducted, for instance
by Bakshi et al. (1997), Dumas et al. (1998), Bates (2000), and Jackwerth
and Rubinstein (2001), often show that so called ‘naı̈ve trader models’ per-
form best or only little worse than more sophisticated models. These models
used by professionals simply assert that today’s IV is tomorrow’s IV. There
are two versions: the sticky strike assumption pretends that IV is constant at
fixed strikes. The sticky delta or sticky moneyness version asserts the same
for IVs observed at a fixed moneyness or option delta, Derman (1999). We use
the sticky moneyness model as our benchmark. There are two reasons for this
choice: first, from a methodological point of view, as has been shown by Bal-
land (2002) and Daglish et al. (2003), the sticky strike rule as an assumption
on the stochastic process governing IVs, is not consistent with the existence
of a smile. The sticky moneyness rule, however, can be. Second, since we es-
timate our model in terms of moneyness, the sticky moneyness rule is most
natural.
The methodology in comparing the prediction performance is as follows: as
presented earlier, the resulting times series of latent factors βbi,l is replaced by
186 5 Dimension-reduced modeling

a time series model with fitted values βei,l (θ̂) based on βbi0 ,l with i0 ≤ i − 1 , 1 ≤
l ≤ L, where θ̂ is a vector of estimated coefficients seen in Table 5.13. Similarly
as before, we employ an AIC based on the fitted values as an asymptotically
unbiased estimate of the mean square prediction error.
def
For the model comparison, we use the criterion ΞAIC1 with w(u) = 1.
Additionally we penalize the dimension of the fitted time series model β(θ):
e

Ji
I X
( L
)2
def
X X
−1
Ξ
eAIC = N yi,j − βei,l (θ̂) m
b l (xi,j )
i j l=0
 
L 2 dim(θ)
× exp 2 Kh (0) µλ + . (5.88)
N N

In our case dim(θ) = 27, since we have for three equations six VAR-coefficients
plus the constant and two dummy variables.
Criterion (5.88) is compared with the squared one-day prediction error of
the sticky moneyness (StM) model:
Ji
I X
def
X 2
ΞStM = N −1 (yi,j − yi−1,j 0 ) . (5.89)
i j

In practice, since one hardly observes yi,j at the same moneyness as in


i − 1, yi−1,j 0 is obtained via a localized interpolation of the previous day’s
smile. Time to maturity effects are neglected, and observations, the previous
values of which are lost due to expiry, are deleted from the sample.
Running the model comparison shows:

ΞStM = 0.00476 ,
Ξ
eAIC = 0.00439 .

Thus, the model comparison reveals that the SFM is approximately 10%
better than the naı̈ve trader model. This is a substantial improvement given
the high variance in IV and financial data in general. An alternative approach
would investigate the hedging performance of our model compared with other
models, e.g. in following Engle and Rosenberg (2000). This is left for further
research.
5.5 Summary 187

5.5 Summary

This chapter is divided into two main parts. In the first part, we presented
CPC models as a natural means of modeling the IVS. The CPC approach
comprises an entire hierarchy of models. This allows for a detailed analysis of
the ‘degree of commonness’ within different maturity groups of the IVS. We
derived tests to assess stability of the factor loadings across different samples
and found that only the first component may be considered as being suffi-
ciently stable. The other components fluctuate from sample to sample year.
Finally, we modelled the resulting time series by means of ARCH and GARCH
processes.
In the second part, we digressed on FPCA for IVS modeling. Then, we
presented a semiparametric factor model as a new modeling approach to the
IVS. The key advantage is that it takes care of the discrete string structure of
IV data. The technique can be seen as a combination from FPCA and back-
fitting in additive models. Unlike other studies, this ansatz is tailored to the
degenerated design of IV data by fitting basis functions in the local neighbor-
hood of the design points only. This can reduce bias effects in the estimation
of the IVS. Due to its flexible semiparametric structure, the SFM may also
be advantageous compared to the CPC approach given the structural shifts
in the underlying data. After estimating the factor functions, we fitted vector
autoregressive processes of order two to the factor series. The presentation of
the SFM concluded with a horse race between the SFM and the ‘naı̈ve trader
model’. We found the SFM to be approximately 10% superior to the more
simple model.
Our analysis has shown that CPC and SFM models are powerful dimension
reduction techniques in the context of IVS modeling. Typically, the IVS allows
for a decomposition into three factor that drive the surface. These factors
can be interpreted as a shift factor, which accounts for around 80% of the
variation, a slope and a twist or term structure factor. This result can have
numerous applications: an obvious one is risk management, for instance in
scenario analysis and stress tests of portfolios. In order to make the SFM
more tractable, it may be good to replace the nonparametric functions by
suitable parametric approximations. Then, Monte Carlo simulations of the
models along the lines of Jamshidian and Zhu (1997) are straightforward.
6

Conclusion and outlook

The implied volatility (IV) smile and implied volatility surface (IVS) are em-
pirical phenomena that have spurred research since the discovery of the Black-
Scholes (BS) formula in the nineteen-seventies. Two main strands of literature
have dominated the research agenda since then. The first tries to exploit IV
as a predictor for asset price fluctuations. The second seeks to provide alter-
native option pricing models that explain the existence of the volatility smile.
Recently, a third line of research has emerged: shaped by the establishment
of organized futures markets that allow trading of standardized derivatives at
low costs with high liquidity, this new research aims at exploiting the infor-
mation content of option prices or the IVS for the pricing of more complicated
derivatives or positions. This approach has been termed smile consistent mod-
eling.
The IVS is an input factor in almost any smile consistent model, either
directly or in some intermediate step such as the reconstruction of the lo-
cal volatility surface: it may come along as a simple estimate of the current
surface or as a fully specified dynamic model describing the propagation of
the IVS through time. Its accuracy and precision are the decisive competitive
advantages for any smile consistent pricing model. This is particularly obvi-
ous for the complex derivatives and structured products that emerged on the
markets: several underlying assets of all different kinds such as stocks, bonds
and commodity linked products are comprised into a single structured deriva-
tive with complicated path-dependent payoffs, Overhaus (2002) and Quessette
(2002). These products are likely to exhibit high sensitivity to volatility and
are very susceptible to any misspecification of the volatility process.
Besides introducing into the financial theory of smile consistent approaches,
the aim of this book is to take a specific semiparametric perspective towards
two main aspects of model building of the IVS: smoothing and dimension-
reduced modeling. We believe that such an approach is well placed given the
challenges we face in this context: the unknown, complicated functional form
of the IVS and its intricate discrete design. Non- and semiparametric tech-
190 6 Conclusion and outlook

niques do not require any a priori knowledge of the functional form which
is fitted to the data. Rather, it is the IV observations that ‘decide’. Since
from theory only loose restrictions on the IVS can be derived, for instance in
terms of wide no-arbitrage bounds on the slopes, this approach appears to be
particularly attractive.
Smile consistent models are a fruitful field of research, and we can resort
on a wide spectrum of different approaches and specifications today. However,
the current literature lacks empirical assessments and especially investigations
of their hedging performance. These studies should include exotic options and
be performed in comparison with competing model classes, such as stochastic
volatility and jump-diffusion models. This will also shed new light on the delta
debate. Stochastic variants of local volatility models may serve as an elegant
way to circumvent the delta problem, Derman and Kani (1998), Alexander
and Nogueira (2004), but it remains to be shown how they can be employed
effectively for the pricing of exotic derivatives.
A topic for further research is the stability of the dimension reduction.
Instead of estimating on predefined intervals, an alternative is to embed it
into a framework of adaptive window choice as developed by Spokoiny (1998).
Within this setting, one would aim at identifying time-homogeneous intervals
on which the dimension reduction is performed. Examples of this approach in
(realized) volatility modeling are Härdle et al. (2003), Mercurio (2004), and
Mercurio and Spokoiny (2004).
Besides from modeling the IVS, common principle component (CPC) mod-
els are a natural choice whenever the data fall into a number of groups. This
is encountered a lot of times in economics and finance: for instance, the same
variables may be measured in different countries and markets. Thus, CPC
models have found application in the analysis of the term structure of interest
rates across different countries, Alexander and Lvov (2003) and Pérignon and
Villa (2002, 2004). Other possible applications are obvious. Similar reflections
apply to the semiparametric factor model (SFM). Its main properties – esti-
mation in the local neighborhood of the design points and suitable dimension
reduction – make it an ideal candidate for functional modeling. Potential fields
of application are the term structure of interest rates, or swap and forward
rates.
We believe that semiparametric modeling in finance is an inspiring field of
research, and – in recalling the words of Corrozet (1543) – it appears to be
particularly fruitful in a financial world that is ‘un monde instable porté sur
la mer tant esmeue et rogue’.
A

Description and preparation of the IV data

A.1 Preliminaries

The data set employed for this research contains tick statistics on the DAX
futures contract and DAX index options and is provided by the EUREX
(Frankfurt am Main) for the period from 19950101 to 20010531. Both futures
contract data and option data are contract based data, i.e. each single contract
is registered together with its price, contract size, and time of settlement up
to a hundredth second. Interest rate data in daily frequency, i.e. one, three, six
and twelve months FIBOR rates for the years 1995–1999 and EURIBOR rates
for the period 2000–2001, are obtained from Thomson Financial Datastream.
Interest rate data are linearly interpolated to approximate the riskless interest
rate for the option’s time to maturity. In order to avoid a German tax bias,
option raw data has undergone a preparation scheme which is due to Hafner
and Wallmeier (2001) and described in the following. The entire data set
is stored in the financial database MD*base, maintained at the Center for
Applied Statistics and Economics (CASE) at the Humboldt-Universität zu
Berlin.
It is important to remark that a number of fundamental amendments in
income taxation were introduced in Germany in 2000 (Steuersenkungsgesetz,
BGBl. Teil I, Nr. 46 dating from 20001026). After a transition period starting
in 2001, the changes came fully into effect beginning from 2002. The former
legislation granted a tax voucher to domestic shareholders in compensation
for the corporate tax paid by the company (Anrechnungsverfahren). However,
this did not apply to foreign investors. Since 2002, the taxes paid on corpo-
rate income can no longer be deducted by domestic shareholders. Instead,
50% of the distributed dividends are taxed at the personal income tax (Halb-
einkünfteverfahren), while the other 50% of the capital income are not liable
to any further taxation. Therefore, the correction may no longer be mandatory
for the DAX index option data beginning from 2002. Regrettably, we are not
192 A Description and preparation of the IV data

aware of any study investigating this issue. For details on German taxation
law, we refer for instance to Tipke et al. (2002) or Rose (2004).

A.2 Data correction scheme

In a first step of the correction scheme, the DAX index values are recovered.
To this end, we group to each option price observation Ht the futures price Ft
of the nearest available futures contract, which was traded within a one minute
interval around the observed option. The futures price observation was taken
from the most heavily traded futures contract on the particular day, which is
the three months contract. The no-arbitrage price of the underlying index in
a frictionless market without dividends is given by

St = e−rTF ,t (TF −t) Ft , (A.1)

where St and Ft denote the index and the futures price respectively, TF the
maturity date of the futures contract, and rT,t the interest rate with maturity
T − t.
The DAX index is a capital weighted performance index, Deutsche Börse
(2002), i.e. dividends less corporate tax are reinvested into the index. There-
fore, at a first glance, dividend payments should have no or almost little
impact on the index options. However, when only the interest rate discounted
futures price is used to recover IVs by inverting the BS formula, IVs of calls
and puts can differ significantly. This discrepancy is especially large during
spring, when most of the 30 companies listed in the DAX distribute dividends.
The point is best visible in Figure A.1 from 20000404: IVs of calls (crosses) and
puts (circles) fall apart, thus violating the put-call-parity (2.26) and general
market efficiency considerations.
Hafner and Wallmeier (2001) argue that the marginal investor’s individual
tax scheme is different from the one actually assumed to compute the DAX
index. As has been explained in Section A.1, this can be the case between for-
eign and domestic shareholders, or between domestic shareholders of different
individual taxation. Consequently, the net dividend for this investor can be
higher or lower than the one used for the index computation. The discrep-
ancy, which the authors call difference dividend, has the same impact as a
dividend payment for an unprotected option, i.e. it drives a wedge into the
option prices and hence into IVs. Denote by ∆Dt,T the time T value of this
difference dividend incurred between t and T . Consider the dividend adjusted
futures price, which is approximated here by the forward price:

Ft = erF (TF −t) St − ∆Dt,TF , (A.2)

and the dividend adjusted put-call parity:


A.2 Data correction scheme 193

Implied Volatility Surface Ticks

0.39

0.35

0.32

0.29

0.25

0.20
0.72 0.82 0.14 0.17
0.92 1.02 0.08 0.11
1.12

Fig. A.1. IVS ticks on 20000404, derived from futures prices that are interest rate
discounted only. Put IV are circles, call IV crosses.

Ct − Pt = St − ∆Dt,TH e−rH (TH −t) − e−rH (TH −t) K , (A.3)

with TH denoting the call’s Ct and the put’s Pt maturity date. Inserting
equation (A.2) into (A.3) yields

Ct − Pt = Ft e−rF (TF −t) + ∆Dt,TH ,TF − e−rH (TH −t) K , (A.4)


def
where ∆Dt,TH ,TF = ∆Dt,TF e−rF (TF −t) − ∆Dt,TH e−rH (TH −t) is the desired
difference dividend.
The ‘adjusted’ index level

S̃t = Ft e−rF (TF −t) + ∆Dt,TH ,TF (A.5)

is that index level, which ties put and call IVs exactly to the same levels when
used in the inversion of the BS formula.
For an estimate of ∆D̂t,TH ,TF , pairs of puts and calls of the strikes and
same maturity are identified provided they were traded within a five minutes
interval. For each pair the ∆Dt,TH ,TF is derived from Equation (A.4). To
ensure robustness ∆D̂t,TH ,TF is estimated by the median of all ∆Dt,TH ,TF of
194 A Description and preparation of the IV data

Implied Volatility Surface Ticks

0.40

0.36

0.33

0.29

0.26

0.20
0.14 0.17
0.72 0.82
0.92 1.02 0.08 0.11
1.12

Fig. A.2. IVS ticks on 20000404, derived from futures prices that are interest rate
discounted and corrected with the implied difference dividend. Put IV are circles, call
IV crosses.

the pairs for a given maturity at day t. IVs are recovered by inverting the BS
formula using the corrected index value S̃t = Ft e−rF (TF −t) + ∆D̂t,TH ,TF . Note
that ∆Dt,TH ,TF = 0, when TH = TF . Indeed, when calculated also in this case,
∆D̂t,TH ,TF proved to be very small (compared with the index value), which
supports the validity of this approach. The described procedure is applied on
a daily basis throughout the entire data set from 19950101 to 20010531. All
computations have been made with XploRe, Härdle et al. (2000b).
In Figure A.2, also from 20000404, we present the data after correct-
ing the discounted futures price with an implied difference dividend ∆D̂t =
(10.3, 5.0, 1.9)> , where the first entry refers to 16 days, the second to 45 days
and the third to 73 days to maturity. IVs of puts and calls converge two
one single string, while the concavity of the put volatility smile is remedied,
too. Note that the overall level of the IV string is not altered through that
procedure.
The data are transaction based and may contain potential misprints and
outliers. This is seen in Figures A.1 and A.2. To accommodate for this, a mild
filter is applied: observations with IV less than 4% and bigger than 80% are
A.2 Data correction scheme 195

Min. Max. Mean Median Stdd. Skewn. Kurt.


All T. to mat. 0.028 2.014 0.134 0.084 0.149 3.623 22.574
Moneyn. 0.325 1.856 0.987 0.994 0.097 -0.303 5.801
IV 0.041 0.799 0.255 0.246 0.088 1.531 7.531

1995 T. to mat. 0.028 0.769 0.132 0.086 0.121 2.265 8.441


Moneyn. 0.771 1.207 0.996 0.997 0.040 -0.111 4.530
IV 0.046 0.622 0.149 0.147 0.021 1.218 12.165
1996 T. to mat. 0.028 2.011 0.152 0.097 0.167 3.915 28.561
Moneyn. 0.687 1.221 0.987 0.993 0.044 -0.723 5.887
IV 0.046 0.789 0.134 0.130 0.028 2.893 33.466
1997 T. to mat. 0.028 1.964 0.147 0.086 0.172 3.503 21.267
Moneyn. 0.446 1.441 0.979 0.988 0.077 -0.546 5.442
IV 0.043 0.800 0.246 0.233 0.073 1.149 5.027
1998 T. to mat. 0.028 2.014 0.134 0.081 0.148 3.548 22.957
Moneyn. 0.386 1.856 0.984 0.992 0.108 -0.030 5.344
IV 0.041 0.799 0.335 0.306 0.114 0.970 3.471
1999 T. to mat. 0.028 1.994 0.126 0.083 0.139 4.331 32.578
Moneyn. 0.371 1.516 0.979 0.992 0.099 -0.595 5.563
IV 0.047 0.798 0.273 0.259 0.076 0.942 4.075
2000 T. to mat. 0.028 1.994 0.130 0.083 0.151 3.858 23.393
Moneyn. 0.325 1.611 0.985 0.992 0.092 -0.337 6.197
IV 0.041 0.798 0.254 0.242 0.060 1.463 7.313
2001 T. to mat. 0.028 0.978 0.142 0.083 0.159 2.699 10.443
Moneyn. 0.583 1.811 1.001 1.001 0.085 0.519 6.762
IV 0.043 0.789 0.230 0.221 0.049 1.558 7.733

Table A.1. Summary statistics on the data base from 19950101 to 20010531, en-
tirely and on an annual basis. 2001 is from 20010101 to 20010531, only.

dropped. Furthermore, we disregard all observations having a maturity τ ≤ 10


days. Obviously, this filter does not detect outliers within these bounds. At this
point robust statistical methods may be an adequate choice. However, given
the sheer vastness of the data set, we believe this filter still to be adequate.
After this filtering, the entire number of observations are more than 5.7
million contracts. Trading volume increased considerably during this sample
period. For the last years 1998, and the following years it is around 5 200
observations per day.
Table A.1 gives a short summary of our IVS data. Most heavy trading
occurs in the short term contracts, as is seen from the difference between
median and mean of the term structure distribution of the observations as
well as from its skewness. Median time to maturity is 30 days (0.083 years).
Across moneyness the distribution is slightly negatively skewed. Mean IV over
the sample period is 27.9%.
B

Some results from stochastic calculus

This chapter contains a number of basic definitions and results from stochas-
tic calculus. They are collected in order to make our treatment more self-
contained. Thus, the selection of the issues is driven by their complementary
function to our work, rather than by their importance in stochastic calculus.
For any deeper treatment or proofs, we refer to standard textbooks such as
Øksendal (1998), Karatzas and Shreve (1991), or Steele (2000).
In this chapter, we consider stochastic processes defined on a complete
probability space (Ω, F, P). The probability space is equipped with a filtra-
tion, i.e. a nondecreasing family (Ft )t≥0 of subsigma fields Fs ⊆ Ft ⊆ F, for
0 ≤ s < t. The filtration is assumed to satisfy the ‘usual’ conditions, namely
that it is right-continuous, and that F0 contains all null sets. A stochastic pro-
cess is a collection of random variables (Xt )t≥0 on (Ω, F), which take values
in Rd . The index t is interpreted as ‘time’. We say that a stochastic process
X is adapted to (Ft )t≥0 , if all Xt are (Ft )t≥0 -measurable. For a fixed ω ∈ Ω,
the mapping t → Xt (ω) for t ≥ 0 is called the sample path of X associated
with ω.

Martingale

Let (Xt )0≤t<∞ be an (Ft )t≥0 -adapted stochastic process on (Ω, F, P) satis-
fying E|Xt | < ∞ for all 0 ≤ t < ∞. The process X is called an (Ft )t≥0 -
martingale, if for every 0 ≤ s < t < ∞, we have
E(Xt |Fs ) = Xs . (B.1)

The quadratic variation and covariation process

Let (Xt )0≤t≤T , for T < ∞, be an (Ft )t≥0 -adapted stochastic process on
(Ω, F, P). Further, let Dn be the Dyadic decomposition of order n on the
interval [0, T ], i.e.
198 B Some results from stochastic calculus

Dn = {i2−n |i = 0, 1, 2, 3, . . .} ∩ [0, T ] . (B.2)

The quadratic variation process of X is defined by (provided it exists):


def
X
hXit = lim (Xti − Xti−1 )2 , for 0 ≤ t ≤ T , (B.3)
n↑∞
0<ti ≤t

where the limit is understood in probability.


Let (Yt )0≤t<T be a second stochastic process on (Ω, F, P). The covariation
process of X and Y is defined by (if it exists):
def
X
hX, Y it = lim (Xti − Xti−1 )(Yti − Yti−1 ) , for 0 ≤ t ≤ T , (B.4)
n↑∞
0<ti ≤t

where the limit is understood in probability.

Brownian motion

A real-valued stochastic process (Wt )0≤t≤T <∞ adapted to (Ft )0≤t<T is called
a standard Brownian motion with respect to (Ft )0≤t<T on the interval [0, T ]
if it satisfies the following properties:

(i) W0 = 0
(ii) For any 0 ≤ s < t ≤ T the increment

Wt − Ws (B.5)

is independent of Fs and has the Gaussian distribution N (0, t − s).


(iii) (Wt )0≤t≤T has continuous sample paths.

For 0 ≤ s < t ≤ T the covariance is calculated as Cov(Ws , Wt ) = E{(Wt −


Ws + Ws )Ws } = E(Ws2 ), so in general

Cov(Ws , Wt ) = min(s, t) for 0 ≤ s, t ≤ T. (B.6)

For almost every ω ∈ Ω the Brownian sample path associated with ω is


nowhere differentiable. However, its quadratic variation process exists and is
P-almost surely:
hW it = t , for 0 ≤ t ≤ T . (B.7)
B Some results from stochastic calculus 199

Itô formula

Suppose that the real-valued process X taking values in R has the (stochastic)
integral representation
Z t Z t
Xt = x0 + as ds + bs dWs (B.8)
0 0

on 0 ≤ t ≤ T , where (at )0≤t≤T and (bt )0≤t≤T are real-valued (Ft )0≤t≤T -
adapted processes satisfying
! !
Z T Z T
P |as | ds < ∞ = 1 and P b2s ds < ∞ =1.
0 0

Then X is called an Itô process. Its quadratic variation process exists and
is given by: Z t
hXit = b2s ds (B.9)
0
for 0 ≤ t ≤ T .
Let f ∈ C 2,1 (R × R+ ). Then Itô’s formula states
Z t Z t
∂f (Xs , s) ∂f (Xs , s)
f (Xt , t) = f (X0 , 0) + ds + dXs
0 ∂t 0 ∂x
Z t 2
1 ∂ f (Xs , s)
+ dhXis , (B.10)
2 0 ∂x2

for 0 ≤ t ≤ T .
For the vector-valued process X = (X (1) , . . . , X (d) )> and f ∈ C 2,1 (Rd ×
+
R ), Itô’s formula generalizes to
Z t d Z t
∂f (Xs , s) X ∂f (Xs , s)
f (Xt , t) = f (X0 , 0) + ds + dXs(i)
0 ∂t i=1 0
∂x i

d d Z t 2
1 XX ∂ f (Xs , s)
+ dhX (i) , X (j) is . (B.11)
2 i=1 j=1 0 ∂xi ∂xj
200 B Some results from stochastic calculus

Tanaka-Meyer formula

The Itô formula can be generalized to convex functions f , Tanaka (1963),


Meyer (1976), in which case it is known as Tanaka-Meyer formula, Karatzas
and Shreve (1991, Theorem 3.6.22 and p. 220).
For some c ∈ R consider the convex function f : R → R, x → (x − c)+ ,
which is the relevant special case in this book. The left side derivative of f is
given by

D− f (x) = 1(x > c) , (B.12)

where 1(A) denotes the indicator function of the set A.


Define the second derivative in a distributional sense by
∂ 2 f (x)
= δc (x) , (B.13)
∂x2
where δc is the Dirac delta function centered at c.
Let X satisfy representation (B.8). The Tanaka-Meyer formula states:
Z t
+ + 1
(Xt − c) = (x0 − c) + 1(Xs > c) dXs + Lct , (B.14)
0 2
for 0 ≤ t ≤ T .
Z t   
def 1
Lct = lim n 1 Xs ∈ c, c + dhXis (B.15)
n↑∞ 0 n
Z t
= δc (Xs ) b2s ds (B.16)
0

is called the local time at level c. Intuitively, it measures the ‘time spent at
level c’.

Uniqueness and existence of SDE

In the following, we shall denote by (Ft )0≤t≤T the P-augmentation of the


filtration
FtW = σ Ws , 0 ≤ s ≤ t , 0 ≤ t ≤ T ,

(B.17)
generated by W . It can be shown that (Ft )0≤t≤T is already right-continuous
and thus satisfies the ‘usual’ conditions.
For x0 ∈ R, consider the one-dimensional SDE:

dXt = a(Xt , t) dt + b(Xt , t) dWt , (B.18)

with initial condition X0 = x0 , and with functions a, b : R × [0, T ] → R.


Assume that they satisfy the global Lipschitz condition:
B Some results from stochastic calculus 201

|a(x, t) − a(y, t)| + |b(x, t) − b(y, t)| ≤ K|x − y| , (B.19)

and the linear growth condition:

|a(x, t)| + |b(x, t)| ≤ L(1 + |x|) , (B.20)

for any 0 ≤ t ≤ T , and x, y ∈ R, where K, L are a positive constants.


Then there exists a strong solution to (B.18), i.e. there exists a continuous
(Ft )0≤t≤T -adapted process (Xt )0≤t≤T satisfying (B.18) and the initial condi-
tion X0 = x0 .
Moreover, if (Yt )0≤t≤T is another solution to (B.18), then strong unique-
ness holds, i.e.
P(Xt = Yt for all t ∈ [0, T ]) = 1 . (B.21)

This is the one-dimensional version of, e.g., Karatzas and Shreve (1991,
Theorem 5.2.9). In the vector-valued case, the absolute value is to be replaced
by a norm, but similar results hold.

Fokker-Planck equation

Let (Xt )0≤t≤T which takes values in R satisfy the SDE

dXt = a(Xt , t) dt + b(Xt , t) dWt , (B.22)

with initial condition X0 = x0 . Under the ellipticity condition b2 ≥  > 0, X


is a Markov process and its transition kernel takes the form

P(XT ∈ dy|Xt = x) = φ(y, T |Xt , t) dy (B.23)

for some jointly measurable density function φ(y, T |Xt , t) ≥ 0. The notation
makes precise that it is a density conditional on Xt and t. Then, φ(y, T |Xt , t)
can be characterized by the Fokker-Planck or forward Kolmogorov equation
n o n o
2 2
∂φ(y, T |Xt , t) ∂ a(y, T )φ(y, T |X t , t) 1 ∂ b (y, T )φ(y, T |X t , t)
0= + −
∂T ∂y 2 ∂y 2
(B.24)
for fixed (Xt , t) ∈ R × R+ and with the initial condition

φ(y, t|Xt , t) = δy (Xt ) . (B.25)


202 B Some results from stochastic calculus

Girsanov’s theorem

Let W = (W (1) , . . . , W (d) )> be a d-dimensional standard Brownian motion


defined on Ω and 0 ≤ T < ∞. Further let α = (α(1) , . . . , α(d) ) be an Rd -valued
R T (i) 2
(Ft )0≤t≤T -adapted process which satisfies P{ 0 αs ds < ∞} = 1 for each
i = 1, . . . , d.
Define the process
d Z
!
t Z t
def
X 1
Mt = exp αs(i) dWs(i) − kαs(i) k2 ds , (B.26)
i=1 0 2 0

where k · k denotes the Euclidian norm. Assume that α satisfies the Novikov
condition: ( !)
1 T
Z
2
E exp kαs k ds <∞. (B.27)
2 0

Then (Mt )0≤t≤T is a martingale and

EMt = 1 , (B.28)

for each 0 ≤ t ≤ T . Thus, we can define a new probability measure P


e T on
(Ω, FT ) by
e T (A) def

P = E 1(A)MT , A ∈ FT , (B.29)

i.e. P
e has the Radon-Nikodým derivative:

dPeT
= MT . (B.30)
dP

We can also define a new process W


f = (W f (d) )> by
f (1) , . . . , W
Z t
ft(i) def
W
(i)
= Wt − αs(i) ds , (B.31)
0

for i = 1, . . . , d and 0 ≤ t ≤ T .
In this situation Girsanov’s theorem asserts that W
f is a standard Brownian
motion on the new probability space (Ω, F, PT ).
e
C

Proofs of the results on the LSK IV estimator

As mentioned in Section 4.5, proofs of these results in the general class of


kernel M-estimators are due to Gouriéroux et al. (1994), here given as in
Fengler and Wang (2003) .

C.1 Proof of consistency

For notational simplicity, we introduce:


   
def κt − x τ −y
Z(x, y) = w(x) K(1) K(2) , (C.1)
h1,n h2,n

and
n
b n (σ) def 1 X
L = cti − cBS (κti , τi , ri , σ)}2 Z(κti , τi ) .
{e (C.2)
nh1,n h2,n i=1

def
and we remind that throughout this chapter κt = K/St . For sake of clarity,
we drop in the following the explicit dependence of the option prices and its
derivatives on r. Moreover, in this and the following section Et is an abbrevi-
ation for the conditional expectation with respect to Ft .
As a first step, let us prove
p def
ct − cBS (κt , τ, σ)}2 w(κt ) .
 
b n (σ) −→
L L(σ) = Et {e (C.3)

It is observed that
204 C Proofs of the results on the LSK IV estimator
n
1 X
cti − cBS (κti , τi , σ)}2 Z(κti , τi )

L
b n (σ) = {e
nh1,n h2,n i=1
cti − cBS (κti , τi , σ)}2 Z(κti , τi )
 
− Et {e
1
ct1 − cBS (κt1 , τ1 , σ)}2 Z(κt1 , τ1 )
 
+ Et {e
h1,n h2,n
def
= αn + βn . (C.4)

Standard arguments can be used to prove


 
Et αn2 = O (nh1,n h2,n )−1 (C.5)

by conditions (A1) and (A2) on page 120.


By Taylor’s expansion, we have
Z
1
βn = Et {e ct1 − cBS (x, y, σ)}2 Z(x, y) dx dy
h1,n h2,n
Z
= Et {e ct − cBS (κt − h1,n u, τ − h2,n v, σ)}2
p
× w(κt − h1,n u)K(1) (u)K(2) (v) du dv −→ L(σ) . (C.6)
Equations (C.5) and (C.6) together prove (C.3).

In a second step, we have, recalling the definition of σ(κt , τ ):


∂L(σ) ∂ BS
= −2 Et e
ct w(κt ) c (κt , τ, σ)
∂σ σ=σ(κt ,τ ) ∂σ σ=σ(κt ,τ )
∂ BS
+ 2 Et cBS (κt , τ, σ(κt , τ ))w(κt ) c (κt , τ, σ)
∂σ σ=σ(κt ,τ )
=0, (C.7)
and
∂ 2 L(σ) ∂ 2 BS
= −2 Et e
ct w(κt ) c (κt , τ, σ)
∂σ 2 σ=σ(κt ,τ ) ∂σ 2 σ=σ(κt ,τ )
 2
∂ BS
+ 2 Et w(κt ) c (κt , τ, σ)
∂σ σ=σ(κt ,τ )

∂2
+ 2 Et w(κt )cBS (κt , τ, σ(κ, τ )) 2 cBS (κt , τ, σ)
∂σ σ=σ(κ,τ )
 2
∂ BS
= 2 Et w(κt ) c (κt , τ, σ) . (C.8)
∂σ σ=σ(κt ,τ )

This together with (C.3) proves that L


b n (σ) converges in probability to a
p
convex function with a unique minimum at σ = σ(κt , τ ). Thus, σbn (κt , τ ) −→
C.2 Proof of asymptotic normality 205

σ(κt , τ ) is proved.

C.2 Proof of asymptotic normality

Recalling the definition of σ


b(κt , τ ), it follows that σ
b(κt , τ ) is the solution of
the following equation:
n
def 1 X
Un (σ) = Ai (κti , τi , σ)Bi (κti , τi , σ) Z(κti , τi )
nh1,n h2,n i=1
= 0. (C.9)

By Taylor’s expansion, we get


 
σ (κt , τ )) = Un (σ(κt , τ )) + Un0 (σ ∗ ) σ
0 = Un (b bt (κt , τ ) − σ(κt , τ ) , (C.10)

def ∂
where σ ∗ lies between σ and σ
b and Un0 (σ ∗ ) = ∂σ Un (σ)|σ=σ .

From (C.10), we have

b(κt , τ ) − σ(κt , τ ) = −{Un0 (σ ∗ )}−1 Un (σ) .


σ (C.11)

By some algebra, we obtain


n
1 X n ∂
Un0 (σ) =

Ai (κti , τi , σ) Bi (κti , τi , σ)
nh1,n h2,n i=1 ∂σ
∂ o
+ Ai (κti , τi , σ) Bi (κti , τi , σ) Z(κti , τi )
∂σ
hn ∂ 
− Et Ai (κti , τi , σ) Bi (κti , τi , σ)
∂σ !
∂ o i
+ Ai (κti , τi , σ) Bi (κti , τi , σ) Z(κti , τi )
∂σ
n
1 X hn ∂ 
+ Et Ai (κti , τi , σ) Bi (κti , τi , σ)
nh1,n h2,n i=1 ∂σ
∂ o i
+ Ai (κti , τi , σ) Bi (κti , τi , σ) Z(κti , τi )
∂σ
def
= 41,n + 42,n . (C.12)

Inspect first 41,n in Equation (C.12): by some algebra, we get


206 C Proofs of the results on the LSK IV estimator
n n
1 X ∂
Et 421,n ≤ 2 2 2 Et ( Ai (κti , τi , σ))Bi (κti , τi , σ)
n h1,n h2,n i=1 ∂σ
2
∂ o
+ Ai (κti , τi , σ) Bi (κti , τi , σ) Z(κti , τi )
∂σ
ft2 (κt , τ ) K(1)
R 2 R 2
(u) du K(2) (v)dv
n

= Et A1 (κt , τ, σ)B1 (κt , τ, σ)
nh1,n h2,n ∂σ
2 o 

+ A1 (κt , τ, σ) B1 (κt , τ, σ) w(κt )
∂σ
 
1
+ O −→ 0 , (C.13)
nh1,n h2,n

as nh1,n h2,n → ∞. The joint (time-t conditional) probability density function


of κt and τ is denoted by ft (κt , τ ).
∂ def
To consider 42,n in Equation (C.12), denote D(κt , τ, σ) = ∂σ B(κt , τ, σ),

for simplicity. Note that ∂σ A(κt , τ, σ) = −B(κt , τ, σ). Thus, we have:

1 nZ  
42,n = Et − B 2 (x, y, σ) + A(x, y, σ)D(x, y, σ)
h1,n h2,n
o
× Z(x, y)ft (x, y) dx dy
Z n
= Et −B 2 (κt − h1,n u, τ − h2,n v, σ)
o
+ A(κt − h1,n u, τ − h2,n v, σ)D(κt − h1,n u, τ − h2,n v, σ)
× w(κt ) ft (κt − h1,n u, τ − h2,n v)K(1) (u)K(2) (v) du dv
h n o
−→ − Et B 2 (κt , τ, σ)w(κt )
n oi
+ Et A(κt , τ, σ)D(κt , τ, σ)w(κt ) ft (κt , τ ) . (C.14)

Equations (C.12), (C.13), (C.14) and the fact Un0 (σ ∗ )−Un0 (σ) → 0 together
prove:
p
h n o
Un0 (σ ∗ ) −→ Et −B 2 (κt , τ, σ)w(κt )
n oi
+ Et A(κt , τ, σ)D(κt , τ, σ)w(κt ) ft (κt , τ ) .
(C.15)

Now, let
def 1
uni = A(κti , τi , σ)B(κti , τi , σ) Z(κti , τi ) . (C.16)
h1,n h2,n
C.2 Proof of asymptotic normality 207

For some δ > 0, we have:


1
Et |uni |2+δ = E A2+δ (κti , τi , σ)B 2+δ (κti , τi , σ)2+δ Z 2+δ (κti , τi )
2+δ t
h2+δ h
1,n 2,n
Z
1
= 1+δ 1+δ Et A2+δ (κt − hn u, τ − hn v, σ)
h1,n h2,n
× B 2+δ (κt − hn u, τ − hn u, σ)

2+δ
× Z (κt − h1,n u, τ − h2,n v) du dv
2+δ
R R 2+δ
ft (κt , τ ) K(1) (u) du K(2) (v) dv
=
h1+δ 1+δ
1,n h2,n

× Et A2+δ (κt , τ, σ)B 2+δ (κt , τ, σ)w2+δ (κt )


 
!
1
+ O . (C.17)
h1+δ 1+δ
1,n h2,n

Similarly, we get:
R 2 R 2
ft (κt , τ ) K(1) (u) du K(2) (v) dv
Et u2ni =
h1,n h2,n
× Et {A2 (κt , τ, σ)B 2 (κ, τ, σ)w2 (κt )}
 
1
+ O . (C.18)
h1,n h2,n

Equations (C.17) and (C.18) together prove


Pn 2+δ
i=1 Et |uni | − δ2
Pn 2+δ = O((nh1,n h2,n ) ) = O(1) (C.19)
( i=1 Et |uni |2 ) 2

as nh1,n h2,n → 0.
Applying the Liapounov central limit theorem, we get
 
p L
nh1,n h2,n Un (σ) −→ N 0, ft (κt , τ ) ν 2 , (C.20)

where
Z
2 def 2 2 2 2 2
ν = Et {A (κt , τ, σ)B (κt , τ, σ)w (κt )} K(1) (u)K(2) (v) dudv . (C.21)

By (C.15) and (C.20), asymptotic normality is proved.


References

Airoldi, J.-P. and Flury, B. D. (1988). An application of common principal compo-


nent analysis to cranial morphometry of microtus californicus and m. ochrogaster
(mammalia, rodentia), Journal of Zoology, Lond. 216: 21–36.
Aı̈t-Sahalia, Y. (1996). Nonparametric pricing of interest rate derivative securities,
Econometrica 64: 527–560.
Aı̈t-Sahalia, Y. and Duarte, J. (2003). Nonparametric option pricing under shape
restrictions, Journal of Econometrics 116: 9–47.
Aı̈t-Sahalia, Y. and Lo, A. (1998). Nonparametric estimation of state-price densities
implicit in financial asset prices, Journal of Finance 53: 499–548.
Aı̈t-Sahalia, Y., Bickel, P. J. and Stoker, T. M. (2001a). Goodness-of-fit tests for
regression using kernel methods, Journal of Econometrics 105: 363–412.
Aı̈t-Sahalia, Y., Wang, Y. and Yared, F. (2001b). Do options markets correctly price
the probabilities of movement of the underlying asset?, Journal of Econometrics
102: 67–110.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood
principle, 2nd International Symposium on Information Theory, Akademiai Ki-
ado, Budapest.
Alexander, C. (2001a). Market Models, John Wiley & Sons, New York.
Alexander, C. (2001b). Principles of the skew, RISK 14(1): S29–S32.
Alexander, C. and Lvov, D. (2003). Statistical properties of forward rates, Working
paper, ISMA Centre, University of Reading.
Alexander, C. and Nogueira, L. M. (2004). Hedging with stochastic local volatility,
Discussion Papers in Finance 2004-11, ISMA Centre, University of Reading.
Alexander, C., Brintalos, G. and Nogueira, L. (2003). Short and long term smile
effects: The binomial normal mixture diffusion model, Working paper, ISMA
Centre, University of Reading.
Amerio, E., Fusai, G. and Vulcano, A. (2003). Pricing of implied volatility deriva-
tives, FORC Preprint 2003/126, University of Warwick.
Amin, K. I. and Ng, V. K. (1997). Inferring future volatility from the information
in implied volatility in eurodollar options: A new approach, Review of Financial
Studies 10(2): 333–367.
210 References

Andersen, L. B. G. and Brotherton-Ratcliffe, R. (1997). The equity option volatility


smile: An implicit finite-difference approach, Journal of Computational Finance
1(2): 5–37.
Andersen, L. B. G., Andreasen, J. and Eliezer, D. (2002). Static replication of barrier
options: Some general results, Journal of Computational Finance 5(4): 1–25.
Andersen, T. G., Bollerslev, T., Diebold, F. X. and Labys, P. (2003). Modelling and
forecasting realized volatility, Econometrica 71: 579–625.
Anderson, T. W. (1963). Asymptotic theory for principal component analysis, An-
nals of Mathematical Statistics 34: 122–148.
Ané, T. and Geman, H. (1999). Stochastic volatility and transaction time: An
activity-based volatility estimator, Journal of Risk 2(1): 57–69.
Avellaneda, M., Boyer-Olson, D., Busca, J. and Friz, P. (2002). Reconstructing
volatility, RISK 15(10): 91–95.
Avellaneda, M., Friedman, C., Holmes, R. and Samperi, D. (1997). Calibrating
volatility surfaces via relative entropy minimization, Applied Mathematical Fi-
nance 4: 37–64.
Ayache, E., Henrotte, P., Nassar, S. and Wang, X. (2004). Can anyone solve the
smile problem?, Wilmott magazine (Jan.): 78–96.
Bajeux, I. and Rochet, J. C. (1992). Dynamic spanning: Are options an appropriate
instrument?, Mathematical Finance 6: 1–16.
Bakshi, G. and Kapadia, N. (2003). Delta-hedged gains and the negative market
volatility risk premium, Review of Financial Studies 16(2): 527–566.
Bakshi, G., Cao, C. and Chen, Z. (1997). Empirical performance of alternative
option pricing models, Journal of Finance 52(5): 2003–2049.
Bakshi, G., Cao, C. and Chen, Z. (2000). Do call and underlying prices always move
in the same direction?, Review of Financial Studies 13(3): 549–584.
Bakshi, G., Kapadia, N. and Madan, D. (2003). Stock return characteristics, skew
laws, and the differential pricing of individual equity options, Review of Financial
Studies 16(1): 101–143.
Ball, C. and Roma, A. (1994). Stochastic volatility option pricing, Journal of Fi-
nancial and Quantitative Analysis 29(4): 589–607.
Balland, P. (2002). Deterministic implied volatility models, Quantitative Finance
2: 31–44.
Barle, S. and Cakici, N. (1998). How to grow a smiling tree, The Journal of Financial
Engineering 7: 127–146.
Barndorff-Nielsen, O. E. (1997). Normal inverse Gaussian distributions and stochas-
tic volatility modelling, Scandinavian Journal of Statistics 24: 1–13.
Bates, D. S. (1996). Jumps and stochastic volatility: Exchange rate processes implicit
in deutsche mark options, Review of Financial Studies 9: 69–107.
Bates, D. S. (2000). Post-’87 crash fears in the S&P 500 futures option market,
Journal of Econometrics 94(1-2): 181–238.
Beaglehole, D. and Chebanier, A. (2002). Mean-reverting smiles, RISK 15(4): 95–98.
Beckers, S. (1981). Standard deviations implied in option prices as predictors of
future stock price variability, Journal of Banking and Finance 5: 363–382.
Benko, M. and Härdle, W. (2004). Common functional implied volatility analysis, in
P. Čı́žek, W. Härdle and R. Weron (eds), Statistical Tools in Finance, Springer-
Verlag, Berlin, Heidelberg. Forthcoming.
Berestycki, H., Busca, J. and Florent, I. (2002). Asymptotics and calibration of local
volatility models, Quantitative Finance 2: 61–69.
References 211

Besse, P. (1991). Approximation spline de l’analyse en composantes principales d’une


variable aléatoire hilbertienne, Annales de la Faculté des Sciences de Toulouse
12: 329–346.
Björk, T. (1998). Arbitrage Theory in Continuous Time, Oxford University Press,
Oxford.
Black, F. (1976). Studies of stock price volatility changes, Proceedings of the 1976
Meetings of the American Statistical Association pp. 177–181.
Black, F. (1992). Living up to the model, in P. Field and R. Jaycobs (eds), From
Black-Scholes to Black Holes: New Frontiers in Option Pricing, Risk Magazine
Ltd, London, pp. 17–20.
Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities,
Journal of Political Economy 81: 637–654.
Blaskowitz, O., Härdle, W. and Schmidt, P. (2004). Skewness and kurtosis trades,
in S. T. Rachev (ed.), Handbook: Computational and Numerical Methods in Fi-
nance, Birkhäuser.
Bliss, R. (1997). Movements in the term structure of interest rates, Economic Review
Q IV, Federal Reserve Bank of Atlanta.
Bluman, G. (1980). On the transformation of diffusion processes into Wiener pro-
cesses, SIAM Journal on Applied Mathematics 39(2): 238–247.
Bodurtha, J. N. (2000). A linearization-based solution to the ill-posed local volatility
estimation problem, Working paper, Georgetown University.
Bodurtha, J. N. and Jermakyan, M. (1999). Nonparametric estimation of an implied
volatility surface, Journal of Computational Finance 2(4): 29–60.
Bollen, N. and Whaley, R. E. (2003). Does net buying pressure affect the shape of
the implied volatility functions?, Working paper.
Borak, S., Fengler, M. R., Härdle, W. and Mammen, E. (2005). Semiparametric
state space factor models, CASE Discussion Paper, Humboldt-Universität zu
Berlin.
Bouchouev, I. and Isakov, V. (1999). Uniqueness, stability and numerical meth-
ods for the inverse problem that arises in financial markets, Inverse Problems
15: R95–R116.
Brace, A., Goldys, B., Klebaner, F. and Womersley, R. (2001). Market model of
stochastic implied volatility with application to the BGM model, Working paper,
Department of Statistics, University of New South Wales, Sydney.
Branger, N. and Schlag, C. (2004). Why is the index smile so steep?, Review of
Finance 8: 109–127.
Breeden, D. and Litzenberger, R. (1978). Price of state-contingent claims implicit
in options prices, Journal of Business 51: 621–651.
Breidt, F. J., Crato, N. and de Lima, P. (1998). The detection and estimation of
long memory in stochastic volatility, Journal of Econometrics 83: 325–348.
Brigo, D. and Mercurio, F. (2001). Displaced and mixture diffusions for analytically-
tractable smile models, in H. German, D. B. Madan, S. R. Pliska and A. C. F.
Vorst (eds), Mathematical Finance Bachelier Congress 2000, Springer-Verlag,
Berlin, Heidelberg.
Brigo, D. and Mercurio, F. (2002). Log-normal-mixture dynamics and calibration
to market volatility smiles, International Journal of Theoretical and Applied
Finance 5(4): 427–446.
Brigo, D., Mercurio, F. and Sartorelli, G. (2002). Alternative asset price dynamics
and volatility smile, Banca IMI report.
212 References

Britten-Jones, M. and Neuberger, A. J. (2000). Option prices, implied price pro-


cesses, and stochastic volatility, Journal of Finance 55(2): 839–866.
Broadie, M., Cvitanić, J. and Soner, H. M. (1998). Optimal replication of contingent
claims under portfolio constraints, Review of Financial Studies 11(1): 59–79.
Broadie, M., Detemple, J., Ghysels, E. and Torrès, O. (2000). American options with
stochastics dividends and volatility: A nonparametric investigation, Journal of
Econometrics 94: 53–92.
Brown, G. and Randall, C. (1999). If the skew fits, RISK 12(4): 62–65.
Brunner, B. and Hafner, R. (2003). Arbitrage-free estimation of the risk-neutral
density from the implied volatility smile, Journal of Computational Finance
7(1): 75–106.
Cai, Z., Fan, J. and Yao, Q. (2000). Functional-coefficient regression models for
nonlinear time series, Journal of the American Statistical Association 95: 941–
956.
Canina, L. and Figlewski, S. (1993). The informational content of implied volatility,
Review of Financial Studies 6: 659–681.
Carr, P. and Madan, D. (1998). Towards a theory of volatility trading, in R. Jarrow
(ed.), Volatility, Risk Publications, pp. 417–427.
Carr, P., Ellis, K. and Gupta, V. (1998). Static hedging of exotic options, Journal
of Finance 53(3): 1165–1190.
Chiras, D. P. and Manaster, S. (1978). The information content of option prices and
a test for market efficiency, Journal of Financial Economics 6: 213–234.
Christensen, B. and Prabhala, N. (1998). The relation between implied and realized
volatility, Journal of Financial Economics 50: 125–150.
Čı́žek, P., Härdle, W. and Weron, R. (2004). Statistical Tools in Finance, Springer-
Verlag, Berlin, Heidelberg. Forthcoming.
Coleman, T. F., Kim, Y., Li, Y. and Verma, A. (2001). Dynamic hedging with a
deterministic local volatility function model, Journal of Risk 4(1): 63–89.
Coleman, T. F., Li, Y. and Verma, A. (1999). Reconstructing the unknown local
volatility function, Journal of Computational Finance 2(3): 77–102.
Connor, G. and Linton, O. (2000). Semiparametric estimation of a characteristic-
based factor model of stock returns, Technical report, LSE, London.
Cont, R. (1999). Beyond implied volatility: Extracting information from option
prices, in I. Kondor and J. Kertesz (eds), Econophysics: An Emerging Science,
Kluwer Academic Publishers, Dordrecht.
Cont, R. and da Fonseca, J. (2002). The dynamics of implied volatility surfaces,
Quantitative Finance 2(1): 45–60.
Cont, R. and Tankov, P. (2003). Calibration of jump-diffusion option pricing models:
A robust non-parametric approach, Journal of Computational Finance. Forth-
coming.
Cont, R. and Tankov, P. (2004). Financial modelling with Jump Processes, Chapman
& Hall, CRC Press, London.
Cont, R., da Fonseca, J. and Durrleman, V. (2002). Stochastic models of implied
volatility surfaces, Economic Notes 31(2): 361–377.
Corrozet, G. (1543). Hecaton-GRAPHIE. C’est à dire les descriptions de cent
figures & hystoires, contenants plusieurs appophthegmes, prouerbes, sentences
& dictz tant des anciens, que des modernes. Le tout reueu par son autheur.
Auecq’Priuilege. A Paris chez Denys Ianot Imprimeur & Libraire.
References 213

Cox, J. E. and Ross, S. A. (1976). The valuation of options for alternative stochastic
processes, Journal of Financial Economics 76: 145–166.
Cox, J. E., Ross, S. A. and Rubinstein, M. (1979). Option pricing: A simplified
approach, Journal of Financial Economics 7: 229–263.
Crépey, S. (2004). Delta-hedging vega risk?, Technical report, Université d’Évry,
France.
Daglish, T. (2003). Pricing and hedging comparison for index options, Journal of
Financial Econometrics 1(3): 327–364.
Daglish, T., Hull, J. C. and Suo, W. (2003). Volatility surfaces: Theory, rules of
thumb, and empirical evidence, Working paper, J. L. Rotman School of Man-
agement, University of Toronto.
Das, S. and Sundaram, R. (1999). Of smiles and smirks: A term-structure perspec-
tive, Journal of Financial and Quantitative Analysis 34(2): 211–240.
Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for the principal
component analysis of a vector random function: Some applications to statistical
inference, Journal of Multivariate Analysis 12: 136–154.
Dempster, M. A. H. and Richards, D. G. (2000). Pricing American options fitting
the smile, Mathematical Finance 10(2): 157–177.
Derman, E. (1999). Regimes of volatility, RISK 12(4): 55–59.
Derman, E. and Kani, I. (1994a). Riding on a smile, RISK 7(2): 32–39.
Derman, E. and Kani, I. (1994b). The volatility smile and its implied tree, Quanti-
tative strategies research notes, Goldman Sachs.
Derman, E. and Kani, I. (1998). Stochastic implied trees: Arbitrage pricing with
stochastic term and strike structure of volatility, International Journal of The-
oretical and Applied Finance 1(1): 61–110.
Derman, E., Ergener, D. and Kani, I. (1995). Static options replication, Journal of
Derivatives 2(4): 78–95.
Derman, E., Kani, I. and Chriss, N. (1996a). Implied trinomial trees of the volatility
smile, Journal of Derivatives 3(4): 7–22.
Derman, E., Kani, I. and Kamal, M. (1997). Trading and hedging local volatility,
Journal of Financial Engineering 6(3): 1233–1268.
Derman, E., Kani, I. and Zou, J. Z. (1996b). The local volatility surface: Unlocking
the information in index option prices, Financial Analysts Journal 7-8: 25–36.
Deutsche Börse (2002). Leitfaden zu den Aktienindizes der Deutschen Börse, 4.3
edn, Deutsche Börse AG, 60284 Frankfurt am Main.
Duffie, D. (2001). Dynamic Asset Pricing Theory, 3rd edn, Princeton University
Press, Princeton.
Dumas, B., Fleming, J. and Whaley, R. E. (1998). Implied volatility functions:
Empirical tests, Journal of Finance 80(6): 2059–2106.
Dupire, B. (1994). Pricing with a smile, RISK 7(1): 18–20.
Eberlein, E. and Keller, U. (1995). Hyperbolic distributions in finance, Bernoulli
1: 281–299.
Eberlein, E. and Prause, K. (2002). The generalized hyperbolic model: Financial
derivatives and risk measures, in H. Geman, D. Madan, S. Pliska and T. Vorst
(eds), Mathematical Finance - Bachelier Congress 2000, Springer-Verlag, Berlin,
Heidelberg, pp. 245–267.
Ederington, L. and Guan, W. (2002). Why are those options smiling?, Journal of
Derivatives 10(2): 9–34.
214 References

Efromovich, S. (1999). Nonparametric Curve Estimation, Springer-Verlag, Berlin,


Heidelberg.
Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of
the variance of United Kingdom inflation, Econometrica 50(4): 987–1007.
Engle, R. (2002). Dynamical conditional correlation: A simple class of multivariate
generalized autoregressive conditional heteroscedastic models, Journal of Busi-
ness and Economic Statistics 20(3): 339–350. Forthcoming.
Engle, R. and Rosenberg, J. (2000). Testing the volatility term structure using
option hedging criteria, Journal of Derivatives 8(1): 10–28.
Evans, M., Hastings, N. and Peacock, B. (2000). Statistical Distributions, 3rd edn,
John Wiley & Sons, New York.
Fahlenbrach, R. and Strobl, G. (2002). Is the volatility constrained to smile? An
empiricial investigation of option pricing models under portfolio constraints,
Working paper, University of Pennsylvania.
Fan, J. (1992). Design adaptive nonparametric regression, Journal of the American
Statistical Association 87: 998–1004.
Fan, J. (1993). Local linear regression smoothers and their minimax efficiences,
Journal of the American Statistical Association 21: 196–216.
Fan, J. and Gijbels, I. (1992). Variable bandwidth and local linear regression
smoothers, Annals of Statistics 21: 196–216.
Fan, J., Yao, Q. and Cai, Z. (2003). Adaptive varying-coefficient linear models, J.
Roy. Statist. Soc. B. 65: 57–80.
Fengler, M. R. (2002). The phenomenology of implied volatility surfaces, Master
thesis. Department of Business and Economics, Humboldt-Universität zu Berlin.
Fengler, M. R. and Herwartz, H. (2002). Multivariate volatility models, in W. Härdle,
T. Kleinow and G. Stahl (eds), Applied Quantitative Finance, Springer-Verlag,
Berlin, Heidelberg.
Fengler, M. R. and Schwendner, P. (2004). Quoting multiasset equity options in the
presence of errors from estimating correlations, Journal of Derivatives 11(4): 43–
54.
Fengler, M. R. and Wang, Q. (2003). Fitting the smile revisited: A least squares
kernel estimator for the implied volatility surface, SfB 373 Discussion Paper
2003-25, Humboldt-Universität zu Berlin.
Fengler, M. R. and Winter, J. (2004). Price variability and price dispersion in a
stable monetary environment: Evidence from Germany, Managerial and Decision
Economics. Special Issue on Price Flexibility: Theories and Evidence, D. Levy
(ed.), forthcoming.
Fengler, M. R., Härdle, W. and Mammen, E. (2003a). A dynamic semiparametric
factor model for implied volatility string dynamics, Discussion paper, SfB 373,
Humboldt-Universität zu Berlin.
Fengler, M. R., Härdle, W. and Schmidt, P. (2002a). The analysis of implied volatili-
ties, in W. Härdle, T. Kleinow and G. Stahl (eds), Applied Quantitative Finance,
Springer-Verlag, Berlin, Heidelberg.
Fengler, M. R., Härdle, W. and Schmidt, P. (2002b). Common factors governing
VDAX movements and the maximum loss, Journal of Financial Markets and
Portfolio Management 16(1): 16–29.
Fengler, M. R., Härdle, W. and Villa, C. (2003b). The dynamics of implied volatili-
ties: A common principle components approach, Review of Derivatives Research
6: 179–202.
References 215

Figlewski, S. (1989). What does an option pricing model tell us about option prices?,
Financial Analysts Journal 45: 12–15.
Flury, B. (1988). Common Principal Components and Related Multivariate Models,
Wiley Series in Probability and Mathematical Statistics, John Wiley & Son,
New York.
Flury, B. and Gautschi, W. (1986). An algorithm for simultaneous orthogonal trans-
formations of several positive definite matrices to nearly diagonal form, Journal
on Scientific and Statistical Computing 7: 169–184.
Föllmer, H. and Schied, A. (2002). Stochastic Finance: An Introduction in Dis-
crete Time, Wiley Series in Probability and Mathematical Statistics, Walter de
Gruyter, Berlin, New York.
Föllmer, H. and Schweizer, M. (1990). Hedging of contingent claims under incom-
plete information, in M. H. A. Davis and R. J. Elliott (eds), Applied Stochasti-
cal Analysis, Vol. 5 of Stochastics Monographs, Gordon and Breach, New York,
pp. 389–414.
Föllmer, H. and Sondermann, D. (1986). Hedging of non-redundant contingent
claims, in W. Hildenbrand and A. Mas-Colell (eds), Contributions to Math-
ematical Economics in Honor of Gérard Debreu, North-Holland, Amsterdam,
pp. 206–223.
Fouque, J.-P., Papanicolaou, G. and Sircar, K. R. (2000). Derivatives in Financial
Markets with Stochastic Volatility, Cambridge University Press, Cambridge.
Franke, J., Härdle, W. and Hafner, C. (2004). Introduction to the Statistics of
Financial Markets, Springer-Verlag, Berlin, Heidelberg. Forthcoming.
Frey, R. (1996). Derivative asset analysis in models with level-dependent and
stochastic volatility, CWI Quarterly 10(1): 1–34.
Frey, R. and Patie, P. (2002). Risk management for derivatives in illiquid markets:
A simulation study, in K. Sandmann and P. Schönbucher (eds), Advances in
Finance and Stochastics, Springer-Verlag, Berlin, Heidelberg.
Gatheral, J. (1999). The volatility skew: Arbitrage constraints and asymptotic be-
havior, Technical report, Merill Lynch.
Ghysels, E. and Ng, S. (1989). A semiparametric factor model of interest rates and
tests of the affine term structure, Review of Economics and Statistics 80: 535–
548.
Glosten, L., Jagannathan, R. and Runkle, D. (1993). Relationship between the
expected value and the volatility of the nominal excess return on stocks, Journal
of Finance 48: 1779–1801.
Golub, B. and Tilman, L. M. (1997). Measuring yield curve risk using principal
component analysis, value at risk, and key rate durations, Journal of Portfolio
Management 23(4): 72–84.
Gouriéroux, C. and Jasiak, J. (2001). Dynamic factor models, Econometrics Review
20(4): 385–424.
Gouriéroux, C., Monfort, A. and Tenreiro, C. (1994). Nonparametric diagnostics for
structural models, Document de travail 9405, CREST, Paris.
Gouriéroux, C., Monfort, A. and Tenreiro, C. (1995). Kernel M-estimators and
functional residual plots, Document de travail 9546, CREST, Paris.
Gouriéroux, C., Scaillet, O. and Szafarz, A. (1997). Econométrie de la finance,
Economica, Paris.
Grossman, S. and Zhou, Z. (1996). Equilibrium analysis of portfolio insurance,
Journal of Finance 51(4): 1379–1403.
216 References

Hafner, R. and Wallmeier, M. (2001). The dynamics of DAX implied volatilities,


International Quarterly Journal of Finance 1(1): 1–27.
Hagan, P. and Woodward, D. (1999). Equivalent Black volatilities, Applied Mathe-
matical Finance 6: 147–157.
Hagan, P., Kumar, D., Lesniewski, A. and Woodward, D. (2002). Managing smile
risk, Wilmott magazine 1: 84–108.
Härdle, W. (1990). Applied Nonparametric Regression, Cambridge University Press,
Cambridge, UK.
Härdle, W. and Hafner, C. (2000). Discrete time option pricing with flexible volatility
estimation, Finance and Stochastics 4(2): 189–207.
Härdle, W. and Hlávka, Z. (2004). Dynamics of state price densities, CASE Discus-
sion Paper, Humboldt-Universität zu Berlin.
Härdle, W. and Simar, L. (2003). Applied Multivariate Statistical Analysis, Springer-
Verlag, Berlin, Heidelberg.
Härdle, W. and Yatchew, A. (2003). Dynamic state price density estimation using
constrained least squares and the bootstrap, Journal of Econometrics. Forth-
coming.
Härdle, W. and Zheng, J. (2002). How precise are price distributions predicted by
implied binomial trees?, in W. Härdle, T. Kleinow and G. Stahl (eds), Applied
Quantitative Finance, Springer-Verlag, Berlin, Heidelberg.
Härdle, W., Herwartz, H. and Spokoiny, V. (2003). Time inhomogeous multiple
volatility modelling, Journal Financial Econometrics 1(2): 55–95.
Härdle, W., Hlávka, Z. and Klinke, S. (2000a). XploRe – Application Guide,
Springer-Verlag, Berlin, Heidelberg.
Härdle, W., Kleinow, T. and Stahl, G. (2002). Applied Quantitative Finance,
Springer-Verlag, Berlin, Heidelberg.
Härdle, W., Klinke, S. and Müller, M. (2000b). Xplore – Learning Guide, Springer-
Verlag, Berlin, Heidelberg.
Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and
Semiparametric Models, Springer-Verlag, Berlin, Heidelberg.
Harper, J. (1994). Reducing parabolic partial differential equations to canonical
form, European Journal of Applied Mathematics 5: 159–165.
Harrison, J. and Kreps, D. (1979). Martingales and arbitrage in multiperiod secu-
rities markets, Journal of Economic Theory 20: 381–408.
Harvey, C. R. and Whaley, R. E. (1991). S&P 100 index option volatility, Journal
of Finance 46(4): 1151–1561.
Harvey, C. R. and Whaley, R. E. (1992). Market volatility prediction and the
efficiency of the S&P 100 index option market, Journal of Financial Economics
31: 43–73.
Hastie, T. and Tibshirani, R. (1990). Generalized additive models, Chapman and
Hall, London.
Heath, D., Jarrow, R. A. and Morton, A. (1992). Bond pricing and the term structure
of interest rates: A new methodology for contingent claims valuation, Economet-
rica 60: 77–105.
Henkel, A. and Schöne, A. (1996). Emblemata. Handbuch zur Sinnbildkunst des
XVI. und XVII. Jahrhunderts, Verlag J. B. Metzler, Stuttgart, Weimar.
Hentschel, L. (2003). Errors in implied volatility estimation, Journal of Financial
and Quantitative Analysis 38: 779–810.
References 217

Heston, S. (1993). A closed-form solution for options with stochastic volatility with
applications to bond and currency options, Review of Financial Studies 6: 327–
343.
Heynen, R. (1994). An empirical investigation of observed smile patterns, Review
of Futures Markets 13: 317–353.
Hlávka, Z. (2003). Constrained estimation of state price densities, Discussion Paper
2003-22, SfB 373, Humboldt-Universität zu Berlin.
Hormander, L. (1990). The Analysis of Linear Partial Differential Operators I: Dis-
tribution Theory and Fourier Analysis, 2nd edn, Springer-Verlag, Berlin, Hei-
delberg.
Horowitz, J. (1998). Semiparametric Methods in Econometrics, number 131 in Lec-
ture Notes in Statistics, Springer-Verlag, Berlin, Heidelberg.
Horowitz, J., Klemela, J. and Mammen, E. (2002). Optimal estimation in additive
models, Preprint.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal
components, Journal of Educational Psychology 24: 417–441.
Hull, J. (2002). Options, Futures, and Other Derivatives, Prentice Hall, New Jersey,
USA.
Hull, J. and White, A. (1987). The pricing of options on assets with stochastic
volatilities, Journal of Finance 42: 281–300.
Huynh, K., Kervalla, P. and Zheng, J. (2002). Estimating state price densities with
nonparametric regression, in W. Härdle, T. Kleinow and G. Stahl (eds), Applied
Quantitative Finance, Springer-Verlag, Berlin, Heidelberg.
Ingersoll, J. E. (1997). Valuing foreign exchange rate derivatives with a bounded
exchange rate process, Review of Derivatives Research 1: 159–181.
Jackson, N., Süli, E. and Howison, S. (1998). Computation of deterministic volatility
surfaces, Journal of Computational Finance 2(2): 5–32.
Jackwerth, J. C. (1997). Generalized binomial trees, Journal of Derivatives 5: 7–17.
Jackwerth, J. C. (1999). Option-implied risk-neutral distributions and implied bi-
nomial trees: A literature review, Journal of Derivatives 7(2): 66–82.
Jackwerth, J. C. and Rubinstein, M. (2001). Recovering stochastic processes from
option prices, Working paper, Universität Konstanz.
Jamshidian, F. (1993). Options and futures evaluation with deterministic volatilities,
Mathematical Finance 3(2): 149–159.
Jamshidian, F. and Zhu, Y. (1997). Scenario simulation: Theory and methodology,
Finance and Stochastics 1: 43–67.
Jarrow, R. A. and O’Hara, M. (1989). Primes and scores: An essay on market
imperfections, Journal of Finance 44: 1265–1287.
Jiang, L. and Tao, Y. (2001). Identifying the volatility of the underlying assets from
option prices, Inverse Problems 17: 137–155.
Jiang, L., Chen, Q., Wang, L. and Zhang, J. E. (2003). A new well-posed algorithm
to recover implied local volatility, Quantitative Finance 3: 451–457.
Johnson, R. A. and Wichern, D. W. (1998). Applied Multivariate Statistical Analysis,
4 edn, Prentice-Hall, Englewood Cliffs, N.J.
Jorion, P. (1988). On jump processes in the foreign exchange and stock markets,
Review of Financial Studies 1(4): 427–445.
Jorion, P. (1995). Predicting volatility in the foreign exchange market, Journal of
Finance 50(2): 507–528.
218 References

Joshi, M. S. (2003). The Concepts and Practice of Mathematical Finance, Cambridge


University Press, Cambridge.
Karatzas, I. (1997). Lectures on the Mathematics of Finance, Vol. 8 of CRM Mono-
graph Series, American Mathematical Society, Providence, Rhode Island.
Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus,
Springer-Verlag, Berlin, Heidelberg.
Khatri, C. G. (1980). Quadratic forms in normal variables, in P. R. Krishnaiah (ed.),
Handbook of Statistics, Vol. I, North-Holland Publishing Company, Amsterdam,
New York, Oxford, Tokyo, pp. 443–469.
Kruse, S. (2003). On the pricing of forward starting options under stochastic volatil-
ity, Berichte des Fraunhofer ITWM 53(2003), Fraunhofer Institut Techno- und
Wirtschaftsmathematik, Kaiserslautern.
Küchler, U., Neumann, K., Sørensen, M. and Streller, A. (1999). Stock returns and
hyperbolic distributions, Mathematical and Computer Modelling 29: 1–15.
Lagnado, R. and Osher, S. (1997). A technique for calibrating derivative security
pricing models: Numerical solution of an inverse problem, Journal of Computa-
tional Finance 1(1): 13–25.
Lamoureux, C. G. and Lastrapes, W. D. (1993). Forecasting stock-return variance:
Toward an understanding of stochastic implied volatilities, Review of Financial
Studies 6(2): 293–326.
Latané, H. A. and Rendelman, J. (1976). Standard deviations of stock price ratios
implied in option prices, Journal of Finance 31: 369–381.
Ledoit, O. and Santa-Clara, P. (1998). Relative option pricing with stochastic volatil-
ity, Working paper, UCLA, Los Angeles, USA.
Lee, P., Wang, L. and Karim, A. (2003). Index volatility surface via moment-
matching techniques, RISK 16(12): 85–89.
Lee, R. W. (2001). Implied and local volatilities under stochastic volatility, Inter-
national Journal of Theoretical and Applied Finance 4(1): 45–89.
Lee, R. W. (2002). Implied volatility: Statics, dynamics, and probabilistic interpre-
tation, Recent Advances in Applied Probability. Forthcoming.
Lee, R. W. (2003). The moment formula for implied volatility at extreme strikes,
Mathematical Finance. Forthcoming.
Lepski, O. and Spokoiny, V. (1997). Optimal pointwise adaptive methods in non-
parametric estimation, Annals of Statistics 25: 2512–2546.
Lewis, A. L. (2000). Option Valuation under Stochastic Volatility, Finance Press.
Lintner, J. (1965). The valuation of risky assets and the selection of risky invest-
ments in stock portfolios and capital budgets, Review of Economics and Statistics
47: 13–37.
Linton, O., Mammen, E., Nielsen, J. and Tanggaard, C. (2001). Yield curve estima-
tion by kernel smoothing, Journal of Econometrics 105(1): 185–223.
Linton, O., Nguyen, T. and Jeffrey, A. (2003). Nonparametric estimation of single
factor Heath-Jarrow-Morton term structure models and a test for path indepen-
dence, Technical report, LSE, London.
Lipton, A. (2001). Mathematical Methods For Foreign Exchange: A Financial En-
gineer’s Approach, World Scientific Publishing Company.
Manaster, S. and Koehler, G. (1982). The calculation of implied variances from the
Black-and-Scholes model: A note, Journal of Finance 37: 227–230.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1992). Multivariate Analysis, 8th edn,
Academic Press, Academic Press Ltd., London.
References 219

Markowitz, H. (1959). Portfolio Selection: Efficient Diversification of Investments,


John Wiley, New York.
Marron, J. S. and Härdle, W. (1986). Random approximations to an error criterion
of nonparametric statistics, Journal of Multivariate Analysis 20: 91–113.
Marron, J. S. and Nolan, D. (1988). Canonical kernels for density estimation, Statis-
tics and Probability Letters 7(3): 195–199.
McIntyre, M. L. (2001). Performance of Dupire’s implied diffusion approach under
sparse and incomplete data, Journal of Computational Finance 4(4): 33–84.
Mercurio, D. (2004). Adaptive estimation for financial time series, PhD thesis,
Humboldt-Universität zu Berlin, Berlin.
Mercurio, D. and Spokoiny, V. (2004). Statistical inference for time-inhomogeneous
volatility models, Annals of Statistics. Forthcoming.
Merton, R. C. (1973). Theory of rational option pricing, Bell Journal of Economics
and Management Science 4(Spring): 141–183.
Merton, R. C. (1976). Option pricing when underlying stock returns are discontin-
uous, Journal of Financial Economics 3: 125–144.
Meyer, P. A. (1976). Un cours sur les intégrales stochastiques, number 511 in Lecture
Notes in Mathematics, Springer-Verlag, Berlin, Heidelberg.
Molgedey, L. and Galic, E. (2001). Extracting factors for interest rate scenarios,
European Physical Journal B 20(4): 517–522.
Musiela, M. and Rutkowski, M. (1997). Martingale Methods for Financial Modelling,
Springer-Verlag, Berlin, Heidelberg.
Nadaraya, E. A. (1964). On estimating regression, Theory of Probability and its
Applications 10: 186–190.
Nagot, I. and Trommsdorff, R. (1999). The tree of knowledge, RISK 12(8): 99–102.
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new ap-
proach, Econometrica 59: 347–370.
Nelson, D. B. and Ramaswamy, K. (1990). Simple binomial processes as diffusion
approximations in financial models, Review of Financial Studies 3(3): 393–430.
Niffikeer, C. L., Hewins, R. D. and Flavell, R. B. (2000). A synthetic factor approach
to the estimation of value-at-risk of a portfolio of interest rate swaps, Journal of
Banking and Finance 24: 1903–1932.
Øksendal, B. (1998). Stochastic Differential Equations, 5th edn, Springer-Verlag,
Berlin, Heidelberg.
Overhaus, M. (2002). Himalaya options, RISK 15(3): 101–104.
Pagan, A. and Ullah, A. (1999). Nonparametric Econometrics, Cambridge University
Press, Cambridge.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space,
Philosophical Magazine 2(6): 559–572.
Peña, I., Rubio, G. and Serna, G. (1999). Why do we smile? On the determinants of
the implied volatility function, Journal of Banking and Finance 23: 1151–1179.
Pérignon, C. and Villa, C. (2002). Component proponents, RISK 15(9): 154–156.
Pérignon, C. and Villa, C. (2004). Component proponents II, RISK 17(7): 77–79.
Pezzulli, S. and Silverman, B. W. (1993). Some properties of smoothed principal
components analysis for functional data, Computational Statistics 8: 1–13.
Pham, H. and Touzi, N. (1996). Intertemporal equilibrium risk premia in a stochastic
volatility model, Journal of Mathematical Finance 6: 215–236.
220 References

Pong, S., Shackleton, M., Taylor, S. and Xu, X. (2003). Forecasting currency volatil-
ity: A comparison of implied volatilities and AR(FI)MA models, Journal of
Banking and Finance. Forthcoming.
Poon, S.-H. and Granger, C. W. J. (2003). Forecasting volatility in financial markets:
A review, Journal of Economic Literature 41: 478–539.
Press, W., Flannery, B., Teukolsky, S. and Vetterling, W. (1993). Numerical Recipes
in C: The Art of Scientific Computing, 2nd edn, Cambridge University Press.
Quessette, R. (2002). New products, new risks, RISK 15(3): 97–100.
Rady, S. (1997). Option pricing in the presence of natural boundaries and a quadratic
diffusion term, Finance and Stochastics 1: 331–344.
Ramsay, J. O. and Silverman, B. W. (1997). Functional Data Analysis, Springer-
Verlag, Berlin, Heidelberg.
Randall, C. and Tavella, D. (2000). Pricing Financial Instruments: The Finite
Difference Method, John Wiley & Sons, New York.
Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd edn, Wiley,
New York.
Rebonato, R. (1998). Interest-Rate Option Models: Understanding, Analyzing and
Using Models for Exotic Interest-Rate Options, Wiley Series in Financial Engi-
neering, 2nd edn, John Wiley & Son Ltd.
Rebonato, R. (1999). Volatility and Correlation, Wiley Series in Financial Engineer-
ing, John Wiley & Son Ltd.
Renault, E. and Touzi, N. (1996). Option hedging and implied volatilities in a
stochastic volatility model, Mathematical Finance 6(3): 279–302.
Riesz, F. and Nagy, B. (1956). Functional Analysis, Blackie, London.
Roll, R. (1984). A simple implicit measure of the effective bid-ask spread, Journal
of Finance 39: 1127–1139.
Rookley, C. (1997). Fully exploiting the information content of intra-day option
quotes: Applications in option pricing and risk management, Technical report,
Department of Finance, University of Arizona.
Rose, G. (2004). Unternehmenssteuerrecht, E. Schmidt Verlag.
Rosenberg, J. (2000). Implied volatility functions: A reprise, Journal of Derivatives
7: 51–64.
Rossi, A. (2002). The Britten-Jones and Neuberger smile-consistent with stochas-
tic volatility option pricing model: A further analysis, International Journal of
Theoretical and Applied Finance 5(1): 1–31.
Rubinstein, M. (1994). Implied binomial trees, Journal of Finance 49: 771–818.
Ruppert, D. (1997). Empirical-bias bandwidths for local polynomial nonparametric
regression and density estimation, Journal of the American Statistical Associa-
tion 92: 1049–1062.
Ruppert, D. and Wand, M. P. (1994). Multivariate locally weighted least squares
regression, Annals of Statistics 22(3): 1346–1370.
Schmalensee, R. and Trippi, R. R. (1978). Common stock volatility expectations
implied by option premia, Journal of Finance 33: 129–147.
Schönbucher, P. J. (1999). A market model for stochastic implied volatility, Philo-
sophical Transactions of the Royal Society 357(1758): 2071–2092.
Schoutens, W. (2003). Lévy Processes in Finance, John Wiley & Sons, New York.
Schwarz, G. (1978). Estimating the dimension of a model, Annals of Statistics
6: 461–464.
References 221

Scott, L. (1987). Option pricing when the variance changes randomly: Theory,
estimation, and an application, Journal of Financial and Quantitative Analysis
22: 419–37.
Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under
conditions of risk, Journal of Finance 19: 425–442.
Shimko, D. (1993). Bounds on probability, RISK 6(4): 33–37.
Shu, J. and Zhang, J. E. (2003). The relationship between implied and realized
volatility of S&P 500 index, Wilmott magazine Jan.: 83–91.
Skiadopoulos, G. (2001). Volatility smile consistent option models: A survey, Inter-
national Journal of Theoretical and Applied Finance 4(3): 403–437.
Skiadopoulos, G., Hodges, S. and Clewlow, L. (1999). The dynamics of the S&P 500
implied volatility surface, Review of Derivatives Research 3: 263–282.
Spokoiny, V. (1998). Estimation of a function with discontinuities via local polyno-
mial fit with an adaptive window choice, Annals of Statistics 26: 1356–1378.
Steele, J. M. (2000). Stochastic Calculus and Financial Applications, Springer-
Verlag, Berlin, Heidelberg, New York.
Stein, E. M. and Stein, J. C. (1991). Stock price distributions with stochastic
volatility: An analytic approach, Review of Financial Studies 4: 727–752.
Stone, C. J. (1986). The dimensionality reduction principle for generalized additive
models, The Annals of Statistics 14: 592–606.
Tanaka, H. (1963). Note on continuous additive functionals of the 1-dimensional
Brownian path, Zeitschrift für Wahrscheinlichkeitstheorie 1: 251–257.
Taylor, S. J. (2000). Consequences for option pricing of a long memory in volatility,
Working paper, Department of Accounting and Finance, Lancaster University,
UK.
Tipke, K., Lang, J. and Seer, R. (2002). Steuerrecht, O. Schmidt Verlag, Köln.
Tompkins, R. (1999). Implied volatility surfaces: Uncovering regularities for options
on financial futures, Working paper, Vienna University of Technology.
Tompkins, R. (2001). Stock index futures markets: Stochastic volatility models and
smiles, The Journal of Futures Markets 21(1): 43–78.
Tse, Y. and Tsui, A. (2002). A multivariate generalized autoregressive conditional
heteroscedastic model with time-varying correlations, Journal of Business and
Economic Statistics 20(3): 351–362.
Vähämaa, S. (2004). Delta hedging with the smile, Financial Markets and Portfolio
Management 18(3): 241–255.
Watson, G. S. (1964). Smooth regression analysis, Sankyhā, Series A 26: 359–372.
Weinberg, S. A. (2001). Interpreting the volatility smile: An examination of the
informational content of option prices, International Finance Discussion Papers
706, Federal Reserve Board, Washington, D. C.
Whaley, R. (1982). Valuation of American call options on dividend-paying stocks:
Empirical tests, Journal of Financial Economics 10: 29–58.
Wilmott, P. (2001a). Paul Wilmott on Quantitative Finance, Vol. 1, John Wiley &
Sons.
Wilmott, P. (2001b). Paul Wilmott on Quantitative Finance, Vol. 2, John Wiley &
Sons.
Zakoian, J. M. (1994). Threshold heteroskedastic functions, Journal of Economic
Dynamics and Control 18: 931–955.
Zhu, Y. and Avellaneda, M. (1997). An E-ARCH model for the term-structure of
implied volatility of FX options, Applied Mathematical Finance 4: 81–100.
222 References

Zhu, Y. and Avellaneda, M. (1998). A risk-neutral stochastic volatility model, In-


ternational Journal of Theoretical and Applied Finance 1(2): 289–310.
Zühlsdorff, C. (2002). The pricing of derivatives on assets with quadratic volatility,
Working Paper B-451, Bonn SfB 303.
Index

Akaike information criterion, 108, 111, of implied volatility surface, 139–144


116, 132, 168 partial, 131
arbitrage, 11 proportional model, 130
at-the-money, 21 stability analysis , 144–149
average squared error, 107 stability tests, 134–138
time series models, 149–153
bandwidth choice, 106–113 constant elasticity of variance model, 86
Barle Cakici implied tree, 70 contingent claim, 10
Black Scholes formula, 20, 40, 50, 58, counterparty, 10
117, 186 covariation process, 192
Black Scholes model, 9–10 cross validation, 107, 108, 115
call option, 14
generalized PDE, 36, 52 delta, 15, 27, 41, 90
partial differential equation, 12 model consistent, 91
bond, riskless, 10 recalibration, 92
Brigo Mercurio model, 87 sticky-moneyness, 91
Britten-Jones Neuberger implied tree, sticky-strike, 91
83 vega correction, 42, 91
Brownian motion delta hedging, 15, 36, 41–44, 46, 90–92
definition, 192 delta-sigma hedging, 40
geometric, 9 derivative, 10
derivatives estimation, 61,
call option, 10 → nonparametric regression
Black Scholes formula, 14 Derman Kani Criss implied tree, 78
common principal component models, Derman Kani implied tree, 70
128–148 Derman Kani stochastic implied tree,
asymptotic distribution of eigenval- 82
ues, 134 difference dividend, 186, 187
asymptotic distribution of eigenvec- dimension reduction,
tors, 134 → common principle component
hierarchy of models, 131 models, functional principle com-
likeklihood function, 133 ponent analysis, semiparametric
model selection, 138–139 factor models
motivation, 128 Dupire formula, 53, 55, 57
224 Index

discrete-time version, 84 Itô formula


implied volatility counterpart, 58, 116 multi-dimensional, 193
one-dimensional, 192
exercise price, 10
expectation Jackwerth implied tree, 73
K-strike, T -maturity forward jump diffusion, 45, 46, 93
risk-adjusted, 53, 65–68
risk neutral, 13, 39, 51 Karhunen-Loève expansion, 156
kernels, 101–102
filtration, 191 Epanechnikov, 101
Fokker-Planck equation, 56, 195 Gaussian, 102
forward price, 21, 22, 186 multivariate, 102
functional data analysis, 156–160 quartic, 101
functional principle component analysis
Lévy-process, 45, 46
computation, 158–160
least squares kernel smoothing,
basis expansions, 158
→ semiparametric regression
discretization, 158
Lipschitz condition, 194
Galerkin method, 159–160
local polynomial smoothing,
set-up, 156
→ nonparametric regression
fundamental theorem of asset pricing,
local volatility,
13
→ volatility, local
futures price, 22
local volatility model, 69–92
local volatility surface,
gamma, 15, 92
→ volatility, local
GARCH models
estimation via local polynomials, 61,
of implied volatility, 150–153
89, 100, 105
Girsanov’s theorem, 13, 39, 195
greeks, market
→ delta, gamma, rho, vanna, complete, 13, 50
vega, volga, theta, 15–20 incomplete, 39–41
market price of risk, 13, 50, 67
hedging volatility, 41 market price of volatility risk, 39, 66
martingale, 191
implied tree mean integrated squared error, 107
binomial, 69–78 mean squared error, 106
stochastic, 82–86 asymptotic, 106
trinomial, 78–82 measure
implied volatility surface, K-strike, T -maturity forward
→ volatility, implied risk-adjusted, 65–68
estimation of derivatives, 61, 100, 105 risk neutral, 13, 39, 51
least squares kernel smoothing, mixture diffusions, 87
116–124 moneyness
nonparametric smoothing, 102–106 forward, futures, 21
shift factor, 140, 173 log-, 21
slope factor, 141, 173 stock price, 21
term structure factor, 173
twist factor, 141, 173 Nadaraya-Watson estimator,
in-the-money, 21 → nonparametric regression
integrated squared error, 107 nonparametric regression
Index 225

bandwidth choice, 106–116 least squares kernel smoothing,


leave-one-out estimator, 108 116–124
local polynomial smoothing, 61, 90, assumptions, 117
104–106, 139 asymptotic normality, 118, 199–201
asymptotic bias, 105 consistency, 118, 197–198
asymptotic variance, 105 weighting schemes, 119–120
derivatives estimation, 105 semiparametric factor model, 161–181
multi-variate, 106 estimation, 164–166
Nadaraya-Watson estimator, 102–104 model selection, 168–170
asymptotic bias, 103 norming, 167
asymptotic variance, 103 of implied volatility, 170–179
multi-variate, 104 prediction performance, 179–181
set-up, 164
option state price density, 18, 20, 30, 54, 62
American style, 10 stochastic differential equation, 194
barrier, 43, 92, 98 stochastic implied volatility model,
call, 10 93–96
European style, 10 PDE, 96
forward starting, 93 super-hedging, 41
plain vanilla, 10
put, 10 Tanaka-Meyer formula, 54, 194
underlying asset, 10 theta, 15
Ornstein-Uhlenbeck process, 38 time to maturity, 21
out-of-the-money, 21 trading strategy, 11, 12

payoff function vanna, 15, 92


call, 10 variance
put, 10 instantaneous, 10
portfolio local, 51
replicating, 11 variance explained, 168
self-financing, 11 implied volatility surface, 140
tame, 11 vega, 15, 58, 118, 119
principal component analysis, 125 volatility
put option, 10 Black Scholes model, 10
Black Scholes formula, 14 constant, 10, 14, 20, 36
put-call parity, 14, 15, 186 deterministic, 36–38
implied, 20–28
quadratic variation process, 192 as spatial harmonic mean, 62, 63
quantile-hedging, 41 DAX, 32–36
explanation for, 45–47
Radon-Nikodým derivative, 13, 39, 67, forecast, 68–69
196 interpretation as average, 37, 40
rho, 15 large, small strike behavior, 29–31
risk-minimizing hedging, 41 link to local, 57–64
Rubinstein implied tree, 73 overview, 98
predictor of realized, 44–45
Schönbucher model, 93 slope bounds, 28–29
Schwarz information criterion, 132 stochastic, 93–96
semiparametric regression stylized facts, 31–32
226 Index

instantaneous, 10, 51, 62 Dupire formula, 53–57


Cox-Ross model, 86 implied tree, 75, 80
deterministic, 52, 55, 56, 95 link to implied, 57–64
in local volatility models, 51, 52 mixture diffusions, 87–88
in stochastic implied volatility nonparametric approaches, 89–90
models, 94 overview, 98
overview, 98 parametric approaches, 86–88
stochastic, 51, 66 slope rule, 64, 91
unconditional expectation of, 68 quadratic, 86
local, 51–65 stochastic, 38–41, 46
characterization as risk-adjusted time-dependent, 36
expectation, 53, 65–68 volga, 15, 118
definition, 52
dual PDE approach, 56–57 Wishart distribution, 132

View publication stats

You might also like