0% found this document useful (0 votes)
78 views13 pages

Regime Switching Models: An Example For A Stock Market Index

This document discusses estimating regime switching models using an example of a US stock market index. It specifies that the stock market returns follow one of two normal distributions depending on the state of a latent Markov process. It describes the likelihood function and parameters of the model, and outlines an Expectation-Maximization algorithm to estimate the parameters by maximizing the conditional log-likelihood. The document also describes filtering and smoothing recursions to calculate the probabilities of being in each state over time given the observed returns.

Uploaded by

Haris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views13 pages

Regime Switching Models: An Example For A Stock Market Index

This document discusses estimating regime switching models using an example of a US stock market index. It specifies that the stock market returns follow one of two normal distributions depending on the state of a latent Markov process. It describes the likelihood function and parameters of the model, and outlines an Expectation-Maximization algorithm to estimate the parameters by maximizing the conditional log-likelihood. The document also describes filtering and smoothing recursions to calculate the probabilities of being in each state over time given the observed returns.

Uploaded by

Haris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Regime Switching Models: An Example for a Stock

Market Index
Erik Kole∗
Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam

April 2010

In this document, I discuss in detail how to estimate regime switching models with an
example based on a US stock market index.

1 Specification
We assume that the returns on the US stock market index, Yt , follow a distribution that
depends on a latent process St . At each point in time, the process St is in one out of two
regimes, which we indicate by St = 0 and St = 1. The return Yt behaves according to
{
N (µ0 , σ02 ) if St = 0
Yt ∼ (1)
N (µ1 , σ12 ) if St = 1.

In both regimes, the return follows a normal distribution, though with different means and
variances. We use the function f to denote the normal pdf,
( )
1 (y − µ)2
f (y; µ, σ ) = √
2
exp − . (2)
2πσ 2σ 2

Of course it is possible to have different distributions in regime 0 and 1.


The latent process St follows a first order Markov chain. This means that the probability
for regime 0 to occur at time t depends solely on the regime at time t − 1. We denote these
transition probabilities by

pij = Pr[St = i|St−1 = j] (3)

The transition probabilities for the departure states j should add up to one, i.e., p00 +p10 =
1 and p01 + p11 = 1. So, for a binary process St , we have two free parameters, p00 and p11 .

Corresponding author. Address: Burg. Oudlaan 50, Room H11-13, P.O. Box 1738, 3000DR Rotter-
dam, The Netherlands, Tel. +31 10 408 12 58. E-mail addresses [email protected].

1
We gather the transition probabilities in a transition matrix
( ) ( )
p00 p01 p00 1 − p11
P = = . (4)
p10 p11 1 − p00 p11
Since the whole process St is unobserved, this also applies to the initial regime S1 . We
introduce a separate parameter ζ for the probability that the first regime occurs,
ζ = Pr[S1 = 0]. (5)
Naturally, we have Pr[S1 = 1] = 1 − ζ. Because no conditional information on S0 is
available, we cannot directly use the transition matrix to determine this probability, and
we need the extra parameter. This last parameter can be estimated, but also specified
exogenously. We assume in this document that the parameter is estimated.

2 Inference on St
The process St is latent, which means that we will never know for sure which regime
prevailed at a certain point in time. However, we can use the information from the current
and past observations, combined with the distributions and transition probabilities to make
an inference on Pr[St = 0|yt , yt−1 , . . . , y1 ]. We accomplish this by using Bayes’ rule,
Pr[B|A] Pr[A]
Pr[A|B] = .
Pr[B]
For the inference of the regime at time t = 1, this means
Pr[Y1 = y1 |S1 = 0] · Pr[S1 = 0]
Pr[S1 = 0|Y1 = y1 ] =
Pr[Y1 = y1 ]
Pr[Y1 = y1 |S1 = 0] · Pr[S1 = 0]
=
Pr[Y1 = y1 |S1 = 0] · Pr[S1 = 0] + Pr[Y1 = y1 |S1 = 1] · Pr[S1 = 1]
f (y1 ; µ0 , σ02 ) · ζ
= .
f (y1 ; µ0 , σ02 ) · ζ + f (y1 ; µ1 , σ12 ) · (1 − ζ)
In the second equality, we use conditioning again, because conditional on the regime the
distribution of Y1 is given. We make the distributions explicit in the third equality. In a
similar way, we find an expression for Pr[S1 = 1|Y1 = y1 ], but we can also compute this
using Pr[S1 = 1|Y1 = y1 ] = 1 − Pr[S1 = 0|Y1 = y1 ].
After computing the inferences for the regimes at time 1, we can use them to make a
forecast for the regime distribution at time 2,
Pr[S2 = 0|Y1 = y1 ] = Pr[S2 = 0|S1 = 0, Y1 = y1 ] · Pr[S1 = 0|Y1 = y1 ]+
Pr[S2 = 0|S1 = 1, Y1 = y1 ] · Pr[S1 = 1|Y1 = y1 ]
= Pr[S2 = 0|S1 = 0] · Pr[S1 = 0|Y1 = y1 ]+
Pr[S2 = 0|S1 = 1] · Pr[S1 = 1|Y1 = y1 ]
= p00 Pr[S1 = 0|Y1 = y1 ] + p01 Pr[S1 = 1|Y1 = y1 ].

2
In the first equality we condition on the regime at time 1. In the second equality we use
the fact that St follows a first order Markov chain independent of the process Yt . Again, we
can similarly derive Pr[S2 = 1|Y1 = y1 ] or use Pr[S2 = 1|Y1 = y1 ] = 1 − Pr[S2 = 0|Y1 = y1 ].
The steps of calculating inference and forecast probabilities define a recursion. Based
on the forecast probabilities for time 2 and the observation y2 we can calculate inference
probabilities for the regime at time 2. In turn, we use these inferences for forecasts for
the regime at time 3. We can write these recursions compacter by using vector-matrix
notation. We use ξt|t = Pr[St |yt , yt−1 , . . . , y1 ] to denote the vector of inferences probabilities
at time t, and ξt+1|t = Pr[St+1 |yt , yt−1 , . . . , y1 ] for the forecast probabilities at time t, using
information up to time t. We gather the densities of observation yt conditional on the
regimes in a vector ft . We can construct the series of inference and forecast probabilities
by the recursion
1
ξt|t = ′
ξt|t−1 ⊙ ft (6)
ξt|t−1 ft
ξt+1|t = P ξt|t , (7)

where ⊙ indicates element-by-element multiplication. We call this the filter recursion.


It is also possible to determine the probability of the occurrence of a specific regime
at time t, using all available information, i.e., information before and after time t, which
we call smoothed inference probabilities. These probabilities denoted by ξt|T can also be
calculated by recursion,
( )
ξt|T = ξt|t ⊙ P ′ (ξt+1|T ÷ ξt+1|t ) , (8)

where we use the inference and forecast probabilities (see Kim, 1994, §2.2, for a derivation).
We use the smoothed inference probabilities mostly to show how the regimes are identified.
This recursion is called the smoother recursion.

3 Estimation
We can estimate the parameters of the regime switching models using a maximum like-
lihood approach. As with other conditional models such as ARMA- or GARCH-models,
the likelihood function will take a conditional form, too. We gather the parameters of the
model in a vector θ = (µ1 , σ1 , µ2 , σ2 , p00 , p11 , ζ)′ . The conditional likelihood function is
given by


T
L(y1 , y2 , . . . , yT ; θ) = Pr[Yt = yt |yt−1 , yt−2 , . . . y1 ]. (9)
t=1

3
Conditioning on the regime at time t, we find
Pr[Yt = yt |y1 , y2 , . . . yt−1 ] = Pr[Yt = yt |St = 0, y1 , y2 , . . . yt−1 ] · Pr[St = 0|y1 , y2 , . . . yt−1 ]+
Pr[Yt = yt |St = 1, y1 , y2 , . . . yt−1 ] · Pr[St = 1|y1 , y2 , . . . yt−1 ]
= Pr[Yt = yt |St = 0] · ξt|t−1,0 + Pr[Yt = yt |St = 1] · ξt|t−1,1

=ξt|t−1 ft
In the second equality, we use the information that the distribution of Yt |St does not
depend on further prior realizations. The conditional log likelihood function can thus be
calculated as

T

ℓ(y1 , y2 , . . . , yT ; θ) = log(ξt|t−1 ft ), (10)
t=1

which follows as a byproduct of the filter recursion.


Straightforward maximum likelihood estimation implies maximizing (10) as a function
of θ. Because of the filter recursion, the log likelihood function exhibits a complicated
structure with many local optima. Optimizing this function may therefore be computa-
tionally demanding. Therefore, we will use a special optimization algorithm, called the
Expectation-Maximization (EM) algorithm of Dempster et al. (1977).

3.1 The Expectation-Maximization Algorithm


Suppose that we could actually observe the realizations of the latent process St , and we
would have a set {s1 , s2 , . . . , sT } similar to the set {y1 , y2 , . . . , yT }. To simplify notation,
we write St = {s1 , s2 , . . . , st } and Yt = {y1 , y2 , . . . , yt }. The realization of St is either zero
or one, so it corresponds with a draw from a Bernoulli distribution. We find the density
of the combination (yt , st ) conditional on past observations as
Pr[Yt = yt , St = st |Yt−1 , St−1 ; θ] = Pr[Yt = yt |St ; θ] Pr[St = st |St−1 ; θ]


 f (yt ; µ0 , σ02 )p00 if st = 0, st−1 =0

f (y ; µ , σ 2 )(1 − p ) if s = 0, s
t 0 0 11 t t−1 =1
=

 f (yt ; µ1 , σ1 )(1 − p00 ) if st = 1, st−1
2
=0


f (yt ; µ1 , σ12 )p11 if st = 1, st−1 =1 (11)
( ) (1−s )(1−s t−1 )
= f (yt ; µ0 , σ02 )p00 ×
t

( ) (1−st )st−1
f (yt ; µ0 , σ02 )(1 − p11 ) ×
( ) st (1−st−1 )
f (yt ; µ1 , σ12 )(1 − p00 ) ×
( 2
)st st−1
f (yt ; µ1 , σ1 )p11 .
We see that the density of (yt , st ) combines the fact that conditionally, yt follows a normal
distribution, with the fact that st follows a Bernoulli distribution, conditionally on its
previous realization st−1 .

4
When we construct the log likelihood function of the joint observations (YT , ST ), we
need the log of (11)

log Pr[Yt = yt , St = st |Yt−1 , St−1 ; θ]


( ( ) )
= log f yt ; µ0 , σ02 p00 · (1 − st ) · (1 − st−1 )+
( ( ) )
log f yt ; µ0 , σ02 (1 − p11 ) · (1 − st ) · · ·t−1 +
( ( ) )
log f yt ; µ1 , σ12 (1 − p00 ) · st · (1 − st−1 )+
( ( ) )
log f yt ; µ1 , σ12 p11 · st · st−1
( ) ( )
=(1 − st ) log f yt ; µ0 , σ02 + st log f yt ; µ1 , σ12 +
(1 − st )(1 − st−1 ) log p00 + (1 − st )st−1 log(1 − p11 )+
st (1 − st−1 ) log(1 − p00 ) + st st−1 log p11

A small alteration must be made for the density of Pr[Y1 = y1 , S1 = s1 ; θ], since no history
will be available there. So, instead of the Markov chain parameters p00 and p11 we find an
expression with the parameter ζ,
( )(1−s1 ) ( )s
Pr[Y1 = y1 , S1 = s1 ; θ] = f (yt ; µ0 , σ02 )ζ f (yt ; µ1 , σ12 )ζ 1 .

Now, we can simply construct the log likelihood for (YT , ST ) as


T (
∑ )
ℓY,S (YT , ST ; θ) = (1 − st ) log f (yt ; µ0 , σ02 ) + st log f (yt ; µ1 , σ12 ) +
t=1
∑T (
(1 − st )(1 − st−1 ) log p00 + (1 − st )st−1 log(1 − p11 )+ (12)
t=2
)
st (1 − st−1 ) log(1 − p00 ) + st st−1 log p11 +
(1 − s1 ) log ζ + s1 log(1 − ζ)

This log likelihood function would be much easier to optimize than the actual log likelihood
function in (10), because (12) does not exhibit a recursive relation. However, we cannot
observe St .
The EM-algorithm proposes to base the estimation on (12). Because we do not have
actual observations on St , the EM-algorithm maximizes the expectation of the log likelihood
function in (12) based on the complete data that we do observe, YT . So, instead of working
with st , we work with the expectation of St conditional on the data and the parameters,

E[St |YT ; θ] = Pr[St = 0|YT ; θ] · 0 + Pr[St = 1|YT ; θ] · 1 = Pr[St = 1|YT ; θ]. (13)

The last probability is a smoothed inference probability as in (8). Similarly, we find

E[St St−1 |YT ; θ] = Pr[St = St−1 = 1|YT ; θ]. (14)

5
This approach would almost retain the attractive structure of the log likelihood function
in (12). Almost, as the expectations of St and St St−1 depend on θ and are calculated again
via the recursion in (8). The trick of the EM-algorithm is to treat the expectation part
and the maximization separately. So, for a given parameter vector θ, the expectations in
(13) and (14) are calculated. Then, these expectations are treated as given, and a new
parameter vector θ ∗ is calculated which maximizes the expected log likelihood function.
Of course, this new parameter vector gives rise to other expectations, which in turn lead
to a new parameter vector. So, instead of one direct maximum likelihood estimation, we
conduct a series of expectation maximization steps, which produce a series of parameter
estimates θ (k)
[ ]
θ (k) = arg max E ℓY,S (YT , ST ; θ)|YT ; θ (k−1) . (15)
θ

Dempster et al. (1977) and Hamilton (1990) show that this sequence of θ (k) converges and
produces a maximum of (10). As always, this maximum can be local, and may depend on
starting values θ (0) .

3.2 The Maximization Step


We now look at the maximization step in more detail. Our starting point is the likeli-
hood function in (12), for which we calculate the expectation conditional on the data and
parameters θ (k−1) ,
( ) [ ]
ℓEM YT ; θ, θ (k−1) = E ℓY,S (YT , ST ; θ)|YT ; θ (k−1) (16)

The updated parameters θ (k) maximize this expected log likelihood function, so they satisfy
the first order conditions
( )
∂ℓEM YT ; θ, θ (k−1)
= 0. (17)
∂θ (k)
θ=θ

Taking a closer look at (12), we see that the log likelihood function can be split in
terms that exclusively relate to specific parameters. The parameters of the distribution
for the first regime µ0 and σ02 are only related to the first term, and the parameters of
the distribution for the second regime only to the second. The transition probability p00
is related to the third and fifth term, and so on. So differentiation will produce relatively
simply conditions.
(k−1)
We first look at differentiating (16) with respect to µ0 . We will use ξt|T,0 to denote
Pr[St = 0|YT ; θ (k−1) ] = 1 − E[St |YT ; θ (k−1) ], which is the smoothed inference probability
that we find when we apply the filter and smoother recursions in (6)–(8) with parameters

6
θ (k−1) . We find

T
(k−1)
( ) ∂ ξt|T,0 log f (yt ; µ0 , σ02 )
∂ℓEM YT ; θ, θ (k−1)
t=1
=
∂µ0 ∂µ0
∑T ( )
(k−1) 1 1 (yt − µ0 )2
∂ ξt|T,0 − log 2π − log σ0 − (18)
t=1
2 2 σ02
=
∂µ0

T
(k−1) (yt − µ0 )
= ξt|T,0
t=1
σ02

(k)
For the optimal µ0 this expression equals zero, which means that we find
∑T (k−1)
(k) t=1 ξt|T,0 yt
µ0 = ∑T (k−1) . (19)
t=1 ξt|T,0

This estimate for µ0 can be interpreted as a weighted average of the observations, where
the smoothed inference probabilities for regime 0 serve as weights. It is a clear extension
of the normal maximum likelihood estimator for the mean of a normal distribution. For
(k) (k−1) (k−1)
µ1 we find a similar expression, with ξt|T,1 instead of ξt|T,0 .
Next we consider the estimates for σ02 . Differentiation yields


T
(k−1)
( ) ∂ ξt|T,0 log f (yt ; µ0 , σ02 )
∂ℓEM YT ; θ, θ (k−1)
t=1
=
∂σ0 ∂σ0

T ( )
(k−1) 1 1 (yt − µ0 )2
∂ ξt|T,0 − log 2π − log σ0 − (20)
t=1
2 2 σ02
=
∂σ0

T ( )
(k−1) (yt − µ0 )2 1
= ξt|T,0 3
− .
t=1
σ0 σ0

(k)
The optimal σ0 sets this expression to zeros, so
v ( )
u∑
u T ξ (k−1) y − µ(k) 2
u t 0
σ0 = t
(k) t=1 t|T,0
∑T (k−1) , (21)
ξ
t=1 t|T,0

which is again a weighted average.

7
In a similar way we can derive the estimates for p00 and p11 . Before we derive these
estimates, note that
[ ]
E (1 − St )(1 − St−1 )|YT ; θ
[ ] [ ] [ ]
=1 − E St |YT ; θ − E St−1 |YT ; θ + E St St−1 |YT ; θ
[ ] [ ] [ ]
=1 − Pr St = 1|YT ; θ − Pr St = 1|YT ; θ + Pr St = St−1 = 1|YT ; θ
[ ]
= Pr St = St−1 = 0|YT ; θ
[ ] [ ] [ ]
and[ similarly E St (1−St−1])|YT ; θ = Pr St = 1, St−1 = 0|YT ; θ and E (1−St )St−1 |YT ; θ =
Pr St = 0, St−1 = 1|YT ; θ . These probabilities can be calculated with a slight modification
of the recursion in (8),
ξt+1|T,i (k−1)
Pr[St+1 = i, St = j|YT ; θ (k−1) ] = p̃ij,t+1 = ξt|t,j · p (22)
ξt+1|t,i ij
The derivative for p00 is given by

T

( ) ∂ p̃00,t log p00 + p̃10,t log(1 − p00 )


∂ℓEM YT ; θ, θ (k−1) t=2
=
∂p00 ∂p00 (23)
(
∑ p̃00,t
T )
p̃10,t
= − .
t=2
p00 1 − p00

Setting this expression to zero implies


∑T ∑T
(k) t=2 p̃00,t p̃00,t
p00 = ∑T = ∑T t=2 . (24)
t=2 (p̃ 00,t + p̃ 10,t ) t=2 ξ t−1|T,0

This can be generalized to


∑T
(k) p̃ij,t
pij = ∑T t=2 , (25)
t=2 ξt−1|T,j

which corresponds with (3.45) in Franses and van Dijk (2000).


Finally, we consider the estimate for the ζ parameter, which is easy to derive. The
derivative of interest is
( ) ( )
∂ℓEM YT ; θ, θ (k−1) ∂ ξ1|T,0 log ζ + ξ1|T,1 log(1 − ζ)
=
∂ζ ∂ζ
(26)
ξ1|T,0 ξ1|T,1
= − .
ζ 1−ζ
Setting this expression to zero we find
(k−1)
ζ (k) = ξt|T,0 . (27)

8
3.3 Remarks
1. The EM-algorithm needs starting values θ (0) . In principle, these starting values
can be picked at random, as long as they are feasible, i.e., positive volatility and
probabilities between zero and one. It is advisable to make sure that the distribution
parameters for regime 0 differ substantially from those for regime 1. For example,
take the volatility for regime 1 three or four times that for regime 0. Regimes tend
to be persistent, so set the transition probabilities at a high value of 0.9, say

2. The EM-algorithm converges and maximizes the likelihood. This means that each
maximization step in the EM-algorithm should yield an improvement. In other words,
for each new set of parameters θ (k) , the log likelihood function in (10) should in-
crease.
( In )implementing
( the) algorithm, an important control mechanism is whether
ℓ YT ; θ(k) > ℓ YT ; θ (k−1)
. If not, the EM-algorithm is not implemented correctly.

3. Each step in the EM-algorithm yields an improvement in the likelihood function. This
improvement will get smaller and smaller, with parameters that also do not change
very much. So, you have to specify a stopping criterion, which is best formulated for
the increase in likelihood falling below a threshold.

4 An example
In the example we look at weekly excess returns on the MSCI US Stock Market Index. For
each week, I have calculated the log return on the index, from which I have subtracted the
1-week risk free rate. The first return is for January 2, 1980 and the last for July 1, 2009.
In total we have 1540 observations (see Kole and van Dijk, 2010, for more details on the
data). The data is available in the file RSExample MSCIUS.xls. The returns are given in
%.

4.1 Inferences
First, we look at the inferences that we make for a given set of parameters. As values for
the parameters we take

µ0 = 0.04 σ0 = 1 p11 = 0.80 ζ = 0.50


µ1 = −0.04 σ1 = 4 p22 = 0.80

The means and volatilities are based on the overall sample mean, which was close to zero,
and the overall sample variance which was around two.
In Table 1 we see the first ten forecast, inference and smoothed inference probabilities.
The first forecast probabilities are given by ζ and 1 − ζ. Based on the first return of
-1.01923, the inference probabilities are calculated. This return is relatively close to zero,
and fits better with the first regime (low volatility) than the second regime (high volatility).
Therefore the inference probability for state 0 is higher than for state 1. Because of the

9
Table 1: Inferences for the first ten returns.
forecast inference smoothed inf.
probabilities probabilities probabilities
observation return St = 0 St = 1 St = 0 St = 1 St = 0 St = 1
1 −1.01923 0.50000 0.50000 0.70167 0.29833 0.51467 0.48533
2 2.64830 0.62100 0.37900 0.21490 0.78510 0.27057 0.72943
3 1.54639 0.32894 0.67106 0.40549 0.59451 0.45034 0.54966
4 2.02344 0.44329 0.55671 0.33727 0.66273 0.51982 0.48018
5 0.96257 0.40236 0.59764 0.64486 0.35514 0.72967 0.27033
6 0.04977 0.58691 0.41309 0.85040 0.14960 0.73656 0.26344
7 1.81177 0.71024 0.28976 0.69432 0.30568 0.40332 0.59668
8 −2.47153 0.61659 0.38341 0.24830 0.75170 0.07637 0.92363
9 −4.24477 0.34898 0.65102 0.00038 0.99962 0.00018 0.99982
10 −1.69100 0.20023 0.79977 0.19599 0.80401 0.05800 0.94201
This tables shows the first ten returns with their forecast probabilities, inference probabilities and smoothed
inference probabilities. The inferences are based on the two-state regime switching model specified in Sec. 1.
The parameters values are µ0 = 0.04, σ0 = 1, µ1 = −0.04 σ1 = 4, p11 = 0.80, p22 = 0.80 and ζ = 0.50.

persistence of the regimes (p11 and p22 are high), the forecast probability for state 0 at
time 2, is higher than the 0.5 at time 1. Returns at time 2, 3 and 4 match better with
the high volatility regime (inference probabilities for regime 2 exceed 0.5). Consequently,
when we smooth the series of inference probabilities, the probability for regime 0 at time
1 goes down, from 0.70167 to 0.51467.

4.2 Estimation
We can use the parameters we picked in the previous subsection to start the EM-algorithm
to estimate the model parameters. We set the stopping criterion at an increase in the
log likelihood function in (10) below 10−8 . In Table 2 we show how the EM algorithm
proceeds. We see that the likelihood increases with every iteration. The EM-algorithm
needs 48 steps in 0.719 seconds to converge to the optimal solution in this case.
In Table 3 we report the forecast, inference and smoothed inference probabilities for
the first ten returns, based on the parameters estimates produced by the EM-algorithm.
Compared to Table 1, we see the regimes are better defined now: the probabilities are
either close to zero or to one. The inference probabilities signal a possible switch for the
return after 9 weeks, where the probability for regime 2 increases above 0.5. It is still
close to 0.5, so based on the 9 weeks of information the regime switching models does not
produce certain inferences about the switch. Using all information, the inference is more
certain for regime 2, and dates the switch already in week 8.
In Figure 1, we see the smoothed inference probabilities for regime 0 over time. This
low volatility regime prevails during prolonged periods of time, but we also see clear periods
identified as exhibiting high volatility, notably around the crash of October 1987, the Asian
crisis (1997), the Ruble crisis (1998), the burst of the IT-bubble after 2001 and the credit
crisis in 2007-2008.

10
Table 2: Steps of the EM-algorithm
starting iteration optimal
values 1 2 3 solution
µ0 0.0400 0.1426 0.1980 0.2240 0.1573
σ0 1.0000 1.1445 1.2182 1.2645 1.5594
µ1 −0.0400 −0.1262 −0.1887 −0.2324 −0.2988
σ1 4.0000 3.1417 3.0916 3.1030 3.4068
p11 0.8000 0.8222 0.8345 0.8532 0.9770
p22 0.8000 0.7899 0.8072 0.8195 0.9484
ζ 0.5000 0.5147 0.5585 0.6501 1.0000
ℓ(YT ; θ) −3423.5840 −3352.8306 −3343.2509 −3337.7226 −3310.2279
This table shows the steps of the EM-algorithm, applied to the full sample. Starting values for the
parameters are µ0 = 0.04, σ0 = 1, µ1 = −0.04 σ1 = 4, p11 = 0.80, p22 = 0.80 and ζ = 0.50. The algorithm
stops when the improvement in the log likelihood function falls below 10−8 . We show the parameters after
the first three iterations, and the optimal values. For each parameter set we calculate the value of the log
likelihood function in (10).

Table 3: Inferences for the first ten returns, based on estimated parameters.
forecast inference smoothed inf.
probabilities probabilities probabilities
observation return St = 0 St = 1 St = 0 St = 1 St = 0 St = 1
1 −1.01923 1.00000 0.00000 1.00000 0.00000 1.00000 0.00000
2 2.64830 0.97697 0.02303 0.97411 0.02589 0.97756 0.02244
3 1.54639 0.95301 0.04699 0.97184 0.02816 0.95963 0.04037
4 2.02344 0.95091 0.04909 0.96308 0.03692 0.92842 0.07158
5 0.96257 0.94281 0.05719 0.97123 0.02877 0.88600 0.11400
6 0.04977 0.95035 0.04965 0.97671 0.02329 0.79482 0.20518
7 1.81177 0.95542 0.04458 0.96998 0.03002 0.58738 0.41262
8 −2.47153 0.94919 0.05081 0.92354 0.07646 0.26443 0.73557
9 −4.24477 0.90622 0.09378 0.43437 0.56563 0.04898 0.95103
10 −1.69100 0.45357 0.54643 0.49407 0.50593 0.03344 0.96657
This tables shows the first ten returns with their forecast probabilities, inference probabilities and smoothed
inference probabilities. The inferences are based on the two-state regime switching model specified in Sec. 1.
The parameters are estimated with the EM-algorithm and reported in Table 2.

11
Figure 1: Smoothed Inference Probability for Regime 0

0.8

0.6

0.4

0.2

0
2-1-1980

2-1-1981

2-1-1982

2-1-1983

2-1-1984

2-1-1985

2-1-1986

2-1-1987

2-1-1988

2-1-1989

2-1-1990

2-1-1991

2-1-1992

2-1-1993

2-1-1994

2-1-1995

2-1-1996

2-1-1997

2-1-1998

2-1-1999

2-1-2000

2-1-2001

2-1-2002

2-1-2003

2-1-2004

2-1-2005

2-1-2006

2-1-2007

2-1-2008

2-1-2009

This figure shows the smoothed inference probabilities for regime 0 over time for the US stock market.
The probabilities are constructed using the filter recursion in (6) and (7) and the smoother recursion of
Kim (1994) in (8). The parameters are estimated with the EM-algorithm and reported in Table 2.

12
References
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1):1–38.

Franses, P. H. and van Dijk, D. (2000). Non-Linear Time Series Models in Empirical Finance.
Cambridge University Press, Cambridge, UK.

Hamilton, J. D. (1990). Analysis of time series subject to changes in regime. Journal of Econo-
metrics, 45(1-2):39–70.

Kim, C.-J. (1994). Dynamic linear models with markov-switching. Journal of Econometrics,
60(1):1–22.

Kole, E. and van Dijk, D. J. C. (2010). How to predict bull and bear markets? Working paper,
Econometric Institute, Erasmus University Rotterdam, The Netherlands.

13

You might also like