Regime Switching Models: An Example For A Stock Market Index
Regime Switching Models: An Example For A Stock Market Index
Market Index
Erik Kole∗
Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam
April 2010
In this document, I discuss in detail how to estimate regime switching models with an
example based on a US stock market index.
1 Specification
We assume that the returns on the US stock market index, Yt , follow a distribution that
depends on a latent process St . At each point in time, the process St is in one out of two
regimes, which we indicate by St = 0 and St = 1. The return Yt behaves according to
{
N (µ0 , σ02 ) if St = 0
Yt ∼ (1)
N (µ1 , σ12 ) if St = 1.
In both regimes, the return follows a normal distribution, though with different means and
variances. We use the function f to denote the normal pdf,
( )
1 (y − µ)2
f (y; µ, σ ) = √
2
exp − . (2)
2πσ 2σ 2
The transition probabilities for the departure states j should add up to one, i.e., p00 +p10 =
1 and p01 + p11 = 1. So, for a binary process St , we have two free parameters, p00 and p11 .
∗
Corresponding author. Address: Burg. Oudlaan 50, Room H11-13, P.O. Box 1738, 3000DR Rotter-
dam, The Netherlands, Tel. +31 10 408 12 58. E-mail addresses [email protected].
1
We gather the transition probabilities in a transition matrix
( ) ( )
p00 p01 p00 1 − p11
P = = . (4)
p10 p11 1 − p00 p11
Since the whole process St is unobserved, this also applies to the initial regime S1 . We
introduce a separate parameter ζ for the probability that the first regime occurs,
ζ = Pr[S1 = 0]. (5)
Naturally, we have Pr[S1 = 1] = 1 − ζ. Because no conditional information on S0 is
available, we cannot directly use the transition matrix to determine this probability, and
we need the extra parameter. This last parameter can be estimated, but also specified
exogenously. We assume in this document that the parameter is estimated.
2 Inference on St
The process St is latent, which means that we will never know for sure which regime
prevailed at a certain point in time. However, we can use the information from the current
and past observations, combined with the distributions and transition probabilities to make
an inference on Pr[St = 0|yt , yt−1 , . . . , y1 ]. We accomplish this by using Bayes’ rule,
Pr[B|A] Pr[A]
Pr[A|B] = .
Pr[B]
For the inference of the regime at time t = 1, this means
Pr[Y1 = y1 |S1 = 0] · Pr[S1 = 0]
Pr[S1 = 0|Y1 = y1 ] =
Pr[Y1 = y1 ]
Pr[Y1 = y1 |S1 = 0] · Pr[S1 = 0]
=
Pr[Y1 = y1 |S1 = 0] · Pr[S1 = 0] + Pr[Y1 = y1 |S1 = 1] · Pr[S1 = 1]
f (y1 ; µ0 , σ02 ) · ζ
= .
f (y1 ; µ0 , σ02 ) · ζ + f (y1 ; µ1 , σ12 ) · (1 − ζ)
In the second equality, we use conditioning again, because conditional on the regime the
distribution of Y1 is given. We make the distributions explicit in the third equality. In a
similar way, we find an expression for Pr[S1 = 1|Y1 = y1 ], but we can also compute this
using Pr[S1 = 1|Y1 = y1 ] = 1 − Pr[S1 = 0|Y1 = y1 ].
After computing the inferences for the regimes at time 1, we can use them to make a
forecast for the regime distribution at time 2,
Pr[S2 = 0|Y1 = y1 ] = Pr[S2 = 0|S1 = 0, Y1 = y1 ] · Pr[S1 = 0|Y1 = y1 ]+
Pr[S2 = 0|S1 = 1, Y1 = y1 ] · Pr[S1 = 1|Y1 = y1 ]
= Pr[S2 = 0|S1 = 0] · Pr[S1 = 0|Y1 = y1 ]+
Pr[S2 = 0|S1 = 1] · Pr[S1 = 1|Y1 = y1 ]
= p00 Pr[S1 = 0|Y1 = y1 ] + p01 Pr[S1 = 1|Y1 = y1 ].
2
In the first equality we condition on the regime at time 1. In the second equality we use
the fact that St follows a first order Markov chain independent of the process Yt . Again, we
can similarly derive Pr[S2 = 1|Y1 = y1 ] or use Pr[S2 = 1|Y1 = y1 ] = 1 − Pr[S2 = 0|Y1 = y1 ].
The steps of calculating inference and forecast probabilities define a recursion. Based
on the forecast probabilities for time 2 and the observation y2 we can calculate inference
probabilities for the regime at time 2. In turn, we use these inferences for forecasts for
the regime at time 3. We can write these recursions compacter by using vector-matrix
notation. We use ξt|t = Pr[St |yt , yt−1 , . . . , y1 ] to denote the vector of inferences probabilities
at time t, and ξt+1|t = Pr[St+1 |yt , yt−1 , . . . , y1 ] for the forecast probabilities at time t, using
information up to time t. We gather the densities of observation yt conditional on the
regimes in a vector ft . We can construct the series of inference and forecast probabilities
by the recursion
1
ξt|t = ′
ξt|t−1 ⊙ ft (6)
ξt|t−1 ft
ξt+1|t = P ξt|t , (7)
where we use the inference and forecast probabilities (see Kim, 1994, §2.2, for a derivation).
We use the smoothed inference probabilities mostly to show how the regimes are identified.
This recursion is called the smoother recursion.
3 Estimation
We can estimate the parameters of the regime switching models using a maximum like-
lihood approach. As with other conditional models such as ARMA- or GARCH-models,
the likelihood function will take a conditional form, too. We gather the parameters of the
model in a vector θ = (µ1 , σ1 , µ2 , σ2 , p00 , p11 , ζ)′ . The conditional likelihood function is
given by
∏
T
L(y1 , y2 , . . . , yT ; θ) = Pr[Yt = yt |yt−1 , yt−2 , . . . y1 ]. (9)
t=1
3
Conditioning on the regime at time t, we find
Pr[Yt = yt |y1 , y2 , . . . yt−1 ] = Pr[Yt = yt |St = 0, y1 , y2 , . . . yt−1 ] · Pr[St = 0|y1 , y2 , . . . yt−1 ]+
Pr[Yt = yt |St = 1, y1 , y2 , . . . yt−1 ] · Pr[St = 1|y1 , y2 , . . . yt−1 ]
= Pr[Yt = yt |St = 0] · ξt|t−1,0 + Pr[Yt = yt |St = 1] · ξt|t−1,1
′
=ξt|t−1 ft
In the second equality, we use the information that the distribution of Yt |St does not
depend on further prior realizations. The conditional log likelihood function can thus be
calculated as
∑
T
′
ℓ(y1 , y2 , . . . , yT ; θ) = log(ξt|t−1 ft ), (10)
t=1
( ) (1−st )st−1
f (yt ; µ0 , σ02 )(1 − p11 ) ×
( ) st (1−st−1 )
f (yt ; µ1 , σ12 )(1 − p00 ) ×
( 2
)st st−1
f (yt ; µ1 , σ1 )p11 .
We see that the density of (yt , st ) combines the fact that conditionally, yt follows a normal
distribution, with the fact that st follows a Bernoulli distribution, conditionally on its
previous realization st−1 .
4
When we construct the log likelihood function of the joint observations (YT , ST ), we
need the log of (11)
A small alteration must be made for the density of Pr[Y1 = y1 , S1 = s1 ; θ], since no history
will be available there. So, instead of the Markov chain parameters p00 and p11 we find an
expression with the parameter ζ,
( )(1−s1 ) ( )s
Pr[Y1 = y1 , S1 = s1 ; θ] = f (yt ; µ0 , σ02 )ζ f (yt ; µ1 , σ12 )ζ 1 .
This log likelihood function would be much easier to optimize than the actual log likelihood
function in (10), because (12) does not exhibit a recursive relation. However, we cannot
observe St .
The EM-algorithm proposes to base the estimation on (12). Because we do not have
actual observations on St , the EM-algorithm maximizes the expectation of the log likelihood
function in (12) based on the complete data that we do observe, YT . So, instead of working
with st , we work with the expectation of St conditional on the data and the parameters,
E[St |YT ; θ] = Pr[St = 0|YT ; θ] · 0 + Pr[St = 1|YT ; θ] · 1 = Pr[St = 1|YT ; θ]. (13)
5
This approach would almost retain the attractive structure of the log likelihood function
in (12). Almost, as the expectations of St and St St−1 depend on θ and are calculated again
via the recursion in (8). The trick of the EM-algorithm is to treat the expectation part
and the maximization separately. So, for a given parameter vector θ, the expectations in
(13) and (14) are calculated. Then, these expectations are treated as given, and a new
parameter vector θ ∗ is calculated which maximizes the expected log likelihood function.
Of course, this new parameter vector gives rise to other expectations, which in turn lead
to a new parameter vector. So, instead of one direct maximum likelihood estimation, we
conduct a series of expectation maximization steps, which produce a series of parameter
estimates θ (k)
[ ]
θ (k) = arg max E ℓY,S (YT , ST ; θ)|YT ; θ (k−1) . (15)
θ
Dempster et al. (1977) and Hamilton (1990) show that this sequence of θ (k) converges and
produces a maximum of (10). As always, this maximum can be local, and may depend on
starting values θ (0) .
The updated parameters θ (k) maximize this expected log likelihood function, so they satisfy
the first order conditions
( )
∂ℓEM YT ; θ, θ (k−1)
= 0. (17)
∂θ (k)
θ=θ
Taking a closer look at (12), we see that the log likelihood function can be split in
terms that exclusively relate to specific parameters. The parameters of the distribution
for the first regime µ0 and σ02 are only related to the first term, and the parameters of
the distribution for the second regime only to the second. The transition probability p00
is related to the third and fifth term, and so on. So differentiation will produce relatively
simply conditions.
(k−1)
We first look at differentiating (16) with respect to µ0 . We will use ξt|T,0 to denote
Pr[St = 0|YT ; θ (k−1) ] = 1 − E[St |YT ; θ (k−1) ], which is the smoothed inference probability
that we find when we apply the filter and smoother recursions in (6)–(8) with parameters
6
θ (k−1) . We find
∑
T
(k−1)
( ) ∂ ξt|T,0 log f (yt ; µ0 , σ02 )
∂ℓEM YT ; θ, θ (k−1)
t=1
=
∂µ0 ∂µ0
∑T ( )
(k−1) 1 1 (yt − µ0 )2
∂ ξt|T,0 − log 2π − log σ0 − (18)
t=1
2 2 σ02
=
∂µ0
∑
T
(k−1) (yt − µ0 )
= ξt|T,0
t=1
σ02
(k)
For the optimal µ0 this expression equals zero, which means that we find
∑T (k−1)
(k) t=1 ξt|T,0 yt
µ0 = ∑T (k−1) . (19)
t=1 ξt|T,0
This estimate for µ0 can be interpreted as a weighted average of the observations, where
the smoothed inference probabilities for regime 0 serve as weights. It is a clear extension
of the normal maximum likelihood estimator for the mean of a normal distribution. For
(k) (k−1) (k−1)
µ1 we find a similar expression, with ξt|T,1 instead of ξt|T,0 .
Next we consider the estimates for σ02 . Differentiation yields
∑
T
(k−1)
( ) ∂ ξt|T,0 log f (yt ; µ0 , σ02 )
∂ℓEM YT ; θ, θ (k−1)
t=1
=
∂σ0 ∂σ0
∑
T ( )
(k−1) 1 1 (yt − µ0 )2
∂ ξt|T,0 − log 2π − log σ0 − (20)
t=1
2 2 σ02
=
∂σ0
∑
T ( )
(k−1) (yt − µ0 )2 1
= ξt|T,0 3
− .
t=1
σ0 σ0
(k)
The optimal σ0 sets this expression to zeros, so
v ( )
u∑
u T ξ (k−1) y − µ(k) 2
u t 0
σ0 = t
(k) t=1 t|T,0
∑T (k−1) , (21)
ξ
t=1 t|T,0
7
In a similar way we can derive the estimates for p00 and p11 . Before we derive these
estimates, note that
[ ]
E (1 − St )(1 − St−1 )|YT ; θ
[ ] [ ] [ ]
=1 − E St |YT ; θ − E St−1 |YT ; θ + E St St−1 |YT ; θ
[ ] [ ] [ ]
=1 − Pr St = 1|YT ; θ − Pr St = 1|YT ; θ + Pr St = St−1 = 1|YT ; θ
[ ]
= Pr St = St−1 = 0|YT ; θ
[ ] [ ] [ ]
and[ similarly E St (1−St−1])|YT ; θ = Pr St = 1, St−1 = 0|YT ; θ and E (1−St )St−1 |YT ; θ =
Pr St = 0, St−1 = 1|YT ; θ . These probabilities can be calculated with a slight modification
of the recursion in (8),
ξt+1|T,i (k−1)
Pr[St+1 = i, St = j|YT ; θ (k−1) ] = p̃ij,t+1 = ξt|t,j · p (22)
ξt+1|t,i ij
The derivative for p00 is given by
∑
T
8
3.3 Remarks
1. The EM-algorithm needs starting values θ (0) . In principle, these starting values
can be picked at random, as long as they are feasible, i.e., positive volatility and
probabilities between zero and one. It is advisable to make sure that the distribution
parameters for regime 0 differ substantially from those for regime 1. For example,
take the volatility for regime 1 three or four times that for regime 0. Regimes tend
to be persistent, so set the transition probabilities at a high value of 0.9, say
2. The EM-algorithm converges and maximizes the likelihood. This means that each
maximization step in the EM-algorithm should yield an improvement. In other words,
for each new set of parameters θ (k) , the log likelihood function in (10) should in-
crease.
( In )implementing
( the) algorithm, an important control mechanism is whether
ℓ YT ; θ(k) > ℓ YT ; θ (k−1)
. If not, the EM-algorithm is not implemented correctly.
3. Each step in the EM-algorithm yields an improvement in the likelihood function. This
improvement will get smaller and smaller, with parameters that also do not change
very much. So, you have to specify a stopping criterion, which is best formulated for
the increase in likelihood falling below a threshold.
4 An example
In the example we look at weekly excess returns on the MSCI US Stock Market Index. For
each week, I have calculated the log return on the index, from which I have subtracted the
1-week risk free rate. The first return is for January 2, 1980 and the last for July 1, 2009.
In total we have 1540 observations (see Kole and van Dijk, 2010, for more details on the
data). The data is available in the file RSExample MSCIUS.xls. The returns are given in
%.
4.1 Inferences
First, we look at the inferences that we make for a given set of parameters. As values for
the parameters we take
The means and volatilities are based on the overall sample mean, which was close to zero,
and the overall sample variance which was around two.
In Table 1 we see the first ten forecast, inference and smoothed inference probabilities.
The first forecast probabilities are given by ζ and 1 − ζ. Based on the first return of
-1.01923, the inference probabilities are calculated. This return is relatively close to zero,
and fits better with the first regime (low volatility) than the second regime (high volatility).
Therefore the inference probability for state 0 is higher than for state 1. Because of the
9
Table 1: Inferences for the first ten returns.
forecast inference smoothed inf.
probabilities probabilities probabilities
observation return St = 0 St = 1 St = 0 St = 1 St = 0 St = 1
1 −1.01923 0.50000 0.50000 0.70167 0.29833 0.51467 0.48533
2 2.64830 0.62100 0.37900 0.21490 0.78510 0.27057 0.72943
3 1.54639 0.32894 0.67106 0.40549 0.59451 0.45034 0.54966
4 2.02344 0.44329 0.55671 0.33727 0.66273 0.51982 0.48018
5 0.96257 0.40236 0.59764 0.64486 0.35514 0.72967 0.27033
6 0.04977 0.58691 0.41309 0.85040 0.14960 0.73656 0.26344
7 1.81177 0.71024 0.28976 0.69432 0.30568 0.40332 0.59668
8 −2.47153 0.61659 0.38341 0.24830 0.75170 0.07637 0.92363
9 −4.24477 0.34898 0.65102 0.00038 0.99962 0.00018 0.99982
10 −1.69100 0.20023 0.79977 0.19599 0.80401 0.05800 0.94201
This tables shows the first ten returns with their forecast probabilities, inference probabilities and smoothed
inference probabilities. The inferences are based on the two-state regime switching model specified in Sec. 1.
The parameters values are µ0 = 0.04, σ0 = 1, µ1 = −0.04 σ1 = 4, p11 = 0.80, p22 = 0.80 and ζ = 0.50.
persistence of the regimes (p11 and p22 are high), the forecast probability for state 0 at
time 2, is higher than the 0.5 at time 1. Returns at time 2, 3 and 4 match better with
the high volatility regime (inference probabilities for regime 2 exceed 0.5). Consequently,
when we smooth the series of inference probabilities, the probability for regime 0 at time
1 goes down, from 0.70167 to 0.51467.
4.2 Estimation
We can use the parameters we picked in the previous subsection to start the EM-algorithm
to estimate the model parameters. We set the stopping criterion at an increase in the
log likelihood function in (10) below 10−8 . In Table 2 we show how the EM algorithm
proceeds. We see that the likelihood increases with every iteration. The EM-algorithm
needs 48 steps in 0.719 seconds to converge to the optimal solution in this case.
In Table 3 we report the forecast, inference and smoothed inference probabilities for
the first ten returns, based on the parameters estimates produced by the EM-algorithm.
Compared to Table 1, we see the regimes are better defined now: the probabilities are
either close to zero or to one. The inference probabilities signal a possible switch for the
return after 9 weeks, where the probability for regime 2 increases above 0.5. It is still
close to 0.5, so based on the 9 weeks of information the regime switching models does not
produce certain inferences about the switch. Using all information, the inference is more
certain for regime 2, and dates the switch already in week 8.
In Figure 1, we see the smoothed inference probabilities for regime 0 over time. This
low volatility regime prevails during prolonged periods of time, but we also see clear periods
identified as exhibiting high volatility, notably around the crash of October 1987, the Asian
crisis (1997), the Ruble crisis (1998), the burst of the IT-bubble after 2001 and the credit
crisis in 2007-2008.
10
Table 2: Steps of the EM-algorithm
starting iteration optimal
values 1 2 3 solution
µ0 0.0400 0.1426 0.1980 0.2240 0.1573
σ0 1.0000 1.1445 1.2182 1.2645 1.5594
µ1 −0.0400 −0.1262 −0.1887 −0.2324 −0.2988
σ1 4.0000 3.1417 3.0916 3.1030 3.4068
p11 0.8000 0.8222 0.8345 0.8532 0.9770
p22 0.8000 0.7899 0.8072 0.8195 0.9484
ζ 0.5000 0.5147 0.5585 0.6501 1.0000
ℓ(YT ; θ) −3423.5840 −3352.8306 −3343.2509 −3337.7226 −3310.2279
This table shows the steps of the EM-algorithm, applied to the full sample. Starting values for the
parameters are µ0 = 0.04, σ0 = 1, µ1 = −0.04 σ1 = 4, p11 = 0.80, p22 = 0.80 and ζ = 0.50. The algorithm
stops when the improvement in the log likelihood function falls below 10−8 . We show the parameters after
the first three iterations, and the optimal values. For each parameter set we calculate the value of the log
likelihood function in (10).
Table 3: Inferences for the first ten returns, based on estimated parameters.
forecast inference smoothed inf.
probabilities probabilities probabilities
observation return St = 0 St = 1 St = 0 St = 1 St = 0 St = 1
1 −1.01923 1.00000 0.00000 1.00000 0.00000 1.00000 0.00000
2 2.64830 0.97697 0.02303 0.97411 0.02589 0.97756 0.02244
3 1.54639 0.95301 0.04699 0.97184 0.02816 0.95963 0.04037
4 2.02344 0.95091 0.04909 0.96308 0.03692 0.92842 0.07158
5 0.96257 0.94281 0.05719 0.97123 0.02877 0.88600 0.11400
6 0.04977 0.95035 0.04965 0.97671 0.02329 0.79482 0.20518
7 1.81177 0.95542 0.04458 0.96998 0.03002 0.58738 0.41262
8 −2.47153 0.94919 0.05081 0.92354 0.07646 0.26443 0.73557
9 −4.24477 0.90622 0.09378 0.43437 0.56563 0.04898 0.95103
10 −1.69100 0.45357 0.54643 0.49407 0.50593 0.03344 0.96657
This tables shows the first ten returns with their forecast probabilities, inference probabilities and smoothed
inference probabilities. The inferences are based on the two-state regime switching model specified in Sec. 1.
The parameters are estimated with the EM-algorithm and reported in Table 2.
11
Figure 1: Smoothed Inference Probability for Regime 0
0.8
0.6
0.4
0.2
0
2-1-1980
2-1-1981
2-1-1982
2-1-1983
2-1-1984
2-1-1985
2-1-1986
2-1-1987
2-1-1988
2-1-1989
2-1-1990
2-1-1991
2-1-1992
2-1-1993
2-1-1994
2-1-1995
2-1-1996
2-1-1997
2-1-1998
2-1-1999
2-1-2000
2-1-2001
2-1-2002
2-1-2003
2-1-2004
2-1-2005
2-1-2006
2-1-2007
2-1-2008
2-1-2009
This figure shows the smoothed inference probabilities for regime 0 over time for the US stock market.
The probabilities are constructed using the filter recursion in (6) and (7) and the smoother recursion of
Kim (1994) in (8). The parameters are estimated with the EM-algorithm and reported in Table 2.
12
References
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1):1–38.
Franses, P. H. and van Dijk, D. (2000). Non-Linear Time Series Models in Empirical Finance.
Cambridge University Press, Cambridge, UK.
Hamilton, J. D. (1990). Analysis of time series subject to changes in regime. Journal of Econo-
metrics, 45(1-2):39–70.
Kim, C.-J. (1994). Dynamic linear models with markov-switching. Journal of Econometrics,
60(1):1–22.
Kole, E. and van Dijk, D. J. C. (2010). How to predict bull and bear markets? Working paper,
Econometric Institute, Erasmus University Rotterdam, The Netherlands.
13