Estimation of the transition matrix of a discrete-time
Estimation of the transition matrix of a discrete-time
Summary
Discrete-time Markov chains have been successfully used to investigate treatment programs and health care
protocols for chronic diseases. In these situations, the transition matrix, which describes the natural progression of
the disease, is often estimated from a cohort observed at common intervals. Estimation of the matrix, however, is
often complicated by the complex relationship among transition probabilities. This paper summarizes methods to
obtain the maximum likelihood estimate of the transition matrix when the cycle length of the model coincides with
the observation interval, the cycle length does not coincide with the observation interval, and when the observation
intervals are unequal in length. In addition, the bootstrap is discussed as a method to assess the uncertainty of the
maximum likelihood estimate and to construct confidence intervals for functions of the transition matrix such as
expected survival. Copyright # 2002 John Wiley & Sons, Ltd.
*Correspondence to: Department of Statistics, 1399 Mathematical Sciences, Purdue University, West Lafayette, IN 47907-1399,
USA. Tel.: +765-494-6043; fax: +765-494-0558; e-mail: [email protected]
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)
Discrete-Time Markov Chain 35
This approach is only appropriate if restricted to Given the observed count matrix, the maximum
a single probability. With a Markov chain, a likelihood estimate of the transition matrix is
transition across two cycles involves a complex simply the row proportions of N,
combination of transition probabilities (through n o .X
h
matrix multiplication). Appropriate methods need Mb ¼ b h where byrc ¼ nrc nrj
to take into account this dependent structure of j¼1
the transition probabilities over cycles.
This estimation technique is commonly used and is
presented here as a reference for the other two
Estimation situations [9].
In this section, it is assumed that the transition Observation intervals do not coincide
matrix will be estimated from longitudinal cohort
data with observation intervals common to all Let Lo be the common observation interval and Ld
subjects. Attention is restricted to obtaining the the desired cycle length. The maximum likelihood
maximum likelihood estimate of the transition estimate of the transition matrix Mb o , associated
matrix for three specific cases increasing in with the cycle length Lo , is obtained using the
complexity. The first case is when the observation methods of ‘Observation intervals coincide’. By
intervals are constant and coincide with the cycle the invariance property, the maximum likelihood
length. The second case is when the observation estimate of the transition matrix associated with
intervals are constant but do not coincide with the cycle length Ld is
cycle length. The method discussed in this section
can only be used in certain situations. When it Mbd ¼ M bt
o
cannot, the method discussed for the third case is where t ¼ Ld =Lo . For example, if in the previous
possible. The third case represents the most example a one year rather than a two-year
common situation when the observation intervals transition matrix were desired (Lo ¼ 2 and
are not equal in length. The cycle length may or Ld ¼ 1), one would take the square root of the
may not coincide with one of these intervals. estimated two-year transition matrix (t ¼ 0:5).
Computation of this matrix is straightforward
Observation intervals coincide from the decomposition of M b o into its eigenvalues
and eigenvectors (spectral decomposition). Based
Suppose you have a disease with h distinct health on this decomposition, the h h matrix M b o can be
states. You want to estimate a two-year transition expressed as
matrix and the data is from a cohort that was 2 3
l1 0 0
followed for four years with two two-year ob- 6
servation intervals. The three health states for 6 .. .. 7
7
6 0 l 2 . . 7
individual i are labeled as si0 ; si2 and si4 . b
Mo ¼ PDP 1
where D ¼ 6 6 7
7
In this case, the observed two-year intervals 6 .. .. .. 7
coincide with the desired two-year transition 4 . . . 05
matrix. Because the model is homogeneous, the 0 ... 0 lh
observed transitions between the first two years and li is the ith eigenvalue and its associated
can be pooled with the transitions between the
eigenvector is the ith column of P. It then follows
second two years to form an observed two-year that
transition count matrix: 2 t 3
0 1 l1 0 0
n11 n12 . . . n1h 6
B C 6 t
.. .. 7
7
B n21 n22 . . . n2h C 6 0 l . . 7
B C Mb t ¼ PDt P1 where Dt ¼ 6 2 7
N¼B . .. .. .. C ; o 6 . .. .. 7
B .. . . . C 6 . 7
@ A 4 . . . 05
nh1 nh2 . . . nhh 0 ... 0 lth
where nrc is the number of occurrences where si0 ¼ The eigenvalues are raised to the power t but the
r and si2 ¼ c or si2 ¼ r and si4 ¼ c. eigenvectors do not change. Many software
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)
36 B.A. Craig and P.P. Sendi
packages, such as Splus, have matrix decomposi- the unobserved cycles, tallying the expected
tion functions, so these calculations can be done number of transitions, and then using the methods
very quickly. of ‘Observation intervals coincide’ to obtain a
This method is very similar to one method used new estimate of the transition matrix. This is
to obtain the MLE estimate of the continuous- repeated until the transition matrix stabilizes. An
time transition matrix [11]. However, while this initial transition matrix is needed to start the
always works in the continuous-time case, it does algorithm. Convergence to the MLE is not
not always work in the discrete-time case. A guaranteed (may converge to local maximum) so
discrete-time model is not necessarily Markov at several initial transition matrices are recom-
all cycle lengths. This is comparable to saying the mended.
eigenvalues of the transition matrix can be For the E-step, the estimated transition matrix is
negative. Provided the estimated transition matrix used to compute the probability of each path a
Mb is positive semidefinite (all the eigenvalues are subject could have followed to end up where
non-negative), this method will allow you to he/she did after kt cycles. For example, given a
compute the MLE directly. In situations where one-year transition matrix M, the two-year
Lo is even and M b is not positive semidefinite, the transition probabilities are given by computing
method described in the following section can be M M. Labeling the one-year transition
used. matrix
0 1
y11 y12 y1h
B C
Unequal observation intervals B y21 y22 y2h C
B C
M¼B . .. .. .. C
B .. . . . C
In many situations, the observation intervals may @ A
be unequal in length [6,8]. As an example, suppose yh1 yh2 yhh
a one-year transition matrix is desired but the
cohort was observed at year two and three. In this this product can be expressed in terms of the one-
situation, the one-year transition matrix could be year transition probabilities as
estimated using only the year two to three MM ¼
information but this throws away half of the 0 Ph Ph Ph 1
observed data. Ideally, one would like to use all j¼1 y1j yj1 j¼1 y1j yj2 j¼1 y1j yjh
B Ph Ph Ph C
the information. This can be done using the EM B y2j yj1 y2j yj2 y2j yjh C
algorithm [12]. The E-step imputes the missing B j¼1 j¼1 j¼1 C
B C
data by computing the expected number of single- B .. .. .. .. C
B . . . . C
cycle transitions. The M-step treats the expected @ A
Ph Ph Ph
number of single-cycle transitions as the true data j¼1 yhj yj1 j¼1 yhj yj2 j¼1 y hj y jh
set and maximizes the likelihood. This is repeated
until the transition probabilities stabilize. where each probability product (yrj yjc ) represents
Recall the situation where the observation one possible path from the initial state r to state c
intervals and cycle length coincide. If nrc represents after two cycles (years).
the number of individuals that move from Denote the number of observed subjects moving
state r to state c in one cycle, the likelihood between state r and state c after kt cycles as nkrct .
function is Given the probability of each possible path, the
remainder of this step involves estimating the
h Y
Y h
number of subjects who follow each of these paths
LðyÞ ¼ ynrcrc and then tallying the number of single-cycle
r¼1 c¼1
transitions. In the above h state model, there are
and the method of ‘Observation intervals coincide’ h paths in each cell of the two-year transition
provides the MLE of M ¼ fyg. matrix. Each one of the individuals in that cell
Consider that there are T observation intervals must have followed one of the h paths. The
which are integer multiples (k1 ; k2 ; . . . ; kT ) of the expected number of individuals to follow each
cycle length. The missing data are the health states path is based on the relative probability of each
for each individual at the unobserved cycles. Thus path (multinomial distribution). For example, in
the EM algorithm involves imputing these states at the upper left cell, the probability of an individual
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)
Discrete-Time Markov Chain 37
following the path (1 ! 1 ! 1) is transitions with replacement from the observed nr:
y11 y11 transitions. In other words, nr: draws are taken
Pð1 ! 1 ! 1 j 1 ! ? ! 1Þ ¼ Ph from a Multinomial distribution with probabilities
j¼1 y1j yj1 fbyrc g to generate a new set of transition counts for
so the expected number of subjects to have row r. Combining the results of each row forms
followed this path is a new transition count matrix N * and thus
another possible transition probability matrix
y11 y11 b * . If the desired cycle length and observation
n211 Ph M
j¼1 y1j yj1 interval do not coincide then spectral decomposi-
tion or the EM algorithm would be used on each
The expected number of one-cycle transitions is
bootstrap sample to obtain a new transition matrix
then a bookkeeping exercise. For example, the
[14,15].
path 1 ! 1 ! 1 involves two 1 ! 1 transitions
The collection of bootstrapped transition ma-
and the path 1 ! 2 ! 1 involves a 1 ! 2 transi-
trices approximates the sampling distribution.
tion and a 2 ! 1 transition. A single-cycle transi-
From this distribution, one could assess the
tion count matrix is generated and the M-step
uncertainty of each probability in the transition
estimates a new transition matrix. This matrix is
matrix as well as any function of the transition
used to redefine the probability of each path in the
matrix. For example, if one were interested in the
next iteration.
expected survival of an individual starting in
The number of paths each subject could have
state s1 . One could compute this expected value
followed depends on the number of health states h,
for each matrix in the bootstrap set thereby
the number of cycles between observations kt , and
creating a sampling distribution for expected
any restrictions imposed on the transition
survival. An example of this bootstrap approach
matrix (e.g. progressive disease). While the number
is found in Sendi et al. [5].
of paths can be quite large, it is easy for a
computer to handle. The appendix contains a
description of one possible algorithm for the
E-step. Examples
To illustrate these methods, two examples are
presented. The first example utilizes the Swiss HIV
Con¢dence intervals using the cohort study (SHCS) database to estimate one-
bootstrap month CD4-cell count state transition probabil-
ities from six-month follow-ups. The second
As one can see from the product of M M, model example, in order to describe the EM calculations
summaries, such as the probability of entering the in detail, is based on a simulated data set. In
absorbing state by cycle 5 or the expected survival, addition, for the second data set, Bootstrap
are a complex function of single-cycle probabil- procedures are used to form a 95% CI for the
ities. Methods which vary only a subset of the expected number of cycles until entering the
transition parameters (e.g. sensitivity analysis) do absorbing state.
not properly address this complex relationship.
While they can still be very helpful in under-
standing the behavior of the model, other methods Swiss HIV cohort study
must be used to assess uncertainty and construct
confidence intervals. Researchers constructed a homogeneous Markov
For this purpose, Efron’s bootstrap is recom- chain to describe the monthly progression of HIV-
mended [13]. With this method, other possible data infected subjects at the greatest risk of developing
sets, the same size as the original, are formed by Mycobacterium avium complex (MAC) infection
sampling with replacement from the original data [16]. This progression included the possibility of
set. This is done by addressing each row of the movement between three distinct CD4-cell count
transition count matrix N separately. Letting nr: ranges (with and without AIDS). Estimates of the
denote the total number of transitions for row r, monthly transitional probabilities are based on
bootstrapping row r simply involves sampling nr: data from the SHCS. This is a multi-center,
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)
38 B.A. Craig and P.P. Sendi
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)
Discrete-Time Markov Chain 39
y12 y21
n12 ¼ 22 þ 214 Bootstrap example
y11 y11 þ y12 y21
y12 y23
þ 45 þ 41 A function of the transition matrix that is usually
y11 y13 þ y12 y23 þ y13 of great interest is the time until a subject reaches
y21 y12 the absorbing state. For example, when the
þ 62 absorbing state is death, this time is the life-
y21 y12 þ y22 y22
expectancy of the subject. Consider the EM
example with death as the absorbing state and
y13 þ y11 y13 suppose there was interest in estimating the life
n13 ¼ 21 þ 41 expectancy of someone in state 1. Using the
y11 y13 þ y12 y23 þ y13
fundamental matrix solution [2], the expected
y21 y13
þ 82 number of cycles is estimated to be 10.23 cycles.
y21 y13 þ y22 y23 þ y23 Since each cycle represents one month, the
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)
40 B.A. Craig and P.P. Sendi
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)
Discrete-Time Markov Chain 41
these stochastic models provide more realism, this and the second column contains those paths that
also means more uncertainty which can mask treat- end in state 2. The first four rows contain paths
ment protocol differences. As a result, it is impor- which start in state 1 and the last four rows
tant that researchers assess which model better contain paths which start in state 2. Finally, the
addresses the primary questions of their research three single cycle transitions that make up a path
and not just select the most realistic model. have a particular pattern as you go down the paths
Although not discussed, model fit is also very within a column. This type of organization makes
important to consider and is often overlooked. If the accounting of the E-step very easy.
one is considering the model for prediction pur- To construct such a matrix, consider a h h
poses, predicting the outcomes of an independent single-cycle transition matrix M and data observed
yet similar cohort is one valuable check of the at T unique interval lengths equal to kt : t ¼
model [16]. Also a likelihood ratio, or asymptoti- 1; 2; :::; T cycles. Since the matrix construction is
cally equivalent w2 test statistic, as described in similar for each cycle length, assume a single
Anderson and Goodman [9] provides a measure of interval equal to k cycles. The matrix, Pk (a hk h
fit to one’s own data. If the cohort data set is large matrix) is constructed using the following iterative
enough, one could also use a cross validation matrix multiplication equation, P1 ¼ M and
technique.
Pk ðhðr 1Þ þ 1; jÞ
8 k1
< r ¼ 1; 2; . . . ; h
>
Acknowledgements ¼ Pk1 ðr; cÞ Mðc; jÞ c ¼ 1; 2; . . . ; h
>
:
Contract grant/sponsor: NEI Small Research; number:
j ¼ 1; 2; . . . ; h
EY12254-01.
In other words, the first row of Pk is the Kronecker
product of the first element in Pk1 and the first
row of M. The second row is the Kronecker
Appendix product of the second element Pk1 ð1; 2Þ and the
second row of M and so on. The matrix Pk has all
In this problem, the key to the EM algorithm is an potential paths arranged such that each column c
efficient E-step. Recall, the E-step involves (1) contains all paths that end in state c with the first
calculating the probability of each possible path, hk1 rows containing the paths that start in state 1,
(2) obtaining the expected number of subjects to the next hk1 rows containing the paths that start
follow each path, and (3) tallying the number in state 2, and so on. This allows easy computation
of single-cycle transitions. In this section, a of the expected number of subjects to follow each
matrix-oriented approach to keep track of all the path since it arranges all the possible paths in
potential paths is described. Shown below is an adjacent rows and a single column.
example of such a matrix which contains all In the construction of each of the probabilities
potential paths for k ¼ 3 cycles when there are in Pk , k single elements of M were multiplied
only h ¼ 2 states: together. We use the multiplication pattern to tally
0 1 the single-cycle transitions. Let N bði; jÞ represent
y11 y11 y11 y11 y11 y12
B C the expected number of subjects to follow the
B y11 y12 y21 y11 y12 y22 C path described in row i and column j of Pk .
B C
By y y C The exp-ected number single-cycle transitions
B 12 21 11 y12 y21 y12 C
B C from r to c is
B y12 y22 y21 y12 y22 y22 C
P3 ¼ BB C
C
B y21 y11 y11 y21 y11 y12 C hl1 X
k1 X
X k1l
h hX
B C nbr;c ¼ bðsðc; h; k; lÞ
N
B y21 y12 y21 y21 y12 y22 C
B C l¼1 i¼1 j¼1 m¼1
By y y C
@ 22 21 11 y22 y21 y12 A
þ hkþ1l ði 1Þ þ m 1; jÞ
y22 y22 y21 y22 y22 y22
X
h
Notice the organization of these paths. The first þ bðr þ hði 1Þ; cÞ
N
column contains those paths that end in state 1 i¼1
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)
42 B.A. Craig and P.P. Sendi
where sðc; h; k; lÞ ¼ hk1l ðhðr 1Þ þ c 1Þ. The detecting diabetic retinopathy. Med Care 1991; 29:
variable l represents the lth ordered single-cycle 20–39.
transition of a path. This equation is simply 7. Miller DK, Homan SM. Determining transition
summing together each N bði; jÞ whose path contains probabilities: confusions and suggestions. Med
a rc transition in the lth position. The second sum Decision Making 1994; 14: 52–58.
8. Craig BA, Fryback DG, Klein R, Klein BEK. A
represents this process for the last single-cycle
Bayesian approach to modelling the natural history
transition in each path. It is separate because the of a chronic condition from observations with
rc transition is only possible in column c. intervention. Stat Med 1999; 18: 1355–1371.
9. Anderson TW, Goodman LA. Statistical inference
about Markov chains. Ann Math Stat 1957; 28: 89–
110.
References 10. Manning WG, Fryback DG, Weinstein MC. Re-
flecting uncertainty in cost-effectiveness analysis. In
1. Buxton MJ, Drummond MF, van Hout BA, et al. Cost-Effectiveness in Health and Medicine, Gold
Modelling in economic evaluation: an unavoidable MR, Siegel JE, Russell LB, Weinstein MC (eds).
fact of life. Health Econ 1997; 6: 217–227. Oxford University Press: New York, 1996; 247–275.
2. Beck JR, Pauker SG. The Markov process in 11. Kalbfleisch JD, Lawless JF. The analysis of panel
medical prognosis. Med Decision Making 1983; 3: data under a Markov assumption. J Amer Stat
419–458.3. Briggs A, Sculper M. An introduction to Assoc 1985; 80: 863–871.
Markov modeling for economic evaluation. Phar- 12. Dempster AP, Laird NM, Rubin DB. Maximum
macoeconomics 1998; 13: 397–409. likelihood from incomplete data via the EM
4. Sonnenberg FA, Beck JR. Markov models in algorithm. J Roy Stat Soc 1977; 39: 1–38.
medical decision making: a practical guide. Med 13. Efron B, Tibshirani RJ. An Introduction to the
Decision Making 1993; 13: 322–338. Bootstrap. Chapman & Hall: New York, 1993.
5. Sendi PP, Bucher HC, Craig BA, Pfluger D, 14. LePage R, Billard L. Exploring the Limits of the
Battegay M. Estimating AIDS-free survival in a Bootstrap. Wiley: New York, 1992.
severely immunosuppressed asymptomatic HIV-in- 15. McLachlan GJ, Krishnan T. The EM Algorithm and
fected population in the era of antiretroviral triple Extensions. Wiley: New York, 1997.
combination therapy. J Acquir Immune Defic Syndr 16. Sendi PP, Craig BA, Pfluger D, Gafni A, Bucher
Hum Retrovirol 1999; 20: 376–381. HC. Systematic validation of disease models for
6. Dasbach EJ, Fryback DG, Newcomb PA, Klein R, pharmacoeconomic evaluations. J Eval Clin Pract
Klein BEK. Cost-effectiveness of strategies for 1999; 5: 283–295.
Copyright # 2002 John Wiley & Sons, Ltd. Health Econ. 11: 33–42 (2002)