Vigoda - I NTRODUCTION TO MCMC AND PAGERANK
Vigoda - I NTRODUCTION TO MCMC AND PAGERANK
PAGE R ANK
Eric Vigoda
Georgia Tech
2 E RGODICITY
4 PAGE R ANK
5 M IXING T IME
.5
Listen
to Check
Kishore .2 Email
.3 .5
.7 .3
Sleep StarCraft
.7
.3
What is a Markov chain?
.5
Listen
to Check
Kishore .2 Email
.3 .5
.7 .3
Sleep StarCraft
.7
.3
.5
Listen
to
Kishore .2
Check
Email
.5 .5 0 0
.2 0 .5 .3
.7
.3 .5
.3 P=
0 .3 .7 0
Sleep StarCraft .7 0 0 .3
.7
.3
Time: t = 0, 1, 2, . . . .
Let Xt denote the state at time t.
Xt is a random variable.
One-step transitions
Time: t = 0, 1, 2, . . . .
Let Xt denote the state at time t.
Xt is a random variable.
For states k and j, Pr (X1 = j | X0 = k) = P(k, j).
One-step transitions
Time: t = 0, 1, 2, . . . .
Let Xt denote the state at time t.
Xt is a random variable.
For states k and j, Pr (X1 = j | X0 = k) = P(k, j).
In general, for t ≥ 1, given:
in state k0 at time 0, in k1 at time 1, . . . , in kt−1 at time t − 1,
what’s the probability of being in state j at time t?
One-step transitions
Time: t = 0, 1, 2, . . . .
Let Xt denote the state at time t.
Xt is a random variable.
For states k and j, Pr (X1 = j | X0 = k) = P(k, j).
In general, for t ≥ 1, given:
in state k0 at time 0, in k1 at time 1, . . . , in kt−1 at time t − 1,
what’s the probability of being in state j at time t?
Time: t = 0, 1, 2, . . . .
Let Xt denote the state at time t.
Xt is a random variable.
For states k and j, Pr (X1 = j | X0 = k) = P(k, j).
In general, for t ≥ 1, given:
in state k0 at time 0, in k1 at time 1, . . . , in kt−1 at time t − 1,
what’s the probability of being in state j at time t?
Process is memoryless –
only current state matters, previous states do not matter.
Known as Markov property, hence the term Markov chain.
2-step transitions
.5 .5 0 0 .35 .25 .25 .15
.2 0 .5 .3 .31 .25 .35 .09
P=
0 .3 .7 0
P2 =
.06
.21 .64 .09
.7 0 0 .3 .56 .35 0 .09
States: 1=Listen, 2=Email, 3=StarCraft, 4=Sleep.
k-step transitions
Pr (Xt+2 = j | Xt = i)
XN
= Pr (Xt+2 = j | Xt+1 = k) × Pr (Xt+1 = k | Xt = i)
k=1
X X
= P(k, j)P(i, k) = P(i, k)P(k, j) = P2 (i, j)
k k
k-step transitions
Pr (Xt+2 = j | Xt = i)
XN
= Pr (Xt+2 = j | Xt+1 = k) × Pr (Xt+1 = k | Xt = i)
k=1
X X
= P(k, j)P(i, k) = P(i, k)P(k, j) = P2 (i, j)
k k
So X1 ∼ µ1 where µ1 = µ0 P.
Random Initial State
So X1 ∼ µ1 where µ1 = µ0 P.
And Xt ∼ µt where µt = µ0 Pt .
Back to CS 6210 example: big t?
.7 0 0 .3
Back to CS 6210 example: big t?
.244190 .244187 .406971 .104652
.244187 .244186 .406975 .104651
P20 =
.244181
.244185 .406984 .104650
.244195 .244188 .406966 .104652
Back to CS 6210 example: big t?
.244190 .244187 .406971 .104652
.244187 .244186 .406975 .104651
P20 =
.244181
.244185 .406984 .104650
.244195 .244188 .406966 .104652
For big t,
.244186 .244186 .406977 .104651
.244186 .244186 .406977 .104651
Pt ≈
.244186 .244186 .406977 .104651
.244186 .244186 .406977 .104651
Limiting Distribution
For big t,
.244186 .244186 .406977 .104651
.244186 .244186 .406977 .104651
Pt ≈
.244186 .244186 .406977 .104651
.244186 .244186 .406977 .104651
Pr (Xt = 1) = .244186
Pr (Xt = 2) = .244186
Pr (Xt = 3) = .406977
Pr (Xt = 4) = .104651
Limiting Distribution
For big t,
.244186 .244186 .406977 .104651
.244186 .244186 .406977 .104651
Pt ≈
.244186 .244186 .406977 .104651
.244186 .244186 .406977 .104651
Pr (Xt = 1) = .244186
Pr (Xt = 2) = .244186
Pr (Xt = 3) = .406977
Pr (Xt = 4) = .104651
Let π = [ .244186, .244186, .406977, .104651].
In other words, for big t, Xt ∼ π.
π is called a stationary distribution.
Limiting Distribution
Key questions:
When is there a stationary distribution?
If there is at least one, is it unique or more than one?
Assuming there’s a unique stationary distribution:
Do we always reach it?
What is it?
Mixing time = Time to reach unique stationary distribution
Algorithmic Goal:
If we have a distribution π that we want to sample from, can
we design a Markov chain that has:
Unique stationary distribution π,
From every X0 we always reach π,
Fast mixing time.
1 M ARKOV C HAIN BASICS
2 E RGODICITY
4 PAGE R ANK
5 M IXING T IME
1 5
1
2
3 4
1 5
1
2
3 4
.4
1 3
.3
.3
.4
2
.6
4
.7
.5
4 .3
.5
.4
1 3
.3
.3
.4
2
.6
4
.7
.5
4 .3
.5
2
1
1
1 .7
.3
.7
4
3
.3
2
1
1
1 .7
.3
.7
4
3
.3
lim µt = π.
t→∞
lim µt = π.
t→∞
What is π?
1 M ARKOV C HAIN BASICS
2 E RGODICITY
4 PAGE R ANK
5 M IXING T IME
2 E RGODICITY
4 PAGE R ANK
5 M IXING T IME
What if:
a webpage has 500 links and one is to Eric’s page.
another webpage has only 5 links and one is to Santosh’s
page.
Which link is more valuable?
Refining the Ranking Idea
What if:
a webpage has 500 links and one is to Eric’s page.
another webpage has only 5 links and one is to Santosh’s
page.
Which link is more valuable?
Academic papers: If a paper cites 50 other papers, then each
reference gets 1/50 of a citation.
Refining the Ranking Idea
What if:
a webpage has 500 links and one is to Eric’s page.
another webpage has only 5 links and one is to Santosh’s
page.
Which link is more valuable?
Academic papers: If a paper cites 50 other papers, then each
reference gets 1/50 of a citation.
Webpages: If a page y has |Out(y)| outgoing links, then:
each linked page gets 1/|Out(y)|.
New solution:
X 1
π(x) = .
|Out(y)|
y∈In(x)
Further Refining the Ranking Idea
Previous:
X 1
π(x) = .
|Out(y)|
y∈In(x)
Previous:
X 1
π(x) = .
|Out(y)|
y∈In(x)
Importance of page x:
X π(y)
π(x) = .
|Out(y)|
y∈In(x)
Random Walk
Importance of page x:
X π(y)
π(x) = .
|Out(y)|
y∈In(x)
Importance of page x:
X π(y)
π(x) = .
|Out(y)|
y∈In(x)
2 E RGODICITY
4 PAGE R ANK
5 M IXING T IME
1
dTV (µ, ν) = (.25 + .15 + .1 + 0) = .25
2
Mixing Time
Coupling proof:
Consider 2 copies of the Random Surfer chain (Xt ) and (Yt ).
Choose Y0 from π. Thus, Yt ∼ π for all t.
And X0 is arbitrary.
Mixing Time of Random Surfer
Coupling proof:
Consider 2 copies of the Random Surfer chain (Xt ) and (Yt ).
Choose Y0 from π. Thus, Yt ∼ π for all t.
And X0 is arbitrary.
If Xt−1 = Yt−1 then they choose the same transition at time t.
If Xt−1 6= Yt−1 then with prob. 1 − α choose the same random
page z for both chains.
Therefore,
Pr (Xt 6= Yt ) ≤ αt .
Mixing Time of Random Surfer
Coupling proof:
Consider 2 copies of the Random Surfer chain (Xt ) and (Yt ).
Choose Y0 from π. Thus, Yt ∼ π for all t.
And X0 is arbitrary.
If Xt−1 = Yt−1 then they choose the same transition at time t.
If Xt−1 6= Yt−1 then with prob. 1 − α choose the same random
page z for both chains.
Therefore,
Pr (Xt 6= Yt ) ≤ αt .