0% found this document useful (0 votes)

3 views

Lecture11_PageRank_V0

The document discusses the PageRank algorithm, which is used to rank web pages based on their importance determined by the link structure of the web. It explains how PageRank treats links as votes, where incoming links from important pages contribute more to a page's rank. The document also outlines the mathematical formulation of PageRank, including the use of stochastic matrices and the power iteration method for calculating the rank of web pages.

Uploaded by

taiiq zhou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Lecture11_PageRank_V0

Uploaded by

taiiq zhou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

COMP4434 Big Data Analytics

Lecture 11 PageRank
HUANG Xiao
[email protected]
n
k

tro
on
or

Ove

a ti

od ep
tw
ne

sco aluat

nc rc
ag
rf i t

)
l ) rk

er
e
ra eries wo

ul prop

p
re, ion m valida
(Ev
t

ti n
u e

(a er
e -s ln
t n , time eura

oe
pre

ay
g&

ut
n

ti l
e t n
rr (tex onal age)

Ba
cisi etrics)
cro
c u t i (m

M
i
Re nvolu ce

on,
du

ss
e
Co pR

rec
Ma

a ll
Dimensionality reduction oop

ing
Had

tion
(autoencoder, SVD)

arn
Clustering: K-means Un

p le
s
leaupe

De e
rn rvi Large-scale data
in s e
g d analytics systems Volume
r vised
fi e Superning Machine Velocity
s si lear Var
c la e learning Big Data Characteristics tim iety
n in e-
s io a ch Analytics of big data Ver series, (i tabular
s acit mage , text
re rm y , gr ,
reg c to Basic statistical
a ph)
ic ve
gi st o rt analysis
Lo pp Graph Applications: AI
ChatGPT
Su Alph
on
es ar

analytics with big data aGo

sit va nt

Al
gr ne
si

p
Re Au ha
ce
re Li

n( e

Fo
SV

c to
io lu
es

om no ld

Fa
Web sea
2
d

cia
m ou
nt

po ar

lr
sd
ie

ec
de
m ul

riv
ad

og
rs in
co ng

tri x

n
Gr

iti
rc
de Si

on
te

h
ma

m C n
Reco
n

te
ommnt-base
actio
y

end d
enc

Co
nk

a ti o
inter
ork

lla n
Pag
a
jac

Factoriza
fil bo
eR
netw

eRa
-item

te ra
Ad

(SVD)
ri ti
Pag

ng ve
User

tion
Map
Red
uce

New Jersey Institute of Technology

PageRank Motivation: How to organize the Web?

§ First try: Human curated Web directories

§ Yahoo, DMOZ, LookSmart
§ Second try: Web search
§ Information Retrieval investigates:
Finding relevant docs in a small
and trusted set
§ Newspaper articles, Patents, etc.
§ But: Web is huge, full of untrusted documents, random
things, web spam, etc.

COMP4434 3
New Jersey Institute of Technology
Challenges in Web Search

§ (1) Web contains many sources of information

Who to “trust”?
§ Trick: Trustworthy pages may point to each other!
§ (2) What is the “best” answer to query “newspaper”?
§ No single right answer
§ Trick: Pages that actually know about newspapers
might all be pointing to many newspapers

COMP4434 4
New Jersey Institute of Technology
Hint: Web as a Directed Graph

§ Nodes: Webpages
§ Edges: Hyperlinks

I teach Big
Data in
COMP
COMP is in
Faculty of
Engineering The Hong
Kong
Polytechnic
University

COMP4434 5
New Jersey Institute of Technology
Web as a Directed Graph

COMP4434 6
New Jersey Institute of Technology
Ranking Nodes on the Graph

§ All web pages are not equally “important”

https://xhuang31.github.io vs.
https://www.polyu.edu.hk

§ There is large diversity

in the web-graph
node connectivity.
Let’s rank the pages by
the link structure!

COMP4434 7
New Jersey Institute of Technology
Example of Node Ranking

§ Page Ranking
§ Social Ranking
§ Paper Ranking
§ Scholar Ranking
§ ……

COMP4434 8
New Jersey Institute of Technology
Idea: Links as votes

§ Page is more important if it has more links

§ In-coming links? Out-going links?
§ Think of in-links as votes:
§ www.stanford.edu has 23,400 in-links
§ https://xhuang31.github.io has 0 in-link

§ Are all in-links are equal?

§ Links from important pages count more
§ Recursive question!

COMP4434 9
New Jersey Institute of Technology
Google PageRank

COMP4434 10
New Jersey Institute of Technology
Is Page == “Webpage”?

§ Born in March 26, 1973

§ Found Google at September 4, 1998
§ As of Nov 2024, own an estimated
net worth of $163 billion (No.15
Richest)
§ Begins from ”Larry Page and Sergey
Brin developed PageRank at Stanford
University in 1996” as part of a research
project about a new kind of search
Larry Page
engine. Co-founder of Google

COMP4434 11
New Jersey Institute of Technology
Simple Recursive Formulation

§ Each link’s vote is proportional to the importance of its source

page
§ If page j with importance PR(j) has n out-links, each link gets
PR(j) / n votes
§ Page j’s own importance is the sum of the votes on its in-links

i k
ri/3
rk/4

PR(j) = PR(i)/3+PR(k)/4 j rj/3

rj/3 rj/3

COMP4434 12
New Jersey Institute of Technology
How to Represent a Graph

§ Graph model 𝐺 = 𝑉, 𝐸
§ 𝑉 is a set of pages
§ 𝐸 is a set of edges
§ Each edge 𝑢, 𝑣 ∈ 𝐸 represents that
page 𝑢 points/references to page 𝑣
§ Adjacent List
§ A data structure for a graph
§ 𝐴𝑑𝑗 𝑢 = 𝑣: 𝑢, 𝑣 ∈ 𝐸 contains
each vertex 𝑣 being adjacent to 𝑢
§ Example: 𝐴𝑑𝑗 2 = {3, 4}

COMP4434 13
New Jersey Institute of Technology
PageRank: The “Flow” Model

§ A “vote” from an important page is y/2

worth more
y
§ A page is important if it is pointed to by
other important pages a/2
§ Define a “rank” rj for page j y/2
m
a m
a/2
ri
rj = å “Flow” equations:

i® j di
ry = ry /2 + ra /2
ra = ry /2 + rm
𝒅𝒊 … out-degree of node 𝒊
rm = ra /2

COMP4434 14
New Jersey Institute of Technology
Solving the Flow Equations
Flow equations:
ry = ry /2 + ra /2
§ 3 equations, 3 unknowns,
no constants ra = ry /2 + rm
rm = ra /2
§ No unique solution
§ All solutions equivalent modulo the scale factor
§ Additional constraint forces uniqueness:
§ 𝒓𝒚 + 𝒓𝒂 + 𝒓𝒎 = 𝟏
𝟐 𝟐 𝟏
§ Solution: 𝒓𝒚 = , 𝒓𝒂 = , 𝒓𝒎 =
𝟓 𝟓 𝟓
§ But, we need a better method for large web-size graphs

COMP4434 15
New Jersey Institute of Technology
PageRank: Matrix Formulation

§ Stochastic adjacency matrix 𝑴

§ Let page 𝑖 has 𝑑𝑖 out-links
!
§ If 𝑖 → 𝑗, then 𝑀𝑗𝑖 = else 𝑀𝑗𝑖 = 0
"!
§ 𝑴 is a column stochastic matrix
§ Columns sum to 1
§ Rank vector 𝒓: vector with an entry per page
ri
§ 𝑟𝑖 is the importance score of page 𝑖
§ ∑' 𝑟' = 1
rj = å
i® j di
§ The flow equations can be written
𝒓 = 𝑴⋅ 𝒓
COMP4434 16
New Jersey Institute of Technology
Example i k
ri/3
rk/4

§ Remember the flow equation: j rj/3

§ Flow equation in the matrix form rj/3 rj/3

𝑴⋅ 𝒓=𝒓
§ Suppose page i links to 3 pages, including j
i

j rj
. =
ri
ri
1/3 rj = å
i® j di
M . r = r
COMP4434 17
New Jersey Institute of Technology
Example: Flow Equations & M

y a m
y y ½ ½ 0
a ½ 0 1
a m m 0 ½ 0

r = M·r

ry = ry /2 + ra /2 y ½ ½ 0 y
ra = ry /2 + rm a = ½ 0 1 a
rm = ra /2 m 0 ½ 0 m

COMP4434 18
New Jersey Institute of Technology
Power Iteration Method

§ Given a web graph with N nodes, where the nodes are pages
and edges are hyperlinks
§ Power iteration: a simple iterative scheme
§ Suppose there are N web pages
§ Initialize: r(0) = [1/N,….,1/N]T (t )
r
å
( t +1)
§ Iterate: r(t+1) = M · r(t) rj = i

§ Stop when |r(t+1) – r(t)|1 < e i® j di

di …. out-degree of node i

|x|1 = å1≤i≤N|xi| is the L1 norm

Can use any other vector norm, e.g., Euclidean

COMP4434 19
New Jersey Institute of Technology
PageRank: How to solve?
§ Power Iteration: y a m
y ½ ½ 0
§ Set 𝑟( = 1/N
y a ½ 0 1
*
§ 1: 𝑟′( = ∑'→( # m 0 ½ 0
+#
§ 2: 𝑟 = 𝑟′ a m
§ Go to 1 ry = ry /2 + ra /2
ra = ry /2 + rm
§ Example: rm = ra /2

ry 1/3 1/3 5/12 9/24 6/15

ra = 1/3 3/6 1/3 11/24 … 6/15
rm 1/3 1/6 3/12 1/6 3/15

Iteration 0, 1, 2, …
COMP4434 20
New Jersey Institute of Technology
Why Power Iteration works? (1) Details!

§ Power iteration:
A method for finding dominant eigenvector (the vector
corresponding to the largest eigenvalue)
§ 𝒓(𝟏) = 𝑴 ⋅ 𝒓(𝟎)
§ 𝒓(𝟐) = 𝑴 ⋅ 𝒓 𝟏 = 𝑴 𝑴𝒓 𝟎 = 𝑴𝟐 ⋅ 𝒓 𝟎

§ 𝒓(𝟑) = 𝑴 ⋅ 𝒓 𝟐 = 𝑴 𝑴𝟐 𝒓 𝟎 = 𝑴𝟑 ⋅ 𝒓 𝟎
§ Claim:
Sequence 𝑴 ⋅ 𝒓 𝟎 , 𝑴𝟐 ⋅ 𝒓 𝟎 , … 𝑴𝒌 ⋅ 𝒓 𝟎 , … approaches the
dominant eigenvector of 𝑴 (𝑴 is stochastic/Markov matrix)
§ NOTE: x is an eigenvector with the corresponding eigenvalue λ if:
𝑴𝒙 = 𝝀𝒙
Optimal r is the first or principal eigenvector of M, with
corresponding eigenvalue 1

COMP4434 21
New Jersey Institute of Technology
Why Power Iteration works? (2) Details!

§ Claim: Sequence 𝑴 ⋅ 𝒓 𝟎 , 𝑴𝟐 ⋅ 𝒓 𝟎 , … 𝑴𝒌 ⋅ 𝒓 𝟎 , … approaches

the dominant eigenvector of 𝑴 NOTE: x is an eigenvector with the
corresponding eigenvalue λ if:
𝑴𝒙 = 𝝀𝒙
§ Proof:
§ Assume M has n linearly independent eigenvectors,
𝑥1, 𝑥2, … , 𝑥3 with corresponding eigenvalues 𝜆1, 𝜆2, … , 𝜆3 ,
where 𝜆1 > 𝜆2 > ⋯ > 𝜆3
§ Vectors 𝑥1, 𝑥2, … , 𝑥3 form a basis and thus we can write:
𝑟 (4) = 𝑐1 𝑥1 + 𝑐2 𝑥2 + ⋯ + 𝑐3 𝑥3
§ 𝑴𝒓(𝟎) = 𝑴 𝒄𝟏 𝒙𝟏 + 𝒄𝟐 𝒙𝟐 + ⋯ + 𝒄𝒏 𝒙𝒏
= 𝑐1(𝑀𝑥1) + 𝑐2(𝑀𝑥2) + ⋯ + 𝑐3 (𝑀𝑥3 )
= 𝑐1(𝜆1𝑥1) + 𝑐2(𝜆2𝑥2) + ⋯ + 𝑐3 (𝜆3 𝑥3 )
§ Repeated multiplication on both sides produces
𝑀6 𝑟 (4) = 𝑐1(𝜆16 𝑥1) + 𝑐2(𝜆62 𝑥2) + ⋯ + 𝑐3 (𝜆63 𝑥3 )
COMP4434 22
New Jersey Institute of Technology
Why Power Iteration works? (3) Details!

§ Claim: Sequence 𝑴 ⋅ 𝒓 𝟎 , 𝑴𝟐 ⋅ 𝒓 𝟎 , … 𝑴𝒌 ⋅ 𝒓 𝟎 , … approaches the

dominant eigenvector of 𝑴
§ Proof (continued):
§ Repeated multiplication on both sides produces
𝑀% 𝑟 (') = 𝑐) (𝜆)% 𝑥) ) + 𝑐* (𝜆%* 𝑥* ) + ⋯ + 𝑐+ (𝜆%+ 𝑥+ )
,! % ,# %
§ 𝑀% 𝑟 (') = 𝜆)% 𝑐) 𝑥) + 𝑐* 𝑥* + ⋯ + 𝑐+ 𝑥+
," ,"
,! ,$
§ Since 𝜆) > 𝜆* then fractions , … <1
," ,"
,% %
and so = 0 as 𝑘 → ∞ (for all 𝑖 = 2 … 𝑛).
,"

§ Thus: 𝑴𝒌 𝒓(𝟎) ≈ 𝒄𝟏 𝝀𝒌𝟏 𝒙𝟏

§ Note if 𝑐( = 0 then the method won’t converge
§ The largest eigenvalue of a stochastic matrix is always 1.
COMP4434 23
New Jersey Institute of Technology
PageRank: Three Questions

(t )
ri
=å
( t +1)
rj
i® j di
or
equivalently
r = Mr
§ Does this converge?

§ Does it converge to what we want?

§ Are results reasonable?

COMP4434 24
New Jersey Institute of Technology
Does this converge?

(t )
ri
=å
( t +1)
a b rj
i® j di
§ Example:
ra 1 0 1 0
=
rb 0 1 0 1

Iteration 0, 1, 2, …

COMP4434 25
New Jersey Institute of Technology
Does it converge to what we want?

(t )
ri
=å
( t +1)
a b rj
i® j di
§ Example:
ra 1 0 0 0
rb = 0 1 0 0

Iteration 0, 1, 2, …

COMP4434 26
New Jersey Institute of Technology
PageRank: Problems

2 problems: Dead end

§ (1) Some pages are

dead ends (have no out-links)
§ “Vote” has “nowhere” to go to
§ Such pages cause importance to “leak out”
Spider
trap
§ (2) Spider traps:
(all out-links are within the group)
§ “Vote” gets “stuck” in a trap
§ And eventually spider traps absorb all importance

COMP4434 27
New Jersey Institute of Technology
Problem: Spider Traps

§ Power Iteration: y a m
y
y ½ ½ 0
§ Set 𝑟( = 1
a ½ 0 0
*
§ 𝑟( = ∑'→( # a m m 0 ½ 1
+#
§ And iterate m is a spider trap
ry = ry /2 + ra /2
ra = ry /2
§ Example: rm = ra /2 + rm
ry 1/3 2/6 3/12 5/24 0
ra = 1/3 1/6 2/12 3/24 … 0
rm 1/3 3/6 7/12 16/24 1
Iteration 0, 1, 2, …

All the PageRank score gets “trapped” in node m.

28
New Jersey Institute of Technology
Solution: Teleports!

§ The Google solution for spider traps: At each time step, the
“vote” has two options
§ With prob. b, follow a link at random
§ With prob. 1-b, jump to some random page
§ Common values for b are in the range 0.8 to 0.9
§ “Vote” will teleport out of spider trap
within a few time steps

y y

a m a m

COMP4434 29
New Jersey Institute of Technology
Problem: Dead Ends

§ Power Iteration: y a m
y
y ½ ½ 0
§ Set 𝑟( = 1
a ½ 0 0
*
§ 𝑟( = ∑'→( # a m m 0 ½ 0
+#
§ And iterate
ry = ry /2 + ra /2
ra = ry /2
§ Example: rm = ra /2
ry 1/3 2/6 3/12 5/24 0
ra = 1/3 1/6 2/12 3/24 … 0
rm 1/3 1/6 1/12 2/24 0
Iteration 0, 1, 2, …

Here the PageRank “leaks” out since the matrix is not stochastic. 30
New Jersey Institute of Technology
Solution: Always Teleport!

§ Teleports: Follow random teleport links with probability 1.0

from dead-ends
§ Adjust matrix accordingly

y y

a m a m
y a m y a m
y ½ ½ 0 y ½ ½ ⅓
a ½ 0 0 a ½ 0 ⅓
m 0 ½ 0 m 0 ½ ⅓

COMP4434 31
New Jersey Institute of Technology
Why Teleports Solve the Problem?

§ Spider-traps are not a problem, but with traps PageRank

scores are not what we want
§ Solution: Never get stuck in a spider trap by
teleporting out of it in a finite number of steps
§ Dead-ends are a problem
§ The matrix is not column stochastic so our initial
assumptions are not met
§ Solution: Make matrix column stochastic by always
teleporting when there is nowhere else to go

COMP4434 32
New Jersey Institute of Technology
Solution: Random Teleports
§ Google’s solution that does it all:
At each step, random surfer has two options:
§ With probability b, follow a link at random
§ With probability 1-b, jump to some random page

§ PageRank equation [Larry Page and Sergey Brin 1998]

𝑟" 1
𝑟! = ' 𝛽 + (1 − 𝛽) di … out-degree
𝑑" 𝑁 of node i

"→!
This formulation assumes that 𝑴 has no dead ends. We can either
preprocess matrix 𝑴 to remove all dead ends or explicitly follow random
teleport links with probability 1.0 from dead-ends.

COMP4434 33
New Jersey Institute of Technology
The Google Matrix

§ PageRank equation [Brin-Page, ‘98]

𝑟' 1
𝑟( = D 𝛽 + (1 − 𝛽)
𝑑' 𝑁
'→(
§ The Google Matrix A:
1 [1/N]NxN…N by N matrix
𝐴 =𝛽𝑀+ 1−𝛽 where all entries are 1/N
𝑁 7×7
§ We have a recursive problem: 𝒓 = 𝑨 ⋅ 𝒓
And the Power method still works!
§ What is b ?
§ In practice b =0.8 ~ 0.9 (make 5 steps on avg., jump)

COMP4434 34
New Jersey Institute of Technology
Random Teleports (b = 0.8)

M [1/N]NxN
7/15
y 1/2 1/2 0 1/3 1/3 1/3
0.8 1/2 0 0 + 0.2 1/3 1/3 1/3
0 1/2 1 1/3 1/3 1/3
1/
5
7/1

15
5
7/1

y 7/15 7/15 1/15

13/15
a 7/15 1/15 1/15
a 7/15
m 1/15 7/15 13/15
1/15
m
1/
15
A

y 1/3 0.33 0.24 0.26 7/33

a = 1/3 0.20 0.20 0.18 ... 5/33
m 1/3 0.46 0.52 0.56 21/33
COMP4434 35
New Jersey Institute of Technology
MapReduce Program for PageRank
Map(key, value) {
// key: a page,
// value: page rank of the page

For each page in Adj[key]

emit(page, PR(key)/sizeof(Adj[key]);
}

Reduce(key, values) {
// key: a page,
// values: a list of page ranks from all its incoming pages

PR(key)=1-b;
For each pagerank in values
PR(key) = PR(key) + b*pagerank;
emit(key, PR(key));
}

COMP4434 36
New Jersey Institute of Technology
MapReduce Program for PageRank

ABCD A1 B 1/3
A 1/2
BAD B1 C 1/3
Map Reduce A 1/2
CAB C1 D 1/3
DBC D1
A 1/2 B 1/3
Links.txt Initial PR D 1/2 B 1/2
B 1/2

A 1/2
C 1/3
B 1/2
C 1/2

B 1/2
C 1/2 D 1/3
D 1/2

COMP4434 37
New Jersey Institute of Technology
Web Search Engines

§ Indexer
§ Process the retrieved
pages/documents and
represents them in
efficient search data
structures (inverted files)

§ Query Server
§ Accept the query from the
user and return the result
pages by consulting the
search data structure

COMP4434 38
New Jersey Institute of Technology

14-link-1 - converted
No ratings yet
14-link-1 - converted
10 pages
PageRank_2021
No ratings yet
PageRank_2021
55 pages
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
No ratings yet
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
51 pages
The Linear Algebra Behind Google
No ratings yet
The Linear Algebra Behind Google
34 pages
ch05 Linkanalysis1
No ratings yet
ch05 Linkanalysis1
60 pages
TM3 ch05 Link Analysis
No ratings yet
TM3 ch05 Link Analysis
69 pages
Lecture9
No ratings yet
Lecture9
64 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
44 pages
09 Pagerank
No ratings yet
09 Pagerank
61 pages
Assignment5 NLA Aug2023
No ratings yet
Assignment5 NLA Aug2023
7 pages
Pagerank The Matrix Formulation
No ratings yet
Pagerank The Matrix Formulation
4 pages
Google Pagerank and Reduced-Order Modelling
No ratings yet
Google Pagerank and Reduced-Order Modelling
56 pages
04 Pagerank
No ratings yet
04 Pagerank
64 pages
Unit 4
No ratings yet
Unit 4
60 pages
Es 5 Power - Flow
No ratings yet
Es 5 Power - Flow
84 pages
Link Analysis 1
No ratings yet
Link Analysis 1
101 pages
6 Pagerank
No ratings yet
6 Pagerank
7 pages
Project 12
No ratings yet
Project 12
7 pages
Google Pagerank: The World'S Largest Matrix Computation
No ratings yet
Google Pagerank: The World'S Largest Matrix Computation
13 pages
math551lab12
No ratings yet
math551lab12
5 pages
20241017_page_rank
No ratings yet
20241017_page_rank
29 pages
CS345 Data Mining: Link Analysis Algorithms Page Rank
No ratings yet
CS345 Data Mining: Link Analysis Algorithms Page Rank
37 pages
mlc_04_graph_methods_ranking_communities_link_prediction-sose2023
No ratings yet
mlc_04_graph_methods_ranking_communities_link_prediction-sose2023
110 pages
The Use of The Linear Algebra by Web Search Engines
No ratings yet
The Use of The Linear Algebra by Web Search Engines
5 pages
Report PDF
No ratings yet
Report PDF
35 pages
CSF-469-L11-13 (Link Analysis Page Rank)
No ratings yet
CSF-469-L11-13 (Link Analysis Page Rank)
47 pages
Extrapolation Methods For Accelerating Pagerank Computations
No ratings yet
Extrapolation Methods For Accelerating Pagerank Computations
45 pages
Theorist's Toolkit Lecture 6: Eigenvalues and Expanders
No ratings yet
Theorist's Toolkit Lecture 6: Eigenvalues and Expanders
9 pages
The $25 Billion Eigenvector
No ratings yet
The $25 Billion Eigenvector
11 pages
Markov Chains
No ratings yet
Markov Chains
37 pages
Kendall We I
No ratings yet
Kendall We I
3 pages
Math Foundations of Gena i
No ratings yet
Math Foundations of Gena i
210 pages
Centrality Measures
No ratings yet
Centrality Measures
56 pages
Lab 4-2
No ratings yet
Lab 4-2
4 pages
Computation of Matrix Eigenvalues and Eigenvectors
No ratings yet
Computation of Matrix Eigenvalues and Eigenvectors
16 pages
The Pagerank Problem: Mathematical Theory and Computational Techniques
No ratings yet
The Pagerank Problem: Mathematical Theory and Computational Techniques
43 pages
The Pagerank and HITS Algorithms
No ratings yet
The Pagerank and HITS Algorithms
22 pages
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
No ratings yet
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
40 pages
F1 DB5 D 01
No ratings yet
F1 DB5 D 01
4 pages
5 - Cia 2 Key
No ratings yet
5 - Cia 2 Key
4 pages
Network Analysis and Mining: Pagerank
No ratings yet
Network Analysis and Mining: Pagerank
13 pages
Matrix Analysis
No ratings yet
Matrix Analysis
143 pages
Final Report Rep 2 1
No ratings yet
Final Report Rep 2 1
25 pages
Group 11 C21+A24 - MAT2001 - Group Activity
No ratings yet
Group 11 C21+A24 - MAT2001 - Group Activity
27 pages
Ai Set 04
No ratings yet
Ai Set 04
59 pages
Dbms Review-3: G.BALAVIGNESH-10MSE1072 Harshavardhan-10Mse1077
No ratings yet
Dbms Review-3: G.BALAVIGNESH-10MSE1072 Harshavardhan-10Mse1077
35 pages
Lecture 3 Introduction to Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction to Linear Algebra (Part 2)
57 pages
pagerank
No ratings yet
pagerank
3 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
Lecture 4
No ratings yet
Lecture 4
3 pages
Numerical Analysis Lectures
No ratings yet
Numerical Analysis Lectures
20 pages
Cse535 Link Analysis
No ratings yet
Cse535 Link Analysis
19 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
The Matrix Calculus You Need For Deep Learning
No ratings yet
The Matrix Calculus You Need For Deep Learning
34 pages
Evmeth
No ratings yet
Evmeth
9 pages
Applications of Eigenvalues and Eigenvectors
No ratings yet
Applications of Eigenvalues and Eigenvectors
5 pages
How Works: M. Ram Murty, FRSC Queen's Research Chair Queen's University
No ratings yet
How Works: M. Ram Murty, FRSC Queen's Research Chair Queen's University
29 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
Lecture2_Gradient Descent_V0
No ratings yet
Lecture2_Gradient Descent_V0
51 pages
Lecture1_Introduction_V0
No ratings yet
Lecture1_Introduction_V0
25 pages
Config
No ratings yet
Config
3 pages
COMP1901 Research Project
No ratings yet
COMP1901 Research Project
12 pages
The Hong Kong Polytechnic University: Reference Checklist (Confidential)
No ratings yet
The Hong Kong Polytechnic University: Reference Checklist (Confidential)
4 pages
Computational Thinking and Problem Solving (COMP1002) and Problem Solving Methodology in Information Technology (COMP1001)
No ratings yet
Computational Thinking and Problem Solving (COMP1002) and Problem Solving Methodology in Information Technology (COMP1001)
3 pages
Sampling Distribution and P G Estimation: T I3 Topic 3
No ratings yet
Sampling Distribution and P G Estimation: T I3 Topic 3
46 pages
Dropbox
No ratings yet
Dropbox
34 pages
Manual Display NT-6040
No ratings yet
Manual Display NT-6040
60 pages
GPS Tracker Communication Protocol
No ratings yet
GPS Tracker Communication Protocol
39 pages
The First Slam Dunk Online Updates
No ratings yet
The First Slam Dunk Online Updates
7 pages
Hotmail (27)
No ratings yet
Hotmail (27)
286 pages
Single Aisle Technical Training Manual CMQ A330/A340 To A319/320/321 (IAE V2500/ME) T1 (LVL 2&3) Information Systems
100% (1)
Single Aisle Technical Training Manual CMQ A330/A340 To A319/320/321 (IAE V2500/ME) T1 (LVL 2&3) Information Systems
30 pages
икт 9 неделя
No ratings yet
икт 9 неделя
5 pages
Console Workflow Schedule - Log
No ratings yet
Console Workflow Schedule - Log
3 pages
Part No.: 50116323 BCL 304i OL 100: Technical Data Sheet Stationary Bar Code Reader
No ratings yet
Part No.: 50116323 BCL 304i OL 100: Technical Data Sheet Stationary Bar Code Reader
9 pages
EPS Fallback Voice in 5G
No ratings yet
EPS Fallback Voice in 5G
7 pages
CalibrationManual e
No ratings yet
CalibrationManual e
29 pages
BRACNet Financial Proposal For Sheride Limited (Domain Services)
No ratings yet
BRACNet Financial Proposal For Sheride Limited (Domain Services)
9 pages
1686-Article Text-5618-1-10-20240723 (3)
No ratings yet
1686-Article Text-5618-1-10-20240723 (3)
6 pages
ICT Use and Access by Individuals and Households Survey Report, Malaysia, 2020
No ratings yet
ICT Use and Access by Individuals and Households Survey Report, Malaysia, 2020
4 pages
Extensions - 2020-11-21 17.12.32
No ratings yet
Extensions - 2020-11-21 17.12.32
18 pages
ThinkPad E145 DALI2KMB8D0 Schematic Diagram
No ratings yet
ThinkPad E145 DALI2KMB8D0 Schematic Diagram
38 pages
Searchq Argeddion&Ie UTF 8&oe UTF 8&Hl en Kw&Client Safari
No ratings yet
Searchq Argeddion&Ie UTF 8&oe UTF 8&Hl en Kw&Client Safari
1 page
WGS160 Answers 3
No ratings yet
WGS160 Answers 3
5 pages
Broadband Network Management: ATM Networks
No ratings yet
Broadband Network Management: ATM Networks
40 pages
Aec3012 4001
No ratings yet
Aec3012 4001
17 pages
Payload XSS
No ratings yet
Payload XSS
5 pages
Ge23p - Math - Adv Test - 22.08.2021192345
No ratings yet
Ge23p - Math - Adv Test - 22.08.2021192345
10 pages
IP Address - switching
No ratings yet
IP Address - switching
6 pages
MELCs MIL Q1 1-8 Periodical Exam With Answer Keys SY 2022-2023-Final
100% (1)
MELCs MIL Q1 1-8 Periodical Exam With Answer Keys SY 2022-2023-Final
5 pages
Ipconfig All
No ratings yet
Ipconfig All
1 page
Soal Informatika PTS - KELAS 9 - 050914
No ratings yet
Soal Informatika PTS - KELAS 9 - 050914
7 pages
Application Form For Wi-Fi Connection
No ratings yet
Application Form For Wi-Fi Connection
1 page
FPS-F5 Assembly and Operating Manual
No ratings yet
FPS-F5 Assembly and Operating Manual
40 pages
Computer Networks Lab: Lab No. 10 Dynamic Routing
No ratings yet
Computer Networks Lab: Lab No. 10 Dynamic Routing
7 pages
Cisco Serial Cable Connection Guide
No ratings yet
Cisco Serial Cable Connection Guide
7 pages

Lecture11_PageRank_V0

Uploaded by

Lecture11_PageRank_V0

Uploaded by

COMP4434 Big Data Analytics

analytics with big data aGo

New Jersey Institute of Technology

§ First try: Human curated Web directories

§ (1) Web contains many sources of information

§ All web pages are not equally “important”

§ There is large diversity

§ Page is more important if it has more links

§ Are all in-links are equal?

§ In-coming links! Out-going links?

§ Born in March 26, 1973

§ Each link’s vote is proportional to the importance of its source

PR(j) = PR(i)/3+PR(k)/4 j rj/3

§ A “vote” from an important page is y/2

§ Stochastic adjacency matrix 𝑴

§ Remember the flow equation: j rj/3

§ Flow equation in the matrix form rj/3 rj/3

§ Stop when |r(t+1) – r(t)|1 < e i® j di

|x|1 = å1≤i≤N|xi| is the L1 norm

ry 1/3 1/3 5/12 9/24 6/15

§ Claim: Sequence 𝑴 ⋅ 𝒓 𝟎 , 𝑴𝟐 ⋅ 𝒓 𝟎 , … 𝑴𝒌 ⋅ 𝒓 𝟎 , … approaches

§ Claim: Sequence 𝑴 ⋅ 𝒓 𝟎 , 𝑴𝟐 ⋅ 𝒓 𝟎 , … 𝑴𝒌 ⋅ 𝒓 𝟎 , … approaches the

§ Thus: 𝑴𝒌 𝒓(𝟎) ≈ 𝒄𝟏 𝝀𝒌𝟏 𝒙𝟏

§ Does it converge to what we want?

§ Are results reasonable?

2 problems: Dead end

§ (1) Some pages are

All the PageRank score gets “trapped” in node m.

§ Teleports: Follow random teleport links with probability 1.0

§ Spider-traps are not a problem, but with traps PageRank

§ PageRank equation [Larry Page and Sergey Brin 1998]

§ PageRank equation [Brin-Page, ‘98]

y 7/15 7/15 1/15

y 1/3 0.33 0.24 0.26 7/33

For each page in Adj[key]

You might also like