CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

7. After

8. After

Large-‐Scale
Machine
Learning
and
Graphs

Yucheng
Low

Phase
1:
POSSIBILITY

Benz
Patent
Motorwagen
(1886)

Phase
2:
SCALABILITY

Model
T
Ford
(1908)

Possibility

6. Before

7. After

6. Before

Scalability

+
Graph
8. After

7. After

6. Before

8. After

7. After

Usability
8. After

The
Big
QuesFon
of

Big
Learning

How
will
we

design
and
implement

parallel
learning
systems?

MapReduce
for
Data-‐Parallel
ML

Excellent
for
large
data-‐parallel
tasks!

Data-Parallel

MapReduce

Feature

ExtracFon

Cross

ValidaFon

CompuFng
Suﬃcient

StaFsFcs

Graph-Parallel

Is
there
more
to

Machine
Learning

Graphical
Models

Gibbs
Sampling

Belief
PropagaFon

VariaFonal
Opt.

CollaboraLve

Filtering

Semi-‐Supervised

Learning

?

Tensor
FactorizaFon

Label
PropagaFon

CoEM

Graph
Analysis

PageRank

Triangle
CounFng

Es(mate
Poli(cal
Bias

?

?

?

Liberal

?

?

Post

?

Post

?

?

?

Semi-‐Supervised
&

?

TransducFve
Learning

Post

Post

?

?

?

Post

Post

Post

?

ConservaFve

?

?

?

?

Post

?

Post

?

?

?

?

Post

?

Post

?

Post

Post

Post

?

Post

Post

?

?

?

?

Flashback
to
1998

First
Google
advantage:

a
Graph
Algorithm
&
a
System
to
Support
it!

The
Power
of

Dependencies

where
the
value
is!

It’s
all
about
the

graphs…

Social
Media

!

AdverLsing

Web

Graphs
encode
the
relaLonships
between:

People

!

Science

Products

Ideas

Facts

Interests

Big:
100
billions
of
verLces
and
edges
and
rich
metadata

!
!

Facebook
(10/2012):
1B
users,
144B
friendships

Twicer
(2011):
15B
follower
edges

Examples
of

Graphs
in

Machine
Learning

CollaboraFve
Filtering:
ExploiFng
Dependencies

Women
on
the
Verge
of
a

Nervous
Breakdown

The
CelebraFon

Latent
Factor
Models

City
of
God

Matrix
CompleFon/FactorizaFon
Models

Wild
Strawberries

La
Dolce
Vita

Topic
Modeling

Cat

Apple

Latent
Dirichlet
AllocaFon,
etc

Growth

Hat

Plant

Example
Topics
Discovered
from
Wikipedia

Machine
Learning
Pipeline

Data

Extract
Features

Graph
Formation

Structured
Machine
Learning
Algorithm
6. Before

Value
from
Data

7. After

face

labels

images

docs

movie

raFngs

doc

topics

social

acFvity

8. After

movie

recommend

senFment

analysis

ML
Tasks
Beyond
Data-‐Parallelism

Data-Parallel

Graph-Parallel

Map
Reduce

Feature

ExtracFon

Cross

ValidaFon

CompuFng
Suﬃcient

StaFsFcs

Graphical
Models

Gibbs
Sampling

Belief
PropagaFon

VariaFonal
Opt.

CollaboraLve

Filtering

Tensor
FactorizaFon

Semi-‐Supervised

Learning

Label
PropagaFon

CoEM

Graph
Analysis

PageRank

Triangle
CounFng

Example
of
a

Graph-‐Parallel

Algorithm

PageRank

Depends on rank
of who follows them…

Depends on rank
of who follows her

What’s the rank
of this user?

Rank?

Loops
in
graph
è
Must
iterate!

PageRank
IteraFon

R[j]

Iterate
unFl
convergence:

wji

R[i]

“My
rank
is
weighted

average
of
my
friends’
ranks”

X
R[i] = ↵ + (1 ↵)
wji R[j]
(j,i)2E

!
!

α
is
the
random
reset
probability

wji
is
the
prob.
transiFoning
(similarity)
from
j
to
i

ProperFes
of
Graph
Parallel
Algorithms

Dependency

Graph

Local

Updates

IteraFve

ComputaFon

My
Rank

Friends
Rank

The
Need
for
a
New
AbstracFon

!

Need:
Asynchronous,
Dynamic
Parallel
ComputaFons

Data-Parallel

Graph-Parallel

Map
Reduce

Feature

ExtracFon

Cross

ValidaFon

CompuFng
Suﬃcient

StaFsFcs

Graphical
Models

Gibbs
Sampling

Belief
PropagaFon

VariaFonal
Opt.

CollaboraLve

Filtering

Tensor
FactorizaFon

Semi-‐Supervised

Learning

Label
PropagaFon

CoEM

Data-‐Mining

PageRank

Triangle
CounFng

The
GraphLab
Goals

Know how to
solve ML problem
on 1 machine

Eﬃcient

parallel

predicFons

Data
Graph

Data
associated
with
verFces
and
edges

Graph:

• 
Social
Network

Vertex
Data:

• 
User
proﬁle
text

• 
Current
interests
esFmates

Edge
Data:

• 
Similarity
weights

How
do
we
program

graph
computaFon?

“Think
like
a
Vertex.”

-‐Malewicz
et
al.
[SIGMOD’10]

Update
FuncFons

User-‐deﬁned
program:
applied
to

vertex
transforms
data
in
scope
of
vertex

pagerank(i,
scope){

//
Get
Neighborhood
data

(R[i],
wij,
R[j])
!scope;

//
Update
the
vertex
data
Update
funcFon
applied
(asynchronously)

R[i] ← α + (1− α ) ∑ w ji × R[ j];
in
parallel
unFl
convergence

j∈N [i]

//
Reschedule
Neighbors
if
needed

if
R[i]
changes
then

Many
schedulers
available
eschedule_neighbors_of(i);

r to
prioriFze
computaFon

}

Dynamic

computaLon

The
GraphLab
Framework

Graph
Based

Data
Representa(on

Scheduler

Update
FuncFons

User
Computa(on

Consistency
Model

AlternaFng
Least

Squares

CoEM

Lasso

SVD

Belief
PropagaFon

LDA

Splash
Sampler

Bayesian
Tensor

FactorizaFon

PageRank

SVM

Gibbs
Sampling

Dynamic
Block
Gibbs
Sampling

K-‐Means

Linear
Solvers

…Many
others…

Matrix

FactorizaFon

Never
Ending
Learner
Project
(CoEM)

Hadoop

95
Cores

7.5
hrs

Distributed

GraphLab

32
EC2

machines

80
secs

0.3% of Hadoop time

2 orders of mag faster "#
2 orders of mag cheaper

!
!

ML
algorithms
as
vertex
programs

Asynchronous
execuFon
and
consistency

models

Thus
far…

GraphLab
1
provided
exciFng

scaling
performance

But…

We
couldn’t
scale
up
to

Altavista
Webgraph
2002

1.4B
verLces,
6.7B
edges

Natural
Graphs

[Image
from
WikiCommons]

Problem:

ExisFng
distributed
graph

computaFon
systems
perform

poorly
on
Natural
Graphs

Achilles
Heel:

Idealized
Graph
AssumpFon

Assumed…

Small
degree
"

Easy
to
parFFon

But,
Natural
Graphs…

Many
high
degree
verFces

(power-‐law
degree
distribuFon)

"

Very
hard
to
parFFon

Power-‐Law
Degree
DistribuFon

10

Number
of
VerFces

count

10

8

10

High-‐Degree

VerFces:

1%
verFces
adjacent

to
50%
of
edges

6

10

4

10

2

10

0

10

AltaVista
WebGraph

1.4B
VerFces,
6.6B
Edges

0

10

2

10

4

Degree

10
degree

6

10

8

10

High
Degree
VerFces
are
Common

Popular
Movies

Users

“Social”
People

NeYlix

Movies

Hyper
Parameters

θ

θ

β

θ

θ

Z

Z

Z

Z

Z

Z

Z

Z

w

w

Z

Z

w

w

Z

Z

w

w

Z

Z

Z

w

w

w

Z

w

w

w

w

w

w

w

Docs

α

Common
Words

LDA

Obama

Words

Power-‐Law
Degree
DistribuFon

“Star
Like”
MoFf

President

Obama

Followers

Problem:

High
Degree
VerLces
è
High

CommunicaLon
for
Distributed
Updates

Data transmitted
Y

across network
O(# cut edges)

Natural
graphs
do
not
have
low-‐cost
balanced
cuts

[Leskovec
et
al.
08,
Lang
04]

Machine
1

Machine
2

Popular
parFFoning
tools
(MeFs,
Chaco,…)
perform
poorly

[Abou-‐Rjeili
et
al.
06]

Extremely
slow
and
require
substan(al
memory

acement cutsParFFoning
edges:
most of the
Random

!

Both
GraphLab
1,
Pregel,
Twicer,
Facebook,…
rely
on

Random
(hashed)
parFFoning
for
Natural
Graphs

m 5.1. If vertices are randomly assign
s then the expected fraction of edges cu
For
p
Machines:



|Edges Cut|
1

E
=1
|E|
p

Machine

10
Machines
Machine
e
if just twoà
90%
of
1
dges
cut
used,
machines are 2

100
Machines
à
99%
of
edges
cut!

ample
ha
will be cut requiring order |E| /2 commu
All
data
is
communicated…
Licle
advantage
over
MapReduce

In
Summary

GraphLab
1
and
Pregel
are
not
well

suited
for
natural
graphs

!
!

Poor
performance
on
high-‐degree
verFces

Low
Quality
ParFFoning

Common
Padern
for
Update
Fncs.

R[j]

wji

R[i]

GraphLab_PageRank(i)

//
Compute
sum
over
neighbors

total
=
0

Gather
InformaLon

foreach(
j
in
in_neighbors(i)):

About
Neighborhood

total
=
total
+
R[j]
*
wji

//
Update
the
PageRank

Apply
Update
to
Vertex

R[i]
=
0.1
+
total

//
Trigger
neighbors
to
run
again

if
R[i]
not
converged
then

Sca7er
Signal
to
Neighbors

foreach(
j
in
out_neighbors(i))

Modify
Edge
Data

&

signal
vertex-‐program
on
j

GAS
DecomposiFon

Gather
(Reduce)

Accumulate
informaFon

about
neighborhood

Y

Y

Y

⌃

+

+
…
+

$

Scacer

Apply
the
accumulated

value
to
center
vertex

Σ

Y

Parallel

“Sum”

Apply

Y

Update
adjacent
edges

and
verFces.

Y’

Y’

Many
ML
Algorithms
ﬁt

into
GAS
Model

graph
analyFcs,
inference
in
graphical

models,
matrix
factorizaFon,

collaboraFve
ﬁltering,
clustering,
LDA,
…

Minimizing
CommunicaFon
in
GL2
PowerGraph:

Vertex
Cuts

Y

CommunicaFon
linear

in
#
spanned
machines

GL2
PowerGraph
includes
novel
vertex
cut
algorithms

%

A
vertex-‐cut
m gains
in
p

Provides
order
of
magnitude
inimizes
erformance

#
machines
per
vertex

Percola(on
theory
suggests
Power
Law
graphs
can
be
split

by
removing
only
a
small
set
of
ver(ces
[Albert
et
al.
2000]

è

Small
vertex
cuts
possible!

7. After

From
the
AbstracFon

to
a
System

8. After

Triangle
CounLng
on
Twicer
Graph

34.8
Billion
Triangles

Hadoop
1636
Machines

[WWW’11]

423
Minutes

64
Machines

15
Seconds

Why?
Wrong
AbstracLon

$

Broadcast
O(degree2)
messages
per
Vertex

S.
Suri
and
S.
Vassilvitskii,
“CounFng
triangles
and
the
curse
of
the
last
reducer,”
WWW’11

Topic
Modeling
(LDA)

Million
Tokens
Per
Second

0

20

60

80

100

120

140

Speciﬁcally
engineered
for
this
task

Smola
et
al.

GL2
PowerGraph

40

64
cc2.8xlarge
EC2
Nodes

200
lines
of
code
&
4
human
hours

!

English
language
Wikipedia

!

!

2.6M
Documents,
8.3M
Words,
500M
Tokens

ComputaFonally
intensive
algorithm

100
Yahoo!
Machines

160

How
well
does
GraphLab
scale?

Yahoo
Altavista
Web
Graph
(2002):

One
of
the
largest
publicly
available
webgraphs

1.4B
Webpages,

6.7
Billion
Links

7
seconds
per
iter.

64
HPC
Nodes

1B
links
processed
per
second

30
lines
of
user
code

GraphChi:
Going
small
with
GraphLab

7. After

8. After

Solve
huge
problems
on

small
or
embedded

devices?

Key:
Exploit
non-‐volaFle
memory

(starFng
with
SSDs
and
HDs)

GraphChi
–
disk-‐based
GraphLab

Challenge:

Random
Accesses

Novel
GraphChi
soluLon:

Parallel
sliding
windows
method
è

minimizes
number
of
random
accesses

Triangle
CounFng
on
Twicer
Graph

40M
Users

Total:
34.8
Billion
Triangles

1.2B
Edges

Hadoop

1636
Machines

423
Minutes

59
Minutes

59
Minutes,
1
Mac
Mini!

GraphChi

GraphLab2

64
Machines,
1024
Cores

15
Seconds

Hadoop results from [Suri & Vassilvitskii '11]

6. Before

!
!

ML
algorithms
as
vertex
programs

Asynchronous
execuFon
and
consistency

models

7. After

!

8. After

!

Natural
graphs
change
the
nature
of

computaFon

Vertex
cuts
and
gather/apply/scacer
model

GL2
PowerGraph

focused
on

Scalability

at
the
loss
of

Usability

GraphLab
1

PageRank(i,
scope){

acc
=
0

for
(j
in
InNeighbors)
{

acc
+=
pr[j]
*
edge[j].weight

}

pr[i]
=
0.15
+
0.85
*
acc

}

Explicitly
described
operaLons

Code is intuitive

GraphLab
1

GL2
PowerGraph

Implicit
operaLon

PageRank(i,
scope){

acc
=
0

for
(j
in
InNeighbors)
{

acc
+=
pr[j]
*
edge[j].weight

}

pr[i]
=
0.15
+
0.85
*
acc

}

Explicitly
described
operaLons

gather(edge)
{

return
edge.source.value
*

edge.weight

}

merge(acc1,
acc2)
{

return
accum1
+
accum2

}

Implicit
aggregaLon

apply(v,
accum)
{

v.pr
=
0.15
+
0.85
*
acc

}

Code is intuitive

Need to understand engine
to understand code

Scalability,

but
very
rigid
abstracFon

Great
ﬂexibility,

but
hit
scalability
wall

(many
contorFons
needed
to
implement

SVD++,
Restricted
Boltzmann
Machines)

What now?

GL3
WarpGraph
Goals

Program

Like
GraphLab
1

Run
Like

GraphLab
2

Machine 1

Machine 2

Fine-‐Grained
PrimiFves

Expose
Neighborhood
OperaLons
through
Parallel
Iterators

R[i] = 0.15 + 0.85

X

(j,i)2E

Y

w[j, i] ⇤ R[j]

PageRankUpdateFunction(Y)
{

Y.pagerank
=
0.15
+
0.85
*

MapReduceNeighbors(

lambda
nbr:
nbr.pagerank*nbr.weight,

lambda
(a,b):
a
+
b
neighbors)
(aggregate sum over

)

}

Expressive,
Extensible
Neighborhood
API

Parallel
Transform

Adjacent
Edges

Broadcast

Y

Y

Y

Modify
adjacent
edges

Schedule
a
selected
subset

of
adjacent
verFces

Y

+

+
…
+

Y

Parallel

Sum

Y

MapReduce
over

Neighbors

Can
express
every
GL2
PowerGraph
program

(more
easily)
in
GL3
WarpGraph

But
GL3
is
more

expressive

MulFple

gathers

UpdateFunction(v)
{

if
(v.data
==
1)

accum
=
MapReduceNeighs(g,m)

else
...

}

Scacer
before

gather

CondiFonal

execuFon

Graph
Coloring
Twicer
Graph:
41M
VerFces
1.4B
Edges

GL2
PowerGraph

227
seconds

GL3
WarpGraph
60
seconds

3.8x
Faster

WarpGraph
outperforms
PowerGraph
with
simpler
code

32
Nodes
x
16
Cores
(EC2
HPC
cc2.8x)

6. Before

!
!

ML
algorithms
as
vertex
programs

Asynchronous
execuFon
and
consistency

models

7. After

6. Before

!

8. After

!

Natural
graphs
change
the
nature
of

computaFon

Vertex
cuts
and
gather/apply/scacer
model

7. After

8. After

!
!

Usability
is
key

Access
neighborhood
through
parallelizable

iterators
and
latency
hiding

Usability
for
Whom???

PowerGraph

WarpGraph

…

Machine
Learning

PHASE
3
(part
2)

USABILITY

ExciFng
Time
to
Work
in
ML

With Big Data,
I’ll take over
the world!!!

We met
because of
Big Data

Why won’t
Big Data read
my mind???

Unique
opportuniFes
to
change
the
world!!
☺

But,
every
deployed
system
is
an
one-‐oﬀ
soluFon,

and
requires
PhDs
to
make
work…
'

But…

Even
basics
of
scalable
ML

can
be
challenging

ML
key
to
any

new
service
we

want
to
build

6
months
from
R/Matlab

to
producFon,
at
best

State-‐of-‐art
ML
algorithms

trapped
in
research
papers

Goal
of
GraphLab
3:

Make
huge-‐scale
machine
learning
accessible
to
all!

Step
0
:

Learn
ML

With
GraphLab
Notebook

GraphLab
+
Python

Step
1
:

pip
install
graphlab

prototype
on
local
machine

GraphLab
+
Python

Step
2
:

scale
to
full
dataset

in
the
cloud

with
minimal
code
changes

GraphLab
+
Python

Step
3:

deploy
in
production

GraphLab
+
Python

Step
4:

???

GraphLab
+
Python

Step
4:

Profit

GraphLab
Toolkits

Highly
scalable,
state-‐of-‐the-‐art

machine
learning
methods…
all
accessible

from
python

Graph

AnalyFcs

Graphical

Models

Computer

Vision

Clustering

Topic

Modeling

CollaboraFve

Filtering

Now
with
GraphLab:

Learn/Prototype/Deploy

Even
basics
of
scalable
ML

can
be
challenging

6
months
from
R/Matlab

to
producFon,
at
best

State-‐of-‐art
ML
algorithms

trapped
in
research
papers

Learn ML with
GraphLab Notebook

pip install graphlab
then deploy on
EC2/Cluster
Fully integrated
via GraphLab Toolkits

We’re
selecFng
strategic
partners

Help
deﬁne
our
strategy
&
prioriFes

And,
get
the
value
of
GraphLab
in
your
company

partners@graphlab.com

8. After

C++
GraphLab
2.2
available
now:
graphlab.com

Beta
Program:
beta.graphlab.com

Follow
us
on
Twicer:
@graphlabteam

Deﬁne
our
future:
partners@graphlab.com

Needless
to
say:
jobs@graphlab.com

CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin (20)

More from AMD Developer Central (20)

Recently uploaded (20)

CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonzalez and Carlos Guestrin