ML Workshop 1: A New Architecture for Machine Learning Logistics

© 2017 MapR Technologies 1
Machine Learning Model Management
The working of the rendezvous framework

Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Committer, PMC member, board member, ASF
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning

Traditional View

Traditional View: This isn’t the whole story

90% of the effort in successful machine
learning isn’t in the training or model dev…
It’s the logistics

Rendezvous Architecture
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results

What We Ultimately Want
request
response
Model

But This Isn’t The Answer
Model 1
request
response
Load
balancer
Model 2
Model 3

First Try with Streams
Input
Model 1
Model 2
Model 3
request
response
?

First Rendezvous
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results

Some Key Points
• Note that all models see identical inputs
• All models run in production setting
• All models send scores to same stream
• The rendezvous server decides which scores to ignore
• Roll forward, roll back, correlated comparison are all now trivial

Reality Check, Injecting External State
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Database
The world

Recording Raw Data (as it really was)
Input
Scores
Decoy
Model 2
Model 3
Archive

Quality & Reproducibility of Input Data is Important!
• Recording raw-ish data is really a big deal
– Data as seen by a model is worth gold
– Data reconstructed later often has time-machine leaks
– Databases were made for updates, streams are safer
• Raw data is useful for non-ML cases as well (think flexibility)
• Decoy model records training data as seen by models under
development & evaluation

Canary for Comparison
Real
model
∆
Result
Canary
Decoy
Archive
Input

What Does the Canary Do?
• The canary is a real model, but is very rarely updated
• The canary results are almost never used for decisioning
• The virtue of the canary is stability
• Comparing to the canary results gives insight into new models

Isolated Development With Stream Replication
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Internal 1
Internal 2
Internal 3
The world
Model 4
Raw
New
external
data
Input
Internal 4
Production
Development

Scores
ArchiveDecoy
m1
m2
m3
Features /
proﬁles
InputRaw

ResultsRendezvousScores
ArchiveDecoy
m1
m2
m3
Features /
proﬁles
InputRaw

Metrics
Metrics
ArchiveDecoy
m1
m2
m3
Features /
proﬁles
InputRaw

Some Details
• Inside the rendezvous server
– Message contents … highlight return address
– Rendezvous mailbox
– Schedule ideas
• Inside a model container
– Identical inputs makes scaling easy
– Nearly stateless models
– Streaming shims, latency rig

Message Content
• Input request contains request data plus administrivia
{
timestamp: 1501020498314,
messageId: "2a5f2b61fdd848d7954a51b49c2a9e2c",
return: "proxy-217"
provenance: { ... },
diagnostics: { ... },
... application specific data here ..
}

Rendezvous Schedules
• Simple part
– Up to deadline, accept preferred models
– Up to next deadline, accept more models
– Near final deadline, accept default answer
• But also some probabilistic choice
• And also consider external experimental control
– Inject as external state
– Use in rendezvous to select model result
– Open question how much power to expose

The rendezvous server is simpler
than it looks at first

Model Life Cycle
• Developer / modeler produces container spec
– And uses this to build their development article
• QA inspects container spec
– And uses this to build a test article
• Security inspects container spec
– And uses this to build final artifact
• Important to use tools like Grafeas to inspect supply chain
http://bit.ly/grafeas
• Important that each step be inspectable

Almost all of the framework scales by
trivial parallelism

Scaling Up
• Note about streams
– At millions of updates per server, the streams aren’t part of the streaming
question
• Scaling up state injection
– Partition raw input, replicate state injector
– Beware external throughput limits
– State injection does avoid duplicate queries
• Scaling up models
– Stateless models allow trivial scaling
– Sequence state typically also trivial to scale
• Scaling up the rendezvous
– Match partition on raw and scores
– Replicate trivially

Metrics
Metrics
ArchiveDecoy
m1
m2
m3
Features /
proﬁles
InputRaw

In-place update of the framework via
modified Chandry-Lamport

Transition Message
Input
Features /
proﬁles
Raw

Transition Message
Features /
proﬁles
Input
Features /
proﬁles
Raw

Transition Message
Features /
proﬁles
Features /
proﬁles
InputRaw

Summary:
This is easy-ish

Summary:
This is easy-ish
Well, it isn’t real hard

First Rendezvous
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results

Additional Resources
O’Reilly report by Ted Dunning & Ellen Friedman © March 2017
Read free courtesy of MapR:
https://mapr.com/geo-distribution-big-data-and-analytics/
O’Reilly book by Ted Dunning & Ellen Friedman
© March 2016
https://mapr.com/streaming-architecture-using-
apache-kafka-mapr-streams/

O’Reilly book by Ted Dunning & Ellen Friedman
© June 2014
https://mapr.com/practical-machine-learning-
new-look-anomaly-detection/
O’Reilly book by Ellen Friedman & Ted Dunning
© February 2014
https://mapr.com/practical-machine-learning/

by Ellen Friedman 8 Aug 2017 on MapR blog:
https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/
by Ted Dunning 13 Sept 2017 in
InfoWorld:
https://www.infoworld.com/article/3223
688/machine-learning/machine-
learning-skills-for-software-
engineers.html

New book: Machine Learning Logistics
Model Management in the Real World
O’Reilly book by Ellen Friedman & Ted Dunning © Sept 2017
Download free from MapR
http://info.mapr.com/2017_Content_Machine-Learning-
Logistics_eBook_Prereg_RegistrationPage.html
Going to Strata Data NYC? Book will be released 26 Sept 2017:
Visit MapR booth for free book signings or to talk about logistics

Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015#womenintech #datawomen

Q&A
@mapr
tdunning@mapr.com
ENGAGE WITH US
@ Ted_Dunning

ML Workshop 1: A New Architecture for Machine Learning Logistics

Recommended

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to ML Workshop 1: A New Architecture for Machine Learning Logistics (20)

More from MapR Technologies (19)

Recently uploaded (20)

ML Workshop 1: A New Architecture for Machine Learning Logistics