Distributed Deep Learning on Spark

© 2014 MapR Technologies 1© 2014 MapR Technologies
Distributed Deep Learning on Spark
Mathieu Dumoulin - Data Engineer
MapR Professional Services APAC

© 2014 MapR Technologies 2
Tonight’s Presentation FAQ-Style
• Short intro on machine learning
• What’s Deep learning?
• Why distributed? Why do we need a computer cluster?
• Why run it on Spark?
• How does it work?
– Case study of SparkNet: Training Deep Networks in Spark
– Case Study of CaffeOnSpark
• Can I see a Demo?
– Installation Process
– Caffe demo
– CaffeOnSpark demo

© 2014 MapR Technologies 3
Machine Learning is all around us!
• Internet search with Google and Bing
• Contextual ads (Adsense)
• Apple iOS 9&10 (interesting link with details!)
• Google GMail/Inbox (Priority Inbox, Spam filtering)
• Fraud Detection
• Recommendations (Amazon)
• Image recognition (I can see… cats!)
• Language Modeling & Speech Recognition (Siri, Google Now,
Google Translate)

© 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential
Classification of images

Why Deep Learning?
• Because they work really, really well!
• Deep learning is the state of the art in applied machine learning
– Wins in every major machine learning competition
• Kaggle
• ImageNet
• Especially well suited for:
– Images (classification, object detection, etc)
– Sounds (speech, music)
– Text (translation)
• Deep Learning is very CPU intensive
– More processing for better models
– More processing for faster training

MNIST digits task
• Classify 60,000 handwritten digits to the correct number
Taken from Wikipedia (https://en.wikipedia.org/wiki/MNIST_database)
More deep learning results: (http://yann.lecun.com/exdb/mnist/)
Type Error rate
(%)
K-Nearest Neighbors 0.52[14]
Support vector machine 0.56[16]
Deep neural network 0.35[18]
Convolutional neural
network
0.23[8]

Results are now competitive with humans!

Why Distributed
“training can be time consuming, often requiring multiple days on a
single GPU using [SGD]” - Moritz et al - SparkNet
• The most GPU for one physical node is 3-4
• A cluster can spread the CPU/GPU load at the cost of increased
complexity
• Google coded such software from scratch early 2010.

How to Distribute: Parameter Server
• Li et al propose the “Parameter Server” approach in 2014
– https://www.cs.cmu.edu/~dga/papers/osdi14-paper-li_mu.pdf
From Arimo’s Distributed TensorFlow blog post (link)

Why Spark?
• Integrates well with existing “big data” batch processing
frameworks (Hadoop/MapReduce)
• Allows data to be kept in memory from start to finish
• Work with a single computational framework
• Relatively easy to implement parameter server

New frameworks for spark-based Distributed DL
• CaffeOnSpark (Yahoo America)
• SparkNet (Berkeley University’s Amplab)
• DeepLearning4J (Skymind)
• Elephas (Keras team)
• Distributed Tensor Flow (Arimo)

SparkNet implementation
From: https://arxiv.org/pdf/1511.06051v4.pdf

SparkNet implementation 2

SparkNet implementation 3

We need a Solver: Caffe
● (+) Good for feedforward networks and image processing
● (+) Good for finetuning existing networks
● (+) Train models without writing any code
● (+) Python interface is pretty useful
● (-) Need to write C++ / CUDA for new GPU layers
● (-) Not good for recurrent networks
● (-) Cumbersome for big networks (GoogLeNet, ResNet)
● (-) Not extensible, bit of a hairball
● (-) No commercial support
taken from: http://deeplearning4j.org/compare-dl4j-torch7-pylearn.html#caffe

Distributed SGD and Parameter Server

SparkNet’s implementation of DSGD

Benefits of the approach

Scaling performance of SparkNet

CaffeOnSpark
• Mix Java and Scala implementation
• Developed and used in production at Yahoo America
• Much easier to install than SparkNet, less buggy
• Can take advantage of Infiniband network
• Enhanced Caffe to use multi-GPU
• CaffeOnSpark executors communicate to each other via MPI
allreduce style interface
• Spark+MPI architecture achieves similar performance as
dedicated deep learning clusters
– Peer-to-peer parameter server
• Faster than SparkNet

CaffeOnSpark System Architecture
From: http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop

CaffeOnSpark vs. SparkNet
• Much faster
communication between
nodes (Infiniband
capability)
• Peer-to-peer parameter
exchange model is a
much faster
implementation
• Enhanced multi-GPU
Caffe also faster

Comparison of Frameworks (Spark Summit 2016)
By Yu Cao (EMC) and Zhe Dong (EMC) (Slideshare)

Benchmark 2
By Yu Cao (EMC) and Zhe Dong (EMC) (Slideshare)

Installing CaffeOnSpark
• I recommend Centos 7 or Ubuntu 14+
• Process is very “touchy”, easy to mess up
• Go step by step!
Process:
1. Update the OS and kernel, install dev tools (gcc, etc.) reboot
a. Disable “nouveau” driver!!!
2. Install NVidia Drivers latest, Cuda 7.5, cuDNN 4
3. Install Caffe
a. Install all caffe dependencies, make sure it compiles and examples
run.
4. Install CaffeOnSpark

Installing Caffe
Good tutorials are quite few!
• Ubuntu works more “out of the box” the default paths are all
correct
• Centos7: a few changes are needed but it’s still OK
The caffe web site instructions for Centos are a bit outdated.

Demos
• Running an example on Caffe
– Caffe deep network description files
– MNIST example
• Running an example with CaffeOnSpark
– MNIST example
– running on YARN/Spark Standalone

Distributed Deep Learning on Spark

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Distributed Deep Learning on Spark (20)

More from Mathieu Dumoulin (6)

Recently uploaded (20)

Distributed Deep Learning on Spark