SlideShare a Scribd company logo
Deep learning on a mixed
cluster with Deeplearning4j
and Spark
Barcelona Spark meetup, Dec 9, 2016
(right after NIPS)
francois@garillot.net @huitseeker
Agenda
Intro
Why Deep Learning on a
Cluster
Big Data Architecture
Deeplearning4j
Spark challenges
Introduction : Deep Learning
in the trenches today
The bad thing about doing a
talk right after NIPS
you guys are scary.
The good thing about doing a
talk right after NIPS
You guys don't need to be told SkyNet is a fantasy (for now).
Paying algorithms
Anomaly detection in many forms (bad guys / predictive
maintenance / market rally)
Fraud detection
Network intrusion
Fintech secutiries churn prediction
Video object detection (security)
Models that are being
neglected in benchmarks and
implementation efforts
LSTMs
Autoencoders
How to deal with this in the
Spark world ?
experiment with trained model application: Tensorframes,
what are the deep learning frameworks that let you train
?
Why Deep Learning on a
cluster ?
Practically ... let's look at benchmarks
Practically ... let's look at benchmarks
Practically ... let's look at benchmarks
Practically ... let's look at benchmarks
Training, but how ?
New Amazon GPU instances
Training, but how ?
Training, but how ?
Cluster training in the
enterprise
it's really about multi-tenancy & economies of scale
a big bunch of machines shared among everybody shares
better
if only because you can reuse it for other workloads
Minor reasons
enterprises may not have
GPUs
Distributing training
basically distributing SGD (R)
challenge is AllReduce Communication
Sparse updates, async
communications
Distributing training : good
engineering matters
Cluster training in your
(experimentor) case ?
it's a fun problem : AllReduce
Ultimately solved for people with a large amount of images
that solution is not open-source (but at Facebook, Google,
Amazon, Microsoft¹, Baidu)
¹: 1-bit SGD is under non-commercial license in CNTK 2.0
Big Data architecture
With a parameter server
With Spark
Spark does the initial ETL
Spark ingests the nal result
In the middle : parameter
server.
Spark cluster modes
Mesos GPU support merged
devices cgroups !
YARN GPU support through
tags
Spark Standalone : ?
Deeplearning4j
Deeplearning4j
the rst commercial-grade, open-source, distributed deep-
learning library written for Java and Scala
Skymind its commercial support arm
Scienti c computing on the JVM
libnd4j : Vectorization, 32-bit addressing, linalg (BLAS!)
JavaCPP: generates JNI bindings to your CPP libs
ND4J : numpy for the JVM, native superfast arrays
Datavec : one-stop interface to an NDArray
DeepLearning4J: orchestration, backprop, layer de nition
ScalNet: gateway drug, inspired from (and closely following)
Keras
RL4J : Reinforcement learning for the JVM
With Spark
JavaSparkContent sc = ...;
JavaRDD<DataSet> trainingData = ...;
MultiLayerConfiguration networkConfig = ...;
//Create the TrainingMaster instance
int examplesPerDataSetObject = 1;
TrainingMaster trainingMaster =
new ParameterAveragingTrainingMaster.Builder(examplesPerDataSetObjec
.(other configuration options)
.build();
//Create the SparkDl4jMultiLayer instance
SparkDl4jMultiLayer sparkNetwork =
new SparkDl4jMultiLayer(sc, networkConfig, trainingMaster);
//Fit the network using the training data:
sparkNetwork.fit(trainingData);
Spark Challenges
Even if you don't care about Deep
learning
(from Kazuaki Ishizaki @ IBM Japan)
SPARK-6442 : better linear algebra than
breeze
ND4J will have sparse representations soon
Even if you don't care about Deep
learning II
Meta-
RDDs
Killing the bottlenecks
Spark has already changed its networking backend once.
better support for parameters servers and their fault
tolerance.
A Last Word (from Andrew Y. Ng)
get involved !
don't just read papers, reproduce research
results
Also
We're happy to mentor contributions, and there's a book !
Questions ?
Ad

More Related Content

What's hot (20)

Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
Adam Gibson
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
Adam Gibson
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
Adam Gibson
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
Adam Gibson
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
Adam Gibson
 
Deploying signature verification with deep learning
Deploying signature verification with deep learningDeploying signature verification with deep learning
Deploying signature verification with deep learning
Adam Gibson
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
Shivaji Dutta
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
Taylor Riggan
 
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTDeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
Romeo Kienzler
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
MLconf
 
Deep learning in production with the best
Deep learning in production   with the bestDeep learning in production   with the best
Deep learning in production with the best
Adam Gibson
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
Paolo Platter
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
Amanda Mackay (she/her)
 
Best Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in EnterprisesBest Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in Enterprises
geetachauhan
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
mikaelhuss
 
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ..."Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
Edge AI and Vision Alliance
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Josh Patterson
 
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRGData Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Thamme Gowda
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
Adam Gibson
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
Adam Gibson
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
Adam Gibson
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
Adam Gibson
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
Adam Gibson
 
Deploying signature verification with deep learning
Deploying signature verification with deep learningDeploying signature verification with deep learning
Deploying signature verification with deep learning
Adam Gibson
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
Shivaji Dutta
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
Taylor Riggan
 
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoTDeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
Romeo Kienzler
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
MLconf
 
Deep learning in production with the best
Deep learning in production   with the bestDeep learning in production   with the best
Deep learning in production with the best
Adam Gibson
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
Paolo Platter
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Best Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in EnterprisesBest Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in Enterprises
geetachauhan
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
mikaelhuss
 
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ..."Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
Edge AI and Vision Alliance
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Josh Patterson
 
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRGData Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Thamme Gowda
 

Viewers also liked (7)

Deep Learning for Java (DL4J)
Deep Learning for Java (DL4J)Deep Learning for Java (DL4J)
Deep Learning for Java (DL4J)
신동 강
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
 
RBM with DL4J for Deep Learning
RBM with DL4J for Deep Learning RBM with DL4J for Deep Learning
RBM with DL4J for Deep Learning
신동 강
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
DataWorks Summit
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Deep Learning for Java (DL4J)
Deep Learning for Java (DL4J)Deep Learning for Java (DL4J)
Deep Learning for Java (DL4J)
신동 강
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
 
RBM with DL4J for Deep Learning
RBM with DL4J for Deep Learning RBM with DL4J for Deep Learning
RBM with DL4J for Deep Learning
신동 강
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
Ad

Similar to Deep learning on a mixed cluster with deeplearning4j and spark (20)

Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
DataWorks Summit
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with Scala
Susan Eraly
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Databricks
 
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Tomasz Sikora
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
Deep Learning Italia
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Greg Makowski
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
Asim Jalis
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Databricks
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
geetachauhan
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
 
Meetup deeplearningitalia-milano-valerio-morfino
Meetup deeplearningitalia-milano-valerio-morfinoMeetup deeplearningitalia-milano-valerio-morfino
Meetup deeplearningitalia-milano-valerio-morfino
Deep Learning Italia
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI Day
Nick Pentreath
 
Urs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonUrs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in Boston
Intel Nervana
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
DataWorks Summit
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with Scala
Susan Eraly
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Databricks
 
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Tomasz Sikora
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
APACHE SPARK PER IL MACHINE LEARNING: INTRODUZIONE ED UN CASO DI STUDIO_ Meet...
Deep Learning Italia
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Greg Makowski
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
Asim Jalis
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Databricks
 
Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
geetachauhan
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
 
Meetup deeplearningitalia-milano-valerio-morfino
Meetup deeplearningitalia-milano-valerio-morfinoMeetup deeplearningitalia-milano-valerio-morfino
Meetup deeplearningitalia-milano-valerio-morfino
Deep Learning Italia
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI Day
Nick Pentreath
 
Urs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in BostonUrs Köster Presenting at RE-Work DL Summit in Boston
Urs Köster Presenting at RE-Work DL Summit in Boston
Intel Nervana
 
Ad

More from François Garillot (8)

Growing Your Types Without Growing Your Workload
Growing Your Types Without Growing Your WorkloadGrowing Your Types Without Growing Your Workload
Growing Your Types Without Growing Your Workload
François Garillot
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
François Garillot
 
Delivering near real time mobility insights at swisscom
Delivering near real time mobility insights at swisscomDelivering near real time mobility insights at swisscom
Delivering near real time mobility insights at swisscom
François Garillot
 
Spark Streaming : Dealing with State
Spark Streaming : Dealing with StateSpark Streaming : Dealing with State
Spark Streaming : Dealing with State
François Garillot
 
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache SparkA Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
François Garillot
 
Ramping up your Devops Fu for Big Data developers
Ramping up your Devops Fu for Big Data developersRamping up your Devops Fu for Big Data developers
Ramping up your Devops Fu for Big Data developers
François Garillot
 
Diving In The Deep End Of The Big Data Pool
Diving In The Deep End Of The Big Data PoolDiving In The Deep End Of The Big Data Pool
Diving In The Deep End Of The Big Data Pool
François Garillot
 
Scala Collections : Java 8 on Steroids
Scala Collections : Java 8 on SteroidsScala Collections : Java 8 on Steroids
Scala Collections : Java 8 on Steroids
François Garillot
 
Growing Your Types Without Growing Your Workload
Growing Your Types Without Growing Your WorkloadGrowing Your Types Without Growing Your Workload
Growing Your Types Without Growing Your Workload
François Garillot
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
François Garillot
 
Delivering near real time mobility insights at swisscom
Delivering near real time mobility insights at swisscomDelivering near real time mobility insights at swisscom
Delivering near real time mobility insights at swisscom
François Garillot
 
Spark Streaming : Dealing with State
Spark Streaming : Dealing with StateSpark Streaming : Dealing with State
Spark Streaming : Dealing with State
François Garillot
 
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache SparkA Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
François Garillot
 
Ramping up your Devops Fu for Big Data developers
Ramping up your Devops Fu for Big Data developersRamping up your Devops Fu for Big Data developers
Ramping up your Devops Fu for Big Data developers
François Garillot
 
Diving In The Deep End Of The Big Data Pool
Diving In The Deep End Of The Big Data PoolDiving In The Deep End Of The Big Data Pool
Diving In The Deep End Of The Big Data Pool
François Garillot
 
Scala Collections : Java 8 on Steroids
Scala Collections : Java 8 on SteroidsScala Collections : Java 8 on Steroids
Scala Collections : Java 8 on Steroids
François Garillot
 

Recently uploaded (20)

Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Foundation Models for Time Series : A Survey
Foundation Models for Time Series : A SurveyFoundation Models for Time Series : A Survey
Foundation Models for Time Series : A Survey
jayanthkalyanam1
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Apple Logic Pro X Crack FRESH Version 2025
Apple Logic Pro X Crack FRESH Version 2025Apple Logic Pro X Crack FRESH Version 2025
Apple Logic Pro X Crack FRESH Version 2025
fs4635986
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Implementing promises with typescripts, step by step
Implementing promises with typescripts, step by stepImplementing promises with typescripts, step by step
Implementing promises with typescripts, step by step
Ran Wahle
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Odoo ERP for Education Management to Streamline Your Education Process
Odoo ERP for Education Management to Streamline Your Education ProcessOdoo ERP for Education Management to Streamline Your Education Process
Odoo ERP for Education Management to Streamline Your Education Process
iVenture Team LLP
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Full Cracked Resolume Arena Latest Version
Full Cracked Resolume Arena Latest VersionFull Cracked Resolume Arena Latest Version
Full Cracked Resolume Arena Latest Version
jonesmichealj2
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Foundation Models for Time Series : A Survey
Foundation Models for Time Series : A SurveyFoundation Models for Time Series : A Survey
Foundation Models for Time Series : A Survey
jayanthkalyanam1
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Apple Logic Pro X Crack FRESH Version 2025
Apple Logic Pro X Crack FRESH Version 2025Apple Logic Pro X Crack FRESH Version 2025
Apple Logic Pro X Crack FRESH Version 2025
fs4635986
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Implementing promises with typescripts, step by step
Implementing promises with typescripts, step by stepImplementing promises with typescripts, step by step
Implementing promises with typescripts, step by step
Ran Wahle
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Odoo ERP for Education Management to Streamline Your Education Process
Odoo ERP for Education Management to Streamline Your Education ProcessOdoo ERP for Education Management to Streamline Your Education Process
Odoo ERP for Education Management to Streamline Your Education Process
iVenture Team LLP
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Full Cracked Resolume Arena Latest Version
Full Cracked Resolume Arena Latest VersionFull Cracked Resolume Arena Latest Version
Full Cracked Resolume Arena Latest Version
jonesmichealj2
 

Deep learning on a mixed cluster with deeplearning4j and spark

  • 1. Deep learning on a mixed cluster with Deeplearning4j and Spark Barcelona Spark meetup, Dec 9, 2016 (right after NIPS) [email protected] @huitseeker
  • 2. Agenda Intro Why Deep Learning on a Cluster Big Data Architecture Deeplearning4j Spark challenges
  • 3. Introduction : Deep Learning in the trenches today
  • 4. The bad thing about doing a talk right after NIPS you guys are scary.
  • 5. The good thing about doing a talk right after NIPS You guys don't need to be told SkyNet is a fantasy (for now).
  • 6. Paying algorithms Anomaly detection in many forms (bad guys / predictive maintenance / market rally) Fraud detection Network intrusion Fintech secutiries churn prediction Video object detection (security)
  • 7. Models that are being neglected in benchmarks and implementation efforts LSTMs Autoencoders
  • 8. How to deal with this in the Spark world ? experiment with trained model application: Tensorframes, what are the deep learning frameworks that let you train ?
  • 9. Why Deep Learning on a cluster ?
  • 10. Practically ... let's look at benchmarks
  • 11. Practically ... let's look at benchmarks
  • 12. Practically ... let's look at benchmarks
  • 13. Practically ... let's look at benchmarks
  • 14. Training, but how ? New Amazon GPU instances
  • 17. Cluster training in the enterprise it's really about multi-tenancy & economies of scale a big bunch of machines shared among everybody shares better if only because you can reuse it for other workloads Minor reasons enterprises may not have GPUs
  • 18. Distributing training basically distributing SGD (R) challenge is AllReduce Communication Sparse updates, async communications
  • 19. Distributing training : good engineering matters
  • 20. Cluster training in your (experimentor) case ? it's a fun problem : AllReduce Ultimately solved for people with a large amount of images that solution is not open-source (but at Facebook, Google, Amazon, Microsoft¹, Baidu) ¹: 1-bit SGD is under non-commercial license in CNTK 2.0
  • 23. With Spark Spark does the initial ETL Spark ingests the nal result In the middle : parameter server.
  • 24. Spark cluster modes Mesos GPU support merged devices cgroups ! YARN GPU support through tags Spark Standalone : ?
  • 26. Deeplearning4j the rst commercial-grade, open-source, distributed deep- learning library written for Java and Scala Skymind its commercial support arm
  • 27. Scienti c computing on the JVM libnd4j : Vectorization, 32-bit addressing, linalg (BLAS!) JavaCPP: generates JNI bindings to your CPP libs ND4J : numpy for the JVM, native superfast arrays Datavec : one-stop interface to an NDArray DeepLearning4J: orchestration, backprop, layer de nition ScalNet: gateway drug, inspired from (and closely following) Keras RL4J : Reinforcement learning for the JVM
  • 28. With Spark JavaSparkContent sc = ...; JavaRDD<DataSet> trainingData = ...; MultiLayerConfiguration networkConfig = ...; //Create the TrainingMaster instance int examplesPerDataSetObject = 1; TrainingMaster trainingMaster = new ParameterAveragingTrainingMaster.Builder(examplesPerDataSetObjec .(other configuration options) .build(); //Create the SparkDl4jMultiLayer instance SparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, networkConfig, trainingMaster); //Fit the network using the training data: sparkNetwork.fit(trainingData);
  • 30. Even if you don't care about Deep learning (from Kazuaki Ishizaki @ IBM Japan) SPARK-6442 : better linear algebra than breeze ND4J will have sparse representations soon
  • 31. Even if you don't care about Deep learning II Meta- RDDs
  • 32. Killing the bottlenecks Spark has already changed its networking backend once. better support for parameters servers and their fault tolerance.
  • 33. A Last Word (from Andrew Y. Ng) get involved ! don't just read papers, reproduce research results Also We're happy to mentor contributions, and there's a book !