SlideShare a Scribd company logo
Integrating deep learning
libraries with Apache Spark
Joseph K. Bradley
O’Reilly AI Conference NYC
June 29, 2017
About me
Software engineer at Databricks
Apache Spark committer & PMC member
Ph.D. Carnegie Mellon in Machine Learning
TEAM
About Databricks
Started Spark project (now Apache Spark) at UC Berkeley in 2009
3	3	
PRODUCT
Unified Analytics Platform
MISSION
Making Big Data Simple
Deep Learning and Apache Spark
Deep Learning frameworks w/ Spark bindings
•  Caffe (CaffeOnSpark)
•  Keras (Elephas)
•  mxnet
•  Paddle
•  TensorFlow (TensorFlow on Spark, TensorFrames)
•  CNTK (mmlspark)
Extensions to Spark for specialized hardware
•  Blaze (UCLA & Falcon Computing Solutions)
•  IBM Conductor with Spark
Native Spark
•  BigDL
•  DeepDist
•  DeepLearning4J
•  MLlib
•  SparkCL
•  SparkNet
Deep Learning and Apache Spark
2016: the year of emerging solutions for Spark + Deep Learning
No consensus
•  Many approaches for libraries
–  integrate with Spark
–  build on Spark
–  modify Spark
•  Official Spark MLlib support is limited (perceptron-like networks)
One Framework to Rule Them All?
Databricks’ perspective
•  Databricks: hosted Spark platform on public cloud
•  GPUs for compute-intensive workloads
•  Customers use many Deep Learning frameworks: TensorFlow, Keras, BigDL,
CNTK, MXNet, Theano, Caffe and more
This talk
•  Lessons learned from supporting many Deep Learning frameworks
•  Multiple ways to integrate Deep Learning & Spark
•  High-level APIs vs. low-level cluster/dev issues
Outline
Deep Learning in data pipelines
Recurring patterns in Apache Spark + Deep Learning integrations
Integrating Spark + DL: APIs
Integrating Spark + DL: clusters
Outline
Deep Learning in data pipelines
Recurring patterns in Apache Spark + Deep Learning integrations
Integrating Spark + DL: APIs
Integrating Spark + DL: clusters
DL in data pipelines
Data
collection
ETL Featurization Validation Export,
Serving
compute intensive IO intensiveIO intensive
Large cluster
High memory/CPU ratio
Small cluster
Low memory/CPU ratio
Training
Outline
Deep Learning in data pipelines
Recurring patterns in Apache Spark + DL integrations
Integrating Spark + DL: APIs
Integrating Spark + DL: clusters
Recurring patterns
Data
collection
ETL Featurization Validation Export,
Serving
Training
DL for featurization
à other ML for training
ETL/featurization on cluster
à DL training on 1 beefy machine
Distributed model tuning:
local DL training for each
parameter set
Distributed DL
training
Recurring patterns
DL for featurization
à other ML for training
ETL/featurization on cluster
à DL training on 1 beefy machine
Distributed model tuning:
local DL training for each
parameter set
Deep Learning Transformer Deep Learning Estimator
Flexible cluster deployments
Distributed DL
training
APIs
Clusters
Outline
Deep Learning in data pipelines
Recurring patterns in Apache Spark + Deep Learning integrations
Integrating Spark + DL: APIs
Integrating Spark + DL: clusters
Recurring patterns: APIs
Deep learning is used in many parts of a data pipeline:
•  Featurization
•  Training
•  Local
•  Distributed: 1 model per machine
•  Distributed: 1 model per cluster
•  Prediction
Our answer:
•  Use familiar MLlib APIs
•  Leverage existing deep
learning frameworks
New library: Deep Learning Pipelines
• Simple API for deep learning, based on MLlib Pipelines
• Scales common tasks with Transformers and Estimators
• Exposes deep learning models in Spark DataFrames & SQL
Example: Image classification
Example: Identify the James Bond cars
Example: Identify the James Bond cars
• Neural networks are very good at image analysis
• Can work with complex situations:
INVARIANCE TO ROTATIONS
Good application for Deep Learning
INCOMPLETE DATA
Transfer Learning
Transfer Learning
Transfer Learning
Transfer Learning
Transfer Learning
SoftMax
GIANT PANDA
0.9
RED PANDA 0.05
RACCOON 0.01
…
Classifier
Transfer Learning
DeepImageFeaturize
r
Deep Learning Pipelines
MLlib Pipeline APIs for
•  Featurization (Transformers)
•  Training (Estimators)
Conversion of deep learning models to SQL UDFs
Early release of Deep Learning Pipelines
https://github.com/databricks/spark-deep-learning
MLlib Pipeline APIs for Deep Learning
Growing set of API integrations:
•  Spark MLlib (Apache)
•  Deep Learning Pipelines (Databricks)
•  mmlspark (Microsoft)
•  Others are under active development!
General trend in Spark + Machine Learning:
•  xgboost
•  many Spark Packages (spark-packages.org)
Outline
Deep Learning in data pipelines
Recurring patterns in Apache Spark + Deep Learning integrations
Integrating Spark + DL: APIs
Integrating Spark + DL: clusters
Recurring patterns: clusters
Flexible cluster deployments are critical.
• Handle different parts of workflow
•  Featurization
•  Training 1 model
•  Training many models
• Handle different data & model dimensions
•  CPUs vs. GPUs
•  Local vs. distributed
Diving deeper: communication
Spark as a scheduler
•  Data-parallel tasks
•  Data stored outside Spark
Embedded Deep Learning transforms
•  Data-parallel tasks
•  Data stored in DataFrames/RDDs
Cooperative frameworks
•  Multiple passes over data
•  Heavy and/or specialized communication
Streaming data through DL
Primary storage choices:
•  Cold layer (HDFS, S3)
•  Local storage (files, Spark’s on-disk persistence)
•  Memory (Spark RDDs, DataFrames)
Find out if you are I/O constrained or processor-constrained
•  How big is your dataset? MNIST or ImageNet?
Cooperative frameworks
Use Spark for data input. Alternative communication layer.
Examples:
•  Skymind’s DeepLearning4J
•  DistML and other Parameter Server projects
RDD	
Par((on	1	
Par((on	n	
RDD	
Par((on	1	
Par((on	m	
Black	box
Cooperative frameworks
Use Spark for data input. Alternative communication layer.
Examples:
•  Skymind’s DeepLearning4J
•  DistML and other Parameter Server projects
Bypass Spark communication:
•  Lose reproducibility/determinism of RDDs and DataFrames
•  But that’s OK: “reproducibility is worth a factor of 2” (Leon Bottou)
Simplified deployment
• Pre-installed GPU drivers on machine
• Docker image with full GPU SDK
GPU integration (in Databricks)
Container:	
nvidia-docker,	
lxc,	etc.	
GPU	hardware	
Linux	kernel	 NV	Kernel	driver	
CuBLAS	 CuDNN	
Deep	learning	libraries	
(Tensorflow,	etc.)	 JCUDA	
Python	/	JVM	clients	
CUDA	
NV	kernel	driver	(userspace	interface)	
Flexible cluster setup
•  CPUs or GPUs
•  Transfer & share data
across clusters
Monitoring Spark + DL integrations
The best method depends on task granularity.
•  Around tasks
•  Inside (long-running) tasks
Accumulators in tasks
•  Throughput or failure rate within tasks
External systems
•  Logging: TensorBoard
•  General system: Grafana, Graphite, Prometheus, etc.
Resources
Early release of Deep Learning Pipelines
https://github.com/databricks/spark-deep-learning
Recent blog posts
http://databricks.com/blog
Deep Learning Pipelines, TensorFrames, GPU acceleration, getting started with Deep
Learning, Intel’s BigDL
Docs for Deep Learning on Databricks
http://docs.databricks.com
Getting started, Spark integration
Deep Learning without Deep Pockets
For users
•  Life’s getting better.
•  MLlib APIs for Deep
Learning integrations
(Deep Learning Pipelines)
•  Flexible cluster
deployments
For developers
•  Challenges remain.
•  Communication patterns
for Deep Learning
•  Maintaining GPU software
stacks
UNIFIED ANALYTICS PLATFORM
Try Apache Spark in Databricks!
•  Collaborative cloud environment
•  Free version (community edition)
39	39	
DATABRICKS RUNTIME 3.0
•  Apache Spark - optimized for the cloud
•  Caching and optimization layer - DBIO
•  Enterprise security - DBES
Try for free today.
databricks.com
Thank you!
Twitter: @jkbatcmu à I’ll share my slides.
http://databricks.com/try
Ad

More Related Content

What's hot (20)

Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
Cloudera, Inc.
 
Spark r under the hood with Hossein Falaki
Spark r under the hood with Hossein FalakiSpark r under the hood with Hossein Falaki
Spark r under the hood with Hossein Falaki
Databricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
 
Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0
Databricks
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
Meeraj Kunnumpurath
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Daniel Rodriguez
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
Databricks
 
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Databricks
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
Manish Gupta
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
DataWorks Summit
 
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
Databricks
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Wee Hyong Tok
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark Summit
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Python and Bigdata - An Introduction to Spark (PySpark)
Python and Bigdata -  An Introduction to Spark (PySpark)Python and Bigdata -  An Introduction to Spark (PySpark)
Python and Bigdata - An Introduction to Spark (PySpark)
hiteshnd
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
Cloudera, Inc.
 
Spark r under the hood with Hossein Falaki
Spark r under the hood with Hossein FalakiSpark r under the hood with Hossein Falaki
Spark r under the hood with Hossein Falaki
Databricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan ZhuBuilding a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
 
Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0Large-Scale Data Science in Apache Spark 2.0
Large-Scale Data Science in Apache Spark 2.0
Databricks
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
Meeraj Kunnumpurath
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Daniel Rodriguez
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
Databricks
 
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Databricks
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
Manish Gupta
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
DataWorks Summit
 
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
Databricks
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Wee Hyong Tok
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark as the Gateway Drug to Typed Functional Programming: Spark Summit East ...
Spark Summit
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Python and Bigdata - An Introduction to Spark (PySpark)
Python and Bigdata -  An Introduction to Spark (PySpark)Python and Bigdata -  An Introduction to Spark (PySpark)
Python and Bigdata - An Introduction to Spark (PySpark)
hiteshnd
 

Similar to Integrating Deep Learning Libraries with Apache Spark (20)

AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI Day
Nick Pentreath
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Xiao Li
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Big Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudDataBig Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudData
WeCloudData
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
Databricks
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
Jen Aman
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI Day
Nick Pentreath
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
BigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for SparkBigDL webinar - Deep Learning Library for Spark
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
Xiao Li
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Big Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudDataBig Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudData
WeCloudData
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
Databricks
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
Jen Aman
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 

Integrating Deep Learning Libraries with Apache Spark

  • 1. Integrating deep learning libraries with Apache Spark Joseph K. Bradley O’Reilly AI Conference NYC June 29, 2017
  • 2. About me Software engineer at Databricks Apache Spark committer & PMC member Ph.D. Carnegie Mellon in Machine Learning
  • 3. TEAM About Databricks Started Spark project (now Apache Spark) at UC Berkeley in 2009 3 3 PRODUCT Unified Analytics Platform MISSION Making Big Data Simple
  • 4. Deep Learning and Apache Spark Deep Learning frameworks w/ Spark bindings •  Caffe (CaffeOnSpark) •  Keras (Elephas) •  mxnet •  Paddle •  TensorFlow (TensorFlow on Spark, TensorFrames) •  CNTK (mmlspark) Extensions to Spark for specialized hardware •  Blaze (UCLA & Falcon Computing Solutions) •  IBM Conductor with Spark Native Spark •  BigDL •  DeepDist •  DeepLearning4J •  MLlib •  SparkCL •  SparkNet
  • 5. Deep Learning and Apache Spark 2016: the year of emerging solutions for Spark + Deep Learning No consensus •  Many approaches for libraries –  integrate with Spark –  build on Spark –  modify Spark •  Official Spark MLlib support is limited (perceptron-like networks)
  • 6. One Framework to Rule Them All?
  • 7. Databricks’ perspective •  Databricks: hosted Spark platform on public cloud •  GPUs for compute-intensive workloads •  Customers use many Deep Learning frameworks: TensorFlow, Keras, BigDL, CNTK, MXNet, Theano, Caffe and more This talk •  Lessons learned from supporting many Deep Learning frameworks •  Multiple ways to integrate Deep Learning & Spark •  High-level APIs vs. low-level cluster/dev issues
  • 8. Outline Deep Learning in data pipelines Recurring patterns in Apache Spark + Deep Learning integrations Integrating Spark + DL: APIs Integrating Spark + DL: clusters
  • 9. Outline Deep Learning in data pipelines Recurring patterns in Apache Spark + Deep Learning integrations Integrating Spark + DL: APIs Integrating Spark + DL: clusters
  • 10. DL in data pipelines Data collection ETL Featurization Validation Export, Serving compute intensive IO intensiveIO intensive Large cluster High memory/CPU ratio Small cluster Low memory/CPU ratio Training
  • 11. Outline Deep Learning in data pipelines Recurring patterns in Apache Spark + DL integrations Integrating Spark + DL: APIs Integrating Spark + DL: clusters
  • 12. Recurring patterns Data collection ETL Featurization Validation Export, Serving Training DL for featurization à other ML for training ETL/featurization on cluster à DL training on 1 beefy machine Distributed model tuning: local DL training for each parameter set Distributed DL training
  • 13. Recurring patterns DL for featurization à other ML for training ETL/featurization on cluster à DL training on 1 beefy machine Distributed model tuning: local DL training for each parameter set Deep Learning Transformer Deep Learning Estimator Flexible cluster deployments Distributed DL training APIs Clusters
  • 14. Outline Deep Learning in data pipelines Recurring patterns in Apache Spark + Deep Learning integrations Integrating Spark + DL: APIs Integrating Spark + DL: clusters
  • 15. Recurring patterns: APIs Deep learning is used in many parts of a data pipeline: •  Featurization •  Training •  Local •  Distributed: 1 model per machine •  Distributed: 1 model per cluster •  Prediction Our answer: •  Use familiar MLlib APIs •  Leverage existing deep learning frameworks
  • 16. New library: Deep Learning Pipelines • Simple API for deep learning, based on MLlib Pipelines • Scales common tasks with Transformers and Estimators • Exposes deep learning models in Spark DataFrames & SQL
  • 18. Example: Identify the James Bond cars
  • 19. Example: Identify the James Bond cars
  • 20. • Neural networks are very good at image analysis • Can work with complex situations: INVARIANCE TO ROTATIONS Good application for Deep Learning INCOMPLETE DATA
  • 26. SoftMax GIANT PANDA 0.9 RED PANDA 0.05 RACCOON 0.01 … Classifier Transfer Learning DeepImageFeaturize r
  • 27. Deep Learning Pipelines MLlib Pipeline APIs for •  Featurization (Transformers) •  Training (Estimators) Conversion of deep learning models to SQL UDFs Early release of Deep Learning Pipelines https://github.com/databricks/spark-deep-learning
  • 28. MLlib Pipeline APIs for Deep Learning Growing set of API integrations: •  Spark MLlib (Apache) •  Deep Learning Pipelines (Databricks) •  mmlspark (Microsoft) •  Others are under active development! General trend in Spark + Machine Learning: •  xgboost •  many Spark Packages (spark-packages.org)
  • 29. Outline Deep Learning in data pipelines Recurring patterns in Apache Spark + Deep Learning integrations Integrating Spark + DL: APIs Integrating Spark + DL: clusters
  • 30. Recurring patterns: clusters Flexible cluster deployments are critical. • Handle different parts of workflow •  Featurization •  Training 1 model •  Training many models • Handle different data & model dimensions •  CPUs vs. GPUs •  Local vs. distributed
  • 31. Diving deeper: communication Spark as a scheduler •  Data-parallel tasks •  Data stored outside Spark Embedded Deep Learning transforms •  Data-parallel tasks •  Data stored in DataFrames/RDDs Cooperative frameworks •  Multiple passes over data •  Heavy and/or specialized communication
  • 32. Streaming data through DL Primary storage choices: •  Cold layer (HDFS, S3) •  Local storage (files, Spark’s on-disk persistence) •  Memory (Spark RDDs, DataFrames) Find out if you are I/O constrained or processor-constrained •  How big is your dataset? MNIST or ImageNet?
  • 33. Cooperative frameworks Use Spark for data input. Alternative communication layer. Examples: •  Skymind’s DeepLearning4J •  DistML and other Parameter Server projects RDD Par((on 1 Par((on n RDD Par((on 1 Par((on m Black box
  • 34. Cooperative frameworks Use Spark for data input. Alternative communication layer. Examples: •  Skymind’s DeepLearning4J •  DistML and other Parameter Server projects Bypass Spark communication: •  Lose reproducibility/determinism of RDDs and DataFrames •  But that’s OK: “reproducibility is worth a factor of 2” (Leon Bottou)
  • 35. Simplified deployment • Pre-installed GPU drivers on machine • Docker image with full GPU SDK GPU integration (in Databricks) Container: nvidia-docker, lxc, etc. GPU hardware Linux kernel NV Kernel driver CuBLAS CuDNN Deep learning libraries (Tensorflow, etc.) JCUDA Python / JVM clients CUDA NV kernel driver (userspace interface) Flexible cluster setup •  CPUs or GPUs •  Transfer & share data across clusters
  • 36. Monitoring Spark + DL integrations The best method depends on task granularity. •  Around tasks •  Inside (long-running) tasks Accumulators in tasks •  Throughput or failure rate within tasks External systems •  Logging: TensorBoard •  General system: Grafana, Graphite, Prometheus, etc.
  • 37. Resources Early release of Deep Learning Pipelines https://github.com/databricks/spark-deep-learning Recent blog posts http://databricks.com/blog Deep Learning Pipelines, TensorFrames, GPU acceleration, getting started with Deep Learning, Intel’s BigDL Docs for Deep Learning on Databricks http://docs.databricks.com Getting started, Spark integration
  • 38. Deep Learning without Deep Pockets For users •  Life’s getting better. •  MLlib APIs for Deep Learning integrations (Deep Learning Pipelines) •  Flexible cluster deployments For developers •  Challenges remain. •  Communication patterns for Deep Learning •  Maintaining GPU software stacks
  • 39. UNIFIED ANALYTICS PLATFORM Try Apache Spark in Databricks! •  Collaborative cloud environment •  Free version (community edition) 39 39 DATABRICKS RUNTIME 3.0 •  Apache Spark - optimized for the cloud •  Caching and optimization layer - DBIO •  Enterprise security - DBES Try for free today. databricks.com
  • 40. Thank you! Twitter: @jkbatcmu à I’ll share my slides. http://databricks.com/try