SlideShare a Scribd company logo
Intel® Confidential — INTERNAL USE ONLY
Building Deep
Learning Powered
Big Data Analytics
using BigDLYiheng Wang, JennieWang
BDT / SSG / Intel
2
What is BigDL?
BigDL is a distributed deep learning library forApache Spark*
3
Big data boost deep learning Production ML/DL system is Complex
Why BigDL?
Andrew NG, Baidu, NIPS 2015 Paper
4
Why BigDL?
BigDL open sourced on Dec 30, 2016
§ Write deep learning applications as standard Spark programs
§ Run on top of existing Spark or Hadoop clusters(No change to the clusters)
§ Rich deep learning support
§ High performance powered by Intel MKL and multi-threaded programming
§ Efficient scale-out with an all-reduce communications on Spark
5
usage and examples of
bigdl
6
Fraud Transaction Detection
Fraud transaction detection is very import to finance companies. A good fraud detection
solution can save a lot of money.
ML solution challenge
§ Data cleaning
§ Feature engineering
§ Unbalanced data
§ Hyper parameter
7
Fraud Transaction Detection
§ History data is stored on Hive
§ Easily data preprocess/cleaning
with Spark-SQL
§ Spark ML pipelinefor complex
feature engineering
§ Under sample + Bagging solve
unbalance problem
§ Grid search for hyperparameter
tuning
Powered by BigDL
8
Product Defect Detection and Classification
Data source
§ Cameras installed on manufactory pipeline
Task
§ Detect defect from the photos
§ Classify the defect
9
Product Defect Detection and Classification
(KeyStone ML Pipeline)
10
Object Detection on PASCAL(http://host.robots.ox.ac.uk/pascal/VOC/)
11
Fast-RCNN
§ Faster-RCNN is a popularobject
detection framework
§ It share the features between detection
network and region proposal network
Ren, Shaoqing, et al. "Faster r-cnn: Towards
real-time object detection with region proposal
networks." Advances in neural information
processing systems. 2015.
12
Object Detection with Fast-RCNN
See the code at: https://github.com/intel-analytics/BigDL/pull/387
13
Language Model with RNN
Text	
Preprocessing
RNN	Model	
Training
Sentence	
Generating
§ Sentence Tokenizer
§ Dictionary Building
§ Input Document
Transformer
Generated sentences with
regard to trigger words.
14
RNN Model
See the code at:
https://github.com/intel-analytics/BigDL/tree/master/dl/src/main/scala/com/intel/analytics/bigdl/models/rnn
15
Learn from Shakespeare Poems
Output of RNN:
Long live the King . The King and Queen , and the Strange of the Veils of the rhapsodic .
and grapple, and the entreatments of the pressure .
Upon her head , and in the world ? `` Oh, the gods ! O Jove ! To whom the king : `` O
friends !
Her hair, nor loose ! If , my lord , and the groundlingsof the skies . jocund and Tasso in
the Staggering of the Mankind . and
16
Fine-tune Caffe/Torch Model on Spark
BigDL Model
Fine-tune
Melancholy
Sunny
Macro
Caffe
Model
BigDL
Model
Torch
Model
Load
• Train on different datasetbasedon pre-trainedmodel
• Predict image style instead oftype
• Save training timeand improveaccuracy
Image source: https://www.flickr.com/photos/
17
Accuracy increases 10% and converge time decreases.
Fine-tune Caffe/Torch Model on Spark
18
Integration with Spark Streaming
Spark
Streaming RDDs
EvaluatorBigDL
Model
StreamWriter
BigDL integarates with Spark Streaming for runtime training and prediction
HDFS/S3
Kafka
Flume
Kinesis
Twitter
Train
Predict
19
Tight Integration with SparkSQLand DataFrames
df.select($’image’)
.withColumn(
“image_type”,
ImgClassifier(“image”))
.filter($’image_type’==‘dog’)
.show()
Image classification on ImageNet(http://www.image-net.org)
20
More BigDL Examples
BigDL provide examples to help developer play with bigdl and start with popular models.
https://github.com/intel-analytics/BigDL/wiki/Examples
Models(Train and Inference example code):
§ LeNet, Inception, VGG, ResNet, RNN, Auto-encoder
Examples:
• Text Classification
• Image Classification
• Load Torch/Caffe model
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang

More Related Content

What's hot (20)

PDF
Spark Summit EU talk by Tug Grall
Spark Summit
 
PDF
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Databricks
 
PDF
From R Script to Production Using rsparkling with Navdeep Gill
Databricks
 
PDF
Data Warehousing with Spark Streaming at Zalando
Databricks
 
PDF
AI on Spark for Malware Analysis and Anomalous Threat Detection
Databricks
 
PDF
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
PDF
Video Games at Scale: Improving the gaming experience with Apache Spark
Spark Summit
 
PPTX
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
 
PDF
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Databricks
 
PPTX
How Spark Enables the Internet of Things- Paula Ta-Shma
Spark Summit
 
PDF
Briefing on the Modern ML Stack with R
Databricks
 
PDF
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
 
PDF
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Databricks
 
PDF
Accelerating Data Science with Better Data Engineering on Databricks
Databricks
 
PDF
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Databricks
 
PPTX
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Spark Summit
 
PDF
Spark Summit EU talk by Christos Erotocritou
Spark Summit
 
PDF
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
GetInData
 
PDF
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
 
PDF
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Databricks
 
Spark Summit EU talk by Tug Grall
Spark Summit
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Databricks
 
From R Script to Production Using rsparkling with Navdeep Gill
Databricks
 
Data Warehousing with Spark Streaming at Zalando
Databricks
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
Databricks
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
Video Games at Scale: Improving the gaming experience with Apache Spark
Spark Summit
 
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Databricks
 
How Spark Enables the Internet of Things- Paula Ta-Shma
Spark Summit
 
Briefing on the Modern ML Stack with R
Databricks
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Databricks
 
Accelerating Data Science with Better Data Engineering on Databricks
Databricks
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Databricks
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Spark Summit
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
GetInData
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Databricks
 

Viewers also liked (20)

PDF
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Spark Summit
 
PPTX
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
PDF
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
PDF
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
PDF
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark Summit
 
PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
PDF
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
PDF
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
PDF
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
PDF
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Spark Summit
 
PDF
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Spark Summit
 
PDF
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Spark Summit
 
PDF
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
Spark Summit
 
PDF
How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...
Spark Summit
 
PDF
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
Spark Summit
 
PDF
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Spark Summit
 
PDF
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
PDF
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 
PDF
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Spark Summit
 
PDF
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Spark Summit
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Spark Summit
 
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark Summit
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark Summit
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Spark Summit
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming: Spar...
Spark Summit
 
Building a Real-Time Fraud Prevention Engine Using Open Source (Big Data) Sof...
Spark Summit
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
Spark Summit
 
How to Integrate Spark MLlib and Apache Solr to Build Real-Time Entity Type R...
Spark Summit
 
Problem Solving Recipes Learned from Supporting Spark: Spark Summit East talk...
Spark Summit
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Spark Summit
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 
Solving Real Problems with Apache Spark: Archiving, E-Discovery, and Supervis...
Spark Summit
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Spark Summit
 
Ad

Similar to Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang (20)

PDF
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
PDF
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Ganesh Raju
 
PDF
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Linaro
 
PDF
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Linaro
 
PPTX
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Ahmed Elsayed
 
PDF
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
Dan Eaton
 
PDF
Spark
Nitish Upreti
 
PDF
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 
PDF
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
Krishna Sankar
 
PDF
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
PDF
Started with-apache-spark
Happiest Minds Technologies
 
PDF
Scaling PyData Up and Out
Travis Oliphant
 
PPTX
Stephen Dillon - Fast Data Presentation Sept 02
Stephen Dillon
 
PDF
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
PDF
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
PDF
Data Science with Spark
Krishna Sankar
 
PDF
GOAI: GPU-Accelerated Data Science DataSciCon 2017
Joshua Patterson
 
PDF
Big data with java
Stefan Angelov
 
PDF
Introduction to Spark: Or how I learned to love 'big data' after all.
Peadar Coyle
 
Distributed Deep Learning At Scale On Apache Spark With BigDL
Yulia Tell
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Linaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Linaro
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Ahmed Elsayed
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
Dan Eaton
 
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
Krishna Sankar
 
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
Started with-apache-spark
Happiest Minds Technologies
 
Scaling PyData Up and Out
Travis Oliphant
 
Stephen Dillon - Fast Data Presentation Sept 02
Stephen Dillon
 
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
Data Science with Spark
Krishna Sankar
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
Joshua Patterson
 
Big data with java
Stefan Angelov
 
Introduction to Spark: Or how I learned to love 'big data' after all.
Peadar Coyle
 
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PPTX
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
DOCX
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
PPTX
Mynd company all details what they are doing a
AniketKadam40952
 
PDF
SaleServicereport and SaleServicereport
2251330007
 
PPTX
Natural Language Processing Datascience.pptx
Anandh798253
 
PPTX
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
PDF
Data science AI/Ml basics to learn .pdf
deokhushi04
 
PPTX
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
PDF
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
PDF
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
PPT
Reliability Monitoring of Aircrfat commerce
Rizk2
 
PDF
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
PDF
Datàaaaaaaaaaengineeeeeeeeeeeeeeeeeeeeeee
juadsr96
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
PDF
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Mynd company all details what they are doing a
AniketKadam40952
 
SaleServicereport and SaleServicereport
2251330007
 
Natural Language Processing Datascience.pptx
Anandh798253
 
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
Data science AI/Ml basics to learn .pdf
deokhushi04
 
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
 
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
 
Reliability Monitoring of Aircrfat commerce
Rizk2
 
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
Datàaaaaaaaaaengineeeeeeeeeeeeeeeeeeeeeee
juadsr96
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang and Yiheng Wang

  • 1. Intel® Confidential — INTERNAL USE ONLY Building Deep Learning Powered Big Data Analytics using BigDLYiheng Wang, JennieWang BDT / SSG / Intel
  • 2. 2 What is BigDL? BigDL is a distributed deep learning library forApache Spark*
  • 3. 3 Big data boost deep learning Production ML/DL system is Complex Why BigDL? Andrew NG, Baidu, NIPS 2015 Paper
  • 4. 4 Why BigDL? BigDL open sourced on Dec 30, 2016 § Write deep learning applications as standard Spark programs § Run on top of existing Spark or Hadoop clusters(No change to the clusters) § Rich deep learning support § High performance powered by Intel MKL and multi-threaded programming § Efficient scale-out with an all-reduce communications on Spark
  • 6. 6 Fraud Transaction Detection Fraud transaction detection is very import to finance companies. A good fraud detection solution can save a lot of money. ML solution challenge § Data cleaning § Feature engineering § Unbalanced data § Hyper parameter
  • 7. 7 Fraud Transaction Detection § History data is stored on Hive § Easily data preprocess/cleaning with Spark-SQL § Spark ML pipelinefor complex feature engineering § Under sample + Bagging solve unbalance problem § Grid search for hyperparameter tuning Powered by BigDL
  • 8. 8 Product Defect Detection and Classification Data source § Cameras installed on manufactory pipeline Task § Detect defect from the photos § Classify the defect
  • 9. 9 Product Defect Detection and Classification (KeyStone ML Pipeline)
  • 10. 10 Object Detection on PASCAL(http://host.robots.ox.ac.uk/pascal/VOC/)
  • 11. 11 Fast-RCNN § Faster-RCNN is a popularobject detection framework § It share the features between detection network and region proposal network Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
  • 12. 12 Object Detection with Fast-RCNN See the code at: https://github.com/intel-analytics/BigDL/pull/387
  • 13. 13 Language Model with RNN Text Preprocessing RNN Model Training Sentence Generating § Sentence Tokenizer § Dictionary Building § Input Document Transformer Generated sentences with regard to trigger words.
  • 14. 14 RNN Model See the code at: https://github.com/intel-analytics/BigDL/tree/master/dl/src/main/scala/com/intel/analytics/bigdl/models/rnn
  • 15. 15 Learn from Shakespeare Poems Output of RNN: Long live the King . The King and Queen , and the Strange of the Veils of the rhapsodic . and grapple, and the entreatments of the pressure . Upon her head , and in the world ? `` Oh, the gods ! O Jove ! To whom the king : `` O friends ! Her hair, nor loose ! If , my lord , and the groundlingsof the skies . jocund and Tasso in the Staggering of the Mankind . and
  • 16. 16 Fine-tune Caffe/Torch Model on Spark BigDL Model Fine-tune Melancholy Sunny Macro Caffe Model BigDL Model Torch Model Load • Train on different datasetbasedon pre-trainedmodel • Predict image style instead oftype • Save training timeand improveaccuracy Image source: https://www.flickr.com/photos/
  • 17. 17 Accuracy increases 10% and converge time decreases. Fine-tune Caffe/Torch Model on Spark
  • 18. 18 Integration with Spark Streaming Spark Streaming RDDs EvaluatorBigDL Model StreamWriter BigDL integarates with Spark Streaming for runtime training and prediction HDFS/S3 Kafka Flume Kinesis Twitter Train Predict
  • 19. 19 Tight Integration with SparkSQLand DataFrames df.select($’image’) .withColumn( “image_type”, ImgClassifier(“image”)) .filter($’image_type’==‘dog’) .show() Image classification on ImageNet(http://www.image-net.org)
  • 20. 20 More BigDL Examples BigDL provide examples to help developer play with bigdl and start with popular models. https://github.com/intel-analytics/BigDL/wiki/Examples Models(Train and Inference example code): § LeNet, Inception, VGG, ResNet, RNN, Auto-encoder Examples: • Text Classification • Image Classification • Load Torch/Caffe model