SlideShare a Scribd company logo
Ultra Fast Deep Learning in Hybrid Cloud
Using Intel Analytics Zoo & Alluxio
Jennie Wang, Intel
Louie Tsai, Intel
Bin Fan Alluxio
04/23/2020 1
Agenda
• Part1: Deep Learning & Analytics Zoo
• Part2: Challenges in Hybrid Environment
• Architecture: Analytics Zoo + Alluxio
• Part3: Experimental Result
2
Deep Learning & Analytics Zoo
3
Data Scale Driving
Deep Learning Process
“Machine Learning Yearning”,
Andrew Ng, 2016
4
Real-World ML/DL Systems Are
Complex Big Data Analytics Pipelines
“Hidden Technical Debt in Machine Learning Systems”,
Sculley et al., Google, NIPS 2015 Paper
5
Analytics Zoo: End-to-End DL Pipeline
Made Easy for Big Data
Prototype on laptop
using sample data
Experiment on clusters
with history data
Deployment with
production, distributed big
data pipelines
• “Zero” code change from laptop to distributed cluster
• Directly accessing production big data (Hadoop/Hive/HBase)
• Easily prototyping the end-to-end pipeline
• Seamlessly deployed on production big data clusters
6
Analytics Zoo
Recommendation
Distributed TensorFlow & PyTorch on Spark
Spark Dataframes & ML Pipelines for DL
RayOnSpark
Inference Model
Models &
Algorithms
Integrated
Analytics & AI
Pipelines
Time Series Computer Vision NLP
Unified Data Analytics and AI Platform
https://github.com/intel-analytics/analytics-zoo
Automated ML
Workflow
AutoML for Time Series Automatic Cluster Serving
Compute
Environment
K8s Cluster Spark Cluster
Python Libraries
(Numpy/Pandas/sklearn/…)
DL Frameworks
(TF/PyTorch/OpenVINO/…)
Distributed Analytics
(Spark/Flink/Ray/…)
Laptop Hadoop Cluster
Powered by oneAPI
7
Distributed TensorFlow on Spark in Analytics Zoo
#pyspark code
train_rdd = spark.hadoopFile(…).map(…)
dataset = TFDataset.from_rdd(train_rdd,…)
#tensorflow code
import tensorflow as tf
slim = tf.contrib.slim
images, labels = dataset.tensors
with slim.arg_scope(lenet.lenet_arg_scope()):
logits, end_points = lenet.lenet(images, …)
loss = tf.reduce_mean( 
tf.losses.sparse_softmax_cross_entropy( 
logits=logits, labels=labels))
#distributed training on Spark
optimizer = TFOptimizer.from_loss(loss,
Adam(…))
optimizer.optimize(end_trigger=MaxEpoch(5))
Write TensorFlow inline with Spark code
Analytics Zoo API in blue 8
Spark Dataframe & ML Pipeline for DL
#Spark dataframe code
parquetfile = spark.read.parquet(…)
train_df = parquetfile.withColumn(…)
#Keras API
model = Sequential()
.add(Convolution2D(32, 3, 3)) 
.add(MaxPooling2D(pool_size=(2, 2))) 
.add(Flatten()).add(Dense(10)))
#Spark ML pipeline code
estimater = NNEstimater(model, 
CrossEntropyCriterion())
.setMaxEpoch(5) 
.setFeaturesCol("image")
nnModel = estimater.fit(train_df)
Analytics Zoo API in blue 9
RayOnSpark
Run Ray programs directly on YARN/Spark/K8s cluster
“RayOnSpark: Running Emerging AI Applications on Big Data Clusters with Ray and Analytics Zoo”
https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a
Analytics Zoo API in blue
sc = init_spark_on_yarn(...)
ray_ctx = RayContext(sc=sc, ...)
ray_ctx.init()
#Ray code
@ray.remote
class TestRay():
def hostname(self):
import socket
return socket.gethostname()
actors = [TestRay.remote() for i in range(0,
100)]
print([ray.get(actor.hostname.remote()) 
for actor in actors])
ray_ctx.stop()
10
Distributed Cluster Serving
P5
P4
P3
P2
P1
R4
R3
R2
R1
R5
Input Queue for requests
Output Queue (or files/DB tables)
for prediction results
Local node or
Docker container Hadoop/Yarn/K8s cluster
Network
connection
Model
Simple
Python script
https://software.intel.com/en-u
s/articles/distributed-inference
-made-easy-with-analytics-zoo
-cluster-serving#enqueue request
input = InputQueue()
img = cv2.imread(path)
img = cv2.resize(img, (224,
224))
input.enqueue_image(id, img)
#dequeue response
output = OutputQueue()
result = output.dequeue()
for k in result.keys():
print(k + “: “ + 
json.loads(result[k]))
√ Users freed from complex distributed inference solutions
√ Distributed, real-time inference automatically managed Analytics Zoo
− TensorFlow, PyTorch, Caffe, BigDL, OpenVINO, …
− Spark Streaming, Flink, …
Analytics Zoo API in blue 11
Scalable AutoML for Time Series Prediction
“Scalable AutoML for Time Series Prediction using Ray and Analytics Zoo”
https://medium.com/riselab/scalable-automl-for-time-series-prediction-usin
g-ray-and-analytics-zoo-b79a6fd08139
Automated feature selection, model selection and hyper parameter tuning using Ray
tsp = TimeSequencePredictor( 
dt_col="datetime",
target_col="value")
pipeline = tsp.fit(train_df,
val_df, metric="mse",
recipe=RandomRecipe())
pipeline.predict(test_df)
Analytics Zoo API in blue 12
Production Deployment with Analytics Zoo for
Spark and BigDL
http://mp.weixin.qq.com/s/xUCkzbHK4K06-v5qUsaNQQ
https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom
• Reuse existing Hadoop/Spark clusters for deep learning with no changes (image search, IP protection, etc.)
• Efficiently scale out on Spark with superior performance (3.83x speed-up vs. GPU severs) as benchmarked by JD
13
Technology End UsersCloud Service
Providers
And Many More
*Other names and brands may be claimed as the property of others.
software.intel.com/AIonBigData
Not a full list
Hybrid Cloud & Alluxio
An Open Source Data Orchestration Layer
www.alluxio.io
15
Co-located
Co-located
compute & HDFS
on the same cluster
Disaggregated
compute & HDFS
on the same cluster
MR / Hive
HDFS
Hive
HDFS
Disaggregated
Burst HDFS data in
the cloud,
public or private
Enable & accelerate
access big data across
data centers
Support analytics across
datacenters
HDFS for Hybrid Cloud
Big data journey & innovation
16
Challenge: Data Gets Increasingly Remote
from Compute
▪ Challenging Scenarios
▪ Data-driven initiatives in need of more compute
▪ Hadoop system on-prem, but it’s remote
▪ Object data growth in a cloud region, but it’s remote
▪ How to make remote data local to the compute without
copies?
▪ Business benefits
▪ Faster data-driven insights: data immediately available for
compute
▪ More elastic computing power to solve problems quicker
▪ Up to 80% lower egress costs
Datacenter
17
Solution: “Zero-copy” bursting to scale to
the cloud
AnalyticsZoo
Alluxio
Accelerate big data frameworks
on the public cloud
AnalyticsZoo
Alluxio
Burst big data workloads in
hybrid cloud environments
On premise
18
The Alluxio Story
Originated as Tachyon project, at UC Berkley AMPLab by
Ph.D. student Haoyuan (H.Y.) Li - now Alluxio CTO2013
2015
Open Source project established & company to
commercialize Alluxio founded
Goal: Orchestrate Data at Memory Speed for the Cloud
for data driven apps such as Big Data Analytics, ML and AI.
19
Alluxio is Open-Source Data Orchestration
Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver GCS Driver S3 Driver Azure Driver
20
Zero-Copy Burst: View the I/O Stack
FAST
104
- 105
MB/s
MODERATE 103
- 104
MB/s
SLOW 10 - 103
MB/s
Only when necessary
Limited
Often
SSD
HDD
Mem
21
Benchmark (Louie)
22
Environments for performance results
EC2 Instance Type r5.8xlarge
Number of vCPU per instance 32
Size of memory per instance 256GB
Network speed 10Gbps
Disk space 100GB
Operation System Ubuntu 18.04
Apache Spark version 2.4.3
BigDL version 0.10.0
Analytics Zoo version 0.7.0
Alluxio version 2.2.0
Environments for performance results
Application : Inception Model on Imagenet
https://github.com/intel-analytics/analytics-zoo/tree/master/zoo/src/main/scala/com/intel/an
alytics/zoo/examples/inception
Used 6 “r5.8xlarge”
instances. One worker
per instance.
Have 6 executors
Performance measurement
Measure data loading
time for training and
test data set
Job0 : load training data set
Job1 : load testing data set
Two stages :
stage 0 and stage 1 in Job 0
Two stages :
stage 2 and stage 3 in Job 1
Performance measurement
Using S3 data Using Alluxio data
Performance Results
Achieve 1.5X
speedup by using
Alluxio
Standard deviation is small
for both w & w/o testings
Legal Disclaimers
• Intel technologies’ features and benefits depend on system configuration and may require enabled
hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
• No computer system can be absolutely secure.
• Tests document performance of components on a particular test, in specific systems. Differences in
hardware, software, or configuration will affect actual performance. Consult other sources of
information to evaluate performance as you consider your purchase. For more complete information
about performance and benchmark results, visit http://www.intel.com/performance.
Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the U.S.
and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2019 Intel Corporation
28

More Related Content

What's hot (20)

PDF
Hybrid data lake on google cloud with alluxio and dataproc
Alluxio, Inc.
 
PDF
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Alluxio, Inc.
 
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 
PDF
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Alluxio, Inc.
 
PDF
Data Orchestration for the Hybrid Cloud Era
Alluxio, Inc.
 
PDF
Accelerating Data Computation on Ceph Objects
Alluxio, Inc.
 
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
PDF
What's New in Alluxio 2.3
Alluxio, Inc.
 
PDF
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
PDF
Speeding Up Spark Performance using Alluxio at China Unicom
Alluxio, Inc.
 
PDF
Reducing large S3 API costs using Alluxio at Datasapiens
Alluxio, Inc.
 
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
 
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Alluxio, Inc.
 
PDF
Presto on Alluxio Hands-On Lab
Alluxio, Inc.
 
PDF
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
PDF
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio, Inc.
 
PDF
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Alluxio, Inc.
 
PDF
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 
Hybrid data lake on google cloud with alluxio and dataproc
Alluxio, Inc.
 
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Alluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Alluxio, Inc.
 
Data Orchestration for the Hybrid Cloud Era
Alluxio, Inc.
 
Accelerating Data Computation on Ceph Objects
Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
What's New in Alluxio 2.3
Alluxio, Inc.
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
Speeding Up Spark Performance using Alluxio at China Unicom
Alluxio, Inc.
 
Reducing large S3 API costs using Alluxio at Datasapiens
Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Alluxio, Inc.
 
Presto on Alluxio Hands-On Lab
Alluxio, Inc.
 
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio, Inc.
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
 

Similar to Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio (20)

PDF
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
PDF
End-to-End Big Data AI with Analytics Zoo
Jason Dai
 
PDF
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Databricks
 
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Alluxio, Inc.
 
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Alluxio, Inc.
 
PDF
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Jason Dai
 
PDF
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
 
PPTX
Machine Learning with Apache Spark
IBM Cloud Data Services
 
PDF
Enabling a hardware accelerated deep learning data science experience for Apa...
Indrajit Poddar
 
PDF
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
PPTX
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
iguazio
 
PDF
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
PDF
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
 
PDF
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
PDF
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
PDF
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio, Inc.
 
PDF
AI at Scale in Enterprises
Ganesan Narayanasamy
 
PDF
Scalable AutoML for Time Series Forecasting using Ray
Databricks
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
End-to-End Big Data AI with Analytics Zoo
Jason Dai
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Databricks
 
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Alluxio, Inc.
 
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Alluxio, Inc.
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Jason Dai
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
 
Machine Learning with Apache Spark
IBM Cloud Data Services
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Indrajit Poddar
 
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
iguazio
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
 
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
Enabling big data & AI workloads on the object store at DBS
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio, Inc.
 
AI at Scale in Enterprises
Ganesan Narayanasamy
 
Scalable AutoML for Time Series Forecasting using Ray
Databricks
 
Ad

More from Alluxio, Inc. (20)

PDF
Introduction to Apache Iceberg™ & Tableflow
Alluxio, Inc.
 
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Alluxio, Inc.
 
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
 
PDF
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Alluxio, Inc.
 
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
 
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio, Inc.
 
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
Alluxio, Inc.
 
PDF
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
Alluxio, Inc.
 
Introduction to Apache Iceberg™ & Tableflow
Alluxio, Inc.
 
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Alluxio, Inc.
 
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
 
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Alluxio, Inc.
 
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio, Inc.
 
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
 
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
Alluxio, Inc.
 
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
 
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio, Inc.
 
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
Alluxio, Inc.
 
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio, Inc.
 
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio, Inc.
 
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Alluxio, Inc.
 
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
Alluxio, Inc.
 
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
Alluxio, Inc.
 
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
Alluxio, Inc.
 
Ad

Recently uploaded (20)

PDF
Code and No-Code Journeys: The Maintenance Shortcut
Applitools
 
PDF
Message Level Status (MLS): The Instant Feedback Mechanism for UAE e-Invoicin...
Prachi Desai
 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
PDF
AI Prompts Cheat Code prompt engineering
Avijit Kumar Roy
 
PPTX
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
PDF
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
PDF
Latest Capcut Pro 5.9.0 Crack Version For PC {Fully 2025
utfefguu
 
PDF
How Attendance Management Software is Revolutionizing Education.pdf
Pikmykid
 
PPT
24-BuildingGUIs Complete Materials in Java.ppt
javidmiakhil63
 
PDF
ESUG 2025: Pharo 13 and Beyond (Stephane Ducasse)
ESUG
 
PPTX
ChessBase 18.02 Crack + Serial Key Free Download
cracked shares
 
PPTX
Transforming Lending with IntelliGrow – Advanced Loan Software Solutions
Intelli grow
 
PPTX
TexSender Pro 8.9.1 Crack Full Version Download
cracked shares
 
PDF
custom development enhancement | Togglenow.pdf
aswinisuhu
 
PPTX
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
PDF
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
PDF
Notification System for Construction Logistics Application
Safe Software
 
PPTX
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
PDF
Best Insurance Compliance Software for Managing Regulations
Insurance Tech Services
 
Code and No-Code Journeys: The Maintenance Shortcut
Applitools
 
Message Level Status (MLS): The Instant Feedback Mechanism for UAE e-Invoicin...
Prachi Desai
 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
AI Prompts Cheat Code prompt engineering
Avijit Kumar Roy
 
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
Latest Capcut Pro 5.9.0 Crack Version For PC {Fully 2025
utfefguu
 
How Attendance Management Software is Revolutionizing Education.pdf
Pikmykid
 
24-BuildingGUIs Complete Materials in Java.ppt
javidmiakhil63
 
ESUG 2025: Pharo 13 and Beyond (Stephane Ducasse)
ESUG
 
ChessBase 18.02 Crack + Serial Key Free Download
cracked shares
 
Transforming Lending with IntelliGrow – Advanced Loan Software Solutions
Intelli grow
 
TexSender Pro 8.9.1 Crack Full Version Download
cracked shares
 
custom development enhancement | Togglenow.pdf
aswinisuhu
 
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
intro_to_cpp_namespace_robotics_corner.pdf
MohamedSaied877003
 
Notification System for Construction Logistics Application
Safe Software
 
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
Best Insurance Compliance Software for Managing Regulations
Insurance Tech Services
 

Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio

  • 1. Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio Jennie Wang, Intel Louie Tsai, Intel Bin Fan Alluxio 04/23/2020 1
  • 2. Agenda • Part1: Deep Learning & Analytics Zoo • Part2: Challenges in Hybrid Environment • Architecture: Analytics Zoo + Alluxio • Part3: Experimental Result 2
  • 3. Deep Learning & Analytics Zoo 3
  • 4. Data Scale Driving Deep Learning Process “Machine Learning Yearning”, Andrew Ng, 2016 4
  • 5. Real-World ML/DL Systems Are Complex Big Data Analytics Pipelines “Hidden Technical Debt in Machine Learning Systems”, Sculley et al., Google, NIPS 2015 Paper 5
  • 6. Analytics Zoo: End-to-End DL Pipeline Made Easy for Big Data Prototype on laptop using sample data Experiment on clusters with history data Deployment with production, distributed big data pipelines • “Zero” code change from laptop to distributed cluster • Directly accessing production big data (Hadoop/Hive/HBase) • Easily prototyping the end-to-end pipeline • Seamlessly deployed on production big data clusters 6
  • 7. Analytics Zoo Recommendation Distributed TensorFlow & PyTorch on Spark Spark Dataframes & ML Pipelines for DL RayOnSpark Inference Model Models & Algorithms Integrated Analytics & AI Pipelines Time Series Computer Vision NLP Unified Data Analytics and AI Platform https://github.com/intel-analytics/analytics-zoo Automated ML Workflow AutoML for Time Series Automatic Cluster Serving Compute Environment K8s Cluster Spark Cluster Python Libraries (Numpy/Pandas/sklearn/…) DL Frameworks (TF/PyTorch/OpenVINO/…) Distributed Analytics (Spark/Flink/Ray/…) Laptop Hadoop Cluster Powered by oneAPI 7
  • 8. Distributed TensorFlow on Spark in Analytics Zoo #pyspark code train_rdd = spark.hadoopFile(…).map(…) dataset = TFDataset.from_rdd(train_rdd,…) #tensorflow code import tensorflow as tf slim = tf.contrib.slim images, labels = dataset.tensors with slim.arg_scope(lenet.lenet_arg_scope()): logits, end_points = lenet.lenet(images, …) loss = tf.reduce_mean( tf.losses.sparse_softmax_cross_entropy( logits=logits, labels=labels)) #distributed training on Spark optimizer = TFOptimizer.from_loss(loss, Adam(…)) optimizer.optimize(end_trigger=MaxEpoch(5)) Write TensorFlow inline with Spark code Analytics Zoo API in blue 8
  • 9. Spark Dataframe & ML Pipeline for DL #Spark dataframe code parquetfile = spark.read.parquet(…) train_df = parquetfile.withColumn(…) #Keras API model = Sequential() .add(Convolution2D(32, 3, 3)) .add(MaxPooling2D(pool_size=(2, 2))) .add(Flatten()).add(Dense(10))) #Spark ML pipeline code estimater = NNEstimater(model, CrossEntropyCriterion()) .setMaxEpoch(5) .setFeaturesCol("image") nnModel = estimater.fit(train_df) Analytics Zoo API in blue 9
  • 10. RayOnSpark Run Ray programs directly on YARN/Spark/K8s cluster “RayOnSpark: Running Emerging AI Applications on Big Data Clusters with Ray and Analytics Zoo” https://medium.com/riselab/rayonspark-running-emerging-ai-applications-on-big-data-clusters-with-ray-and-analytics-zoo-923e0136ed6a Analytics Zoo API in blue sc = init_spark_on_yarn(...) ray_ctx = RayContext(sc=sc, ...) ray_ctx.init() #Ray code @ray.remote class TestRay(): def hostname(self): import socket return socket.gethostname() actors = [TestRay.remote() for i in range(0, 100)] print([ray.get(actor.hostname.remote()) for actor in actors]) ray_ctx.stop() 10
  • 11. Distributed Cluster Serving P5 P4 P3 P2 P1 R4 R3 R2 R1 R5 Input Queue for requests Output Queue (or files/DB tables) for prediction results Local node or Docker container Hadoop/Yarn/K8s cluster Network connection Model Simple Python script https://software.intel.com/en-u s/articles/distributed-inference -made-easy-with-analytics-zoo -cluster-serving#enqueue request input = InputQueue() img = cv2.imread(path) img = cv2.resize(img, (224, 224)) input.enqueue_image(id, img) #dequeue response output = OutputQueue() result = output.dequeue() for k in result.keys(): print(k + “: “ + json.loads(result[k])) √ Users freed from complex distributed inference solutions √ Distributed, real-time inference automatically managed Analytics Zoo − TensorFlow, PyTorch, Caffe, BigDL, OpenVINO, … − Spark Streaming, Flink, … Analytics Zoo API in blue 11
  • 12. Scalable AutoML for Time Series Prediction “Scalable AutoML for Time Series Prediction using Ray and Analytics Zoo” https://medium.com/riselab/scalable-automl-for-time-series-prediction-usin g-ray-and-analytics-zoo-b79a6fd08139 Automated feature selection, model selection and hyper parameter tuning using Ray tsp = TimeSequencePredictor( dt_col="datetime", target_col="value") pipeline = tsp.fit(train_df, val_df, metric="mse", recipe=RandomRecipe()) pipeline.predict(test_df) Analytics Zoo API in blue 12
  • 13. Production Deployment with Analytics Zoo for Spark and BigDL http://mp.weixin.qq.com/s/xUCkzbHK4K06-v5qUsaNQQ https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom • Reuse existing Hadoop/Spark clusters for deep learning with no changes (image search, IP protection, etc.) • Efficiently scale out on Spark with superior performance (3.83x speed-up vs. GPU severs) as benchmarked by JD 13
  • 14. Technology End UsersCloud Service Providers And Many More *Other names and brands may be claimed as the property of others. software.intel.com/AIonBigData Not a full list
  • 15. Hybrid Cloud & Alluxio An Open Source Data Orchestration Layer www.alluxio.io 15
  • 16. Co-located Co-located compute & HDFS on the same cluster Disaggregated compute & HDFS on the same cluster MR / Hive HDFS Hive HDFS Disaggregated Burst HDFS data in the cloud, public or private Enable & accelerate access big data across data centers Support analytics across datacenters HDFS for Hybrid Cloud Big data journey & innovation 16
  • 17. Challenge: Data Gets Increasingly Remote from Compute ▪ Challenging Scenarios ▪ Data-driven initiatives in need of more compute ▪ Hadoop system on-prem, but it’s remote ▪ Object data growth in a cloud region, but it’s remote ▪ How to make remote data local to the compute without copies? ▪ Business benefits ▪ Faster data-driven insights: data immediately available for compute ▪ More elastic computing power to solve problems quicker ▪ Up to 80% lower egress costs Datacenter 17
  • 18. Solution: “Zero-copy” bursting to scale to the cloud AnalyticsZoo Alluxio Accelerate big data frameworks on the public cloud AnalyticsZoo Alluxio Burst big data workloads in hybrid cloud environments On premise 18
  • 19. The Alluxio Story Originated as Tachyon project, at UC Berkley AMPLab by Ph.D. student Haoyuan (H.Y.) Li - now Alluxio CTO2013 2015 Open Source project established & company to commercialize Alluxio founded Goal: Orchestrate Data at Memory Speed for the Cloud for data driven apps such as Big Data Analytics, ML and AI. 19
  • 20. Alluxio is Open-Source Data Orchestration Data Orchestration for the Cloud Java File API HDFS Interface S3 Interface REST APIPOSIX Interface HDFS Driver GCS Driver S3 Driver Azure Driver 20
  • 21. Zero-Copy Burst: View the I/O Stack FAST 104 - 105 MB/s MODERATE 103 - 104 MB/s SLOW 10 - 103 MB/s Only when necessary Limited Often SSD HDD Mem 21
  • 23. Environments for performance results EC2 Instance Type r5.8xlarge Number of vCPU per instance 32 Size of memory per instance 256GB Network speed 10Gbps Disk space 100GB Operation System Ubuntu 18.04 Apache Spark version 2.4.3 BigDL version 0.10.0 Analytics Zoo version 0.7.0 Alluxio version 2.2.0
  • 24. Environments for performance results Application : Inception Model on Imagenet https://github.com/intel-analytics/analytics-zoo/tree/master/zoo/src/main/scala/com/intel/an alytics/zoo/examples/inception Used 6 “r5.8xlarge” instances. One worker per instance. Have 6 executors
  • 25. Performance measurement Measure data loading time for training and test data set Job0 : load training data set Job1 : load testing data set Two stages : stage 0 and stage 1 in Job 0 Two stages : stage 2 and stage 3 in Job 1
  • 26. Performance measurement Using S3 data Using Alluxio data
  • 27. Performance Results Achieve 1.5X speedup by using Alluxio Standard deviation is small for both w & w/o testings
  • 28. Legal Disclaimers • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. • No computer system can be absolutely secure. • Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2019 Intel Corporation 28